Files
3yearplan/current-state.md
Joseph Doherty bed8c8e12b Remove equipment protocol survey — driver list confirmed by v2 team
The OtOpcUa v2 implementation team committed all 8 core drivers from
internal knowledge of the estate, making the formal protocol survey
unnecessary for driver scoping. Removed current-state/equipment-protocol-
survey.md and cleaned up all references across 7 files.

The UNS hierarchy snapshot (per-site equipment-instance walk for site/area/
line/equipment assignments + UUIDs) is now a standalone Year 1 deliverable,
decoupled from protocol discovery. Tracked in status.md and goal-state.md
UNS naming hierarchy section.

Eliminates ~52 TBDs (all placeholder data in the pre-seeded survey rows).
2026-04-17 11:54:46 -04:00

167 lines
25 KiB
Markdown

# Current State
Snapshot of today's shopfloor IT/OT interfaces and data collection. Keep this updated as discovery progresses.
> When a section below grows beyond a few paragraphs, break it out into `current-state/<component>.md` and leave a short summary + link here. See [`CLAUDE.md`](CLAUDE.md#breaking-out-components).
## Enterprise Layout
### Primary Data Center
- **South Bend Data Center** — primary data center.
### Largest Sites
- **Warsaw West campus**
- **Warsaw North campus**
> Largest sites run **one server cluster per production building** (each larger production building gets its own dedicated cluster of equipment servers).
### Other Integrated Sites
- **Shannon**
- **Galway**
- **TMT**
- **Ponce**
> Other integrated sites run a **single server cluster** covering the whole site.
### Not Yet Integrated
- A number of **smaller sites globally** are **not yet integrated** into the current SCADA system. Known examples include:
- **Berlin**
- **Winterthur**
- **Jacksonville**
- _…others — see note on volatility below._
- Characteristic: these tend to be **smaller footprint** sites distributed across multiple regions (EU, US, etc.), likely requiring a lighter-weight onboarding pattern than the large Warsaw campuses.
- **Volatility note:** the list of smaller sites is **expected to change** — sites may be added, removed, reprioritized, or handled by adjacent programs. This file deliberately **does not** dive into per-site detail (equipment, PLC vendors, network topology, etc.) for the smaller sites because that detail would go stale quickly. Rely on the named examples as illustrative rather than authoritative until a firm enterprise-wide site list is established.
## Systems & Interfaces
### SCADA — Split Stack
SCADA responsibilities are split across two platforms by purpose:
- **Aveva System Platform** — used for **validated data collection** (regulated/compliance-grade data).
- **Ignition SCADA** — used for **KPI** monitoring and reporting.
### Aveva System Platform
- **Role:** validated data collection (see SCADA split above).
- **Primary cluster:** hosted in the **South Bend Data Center**.
- **Site clusters:** each smaller site runs its own **site-level application server cluster** on Aveva System Platform.
- **Version:** **Aveva System Platform 2023 R2** across the estate. _TBD — whether every cluster is actually at 2023 R2 (confirm no version skew between primary and site clusters) and the patch/update level within 2023 R2._
- **Galaxy structure:** federation is handled entirely through **Global Galaxy** — that is the structural shape of the System Platform estate. Individual site galaxies and the primary cluster galaxy are tied together via Global Galaxy rather than a separate enterprise-galaxy layer on top. _TBD — exact count of underlying galaxies, naming, and which objects live where in the federation._
- **Inter-cluster communication:** clusters talk to each other via this Global Galaxy federation.
- **Redundancy model:** **hot-warm pairs** — Aveva System Platform's standard AppEngine redundancy pattern. Each engine runs a hot primary with a warm standby partner; the warm partner takes over on primary failure. Applies across both the primary cluster in South Bend and the site-level application server clusters. _TBD — which engines specifically run as redundant pairs (not every engine in a galaxy typically does), failover drill cadence, and how redundancy interacts with Global Galaxy federation during a failover._
- **Web API interface:** a Web API runs on the **primary cluster** (South Bend), serving as the enterprise-level integration entry point. It currently exposes two integration interfaces:
- **Delmia DNC** — interface for DNC (file/program distribution) integration.
- **Camstar MES** — interface for MES integration.
- **Out of scope for this plan:** licensing posture. License model and renewal strategy are not tracked here even if they shift as Redpanda-based event flows offload work from System Platform.
- _TBD — patch/update level within 2023 R2, full Galaxy structure detail, and per-engine redundancy specifics (all tracked inline above)._
### Ignition SCADA
- **Role:** KPI monitoring and reporting (see SCADA split above).
- **Deployment topology:** **centrally hosted in the South Bend Data Center** today. Ignition is **not** deployed per-site — there is a single central Ignition footprint, and every site's KPI UX reaches it over the WAN. This is the opposite of the Aveva System Platform topology (which has site-level clusters) and means Ignition KPI UX at a site depends on WAN reachability to South Bend.
- **Data source today: direct OPC UA from equipment.** Ignition pulls data **directly over OPC UA** from equipment — it does **not** go through ScadaBridge, LmxOpcUa, Aveva Historian, or the Global Galaxy to get its values. Because Ignition is centrally hosted in South Bend, this means **OPC UA connections run from South Bend to every site's equipment over the WAN**.
- **Contrast with ScadaBridge:** ScadaBridge is built around a data-locality principle (equipment talks to the local site's ScadaBridge instance). Ignition does the opposite today — equipment talks to a remote central Ignition over WAN OPC UA.
- **Implication for WAN outages:** during a WAN outage between a site and South Bend, Ignition loses access to that site's equipment — KPI UX for that site goes stale until the WAN recovers. This is a known characteristic of the current topology, not a defect to fix piecemeal; any remediation belongs in the goal-state discussion about Ignition's future deployment shape.
- **Version:** **Ignition 8.3**.
- **Modules in use:**
- **Perspective** — Ignition's web-native UX module, used for the KPI user interface.
- **OPC UA** — used to pull data directly from equipment (see data source above).
- **Reporting** — used for KPI/operational reports on top of Ignition.
- Notable **not** in use: **Tag Historian** (Aveva Historian is the sole historian in the estate), **Vision** (Perspective is the only UX module), and no third-party modules (no Sepasoft MES, no Cirrus Link MQTT, etc.).
- _TBD — whether a per-site or regional Ignition footprint is on the roadmap given the WAN-dependency implication, and the patch level within 8.3._
### ScadaBridge (in-house)
- **What:** clustered **Akka.NET** application built in-house.
- **Role:** interfaces with **OPC UA** sources, bridging device/equipment data into the broader SCADA stack.
- **Capabilities:**
- **Scripting** — custom logic can be written and executed inside the bridge. Scripts run in **C# via Roslyn scripting** (the same language as ScadaBridge itself), so users can reuse .NET libraries and ScadaBridge's internal types without an extra binding layer.
- **Templating** — reusable templates for configuring devices/data flows at scale. **Authoring and distribution model:**
- Templates are authored in a **UI** (not hand-edited files).
- The UI writes template definitions to a **central database** that serves as the source of truth for all templates across the enterprise.
- When templates are updated, changes are **serialized and pushed from the central DB out to the site server clusters**, so every ScadaBridge cluster runs a consistent, up-to-date template set without requiring per-site edits.
- _TBD — serialization format on the wire, push mechanism (pull vs push), conflict/version handling if a site is offline during an update, audit trail of template changes._
- **Secure Web API (inbound)** — external systems can interface with ScadaBridge over an authenticated Web API. Authentication is handled via **API keys** — clients present a static, per-client API key on each call. _TBD — key issuance and rotation process, storage at the client side, scoping (per client vs per capability), revocation process, audit trail of key usage._
- **Web API client (outbound) — pre-configured, script-callable.** ScadaBridge provides a generic outbound Web API client capability: **any Web API can be pre-configured** (endpoint URL, credentials, headers, auth scheme, etc.) and then **called easily from scripts** using the configured name. There is no hard-coded list of "known" external Web APIs — the set of callable APIs is whatever is configured today, and new APIs can be added without ScadaBridge code changes.
- **Notifications — contact-list driven.** ScadaBridge maintains **contact lists** (named groups of recipients) as a first-class concept. Scripts send notifications **to a contact list**; ScadaBridge handles the delivery. **Supported transport today: email only** (SMTP-based with OAuth2 Client Credentials for modern providers; plain-text message bodies, no HTML). Scripts do not care about the transport — they call a single "notify" capability against a named contact list, and routing/fan-out happens inside ScadaBridge. New contact lists and new recipients can be added without script changes. _Note: an earlier version of this plan claimed Microsoft Teams support — the ScadaBridge design repo does not document Teams as an implemented transport. If Teams support is needed, it would be a ScadaBridge extension, not an existing capability._
- **Database writes** — can write to databases as a sink for collected/processed data. **Supported target today: SQL Server only** — other databases (PostgreSQL, Oracle, etc.) are not currently supported. _TBD — whether a generic ADO.NET / ODBC path is planned to broaden support, or whether SQL Server is intentionally the only target._
- **Equipment writes via OPC UA** — can write back to equipment over OPC UA (not just read).
- **EventHub forwarding (committed, not yet implemented)** — designed to forward events to an **EventHub** (Kafka-compatible) for async downstream consumers. The design is committed and the architecture accommodates it, but the implementation does not exist yet in the ScadaBridge codebase. Year 1 ScadaBridge Extensions workstream in [`roadmap.md`](roadmap.md) includes this as the "EventHub producer" deliverable.
- **Store-and-forward (per-call, optional)** — Web API calls, notifications, and database interactions can **optionally** be cached in a **store-and-forward** queue on a **per-call basis**. If the downstream target is unreachable (WAN outage, target down), the call is persisted locally and replayed when connectivity returns — preserving site resilience without forcing every integration to be async.
- **Deployment topology:** runs as **2-node Akka.NET clusters**, **co-located on the existing Aveva System Platform cluster nodes** (no dedicated hardware — shares the same physical/virtual nodes that host System Platform).
- **Benchmarked throughput (OPC UA ingestion ceiling):** a single 2-node site cluster has been **benchmarked** to handle **~225,000 OPC UA updates per second** at the ingestion layer. This is the **input rate ceiling**, not the downstream work rate — triggered events, script executions, Web API calls, DB writes, EventHub forwards, and notifications all happen on a filtered subset of those updates and run at significantly lower rates. _TBD — actual production load per site (typically far below this ceiling), downstream work-rate profile (what percent of ingested updates trigger work), whether the benchmark was sustained or peak, and the co-located System Platform node headroom at benchmark load._
- **Dual transport (Akka.NET + gRPC):** cluster-internal communication uses two channels — **Akka.NET ClusterClient** for command/control and **gRPC server-streaming** for real-time data. This is a design-repo-documented architectural detail, not visible to external consumers.
- **Supervision model (Akka.NET):** ScadaBridge uses Akka.NET supervision to self-heal around transient failures. Concretely:
- **OPC UA connection restarts.** When an OPC UA source disconnects, returns malformed data, or stalls, ScadaBridge **restarts the connection to that source** rather than letting the failure propagate up. Individual source failures are isolated from each other.
- **Actor tree restarts on failure.** When a failure escalates beyond a single connection (e.g., a faulty script or a downstream integration wedged in an unrecoverable state), ScadaBridge can **restart the affected actor tree**, bringing its children back to a known-good state without taking the whole cluster down. The Deployment Manager supervises Instance Actors with a **OneForOneStrategy**; Script/Alarm Actors use **Resume** (preserves coordinator state); Execution Actors use **Stop**.
- **Split-brain resolver: keep-oldest strategy** with 15-second stability windows and `down-if-alone = on` safeguards. Approximate failover time: **~25 seconds** (detection + stabilization).
- **Staggered batch startup** — the Deployment Manager creates Instance Actors in staggered batches (e.g., 20 at a time) to avoid overwhelming OPC UA servers and network capacity during cluster start or restart.
- **Central UI: Blazor Server.** Full web-based authoring environment for templates, scripts, external system definitions, and deployment workflows. Role-based access with three levels (**Admin**, **Design**, **Deployment**). **LDAP / Active Directory integration** with JWT session management (15-minute expiry, sliding refresh, HttpOnly/Secure cookies). Includes **real-time debug streaming** via SignalR WebSocket for live attribute values and alarm state changes.
- **Audit logging.** Comprehensive synchronous audit entries within the same database transaction as changes. After-state serialized as JSON for full change-history reconstruction.
- **Three-phase deployment process.** Central nodes first, database validation required, staggered site deployment, rollback strategy with retained binaries and backups.
- **Site-level configuration: SQLite.** Sites receive only **flattened configurations** and read from a **local SQLite** database, not a full SQL Server instance. The central SQL Server configuration database is the source of truth; sites consume a read-only serialized subset.
- **Downstream consumers / integration targets in production today:**
- **Aveva System Platform — via LmxOpcUa.** ScadaBridge interacts with System Platform through the in-house **LmxOpcUa** server rather than a direct System Platform API. LmxOpcUa exposes System Platform objects over OPC UA; ScadaBridge reads from and writes to System Platform through that OPC UA surface. This is the primary OT-side consumer.
- **Internal Web APIs.** ScadaBridge makes outbound calls to internal enterprise Web APIs using its generic pre-configured Web API client capability (see Capabilities above). Because any Web API can be configured dynamically, there is no fixed enumeration of "ScadaBridge's Web API integrations" to capture here; specific IT↔OT Web API crossings land in the legacy integrations inventory (`current-state/legacy-integrations.md`) regardless of whether they're reached via ScadaBridge's generic client.
- **Batch tracking database.** ScadaBridge writes batch tracking records directly to a SQL Server batch tracking database.
- **Camstar MES — direct.** ScadaBridge integrates with Camstar via a **direct outbound Web API call from ScadaBridge to Camstar**, using its own Web API client and credentials. It does **not** go through the Aveva System Platform primary cluster's Camstar Web API interface (LEG-002). This means ScadaBridge already has a native Camstar path; the LEG-002 retirement work is about moving the **other** consumers of that System Platform Web API off it, not about building a new ScadaBridge-to-Camstar path.
- _TBD — other databases written to besides batch tracking, and any additional consumers not listed here. **Enumeration of internal Web API endpoints is not tracked here** because ScadaBridge's Web API client is generic/configurable (see Capabilities); specific IT↔OT Web API crossings that need migration live in `current-state/legacy-integrations.md`. **Notification destination teams are similarly not enumerated** because they're contact-list-driven and transport-agnostic (see Capabilities) — the list of actual recipients lives in ScadaBridge's configuration, not in this plan._
- **Routing topology:**
- **Hub-and-spoke** — ScadaBridge nodes on the **central cluster (South Bend)** can route to ScadaBridge nodes on other clusters, forming a hub-and-spoke network with the central cluster as the hub.
- **Direct access** — site-level ScadaBridge clusters can also be reached directly (not only via the hub), enabling point-to-point integration where appropriate.
- **Data locality (design principle):** ScadaBridge is designed to **keep local data sources localized** — equipment at a site communicates with the **local ScadaBridge instance** at that site, not with the central cluster. This minimizes cross-site/WAN traffic, reduces latency, and keeps site operations resilient to WAN outages.
- **Deployment status:** ScadaBridge is **already deployed** across the current cluster footprint. However, **not all legacy API integrations have been migrated onto it yet** — some older point-to-point integrations still run outside ScadaBridge and need to be ported. The authoritative inventory of these integrations (and their retirement tracking against `goal-state.md` pillar 3) lives in [`current-state/legacy-integrations.md`](current-state/legacy-integrations.md).
- _TBD — resource impact of co-location with System Platform at the largest sites; whether any additional downstream consumers exist beyond those listed above; whether the notification capability will be extended to support Microsoft Teams (not currently implemented)._
### LmxOpcUa (in-house)
- **What:** in-house **OPC UA server** with **tight integration to Aveva System Platform**.
- **Role:** exposes System Platform data/objects via OPC UA, enabling OPC UA clients (including ScadaBridge and third parties) to consume System Platform data natively.
- **Goal-state note:** in the target architecture, LmxOpcUa is **folded into `OtOpcUa`** — the new unified site-level OPC UA layer. Its System Platform namespace carries forward; it runs alongside a new equipment namespace on the same per-site clustered OPC UA server. See `goal-state.md`**OtOpcUa — the unified site-level OPC UA layer** for the fold-in details.
- **Deployment footprint:** built and deployed to **every Aveva System Platform node** — primary cluster in South Bend and every site-level application server cluster. LmxOpcUa is not a centralized gateway; each System Platform node runs its own local instance, so OPC UA clients can reach the System Platform objects hosted on that node directly.
- **Namespace source:** each LmxOpcUa instance is built to interface with its **local Application Platform's LMX API**. The OPC UA address space exposed by a given LmxOpcUa node reflects the System Platform objects reachable through **that node's** LMX API — i.e., the namespace is inherently per-node and scoped to whatever the local App Server surfaces. Cross-node visibility happens at the System Platform / Global Galaxy layer, not at the LmxOpcUa layer.
- **Security model:** standard **OPC UA security** — supports the standard OPC UA security modes (`None` / `Sign` / `SignAndEncrypt`) and standard security profiles (`Basic256Sha256` and related), with **UserName token** authentication for clients. No bespoke auth scheme. _TBD — which security mode + profile combinations are required vs allowed in production, where the UserName credentials come from (local accounts, AD/LDAP, a dedicated credential store), and how credentials are rotated and audited._
- _TBD — exact OPC UA namespace shape exposed to clients (hierarchy mirroring Galaxy areas/objects vs flat vs custom), and how ScadaBridge templates address equipment across multiple per-node LmxOpcUa instances._
### Equipment OPC UA — multiple direct connections today
- **Current access pattern:** some equipment is connected to by **multiple systems directly**, concurrently, rather than through a single shared access layer. Depending on the equipment, any of the following may hold OPC UA sessions against it at the same time:
- **Aveva System Platform** (for validated data collection via its IO drivers)
- **Ignition SCADA** (for KPI data, central from South Bend over the WAN — see Ignition data source)
- **ScadaBridge** (for bridge/integration workloads via its Akka.NET OPC UA client)
- **Consequences of the current pattern:**
- **Multiple OPC UA sessions per equipment.** Equipment takes the session load of every consumer independently, which can strain devices with limited concurrent-session support.
- **No single access-control point.** Authorization is enforced by whatever each consumer happens to present to the equipment — no site-level chokepoint exists to inspect, audit, or limit equipment access.
- **Inconsistent data.** The same tag read by three different consumers can produce three subtly different values (different sampling intervals, different deadbands, different session buffers).
- _TBD — exact inventory of which equipment is reached by which consumers today; whether any equipment is already fronted by a shared OPC UA aggregator at the site level._
- **Equipment protocols — resolved.** The OtOpcUa v2 implementation design has committed the core driver library based on the team's internal knowledge of the estate: OPC UA Client, Modbus TCP, AB CIP, AB Legacy (PCCC), Siemens S7, Beckhoff TwinCAT (ADS), FANUC FOCAS, plus the Galaxy driver carried forward from LmxOpcUa. See [`goal-state.md`](goal-state.md) → OtOpcUa → Driver strategy for the full list and stability tiers. A formal protocol survey is no longer needed for driver scoping; the **UNS hierarchy snapshot** (equipment-instance-level site/area/line/equipment walk) is still required — see [`goal-state.md`](goal-state.md) → UNS naming hierarchy standard.
### Camstar MES (sole MES)
- **Role:** the **only MES** in use across the estate. There are no other MES products at any site — Camstar is the enterprise-wide system.
- **Integration today:** accessed from the shopfloor via the **Camstar interface on the Aveva System Platform primary cluster's Web API** (LEG-002 in the legacy integrations inventory), and separately by **ScadaBridge** (path TBD — see ScadaBridge downstream consumers).
- _TBD — Camstar version, hosting (on-prem vs SaaS), owner team, which business capabilities it covers._
### Aveva Historian (sole historian)
- **Role:** the **only historian** in use across the estate. No other historian products (OSIsoft PI, Canary, GE Proficy, etc.) run at any site.
- **Deployment topology: central-only in the South Bend Data Center.** A single Aveva Historian instance in South Bend serves the entire estate. There are **no per-site tier-1 historians**, and there is **no tier-1 → tier-2 replication** model in play today — every site's historian data lands directly in the central South Bend historian.
- **Implication for ingestion:** the SnowBridge reads from **one** historian, not many — no per-site historian enumeration, no replication topology to account for.
- **Implication for WAN:** because the historian is central, the collection path from a site's System Platform cluster to the historian already crosses the WAN today. This is a pre-existing WAN dependency, not something this plan introduces.
- **Version:** **2023 R2**, same release cadence as Aveva System Platform.
- **Retention policy: permanent.** No TTL or rollup is applied — historian data is retained **forever** as a matter of policy. This means the "drill-down to Historian for raw data" pattern in `goal-state.md` works at any historical horizon, and the historian is the authoritative long-term system of record for validated tag data regardless of how much Snowflake chooses to store.
- **Integration role:** serves as the system of record for validated/compliance-grade tag data collected via Aveva System Platform, and exposes a **SQL interface** (OPENQUERY and history views) for read access. Downstream use of that SQL interface for Snowflake ingestion is discussed in `goal-state.md` under Aveva Historian → Snowflake.
- **Current consumers (reporting):** the primary consumer of Historian data today is **enterprise reporting**, currently on **SAP BusinessObjects (BOBJ)**. Reporting is actively **migrating from SAP BOBJ to Power BI** — this is an in-flight transition that this plan should be aware of but does not own.
- **Implication for pillar 2:** the "enterprise analytics/AI enablement" target in `goal-state.md` sits alongside this Power BI migration, not in competition with it. Whether Power BI consumes from Snowflake (via the dbt curated layer), from Historian directly, or from both is a TBD that coordinates between the two initiatives.
- _TBD — current storage footprint and growth rate, other consumers beyond reporting (e.g., Aveva Historian Client / Insight / Trend tools, ad-hoc analyst SQL, regulatory/audit exports), and how the BOBJ→Power BI migration coordinates with the Snowflake path for machine data._
_TBD — additional shopfloor systems and HMIs not covered above (if any)._
## IT/OT Integration Points
_TBD — how IT and OT layers currently connect (protocols, gateways, brokers)._
## Data Collection
_TBD — what data is collected, how, where it lands, frequency, gaps._
## Operator / User Interfaces
_TBD — current UIs operators interact with, pain points._
## Known Pain Points & Constraints
_TBD._
## Stakeholders
_TBD._