Files

Joseph Doherty ec1dfe59e4 Initial commit: 3-year shopfloor IT/OT transformation plan

Core plan: current-state, goal-state (layered architecture, OtOpcUa,
Redpanda EventHub, SnowBridge, canonical model, UNS posture + naming
hierarchy, digital twin use cases absorbed), roadmap (7 workstreams x 3
years), and status bookmark.

Component detail files: legacy integrations inventory (3 integrations,
pillar 3 denominator closed), equipment protocol survey template (dual
mandate with UNS hierarchy snapshot), digital twin management brief
(conversation complete, outcome recorded).

Output generation pipeline: specs for 18-slide mixed-stakeholder PPTX
and faithful-typeset PDF, with README, design doc, and implementation
plan. No generated outputs yet — deferred until source data is stable.

2026-04-17 09:12:35 -04:00

23 KiB

Raw Blame History

Current State

Snapshot of today's shopfloor IT/OT interfaces and data collection. Keep this updated as discovery progresses.

When a section below grows beyond a few paragraphs, break it out into current-state/<component>.md and leave a short summary + link here. See CLAUDE.md.

Enterprise Layout

Primary Data Center

South Bend Data Center — primary data center.

Largest Sites

Warsaw West campus
Warsaw North campus

Largest sites run one server cluster per production building (each larger production building gets its own dedicated cluster of equipment servers).

Other Integrated Sites

Shannon
Galway
TMT
Ponce

Other integrated sites run a single server cluster covering the whole site.

Not Yet Integrated

A number of smaller sites globally are not yet integrated into the current SCADA system. Known examples include:
- Berlin
- Winterthur
- Jacksonville
- …others — see note on volatility below.
Characteristic: these tend to be smaller footprint sites distributed across multiple regions (EU, US, etc.), likely requiring a lighter-weight onboarding pattern than the large Warsaw campuses.
Volatility note: the list of smaller sites is expected to change — sites may be added, removed, reprioritized, or handled by adjacent programs. This file deliberately does not dive into per-site detail (equipment, PLC vendors, network topology, etc.) for the smaller sites because that detail would go stale quickly. Rely on the named examples as illustrative rather than authoritative until a firm enterprise-wide site list is established.

Systems & Interfaces

SCADA — Split Stack

SCADA responsibilities are split across two platforms by purpose:

Aveva System Platform — used for validated data collection (regulated/compliance-grade data).
Ignition SCADA — used for KPI monitoring and reporting.

Aveva System Platform

Role: validated data collection (see SCADA split above).
Primary cluster: hosted in the South Bend Data Center.
Site clusters: each smaller site runs its own site-level application server cluster on Aveva System Platform.
Version: Aveva System Platform 2023 R2 across the estate. TBD — whether every cluster is actually at 2023 R2 (confirm no version skew between primary and site clusters) and the patch/update level within 2023 R2.
Galaxy structure: federation is handled entirely through Global Galaxy — that is the structural shape of the System Platform estate. Individual site galaxies and the primary cluster galaxy are tied together via Global Galaxy rather than a separate enterprise-galaxy layer on top. TBD — exact count of underlying galaxies, naming, and which objects live where in the federation.
Inter-cluster communication: clusters talk to each other via this Global Galaxy federation.
Redundancy model: hot-warm pairs — Aveva System Platform's standard AppEngine redundancy pattern. Each engine runs a hot primary with a warm standby partner; the warm partner takes over on primary failure. Applies across both the primary cluster in South Bend and the site-level application server clusters. TBD — which engines specifically run as redundant pairs (not every engine in a galaxy typically does), failover drill cadence, and how redundancy interacts with Global Galaxy federation during a failover.
Web API interface: a Web API runs on the primary cluster (South Bend), serving as the enterprise-level integration entry point. It currently exposes two integration interfaces:
- Delmia DNC — interface for DNC (file/program distribution) integration.
- Camstar MES — interface for MES integration.
Out of scope for this plan: licensing posture. License model and renewal strategy are not tracked here even if they shift as Redpanda-based event flows offload work from System Platform.
TBD — patch/update level within 2023 R2, full Galaxy structure detail, and per-engine redundancy specifics (all tracked inline above).

Ignition SCADA

Role: KPI monitoring and reporting (see SCADA split above).
Deployment topology: centrally hosted in the South Bend Data Center today. Ignition is not deployed per-site — there is a single central Ignition footprint, and every site's KPI UX reaches it over the WAN. This is the opposite of the Aveva System Platform topology (which has site-level clusters) and means Ignition KPI UX at a site depends on WAN reachability to South Bend.
Data source today: direct OPC UA from equipment. Ignition pulls data directly over OPC UA from equipment — it does not go through ScadaBridge, LmxOpcUa, Aveva Historian, or the Global Galaxy to get its values. Because Ignition is centrally hosted in South Bend, this means OPC UA connections run from South Bend to every site's equipment over the WAN.
- Contrast with ScadaBridge: ScadaBridge is built around a data-locality principle (equipment talks to the local site's ScadaBridge instance). Ignition does the opposite today — equipment talks to a remote central Ignition over WAN OPC UA.
- Implication for WAN outages: during a WAN outage between a site and South Bend, Ignition loses access to that site's equipment — KPI UX for that site goes stale until the WAN recovers. This is a known characteristic of the current topology, not a defect to fix piecemeal; any remediation belongs in the goal-state discussion about Ignition's future deployment shape.
Version: Ignition 8.3.
Modules in use:
- Perspective — Ignition's web-native UX module, used for the KPI user interface.
- OPC UA — used to pull data directly from equipment (see data source above).
- Reporting — used for KPI/operational reports on top of Ignition.
- Notable not in use: Tag Historian (Aveva Historian is the sole historian in the estate), Vision (Perspective is the only UX module), and no third-party modules (no Sepasoft MES, no Cirrus Link MQTT, etc.).
TBD — whether a per-site or regional Ignition footprint is on the roadmap given the WAN-dependency implication, and the patch level within 8.3.

ScadaBridge (in-house)

What: clustered Akka.NET application built in-house.
Role: interfaces with OPC UA sources, bridging device/equipment data into the broader SCADA stack.
Capabilities:
- Scripting — custom logic can be written and executed inside the bridge. Scripts run in C# via Roslyn scripting (the same language as ScadaBridge itself), so users can reuse .NET libraries and ScadaBridge's internal types without an extra binding layer.
- Templating — reusable templates for configuring devices/data flows at scale. Authoring and distribution model:
  - Templates are authored in a UI (not hand-edited files).
  - The UI writes template definitions to a central database that serves as the source of truth for all templates across the enterprise.
  - When templates are updated, changes are serialized and pushed from the central DB out to the site server clusters, so every ScadaBridge cluster runs a consistent, up-to-date template set without requiring per-site edits.
  - TBD — serialization format on the wire, push mechanism (pull vs push), conflict/version handling if a site is offline during an update, audit trail of template changes.
- Secure Web API (inbound) — external systems can interface with ScadaBridge over an authenticated Web API. Authentication is handled via API keys — clients present a static, per-client API key on each call. TBD — key issuance and rotation process, storage at the client side, scoping (per client vs per capability), revocation process, audit trail of key usage.
- Web API client (outbound) — pre-configured, script-callable. ScadaBridge provides a generic outbound Web API client capability: any Web API can be pre-configured (endpoint URL, credentials, headers, auth scheme, etc.) and then called easily from scripts using the configured name. There is no hard-coded list of "known" external Web APIs — the set of callable APIs is whatever is configured today, and new APIs can be added without ScadaBridge code changes.
- Notifications — contact-list driven, transport-agnostic. ScadaBridge maintains contact lists (named groups of recipients) as a first-class concept. Scripts send notifications to a contact list; ScadaBridge handles the delivery over the appropriate transport (email or Microsoft Teams) based on how the contacts are configured. Scripts do not care about the transport — they call a single "notify" capability against a named contact list, and routing/fan-out happens inside ScadaBridge. New contact lists and new recipients can be added without script changes.
- Database writes — can write to databases as a sink for collected/processed data. Supported target today: SQL Server only — other databases (PostgreSQL, Oracle, etc.) are not currently supported. TBD — whether a generic ADO.NET / ODBC path is planned to broaden support, or whether SQL Server is intentionally the only target.
- Equipment writes via OPC UA — can write back to equipment over OPC UA (not just read).
- EventHub forwarding — can forward events to an EventHub (Kafka-compatible) for async downstream consumers.
- Store-and-forward (per-call, optional) — Web API calls, notifications, and database interactions can optionally be cached in a store-and-forward queue on a per-call basis. If the downstream target is unreachable (WAN outage, target down), the call is persisted locally and replayed when connectivity returns — preserving site resilience without forcing every integration to be async.
Deployment topology: runs as 2-node Akka.NET clusters, co-located on the existing Aveva System Platform cluster nodes (no dedicated hardware — shares the same physical/virtual nodes that host System Platform).
Benchmarked throughput (OPC UA ingestion ceiling): a single 2-node site cluster has been benchmarked to handle ~225,000 OPC UA updates per second at the ingestion layer. This is the input rate ceiling, not the downstream work rate — triggered events, script executions, Web API calls, DB writes, EventHub forwards, and notifications all happen on a filtered subset of those updates and run at significantly lower rates. TBD — actual production load per site (typically far below this ceiling), downstream work-rate profile (what percent of ingested updates trigger work), whether the benchmark was sustained or peak, and the co-located System Platform node headroom at benchmark load.
Supervision model (Akka.NET): ScadaBridge uses Akka.NET supervision to self-heal around transient failures. Concretely:
- OPC UA connection restarts. When an OPC UA source disconnects, returns malformed data, or stalls, ScadaBridge restarts the connection to that source rather than letting the failure propagate up. Individual source failures are isolated from each other.
- Actor tree restarts on failure. When a failure escalates beyond a single connection (e.g., a faulty script or a downstream integration wedged in an unrecoverable state), ScadaBridge can restart the affected actor tree, bringing its children back to a known-good state without taking the whole cluster down.
- TBD — specific supervision strategies per actor tier (OneForOne vs AllForOne, restart limits, backoff), what failures escalate to cluster-level rather than tree-level, and how recurring script failures are throttled/quarantined rather than restart-looped.
Downstream consumers / integration targets in production today:
- Aveva System Platform — via LmxOpcUa. ScadaBridge interacts with System Platform through the in-house LmxOpcUa server rather than a direct System Platform API. LmxOpcUa exposes System Platform objects over OPC UA; ScadaBridge reads from and writes to System Platform through that OPC UA surface. This is the primary OT-side consumer.
- Internal Web APIs. ScadaBridge makes outbound calls to internal enterprise Web APIs using its generic pre-configured Web API client capability (see Capabilities above). Because any Web API can be configured dynamically, there is no fixed enumeration of "ScadaBridge's Web API integrations" to capture here; specific IT↔OT Web API crossings land in the legacy integrations inventory (current-state/legacy-integrations.md) regardless of whether they're reached via ScadaBridge's generic client.
- Batch tracking database. ScadaBridge writes batch tracking records directly to a SQL Server batch tracking database.
- Camstar MES — direct. ScadaBridge integrates with Camstar via a direct outbound Web API call from ScadaBridge to Camstar, using its own Web API client and credentials. It does not go through the Aveva System Platform primary cluster's Camstar Web API interface (LEG-002). This means ScadaBridge already has a native Camstar path; the LEG-002 retirement work is about moving the other consumers of that System Platform Web API off it, not about building a new ScadaBridge-to-Camstar path.
- TBD — other databases written to besides batch tracking, and any additional consumers not listed here. Enumeration of internal Web API endpoints is not tracked here because ScadaBridge's Web API client is generic/configurable (see Capabilities); specific IT↔OT Web API crossings that need migration live in current-state/legacy-integrations.md. Notification destination teams are similarly not enumerated because they're contact-list-driven and transport-agnostic (see Capabilities) — the list of actual recipients lives in ScadaBridge's configuration, not in this plan.
Routing topology:
- Hub-and-spoke — ScadaBridge nodes on the central cluster (South Bend) can route to ScadaBridge nodes on other clusters, forming a hub-and-spoke network with the central cluster as the hub.
- Direct access — site-level ScadaBridge clusters can also be reached directly (not only via the hub), enabling point-to-point integration where appropriate.
Data locality (design principle): ScadaBridge is designed to keep local data sources localized — equipment at a site communicates with the local ScadaBridge instance at that site, not with the central cluster. This minimizes cross-site/WAN traffic, reduces latency, and keeps site operations resilient to WAN outages.
Deployment status: ScadaBridge is already deployed across the current cluster footprint. However, not all legacy API integrations have been migrated onto it yet — some older point-to-point integrations still run outside ScadaBridge and need to be ported. The authoritative inventory of these integrations (and their retirement tracking against goal-state.md pillar 3) lives in current-state/legacy-integrations.md.
TBD — scripting language/runtime, template format, Web API auth model, supported DB targets, supervision model, throughput, downstream consumers, resource impact of co-location with System Platform.

LmxOpcUa (in-house)

What: in-house OPC UA server with tight integration to Aveva System Platform.
Role: exposes System Platform data/objects via OPC UA, enabling OPC UA clients (including ScadaBridge and third parties) to consume System Platform data natively.
Goal-state note: in the target architecture, LmxOpcUa is folded into OtOpcUa — the new unified site-level OPC UA layer. Its System Platform namespace carries forward; it runs alongside a new equipment namespace on the same per-site clustered OPC UA server. See goal-state.md → OtOpcUa — the unified site-level OPC UA layer for the fold-in details.
Deployment footprint: built and deployed to every Aveva System Platform node — primary cluster in South Bend and every site-level application server cluster. LmxOpcUa is not a centralized gateway; each System Platform node runs its own local instance, so OPC UA clients can reach the System Platform objects hosted on that node directly.
Namespace source: each LmxOpcUa instance is built to interface with its local Application Platform's LMX API. The OPC UA address space exposed by a given LmxOpcUa node reflects the System Platform objects reachable through that node's LMX API — i.e., the namespace is inherently per-node and scoped to whatever the local App Server surfaces. Cross-node visibility happens at the System Platform / Global Galaxy layer, not at the LmxOpcUa layer.
Security model: standard OPC UA security — supports the standard OPC UA security modes (None / Sign / SignAndEncrypt) and standard security profiles (Basic256Sha256 and related), with UserName token authentication for clients. No bespoke auth scheme. TBD — which security mode + profile combinations are required vs allowed in production, where the UserName credentials come from (local accounts, AD/LDAP, a dedicated credential store), and how credentials are rotated and audited.
TBD — exact OPC UA namespace shape exposed to clients (hierarchy mirroring Galaxy areas/objects vs flat vs custom), and how ScadaBridge templates address equipment across multiple per-node LmxOpcUa instances.

Equipment OPC UA — multiple direct connections today

Current access pattern: some equipment is connected to by multiple systems directly, concurrently, rather than through a single shared access layer. Depending on the equipment, any of the following may hold OPC UA sessions against it at the same time:
- Aveva System Platform (for validated data collection via its IO drivers)
- Ignition SCADA (for KPI data, central from South Bend over the WAN — see Ignition data source)
- ScadaBridge (for bridge/integration workloads via its Akka.NET OPC UA client)
Consequences of the current pattern:
- Multiple OPC UA sessions per equipment. Equipment takes the session load of every consumer independently, which can strain devices with limited concurrent-session support.
- No single access-control point. Authorization is enforced by whatever each consumer happens to present to the equipment — no site-level chokepoint exists to inspect, audit, or limit equipment access.
- Inconsistent data. The same tag read by three different consumers can produce three subtly different values (different sampling intervals, different deadbands, different session buffers).
TBD — exact inventory of which equipment is reached by which consumers today; whether any equipment is already fronted by a shared OPC UA aggregator at the site level.
Equipment protocol survey. The authoritative inventory of native equipment protocols across the estate (Modbus, EtherNet/IP, Siemens S7, Fanuc FOCAS, native OPC UA, long-tail) lives in current-state/equipment-protocol-survey.md. That file is the Year 1 input to the OtOpcUa core driver library scope — see goal-state.md → OtOpcUa → Driver strategy and roadmap.md → OtOpcUa → Year 1.

Camstar MES (sole MES)

Role: the only MES in use across the estate. There are no other MES products at any site — Camstar is the enterprise-wide system.
Integration today: accessed from the shopfloor via the Camstar interface on the Aveva System Platform primary cluster's Web API (LEG-002 in the legacy integrations inventory), and separately by ScadaBridge (path TBD — see ScadaBridge downstream consumers).
TBD — Camstar version, hosting (on-prem vs SaaS), owner team, which business capabilities it covers.

Aveva Historian (sole historian)

Role: the only historian in use across the estate. No other historian products (OSIsoft PI, Canary, GE Proficy, etc.) run at any site.
Deployment topology: central-only in the South Bend Data Center. A single Aveva Historian instance in South Bend serves the entire estate. There are no per-site tier-1 historians, and there is no tier-1 → tier-2 replication model in play today — every site's historian data lands directly in the central South Bend historian.
- Implication for ingestion: the SnowBridge reads from one historian, not many — no per-site historian enumeration, no replication topology to account for.
- Implication for WAN: because the historian is central, the collection path from a site's System Platform cluster to the historian already crosses the WAN today. This is a pre-existing WAN dependency, not something this plan introduces.
Version: 2023 R2, same release cadence as Aveva System Platform.
Retention policy: permanent. No TTL or rollup is applied — historian data is retained forever as a matter of policy. This means the "drill-down to Historian for raw data" pattern in goal-state.md works at any historical horizon, and the historian is the authoritative long-term system of record for validated tag data regardless of how much Snowflake chooses to store.
Integration role: serves as the system of record for validated/compliance-grade tag data collected via Aveva System Platform, and exposes a SQL interface (OPENQUERY and history views) for read access. Downstream use of that SQL interface for Snowflake ingestion is discussed in goal-state.md under Aveva Historian → Snowflake.
Current consumers (reporting): the primary consumer of Historian data today is enterprise reporting, currently on SAP BusinessObjects (BOBJ). Reporting is actively migrating from SAP BOBJ to Power BI — this is an in-flight transition that this plan should be aware of but does not own.
- Implication for pillar 2: the "enterprise analytics/AI enablement" target in goal-state.md sits alongside this Power BI migration, not in competition with it. Whether Power BI consumes from Snowflake (via the dbt curated layer), from Historian directly, or from both is a TBD that coordinates between the two initiatives.
TBD — current storage footprint and growth rate, other consumers beyond reporting (e.g., Aveva Historian Client / Insight / Trend tools, ad-hoc analyst SQL, regulatory/audit exports), and how the BOBJ→Power BI migration coordinates with the Snowflake path for machine data.

TBD — additional shopfloor systems and HMIs not covered above (if any).

IT/OT Integration Points

TBD — how IT and OT layers currently connect (protocols, gateways, brokers).

Data Collection

TBD — what data is collected, how, where it lands, frequency, gaps.

Operator / User Interfaces

TBD — current UIs operators interact with, pain points.

Known Pain Points & Constraints

TBD.

Stakeholders