Remove equipment protocol survey — driver list confirmed by v2 team

The OtOpcUa v2 implementation team committed all 8 core drivers from
internal knowledge of the estate, making the formal protocol survey
unnecessary for driver scoping. Removed current-state/equipment-protocol-
survey.md and cleaned up all references across 7 files.

The UNS hierarchy snapshot (per-site equipment-instance walk for site/area/
line/equipment assignments + UUIDs) is now a standalone Year 1 deliverable,
decoupled from protocol discovery. Tracked in status.md and goal-state.md
UNS naming hierarchy section.

Eliminates ~52 TBDs (all placeholder data in the pre-seeded survey rows).
This commit is contained in:
Joseph Doherty
2026-04-17 11:54:37 -04:00
parent f53a775968
commit bed8c8e12b
8 changed files with 13 additions and 282 deletions

View File

@@ -63,7 +63,7 @@ The roadmap is laid out as a 2D grid — **workstreams** (rows) crossed with **y
| Workstream | **Year 1 — Foundation** | **Year 2 — Scale** | **Year 3 — Completion** |
|---|---|---|---|
| **OtOpcUa** | **Evolve LmxOpcUa into OtOpcUa** — extend the existing in-house OPC UA server to add (a) a new equipment namespace with single session per equipment via native protocols translated to OPC UA (committed core drivers: OPC UA Client, Modbus TCP, AB CIP, AB Legacy, S7, TwinCAT, FOCAS, plus Galaxy carried forward), and (b) clustering (non-transparent redundancy, 2-node per site) on top of the existing per-node deployment. **Driver stability tiers:** Tier A in-process (Modbus, OPC UA Client), Tier B in-process with guards (S7, AB CIP, AB Legacy, TwinCAT), Tier C out-of-process (Galaxy — bitness constraint, FOCAS — uncatchable AVE). **Protocol survey** across the estate — template in [`current-state/equipment-protocol-survey.md`](current-state/equipment-protocol-survey.md); target steps 13 done Q1 to validate committed driver list and feed initial UNS hierarchy snapshot. **Build ACL surface** (per-cluster `EquipmentAcl` table, Admin UI, OPC UA NodeManager enforcement) — required before tier-1 cutover. **Deploy OtOpcUa to every site** as fast as practical. **Begin tier 1 cutover (ScadaBridge)** at large sites. **Prerequisite: certificate-distribution** to consumer trust stores before each cutover. **Aveva System Platform IO pattern validation** — Year 1 or early Year 2 research to confirm Aveva supports upstream OPC UA data sources, well ahead of Year 3 tier 3. _TBD — survey owner; first-cutover site selection; cutover plan owner (OtOpcUa team or integration team); enterprise shortname for UNS hierarchy root._ | **Complete tier 1 (ScadaBridge)** across all sites. **Begin tier 2 (Ignition)** — Ignition consumers redirected from direct-equipment OPC UA to each site's OtOpcUa, collapsing WAN session counts from *N per equipment* to *one per site*. **Build long-tail drivers** on demand as sites require them. Resolve Warsaw per-building multi-cluster consumer-addressing pattern (consumer-side stitching vs site-aggregator OtOpcUa instance). _TBD — per-site tier-2 rollout sequence._ | **Complete tier 2 (Ignition)** across all sites. **Execute tier 3 (Aveva System Platform IO)** with compliance stakeholder validation — the hardest cutover because System Platform IO feeds validated data collection. Reach steady state: every equipment session is held by OtOpcUa, every downstream consumer reads OT data through it. _TBD — per-equipment-class criteria for System Platform IO re-validation._ |
| **OtOpcUa** | **Evolve LmxOpcUa into OtOpcUa** — extend the existing in-house OPC UA server to add (a) a new equipment namespace with single session per equipment via native protocols translated to OPC UA (committed core drivers: OPC UA Client, Modbus TCP, AB CIP, AB Legacy, S7, TwinCAT, FOCAS, plus Galaxy carried forward), and (b) clustering (non-transparent redundancy, 2-node per site) on top of the existing per-node deployment. **Driver stability tiers:** Tier A in-process (Modbus, OPC UA Client), Tier B in-process with guards (S7, AB CIP, AB Legacy, TwinCAT), Tier C out-of-process (Galaxy — bitness constraint, FOCAS — uncatchable AVE). Core driver list confirmed by v2 implementation team (protocol survey no longer needed for driver scoping). **UNS hierarchy snapshot walk** — per-site equipment-instance discovery (site/area/line/equipment + UUID assignment) to feed the initial schemas-repo hierarchy definition and canonical model; target done Q1Q2. **Build ACL surface** (per-cluster `EquipmentAcl` table, Admin UI, OPC UA NodeManager enforcement) — required before tier-1 cutover. **Deploy OtOpcUa to every site** as fast as practical. **Begin tier 1 cutover (ScadaBridge)** at large sites. **Prerequisite: certificate-distribution** to consumer trust stores before each cutover. **Aveva System Platform IO pattern validation** — Year 1 or early Year 2 research to confirm Aveva supports upstream OPC UA data sources, well ahead of Year 3 tier 3. _TBD — survey owner; first-cutover site selection; cutover plan owner (OtOpcUa team or integration team); enterprise shortname for UNS hierarchy root._ | **Complete tier 1 (ScadaBridge)** across all sites. **Begin tier 2 (Ignition)** — Ignition consumers redirected from direct-equipment OPC UA to each site's OtOpcUa, collapsing WAN session counts from *N per equipment* to *one per site*. **Build long-tail drivers** on demand as sites require them. Resolve Warsaw per-building multi-cluster consumer-addressing pattern (consumer-side stitching vs site-aggregator OtOpcUa instance). _TBD — per-site tier-2 rollout sequence._ | **Complete tier 2 (Ignition)** across all sites. **Execute tier 3 (Aveva System Platform IO)** with compliance stakeholder validation — the hardest cutover because System Platform IO feeds validated data collection. Reach steady state: every equipment session is held by OtOpcUa, every downstream consumer reads OT data through it. _TBD — per-equipment-class criteria for System Platform IO re-validation._ |
| **Redpanda EventHub** | Stand up central Redpanda cluster in South Bend (single-cluster HA). Stand up bundled Schema Registry. Wire SASL/OAUTHBEARER to enterprise IdP. Create initial topic set (prefix-based ACLs). Hook up observability minimum signal set. Define the three retention tiers (`operational`/`analytics`/`compliance`). **Stand up the central `schemas` repo** with `buf` CI, CODEOWNERS, and the NuGet publishing pipeline. **Publish the canonical equipment/production/event model v1** — including the canonical machine state vocabulary (`Running / Idle / Faulted / Starved / Blocked` + any agreed additions) as a Protobuf enum, the `equipment.state.transitioned` event schema, and initial equipment-class definitions for pilot equipment. This is the foundation for Digital Twin Use Cases 1 and 3 (see `goal-state.md` → Strategic Considerations → Digital twin) and is load-bearing for pillar 2. **Pilot equipment class for canonical definition: FANUC CNC** (pre-defined FOCAS2 hierarchy already exists in OtOpcUa v2 driver design). Land the FANUC CNC class template in the schemas repo before Tier 1 cutover begins. _TBD — sizing decisions, initial topic list, canonical vocabulary ownership (domain SME group)._ | Expand topic coverage as additional domains onboard. Enforce tiered retention and ACLs at scale. Prove backlog replay after a WAN-outage drill (also exercises the Digital Twin Use Case 2 simulation-lite replay path). Exercise long-outage planning (ScadaBridge queue capacity vs. outage duration). Iterate the canonical model as additional equipment classes and domains onboard. _TBD — concrete drill cadence._ | Steady-state operation. Harden alerting and runbooks against the observed failure modes from Years 12. Canonical model is mature and covers every in-scope equipment class; schema changes are routine rather than foundational. |
| **SnowBridge** | Design and begin custom build in .NET. **Filtered, governed upload to Snowflake is the Year 1 purpose** — the service is the component that decides which topics/tags flow to Snowflake, applies the governed selection model, and writes into Snowflake. Ship an initial version with **one working source adapter** — starting with **Aveva Historian (SQL interface)** because it's central-only, exists today, and lets the workstream progress in parallel with Redpanda rather than waiting on it. First end-to-end **filtered** flow to Snowflake landing tables on a handful of priority tags. Selection model in place even if the operator UI isn't yet (config-driven is acceptable for Year 1). _TBD — team, credential management, datastore for selection state._ | Add the **ScadaBridge/Redpanda source adapter** alongside Historian. Build and ship the operator **web UI + API** on top of the Year 1 selection model, including the blast-radius-based approval workflow, audit trail, RBAC, and exportable state. Onboard priority tags per domain under the UI-driven governance path. _TBD — UI framework._ | All planned source adapters live behind the unified interface. Approval workflow tuned based on Year 2 operational experience. Feature freeze; focus on hardening. |
| **Snowflake dbt Transform Layer** | Scaffold a dbt project in git, wired to the self-hosted orchestrator (per `goal-state.md`; specific orchestrator chosen outside this plan). Build first **landing → curated** model for priority tags. **Align curated views with the canonical model v1** published in the `schemas` repo — equipment, production, and event entities in the curated layer use the canonical state vocabulary and the same event-type enum values, so downstream consumers (Power BI, ad-hoc analysts, future AI/ML) see the same shape of data Redpanda publishes. This is the dbt-side delivery for Digital Twin Use Cases 1 and 3. Establish `dbt test` discipline from day one — including tests that catch divergence between curated views and the canonical enums. _TBD — project layout (single vs per-domain); reconciliation rule if derived state in curated views disagrees with the layer-3 derivation (should not happen, but the rule needs to exist)._ | Build curated layers for all in-scope domains. **Ship a canonical-state-based OEE model** as a strong candidate for the pillar-2 "not possible before" use case — accurate cross-equipment, cross-site OEE computed once in dbt from the canonical state stream, rather than re-derived in every reporting surface. Source-freshness SLAs tied to the **≤15-minute analytics** budget. Begin development of the first **"not possible before" AI/analytics use case** (pillar 2). | The "not possible before" use case is **in production**, consuming the curated layer, meeting its own SLO. Pillar 2 check passes. |