Initial commit: 3-year shopfloor IT/OT transformation plan

Core plan: current-state, goal-state (layered architecture, OtOpcUa, Redpanda EventHub, SnowBridge, canonical model, UNS posture + naming hierarchy, digital twin use cases absorbed), roadmap (7 workstreams x 3 years), and status bookmark. Component detail files: legacy integrations inventory (3 integrations, pillar 3 denominator closed), equipment protocol survey template (dual mandate with UNS hierarchy snapshot), digital twin management brief (conversation complete, outcome recorded). Output generation pipeline: specs for 18-slide mixed-stakeholder PPTX and faithful-typeset PDF, with README, design doc, and implementation plan. No generated outputs yet — deferred until source data is stable.
2026-04-17 09:12:35 -04:00
commit ec1dfe59e4
15 changed files with 2743 additions and 0 deletions
--- a/.gitignore
+++ b/.gitignore
@@ -0,0 +1 @@
 .DS_Store
--- a/CLAUDE.md
+++ b/CLAUDE.md
@@ -0,0 +1,43 @@
 # Project Context
 This directory contains a **3-year plan for transforming and enhancing shopfloor IT/OT interfaces and data collection**.
 Work in this repo focuses on planning, designing, and tracking the multi-year roadmap for modernizing shopfloor systems — bridging IT and OT layers, improving operator interfaces, and upgrading data collection pipelines.
 ## Structure
 Plan content lives in markdown files at the repo root to keep it easy to read and update:
 - [`current-state.md`](current-state.md) — snapshot of today's shopfloor IT/OT systems, integrations, data collection, and pain points.
 - [`goal-state.md`](goal-state.md) — target end-state at the close of the 3-year plan, including success criteria.
 - [`roadmap.md`](roadmap.md) — migration plan / sequencing from current state to goal state over the 3 years.
 - [`status.md`](status.md) — working-session bookmark; records where we left off and the top pending items. Not authoritative plan content.
 ### Component detail files
 - [`current-state/legacy-integrations.md`](current-state/legacy-integrations.md) — authoritative inventory of **bespoke IT↔OT integrations** that cross ScadaBridge-central outside ScadaBridge. Denominator for pillar 3 retirement.
 - [`current-state/equipment-protocol-survey.md`](current-state/equipment-protocol-survey.md) — authoritative inventory of **native equipment protocols** across the estate. Input to the OtOpcUa core driver library scope decision in Year 1.
 - [`goal-state/digital-twin-management-brief.md`](goal-state/digital-twin-management-brief.md) — meeting-prep artifact for the management conversation that turns "we want digital twins" into a scoped response. Parallel structure to `goal-state.md` → Strategic Considerations → Digital twin.
 ### Output generation pipeline
 - [`outputs/`](outputs/) — **repeatable PPTX + PDF generation** over the plan markdown. Entry point: [`outputs/README.md`](outputs/README.md) (trigger phrases + regeneration checklist). Structure anchors: [`outputs/presentation-spec.md`](outputs/presentation-spec.md) for the 18-slide mixed-stakeholder deck and [`outputs/longform-spec.md`](outputs/longform-spec.md) for the faithful-typeset long-form PDF. Regeneration trigger: `regenerate outputs` / `regenerate presentation` / `regenerate longform`. Outputs live under `outputs/generated/`; do not hand-edit them.
 As the plan grows, add further markdown files and link them from here.
 ## Breaking out components
 Individual components (a system, an interface, a data pipeline, an initiative) often need more detail than fits inline. When a section in `current-state.md` or `goal-state.md` outgrows a few paragraphs:
 1. Create a dedicated file under a topical subdirectory, e.g.:
   - `current-state/<component>.md`
   - `goal-state/<component>.md`
   - `components/<component>.md` (when it spans both states)
 2. Leave a short summary in the parent file and link to the detail file.
 3. Keep filenames lowercase-kebab-case so they're easy to reference.
 ## Conventions
 - Keep everything in markdown — no proprietary formats.
 - Prefer editing existing files over adding new ones; split a file only when it gets unwieldy or when a component needs its own detail page (see above).
 - Use `_TBD_` placeholders for sections that aren't yet filled in, so gaps stay visible.
--- a/STATUS.md
+++ b/STATUS.md
@@ -0,0 +1,119 @@
 # Plan — Working Session Status
 **Saved:** 2026-04-15
 **Claude Code session ID:** `4851af09-c6b2-41fd-a419-a43d8cf80721`
 **Resume with:** `claude --resume 4851af09-c6b2-41fd-a419-a43d8cf80721` (or pick from `/resume` inside Claude Code)
 > This file is a **bookmark**, not a replacement for the plan. The authoritative content lives in `CLAUDE.md`, `current-state.md`, `goal-state.md`, `roadmap.md`, and the component detail files under `current-state/`, `goal-state/`, and `outputs/`. Read this file only to find out where we left off.
 ## Where we are
 The plan is substantively fleshed out. All three core documents are populated, the canonical model + UNS framing has been declared, the digital twin management conversation has happened and been absorbed, and most open items are either design-time details the build team can close or items deliberately marked out of scope. An output generation pipeline (PPTX + PDF) has been scaffolded with spec files but no outputs generated yet.
 ### Files
 **Core plan content:**
 - [`CLAUDE.md`](CLAUDE.md) — plan purpose, document index (now including the component detail files and outputs pipeline), markdown-first conventions, component breakout rules.
 - [`current-state.md`](current-state.md) — snapshot of today's estate (enterprise layout, clusters, systems, integrations, equipment access patterns).
 - [`goal-state.md`](goal-state.md) — target end-state with Vision, layered architecture, **Unified Namespace posture + naming hierarchy standard**, component designs (OtOpcUa, SnowBridge, Redpanda EventHub with **Canonical Equipment/Production/Event Model + canonical state vocabulary**, dbt layer, ScadaBridge extensions), success criteria, observability, Strategic Considerations (Digital Twin with use cases received + Power BI), and Non-Goals.
 - [`roadmap.md`](roadmap.md) — 3-year workstreams × years grid with 7 workstreams and cross-workstream dependencies; Year 1 Redpanda and dbt cells updated for canonical model delivery.
 **Component detail files:**
 - [`current-state/legacy-integrations.md`](current-state/legacy-integrations.md) — authoritative inventory for pillar 3 retirement. **Closed as denominator = 3**: LEG-001 Delmia DNC, LEG-002 Camstar MES, LEG-003 custom email notification service. Historian MSSQL reporting surface explicitly carved out as *not* legacy.
 - [`current-state/equipment-protocol-survey.md`](current-state/equipment-protocol-survey.md) — template/schema for the Year 1 protocol survey that drives OtOpcUa core driver library scope. **Dual mandate:** same discovery walk also produces the initial UNS naming hierarchy snapshot (equipment-instance granularity) for the `schemas` repo. Six expected categories pre-seeded as placeholders.
 - [`goal-state/digital-twin-management-brief.md`](goal-state/digital-twin-management-brief.md) — meeting-prep artifact for the (now completed) digital twin management conversation; "Outcome" section at top captures the resolution.
 **Input / reference files:**
 - [`digital_twin_usecases.md.txt`](digital_twin_usecases.md.txt) — management's delivered requirements for digital twin (three use cases: standardized state model, virtual testing/simulation, cross-system canonical model). Source for the plan's digital twin response.
 **Output generation pipeline (specs only — no outputs generated yet):**
 - [`outputs/README.md`](outputs/README.md) — trigger phrases (`regenerate outputs` / `regenerate presentation` / `regenerate longform`), regeneration procedure, edit-this-not-that rules.
 - [`outputs/DESIGN.md`](outputs/DESIGN.md) — design for the generation pipeline.
 - [`outputs/IMPLEMENTATION-PLAN.md`](outputs/IMPLEMENTATION-PLAN.md) — scaffolding plan (partially executed — specs written, generation not yet run).
 - [`outputs/presentation-spec.md`](outputs/presentation-spec.md) — 18-slide mixed-stakeholder deck structure anchor.
 - [`outputs/longform-spec.md`](outputs/longform-spec.md) — faithful-typeset PDF structure anchor.
 - `outputs/diagrams/` and `outputs/generated/` — empty, waiting for first regeneration run.
 ### Major decisions captured (pointers, not restatements)
 - **Vision theme:** *stable, single point of integration between shopfloor OT and enterprise IT* — used as the tiebreaker for ambiguous decisions.
 - **Three in-scope pillars:** unification (100% of sites on standardized stack), analytics/AI enablement (≤15m analytics SLO, one "not possible before" use case in production), legacy middleware retirement (inventory to zero). Binary at end of plan.
 - **UX split:** Ignition owns KPI UX long-term; Aveva System Platform HMI owns validated-data UX long-term. Not a primary goal of this plan.
 - **IT↔OT boundary:** single crossing at ScadaBridge central. OT = machine data (System Platform, equipment OPC UA, OtOpcUa, ScadaBridge, Aveva Historian, Ignition). IT = enterprise apps (Camstar, Delmia, Snowflake, SnowBridge, Power BI/BOBJ).
 - **Layered architecture:** Layer 1 Equipment → Layer 2 OtOpcUa → Layer 3 SCADA (System Platform + Ignition) → Layer 4 ScadaBridge → Enterprise IT.
 - **OtOpcUa** (layer 2): custom-built, clustered, co-located on System Platform nodes, hybrid driver strategy (proactive core library + on-demand long-tail), OPC UA-native auth, **absorbs LmxOpcUa** as its System Platform namespace. Tiered cutover: ScadaBridge first, Ignition second, System Platform IO last. **Namespace architecture supports a future `simulated` namespace** for integration testing (digital twin use case 2) — architecturally supported, not committed for build.
 - **Redpanda EventHub:** self-hosted, central cluster in South Bend (single-cluster HA, VM-level DR out of scope), per-topic tiered retention (operational 7d / analytics 30d / compliance 90d), bundled Schema Registry, Protobuf via central `schemas` repo with `buf` CI, `BACKWARD_TRANSITIVE` compatibility, `TopicNameStrategy` subjects, `{domain}.{entity}.{event-type}` naming, site identity in message (not topic), SASL/OAUTHBEARER + prefix ACLs. Store-and-forward at ScadaBridge handles site resilience. **Analytics-tier retention is also a replay surface** for integration testing / simulation-lite (digital twin use case 2).
 - **Canonical Equipment, Production, and Event Model** (added via digital twin use cases 1 and 3): the plan commits to declaring the composition of OtOpcUa equipment namespace + Redpanda canonical topics + `schemas` repo + dbt curated layer as **the** canonical model. Three surfaces, one source of truth (`schemas` repo). Includes a **canonical machine state vocabulary** (`Running / Idle / Faulted / Starved / Blocked` + TBD additions like `Changeover`, `Maintenance`, `Setup`). Year 1 Redpanda and dbt cells are updated to deliver v1.
 - **Unified Namespace (UNS) posture:** the canonical model above is also declared as the plan's UNS, framed for stakeholders using UNS vocabulary. **Deliberate deviations from classic MQTT/Sparkplug UNS:** Kafka instead of MQTT (for analytics/replay), flat `{domain}.{entity}.{event-type}` topics with site in message (for bounded topic count), stateless events instead of Sparkplug state machine. Optional future **UNS projection service** (MQTT/Sparkplug and/or enterprise OPC UA aggregator) is architecturally supported but not committed for build; decision trigger documented.
 - **UNS naming hierarchy standard:** 5 levels always present — Enterprise → Site → Area → Line → Equipment, with `_default` placeholder where a level doesn't apply. Text form `ent.warsaw-west.bldg-3.line-2.cnc-mill-05` / OPC UA form `ent/warsaw-west/bldg-3/line-2/cnc-mill-05`. Stable **equipment UUIDv4** alongside the path (path is navigation, UUID is lineage). Authority lives in `schemas` repo; OtOpcUa / Redpanda / dbt consume the authoritative definition. **Enterprise shortname is currently `ent` placeholder — needs assignment.**
 - **SnowBridge:** custom-built machine-data-to-Snowflake upload service; Year 1 starting source is Aveva Historian SQL; UI + API with blast-radius-based approval workflow; selection state in internal datastore (not git).
 - **Snowflake transform tooling:** dbt only, run by a self-hosted orchestrator (specific orchestrator out of scope).
 - **Aggregation boundary:** aggregation lives in Snowflake (dbt). ScadaBridge does deadband/exception-based filtering (global default ~1% of span) plus tag opt-in via SnowBridge — not source-side summarization.
 - **Observability:** commit to signals (Redpanda, ScadaBridge, SnowBridge, dbt), tool is out of scope.
 - **Digital Twin (management-delivered use cases, 2026-04-15):** three use cases received — (1) standardized equipment state model, (2) virtual testing / simulation, (3) cross-system canonical model. **Use cases 1 and 3 absorbed into the plan** as the canonical state vocabulary + canonical model declaration (see above). **Use case 2 served minimally** by Redpanda historical replay + future OtOpcUa `simulated` namespace; full commissioning-grade simulation stays out of scope pending a separately funded initiative.
 - **Enterprise reporting coordination (BOBJ → Power BI migration, in-flight adjacent initiative):** three consumption paths analyzed (Snowflake dbt / Historian direct / both). Recommended position: **Path C with Path A as strategic direction** — most machine-data and cross-domain reports move to Snowflake over Years 2–3, compliance reports stay on Historian indefinitely. Conversation with reporting team still to be scheduled.
 - **Output generation pipeline:** PPTX + PDF generation from plan markdown, repeatability anchored by spec files (`presentation-spec.md`, `longform-spec.md`) rather than prompts. Spec files written; diagrams and generation run deferred until the source plan is stable.
 ## Top pending items (from most recent status check)
 All four items from the previous status check have been **advanced to the point where the next move is a real-world action** (management meeting, reporting-team conversation, field survey, or — for legacy — closed outright). The in-room plan work that could be done without external input has been done. The remaining open items are **external dependencies**, not plan-authoring gaps.
 ### External-dependency items — waiting on real-world action
 1. **BOBJ → Power BI coordination with reporting team.** Plan position documented in `goal-state.md` → Strategic Considerations → **Enterprise reporting: BOBJ → Power BI migration (adjacent initiative)** — three consumption paths analyzed, recommended position stated (Path C with Path A as strategic direction), eight questions and a four-bucket decision rubric included. **Action needed:** schedule the coordination conversation with the reporting team; bring back a bucket assignment. Once a bucket is assigned, update `goal-state.md` → Enterprise reporting and, if the outcome is Bucket A or B, update `roadmap.md` → Snowflake dbt Transform Layer to include reporting-shaped views.
 2. **Equipment protocol survey execution (dual mandate: protocol survey + initial UNS hierarchy snapshot).** Template, schema, classification rule, rollup views, and discovery approach documented in [`current-state/equipment-protocol-survey.md`](current-state/equipment-protocol-survey.md). The file is pre-seeded with expected categories (OPC UA-native, Siemens S7, Allen-Bradley EtherNet/IP, Modbus, Fanuc FOCAS, long-tail) as placeholders — these are **not authoritative** until real discovery data replaces them. **Dual mandate:** the same discovery walk also produces the initial **UNS naming hierarchy snapshot** (equipment-instance granularity, with site/area/line assignments and stable UUIDs) that gets committed to the `schemas` repo — see `current-state/equipment-protocol-survey.md` → **Companion deliverable** and `goal-state.md` → **Unified Namespace (UNS) posture → UNS naming hierarchy standard**. Two outputs, one walk. **Action needed:** assign a survey owner; walk System Platform IO config, Ignition OPC UA connections, and ScadaBridge templates (steps 1–3 of the discovery approach) within Q1 of Year 1 — capturing both equipment-class data (for the protocol survey) and equipment-instance data (for the UNS hierarchy) in the same pass; interview-driven gap-filling follows in parallel with core driver build. This is a **Year 1 prerequisite** — the OtOpcUa core driver library cannot be scoped without it, and the canonical model v1 cannot be published without the initial hierarchy snapshot. **Sub-blocker:** the UNS hierarchy's enterprise-level shortname is currently a placeholder (`ent` in goal-state.md); the real shortname needs to be assigned before the initial hierarchy snapshot can be committed to the `schemas` repo. Small decision, blocking the hierarchy materialization.
 3. **Digital twin use case 2 — funded simulation initiative (exploratory).** The digital twin management conversation is complete; management provided three use cases and the plan absorbs two of them (canonical state vocabulary + canonical model declaration — see closed items). **Use case 2 (Virtual Testing / Simulation)** is served minimally by Redpanda historical replay + OtOpcUa's architectural support for a future `simulated` namespace, but **full commissioning-grade simulation stays out of scope** for this plan. **No action needed** unless and until a funded simulation initiative materializes with a sponsor, scope, and timeline; at that point the meeting-prep brief at [`goal-state/digital-twin-management-brief.md`](goal-state/digital-twin-management-brief.md) can be reused for a use-case-2-specific scoping conversation. Keep on the radar, don't actively work on it.
 ### Closed since last status check
 All closed items below were worked through the same 2026-04-15 session. Grouped roughly chronologically.
 **Denominators and discovery templates:**
 - ~~**Legacy integration inventory population.**~~ **Closed 2026-04-15.** The inventory in `current-state/legacy-integrations.md` is complete as the pillar 3 denominator: **3 rows** — LEG-001 Delmia DNC, LEG-002 Camstar MES (Camstar-initiated, confirmed this session), LEG-003 custom email notification service (added this session). Historian's MSSQL reporting surface (BOBJ / Power BI) was explicitly carved out as **not legacy** and documented under "Deliberately not tracked" in the inventory file — the rationale is that Historian's SQL interface is its native consumption surface, not a bespoke integration. Detail fields on the three rows (sites, owners, volumes, exact transports) remain `_TBD_` and will get filled in during migration planning.
 - ~~**Equipment protocol survey template.**~~ **Advanced 2026-04-15.** The survey was listed as a Year 1 prerequisite but had no template; now a full template with schema, classification rule, rollup views, and discovery approach lives at `current-state/equipment-protocol-survey.md`. **Then further advanced** to carry a dual mandate (see below). Still open: actually running the survey (tracked above as item #2).
 **Digital twin — full lifecycle in one session:**
 - ~~**Digital twin conversation preparation.**~~ **Advanced 2026-04-15.** The 8 clarification questions existed but lacked framing; now wrapped with a full meeting-prep brief at `goal-state/digital-twin-management-brief.md`. Superseded by the conversation outcome below.
 - ~~**Digital twin management conversation.**~~ **Closed 2026-04-15.** Conversation happened. Management delivered three use cases (source: `digital_twin_usecases.md.txt`): (1) standardized equipment state model, (2) virtual testing / simulation, (3) cross-system canonical model. Plan response: (a) use cases 1 and 3 absorbed as plan additions (see Canonical Model + UNS work below); (b) use case 2 served minimally by Redpanda historical replay + OtOpcUa architectural support for a future `simulated` namespace — full commissioning simulation stays out of scope. The `goal-state.md` Digital Twin section, the `digital-twin-management-brief.md` outcome, and the Year 1 Redpanda/dbt roadmap cells are all updated. Narrower open item carried forward as external-dependency item #3 above.
 **Canonical model and UNS work (follows from digital twin use cases 1 and 3):**
 - ~~**Canonical Equipment, Production, and Event Model declaration.**~~ **Closed 2026-04-15.** New subsection under `goal-state.md` → Async Event Backbone declares the canonical model: three surfaces (OtOpcUa equipment namespace, Redpanda topics + Protobuf schemas, dbt curated layer) with `schemas` repo as single source of truth. Committed **canonical machine state vocabulary** (`Running / Idle / Faulted / Starved / Blocked` + TBD additions) with explicit semantics, rules, and governance. OEE computed on the canonical state stream named as a candidate for pillar 2's "not possible before" use case. Year 1 Redpanda cell in `roadmap.md` commits to publishing v1.
 - ~~**Unified Namespace (UNS) posture declaration.**~~ **Closed 2026-04-15.** New subsection under `goal-state.md` → Target IT/OT Integration declares the canonical model as **the plan's UNS**, with three deliberate deviations from classic MQTT/Sparkplug UNS (Kafka instead of MQTT, flat topics with site-in-message, stateless events instead of Sparkplug state). Optional future **UNS projection service** (MQTT/Sparkplug and/or enterprise OPC UA aggregator) documented as architecturally supported but not committed for build. Cross-references added from Canonical Model subsection and Digital Twin section.
 - ~~**UNS naming hierarchy standard.**~~ **Closed 2026-04-15.** Five-level hierarchy committed: Enterprise → Site → Area → Line → Equipment, always present, `_default` placeholder where a level doesn't apply. Naming rules align with Redpanda topic convention (`[a-z0-9-]`, dots/slashes for segments, hyphens within). Stable **equipment UUIDv4** alongside the path. Authority in `schemas` repo. Evolution governance, worked examples, out-of-scope list (no product/job hierarchy — that's Camstar MES), and TBDs all captured. `current-state/equipment-protocol-survey.md` updated to note the dual mandate — same discovery walk produces the initial hierarchy snapshot at equipment-instance granularity.
 **Adjacent initiatives:**
 - ~~**BOBJ → Power BI coordination framing.**~~ **Advanced 2026-04-15.** The coordination question was flagged but no plan position existed; now documented as a new Strategic Considerations subsection in `goal-state.md` with three paths, recommended position, and eight questions for the reporting team. Still open: actually having the coordination conversation (tracked above as item #1).
 **Output generation pipeline:**
 - ~~**Repeatable PPTX + PDF generation pipeline.**~~ **Advanced 2026-04-15.** Design brainstormed (A+D pattern — Claude in full control, spec-file anchors, no templates yet). Directory scaffold created at `outputs/`. README, DESIGN, IMPLEMENTATION-PLAN, presentation-spec (18 slides, mixed-stakeholder), and longform-spec (3 chapters + 2 appendices, faithful typeset) all written. **Deferred:** Mermaid diagram source files, first PNG rendering, first PPTX and PDF generation, inaugural run-log entry — all wait on source data being stable. Trigger phrases (`regenerate outputs` / `regenerate presentation` / `regenerate longform`) documented in `outputs/README.md` for any future session.
 Items that can wait, design details that close during implementation, and deliberately deferred / out-of-scope items are listed in the working conversation — no need to re-enumerate here; they're all captured as `_TBD_` markers in the authoritative files.
 ## Recommended resume flow
 1. Resume the Claude Code session (`claude --resume 4851af09-c6b2-41fd-a419-a43d8cf80721`).
 2. Skim this file and `CLAUDE.md` to re-orient (~2 minutes).
 3. Pick one of the three external-dependency items above — or whatever has become most pressing — and tell me which.
 4. If you've already had the Power BI coordination conversation with the reporting team, bring the answers and I'll fold them into the plan.
 5. If a funded simulation initiative has materialized (digital twin use case 2), say so and I'll reuse the meeting brief for a scoping conversation.
 6. If the equipment protocol survey walk has been run, bring the data and I'll populate both `current-state/equipment-protocol-survey.md` and the initial UNS hierarchy snapshot in one pass.
 7. If the enterprise shortname has been assigned (currently `ent` placeholder in `goal-state.md`), tell me and I'll propagate the find-and-replace across the UNS hierarchy examples.
 8. When ready to generate outputs (deck + PDF), say `regenerate outputs`, `regenerate presentation`, or `regenerate longform` and I'll follow the checklist in `outputs/README.md`.
 ## What not to do on resume
 - Don't re-open settled decisions without a reason. The plan's decisions are load-bearing and have explicit rationale captured inline; reversing one should require new information, not re-litigation.
 - Don't add new workstreams to `roadmap.md` without a matching commitment to one of the three pillars. That's how plans quietly bloat.
 - Don't let Digital Twin reappear as a new committed workstream. Management's three use cases have been resolved: uses 1 and 3 absorbed into the canonical model + UNS work; use 2 stays out of scope unless a separately funded simulation initiative materializes. Full commissioning-grade simulation is not a stealth pillar.
 - Don't let Copilot 365 reappear. It was deliberately removed earlier — it's handled implicitly by the Snowflake/dbt + canonical model path.
 - Don't build a parallel MQTT UNS broker just because "UNS" means MQTT to many vendors. The plan's UNS posture is deliberate: Redpanda IS the UNS backbone, and a projection service is a small optional addition when a specific consumer requires it — not the default path.
 - Don't hand-edit files under `outputs/generated/` — they're disposable, regenerated from the spec files on every run. Edit specs or source plan files instead.
--- a/current-state.md
+++ b/current-state.md
@@ -0,0 +1,160 @@
 # Current State
 Snapshot of today's shopfloor IT/OT interfaces and data collection. Keep this updated as discovery progresses.
 > When a section below grows beyond a few paragraphs, break it out into `current-state/<component>.md` and leave a short summary + link here. See [`CLAUDE.md`](CLAUDE.md#breaking-out-components).
 ## Enterprise Layout
 ### Primary Data Center
 - **South Bend Data Center** — primary data center.
 ### Largest Sites
 - **Warsaw West campus**
 - **Warsaw North campus**
 > Largest sites run **one server cluster per production building** (each larger production building gets its own dedicated cluster of equipment servers).
 ### Other Integrated Sites
 - **Shannon**
 - **Galway**
 - **TMT**
 - **Ponce**
 > Other integrated sites run a **single server cluster** covering the whole site.
 ### Not Yet Integrated
 - A number of **smaller sites globally** are **not yet integrated** into the current SCADA system. Known examples include:
  - **Berlin**
  - **Winterthur**
  - **Jacksonville**
  - _…others — see note on volatility below._
 - Characteristic: these tend to be **smaller footprint** sites distributed across multiple regions (EU, US, etc.), likely requiring a lighter-weight onboarding pattern than the large Warsaw campuses.
 - **Volatility note:** the list of smaller sites is **expected to change** — sites may be added, removed, reprioritized, or handled by adjacent programs. This file deliberately **does not** dive into per-site detail (equipment, PLC vendors, network topology, etc.) for the smaller sites because that detail would go stale quickly. Rely on the named examples as illustrative rather than authoritative until a firm enterprise-wide site list is established.
 ## Systems & Interfaces
 ### SCADA — Split Stack
 SCADA responsibilities are split across two platforms by purpose:
 - **Aveva System Platform** — used for **validated data collection** (regulated/compliance-grade data).
 - **Ignition SCADA** — used for **KPI** monitoring and reporting.
 ### Aveva System Platform
 - **Role:** validated data collection (see SCADA split above).
 - **Primary cluster:** hosted in the **South Bend Data Center**.
 - **Site clusters:** each smaller site runs its own **site-level application server cluster** on Aveva System Platform.
 - **Version:** **Aveva System Platform 2023 R2** across the estate. _TBD — whether every cluster is actually at 2023 R2 (confirm no version skew between primary and site clusters) and the patch/update level within 2023 R2._
 - **Galaxy structure:** federation is handled entirely through **Global Galaxy** — that is the structural shape of the System Platform estate. Individual site galaxies and the primary cluster galaxy are tied together via Global Galaxy rather than a separate enterprise-galaxy layer on top. _TBD — exact count of underlying galaxies, naming, and which objects live where in the federation._
 - **Inter-cluster communication:** clusters talk to each other via this Global Galaxy federation.
 - **Redundancy model:** **hot-warm pairs** — Aveva System Platform's standard AppEngine redundancy pattern. Each engine runs a hot primary with a warm standby partner; the warm partner takes over on primary failure. Applies across both the primary cluster in South Bend and the site-level application server clusters. _TBD — which engines specifically run as redundant pairs (not every engine in a galaxy typically does), failover drill cadence, and how redundancy interacts with Global Galaxy federation during a failover._
 - **Web API interface:** a Web API runs on the **primary cluster** (South Bend), serving as the enterprise-level integration entry point. It currently exposes two integration interfaces:
  - **Delmia DNC** — interface for DNC (file/program distribution) integration.
  - **Camstar MES** — interface for MES integration.
 - **Out of scope for this plan:** licensing posture. License model and renewal strategy are not tracked here even if they shift as Redpanda-based event flows offload work from System Platform.
 - _TBD — patch/update level within 2023 R2, full Galaxy structure detail, and per-engine redundancy specifics (all tracked inline above)._
 ### Ignition SCADA
 - **Role:** KPI monitoring and reporting (see SCADA split above).
 - **Deployment topology:** **centrally hosted in the South Bend Data Center** today. Ignition is **not** deployed per-site — there is a single central Ignition footprint, and every site's KPI UX reaches it over the WAN. This is the opposite of the Aveva System Platform topology (which has site-level clusters) and means Ignition KPI UX at a site depends on WAN reachability to South Bend.
 - **Data source today: direct OPC UA from equipment.** Ignition pulls data **directly over OPC UA** from equipment — it does **not** go through ScadaBridge, LmxOpcUa, Aveva Historian, or the Global Galaxy to get its values. Because Ignition is centrally hosted in South Bend, this means **OPC UA connections run from South Bend to every site's equipment over the WAN**.
  - **Contrast with ScadaBridge:** ScadaBridge is built around a data-locality principle (equipment talks to the local site's ScadaBridge instance). Ignition does the opposite today — equipment talks to a remote central Ignition over WAN OPC UA.
  - **Implication for WAN outages:** during a WAN outage between a site and South Bend, Ignition loses access to that site's equipment — KPI UX for that site goes stale until the WAN recovers. This is a known characteristic of the current topology, not a defect to fix piecemeal; any remediation belongs in the goal-state discussion about Ignition's future deployment shape.
 - **Version:** **Ignition 8.3**.
 - **Modules in use:**
  - **Perspective** — Ignition's web-native UX module, used for the KPI user interface.
  - **OPC UA** — used to pull data directly from equipment (see data source above).
  - **Reporting** — used for KPI/operational reports on top of Ignition.
  - Notable **not** in use: **Tag Historian** (Aveva Historian is the sole historian in the estate), **Vision** (Perspective is the only UX module), and no third-party modules (no Sepasoft MES, no Cirrus Link MQTT, etc.).
 - _TBD — whether a per-site or regional Ignition footprint is on the roadmap given the WAN-dependency implication, and the patch level within 8.3._
 ### ScadaBridge (in-house)
 - **What:** clustered **Akka.NET** application built in-house.
 - **Role:** interfaces with **OPC UA** sources, bridging device/equipment data into the broader SCADA stack.
 - **Capabilities:**
  - **Scripting** — custom logic can be written and executed inside the bridge. Scripts run in **C# via Roslyn scripting** (the same language as ScadaBridge itself), so users can reuse .NET libraries and ScadaBridge's internal types without an extra binding layer.
  - **Templating** — reusable templates for configuring devices/data flows at scale. **Authoring and distribution model:**
    - Templates are authored in a **UI** (not hand-edited files).
    - The UI writes template definitions to a **central database** that serves as the source of truth for all templates across the enterprise.
    - When templates are updated, changes are **serialized and pushed from the central DB out to the site server clusters**, so every ScadaBridge cluster runs a consistent, up-to-date template set without requiring per-site edits.
    - _TBD — serialization format on the wire, push mechanism (pull vs push), conflict/version handling if a site is offline during an update, audit trail of template changes._
  - **Secure Web API (inbound)** — external systems can interface with ScadaBridge over an authenticated Web API. Authentication is handled via **API keys** — clients present a static, per-client API key on each call. _TBD — key issuance and rotation process, storage at the client side, scoping (per client vs per capability), revocation process, audit trail of key usage._
  - **Web API client (outbound) — pre-configured, script-callable.** ScadaBridge provides a generic outbound Web API client capability: **any Web API can be pre-configured** (endpoint URL, credentials, headers, auth scheme, etc.) and then **called easily from scripts** using the configured name. There is no hard-coded list of "known" external Web APIs — the set of callable APIs is whatever is configured today, and new APIs can be added without ScadaBridge code changes.
  - **Notifications — contact-list driven, transport-agnostic.** ScadaBridge maintains **contact lists** (named groups of recipients) as a first-class concept. Scripts send notifications **to a contact list**; ScadaBridge handles the delivery over the appropriate transport (**email** or **Microsoft Teams**) based on how the contacts are configured. Scripts do not care about the transport — they call a single "notify" capability against a named contact list, and routing/fan-out happens inside ScadaBridge. New contact lists and new recipients can be added without script changes.
  - **Database writes** — can write to databases as a sink for collected/processed data. **Supported target today: SQL Server only** — other databases (PostgreSQL, Oracle, etc.) are not currently supported. _TBD — whether a generic ADO.NET / ODBC path is planned to broaden support, or whether SQL Server is intentionally the only target._
  - **Equipment writes via OPC UA** — can write back to equipment over OPC UA (not just read).
  - **EventHub forwarding** — can forward events to an **EventHub** (Kafka-compatible) for async downstream consumers.
  - **Store-and-forward (per-call, optional)** — Web API calls, notifications, and database interactions can **optionally** be cached in a **store-and-forward** queue on a **per-call basis**. If the downstream target is unreachable (WAN outage, target down), the call is persisted locally and replayed when connectivity returns — preserving site resilience without forcing every integration to be async.
 - **Deployment topology:** runs as **2-node Akka.NET clusters**, **co-located on the existing Aveva System Platform cluster nodes** (no dedicated hardware — shares the same physical/virtual nodes that host System Platform).
 - **Benchmarked throughput (OPC UA ingestion ceiling):** a single 2-node site cluster has been **benchmarked** to handle **~225,000 OPC UA updates per second** at the ingestion layer. This is the **input rate ceiling**, not the downstream work rate — triggered events, script executions, Web API calls, DB writes, EventHub forwards, and notifications all happen on a filtered subset of those updates and run at significantly lower rates. _TBD — actual production load per site (typically far below this ceiling), downstream work-rate profile (what percent of ingested updates trigger work), whether the benchmark was sustained or peak, and the co-located System Platform node headroom at benchmark load._
 - **Supervision model (Akka.NET):** ScadaBridge uses Akka.NET supervision to self-heal around transient failures. Concretely:
  - **OPC UA connection restarts.** When an OPC UA source disconnects, returns malformed data, or stalls, ScadaBridge **restarts the connection to that source** rather than letting the failure propagate up. Individual source failures are isolated from each other.
  - **Actor tree restarts on failure.** When a failure escalates beyond a single connection (e.g., a faulty script or a downstream integration wedged in an unrecoverable state), ScadaBridge can **restart the affected actor tree**, bringing its children back to a known-good state without taking the whole cluster down.
  - _TBD — specific supervision strategies per actor tier (OneForOne vs AllForOne, restart limits, backoff), what failures escalate to cluster-level rather than tree-level, and how recurring script failures are throttled/quarantined rather than restart-looped._
 - **Downstream consumers / integration targets in production today:**
  - **Aveva System Platform — via LmxOpcUa.** ScadaBridge interacts with System Platform through the in-house **LmxOpcUa** server rather than a direct System Platform API. LmxOpcUa exposes System Platform objects over OPC UA; ScadaBridge reads from and writes to System Platform through that OPC UA surface. This is the primary OT-side consumer.
  - **Internal Web APIs.** ScadaBridge makes outbound calls to internal enterprise Web APIs using its generic pre-configured Web API client capability (see Capabilities above). Because any Web API can be configured dynamically, there is no fixed enumeration of "ScadaBridge's Web API integrations" to capture here; specific IT↔OT Web API crossings land in the legacy integrations inventory (`current-state/legacy-integrations.md`) regardless of whether they're reached via ScadaBridge's generic client.
  - **Batch tracking database.** ScadaBridge writes batch tracking records directly to a SQL Server batch tracking database.
  - **Camstar MES — direct.** ScadaBridge integrates with Camstar via a **direct outbound Web API call from ScadaBridge to Camstar**, using its own Web API client and credentials. It does **not** go through the Aveva System Platform primary cluster's Camstar Web API interface (LEG-002). This means ScadaBridge already has a native Camstar path; the LEG-002 retirement work is about moving the **other** consumers of that System Platform Web API off it, not about building a new ScadaBridge-to-Camstar path.
  - _TBD — other databases written to besides batch tracking, and any additional consumers not listed here. **Enumeration of internal Web API endpoints is not tracked here** because ScadaBridge's Web API client is generic/configurable (see Capabilities); specific IT↔OT Web API crossings that need migration live in `current-state/legacy-integrations.md`. **Notification destination teams are similarly not enumerated** because they're contact-list-driven and transport-agnostic (see Capabilities) — the list of actual recipients lives in ScadaBridge's configuration, not in this plan._
 - **Routing topology:**
  - **Hub-and-spoke** — ScadaBridge nodes on the **central cluster (South Bend)** can route to ScadaBridge nodes on other clusters, forming a hub-and-spoke network with the central cluster as the hub.
  - **Direct access** — site-level ScadaBridge clusters can also be reached directly (not only via the hub), enabling point-to-point integration where appropriate.
 - **Data locality (design principle):** ScadaBridge is designed to **keep local data sources localized** — equipment at a site communicates with the **local ScadaBridge instance** at that site, not with the central cluster. This minimizes cross-site/WAN traffic, reduces latency, and keeps site operations resilient to WAN outages.
 - **Deployment status:** ScadaBridge is **already deployed** across the current cluster footprint. However, **not all legacy API integrations have been migrated onto it yet** — some older point-to-point integrations still run outside ScadaBridge and need to be ported. The authoritative inventory of these integrations (and their retirement tracking against `goal-state.md` pillar 3) lives in [`current-state/legacy-integrations.md`](current-state/legacy-integrations.md).
 - _TBD — scripting language/runtime, template format, Web API auth model, supported DB targets, supervision model, throughput, downstream consumers, resource impact of co-location with System Platform._
 ### LmxOpcUa (in-house)
 - **What:** in-house **OPC UA server** with **tight integration to Aveva System Platform**.
 - **Role:** exposes System Platform data/objects via OPC UA, enabling OPC UA clients (including ScadaBridge and third parties) to consume System Platform data natively.
 - **Goal-state note:** in the target architecture, LmxOpcUa is **folded into `OtOpcUa`** — the new unified site-level OPC UA layer. Its System Platform namespace carries forward; it runs alongside a new equipment namespace on the same per-site clustered OPC UA server. See `goal-state.md` → **OtOpcUa — the unified site-level OPC UA layer** for the fold-in details.
 - **Deployment footprint:** built and deployed to **every Aveva System Platform node** — primary cluster in South Bend and every site-level application server cluster. LmxOpcUa is not a centralized gateway; each System Platform node runs its own local instance, so OPC UA clients can reach the System Platform objects hosted on that node directly.
 - **Namespace source:** each LmxOpcUa instance is built to interface with its **local Application Platform's LMX API**. The OPC UA address space exposed by a given LmxOpcUa node reflects the System Platform objects reachable through **that node's** LMX API — i.e., the namespace is inherently per-node and scoped to whatever the local App Server surfaces. Cross-node visibility happens at the System Platform / Global Galaxy layer, not at the LmxOpcUa layer.
 - **Security model:** standard **OPC UA security** — supports the standard OPC UA security modes (`None` / `Sign` / `SignAndEncrypt`) and standard security profiles (`Basic256Sha256` and related), with **UserName token** authentication for clients. No bespoke auth scheme. _TBD — which security mode + profile combinations are required vs allowed in production, where the UserName credentials come from (local accounts, AD/LDAP, a dedicated credential store), and how credentials are rotated and audited._
 - _TBD — exact OPC UA namespace shape exposed to clients (hierarchy mirroring Galaxy areas/objects vs flat vs custom), and how ScadaBridge templates address equipment across multiple per-node LmxOpcUa instances._
 ### Equipment OPC UA — multiple direct connections today
 - **Current access pattern:** some equipment is connected to by **multiple systems directly**, concurrently, rather than through a single shared access layer. Depending on the equipment, any of the following may hold OPC UA sessions against it at the same time:
  - **Aveva System Platform** (for validated data collection via its IO drivers)
  - **Ignition SCADA** (for KPI data, central from South Bend over the WAN — see Ignition data source)
  - **ScadaBridge** (for bridge/integration workloads via its Akka.NET OPC UA client)
 - **Consequences of the current pattern:**
  - **Multiple OPC UA sessions per equipment.** Equipment takes the session load of every consumer independently, which can strain devices with limited concurrent-session support.
  - **No single access-control point.** Authorization is enforced by whatever each consumer happens to present to the equipment — no site-level chokepoint exists to inspect, audit, or limit equipment access.
  - **Inconsistent data.** The same tag read by three different consumers can produce three subtly different values (different sampling intervals, different deadbands, different session buffers).
 - _TBD — exact inventory of which equipment is reached by which consumers today; whether any equipment is already fronted by a shared OPC UA aggregator at the site level._
 - **Equipment protocol survey.** The authoritative inventory of **native equipment protocols** across the estate (Modbus, EtherNet/IP, Siemens S7, Fanuc FOCAS, native OPC UA, long-tail) lives in [`current-state/equipment-protocol-survey.md`](current-state/equipment-protocol-survey.md). That file is the Year 1 input to the **OtOpcUa core driver library** scope — see [`goal-state.md`](goal-state.md) → OtOpcUa → Driver strategy and [`roadmap.md`](roadmap.md) → OtOpcUa → Year 1.
 ### Camstar MES (sole MES)
 - **Role:** the **only MES** in use across the estate. There are no other MES products at any site — Camstar is the enterprise-wide system.
 - **Integration today:** accessed from the shopfloor via the **Camstar interface on the Aveva System Platform primary cluster's Web API** (LEG-002 in the legacy integrations inventory), and separately by **ScadaBridge** (path TBD — see ScadaBridge downstream consumers).
 - _TBD — Camstar version, hosting (on-prem vs SaaS), owner team, which business capabilities it covers._
 ### Aveva Historian (sole historian)
 - **Role:** the **only historian** in use across the estate. No other historian products (OSIsoft PI, Canary, GE Proficy, etc.) run at any site.
 - **Deployment topology: central-only in the South Bend Data Center.** A single Aveva Historian instance in South Bend serves the entire estate. There are **no per-site tier-1 historians**, and there is **no tier-1 → tier-2 replication** model in play today — every site's historian data lands directly in the central South Bend historian.
  - **Implication for ingestion:** the SnowBridge reads from **one** historian, not many — no per-site historian enumeration, no replication topology to account for.
  - **Implication for WAN:** because the historian is central, the collection path from a site's System Platform cluster to the historian already crosses the WAN today. This is a pre-existing WAN dependency, not something this plan introduces.
 - **Version:** **2023 R2**, same release cadence as Aveva System Platform.
 - **Retention policy: permanent.** No TTL or rollup is applied — historian data is retained **forever** as a matter of policy. This means the "drill-down to Historian for raw data" pattern in `goal-state.md` works at any historical horizon, and the historian is the authoritative long-term system of record for validated tag data regardless of how much Snowflake chooses to store.
 - **Integration role:** serves as the system of record for validated/compliance-grade tag data collected via Aveva System Platform, and exposes a **SQL interface** (OPENQUERY and history views) for read access. Downstream use of that SQL interface for Snowflake ingestion is discussed in `goal-state.md` under Aveva Historian → Snowflake.
 - **Current consumers (reporting):** the primary consumer of Historian data today is **enterprise reporting**, currently on **SAP BusinessObjects (BOBJ)**. Reporting is actively **migrating from SAP BOBJ to Power BI** — this is an in-flight transition that this plan should be aware of but does not own.
  - **Implication for pillar 2:** the "enterprise analytics/AI enablement" target in `goal-state.md` sits alongside this Power BI migration, not in competition with it. Whether Power BI consumes from Snowflake (via the dbt curated layer), from Historian directly, or from both is a TBD that coordinates between the two initiatives.
 - _TBD — current storage footprint and growth rate, other consumers beyond reporting (e.g., Aveva Historian Client / Insight / Trend tools, ad-hoc analyst SQL, regulatory/audit exports), and how the BOBJ→Power BI migration coordinates with the Snowflake path for machine data._
 _TBD — additional shopfloor systems and HMIs not covered above (if any)._
 ## IT/OT Integration Points
 _TBD — how IT and OT layers currently connect (protocols, gateways, brokers)._
 ## Data Collection
 _TBD — what data is collected, how, where it lands, frequency, gaps._
 ## Operator / User Interfaces
 _TBD — current UIs operators interact with, pain points._
 ## Known Pain Points & Constraints
 _TBD._
 ## Stakeholders
 _TBD._
--- a/current-state/equipment-protocol-survey.md
+++ b/current-state/equipment-protocol-survey.md
@@ -0,0 +1,232 @@
 # Equipment Protocol Survey
 The authoritative inventory of **equipment protocols in use across the estate**. Feeds scope decisions for the **OtOpcUa core driver library** (see [`../goal-state.md`](../goal-state.md) → OtOpcUa → Driver strategy).
 > This file is the **input** to the Year 1 OtOpcUa decision: which protocols get a driver built **proactively** (core library) vs. which get built **on-demand** (long-tail) as each site onboards. Without this survey, the core driver library cannot be scoped — the whole Year 1 OtOpcUa workstream sits on top of it. See [`../roadmap.md`](../roadmap.md) → OtOpcUa → Year 1.
 ## Why this inventory exists (and why it's not the legacy-integrations inventory)
 Different question, different denominator:
 - [`legacy-integrations.md`](legacy-integrations.md) tracks **bespoke IT↔OT integrations that cross the ScadaBridge-central boundary outside ScadaBridge**. Denominator for pillar 3 retirement.
 - **This file** tracks **equipment protocols on the OT side** — the native protocols that PLCs, controllers, and instruments actually speak on the shopfloor, which OtOpcUa has to translate into OPC UA. Denominator for the OtOpcUa core driver library scope.
 They are unrelated lists. Equipment that already speaks OPC UA natively is **in scope for this survey** (because OtOpcUa still needs to connect to it), even though "OPC UA → OtOpcUa" is not a driver build.
 ## Companion deliverable: initial UNS hierarchy snapshot
 **The discovery walk for this survey is the same walk that produces the initial UNS naming-hierarchy snapshot.** See [`../goal-state.md`](../goal-state.md) → **Target IT/OT Integration → Unified Namespace (UNS) posture → UNS naming hierarchy standard**.
 The UNS hierarchy (Enterprise → Site → Area → Line → Equipment, with stable equipment UUIDs) and this protocol survey both require someone to walk System Platform IO config, Ignition OPC UA connections, and ScadaBridge templates across the estate. Running those walks once and producing two artifacts is dramatically cheaper than running them twice.
 **Different granularity, same source:**
 - **This file** captures data at **equipment-class granularity** — "approximately 40 three-axis CNC mills, Vendor X, Modbus TCP, across Warsaw West and Warsaw North" — because the core driver library scope is a per-protocol decision and individual instance detail would be noise at that level.
 - **The UNS hierarchy snapshot** captures data at **equipment-instance granularity** — one entry per physical machine, with site / area / line / equipment-name / stable UUID — because the hierarchy is a per-instance addressing surface.
 **Guidance for the walker:** at every step of the Discovery approach below, capture both levels of data in the same pass. Each equipment instance found becomes both (a) an increment to this file's equipment-class count for its protocol, and (b) a row in the initial UNS hierarchy snapshot for the `schemas` repo. Do not split into two discovery efforts.
 The initial hierarchy snapshot is not stored in this file (it belongs in the `schemas` repo as part of the UNS canonical definition), but capturing it is a **required output** of the same discovery walk. If the walker produces only protocol-survey data and no hierarchy data, the work has to be redone for the UNS snapshot — unacceptable duplication.
 ## How to use this file
 - **One row per (site, equipment class, protocol) combination.** An equipment class is a grouping of functionally equivalent machines — "3-axis CNC milling machines, Vendor X, 2015–2020 generation" is one class, regardless of how many individual machines fit that description. If the same equipment class appears at multiple sites speaking the same protocol, list all sites in the `Sites` field of a single row. If the same class speaks different protocols at different sites (e.g., older fleet on Modbus, newer fleet on OPC UA), split into separate rows.
 - **Rows are discovery-grade.** Exact model numbers, exact counts, and exact firmware revisions are **not** required. The survey only has to be precise enough to scope driver work — "approximately 40 units across 3 sites, Modbus TCP" is enough; a CMDB dump is not.
 - **Missing values are fine during discovery** — mark them `_TBD_` rather than leaving blank.
 - **Rows are not removed once the driver ships.** A row represents "this protocol exists in the estate," which stays true even after OtOpcUa can speak it. Rows are removed only when the underlying equipment is decommissioned.
 ## Field schema
 | Field | Description |
 |---|---|
 | `ID` | Short stable identifier (e.g., `EQP-001`). Never reused. |
 | `Equipment Class` | Human-readable grouping — machine type + vendor + generation. Precise enough to mean something to a shopfloor engineer, not precise enough to require a part number. |
 | `Vendor(s)` | Equipment manufacturer(s) covered by this class. |
 | `Native Protocol` | The protocol the equipment actually speaks on the wire. One of: `OPC UA`, `Modbus TCP`, `Modbus RTU`, `EtherNet/IP` (CIP), `Siemens S7` (ISO-on-TCP), `Profinet`, `Profibus`, `DeviceNet`, `MTConnect`, `Fanuc FOCAS`, `Heidenhain DNC`, `Siemens SINUMERIK OPC UA`, `ASCII serial`, `proprietary`, other (name it). |
 | `Protocol Variant / Notes` | Sub-variant if relevant (e.g., `Modbus TCP with custom register map`, `EtherNet/IP explicit messaging only`, `S7-300 vs S7-1500`). |
 | `Sites` | Sites where equipment in this class is present. `All integrated sites` is acceptable shorthand. |
 | `Approx. Instance Count` | Rough order-of-magnitude across all listed sites (e.g., `~40`, `~10–20`, `>100`, `unknown`). A magnitude is enough. |
 | `Current Access Path` | How equipment in this class is accessed **today** — direct OPC UA from Aveva System Platform / Ignition / ScadaBridge, via a site-level gateway, via a vendor-specific driver in System Platform, not connected at all, etc. |
 | `OtOpcUa Driver Needed?` | One of: `No — already OPC UA` (OtOpcUa connects as an OPC UA client, no driver build), `Yes — core candidate` (protocol is broad enough to warrant a proactive driver), `Yes — long-tail` (protocol is narrow; driver built on-demand when the first site that needs it onboards), `Unknown` (survey incomplete). |
 | `Driver Complexity (Estimate)` | `Low` / `Medium` / `High` / `Unknown`. Proxy for how much the driver will cost to build and maintain; influences core-vs-long-tail decision. |
 | `Priority Site(s)` | If this driver is on the critical path for onboarding a specific site or cluster, name it. Drives sequencing. |
 | `Notes` | Vendor docs availability, known quirks, existing LMX or ScadaBridge work that could be reused, anything else. |
 ## Classification: core vs long-tail
 A protocol becomes a **core library driver** if it meets **any one** of these criteria:
 1. **Breadth** — present at **three or more sites**, regardless of instance count.
 2. **Volume** — present at **any number of sites** with a combined instance count **above ~25** across the estate.
 3. **Blocker** — needed to onboard a site that is on the roadmap for Year 1 or Year 2, regardless of how narrowly the protocol is used elsewhere.
 4. **Strategic vendor** — the protocol belongs to a vendor whose equipment is expected to grow in the estate (e.g., the vendor is winning new purchases), even if today's footprint is small. This is a **judgment call**, not a hard rule — use sparingly.
 A protocol is **long-tail** by default if none of the above apply. Long-tail drivers are built **on-demand** when the first site that needs the protocol reaches onboarding.
 **Protocols already OPC UA** are **neither core nor long-tail** — OtOpcUa speaks OPC UA natively and the work is a connection configuration, not a driver build.
 **Tiebreakers:**
 - When a protocol narrowly misses a threshold, **err toward long-tail.** Core drivers are a commitment to maintain; long-tail drivers are one-off builds. The cost of building a long-tail driver later is bounded; the cost of committing to maintain a core driver forever is not.
 - When a protocol narrowly makes a threshold but is **known to be retiring** from the estate (old generation equipment scheduled for replacement), **err toward long-tail** and make a note about the retirement horizon.
 ## Current inventory
 > **Discovery not started.** Populate as the protocol survey is conducted. The rows below are pre-seeded with **expected categories** based on typical discrete-manufacturing estates — they are **placeholders** to make the shape of the table obvious, not confirmed observations. Remove or merge them as real discovery data arrives. **Nothing in this section is authoritative until it has a non-TBD `Approx. Instance Count` and at least one confirmed `Sites` entry.**
 ### EQP-001 — OPC UA-native equipment
 | Field | Value |
 |---|---|
 | **ID** | EQP-001 |
 | **Equipment Class** | Equipment speaking OPC UA natively (mixed vendors and generations) |
 | **Vendor(s)** | _TBD — almost certainly a mix, name the top 3–5 vendors once known_ |
 | **Native Protocol** | OPC UA |
 | **Protocol Variant / Notes** | _TBD — security modes actually in use (`None` / `Sign` / `SignAndEncrypt`), profiles (`Basic256Sha256`, `Aes256_Sha256_RsaPss`, etc.), auth tokens (anonymous vs UserName vs certificate)_ |
 | **Sites** | _TBD_ |
 | **Approx. Instance Count** | _TBD_ |
 | **Current Access Path** | Mixed — direct OPC UA sessions from Aveva System Platform, Ignition, and/or ScadaBridge depending on the equipment. See [`../current-state.md`](../current-state.md) → Equipment OPC UA. |
 | **OtOpcUa Driver Needed?** | **No — already OPC UA.** OtOpcUa acts as an OPC UA client to these devices; no driver build, but connection configuration and auth setup still required. |
 | **Driver Complexity (Estimate)** | N/A — connection work only. |
 | **Priority Site(s)** | N/A |
 | **Notes** | Will be the **easiest** equipment class to bring onto OtOpcUa once OtOpcUa ships — no driver work, just redirect the client-side connection. Expected to be a meaningful fraction of the estate given the "OPC UA-first" posture of most equipment vendors over the last decade. Survey should **still** capture this category because the count informs how much of the tier-1 ScadaBridge cutover is "redirect an existing OPC UA client" vs "bridge through a new driver." |
 ### EQP-002 — Siemens PLC family (S7)
 | Field | Value |
 |---|---|
 | **ID** | EQP-002 |
 | **Equipment Class** | Siemens S7 PLCs (S7-300 / S7-400 / S7-1200 / S7-1500) |
 | **Vendor(s)** | Siemens |
 | **Native Protocol** | Siemens S7 (ISO-on-TCP); newer S7-1500 also speaks OPC UA natively |
 | **Protocol Variant / Notes** | _TBD — mix of S7 generations determines whether the S7 driver is actually needed or whether OPC UA covers most units_ |
 | **Sites** | _TBD_ |
 | **Approx. Instance Count** | _TBD_ |
 | **Current Access Path** | _TBD — likely a mix of System Platform S7 IO drivers and direct OPC UA on newer units_ |
 | **OtOpcUa Driver Needed?** | **Unknown.** Depends on whether the S7-1500 OPC UA footprint has displaced older S7 generations. If S7-300/400 still dominate → **core candidate**. If fleet is mostly S7-1500 → `No — already OPC UA` for most units, long-tail for the residual older generations. |
 | **Driver Complexity (Estimate)** | Medium — S7 protocol is well-documented; multiple open-source implementations exist; the work is in matching existing System Platform semantics. |
 | **Priority Site(s)** | _TBD_ |
 | **Notes** | Strong core candidate on first-principles grounds — Siemens is a common PLC vendor in discrete manufacturing. Confirm or refute with a specific count during discovery. |
 ### EQP-003 — Allen-Bradley / Rockwell PLC family (EtherNet/IP)
 | Field | Value |
 |---|---|
 | **ID** | EQP-003 |
 | **Equipment Class** | Allen-Bradley / Rockwell ControlLogix, CompactLogix, MicroLogix, SLC families |
 | **Vendor(s)** | Rockwell Automation |
 | **Native Protocol** | EtherNet/IP (CIP) |
 | **Protocol Variant / Notes** | _TBD — implicit vs explicit messaging, tag-based vs legacy data table access_ |
 | **Sites** | _TBD_ |
 | **Approx. Instance Count** | _TBD_ |
 | **Current Access Path** | _TBD — likely System Platform EtherNet/IP IO driver_ |
 | **OtOpcUa Driver Needed?** | **Unknown — likely core candidate.** Rockwell equipment does not generally speak OPC UA natively at the controller level, so if Rockwell has any meaningful footprint in the estate, an EtherNet/IP driver is a core candidate. |
 | **Driver Complexity (Estimate)** | Medium-to-high — CIP is a large protocol family; scope depends on which message classes are actually needed. |
 | **Priority Site(s)** | _TBD_ |
 | **Notes** | Paired with EQP-002 (Siemens) as the two most likely dominant PLC protocol families. Confirm scope during discovery. |
 ### EQP-004 — Generic Modbus devices
 | Field | Value |
 |---|---|
 | **ID** | EQP-004 |
 | **Equipment Class** | Modbus TCP and Modbus RTU devices across instruments, sensors, power meters, older PLCs, variable-frequency drives |
 | **Vendor(s)** | Mixed — Modbus is multi-vendor |
 | **Native Protocol** | Modbus TCP, Modbus RTU (often bridged to TCP via a gateway) |
 | **Protocol Variant / Notes** | _TBD — per-device register map is the real work, not the protocol itself_ |
 | **Sites** | _TBD_ |
 | **Approx. Instance Count** | _TBD_ |
 | **Current Access Path** | _TBD — likely a mix of System Platform Modbus driver and site-level gateways_ |
 | **OtOpcUa Driver Needed?** | **Likely core candidate.** Modbus is the most common low-cost protocol in the estate on first-principles grounds. The driver itself is simple; the long-tail is **per-device register maps**. |
 | **Driver Complexity (Estimate)** | Low for the protocol; Medium-to-High for the register-map configuration surface (needs to be editable per-device without code changes). |
 | **Priority Site(s)** | _TBD_ |
 | **Notes** | **Register map configuration is the real work.** The core driver should ship a configuration mechanism (UI or templates) for register-map definition so new Modbus devices don't require code changes. Strong candidate for core library. |
 ### EQP-005 — Fanuc CNC controllers (FOCAS)
 | Field | Value |
 |---|---|
 | **ID** | EQP-005 |
 | **Equipment Class** | Fanuc CNC machine controls |
 | **Vendor(s)** | Fanuc |
 | **Native Protocol** | Fanuc FOCAS (proprietary library, not a wire protocol) |
 | **Protocol Variant / Notes** | FOCAS1 / FOCAS2 library versions |
 | **Sites** | _TBD_ |
 | **Approx. Instance Count** | _TBD_ |
 | **Current Access Path** | _TBD — likely direct or via a vendor-specific driver_ |
 | **OtOpcUa Driver Needed?** | **Core candidate if any significant CNC footprint exists.** FOCAS is the de-facto API for Fanuc CNCs and does not come "for free" with OPC UA. |
 | **Driver Complexity (Estimate)** | High — FOCAS is a C library with platform-specific bindings and licensing considerations; wrapping it into a .NET driver carries non-trivial ops/licensing work. |
 | **Priority Site(s)** | _TBD — Warsaw campuses or TMT likely candidates if they have CNC machining centers_ |
 | **Notes** | **Decouple from core library decision until the CNC count is known.** If CNC is a meaningful fraction of the estate, FOCAS is unavoidable. If CNC is a handful of machines, build on-demand as long-tail. A separate alternative — MTConnect — is worth asking about (some modern CNCs expose MTConnect, which is a simpler target). |
 ### EQP-006 — Other long-tail equipment
 | Field | Value |
 |---|---|
 | **ID** | EQP-006 |
 | **Equipment Class** | Everything else — instruments, ovens, vision systems, stand-alone controllers, legacy proprietary devices |
 | **Vendor(s)** | Mixed |
 | **Native Protocol** | Various — ASCII serial, proprietary, vendor-specific |
 | **Protocol Variant / Notes** | _TBD — catalog as encountered_ |
 | **Sites** | _TBD_ |
 | **Approx. Instance Count** | _TBD_ |
 | **Current Access Path** | _TBD — often bespoke per-device integration work in System Platform today_ |
 | **OtOpcUa Driver Needed?** | **Long-tail — build on-demand.** This is the category the "on-demand long-tail" driver strategy exists to serve. |
 | **Driver Complexity (Estimate)** | Low-to-High per device — varies wildly. |
 | **Priority Site(s)** | Wherever the first blocker appears. |
 | **Notes** | Track individual long-tail cases as separate rows (EQP-007, EQP-008, …) as discovery identifies them. The placeholder above exists only to anchor the category. |
 ### _Further rows TBD — add as discovery progresses_
 ## Rollup views
 These views are **derived** from the row-level data above. Regenerate as rows are updated.
 ### By protocol — drives core library scope
 | Native Protocol | Row IDs | Total Approx. Instances | Sites | Core / Long-tail / Already OPC UA |
 |---|---|---|---|---|
 | OPC UA | EQP-001 | _TBD_ | _TBD_ | Already OPC UA — no driver needed |
 | Siemens S7 | EQP-002 | _TBD_ | _TBD_ | _TBD — depends on S7-1500 fraction_ |
 | EtherNet/IP | EQP-003 | _TBD_ | _TBD_ | _TBD — likely core_ |
 | Modbus TCP/RTU | EQP-004 | _TBD_ | _TBD_ | _TBD — likely core_ |
 | Fanuc FOCAS | EQP-005 | _TBD_ | _TBD_ | _TBD — depends on CNC count_ |
 | Long-tail mix | EQP-006+ | _TBD_ | _TBD_ | Long-tail — on-demand |
 **Decision output of this table:** the **core driver library scope** for Year 1 OtOpcUa. A protocol row tagged `Core` becomes a Year 1 build commitment; `Long-tail` becomes a Year 2+ on-demand build budget; `Already OPC UA` becomes connection configuration work only.
 ### By site — drives onboarding sequencing
 | Site | Row IDs present | Protocols present | Blockers for tier-1 cutover |
 |---|---|---|---|
 | South Bend DC (primary cluster) | _TBD_ | _TBD_ | _TBD_ |
 | Warsaw West | _TBD_ | _TBD_ | _TBD_ |
 | Warsaw North | _TBD_ | _TBD_ | _TBD_ |
 | Shannon | _TBD_ | _TBD_ | _TBD_ |
 | Galway | _TBD_ | _TBD_ | _TBD_ |
 | TMT | _TBD_ | _TBD_ | _TBD_ |
 | Ponce | _TBD_ | _TBD_ | _TBD_ |
 **Decision output of this table:** **tier-1 ScadaBridge cutover sequencing.** The roadmap commits tier 1 at large sites first; this view identifies which sites can cut over with which drivers available, so that sequencing is driven by driver availability, not just site size.
 ## Discovery approach
 Recommended path — not prescriptive. **Each step produces both protocol-survey rows (equipment-class granularity) and UNS hierarchy snapshot rows (equipment-instance granularity) in the same pass** — see "Companion deliverable" above.
 1. **Walk the Aveva System Platform IO configuration** at the primary cluster and each site cluster. System Platform's IO server layer is the most complete existing inventory of "what protocols are we talking to what equipment with." Every configured IO object is a row candidate — and its parent galaxy / area / cluster assignment maps directly onto UNS Site and Area levels.
 2. **Walk the Ignition OPC UA connections** at the central Ignition footprint in South Bend. These are the equipment Ignition talks to directly today. Every distinct endpoint is a row candidate; Ignition's tag browse tree typically groups endpoints by site and production area, which feeds the UNS hierarchy walk directly.
 3. **Walk ScadaBridge template configuration** for equipment-facing templates. Less complete than System Platform IO, but captures anything already templated on ScadaBridge. Template groupings also surface line-level organization where it exists.
 4. **Walk any existing site asset registers or CMDBs** if present. Lower priority — often stale, rarely matches reality — but may surface equipment not yet integrated with any SCADA layer, and may be the best source for a stable enterprise equipment ID that can seed the UNS UUID.
 5. **Interview site controls/automation engineers** at each large site. They are the ground truth for "what's actually out there" — especially for long-tail equipment that never made it into a central inventory. Interviews are also the best source for **Line-level structure**, which is frequently implicit in operator knowledge and absent from System Platform or Ignition configuration.
 6. **Cross-check against Camstar's equipment master data** (if Camstar tracks equipment at that granularity) as a sanity check against what manufacturing operations believes is on the floor — and as a tiebreak when different configuration sources name the same machine differently.
 **Order matters.** Steps 1–3 are cheap and produce most of the signal for both outputs. Steps 4–6 are interview-driven and time-consuming; reserve them for gap-filling after the System Platform / Ignition / ScadaBridge walks are done.
 **Dual output at each step:** the walker should carry two notebooks (or two sheets of a spreadsheet) and capture entries in both as they go — one equipment instance observed produces one row in each notebook. Reconciliation between the two outputs happens at the end of the walk, not during it. If reconciliation reveals mismatches (a machine shows up in the protocol survey but not the hierarchy, or vice versa), that's a walker error to chase down, not a data difference to split the difference on.
 ## Open questions for the survey itself
 - **Who owns the survey?** The OtOpcUa workstream lead, a dedicated discovery resource, or a rotating responsibility across site automation engineers? _TBD._
 - **Deadline.** Year 1 OtOpcUa work cannot scope the core driver library without this survey — it's a **Year 1 prerequisite**. Aim to complete at least steps 1–3 (System Platform / Ignition / ScadaBridge walks) within the first quarter of Year 1 so core driver library build can start by quarter 2. Interview-driven gap-filling can extend through the rest of Year 1 in parallel with core driver build.
 - **How often is this file reviewed after Year 1?** Recommended: quarterly during Year 1 (active discovery), annually thereafter (to catch drift from new equipment purchases). Add a row when a new equipment class shows up in the estate; do **not** remove rows when drivers ship.
 - **Relationship to Site Onboarding workstream.** When a smaller site (Berlin, Winterthur, Jacksonville, …) is onboarded in Year 2, its equipment should be added to this file as part of the onboarding checklist — that way the core-vs-long-tail decision is re-evaluated as the estate grows.
 - **MTConnect.** Modern CNCs often expose MTConnect as an alternative to FOCAS. If MTConnect coverage is broad enough, it may replace FOCAS as the CNC driver choice. Worth an explicit "is MTConnect in play here?" question during discovery.
--- a/current-state/legacy-integrations.md
+++ b/current-state/legacy-integrations.md
@@ -0,0 +1,126 @@
 # Legacy Integrations Inventory
 The authoritative list of **legacy point-to-point integrations** that currently run **outside ScadaBridge** and must be retired by end of plan (see `goal-state.md` → Success Criteria → pillar 3).
 > This file is the **denominator** for the zero-count retirement target. If an integration is not captured here, it cannot be tracked to retirement — so capture first, argue about classification later.
 ## How to use this file
 - **One row per integration.** If the same logical integration runs at multiple sites with the same code path, it's one row with multiple sites listed. If each site runs a meaningfully different variant, split into separate rows.
 - **"Legacy" here means any IT↔OT integration path that crosses the ScadaBridge-central boundary without going through ScadaBridge.** In the target architecture, **ScadaBridge central is the sole IT↔OT crossing point**; anything that crosses that boundary via another path — Web API interfaces exposed by the Aveva System Platform primary cluster, custom services, scheduled jobs, file drops, direct DB links, etc. — is legacy and in scope for retirement.
 - **Not in scope for this inventory:** OT-internal traffic. System Platform ↔ System Platform traffic over Global Galaxy, site-level ScadaBridge ↔ local equipment, site System Platform clusters ↔ central System Platform cluster — all of this stays on the OT side in the target architecture and is not tracked here.
 - **Fields are described in the schema below.** Missing values are fine during discovery — mark them `_TBD_` rather than leaving blank, so gaps are visible.
 - **Do not remove rows.** Retired integrations stay in the file with `Status: Retired` and a retirement date, so pillar 3 progress is auditable.
 ## Field schema
 | Field | Description |
 |---|---|
 | `ID` | Short stable identifier (e.g., `LEG-001`). Never reused. |
 | `Name` | Human-readable name of the integration. |
 | `Source System` | System the data comes from (e.g., System Platform primary cluster, specific SCADA node, MES, PLC). |
 | `Target System` | System the data goes to (e.g., Camstar MES, Delmia DNC, a specific database, an external partner). |
 | `Direction` | One of: `Source→Target`, `Target→Source`, `Bidirectional`. |
 | `Transport` | How the integration moves data (e.g., Web API, direct DB link, file drop, OPC DA, scheduled SQL job, SOAP). |
 | `Site(s)` | Sites where this integration runs. `All integrated sites` is acceptable shorthand. |
 | `Traffic Volume` | Rough order-of-magnitude (e.g., ~10 req/min, ~1k events/day, ~50 MB/day, unknown). |
 | `Business Purpose` | One sentence: what this integration exists to do. |
 | `Current Owner` | Team or person responsible today. |
 | `Dependencies` | Other integrations or systems that depend on this one continuing to work. |
 | `Migration Target` | What ScadaBridge pattern replaces it (e.g., `ScadaBridge inbound Web API`, `ScadaBridge → EventHub topic X`, `ScadaBridge outbound Web API call`, `ScadaBridge DB write`). |
 | `Retirement Criteria` | Concrete, testable conditions that must be true before this integration can be switched off (e.g., "all consumers reading topic `mes.workorder.started` in prod for 30 days"). |
 | `Status` | One of: `Discovered`, `Planned`, `In Migration`, `Dual-Run`, `Retired`. |
 | `Retirement Date` | Actual retirement date once `Status = Retired`. |
 | `Notes` | Anything else — known risks, gotchas, related tickets. |
 ## Current inventory
 > **Discovery complete — denominator = 3.** Three legacy IT↔OT integrations are tracked: **LEG-001 Delmia DNC** (Aveva Web API), **LEG-002 Camstar MES** (Aveva Web API, Camstar-initiated), **LEG-003 custom email notification service**. Header-level field details on these three are still largely TBD and get filled in as migration planning proceeds, but no further integrations are expected to be added — the inventory is closed as the pillar 3 denominator unless new evidence surfaces during migration work. See **Deliberately not tracked** below for categories explicitly carved out of the definition.
 ### LEG-001 — Aveva Web API ↔ Delmia DNC
 | Field | Value |
 |---|---|
 | **ID** | LEG-001 |
 | **Name** | Aveva System Platform Web API — Delmia DNC interface |
 | **Source System** | Aveva System Platform primary cluster (initiates the download request) **and** Delmia DNC (initiates the completion notification back). |
 | **Target System** | Delmia DNC (recipe library / download service) on one leg; Aveva System Platform primary cluster + equipment on the return leg. |
 | **Direction** | **Bidirectional — orchestrated handshake.** <br>(1) **System Platform → Delmia:** System Platform triggers a Delmia download ("fetch recipe X"). <br>(2) **Delmia → System Platform:** when the download completes, Delmia notifies System Platform so System Platform can parse/load the recipe file to the equipment. <br>Both legs go over the Web API on the Aveva System Platform primary cluster. |
 | **Transport** | Web API on the primary cluster — used in both directions (System Platform as client for leg 1, System Platform Web API as server for leg 2). |
 | **Site(s)** | _TBD — presumably all sites with equipment that consumes Delmia recipe files, confirm_ |
 | **Traffic Volume** | _TBD — recipe download events are typically low-frequency compared to tag data, order-of-magnitude unknown_ |
 | **Business Purpose** | **Delmia DNC distributes recipe (NC program) files to equipment**, with **System Platform orchestrating the handshake**. The sequence is: System Platform triggers Delmia to download a recipe file → Delmia fetches it → Delmia notifies System Platform that the download is ready → System Platform parses/loads the recipe file to the equipment. Parsing is required for some equipment classes; pass-through for others. |
 | **Current Owner** | _TBD_ |
 | **Dependencies** | Equipment that consumes Delmia recipe files is the downstream dependency. Any equipment requiring the System Platform parse step is a harder-to-migrate case than equipment that accepts a raw file. |
 | **Migration Target** | ScadaBridge replaces the System Platform Web API as the Delmia-facing surface on **both legs** of the handshake: <br>— **Leg 1 (trigger):** ScadaBridge calls Delmia via its **outbound Web API client** capability to trigger a recipe download (replacing the System Platform → Delmia call). <br>— **Leg 2 (notification):** ScadaBridge exposes an **inbound secure Web API** endpoint that Delmia calls to notify download completion (replacing the Delmia → System Platform Web API callback). <br>— **Parse/load step:** ScadaBridge **scripts** (C#/Roslyn) re-implement the recipe-parse logic that currently runs on System Platform; ScadaBridge then writes the parsed result to equipment via its OPC UA write path. <br>_TBD — whether the parse logic lives per-site or centrally; whether Delmia can target per-site ScadaBridge endpoints directly or must go through ScadaBridge central; how orchestration state (pending triggers, in-flight downloads) is held during the handshake, especially across a WAN outage that separates the trigger and the notification._ |
 | **Retirement Criteria** | _TBD — almost certainly requires per-equipment-class validation that the ScadaBridge-parsed output matches the System Platform-parsed output byte-for-byte, for any equipment where parsing is involved._ |
 | **Status** | Discovered |
 | **Retirement Date** | — |
 | **Notes** | One of the two existing Aveva Web API interfaces documented in `../current-state.md`. Unlike LEG-002 (which has a ready replacement in ScadaBridge's native Camstar path), **LEG-001 requires new work** — the recipe-parse logic currently on System Platform has to be re-implemented (most likely as ScadaBridge scripts) before the legacy path can be retired. Expected to be a **harder retirement** than LEG-002. |
 ### LEG-002 — Aveva Web API ↔ Camstar MES
 | Field | Value |
 |---|---|
 | **ID** | LEG-002 |
 | **Name** | Aveva System Platform Web API — Camstar MES interface |
 | **Source System** | Aveva System Platform primary cluster (South Bend) — **hosts** the Web API. |
 | **Target System** | Camstar MES — **is the caller** (Camstar initiates the interaction by calling into the System Platform Web API). |
 | **Direction** | **Bidirectional, Camstar-initiated.** Camstar calls the Aveva System Platform primary-cluster Web API to initiate; request/response data flows in both directions over that call. _TBD — whether Camstar is primarily pulling data out of System Platform, pushing data into it, or doing a true two-way exchange during the call, and whether System Platform ever initiates back to Camstar through any other channel._ |
 | **Transport** | Web API on the primary cluster — **inbound to System Platform** (Camstar is the HTTP client, System Platform Web API is the HTTP server). |
 | **Site(s)** | _TBD — presumably all sites, confirm_ |
 | **Traffic Volume** | _TBD_ |
 | **Business Purpose** | MES integration — details _TBD_ |
 | **Current Owner** | _TBD_ |
 | **Dependencies** | _TBD_ |
 | **Migration Target** | **Move callers off the System Platform Web API and onto ScadaBridge's existing direct Camstar integration.** ScadaBridge already calls Camstar directly (see `../current-state.md` → ScadaBridge downstream consumers), so the retirement work for this integration is about redirecting existing consumers, **not** building a new ScadaBridge→Camstar path. |
 | **Retirement Criteria** | _TBD_ |
 | **Status** | Discovered |
 | **Retirement Date** | — |
 | **Notes** | One of the two existing Aveva Web API interfaces documented in `../current-state.md`. ScadaBridge's native Camstar path exists today, which means LEG-002 is an "end-user redirect" problem rather than a "build something new" problem — typically cheaper and faster to retire than an integration that requires replacement work. **Important directional nuance:** because **Camstar** is the caller (not System Platform), retirement can't be done purely by changing System Platform's outbound code — it requires reconfiguring Camstar to call ScadaBridge instead of the System Platform Web API. That makes Camstar's change-control process a critical dependency for retirement planning. The deeper Camstar integration described in `goal-state.md` may still inform the long-term shape. |
 ### LEG-003 — System Platform → email notifications via custom service
 | Field | Value |
 |---|---|
 | **ID** | LEG-003 |
 | **Name** | Aveva System Platform → email notifications via custom in-house service |
 | **Source System** | Aveva System Platform (emits notification events) — _TBD whether from the primary cluster only, from site clusters, or both_. |
 | **Target System** | Enterprise email system, reached via a **custom in-house notification service** that sits between System Platform and email. |
 | **Direction** | Source→Target. System Platform emits notification events; the custom service relays them to email. No known return path from email back to System Platform. |
 | **Transport** | Custom in-house notification service. _TBD — how the service is fed from System Platform (scripted handler in System Platform, direct DB read, OPC UA / LmxOpcUa subscription, Web API push, file drop, …) and how it reaches email (SMTP to enterprise relay, Exchange Web Services, Graph API, …)._ |
 | **Site(s)** | _TBD — confirm which clusters actually emit through this path and whether site clusters use the same service or have their own variants_ |
 | **Traffic Volume** | _TBD_ |
 | **Business Purpose** | Operator / engineering / maintenance notifications (alarms, state changes, threshold breaches — _TBD_ which categories) emitted from System Platform and delivered by email. |
 | **Current Owner** | _TBD — service is in-house but ownership of the custom service codebase and its operational runbook needs to be confirmed._ |
 | **Dependencies** | Enterprise email system availability; recipient distribution lists baked into the custom service's configuration; any downstream workflows (ticketing, on-call escalation, acknowledgement) that trigger off these emails — _TBD_ whether any such downstream automation exists. |
 | **Migration Target** | **ScadaBridge native notifications.** ScadaBridge already provides contact-list-driven, transport-agnostic notifications over **email and Microsoft Teams** (see `../current-state.md` → ScadaBridge Capabilities → Notifications). Migration path: port the existing notification triggers to ScadaBridge scripts, recreate recipient lists as ScadaBridge contact lists, dual-run both notification paths for a window long enough to catch low-frequency alarms, then cut over and decommission the custom service. |
 | **Retirement Criteria** | _TBD — at minimum: (1) **trigger parity** — every notification the custom service emits today is emitted by a ScadaBridge script; (2) **recipient parity** — every recipient on every distribution list exists on the equivalent ScadaBridge contact list; (3) **delivery SLO parity** — email delivery latency on the ScadaBridge path meets or beats today's; (4) **dual-run window** long enough to surface low-frequency notifications (monthly / quarterly events)._ |
 | **Status** | Discovered |
 | **Retirement Date** | — |
 | **Notes** | **Easier retirement than LEG-001** — ScadaBridge's notification capability already exists in production, so this is a migration of **triggers and recipient lists**, not new build work. Closer in difficulty to LEG-002. **Open risk:** if the custom service also handles non-notification workflows (acknowledgement callbacks, escalation state, ticketing integration), those are out of scope for ScadaBridge's notification capability and would need a separate migration path — confirm during discovery whether the service is purely fire-and-forget email or has state. **Also — Teams as a by-product:** moving to ScadaBridge opens the door to routing the same notifications over Microsoft Teams without code changes (contact-list config only), which may be a quick win to surface with stakeholders during dual-run. |
 ## Deliberately not tracked
 Categories explicitly carved out of the "legacy integration" definition. These crossings may look like they cross the IT↔OT boundary, but they are **not** pillar 3 work and are **not** counted against the "drive to zero" target. The carve-outs are load-bearing — without them, pillar 3's zero target is ambiguous.
 ### Historian SQL reporting consumers (SAP BOBJ, Power BI, ad-hoc analyst SQL)
 **Not legacy.** Aveva Historian exposes data via **MSSQL** (SQL Server linked-server / OPENQUERY views) as its **native consumption surface** — the supported, designed way to read historian data. A reporting tool hitting those SQL views is using Historian as intended, not a bespoke point-to-point integration grafted onto the side of it.
 Pillar 3's "legacy integration" target covers **bespoke IT↔OT crossings** — Web API interfaces exposed by the System Platform primary cluster, custom services, file drops, direct DB links into internal stores that weren't designed for external reads. Consuming a system's own first-class SQL interface is categorically different and does not fit that definition.
 The BOBJ → Power BI migration currently in flight (see `../current-state.md` → Aveva Historian → Current consumers) will reshape this surface independently of pillar 3. Whether Power BI ultimately reads from Historian's SQL interface, from Snowflake's dbt curated layer, or from both is a coordination question between the reporting team and this plan (tracked in `../status.md` as a top pending item) — but whichever way it lands, the resulting path is **not** tracked here as a retirement target.
 > **Implication:** if at any point a reporting consumer stops using Historian's SQL views and instead starts talking to Historian via a bespoke side-channel (custom extract job, scheduled export, direct file read of the historian store, etc.), **that** side-channel **would** be legacy and would need a row in the inventory. The carve-out applies specifically to the native MSSQL surface.
 ### OT-internal traffic
 Restating the rule from "How to use this file" above, for explicit visibility: System Platform ↔ System Platform traffic over Global Galaxy, site-level ScadaBridge ↔ local equipment, site System Platform clusters ↔ central System Platform cluster, and any other traffic that stays entirely within the OT side of the boundary is **not** legacy for pillar 3 purposes. The IT↔OT crossing at ScadaBridge central is the only boundary this inventory tracks.
 ## Open questions for the inventory itself
 - Is there an existing system of record (e.g., a CMDB, an integration catalog, a SharePoint inventory) that this file should pull from or cross-reference rather than duplicate?
 - Discovery approach: walk the Aveva System Platform cluster configs, interview site owners, scan network traffic, or some combination?
 - ~~Definition boundary: does "legacy integration" include internal System Platform ↔ System Platform traffic over Global Galaxy, or only IT↔OT crossings?~~ **Resolved:** **only IT↔OT crossings at the central ScadaBridge boundary.** In the target architecture, System Platform traffic (including Global Galaxy, site↔site, and site↔central System Platform clusters) stays **entirely on the OT side** and is not subject to pillar 3 retirement. The **IT↔OT boundary** is specifically **ScadaBridge central ↔ enterprise integrations** — that crossing is the only one the inventory tracks and the only one subject to the "zero legacy paths" target.
 - How often is this file reviewed, and by whom? (Needs to be at least quarterly to feed pillar 3 progress metrics.)
--- a/digital_twin_usecases.md.txt
+++ b/digital_twin_usecases.md.txt
@@ -0,0 +1,74 @@
 1) Standardized Equipment State / Metadata Model
 Use case:
 Create a consistent, high-level representation of machine state derived from raw signals.
 What it does:
 	•	Converts low-level sensor/PLC data into meaningful states (e.g., Running, Idle, Faulted, Starved, Blocked)
 	•	Normalizes differences across equipment types
 	•	Aggregates multiple signals into a single, authoritative “machine state”
 Examples:
 	•	Deriving true run state from multiple interlocks and status bits
 	•	Calculating actual cycle time vs. theoretical
 	•	Identifying top fault instead of exposing dozens of raw alarms
 Value:
 	•	Provides a single, consistent view of equipment behavior
 	•	Reduces complexity for downstream systems and users
 	•	Improves accuracy of KPIs like OEE and downtime tracking
 ⸻
 2) Virtual Testing / Simulation (FAT, Integration, Validation)
 Use case:
 Use a digital representation of equipment to simulate behavior for testing without requiring physical machines.
 What it does:
 	•	Emulates machine signals, states, and sequences
 	•	Allows testing of automation logic, workflows, and integrations
 	•	Supports replay of historical scenarios or generation of synthetic ones
 Examples:
 	•	Simulating startup, shutdown, and fault conditions
 	•	Testing alarm handling and recovery workflows
 	•	Validating system behavior under edge cases (missing data, delays, abnormal sequences)
 Value:
 	•	Enables earlier testing before equipment is available
 	•	Reduces commissioning time and risk
 	•	Improves quality and stability of deployed systems
 ⸻
 3) Cross-System Data Normalization / Canonical Model
 Use case:
 Act as a common semantic layer between multiple systems interacting with manufacturing data.
 What it does:
 	•	Defines standardized data structures for equipment, production, and events
 	•	Translates system-specific formats into a unified model
 	•	Provides a consistent interface for all consumers
 Examples:
 	•	Mapping different machine tag structures into a common equipment model
 	•	Standardizing production counts, states, and identifiers
 	•	Providing uniform event definitions (e.g., “machine fault,” “job complete”)
 Value:
 	•	Simplifies integration between disparate systems
 	•	Reduces duplication of transformation logic
 	•	Improves data consistency and interoperability across the enterprise
 ⸻
 Combined Outcome
 Together, these three use cases position a digital twin as:
 	•	A translator (raw signals → meaningful state)
 	•	A simulator (test without physical dependency)
 	•	A standard interface (consistent data across systems)
 This approach focuses on practical operational value rather than high-fidelity modeling, aligning well with discrete manufacturing environments.
--- a/goal-state.md
+++ b/goal-state.md
@@ -0,0 +1,815 @@
 # Goal State (3-Year Target)
 Target end-state for shopfloor IT/OT interfaces and data collection at the end of the 3-year plan.
 > When a section below grows beyond a few paragraphs, break it out into `goal-state/<component>.md` and leave a short summary + link here. See [`CLAUDE.md`](CLAUDE.md#breaking-out-components).
 ## Vision
 **Overarching theme: provide a stable, single point of integration between shopfloor OT and enterprise IT.** Every design decision in this plan reduces back to that goal — one bridge (ScadaBridge central), one event backbone (Redpanda), one machine-data path to Snowflake (the integration service), one place for schemas, one set of conventions. "Stable" and "single" are load-bearing words: stability rules out bespoke one-offs that drift, and singleness rules out parallel integration estates that compete with the unified model. When a decision is ambiguous later in the plan, this theme is the tiebreaker.
 By the end of the 3-year plan, shopfloor IT/OT is built on **one unified integration model across every site** — large campuses (Warsaw West/North), mid-size integrated sites (Shannon, Galway, TMT, Ponce), and the currently-unintegrated smaller sites (Berlin, Winterthur, Jacksonville, and others) all onboard through the same standardized pattern centered on **ScadaBridge** as the IT/OT bridge and **EventHub** as the async backbone. That unified foundation **unlocks enterprise analytics and AI on shopfloor data** by making curated, governed shopfloor data available in **Snowflake** at the right latency and granularity for downstream consumers. In parallel, **legacy point-to-point middleware and bespoke integrations are retired** in favor of ScadaBridge-managed flows, leaving a single, supportable IT/OT integration estate.
 **Explicitly in scope:**
 - (a) Unifying all sites under one integration model.
 - (b) Unlocking enterprise analytics/AI on shopfloor data.
 - (d) Replacing legacy middleware.
 **Explicitly not a primary goal:**
 - (c) Modernizing operator UX is **not** a primary driver of this 3-year plan. Operator-facing UIs may improve incidentally (e.g., as a side effect of migrating to new integrations), but UX modernization is not a success criterion and should not compete for scope, budget, or sequencing. See **Non-Goals**.
 ## Target IT/OT Integration
 ### Layered architecture — the mental model
 The target architecture has **four layers** on the OT side plus the enterprise IT side above them. Each layer has one job, one kind of data, and one set of downstream consumers. This is the framing to hold in mind before diving into the component-level sections below.
 ```
                                   ┌──────────────────────────────────────┐
                                   │         Enterprise IT side           │
                                   │  (Camstar, Delmia, Snowflake via     │
                                   │   Machine Data → Snowflake Service,  │
                                   │   Power BI / BOBJ, other apps)       │
                                   └───────────────▲──────────────────────┘
                                                   │
                               ── IT / OT boundary ─┼───────────────────
                                                   │  (ScadaBridge central
                                                   │   is the sole crossing)
                                                   │
                                   ┌───────────────┴──────────────────────┐
  Layer 4 — Bridge                 │             ScadaBridge              │
  (bridge to enterprise)           │   (site clusters + central cluster)  │
                                   └───────────────▲──────────────────────┘
                                                   │
                                   ┌───────────────┴──────────────────────┐
  Layer 3 — SCADA                  │  Aveva System Platform   │  Ignition │
  (processed data)                 │  (validated collection)  │  (KPI)    │
                                   └───────────────▲──────────────────────┘
                                                   │
                                   ┌───────────────┴──────────────────────┐
  Layer 2 — Raw data               │               OtOpcUa                │
  (single OT OPC UA endpoint       │  (one cluster per site; one session  │
  per site, two namespaces)        │   per equipment; System Platform     │
                                   │   namespace folded in from LmxOpcUa) │
                                   └───────────────▲──────────────────────┘
                                                   │
                                   ┌───────────────┴──────────────────────┐
  Layer 1 — Equipment              │ PLCs · CNCs · robots · controllers · │
  (physical devices)               │ instruments · sensors                │
                                   └──────────────────────────────────────┘
 ```
 **Layer responsibilities.**
 - **Layer 1 — Equipment.** Physical devices. Speaks whatever native protocol each device supports (native OPC UA, Modbus, Ethernet/IP, Siemens S7, proprietary, etc.). This layer does not change as part of the plan.
 - **Layer 2 — Raw data: OtOpcUa.** Holds the **single** session to each piece of equipment. Translates native device protocols into a uniform OPC UA surface (equipment namespace). Also hosts a **System Platform namespace** — the evolved LmxOpcUa capability — that exposes Aveva System Platform objects as OPC UA through the same endpoint, so OT OPC UA clients have one place to read both raw equipment data and processed System Platform data. Enforces access control and audit at the OT data boundary. Raw equipment data at this layer is exactly that — **raw** — no deadbanding, no aggregation, no business meaning; the System Platform namespace is a view onto layer 3, not a transformation.
 - **Layer 3 — SCADA: Aveva System Platform + Ignition.** Transforms raw OPC UA data into **processed data** with business meaning. Aveva System Platform handles validated/compliance-grade processing and collection; Ignition handles KPI-focused processing and presentation. Both layer-3 systems read raw equipment data from OtOpcUa's equipment namespace — neither holds direct equipment sessions in the target state. This is where engineering-unit conversions, state derivations, alarm definitions, and validated archiving happen. System Platform's outputs are re-exposed back through OtOpcUa's System Platform namespace for OPC UA-native consumers that need them.
 - **Layer 4 — Bridge: ScadaBridge.** Bridges processed OT data into the enterprise IT side. Everything that needs to cross IT↔OT goes through ScadaBridge central. Site-level ScadaBridge clusters handle site-local scripting, templating, notifications, and store-and-forward; ScadaBridge central is the sanctioned crossing point. ScadaBridge does **not** reach into equipment directly — it consumes OT data from OtOpcUa (both namespaces as needed).
 - **Enterprise IT side.** Camstar, Delmia, Snowflake (via the SnowBridge), Power BI / BusinessObjects, and any other enterprise app. Lives entirely above the IT↔OT boundary; consumes layer-4 outputs through sanctioned interfaces.
 **Where the other components fit in this mental model.**
 - **LmxOpcUa is not a separate component in the target state — it is absorbed into OtOpcUa as the System Platform namespace.** The layered model therefore has no separate "LmxOpcUa tap"; OtOpcUa serves both raw equipment data (its primary layer-2 role) and a System Platform view (the absorbed LmxOpcUa role) through a single endpoint with two namespaces. See the OtOpcUa section above for the fold-in details.
 - **Aveva Historian** is a **store adjacent to layer 3**, not a layer of its own. System Platform (layer 3) writes into it; consumers read historical validated data out of it (either through its SQL interface or through Snowflake after SnowBridge has pulled from it). It is the long-term system of record for validated data regardless of what Snowflake chooses to store.
 - **Redpanda (EventHub)** is **infrastructure used between layer 4 and enterprise IT**, not a layer of its own. ScadaBridge publishes events into it; enterprise consumers (SnowBridge, KPI processors, Camstar integration) read from it. It decouples layer-4 producers from enterprise consumers without introducing a fifth layer.
 - **SnowBridge** is an **enterprise-side consumer** that happens to read from both the adjacent-to-layer-3 store (Aveva Historian) and the layer-4-to-enterprise backbone (Redpanda). Its job is the governed, filtered upload to Snowflake — it does not fit inside the layered data path itself.
 - **dbt** runs **inside Snowflake**, so it is enterprise-side infrastructure that transforms landed data. It has no layer-1-through-4 position.
 - **OtOpcUa's raw data goes "up" through this stack and back out the top on the IT side.** A tag read from a machine in Warsaw West flows: equipment → OtOpcUa (layer 2) → System Platform or Ignition (layer 3) → ScadaBridge (layer 4) → Redpanda → SnowBridge → Snowflake → dbt → Power BI or downstream consumer.
 **What the layering rules out.** Cross-layer shortcuts that bypass the layer in between:
 - No layer-4 component (ScadaBridge) reads equipment directly. It must go through OtOpcUa (for OPC UA reads) or layer 3 (for processed data).
 - No enterprise-IT component reads layer 2 or equipment directly. It must go through layer 4 (or, for historical data, through the Aveva Historian → SnowBridge path, which still originates in layer 3's write-to-store).
 - No layer-3 component holds direct equipment sessions in the target state. Today they do (see current-state); the OtOpcUa tier-3 cutover is what ends that.
 - No enterprise-IT component holds direct Redpanda-bypass connections into site infrastructure. Redpanda is the only path from layer 4 out to enterprise consumers for event-driven flows.
 - No second OPC UA server runs alongside OtOpcUa. LmxOpcUa's role lives inside OtOpcUa; a future component that needs to expose OT data over OPC UA should add a namespace to OtOpcUa, not stand up its own OPC UA server.
 If a future component (including a digital twin, a new AI/ML platform, an additional enterprise app, or a new third-party tool) does not fit into one of these layer roles, it is almost certainly either a layer-3 or layer-4 consumer that needs to be reshaped to consume through the existing stack — not a new parallel path.
 ### Unified Namespace (UNS) posture
 "Unified Namespace" is a concept popularized by Walker Reynolds / 4.0 Solutions / HighByte and the MQTT-Sparkplug community. It typically means: one hierarchical data tree (ISA-95 — Enterprise / Site / Area / Line / Cell / Equipment), MQTT Sparkplug B as transport, a central broker as the hub, and every system publishing to or consuming from the same namespace instead of building point-to-point integrations. The plan **does not build a classic MQTT/Sparkplug B UNS** — but the composition of existing commitments already delivers the UNS value proposition, and the plan reframes that composition as **"our UNS"** for stakeholders who use that vocabulary.
 This subsection is a **framing commitment**, not a build commitment. It does not add a new component, a new workstream, or a new pillar criterion. It is the plan's answer to the question "do we have a unified namespace?"
 #### The plan's UNS composition
 Four existing commitments, together, constitute the unified namespace:
 | Component | UNS role |
 |---|---|
 | **OtOpcUa equipment namespace** (per site) | Hierarchical real-time OT data surface. Equipment, tags, and derived state exposed as a canonical OPC UA tree at each site. This is the "classic UNS hierarchy" at the site level — equipment-class templates in the `schemas` repo define the node layout. |
 | **Redpanda topics + canonical Protobuf schemas** | Enterprise-wide pub-sub backbone carrying canonical equipment / production / event messages. The `{domain}.{entity}.{event-type}` taxonomy + schema registry define what "speaking UNS" means on the wire. Retention tiers give consumers a bounded replay window against the UNS. |
 | **`schemas` repo + canonical model declaration** | The shared **context layer** — equipment classes, machine states (`Running / Idle / Faulted / Starved / Blocked`), event types — that makes every surface's data semantically consistent. See **Async Event Backbone → Canonical Equipment, Production, and Event Model** for the full declaration. This is where the ISA-95 hierarchy conceptually lives, even though it is not the topic path. |
 | **dbt curated layer in Snowflake** | Canonical historical / analytical surface. Consumers that need "what has this equipment done over time" read the UNS via the curated layer, with the same vocabulary as the real-time surfaces. Same canonical model, different access pattern. |
 Together: a single canonical data model (the `schemas` repo), a single real-time backbone (Redpanda), a canonical OT-side hierarchy at each site (OtOpcUa), and a canonical analytical surface (dbt). That is the UNS.
 #### UNS naming hierarchy standard
 The plan commits to a **single canonical naming hierarchy** for addressing equipment across every UNS surface (OtOpcUa, Redpanda, dbt, `schemas` repo). Without this, each surface would re-derive its own naming and drift apart; the whole point of "a single canonical model" evaporates.
 ##### Hierarchy — five levels, always present
 | Level | Name | Semantics | Example |
 |---|---|---|---|
 | 1 | **Enterprise** | Single root for the whole organization. One value for the entire estate. | `ent` (working placeholder — replace with the real enterprise shortname when assigned) |
 | 2 | **Site** | Physical location. Matches the authoritative site list in [`current-state.md`](current-state.md) → Enterprise Layout. | `south-bend`, `warsaw-west`, `warsaw-north`, `shannon`, `galway`, `tmt`, `ponce`, `berlin`, `winterthur`, `jacksonville`, … |
 | 3 | **Area** | A section of the site — typically a **production building** at the Warsaw campuses (which run one cluster per building), or `_default` at sites that have a single cluster covering the whole site. Always present; uniform path depth is a design goal. | `bldg-3`, `bldg-7`, `_default` |
 | 4 | **Line** | A production line or work cell within an area. One line = one coherent sequence of equipment working together toward a product or sub-assembly. | `line-2`, `assembly-a`, `packout-1` |
 | 5 | **Equipment** | An individual machine instance. Equipment class prefix + instance number or shortname. | `cnc-mill-05`, `injection-molder-02`, `vision-system-01` |
 Five levels is a **hard commitment**. Consumers can assume every equipment instance in the UNS has exactly five path segments from root to leaf. This simplifies parsing, filtering, and joining across surfaces.
 **Signal / tag is a level-6 property of equipment, not a path level.** Individual data points on a piece of equipment (e.g., `spindle-speed`, `door-state`, `top-fault`) live as child nodes under the equipment in the OtOpcUa namespace and as field references in canonical event payloads. The "address" of the **equipment** stops at level 5; a **signal** is addressed as `equipment-path + signal-name`.
 ##### Why "always present with placeholders" rather than "variable depth"
 - **Uniform depth makes consumers simpler.** Subscribers and dbt models assume a fixed schema for the equipment identifier; variable-depth paths require special-casing.
 - **Adding a building later doesn't shift paths.** If a small site adds a second production building and needs an Area level, the existing equipment at that site keeps its path (now pointing at a named area instead of `_default`), and the new building gets a new area segment — no rewrites, no breaking changes for historical consumers.
 - **Explicit placeholder is more discoverable than an implicit skip.** A reader looking at `ent.shannon._default.line-1.cnc-mill-03` immediately sees that Shannon has no area distinction today; a variable-depth alternative like `ent.shannon.line-1.cnc-mill-03` leaves the reader wondering whether a level is missing.
 ##### Naming rules
 Identical conventions to the existing Redpanda topic naming — one vocabulary, two serializations (text form for messages / docs / dbt keys; OPC UA browse-path form for OtOpcUa):
 - **Character set:** `[a-z0-9-]` only. Lowercase enforced. No underscores (except the literal placeholder `_default`), no camelCase, no spaces, no unicode.
 - **Segment separator:** `.` (dot) in text form; `/` (slash) in OPC UA browse paths. The two forms are **mechanically interchangeable** — same segments, different delimiter.
 - **Within-segment separator:** `-` (hyphen). Multi-word segments use hyphens (`warsaw-west`, `cnc-mill-05`).
 - **Segment length:** max 32 characters per segment. Keeps individual segments readable and bounds OPC UA node-name length.
 - **Total path length:** max 200 characters in text form (root + 4 dots + 5 segments ≤ 160 chars in practice, leaving headroom for longer instance names).
 - **Reserved tokens:** `_default` is the only reserved segment name, used exclusively as the placeholder for a level that does not apply at a given site.
 - **Case rule on display:** paths are always shown in their canonical lowercase form. UIs may style them (larger font, breadcrumbs) but may not transform case.
 **Worked example — full path for one tag, two serializations:**
 | Form | Example |
 |---|---|
 | Text (messages, docs, dbt keys) | `ent.warsaw-west.bldg-3.line-2.cnc-mill-05.spindle-speed` |
 | OPC UA browse path | `ent/warsaw-west/bldg-3/line-2/cnc-mill-05/spindle-speed` |
 | Same machine at a small site (area placeholder) | `ent.shannon._default.line-1.cnc-mill-03` |
 ##### Stable equipment identity — path is navigation, UUID is lineage
 The path is the **navigation identifier**: it tells you where the equipment lives today and how to browse to it. But paths **can change** — a machine moves from one line to another, a building gets renumbered, a campus reorganizes. If the path were the only identifier, every rename would break historical queries, genealogy traces, and audit trails.
 The plan commits to a **stable equipment UUID** that lives alongside the path:
 - **UUID is assigned once, never reused, never changes** over the equipment's lifetime in the estate.
 - **Path can change** if the equipment moves, the area is renamed, or the line is restructured. Each change is an event in the `schemas` repo history.
 - **Canonical events carry both** the UUID (authoritative, for lineage and joins) and the current path (convenient, for filtering and human readability at the time the event was produced).
 - **Historical messages retain the path that was current at the time.** When a machine moves, historical events still carry the old path; new events carry the new path; joining through the UUID gives a continuous view across the rename.
 - **UUID format:** RFC 4122 UUIDv4 (random). Not derived from the path — derivation would couple UUID stability to path stability, defeating the purpose.
 **Consumer rule of thumb:**
 - Use the **UUID** for joins, lineage, genealogy, audit, and long-running historical queries.
 - Use the **path** for dashboards, filters, browsing, human search, and anywhere a reader needs to know *where* the equipment is right now.
 A dbt dimension table (`dim_equipment`) carries both and is the authoritative join point for analytical consumers. OtOpcUa's equipment namespace exposes the UUID as a property on each equipment node so OPC UA consumers can read it without leaving the OPC UA surface. Redpanda canonical events carry both as fields on the message.
 ##### Where the authoritative hierarchy lives
 **The `schemas` repo is the single source of truth.** Every other UNS surface consumes the authoritative definition; none of them carry an independent copy.
 | Surface | What the surface carries | Authority relationship |
 |---|---|---|
 | `schemas` repo | Canonical hierarchy definition — the full tree with UUIDs, current paths, equipment-class assignments, and evolution history. Likely stored as a structured file (YAML or Protobuf message) alongside the Protobuf schemas that reference it. | **Authoritative.** Changes go through the same CODEOWNERS + `buf`-CI governance as other schema changes. |
 | OtOpcUa equipment namespace | Browse-path structure matching the hierarchy; equipment nodes carry the UUID as a property. Built per-site from the relevant subtree (each site's OtOpcUa only exposes equipment at that site). | **Consumer.** Generated from the `schemas` repo definition at deploy/config time. Drift between OtOpcUa and `schemas` repo is a defect. |
 | Redpanda canonical event payloads | Every event payload carries `equipment_uuid` (stable) and `equipment_path` (current at event time) as fields. Enables filtering without topic explosion. | **Consumer.** Protobuf schemas reference the hierarchy definition in the same `schemas` repo. |
 | dbt curated layer in Snowflake | `dim_equipment` dimension table with UUID, current path, historical path versions, equipment class, site, area, line. Used as the join key by every analytical consumer. | **Consumer.** Populated by a dbt model that reads from an upstream reference table synced from the `schemas` repo — not hand-maintained in Snowflake. |
 ##### Evolution and change management
 The hierarchy will change. Sites get added (smaller sites onboarding in Year 2). Buildings get reorganized. Lines get restructured. Equipment moves. Renames happen. The plan commits to a governance model that makes these changes safe:
 - **All changes go through the `schemas` repo** via normal PR + CODEOWNERS review.
 - **CI enforces uniqueness** — no two pieces of equipment can share a UUID; no two pieces of equipment at the same (site, area, line) can share a leaf name; `_default` is reserved.
 - **CI enforces structural invariants** — every equipment has exactly 5 path segments; every segment matches the naming rules; no reserved tokens except `_default`.
 - **Path changes are tracked as history** — when an equipment's path changes, the old path is retained in a `path_history` list on that equipment's definition, with a timestamp and reason. Historical messages that carry the old path can be joined back to the current definition via UUID.
 - **Renames never reuse UUIDs** — if a machine is physically replaced, the new machine gets a new UUID even if the new machine sits at the same path. Genealogy stays clean.
 - **Area placeholder → named area promotion** — when a small site grows to justify area distinctions (e.g., adding a second building), the existing equipment has its path updated from `_default` to the new named area via a single PR. Historical events keep the old `_default` path; new events carry the new path; UUID stays stable.
 ##### Out of scope for this subsection
 - **Product / job / traveler hierarchy.** Products flow through equipment orthogonally to the equipment tree and are tracked in Camstar's MES genealogy, not in the UNS equipment hierarchy. A product's current equipment is joined in via MES events (`mes.workorder.*`) referencing equipment UUIDs — not by putting products into the equipment path.
 - **Operator / crew / shift hierarchy.** Same reason — orthogonal to equipment; lives elsewhere.
 - **Logical vs physical equipment.** The plan's hierarchy addresses **physical equipment instances**. Logical groupings (e.g., "all CNC mills," "all equipment on the shop floor") are queryable via equipment class + attributes in dbt or via OPC UA browse filters — not via the path hierarchy.
 - **Real-time UNS browsing UI.** If stakeholders want a tree-browse experience against the UNS (an HMI, an engineering tool), that is a consumer surface, not a hierarchy definition. The projection service discussed below is the likely delivery path if this is ever funded.
 _TBD — enterprise shortname (currently `ent` placeholder); authoritative initial hierarchy snapshot for the currently-integrated sites (blocked on the per-site area/line/equipment walk that the equipment-protocol survey will partially produce — overlap with `current-state/equipment-protocol-survey.md`); storage format for the hierarchy in the `schemas` repo (YAML vs Protobuf vs both); whether the dbt `dim_equipment` historical-path tracking needs a slowly-changing-dimension type-2 pattern or a simpler current+history list; ownership of hierarchy change PRs (likely a domain SME group, not the ScadaBridge team)._
 #### How this differs from a classic MQTT-based UNS
 Three deliberate deviations from the MQTT/Sparkplug B template. Each is a decision this plan already made for other reasons — none was made to avoid UNS, and none precludes serving consumers that expect UNS shape.
 - **Transport is Kafka, not MQTT.** Kafka was chosen for analytics replay, a bundled schema registry, long-horizon retention (30/90 days per tier), and native Snowflake ingestion via the Snowflake Kafka Connector — all of which are critical to pillar 2. MQTT is better for lightweight real-time pub-sub and has a richer tooling ecosystem for HMIs and COTS integration products; Kafka is better for the plan's actual consumer mix. **Cost of the deviation:** vendor tools that expect "UNS = MQTT" need either a Kafka client (many modern vendors have one) or a projection layer (see below).
 - **Topic structure is flat, not ISA-95-hierarchical.** Topics are named `{domain}.{entity}.{event-type}` with **site identity in the message**, not `enterprise.site.area.line.cell.equipment.tag` with per-equipment topic explosion. This was a deliberate topic-count bounding decision — adding Berlin, Winterthur, Jacksonville, etc. does not multiply the topic set. The ISA-95 hierarchy still exists **conceptually** in the `schemas` repo's equipment-class definitions; it is just not the Kafka topic path. **Cost of the deviation:** consumers expecting hierarchical topic navigation get it via the `schemas` repo and (optionally) via the projection service below, not via Kafka topic listing.
 - **No Sparkplug B state machine.** Sparkplug B adds birth/death/state certificates and a stateful producer model on top of MQTT. The plan's canonical events are **stateless** messages carrying explicit state transitions (`equipment.state.transitioned`) rather than implicit Sparkplug state. **Cost of the deviation:** if a consumer needs Sparkplug state semantics, the projection service translates at the boundary; the plan does not emit Sparkplug natively.
 #### Optional future: UNS projection service (architecturally supported, not committed for build)
 For consumers that expect a "classic UNS" surface — vendor COTS tools, operator HMI platforms with an MQTT / Sparkplug B client, third-party analytics tools pre-wired to MQTT, or enterprise OPC UA clients expecting a unified address space — a **small projection service** can consume from Redpanda topics and expose the same canonical data on a classic UNS surface. The plan supports this architecturally (Pattern A from the brainstorm that produced this subsection) but **does not commit to building it** in the 3-year scope.
 Two projection flavors are possible, not mutually exclusive:
 1. **MQTT Sparkplug B broker projection.** A central MQTT broker (HiveMQ, EMQX, Mosquitto, or similar) republishes selected Redpanda topics as Sparkplug-shaped messages on an ISA-95 topic path constructed from the canonical model's equipment-class metadata. Consumers connect to the MQTT broker as if it were their UNS.
 2. **Enterprise OPC UA aggregator projection.** A central OPC UA server unions the per-site OtOpcUa instances into one enterprise-wide hierarchical address space, for consumers that prefer OPC UA over MQTT. This flavor needs care around the IT↔OT boundary — an enterprise-reachable OPC UA surface on top of OT-side servers can blur the "ScadaBridge central is the sole IT↔OT crossing" rule, and would need to be either routed through ScadaBridge central or limited to OT-side clients. Build this only if a consumer specifically needs it and the boundary question gets explicit resolution.
 **Decision trigger for building a projection service:** when a specific consumer (vendor tool, COTS HMI, analytics product, new initiative) requires a classic UNS surface and the cost of writing a Kafka client for that consumer exceeds the cost of operating the projection layer for the rest of the consumer's lifetime. Until that trigger is hit, the canonical model + Redpanda **is** the UNS and consumers reach it directly.
 This mirrors the treatment of OtOpcUa's future `simulated` namespace and the Digital Twin Use Case 2 simulation-lite foundation: the architecture supports the addition; the plan does not commit the build until a specific need justifies it.
 #### What the UNS framing does and does not change
 **Changes:**
 - Stakeholders who ask "do we have a UNS?" get a direct "yes — composed of OtOpcUa + Redpanda + `schemas` repo + dbt" answer instead of "we have a canonical model but we didn't use that word."
 - Digital Twin Use Cases 1 and 3 (see **Strategic Considerations → Digital twin**) — which are functionally UNS use cases in another vocabulary — now have a second name and a second stakeholder audience.
 - A future projection service is pre-legitimized as a small optional addition, not a parallel or competing initiative.
 - Vendor conversations that assume "UNS" means a specific MQTT broker purchase can be reframed: the plan delivers the UNS value proposition via different transport; the vendor's MQTT expectations become a projection-layer concern, not a core-architecture concern.
 **Does not change:**
 - **Does not add a new workstream** to [`roadmap.md`](roadmap.md).
 - **Does not commit to building** an MQTT broker, a Sparkplug B producer layer, or an enterprise OPC UA aggregator.
 - **Does not change** the Redpanda topic naming convention, the `{domain}.{entity}.{event-type}` taxonomy, or the site-identity-in-message decision.
 - **Does not change** the IT/OT boundary or the "ScadaBridge central is the sole IT↔OT crossing" rule. A projection service, if ever built, would live on one side of the boundary (most likely IT side, as a Redpanda consumer) and would not create a new crossing.
 - **Does not invalidate** any existing plan decision — the UNS framing is additive and interpretive, not restructural.
 _TBD — whether any stakeholder has specifically asked for UNS vocabulary, or whether this framing is proactive; whether any vendor tool currently in evaluation expects an MQTT/Sparkplug UNS and would motivate a Year 2 or Year 3 projection build; whether to add the projection service as an explicit "optional future capability" callout in roadmap.md, or leave it as an unscheduled architectural option (current posture)._
 ### IT/OT Bridge — ScadaBridge as the Strategic Layer
 - **ScadaBridge** is the **global integration network** providing controlled access between **IT and OT**.
 - **The IT↔OT boundary sits at ScadaBridge central.** In the target architecture:
  - **OT side = machine data.** Everything that collects, transforms, or stores machine data lives on the OT side. Concretely this includes: **Aveva System Platform** (primary and site clusters, Global Galaxy federation, hot-warm redundancy), **equipment OPC UA and native device protocols** (PLCs, controllers, instruments), **OtOpcUa** (the unified per-site OPC UA layer that exposes raw equipment data and the System Platform namespace — the evolution of LmxOpcUa), **ScadaBridge** (site clusters and central), **Aveva Historian**, and **Ignition SCADA** (as the KPI SCADA UX layer per the UX split).
  - **IT side = enterprise applications.** Everything business-facing lives on the IT side. Concretely this includes: **Camstar** (MES), **Delmia** (DNC / digital manufacturing), **enterprise reporting and analytics** (Snowflake, dbt, SAP BusinessObjects today / Power BI tomorrow), **the SnowBridge** (it's a Snowflake-facing enterprise consumer, not an OT component — it happens to read from OT sources but its identity, hosting, and governance are IT), and **any other enterprise app** that needs shopfloor data or has to drive shopfloor behavior.
  - **Long-term posture.** System Platform traffic (Global Galaxy, site↔site cluster federation, site System Platform clusters ↔ central System Platform cluster, site-level ScadaBridge ↔ local equipment) **stays on the OT side** and is **not** subject to "retire to ScadaBridge." Global Galaxy is how System Platform is supposed to federate and stays the authorized mechanism for OT-internal integration.
  - **The crossing point.** **ScadaBridge central ↔ enterprise integrations is the single IT↔OT bridge.** Any traffic that crosses between the two zones must cross *through* ScadaBridge central; nothing else is permitted as a long-term path. That includes reads (enterprise app wanting machine data) and writes (enterprise app driving shopfloor state).
  - **Implication for the Global Galaxy Web API** on the Aveva System Platform primary cluster: its two existing interfaces (Delmia DNC, Camstar MES) are IT↔OT crossings that currently run *outside* ScadaBridge and are therefore in-scope for retirement under pillar 3.
  - **Implication for the EventHub backbone.** Redpanda is consumed by both sides: ScadaBridge (OT) produces to it, and the SnowBridge (IT) consumes from it. The cluster itself lives in South Bend and is operationally an IT-zone resource — the physical network path from site ScadaBridge to the central cluster is therefore an IT↔OT crossing, and its ScadaBridge-side producer counts as the ScadaBridge-central-mediated crossing per the rule above.
  - **Out of scope here:** physical network segmentation (VLANs, firewalls, DMZ design, IDMZ conduits, specific IEC 62443 zone/conduit definitions, etc.). This plan defines the **logical** IT↔OT boundary and the **single sanctioned crossing point**. How the enterprise network team implements that logical boundary in physical infrastructure is owned outside this plan.
 - All cross-domain traffic (enterprise systems ↔ shopfloor equipment/SCADA) flows through ScadaBridge rather than ad-hoc point-to-point connections.
 - Leverages ScadaBridge's existing capabilities — **OPC UA** on the OT side, **secure Web API**, **scripting**, and **templating** — to standardize how integrations are built and governed.
 - **Legacy integration migration:** ScadaBridge is already deployed, but **remaining legacy IT↔OT integrations must be migrated onto it** to retire point-to-point paths. This migration is a prerequisite for the "all cross-domain traffic flows through ScadaBridge" target state. The authoritative inventory lives in [`current-state/legacy-integrations.md`](current-state/legacy-integrations.md). _TBD — migration plan, sequencing, decommission criteria for each legacy path._
 - _TBD — identity & auth details beyond what's already captured (EventHub SASL/OAUTHBEARER, ScadaBridge API keys, OtOpcUa UserName tokens with standard OPC UA security modes/profiles — inherited from the LmxOpcUa pattern), change control, HA/DR, capacity planning per site._
 ### Enterprise System Integration
 - **Camstar MES** — deeper, richer integration than today's single Web API interface. _TBD — bidirectional flows, event-driven vs request/response, data scope, error handling._
 - **Snowflake** — first-class integration with the enterprise data platform so shopfloor data lands in Snowflake for analytics, reporting, and downstream consumers. See **Aveva Historian → Snowflake** below for the time-series ingestion pattern. _TBD — non-historian data flows (MES, ScadaBridge events, metadata), schema ownership, latency targets, governance._
 - _TBD — other enterprise systems (ERP, PLM, quality, etc.) that need integration._
 ### SnowBridge — the Machine Data to Snowflake upload service
 **SnowBridge** is a **dedicated integration service** that owns all **machine data** flows into Snowflake. It is a new, purpose-built component — **not** the Snowflake Kafka Connector directly, and **not** configuration living inside ScadaBridge or the central `schemas` repo.
 **Responsibilities.**
 - **Source abstraction.** Reads from multiple machine-data sources behind a common interface: **Aveva Historian** (via its SQL interface), **ScadaBridge / EventHub (Redpanda topics)**, and any future source (e.g., Ignition, Aveva Data Hub, direct OPC UA collectors) without each source needing its own bespoke pipeline.
 - **Selection.** Operators configure **which topics** (for Redpanda-backed sources) and **which tags/streams** (for Historian and other sources) flow to Snowflake. Selection is a first-class, governed action inside this service — not a side effect of deploying a schema or a ScadaBridge template.
 - **Sink to Snowflake.** Writes into Snowflake via the appropriate native mechanism per source (Snowpipe Streaming for event/topic sources, bulk or COPY-based for Historian backfills, etc.) while presenting one unified operational surface.
 - **Governance home.** Topic/source/tag opt-in, mapping to Snowflake target tables, schema bindings, and freshness expectations all live in this service's configuration — one place to ask "is this tag going to Snowflake, and why?"
 **Rationale.**
 - **Separation of concerns.** ScadaBridge's job is IT/OT integration at the edge (site-local, OPC UA, store-and-forward, scripting). Shoveling curated data into Snowflake is a different job — long-lived connector state, Snowflake credentials, per-source backfill logic, schema mapping — and does not belong on every ScadaBridge cluster.
 - **Source-agnostic.** Not all machine data is going to flow through Redpanda. Aveva Historian in particular has its own SQL interface that is better read directly for bulk/historical work than replayed through EventHub. This service handles that heterogeneity in one place.
 - **Governance visibility.** A single operator-facing system answers the "what machine data is in Snowflake?" question, which matters for compliance, cost attribution, and incident response.
 - **Decouples schema evolution from data flow.** Adding a Protobuf schema to the central repo no longer implicitly adds data to Snowflake — that requires an explicit action in this service. Prevents accidental volume.
 **Implications for other decisions in this plan.**
 - The **Aveva Historian → Snowflake** recommendation (below) is updated: this service is the component that actually implements the path, rather than a direct ScadaBridge→EventHub→Snowflake Kafka Connector pipeline.
 - The **tag opt-in governance** question for the Snowflake-bound stream is resolved here: the opt-in list lives in **this service**, not in the central `schemas` repo and not in ScadaBridge configuration.
 - The **Snowflake Kafka Connector** is no longer presumed to be the primary path. It may still be used internally by this service for Redpanda-backed flows, or this service may implement its own consumer — an implementation choice inside the service, not a plan-level commitment.
 - The **central EventHub cluster** does not change — machine data still flows through Redpanda for event/topic sources; this service is just one of several consumers (alongside KPI processors, Camstar integration, etc.).
 **Build-vs-buy: custom build, in-house.** The service is built in-house rather than adopting Aveva Data Hub or a third-party ETL tool (Fivetran, Airbyte, StreamSets, Precog, etc.).
 - **Rationale:** bespoke fit for the exact source mix (Aveva Historian SQL + Redpanda/ScadaBridge + future sources), full control over the selection/governance model, alignment with the existing .NET ecosystem that ScadaBridge and OtOpcUa already run on, no commercial license dependency, and no vendor roadmap risk for a component this central.
 - **Trade-off accepted:** commits the organization to building and operating another service over the lifetime of the plan. Justified because the requirements (multi-source abstraction, topic/tag selection as a governed first-class action, Snowflake as a targeted sink) don't map cleanly onto any off-the-shelf tool, and the cost of a bad fit would be paid forever.
 - **Implementation hint (not a commitment):** the most natural starting point is a .NET service — possibly an Akka.NET application to share infrastructure patterns with ScadaBridge — but the specific runtime/framework is an implementation detail for the build team.
 **Operator interface: web UI backed by an API.** Operators manage source/topic/tag selection through a **dedicated web UI** that sits on top of the service's own API. Selection state lives in an **internal datastore owned by the service**, not in a git repo.
 - **Rationale:** lowest friction for the operators who actually run the machine-data estate — non-engineers can onboard a tag or disable a topic without opening a PR. Makes the "what's flowing to Snowflake right now?" question answerable from one screen instead of correlating git state with running state. The underlying API lets ScadaBridge, dbt, or future tooling drive selection changes programmatically when needed.
 - **Trade-off accepted:** git is **not** the source of truth for selection state. Audit, change review, and rollback all have to be built into the service itself — they do not come for free from `git log` and PR review.
 - **Non-negotiable requirements on the UI/API (to offset the trade-off):**
  - **Full audit trail** — every selection change records who, what, when, and why (with a required change-reason field). Audit entries are queryable and exportable.
  - **Role-based access control** — viewing vs proposing vs approving selection changes are separate permissions; ties into the same enterprise IdP used for Redpanda SASL/OAUTHBEARER (see EventHub auth).
  - **Approval workflow — blast-radius based.** Not every change needs four-eyes review; the gate is the **potential blast radius** of the change, not the environment it runs in.
    - **Self-service (no approval):** adding/removing a single tag from a non-compliance topic, adjusting a non-structural mapping, toggling an individual tag on/off, renaming a display label.
    - **Requires approval (four-eyes):** adding a brand-new topic or source, adding or modifying any tag/topic classified as **compliance-tier**, any change whose **estimated Snowflake storage or compute impact** crosses a configured threshold (the UI must compute and display that estimate before the change can be submitted), and any change to the approval rules themselves.
    - **Rationale:** cost and compliance incidents can happen in any environment connected to a real Snowflake account, so gating approvals purely by environment is weaker than gating by blast radius. Fast path stays fast; risky changes always get reviewed regardless of where they're being made.
    - _TBD — concrete storage/compute impact thresholds that trigger the gate, whether approvers can be inside or must be outside the requester's team, SLA for approvals so the gate doesn't become a bottleneck, and how "estimated impact" is calculated (heuristic, dry-run query, cost modeling service)._
  - **Readable, exportable state** — the current selection can be dumped as a machine-readable document (YAML/JSON) at any time, so operators and auditors can see the full picture without walking the UI screen-by-screen, and so disaster recovery of the service has a clear input format.
  - **Reversible changes** — every change can be reverted; reverts are themselves audited.
 _TBD — service name (working title only); hosting (South Bend, alongside Redpanda, or elsewhere); high-availability posture; how it authenticates to Snowflake, Historian, and Redpanda; which datastore holds selection state (SQL? internal KV?); how the service recovers its selection state after a failure; how it handles schema evolution when the upstream Protobuf schema changes; backfill and replay semantics per source; whether the UI is a standalone app or embeds into an existing operations portal._
 ### Aveva Historian → Snowflake
 **Problem.** Aveva Historian exposes a SQL interface (OPENQUERY, history views) that *can* be queried from Snowflake, but full-fidelity bulk loading of raw tag data into Snowflake is not viable at enterprise scale — the data volume (sub-second tag values across thousands of tags across many sites) would overwhelm both the export path and Snowflake storage/compute costs.
 **Design constraints.**
 - **Volume-aware** — we cannot ship raw full-resolution data wholesale.
 - **Data locality** — collection stays local to each site (see ScadaBridge principle). Heavy pre-processing should happen near the source, not after a WAN hop.
 - **Replayability** — downstream systems (including Snowflake) must tolerate outages without data loss.
 - **Single backbone** — reuse the planned **EventHub (Kafka-compatible)** backbone rather than inventing a point-to-point Historian↔Snowflake path.
 **Candidate options researched.**
 1. **Direct Snowflake → Historian SQL (OPENQUERY / linked server / JDBC pulls).**
   - *Pros:* simplest conceptually; no new infra; uses existing Historian SQL surface.
   - *Cons:* **does not scale** — pulling raw resolution is the volume problem we're trying to avoid; puts read load on production Historians; no store-and-forward; Snowflake compute wakes up repeatedly to poll.
   - *Verdict:* viable only for low-volume metadata/config tables, not tag data.
 2. **Aveva Historian Tier-2 Summary Replication → Snowflake.**
   - Aveva Historian natively supports **Tier-2 replication** with **summary replication** (periodic analog/state summary stats rather than full resolution). Summaries are stored as history blocks, not raw SQL rows.
   - *Pros in principle:* native feature; summaries designed for analytics.
   - *Why it doesn't apply here:* the current-state Historian topology is **central-only in South Bend** — there is no tier-1/tier-2 replication in play today, so "Tier-2 summary replication" would require introducing a new historian server to replicate into before it could feed Snowflake. That's infrastructure we don't have and don't want to add just to serve Snowflake ingestion.
   - *Verdict:* **not used.** This option assumed a tiered historian that doesn't exist in the current topology, and standing one up just for Snowflake contradicts the "stable, single point of integration" theme.
 3. **ScadaBridge → EventHub → Snowflake Kafka Connector (Snowpipe Streaming).** *(recommended primary path)*
   - ScadaBridge already has **EventHub forwarding** as a capability and runs at each site (data locality). It can publish tag values, aggregates, or exception-based events to EventHub.
   - Snowflake offers the **Snowflake Connector for Kafka with Snowpipe Streaming**, which streams rows directly into Snowflake tables with ~1 second latency and lower per-row cost than file-based Snowpipe.
   - *Pros:* reuses the planned EventHub backbone; ScadaBridge scripting/templating does the aggregation/thinning at the source; store-and-forward handles WAN outages; Snowflake-native streaming ingestion; horizontally scalable per site.
   - *Cons:* requires disciplined schema/topic taxonomy and a schema registry; ScadaBridge has to carry aggregation logic (or delegate it to Historian Tier-2 summaries — see option 2); cost of EventHub + Snowpipe Streaming must be modeled.
   - *Verdict:* **recommended primary path.** It aligns with every other goal-state decision (ScadaBridge as IT/OT bridge, EventHub as async backbone, data locality).
 4. **Aveva Data Hub / Cogent DataHub as the bridge.**
   - Aveva's own DataHub products can connect to AVEVA Historian and forward to external targets including **Apache Kafka** and **Azure Event Hubs**, with store-and-forward.
   - *Pros:* supported Aveva product; pre-built Historian connector; could reduce custom ScadaBridge logic for historian-sourced flows.
   - *Cons:* additional licensed product to buy, deploy, and operate; overlaps significantly with ScadaBridge's role and risks creating two IT/OT bridges; may not fit the existing ScadaBridge-centric governance model.
   - *Verdict:* fallback/complement — evaluate if ScadaBridge's historian read path proves insufficient, or for sites where a packaged product is preferable to scripting.
 5. **File-based bulk export → object storage → Snowpipe / external tables.**
   - Periodic SQL export to Parquet/CSV on S3/ADLS/GCS, loaded via Snowpipe or queried as external tables / Iceberg.
   - *Pros:* cheap for large batch windows; decouples Snowflake cost from ingestion timing.
   - *Cons:* high latency (minutes to hours); still requires someone to do the SQL pull; not great for operational/near-real-time KPIs.
   - *Verdict:* useful for **historical backfill** and cold/archive tier only.
 **Recommended direction.**
 - **Primary path (updated):** all machine-data ingestion into Snowflake is owned by the **SnowBridge** (see its dedicated section above). That service reads from ScadaBridge/EventHub for event-driven flows and directly from Aveva Historian's SQL interface for historian-native flows, then writes to Snowflake. ScadaBridge remains the producer for event-driven machine data into Redpanda; the integration service consumes from Redpanda rather than Snowflake pulling directly.
 - **Aggregation boundary: aggregation lives in Snowflake.** Heavy transforms — summary statistics, time-window rollups, state derivations, cross-site joins, enrichment with MES/other data, business-level KPIs — are **all done in Snowflake** using Snowflake-native transform tooling (dbt, Dynamic Tables, Streams + Tasks — exact tool selection TBD).
  - **Rationale:** keeps transform logic where the data analysts and platform owners already work; avoids scattering business logic across Historian Tier-2, ScadaBridge scripts, and Snowflake. One place to version, review, and lineage transforms. Accepts higher Snowflake compute/storage cost as the explicit trade-off.
  - **Not used:** Aveva Historian **Tier-2 summary replication** is **not** used as the aggregation layer for the Snowflake path. (Tier-2 may still be used for its original historian purpose, but it's not part of the Snowflake ingestion pipeline.)
  - **Not used:** ScadaBridge **scripting** is **not** the aggregation layer either — aggregation logic does not live in ScadaBridge scripts.
 - **Volume control — two layers.** Since aggregation happens in Snowflake, both ScadaBridge and the SnowBridge share responsibility for preventing a full-fidelity firehose from reaching Snowflake:
  - **At ScadaBridge (producing to EventHub):** **deadband / exception-based publishing** — only publish when a tag value changes by a configured threshold or a state changes, not on every OPC UA sample. **Rate limiting** per tag / per site where needed. This controls how much machine data reaches Redpanda in the first place.
  - **At the SnowBridge (selecting what reaches Snowflake):** **topic and tag selection** — only topics and tags explicitly opted in within the service are forwarded to Snowflake. Not every Redpanda topic or every Historian tag automatically flows. Adding a tag or topic to Snowflake is a governed action in this service.
  - Keep **raw full-resolution data in Aveva Historian** as the system of record — Snowflake stores the selected, deadband-filtered stream plus whatever aggregations dbt builds on top, not a mirror of Historian.
 - **Drill-down:** for rare raw-data investigations, query the Historian SQL interface directly from analyst tooling rather than copying raw data into Snowflake.
 - **Historical backfill:** one-off file-based exports (option 5) to seed Snowflake history when a new tag set comes online.
 **Snowflake-side transform tooling: dbt.** All Snowflake transforms are built in **dbt**, versioned in git alongside the other integration source (schemas, topic config, etc.), and run on a schedule (or via CI) — not as real-time streaming transforms.
 - **Rationale:** dbt is the mature, portable standard for SQL transforms. Strong lineage, testing (`dbt test`), environment separation, and documentation generation. Fits the "everything load-bearing lives in git and is reviewed before it ships" discipline already established for schemas and topic definitions.
 - **Explicit trade-off — no real-time transforms.** dbt is batch/micro-batch, not streaming. Transforms land in Snowflake tables on whatever cadence dbt runs (likely minutes-to-hours depending on the model), **not** sub-second. This is acceptable because operational/real-time KPIs continue to run on **Ignition SCADA**, not on Snowflake (see Target Operator / User Experience — Ignition owns KPI UX). Snowflake's job is analytics and enterprise rollups, which tolerate minute-plus latency.
 - **Not used:** Dynamic Tables and Streams+Tasks are **not** adopted as part of the primary transform toolchain. If a specific future use case genuinely needs sub-minute latency from Snowflake itself (not Ignition), re-open this decision — don't quietly add a second tool.
 - **Orchestration: dbt Core on a self-hosted scheduler.** dbt runs are driven by a self-hosted orchestrator (Airflow / Dagster / Prefect — specific tool TBD), **not** dbt Cloud and **not** CI-only. The orchestrator schedules `dbt build`/`dbt run`/`dbt test`, manages freshness SLAs, backfills, and coordinates dbt alongside the rest of the data pipeline (Snowflake Kafka Connector health checks, ad-hoc backfill jobs, downstream notifications).
  - **Rationale:** gives full control over scheduling, dependencies, and alerting; avoids recurring dbt Cloud SaaS spend; keeps dbt runs decoupled from CI so a long-running transform isn't sitting inside a CI build minute. Accepts the operational cost of running one more platform service.
  - **Not used:** dbt Cloud (avoided recurring SaaS cost and vendor coupling), pure CI-driven runs (too coupled to PR merge cadence), and Snowflake Tasks as the primary scheduler (too limited).
  - **Out of scope for this plan:** specific orchestrator selection (Airflow vs Dagster vs Prefect), whether to stand up a new one or reuse one the enterprise data platform already runs, hosting, and credential management. This plan commits to the *pattern* (dbt Core run by a self-hosted orchestrator) but leaves the concrete orchestrator choice to the team that owns the Snowflake-side data platform.
 - _TBD — dbt project layout (one project vs per-domain), model cadence targets, test coverage expectations, source freshness thresholds, CI/CD pipeline for dbt changes._
 **Deadband / filtering model: global default with explicit per-tag overrides.** ScadaBridge applies **one global deadband** to every tag opted in to the Snowflake stream, and specific tags can be **explicitly overridden** when the global default is too loose or too tight. No per-tag-class templating for deadband — the global default is the floor, overrides are the only fine-tuning mechanism.
 - **Rationale:** simplest model to reason about and operate — one number to understand across the whole fleet, plus an explicit list of exceptions. Makes it immediately obvious which tags have bespoke tuning (any tag *not* on the override list uses the global default). Avoids per-class template proliferation as a governance surface.
 - **Trade-off accepted:** a single global default will be wrong for many tags — too coarse for some (losing signal) and too fine for others (wasting throughput). This is acceptable because (1) overrides exist for the tags that matter, and (2) Snowflake compute cost of over-sampled tags is a known-and-bounded cost, not a silent failure.
 - **Starting value (not a final commitment):** **approximately 1% of span** for analog tags, **change-only** for booleans/state tags, and **every-increment** for counters — as a sensible industry-conservative starting point. The exact percentage is deliberately not pinned in this plan: the mechanism (global default + per-tag overrides) is what's load-bearing, and the starting value will be retuned in Year 2 based on observed Snowflake cost and signal loss. The build team may adjust the starting point during implementation without reopening the plan.
 - _TBD — override governance (how overrides are requested, reviewed, and tracked) and whether the override list lives in the central `schemas` repo alongside tag opt-in metadata or in ScadaBridge configuration directly._
 **Open questions (TBD).**
 - Tag and topic opt-in governance: who approves adding a tag/topic to Snowflake? The **list now lives inside the SnowBridge**, not the central `schemas` repo — approval workflow, audit trail, and change control for that service's selection state are TBD.
 - ~~Latency SLOs per data class.~~ **Resolved — captured below.**
 **Latency SLOs per data class.** End-to-end latency is measured from the moment ScadaBridge (or the Historian source) emits a value to the moment it is queryable in its target consumer.
 - **Operational / real-time KPI UX — out of scope for the Snowflake path.** Real-time KPI runs on **Ignition SCADA** per the UX split (see Target Operator / User Experience). Snowflake has no sub-minute SLO obligation because no operational UI depends on it.
 - **Analytics feeds (Snowflake): ≤ 15 minutes end-to-end.** Covers ScadaBridge emit → Redpanda → SnowBridge → Snowflake landing table → dbt model refresh → queryable in the curated layer. Tight enough to feel alive for analysts and dashboards, loose enough to be reachable with dbt on a self-hosted scheduler and no streaming transforms.
 - **Compliance / validated data feeds (Snowflake): ≤ 60 minutes end-to-end.** Snowflake is an investigation/reporting tier for validated data; the system of record remains **Aveva Historian**. A 60-minute SLO is sufficient because no compliance control depends on Snowflake freshness — if an investigation needs sub-hour data, it queries Historian directly.
 - **Ad-hoc raw drill-down — no SLO.** Analysts query the Historian SQL interface directly for rare raw-resolution investigations; this path is not budgeted against any latency target.
 - _TBD — which layer is responsible for each segment of the budget (e.g., how much of the 15 minutes is Redpanda vs integration service vs dbt), and how the SLOs are monitored and alerted on in practice._
 - Cost model: EventHub throughput + ingestion (Snowpipe Streaming or whatever the integration service uses) + Snowflake **compute for transforms** + Snowflake storage for the target tag volume. Compute cost matters more under this choice than it would have under a source-aggregated model — worth pricing early.
 - Whether Aveva Data Hub (option 4) should still be piloted as a **reference point** for the custom build — useful for comparison on specific capabilities (Historian connector depth, store-and-forward behavior) even though it is not the target implementation.
 ### OtOpcUa — the unified site-level OPC UA layer (absorbs LmxOpcUa)
 **OtOpcUa** is a per-site **clustered OPC UA server** that is the **single sanctioned OPC UA access point for all OT data at each site**. It owns the one connection to each piece of equipment and exposes a unified OPC UA surface — containing **two logical namespaces** — to every downstream consumer (System Platform, Ignition, ScadaBridge, future consumers).
 **The two namespaces served by OtOpcUa:**
 1. **Equipment namespace (raw data).** Live values read from equipment via native OPC UA or native device protocols (Modbus, Ethernet/IP, Siemens S7, etc.) translated to OPC UA. This is the new capability the plan introduces — what the "layer 2 — raw data" role in the layered architecture describes.
 2. **System Platform namespace (processed data tap).** The former **LmxOpcUa** functionality, folded in. Exposes Aveva System Platform objects (via the local App Server's LMX API) as OPC UA so that OPC UA-native consumers can read processed data through the same endpoint they use for raw equipment data.
 **Namespace model is extensible — future "simulated" namespace supported architecturally, not committed for build.** The two-namespace design is not a hard cap. A future **`simulated` namespace** could expose synthetic or replayed equipment data to consumers, letting tier-1 / tier-2 consumers (ScadaBridge, Ignition, System Platform IO) be exercised against real-shaped-but-offline data streams without physical equipment. This is the **OtOpcUa-side foundation for Digital Twin Use Case 2** (Virtual Testing / Simulation — see **Strategic Considerations → Digital twin**). The plan **does not commit to building** a simulated namespace in the 3-year scope; it commits that the namespace architecture can accommodate one when a specific testing need justifies it, without reshaping OtOpcUa. The complementary foundation (historical event replay) lives in the Redpanda layer — see **Async Event Backbone → Usage patterns → Historical replay**.
 **LmxOpcUa is absorbed into OtOpcUa, not replaced by a separate component.** The existing LmxOpcUa software and deployment pattern (per-node service on every System Platform node) evolves into OtOpcUa. Consumers that previously pointed at LmxOpcUa for System Platform data and at "nothing yet" for equipment data now point at OtOpcUa and see both in its namespace. There is not a second OPC UA server running alongside.
 **Responsibilities.**
 - **Single connection per equipment.** OtOpcUa is the **only** OPC UA client that talks to equipment directly. Equipment holds one session — to OtOpcUa — regardless of how many downstream consumers need its data. This eliminates the multiple-direct-sessions problem documented in `current-state.md` → Equipment OPC UA.
 - **Site-local aggregation.** Downstream consumers (System Platform IO, Ignition, ScadaBridge, and any future consumers such as a prospective digital twin layer) connect to OtOpcUa rather than to equipment directly. A consumer reading the same tag gets the same value regardless of who else is subscribed.
 - **Unified OPC UA endpoint for OT data.** Clients that need both raw equipment data and processed System Platform data read from **one OPC UA endpoint** with two namespaces instead of connecting to two separate OPC UA servers (as they would have in the previous "LmxOpcUa + new cluster" design). One fewer connection, one fewer credential, one fewer monitoring target per client.
 - **Access control / authorization chokepoint.** Authentication, authorization, rate limiting, and audit of OT OPC UA reads/writes are enforced at OtOpcUa, not at each consumer. This is the site-level analogue of the "single sanctioned crossing point" theme the plan applies between IT and OT.
 - **Clustered for HA.** Runs as a cluster (multi-node), not a single server, so a node loss does not drop equipment or System Platform visibility.
 **Rationale.**
 - **Stable, single point of integration — applied at the equipment boundary.** The overarching plan theme is a stable, single point of integration between IT and OT. This component extends the same discipline down one layer: a stable, single point of integration between **consumers and equipment** at each site. Every other decision in this plan presumes equipment data is sharable — this is what makes that true in practice.
 - **Protects equipment.** Many devices have hard caps on concurrent OPC UA sessions; some degrade under session load. Collapsing to one session per equipment removes that risk entirely.
 - **Data consistency.** Downstream consumers reading the same tag see the same value (sampling cadence, deadband, and buffer are controlled in one place).
 - **Auditability.** Equipment access becomes observable and governable from one place — which also gives OT security a concrete control surface.
 - **Preserves data locality.** The cluster is per-site, same pattern as ScadaBridge and Aveva System Platform site clusters. Equipment never has to be reachable from outside its site to serve a downstream consumer.
 **Implications for other decisions in this plan.**
 - **ScadaBridge** stops connecting to equipment OPC UA directly. In the target state, ScadaBridge reads equipment data from OtOpcUa's equipment namespace and System Platform data from OtOpcUa's System Platform namespace — all from the same OPC UA endpoint. ScadaBridge's **data locality** principle is preserved and strengthened — the local ScadaBridge talks to the local OtOpcUa, which talks to local equipment and the local System Platform node.
 - **Ignition** stops connecting to equipment OPC UA directly. Today Ignition is central in South Bend and holds direct OPC UA sessions to equipment across the WAN; in the target state, Ignition consumes from each site's OtOpcUa instead. If Ignition remains centrally hosted, the WAN dependency is still present but collapses from *N sessions per equipment* to *one session per site*. (A future goal-state decision on Ignition deployment topology — per-site vs central — is independent of this change.)
 - **Aveva System Platform IO** consumes equipment data from OtOpcUa's equipment namespace rather than holding its own direct equipment sessions. This is a meaningful shift in System Platform's IO layer and needs validation against Aveva's supported patterns — System Platform is the most opinionated consumer about how its IO is sourced. (System Platform is still the owner of the objects in OtOpcUa's System Platform namespace — that namespace is a view onto System Platform, not a replacement for it.)
 - **LmxOpcUa evolves into OtOpcUa**, rather than running alongside it. The existing LmxOpcUa deployment (per-node service on every System Platform node, exposing System Platform objects) grows to also expose the equipment namespace, picks up clustering, and is renamed OtOpcUa. Consumers that used LmxOpcUa-style OPC UA access to System Platform continue to get that access through OtOpcUa; the previous LmxOpcUa operational pattern (credentials, security modes, namespace shapes for System Platform) carries forward.
 - **The IT↔OT boundary is unchanged.** OtOpcUa lives entirely on the **OT side** — it's OT-data-facing, site-local, and fronts OT consumers. It does not change where the IT↔OT crossing sits (still ScadaBridge central ↔ enterprise integrations).
 **Build-vs-buy: custom build, in-house.** The site-level OPC UA server cluster is built in-house rather than adopting Kepware, Matrikon, Aveva Communication Drivers, or any other off-the-shelf OPC UA aggregator.
 - **Rationale:** matches the existing in-house .NET pattern for ScadaBridge and SnowBridge (and continues the in-house .NET pattern LmxOpcUa already established — which OtOpcUa is the evolution of); full control over clustering semantics, access model, and integration with ScadaBridge's operational surface; no per-site commercial license; no vendor roadmap risk for a component this central to the OT plan.
 - **Primary cost driver acknowledged upfront: equipment driver coverage.** Unlike ScadaBridge (which reads OPC UA and doesn't have to speak native device protocols) or the SnowBridge (which reads from a small set of well-defined sources), this component has to **speak the actual protocols of every piece of equipment it fronts**. Commercial aggregators like Kepware justify their license cost largely through their driver library, and picking custom build means that library has to be built or sourced internally over the lifetime of the plan. This is the real cost of option (a), and it is accepted as the trade-off for control.
 - **Mitigation:** where equipment already speaks native OPC UA, no driver work is required — the cluster simply proxies the OPC UA session. The driver-build effort is scoped to equipment that exposes non-OPC-UA protocols (Modbus, Ethernet/IP, Siemens S7, proprietary serial, etc.).
 - **Driver strategy: hybrid — proactive core library plus on-demand long-tail.** A **core driver library** covering the top equipment protocols for the estate is built **proactively** (Year 1 into Year 2), so that most site onboardings can draw from existing drivers rather than blocking on driver work. Protocols beyond the core library — long-tail equipment specific to one site or one equipment class — are built **on-demand** as each site onboards.
  - **Why hybrid:** purely lazy (on-demand only) makes site onboarding unpredictable and bumpy; purely proactive risks building drivers for protocols nobody actually uses. The hybrid matches the reality of a mixed equipment estate over a 3-year horizon.
  - **Core library scope** is driven by the equipment-protocol inventory, not by guessing — the top protocols are whichever ones are actually most common in the estate once surveyed.
  - **Implementation approach (not committed, one possible tactic):** embedded open-source protocol stacks (e.g., NModbus for Modbus, Sharp7 for Siemens S7, libplctag for Ethernet/IP) wrapped in the cluster's driver framework, rather than from-scratch protocol implementations. This reduces driver work to "write the OPC UA ↔ protocol adapter" rather than "implement the protocol." The build team may pick this or a cleaner-room approach per driver; this plan does not commit to a specific library choice.
  - _TBD — inventory of the actual non-OPC-UA equipment protocols in the estate, which determines the core library scope; how long-tail driver requests are prioritized vs site-onboarding deadlines._
 - **Not used:** Kepware, Matrikon, Aveva Communication Drivers, HiveMQ Edge, and other off-the-shelf options. Reference products may still be useful for comparison on specific capabilities (clustering patterns, security features, driver implementations) even though they are not the target implementation.
 **Deployment footprint per site: co-located on existing Aveva System Platform nodes.** Same pattern as ScadaBridge today — the site-level OPC UA server cluster runs on the **same physical/virtual nodes** that host Aveva System Platform and ScadaBridge, not on dedicated hardware.
 - **Rationale:** zero new hardware footprint, consistent operational model across the in-house .NET OT components (ScadaBridge and OtOpcUa — with OtOpcUa running on the same nodes LmxOpcUa already runs on today), and the driver workload at a typical site is modest compared to ScadaBridge's 225k/sec OPC UA ingestion ceiling that these nodes already handle. Co-location keeps site infrastructure simple as smaller sites onboard.
 - **Cluster size:** same as ScadaBridge — **2-node clusters at most sites**, with the understanding that the **largest sites** (Warsaw West, Warsaw North) run **one cluster per production building** matching ScadaBridge's and System Platform's existing per-building cluster pattern. No special hardware or quorum model is required beyond what ScadaBridge already uses.
 - **Trade-off accepted:** System Platform, ScadaBridge, and OtOpcUa all share the same nodes' CPU, memory, and network. Resource contention is a risk — mitigated by (1) the modest driver workload relative to ScadaBridge's proven ingestion ceiling, (2) monitoring via the observability minimum signal set, and (3) the option to move off-node if contention is observed during rollout. Note: the OtOpcUa workload largely replaces what LmxOpcUa already runs on these nodes, so the *incremental* resource draw is just the new equipment-driver and clustering work, not a full new service. _TBD — measured impact of adding this workload to already-shared nodes; headroom numbers at the largest sites; whether any specific site needs to escalate to dedicated hardware._
 **Authorization model: OPC UA-native — user tokens for authentication + namespace-level ACLs for authorization.** Every downstream consumer (System Platform IO, Ignition, ScadaBridge, future consumers) authenticates to the cluster using **standard OPC UA user tokens** (UserName tokens and/or X.509 client certs, per site/consumer policy), and authorization is enforced via **namespace-level ACLs** inside the cluster — each authenticated identity is scoped to the equipment/namespaces it is permitted to read/write.
 - **Rationale:** OPC UA is the protocol we're fronting, so the auth model stays in OPC UA's own terms. No SASL/OAUTHBEARER bridging, no custom token-exchange glue — OtOpcUa is self-contained and operable with standard OPC UA client tooling. **Inherits the LmxOpcUa auth pattern** — UserName tokens with standard OPC UA security modes/profiles — so the consumer-side experience does not change for clients that used LmxOpcUa previously, and the fold-in is an evolution rather than a rewrite.
 - **Explicitly not federated with the enterprise IdP.** Unlike Redpanda (which uses SASL/OAUTHBEARER against the enterprise IdP) and SnowBridge (which uses the same IdP for RBAC), OtOpcUa does **not** pull enterprise IdP identity into the OT data access path. OT data access is a pure OT concern, and the plan's IT/OT boundary stays at ScadaBridge central — not here.
 - **Trade-off accepted:** identity lifecycle (user token/cert provisioning, rotation, revocation) is managed locally in the OT estate rather than inherited from the enterprise IdP. Two identity stores to operate (enterprise IdP for IT-facing components, OPC UA-native identities for OtOpcUa) is the cost of keeping the OPC UA layer clean and self-contained.
 - _TBD — specific OPC UA security mode + profile combinations required vs allowed; where UserName credentials/certs are sourced from (local site directory, a per-site credential vault, AD/LDAP); rotation cadence; audit trail of authz decisions; whether the namespace ACL definitions live alongside driver/topology config or in their own governance surface._
 **Open questions (TBD).**
 - **Driver coverage.** Which equipment protocols need to be bridged to OPC UA beyond native OPC UA equipment — this is where product-driven decisions matter most.
 - **Rollout posture: build and deploy the cluster software to every site ASAP.** The cluster software (server + core driver library) is built and rolled out to **every site's System Platform nodes as fast as practical** — deployment to all sites is treated as a prerequisite for the rest of the OT plan, not a gradual per-site effort. "Deployment" here means installing and configuring the cluster software at each site so the node is ready to front equipment; it does **not** mean immediately migrating consumers (that follows the tiered cutover below). A deployed but inactive cluster is cheap; what's expensive is delaying deployment and then trying to do it site-by-site on the critical path of every other workstream.
 - **Migration path: tiered consumer-by-consumer cutover, sequenced by risk.** Existing direct equipment connections are moved to the cluster **one consumer at a time**, in risk order — not equipment-by-equipment or site-by-site.
  1. **ScadaBridge first.** We own both the server and the client, so redirecting ScadaBridge to consume from the new cluster is the lowest-risk cutover. It also validates the cluster under real ingestion load before Ignition or System Platform are affected.
  2. **Ignition second.** Moving Ignition off direct equipment OPC UA to the site-level cluster collapses its WAN session count per equipment from *N* to *one per site cluster*. Medium risk — Ignition is the KPI SCADA and a cutover mistake is user-visible, but Ignition has no validated-data obligations.
  3. **Aveva System Platform IO last.** System Platform IO is the hardest cutover because its IO path feeds validated data collection. Moving it through the new cluster needs validation/re-qualification with compliance stakeholders, and System Platform is the most opinionated consumer about how its IO is sourced. Doing it last lets us accumulate operational confidence on the cluster from the ScadaBridge and Ignition cutovers first.
  - **Acceptable double-connection windows.** During each consumer's cutover, a short window of **both old direct connection and new cluster connection** existing at the same time for the same equipment is **tolerated** — it temporarily aggravates the session-load problem the cluster is meant to solve, but keeping the window short (minutes to hours, not days) bounds the exposure. Longer parallel windows are only acceptable for the System Platform cutover where compliance validation may require extended dual-run.
  - **Rollback posture.** Each consumer's cutover is reversible — if the cluster misbehaves during or immediately after a cutover, the consumer falls back to direct equipment OPC UA, and the cutover is retried after the issue is understood. The old direct-connection capability is **not removed** from consumers until all three cutover tiers are complete and stable at a site.
  - _TBD — per-site cutover sequencing across the three tiers (all sites reach tier 1 before any reaches tier 2, or one site completes all three tiers before the next site starts), and per-equipment-class criteria for when a System Platform IO cutover requires compliance re-validation._
 - **Validated-data implication.** System Platform's validated data collection currently uses its own IO path; moving that through the new cluster may require validation/re-qualification depending on the regulated context. Confirm with compliance stakeholders before committing System Platform IO to this path.
 - **Authorization model inside the cluster.** How consumers authenticate to the cluster (certificates, UserName tokens, tied to the enterprise IdP or a local site directory), and how per-consumer read/write scopes are expressed.
 - **Relationship to ScadaBridge's 225k/sec ingestion ceiling** (per `current-state.md`): the cluster's aggregate throughput must be able to feed ScadaBridge at its capacity without becoming a bottleneck — sizing needs to reflect this.
 ### Async Event Backbone
 - **EventHub (Kafka-compatible)** — introduce an EventHub as the **async event integration layer** between shopfloor systems and enterprise consumers. Provides a Kafka-compatible API so producers/consumers can use standard Kafka tooling.
  - **Purpose:** decouple producers from consumers, enable fan-out, replay, and event-driven integrations (Camstar, Snowflake, and future systems).
  - **Hosting model:** **self-hosted Redpanda** (Kafka-API-compatible). Chosen over Apache Kafka and over managed cloud offerings (Azure Event Hubs, Confluent Cloud, AWS MSK) for: maximum control, lower per-message cost at scale, no cloud-provider coupling, and — vs Apache Kafka specifically — a single-binary operational model (no ZooKeeper/KRaft quorum to manage separately), bundled **Schema Registry** and HTTP Proxy, and lower node count for equivalent throughput. Trade-off accepted: higher operational burden than managed offerings (cluster ops, upgrades, capacity planning, DR) is owned internally, and we commit to the Redpanda ecosystem for broker-adjacent features.
  - **Deployment footprint: central cluster only (South Bend).** A single self-hosted EventHub cluster runs in the South Bend Data Center. No per-site or per-region clusters.
    - **Rationale:** keeps operational burden manageable (one cluster to run, upgrade, secure, and back up) and gives all enterprise consumers (Snowflake, Camstar, global dashboards) a single integration point. Avoids the cost and complexity of federation/mirroring across N sites.
    - **Write-path resilience via ScadaBridge store-and-forward.** Because the cluster is central, every site→EventHub write traverses the WAN. To preserve the ScadaBridge **data locality** and **WAN-outage resilience** principles, **all ScadaBridge writes to EventHub are configured as store-and-forward**. During a WAN outage, ScadaBridge queues events locally at the site and replays them to the central EventHub when connectivity returns — no data loss, no site operational impact. This leverages ScadaBridge's existing per-call store-and-forward capability (see `current-state.md`).
    - **Consequence:** site-local producers never block on the central cluster being reachable. Producers remain local; durability during outages is handled at the ScadaBridge layer, not the Kafka layer.
    - **DR posture: single-cluster HA only; disaster recovery is handled at the VM layer, outside this plan.** The Redpanda cluster is deployed as a **multi-node HA cluster inside South Bend** (rack/zone awareness, replication factor sufficient for node-level failures) and **does not** run a second Redpanda cluster, Cluster Linking, or MirrorMaker. Cross-DC recovery of South Bend as a whole is covered by the **existing enterprise VM-level DR** solution, which is **out of scope for this plan** — EventHub is just another VM workload as far as that DR story is concerned.
      - **Rationale:** avoids operating a standby Redpanda cluster (significant ongoing cost for a failure mode the enterprise already has a solution for), and inherits the DR posture already decided for every other central workload rather than inventing a bespoke one.
      - **Medium-outage resilience comes from ScadaBridge, not from a second cluster.** ScadaBridge's per-call store-and-forward (see `current-state.md`) keeps site writes durable through any outage shorter than its local queue capacity — that's the mechanism for ride-through, not a standby Redpanda.
      - _TBD — long-outage planning: at what outage duration does ScadaBridge local queue capacity become the binding constraint, and what's the operational response then (grow local disk, prioritize topics, drop operational-tier events)? Should be modeled once the Redpanda write-volume numbers are firm._
    - _TBD — read-path implications: any site-local consumers (e.g., site KPI processors, alarm pipelines) that need to react to events would need to traverse WAN to the central cluster. Confirm whether all planned consumers tolerate this, or whether specific high-criticality local consumers need a different pattern (e.g., consume directly from ScadaBridge rather than EventHub)._
    - _TBD — sizing the central cluster for the aggregate write volume of every site plus full-site backlog replay after a WAN outage._
  - **Topic design — site identity lives in the message, not the topic name.** Topics are named by **domain and event type only**, and every event carries a **`site` field in its headers/payload**. Consumers that want a single site's data filter in code (or via consumer-side stream processing).
    - **Rationale:** keeps topic count bounded as new sites onboard — adding Berlin, Winterthur, Jacksonville, etc. does not create a new topic set per site. Consumers that span all sites (Snowflake ingestion, global dashboards, enterprise Camstar integration) subscribe once and see the whole fleet; consumers that need one site filter on the header.
    - **Trade-off:** per-site ACLs and quotas are harder — Kafka ACLs are topic-scoped, so site-level authorization has to be enforced by producers (ScadaBridge) or by a stream-processing layer, not by the broker.
    - _TBD — whether the `site` identifier goes in Kafka **record headers** (cheaper to filter, not part of payload) or in the **payload schema** (forces schema discipline, survives header-unaware consumers). Likely both: header for routing/filtering, payload for durability._
  - **Topic naming convention: `{domain}.{entity}.{event-type}`.** Hierarchical, lowercase, dot-separated. Examples:
    - `mes.workorder.started`
    - `mes.workorder.completed`
    - `equipment.tag.value-changed`
    - `equipment.tag.quality-changed`
    - `scada.alarm.raised`
    - `scada.alarm.acknowledged`
    - `quality.inspection.completed`
    - **Rules of thumb:**
      - `domain` is the bounded context (mes, scada, equipment, quality, maintenance, …).
      - `entity` is the thing the event is about (workorder, tag, alarm, inspection, …).
      - `event-type` is what happened in past tense (started, completed, raised, acknowledged, …).
      - Use **hyphens within a segment** (`value-changed`), **dots between segments** (`equipment.tag.value-changed`).
      - Keep all segments lowercase; no site names, no environment names (dev/prod handled by cluster, not topic).
    - _TBD — authoritative list of `domain` values and who owns each; governance process for adding a new domain or event type._
  - **Schema registry: Redpanda Schema Registry (bundled).** Reuse the schema registry bundled with Redpanda — Confluent-API-compatible, so the **Snowflake Kafka Connector** and any Confluent-compatible client libraries work unmodified. Avoids running a separate Apicurio/Confluent registry service.
  - **Schema format: Protobuf.** All EventHub payloads are encoded as **Protobuf** and registered in the schema registry.
    - **Rationale:** first-class .NET tooling (matters because ScadaBridge is Akka.NET), compact binary wire format, and `.proto` files are reusable outside Kafka — the same schemas can be shared with Web API consumers, internal services, and any downstream system that wants a canonical definition of a shopfloor event.
    - **Trade-off:** Protobuf's schema-evolution rules are looser than Avro's, so we compensate with a strict **compatibility policy** in the registry (see TBD below) and discipline around field numbering and `reserved` markers.
    - **Compatibility policy: `BACKWARD_TRANSITIVE`.** New schema versions must be readable by consumers compiled against **any** previous version, not just the immediately preceding one. Chosen because: (1) it matches the natural producer-first rollout order (producers upgrade, consumers follow), (2) it is what the Snowflake Kafka Connector expects, and (3) the *transitive* variant protects against long-tail consumers that haven't upgraded in a while — a real risk given the per-site, per-ScadaBridge-cluster producer footprint. Trade-off: disallows some schema changes (e.g., removing a field still used by an older consumer); `buf breaking` in CI catches these at PR time.
    - **Subject naming strategy: `TopicNameStrategy`.** Each topic has exactly one message type, and registry subjects follow `{topic}-value` (and `{topic}-key` if keys are schematized) — e.g., `mes.workorder.started-value`. One topic, one schema, one subject. Chosen because: (1) it matches the "one topic = one event type" convention already adopted for topic naming, (2) it's the Snowflake Kafka Connector's default, so the Historian→Snowflake path works with no extra connector config, and (3) per-topic compatibility is easy to reason about (each subject evolves independently). Trade-off: topics cannot carry mixed record types — if we ever need an "event envelope" pattern, that topic will need a dedicated wrapper message rather than multiple concrete records on the same topic.
  - **Retention policy: per-topic tiered, classified by purpose.** Retention is not a single cluster-wide default; every topic is assigned a **retention tier** as part of its onboarding metadata.
    - **`operational` tier — 7 days.** Short-lived operational events (e.g., `scada.alarm.raised`, transient state changes) whose value decays rapidly. Enough to cover consumer restarts, deploys, and long weekends, but not so long that storage and replay costs balloon.
    - **`analytics` tier — 30 days.** Streams consumed by Snowflake ingestion, KPI processors, and other analytical consumers. 30 days gives enough room for backfill, connector re-bootstrapping, and monthly reprocessing windows without turning Redpanda into a long-term store.
    - **`compliance` tier — 90 days.** Events tied to validated/regulated flows where retention is driven by compliance requirements rather than operational need. Explicitly not the compliance system of record — Aveva Historian remains that — but 90 days gives a working window for investigations and audit support.
    - **Rationale:** retention is cheap to vary per topic, and these event classes have genuinely different needs. A cluster-wide default would either under-serve analytics or over-pay for operational topics.
    - **Enforcement:** the tier is set when the topic is created (via infra-as-code for topic definitions), not edited ad-hoc. Classification lives next to the schema in the central `schemas` repo so retention moves with the topic's definition.
    - _TBD — exact topic-classification criteria, exception process for topics that don't fit cleanly, tiered-storage offload for the analytics/compliance tiers (Redpanda Tiered Storage → S3/ADLS) to keep hot broker storage small, whether compliance-tier topics need separate ACLs or encryption._
  - **Security / auth model: SASL/OAUTHBEARER + prefix-based ACLs.**
    - **Authentication:** producers and consumers authenticate to Redpanda with **SASL/OAUTHBEARER** tokens issued by the enterprise identity provider. Every ScadaBridge cluster, every connector (Snowflake Kafka Connector), and every downstream service acquires a token via the OAuth **client-credentials** flow against the IdP, then presents it to Redpanda. Human operators reuse the same IdP via existing SSO.
      - **Rationale:** reuses enterprise SSO/IdP identity for machine workloads, avoids managing per-client X.509 certificates or long-lived passwords, and gives the security team one place to revoke, rotate, and audit access. Aligns with the broader enterprise integration direction.
      - **WAN-outage implication:** OAuth requires the IdP to be reachable for token issuance/refresh. When the WAN is down, a site's ScadaBridge cannot refresh its token. This is **acceptable** because the site's write path uses ScadaBridge **store-and-forward** (see EventHub Deployment footprint above) — queued events simply wait for both the IdP and Redpanda to be reachable before replaying. Local site operations are unaffected because ScadaBridge's producers don't block on EventHub availability.
      - _TBD — which IdP (Azure AD/Entra ID is the obvious candidate but not confirmed), token lifetime and refresh strategy, how ScadaBridge securely stores its client-credentials secret at each site, whether there's a local IdP fallback (short-lived cached tokens, emergency break-glass credentials) for extended multi-day WAN outages._
    - **Authorization: prefix-based ACLs.** Redpanda ACLs are granted on **topic prefixes** that follow the `{domain}.{entity}.{event-type}` naming convention, so each principal gets a compact, legible rule set instead of per-topic grants.
      - **Principal shape:** one principal per ScadaBridge cluster (per site), one per connector/consumer service, one per human role.
      - **Grant pattern:** write access is scoped to the domains a site owns (e.g., a site's ScadaBridge cluster gets `PRODUCE` on `equipment.*` and `scada.*`), while read access is scoped to the domains a consumer cares about (e.g., Snowflake ingestion gets `CONSUME` on `equipment.*`, `scada.*`, `mes.*`, and `quality.*`).
      - **Rationale:** the topic naming convention already carries ownership and domain information; prefix ACLs let that structure drive authorization without per-topic bookkeeping. Adding a new topic inside a domain automatically inherits the domain's grants.
      - _TBD — authoritative mapping of principals to prefix grants, whether ScadaBridge cluster principals should be scoped by site (producing only its own site's events) — note that site filtering is enforced at the producer today since site identity lives in the message, not the topic name, so Redpanda ACLs cannot enforce "only Warsaw West produces Warsaw West events"; that invariant is a ScadaBridge-side responsibility, which should be called out explicitly in the ScadaBridge configuration contract._
  - **Schema source of truth: single central `schemas` repo.** All `.proto` files live in one central repository, not per-domain repos and not co-located with producing services.
    - **Structure:** organized by `domain/entity` inside the repo, mirroring the topic naming convention (`mes/workorder.proto`, `equipment/tag.proto`, `scada/alarm.proto`, etc.). **CODEOWNERS** enforces per-domain ownership inside the single repo — reviews still go to the right team, but the style, layout, and compatibility policy are enforced uniformly.
    - **Publishing pipeline:** on every merge to main, the repo builds and publishes:
      - A **NuGet package** for .NET consumers (ScadaBridge and any .NET-based Web API clients).
      - Additional language packages (Java/Maven, Python/PyPI, etc.) **added only when a concrete consumer needs them** — not up front.
      - Compiled schemas are also **registered with the Redpanda Schema Registry** as part of the same pipeline, so registry state and source state never drift.
    - **Rationale:** one place to enforce the compatibility policy, one chokepoint for schema review, one versioned artifact for every consumer. Prevents drift between copies, avoids duplicate message definitions across domains, and gives enterprise consumers (Snowflake ingestion, Camstar integration) exactly one dependency to track.
    - **Trade-off accepted:** the central repo is a review chokepoint. Mitigated by CODEOWNERS routing and by keeping the compatibility policy mechanical (tooling enforces it — see TBD) rather than relying on a gatekeeper team.
  - **Schema tooling: `buf` (buf.build).** The central `schemas` repo standardizes on `buf` for lint, breaking-change detection, and code generation.
    - **`buf lint`** enforces style in CI — field naming, package layout, `reserved` discipline — so schema style doesn't drift across domains.
    - **`buf breaking`** runs in CI on every PR, checking the proposed changes against the previously published version of the package. Breaking changes fail the PR **before** merge — compatibility is enforced at PR time, not at registry-publish time.
    - **`buf generate`** drives code generation for the NuGet package (and any other language packages added on demand), replacing ad-hoc `protoc` invocations.
    - **Rationale:** PR-time feedback is dramatically better than publish-time errors; `buf` is the de-facto modern Protobuf toolchain with strong .NET/C# support; having lint, breaking-change checks, and codegen in one tool keeps the schema repo's CI simple.
    - _TBD — `buf.yaml` lint/breaking rule set (likely default + a few project-specific additions), versioning scheme on the NuGet package (semver, how breaking changes are signaled), release cadence (per-merge vs batched), whether to use `buf push` to a BSR or keep the NuGet package as the only distribution artifact._
  - **Usage patterns:**
    - **Async event notifications** — shopfloor events (state changes, alarms, lifecycle events, etc.) published to EventHub for any interested consumer to subscribe to, without producers needing to know who's listening.
    - **Async processing for KPI** — KPI calculations (currently handled on Ignition SCADA) can consume event streams from EventHub, enabling decoupled, replayable KPI pipelines instead of tightly coupled point queries.
    - **System integrations** — other enterprise systems (Camstar, Snowflake, future consumers) integrate by subscribing to EventHub topics rather than opening point-to-point connections into OT.
    - **Historical replay for integration testing and simulation-lite.** The `analytics`-tier retention (30 days) is explicitly also a **replay surface** for testing and simulation-lite: downstream consumers (ScadaBridge scripts, KPI pipelines, dbt models, a future digital twin layer) can be exercised against real historical event streams instead of synthetic data. This is the minimal answer to **Digital Twin Use Case 2 (Virtual Testing / Simulation)** — see **Strategic Considerations → Digital twin** → use case 2 — and does not require any new component. When longer horizons are needed, extend to the `compliance` tier (90 days). Replay windows beyond 90 days are served by the dbt curated layer in Snowflake, not by Redpanda.
  - _Remaining open items are tracked inline in the subsections above — sizing, read-path implications, long-outage planning, IdP selection, schema subject/versioning details, etc. Support staffing and on-call ownership are out of scope for this plan._
 #### Canonical Equipment, Production, and Event Model
 The plan already delivers the infrastructure for a cross-system canonical model — OtOpcUa's equipment namespace, Redpanda's `{domain}.{entity}.{event-type}` topic taxonomy, Protobuf schemas in the central `schemas` repo, and the dbt curated layer in Snowflake. What it had not, until now, explicitly committed to is **declaring** that these pieces together constitute the enterprise's canonical equipment / production / event model, and that consumers are entitled to treat them as an integration interface.
 This subsection makes that declaration. It is the plan's answer to **Digital Twin Use Cases 1 and 3** (see **Strategic Considerations → Digital twin**) and — independent of digital twin framing — is load-bearing for pillar 2 (analytics/AI enablement) because a canonical model is what makes "not possible before" cross-domain analytics possible at all.
 > **Unified Namespace framing:** this canonical model is also the plan's **Unified Namespace** (UNS) — see **Target IT/OT Integration → Unified Namespace (UNS) posture**. The UNS posture is a higher-level framing of the same mechanics described here: this section specifies the canonical model mechanically; the UNS posture explains what stakeholders asking about UNS should understand about how the plan delivers the UNS value proposition without an MQTT/Sparkplug broker.
 ##### The three surfaces
 The canonical model is exposed on three surfaces, one per layer:
 | Layer | Surface | What it canonicalizes |
 |---|---|---|
 | Layer 2 — Equipment | **OtOpcUa equipment namespace** | Canonical per-equipment OPC UA node structure. A consumer reading tag `X` from equipment `Y` at any site gets the same node path, the same data type, and the same unit. Equipment-class templates (e.g., "3-axis CNC," "injection molding cell") live here and are referenced from the Redpanda and dbt surfaces. |
 | Layer 4 → IT — Events | **Redpanda topics + Protobuf schemas** (`schemas` repo) | Canonical event shape. Every shopfloor event — `equipment.tag.value-changed`, `equipment.state.transitioned`, `mes.workorder.started`, `scada.alarm.raised`, etc. — has exactly one Protobuf message type, registered once, consumed everywhere. **This is where the canonical model is source-of-truth.** |
 | IT — Analytics | **dbt curated layer in Snowflake** | Canonical analytics model. Curated views expose equipment, production runs, events, and aggregates with the same vocabulary, dimensions, and state values as the OtOpcUa and Redpanda surfaces. Downstream reporting (Power BI, ad-hoc SQL) and AI/ML consume from here. |
 **Single source of truth: the `schemas` repo.** The three surfaces reference a shared canonical definition — they do not each carry their own. Specifically:
 - **Protobuf message definitions** in the `schemas` repo define the wire format for every canonical event.
 - **Shared enum types** in the `schemas` repo define the canonical **machine state vocabulary** (see below), canonical event-type values, and any other closed sets of values.
 - **Equipment-class definitions** in the `schemas` repo (format TBD — could be a Protobuf message set, could be a YAML document set referenced from Protobuf) describe the canonical node layout that OtOpcUa templates instantiate and that dbt curated views flatten into fact/dim tables.
 Consumers that need to know "what does a `Faulted` state mean" or "what are all the event types in the `mes` domain" look at the `schemas` repo. Any divergence between a surface and the `schemas` repo is a defect in the surface, not in the schema.
 ##### Canonical machine state vocabulary
 The plan commits to a **single authoritative set of machine state values** used consistently across layer-3 state derivations, Redpanda event payloads, and dbt curated views. This is the answer to Digital Twin Use Case 1.
 Starting set (subject to refinement during implementation, but the names and semantics below are committed as the baseline):
 | State | Semantics |
 |---|---|
 | `Running` | Equipment is actively producing at or near theoretical cycle time. |
 | `Idle` | Equipment is powered and available but not producing — no work in progress, no fault, nothing blocking. |
 | `Faulted` | Equipment has raised a fault that requires acknowledgement or intervention before it can return to `Running`. |
 | `Starved` | Equipment is ready to run but is blocked by missing upstream input (material, preceding operation). |
 | `Blocked` | Equipment is ready to run but is blocked by a downstream constraint (full buffer, unavailable downstream operation). |
 **Rules of the vocabulary:**
 - **One state at a time.** An equipment instance is in exactly one of these states at any moment. Multi-dimensional status (e.g., alarm severity, operator mode) is carried in **additional fields** on the state event, not by overloading the state value.
 - **Derivation lives at layer 3.** Deriving "true state" from raw signals (interlocks, status bits, PLC words, alarm registers) is a **Layer 3** responsibility — Aveva System Platform for validated derivations, Ignition for KPI-facing derivations. The dbt curated layer consumes the already-derived state; it does not re-derive.
 - **Events carry state transitions, not state polls.** Redpanda publishes a canonical `equipment.state.transitioned` event every time an equipment instance changes state, with the previous state, the new state, a reason code when available, and the underlying derivation inputs referenced by ID where possible. Current state is reconstructable from the transition stream.
 - **State values are an enum in the `schemas` repo.** Adding a state value is a schema change reviewed through the normal `schemas` repo governance (CODEOWNERS, `buf` CI, compatibility checks). Removing a state value is effectively impossible without a long-tail consumer migration — treat the starting set as durable.
 - **Top-fault derivation.** When `Faulted`, the canonical event carries a `top_fault` field (single fault code or string, per the `schemas` repo enum) rather than exposing the full alarm vector. The derivation of "top" from the underlying alarm set lives at layer 3 and is documented per-equipment-class in the `schemas` repo alongside the equipment-class definition.
 **Additions under active consideration (TBD — resolve during Year 1 implementation):**
 - `Changeover` — distinct from `Idle` because it is planned downtime with a known duration and setup workflow. Likely needed for OEE accuracy.
 - `Maintenance` — planned or unplanned maintenance state, again distinct from `Idle` for OEE accounting.
 - `Setup` / `WarmingUp` — start-of-shift or start-of-run conditioning where equipment is powered but not yet eligible to run.
 These are strong candidates but not committed in the starting set; the implementation team closes them with domain SMEs once the first equipment classes are modeled.
 ##### Relationship to OEE and KPI
 The canonical state vocabulary directly enables accurate OEE computation in the dbt curated layer without each consumer having to re-derive availability / performance / quality from scratch. This is one of the most immediate answers to pillar 2's "not possible before" use case criterion: cross-equipment, cross-site OEE computed once in dbt from a canonical state stream is meaningfully harder today because the state-derivation logic is fragmented across System Platform and Ignition scripts. Once the canonical state vocabulary is in place, OEE becomes a dbt model, not a bespoke script per site.
 ##### Not in scope for this subsection
 - **UI / visualization of the canonical model.** The model is a data contract, not a product. Dashboards, HMIs, and reporting surfaces (Ignition KPI, Power BI, any future digital-twin dashboard) all consume the canonical model — but building those surfaces is **not** what the canonical model commits to.
 - **Real-time state event frequency guarantees.** Latency and delivery semantics for `equipment.state.transitioned` events are governed by the general Redpanda latency profile and the `analytics`-tier retention (30 days); this subsection does not add a per-event SLO beyond what pillar 2's `≤15-minute analytics` commitment already provides.
 - **Predictive state prediction.** Forecasting whether an equipment instance is "about to fault" is a pillar-2 AI/ML use case on top of the canonical stream, not a canonical-model deliverable. The canonical model just has to be clean enough to train on.
 _TBD — exact `schemas` repo layout for equipment-class definitions (Protobuf vs YAML vs both); ownership of the canonical state vocabulary (likely a domain SME group rather than the ScadaBridge team); pilot equipment class for the first canonical definition; reconciliation process if System Platform and Ignition derivations of the same equipment's state diverge (they should not, but the canonical surface needs a tiebreak rule)._
 ## Observability
 The plan commits to **what must be observable**, not to **which tool** emits/stores/visualizes the signals. Each new component must expose its signals to whatever enterprise observability stack exists (reuse, don't reinvent). Tool selection is out of scope for this plan — the same posture as VM-level DR and orchestrator selection.
 **Minimum signal set — non-negotiable requirements on the goal-state components.**
 - **Redpanda (central EventHub cluster).**
  - Per-topic throughput (messages/sec, bytes/sec).
  - Producer and consumer **lag** per consumer group.
  - Broker node health and partition distribution.
  - **ACL deny counts** (authorization failures).
  - **OAuth token acquisition failures** (ties to the SASL/OAUTHBEARER auth model).
  - Disk utilization against retention policy per tier (`operational`, `analytics`, `compliance`).
 - **ScadaBridge (existing component, observability obligations added by this plan).**
  - Per-site store-and-forward queue depth and oldest-unsent age (critical for detecting extended WAN outages against ScadaBridge disk capacity).
  - EventHub publish success/failure rate per site.
  - Outbound Web API success/failure rate per destination.
  - Deadband filter activity (how many samples suppressed vs published) per tag class, to tune the global deadband value.
 - **SnowBridge.**
  - Per-source ingest rate (events/sec from Redpanda, rows/sec from Historian, etc.).
  - Per-source error rate and last-successful-read timestamp.
  - Snowflake write latency and failure rate per target table.
  - **Selection-change audit events** — every approved change is observable as an event, not just a DB row (so alerting on unusual change patterns is possible).
  - End-to-end latency (source emit → Snowflake queryable), measured against the **≤15-minute analytics** and **≤60-minute compliance** SLOs.
 - **dbt (on the self-hosted orchestrator).**
  - Per-model run duration, success/failure, and last-successful-run timestamp.
  - **Source freshness** — how stale the landing-table sources are that dbt reads from.
  - Failed test count (not just failed model count — `dbt test` results are a first-class signal).
  - Queued/stalled job counts on the orchestrator.
 **Alerting floor (what must page someone, whatever tool is chosen).**
 - Any component above breaching its SLO for sustained periods (definition of "sustained" is a per-signal TBD).
 - ScadaBridge store-and-forward queue approaching disk capacity at any site.
 - Consumer lag on any Snowflake-bound topic exceeding the analytics SLO budget.
 - Any selection change that bypasses the approval workflow (defense-in-depth against the UI being compromised).
 - Any OAuth token acquisition failure affecting production producers.
 _TBD — "sustained" durations per signal, concrete alert routing (on-call rotation, ticketing, etc.) — both outside this plan's scope per the "don't cover support staffing" decision, but noted as a seam for operations teams to wire up._
 ## Target Operator / User Experience
 > UX modernization is **not a primary goal** of this plan (see **Vision** and **Non-Goals**). This section captures the **long-term UX split** so that any incidental UX work done during migrations lands in the right place — not so that UX becomes a funded workstream.
 The long-term UX split **mirrors the SCADA data split** already in place:
 - **Ignition SCADA → KPI UX.** All KPI-facing operator and supervisor UX lives on Ignition. Dashboards, OEE views, production status boards, and any other KPI visualizations are built and maintained in Ignition.
 - **Aveva System Platform HMI → Validated data UX.** Any operator UX that interacts with **validated/regulated data** (compliance-grade data collection, GxP-relevant workflows, etc.) lives on System Platform HMI. Validation controls, audit trails, and regulated operator actions stay on System Platform.
 **Implications for the 3-year plan:**
 - Don't introduce a third UX platform. New operator UX should be placed on Ignition or System Platform HMI based on whether it's KPI-facing or validated-data-facing.
 - Migrations that touch UX should route it to the correct side of the split (e.g., a legacy KPI screen living outside Ignition should move to Ignition; a validated workflow outside System Platform HMI should move to System Platform).
 - No commitment is made here to rebuild existing UX that already runs acceptably — we're setting the target shape, not scheduling a UX refresh.
 _TBD — criteria for when a screen counts as "KPI" vs "validated"; handling of edge cases that need data from both worlds._
 ## Success Criteria / KPIs
 Success is measured against the three in-scope pillars from the **Vision**. Each pillar has one concrete, measurable criterion — vague criteria produce plans that quietly expand, so "done" is unambiguous.
 1. **Unification — 100% of sites integrated under the standardized model.**
   - Every site is onboarded through the standard stack: **ScadaBridge** at the edge, **Redpanda EventHub** as the async backbone, the **SnowBridge** for Snowflake-bound flows, and the **long-term UX split** (Ignition for KPI, System Platform HMI for validated).
   - In-scope sites: South Bend, Warsaw West, Warsaw North, Shannon, Galway, TMT, Ponce, **and all currently unintegrated smaller sites** (Berlin, Winterthur, Jacksonville, and the rest of the list once firm).
   - Measurement: a site counts as "integrated" when its equipment produces events into Redpanda via a local ScadaBridge cluster (or legitimate exception documented), it is represented in the standard topic taxonomy, and its legacy point-to-point integrations are retired (pillar 3).
 2. **Analytics / AI enablement — machine data queryable in Snowflake at the committed SLOs.**
   - **Aveva Historian** machine data (and event-driven data from ScadaBridge) is queryable in **Snowflake** at the committed latencies: **analytics ≤ 15 minutes end-to-end**, **compliance ≤ 60 minutes end-to-end**.
   - A defined set of **priority tags** (list TBD as part of onboarding) is flowing end-to-end through the SnowBridge, landing in Snowflake, and transformed into **dbt curated layers**.
   - At least **one production enterprise analytics or AI use case that was not possible before this pipeline existed** is consuming the curated layer in production. The test is enablement, not throughput: the use case must depend on data, freshness, or cross-site reach that the pre-plan stack could not deliver. Re-platforming an existing report onto Snowflake does not count; a new AI/ML model trained on cross-site machine data, a net-new cross-site OEE view, or an alert that depends on the ≤15-minute SLO would count.
 3. **Legacy middleware retirement — zero remaining legacy point-to-point integrations.**
   - The inventory of "legacy API integrations not yet migrated to ScadaBridge" (captured in `current-state.md`) goes to **zero** by end of plan.
   - Every cross-domain IT↔OT path runs through **ScadaBridge** (or, for the Snowflake path, through ScadaBridge → Redpanda → the SnowBridge).
   - Documented exceptions are allowed only with explicit approval and a retirement date; "temporary" carve-outs that outlast the plan count as a failure of this criterion.
 **Operating principle.** A pillar is not "partially done." Each criterion is binary at the end of the 3 years — either every site is on the standardized model or not, either the analytics pipeline is in production or not, either the legacy inventory is zero or not. Intermediate progress is tracked per-site / per-tag / per-integration in the plan's working documents, not by softening the end-state criteria.
 _TBD — named owners for each pillar's criterion; quarterly progress metrics (e.g., sites integrated/quarter, tags onboarded/quarter, legacy integrations retired/quarter) that roll up into the three binary end-state checks; the priority-tag list for pillar 2; the authoritative legacy-integration inventory for pillar 3._
 ## Strategic Considerations (Adjacent Asks)
 External strategic asks that are **not** part of this plan's three pillars but that the plan should be *shaped to serve* when they materialize. None of these commit the plan to deliver anything — they are constraints on how components are built so that future adjacent initiatives can consume them.
 ### Digital twin (management ask — use cases received 2026-04-15)
 **Status: management has delivered the requirements; the plan absorbs two of the three use cases and treats the third as exploratory.** The plan does not add a new "digital twin workstream" to `roadmap.md`, and no pillar criterion depends on a digital twin deliverable. What the plan does is **commit to the pieces** that management's three use cases actually require, as additions to existing components rather than as a parallel initiative. See [`goal-state/digital-twin-management-brief.md`](goal-state/digital-twin-management-brief.md) → "Outcome" for the meeting resolution.
 #### Management-provided use cases
 These are the **only requirements** management can provide — high-level framing, no product selection, no sponsor, no timeline beyond "directionally, this is what we want." Captured here verbatim in intent; the source document lives at [`../digital_twin_usecases.md.txt`](../digital_twin_usecases.md.txt) in its original form.
 1. **Standardized Equipment State / Metadata Model.** A consistent, high-level representation of machine state derived from raw signals: Running / Idle / Faulted / Starved / Blocked. Normalized across equipment types. Single authoritative machine state, derived from multiple interlocks and status bits. Actual-vs-theoretical cycle time. Top-fault instead of dozens of raw alarms. Value: single consistent view of equipment behavior, reduced downstream complexity, improved KPI accuracy (OEE, downtime).
 2. **Virtual Testing / Simulation (FAT, Integration, Validation).** A digital representation of equipment that emulates signals, states, and sequences, so automation logic / workflows / integrations can be tested without physical machines. Replay of historical scenarios, synthetic scenarios, edge-case coverage. Value: earlier testing, reduced commissioning time and risk, improved deployed-system stability.
 3. **Cross-System Data Normalization / Canonical Model.** A common semantic layer between systems: standardized data structures for equipment, production, and events. Translates system-specific formats into a unified model. Consistent interface for all consumers. Uniform event definitions (`machine fault`, `job complete`). Value: simplified integration, reduced duplication of transformation logic, improved consistency across the enterprise.
 Management's own framing of the combined outcome: "a translator (raw signals → meaningful state), a simulator (test without physical dependency), and a standard interface (consistent data across systems)."
 #### Plan mapping — what each use case costs this plan
 | # | Use case | Maps to existing plan components | Delta this plan commits to |
 |---|---|---|---|
 | 1 | Standardized equipment state model | Layer 3 (Aveva System Platform + Ignition state derivation) for real-time; dbt curated layer for historical; Redpanda event schemas for event-level state transitions | **Canonical machine state vocabulary.** Adopt `Running / Idle / Faulted / Starved / Blocked` (plus any additions agreed during implementation) as the **authoritative state set** across layer-3 derivations, Redpanda event payloads, and dbt curated views. No new component — commitment is that every surface uses the same state values, and the vocabulary is published in the central `schemas` repo. See **Async Event Backbone → Canonical Equipment, Production, and Event Model.** |
 | 2 | Virtual testing / simulation | Not served today by the plan, and not going to be served by a full simulation stack. | **Simulation-lite via replay.** Redpanda's analytics-tier retention (30 days) already enables historical event replay to exercise downstream consumers. OtOpcUa's namespace architecture can in principle host a future "simulated" namespace that replays historical equipment data to exercise tier-1 and tier-2 consumers — architecturally supported, not committed for build in this plan. **Full commissioning-grade simulation stays out of scope** pending a separate funded initiative. |
 | 3 | Cross-system canonical model | OtOpcUa equipment namespace (canonical OPC UA surface); Redpanda topic taxonomy (`{domain}.{entity}.{event-type}`) + Protobuf schemas; dbt curated layer (canonical analytics model) — all three already committed. | **Canonical model declaration.** The plan already builds the pieces; what it did not do is **declare** that these pieces together constitute a canonical equipment/production/event model that consumers are entitled to use as an integration interface. This declaration lives in the central `schemas` repo as first-class content and is referenced from every surface that exposes the model. See **Async Event Backbone → Canonical Equipment, Production, and Event Model.** |
 #### Resolution against the meeting brief's four buckets
 The meeting brief framed four outcome buckets (#1 already-delivered, #2 adjacent-funded, #3 future-plan-cycle, #4 exploratory). Management's actual answer does not land in a single bucket — it **splits per use case:**
 - **Use cases 1 and 3 → Bucket #1 with small plan additions.** The plan already delivers the substrate; it now also commits to the canonical state vocabulary (use case 1) and the canonical model declaration (use case 3), both captured below under **Async Event Backbone → Canonical Equipment, Production, and Event Model**. No new workstream, no new component, no pillar impact.
 - **Use case 2 → Bucket #4, served minimally.** Replay-based "simulation-lite" is architecturally enabled by Redpanda's retention tiers and OtOpcUa's namespace model. Full FAT / commissioning / integration-test simulation remains out of scope for this plan. If a funded simulation initiative materializes later, this plan's foundation supports it; until then, the narrow answer to use case 2 is "replay what Redpanda already holds, and build a simulated OtOpcUa namespace when a specific testing need justifies it."
 #### Design constraints this imposes (unchanged)
 - **Any digital twin capability must consume equipment data through OtOpcUa.** No direct equipment OPC UA sessions.
 - **Any digital twin capability must consume historical and analytical data through Snowflake + dbt** — not from Historian directly, not through a bespoke pipeline. The `≤15-minute analytics` SLO is the freshness budget available to it.
 - **Any digital twin capability must consume event streams through Redpanda** — not a parallel bus. The same schemas-in-git and `{domain}.{entity}.{event-type}` topic naming apply. The canonical state vocabulary and canonical model declaration (see below) are how "consistent state semantics" is delivered.
 - **Any digital twin capability must stay within the IT↔OT boundary.** Enterprise-hosted twins cross through ScadaBridge central and the SnowBridge like every other enterprise consumer.
 > **Unified Namespace vocabulary:** stakeholders framing the digital twin ask in "Unified Namespace" terms are asking for the same thing Use Cases 1 and 3 describe, just in UNS language. See **Target IT/OT Integration → Unified Namespace (UNS) posture** for the plan's explicit UNS framing and the decision trigger for a future MQTT/Sparkplug projection service. In short: the plan **already** delivers the UNS value proposition; an MQTT-native projection can be added later if a consumer specifically requires it.
 #### What this does and does not commit
 **Commits:**
 - A canonical machine state vocabulary (`Running / Idle / Faulted / Starved / Blocked` + any additions), published in the `schemas` repo and used consistently across layer-3 derivations, Redpanda event schemas, and dbt curated views.
 - A canonical equipment / production / event model declaration in the `schemas` repo, referencing the three surfaces (OtOpcUa, Redpanda, dbt) where it is exposed.
 - Retention-tier replay of Redpanda analytics topics as a documented capability usable for integration testing and simulation-lite.
 **Does not commit:**
 - Building or buying a full commissioning-grade simulation product (Aveva Digital Twin, Siemens NX, DELMIA, Azure Digital Twins, etc.).
 - A digital twin UI, dashboard, 3D visualization, or product surface.
 - Predictive / AI models specific to digital twin use cases — those are captured under pillar 2 as general analytics/AI enablement, not as digital-twin-specific deliverables.
 - Any new workstream, pillar, or end-of-plan criterion tied to digital twin delivery.
 _TBD — whether any equipment state additions beyond the five names above are needed (e.g., `Changeover`, `Maintenance`, `Setup`); ownership of the canonical state vocabulary in the `schemas` repo (likely a domain-specific team rather than the ScadaBridge team); whether a use-case-2 funded simulation initiative is on anyone's horizon._
 ### Enterprise reporting: BOBJ → Power BI migration (adjacent initiative)
 **Status: in-flight, not owned by this plan.** Enterprise reporting is actively migrating from **SAP BusinessObjects** to **Microsoft Power BI** (see [`current-state.md`](current-state.md) → Aveva Historian → Current consumers). This is a reporting-team initiative, not a workstream of this 3-year plan — but it **overlaps with pillar 2** (analytics/AI enablement) in a way that requires explicit coordination, because both initiatives ultimately consume machine data and both ultimately present analytics to business users.
 **This plan's posture:** no workstream is added to `roadmap.md`, and no pillar criterion depends on the Power BI migration landing on any particular schedule. However, the plan's Snowflake-side components (SnowBridge, dbt curated layer) are shaped so that Power BI can consume them cleanly **if and when** the reporting team decides to point there. Whether Power BI actually does so, on what timeline, and for which reports is **not this plan's decision** — it is a coordination question between this plan and the reporting team.
 #### Three consumption paths for Power BI
 The reporting team's Power BI migration can land on any of three paths. Each has different implications for this plan:
 **Path A — Power BI reads from the Snowflake dbt curated layer.**
 - *Fit with this plan's architecture:* **best.** Machine data flows through the planned pipeline (equipment → OtOpcUa → layer 3 → ScadaBridge → Redpanda → SnowBridge → Snowflake → dbt → Power BI). The architectural diagram in `## Layered Architecture` above already shows this as the intended shape.
 - *What it requires from this plan:* the dbt curated layer must be built to serve **reporting**, not only AI/ML. Likely adds a **reporting-shaped view or semantic layer** on top of the curated layer, tuned for Power BI's query patterns and cross-domain joins. SnowBridge's tag selection must include tags that feed reporting, not only tags that feed the pillar-2 AI use case.
 - *What it requires from the reporting team:* capacity and willingness to consume Snowflake as a data source (Power BI has a native Snowflake connector; the learning curve is in the semantic layer, not the connection). Commitment to defer at least the machine-data portion of the BOBJ migration until the dbt curated layer is live — which ties the reporting migration's machine-data cutover to this plan's Year 2+ delivery.
 - *Risk:* **timing coupling.** If the reporting team wants to finish their migration inside Year 1, this path doesn't work for machine-data reports. They'd need to hold on machine-data reports and migrate the rest first — which is tenable (reports migrate in waves anyway) but needs agreement.
 - *"Not possible before" hook:* Path A opens the door to **cross-domain reports** (machine data joined with MES/ERP data in one query) that BOBJ couldn't easily deliver. This is a strong candidate for pillar 2's "not possible before" use case.
 **Path B — Power BI reads from Historian's MSSQL surface directly.**
 - *Fit with this plan's architecture:* **neutral.** Historian's SQL interface is its native consumption surface (see [`current-state/legacy-integrations.md`](current-state/legacy-integrations.md) → Deliberately not tracked → Historian SQL reporting consumers). This path is not legacy, not a retirement target.
 - *What it requires from this plan:* **nothing.** This plan makes no changes to Historian's MSSQL surface.
 - *What it requires from the reporting team:* a pure tool migration (BOBJ → Power BI, same data path). Shortest path to finishing the Power BI migration on the reporting team's preferred timeline.
 - *Risk:* **perpetuates the current pattern.** All of the reasons the plan chose a Snowflake-based analytics substrate still apply — cross-domain joins are hard, raw-resolution scale is painful, Historian carries reporting read load on top of its compliance role. Pillar 2's "single analytics substrate" story weakens; the organization ends up running two reporting substrates (Historian SQL for machine data, Snowflake for AI/ML use cases). The machine-data analytics cost moves with Historian rather than with Snowflake's pay-per-use model, which makes the "Snowflake cost story" of this plan less compelling against a baseline that doesn't include reporting load.
 - *"Not possible before" hook:* none beyond what Historian SQL already offers.
 **Path C — Both, partitioned by report category.**
 - *Shape:* compliance/validation reports read Historian directly (because Historian is the authoritative system of record and auditors typically want reports against it); machine-data analytics and cross-domain reports read from Snowflake dbt; reports sourced from Camstar/Delmia/ERP stay on their native connectors. Reports migrate per category.
 - *Fit with this plan's architecture:* **pragmatic.** Acknowledges that enterprise reporting is heterogeneous and that one path doesn't fit everything.
 - *What it requires from this plan:* Path-A requirements (reporting-shaped dbt layer, tag selection in SnowBridge) for the Snowflake portion. No new requirements for the Historian portion.
 - *What it requires from the reporting team:* a published **report-category → data-source** rubric that dev teams can use to place new reports on the right path. Needs governance; otherwise new reports land wherever feels easiest at the time.
 - *Risk:* **complexity.** Two semantic layers, two connection paths, two mental models for report authors. Worth it only if the volume of cross-domain / AI-adjacent reporting is high enough to justify Path A alongside Path B.
 #### Recommended position
 **Path C (with Path A as the strategic direction).** Expect most machine-data-heavy reports and all cross-domain reports to move to Snowflake (Path A) over Years 2–3 as the dbt curated layer matures; expect compliance reports to stay on Historian's SQL surface (Path B) indefinitely because Historian is the authoritative regulatory system of record and moving compliance reporting off it introduces chain-of-custody questions we don't want to open. Path B is **explicitly** not a retirement target (see the carve-out in the legacy inventory), so "staying" is a valid end state for compliance reporting.
 **Why not pure Path A:** forces a needless fight over compliance reports that have no business case for leaving Historian.
 **Why not pure Path B:** gives up the cross-domain reporting upside that is one of the most compelling answers to "what does pillar 2 get us that we couldn't do before?"
 **Why not leave the decision open:** without a plan position, the reporting team will default to Path B by inertia (it's the shortest path and they're already mid-migration). That locks in the weakest of the three outcomes.
 #### Questions to take to the reporting team
 Use these to land the coordination conversation. Priority order — the first four are the must-answers:
 1. **What's your timeline for completing BOBJ → Power BI?** Specifically, when do you expect to have migrated (a) all non-machine-data reports, (b) machine-data reports that read Historian, and (c) cross-domain reports? This tells us whether holding machine-data reports for Path A is even tenable on your side.
 2. **Have you made an architectural decision on Power BI's connection to Historian?** Direct MSSQL link, Power BI gateway + on-prem data source, Azure Analysis Services in front of Historian, dataflows, something else? A decision already baked in may be hard to unwind.
 3. **Has Snowflake been evaluated as a Power BI data source?** If yes, what were the findings (cost, performance, semantic modeling effort)? If no, would you be open to an evaluation once the first dbt curated layer is live in Year 2?
 4. **Is there a business stakeholder asking for cross-domain reports** (machine data joined with MES/ERP/Camstar data in one report) that BOBJ can't deliver today? A named stakeholder here is the strongest signal that Path A is worth the coordination cost.
 5. **What's the rough split of your report inventory** between machine-data-heavy reports, compliance reports, cross-domain reports, and pure-enterprise reports? A rough count is enough — we're not looking for a census, just the shape of the portfolio.
 6. **Does the reporting team have capacity to learn Snowflake + dbt semantic modeling?** If that's a deal-breaker, Path A is off the table and we should plan for Path B + a parallel Snowflake analytics stack that non-reporting users consume.
 7. **Who owns the decision on Power BI's data sources?** Your team, a BI governance body, IT architecture, the CIO? We need to know who to bring into the Path-A discussion if it progresses.
 8. **Would you be willing to pilot one cross-domain report on Snowflake (Path A) during Year 2** as a proof point, independent of the rest of the migration? This is a low-commitment way to validate Path A before betting more reports on it.
 #### Decision rubric
 After the conversation, place the outcome into one of these buckets:
 - **Bucket A — Full Path A commitment.** Reporting team commits to migrating all non-compliance reports to Snowflake over Years 2–3. → Update `roadmap.md` (Snowflake dbt Transform Layer workstream) to include reporting-shaped views in Year 2. Update `goal-state.md` to name cross-domain reporting as a pillar 2 "not possible before" candidate.
 - **Bucket B — Path C commitment.** Reporting team commits to the hybrid path with a published report-category rubric. → Same roadmap updates as A, plus document the rubric as a link from this subsection.
 - **Bucket C — Path B lock-in.** Reporting team declines Path A for cost, capacity, or timing reasons. → Update `goal-state.md` here to record the decision. No roadmap changes. Pillar 2's "not possible before" use case must come from a different source (e.g., predictive maintenance, OEE anomaly detection) because cross-domain reporting is off the table.
 - **Bucket D — Conversation inconclusive.** Reporting team needs more time, or the decision is above their level. → Schedule follow-up. Note which questions were answered and which are still open.
 #### What this does NOT decide
 - Whether the reporting team completes their Power BI migration (their decision).
 - Whether Historian's SQL surface is ever retired (no — it's the compliance system of record).
 - Whether this plan's Snowflake dbt layer supports Power BI (yes, it can — the question is only whether the reporting team will consume it).
 - Whether the SnowBridge's tag selection is driven by reporting requirements (partly — SnowBridge's selection is governed by blast-radius approval, so reporting-team requests are handled through the same workflow as any other).
 _TBD — name and sponsor of the Power BI migration initiative; named owner on the reporting team for this coordination; whether a joint session between this plan's build team and the reporting team has been scheduled; whether a Power BI + Snowflake proof-of-concept can fit into Year 1 as a forward-looking test, independent of the rest of Year 1's scope._
 ## Non-Goals
 - **Operator UX modernization is not a primary goal** of this 3-year plan. Revamping HMIs, operator dashboards, or shopfloor UI frameworks is explicitly deprioritized against the three in-scope pillars (unification, analytics/AI enablement, legacy middleware retirement). If UX work happens, it happens as a by-product of migrations — never as a standalone initiative funded from this plan.
 - _TBD — other explicit non-goals (e.g., specific technologies we are not adopting, scope boundaries vs. adjacent programs)._
--- a/goal-state/digital-twin-management-brief.md
+++ b/goal-state/digital-twin-management-brief.md
@@ -0,0 +1,150 @@
 # Digital Twin — Management Conversation Brief
 A walk-into-the-meeting artifact for the **management conversation** that turns the ask ("we want digital twins") into a scoped response.
 > This brief is a **meeting prep document**, not plan content. The authoritative plan position on digital twin lives in [`../goal-state.md`](../goal-state.md) → **Strategic Considerations (Adjacent Asks)** → **Digital twin** — this file exists to prepare for the clarification conversation referenced there.
 ## Outcome — conversation complete (2026-04-15)
 **Status: the conversation has happened.** Management delivered three concrete high-level use cases as their complete answer — that is all the requirements framing they can provide. Source document: [`../digital_twin_usecases.md.txt`](../digital_twin_usecases.md.txt).
 **The three use cases management delivered:**
 1. **Standardized Equipment State / Metadata Model** — raw signals → meaningful canonical state (`Running` / `Idle` / `Faulted` / `Starved` / `Blocked`), cycle-time accuracy, top-fault derivation.
 2. **Virtual Testing / Simulation** — emulate equipment signals/states for automation-logic testing, FAT, integration validation, replay of historical and synthetic scenarios.
 3. **Cross-System Data Normalization / Canonical Model** — common semantic layer with standardized equipment/production/event structures and uniform event definitions across systems.
 **Bucket resolution — splits across use cases, does not land in a single bucket:**
 | Use case | Bucket | Plan response |
 |---|---|---|
 | 1 — Standardized state model | **#1 with a small addition** — plan absorbs it. | Commit to a canonical machine state vocabulary (`Running / Idle / Faulted / Starved / Blocked` + TBD additions like `Changeover`, `Maintenance`). Derived at layer 3, published as an enum in the central `schemas` repo, consumed uniformly across Redpanda events and dbt curated views. See [`../goal-state.md`](../goal-state.md) → Async Event Backbone → **Canonical Equipment, Production, and Event Model** → **Canonical machine state vocabulary**. |
 | 2 — Virtual testing / simulation | **#4 — served minimally, full scope exploratory.** | Replay-based simulation-lite enabled by Redpanda's `analytics`-tier retention (30 days); OtOpcUa's namespace architecture can accommodate a future `simulated` namespace without reshaping the component. Full commissioning-grade FAT / integration simulation stays **out of scope** for this plan. If a funded simulation initiative materializes, this plan's foundation supports it — no new workstream until then. |
 | 3 — Cross-system canonical model | **#1 with a framing commitment** — plan absorbs it. | The plan already builds the pieces (OtOpcUa equipment namespace, Redpanda topic taxonomy, Protobuf schemas in central `schemas` repo, dbt curated layer). Commit to declaring these pieces as **the** canonical equipment/production/event model that consumers are entitled to treat as an integration interface. See [`../goal-state.md`](../goal-state.md) → Async Event Backbone → **Canonical Equipment, Production, and Event Model**. |
 **What this meeting did NOT produce** (deliberately, because management could not provide these details and the plan does not require them to move forward):
 - A named sponsor for a separately funded digital twin initiative.
 - A budget or timeline for use case 2 (simulation).
 - A specific vendor product selection.
 - A "kind of twin" framing (equipment twin vs line twin vs genealogy twin vs simulation twin) — the three use cases above cut across multiple categories from the brief's Q2, which is fine given how the plan absorbs them.
 - Any decision that would add a workstream to [`../roadmap.md`](../roadmap.md).
 **What comes next:**
 - **Use cases 1 and 3 are now plan commitments** and get implemented under existing workstreams (Redpanda EventHub for the schemas/vocabulary, Snowflake dbt Transform Layer for the curated-view side). See [`../roadmap.md`](../roadmap.md) → Year 1 updates.
 - **Use case 2 remains open as an exploratory item.** The narrower open question carried forward is tracked in [`../status.md`](../status.md) → Top pending items: "Simulation initiative (digital twin use case 2) — exploratory; no plan action until/unless a funded initiative materializes with a sponsor."
 **This brief is retained for reference.** The pre-meeting framing (question priority, interpretation table, decision tree, four-bucket framework) remains useful if a follow-up conversation is needed — especially around use case 2 (simulation scoping) or if management surfaces additional use cases beyond the three above. The rest of the document continues below unchanged for that purpose.
 ---
 ## Goal of the meeting
 Come out with enough information to place the ask into **one of four buckets**:
 1. **Already delivered by this plan.** The "real" need is a Snowflake-backed historical / predictive view of equipment health and performance. Recommendation: no new workstream; the first twin use case lands in Year 2 or Year 3 as one of pillar 2's "not possible before" analytics use cases. This is the predicted outcome (see `goal-state.md` → Digital twin → "Likely outcome of this conversation").
 2. **Adjacent initiative, consumes this plan's foundation.** A funded, sponsored, separately-scoped twin effort runs alongside this plan and consumes OtOpcUa, Redpanda, Snowflake, and the SnowBridge as its data substrate. Recommendation: no changes to this plan's pillars; digital twin team owns delivery; this plan commits to keeping the foundation consumable.
 3. **Folded into a future version of this plan.** A twin capability becomes a new pillar in a v2 of this plan — not today. Recommendation: document the agreement, park until the next planning cycle.
 4. **Genuinely undefined — exploratory ask.** Management wants us to "look at it" but has no problem statement, sponsor, or timeline. Recommendation: run a scoped proof-of-concept (one equipment class, one site) on OtOpcUa's new equipment namespace as an inexpensive, low-commitment response; defer the bigger question.
 Any outcome other than these four means the conversation did not converge; schedule a follow-up rather than try to commit on the spot.
 ## Suggested opener
 > "Thanks for raising digital twin as something you want us to look at. Before we commit anything into the 3-year plan, we want to make sure what we build actually lands against what you're after — 'digital twin' covers enough different things that it's worth an hour to sharpen the ask. We've come with a short list of clarifying questions. Good news up front: most of the likely shapes of this ask are already served by the foundation we're building for analytics and AI enablement, so this conversation is more likely to end with 'here's how you already get it' than 'we need a new workstream.'"
 This framing is deliberately **not** defensive. The plan already shapes its components for a prospective digital twin layer; we're not pushing back, we're helping the ask land in a form we can execute against.
 ## Question priority grouping
 The 8 questions in `goal-state.md` are all useful, but they are not equally diagnostic for placing the ask into one of the four buckets. Use this order:
 ### Must-answer (drive the bucket decision)
 These three typically resolve the entire conversation:
 - **Q1. What problem are you trying to solve?** — The single most diagnostic question. If the answer is framed in terms of downtime, predictive maintenance, quality yields, or compliance evidence, the likely bucket is #1 (Snowflake-backed) or #2 (adjacent initiative on this foundation). If it is framed in terms of operator training or line simulation, the likely bucket is #2 (adjacent, probably vendor product) or #4 (exploratory). If there is no problem — "we just need to be doing digital twin" — the bucket is #4.
 - **Q7. Is there a named sponsor and funding?** — Hard gate between buckets. Sponsor + funding → bucket #2. No sponsor, no funding → bucket #4. Future plan cycle → bucket #3. This question also controls how much time it's worth spending on the other seven.
 - **Q8. Is this connected to an initiative already underway?** — If yes (operational excellence, predictive maintenance pilot, AI/ML platform, sustainability dashboards), the "real" ask is that parent initiative and we should talk to it directly. Finding the parent is often the fastest path to bucket #1.
 ### Nice-to-have (sharpen the scope once the bucket is known)
 Once the bucket is known, these refine the response:
 - **Q2. Which *kind* of digital twin?** — Pins the architectural fit. Equipment/asset twin → OtOpcUa + real-time layer. Line/cell twin → Snowflake + dbt. Product/genealogy twin → Camstar MES, **not** this plan. Simulation twin → vendor product. Predictive/AI twin → Snowflake + dbt + an ML layer.
 - **Q4. Real-time, historical, predictive, or simulation?** — Overlaps with Q2 but is useful as a sanity-check if the answer to Q2 is "a bit of everything" (which usually means "undefined").
 - **Q5. Scope and timing?** — Converts an abstract ask into something you can actually say yes or no to. Also the easiest question to get a "someday" answer on, which is itself informative.
 ### Skip if time is short
 - **Q3. Who uses it?** — Helpful if answered crisply, usually vague if not. Can be deferred to a follow-up.
 - **Q6. Assumed product?** — Only relevant if the bucket is #2 and build-vs-buy is on the table. Irrelevant if we're in bucket #1, #3, or #4.
 ## Interpretation table — likely answer patterns and what they mean
 | If the answer sounds like... | The real ask is probably... | Bucket | Response |
 |---|---|---|---|
 | "Reduce unplanned downtime on our critical equipment" | Predictive maintenance on historical equipment data | #1 | "This is a pillar 2 use case. Year 2–3 delivery on the dbt curated layer." |
 | "See equipment state in real time from anywhere" | Real-time equipment dashboard | #1 or #2 | Year 2+ on Ignition + Snowflake (pillar 2) if enterprise-read-only; separate initiative if interactive/bidirectional. |
 | "Train operators without touching real equipment" | Simulation / process twin | #2 | Vendor product (Aveva Digital Twin, DELMIA, Siemens NX). Separate initiative — this plan provides the data substrate only. |
 | "Track every part through the factory with its full history" | Product / genealogy twin | Not this plan | Camstar MES territory — direct management to the Camstar owner. |
 | "Forecast future equipment failures from sensor data" | Predictive / AI twin | #1 | Pillar 2 use case. Year 2–3 on the curated layer + an ML layer. |
 | "We saw a demo of \<specific product\> and want to evaluate it" | Vendor-driven exploration | #4 or #2 | Proof-of-concept, scoped to one equipment class on OtOpcUa's equipment namespace. |
 | "The board wants to hear about our digital transformation" | No concrete ask; political positioning | #4 | Reframe as "here's what we're already doing that counts as digital transformation" rather than building something new. |
 | "\<Parent initiative\> needs a digital twin component" | The parent initiative is the real ask | Depends on parent | Route the conversation to the parent initiative's sponsor. |
 ## Decision tree
 Use this in the moment to place the ask:
 ```
 Is there a named sponsor and funding? (Q7)
 ├── No → Is there a concrete problem? (Q1)
 │        ├── No → Bucket #4 (exploratory). Offer: PoC on one equipment class, deferred bigger decision.
 │        └── Yes → Does it fit pillar 2? (Q1, Q4)
 │                 ├── Yes → Bucket #1. Already delivered; Year 2–3 use case.
 │                 └── No → Bucket #3. Park for next planning cycle.
 └── Yes → Is there a parent initiative? (Q8)
          ├── Yes → Route to parent initiative owner. Out of this plan's hands.
          └── No → Does it fit the foundation this plan delivers? (Q2, Q4)
                   ├── Yes → Bucket #2. Adjacent, consumes this plan's foundation.
                   └── No → Bucket #2 anyway, but flag that the foundation gap may need to be filled.
 ```
 ## Non-negotiables to hold in the conversation
 Whatever the bucket turns out to be, these are already committed positions of the plan and should not be renegotiated in the meeting:
 - **Any twin must consume equipment data through OtOpcUa.** No direct equipment OPC UA sessions.
 - **Any twin must consume historical/analytical data through Snowflake + dbt.** No direct Historian pulls, no bespoke pipelines.
 - **Any twin must consume event streams through Redpanda.** No parallel messaging bus.
 - **Any twin must stay within the IT↔OT boundary** — enterprise-hosted twins cross through ScadaBridge central and the SnowBridge like every other enterprise consumer.
 These are on line in `goal-state.md` → Digital twin → "Design constraints this imposes." Restate them if the conversation drifts toward a parallel integration path.
 ## Outputs of the meeting
 Bring back:
 1. The **bucket assignment** (or a reason the conversation did not converge and needs a follow-up).
 2. The **sponsor and funding** status, if known.
 3. Any **parent initiative** identified.
 4. A **one-line summary** of the actual problem the ask exists to solve, in management's own words — this is the quotable thing you'll use to explain the decision later.
 5. Agreement on the **next action**: file the use case into pillar 2, stand up a PoC, park until next planning cycle, or route to a parent initiative owner.
 If you come back without (1) and (5), the meeting did not do its job — schedule the follow-up before leaving the room.
 ## What to do after the meeting
 - If **bucket #1**: update `goal-state.md` → Digital twin section with a one-line pointer noting "resolved to pillar 2 analytics use case" and a date. Add the use case to the pillar 2 candidate list. Remove the top-pending-item entry from `../status.md`.
 - If **bucket #2**: update `goal-state.md` with the sponsor, scope, and foundation touchpoints. No changes to pillars. Keep this brief on file for the adjacent initiative's kickoff.
 - If **bucket #3**: note the agreement in `goal-state.md` and move on. Surface in the next planning cycle.
 - If **bucket #4**: document the PoC scope in `goal-state.md` (one equipment class, one site, one quarter) and kick it off as a Year 1 side activity on OtOpcUa. Do **not** add a workstream to `roadmap.md` — PoCs don't belong on the grid.
 ---
 **Related:**
 - [`../goal-state.md`](../goal-state.md) → Strategic Considerations → Digital twin — plan position and design constraints.
 - [`../goal-state.md`](../goal-state.md) → OtOpcUa — "any future consumers such as a prospective digital twin layer."
 - [`../status.md`](../status.md) → Top pending items — where this meeting sits in the open-work queue.
--- a/outputs/DESIGN.md
+++ b/outputs/DESIGN.md
@@ -0,0 +1,164 @@
 # Design — Repeatable Output Generation
 **Date:** 2026-04-15
 **Topic:** PPTX presentation and PDF longform generation from the 3-year plan markdown source.
 **Status:** Approved — proceeding to implementation.
 > **Path note.** The `superpowers-extended-cc:brainstorming` skill's default location for this file is `docs/plans/YYYY-MM-DD-<topic>-design.md`, but this repo's convention (per [`../CLAUDE.md`](../CLAUDE.md)) is markdown files at root or under topical subdirectories. This design document is about the output-generation pipeline, so it lives alongside the pipeline it describes in [`outputs/`](.). Same intent, different path.
 ## Problem
 The plan lives in ~10 markdown files. It needs to be published as:
 1. A **mixed-stakeholder PowerPoint** (18 slides) for steering / leadership / technical-lead audiences.
 2. A **long-form PDF** of the authoritative plan content for archival, handoff, and offline reading.
 Both outputs must be **repeatable reliably** across sessions and across time — regenerating them after the plan has evolved should produce consistent structure and style, with bounded variability in Claude-authored phrasing only.
 ## Approach — four-part reliability harness
 Free-form "Claude, regenerate the outputs" drifts. The four fixes:
 1. **Spec files as structure anchors.** `outputs/presentation-spec.md` and `outputs/longform-spec.md` enumerate exactly which source feeds which slide / chapter, with no Claude discretion over structure. The prompt ("regenerate outputs") stays constant; the specs evolve as the plan does.
 2. **Pinned invocation checklist.** `outputs/README.md` documents the trigger phrases and the numbered procedure I follow on every regeneration. No free-form prompting.
 3. **Hand-drawn diagrams as committed files.** `outputs/diagrams/*.mmd` + `outputs/diagrams/*.png`. Diagrams change only when the `.mmd` is edited and re-rendered; regeneration just embeds the existing PNG.
 4. **Diff-friendly intermediate + run log.** Every regeneration appends a line to `outputs/run-log.md` recording the timestamp, what was regenerated, and any warnings (missing source sections, broken links, overflow truncations). Drift between runs is visible by construction.
 Together these reduce "what Claude has to decide" to **text phrasing inside a fixed structure**, which has bounded variability that the run log makes auditable.
 ## Section 1 — Repo structure
 ```
 plan/
 ├── current-state.md, goal-state.md, roadmap.md, status.md    (existing, unchanged)
 ├── current-state/, goal-state/                                (existing, unchanged)
 └── outputs/                                                    (NEW)
    ├── README.md                 — trigger phrases + numbered regeneration checklist
    ├── DESIGN.md                 — this document
    ├── run-log.md                — append-only regeneration audit log
    ├── presentation-spec.md      — slide-by-slide PPTX structure anchor
    ├── longform-spec.md          — chapter/appendix structure anchor
    ├── diagrams/
    │   ├── architecture-layers.mmd      (Mermaid source, committed)
    │   ├── architecture-layers.png      (exported, committed)
    │   ├── end-to-end-flow.mmd
    │   └── end-to-end-flow.png
    └── generated/
        ├── plan-presentation.pptx
        └── plan-longform.pdf
 ```
 Spec files and diagrams are **source**; `generated/` is **artifact**.
 ## Section 2 — Generation workflow
 **Trigger phrases** (any future Claude session recognizes these from `outputs/README.md`):
 - `regenerate outputs` — both
 - `regenerate presentation` — PPTX only
 - `regenerate longform` — PDF only
 **Procedure** (checklist in `outputs/README.md`):
 1. Read `outputs/presentation-spec.md` or `outputs/longform-spec.md` — the structure anchor.
 2. Read every source file named in the spec. Files not in the spec are **not** read.
 3. For each slide / chapter in the spec, populate from the named source using the spec's rules (full text, N-bullet summary, diagram embed, etc.).
 4. Invoke `document-skills:pptx` or `document-skills:pdf` to render to `outputs/generated/`.
 5. Append a run log entry to `outputs/run-log.md`.
 Regeneration is **read-only on plan content**. The only files it writes to are under `outputs/generated/` and `outputs/run-log.md`.
 ## Section 3 — PPTX spec (18 slides, mixed-stakeholder)
 Slide-by-slide mapping (full detail lives in `outputs/presentation-spec.md`):
 | # | Slide | Source anchor |
 |---|---|---|
 | 1 | Title | — |
 | 2 | Executive Summary | `goal-state.md` → Vision + Success Criteria |
 | 3 | Today's Reality | `current-state.md` → pain points |
 | 4 | Vision | `goal-state.md` → Vision |
 | 5 | Three Pillars | `goal-state.md` → Success Criteria |
 | 6 | Enterprise Layout | `current-state.md` → Enterprise Layout |
 | 7 | Today's Systems | `current-state.md` → Systems & Interfaces |
 | 8 | Architecture (diagram) | `goal-state.md` → Layered Architecture + `diagrams/architecture-layers.png` |
 | 9 | Data Flow (diagram) | `goal-state.md` → tag flow sentence + `diagrams/end-to-end-flow.png` |
 | 10 | OtOpcUa | `goal-state.md` → OtOpcUa |
 | 11 | Analytics Stack | `goal-state.md` → SnowBridge + Historian→Snowflake |
 | 12 | Redpanda EventHub | `goal-state.md` → Async Event Backbone |
 | 13 | Roadmap grid (table) | `roadmap.md` → The grid |
 | 14 | Year 1 Focus | `roadmap.md` → Year 1 column |
 | 15 | Pillar 3: Legacy Retirement | `current-state/legacy-integrations.md` |
 | 16 | Open Coordination Items | `goal-state.md` → Strategic Considerations |
 | 17 | Non-Goals | `goal-state.md` → Non-Goals |
 | 18 | Asks & Next Steps | `status.md` → Top pending items |
 **Truncation:** max 6 bullets per slide, max ~12 words per bullet. Overflow → footer pointer to source.
 **Theme:** `document-skills:theme-factory` default professional theme. No custom branding on first pass.
 ## Section 4 — Longform PDF spec
 **Document shape:** Cover → TOC → Chapter 1 (`current-state.md`) → Chapter 2 (`goal-state.md`) → Chapter 3 (`roadmap.md`) → Appendix A (`current-state/legacy-integrations.md`) → Appendix B (`current-state/equipment-protocol-survey.md`).
 **Transformations** (applied at render time, source markdown not edited):
 - File H1 → chapter title
 - H2/H3/H4 → numbered sections (1.1, 1.1.1)
 - Inter-file links → "see Chapter N" / "see Appendix X"
 - `_TBD_` markers → visual callout
 - ASCII diagrams → preserved as monospace code blocks
 - Tables → preserved as multi-page tables
 **Page setup:** Letter, 1" margins, chapter-name running header, page-number + as-of-date footer. `document-skills:theme-factory` default serif.
 **Excluded from PDF:** `status.md`, `CLAUDE.md`, `goal-state/digital-twin-management-brief.md`, `outputs/*` (all meta, working, or prep content — not plan content).
 ## Section 5 — Diagrams
 **Starter set — two diagrams:**
 1. **`architecture-layers`** — the 4-layer stack with IT↔OT boundary. Mermaid `flowchart TB`.
 2. **`end-to-end-flow`** — horizontal data flow from equipment to Power BI. Mermaid `flowchart LR`.
 **No third roadmap-timeline diagram** — the roadmap renders as a PPTX table on slide 13; a Gantt would compete with the grid rather than complement it. Can be added later as `diagrams/roadmap-timeline.mmd` without changing any other file.
 **Rendering:** on first regeneration, I attempt to render `.mmd` → `.png` in-session. If the Claude Code environment has Mermaid rendering available, done. If not, the run log directs the human to render the `.mmd` file at https://mermaid.live and drop the PNG into `outputs/diagrams/`. After that, regeneration embeds the existing PNG verbatim — zero drift.
 **Editing:** edit `.mmd` → re-render PNG → `regenerate presentation` picks up the new PNG automatically.
 ## Open questions for first-run verification
 These get resolved during implementation, not during design:
 1. **Mermaid rendering in the Claude Code environment.** Unknown until I try. Fallback is manual rendering at mermaid.live; neither path breaks the design.
 2. **Whether `document-skills:pptx` can produce a 3-column layout** (needed for slide 5: Three Pillars) and a 2-column layout (needed for slide 16: Open Coordination Items). If not, the spec falls back to single-column with visual separation.
 3. **Table overflow behavior on slide 13.** The 7×3 roadmap grid with truncated cell content should fit one slide, but if it overflows, the spec needs a fallback: either shrink text or split across two slides.
 4. **First-pass theme quality.** I'll use the theme-factory default; the first output becomes the visual baseline. If it looks wrong, section 3's "visual style" line is where the override goes.
 ## Task list
 Implementation tasks are tracked in the native task list:
 - **#4** — Scaffold `outputs/` directory
 - **#5** — Write `outputs/README.md` (trigger phrases + checklist)
 - **#6** — Write `outputs/presentation-spec.md`
 - **#7** — Write `outputs/longform-spec.md`
 Additional tasks to be created on entry to implementation:
 - Draft `architecture-layers.mmd` + `end-to-end-flow.mmd` Mermaid source
 - Attempt first PNG rendering (or document manual fallback in run log)
 - Execute first PPTX and PDF generation, save to `outputs/generated/`
 - Append inaugural run log entry
 ## What happens after first generation
 The first outputs are the baseline. Review produces one of:
 - **Looks right** — done. `regenerate outputs` works; use it whenever the plan changes.
 - **Structure wrong** — edit `presentation-spec.md` or `longform-spec.md`, regenerate.
 - **Style wrong** — note override in spec (theme change, truncation rule tweak), regenerate.
 - **Content faithful but phrasing drifts between runs** — tighten spec rules (e.g., "quote verbatim" instead of "summarize to 5 bullets"), regenerate.
 If drift proves unmanageable after a few iterations, the fallback per earlier conversation is to extract the first good output as a **template** and regenerate against that template instead — turning the pipeline into a content-substitution step against a known-good visual shell. Design section 5's "once the PNG exists" principle (fixed visual, variable text) becomes the model for the whole deck and PDF at that point.
--- a/outputs/IMPLEMENTATION-PLAN.md
+++ b/outputs/IMPLEMENTATION-PLAN.md
@@ -0,0 +1,275 @@
 # Output Generation Pipeline Implementation Plan
 > **For Claude:** REQUIRED SUB-SKILL: Use `superpowers-extended-cc:executing-plans` to implement this plan task-by-task. Note: this repo is not a git project and the work is markdown authoring + skill invocation, not code. The standard writing-plans template (TDD, pytest, git commits) does not apply; use the **verification steps** per task as the equivalent of "run the tests."
 **Goal:** Scaffold a repeatable PPTX + PDF generation pipeline over the plan markdown source, and produce the first pair of outputs.
 **Architecture:** Spec files (`presentation-spec.md`, `longform-spec.md`) as structure anchors; hand-drawn Mermaid diagrams committed as `.mmd` + `.png`; `document-skills:pptx` and `document-skills:pdf` invoked per a fixed checklist in `outputs/README.md`; append-only `run-log.md` records every regeneration.
 **Tech Stack:** Markdown, Mermaid (for diagrams), `document-skills:pptx`, `document-skills:pdf`, `document-skills:theme-factory` default theme.
 **Design reference:** [`outputs/DESIGN.md`](DESIGN.md)
 **Path note:** Writing-plans' default location is `docs/plans/YYYY-MM-DD-<name>.md`; this repo convention keeps documents in topical directories, so this plan lives alongside the pipeline it implements.
 ---
 ## Task 0: Directory scaffold
 **Status:** Parent directory `outputs/` already exists (created when `DESIGN.md` was written). This task creates the remaining subdirectories.
 **Files:**
 - Create: `outputs/diagrams/` (directory)
 - Create: `outputs/generated/` (directory)
 **Step 1:** `mkdir -p outputs/diagrams outputs/generated`
 **Step 2: Verify**
 Run: `ls outputs/`
 Expected: `DESIGN.md  IMPLEMENTATION-PLAN.md  diagrams  generated` (README, specs, run-log not yet created)
 ---
 ## Task 1: outputs/README.md — trigger phrases + regeneration checklist
 **Files:**
 - Create: `outputs/README.md`
 **Content requirements:**
 - Purpose statement (what `outputs/` is and isn't)
 - Trigger phrases section — `regenerate outputs`, `regenerate presentation`, `regenerate longform`
 - Numbered regeneration procedure (the 5-step checklist from design section 2)
 - Edit-this-not-that rules — when to edit specs vs. source vs. diagrams
 - Pointers to `DESIGN.md`, `presentation-spec.md`, `longform-spec.md`, `run-log.md`
 **Verification:**
 - File exists at `outputs/README.md`
 - Contains the three trigger phrases verbatim
 - Contains the 5-step procedure numbered 1–5
 - Links to the four companion files resolve to existing or soon-to-exist paths
 ---
 ## Task 2: outputs/presentation-spec.md — 18-slide PPTX structure anchor
 **Files:**
 - Create: `outputs/presentation-spec.md`
 **Content requirements:**
 - Meta header — audience (mixed-stakeholder), theme (theme-factory default), truncation rules (6 bullets, ~12 words each, overflow → footer pointer)
 - 18 slide entries, one per slide. Each entry specifies:
  - Slide number, title, layout type
  - Source file + section anchor
  - Population instructions (verbatim / N-bullet summary / diagram embed / table render)
  - Optional: notes on layout fallbacks if the PPTX skill can't do the requested layout
 - A closing "editing this spec" section explaining that structural changes go here, not in prompts
 **Verification:**
 - File exists
 - Has exactly 18 slide entries
 - Every slide cites a specific source file (or `—` for the title slide)
 - Slides 8 and 9 reference `outputs/diagrams/architecture-layers.png` and `outputs/diagrams/end-to-end-flow.png` respectively
 ---
 ## Task 3: outputs/longform-spec.md — PDF chapter/appendix structure anchor
 **Files:**
 - Create: `outputs/longform-spec.md`
 **Content requirements:**
 - Meta header — audience (anyone reading the plan standalone), theme (theme-factory default serif), page setup (Letter, 1" margins, chapter header, page+date footer)
 - File-to-chapter mapping:
  - Cover page (title, subtitle, as-of date, abstract from `goal-state.md` → Vision)
  - TOC (auto, two levels)
  - Chapter 1: `current-state.md`
  - Chapter 2: `goal-state.md`
  - Chapter 3: `roadmap.md`
  - Appendix A: `current-state/legacy-integrations.md`
  - Appendix B: `current-state/equipment-protocol-survey.md`
 - Transformation rules (numbered heading, link normalization, `_TBD_` highlight, ASCII diagram preservation, table handling)
 - Exclusion list (`status.md`, `CLAUDE.md`, `goal-state/digital-twin-management-brief.md`, `outputs/*`)
 **Verification:**
 - File exists
 - Lists 3 chapters + 2 appendices in the specified order
 - Transformation rules section is present and enumerated
 ---
 ## Task 4: outputs/diagrams/architecture-layers.mmd — Mermaid source for the 4-layer stack
 **Files:**
 - Create: `outputs/diagrams/architecture-layers.mmd`
 **Content requirements:**
 - Mermaid `flowchart TB` (top-to-bottom)
 - Nodes: Equipment (bottom) → OtOpcUa (L2, with two namespace sub-nodes) → System Platform + Ignition (L3) → ScadaBridge (L4) → Enterprise IT (Camstar, Delmia, Snowflake, Power BI)
 - A visible IT↔OT boundary line between L4 and Enterprise IT (Mermaid supports this via subgraphs or styled edges)
 - Labels use the exact vocabulary from `goal-state.md` → Layered Architecture
 **Verification:**
 - File exists
 - Starts with `flowchart TB` or equivalent directive
 - Contains node labels for `Equipment`, `OtOpcUa`, `System Platform`, `Ignition`, `ScadaBridge`, and `Enterprise IT`
 - Passes basic Mermaid syntax sanity check (paste into mermaid.live renders without error — verify this step on first authoring)
 ---
 ## Task 5: outputs/diagrams/end-to-end-flow.mmd — Mermaid source for the data flow
 **Files:**
 - Create: `outputs/diagrams/end-to-end-flow.mmd`
 **Content requirements:**
 - Mermaid `flowchart LR` (left-to-right)
 - Nodes match `goal-state.md` line 77 exactly: Equipment → OtOpcUa → System Platform/Ignition → ScadaBridge → Redpanda → SnowBridge → Snowflake → dbt → Power BI
 - IT↔OT boundary marker between ScadaBridge and Redpanda (Redpanda is IT-adjacent from ScadaBridge's central crossing)
 **Verification:**
 - File exists
 - Starts with `flowchart LR` or equivalent
 - Has exactly the 9 nodes in the exact order from `goal-state.md` line 77
 - Renders at mermaid.live without error
 ---
 ## Task 6: Render diagrams to PNG (or document manual fallback)
 **Files:**
 - Create: `outputs/diagrams/architecture-layers.png` (if rendering available)
 - Create: `outputs/diagrams/end-to-end-flow.png` (if rendering available)
 **Step 1:** Attempt to render Mermaid → PNG in the Claude Code environment. Options, in order of preference:
 1. `document-skills:pptx` skill may expose a diagram embed path that handles Mermaid natively
 2. Python with `mermaid-py` or `mermaid-cli` if available
 3. A headless browser with mermaid.js
 **Step 2 (if rendering succeeds):** Write the PNGs to `outputs/diagrams/`. Verify file size > 0 and file type is PNG.
 **Step 3 (if rendering fails):** Do not block. Instead:
 - Create placeholder empty PNGs or skip PNG creation entirely
 - Record in the run log: *"Render `outputs/diagrams/architecture-layers.mmd` and `outputs/diagrams/end-to-end-flow.mmd` at https://mermaid.live and save the PNGs to `outputs/diagrams/`. Then re-run `regenerate presentation`."*
 - Note in the human-facing summary that this is the one-time manual step
 **Verification:**
 - Either both PNG files exist with non-zero size, OR the run log contains the explicit manual-render instruction
 ---
 ## Task 7: Update CLAUDE.md index
 **Files:**
 - Modify: `CLAUDE.md`
 **Change:** Add `outputs/` to the component detail files list with a one-line description, so future Claude sessions see the pipeline exists when loading the project.
 **Verification:**
 - `CLAUDE.md` mentions `outputs/` and points to `outputs/README.md` as the regeneration entry point
 ---
 ## Task 8: Execute first PPTX generation
 **Files:**
 - Create: `outputs/generated/plan-presentation.pptx`
 **Step 1:** Read `outputs/presentation-spec.md` (the structure anchor).
 **Step 2:** Read every source file named in the spec.
 **Step 3:** For each of the 18 slides, populate content per the spec's rules.
 **Step 4:** Invoke `document-skills:pptx` with:
 - Theme: `document-skills:theme-factory` default professional
 - Title: "Shopfloor IT/OT Transformation — 3-Year Plan"
 - Output path: `outputs/generated/plan-presentation.pptx`
 - Diagram images from `outputs/diagrams/`
 **Step 5:** Append run log entry.
 **Verification:**
 - `outputs/generated/plan-presentation.pptx` exists, file size > 0
 - Slide count is 18 (verify by re-reading the file's structure if possible)
 - Run log has a new entry with timestamp and a "presentation regenerated" note
 - If any source content was truncated, a warning appears in the run log entry
 ---
 ## Task 9: Execute first PDF generation
 **Files:**
 - Create: `outputs/generated/plan-longform.pdf`
 **Step 1:** Read `outputs/longform-spec.md`.
 **Step 2:** Read all 5 source files (current-state.md, goal-state.md, roadmap.md, legacy-integrations.md, equipment-protocol-survey.md).
 **Step 3:** Apply the transformation rules from the spec — numbered headings, link normalization, `_TBD_` highlighting.
 **Step 4:** Invoke `document-skills:pdf` with:
 - Theme: theme-factory default
 - Cover page with title, subtitle, as-of date, abstract
 - TOC, 3 chapters, 2 appendices
 - Output path: `outputs/generated/plan-longform.pdf`
 **Step 5:** Append run log entry.
 **Verification:**
 - `outputs/generated/plan-longform.pdf` exists, file size > 0
 - Basic sanity: first page is cover, second is TOC, chapters appear in the expected order
 - Run log has a new entry for the PDF generation
 - Broken/unresolvable inter-file links are logged as warnings, not silently dropped
 ---
 ## Task 10: Initial run log + human-facing summary
 **Files:**
 - Create: `outputs/run-log.md`
 **Content:**
 - Header explaining the run log's purpose (append-only audit trail, one entry per regeneration, newest at bottom)
 - Inaugural entry:
  - Timestamp (2026-04-15)
  - What ran: first PPTX + PDF generation
  - Source plan version: brief summary ("plan as of 2026-04-15 — 5 source files, 3 legacy integrations, 6 protocol survey placeholder rows")
  - Warnings: any truncations, missing sections, broken links, or PNG rendering fallbacks from tasks 6, 8, 9
  - Next actions for the human reviewer
 **Verification:**
 - File exists
 - Contains the inaugural entry
 - If Task 6 hit the manual-render fallback, the run log surfaces that prominently
 ---
 ## Dependencies
 ```
 Task 0 (scaffold)
  ├── Task 1 (README)
  ├── Task 2 (presentation-spec)
  ├── Task 3 (longform-spec)
  ├── Task 4 (architecture-layers.mmd)
  ├── Task 5 (end-to-end-flow.mmd)
  └── Task 7 (CLAUDE.md index) — can run in parallel
 Tasks 4, 5 → Task 6 (render PNGs)
 Tasks 1, 2, 6 → Task 8 (PPTX generation)
 Tasks 1, 3 → Task 9 (PDF generation)
 Tasks 8, 9 → Task 10 (run log)
 ```
 Tasks 0–5 and 7 can run in parallel where sensible; tasks 6, 8, 9, 10 are sequentially dependent.
 ---
 ## Out of scope for this plan
 - A second theme / branded version of the deck (first run uses theme-factory default; branding iteration is a follow-up)
 - A third diagram (roadmap-timeline) — starter set is 2, per design
 - A `.claude/commands/regenerate-outputs.md` slash command — plain English trigger + README checklist is enough for first pass
 - A curated long-form narrative PDF (option b from earlier brainstorm) — first run is option (a) faithful typeset only
 - Editing source plan files — regeneration is read-only on plan content
 - Any git operations — this repo is not a git project
--- a/outputs/README.md
+++ b/outputs/README.md
@@ -0,0 +1,110 @@
 # outputs/ — repeatable PPTX & PDF generation
 This directory holds the **generation pipeline** for two output artifacts derived from the plan markdown source:
 1. A **mixed-stakeholder PowerPoint** (18 slides) — [`generated/plan-presentation.pptx`](generated/)
 2. A **long-form PDF** of the authoritative plan content — [`generated/plan-longform.pdf`](generated/)
 Outputs are regenerated on demand from the current markdown source. Repeatability is anchored by **spec files** (this directory), not by prompts. See [`DESIGN.md`](DESIGN.md) for the full design rationale.
 ## What lives here
 | File / directory | Role |
 |---|---|
 | [`README.md`](README.md) | This file — entry point, trigger phrases, regeneration procedure. |
 | [`DESIGN.md`](DESIGN.md) | Design document for the generation pipeline. Read this once; edit when the approach changes. |
 | [`presentation-spec.md`](presentation-spec.md) | **Structure anchor** for the PPTX. Enumerates every slide, its source, and how to populate. Edit this when the deck's shape needs to change. |
 | [`longform-spec.md`](longform-spec.md) | **Structure anchor** for the PDF. Chapter/appendix ordering, transformation rules, page setup. Edit this when the PDF's shape needs to change. |
 | [`run-log.md`](run-log.md) | Append-only audit log — one entry per regeneration. Created on first run. |
 | [`diagrams/`](diagrams/) | Hand-drawn diagram sources (`.mmd`) and exported images (`.png`). Committed source; regeneration embeds the existing PNG verbatim and does not re-render. |
 | [`generated/`](generated/) | The actual PPTX and PDF artifacts. Disposable — regenerated from scratch on every run. Do not hand-edit. |
 ## Trigger phrases
 In any Claude Code session (this one or a future one), ask for a regeneration with one of:
 - `regenerate outputs` — regenerates both the PPTX and the PDF.
 - `regenerate presentation` — PPTX only.
 - `regenerate longform` — PDF only.
 These phrases are the documented trigger. They map onto the procedure below. Prompts that paraphrase them loosely will still work but lose the "same phrase every time" determinism. Prefer the exact phrase.
 ## Regeneration procedure
 The fixed checklist any Claude session follows when triggered. This is what makes regeneration repeatable — every run reads the same inputs in the same order and produces outputs from them.
 1. **Read the relevant spec file.**
   - For PPTX: [`presentation-spec.md`](presentation-spec.md).
   - For PDF: [`longform-spec.md`](longform-spec.md).
   - The spec file is the **structure anchor** — it says exactly what slides / chapters exist and what source feeds each one.
 2. **Read every source file named in the spec.** Do **not** read other files in the repo. If a source file is not referenced by the spec, it is not part of the output. This is the rule that prevents drift when new plan files are added — new content only appears in outputs when the spec is updated to reference it.
 3. **For each slide / chapter, populate content from the named source using the spec's rules.** Rules are per-entry and may say: quote verbatim, summarize to N bullets max, embed the diagram image at path X, render as a table with these columns, etc. When the spec says N bullets, use N bullets — no fewer, no more.
 4. **Invoke the rendering skill.**
   - For PPTX: `document-skills:pptx` with the theme and layout from the spec.
   - For PDF: `document-skills:pdf` with the page setup and transformation rules from the spec.
   - Output goes to [`generated/plan-presentation.pptx`](generated/) or [`generated/plan-longform.pdf`](generated/).
 5. **Append a run log entry** to [`run-log.md`](run-log.md) with:
   - Timestamp
   - What was regenerated
   - Any warnings: truncations, missing sections, unresolved inter-file links, missing diagram images, anything unexpected in the source
   - A one-line summary of the source plan state ("3 legacy integrations, 6 protocol survey rows, no new source files since last run") so drift is detectable by diffing the log
 ## Edit-this-not-that rules
 | If you want to change... | Edit this | Do NOT edit this |
 |---|---|---|
 | What goes on slide 7 | `presentation-spec.md` | `current-state.md` |
 | The order of chapters in the PDF | `longform-spec.md` | the source files |
 | A typo in the plan | The source markdown file (`current-state.md`, `goal-state.md`, etc.) | The spec files |
 | The architecture diagram | `diagrams/architecture-layers.mmd` (and re-render to PNG) | the generated `.pptx` |
 | The deck's theme / visual style | `presentation-spec.md` (meta header section) | the generated `.pptx` |
 | The PDF's page setup | `longform-spec.md` (meta header section) | the generated `.pdf` |
 | Anything in `generated/` | — | Don't. Files there are regenerated from scratch. Edit the spec or source instead. |
 **The golden rule:** if it's in `generated/`, it's disposable. If it's anywhere else in `outputs/`, it's committed source and hand-edits stick. If it's a plan markdown file, it's the authoritative source of truth and the spec files consume it, not the other way around.
 ## Diagrams
 Diagrams are **hand-drawn source files** committed to [`diagrams/`](diagrams/), not generated fresh on each run. This is deliberate — the single biggest drift vector in LLM-driven generation is "Claude redraws the diagram each run and it looks slightly different every time."
 - The source is a Mermaid `.mmd` text file — diffable, editable, renderable at [https://mermaid.live](https://mermaid.live).
 - The render is a `.png` committed alongside the `.mmd`.
 - Regeneration embeds the existing `.png` verbatim. Zero drift between runs as long as nobody edits the `.mmd`.
 **When a diagram needs to change:**
 1. Edit the `.mmd` file (or ask Claude to rewrite it).
 2. Re-render the PNG — either Claude does it in-session if Mermaid tooling is available, or you render it manually at https://mermaid.live and save the PNG into `diagrams/`.
 3. Run `regenerate presentation`. The new PNG is picked up automatically.
 ## Starter diagram set
 The spec currently references two diagrams:
 - `diagrams/architecture-layers.png` — the 4-layer goal-state architecture stack (Equipment → OtOpcUa → SCADA → ScadaBridge → Enterprise IT), with the IT↔OT boundary marked.
 - `diagrams/end-to-end-flow.png` — the left-to-right data flow for one tag, matching `goal-state.md` line 77 exactly (Equipment → OtOpcUa → System Platform/Ignition → ScadaBridge → Redpanda → SnowBridge → Snowflake → dbt → Power BI).
 Both are **not yet authored.** On first regeneration, Claude will either author the `.mmd` sources and attempt to render them, or flag this as a manual step in the run log. Until the PNGs exist, the corresponding slides (slides 8 and 9 in the deck) will have placeholder boxes.
 A third diagram (`roadmap-timeline`) is **intentionally not part of the starter set** — the roadmap renders as a PPTX table on slide 13, and a Gantt-style diagram would compete with the grid rather than complement it. Add later if the table feels flat.
 ## What this pipeline does NOT do
 - **No auto-regeneration.** Outputs are regenerated only when explicitly triggered. This avoids producing stale-looking artifacts between meaningful plan updates.
 - **No source-file edits.** Regeneration is read-only on the plan markdown. The only writes are to `generated/` and `run-log.md`.
 - **No diagram re-rendering on each run.** Diagrams are embedded verbatim from committed PNGs — see the Diagrams section above.
 - **No second theme.** First pass uses `document-skills:theme-factory` default professional. Branding and custom themes are a follow-up iteration.
 - **No curated narrative PDF.** First pass is a faithful typeset of the source markdown (option (a) from the design brainstorm). A reshaped, executive-narrative PDF is a potential second output.
 - **No slash command.** A `.claude/commands/regenerate-outputs.md` slash command was considered and deferred — the plain English trigger + this README's checklist is enough for the first pass.
 ## Related files
 - [`../CLAUDE.md`](../CLAUDE.md) — plan structure and conventions
 - [`../current-state.md`](../current-state.md), [`../goal-state.md`](../goal-state.md), [`../roadmap.md`](../roadmap.md) — the authoritative plan
 - [`../status.md`](../status.md) — working bookmark, NOT part of the generated outputs
 - [`DESIGN.md`](DESIGN.md) — full design document for this pipeline
 - [`IMPLEMENTATION-PLAN.md`](IMPLEMENTATION-PLAN.md) — the scaffolding plan (partially executed; see its status section)
--- a/outputs/longform-spec.md
+++ b/outputs/longform-spec.md
@@ -0,0 +1,173 @@
 # Longform Spec — Faithful Typeset PDF
 **The structure anchor for [`generated/plan-longform.pdf`](generated/).** Every regeneration reads this file first and produces a PDF that renders the current plan markdown as a typeset document with cover, TOC, chapters, and appendices. Edit this file to change PDF structure; do not edit prompts, do not edit the source plan files.
 ## Meta
 | Property | Value |
 |---|---|
 | **Type** | Faithful typeset (option (a) from the design brainstorm). Structure and headings map 1:1 from source. No curated narrative reshaping. |
 | **Audience** | Anyone reading the plan standalone — new stakeholders, archival readers, anyone handed the plan without having followed the day-to-day authoring. |
 | **Expected length** | Roughly 50–80 pages depending on rendering density. |
 | **Theme** | `document-skills:theme-factory` default professional theme (serif body font). Override by adding a `**Theme override:**` line here. |
 | **Page size** | US Letter (8.5" × 11"). Change to A4 if distribution is primarily non-US. |
 | **Margins** | 1" on all sides |
 | **Running header** | Chapter title (left-aligned) · "Shopfloor IT/OT Transformation — 3-Year Plan" (right-aligned) |
 | **Running footer** | Page number (center) · As-of date (right) |
 | **Title on cover** | "Shopfloor IT/OT Transformation" |
 | **Subtitle on cover** | "3-Year Plan" |
 | **Cover as-of line** | "As of {{regeneration-date}}" |
 | **Cover abstract** | A single paragraph lifted **verbatim** from [`../goal-state.md`](../goal-state.md) → **Vision** (the opening paragraph). Do not paraphrase. |
 ## Document structure
 ```
 Cover page                         — title, subtitle, as-of date, abstract
 Table of Contents                  — auto-generated, two levels deep
 Chapter 1 — Current State          (source: ../current-state.md)
 Chapter 2 — Goal State             (source: ../goal-state.md)
 Chapter 3 — Roadmap                (source: ../roadmap.md)
 Appendix A — Legacy Integrations Inventory  (source: ../current-state/legacy-integrations.md)
 Appendix B — Equipment Protocol Survey      (source: ../current-state/equipment-protocol-survey.md)
 ```
 **Chapter / appendix file-to-source mapping:**
 | Label | Title | Source file |
 |---|---|---|
 | Chapter 1 | Current State | [`../current-state.md`](../current-state.md) |
 | Chapter 2 | Goal State | [`../goal-state.md`](../goal-state.md) |
 | Chapter 3 | Roadmap | [`../roadmap.md`](../roadmap.md) |
 | Appendix A | Legacy Integrations Inventory | [`../current-state/legacy-integrations.md`](../current-state/legacy-integrations.md) |
 | Appendix B | Equipment Protocol Survey | [`../current-state/equipment-protocol-survey.md`](../current-state/equipment-protocol-survey.md) |
 ## Explicitly excluded
 These files are **not** part of the PDF — do not include them even if they seem relevant:
 | File | Why excluded |
 |---|---|
 | [`../CLAUDE.md`](../CLAUDE.md) | Repo meta — instructions for Claude, not plan content. |
 | [`../status.md`](../status.md) | Working bookmark — a session-state artifact, not authoritative plan content. |
 | [`../goal-state/digital-twin-management-brief.md`](../goal-state/digital-twin-management-brief.md) | Meeting prep artifact. Its own header explicitly says it is not plan content. |
 | [`./README.md`](README.md), [`./DESIGN.md`](DESIGN.md), [`./presentation-spec.md`](presentation-spec.md), [`./longform-spec.md`](longform-spec.md), [`./IMPLEMENTATION-PLAN.md`](IMPLEMENTATION-PLAN.md), [`./run-log.md`](run-log.md) | Output pipeline files — the pipeline does not document itself inside its own output. |
 | [`./diagrams/*`](diagrams/), [`./generated/*`](generated/) | Output pipeline artifacts. |
 If a new plan file is added to the repo and should be in the PDF, **add it here**, under Document structure, as a new chapter or appendix — not in the source file and not in the prompt.
 ## Transformations applied during rendering
 The narrow set of changes applied to source markdown when producing the PDF. Source markdown files are **not edited** — transformations are applied at render time.
 ### 1. File H1 → chapter/appendix title
 `# Current State` (the first line of `current-state.md`) becomes the chapter title *"Chapter 1 — Current State"* on the chapter's opening page. The H1 is consumed by the chapter heading; it does not appear a second time in the body.
 ### 2. Heading numbering
 All `##`, `###`, `####` headings receive section numbers:
 - `##` → `N.M` where N = chapter number, M increments per ## within the chapter
 - `###` → `N.M.K`
 - `####` → `N.M.K.L`
 Example: `## Systems & Interfaces` in Chapter 1 becomes `1.2 Systems & Interfaces`.
 Numbering is **added at render time**; source markdown remains unchanged. Numbering is **applied to body text** but **not to the TOC** (TOC uses unnumbered section titles for readability).
 ### 3. Inter-file link normalization
 Markdown links between plan files are resolved to section references in the rendered PDF:
 | Source link form | Rendered form |
 |---|---|
 | `[current-state.md](current-state.md)` | "see Chapter 1 — Current State" |
 | `[goal-state.md](goal-state.md)` | "see Chapter 2 — Goal State" |
 | `[roadmap.md](roadmap.md)` | "see Chapter 3 — Roadmap" |
 | `[legacy-integrations.md](current-state/legacy-integrations.md)` | "see Appendix A — Legacy Integrations Inventory" |
 | `[equipment-protocol-survey.md](current-state/equipment-protocol-survey.md)` | "see Appendix B — Equipment Protocol Survey" |
 | Intra-file anchor links like `[X](#section-name)` | Rendered as internal PDF cross-reference to the numbered section (e.g., "see §1.2") |
 | Links to excluded files (e.g., `status.md`, `digital-twin-management-brief.md`) | Rendered as **plain text** — the link target is dropped, the link text stays. Logged as a warning in the run log. |
 | External links (http://, https://) | Rendered as clickable external links, unchanged. |
 | Unresolvable links (file not found) | Rendered as plain text, logged as a warning in the run log. **Do not silently drop.** |
 ### 4. `_TBD_` marker highlighting
 Every occurrence of `_TBD_` in the source becomes a **visual callout** in the rendered PDF:
 - Inline `_TBD_` tokens get a colored background (theme-appropriate warning color)
 - A small margin icon (⚑ or similar) in the left margin on lines containing a TBD
 - TBDs remain readable — the goal is visibility, not obstruction
 This makes the "gaps stay visible" convention from [`../CLAUDE.md`](../CLAUDE.md) → Conventions actually visible on paper.
 ### 5. ASCII / text diagrams preserved
 The Layered Architecture text diagram in `goal-state.md` (and any similar text diagrams elsewhere) renders as a monospace code block with a subtle border. Option (a) faithful typeset explicitly accepts this — proper visual diagrams live in the PPTX, not the PDF.
 ### 6. Tables preserved (with multi-page support)
 Markdown tables render as PDF tables with row-level borders and header-row emphasis. Tables that exceed one page split cleanly at row boundaries, with the header row repeated at the top of each continuation page. `document-skills:pdf` handles this natively.
 Specific large tables to expect:
 - [`../roadmap.md`](../roadmap.md) → **The grid** (7 workstreams × 3 years) — likely spans 2–3 pages
 - [`../current-state/legacy-integrations.md`](../current-state/legacy-integrations.md) → per-row integration detail tables (one per integration)
 - [`../current-state/equipment-protocol-survey.md`](../current-state/equipment-protocol-survey.md) → field schema table, classification table, rollup views
 ### 7. Code blocks preserved
 Fenced code blocks in the source (for example, the directory-structure `dot` block in [`../goal-state.md`](../goal-state.md) or the structure trees in this file) render as monospace blocks with a light background. Line wrapping preferred over horizontal scroll.
 ### 8. Block quotes preserved
 Markdown `>` block quotes render as indented italic or colored blocks per the theme. Several key notes in the source (file headers, carve-outs) use block quotes and their emphasis is load-bearing.
 ### 9. Emphasis preserved
 `**bold**`, `*italic*`, and `` `inline code` `` all render as expected. The source uses these deliberately (especially bold for decisions and vocabulary terms) — preserve fidelity.
 ## Front matter
 ### Cover page
 ```
 [Large type]        Shopfloor IT/OT Transformation
 [Medium type]       3-Year Plan
 [Small type]        As of {{regeneration-date}}
 [Body paragraph, centered or full-width]
 {{abstract — verbatim from ../goal-state.md → Vision, opening paragraph}}
 ```
 ### Table of Contents
 - Auto-generated from chapter and appendix headings
 - **Two levels deep:** chapter titles + `##` headings within each chapter
 - Unnumbered in the TOC for readability (numbering appears in body)
 - Page numbers right-aligned
 - Leaders (`.....`) between heading text and page number
 - Chapters and appendices visually distinguished (slight indent, different weight, or section label)
 ### Back matter
 - **No index.** Generating a useful index requires curation; automated indexing produces noise. Skip for the first pass.
 - **No glossary.** Plan vocabulary (OtOpcUa, ScadaBridge, SnowBridge, Redpanda, etc.) is defined at first use in the source. If readers need a glossary, that's a future enhancement.
 ## Editing this spec
 - **Structural changes go here** — chapter order, appendix additions, cover layout, page setup, transformation rules.
 - **Content changes go in the source** — typos, new sections, corrections to the plan. The next regeneration picks them up automatically.
 - **Theme / page-setup changes go in the Meta section** at the top of this file. Override with a `**Theme override:**` or `**Page size override:**` line.
 - **Never hand-edit `generated/plan-longform.pdf`.** It is overwritten on the next regeneration.
 ## Planned follow-ups (not first-pass scope)
 These are deliberately deferred:
 - **Curated executive-narrative PDF** (option (b) from the design brainstorm) — a second output shaped as a standalone written report with intro, argument flow, transitions. Would live at `generated/plan-narrative.pdf` with its own `narrative-spec.md`.
 - **Bilingual rendering** — if the plan ever serves a non-English audience.
 - **Redacted variant** — a version with sensitive detail removed, for external stakeholders.
 - **Date-stamped archival copies** — `generated/plan-longform-YYYY-MM-DD.pdf` alongside the current one, so the history of the plan is preserved as a series of snapshots.
 None of these require changes to the current spec; add them when/if they become real requirements.
--- a/outputs/presentation-spec.md
+++ b/outputs/presentation-spec.md
@@ -0,0 +1,206 @@
 # Presentation Spec — Mixed-Stakeholder Deck
 **The structure anchor for [`generated/plan-presentation.pptx`](generated/).** Every regeneration reads this file first and populates slides exactly as described below. Edit this file to change deck structure; do not edit prompts, do not edit the source plan files.
 ## Meta
 | Property | Value |
 |---|---|
 | **Audience** | Mixed stakeholder — steering committee, leadership, build team leads, adjacent-initiative owners. Some execs, some technical. |
 | **Total slides** | 18 |
 | **Target read time** | 30–45 minutes walked through, ~10 minutes self-read |
 | **Theme** | `document-skills:theme-factory` default professional theme. Clean, neutral, no custom branding on first pass. Override: add a `**Theme override:**` line here with the target preset if you want different. |
 | **Body font** | Theme default sans-serif |
 | **Accent color** | Theme default |
 | **Source of truth for title** | "Shopfloor IT/OT Transformation — 3-Year Plan" |
 | **Subtitle template** | "As of {{date}}" where `{{date}}` is the regeneration timestamp in `YYYY-MM-DD` form |
 ## Truncation rules (apply to every slide unless overridden per-slide)
 - **Max 6 bullets per slide.** If the source section has more than 6 bullets of load-bearing content, pick the 6 most load-bearing and add a footer line: *"Full detail: `<source file> → <section name>`."*
 - **Max ~12 words per bullet.** Aim for scan-ability, not completeness.
 - **No nested bullets.** If the source has nested bullets, flatten to one level by merging or dropping sub-points.
 - **Preserve source vocabulary.** Use the same words the plan uses (e.g., "OtOpcUa," "ScadaBridge," "SnowBridge," "pillar 3," "data locality," "single sanctioned crossing point") — don't paraphrase technical terms or the Vision line.
 - **TBDs stay visible.** If a source section contains a `_TBD_` that is load-bearing for the slide's point, call it out explicitly: *"(TBD — see source.)"* Don't silently drop TBDs.
 - **Overflow callout.** When content is truncated, the footer line *"Full detail: `<source file> → <section name>`"* appears in small text at the bottom of the slide.
 ## Layout fallbacks
 If `document-skills:pptx` cannot render a requested layout:
 - **3-column content → single column with visual separators.** Applies to slide 5 (Three Pillars).
 - **2-column content → single column with a horizontal rule.** Applies to slide 16 (Open Coordination Items).
 - **Diagram + caption → caption only with a "(Diagram not yet available — render `<path>.mmd` and re-run)" note.** Applies to slides 8 and 9 until the PNG files exist in `diagrams/`.
 - **Multi-column table that overflows → split across two slides labeled "N/2" and "2/2".** Applies to slide 13 (Roadmap grid) if 7 rows × 3 columns doesn't fit one slide at readable type size.
 ---
 ## Slide 1 — Title
 | Property | Value |
 |---|---|
 | **Layout** | Title |
 | **Source** | — (no source file) |
 | **Content** | Title: *"Shopfloor IT/OT Transformation"* · Subtitle: *"3-Year Plan"* · Footer: *"As of {{regeneration-date}}"* |
 | **Rules** | No bullets, no extra content. Title slide is deliberately minimal. |
 ## Slide 2 — Executive Summary
 | Property | Value |
 |---|---|
 | **Layout** | Content (bulleted) |
 | **Source** | [`../goal-state.md`](../goal-state.md) → **Vision** + **Success Criteria** |
 | **Population** | 5 bullets: (1) the Vision line verbatim, (2)–(4) one line per pillar (the binary success criterion, not the rationale), (5) a single line naming the key non-goals in the form *"Out of scope: operator UX modernization, licensing, VM-level DR, physical network segmentation."* |
 | **Notes** | This slide is the TL;DR for anyone who reads only one slide. Keep it tight. |
 ## Slide 3 — Today's Reality
 | Property | Value |
 |---|---|
 | **Layout** | Content (bulleted) |
 | **Source** | [`../current-state.md`](../current-state.md) — synthesize from Systems & Interfaces and Equipment OPC UA sections |
 | **Population** | 5 bullets describing the pain points of the current state: (1) split SCADA stack (Aveva + Ignition by purpose), (2) WAN-dependent central Ignition with direct OPC UA sessions to site equipment, (3) multiple concurrent OPC UA sessions to the same equipment, (4) legacy IT↔OT integrations outside ScadaBridge (3 of them), (5) fragmented data access and inconsistent values between consumers |
 | **Notes** | The source file doesn't have a dedicated "pain points" section (it's a `_TBD_`). Synthesize from the inline observations in Systems & Interfaces and Equipment OPC UA. Don't invent pain points that aren't grounded in the source. |
 ## Slide 4 — Vision
 | Property | Value |
 |---|---|
 | **Layout** | Big-text / quote layout |
 | **Source** | [`../goal-state.md`](../goal-state.md) → **Vision** |
 | **Population** | The Vision line ("stable, single point of integration between shopfloor OT and enterprise IT") as the centerpiece. 3 sub-bullets below it on what that means concretely — extracted from the paragraph that follows the Vision line in goal-state.md. |
 | **Notes** | This slide repeats the Vision from slide 2 deliberately — slide 2 is the 30-second summary; slide 4 is where the Vision actually gets the airtime the conversation will return to. |
 ## Slide 5 — Three Pillars
 | Property | Value |
 |---|---|
 | **Layout** | 3-column content (fallback: single column with visual separators) |
 | **Source** | [`../goal-state.md`](../goal-state.md) → **Success Criteria** |
 | **Population** | Column 1: Pillar 1 — Unification (100% of sites on standardized stack). Column 2: Pillar 2 — Analytics/AI Enablement (≤15-minute analytics SLO; one "not possible before" use case in production). Column 3: Pillar 3 — Legacy Retirement (inventory to zero). Each column: (a) title, (b) the binary criterion in one line, (c) one line of "what this means" context. |
 | **Rules** | All three pillars must appear. Do not prioritize one over the others here — they are equal end-state criteria per the plan. |
 ## Slide 6 — Enterprise Layout
 | Property | Value |
 |---|---|
 | **Layout** | Content (bulleted, grouped by tier) |
 | **Source** | [`../current-state.md`](../current-state.md) → **Enterprise Layout** |
 | **Population** | Grouped bullets: (a) Primary Data Center — South Bend. (b) Largest Sites — Warsaw West, Warsaw North (one cluster per building). (c) Other Integrated Sites — Shannon, Galway, TMT, Ponce (single cluster per site). (d) Not Yet Integrated — Berlin, Winterthur, Jacksonville (and others, with volatility note). |
 | **Notes** | The volatility note about smaller sites changing belongs on the slide as a one-line disclaimer at the bottom. |
 ## Slide 7 — Today's Systems
 | Property | Value |
 |---|---|
 | **Layout** | Content (bulleted) |
 | **Source** | [`../current-state.md`](../current-state.md) → **Systems & Interfaces** |
 | **Population** | 6 bullets, one line each: (1) Aveva System Platform — validated data collection; (2) Ignition SCADA — KPI monitoring (central, WAN-dependent); (3) ScadaBridge — in-house Akka.NET integration layer, already deployed; (4) LmxOpcUa — in-house OPC UA server exposing System Platform objects; (5) Camstar MES — sole enterprise MES; (6) Aveva Historian — sole historian, central-only, permanent retention. |
 | **Notes** | Delmia DNC is mentioned as one of the two legacy Web API interfaces but does not need its own bullet here — it's captured on slide 15. |
 ## Slide 8 — Goal State: Layered Architecture
 | Property | Value |
 |---|---|
 | **Layout** | Diagram + caption |
 | **Source** | [`../goal-state.md`](../goal-state.md) → **Layered Architecture** + [`diagrams/architecture-layers.png`](diagrams/) |
 | **Population** | Embed `diagrams/architecture-layers.png` as the centerpiece. Caption: one line per layer — *"Layer 1 Equipment · Layer 2 OtOpcUa (equipment + System Platform namespaces) · Layer 3 SCADA (System Platform + Ignition) · Layer 4 ScadaBridge (IT↔OT bridge)"*. Callout: *"ScadaBridge central is the sole IT↔OT crossing point."* |
 | **Fallback** | If `architecture-layers.png` does not exist, render a placeholder box with the text *"Diagram not yet available — render `diagrams/architecture-layers.mmd` at https://mermaid.live and save PNG to `diagrams/architecture-layers.png`, then re-run `regenerate presentation`."* |
 ## Slide 9 — Goal State: End-to-End Data Flow
 | Property | Value |
 |---|---|
 | **Layout** | Diagram + caption |
 | **Source** | [`../goal-state.md`](../goal-state.md) → the tag-flow sentence (search for "A tag read from a machine in Warsaw West") + [`diagrams/end-to-end-flow.png`](diagrams/) |
 | **Population** | Embed `diagrams/end-to-end-flow.png`. Caption: the tag-flow sentence from goal-state.md, verbatim or near-verbatim. |
 | **Fallback** | Same fallback pattern as slide 8, but for `end-to-end-flow.mmd` / `.png`. |
 ## Slide 10 — OtOpcUa — the unification layer
 | Property | Value |
 |---|---|
 | **Layout** | Content (bulleted) |
 | **Source** | [`../goal-state.md`](../goal-state.md) → **OtOpcUa — the unified site-level OPC UA layer** |
 | **Population** | 6 bullets: (1) single sanctioned OPC UA access point per site, one session per equipment; (2) two namespaces — equipment + System Platform (absorbs LmxOpcUa); (3) clustered, co-located on existing System Platform nodes; (4) hybrid driver strategy — proactive core library + on-demand long-tail; (5) OPC UA-native auth (UserName + standard security modes, inherited from LmxOpcUa); (6) tiered cutover — ScadaBridge → Ignition → System Platform IO across Years 1–3. |
 ## Slide 11 — Analytics Stack: SnowBridge, Snowflake, dbt
 | Property | Value |
 |---|---|
 | **Layout** | Content (bulleted) |
 | **Source** | [`../goal-state.md`](../goal-state.md) → **SnowBridge** + **Aveva Historian → Snowflake** + **Snowflake-side transform tooling** |
 | **Population** | 6 bullets: (1) SnowBridge — custom-built machine-data-to-Snowflake upload service; (2) source abstraction — Aveva Historian SQL in Year 1, Redpanda/ScadaBridge in Year 2; (3) governed selection with blast-radius approval workflow; (4) dbt curated layers, orchestrator out of scope; (5) ≤15-minute analytics SLO; (6) one "not possible before" AI/analytics use case in production by end of plan (pillar 2 gate). |
 ## Slide 12 — Redpanda EventHub: the async backbone
 | Property | Value |
 |---|---|
 | **Layout** | Content (bulleted) |
 | **Source** | [`../goal-state.md`](../goal-state.md) → **Async Event Backbone** |
 | **Population** | 6 bullets: (1) self-hosted, central in South Bend (single-cluster HA); (2) per-topic tiered retention — operational 7d / analytics 30d / compliance 90d; (3) bundled Schema Registry; (4) Protobuf schemas in central `schemas` repo with `buf` CI, `BACKWARD_TRANSITIVE` compatibility; (5) `{domain}.{entity}.{event-type}` topic naming, site identity in message not topic; (6) SASL/OAUTHBEARER auth + prefix ACLs. |
 ## Slide 13 — 3-Year Roadmap (workstreams × years)
 | Property | Value |
 |---|---|
 | **Layout** | Table — 7 rows × 3 columns (+ workstream name column = 4 columns total) |
 | **Source** | [`../roadmap.md`](../roadmap.md) → **The grid** |
 | **Population** | Render the 7×3 roadmap grid as a PPTX table. **Truncate** each cell to the **single most important commitment** (not the full cell text, which would overflow). Workstream column: full name. Year columns: ~10-word headline per cell. Color-code cells by pillar if the theme supports it. |
 | **Fallback** | If the 7×3 table doesn't fit one slide at readable type size, split across two slides: workstreams 1–4 on slide 13a (OtOpcUa, Redpanda, SnowBridge, dbt), workstreams 5–7 on slide 13b (ScadaBridge Extensions, Site Onboarding, Legacy Retirement). Label slides 13 and 14, renumber subsequent slides. |
 ## Slide 14 — Year 1 Focus
 | Property | Value |
 |---|---|
 | **Layout** | Content (bulleted) |
 | **Source** | [`../roadmap.md`](../roadmap.md) → the Year 1 column across all 7 workstreams |
 | **Population** | 7 bullets, one per workstream, ordered by prerequisite position: (1) OtOpcUa — evolve LmxOpcUa, protocol survey, deploy to every site, begin tier-1 cutover; (2) Redpanda — stand up central cluster, schema registry, initial topics; (3) SnowBridge — design + first source adapter (Historian SQL) with filtered flow; (4) dbt — scaffold project, first curated model; (5) ScadaBridge Extensions — deadband publishing + EventHub producer; (6) Site Onboarding — document lightweight onboarding pattern (no new sites Year 1); (7) Legacy Retirement — populate inventory (done), retire first integration as pattern-proving exercise. |
 | **Rules** | **Exceeds the 6-bullet truncation rule.** 7 bullets here is intentional because each bullet represents one workstream's Year 1 commitment — dropping one would misrepresent the plan. Keep all 7, tighten wording to ≤10 words per bullet. |
 ## Slide 15 — Pillar 3: Legacy Retirement (3 → 0)
 | Property | Value |
 |---|---|
 | **Layout** | Content (bulleted + callout) |
 | **Source** | [`../current-state/legacy-integrations.md`](../current-state/legacy-integrations.md) → Current inventory |
 | **Population** | 3 bullets — one per legacy integration: (1) **LEG-001** Aveva Web API → Delmia DNC (bidirectional orchestrated handshake; harder retirement — requires ScadaBridge scripts to re-implement System Platform parse logic). (2) **LEG-002** Aveva Web API ← Camstar MES (Camstar-initiated; easier retirement — ScadaBridge already has native Camstar path; requires Camstar-side reconfiguration). (3) **LEG-003** System Platform → custom email notification service (easier retirement — ScadaBridge native notifications already exist). Callout at bottom: *"Historian MSSQL reporting surface (BOBJ / Power BI) is explicitly carved out as not legacy — see `legacy-integrations.md` → Deliberately not tracked."* |
 ## Slide 16 — Open Coordination Items
 | Property | Value |
 |---|---|
 | **Layout** | 2-column content (fallback: single column with horizontal rule) |
 | **Source** | [`../goal-state.md`](../goal-state.md) → **Strategic Considerations (Adjacent Asks)** |
 | **Population** | **Left column — Digital Twin:** 4 bullets: (1) Management ask, not a committed workstream; (2) Plan shaped to serve if it materializes (OtOpcUa, Redpanda, Snowflake); (3) 8 clarification questions + 4-bucket decision framework ready; (4) Next: schedule management conversation — brief at `goal-state/digital-twin-management-brief.md`. **Right column — BOBJ → Power BI:** 4 bullets: (1) In-flight reporting initiative, not owned by this plan; (2) Three consumption paths analyzed (Snowflake dbt / Historian direct / both); (3) Recommended position: Path C — hybrid, with Path A as strategic direction; (4) Next: schedule coordination conversation with reporting team — 8 questions ready in `goal-state.md`. |
 ## Slide 17 — Non-Goals
 | Property | Value |
 |---|---|
 | **Layout** | Content (bulleted) |
 | **Source** | [`../goal-state.md`](../goal-state.md) → **Non-Goals** |
 | **Population** | 6 bullets, one line each: (1) Operator UX modernization — deprioritized against the three pillars; (2) Support staffing decisions — other teams; (3) Licensing strategy — not tracked; (4) Self-hosted orchestrator selection — chosen outside this plan; (5) VM-level DR — out of scope for Redpanda; (6) Physical network segmentation — out of scope. |
 | **Notes** | This slide is important for managing stakeholder expectations — what the plan *does not* commit to is as load-bearing as what it does commit to. Do not drop this slide even if time is short. |
 ## Slide 18 — Asks & Next Steps
 | Property | Value |
 |---|---|
 | **Layout** | Content (bulleted) |
 | **Source** | [`../status.md`](../status.md) → **Top pending items** + inferred from [`../roadmap.md`](../roadmap.md) → Year 1 |
 | **Population** | 5 bullets: (1) Sponsor confirmation + Year 1 funding commitment; (2) Named owners for each of the 7 workstreams (build team alignment); (3) Digital Twin management conversation — schedule (see brief); (4) Power BI coordination conversation with reporting team — schedule; (5) Equipment protocol survey owner named (Q1 Year 1 prerequisite for OtOpcUa core driver library). |
 | **Notes** | This is the closer slide. Each bullet should be a discrete ask with a clear "who needs to do what" so the audience leaves with action. |
 ---
 ## Editing this spec
 - **Structural changes go here.** If you want to add, remove, reorder, or reshape a slide, edit this file. Don't edit the prompt ("regenerate presentation") and don't edit the source plan files.
 - **Content changes go in the source.** If a slide's content is wrong because the plan itself is wrong, edit the source plan file (`current-state.md`, `goal-state.md`, `roadmap.md`, etc.) — the next regeneration will pick up the change automatically because this spec references the source file, not a snapshot.
 - **Theme / visual changes go in the meta section at the top of this file.** Add a `**Theme override:**` line naming the target preset, or tweak the truncation rules. First regeneration uses theme-factory default; iterate from there.
 - **Never edit `generated/plan-presentation.pptx` by hand.** Any hand-edits are overwritten on the next regeneration. If you find yourself wanting to hand-edit, the correct move is to edit this spec and regenerate.
 ## Slide-count budget
 Current: **18 slides.** If additions push this above 22, reconsider whether the deck is still "mixed-stakeholder" or has quietly become a build-team deck. The mixed-stakeholder audience tops out around 20 slides before attention fragments; a build-team deck belongs in a separate spec file (`build-team-spec.md` or similar) feeding a second generated output.
--- a/roadmap.md
+++ b/roadmap.md
@@ -0,0 +1,95 @@
 # Migration Plan / Roadmap
 How we get from `current-state.md` to `goal-state.md` over the 3-year plan.
 > **Status: in progress.** The structure below is a scaffold. Cells are intentionally thin — fill in as decisions land, don't over-commit before the work is real.
 ## Purpose
 This document answers **how** the 3-year plan is actually executed — what gets built, migrated, or retired, in what order, and with what dependencies. It does **not** re-state `goal-state.md` (the destination) or `current-state.md` (the starting point); it refers to them.
 Read this together with:
 - [`goal-state.md`](goal-state.md) — the destination, success criteria, and the three in-scope pillars.
 - [`current-state.md`](current-state.md) — today's system of record.
 - [`current-state/legacy-integrations.md`](current-state/legacy-integrations.md) — the authoritative inventory for pillar 3 retirement.
 ## Guiding principles
 Carried forward from `goal-state.md` → Vision; load-bearing for any sequencing decision:
 - **Stable, single point of integration.** Every step in the roadmap should move the estate *toward* the single ScadaBridge-central IT↔OT bridge, never away from it. No step should create a parallel or bespoke integration path, even as a "temporary" measure.
 - **Three pillars are binary at end of plan.** Unification at 100% of sites, analytics/AI enablement with at least one "not possible before" use case in production, and zero remaining legacy IT↔OT paths. Intermediate progress is tracked per-site / per-tag / per-integration here — not by softening the end-state criteria.
 - **Data locality is preserved throughout.** ScadaBridge remains the local data path at each site; centralizations (Redpanda, SnowBridge, Snowflake) sit above that, not around it.
 - **Dual-run before retire.** No legacy integration is switched off until its replacement has been running in production with the same consumers for a period defined in the integration's retirement criteria. Roadmap steps reflect this with explicit dual-run phases.
 - **Support staffing, licensing, orchestrator selection, VM-level DR, and physical network segmentation are out of scope for this plan.** Sequencing should not gate on decisions that belong to other teams.
 ## Organizing axis: workstreams × years
 The roadmap is laid out as a 2D grid — **workstreams** (rows) crossed with **years** (columns). Each workstream owns a component or capability, and each year's cell describes what happens in that workstream during that year.
 ### Workstreams
 1. **OtOpcUa** — evolve the existing in-house `LmxOpcUa` into a unified clustered OPC UA server (**OtOpcUa**) with two namespaces: the existing System Platform namespace plus a new equipment namespace that holds the single session to each piece of equipment. Ship it to every site and execute the tiered cutover of downstream consumers (see `goal-state.md` → **OtOpcUa — the unified site-level OPC UA layer (absorbs LmxOpcUa)**). Prioritized first because it is **foundational** for the rest of the OT plan.
 2. **Redpanda EventHub** — stand up and operate the central Kafka-compatible backbone (see `goal-state.md` → Async Event Backbone).
 3. **SnowBridge** — custom-build the dedicated service that owns all machine-data flows into Snowflake (see `goal-state.md` → SnowBridge).
 4. **Snowflake dbt Transform Layer** — build and evolve the dbt curated layers that Snowflake consumers read from (see `goal-state.md` → Aveva Historian → Snowflake → Snowflake-side transform tooling).
 5. **ScadaBridge Extensions** — add and tune the capabilities ScadaBridge needs to serve the new architecture (deadband publishing, EventHub producer configuration, auth alignment).
 6. **Site Onboarding** — bring the currently unintegrated smaller sites onto the standardized stack, and keep the already-integrated sites aligned with the evolving pattern.
 7. **Legacy Retirement** — discover, sequence, migrate, dual-run, and retire every legacy IT↔OT path tracked in [`current-state/legacy-integrations.md`](current-state/legacy-integrations.md).
 ### Workstream → pillar mapping
 | Workstream | Primary pillar(s) |
 |---|---|
 | OtOpcUa | Pillars 1, 2 — foundational (unblocks consistent equipment access for both unification and analytics paths) |
 | Redpanda EventHub | Pillar 2 (analytics/AI enablement) — foundational |
 | SnowBridge | Pillar 2 |
 | Snowflake dbt Transform Layer | Pillar 2 |
 | ScadaBridge Extensions | Pillars 1, 2, 3 — touches all three |
 | Site Onboarding | Pillar 1 (unification) |
 | Legacy Retirement | Pillar 3 (legacy retirement) |
 ### Cross-workstream dependencies
 - **OtOpcUa is foundational** and its *deployment* (software installed and ready at every site) is a Year 1 prerequisite for everything else. Its *cutover* (consumers redirected to it) follows the tiered order and extends across all three years, but the software must be present at every site before other workstreams take hard dependencies on equipment-data consistency. LmxOpcUa is already deployed per-node; Year 1 grows it into OtOpcUa in place, which keeps the rollout a low-risk evolution rather than a parallel install.
 - **Redpanda** must be in place before the **SnowBridge** can consume Redpanda-backed flows, and before **ScadaBridge Extensions** can test the EventHub producer path end-to-end.
 - The **SnowBridge** must be in place before **dbt** curated layers can be built on real machine-data landing tables.
 - The **Legacy inventory** (in `current-state/legacy-integrations.md`) must be populated before **Legacy Retirement** can be sequenced; inventory discovery is a Year 1 prerequisite.
 - **ScadaBridge tier-1 cutover** (ScadaBridge reading from OtOpcUa instead of equipment directly) must be completed at a site before **ScadaBridge Extensions** at that site can rely on consistent equipment-data semantics for downstream Redpanda publishing.
 - **Site Onboarding** for the smaller sites depends on having the **standardized stack** (OtOpcUa + ScadaBridge + Redpanda + SnowBridge) reasonably proven at the large sites — so heavy onboarding is Year 2+, not Year 1.
 - **Dual-run** for any retired legacy path requires the replacement path to be live — so Legacy Retirement's execution lags the workstream that delivers the replacement (most often ScadaBridge Extensions or the SnowBridge).
 ## The grid
 | Workstream | **Year 1 — Foundation** | **Year 2 — Scale** | **Year 3 — Completion** |
 |---|---|---|---|
 | **OtOpcUa** | **Evolve LmxOpcUa into OtOpcUa** — extend the existing in-house OPC UA server to add (a) a new equipment namespace that holds the single session to each piece of equipment via native device protocols translated to OPC UA, and (b) clustering on top of the existing per-node deployment. The System Platform namespace carries forward from LmxOpcUa; consumers that already use LmxOpcUa keep working. **Proactive protocol survey** across the estate — template, schema, rollup, and classification rule (core vs long-tail) live in [`current-state/equipment-protocol-survey.md`](current-state/equipment-protocol-survey.md); survey is a **Year 1 prerequisite** for core library scope, target steps 1–3 (System Platform IO / Ignition / ScadaBridge walks) done inside Q1 so the core driver build can start Q2. **Build the core driver library** for the protocols that meet the classification rule. **Deploy OtOpcUa to every site** (in place on existing System Platform nodes where LmxOpcUa already runs) as fast as practical — deployment ≠ cutover; an idle equipment namespace is cheap. **Begin tier 1 cutover (ScadaBridge)** at the large sites where we own both ends of the connection. _TBD — survey owner; first-cutover site selection._ | **Complete tier 1 (ScadaBridge)** across all sites. **Begin tier 2 (Ignition)** — Ignition consumers redirected from direct-equipment OPC UA to each site's OtOpcUa, collapsing WAN session counts from *N per equipment* to *one per site*. **Build long-tail drivers** on demand as sites require them. _TBD — per-site tier-2 rollout sequence._ | **Complete tier 2 (Ignition)** across all sites. **Execute tier 3 (Aveva System Platform IO)** with compliance stakeholder validation — the hardest cutover because System Platform IO feeds validated data collection. Reach steady state: every equipment session is held by OtOpcUa, every downstream consumer reads OT data through it. _TBD — per-equipment-class criteria for System Platform IO re-validation._ |
 | **Redpanda EventHub** | Stand up central Redpanda cluster in South Bend (single-cluster HA). Stand up bundled Schema Registry. Wire SASL/OAUTHBEARER to enterprise IdP. Create initial topic set (prefix-based ACLs). Hook up observability minimum signal set. Define the three retention tiers (`operational`/`analytics`/`compliance`). **Stand up the central `schemas` repo** with `buf` CI, CODEOWNERS, and the NuGet publishing pipeline. **Publish the canonical equipment/production/event model v1** — including the canonical machine state vocabulary (`Running / Idle / Faulted / Starved / Blocked` + any agreed additions) as a Protobuf enum, the `equipment.state.transitioned` event schema, and initial equipment-class definitions for pilot equipment. This is the foundation for Digital Twin Use Cases 1 and 3 (see `goal-state.md` → Strategic Considerations → Digital twin) and is load-bearing for pillar 2. _TBD — sizing decisions, initial topic list, pilot equipment classes for the first canonical definition, canonical vocabulary ownership (domain SME group)._ | Expand topic coverage as additional domains onboard. Enforce tiered retention and ACLs at scale. Prove backlog replay after a WAN-outage drill (also exercises the Digital Twin Use Case 2 simulation-lite replay path). Exercise long-outage planning (ScadaBridge queue capacity vs. outage duration). Iterate the canonical model as additional equipment classes and domains onboard. _TBD — concrete drill cadence._ | Steady-state operation. Harden alerting and runbooks against the observed failure modes from Years 1–2. Canonical model is mature and covers every in-scope equipment class; schema changes are routine rather than foundational. |
 | **SnowBridge** | Design and begin custom build in .NET. **Filtered, governed upload to Snowflake is the Year 1 purpose** — the service is the component that decides which topics/tags flow to Snowflake, applies the governed selection model, and writes into Snowflake. Ship an initial version with **one working source adapter** — starting with **Aveva Historian (SQL interface)** because it's central-only, exists today, and lets the workstream progress in parallel with Redpanda rather than waiting on it. First end-to-end **filtered** flow to Snowflake landing tables on a handful of priority tags. Selection model in place even if the operator UI isn't yet (config-driven is acceptable for Year 1). _TBD — team, credential management, datastore for selection state._ | Add the **ScadaBridge/Redpanda source adapter** alongside Historian. Build and ship the operator **web UI + API** on top of the Year 1 selection model, including the blast-radius-based approval workflow, audit trail, RBAC, and exportable state. Onboard priority tags per domain under the UI-driven governance path. _TBD — UI framework._ | All planned source adapters live behind the unified interface. Approval workflow tuned based on Year 2 operational experience. Feature freeze; focus on hardening. |
 | **Snowflake dbt Transform Layer** | Scaffold a dbt project in git, wired to the self-hosted orchestrator (per `goal-state.md`; specific orchestrator chosen outside this plan). Build first **landing → curated** model for priority tags. **Align curated views with the canonical model v1** published in the `schemas` repo — equipment, production, and event entities in the curated layer use the canonical state vocabulary and the same event-type enum values, so downstream consumers (Power BI, ad-hoc analysts, future AI/ML) see the same shape of data Redpanda publishes. This is the dbt-side delivery for Digital Twin Use Cases 1 and 3. Establish `dbt test` discipline from day one — including tests that catch divergence between curated views and the canonical enums. _TBD — project layout (single vs per-domain); reconciliation rule if derived state in curated views disagrees with the layer-3 derivation (should not happen, but the rule needs to exist)._ | Build curated layers for all in-scope domains. **Ship a canonical-state-based OEE model** as a strong candidate for the pillar-2 "not possible before" use case — accurate cross-equipment, cross-site OEE computed once in dbt from the canonical state stream, rather than re-derived in every reporting surface. Source-freshness SLAs tied to the **≤15-minute analytics** budget. Begin development of the first **"not possible before" AI/analytics use case** (pillar 2). | The "not possible before" use case is **in production**, consuming the curated layer, meeting its own SLO. Pillar 2 check passes. |
 | **ScadaBridge Extensions** | Implement **deadband / exception-based publishing** with the global-default model (+ override mechanism). Add **EventHub producer** capability with per-call **store-and-forward** to Redpanda. Verify co-located footprint doesn't degrade System Platform. _TBD — global deadband value, override mechanism location._ | Roll deadband + EventHub producer to **all currently-integrated sites**. Tune deadband and overrides based on observed Snowflake cost. Support early legacy-retirement work with outbound Web API / DB write patterns as needed. | Steady state. Any remaining Extensions work is residual cleanup or support for the tail end of Site Onboarding / Legacy Retirement. |
 | **Site Onboarding** | **No new site onboardings in Year 1.** Use the year to define and document the **lightweight onboarding pattern** for smaller sites — equipment types, network requirements, standard ScadaBridge template set, standard topic/tag set. Keep the existing integrated sites stable. | **Pilot the onboarding pattern** on one smaller site end-to-end (Berlin, Winterthur, or Jacksonville — choice TBD). Use learnings to refine the pattern, then **begin scaling** onboarding to additional smaller sites. _TBD — pilot site selection criteria, per-site effort estimate._ | **Complete onboarding of all remaining smaller sites.** Every site on the authoritative list is on the standardized stack. Pillar 1 check passes. |
 | **Legacy Retirement** | **Populate the legacy inventory** (`current-state/legacy-integrations.md`) — this is the prerequisite for sequencing. Identify **early-retirement candidates** where the replacement path already exists (e.g., **LEG-002 Camstar**, since ScadaBridge already has a native Camstar path). Retire at least one integration end-to-end as a pattern-proving exercise (including dual-run + decommission). _TBD — inventory ownership, discovery approach._ | **Bulk migration.** Execute retirements in sequence against the inventory, prioritized by a mix of risk and ease. Each retirement follows: plan → build replacement (often in ScadaBridge Extensions) → dual-run → cutover → decommission. Inventory burn-down tracked quarterly. _TBD — prioritization rubric, dual-run duration per integration class._ | **Drive inventory to zero.** Any remaining integrations are in dual-run or decommission phase at start of year; the inventory reaches zero by end of year. Pillar 3 check passes. |
 ### Cell reading guide
 - Each cell describes **what happens in that workstream during that year** — not every task, just the shape.
 - `_TBD_` markers inside cells are open items that don't need to be resolved today but will before execution of that cell.
 - Cells deliberately omit dates beyond "Year N" until individual workstreams firm up their delivery plans. This file should **not** become a Gantt chart; it's the strategic shape, not the project plan.
 ## End-of-plan pillar checks
 At the end of Year 3, the three pillar criteria from `goal-state.md` → Success Criteria are each **binary**. The cells above are structured so that the relevant workstream ends Year 3 having either satisfied its share of the check or not.
 - **Pillar 1 — Unification:** Site Onboarding ends Year 3 with all sites on the standardized stack.
 - **Pillar 2 — Analytics/AI Enablement:** dbt + SnowBridge + Redpanda end Year 3 with the "not possible before" use case in production against the ≤15-minute analytics SLO.
 - **Pillar 3 — Legacy Retirement:** Legacy Retirement ends Year 3 with the inventory at zero.
 If a workstream appears to be falling behind its Year 3 cell, the response is **never** to soften the end-state criterion. It is either to accelerate the workstream, reallocate from a lower-risk workstream, or formally accept slippage and adjust the plan — but the success criteria are not moved.
 ## Open questions
 - **Starting source adapter for the SnowBridge.** Year 1 commits to **Aveva Historian (SQL interface) as the first source adapter** — central-only Historian, exists today, lets the workstream progress in parallel with Redpanda. The Redpanda/ScadaBridge source adapter follows in Year 2 once that workstream has matured. Validate with the build team only if Historian SQL read proves painful at scale.
 - ~~Deadband global default value.~~ **Resolved.** Starting value is approximately **1% of span** for analogs, change-only for booleans/state, every-increment for counters — captured in `goal-state.md` under the deadband model. The build team may adjust during implementation; the mechanism is the load-bearing commitment, not the number.
 - **Pilot smaller-site selection.** Year 2 Site Onboarding needs a pilot site chosen early in Year 2 (or late Year 1).
 - **Quarterly milestones.** This grid is year-level only. Quarterly milestones that roll up into the three pillar checks are not yet defined — if leadership reporting needs them, they belong in a companion document, not in this grid.