Integrate v2 corrections addendum — ACL committed, schemas seed, cutover ownership

B1 resolved: ACL model designed and committed (decisions #129-132). 6-level scope hierarchy, NodePermissions bitmask, generation-versioned NodeAcl table, Phase 1 ships before any driver phase. Updated goal-state and roadmap. B2 partially resolved: schemas repo seed exists at schemas/ (temporary). FANUC CNC pilot class, JSON Schema format definitions, UNS subtree example, docs. Still needs: owner team, dedicated repo, format ratification, CI gate, consumer integration plumbing. C5 resolved: consumer cutover OUT of OtOpcUa v2 scope (decision #136). Integration/operations team owns cutover, not yet named. Plan updated to explicitly assign ownership outside OtOpcUa. CLAUDE.md updated with schemas/ in the file index.
2026-04-17 12:40:14 -04:00
parent 5953685ffb
commit 6b0883ff95
3 changed files with 19 additions and 4 deletions
--- a/goal-state.md
+++ b/goal-state.md
@@ -463,7 +463,14 @@ _TBD — service name (working title only); hosting (South Bend, alongside Redpa
 - **Rationale:** OPC UA is the protocol we're fronting, so the auth model stays in OPC UA's own terms. No SASL/OAUTHBEARER bridging, no custom token-exchange glue — OtOpcUa is self-contained and operable with standard OPC UA client tooling. **Inherits the LmxOpcUa auth pattern** — UserName tokens with standard OPC UA security modes/profiles — so the consumer-side experience does not change for clients that used LmxOpcUa previously, and the fold-in is an evolution rather than a rewrite.
 - **Explicitly not federated with the enterprise IdP.** Unlike Redpanda (which uses SASL/OAUTHBEARER against the enterprise IdP) and SnowBridge (which uses the same IdP for RBAC), OtOpcUa does **not** pull enterprise IdP identity into the OT data access path. OT data access is a pure OT concern, and the plan's IT/OT boundary stays at ScadaBridge central — not here.
 - **Trade-off accepted:** identity lifecycle (user token/cert provisioning, rotation, revocation) is managed locally in the OT estate rather than inherited from the enterprise IdP. Two identity stores to operate (enterprise IdP for IT-facing components, OPC UA-native identities for OtOpcUa) is the cost of keeping the OPC UA layer clean and self-contained.
- **ACL implementation (Year 1 deliverable — required before Tier 1 cutover).** The v2 implementation design surfaced that namespace-level ACLs are not yet modeled. The plan commits to: a per-cluster `EquipmentAcl` table (or equivalent) in the **central configuration database** mapping LDAP-group → permitted Namespace + UnsArea / UnsLine / Equipment subtree + permission level (Read / Write / AlarmAck). ACLs support four granularity levels with inheritance: Namespace → UnsArea → UnsLine → Equipment (grant at UnsArea cascades to all children unless overridden). ACLs are edited through the **Admin UI**, go through the same draft → diff → publish flow as driver/topology config, and are **generation-versioned** for auditability and rollback. The OPC UA NodeManager checks the ACL on every browse / read / write / subscribe against the connected user's LDAP group claims. **This is a substantial missing surface area that must be built before Tier 1 ScadaBridge cutover**, since the "access control / authorization chokepoint" responsibility is the plan's core promise at this layer.
+- **Data-path ACL model (designed and committed — lmxopcua decisions #129–132).** The v2 implementation design has committed the full ACL model in `lmxopcua/docs/v2/acl-design.md`. Key design points:
+  - **`NodePermissions` bitmask:** Browse / Read / Subscribe / HistoryRead / WriteOperate / WriteTune / WriteConfigure / AlarmRead / AlarmAcknowledge / AlarmConfirm / AlarmShelve / MethodCall, plus common bundles (`ReadOnly` / `Operator` / `Engineer` / `Admin`).
+  - **6-level scope hierarchy** with default-deny + additive grants: Cluster → Namespace → UnsArea → UnsLine → Equipment → Tag. Grant at UnsArea cascades to all children unless overridden. Browse-implication on ancestors (granting Read on a child implies Browse on its parents).
+  - **`NodeAcl` table is generation-versioned** (decision #130) — ACL changes go through draft → diff → publish → rollback like every other content table.
+  - **Cluster-create seeds default ACLs** matching the v1 LmxOpcUa LDAP-role-to-permission map (decision #131), preserving behavioral parity for v1 → v2 consumer migration.
+  - **Per-session permission-trie evaluator** with O(depth × group-count) cost; cache invalidated on generation-apply or LDAP group cache expiry.
+  - **Admin UI:** ACL tab + bulk grant + permission simulator.
+  - **Phasing:** Phase 1 ships the schema + Admin UI + evaluator unit tests; per-driver enforcement lands in each driver's phase (Phase 2+). **Phase 1 completes before any driver phase**, so the ACL model exists in the central config DB before any driver consumes it — satisfying the "must be working before Tier 1 cutover" timing constraint.
 - _TBD — specific OPC UA security mode + profile combinations required vs allowed; where UserName credentials/certs are sourced from (local site directory, a per-site credential vault, AD/LDAP); rotation cadence; audit trail of authz decisions._

 **Open questions (TBD).**
@@ -476,7 +483,7 @@ _TBD — service name (working title only); hosting (South Bend, alongside Redpa
  - **Certificate-distribution pre-cutover step (B3 from v2 corrections).** Before any consumer is cut over at a site, that consumer's OPC UA certificate trust store must be populated with the target OtOpcUa cluster's **per-node certificates and ApplicationUris** (2 per cluster; at Warsaw campuses with per-building clusters, multiply by building count if the consumer needs cross-building visibility). Consumers without pre-loaded trust will fail to connect. **Once a consumer has trusted a node's `ApplicationUri`, changing that `ApplicationUri` requires the consumer to re-establish trust** — this is an OPC UA spec constraint, not an implementation choice. OtOpcUa's Admin UI auto-suggests `urn:{Host}:OtOpcUa` on node creation but warns if `Host` changes later.
  - **Acceptable double-connection windows.** During each consumer's cutover, a short window of **both old direct connection and new cluster connection** existing at the same time for the same equipment is **tolerated** — it temporarily aggravates the session-load problem the cluster is meant to solve, but keeping the window short (minutes to hours, not days) bounds the exposure. Longer parallel windows are only acceptable for the System Platform cutover where compliance validation may require extended dual-run.
  - **Rollback posture.** Each consumer's cutover is reversible — if the cluster misbehaves during or immediately after a cutover, the consumer falls back to direct equipment OPC UA, and the cutover is retried after the issue is understood. The old direct-connection capability is **not removed** from consumers until all three cutover tiers are complete and stable at a site.
-  - **Consumer cutover plan needs an owner.** The v2 OtOpcUa implementation design covers building the server, drivers, config, and Admin UI (Phases 0–5) but does **not** address consumer cutover planning. The following are unaddressed and need ownership: per-site cutover sequencing, per-equipment validation methodology (proving consumers see equivalent data through OtOpcUa), rollback procedures, coordination with Aveva for System Platform IO cutover, operational runbooks for consumer connection failures. **Either** the OtOpcUa team adds cutover phases (6/7/8) to the v2 design, **or** an integration / operations team owns the cutover plan separately — in which case this section should name them and link the doc.
+  - **Consumer cutover plan — owned by a separate integration / operations team (not OtOpcUa).** Per lmxopcua decision #136, consumer cutover is **out of OtOpcUa v2 scope**. The OtOpcUa team's responsibility ends at Phase 5 — all drivers built, all stability protections in place, full Admin UI shipped including the data-path ACL editor. Cutover sequencing per site, validation methodology (proving consumers see equivalent data through OtOpcUa), rollback procedures, coordination with Aveva for System Platform IO cutover (tier 3), and operational runbooks are deliverables of a separate **integration / operations team that has yet to be named**. The handoff's tier 1/2/3 sequencing (above) remains the authoritative high-level roadmap; the implementation-level cutover plan lives outside OtOpcUa's docs. _TBD — name the integration/operations team and link their cutover plan doc._
  - _TBD — per-site cutover sequencing across the three tiers (all sites reach tier 1 before any reaches tier 2, or one site completes all three tiers before the next site starts), and per-equipment-class criteria for when a System Platform IO cutover requires compliance re-validation; cutover plan owner assignment._
 - **Validated-data implication (E2 — Aveva pattern validation needed Year 1 or early Year 2).** System Platform's validated data collection currently uses its own IO path; moving that through OtOpcUa may require validation/re-qualification depending on the regulated context. **Year 1 or early Year 2 research deliverable:** validate with Aveva that System Platform IO drivers support upstream OPC UA-server data sources (OtOpcUa), including any restrictions on security mode, namespace structure, or session model. If Aveva's pattern requires something OtOpcUa doesn't expose, that's a long-lead-time discovery that must surface well before Year 3's Tier 3 cutover.
 - **Relationship to ScadaBridge's 225k/sec ingestion ceiling** (per `current-state.md`): the cluster's aggregate throughput must be able to feed ScadaBridge at its capacity without becoming a bottleneck — sizing needs to reflect this.
@@ -564,7 +571,14 @@ The plan already delivers the infrastructure for a cross-system canonical model

 This subsection makes that declaration. It is the plan's answer to **Digital Twin Use Cases 1 and 3** (see **Strategic Considerations → Digital twin**) and — independent of digital twin framing — is load-bearing for pillar 2 (analytics/AI enablement) because a canonical model is what makes "not possible before" cross-domain analytics possible at all.

-> **Schemas-repo dependency is on the OtOpcUa critical path (B2 from v2 corrections).** The `schemas` repo does not exist yet. Until it does, OtOpcUa equipment configurations are hand-curated per-equipment with no class templates, no auto-generated tag lists, no cross-cluster consistency checks, and no signal-validation contract for Layer 3 state derivation. The plan commits to **schemas-repo creation as a Year 1 deliverable** (its own scope, distinct from the OtOpcUa workstream) with a **pilot equipment class (FANUC CNC)** landed in the repo before Tier 1 cutover begins. The **UNS hierarchy snapshot** (a per-site equipment-instance walk) feeds the initial schemas-repo equipment-class list and hierarchy definition. Core driver scope is already resolved by the v2 implementation team's committed driver list.
+> **Schemas-repo dependency — partially resolved.** The OtOpcUa team has contributed an initial seed at [`schemas/`](../schemas/) (temporary location in the 3-year-plan repo until the dedicated `schemas` repo is created — Gitea push-to-create is disabled). The seed includes: JSON Schema format definitions (`format/equipment-class.schema.json`, `format/tag-definition.schema.json`, `format/uns-subtree.schema.json`), the **FANUC CNC pilot equipment class** (`classes/fanuc-cnc.json` — 16 signals + 3 alarm definitions + state-derivation notes), a worked UNS subtree example (`uns/example-warsaw-west.json`), and documentation (`docs/overview.md`, `docs/format-decisions.md` with 8 numbered decisions, `docs/consumer-integration.md`). The **UNS hierarchy snapshot** (a per-site equipment-instance walk) feeds the initial hierarchy definition. Core driver scope is already resolved by the v2 implementation team's committed driver list.
+>
+> **Still needs cross-team ownership:**
+> - Name an owner team for the schemas content (it's consumed by OT and IT systems alike — OtOpcUa, Redpanda, dbt)
+> - Decide whether to move to a dedicated `gitea.dohertylan.com/dohertj2/schemas` repo (proposed) or keep as a 3-year-plan sub-tree
+> - Ratify or revise the 8 format decisions in `schemas/docs/format-decisions.md`
+> - Establish the CI gate for JSON Schema validation
+> - Decide on consumer-integration plumbing for Redpanda Protobuf code-gen and dbt macro generation per `schemas/docs/consumer-integration.md`

 > **Unified Namespace framing:** this canonical model is also the plan's **Unified Namespace** (UNS) — see **Target IT/OT Integration → Unified Namespace (UNS) posture**. The UNS posture is a higher-level framing of the same mechanics described here: this section specifies the canonical model mechanically; the UNS posture explains what stakeholders asking about UNS should understand about how the plan delivers the UNS value proposition without an MQTT/Sparkplug broker.