After shipping the four Phase 6 plan drafts (PRs 77-80), the adversarial-review
adjustments lived only as trailing "Review" sections. An implementer reading
Stream A would find the original unadjusted guidance, then have to cross-reference
the review to reconcile. This PR makes the plans genuinely executable:
1. Merges every ACCEPTed review finding into the actual Scope / Stream / Compliance
sections of each phase plan:
- phase-6-1: Scope table rewrite (per-capability retry, (instance,host) pipeline key,
MemoryTracking vs MemoryRecycle split, hybrid watchdog formula, demand-aware
wedge detector, generation-sealed LiteDB). Streams A/B/D + Compliance rewritten.
- phase-6-2: AuthorizationDecision tri-state, control/data-plane separation,
MembershipFreshnessInterval (15 min), AuthCacheMaxStaleness (5 min),
subscription stamp-and-reevaluate. Stream C widened to 11 OPC UA operations.
- phase-6-3: 8-state ServiceLevel matrix (OPC UA Part 5 §6.3.34-compliant),
two-layer peer probe (/healthz + UaHealthProbe), apply-lease via await using,
publish-generation fencing, InvalidTopology runtime state, ServerUriArray
self-first + peers. New Stream F (interop matrix + Galaxy failover).
- phase-6-4: DraftRevisionToken concurrency control, staged-import via
EquipmentImportBatch with user-scoped visibility, CSV header version marker,
decision-#117-aligned identifier columns, 1000-row diff cap,
decision-#139 OPC 40010 fields, Identification inherits Equipment ACL.
2. Appends decisions #143 through #162 to docs/v2/plan.md capturing the
architectural commitments the adjustments created. Each decision carries its
dated rationale so future readers know why the choice was made.
3. Scaffolds scripts/compliance/phase-6-{1,2,3,4}-compliance.ps1 — PowerShell
stubs with Assert-Todo / Assert-Pass / Assert-Fail helpers. Every check
maps to a Stream task ID from the corresponding phase plan. Currently all
checks are TODO and scripts exit 0; each implementation task is responsible
for replacing its TODO with a real check before closing that task. Saved
as UTF-8 with BOM so Windows PowerShell 5.1 parses em-dash characters
without breaking.
Net result: the Phase 6.1 plan is genuinely ready to execute. Stream A.3 can
start tomorrow without reconciling Streams vs. Review on every task; the
compliance script is wired to the Stream IDs; plan.md has the architectural
commitments that justify the Stream choices.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
21 KiB
Phase 6.2 — Authorization Runtime (ACL + LDAP grants)
Status: DRAFT — the v2
plan.mddecision #129 +acl-design.mdspecify a 6-level permission-trie evaluator withNodePermissionsbitmask grants, but no runtime evaluator exists. ACL tables are schematized but unread by the data path.Branch:
v2/phase-6-2-authorization-runtimeEstimated duration: 2.5 weeks Predecessor: Phase 6.1 (Resilience & Observability) — reuses the Polly pipeline for ACL-cache refresh retries Successor: Phase 6.3 (Redundancy)
Phase Objective
Wire ACL enforcement on every OPC UA Read / Write / Subscribe / Call path + LDAP group → admin role grants that the v2 plan specified but never ran. End-state: a user's effective permissions resolve through a per-session permission-trie over the 6-level Cluster / Namespace / UnsArea / UnsLine / Equipment / Tag hierarchy, cached per session, invalidated on generation-apply + LDAP group expiry.
Closes these gaps:
- Data-path ACL enforcement —
NodeAcltable +NodePermissionsflags shipped;NodeAclService.cspresent as a CRUD surface; no code consults ACLs atRead/Writetime. OPC UA server answers everything to everyone. LdapGroupRoleMappingfor cluster-scoped admin grants — decision #105 shipped as the design; admin roles are hardcoded (FleetAdmin/ConfigEditor/ReadOnly) with no cluster-scoping and no LDAP-to-grant table. Decision #105 explicitly lifts this from v2.1 into v2.0.- Explicit Deny pathway — deferred to v2.1 (decision #129 note). Phase 6.2 ships grants only;
Denystays out. - Admin UI ACL grant editor —
AclsTab.razorexists but edits the now-unusedNodeAcltable; needs to wire to the runtime evaluator + the newLdapGroupRoleMappingtable.
Scope — What Changes
Architectural separation (critical for correctness): LdapGroupRoleMapping is control-plane only — it maps LDAP groups to Admin UI roles (FleetAdmin / ConfigEditor / ReadOnly) and cluster scopes for Admin access. It is NOT consulted by the OPC UA data-path evaluator. The data-path evaluator reads NodeAcl rows joined directly against the session's resolved LDAP group memberships. The two concerns share zero runtime code path.
| Concern | Change |
|---|---|
Configuration project |
New entity LdapGroupRoleMapping { Id, LdapGroup, Role, ClusterId? (nullable = system-wide), IsSystemWide, GeneratedAtUtc }. Consumed only by Admin UI role routing. Migration. Admin CRUD. |
Core → new Core.Authorization sub-namespace |
IPermissionEvaluator.Authorize(IEnumerable<Claim> identity, OpcUaOperation op, NodeId nodeId) → AuthorizationDecision. op covers every OPC UA surface: Browse, Read, Write, HistoryRead, HistoryUpdate, CreateMonitoredItems, TransferSubscriptions, Call, Acknowledge, Confirm, Shelve. Result is tri-state (internal model distinguishes Allow / NotGranted / Denied + carries matched-grant provenance). Phase 6.2 only produces Allow + NotGranted; v2.1 Deny lands without API break. |
PermissionTrieBuilder |
Builds trie from NodeAcl rows joined against resolved LDAP group memberships, keyed on 6-level scope hierarchy for Equipment namespaces. SystemPlatform namespaces (Galaxy) use a FolderSegment scope level between Namespace and Tag, populated from Tag.FolderPath segments, so folder subtree authorization works on Galaxy trees the same way UNS works on Equipment trees. Trie node carries ScopeKind enum. |
PermissionTrieCache + freshness |
One trie per (ClusterId, GenerationId). Invalidated on sp_PublishGeneration via in-process event bus AND generation-ID check on hot path — every authz call looks up CurrentGenerationId (Polly-wrapped, sub-second cache); a Backup that cached a stale generation detects the mismatch + forces re-load. Redundancy-safe. |
UserAuthorizationState freshness |
Cached per session BUT bounded by MembershipFreshnessInterval (default 15 min). Past that, the next hot-path authz call re-resolves LDAP group memberships via LdapGroupService. Failure to re-resolve (LDAP unreachable) → fail-closed: evaluator returns NotGranted for every call until memberships refresh successfully. Decoupled from Phase 6.1's availability-oriented 24h cache. |
AuthCacheMaxStaleness |
Separate from Phase 6.1's UsingStaleConfig window. Default 5 min — beyond that, authz fails closed regardless of Phase 6.1 cache warmth. |
| OPC UA server dispatch — all enforcement surfaces | DriverNodeManager wires evaluator on: Browse + TranslateBrowsePathsToNodeIds (ancestors implicitly visible if any descendant has a grant; denied ancestors filter from results), Read (per-attribute StatusCode BadUserAccessDenied in mixed-authorization batches; batch never poisons), Write (uses NodePermissions.WriteOperate/Tune/Configure based on driver SecurityClassification), HistoryRead (uses NodePermissions.HistoryRead — distinct flag, not Read), HistoryUpdate (NodePermissions.HistoryUpdate), CreateMonitoredItems (per-MonitoredItemCreateResult denial), TransferSubscriptions (re-evaluates items on transfer), Call (NodePermissions.MethodCall), Acknowledge/Confirm/Shelve (per-alarm flags). |
| Subscription re-authorization | Each MonitoredItem is stamped with (AuthGenerationId, MembershipVersion) at create time. On every Publish, items with a stamp mismatching the session's current (AuthGenerationId, MembershipVersion) get re-evaluated; revoked items drop to BadUserAccessDenied within one publish cycle. Unchanged items stay fast-path. |
LdapAuthService |
On cookie-auth success: resolves LDAP group memberships; loads matching LdapGroupRoleMapping rows → role claims + cluster-scope claims (control plane); stores UserAuthorizationState.LdapGroups on the session for the data-plane evaluator. |
ValidatedNodeAclAuthoringService |
Replaces CRUD-only NodeAclService for authoring. Validates (LDAP group exists, scope exists in current or target draft, grant shape is valid, no duplicate (LdapGroup, Scope) pair). Admin UI writes only through it. |
Admin UI AclsTab.razor |
Writes via ValidatedNodeAclAuthoringService. Adds Probe-This-Permission row that runs the real evaluator against a chosen (LDAP group, node, operation) and shows Allow / NotGranted + matched-grant provenance. |
Admin UI new tab RoleGrantsTab.razor |
CRUD over LdapGroupRoleMapping. Per-cluster + system-wide grants. FleetAdmin only. Documentation explicit that this only affects Admin UI access, not OPC UA data plane. |
| Audit log | Every Grant/Revoke/Publish on LdapGroupRoleMapping or NodeAcl writes an AuditLog row with old/new state + user. |
Scope — What Does NOT Change
| Item | Reason |
|---|---|
| OPC UA authn | Already done (PR 19 LDAP user identity + Basic256Sha256 profile). Phase 6.2 is authorization only. |
Explicit Deny grants |
Decision #129 note explicitly defers to v2.1. Default-deny + additive grants only. |
Driver-side SecurityClassification metadata |
Drivers keep reporting Operate / ViewOnly / etc. — the evaluator uses them as part of the decision but doesn't replace them. |
| Galaxy namespace (SystemPlatform kind) | UNS levels don't apply; evaluator treats Galaxy nodes as Cluster → Namespace → Tag (skip UnsArea/UnsLine/Equipment). |
Entry Gate Checklist
- Phase 6.1 merged (reuse
Core.ResiliencePolly pipeline for the ACL cache-refresh retries) acl-design.mdre-read in full- Decision log #105, #129, corrections-doc B1 re-skimmed
- Existing
NodeAcl+NodePermissionsflag enum audited; confirm bitmask flags matchacl-design.mdtable - Existing
LdapAuthServicegroup-resolution code path traced end-to-end — confirm it already queries group memberships (we only need the caller to consume the result) - Test DB scenarios catalogued: two clusters, three LDAP groups per cluster, mixed grant shapes; captured as seed-data fixtures
Task Breakdown
Stream A — LdapGroupRoleMapping table + migration (3 days)
- A.1 Entity + EF Core migration. Columns per §Scope table. Unique constraint on
(LdapGroup, ClusterId)with null-tolerant comparer for the system-wide case. Index onLdapGroupfor the hot-path lookup on auth. - A.2
ILdapGroupRoleMappingServiceCRUD. Wrap in the Phase 6.1 Polly pipeline (timeout → retry → fallback-to-cache). - A.3 Seed-data migration: preserve the current hardcoded
FleetAdmin/ConfigEditor/ReadOnlymappings by seeding rows for the existing LDAP groups the dev box uses (cn=fleet-admin,…,cn=config-editor,…,cn=read-only,…). Op no-op migration for existing deployments.
Stream B — Permission-trie evaluator (1 week)
- B.1
IPermissionEvaluator.Authorize(IEnumerable<Claim> identity, NodeId nodeId, NodePermissions needed)— returnsbool. Phase 6.2 returns onlytrue/false; v2.1 can widen toAllow/Deny/Indeterminateif Deny lands. - B.2
PermissionTrieBuilderbuilds the trie fromNodeAcl+LdapGroupRoleMappingjoined to the current generation'sUnsArea+UnsLine+Equipment+Tagtables. One trie per(ClusterId, GenerationId)so rollback doesn't smear permissions across generations. - B.3 Trie node structure:
{ Level: enum, ScopeId: Guid, AllowedPermissions: NodePermissions, ChildrenByLevel: Dictionary<Guid, TrieNode> }. Evaluation walks from Cluster → Namespace → UnsArea → UnsLine → Equipment → Tag, ORing allowed permissions at each level. Additive semantics: a grant at Cluster level cascades to every descendant tag. - B.4
PermissionTrieCacheservice scoped as singleton; exposesGetTrieAsync(ClusterId, ct)that returns the current-generation trie. Invalidated onsp_PublishGenerationvia an in-process event bus; also on TTL expiry (24 h safety net). - B.5 Per-session cached evaluator: OPC UA Session authentication produces
UserAuthorizationState { ClusterId, LdapGroups[], Trie }; cached on the session until session close or generation-apply. - B.6 Unit tests: trie-walk theory covering (a) Cluster-level grant cascades to tags, (b) Equipment-level grant doesn't leak to sibling Equipment, (c) multi-group union, (d) no-grant → deny, (e) Galaxy nodes skip UnsArea/UnsLine levels.
Stream C — OPC UA server dispatch wiring (6 days, widened)
- C.1
DriverNodeManager.Read— evaluator consulted perReadValueIdwithOpcUaOperation.Read. Denied attributes getBadUserAccessDeniedper-item; batch never poisons. Integration test covers mixed-authorization batch (3 authorized + 2 denied → 3 Good values + 2 Bad StatusCodes, request completes). - C.2
DriverNodeManager.Write— evaluator choosesNodePermissions.WriteOperate/WriteTune/WriteConfigurebased on the driver-reportedSecurityClassification. - C.3
DriverNodeManager.HistoryRead— usesNodePermissions.HistoryRead, which is a distinct flag from Read. Test: user with Read but not HistoryRead can read live values but getsBadUserAccessDeniedonHistoryRead. - C.4
DriverNodeManager.HistoryUpdate— usesNodePermissions.HistoryUpdate. - C.5
DriverNodeManager.CreateMonitoredItems— per-MonitoredItemCreateResultdenial in mixed-authorization batch; partial success path per OPC UA Part 4. Each created item stamped(AuthGenerationId, MembershipVersion). - C.6
DriverNodeManager.TransferSubscriptions— on reconnect, re-evaluate every transferredMonitoredItemagainst the session's current auth state. Stale-stamp items drop toBadUserAccessDenied. - C.7 Browse + TranslateBrowsePathsToNodeIds — evaluator called with
OpcUaOperation.Browse. Ancestor visibility implied when any descendant has a grant (peracl-design.md§Browse). Denied ancestors filter from browse results — the UA browser sees a hierarchy truncated at the denied ancestor rather than an inconsistent child-without-parent view. - C.8
DriverNodeManager.Call—NodePermissions.MethodCall. - C.9 Alarm actions (Acknowledge / Confirm / Shelve) — per-alarm
NodePermissions.AlarmAck/AlarmConfirm/AlarmShelve. - C.10 Publish path — for each
MonitoredItemwith a mismatched(AuthGenerationId, MembershipVersion)stamp, re-evaluate. Unchanged items stay fast-path; changes happen at next publish cycle. - C.11 Integration tests: three-user seed with different memberships; matrix covers every operation in §Scope. Mixed-batch tests for Read + CreateMonitoredItems.
Stream D — Admin UI refresh (4 days)
- D.1
RoleGrantsTab.razor— FleetAdmin-gated CRUD onLdapGroupRoleMapping. Per-cluster dropdown + system-wide checkbox. Validation: LDAP group must exist in the dev LDAP (GLAuth) before saving — best-effort probe with graceful degradation. - D.2
AclsTab.razorrewrites its edit path to write through the newNodeAclService. Adds a "Probe this permission" row: choose(LDAP group, node, action)→ shows Allow / Deny + the reason (which grant matched). - D.3 Draft-generation diff viewer now includes an ACL section: "X grants added, Y grants removed, Z grants changed."
- D.4 SignalR notification:
PermissionTrieCacheinvalidation onsp_PublishGenerationpushes to Admin UI so operators see "this clusters permissions were just updated" within 2 s.
Compliance Checks (run at exit gate)
- Control/data-plane separation:
LdapGroupRoleMappingconsumed only by Admin UI; the data-path evaluator has zero references to it. Enforced via a project-reference audit (Admin project references the mapping service;Core.Authorizationdoes not). - Every operation wired: Browse, Read, Write, HistoryRead, HistoryUpdate, CreateMonitoredItems, TransferSubscriptions, Call, Acknowledge, Confirm, Shelve all consult the evaluator. Integration test matrix covers every operation × allow/deny.
- HistoryRead uses its own flag: test "user with Read + no HistoryRead gets
BadUserAccessDeniedon HistoryRead". - Mixed-batch semantics: Read of 5 nodes (3 allowed + 2 denied) returns 3 Good + 2
BadUserAccessDeniedper-ReadValueId; CreateMonitoredItems equivalent. - Browse ancestor visibility: user with a grant only on a deep equipment node can browse the path to it (ancestors implied); denied ancestors filter from browse results otherwise.
- Galaxy FolderSegment coverage: a grant on a Galaxy folder subtree cascades to its tags; sibling folders are unaffected. Trie test covers this.
- Subscription re-authorization: integration test — create item, revoke grant via draft+publish, next publish cycle the item returns
BadUserAccessDenied(not silently still-notifying). - Membership freshness: test — 15 min MembershipFreshnessInterval elapses on a long-lived session + LDAP now unreachable → authz fails closed on the next request until LDAP recovers.
- Auth cache fail-closed: test — Phase 6.1 cache serves stale config for 6 min; authz evaluator refuses all calls after 5 min regardless.
- Trie invariants:
PermissionTrieBuilderis idempotent (build twice with identical inputs → equal tries). - Additive grants + cluster isolation: cluster-grant cascades; cross-cluster leakage impossible.
- Redundancy-safe invalidation: integration test — two nodes, a publish on one, authorize a request on the other before in-process event propagates → generation-mismatch forces re-load, no stale decision.
- Authoring validation:
AclsTabcannot save a(LdapGroup, Scope)pair that already exists in the draft; operator sees the validation error pre-save. - AuthorizationDecision shape stability: API surface exposes
Allow+NotGrantedonly;Deniedvariant exists in the type but is never produced; v2.1 can add Deny without API break. - No regression in driver test counts.
Risks and Mitigations
| Risk | Likelihood | Impact | Mitigation |
|---|---|---|---|
| ACL evaluator latency on per-read hot path | Medium | High | Trie lookup is O(depth) = O(6); session-cached UserAuthorizationState avoids per-Read trie rebuild; benchmark in Stream B.6 |
| Trie cache stale after a rollback | Medium | High | sp_PublishGeneration + sp_RollbackGeneration both emit the invalidation event; trie keyed on (ClusterId, GenerationId) so rollback fetches the prior trie cleanly |
BadUserAccessDenied returns expose sensitive browse-name metadata |
Low | Medium | Server returns only the status code + NodeId; no message leak per OPC UA Part 4 §7.34 guidance |
| LdapGroupRoleMapping migration breaks existing deployments | Low | High | Seed-migration preserves the hardcoded groups' effective grants verbatim; smoke test exercises the post-migration fleet admin login |
Deny semantics accidentally ship (would break acl-design.md defer) |
Low | Medium | IPermissionEvaluator.Authorize returns bool (not tri-state) through Phase 6.2; widening to Allow/Deny/Indeterminate is a v2.1 ticket |
Completion Checklist
- Stream A:
LdapGroupRoleMappingentity + migration + CRUD + seed - Stream B: evaluator + trie builder + cache + per-session state + unit tests
- Stream C: OPC UA dispatch wiring on Read/Write/HistoryRead/Subscribe/Alarm paths
- Stream D: Admin UI
RoleGrantsTab+AclsTabrefresh + SignalR invalidation phase-6-2-compliance.ps1exits 0; exit-gate doc recorded
Adversarial Review — 2026-04-19 (Codex, thread 019da48d-0d2b-7171-aed2-fc05f1f39ca3)
- Crit · ACCEPT — Trie must not conflate
LdapGroupRoleMapping(control-plane admin claims per decision #105) with data-plane ACLs (decision #129). Change:LdapGroupRoleMappingis consumed only by the Admin UI role router. Data-plane trie readsNodeAclrows joined against the session's resolved LDAP groups, never admin roles. Stream B.2 updated. - Crit · ACCEPT — Cached
UserAuthorizationStatesurvives LDAP group changes because memberships only refresh at cookie-auth. Change: addMembershipFreshnessInterval(default 15 min); past that, next hot-path authz call forces group re-resolution (fail-closed if LDAP unreachable). Session-close-wins on config-rollback. - High · ACCEPT — Node-local invalidation doesn't extend across redundant pair. Change: trie keyed on
(ClusterId, GenerationId); hot-path authz looks upCurrentGenerationIdfrom the shared config DB (Polly-wrapped + sub-second cache). A Backup that read stale generation gets a mismatched trie → forces re-load. Implementation note added to Stream B.4. - High · ACCEPT — Browse enforcement missing. Change: new Stream C.7 (
Browse + TranslateBrowsePathsToNodeIdsenforcement). Ancestor visibility implied when any descendant has a grant; denied ancestors filter from browse results peracl-design.md§Browse. - High · ACCEPT —
HistoryReadshould useNodePermissions.HistoryReadbit, notRead. Change: Stream C.3 revised; separate unit test assertsRead+no-HistoryReaddenies HistoryRead while allowing current-value reads. - High · ACCEPT — Galaxy shallow-path (Cluster→Namespace→Tag) loses folder hierarchy authorization. Change: SystemPlatform namespaces use a
FolderSegmentscope-level between Namespace and Tag, populated fromTag.FolderPath; UNS-kind namespaces keep the 6-level hierarchy. Trie supports both viaScopeKindon each node. - High · ACCEPT — Subscription re-authorization policy unresolved between create-time-only (fast, wrong on revoke) and per-publish (slow). Change: stamp each
MonitoredItemwith(AuthGenerationId, MembershipVersion); re-evaluate on Publish only when either version changed. Revoked items drop toBadUserAccessDeniedwithin one publish cycle. - Med · ACCEPT — Mixed-authorization batch
Read/CreateMonitoredItemsservice-result semantics underspecified. Change: Stream C.6 explicitly tests per-ReadValueId+ per-MonitoredItemCreateResultdenial in mixed batches; batch never collapses to a coarse failure. - Med · ACCEPT — Missing surfaces:
Method.Call,HistoryUpdate, event filter on subscriptions, subscription-transfer on reconnect, alarm-ack. Change: scope expanded — every OPC UA authorization surface enumerated in Stream C: Read, Write, HistoryRead, HistoryUpdate, CreateMonitoredItems, TransferSubscriptions, Call, Acknowledge/Confirm/Shelve, Browse, TranslateBrowsePathsToNodeIds. - Med · ACCEPT —
boolevaluator bakes in grant-only semantics; collides with v2.1 Deny. Change: internal model usesAuthorizationDecision { Allow | NotGranted | Denied, IReadOnlyList<MatchedGrant> Provenance }. Phase 6.2 mapsDenied→ never produced; UI + audit log use the full record so v2.1 Deny lands without API break. - Med · ACCEPT — 6.1 cache fallback is availability-oriented; applying it to auth is correctness-dangerous. Change: auth-specific staleness budget
AuthCacheMaxStaleness(default 5 min, not 24 h). Past that, hot-path evaluator fails closed on cached reads; all authorization calls returnNotGranteduntil fresh data lands. Documented in risks + compliance. - Low · ACCEPT — Existing
NodeAclServiceis raw CRUD. Change: newValidatedNodeAclAuthoringServiceenforces scope-uniqueness + draft/publish invariants + rejects invalid (LDAP group, scope) pairs; Admin UI writes through it only. Stream D.2 adjusted.