Files
lmxopcua/docs/v2/implementation/phase-6-2-authorization-runtime.md
Joseph Doherty bd53ebd192 Phase 6.2 exit gate — compliance script real-checks + phase doc = SHIPPED (core)
scripts/compliance/phase-6-2-compliance.ps1 replaces the stub TODOs with 23
real checks spanning:
- Stream A: LdapGroupRoleMapping entity + AdminRole enum + ILdapGroupRoleMappingService
  + impl + write-time invariant + EF migration all present.
- Stream B: OpcUaOperation enum + NodeScope + AuthorizationDecision tri-state
  + IPermissionEvaluator + PermissionTrie + Builder + Cache keyed on
  GenerationId + UserAuthorizationState with MembershipFreshnessInterval=15m
  and AuthCacheMaxStaleness=5m + TriePermissionEvaluator + HistoryRead uses
  its own flag.
- Control/data-plane separation: the evaluator + trie + cache + builder +
  interface all have zero references to LdapGroupRoleMapping (decision #150).
- Stream C foundation: ILdapGroupsBearer + AuthorizationGate with StrictMode
  knob. DriverNodeManager dispatch-path wiring (11 surfaces) is Deferred,
  tracked as task #143.
- Stream D data layer: ValidatedNodeAclAuthoringService + exception type +
  rejects None permissions. Blazor UI pieces (RoleGrantsTab, AclsTab,
  SignalR invalidation, draft diff) are Deferred, tracked as task #144.
- Cross-cutting: full solution dotnet test runs; 1097 >= 1042 baseline;
  tolerates the one pre-existing Client.CLI Subscribe flake.

IPermissionEvaluator doc-comment reworded to avoid mentioning the literal
type name "LdapGroupRoleMapping" — the compliance check does a text-absence
sweep for that identifier across the data-plane files.

docs/v2/implementation/phase-6-2-authorization-runtime.md status updated from
DRAFT to SHIPPED (core). Two deferred follow-ups explicitly called out so
operators see what's still pending for the "Phase 6.2 fully wired end-to-end"
milestone.

`Phase 6.2 compliance: PASS` — exit 0. Any regression that deletes a class
or re-introduces an LdapGroupRoleMapping reference into the data-plane
evaluator turns a green check red + exit non-zero.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-19 09:45:58 -04:00

22 KiB
Raw Blame History

Phase 6.2 — Authorization Runtime (ACL + LDAP grants)

Status: SHIPPED (core) 2026-04-19 — Streams A, B, C (foundation), D (data layer) merged to v2 across PRs #84-87. Final exit-gate PR #88 turns the compliance stub into real checks (all pass, 2 deferred surfaces tracked).

Deferred follow-ups (tracked separately):

  • Stream C dispatch wiring on the 11 OPC UA operation surfaces (task #143).
  • Stream D Admin UI — RoleGrantsTab, AclsTab Probe-this-permission, SignalR invalidation, draft-diff ACL section + visual-compliance reviewer signoff (task #144).

Baseline pre-Phase-6.2: 1042 solution tests → post-Phase-6.2 core: 1097 passing (+55 net). One pre-existing Client.CLI Subscribe flake unchanged.

Branch: v2/phase-6-2-authorization-runtime Estimated duration: 2.5 weeks Predecessor: Phase 6.1 (Resilience & Observability) — reuses the Polly pipeline for ACL-cache refresh retries Successor: Phase 6.3 (Redundancy)

Phase Objective

Wire ACL enforcement on every OPC UA Read / Write / Subscribe / Call path + LDAP group → admin role grants that the v2 plan specified but never ran. End-state: a user's effective permissions resolve through a per-session permission-trie over the 6-level Cluster / Namespace / UnsArea / UnsLine / Equipment / Tag hierarchy, cached per session, invalidated on generation-apply + LDAP group expiry.

Closes these gaps:

  1. Data-path ACL enforcementNodeAcl table + NodePermissions flags shipped; NodeAclService.cs present as a CRUD surface; no code consults ACLs at Read/Write time. OPC UA server answers everything to everyone.
  2. LdapGroupRoleMapping for cluster-scoped admin grants — decision #105 shipped as the design; admin roles are hardcoded (FleetAdmin / ConfigEditor / ReadOnly) with no cluster-scoping and no LDAP-to-grant table. Decision #105 explicitly lifts this from v2.1 into v2.0.
  3. Explicit Deny pathway — deferred to v2.1 (decision #129 note). Phase 6.2 ships grants only; Deny stays out.
  4. Admin UI ACL grant editorAclsTab.razor exists but edits the now-unused NodeAcl table; needs to wire to the runtime evaluator + the new LdapGroupRoleMapping table.

Scope — What Changes

Architectural separation (critical for correctness): LdapGroupRoleMapping is control-plane only — it maps LDAP groups to Admin UI roles (FleetAdmin / ConfigEditor / ReadOnly) and cluster scopes for Admin access. It is NOT consulted by the OPC UA data-path evaluator. The data-path evaluator reads NodeAcl rows joined directly against the session's resolved LDAP group memberships. The two concerns share zero runtime code path.

Concern Change
Configuration project New entity LdapGroupRoleMapping { Id, LdapGroup, Role, ClusterId? (nullable = system-wide), IsSystemWide, GeneratedAtUtc }. Consumed only by Admin UI role routing. Migration. Admin CRUD.
Core → new Core.Authorization sub-namespace IPermissionEvaluator.Authorize(IEnumerable<Claim> identity, OpcUaOperation op, NodeId nodeId) → AuthorizationDecision. op covers every OPC UA surface: Browse, Read, Write, HistoryRead, HistoryUpdate, CreateMonitoredItems, TransferSubscriptions, Call, Acknowledge, Confirm, Shelve. Result is tri-state (internal model distinguishes Allow / NotGranted / Denied + carries matched-grant provenance). Phase 6.2 only produces Allow + NotGranted; v2.1 Deny lands without API break.
PermissionTrieBuilder Builds trie from NodeAcl rows joined against resolved LDAP group memberships, keyed on 6-level scope hierarchy for Equipment namespaces. SystemPlatform namespaces (Galaxy) use a FolderSegment scope level between Namespace and Tag, populated from Tag.FolderPath segments, so folder subtree authorization works on Galaxy trees the same way UNS works on Equipment trees. Trie node carries ScopeKind enum.
PermissionTrieCache + freshness One trie per (ClusterId, GenerationId). Invalidated on sp_PublishGeneration via in-process event bus AND generation-ID check on hot path — every authz call looks up CurrentGenerationId (Polly-wrapped, sub-second cache); a Backup that cached a stale generation detects the mismatch + forces re-load. Redundancy-safe.
UserAuthorizationState freshness Cached per session BUT bounded by MembershipFreshnessInterval (default 15 min). Past that, the next hot-path authz call re-resolves LDAP group memberships via LdapGroupService. Failure to re-resolve (LDAP unreachable) → fail-closed: evaluator returns NotGranted for every call until memberships refresh successfully. Decoupled from Phase 6.1's availability-oriented 24h cache.
AuthCacheMaxStaleness Separate from Phase 6.1's UsingStaleConfig window. Default 5 min — beyond that, authz fails closed regardless of Phase 6.1 cache warmth.
OPC UA server dispatch — all enforcement surfaces DriverNodeManager wires evaluator on: Browse + TranslateBrowsePathsToNodeIds (ancestors implicitly visible if any descendant has a grant; denied ancestors filter from results), Read (per-attribute StatusCode BadUserAccessDenied in mixed-authorization batches; batch never poisons), Write (uses NodePermissions.WriteOperate/Tune/Configure based on driver SecurityClassification), HistoryRead (uses NodePermissions.HistoryReaddistinct flag, not Read), HistoryUpdate (NodePermissions.HistoryUpdate), CreateMonitoredItems (per-MonitoredItemCreateResult denial), TransferSubscriptions (re-evaluates items on transfer), Call (NodePermissions.MethodCall), Acknowledge/Confirm/Shelve (per-alarm flags).
Subscription re-authorization Each MonitoredItem is stamped with (AuthGenerationId, MembershipVersion) at create time. On every Publish, items with a stamp mismatching the session's current (AuthGenerationId, MembershipVersion) get re-evaluated; revoked items drop to BadUserAccessDenied within one publish cycle. Unchanged items stay fast-path.
LdapAuthService On cookie-auth success: resolves LDAP group memberships; loads matching LdapGroupRoleMapping rows → role claims + cluster-scope claims (control plane); stores UserAuthorizationState.LdapGroups on the session for the data-plane evaluator.
ValidatedNodeAclAuthoringService Replaces CRUD-only NodeAclService for authoring. Validates (LDAP group exists, scope exists in current or target draft, grant shape is valid, no duplicate (LdapGroup, Scope) pair). Admin UI writes only through it.
Admin UI AclsTab.razor Writes via ValidatedNodeAclAuthoringService. Adds Probe-This-Permission row that runs the real evaluator against a chosen (LDAP group, node, operation) and shows Allow / NotGranted + matched-grant provenance.
Admin UI new tab RoleGrantsTab.razor CRUD over LdapGroupRoleMapping. Per-cluster + system-wide grants. FleetAdmin only. Documentation explicit that this only affects Admin UI access, not OPC UA data plane.
Audit log Every Grant/Revoke/Publish on LdapGroupRoleMapping or NodeAcl writes an AuditLog row with old/new state + user.

Scope — What Does NOT Change

Item Reason
OPC UA authn Already done (PR 19 LDAP user identity + Basic256Sha256 profile). Phase 6.2 is authorization only.
Explicit Deny grants Decision #129 note explicitly defers to v2.1. Default-deny + additive grants only.
Driver-side SecurityClassification metadata Drivers keep reporting Operate / ViewOnly / etc. — the evaluator uses them as part of the decision but doesn't replace them.
Galaxy namespace (SystemPlatform kind) UNS levels don't apply; evaluator treats Galaxy nodes as Cluster → Namespace → Tag (skip UnsArea/UnsLine/Equipment).

Entry Gate Checklist

  • Phase 6.1 merged (reuse Core.Resilience Polly pipeline for the ACL cache-refresh retries)
  • acl-design.md re-read in full
  • Decision log #105, #129, corrections-doc B1 re-skimmed
  • Existing NodeAcl + NodePermissions flag enum audited; confirm bitmask flags match acl-design.md table
  • Existing LdapAuthService group-resolution code path traced end-to-end — confirm it already queries group memberships (we only need the caller to consume the result)
  • Test DB scenarios catalogued: two clusters, three LDAP groups per cluster, mixed grant shapes; captured as seed-data fixtures

Task Breakdown

Stream A — LdapGroupRoleMapping table + migration (3 days)

  1. A.1 Entity + EF Core migration. Columns per §Scope table. Unique constraint on (LdapGroup, ClusterId) with null-tolerant comparer for the system-wide case. Index on LdapGroup for the hot-path lookup on auth.
  2. A.2 ILdapGroupRoleMappingService CRUD. Wrap in the Phase 6.1 Polly pipeline (timeout → retry → fallback-to-cache).
  3. A.3 Seed-data migration: preserve the current hardcoded FleetAdmin / ConfigEditor / ReadOnly mappings by seeding rows for the existing LDAP groups the dev box uses (cn=fleet-admin,…, cn=config-editor,…, cn=read-only,…). Op no-op migration for existing deployments.

Stream B — Permission-trie evaluator (1 week)

  1. B.1 IPermissionEvaluator.Authorize(IEnumerable<Claim> identity, NodeId nodeId, NodePermissions needed) — returns bool. Phase 6.2 returns only true / false; v2.1 can widen to Allow/Deny/Indeterminate if Deny lands.
  2. B.2 PermissionTrieBuilder builds the trie from NodeAcl + LdapGroupRoleMapping joined to the current generation's UnsArea + UnsLine + Equipment + Tag tables. One trie per (ClusterId, GenerationId) so rollback doesn't smear permissions across generations.
  3. B.3 Trie node structure: { Level: enum, ScopeId: Guid, AllowedPermissions: NodePermissions, ChildrenByLevel: Dictionary<Guid, TrieNode> }. Evaluation walks from Cluster → Namespace → UnsArea → UnsLine → Equipment → Tag, ORing allowed permissions at each level. Additive semantics: a grant at Cluster level cascades to every descendant tag.
  4. B.4 PermissionTrieCache service scoped as singleton; exposes GetTrieAsync(ClusterId, ct) that returns the current-generation trie. Invalidated on sp_PublishGeneration via an in-process event bus; also on TTL expiry (24 h safety net).
  5. B.5 Per-session cached evaluator: OPC UA Session authentication produces UserAuthorizationState { ClusterId, LdapGroups[], Trie }; cached on the session until session close or generation-apply.
  6. B.6 Unit tests: trie-walk theory covering (a) Cluster-level grant cascades to tags, (b) Equipment-level grant doesn't leak to sibling Equipment, (c) multi-group union, (d) no-grant → deny, (e) Galaxy nodes skip UnsArea/UnsLine levels.

Stream C — OPC UA server dispatch wiring (6 days, widened)

  1. C.1 DriverNodeManager.Read — evaluator consulted per ReadValueId with OpcUaOperation.Read. Denied attributes get BadUserAccessDenied per-item; batch never poisons. Integration test covers mixed-authorization batch (3 authorized + 2 denied → 3 Good values + 2 Bad StatusCodes, request completes).
  2. C.2 DriverNodeManager.Write — evaluator chooses NodePermissions.WriteOperate / WriteTune / WriteConfigure based on the driver-reported SecurityClassification.
  3. C.3 DriverNodeManager.HistoryReaduses NodePermissions.HistoryRead, which is a distinct flag from Read. Test: user with Read but not HistoryRead can read live values but gets BadUserAccessDenied on HistoryRead.
  4. C.4 DriverNodeManager.HistoryUpdate — uses NodePermissions.HistoryUpdate.
  5. C.5 DriverNodeManager.CreateMonitoredItems — per-MonitoredItemCreateResult denial in mixed-authorization batch; partial success path per OPC UA Part 4. Each created item stamped (AuthGenerationId, MembershipVersion).
  6. C.6 DriverNodeManager.TransferSubscriptions — on reconnect, re-evaluate every transferred MonitoredItem against the session's current auth state. Stale-stamp items drop to BadUserAccessDenied.
  7. C.7 Browse + TranslateBrowsePathsToNodeIds — evaluator called with OpcUaOperation.Browse. Ancestor visibility implied when any descendant has a grant (per acl-design.md §Browse). Denied ancestors filter from browse results — the UA browser sees a hierarchy truncated at the denied ancestor rather than an inconsistent child-without-parent view.
  8. C.8 DriverNodeManager.CallNodePermissions.MethodCall.
  9. C.9 Alarm actions (Acknowledge / Confirm / Shelve) — per-alarm NodePermissions.AlarmAck / AlarmConfirm / AlarmShelve.
  10. C.10 Publish path — for each MonitoredItem with a mismatched (AuthGenerationId, MembershipVersion) stamp, re-evaluate. Unchanged items stay fast-path; changes happen at next publish cycle.
  11. C.11 Integration tests: three-user seed with different memberships; matrix covers every operation in §Scope. Mixed-batch tests for Read + CreateMonitoredItems.

Stream D — Admin UI refresh (4 days)

  1. D.1 RoleGrantsTab.razor — FleetAdmin-gated CRUD on LdapGroupRoleMapping. Per-cluster dropdown + system-wide checkbox. Validation: LDAP group must exist in the dev LDAP (GLAuth) before saving — best-effort probe with graceful degradation.
  2. D.2 AclsTab.razor rewrites its edit path to write through the new NodeAclService. Adds a "Probe this permission" row: choose (LDAP group, node, action) → shows Allow / Deny + the reason (which grant matched).
  3. D.3 Draft-generation diff viewer now includes an ACL section: "X grants added, Y grants removed, Z grants changed."
  4. D.4 SignalR notification: PermissionTrieCache invalidation on sp_PublishGeneration pushes to Admin UI so operators see "this clusters permissions were just updated" within 2 s.

Compliance Checks (run at exit gate)

  • Control/data-plane separation: LdapGroupRoleMapping consumed only by Admin UI; the data-path evaluator has zero references to it. Enforced via a project-reference audit (Admin project references the mapping service; Core.Authorization does not).
  • Every operation wired: Browse, Read, Write, HistoryRead, HistoryUpdate, CreateMonitoredItems, TransferSubscriptions, Call, Acknowledge, Confirm, Shelve all consult the evaluator. Integration test matrix covers every operation × allow/deny.
  • HistoryRead uses its own flag: test "user with Read + no HistoryRead gets BadUserAccessDenied on HistoryRead".
  • Mixed-batch semantics: Read of 5 nodes (3 allowed + 2 denied) returns 3 Good + 2 BadUserAccessDenied per-ReadValueId; CreateMonitoredItems equivalent.
  • Browse ancestor visibility: user with a grant only on a deep equipment node can browse the path to it (ancestors implied); denied ancestors filter from browse results otherwise.
  • Galaxy FolderSegment coverage: a grant on a Galaxy folder subtree cascades to its tags; sibling folders are unaffected. Trie test covers this.
  • Subscription re-authorization: integration test — create item, revoke grant via draft+publish, next publish cycle the item returns BadUserAccessDenied (not silently still-notifying).
  • Membership freshness: test — 15 min MembershipFreshnessInterval elapses on a long-lived session + LDAP now unreachable → authz fails closed on the next request until LDAP recovers.
  • Auth cache fail-closed: test — Phase 6.1 cache serves stale config for 6 min; authz evaluator refuses all calls after 5 min regardless.
  • Trie invariants: PermissionTrieBuilder is idempotent (build twice with identical inputs → equal tries).
  • Additive grants + cluster isolation: cluster-grant cascades; cross-cluster leakage impossible.
  • Redundancy-safe invalidation: integration test — two nodes, a publish on one, authorize a request on the other before in-process event propagates → generation-mismatch forces re-load, no stale decision.
  • Authoring validation: AclsTab cannot save a (LdapGroup, Scope) pair that already exists in the draft; operator sees the validation error pre-save.
  • AuthorizationDecision shape stability: API surface exposes Allow + NotGranted only; Denied variant exists in the type but is never produced; v2.1 can add Deny without API break.
  • No regression in driver test counts.

Risks and Mitigations

Risk Likelihood Impact Mitigation
ACL evaluator latency on per-read hot path Medium High Trie lookup is O(depth) = O(6); session-cached UserAuthorizationState avoids per-Read trie rebuild; benchmark in Stream B.6
Trie cache stale after a rollback Medium High sp_PublishGeneration + sp_RollbackGeneration both emit the invalidation event; trie keyed on (ClusterId, GenerationId) so rollback fetches the prior trie cleanly
BadUserAccessDenied returns expose sensitive browse-name metadata Low Medium Server returns only the status code + NodeId; no message leak per OPC UA Part 4 §7.34 guidance
LdapGroupRoleMapping migration breaks existing deployments Low High Seed-migration preserves the hardcoded groups' effective grants verbatim; smoke test exercises the post-migration fleet admin login
Deny semantics accidentally ship (would break acl-design.md defer) Low Medium IPermissionEvaluator.Authorize returns bool (not tri-state) through Phase 6.2; widening to Allow/Deny/Indeterminate is a v2.1 ticket

Completion Checklist

  • Stream A: LdapGroupRoleMapping entity + migration + CRUD + seed
  • Stream B: evaluator + trie builder + cache + per-session state + unit tests
  • Stream C: OPC UA dispatch wiring on Read/Write/HistoryRead/Subscribe/Alarm paths
  • Stream D: Admin UI RoleGrantsTab + AclsTab refresh + SignalR invalidation
  • phase-6-2-compliance.ps1 exits 0; exit-gate doc recorded

Adversarial Review — 2026-04-19 (Codex, thread 019da48d-0d2b-7171-aed2-fc05f1f39ca3)

  1. Crit · ACCEPT — Trie must not conflate LdapGroupRoleMapping (control-plane admin claims per decision #105) with data-plane ACLs (decision #129). Change: LdapGroupRoleMapping is consumed only by the Admin UI role router. Data-plane trie reads NodeAcl rows joined against the session's resolved LDAP groups, never admin roles. Stream B.2 updated.
  2. Crit · ACCEPT — Cached UserAuthorizationState survives LDAP group changes because memberships only refresh at cookie-auth. Change: add MembershipFreshnessInterval (default 15 min); past that, next hot-path authz call forces group re-resolution (fail-closed if LDAP unreachable). Session-close-wins on config-rollback.
  3. High · ACCEPT — Node-local invalidation doesn't extend across redundant pair. Change: trie keyed on (ClusterId, GenerationId); hot-path authz looks up CurrentGenerationId from the shared config DB (Polly-wrapped + sub-second cache). A Backup that read stale generation gets a mismatched trie → forces re-load. Implementation note added to Stream B.4.
  4. High · ACCEPT — Browse enforcement missing. Change: new Stream C.7 (Browse + TranslateBrowsePathsToNodeIds enforcement). Ancestor visibility implied when any descendant has a grant; denied ancestors filter from browse results per acl-design.md §Browse.
  5. High · ACCEPTHistoryRead should use NodePermissions.HistoryRead bit, not Read. Change: Stream C.3 revised; separate unit test asserts Read+no-HistoryRead denies HistoryRead while allowing current-value reads.
  6. High · ACCEPT — Galaxy shallow-path (Cluster→Namespace→Tag) loses folder hierarchy authorization. Change: SystemPlatform namespaces use a FolderSegment scope-level between Namespace and Tag, populated from Tag.FolderPath; UNS-kind namespaces keep the 6-level hierarchy. Trie supports both via ScopeKind on each node.
  7. High · ACCEPT — Subscription re-authorization policy unresolved between create-time-only (fast, wrong on revoke) and per-publish (slow). Change: stamp each MonitoredItem with (AuthGenerationId, MembershipVersion); re-evaluate on Publish only when either version changed. Revoked items drop to BadUserAccessDenied within one publish cycle.
  8. Med · ACCEPT — Mixed-authorization batch Read / CreateMonitoredItems service-result semantics underspecified. Change: Stream C.6 explicitly tests per-ReadValueId + per-MonitoredItemCreateResult denial in mixed batches; batch never collapses to a coarse failure.
  9. Med · ACCEPT — Missing surfaces: Method.Call, HistoryUpdate, event filter on subscriptions, subscription-transfer on reconnect, alarm-ack. Change: scope expanded — every OPC UA authorization surface enumerated in Stream C: Read, Write, HistoryRead, HistoryUpdate, CreateMonitoredItems, TransferSubscriptions, Call, Acknowledge/Confirm/Shelve, Browse, TranslateBrowsePathsToNodeIds.
  10. Med · ACCEPTbool evaluator bakes in grant-only semantics; collides with v2.1 Deny. Change: internal model uses AuthorizationDecision { Allow | NotGranted | Denied, IReadOnlyList<MatchedGrant> Provenance }. Phase 6.2 maps Denied → never produced; UI + audit log use the full record so v2.1 Deny lands without API break.
  11. Med · ACCEPT — 6.1 cache fallback is availability-oriented; applying it to auth is correctness-dangerous. Change: auth-specific staleness budget AuthCacheMaxStaleness (default 5 min, not 24 h). Past that, hot-path evaluator fails closed on cached reads; all authorization calls return NotGranted until fresh data lands. Documented in risks + compliance.
  12. Low · ACCEPT — Existing NodeAclService is raw CRUD. Change: new ValidatedNodeAclAuthoringService enforces scope-uniqueness + draft/publish invariants + rejects invalid (LDAP group, scope) pairs; Admin UI writes through it only. Stream D.2 adjusted.