16 KiB
Phase 6.2 — Authorization Runtime (ACL + LDAP grants)
Status: DRAFT — the v2
plan.mddecision #129 +acl-design.mdspecify a 6-level permission-trie evaluator withNodePermissionsbitmask grants, but no runtime evaluator exists. ACL tables are schematized but unread by the data path.Branch:
v2/phase-6-2-authorization-runtimeEstimated duration: 2.5 weeks Predecessor: Phase 6.1 (Resilience & Observability) — reuses the Polly pipeline for ACL-cache refresh retries Successor: Phase 6.3 (Redundancy)
Phase Objective
Wire ACL enforcement on every OPC UA Read / Write / Subscribe / Call path + LDAP group → admin role grants that the v2 plan specified but never ran. End-state: a user's effective permissions resolve through a per-session permission-trie over the 6-level Cluster / Namespace / UnsArea / UnsLine / Equipment / Tag hierarchy, cached per session, invalidated on generation-apply + LDAP group expiry.
Closes these gaps:
- Data-path ACL enforcement —
NodeAcltable +NodePermissionsflags shipped;NodeAclService.cspresent as a CRUD surface; no code consults ACLs atRead/Writetime. OPC UA server answers everything to everyone. LdapGroupRoleMappingfor cluster-scoped admin grants — decision #105 shipped as the design; admin roles are hardcoded (FleetAdmin/ConfigEditor/ReadOnly) with no cluster-scoping and no LDAP-to-grant table. Decision #105 explicitly lifts this from v2.1 into v2.0.- Explicit Deny pathway — deferred to v2.1 (decision #129 note). Phase 6.2 ships grants only;
Denystays out. - Admin UI ACL grant editor —
AclsTab.razorexists but edits the now-unusedNodeAcltable; needs to wire to the runtime evaluator + the newLdapGroupRoleMappingtable.
Scope — What Changes
| Concern | Change |
|---|---|
Configuration project |
New entity LdapGroupRoleMapping { Id, LdapGroup, Role, ClusterId? (nullable = system-wide), IsSystemWide, GeneratedAtUtc }. Migration. Admin CRUD. |
Core → new Core.Authorization sub-namespace |
IPermissionEvaluator interface; concrete PermissionTrieEvaluator implementation loads ACLs + LDAP mappings from Configuration, builds a trie keyed on the 6-level scope hierarchy, evaluates a (UserClaim[], NodeId, NodePermissions) → bool decision in O(depth × group-count). |
Core.Authorization cache |
PermissionTrieCache — one trie per (ClusterId, GenerationId). Rebuilt on sp_PublishGeneration confirmation; served from memory thereafter. Per-session evaluator keeps a reference to the current trie + user's LDAP groups. |
| OPC UA server dispatch | OtOpcUa.Server/OpcUa/DriverNodeManager.cs Read/Write/HistoryRead/MonitoredItem-create paths call PermissionEvaluator.Authorize(session.Identity, nodeId, NodePermissions.Read) etc. before delegating to the driver. Unauthorized returns BadUserAccessDenied (0x80210000) — not a silent no-op per corrections-doc B1. |
LdapAuthService (existing) |
On cookie-auth success, resolves the user's LDAP groups via LdapGroupService.GetMemberships + loads the matching LdapGroupRoleMapping rows → produces a role-claim list + cluster-scope claim list. Stored on the auth cookie. |
Admin UI AclsTab.razor |
Repoint edits at the new NodeAclService API that writes through to the same table the evaluator reads. Add a "test this permission" probe that runs a dummy evaluator against a chosen (user, nodeId, action) so ops can sanity-check grants before publishing a draft. |
Admin UI new tab RoleGrantsTab.razor |
CRUD over LdapGroupRoleMapping. Per-cluster + system-wide grants. FleetAdmin only. |
| Audit log | Every Grant/Revoke/Publish on LdapGroupRoleMapping or NodeAcl writes an AuditLog row with old/new state + user. |
Scope — What Does NOT Change
| Item | Reason |
|---|---|
| OPC UA authn | Already done (PR 19 LDAP user identity + Basic256Sha256 profile). Phase 6.2 is authorization only. |
Explicit Deny grants |
Decision #129 note explicitly defers to v2.1. Default-deny + additive grants only. |
Driver-side SecurityClassification metadata |
Drivers keep reporting Operate / ViewOnly / etc. — the evaluator uses them as part of the decision but doesn't replace them. |
| Galaxy namespace (SystemPlatform kind) | UNS levels don't apply; evaluator treats Galaxy nodes as Cluster → Namespace → Tag (skip UnsArea/UnsLine/Equipment). |
Entry Gate Checklist
- Phase 6.1 merged (reuse
Core.ResiliencePolly pipeline for the ACL cache-refresh retries) acl-design.mdre-read in full- Decision log #105, #129, corrections-doc B1 re-skimmed
- Existing
NodeAcl+NodePermissionsflag enum audited; confirm bitmask flags matchacl-design.mdtable - Existing
LdapAuthServicegroup-resolution code path traced end-to-end — confirm it already queries group memberships (we only need the caller to consume the result) - Test DB scenarios catalogued: two clusters, three LDAP groups per cluster, mixed grant shapes; captured as seed-data fixtures
Task Breakdown
Stream A — LdapGroupRoleMapping table + migration (3 days)
- A.1 Entity + EF Core migration. Columns per §Scope table. Unique constraint on
(LdapGroup, ClusterId)with null-tolerant comparer for the system-wide case. Index onLdapGroupfor the hot-path lookup on auth. - A.2
ILdapGroupRoleMappingServiceCRUD. Wrap in the Phase 6.1 Polly pipeline (timeout → retry → fallback-to-cache). - A.3 Seed-data migration: preserve the current hardcoded
FleetAdmin/ConfigEditor/ReadOnlymappings by seeding rows for the existing LDAP groups the dev box uses (cn=fleet-admin,…,cn=config-editor,…,cn=read-only,…). Op no-op migration for existing deployments.
Stream B — Permission-trie evaluator (1 week)
- B.1
IPermissionEvaluator.Authorize(IEnumerable<Claim> identity, NodeId nodeId, NodePermissions needed)— returnsbool. Phase 6.2 returns onlytrue/false; v2.1 can widen toAllow/Deny/Indeterminateif Deny lands. - B.2
PermissionTrieBuilderbuilds the trie fromNodeAcl+LdapGroupRoleMappingjoined to the current generation'sUnsArea+UnsLine+Equipment+Tagtables. One trie per(ClusterId, GenerationId)so rollback doesn't smear permissions across generations. - B.3 Trie node structure:
{ Level: enum, ScopeId: Guid, AllowedPermissions: NodePermissions, ChildrenByLevel: Dictionary<Guid, TrieNode> }. Evaluation walks from Cluster → Namespace → UnsArea → UnsLine → Equipment → Tag, ORing allowed permissions at each level. Additive semantics: a grant at Cluster level cascades to every descendant tag. - B.4
PermissionTrieCacheservice scoped as singleton; exposesGetTrieAsync(ClusterId, ct)that returns the current-generation trie. Invalidated onsp_PublishGenerationvia an in-process event bus; also on TTL expiry (24 h safety net). - B.5 Per-session cached evaluator: OPC UA Session authentication produces
UserAuthorizationState { ClusterId, LdapGroups[], Trie }; cached on the session until session close or generation-apply. - B.6 Unit tests: trie-walk theory covering (a) Cluster-level grant cascades to tags, (b) Equipment-level grant doesn't leak to sibling Equipment, (c) multi-group union, (d) no-grant → deny, (e) Galaxy nodes skip UnsArea/UnsLine levels.
Stream C — OPC UA server dispatch wiring (4 days)
- C.1
DriverNodeManager.Read— consult evaluator before delegating toIReadable. Unauthorized nodes getBadUserAccessDeniedper-attribute, not on the whole batch. - C.2
DriverNodeManager.Write— same. Evaluator needsNodePermissions.WriteOperate/WriteTune/WriteConfiguredepending on driver-reportedSecurityClassificationof the attribute. - C.3
DriverNodeManager.HistoryRead— ACL checksNodePermissions.Read(history uses the same Read flag peracl-design.md). - C.4
DriverNodeManager.CreateMonitoredItem— denies unauthorized nodes at subscription create time, not after the first publish. Cleaner than silently omitting notifications. - C.5 Alarm actions (acknowledge / confirm / shelve) — checks
AlarmAck/AlarmConfirm/AlarmShelveflags. - C.6 Integration tests: boot server with a seed trie, auth as three distinct users with different group memberships, assert read of one tag allowed + read of another denied + write denied where Read allowed.
Stream D — Admin UI refresh (4 days)
- D.1
RoleGrantsTab.razor— FleetAdmin-gated CRUD onLdapGroupRoleMapping. Per-cluster dropdown + system-wide checkbox. Validation: LDAP group must exist in the dev LDAP (GLAuth) before saving — best-effort probe with graceful degradation. - D.2
AclsTab.razorrewrites its edit path to write through the newNodeAclService. Adds a "Probe this permission" row: choose(LDAP group, node, action)→ shows Allow / Deny + the reason (which grant matched). - D.3 Draft-generation diff viewer now includes an ACL section: "X grants added, Y grants removed, Z grants changed."
- D.4 SignalR notification:
PermissionTrieCacheinvalidation onsp_PublishGenerationpushes to Admin UI so operators see "this clusters permissions were just updated" within 2 s.
Compliance Checks (run at exit gate)
- Data-path enforcement: OPC UA Read against a NodeId the current user has no grant for returns
BadUserAccessDeniedwith a ServiceResult, not Good with stale data. Verified by an integration test with a Basic256Sha256-secured session + a read-only LDAP identity. - Trie invariants:
PermissionTrieBuilderis idempotent (building twice with identical inputs produces equal tries — overrideEqualsto assert). - Additive grants: Cluster-level grant on User A means User A can read every tag in that cluster without needing any lower-level grant.
- Isolation between clusters: a grant on Cluster 1 has zero effect on Cluster 2 for the same user.
- Galaxy path coverage: ACL checks work on
Galaxyfolder nodes + tag nodes where the UNS levels are absent (the trie treats them as shallowCluster → Namespace → Tag). - No regression in driver test counts.
Risks and Mitigations
| Risk | Likelihood | Impact | Mitigation |
|---|---|---|---|
| ACL evaluator latency on per-read hot path | Medium | High | Trie lookup is O(depth) = O(6); session-cached UserAuthorizationState avoids per-Read trie rebuild; benchmark in Stream B.6 |
| Trie cache stale after a rollback | Medium | High | sp_PublishGeneration + sp_RollbackGeneration both emit the invalidation event; trie keyed on (ClusterId, GenerationId) so rollback fetches the prior trie cleanly |
BadUserAccessDenied returns expose sensitive browse-name metadata |
Low | Medium | Server returns only the status code + NodeId; no message leak per OPC UA Part 4 §7.34 guidance |
| LdapGroupRoleMapping migration breaks existing deployments | Low | High | Seed-migration preserves the hardcoded groups' effective grants verbatim; smoke test exercises the post-migration fleet admin login |
Deny semantics accidentally ship (would break acl-design.md defer) |
Low | Medium | IPermissionEvaluator.Authorize returns bool (not tri-state) through Phase 6.2; widening to Allow/Deny/Indeterminate is a v2.1 ticket |
Completion Checklist
- Stream A:
LdapGroupRoleMappingentity + migration + CRUD + seed - Stream B: evaluator + trie builder + cache + per-session state + unit tests
- Stream C: OPC UA dispatch wiring on Read/Write/HistoryRead/Subscribe/Alarm paths
- Stream D: Admin UI
RoleGrantsTab+AclsTabrefresh + SignalR invalidation phase-6-2-compliance.ps1exits 0; exit-gate doc recorded
Adversarial Review — 2026-04-19 (Codex, thread 019da48d-0d2b-7171-aed2-fc05f1f39ca3)
- Crit · ACCEPT — Trie must not conflate
LdapGroupRoleMapping(control-plane admin claims per decision #105) with data-plane ACLs (decision #129). Change:LdapGroupRoleMappingis consumed only by the Admin UI role router. Data-plane trie readsNodeAclrows joined against the session's resolved LDAP groups, never admin roles. Stream B.2 updated. - Crit · ACCEPT — Cached
UserAuthorizationStatesurvives LDAP group changes because memberships only refresh at cookie-auth. Change: addMembershipFreshnessInterval(default 15 min); past that, next hot-path authz call forces group re-resolution (fail-closed if LDAP unreachable). Session-close-wins on config-rollback. - High · ACCEPT — Node-local invalidation doesn't extend across redundant pair. Change: trie keyed on
(ClusterId, GenerationId); hot-path authz looks upCurrentGenerationIdfrom the shared config DB (Polly-wrapped + sub-second cache). A Backup that read stale generation gets a mismatched trie → forces re-load. Implementation note added to Stream B.4. - High · ACCEPT — Browse enforcement missing. Change: new Stream C.7 (
Browse + TranslateBrowsePathsToNodeIdsenforcement). Ancestor visibility implied when any descendant has a grant; denied ancestors filter from browse results peracl-design.md§Browse. - High · ACCEPT —
HistoryReadshould useNodePermissions.HistoryReadbit, notRead. Change: Stream C.3 revised; separate unit test assertsRead+no-HistoryReaddenies HistoryRead while allowing current-value reads. - High · ACCEPT — Galaxy shallow-path (Cluster→Namespace→Tag) loses folder hierarchy authorization. Change: SystemPlatform namespaces use a
FolderSegmentscope-level between Namespace and Tag, populated fromTag.FolderPath; UNS-kind namespaces keep the 6-level hierarchy. Trie supports both viaScopeKindon each node. - High · ACCEPT — Subscription re-authorization policy unresolved between create-time-only (fast, wrong on revoke) and per-publish (slow). Change: stamp each
MonitoredItemwith(AuthGenerationId, MembershipVersion); re-evaluate on Publish only when either version changed. Revoked items drop toBadUserAccessDeniedwithin one publish cycle. - Med · ACCEPT — Mixed-authorization batch
Read/CreateMonitoredItemsservice-result semantics underspecified. Change: Stream C.6 explicitly tests per-ReadValueId+ per-MonitoredItemCreateResultdenial in mixed batches; batch never collapses to a coarse failure. - Med · ACCEPT — Missing surfaces:
Method.Call,HistoryUpdate, event filter on subscriptions, subscription-transfer on reconnect, alarm-ack. Change: scope expanded — every OPC UA authorization surface enumerated in Stream C: Read, Write, HistoryRead, HistoryUpdate, CreateMonitoredItems, TransferSubscriptions, Call, Acknowledge/Confirm/Shelve, Browse, TranslateBrowsePathsToNodeIds. - Med · ACCEPT —
boolevaluator bakes in grant-only semantics; collides with v2.1 Deny. Change: internal model usesAuthorizationDecision { Allow | NotGranted | Denied, IReadOnlyList<MatchedGrant> Provenance }. Phase 6.2 mapsDenied→ never produced; UI + audit log use the full record so v2.1 Deny lands without API break. - Med · ACCEPT — 6.1 cache fallback is availability-oriented; applying it to auth is correctness-dangerous. Change: auth-specific staleness budget
AuthCacheMaxStaleness(default 5 min, not 24 h). Past that, hot-path evaluator fails closed on cached reads; all authorization calls returnNotGranteduntil fresh data lands. Documented in risks + compliance. - Low · ACCEPT — Existing
NodeAclServiceis raw CRUD. Change: newValidatedNodeAclAuthoringServiceenforces scope-uniqueness + draft/publish invariants + rejects invalid (LDAP group, scope) pairs; Admin UI writes through it only. Stream D.2 adjusted.