Driver-instance bootstrap pipeline (#248) — DriverInstance rows materialise as live IDriver instances #196

Merged
dohertj2 merged 2 commits from phase-7-fu-248-driver-bootstrap into v2 2026-04-20 22:52:14 -04:00
Owner

Closes the gap surfaced by Phase 7 live smoke (#240): DriverInstance rows in the central config DB had no path to materialise as live IDriver instances in DriverHost, so virtual-tag scripts read BadNodeIdUnknown for every tag.

DriverFactoryRegistry (Core.Hosting)

Process-singleton type-name → factory map. Each driver project's static Register call pre-loads its factory at Program.cs startup; the bootstrapper looks up by DriverInstance.DriverType + invokes with (DriverInstanceId, DriverConfig JSON). Case-insensitive; duplicate-type registration throws.

GalaxyProxyDriverFactoryExtensions.Register (Driver.Galaxy.Proxy)

Static helper — no Microsoft.Extensions.DependencyInjection dep, keeps the driver project free of DI machinery. Parses DriverConfig JSON for PipeName + SharedSecret + ConnectTimeoutMs. DriverInstanceId from the row wins over JSON per the schema's UX_DriverInstance_Generation_LogicalId.

DriverInstanceBootstrapper (Server)

After NodeBootstrap loads the published generation: queries DriverInstance rows scoped to that generation, looks up the factory per row, constructs + DriverHost.RegisterAsync (which calls InitializeAsync). Per plan decision #12 (driver isolation), failure of one driver doesn't prevent others — logs ERR + continues + returns the count actually registered. Unknown DriverType (factory not registered) logs WRN + skips so a missing-assembly deployment doesn't take down the whole server.

Wired into OpcUaServerService.ExecuteAsync

After NodeBootstrap.LoadCurrentGenerationAsync, before PopulateEquipmentContentAsync + Phase7Composer.PrepareAsync. The Phase 7 chain now sees a populated DriverHost so CachedTagUpstreamSource has an upstream feed.

Live evidence on the dev box

Re-ran the Phase 7 smoke from #240. Pre vs post #248:

Equipment namespace snapshots loaded for 0/0 driver(s)  ← before
Equipment namespace snapshots loaded for 1/1 driver(s)  ← after

Galaxy.Host pipe ACL denied our SID (env-config issue documented in docs/ServiceHosting.md, NOT a code issue) — the bootstrapper logged it as 'failed to initialize, driver state will reflect Faulted' and continued past the failure exactly per plan #12. The rest of the pipeline (Equipment walker + Phase 7 composer) ran to completion.

Tests — 5 new

DriverFactoryRegistryTests: Register + TryGet round-trip, case-insensitive lookup, duplicate-type throws, null-arg guards, RegisteredTypes snapshot. Pure functions; no DI/DB needed. The bootstrapper's DB-query path is exercised by the live smoke (#240) which operators run before each release.

Closes the gap surfaced by Phase 7 live smoke (#240): `DriverInstance` rows in the central config DB had no path to materialise as live `IDriver` instances in `DriverHost`, so virtual-tag scripts read `BadNodeIdUnknown` for every tag. ## `DriverFactoryRegistry` (Core.Hosting) Process-singleton type-name → factory map. Each driver project's static Register call pre-loads its factory at Program.cs startup; the bootstrapper looks up by `DriverInstance.DriverType` + invokes with `(DriverInstanceId, DriverConfig JSON)`. Case-insensitive; duplicate-type registration throws. ## `GalaxyProxyDriverFactoryExtensions.Register` (Driver.Galaxy.Proxy) Static helper — no `Microsoft.Extensions.DependencyInjection` dep, keeps the driver project free of DI machinery. Parses DriverConfig JSON for `PipeName` + `SharedSecret` + `ConnectTimeoutMs`. `DriverInstanceId` from the row wins over JSON per the schema's `UX_DriverInstance_Generation_LogicalId`. ## `DriverInstanceBootstrapper` (Server) After `NodeBootstrap` loads the published generation: queries DriverInstance rows scoped to that generation, looks up the factory per row, constructs + `DriverHost.RegisterAsync` (which calls `InitializeAsync`). Per plan decision #12 (driver isolation), failure of one driver doesn't prevent others — logs `ERR` + continues + returns the count actually registered. Unknown DriverType (factory not registered) logs `WRN` + skips so a missing-assembly deployment doesn't take down the whole server. ## Wired into `OpcUaServerService.ExecuteAsync` After `NodeBootstrap.LoadCurrentGenerationAsync`, before `PopulateEquipmentContentAsync` + `Phase7Composer.PrepareAsync`. The Phase 7 chain now sees a populated `DriverHost` so `CachedTagUpstreamSource` has an upstream feed. ## Live evidence on the dev box Re-ran the Phase 7 smoke from #240. Pre vs post #248: ``` Equipment namespace snapshots loaded for 0/0 driver(s) ← before Equipment namespace snapshots loaded for 1/1 driver(s) ← after ``` Galaxy.Host pipe ACL denied our SID (env-config issue documented in `docs/ServiceHosting.md`, NOT a code issue) — the bootstrapper logged it as 'failed to initialize, driver state will reflect Faulted' and continued past the failure exactly per plan #12. The rest of the pipeline (Equipment walker + Phase 7 composer) ran to completion. ## Tests — 5 new `DriverFactoryRegistryTests`: Register + TryGet round-trip, case-insensitive lookup, duplicate-type throws, null-arg guards, RegisteredTypes snapshot. Pure functions; no DI/DB needed. The bootstrapper's DB-query path is exercised by the live smoke (#240) which operators run before each release.
dohertj2 added 2 commits 2026-04-20 22:52:01 -04:00
Closes the gap surfaced by Phase 7 live smoke (#240): DriverInstance rows in
the central config DB had no path to materialise as live IDriver instances in
DriverHost, so virtual-tag scripts read BadNodeIdUnknown for every tag.

## DriverFactoryRegistry (Core.Hosting)
Process-singleton type-name → factory map. Each driver project's static
Register call pre-loads its factory at Program.cs startup; the bootstrapper
looks up by DriverInstance.DriverType + invokes with (DriverInstanceId,
DriverConfig JSON). Case-insensitive; duplicate-type registration throws.

## GalaxyProxyDriverFactoryExtensions.Register (Driver.Galaxy.Proxy)
Static helper — no Microsoft.Extensions.DependencyInjection dep, keeps the
driver project free of DI machinery. Parses DriverConfig JSON for PipeName +
SharedSecret + ConnectTimeoutMs. DriverInstanceId from the row wins over JSON
per the schema's UX_DriverInstance_Generation_LogicalId.

## DriverInstanceBootstrapper (Server)
After NodeBootstrap loads the published generation: queries DriverInstance
rows scoped to that generation, looks up the factory per row, constructs +
DriverHost.RegisterAsync (which calls InitializeAsync). Per plan decision
#12 (driver isolation), failure of one driver doesn't prevent others —
logs ERR + continues + returns the count actually registered. Unknown
DriverType (factory not registered) logs WRN + skips so a missing-assembly
deployment doesn't take down the whole server.

## Wired into OpcUaServerService.ExecuteAsync
After NodeBootstrap.LoadCurrentGenerationAsync, before
PopulateEquipmentContentAsync + Phase7Composer.PrepareAsync. The Phase 7
chain now sees a populated DriverHost so CachedTagUpstreamSource has an
upstream feed.

## Live evidence on the dev box
Re-ran the Phase 7 smoke from task #240. Pre-#248 vs post-#248:
  Equipment namespace snapshots loaded for 0/0 driver(s)  ← before
  Equipment namespace snapshots loaded for 1/1 driver(s)  ← after

Galaxy.Host pipe ACL denied our SID (env-config issue documented in
docs/ServiceHosting.md, NOT a code issue) — the bootstrapper logged it as
"failed to initialize, driver state will reflect Faulted" and continued past
the failure exactly per plan #12. The rest of the pipeline (Equipment walker
+ Phase 7 composer) ran to completion.

## Tests — 5 new DriverFactoryRegistryTests
Register + TryGet round-trip, case-insensitive lookup, duplicate-type throws,
null-arg guards, RegisteredTypes snapshot. Pure functions; no DI/DB needed.
The bootstrapper's DB-query path is exercised by the live smoke (#240) which
operators run before each release.
The previous commit (#248 wiring) inadvertently picked up
src/ZB.MOM.WW.OtOpcUa.Server/config_cache.db — generated by the live smoke
re-run that proved the bootstrapper works. Remove from tracking + ignore
going forward so future runs don't dirty the working tree.
dohertj2 merged commit 7e143e293b into v2 2026-04-20 22:52:14 -04:00
dohertj2 referenced this issue from a commit 2026-04-30 08:21:26 -04:00
Admin RoleGrants page — LDAP-group → Admin-role mapping CRUD. Closes the RoleGrantsTab slice of task #144 (Phase 6.2 Stream D follow-up); the remaining three sub-items (Probe-this-permission on AclsTab, SignalR invalidation on role/ACL changes, draft-diff ACL section) are split into new follow-up task #196 so each can ship independently. The permission-trie evaluator + ILdapGroupRoleMappingService already exist from Phase 6.2 Streams A + B — this PR adds the consuming UI + the DI registration that was missing. New /role-grants page at Components/Pages/RoleGrants.razor registered in MainLayout's sidebar next to Certificates. Lists every LdapGroupRoleMapping row with columns LDAP group / Role / Scope (Fleet-wide or Cluster:X) / Created / Notes / Revoke. Add-grant form takes LDAP group DN + AdminRole dropdown (ConfigViewer, ConfigEditor, FleetAdmin) + Fleet-wide checkbox + Cluster dropdown (disabled when Fleet-wide checked) + optional Notes. Service-layer invariants — IsSystemWide=true + ClusterId=null, or IsSystemWide=false + ClusterId populated — enforced in ValidateInvariants; UI catches InvalidLdapGroupRoleMappingException and displays the message in a red alert. ILdapGroupRoleMappingService was present in the Configuration project from Stream A but never registered in the Admin DI container — this PR adds the AddScoped registration so the injection can resolve. Control-plane/data-plane separation note rendered in an info banner at the top of the page per decision #150 (these grants do NOT govern OPC UA data-path authorization; NodeAcl rows are read directly by the permission-trie evaluator without consulting role mappings). Admin project builds 0 errors; Admin.Tests 72/72 passing. Task #196 created to track: (1) AclsTab Probe-this-permission form that takes (ldap group, node path, permission flag) and runs it through the permission trie, showing which row granted it + the actual resolved grant; (2) SignalR invalidation — push a RoleGrantsChanged event when rows are created/deleted so connected Admin sessions reload without polling, ditto NodeAclChanged on ACL writes; (3) DiffViewer ACL section — show NodeAcl + LdapGroupRoleMapping deltas between draft + published alongside equipment/uns diffs.
dohertj2 referenced this issue from a commit 2026-04-30 08:21:26 -04:00
DiffViewer refactor — 6-section plugin pattern + 1000-row cap. Closes task #156 (Phase 6.4 Stream C). Replaces the flat single-table rendering that mixed Namespace/DriverInstance/Equipment/Tag rows into one untyped list with a per-section-card layout that makes draft review actually scannable on non-trivial diffs. New DiffSection.razor reusable component encapsulates the per-section rendering — card header shows Title + Description + a three-badge summary (+added / −removed / ~modified plus a "no changes" grey badge when the section is empty) so operators can glance at a six-card page and see what areas of the draft actually shifted before drilling into any one table. Hard row-cap at DefaultRowCap=1000 per section lives inside the component so a pathological draft (e.g. 20k tags churned by a block rebuild) can't freeze the browser on render — excess rows are silently dropped with a yellow warning banner that surfaces "Showing the first 1000 of N rows" + a pointer to run sp_ComputeGenerationDiff directly for the full set. Body max-height: 400px + overflow-y: auto gives each section its own scroll region so one big section doesn't push the others off screen. DiffViewer.razor refactored to a static Sections table driving a single foreach that instantiates one DiffSection per known TableName. Sections listed in author-order (Namespace → DriverInstance → Equipment → Tag → UnsLine → NodeAcl) — six entries matching the task acceptance criterion. The first four correspond to what sp_ComputeGenerationDiff currently emits; the last two (UnsLine + NodeAcl) render as empty "no changes" cards today + will light up when the proc is extended (tracked in task #196 for NodeAcl; UnsLine proc extension is a natural follow-up since UnsImpactAnalyzer already tracks UNS moves). RowsFor(tableName) replaces the prior flat table — each section filters the overall DiffRow list by its TableName so the proc output format stays stable. Header-bar summary at the top of the page now reads "N rows across M of 6 sections" so operators see overall change weight at a glance before scanning. Two Razor-specific fixes landed along the way: loop variable renamed from `section` to `sec` because `@section` collides with the Razor section directive + trips RZ2005; helper method renamed from Group to RowsFor because the Razor generator gets confused by a parameter-flowing method whose name clashes with LINQ's Group extension (the source-gen output referenced TypeCheck<T> with no argument). Admin project builds 0 errors; Admin.Tests suite 76/76 (unchanged — the refactor is structural + no service-layer logic changed, so the existing DraftValidator + EquipmentService + AdminServicesIntegrationTests cover the consuming paths). No bUnit in this project so the cap behavior isn't unit-tested at the component level; DiffSection.OnParametersSet is small + deterministic (int counts + Take(RowCap)) + reviewed before ship.
dohertj2 referenced this issue from a commit 2026-04-30 08:21:26 -04:00
AclsTab Probe-this-permission — first of three #196 slices. New /clusters/{ClusterId}/draft/{GenerationId} ACLs-tab gains a probe card above the grant table so operators can ask the trie "if cn=X asks for permission Y on node Z, would it be granted, and which rows contributed?" without shell-ing into the DB. Service thinly wraps the same PermissionTrieBuilder + PermissionTrie.CollectMatches call path the Server's dispatch layer uses at request time, so a probe answer is by construction identical to what the live server would decide. New PermissionProbeService.ProbeAsync(generationId, ldapGroup, NodeScope, requiredFlags) — loads the target generation's NodeAcl rows filtered to the cluster (critical: without the cluster filter, cross-cluster grants leak into the probe which tested false-positive in the unit suite), builds a trie, CollectMatches against the supplied scope + [ldapGroup], ORs the matched-grant flags into Effective, compares to Required. Returns PermissionProbeResult(Granted, Required, Effective, Matches) — Matches carries LdapGroup + Scope + PermissionFlags per matched row so the UI can render the contribution chain. Zero side effects + no audit rows — a failing probe is a question, not a denial. AclsTab.razor gains the probe card at the top (before the New-grant form + grant table): six inputs for ldap group + every NodeScope level (NamespaceId → UnsAreaId → UnsLineId → EquipmentId → TagId — blank fields become null so the trie walks only as deep as the operator specified), a NodePermissions dropdown filtered to skip None, Probe button, green Granted / red Denied badge + Required/Effective bitmask display, and (when matches exist) a small table showing which LdapGroup matched at which level with which flags. Admin csproj adds ProjectReference to Core — the trie + NodeScope live there + were previously Server-only. Five new PermissionProbeServiceTests covering: cluster-level row grants a namespace-level read; no-group-match denies with empty Effective; matching group but insufficient flags (Browse+Read vs WriteOperate required) denies with correct Effective bitmask; cross-cluster grants stay isolated (c2's WriteOperate does NOT leak into c1's probe); generation isolation (gen1's Read-only does NOT let gen2's WriteOperate-requiring probe pass). Admin.Tests 92/92 passing (was 87, +5). Admin builds 0 errors. Remaining #196 slices — SignalR invalidation + draft-diff ACL section — ship in follow-up PRs so the review surface per PR stays tight.
dohertj2 referenced this issue from a commit 2026-04-30 08:21:26 -04:00
ACL + role-grant SignalR invalidation — #196 slice 2. Adds the live-push layer so an operator editing permissions in one Admin session sees the change in peer sessions without a manual reload. Covers both axes of task #196's invalidation requirement: cluster-scoped NodeAcl mutations push NodeAclChanged to that cluster's subscribers; fleet-wide LdapGroupRoleMapping CRUD pushes RoleGrantsChanged to every Admin session on the fleet group. New AclChangeNotifier service wraps IHubContext<FleetStatusHub> with two methods: NotifyNodeAclChangedAsync(clusterId, generationId) + NotifyRoleGrantsChangedAsync(). Both are fire-and-forget — a failed hub send logs a warning + returns; the authoritative DB write already committed, so worst-case peers see stale data until their next poll (AclsTab has no polling today; on-parameter-set reload + this signal covers the practical refresh cases). Catching OperationCanceledException separately so request-teardown doesn't log a false-positive hub-failure. NodeAclService constructor gains an optional AclChangeNotifier param (defaults to null so the existing unit tests that pass only a DbContext keep compiling). GrantAsync + RevokeAsync both emit NodeAclChanged after the SaveChanges completes — the Revoke path uses the loaded row's ClusterId + GenerationId for accurate routing since the caller passes only the surrogate rowId. RoleGrants.razor consumes the notifier after every Create + Delete + opens a fleet-scoped HubConnection on first render that reloads the grant list on RoleGrantsChanged. AclsTab.razor opens a cluster-scoped connection on first render and reloads only when the incoming NodeAclChanged message matches both the current ClusterId + GenerationId (so a peer editing a different draft doesn't trigger spurious reloads). Both pages IAsyncDisposable the connection on navigation away. AclChangeNotifier is DI-registered alongside PermissionProbeService. Two new message records in AclChangeNotifier.cs: NodeAclChangedMessage(ClusterId, GenerationId, ObservedAtUtc) + RoleGrantsChangedMessage(ObservedAtUtc). Admin.Tests 92/92 passing (unchanged — the notifier is fire-and-forget + tested at hub level in existing FleetStatusPoller suite). Admin builds 0 errors. One slice of #196 remains: the draft-diff ACL section (extend sp_ComputeGenerationDiff to emit NodeAcl rows + wire the DiffViewer NodeAcl card from the empty placeholder it currently shows). Next PR.
dohertj2 referenced this issue from a commit 2026-04-30 08:21:26 -04:00
DiffViewer ACL section — extend sp_ComputeGenerationDiff with NodeAcl rows. Closes the final slice of task #196 (draft-diff ACL section). The DiffViewer already rendered a placeholder "NodeAcl" card from the task #156 refactor; it stayed empty because the stored proc didn't emit NodeAcl rows. This PR lights the card up by adding a fifth UNION to the proc. Logical id for NodeAcl is the composite LdapGroup + ScopeKind + ScopeId triple — format "cn=group|Cluster|scope-id" or "cn=group|Cluster|(cluster)" when ScopeId is null (Cluster-wide rows). That shape means a permission-only change (same group + same scope, PermissionFlags shifted) appears as a single Modified row with the full triple as its identifier, whereas a scope move (same group, new ScopeId) correctly surfaces as Added + Removed of two different logical ids. CHECKSUM signature covers ClusterId + PermissionFlags + Notes so both operator-visible changes (permission bitmask) and audit-tier changes (notes) round-trip through the diff. New migration 20260420000001_ExtendComputeGenerationDiffWithNodeAcl.cs ships both Up (install V2 proc) + Down (restore the exact V1 proc text shipped in 20260417215224_StoredProcedures so the migration is reversible). Row-id column widens from nvarchar(64) to nvarchar(128) in V2 since the composite key (group DN + scope + scope-id) exceeds 64 chars comfortably — narrow column would silently truncate in prod. Designer .cs cloned from the prior migration since the EF model is unchanged; DiffViewer.razor section description updated to drop the "(proc-extension pending)" note it carried since task #156 — the card will now populate live. Admin + Core full-solution build clean. No unit-test changes needed — the existing StoredProceduresTests cover the proc-exec path + would immediately catch any SQL syntax regression on next SQL Server integration run. Task #196 fully closed now — Probe-this-permission (slice 1, PR 144), SignalR invalidation (slice 2, PR 145), draft-diff ACL section (this PR).
Sign in to join this conversation.