Files
lmxopcua/src/ZB.MOM.WW.OtOpcUa.Server
Joseph Doherty 432173c5c4 ADR-001 last-mile — Program.cs composes EquipmentNodeWalker into the production boot path. Closes task #214 + fully lands ADR-001 Option A as a live code path, not just a connected set of unit-tested primitives. After this PR a server booted against a real Config DB with Published Equipment rows materializes the UNS tree into the OPC UA address space on startup — the whole walker → wire-in → loader chain (PRs #153, #154, #155, #156) finally fires end-to-end in the production process. DriverEquipmentContentRegistry is the handoff between OpcUaServerService's bootstrap-time populate pass + OpcUaApplicationHost's StartAsync walker invocation. It's a singleton mutable holder with Get/Set/Count + Lock-guarded internal dictionary keyed OrdinalIgnoreCase to match the DriverInstanceId convention used by Equipment / Tag rows + walker grouping. Set-once-per-bootstrap semantics in practice though nothing enforces that at the type level — OpcUaServerService.PopulateEquipmentContentAsync is the only expected writer. Shared-mutable rather than immutable-passed-by-value because the DI graph builds OpcUaApplicationHost before NodeBootstrap has resolved the generation, so the registry must exist at compose time + fill at boot time. Program.cs now registers OpcUaApplicationHost via a factory lambda that threads registry.Get as the equipmentContentLookup delegate PR #155 added to the ctor seam — the one-line composition the earlier PR promised. EquipmentNamespaceContentLoader (from PR #156) is AddScoped since it takes the scoped OtOpcUaConfigDbContext; the populate pass in OpcUaServerService opens one IServiceScopeFactory scope + reuses the same loader + DbContext across every driver query rather than scoping-per-driver. OpcUaServerService.ExecuteAsync gets a new PopulateEquipmentContentAsync step between bootstrap + StartAsync: iterates DriverHost.RegisteredDriverIds, calls loader.LoadAsync per driver at the bootstrapped generationId, stashes non-null results in the registry. Null results are skipped — the wire-in's null-check treats absent registry entries as "this driver isn't Equipment-kind; let DiscoverAsync own the address space" which is the correct backward-compat path for Modbus / AB CIP / TwinCAT / FOCAS. Guarded on result.GenerationId being non-null — a fleet with no Published generation yet boots cleanly into a UNS-less address space and fills the registry on the next restart after first publish. Ctor on OpcUaServerService gained two new dependencies (DriverEquipmentContentRegistry + IServiceScopeFactory). No test file constructs OpcUaServerService directly so no downstream test breakage — the BackgroundService is only wired via DI in Program.cs. Four new DriverEquipmentContentRegistryTests: Get-null-for-unknown, Set-then-Get, case-insensitive driver-id lookup, Set-overwrites-existing. Server.Tests 190/190 (was 186, +4 new registry tests). Full ADR-001 Option A now lives at every layer: Core.OpcUa walker (#153) → ScopePathIndexBuilder (#154) → OpcUaApplicationHost wire-in (#155) → EquipmentNamespaceContentLoader (#156) → this PR's registry + Program.cs composition. The last pending loose end (full-integration smoke test that boots Program.cs against a seeded Config DB + verifies UNS tree via live OPC UA client) isn't strictly necessary because PR #155's OpcUaEquipmentWalkerIntegrationTests already proves the wire-in at the OPC UA client-browse level — the Program.cs composition added here is purely mechanical + well-covered by the four-file audit trail plus registry unit tests.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-20 03:50:37 -04:00
..
ADR-001 last-mile — Program.cs composes EquipmentNodeWalker into the production boot path. Closes task #214 + fully lands ADR-001 Option A as a live code path, not just a connected set of unit-tested primitives. After this PR a server booted against a real Config DB with Published Equipment rows materializes the UNS tree into the OPC UA address space on startup — the whole walker → wire-in → loader chain (PRs #153, #154, #155, #156) finally fires end-to-end in the production process. DriverEquipmentContentRegistry is the handoff between OpcUaServerService's bootstrap-time populate pass + OpcUaApplicationHost's StartAsync walker invocation. It's a singleton mutable holder with Get/Set/Count + Lock-guarded internal dictionary keyed OrdinalIgnoreCase to match the DriverInstanceId convention used by Equipment / Tag rows + walker grouping. Set-once-per-bootstrap semantics in practice though nothing enforces that at the type level — OpcUaServerService.PopulateEquipmentContentAsync is the only expected writer. Shared-mutable rather than immutable-passed-by-value because the DI graph builds OpcUaApplicationHost before NodeBootstrap has resolved the generation, so the registry must exist at compose time + fill at boot time. Program.cs now registers OpcUaApplicationHost via a factory lambda that threads registry.Get as the equipmentContentLookup delegate PR #155 added to the ctor seam — the one-line composition the earlier PR promised. EquipmentNamespaceContentLoader (from PR #156) is AddScoped since it takes the scoped OtOpcUaConfigDbContext; the populate pass in OpcUaServerService opens one IServiceScopeFactory scope + reuses the same loader + DbContext across every driver query rather than scoping-per-driver. OpcUaServerService.ExecuteAsync gets a new PopulateEquipmentContentAsync step between bootstrap + StartAsync: iterates DriverHost.RegisteredDriverIds, calls loader.LoadAsync per driver at the bootstrapped generationId, stashes non-null results in the registry. Null results are skipped — the wire-in's null-check treats absent registry entries as "this driver isn't Equipment-kind; let DiscoverAsync own the address space" which is the correct backward-compat path for Modbus / AB CIP / TwinCAT / FOCAS. Guarded on result.GenerationId being non-null — a fleet with no Published generation yet boots cleanly into a UNS-less address space and fills the registry on the next restart after first publish. Ctor on OpcUaServerService gained two new dependencies (DriverEquipmentContentRegistry + IServiceScopeFactory). No test file constructs OpcUaServerService directly so no downstream test breakage — the BackgroundService is only wired via DI in Program.cs. Four new DriverEquipmentContentRegistryTests: Get-null-for-unknown, Set-then-Get, case-insensitive driver-id lookup, Set-overwrites-existing. Server.Tests 190/190 (was 186, +4 new registry tests). Full ADR-001 Option A now lives at every layer: Core.OpcUa walker (#153) → ScopePathIndexBuilder (#154) → OpcUaApplicationHost wire-in (#155) → EquipmentNamespaceContentLoader (#156) → this PR's registry + Program.cs composition. The last pending loose end (full-integration smoke test that boots Program.cs against a seeded Config DB + verifies UNS tree via live OPC UA client) isn't strictly necessary because PR #155's OpcUaEquipmentWalkerIntegrationTests already proves the wire-in at the OPC UA client-browse level — the Program.cs composition added here is purely mechanical + well-covered by the four-file audit trail plus registry unit tests.
2026-04-20 03:50:37 -04:00
ADR-001 Task B — NodeScopeResolver full-path + ScopePathIndexBuilder + evaluator-level ACL test closing #195. Two production additions + one end-to-end authz regression test proving the Identification ACL contract the IdentificationFolderBuilder docstring promises. Task A (PR #153) shipped the walker as a pure function that materializes the UNS → Equipment → Tag browse tree + IdentificationFolderBuilder.Build per Equipment. This PR lands the authz half of the walker's story — the resolver side that turns a driver-side full reference into a full NodeScope path (NamespaceId + UnsAreaId + UnsLineId + EquipmentId + TagId) so the permission trie can walk the UNS hierarchy + apply Equipment-scope grants correctly at dispatch time. The actual in-server wiring (load snapshot → call walker during BuildAddressSpaceAsync → swap in the full-path resolver) is split into follow-up task #212 because it's a bigger surface (Server bootstrap + DriverNodeManager override + real OPC UA client-browse integration test). NodeScopeResolver extended with a second constructor taking IReadOnlyDictionary<string, NodeScope> pathIndex — when supplied, Resolve looks up the full reference in the index + returns the indexed scope with every UNS level populated; when absent or on miss, falls back to the pre-ADR-001 cluster-only scope so driver-discovered tags that haven't been indexed yet (between a DiscoverAsync result + the next generation publish) stay addressable without crashing the resolver. Index is frozen into a FrozenDictionary<string, NodeScope> under Ordinal comparer for O(1) hot-path lookups. Thread-safety by immutability — callers swap atomically on generation change via the server's publish pipeline. New ScopePathIndexBuilder.Build in Server.Security takes (clusterId, namespaceId, EquipmentNamespaceContent) + produces the fullReference → NodeScope dictionary by joining Tag → Equipment → UnsLine → UnsArea through up-front dictionaries keyed Ordinal-ignoring-case. Tag rows with null EquipmentId (SystemPlatform-namespace Galaxy tags per decision #120) are excluded from the index; cluster-only fallback path covers them. Broken FKs (Tag references missing Equipment row, or Equipment references missing UnsLine) are skipped rather than crashing — sp_ValidateDraft should have caught these at publish, any drift here is unexpected but non-fatal. Duplicate keys throw InvalidOperationException at bootstrap so corrupt-data drift surfaces up-front instead of producing silently-last-wins scopes at dispatch. End-to-end authz regression test in EquipmentIdentificationAuthzTests walks the full dispatch flow against a Config-DB-style fixture: ScopePathIndexBuilder.Build from the same EquipmentNamespaceContent the EquipmentNodeWalker consumes → NodeScopeResolver with that index → AuthorizationGate + TriePermissionEvaluator → PermissionTrieBuilder with one Equipment-scope NodeAcl grant + a NodeAclPath resolving Equipment ScopeId to (namespace, area, line, equipment). Four tests prove the contract: (a) authorized group Read granted on Identification property; (b) unauthorized group Read denied on Identification property — the #195 contract the IdentificationFolderBuilder docstring promises (the BadUserAccessDenied surfacing happens at the DriverNodeManager dispatch layer which is already wired to AuthorizationGate.IsAllowed → StatusCodes.BadUserAccessDenied in PR #94); (c) Equipment-scope grant cascades to both the Equipment's tag + its Identification properties because they share the Equipment ScopeId — no new scope level for Identification per the builder's Remarks section; (d) grant on oven-3 does NOT leak to press-7 (different equipment under the same UnsLine) proving per-Equipment isolation at dispatch when the resolver populates the full path. NodeScopeResolverTests extended with two new tests covering the indexed-lookup path + fallback-on-miss path; renamed the existing "_For_Phase1" test to "_When_NoIndexSupplied" to match the current framing. Server project builds 0 errors; Server.Tests 179/179 (was 173, +6 new across the two test files). Task #212 captures the remaining in-server wiring work — Server.SealedBootstrap load of EquipmentNamespaceContent, DriverNodeManager override that calls EquipmentNodeWalker during BuildAddressSpaceAsync for Equipment-kind namespaces, and a real OPC UA client-browse integration test. With that wiring + this PR's authz-layer proof, #195's "ACL integration test" line is satisfied at two layers (evaluator + live endpoint) which is stronger than the task originally asked for.
2026-04-20 02:50:27 -04:00
Phase 1 Streams B–E scaffold + Phase 2 Streams A–C scaffold — 8 new projects with ~70 new tests, all green alongside the 494 v1 IntegrationTests baseline (parity preserved: no v1 tests broken; legacy OtOpcUa.Host untouched). Phase 1 finish: Configuration project (16 entities + 10 enums + DbContext + DesignTimeDbContextFactory + InitialSchema/StoredProcedures/AuthorizationGrants migrations — 8 procs including sp_PublishGeneration with MERGE on ExternalIdReservation per decision #124, sp_RollbackToGeneration cloning rows into a new published generation, sp_ValidateDraft with cross-cluster-namespace + EquipmentUuid-immutability + ZTag/SAPID reservation pre-flight, sp_ComputeGenerationDiff with CHECKSUM-based row signature — plus OtOpcUaNode/OtOpcUaAdmin SQL roles with EXECUTE grants scoped to per-principal-class proc sets and DENY UPDATE/DELETE/INSERT/SELECT on dbo schema); managed DraftValidator covering UNS segment regex, path length, EquipmentUuid immutability across generations, same-cluster namespace binding (decision #122), reservation pre-flight, EquipmentId derivation (decision #125), driver↔namespace compatibility — returning every failing rule in one pass; LiteDB local cache with round-trip + ring pruning + corruption-fast-fail; GenerationApplier with per-entity Added/Removed/Modified diff and dependency-ordered callbacks (namespace → driver → device → equipment → poll-group → tag, Removed before Added); Core project with GenericDriverNodeManager (scaffold for the Phase 2 Galaxy port) and DriverHost lifecycle registry; Server project using Microsoft.Extensions.Hosting BackgroundService replacing TopShelf, with NodeBootstrap that falls back to LiteDB cache when the central DB is unreachable (decision #79); Admin project scaffolded as Blazor Server with Bootstrap 5 sidebar layout, cookie auth, three admin roles (ConfigViewer/ConfigEditor/FleetAdmin), Cluster + Generation services fronting the stored procs. Phase 2 scaffold: Driver.Galaxy.Shared (netstandard2.0) with full MessagePack IPC contract surface — Hello version negotiation, Open/CloseSession, Heartbeat, DiscoverHierarchy + GalaxyObjectInfo/GalaxyAttributeInfo, Read/WriteValues, Subscribe/Unsubscribe/OnDataChange, AlarmSubscribe/Event/Ack, HistoryRead, HostConnectivityStatus, Recycle — plus length-prefixed framing (decision #28) with a 16 MiB cap and thread-safe FrameWriter/FrameReader; Driver.Galaxy.Host (net48) implementing the Tier C cross-cutting protections from driver-stability.md — strict PipeAcl (allow configured server SID only, explicit deny on LocalSystem + Administrators), PipeServer with caller-SID verification via pipe.RunAsClient + WindowsIdentity.GetCurrent and per-process shared-secret Hello, Galaxy-specific MemoryWatchdog (warn at max(1.5×baseline, +200 MB), soft-recycle at max(2×baseline, +200 MB), hard ceiling 1.5 GB, slope ≥5 MB/min over 30-min rolling window), RecyclePolicy (1 soft recycle per hour cap + 03:00 local daily scheduled), PostMortemMmf (1000-entry ring buffer in %ProgramData%\OtOpcUa\driver-postmortem\galaxy.mmf, survives hard crash, readable cross-process), MxAccessHandle : SafeHandle (ReleaseHandle loops Marshal.ReleaseComObject until refcount=0 then calls optional unregister callback), StaPump with responsiveness probe (BlockingCollection dispatcher for Phase 1 — real Win32 GetMessage/DispatchMessage pump slots in with the same semantics when the Galaxy code lift happens), IsExternalInit shim for init setters on .NET 4.8; Driver.Galaxy.Proxy (net10) implementing IDriver + ITagDiscovery forwarding over the IPC channel with MX data-type and security-classification mapping, plus Supervisor pieces — Backoff (5s → 15s → 60s capped, reset-on-stable-run), CircuitBreaker (3 crashes per 5 min opens; 1h → 4h → manual cooldown escalation; sticky alert doesn't auto-clear), HeartbeatMonitor (2s cadence, 3 consecutive misses = host dead per driver-stability.md). Infrastructure: docker SQL Server remapped to host port 14330 to coexist with the native MSSQL14 Galaxy ZB DB instance on 1433; NuGetAuditSuppress applied per-project for two System.Security.Cryptography.Xml advisories that only reach via EF Core Design with PrivateAssets=all (fix ships in 11.0.0-preview); .slnx gains 14 project registrations. Deferred with explicit TODOs in docs/v2/implementation/phase-2-partial-exit-evidence.md: Phase 1 Stream E Admin UI pages (Generations listing + draft-diff-publish, Equipment CRUD with OPC 40010 fields, UNS Areas/Lines tabs, ACLs + permission simulator, Generic JSON config editor, SignalR real-time, Release-Reservation + Merge-Equipment workflows, LDAP login page, AppServer smoke test per decision #142), Phase 2 Stream D (Galaxy MXAccess code lift out of legacy OtOpcUa.Host, dual-service installer, appsettings → DriverConfig migration script, legacy Host deletion — blocked by parity), Phase 2 Stream E (v1 IntegrationTests against v2 topology, Client.CLI walkthrough diff, four 2026-04-13 stability findings regression tests, adversarial review — requires live MXAccess runtime).
2026-04-17 21:35:25 -04:00
Phase 3 PR 34 — Host-status publisher (Server) + /hosts drill-down page (Admin). Closes LMX follow-up #7 by wiring together the data layer from PR 33. Server.HostStatusPublisher is a BackgroundService that walks every driver registered in DriverHost every 10 seconds, skips drivers that don't implement IHostConnectivityProbe, calls GetHostStatuses() on each probe-capable driver, and upserts one DriverHostStatus row per (NodeId, DriverInstanceId, HostName) into the central config DB. Upsert path: SingleOrDefaultAsync on the composite PK; if no row exists, Add a new one; if a row exists, LastSeenUtc advances unconditionally (heartbeat) and State + StateChangedUtc update only on transitions so Admin UI can distinguish 'still reporting, still Running' from 'freshly transitioned to Running'. MapState translates Core.Abstractions.HostState to Configuration.Enums.DriverHostState (intentional duplicate enum — Configuration project stays free of driver-runtime deps per PR 33's choice). If a driver's GetHostStatuses throws, log warning and skip that driver this tick — never take down the Server on a publisher failure. If the DB is unreachable, log warning + retry next heartbeat (no buffering — next tick's current-state snapshot is more useful than replaying stale transitions after a long outage). 2-second startup delay so NodeBootstrap's RegisterAsync calls land before the first publish tick, then tick runs immediately so a freshly-started Server surfaces its host topology in the Admin UI without waiting a full interval.
2026-04-18 15:51:55 -04:00
Phase 1 Streams B–E scaffold + Phase 2 Streams A–C scaffold — 8 new projects with ~70 new tests, all green alongside the 494 v1 IntegrationTests baseline (parity preserved: no v1 tests broken; legacy OtOpcUa.Host untouched). Phase 1 finish: Configuration project (16 entities + 10 enums + DbContext + DesignTimeDbContextFactory + InitialSchema/StoredProcedures/AuthorizationGrants migrations — 8 procs including sp_PublishGeneration with MERGE on ExternalIdReservation per decision #124, sp_RollbackToGeneration cloning rows into a new published generation, sp_ValidateDraft with cross-cluster-namespace + EquipmentUuid-immutability + ZTag/SAPID reservation pre-flight, sp_ComputeGenerationDiff with CHECKSUM-based row signature — plus OtOpcUaNode/OtOpcUaAdmin SQL roles with EXECUTE grants scoped to per-principal-class proc sets and DENY UPDATE/DELETE/INSERT/SELECT on dbo schema); managed DraftValidator covering UNS segment regex, path length, EquipmentUuid immutability across generations, same-cluster namespace binding (decision #122), reservation pre-flight, EquipmentId derivation (decision #125), driver↔namespace compatibility — returning every failing rule in one pass; LiteDB local cache with round-trip + ring pruning + corruption-fast-fail; GenerationApplier with per-entity Added/Removed/Modified diff and dependency-ordered callbacks (namespace → driver → device → equipment → poll-group → tag, Removed before Added); Core project with GenericDriverNodeManager (scaffold for the Phase 2 Galaxy port) and DriverHost lifecycle registry; Server project using Microsoft.Extensions.Hosting BackgroundService replacing TopShelf, with NodeBootstrap that falls back to LiteDB cache when the central DB is unreachable (decision #79); Admin project scaffolded as Blazor Server with Bootstrap 5 sidebar layout, cookie auth, three admin roles (ConfigViewer/ConfigEditor/FleetAdmin), Cluster + Generation services fronting the stored procs. Phase 2 scaffold: Driver.Galaxy.Shared (netstandard2.0) with full MessagePack IPC contract surface — Hello version negotiation, Open/CloseSession, Heartbeat, DiscoverHierarchy + GalaxyObjectInfo/GalaxyAttributeInfo, Read/WriteValues, Subscribe/Unsubscribe/OnDataChange, AlarmSubscribe/Event/Ack, HistoryRead, HostConnectivityStatus, Recycle — plus length-prefixed framing (decision #28) with a 16 MiB cap and thread-safe FrameWriter/FrameReader; Driver.Galaxy.Host (net48) implementing the Tier C cross-cutting protections from driver-stability.md — strict PipeAcl (allow configured server SID only, explicit deny on LocalSystem + Administrators), PipeServer with caller-SID verification via pipe.RunAsClient + WindowsIdentity.GetCurrent and per-process shared-secret Hello, Galaxy-specific MemoryWatchdog (warn at max(1.5×baseline, +200 MB), soft-recycle at max(2×baseline, +200 MB), hard ceiling 1.5 GB, slope ≥5 MB/min over 30-min rolling window), RecyclePolicy (1 soft recycle per hour cap + 03:00 local daily scheduled), PostMortemMmf (1000-entry ring buffer in %ProgramData%\OtOpcUa\driver-postmortem\galaxy.mmf, survives hard crash, readable cross-process), MxAccessHandle : SafeHandle (ReleaseHandle loops Marshal.ReleaseComObject until refcount=0 then calls optional unregister callback), StaPump with responsiveness probe (BlockingCollection dispatcher for Phase 1 — real Win32 GetMessage/DispatchMessage pump slots in with the same semantics when the Galaxy code lift happens), IsExternalInit shim for init setters on .NET 4.8; Driver.Galaxy.Proxy (net10) implementing IDriver + ITagDiscovery forwarding over the IPC channel with MX data-type and security-classification mapping, plus Supervisor pieces — Backoff (5s → 15s → 60s capped, reset-on-stable-run), CircuitBreaker (3 crashes per 5 min opens; 1h → 4h → manual cooldown escalation; sticky alert doesn't auto-clear), HeartbeatMonitor (2s cadence, 3 consecutive misses = host dead per driver-stability.md). Infrastructure: docker SQL Server remapped to host port 14330 to coexist with the native MSSQL14 Galaxy ZB DB instance on 1433; NuGetAuditSuppress applied per-project for two System.Security.Cryptography.Xml advisories that only reach via EF Core Design with PrivateAssets=all (fix ships in 11.0.0-preview); .slnx gains 14 project registrations. Deferred with explicit TODOs in docs/v2/implementation/phase-2-partial-exit-evidence.md: Phase 1 Stream E Admin UI pages (Generations listing + draft-diff-publish, Equipment CRUD with OPC 40010 fields, UNS Areas/Lines tabs, ACLs + permission simulator, Generic JSON config editor, SignalR real-time, Release-Reservation + Merge-Equipment workflows, LDAP login page, AppServer smoke test per decision #142), Phase 2 Stream D (Galaxy MXAccess code lift out of legacy OtOpcUa.Host, dual-service installer, appsettings → DriverConfig migration script, legacy Host deletion — blocked by parity), Phase 2 Stream E (v1 IntegrationTests against v2 topology, Client.CLI walkthrough diff, four 2026-04-13 stability findings regression tests, adversarial review — requires live MXAccess runtime).
2026-04-17 21:35:25 -04:00
Phase 1 Streams B–E scaffold + Phase 2 Streams A–C scaffold — 8 new projects with ~70 new tests, all green alongside the 494 v1 IntegrationTests baseline (parity preserved: no v1 tests broken; legacy OtOpcUa.Host untouched). Phase 1 finish: Configuration project (16 entities + 10 enums + DbContext + DesignTimeDbContextFactory + InitialSchema/StoredProcedures/AuthorizationGrants migrations — 8 procs including sp_PublishGeneration with MERGE on ExternalIdReservation per decision #124, sp_RollbackToGeneration cloning rows into a new published generation, sp_ValidateDraft with cross-cluster-namespace + EquipmentUuid-immutability + ZTag/SAPID reservation pre-flight, sp_ComputeGenerationDiff with CHECKSUM-based row signature — plus OtOpcUaNode/OtOpcUaAdmin SQL roles with EXECUTE grants scoped to per-principal-class proc sets and DENY UPDATE/DELETE/INSERT/SELECT on dbo schema); managed DraftValidator covering UNS segment regex, path length, EquipmentUuid immutability across generations, same-cluster namespace binding (decision #122), reservation pre-flight, EquipmentId derivation (decision #125), driver↔namespace compatibility — returning every failing rule in one pass; LiteDB local cache with round-trip + ring pruning + corruption-fast-fail; GenerationApplier with per-entity Added/Removed/Modified diff and dependency-ordered callbacks (namespace → driver → device → equipment → poll-group → tag, Removed before Added); Core project with GenericDriverNodeManager (scaffold for the Phase 2 Galaxy port) and DriverHost lifecycle registry; Server project using Microsoft.Extensions.Hosting BackgroundService replacing TopShelf, with NodeBootstrap that falls back to LiteDB cache when the central DB is unreachable (decision #79); Admin project scaffolded as Blazor Server with Bootstrap 5 sidebar layout, cookie auth, three admin roles (ConfigViewer/ConfigEditor/FleetAdmin), Cluster + Generation services fronting the stored procs. Phase 2 scaffold: Driver.Galaxy.Shared (netstandard2.0) with full MessagePack IPC contract surface — Hello version negotiation, Open/CloseSession, Heartbeat, DiscoverHierarchy + GalaxyObjectInfo/GalaxyAttributeInfo, Read/WriteValues, Subscribe/Unsubscribe/OnDataChange, AlarmSubscribe/Event/Ack, HistoryRead, HostConnectivityStatus, Recycle — plus length-prefixed framing (decision #28) with a 16 MiB cap and thread-safe FrameWriter/FrameReader; Driver.Galaxy.Host (net48) implementing the Tier C cross-cutting protections from driver-stability.md — strict PipeAcl (allow configured server SID only, explicit deny on LocalSystem + Administrators), PipeServer with caller-SID verification via pipe.RunAsClient + WindowsIdentity.GetCurrent and per-process shared-secret Hello, Galaxy-specific MemoryWatchdog (warn at max(1.5×baseline, +200 MB), soft-recycle at max(2×baseline, +200 MB), hard ceiling 1.5 GB, slope ≥5 MB/min over 30-min rolling window), RecyclePolicy (1 soft recycle per hour cap + 03:00 local daily scheduled), PostMortemMmf (1000-entry ring buffer in %ProgramData%\OtOpcUa\driver-postmortem\galaxy.mmf, survives hard crash, readable cross-process), MxAccessHandle : SafeHandle (ReleaseHandle loops Marshal.ReleaseComObject until refcount=0 then calls optional unregister callback), StaPump with responsiveness probe (BlockingCollection dispatcher for Phase 1 — real Win32 GetMessage/DispatchMessage pump slots in with the same semantics when the Galaxy code lift happens), IsExternalInit shim for init setters on .NET 4.8; Driver.Galaxy.Proxy (net10) implementing IDriver + ITagDiscovery forwarding over the IPC channel with MX data-type and security-classification mapping, plus Supervisor pieces — Backoff (5s → 15s → 60s capped, reset-on-stable-run), CircuitBreaker (3 crashes per 5 min opens; 1h → 4h → manual cooldown escalation; sticky alert doesn't auto-clear), HeartbeatMonitor (2s cadence, 3 consecutive misses = host dead per driver-stability.md). Infrastructure: docker SQL Server remapped to host port 14330 to coexist with the native MSSQL14 Galaxy ZB DB instance on 1433; NuGetAuditSuppress applied per-project for two System.Security.Cryptography.Xml advisories that only reach via EF Core Design with PrivateAssets=all (fix ships in 11.0.0-preview); .slnx gains 14 project registrations. Deferred with explicit TODOs in docs/v2/implementation/phase-2-partial-exit-evidence.md: Phase 1 Stream E Admin UI pages (Generations listing + draft-diff-publish, Equipment CRUD with OPC 40010 fields, UNS Areas/Lines tabs, ACLs + permission simulator, Generic JSON config editor, SignalR real-time, Release-Reservation + Merge-Equipment workflows, LDAP login page, AppServer smoke test per decision #142), Phase 2 Stream D (Galaxy MXAccess code lift out of legacy OtOpcUa.Host, dual-service installer, appsettings → DriverConfig migration script, legacy Host deletion — blocked by parity), Phase 2 Stream E (v1 IntegrationTests against v2 topology, Client.CLI walkthrough diff, four 2026-04-13 stability findings regression tests, adversarial review — requires live MXAccess runtime).
2026-04-17 21:35:25 -04:00
ADR-001 last-mile — Program.cs composes EquipmentNodeWalker into the production boot path. Closes task #214 + fully lands ADR-001 Option A as a live code path, not just a connected set of unit-tested primitives. After this PR a server booted against a real Config DB with Published Equipment rows materializes the UNS tree into the OPC UA address space on startup — the whole walker → wire-in → loader chain (PRs #153, #154, #155, #156) finally fires end-to-end in the production process. DriverEquipmentContentRegistry is the handoff between OpcUaServerService's bootstrap-time populate pass + OpcUaApplicationHost's StartAsync walker invocation. It's a singleton mutable holder with Get/Set/Count + Lock-guarded internal dictionary keyed OrdinalIgnoreCase to match the DriverInstanceId convention used by Equipment / Tag rows + walker grouping. Set-once-per-bootstrap semantics in practice though nothing enforces that at the type level — OpcUaServerService.PopulateEquipmentContentAsync is the only expected writer. Shared-mutable rather than immutable-passed-by-value because the DI graph builds OpcUaApplicationHost before NodeBootstrap has resolved the generation, so the registry must exist at compose time + fill at boot time. Program.cs now registers OpcUaApplicationHost via a factory lambda that threads registry.Get as the equipmentContentLookup delegate PR #155 added to the ctor seam — the one-line composition the earlier PR promised. EquipmentNamespaceContentLoader (from PR #156) is AddScoped since it takes the scoped OtOpcUaConfigDbContext; the populate pass in OpcUaServerService opens one IServiceScopeFactory scope + reuses the same loader + DbContext across every driver query rather than scoping-per-driver. OpcUaServerService.ExecuteAsync gets a new PopulateEquipmentContentAsync step between bootstrap + StartAsync: iterates DriverHost.RegisteredDriverIds, calls loader.LoadAsync per driver at the bootstrapped generationId, stashes non-null results in the registry. Null results are skipped — the wire-in's null-check treats absent registry entries as "this driver isn't Equipment-kind; let DiscoverAsync own the address space" which is the correct backward-compat path for Modbus / AB CIP / TwinCAT / FOCAS. Guarded on result.GenerationId being non-null — a fleet with no Published generation yet boots cleanly into a UNS-less address space and fills the registry on the next restart after first publish. Ctor on OpcUaServerService gained two new dependencies (DriverEquipmentContentRegistry + IServiceScopeFactory). No test file constructs OpcUaServerService directly so no downstream test breakage — the BackgroundService is only wired via DI in Program.cs. Four new DriverEquipmentContentRegistryTests: Get-null-for-unknown, Set-then-Get, case-insensitive driver-id lookup, Set-overwrites-existing. Server.Tests 190/190 (was 186, +4 new registry tests). Full ADR-001 Option A now lives at every layer: Core.OpcUa walker (#153) → ScopePathIndexBuilder (#154) → OpcUaApplicationHost wire-in (#155) → EquipmentNamespaceContentLoader (#156) → this PR's registry + Program.cs composition. The last pending loose end (full-integration smoke test that boots Program.cs against a seeded Config DB + verifies UNS tree via live OPC UA client) isn't strictly necessary because PR #155's OpcUaEquipmentWalkerIntegrationTests already proves the wire-in at the OPC UA client-browse level — the Program.cs composition added here is purely mechanical + well-covered by the four-file audit trail plus registry unit tests.
2026-04-20 03:50:37 -04:00
ADR-001 last-mile — Program.cs composes EquipmentNodeWalker into the production boot path. Closes task #214 + fully lands ADR-001 Option A as a live code path, not just a connected set of unit-tested primitives. After this PR a server booted against a real Config DB with Published Equipment rows materializes the UNS tree into the OPC UA address space on startup — the whole walker → wire-in → loader chain (PRs #153, #154, #155, #156) finally fires end-to-end in the production process. DriverEquipmentContentRegistry is the handoff between OpcUaServerService's bootstrap-time populate pass + OpcUaApplicationHost's StartAsync walker invocation. It's a singleton mutable holder with Get/Set/Count + Lock-guarded internal dictionary keyed OrdinalIgnoreCase to match the DriverInstanceId convention used by Equipment / Tag rows + walker grouping. Set-once-per-bootstrap semantics in practice though nothing enforces that at the type level — OpcUaServerService.PopulateEquipmentContentAsync is the only expected writer. Shared-mutable rather than immutable-passed-by-value because the DI graph builds OpcUaApplicationHost before NodeBootstrap has resolved the generation, so the registry must exist at compose time + fill at boot time. Program.cs now registers OpcUaApplicationHost via a factory lambda that threads registry.Get as the equipmentContentLookup delegate PR #155 added to the ctor seam — the one-line composition the earlier PR promised. EquipmentNamespaceContentLoader (from PR #156) is AddScoped since it takes the scoped OtOpcUaConfigDbContext; the populate pass in OpcUaServerService opens one IServiceScopeFactory scope + reuses the same loader + DbContext across every driver query rather than scoping-per-driver. OpcUaServerService.ExecuteAsync gets a new PopulateEquipmentContentAsync step between bootstrap + StartAsync: iterates DriverHost.RegisteredDriverIds, calls loader.LoadAsync per driver at the bootstrapped generationId, stashes non-null results in the registry. Null results are skipped — the wire-in's null-check treats absent registry entries as "this driver isn't Equipment-kind; let DiscoverAsync own the address space" which is the correct backward-compat path for Modbus / AB CIP / TwinCAT / FOCAS. Guarded on result.GenerationId being non-null — a fleet with no Published generation yet boots cleanly into a UNS-less address space and fills the registry on the next restart after first publish. Ctor on OpcUaServerService gained two new dependencies (DriverEquipmentContentRegistry + IServiceScopeFactory). No test file constructs OpcUaServerService directly so no downstream test breakage — the BackgroundService is only wired via DI in Program.cs. Four new DriverEquipmentContentRegistryTests: Get-null-for-unknown, Set-then-Get, case-insensitive driver-id lookup, Set-overwrites-existing. Server.Tests 190/190 (was 186, +4 new registry tests). Full ADR-001 Option A now lives at every layer: Core.OpcUa walker (#153) → ScopePathIndexBuilder (#154) → OpcUaApplicationHost wire-in (#155) → EquipmentNamespaceContentLoader (#156) → this PR's registry + Program.cs composition. The last pending loose end (full-integration smoke test that boots Program.cs against a seeded Config DB + verifies UNS tree via live OPC UA client) isn't strictly necessary because PR #155's OpcUaEquipmentWalkerIntegrationTests already proves the wire-in at the OPC UA client-browse level — the Program.cs composition added here is purely mechanical + well-covered by the four-file audit trail plus registry unit tests.
2026-04-20 03:50:37 -04:00
Roslyn analyzer — detect unwrapped driver-capability calls (OTOPCUA0001). Closes task #200. New netstandard2.0 analyzer project src/ZB.MOM.WW.OtOpcUa.Analyzers registered as an <Analyzer>-item ProjectReference from the Server csproj so the warning fires at every Server compile. First (and only so far) rule OTOPCUA0001 — "Driver capability call must be wrapped in CapabilityInvoker" — walks every InvocationOperation in the AST + trips when (a) the target method implements one of the seven guarded capability interfaces (IReadable / IWritable / ITagDiscovery / ISubscribable / IHostConnectivityProbe / IAlarmSource / IHistoryProvider) AND (b) the method's return type is Task, Task<T>, ValueTask, or ValueTask<T> — the async-wire-call constraint narrows the rule to the surfaces the Phase 6.1 pipeline actually wraps + sidesteps pure in-memory accessors like IHostConnectivityProbe.GetHostStatuses() which would trigger false positives AND (c) the call does NOT sit inside a lambda argument passed to CapabilityInvoker.ExecuteAsync / ExecuteWriteAsync / AlarmSurfaceInvoker.*. The wrapper detection walks up the syntax tree from the call site, finds any enclosing InvocationExpressionSyntax whose method's containing type is one of the wrapper classes, + verifies the call lives transitively inside that invocation's AnonymousFunctionExpressionSyntax argument — a sibling "result = await driver.ReadAsync(...)" followed by a separate invoker.ExecuteAsync(...) call does NOT satisfy the wrapping rule + the analyzer flags it (regression guard in the 5th test). Five xunit-v3 + Shouldly tests at tests/ZB.MOM.WW.OtOpcUa.Analyzers.Tests: direct ReadAsync in server namespace trips; wrapped ReadAsync inside CapabilityInvoker.ExecuteAsync lambda passes; direct WriteAsync trips; direct DiscoverAsync trips; sneaky pattern — read outside the lambda + ExecuteAsync with unrelated lambda nearby — still trips. Hand-rolled test harness compiles a stub-plus-user snippet via CSharpCompilation.WithAnalyzers + runs GetAnalyzerDiagnosticsAsync directly, deliberately avoiding Microsoft.CodeAnalysis.CSharp.Analyzer.Testing.XUnit because that package pins to xunit v2 + this repo is on xunit.v3 everywhere else. RS2008 release-tracking noise suppressed by adding AnalyzerReleases.Shipped.md + AnalyzerReleases.Unshipped.md as AdditionalFiles, which is the canonical Roslyn-analyzer hygiene path. Analyzer DLL referenced from Server.csproj via ProjectReference with OutputItemType=Analyzer + ReferenceOutputAssembly=false — the DLL ships as a compiler plugin, not a runtime dependency. Server build validates clean: the analyzer activates on every Server file but finds zero violations, which confirms the Phase 6.1 wrapping work done in prior PRs is complete + the analyzer is now the regression guard preventing the next new capability surface from being added raw. slnx updated with both the src + tests project entries. Full solution build clean, analyzer suite 5/5 passing.
2026-04-20 00:52:40 -04:00