Files
lmxopcua/tests/ZB.MOM.WW.OtOpcUa.Server.Tests
Joseph Doherty 016122841b Phase 6.1 Stream E.2 partial — ResilienceStatusPublisherHostedService persists tracker snapshots to DB
Closes the HostedService half of Phase 6.1 Stream E.2 flagged as a follow-up
when the DriverResilienceStatusTracker shipped in PR #82. The Admin /hosts
column refresh + SignalR push + red-badge visual (Stream E.3) remain
deferred to the visual-compliance pass — this PR owns the persistence
story alone.

Server.Hosting:
- ResilienceStatusPublisherHostedService : BackgroundService. Samples the
  DriverResilienceStatusTracker every TickInterval (default 5 s) and upserts
  each (DriverInstanceId, HostName) counter pair into
  DriverInstanceResilienceStatus via EF. New rows on first sight; in-place
  updates on subsequent ticks.
- PersistOnceAsync extracted public so tests drive one tick directly —
  matches the ScheduledRecycleHostedService pattern for deterministic
  timing.
- Best-effort persistence: a DB outage logs a warning + continues; the next
  tick retries. Never crashes the app on sample failure. Cancellation
  propagates through cleanly.
- Tracks the bulkhead depth / recycle / footprint columns the entity was
  designed for. CurrentBulkheadDepth currently persisted as 0 — the tracker
  doesn't yet expose live bulkhead depth; a narrower follow-up wires the
  Polly bulkhead-depth observer into the tracker.

Tests (6 new in ResilienceStatusPublisherHostedServiceTests):
- Empty tracker → tick is a no-op, zero rows written.
- Single-host counters → upsert a new row with ConsecutiveFailures + breaker
  timestamp + sampled timestamp.
- Second tick updates the existing row in place (not a second insert).
- Multi-host pairs persist independently.
- Footprint counters (Baseline + Current) round-trip.
- TickCount advances on every PersistOnceAsync call.

Full solution dotnet test: 1225 passing (was 1219, +6). Pre-existing
Client.CLI Subscribe flake unchanged.

Production wiring (Program.cs) example:
  builder.Services.AddSingleton<DriverResilienceStatusTracker>();
  builder.Services.AddHostedService<ResilienceStatusPublisherHostedService>();
  // Tracker gets wired into CapabilityInvoker via OtOpcUaServer resolution
  // + the existing Phase 6.1 layer.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-19 14:36:00 -04:00
..
Phase 3 PR 34 — Host-status publisher (Server) + /hosts drill-down page (Admin). Closes LMX follow-up #7 by wiring together the data layer from PR 33. Server.HostStatusPublisher is a BackgroundService that walks every driver registered in DriverHost every 10 seconds, skips drivers that don't implement IHostConnectivityProbe, calls GetHostStatuses() on each probe-capable driver, and upserts one DriverHostStatus row per (NodeId, DriverInstanceId, HostName) into the central config DB. Upsert path: SingleOrDefaultAsync on the composite PK; if no row exists, Add a new one; if a row exists, LastSeenUtc advances unconditionally (heartbeat) and State + StateChangedUtc update only on transitions so Admin UI can distinguish 'still reporting, still Running' from 'freshly transitioned to Running'. MapState translates Core.Abstractions.HostState to Configuration.Enums.DriverHostState (intentional duplicate enum — Configuration project stays free of driver-runtime deps per PR 33's choice). If a driver's GetHostStatuses throws, log warning and skip that driver this tick — never take down the Server on a publisher failure. If the DB is unreachable, log warning + retry next heartbeat (no buffering — next tick's current-state snapshot is more useful than replaying stale transitions after a long outage). 2-second startup delay so NodeBootstrap's RegisterAsync calls land before the first publish tick, then tick runs immediately so a freshly-started Server surfaces its host topology in the Admin UI without waiting a full interval.
2026-04-18 15:51:55 -04:00
Phase 3 PR 31 — Live-LDAP integration test + Active Directory compatibility. Closes LMX follow-up #4 with 6 live-bind tests in Server.Tests/LdapUserAuthenticatorLiveTests.cs against the dev GLAuth instance at localhost:3893 (skipped cleanly when unreachable via Assert.Skip + a clear SkipReason — matches the GalaxyRepositoryLiveSmokeTests pattern). Coverage: valid credentials bind + surface DisplayName; wrong password fails; unknown user fails; empty credentials fail pre-flight without touching the directory; writeop user's memberOf maps through GroupToRole to WriteOperate (the exact string WriteAuthzPolicy.IsAllowed expects); admin user surfaces all four mapped roles (WriteOperate + WriteTune + WriteConfigure + AlarmAck) proving memberOf parsing doesn't stop after the first match. While wiring this up, the authenticator's hard-coded user-lookup filter 'uid=<name>' didn't match GLAuth (which keys users by cn and doesn't populate uid) — AND it doesn't match Active Directory either, which uses sAMAccountName. Added UserNameAttribute to LdapOptions (default 'uid' for RFC 2307 backcompat) so deployments override to 'cn' / 'sAMAccountName' / 'userPrincipalName' as the directory requires; authenticator filter now interpolates the configured attribute. The default stays 'uid' so existing test fixtures and OpenLDAP installs keep working without a config change — a regression guard in LdapUserAuthenticatorAdCompatTests.LdapOptions_default_UserNameAttribute_is_uid_for_rfc2307_compat pins this so a future 'helpful' default change can't silently break anyone.
2026-04-18 15:23:22 -04:00
Phase 3 PR 31 — Live-LDAP integration test + Active Directory compatibility. Closes LMX follow-up #4 with 6 live-bind tests in Server.Tests/LdapUserAuthenticatorLiveTests.cs against the dev GLAuth instance at localhost:3893 (skipped cleanly when unreachable via Assert.Skip + a clear SkipReason — matches the GalaxyRepositoryLiveSmokeTests pattern). Coverage: valid credentials bind + surface DisplayName; wrong password fails; unknown user fails; empty credentials fail pre-flight without touching the directory; writeop user's memberOf maps through GroupToRole to WriteOperate (the exact string WriteAuthzPolicy.IsAllowed expects); admin user surfaces all four mapped roles (WriteOperate + WriteTune + WriteConfigure + AlarmAck) proving memberOf parsing doesn't stop after the first match. While wiring this up, the authenticator's hard-coded user-lookup filter 'uid=<name>' didn't match GLAuth (which keys users by cn and doesn't populate uid) — AND it doesn't match Active Directory either, which uses sAMAccountName. Added UserNameAttribute to LdapOptions (default 'uid' for RFC 2307 backcompat) so deployments override to 'cn' / 'sAMAccountName' / 'userPrincipalName' as the directory requires; authenticator filter now interpolates the configured attribute. The default stays 'uid' so existing test fixtures and OpenLDAP installs keep working without a config change — a regression guard in LdapUserAuthenticatorAdCompatTests.LdapOptions_default_UserNameAttribute_is_uid_for_rfc2307_compat pins this so a future 'helpful' default change can't silently break anyone.
2026-04-18 15:23:22 -04:00
Phase 1 Streams B–E scaffold + Phase 2 Streams A–C scaffold — 8 new projects with ~70 new tests, all green alongside the 494 v1 IntegrationTests baseline (parity preserved: no v1 tests broken; legacy OtOpcUa.Host untouched). Phase 1 finish: Configuration project (16 entities + 10 enums + DbContext + DesignTimeDbContextFactory + InitialSchema/StoredProcedures/AuthorizationGrants migrations — 8 procs including sp_PublishGeneration with MERGE on ExternalIdReservation per decision #124, sp_RollbackToGeneration cloning rows into a new published generation, sp_ValidateDraft with cross-cluster-namespace + EquipmentUuid-immutability + ZTag/SAPID reservation pre-flight, sp_ComputeGenerationDiff with CHECKSUM-based row signature — plus OtOpcUaNode/OtOpcUaAdmin SQL roles with EXECUTE grants scoped to per-principal-class proc sets and DENY UPDATE/DELETE/INSERT/SELECT on dbo schema); managed DraftValidator covering UNS segment regex, path length, EquipmentUuid immutability across generations, same-cluster namespace binding (decision #122), reservation pre-flight, EquipmentId derivation (decision #125), driver↔namespace compatibility — returning every failing rule in one pass; LiteDB local cache with round-trip + ring pruning + corruption-fast-fail; GenerationApplier with per-entity Added/Removed/Modified diff and dependency-ordered callbacks (namespace → driver → device → equipment → poll-group → tag, Removed before Added); Core project with GenericDriverNodeManager (scaffold for the Phase 2 Galaxy port) and DriverHost lifecycle registry; Server project using Microsoft.Extensions.Hosting BackgroundService replacing TopShelf, with NodeBootstrap that falls back to LiteDB cache when the central DB is unreachable (decision #79); Admin project scaffolded as Blazor Server with Bootstrap 5 sidebar layout, cookie auth, three admin roles (ConfigViewer/ConfigEditor/FleetAdmin), Cluster + Generation services fronting the stored procs. Phase 2 scaffold: Driver.Galaxy.Shared (netstandard2.0) with full MessagePack IPC contract surface — Hello version negotiation, Open/CloseSession, Heartbeat, DiscoverHierarchy + GalaxyObjectInfo/GalaxyAttributeInfo, Read/WriteValues, Subscribe/Unsubscribe/OnDataChange, AlarmSubscribe/Event/Ack, HistoryRead, HostConnectivityStatus, Recycle — plus length-prefixed framing (decision #28) with a 16 MiB cap and thread-safe FrameWriter/FrameReader; Driver.Galaxy.Host (net48) implementing the Tier C cross-cutting protections from driver-stability.md — strict PipeAcl (allow configured server SID only, explicit deny on LocalSystem + Administrators), PipeServer with caller-SID verification via pipe.RunAsClient + WindowsIdentity.GetCurrent and per-process shared-secret Hello, Galaxy-specific MemoryWatchdog (warn at max(1.5×baseline, +200 MB), soft-recycle at max(2×baseline, +200 MB), hard ceiling 1.5 GB, slope ≥5 MB/min over 30-min rolling window), RecyclePolicy (1 soft recycle per hour cap + 03:00 local daily scheduled), PostMortemMmf (1000-entry ring buffer in %ProgramData%\OtOpcUa\driver-postmortem\galaxy.mmf, survives hard crash, readable cross-process), MxAccessHandle : SafeHandle (ReleaseHandle loops Marshal.ReleaseComObject until refcount=0 then calls optional unregister callback), StaPump with responsiveness probe (BlockingCollection dispatcher for Phase 1 — real Win32 GetMessage/DispatchMessage pump slots in with the same semantics when the Galaxy code lift happens), IsExternalInit shim for init setters on .NET 4.8; Driver.Galaxy.Proxy (net10) implementing IDriver + ITagDiscovery forwarding over the IPC channel with MX data-type and security-classification mapping, plus Supervisor pieces — Backoff (5s → 15s → 60s capped, reset-on-stable-run), CircuitBreaker (3 crashes per 5 min opens; 1h → 4h → manual cooldown escalation; sticky alert doesn't auto-clear), HeartbeatMonitor (2s cadence, 3 consecutive misses = host dead per driver-stability.md). Infrastructure: docker SQL Server remapped to host port 14330 to coexist with the native MSSQL14 Galaxy ZB DB instance on 1433; NuGetAuditSuppress applied per-project for two System.Security.Cryptography.Xml advisories that only reach via EF Core Design with PrivateAssets=all (fix ships in 11.0.0-preview); .slnx gains 14 project registrations. Deferred with explicit TODOs in docs/v2/implementation/phase-2-partial-exit-evidence.md: Phase 1 Stream E Admin UI pages (Generations listing + draft-diff-publish, Equipment CRUD with OPC 40010 fields, UNS Areas/Lines tabs, ACLs + permission simulator, Generic JSON config editor, SignalR real-time, Release-Reservation + Merge-Equipment workflows, LDAP login page, AppServer smoke test per decision #142), Phase 2 Stream D (Galaxy MXAccess code lift out of legacy OtOpcUa.Host, dual-service installer, appsettings → DriverConfig migration script, legacy Host deletion — blocked by parity), Phase 2 Stream E (v1 IntegrationTests against v2 topology, Client.CLI walkthrough diff, four 2026-04-13 stability findings regression tests, adversarial review — requires live MXAccess runtime).
2026-04-17 21:35:25 -04:00
Phase 3 PR 19 — LDAP user identity + Basic256Sha256 security profile. Replaces the anonymous-only endpoint with a configurable security profile and an LDAP-backed UserName token validator. New IUserAuthenticator abstraction in Backend/Security/: LdapUserAuthenticator binds to the configured directory (reuses the pattern from Admin.Security.LdapAuthService without the cross-app dependency — Novell.Directory.Ldap.NETStandard 3.6.0 package ref added to Server alongside the existing OPCFoundation packages) and maps group membership to OPC UA roles via LdapOptions.GroupToRole (case-insensitive). DenyAllUserAuthenticator is the default when Ldap.Enabled=false so UserName token attempts return a clean BadUserAccessDenied rather than hanging on a localhost:3893 bind attempt. OpcUaSecurityProfile enum + LdapOptions nested record on OpcUaServerOptions. Profile=None keeps the PR 17 shape (SecurityPolicies.None + Anonymous token only) so existing integration tests stay green; Profile=Basic256Sha256SignAndEncrypt adds a second ServerSecurityPolicy (Basic256Sha256 + SignAndEncrypt) to the collection and, when Ldap.Enabled=true, adds a UserName token policy scoped to SecurityPolicies.Basic256Sha256 only — passwords must ride an encrypted channel, the stack rejects UserName over None. OtOpcUaServer.OnServerStarted hooks SessionManager.ImpersonateUser: AnonymousIdentityToken passes through; UserNameIdentityToken delegates to IUserAuthenticator.AuthenticateAsync — rejected identities throw ServiceResultException(BadUserAccessDenied); accepted identities get a RoleBasedIdentity that carries the resolved roles through session.Identity so future PRs can gate writes by role. OpcUaApplicationHost + OtOpcUaServer constructors take IUserAuthenticator as a dependency. Program.cs binds the new OpcUaServer:Ldap section from appsettings (Enabled defaults false, GroupToRole parsed as Dictionary<string,string>), registers IUserAuthenticator as LdapUserAuthenticator when enabled or DenyAllUserAuthenticator otherwise. PR 17 integration test updated to pass DenyAllUserAuthenticator so it keeps exercising the anonymous-only path unchanged. Tests — SecurityConfigurationTests (new, 13 cases): DenyAllAuthenticator rejects every credential; LdapAuthenticator rejects blank creds without hitting the server; rejects when Enabled=false; rejects plaintext when both UseTls=false AND AllowInsecureLdap=false (safety guard matching the Admin service); EscapeLdapFilter theory (4 rows: plain passthrough, parens/asterisk/backslash → hex escape) — regression guard against LDAP injection; ExtractOuSegment theory (3 rows: finds ou=, returns null when absent, handles multiple ou segments by returning first); ExtractFirstRdnValue theory (3 rows: strips cn= prefix, handles single-segment DN, returns plain string unchanged when no =). OpcUaServerOptions_default_is_anonymous_only asserts the default posture preserves PR 17 behavior. InternalsVisibleTo('ZB.MOM.WW.OtOpcUa.Server.Tests') added to Server csproj so ExtractOuSegment and siblings are reachable from the tests. Full solution: 0 errors, 180 tests pass (8 Core + 14 Proxy + 24 Configuration + 6 Shared + 91 Galaxy.Host + 19 Server (17 unit + 2 integration) + 18 Admin). Live-LDAP integration test (connect via Basic256Sha256 endpoint with a real user from GLAuth, assert the session.Identity carries the mapped role) is deferred to a follow-up — it requires the GLAuth dev instance to be running at localhost:3893 which is dev-machine-specific, and the test harness for that also needs a fresh client-side certificate provisioned by the live server's trusted store.
2026-04-18 08:49:46 -04:00
Phase 3 PR 26 — server-layer write authorization gating by role. Per the user's ACL-at-server-layer directive (saved as feedback_acl_at_server_layer.md in memory), write authorization is enforced in DriverNodeManager.OnWriteValue and never delegated to the driver or to driver-specific auth (the v1 Galaxy-provided security path is explicitly not part of v2 — drivers report SecurityClassification as discovery metadata only). New WriteAuthzPolicy static class in Server/Security/ maps SecurityClassification → required role per the table documented in docs/Configuration.md: FreeAccess = no role required (anonymous sessions can write), Operate + SecuredWrite = WriteOperate, Tune = WriteTune, VerifiedWrite + Configure = WriteConfigure, ViewOnly = deny regardless of roles. Role matching is case-insensitive and role requirements do NOT cascade — a session with WriteConfigure can write Configure attributes but needs WriteOperate separately to write Operate attributes; this is deliberate so escalation is an explicit LDAP group assignment, not a hierarchy the policy silently grants. DriverNodeManager gains a _securityByFullRef Dictionary populated during Variable() registration (parallel to the existing _variablesByFullRef) so OnWriteValue can look up the classification in O(1) on the hot path. OnWriteValue casts the session's context.UserIdentity to the new IRoleBearer interface (implemented by OtOpcUaServer.RoleBasedIdentity from PR 19) — empty Roles collection when the session is anonymous; the same WriteAuthzPolicy.IsAllowed check then either short-circuits true (FreeAccess), false (ViewOnly), or walks the roles list looking for the required one. On deny, OnWriteValue logs 'Write denied for {FullRef}: classification=X userRoles=[...]' at Information level (readable trail for operator complaints) and returns BadUserAccessDenied without touching IWritable.WriteAsync — drivers never see a request we'd have refused. IRoleBearer kept as a minimal server-side interface rather than reusing some abstraction from Core.Abstractions because the concept is OPC-UA-session-scoped and doesn't generalize (the driver side has no notion of a user session). Tests — WriteAuthzPolicyTests (17 new cases): FreeAccess allows write with empty role set + arbitrary roles; ViewOnly denies write even with every role; Operate requires WriteOperate; role match is case-insensitive; Operate denies empty role set + wrong role; SecuredWrite shares Operate's requirement; Tune requires WriteTune; Tune denies WriteOperate-only (asserts roles don't cascade — this is the test that catches a future regression where someone 'helpfully' adds a role-escalation table); Configure requires WriteConfigure; VerifiedWrite shares Configure's requirement; multi-role session allowed when any role matches; unrelated roles denied; RequiredRole theory covering all 5 classified-and-mapped rows + null for FreeAccess/ViewOnly special cases. lmx-followups.md follow-up #2 marked DONE with a back-reference to this PR and the memory note. Full Server.Tests Unit suite: 38 pass / 0 fail (17 new WriteAuthz + 14 SecurityConfiguration from PR 19 + 2 NodeBootstrap + 5 others). Server.Tests Integration (Category=Integration) 2 pass — existing PR 17 anonymous-endpoint smoke tests stay green since the read path doesn't hit OnWriteValue.
2026-04-18 13:01:01 -04:00