Commit Graph

43 Commits

Author SHA1 Message Date
Joseph Doherty
6e282b9946 server: Phase7Composer accepts DI-registered IAlarmHistorianWriter (PR B.4)
Sixth PR of the alarms-over-gateway epic
(docs/plans/alarms-over-gateway.md). Depends on PR C.2 (sidecar
serves IAlarmEventWriter when enabled), already merged.

Today Phase7Composer.ResolveHistorianSink only scans drivers for an
IAlarmHistorianWriter — no Galaxy driver provides one since PR 7.2,
so the resolution falls through to NullAlarmHistorianSink and
scripted-alarm transitions are silently discarded.

WonderwareHistorianClient already implements IAlarmHistorianWriter
and Program.cs:178 already registers it as a singleton when
Historian:Wonderware:Enabled=true. The gap was that Phase7Composer
ignored DI: this PR adds an optional injectedWriter constructor
parameter, and ASP.NET Core DI resolves it from the same
registration when present.

- Phase7Composer constructor: new optional IAlarmHistorianWriter?
  injectedWriter parameter (default null). Backward-compatible —
  existing callers don't need to change; DI populates it
  automatically when the singleton is registered.
- New static SelectAlarmHistorianWriter helper — resolution order
  is driver → DI → null. Drivers win when both are present so a
  future GalaxyDriver-as-IAlarmHistorianWriter takes the write
  path directly, preserving the v1 invariant where a driver that
  natively owns the historian client doesn't bounce through the
  sidecar IPC.
- ResolveHistorianSink uses the helper + emits a structured log
  line identifying which source provided the writer.

Tests:
- 4 SelectAlarmHistorianWriter precedence tests — no source / DI
  only / driver wins over DI / first-driver-with-writer wins.
- Pre-existing 4 HostStatusPublisherTests SQL failures unrelated
  to this change (require the docker-host SQL Server at
  10.100.0.35,14330 per CLAUDE.md). Phase7 + alarm tests all
  green.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-30 16:31:00 -04:00
Joseph Doherty
21cac4c8c4 PR 4.W — Galaxy:Backend wiring + server-side factory registration
- GalaxyDriver.InitializeAsync now builds the production gw runtime (MxGatewayClient,
  GalaxyMxSession, GatewayGalaxySubscriber, GatewayGalaxyDataWriter,
  ReconnectSupervisor, HostConnectivityForwarder, PerPlatformProbeWatcher) when no
  test seams are pre-injected; Dispose tears the chain down in order.
- GetHealth surfaces supervisor.IsDegraded as DriverState.Degraded so a transport
  drop is observable without polling the supervisor directly.
- DiscoverAsync now refreshes the per-platform probe watcher's membership against
  $WinPlatform / $AppEngine objects after every discovery pass.
- OnPumpDataChange routes ScanState changes through the probe watcher in addition
  to fanning out OnDataChange to ISubscribable consumers.
- Server registers GalaxyDriver under "GalaxyMxGateway" alongside the legacy
  "Galaxy" GalaxyProxyDriver factory so DriverInstance rows can opt in.
- Bumped Server.Tests' Microsoft.Extensions.Logging.Abstractions to 10.0.7 to
  resolve the downgrade pulled in transitively via MxGateway.Client.
- Lifecycle factory tests switched to the internal seam-injection ctor so they
  no longer attempt a real gRPC connect during InitializeAsync.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-29 16:10:31 -04:00
Joseph Doherty
ef22a61c39 v2 mxgw migration — Phase 1+2+3.1 wiring (7 PRs)
Foundational PRs from lmx_mxgw_impl.md, all green. Bodies only — DI/wiring
deferred to PR 1+2.W (combined wire-up) and PR 3.W.

PR 1.1 — IHistorianDataSource lifted to Core.Abstractions/Historian/
  Reuses existing DataValueSnapshot + HistoricalEvent shapes; sidecar (PR
  3.4) translates byte-quality → uint StatusCode internally.

PR 1.2 — IHistoryRouter + HistoryRouter on the server
  Longest-prefix-match resolution, case-insensitive, ObjectDisposed-guarded,
  swallow-on-shutdown disposal of misbehaving sources.

PR 1.3 — DriverNodeManager.HistoryRead* dispatch through IHistoryRouter
  Per-tag resolution with LegacyDriverHistoryAdapter wrapping
  `_driver as IHistoryProvider` so existing tests + drivers keep working
  until PR 7.2 retires the fallback.

PR 2.1 — AlarmConditionInfo extended with five sub-attribute refs
  InAlarmRef / PriorityRef / DescAttrNameRef / AckedRef / AckMsgWriteRef.
  Optional defaulted parameters preserve all existing 3-arg call sites.

PR 2.2 — AlarmConditionService state machine in Server/Alarms/
  Driver-agnostic port of GalaxyAlarmTracker. Sub-attribute refs come from
  AlarmConditionInfo, values arrive as DataValueSnapshot, ack writes route
  through IAlarmAcknowledger. State machine preserves Active/Acknowledged/
  Inactive transitions, Acked-on-active reset, post-disposal silence.

PR 2.3 — DriverNodeManager wires AlarmConditionService
  MarkAsAlarmCondition registers each alarm-bearing variable with the
  service; DriverWritableAcknowledger routes ack-message writes through
  the driver's IWritable + CapabilityInvoker. Service-raised transitions
  route via OnAlarmServiceTransition → matching ConditionSink. Legacy
  IAlarmSource path unchanged for null service.

PR 3.1 — Driver.Historian.Wonderware shell project (net48 x86)
  Console host shell + smoke test; SDK references + code lift come in
  PR 3.2.

Tests: 9 (PR 1.1) + 5 (PR 2.1) + 10 (PR 1.2) + 19 (PR 2.2) + 1 (PR 3.1)
all pass. Existing AlarmSubscribeIntegrationTests + HistoryReadIntegrationTests
unchanged.

Plan + audit docs (lmx_backend.md, lmx_mxgw.md, lmx_mxgw_impl.md)
included so parallel subagent worktrees can read them.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-29 14:03:36 -04:00
Joseph Doherty
802366c2c6 Task #154 — driver-diagnostics RPC: HTTP endpoint + Admin client
Foundation for surfacing per-driver runtime state from the Server process to
the Admin UI. #152 shipped GetAutoProhibitedRanges() as an in-process
accessor; #154 makes it reachable across processes.

Server side (HealthEndpointsHost):
- New URL family: /diagnostics/drivers/{driverInstanceId}/{driverType}/{topic}
- First wired topic: /diagnostics/drivers/{id}/modbus/auto-prohibited
- Driver-agnostic at the URL level — future driver types add their own
  segments[3] cases (e.g. /diagnostics/drivers/{id}/s7/dropped-pdus).
- 404 when the driver instance doesn't exist; 400 when the driver exists
  but isn't a Modbus driver (the per-type endpoint is wrong for this row).
- Response shape is flat JSON (unitId / region / startAddress / endAddress /
  lastProbedUtc / bisectionPending) so consumers don't have to reference the
  Driver.Modbus assembly's ModbusAutoProhibition record.
- Re-uses the existing HttpListener bound to localhost:4841 — same auth /
  reachability story as /healthz and /readyz.

Admin side:
- DriverDiagnosticsClient (Services/) — HttpClient wrapper that fetches the
  per-driver Modbus prohibition list. Returns null on 404/400 (driver
  missing or wrong type); throws on transport failures.
- ModbusAutoProhibitionsResponse + ModbusAutoProhibitionRow flat DTOs —
  client doesn't take a dep on Driver.Modbus.
- ModbusDiagnostics.razor at /modbus/diagnostics/{driverInstanceId} —
  table view with BISECTING (warning yellow) / ISOLATED (danger red)
  badges, relative timestamps (e.g. "5m ago"), Refresh button. Errors
  surface inline rather than swallowing.
- HttpClient registration in Program.cs reads
  DriverDiagnostics:ServerBaseUrl from appsettings.json (default
  http://localhost:4841/ for same-host deployments).

Tests (3 new in HealthEndpointsHostTests):
- Diagnostics_ReturnsModbusAutoProhibitions_ForLiveDriver — registers a
  Modbus driver with a programmable transport that protects register 102,
  records the prohibition via a coalesced ReadAsync, hits the endpoint,
  asserts the returned JSON matches (unitId / region / start / end / pending).
- Diagnostics_404_When_Driver_Not_Found
- Diagnostics_400_When_Driver_Is_Wrong_Type

Architecture note: the Admin-side bUnit-style component test isn't included
because Admin.Tests doesn't have bUnit set up. The DriverDiagnosticsClient
is unit-testable on its own with a mock HandlerStub if needed — left as a
follow-up alongside the broader bUnit setup task.

The diagnostic page is now reachable at /modbus/diagnostics/{driverId} from
any Admin instance pointing at a Server endpoint URL. Future driver types
(S7, AbCip) plug into the same channel by adding their own URL segments
in HealthEndpointsHost.WriteDriverDiagnosticsAsync.
2026-04-25 01:32:21 -04:00
Joseph Doherty
fb760bc465 Task #135 — update integration-test NodeIds for path-based scheme
7 integration tests in Server.Tests were left behind by the path-based
NodeId rename (#134). Each was constructing test NodeIds in the old
"FullReference" shape ("TestFolder.Var1", "raw.var", "AlphaFolder.Var1",
"plcaddr-temperature"), which the node manager no longer mints — the new
shape is `{driverId}/{folder-path}/{browseName}` per OPC UA Part 3 §5.2.2
NodeId immutability.

Fixed by re-deriving each test NodeId from the actual browse path the test
fixture's driver registers:

- OpcUaServerIntegrationTests: "TestFolder.Var1" → "fake/TestFolder/Var1"
- HistoryReadIntegrationTests (4 tests): "raw.var" → "history-driver/raw",
  "proc.var" → "history-driver/proc" (×2), "atTime.var" → "history-driver/atTime"
- MultipleDriverInstancesIntegrationTests: "AlphaFolder.Var1" →
  "alpha/AlphaFolder/Var1"; "BetaFolder.Var1" → "beta/BetaFolder/Var1"
- OpcUaEquipmentWalkerIntegrationTests: "plcaddr-temperature" →
  "galaxy-prod/warsaw/line-a/oven-3/Temperature" (the walker uses Tag.Name
  as the browseName; the FullReference lives in TagConfig but no longer
  surfaces in the NodeId path)

Server.Tests now 277/277 green excluding LiveLdap. Clears the regression
flagged during the #124 verification run.
2026-04-24 22:03:03 -04:00
Joseph Doherty
75c07149d4 Task #124 — Phase 6.2 multi-user authz interop matrix + close LdapGroups gap
The Phase 6.2 evaluator was wired but received no input in production:
RoleBasedIdentity (the IUserIdentity our LDAP path produces) implemented
IRoleBearer but not ILdapGroupsBearer, so AuthorizationGate.BuildSessionState
always returned null and the gate lax-mode-allowed every request. UserAuthResult
also never carried the resolved LDAP groups, only the role-mapped strings.

Closing the gap so the evaluator gets real data:

- UserAuthResult adds Groups alongside Roles. LdapUserAuthenticator now
  surfaces the raw RDN values (ReadOnly / WriteOperate / ...) it already
  collected during the directory query. Roles stay separate per decision #150
  (control-plane Admin role mapping vs data-plane NodeAcl key).
- RoleBasedIdentity implements ILdapGroupsBearer so AuthorizationGate sees
  the groups via the same seam unit tests already use.

ThreeUserInteropMatrixTests drives the closure end-to-end against the live
GLAuth dev directory:

- 5 distinct group memberships (readonly / writeop / writetune /
  writeconfig / alarmack) plus the multi-group admin user
- Each is bound through the real LdapUserAuthenticator
- Resolved groups feed an LdapBoundIdentity that goes through the strict-mode
  AuthorizationGate against a seeded TriePermissionEvaluator
- 31 InlineData rows assert the role × operation matrix; failures pinpoint
  the exact (user, op) cell

The remaining wire-level leg of #124 — a real OPC UA client driving UserName
tokens through an encrypted endpoint policy — still needs a deployment knob
and stays a manual cross-vendor smoke (#119 / #124 manual scope). The doc
audit note in admin-ui-phase-6-status.md is updated to reflect what's now
auto'd vs what stays manual.

33/33 new tests pass against live GLAuth; existing 270 non-LiveLdap tests
in Server.Tests still pass; Core.Tests 205/205, Admin.Tests 109/109. The 7
integration-test failures observed during this run pre-exist this commit
(NodeId-scheme regression from #134) and are tracked separately as #135.
2026-04-24 20:40:07 -04:00
Joseph Doherty
1be0fb5a29 Phase 6.2 Stream C.12 — lock in ScopePathIndexBuilder semantics with tests
Closes task #123 (partial — builder semantics unit-tested; production
wiring is the new task #133).

ScopePathIndexBuilder + NodeScopeResolver indexed mode already exist —
they produce a full Cluster → Namespace → UnsArea → UnsLine → Equipment
→ Tag scope from the published generation's config rows. What was
missing: unit coverage of the Build semantics (the only consumers were
compile-time references) + explicit acknowledgement in the readiness
doc that the gate/resolver aren't yet wired into Program.cs.

Tests — 6 cases in ScopePathIndexBuilderTests.cs:
- Well-formed content emits full hierarchy.
- Tags with null EquipmentId skipped (SystemPlatform-namespace fallback).
- Tags with broken Equipment FK skipped (publish-time validation
  should have caught; builder is defensive).
- Equipment with broken Line FK skipped.
- Duplicate TagConfig throws InvalidOperationException.
- Resolver with index returns full-path scope; un-indexed ref falls
  through to cluster-only scope (pre-ADR-001 behaviour preserved).

Server.Tests 277 → 283.

Critical follow-up (task #133): Program.cs still constructs
OpcUaApplicationHost WITHOUT authzGate or scopeResolver, so all six
dispatch-layer gates (Read, Write, HistoryRead, Browse,
CreateMonitoredItems, Call) are currently inert in production. Wiring
them up — load NodeAcl + EquipmentNamespaceContent at bootstrap,
construct gate + resolver, pass into OpcUaApplicationHost, rebind on
generation refresh — is the last Phase 6.2 GA blocker.

Docs: v2-release-readiness.md Phase 6.2 Stream C hardening list marks
the scope-resolution bullet struck-through with a close-out note that
calls out the gate-inert-in-production gap + task #133.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-24 15:28:19 -04:00
Joseph Doherty
ded292ecd7 Phase 6.2 Stream C — Call + Alarm Acknowledge/Confirm gating
Closes task #122 (Acknowledge + Confirm + generic Call — Shelve stays as
a follow-up pending per-instance method-NodeId resolution).

Before this commit any session with a connected channel could invoke
method nodes on driver-materialized equipment — including alarm
Acknowledge / Confirm. Combined with the Browse + CreateMonitoredItems
gates that landed earlier in Stream C, this was the last service-layer
entry point where a session could still affect state without passing
the authz trie.

Implementation on DriverNodeManager:
- `Call` override — pre-iterates methodsToCall, gates each through
  AuthorizationGate with the operation kind returned by
  MapCallOperation. Denied calls get errors[i] = BadUserAccessDenied
  before delegating to base.Call.
- `MapCallOperation(NodeId methodId)` — maps well-known Part 9 method
  NodeIds to dedicated operation kinds:
    MethodIds.AcknowledgeableConditionType_Acknowledge →
        OpcUaOperation.AlarmAcknowledge
    MethodIds.AcknowledgeableConditionType_Confirm →
        OpcUaOperation.AlarmConfirm
    everything else → OpcUaOperation.Call
  Lets the ACL distinguish "can acknowledge alarms" from "can invoke
  arbitrary methods" without conflating the two roles.
- Shelve dispatch paths through per-instance ShelvedStateMachine methods
  with dynamic NodeIds that can't be constant-matched — falls through
  to generic Call. Fine-grained OpcUaOperation.AlarmShelve is a follow-
  up when the method-invocation path grows a "method-role" annotation.

Extracted GateCallMethodRequests + MapCallOperation as static internal
for unit-testability. 8 new tests (MapCallOperation Acknowledge /
Confirm / generic; gate-null no-op, denied-Acknowledge, allowed-
Acknowledge, mixed-batch, pre-populated-error-preserved).
Server.Tests 269 → 277.

Known follow-ups:
- Shelve per-operation gating (see above).
- TranslateBrowsePathsToNodeIds gating (Browse follow-up from #120).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-24 15:22:19 -04:00
Joseph Doherty
6a6b0f56f2 Phase 6.2 Stream C — CreateMonitoredItems per-item gating
Closes task #121 (partial — creation-time gate; decision #153 per-item
revocation stamp is a follow-up).

Before this commit a session could subscribe to any node via
CreateMonitoredItems, even nodes where Read was denied — the
subscription would surface BadUserAccessDenied on each data-change
read, but the client saw a successful CreateMonitoredItems response
and held the subscription open, wasting resources and leaking the
address-space shape through the item metadata.

New override on DriverNodeManager.CreateMonitoredItems:
- Pre-iterates itemsToCreate, gates each through AuthorizationGate with
  OpcUaOperation.CreateMonitoredItems at the target node's scope.
- For denied slots: sets errors[i] = new ServiceResult(
  StatusCodes.BadUserAccessDenied). The OPC Foundation base stack
  honours pre-populated non-success errors and skips item creation for
  those slots — the subscription never holds a handle to a denied
  node.
- Preserves prior errors (e.g. BadNodeIdUnknown) — first diagnosis wins.
- Non-string-identifier references (stack-synthesized numeric ids)
  bypass the gate.

Extracted the pure filter logic into
GateMonitoredItemCreateRequests(items, errors, identity, gate,
scopeResolver) — static internal, unit-testable without the OPC UA
server stack.

Tests — 6 new in MonitoredItemGatingTests.cs (gate-null no-op,
denied-gets-BadUserAccessDenied, allowed-passes, mixed-batch-denies-
per-item, pre-populated-error-preserved, numeric-id-bypass). Server.Tests
263 → 269.

Known follow-ups:
- Per-item (AuthGenerationId, MembershipVersion) stamp (decision #153)
  for detecting revocation mid-subscription — needs subscription-layer
  plumbing.
- TransferSubscriptions not yet wired (same pattern, smaller scope).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-24 15:17:40 -04:00
Joseph Doherty
e8b8541554 Phase 6.2 Stream C — Browse gating on DriverNodeManager
Closes task #120 (partial — strict point-check; ancestor-visibility
implication is a follow-up).

Before this commit DriverNodeManager exposed every materialized node to
every browsing session regardless of the user's ACL. Read + Write +
HistoryRead were already gated through AuthorizationGate in Phase 6.2
Stream C core; Browse was the one surface where the session could still
enumerate nodes it had no permission to touch, discovering structure
even when reads failed with BadUserAccessDenied.

Implementation
- New `Browse` override on DriverNodeManager that calls base.Browse
  first (lets the stack populate the reference list normally), then
  post-filters the IList<ReferenceDescription> so denied nodes are
  removed silently. OPC UA convention: Browse filtering is invisible to
  the client; no BadUserAccessDenied surfaces.
- Extracted the filter loop into the static internal
  `FilterBrowseReferences(references, userIdentity, gate, scopeResolver)`
  so the policy is unit-testable without standing up the full OPC UA
  server stack.
- Non-string NodeId identifiers (stack-synthesized standard-type
  references with numeric identifiers) bypass the gate — only driver-
  materialized nodes key into the authz trie.
- When AuthorizationGate or NodeScopeResolver is null, the filter is a
  no-op — preserves the pre-Phase-6.2 dispatch path for integration
  tests that construct DriverNodeManager without authz.

Tests — 6 new in BrowseGatingTests.cs (gate-null no-op, empty-list
no-op, denied-removed, allowed-passes-through, numeric-id bypass,
lax-mode null-identity keeps references). Server.Tests 257 → 263.

Known follow-up (tracked implicitly under #120 re-scope):
- Ancestor-visibility implication (acl-design.md §Browse line 111): a
  user with Read at `Line/Tag` should be able to Browse `Line` even
  without an explicit Browse grant. Current filter does a strict
  point-check. Proper fix needs TriePermissionEvaluator to expose a
  "subtree-has-any-grant" query.
- TranslateBrowsePathsToNodeIds not yet filtered (same extension
  pattern; small follow-up).

Docs: v2-release-readiness.md Phase 6.2 Stream C hardening list marks
the Browse bullet struck-through with "Partial" close-out note.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-24 15:11:19 -04:00
Joseph Doherty
a23de2a7e4 Phase 6.3 A.2 + D.1 — GenerationRefreshHostedService: poll + lease-wrap apply
Closes tasks #132 + #118 (GA hardening backlog).

Before this commit, the Server only observed the generation in force at
process start (SealedBootstrap). Peer-published generations accumulated
in the shared config DB while the running node kept serving the
generation it had sealed on boot. Two consequences:

1. Operator role-swaps required a process restart — Admin publishes a
   new generation, but the Server's RedundancyCoordinator never re-read
   the topology.
2. ApplyLeaseRegistry had no apply to wrap. ServiceLevelBand sat at
   PrimaryHealthy (255) during every publish because nothing opened a
   lease; PrimaryMidApply (200) was effectively dead code.

New GenerationRefreshHostedService (src/.../Server/Hosting/):
- Polls sp_GetCurrentGenerationForCluster every 5s (tunable).
- On change: opens leases.BeginApplyLease(newGenerationId, Guid.NewGuid()),
  calls coordinator.RefreshAsync inside the `await using`, releases on
  scope exit (success / exception / cancellation via IAsyncDisposable).
- Diagnostic properties: LastAppliedGenerationId, TickCount, RefreshCount.
- Delegate-injected currentGenerationQuery for test drive-through; real
  path is the private static DefaultQueryCurrentGenerationAsync.
- Registered as HostedService in Program.cs alongside the Phase 6.3
  redundancy / peer-probe stack.

Scope intentionally narrow: only the coordinator refreshes today. Driver
re-init, virtual-tag re-bind, script-engine reload remain as follow-up
wiring. The lease wrap is the right seam for those subscribers to hook
once they grow hot-reload support — the doc comments say so.

Tests
- 5 new unit tests in GenerationRefreshHostedServiceTests (first-apply,
  identity no-op, change-triggers-refresh, null-generation-is-no-op,
  lease-is-released-on-exit). Stub generation-query delegate; real
  coordinator backed by EF InMemory DB.
- Server.Tests total 252 → 257.

Docs
- v2-release-readiness.md Phase 6.3 follow-ups list marks the
  sp_PublishGeneration lease wrap bullet struck-through with close-out
  note.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-24 15:02:33 -04:00
Joseph Doherty
de77d42eab Phase 6.3 Stream B — peer-probe HostedServices populating PeerReachabilityTracker
Closes task #116 (GA hardening backlog). Before this commit the
RedundancyStatePublisher saw PeerReachability.Unknown for every peer
because the tracker had no writers — every healthy peer got
degraded to the Isolated-Primary band (230) even when fully reachable.
Not release-blocking (safe default), but not the full non-transparent-
redundancy UX either.

Two-layer probe model per docs/v2/implementation/phase-6-3-redundancy-runtime.md
§Stream B:

- PeerHttpProbeLoop (Stream B.1) — fast-fail layer at 2 s / 1 s timeout.
  Hits each peer's http://{Host}:{DashboardPort}/healthz via an injected
  IHttpClientFactory. Writes the HTTP bit of PeerReachability while
  preserving the UA bit from the last UA probe so a transient HTTP blip
  doesn't clobber the authoritative UA reading.

- PeerUaProbeLoop (Stream B.2) — authoritative layer at 10 s / 5 s
  timeout. Calls DiscoveryClient.GetEndpoints against opc.tcp://{Host}:
  {OpcUaPort} — cheap compared to a full Session.Create, no cert trust
  required. Short-circuits when the HTTP probe last reported the peer
  unhealthy (no wasted handshakes on a known-dead endpoint), clearing
  the stale UaHealthy bit in that case.

Both inherit from BackgroundService, follow the tick/delay/catch pattern
RedundancyPublisherHostedService + ResilienceStatusPublisherHostedService
established, and expose TickAsync() as internal for test drive-through.

New PeerProbeOptions class carries the four intervals/timeouts so
operators can tune cadence per site. Registered as singleton in Program.cs;
HTTP client registered by name so the OtOpcUa handler chain
(Serilog enrichers, potential future OpenTelemetry instrumentation) isn't
bypassed.

Tests — 9 new unit tests across PeerHttpProbeLoopTests (5) and
PeerUaProbeLoopTests (4). All pass. Server.Tests total 243 → 252.
Full solution build clean.

Docs: v2-release-readiness.md Phase 6.3 follow-ups list marks the
peer-probe bullet struck-through with a close-out note.

Still deferred in Phase 6.3:
  - OPC UA variable-node binding (task #117 — ServiceLevel + ServerUriArray)
  - sp_PublishGeneration lease wrap (task #118)
  - Client interop matrix (task #119)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-24 14:53:38 -04:00
Joseph Doherty
69e0d02c72 task-galaxy-e2e branch — non-FOCAS work-in-progress snapshot
Catch-all commit for pending work on the task-galaxy-e2e branch that
wasn't part of the FOCAS migration. Grouping by topic so future per-topic
commits can be cherry-picked if needed.

TwinCAT
- src/.../Driver.TwinCAT/AdsTwinCATClient.cs + TwinCATDriverFactoryExtensions.cs:
  factory-registration extensions + ADS client refinements.
- src/.../Driver.TwinCAT.Cli/Commands/BrowseCommand.cs: new browse command
  for the TwinCAT test-client CLI.
- tests/.../Driver.TwinCAT.IntegrationTests/TwinCAT3SmokeTests.cs + TwinCatProject/:
  fixture scaffold with a minimal POU + README pointing at the TCBSD/ESXi
  VM for e2e.
- docs/Driver.TwinCAT.Cli.md + docs/drivers/TwinCAT-Test-Fixture.md:
  documentation for the above.
- docs/v3/twincat-backlog.md: forward-looking backlog seed.

Admin UI + fleet status
- src/.../Admin/Components/Pages/Clusters/DriversTab.razor + Hosts.razor:
  UI refresh for fleet-status rendering.
- src/.../Admin/Hubs/FleetStatusHub.cs + FleetStatusPoller.cs +
  Admin/Program.cs: SignalR hub + poller plumbing for live fleet data.
- tests/.../Admin.Tests/FleetStatusPollerTests.cs: poller coverage.

Server + redundancy runtime (Phase 6.3 follow-ups)
- src/.../Server/Hosting/RedundancyPublisherHostedService.cs: HostedService
  that owns the RedundancyStatePublisher lifecycle + wires peer reachability.
- src/.../Server/Redundancy/ServerRedundancyNodeWriter.cs: OPC UA
  variable-node writer binding ServiceLevel + ServerUriArray to the
  publisher's events.
- src/.../Server/Program.cs + Server.csproj: hosted-service registration.
- tests/.../Server.Tests/ServerRedundancyNodeWriterTests.cs +
  Server.Tests.csproj: coverage for the above.

Configuration
- src/.../Configuration/Validation/DraftValidator.cs +
  tests/.../Configuration.Tests/DraftValidatorTests.cs: draft-validation
  refinements.

E2E scripts (shared infrastructure)
- scripts/e2e/README.md + _common.ps1 + test-all.ps1: shared helpers + the
  all-drivers test-all runner.
- scripts/e2e/test-opcuaclient.ps1: OPC UA Client e2e runner.

Docs
- docs/v2/implementation/phase-6-{1,2,3,4}*.md + exit-gate-phase-{3,7}.md:
  phase-gate + implementation doc updates.
- docs/v2/plan.md: top-level plan refresh.
- docs/v2/redundancy-interop-playbook.md: client interop playbook for the
  Phase 6.3 redundancy-runtime work.

Two orphan FOCAS docs remain on disk but deliberately unstaged —
docs/v2/focas-deployment.md and docs/v2/implementation/focas-simulator-plan.md
describe the now-retired Tier-C topology and should either be rewritten
or deleted in a follow-up.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-24 14:12:19 -04:00
Joseph Doherty
8221fac8c1 Task #219 follow-up — close AlarmConditionState child-NodeId + event-propagation gaps
PR #197 surfaced two integration-level wiring gaps in DriverNodeManager's
MarkAsAlarmCondition path; this commit fixes both and upgrades the integration
test to assert them end-to-end.

Fix 1 — addressable child nodes: AlarmConditionState inherits ~50 typed children
(Severity / Message / ActiveState / AckedState / EnabledState / …). The stack
was leaving them with Foundation-namespace NodeIds (type-declaration defaults) or
shared ns=0 counter allocations, so client Read on a child returned
BadNodeIdUnknown. Pass assignNodeIds=true to alarm.Create, then walk the condition
subtree and rewrite each descendant's NodeId symbolically as
  {condition-full-ref}.{symbolic-path}
in the node manager's namespace. Stable, unique, and collision-free across
multiple alarm instances in the same driver.

Fix 2 — event propagation to Server.EventNotifier: OPC UA Part 9 event
propagation relies on the alarm condition being reachable from Objects/Server
via HasNotifier. Call CustomNodeManager2.AddRootNotifier(alarm) after registering
the condition so subscriptions placed on Server-object EventNotifier receive the
ReportEvent calls ConditionSink emits per-transition.

Test upgrades in AlarmSubscribeIntegrationTests:
  - Driver_alarm_transition_updates_server_side_AlarmConditionState_node — now
    asserts Severity == 700, Message text, and ActiveState.Id == true through
    the OPC UA client (previously scoped out as BadNodeIdUnknown).
  - New: Driver_alarm_event_flows_to_client_subscription_on_Server_EventNotifier
    subscribes an OPC UA event monitor on ObjectIds.Server, fires a driver
    transition, and waits for the AlarmConditionType event to be delivered,
    asserting Message + Severity fields. Previously scoped out as "Part 9 event
    propagation out of reach."

Regression checks: 239 server tests pass (+1 new event-subscription test),
195 Core tests pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-21 00:22:02 -04:00
Joseph Doherty
acf31fd943 Task #219 — Server-integration test coverage for IAlarmSource dispatch path
Adds AlarmSubscribeIntegrationTests alongside HistoryReadIntegrationTests so both
optional driver capabilities — IHistoryProvider (already covered) and IAlarmSource
(new) — have end-to-end coverage that boots the full OPC UA stack and exercises the
wiring path from driver event → GenericDriverNodeManager forwarder → DriverNodeManager
ConditionSink through a real Session.

Two tests:
  1. Driver_alarm_transition_updates_server_side_AlarmConditionState_node — a fake
     IAlarmSource declares an IsAlarm=true variable, calls MarkAsAlarmCondition in
     DiscoverAsync, and fires OnAlarmEvent for that source. Verifies the
     client can browse the alarm condition node at FullReference + ".Condition"
     and reads the DisplayName back through Session.Read.
  2. Each_IsAlarm_variable_registers_its_own_condition_node_in_the_driver_namespace —
     two IsAlarm variables each produce their own addressable AlarmConditionState,
     proving the CapturingHandle per-variable registration works.

Scoped-out (documented in the class docstring): the stack exposes AlarmConditionState's
inherited children (Severity / Message / ActiveState / …) with Foundation-namespace
NodeIds that DriverNodeManager does not add to its predefined-node index, so reading
those child attributes through a client returns BadNodeIdUnknown. OPC UA Part 9 event
propagation (subscribe-on-Server + ConditionRefresh) is likewise out of reach until
the node manager wires HasNotifier + child-node registration. The existing Core-level
GenericDriverNodeManagerTests cover the in-memory alarm-sink fan-out semantics.

Full Server.Tests suite: 238 passed, 0 failed.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-20 23:33:45 -04:00
Joseph Doherty
3d78033ea4 Driver-instance bootstrap pipeline (#248) — DriverInstance rows materialise as live IDriver instances
Closes the gap surfaced by Phase 7 live smoke (#240): DriverInstance rows in
the central config DB had no path to materialise as live IDriver instances in
DriverHost, so virtual-tag scripts read BadNodeIdUnknown for every tag.

## DriverFactoryRegistry (Core.Hosting)
Process-singleton type-name → factory map. Each driver project's static
Register call pre-loads its factory at Program.cs startup; the bootstrapper
looks up by DriverInstance.DriverType + invokes with (DriverInstanceId,
DriverConfig JSON). Case-insensitive; duplicate-type registration throws.

## GalaxyProxyDriverFactoryExtensions.Register (Driver.Galaxy.Proxy)
Static helper — no Microsoft.Extensions.DependencyInjection dep, keeps the
driver project free of DI machinery. Parses DriverConfig JSON for PipeName +
SharedSecret + ConnectTimeoutMs. DriverInstanceId from the row wins over JSON
per the schema's UX_DriverInstance_Generation_LogicalId.

## DriverInstanceBootstrapper (Server)
After NodeBootstrap loads the published generation: queries DriverInstance
rows scoped to that generation, looks up the factory per row, constructs +
DriverHost.RegisterAsync (which calls InitializeAsync). Per plan decision
#12 (driver isolation), failure of one driver doesn't prevent others —
logs ERR + continues + returns the count actually registered. Unknown
DriverType (factory not registered) logs WRN + skips so a missing-assembly
deployment doesn't take down the whole server.

## Wired into OpcUaServerService.ExecuteAsync
After NodeBootstrap.LoadCurrentGenerationAsync, before
PopulateEquipmentContentAsync + Phase7Composer.PrepareAsync. The Phase 7
chain now sees a populated DriverHost so CachedTagUpstreamSource has an
upstream feed.

## Live evidence on the dev box
Re-ran the Phase 7 smoke from task #240. Pre-#248 vs post-#248:
  Equipment namespace snapshots loaded for 0/0 driver(s)  ← before
  Equipment namespace snapshots loaded for 1/1 driver(s)  ← after

Galaxy.Host pipe ACL denied our SID (env-config issue documented in
docs/ServiceHosting.md, NOT a code issue) — the bootstrapper logged it as
"failed to initialize, driver state will reflect Faulted" and continued past
the failure exactly per plan #12. The rest of the pipeline (Equipment walker
+ Phase 7 composer) ran to completion.

## Tests — 5 new DriverFactoryRegistryTests
Register + TryGet round-trip, case-insensitive lookup, duplicate-type throws,
null-arg guards, RegisteredTypes snapshot. Pure functions; no DI/DB needed.
The bootstrapper's DB-query path is exercised by the live smoke (#240) which
operators run before each release.
2026-04-20 22:49:25 -04:00
Joseph Doherty
7352db28a6 Phase 7 follow-up #246 — Phase7Composer + Program.cs wire-in
Activates the Phase 7 engines in production. Loads Script + VirtualTag +
ScriptedAlarm rows from the bootstrapped generation, wires the engines through
the Phase7EngineComposer kernel (#243), starts the DriverSubscriptionBridge feed
(#244), and late-binds the resulting IReadable sources to OpcUaApplicationHost
before OPC UA server start.

## Phase7Composer (Server.Phase7)

Singleton orchestrator. PrepareAsync loads the three Phase 7 row sets in one
DB scope, builds CachedTagUpstreamSource, calls Phase7EngineComposer.Compose,
constructs DriverSubscriptionBridge with one DriverFeed per registered
ISubscribable driver (path-to-fullRef map built from EquipmentNamespaceContent
via MapPathsToFullRefs), starts the bridge.

DisposeAsync tears down in the right order: bridge first (no more events fired
into the cache), then engines (cascades + timers stop), then any disposable sink.

MapPathsToFullRefs: deterministic path convention is
  /{areaName}/{lineName}/{equipmentName}/{tagName}
matching exactly what EquipmentNodeWalker emits into the OPC UA browse tree, so
script literals against the operator-visible UNS tree work without translation.
Tags missing EquipmentId or pointing at unknown Equipment are skipped silently
(Galaxy SystemPlatform-style tags + dangling references handled).

## OpcUaApplicationHost.SetPhase7Sources

New late-bind setter. Throws InvalidOperationException if called after
StartAsync because OtOpcUaServer + DriverNodeManagers capture the field values
at construction; mutation post-start would silently fail.

## OpcUaServerService

After bootstrap loads the current generation, calls phase7Composer.PrepareAsync
+ applicationHost.SetPhase7Sources before applicationHost.StartAsync. StopAsync
disposes Phase7Composer first so the bridge stops feeding the cache before the
OPC UA server tears down its node managers (avoids in-flight cascades surfacing
as noisy shutdown warnings).

## Program.cs

Registers IAlarmHistorianSink as NullAlarmHistorianSink.Instance (task #247
swaps in the real Galaxy.Host-writer-backed SqliteStoreAndForwardSink), Serilog
root logger, and Phase7Composer singleton.

## Tests — 5 new Phase7ComposerMappingTests = 34 Phase 7 tests total

Maps tag → walker UNS path, skips null EquipmentId, skips unknown Equipment
reference, multiple tags under same equipment map distinctly, empty content
yields empty map. Pure functions; no DI/DB needed.

The real PrepareAsync DB query path can't be exercised without SQL Server in
the test environment — it's exercised by the live E2E smoke (task #240) which
unblocks once #247 lands.

## Phase 7 production wiring chain status
-  #243 composition kernel
-  #245 scripted-alarm IReadable adapter
-  #244 driver bridge
-  #246 this — Program.cs wire-in
- 🟡 #247 — Galaxy.Host SqliteStoreAndForwardSink writer adapter (replaces NullSink)
- 🟡 #240 — live E2E smoke (unblocks once #247 lands)
2026-04-20 22:06:03 -04:00
Joseph Doherty
e11350cf80 Phase 7 follow-up #244 — DriverSubscriptionBridge
Pumps live driver OnDataChange notifications into CachedTagUpstreamSource so
ctx.GetTag in user scripts sees the freshest driver value. The last missing piece
between #243 (composition kernel) and #246 (Program.cs wire-in).

## DriverSubscriptionBridge

IAsyncDisposable. Per DriverFeed: groups all paths for one ISubscribable into a
single SubscribeAsync call (consolidating polled drivers' work + giving
native-subscription drivers one watch list), keeps a per-feed reverse map from
driver-opaque fullRef back to script-side UNS path, hooks OnDataChange to
translate + push into the cache. DisposeAsync awaits UnsubscribeAsync per active
subscription + unhooks every handler so events post-dispose are silent.

Empty PathToFullRef map → feed skipped (no SubscribeAsync call). Subscribe failure
on any feed unhooks that feed's handler + propagates so misconfiguration aborts
bridge start cleanly. Double-Start throws InvalidOperationException; double-Dispose
is idempotent.

OTOPCUA0001 suppressed at the two ISubscribable call sites with comments
explaining the carve-out: bridge is the lifecycle-coordinator for Phase 7
subscriptions (one Subscribe at engine compose, one Unsubscribe at shutdown),
not the per-call hot-path. Driver Read dispatch still goes through CapabilityInvoker
via DriverNodeManager.

## Tests — 9 new = 29 Phase 7 tests total

DriverSubscriptionBridgeTests covers: SubscribeAsync called with distinct fullRefs,
OnDataChange pushes to cache keyed by UNS path, unmapped fullRef ignored, empty
PathToFullRef skips Subscribe, DisposeAsync unsubscribes + unhooks (post-dispose
events don't push), StartAsync called twice throws, DisposeAsync idempotent,
Subscribe failure unhooks handler + propagates, ctor null guards.

## Phase 7 production wiring chain status
- #243  composition kernel
- #245  scripted-alarm IReadable adapter
- #244  this — driver bridge
- #246 pending — Program.cs Compose call + SqliteStoreAndForwardSink lifecycle
- #240 pending — live E2E smoke (unblocks once #246 lands)
2026-04-20 21:53:05 -04:00
Joseph Doherty
d6a8bb1064 Phase 7 follow-up #245 — ScriptedAlarmReadable adapter over engine state
Task #245 — exposes each scripted alarm's current ActiveState as IReadable so
OPC UA variable reads on Source=ScriptedAlarm nodes return the live predicate
truth instead of BadNotFound.

## ScriptedAlarmReadable

Wraps ScriptedAlarmEngine + implements IReadable:
- Known alarm + Active → DataValueSnapshot(true, Good)
- Known alarm + Inactive → DataValueSnapshot(false, Good)
- Unknown alarm id → DataValueSnapshot(null, BadNodeIdUnknown) — surfaces
  misconfiguration rather than silently reading false
- Batch reads preserve request order

Phase7EngineComposer.Compose now returns this as ScriptedAlarmReadable when
ScriptedAlarm rows are present. ScriptedAlarmSource (IAlarmSource for the event
stream) stays in place — the IReadable is a separate adapter over the same engine.

## Tests — 6 new + 1 updated composer test = 19 total Phase 7 tests

ScriptedAlarmReadableTests covers: inactive + active predicate → bool snapshot,
unknown alarm id → BadNodeIdUnknown, batch order preservation, null-engine +
null-fullReferences guards. The active-predicate test uses ctx.GetTag on a seeded
upstream value to drive a real cascade through the engine.

Updated Phase7EngineComposerTests to assert ScriptedAlarmReadable is non-null
when alarms compose, null when only virtual tags.

## Follow-ups remaining
- #244 — driver-bridge feed populating CachedTagUpstreamSource
- #246 — Program.cs Compose call + SqliteStoreAndForwardSink lifecycle
2026-04-20 21:30:56 -04:00
Joseph Doherty
f64a8049d8 Phase 7 follow-up #243 — CachedTagUpstreamSource + Phase7EngineComposer
Ships the composition kernel that maps Config DB rows (Script / VirtualTag /
ScriptedAlarm) to the runtime definitions VirtualTagEngine + ScriptedAlarmEngine
consume, builds the engine instances, and wires OnEvent → historian-sink routing.

## src/ZB.MOM.WW.OtOpcUa.Server/Phase7/

- CachedTagUpstreamSource — implements both Core.VirtualTags.ITagUpstreamSource and
  Core.ScriptedAlarms.ITagUpstreamSource (identical shape, distinct namespaces) on one
  concrete type so the composer can hand one instance to both engines. Thread-safe
  ConcurrentDictionary value cache with synchronous ReadTag + fire-on-write
  Push(path, snapshot) that fans out to every observer registered via SubscribeTag.
  Unknown-path reads return a BadNodeIdUnknown-quality snapshot (status 0x80340000)
  so scripts branch on quality naturally.
- Phase7EngineComposer.Compose(scripts, virtualTags, scriptedAlarms, upstream,
  alarmStateStore, historianSink, rootScriptLogger, loggerFactory) — single static
  entry point that:
  * Indexes scripts by ScriptId, resolves VirtualTag.ScriptId + ScriptedAlarm.PredicateScriptId
    to full SourceCode
  * Projects DB rows to VirtualTagDefinition + ScriptedAlarmDefinition (mapping
    DataType string → DriverDataType enum, AlarmType string → AlarmKind enum,
    Severity 1..1000 → AlarmSeverity bucket matching the OPC UA Part 9 bands
    that AbCipAlarmProjection + OpcUaClient MapSeverity already use)
  * Constructs VirtualTagEngine + loads definitions (throws InvalidOperationException
    with the list of scripts that failed to compile — aggregated like Streams B+C)
  * Constructs ScriptedAlarmEngine + loads definitions + wires OnEvent →
    IAlarmHistorianSink.EnqueueAsync using ScriptedAlarmEvent.Emission as the event
    kind + Condition.LastAckUser/LastAckComment for audit fields
  * Returns Phase7ComposedSources with Disposables list the caller owns

Empty Phase 7 config returns Phase7ComposedSources.Empty so deployments without
scripts / alarms behave exactly as pre-Phase-7. Non-null sources flow into
OpcUaApplicationHost's virtualReadable / scriptedAlarmReadable plumbing landed by
task #239 — DriverNodeManager then dispatches reads by NodeSourceKind per PR #186.

## Tests — 12/12

CachedTagUpstreamSourceTests (6):
- Unknown-path read returns BadNodeIdUnknown-quality snapshot
- Push-then-Read returns cached value
- Push fans out to subscribers in registration order
- Push to one path doesn't fire another path's observer
- Dispose of subscription handle stops fan-out
- Satisfies both Core.VirtualTags + Core.ScriptedAlarms ITagUpstreamSource interfaces

Phase7EngineComposerTests (6):
- Empty rows → Phase7ComposedSources.Empty (both sources null)
- VirtualTag rows → VirtualReadable non-null + Disposables populated
- Missing script reference throws InvalidOperationException with the missing ScriptId
  in the message
- Disabled VirtualTag row skipped by projection
- TimerIntervalMs → TimeSpan.FromMilliseconds round-trip
- Severity 1..1000 maps to Low/Medium/High/Critical at 250/500/750 boundaries
  (matches AbCipAlarmProjection + OpcUaClient.MapSeverity banding)

## Scope — what this PR does NOT do

The composition kernel is the tricky part; the remaining wiring is three narrower
follow-ups that each build on this PR:

- task #244 — driver-bridge feed that populates CachedTagUpstreamSource from live
  driver subscriptions. Without this, ctx.GetTag returns BadNodeIdUnknown even when
  the driver has a fresh value.
- task #245 — ScriptedAlarmReadable adapter exposing each alarm's current Active
  state as IReadable. Phase7EngineComposer.Compose currently returns
  ScriptedAlarmReadable=null so reads on Source=ScriptedAlarm variables return
  BadNotFound per the ADR-002 "misconfiguration not silent fallback" signal.
- task #246 — Program.cs call to Phase7EngineComposer.Compose with config rows
  loaded from the sealed-cache DB read, plus SqliteStoreAndForwardSink lifecycle
  wiring at %ProgramData%/OtOpcUa/alarm-historian-queue.db with the Galaxy.Host
  IPC writer from Stream D.

Task #240 (live OPC UA E2E smoke) depends on all three follow-ups landing.
2026-04-20 21:23:31 -04:00
Joseph Doherty
f0851af6b5 Phase 7 Stream G follow-up — DriverNodeManager dispatch routing by NodeSourceKind
Honors the ADR-002 discriminator at OPC UA Read/Write dispatch time. Virtual tag
reads route to the VirtualTagEngine-backed IReadable; scripted alarm reads route
to the ScriptedAlarmEngine-backed IReadable; driver reads continue to route to the
driver's own IReadable (no regression for any existing driver test).

## Changes

DriverNodeManager ctor gains optional `virtualReadable` + `scriptedAlarmReadable`
parameters. When callers omit them (every existing driver test) the manager behaves
exactly as before. SealedBootstrap wires the engines' IReadable adapters once the
Phase 7 composition root is added.

Per-variable NodeSourceKind tracked in `_sourceByFullRef` during Variable() registration
alongside the existing `_writeIdempotentByFullRef` / `_securityByFullRef` maps.

OnReadValue now picks the IReadable by source kind via the new internal
SelectReadable helper. When the engine-backed IReadable isn't wired (virtual tag
node but no engine provided), returns BadNotFound rather than silently falling
back to the driver — surfaces a misconfiguration instead of masking it.

OnWriteValue gates on IsWriteAllowedBySource which returns true only for Driver.
Plan decision #6: virtual tags + scripted alarms reject direct OPC UA writes with
BadUserAccessDenied. Scripts write virtual tags via `ctx.SetVirtualTag`; operators
ack alarms via the Part 9 method nodes.

## Tests — 7/7 (internal helpers exposed via InternalsVisibleTo)

DriverNodeManagerSourceDispatchTests covers:
- Driver source routes to driver IReadable
- Virtual source routes to virtual IReadable
- ScriptedAlarm source routes to alarm IReadable
- Virtual source with null virtual IReadable returns null (→ BadNotFound)
- ScriptedAlarm source with null alarm IReadable returns null
- Driver source with null driver IReadable returns null (preserves BadNotReadable)
- IsWriteAllowedBySource: only Driver=true (Virtual=false, ScriptedAlarm=false)

Full solution builds clean. Phase 7 test total now 197 green.
2026-04-20 20:12:17 -04:00
Joseph Doherty
432173c5c4 ADR-001 last-mile — Program.cs composes EquipmentNodeWalker into the production boot path. Closes task #214 + fully lands ADR-001 Option A as a live code path, not just a connected set of unit-tested primitives. After this PR a server booted against a real Config DB with Published Equipment rows materializes the UNS tree into the OPC UA address space on startup — the whole walker → wire-in → loader chain (PRs #153, #154, #155, #156) finally fires end-to-end in the production process. DriverEquipmentContentRegistry is the handoff between OpcUaServerService's bootstrap-time populate pass + OpcUaApplicationHost's StartAsync walker invocation. It's a singleton mutable holder with Get/Set/Count + Lock-guarded internal dictionary keyed OrdinalIgnoreCase to match the DriverInstanceId convention used by Equipment / Tag rows + walker grouping. Set-once-per-bootstrap semantics in practice though nothing enforces that at the type level — OpcUaServerService.PopulateEquipmentContentAsync is the only expected writer. Shared-mutable rather than immutable-passed-by-value because the DI graph builds OpcUaApplicationHost before NodeBootstrap has resolved the generation, so the registry must exist at compose time + fill at boot time. Program.cs now registers OpcUaApplicationHost via a factory lambda that threads registry.Get as the equipmentContentLookup delegate PR #155 added to the ctor seam — the one-line composition the earlier PR promised. EquipmentNamespaceContentLoader (from PR #156) is AddScoped since it takes the scoped OtOpcUaConfigDbContext; the populate pass in OpcUaServerService opens one IServiceScopeFactory scope + reuses the same loader + DbContext across every driver query rather than scoping-per-driver. OpcUaServerService.ExecuteAsync gets a new PopulateEquipmentContentAsync step between bootstrap + StartAsync: iterates DriverHost.RegisteredDriverIds, calls loader.LoadAsync per driver at the bootstrapped generationId, stashes non-null results in the registry. Null results are skipped — the wire-in's null-check treats absent registry entries as "this driver isn't Equipment-kind; let DiscoverAsync own the address space" which is the correct backward-compat path for Modbus / AB CIP / TwinCAT / FOCAS. Guarded on result.GenerationId being non-null — a fleet with no Published generation yet boots cleanly into a UNS-less address space and fills the registry on the next restart after first publish. Ctor on OpcUaServerService gained two new dependencies (DriverEquipmentContentRegistry + IServiceScopeFactory). No test file constructs OpcUaServerService directly so no downstream test breakage — the BackgroundService is only wired via DI in Program.cs. Four new DriverEquipmentContentRegistryTests: Get-null-for-unknown, Set-then-Get, case-insensitive driver-id lookup, Set-overwrites-existing. Server.Tests 190/190 (was 186, +4 new registry tests). Full ADR-001 Option A now lives at every layer: Core.OpcUa walker (#153) → ScopePathIndexBuilder (#154) → OpcUaApplicationHost wire-in (#155) → EquipmentNamespaceContentLoader (#156) → this PR's registry + Program.cs composition. The last pending loose end (full-integration smoke test that boots Program.cs against a seeded Config DB + verifies UNS tree via live OPC UA client) isn't strictly necessary because PR #155's OpcUaEquipmentWalkerIntegrationTests already proves the wire-in at the OPC UA client-browse level — the Program.cs composition added here is purely mechanical + well-covered by the four-file audit trail plus registry unit tests.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-20 03:50:37 -04:00
Joseph Doherty
a29828e41e EquipmentNamespaceContentLoader — Config-DB loader that fills the (driverInstanceId, generationId) shape the walker wire-in from PR #155 consumes. Narrow follow-up to PR #155: the ctor plumbing on OpcUaApplicationHost already takes a Func<string, EquipmentNamespaceContent?>? lookup; this PR lands the loader that will back that lookup against the central Config DB at SealedBootstrap time. DI composition in Program.cs is a separate structural PR because it needs the generation-resolve chain restructured to run before OpcUaApplicationHost construction — this one just lands the loader + unit tests so the wiring PR reduces to one factory lambda. Loader scope is one driver instance at one generation: joins Equipment filtered by (DriverInstanceId == driver, GenerationId == gen, Enabled) first, then UnsLines reachable from those Equipment rows, then UnsAreas reachable from those lines, then Tags filtered by (DriverInstanceId == driver, GenerationId == gen). Returns null when the driver has no Equipment at the supplied generation — the wire-in's null-check treats that as "skip the walker; let DiscoverAsync own the whole address space" which is the correct backward-compat behavior for non-Equipment-kind drivers (Modbus / AB CIP / TwinCAT / FOCAS whose namespace-kind is native per decisions #116-#121). Only loads the UNS branches that actually host this driver's Equipment — skips pulling unrelated UNS folders from other drivers' regions of the cluster by deriving lineIds/areaIds from the filtered Equipment set rather than reloading the full UNS tree. Enabled=false Equipment are skipped at the query level so a decommissioned machine doesn't produce a phantom browse folder — Admin still sees it in the diff view via the regular Config-DB queries but the walker's browse output reflects the operational fleet. AsNoTracking on every query because the bootstrap flow is read-only + the result is handed off to a pure-function walker immediately; change tracking would pin rows in the DbContext for the full server lifetime with no corresponding write path. Five new EquipmentNamespaceContentLoaderTests using InMemoryDatabase: (a) null result when driver has no Equipment; (b) baseline happy-path loads the full shape correctly; (c) other driver's rows at the same generation don't leak into this driver's result (per-driver scope contract); (d) same-driver rows at a different generation are skipped (per-generation scope contract per decision #148); (e) Enabled=false Equipment are skipped. Server project builds 0 errors; Server.Tests 186/186 (was 181, +5 new loader tests). Once the wiring PR lands the factory lambda in Program.cs the loader closes over the SealedBootstrap-resolved generationId + the lookup delegate delegates to LoadAsync via IServiceScopeFactory — a one-line composition, no ctor-signature churn on OpcUaApplicationHost because PR #155 already established the seam.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-20 03:19:45 -04:00
Joseph Doherty
2d97f241c0 ADR-001 wire-in — EquipmentNodeWalker runs inside OpcUaApplicationHost before driver DiscoverAsync, closing tasks #212 + #213. Completes the in-server half of the ADR-001 Option A story: Task A (PR #153) shipped the pure-function walker in Core.OpcUa; Task B (PR #154) shipped the NodeScopeResolver + ScopePathIndexBuilder + evaluator-level authz proof. This PR lands the BuildAddressSpaceAsync wire-in the walker was always meant to plug into + a full-stack OPC UA client-browse integration test that proves the UNS folder skeleton is actually visible to real UA clients end-to-end, not just to the RecordingBuilder test double. OpcUaApplicationHost gains an optional ctor parameter equipmentContentLookup of type Func<string, EquipmentNamespaceContent?>? — when supplied + non-null for a driver instance, EquipmentNodeWalker.Walk is invoked against that driver's node manager BEFORE GenericDriverNodeManager.BuildAddressSpaceAsync streams the driver's native DiscoverAsync output on top. Walker-first ordering matters: the UNS Area/Line/Equipment folder skeleton + Identification sub-folders + the five identifier properties (decision #121) are in place so driver-native references (driver-specific tag paths) land ALONGSIDE the UNS tree rather than racing it. Callers that don't supply a lookup (every existing pre-ADR-001 test + the v1 upgrade path) get identical behavior — the null-check is the backward-compat seam per the opt-in design sketched in ADR-001. The lookup delegate is driver-instance-scoped, not server-scoped, so a single server with multiple drivers can serve e.g. one Equipment-kind namespace (Galaxy proxy with a full UNS) alongside several native-kind namespaces (Modbus / AB CIP / TwinCAT / FOCAS that do not have their own UNS because decisions #116-#121 scope UNS to Equipment-kind only). SealedBootstrap.Start will wire this lookup against the Config-DB snapshot loader in a follow-up — the lookup plumbing lands first so that wiring reduces to one-line composition rather than a ctor-signature churn. New OpcUaEquipmentWalkerIntegrationTests spins up a real OtOpcUaServer on a non-default port with an EmptyDriver that registers with zero native content + a lookup that returns a seeded EquipmentNamespaceContent (one area warsaw / one line line-a / one equipment oven-3 / one tag Temperature). An OPC UA client session connects anonymously against the un-secured endpoint, browses the standard hierarchy, + asserts: (a) area folder warsaw contains line-a folder as a child; (b) line folder line-a contains oven-3 folder as a child; (c) equipment folder oven-3 contains EquipmentId + EquipmentUuid + MachineCode identifier properties — ZTag + SAPID correctly absent because the fixture leaves them null per decision #121 skip-when-null behavior; (d) the bound Tag emits a Variable node under the equipment folder with NodeId == Tag.TagConfig (the wire-level driver address) + the client can ReadValue against it end-to-end through the DriverNodeManager dispatch path. Because the EmptyDriver's DiscoverAsync is a no-op the test proves UNS content came from the walker, not the driver — the original ADR-001 question "what actually owns the browse tree" now has a mechanical answer visible at the OPC UA wire level. Test class uses its own port (48500+rand) + per-test PKI root so it runs in parallel with the existing OpcUaServerIntegrationTests fixture (48400+rand) without binding or cert collisions. Server project builds 0 errors; Server.Tests 181/181 (was 179, +2 new full-stack walker tests). Task #212 + #213 closed; the follow-up SealedBootstrap wiring is the natural next pickup because the ctor plumbing lands here + that becomes a narrow downstream PR.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-20 03:09:37 -04:00
Joseph Doherty
1bf3938cdf ADR-001 Task B — NodeScopeResolver full-path + ScopePathIndexBuilder + evaluator-level ACL test closing #195. Two production additions + one end-to-end authz regression test proving the Identification ACL contract the IdentificationFolderBuilder docstring promises. Task A (PR #153) shipped the walker as a pure function that materializes the UNS → Equipment → Tag browse tree + IdentificationFolderBuilder.Build per Equipment. This PR lands the authz half of the walker's story — the resolver side that turns a driver-side full reference into a full NodeScope path (NamespaceId + UnsAreaId + UnsLineId + EquipmentId + TagId) so the permission trie can walk the UNS hierarchy + apply Equipment-scope grants correctly at dispatch time. The actual in-server wiring (load snapshot → call walker during BuildAddressSpaceAsync → swap in the full-path resolver) is split into follow-up task #212 because it's a bigger surface (Server bootstrap + DriverNodeManager override + real OPC UA client-browse integration test). NodeScopeResolver extended with a second constructor taking IReadOnlyDictionary<string, NodeScope> pathIndex — when supplied, Resolve looks up the full reference in the index + returns the indexed scope with every UNS level populated; when absent or on miss, falls back to the pre-ADR-001 cluster-only scope so driver-discovered tags that haven't been indexed yet (between a DiscoverAsync result + the next generation publish) stay addressable without crashing the resolver. Index is frozen into a FrozenDictionary<string, NodeScope> under Ordinal comparer for O(1) hot-path lookups. Thread-safety by immutability — callers swap atomically on generation change via the server's publish pipeline. New ScopePathIndexBuilder.Build in Server.Security takes (clusterId, namespaceId, EquipmentNamespaceContent) + produces the fullReference → NodeScope dictionary by joining Tag → Equipment → UnsLine → UnsArea through up-front dictionaries keyed Ordinal-ignoring-case. Tag rows with null EquipmentId (SystemPlatform-namespace Galaxy tags per decision #120) are excluded from the index; cluster-only fallback path covers them. Broken FKs (Tag references missing Equipment row, or Equipment references missing UnsLine) are skipped rather than crashing — sp_ValidateDraft should have caught these at publish, any drift here is unexpected but non-fatal. Duplicate keys throw InvalidOperationException at bootstrap so corrupt-data drift surfaces up-front instead of producing silently-last-wins scopes at dispatch. End-to-end authz regression test in EquipmentIdentificationAuthzTests walks the full dispatch flow against a Config-DB-style fixture: ScopePathIndexBuilder.Build from the same EquipmentNamespaceContent the EquipmentNodeWalker consumes → NodeScopeResolver with that index → AuthorizationGate + TriePermissionEvaluator → PermissionTrieBuilder with one Equipment-scope NodeAcl grant + a NodeAclPath resolving Equipment ScopeId to (namespace, area, line, equipment). Four tests prove the contract: (a) authorized group Read granted on Identification property; (b) unauthorized group Read denied on Identification property — the #195 contract the IdentificationFolderBuilder docstring promises (the BadUserAccessDenied surfacing happens at the DriverNodeManager dispatch layer which is already wired to AuthorizationGate.IsAllowed → StatusCodes.BadUserAccessDenied in PR #94); (c) Equipment-scope grant cascades to both the Equipment's tag + its Identification properties because they share the Equipment ScopeId — no new scope level for Identification per the builder's Remarks section; (d) grant on oven-3 does NOT leak to press-7 (different equipment under the same UnsLine) proving per-Equipment isolation at dispatch when the resolver populates the full path. NodeScopeResolverTests extended with two new tests covering the indexed-lookup path + fallback-on-miss path; renamed the existing "_For_Phase1" test to "_When_NoIndexSupplied" to match the current framing. Server project builds 0 errors; Server.Tests 179/179 (was 173, +6 new across the two test files). Task #212 captures the remaining in-server wiring work — Server.SealedBootstrap load of EquipmentNamespaceContent, DriverNodeManager override that calls EquipmentNodeWalker during BuildAddressSpaceAsync for Equipment-kind namespaces, and a real OPC UA client-browse integration test. With that wiring + this PR's authz-layer proof, #195's "ACL integration test" line is satisfied at two layers (evaluator + live endpoint) which is stronger than the task originally asked for.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-20 02:50:27 -04:00
Joseph Doherty
016122841b Phase 6.1 Stream E.2 partial — ResilienceStatusPublisherHostedService persists tracker snapshots to DB
Closes the HostedService half of Phase 6.1 Stream E.2 flagged as a follow-up
when the DriverResilienceStatusTracker shipped in PR #82. The Admin /hosts
column refresh + SignalR push + red-badge visual (Stream E.3) remain
deferred to the visual-compliance pass — this PR owns the persistence
story alone.

Server.Hosting:
- ResilienceStatusPublisherHostedService : BackgroundService. Samples the
  DriverResilienceStatusTracker every TickInterval (default 5 s) and upserts
  each (DriverInstanceId, HostName) counter pair into
  DriverInstanceResilienceStatus via EF. New rows on first sight; in-place
  updates on subsequent ticks.
- PersistOnceAsync extracted public so tests drive one tick directly —
  matches the ScheduledRecycleHostedService pattern for deterministic
  timing.
- Best-effort persistence: a DB outage logs a warning + continues; the next
  tick retries. Never crashes the app on sample failure. Cancellation
  propagates through cleanly.
- Tracks the bulkhead depth / recycle / footprint columns the entity was
  designed for. CurrentBulkheadDepth currently persisted as 0 — the tracker
  doesn't yet expose live bulkhead depth; a narrower follow-up wires the
  Polly bulkhead-depth observer into the tracker.

Tests (6 new in ResilienceStatusPublisherHostedServiceTests):
- Empty tracker → tick is a no-op, zero rows written.
- Single-host counters → upsert a new row with ConsecutiveFailures + breaker
  timestamp + sampled timestamp.
- Second tick updates the existing row in place (not a second insert).
- Multi-host pairs persist independently.
- Footprint counters (Baseline + Current) round-trip.
- TickCount advances on every PersistOnceAsync call.

Full solution dotnet test: 1225 passing (was 1219, +6). Pre-existing
Client.CLI Subscribe flake unchanged.

Production wiring (Program.cs) example:
  builder.Services.AddSingleton<DriverResilienceStatusTracker>();
  builder.Services.AddHostedService<ResilienceStatusPublisherHostedService>();
  // Tracker gets wired into CapabilityInvoker via OtOpcUaServer resolution
  // + the existing Phase 6.1 layer.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-19 14:36:00 -04:00
Joseph Doherty
c4a92f424a Phase 6.1 Stream B.4 follow-up — ScheduledRecycleHostedService drives registered schedulers on a fixed tick
Turns the Phase 6.1 Stream B.4 pure-logic ScheduledRecycleScheduler (shipped
in PR #79) into a running background feature. A Tier C driver registers its
scheduler at startup; the hosted service ticks every TickInterval (default
1 min) and invokes TickAsync on each registered scheduler.

Server.Hosting:
- ScheduledRecycleHostedService : BackgroundService. AddScheduler(s) must be
  called before StartAsync — registering post-start throws
  InvalidOperationException to avoid "some ticks saw my scheduler, some
  didn't" races. ExecuteAsync loops on Task.Delay(TickInterval, _timeProvider,
  stoppingToken) + delegates to a public TickOnceAsync method for one tick.
- TickOnceAsync extracted as the unit-of-work so tests drive it directly
  without needing to synchronize with FakeTimeProvider + BackgroundService
  timing semantics.
- Exception isolation: if one scheduler throws, the loop logs + continues
  to the next scheduler. A flaky supervisor can't take down the tick for
  every other Tier C driver.
- Diagnostics: TickCount + SchedulerCount properties for tests + logs.

Tests (7 new ScheduledRecycleHostedServiceTests, all pass):
- TickOnce before interval doesn't fire; TickCount still advances.
- TickOnce at/after interval fires the underlying scheduler exactly once.
- Multiple ticks accumulate count.
- AddScheduler after StartAsync throws.
- Throwing scheduler doesn't poison its neighbours (logs + continues).
- SchedulerCount matches registrations.
- Empty scheduler list ticks cleanly (no-op + counter advances).

Full solution dotnet test: 1193 passing (was 1186, +7). Pre-existing
Client.CLI Subscribe flake unchanged.

Production wiring (Program.cs):
  builder.Services.AddSingleton<ScheduledRecycleHostedService>();
  builder.Services.AddHostedService(sp => sp.GetRequiredService<ScheduledRecycleHostedService>());
  // During DI configuration, once Tier C drivers + their ScheduledRecycleSchedulers
  // are resolved, call host.AddScheduler(scheduler) for each.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-19 11:42:08 -04:00
Joseph Doherty
c4824bea12 Phase 6.3 Stream C core — RedundancyStatePublisher + PeerReachability; orchestrates calculator inputs end-to-end
Wires the Phase 6.3 Stream B pure-logic pieces (ServiceLevelCalculator,
RecoveryStateManager, ApplyLeaseRegistry) + Stream A topology loader
(RedundancyCoordinator) into one orchestrator the runtime + OPC UA node
surface consume. The actual OPC UA variable-node plumbing (mapping
ServiceLevel Byte + ServerUriArray String[] onto the Opc.Ua.Server stack)
is narrower follow-up on top of this — the publisher emits change events
the OPC UA layer subscribes to.

Server.Redundancy additions:
- PeerReachability record + PeerReachabilityTracker — thread-safe
  per-peer-NodeId holder of the latest (HttpHealthy, UaHealthy) tuple. Probe
  loops (Stream B.1/B.2 runtime follow-up) write via Update; the publisher
  reads via Get. PeerReachability.FullyHealthy / Unknown sentinels for the
  two most-common states.
- RedundancyStatePublisher — pure orchestrator, no background timer, no OPC
  UA stack dep. ComputeAndPublish reads the 6 inputs + calls the calculator:
    * role (from coordinator.Current.SelfRole)
    * selfHealthy (caller-supplied Func<bool>)
    * peerHttpHealthy + peerUaHealthy (aggregate across all peers in
      coordinator.Current.Peers)
    * applyInProgress (ApplyLeaseRegistry.IsApplyInProgress)
    * recoveryDwellMet (RecoveryStateManager.IsDwellMet)
    * topologyValid (coordinator.IsTopologyValid)
    * operatorMaintenance (caller-supplied Func<bool>)
  Before-coordinator-init returns NoData=1 so clients never see an
  authoritative value from an un-bootstrapped server.
  OnStateChanged event fires edge-triggered when the byte changes;
  OnServerUriArrayChanged fires edge-triggered when the topology's self-first
  peer-sorted URI array content changes.
- ServiceLevelSnapshot record — per-tick output with Value + Band +
  Topology. The OPC UA layer's ServiceLevel Byte node subscribes to
  OnStateChanged; the ServerUriArray node subscribes to OnServerUriArrayChanged.

Tests (8 new RedundancyStatePublisherTests, all pass):
- Before-init returns NoData (Value=1, Band=NoData).
- Authoritative-Primary when healthy + peer fully reachable.
- Isolated-Primary (230) retains authority when peer unreachable — matches
  decision #154 non-promotion semantics.
- Mid-apply band dominates: open lease → Value=200 even with peer healthy.
- Self-unhealthy → NoData regardless of other inputs.
- OnStateChanged fires only on value transitions (edge-triggered).
- OnServerUriArrayChanged fires once per topology content change; repeat
  ticks with same topology don't re-emit.
- Standalone cluster treats healthy as AuthoritativePrimary=255.

Microsoft.EntityFrameworkCore.InMemory 10.0.0 added to Server.Tests for the
coordinator-backed publisher tests.

Full solution dotnet test: 1186 passing (was 1178, +8). Pre-existing
Client.CLI Subscribe flake unchanged.

Closes the core of release blocker #3 — the pure-logic + orchestration
layer now exists + is unit-tested. Remaining Stream C surfaces: OPC UA
ServiceLevel Byte variable wiring (binds to OnStateChanged), ServerUriArray
String[] wiring (binds to OnServerUriArrayChanged), RedundancySupport
static from RedundancyMode. Those touch the OPC UA stack directly + land
as Stream C.2 follow-up.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-19 11:31:50 -04:00
Joseph Doherty
84fe88fadb Phase 6.3 Stream A — RedundancyTopology + ClusterTopologyLoader + RedundancyCoordinator
Lands the data path that feeds the Phase 6.3 ServiceLevelCalculator shipped in
PR #89. OPC UA node wiring (ServiceLevel variable + ServerUriArray +
RedundancySupport) still deferred to task #147; peer-probe loops (Stream B.1/B.2
runtime layer beyond the calculator logic) deferred.

Server.Redundancy additions:
- RedundancyTopology record — immutable snapshot (ClusterId, SelfNodeId,
  SelfRole, Mode, Peers[], SelfApplicationUri). ServerUriArray() emits the
  OPC UA Part 4 §6.6.2.2 shape (self first, peers lexicographically by
  NodeId). RedundancyPeer record with per-peer Host/OpcUaPort/DashboardPort/
  ApplicationUri so the follow-up peer-probe loops know where to probe.
- ClusterTopologyLoader — pure fn from ServerCluster + ClusterNode[] to
  RedundancyTopology. Enforces Phase 6.3 Stream A.1 invariants:
    * At least one node per cluster.
    * At most 2 nodes (decision #83, v2.0 cap).
    * Every node belongs to the target cluster.
    * Unique ApplicationUri across the cluster (OPC UA Part 4 trust pin,
      decision #86).
    * At most 1 Primary per cluster in Warm/Hot modes (decision #84).
    * Self NodeId must be a member of the cluster.
  Violations throw InvalidTopologyException with a decision-ID-tagged message
  so operators know which invariant + what to fix.
- RedundancyCoordinator singleton — holds the current topology + IsTopologyValid
  flag. InitializeAsync throws on invariant violation (startup fails fast).
  RefreshAsync logs + flips IsTopologyValid=false (runtime won't tear down a
  running server; ServiceLevelCalculator falls to InvalidTopology band = 2
  which surfaces the problem to clients without crashing). CAS-style swap
  via Volatile.Write so readers always see a coherent snapshot.

Tests (10 new ClusterTopologyLoaderTests):
- Single-node standalone loads + empty peer list.
- Two-node cluster loads self + peer.
- ServerUriArray puts self first + peers sort lexicographically.
- Empty-nodes throws.
- Self-not-in-cluster throws.
- Three-node cluster rejected with decision #83 message.
- Duplicate ApplicationUri rejected with decision #86 shape reference.
- Two Primaries in Warm mode rejected (decision #84 + runtime-band reference).
- Cross-cluster node rejected.
- None-mode allows any role mix (standalone clusters don't enforce Primary count).

Full solution dotnet test: 1178 passing (was 1168, +10). Pre-existing
Client.CLI Subscribe flake unchanged.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-19 11:24:14 -04:00
Joseph Doherty
19a0bfcc43 Phase 6.1 Stream D follow-up — SealedBootstrap consumes ResilientConfigReader + GenerationSealedCache + StaleConfigFlag; /healthz surfaces the flag
Closes release blocker #2 from docs/v2/v2-release-readiness.md — the
generation-sealed cache + resilient reader + stale-config flag shipped as
unit-tested primitives in PR #81, but no production path consumed them until
now. This PR wires them end-to-end.

Server additions:
- SealedBootstrap — Phase 6.1 Stream D consumption hook. Resolves the node's
  current generation through ResilientConfigReader's timeout → retry →
  fallback-to-sealed pipeline. On every successful central-DB fetch it seals
  a fresh snapshot to <cache-root>/<cluster>/<generationId>.db so a future
  cache-miss has a known-good fallback. Alongside the original NodeBootstrap
  (which still uses the single-file ILocalConfigCache); Program.cs can
  switch between them once operators are ready for the generation-sealed
  semantics.
- OpcUaApplicationHost: new optional staleConfigFlag ctor parameter. When
  wired, HealthEndpointsHost consumes `flag.IsStale` via the existing
  usingStaleConfig Func<bool> hook. Means `/healthz` actually reports
  `usingStaleConfig: true` whenever a read fell back to the sealed cache —
  closes the loop between Stream D's flag + Stream C's /healthz body shape.

Tests (4 new SealedBootstrapIntegrationTests, all pass):
- Central-DB success path seals snapshot + flag stays fresh.
- Central-DB failure falls back to sealed snapshot + flag flips stale (the
  SQL-kill scenario from Phase 6.1 Stream D.4.a).
- No-snapshot + central-down throws GenerationCacheUnavailableException
  with a clear error (the first-boot scenario from D.4.c).
- Next successful bootstrap after a fallback clears the stale flag.

Full solution dotnet test: 1168 passing (was 1164, +4). Pre-existing
Client.CLI Subscribe flake unchanged.

Production activation: Program.cs wires SealedBootstrap (instead of
NodeBootstrap), constructs OpcUaApplicationHost with the staleConfigFlag,
and a HostedService polls sp_GetCurrentGenerationForCluster periodically so
peer-published generations land in this node's sealed cache. The poller
itself is Stream D.1.b follow-up.

The sp_PublishGeneration SQL-side hook (where the publish commit itself
could also write to a shared sealed cache) stays deferred — the per-node
seal pattern shipped here is the correct v2 GA model: each Server node
owns its own on-disk cache and refreshes from its own DB reads, matching
the Phase 6.1 scope-table description.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-19 11:14:59 -04:00
Joseph Doherty
f8d5b0fdbb Phase 6.2 Stream C follow-up — wire AuthorizationGate into DriverNodeManager Read / Write / HistoryRead dispatch
Closes the Phase 6.2 security gap the v2 release-readiness dashboard flagged:
the evaluator + trie + gate shipped as code in PRs #84-88 but no dispatch
path called them. This PR threads the gate end-to-end from
OpcUaApplicationHost → OtOpcUaServer → DriverNodeManager and calls it on
every Read / Write / 4 HistoryRead paths.

Server.Security additions:
- NodeScopeResolver — maps driver fullRef → Core.Authorization NodeScope.
  Phase 1 shape: populates ClusterId + TagId; leaves NamespaceId / UnsArea /
  UnsLine / Equipment null. The cluster-level ACL cascade covers this
  configuration (decision #129 additive grants). Finer-grained scope
  resolution (joining against the live Configuration DB for UnsArea / UnsLine
  path) lands as Stream C.12 follow-up.
- WriteAuthzPolicy.ToOpcUaOperation — maps SecurityClassification → the
  OpcUaOperation the gate evaluator consults (Operate/SecuredWrite →
  WriteOperate; Tune → WriteTune; Configure/VerifiedWrite → WriteConfigure).

DriverNodeManager wiring:
- Ctor gains optional AuthorizationGate + NodeScopeResolver; both null means
  the pre-Phase-6.2 dispatch runs unchanged (backwards-compat for every
  integration test that constructs DriverNodeManager directly).
- OnReadValue: ahead of the invoker call, builds NodeScope + calls
  gate.IsAllowed(identity, Read, scope). Denied reads return
  BadUserAccessDenied without hitting the driver.
- OnWriteValue: preserves the existing WriteAuthzPolicy check (classification
  vs session roles) + adds an additive gate check using
  WriteAuthzPolicy.ToOpcUaOperation(classification) to pick the right
  WriteOperate/Tune/Configure surface. Lax mode falls through for identities
  without LDAP groups.
- Four HistoryRead paths (Raw / Processed / AtTime / Events): gate check
  runs per-node before the invoker. Events path tolerates fullRef=null
  (event-history queries can target a notifier / driver-root; those are
  cluster-wide reads that need a different scope shape — deferred).
- New WriteAccessDenied helper surfaces BadUserAccessDenied in the
  OpcHistoryReadResult slot + errors list, matching the shape of the
  existing WriteUnsupported / WriteInternalError helpers.

OtOpcUaServer + OpcUaApplicationHost: gate + resolver thread through as
optional constructor parameters (same pattern as DriverResiliencePipelineBuilder
in Phase 6.1). Null defaults keep the existing 3 OpcUaApplicationHost
integration tests constructing without them unchanged.

Tests (5 new in NodeScopeResolverTests):
- Resolve populates ClusterId + TagId + Equipment Kind.
- Resolve leaves finer path null per Phase 1 shape (doc'd as follow-up).
- Empty fullReference throws.
- Empty clusterId throws at ctor.
- Resolver is stateless across calls.

The existing 9 AuthorizationGate tests (shipped in PR #86) continue to
cover the gate's allow/deny semantics under strict + lax mode.

Full solution dotnet test: 1164 passing (was 1159, +5). Pre-existing
Client.CLI Subscribe flake unchanged. Existing OpcUaApplicationHost +
HealthEndpointsHost + driver integration tests continue to pass because the
gate defaults to null → no enforcement, and the lax-mode fallback returns
true for identities without LDAP groups (the anonymous test path).

Production deployments flip the gate on by constructing it via
OpcUaApplicationHost's new authzGate parameter + setting
`Authorization:StrictMode = true` once ACL data is populated. Flipping the
switch post-seed turns the evaluator + trie from scaffolded code into
actual enforcement.

This closes release blocker #1 listed in docs/v2/v2-release-readiness.md.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-19 11:02:17 -04:00
Joseph Doherty
483f55557c Phase 6.3 Stream B + Stream D (core) — ServiceLevelCalculator + RecoveryStateManager + ApplyLeaseRegistry
Lands the pure-logic heart of Phase 6.3. OPC UA node wiring (Stream C),
RedundancyCoordinator topology loader (Stream A), Admin UI + metrics (Stream E),
and client interop tests (Stream F) are follow-up work — tracked as
tasks #145-150.

New Server.Redundancy sub-namespace:

- ServiceLevelCalculator — pure 8-state matrix per decision #154. Inputs:
  role, selfHealthy, peerUa/HttpHealthy, applyInProgress, recoveryDwellMet,
  topologyValid, operatorMaintenance. Output: OPC UA Part 5 §6.3.34 Byte.
  Reserved bands (0=Maintenance, 1=NoData, 2=InvalidTopology) override
  everything; operational bands occupy 30..255.
  Key invariants:
    * Authoritative-Primary = 255, Authoritative-Backup = 100.
    * Isolated-Primary = 230 (retains authority with peer down).
    * Isolated-Backup = 80 (does NOT auto-promote — non-transparent model).
    * Primary-Mid-Apply = 200, Backup-Mid-Apply = 50; apply dominates
      peer-unreachable per Stream C.4 integration expectation.
    * Recovering-Primary = 180, Recovering-Backup = 30.
    * Standalone treats healthy as Authoritative-Primary (no peer concept).
- ServiceLevelBand enum — labels every numeric band for logs + Admin UI.
  Values match the calculator table exactly; compliance script asserts
  drift detection.
- RecoveryStateManager — holds Recovering band until (dwell ≥ 60s default)
  AND (one publish witness observed). Re-fault resets both gates so a
  flapping node doesn't shortcut through recovery twice.
- ApplyLeaseRegistry — keyed on (ConfigGenerationId, PublishRequestId) per
  decision #162. BeginApplyLease returns an IAsyncDisposable so every exit
  path (success, exception, cancellation, dispose-twice) closes the lease.
  ApplyMaxDuration watchdog (10 min default) via PruneStale tick forces
  close after a crashed publisher so ServiceLevel can't stick at mid-apply.

Tests (40 new, all pass):
- ServiceLevelCalculatorTests (27): reserved bands override; self-unhealthy
  → NoData; invalid topology demotes both nodes to 2; authoritative primary
  255; backup 100; isolated primary 230 retains authority; isolated backup
  80 does not promote; http-only unreachable triggers isolated; mid-apply
  primary 200; mid-apply backup 50; apply dominates peer-unreachable; recovering
  primary 180; recovering backup 30; standalone treats healthy as 255;
  classify round-trips every band including Unknown sentinel.
- RecoveryStateManagerTests (6): never-faulted auto-meets dwell; faulted-only
  returns true (semantics-doc test — coordinator short-circuits on
  selfHealthy=false); recovered without witness never meets; witness without
  dwell never meets; witness + dwell-elapsed meets; re-fault resets.
- ApplyLeaseRegistryTests (7): empty registry not-in-progress; begin+dispose
  closes; dispose on exception still closes; dispose twice safe; concurrent
  leases isolated; watchdog closes stale; watchdog leaves recent alone.

Full solution dotnet test: 1137 passing (Phase 6.2 shipped at 1097, Phase 6.3
B + D core = +40 = 1137). Pre-existing Client.CLI Subscribe flake unchanged.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-19 09:56:34 -04:00
Joseph Doherty
8efb99b6be Phase 6.2 Stream C (foundation) — AuthorizationGate + ILdapGroupsBearer
Lands the integration seam between the Server project's OPC UA stack and the
Core.Authorization evaluator. Actual DriverNodeManager dispatch-path wiring
(Read/Write/HistoryRead/Browse/Call/Subscribe/Alarm surfaces) lands in the
follow-up PR on this branch — covered by Task #143 below.

Server.Security additions:
- ILdapGroupsBearer — marker interface a custom IUserIdentity implements to
  expose its resolved LDAP group DNs. Parallel to the existing IRoleBearer
  (admin roles) — control/data-plane separation per decision #150.
- AuthorizationGate — stateless bridge between Opc.Ua.IUserIdentity and
  IPermissionEvaluator. IsAllowed(identity, operation, scope) materializes a
  UserAuthorizationState from the identity's LDAP groups, delegates to the
  evaluator, and returns a single bool the dispatch paths use to decide
  whether to surface BadUserAccessDenied.
- StrictMode knob controls fail-open-during-transition vs fail-closed:
  - strict=false (default during rollout) — null identity, identity without
    ILdapGroupsBearer, or NotGranted outcome all return true so older
    deployments without ACL data keep working.
  - strict=true (production target) — any of the above returns false.
  The appsetting `Authorization:StrictMode = true` flips deployments over
  once their ACL data is populated.

Tests (9 new in Server.Tests/AuthorizationGateTests):
- Null identity — strict denies, lax allows.
- Identity without LDAP groups — strict denies, lax allows.
- LDAP group with matching grant allows.
- LDAP group without grant — strict denies.
- Wrong operation denied (Read-only grant, WriteOperate requested).
- BuildSessionState returns materialized state with LDAP groups + null when
  identity doesn't carry them.

Full solution dotnet test: 1087 passing (Phase 6.1 = 1042, Phase 6.2 A = +9,
B = +27, C foundation = +9 = 1087). Pre-existing Client.CLI Subscribe flake
unchanged.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-19 09:33:51 -04:00
Joseph Doherty
9dd5e4e745 Phase 6.1 Stream C — health endpoints on :4841 + LogContextEnricher + Serilog JSON sink + CapabilityInvoker enrichment
Closes Stream C per docs/v2/implementation/phase-6-1-resilience-and-observability.md.

Core.Observability (new namespace):
- DriverHealthReport — pure-function aggregation over DriverHealthSnapshot list.
  Empty fleet = Healthy. Any Faulted = Faulted. Any Unknown/Initializing (no
  Faulted) = NotReady. Any Degraded or Reconnecting (no Faulted, no NotReady)
  = Degraded. Else Healthy. HttpStatus(verdict) maps to the Stream C.1 state
  matrix: Healthy/Degraded → 200, NotReady/Faulted → 503.
- LogContextEnricher — Serilog LogContext wrapper. Push(id, type, capability,
  correlationId) returns an IDisposable scope; inner log calls carry
  DriverInstanceId / DriverType / CapabilityName / CorrelationId structured
  properties automatically. NewCorrelationId = 12-hex-char GUID slice for
  cases where no OPC UA RequestHeader.RequestHandle is in flight.

CapabilityInvoker — now threads LogContextEnricher around every ExecuteAsync /
ExecuteWriteAsync call site. OtOpcUaServer passes driver.DriverType through
so logs correlate to the driver type too. Every capability call emits
structured fields per the Stream C.4 compliance check.

Server.Observability:
- HealthEndpointsHost — standalone HttpListener on http://localhost:4841/
  (loopback avoids Windows URL-ACL elevation; remote probing via reverse
  proxy or explicit netsh urlacl grant). Routes:
    /healthz → 200 when (configDbReachable OR usingStaleConfig); 503 otherwise.
      Body: status, uptimeSeconds, configDbReachable, usingStaleConfig.
    /readyz  → DriverHealthReport.Aggregate + HttpStatus mapping.
      Body: verdict, drivers[], degradedDrivers[], uptimeSeconds.
    anything else → 404.
  Disposal cooperative with the HttpListener shutdown.
- OpcUaApplicationHost starts the health host after the OPC UA server comes up
  and disposes it on shutdown. New OpcUaServerOptions knobs:
  HealthEndpointsEnabled (default true), HealthEndpointsPrefix (default
  http://localhost:4841/).

Program.cs:
- Serilog pipeline adds Enrich.FromLogContext + opt-in JSON file sink via
  `Serilog:WriteJson = true` appsetting. Uses Serilog.Formatting.Compact's
  CompactJsonFormatter (one JSON object per line — SIEMs like Splunk,
  Datadog, Graylog ingest without a regex parser).

Server.Tests:
- Existing 3 OpcUaApplicationHost integration tests now set
  HealthEndpointsEnabled=false to avoid port :4841 collisions under parallel
  execution.
- New HealthEndpointsHostTests (9): /healthz healthy empty fleet; stale-config
  returns 200 with flag; unreachable+no-cache returns 503; /readyz empty/
  Healthy/Faulted/Degraded/Initializing drivers return correct status and
  bodies; unknown path → 404. Uses ephemeral ports via Interlocked counter.

Core.Tests:
- DriverHealthReportTests (8): empty fleet, all-healthy, any-Faulted trumps,
  any-NotReady without Faulted, Degraded without Faulted/NotReady, HttpStatus
  per-verdict theory.
- LogContextEnricherTests (8): all 4 properties attach; scope disposes cleanly;
  NewCorrelationId shape; null/whitespace driverInstanceId throws.
- CapabilityInvokerEnrichmentTests (2): inner logs carry structured
  properties; no context leak outside the call site.

Full solution dotnet test: 1016 passing (baseline 906, +110 for Phase 6.1 so
far across Streams A+B+C). Pre-existing Client.CLI Subscribe flake unchanged.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-19 08:15:44 -04:00
Joseph Doherty
52a29100b1 Phase 3 PR 38 — DriverNodeManager HistoryRead override (LMX #1 finish). Wires the OPC UA HistoryRead service through CustomNodeManager2's four protected per-kind hooks — HistoryReadRawModified / HistoryReadProcessed / HistoryReadAtTime / HistoryReadEvents — each dispatching to the driver's IHistoryProvider capability (PR 35 for ReadAtTime + ReadEvents on top of PR 19-era ReadRaw + ReadProcessed). Was the last missing piece of the end-to-end HistoryRead path: PR 10 + PR 11 shipped the Galaxy.Host IPC contracts, PR 35 surfaced them on IHistoryProvider + GalaxyProxyDriver, but no server-side handler bridged OPC UA HistoryRead service requests onto the capability interface. Now it does.
Per-kind override shape: each hook receives the pre-filtered nodesToProcess list (NodeHandles for nodes this manager claimed), iterates them, resolves handle.NodeId.Identifier to the driver-side full reference string, and dispatches to the right IHistoryProvider method. Write back into the outer results + errors slots at handle.Index (not the local loop counter — nodesToProcess is a filtered subset of nodesToRead, so indexing by the loop counter lands in the wrong slot for mixed-manager batches). WriteResult helper sets both results[i] AND errors[i]; this matters because MasterNodeManager merges them and leaving errors[i] at its default (BadHistoryOperationUnsupported) overrides a Good result with Unsupported on the wire — this was the subtle failure mode that masked a correctly-constructed HistoryData response during debugging. Failure-isolation per node: NotSupportedException from a driver that doesn't implement a particular HistoryProvider method translates to BadHistoryOperationUnsupported in that slot; generic exceptions log and surface BadInternalError; unresolvable NodeIds get BadNodeIdUnknown. The batch continues unconditionally.
Aggregate mapping: MapAggregate translates ObjectIds.AggregateFunction_Average / Minimum / Maximum / Total / Count to the driver's HistoryAggregateType enum. Null for anything else (e.g. TimeAverage, Interpolative) so the handler surfaces BadAggregateNotSupported at the batch level — per Part 13, one unsupported aggregate means the whole request fails since ReadProcessedDetails carries one aggregate list for all nodes. BuildHistoryData wraps driver DataValueSnapshots as Opc.Ua.HistoryData in an ExtensionObject; BuildHistoryEvent wraps HistoricalEvents as Opc.Ua.HistoryEvent with the canonical BaseEventType field list (EventId, SourceName, Message, Severity, Time, ReceiveTime — the order OPC UA clients that didn't customize the SelectClause expect). ToDataValue preserves null SourceTimestamp (Galaxy historian rows often carry only ServerTimestamp) — synthesizing a SourceTimestamp would lie about actual sample time.
Two address-space changes were required to make the stack dispatch reach the per-kind hooks at all: (1) historized variables get AccessLevels.HistoryRead added to their AccessLevel byte — the base's early-gate check on (variable.AccessLevel & HistoryRead != 0) was rejecting requests before our override ever ran; (2) the driver-root folder gets EventNotifiers.HistoryRead | SubscribeToEvents so HistoryReadEvents can target it (the conventional pattern for alarm-history browse against a driver-owned object). Document the 'set both bits' requirement inline since it's not obvious from the surface API.
OpcHistoryReadResult alias: Opc.Ua.HistoryReadResult (service-layer per-node result) collides with Core.Abstractions.HistoryReadResult (driver-side samples + continuation point) by type name; the alias 'using OpcHistoryReadResult = Opc.Ua.HistoryReadResult' keeps the override signatures unambiguous and the test project applies the mirror pattern for its stub driver impl.
Tests — DriverNodeManagerHistoryMappingTests (12 new Category=Unit cases): MapAggregate translates each supported aggregate NodeId via reflection-backed theory (guards against the stack renaming AggregateFunction_* constants); returns null for unsupported NodeIds (TimeAverage) and null input; BuildHistoryData wraps samples with correct DataValues + SourceTimestamp preservation; BuildHistoryEvent emits the 6-element BaseEventType field list in canonical order (regression guard for a future 'respect the client's SelectClauses' change); null SourceName / Message translate to empty-string Variants (nullable-Variant refactor trap); ToDataValue preserves StatusCode + both timestamps; ToDataValue leaves SourceTimestamp at default when the snapshot omits it. HistoryReadIntegrationTests (5 new Category=Integration): drives a real OPC UA client Session.HistoryRead against a fake HistoryDriver through the running server. Covers raw round-trip (verifies per-node DataValue ordering + values); processed with Average aggregate (captures the driver's received aggregate + interval, asserting MapAggregate routed correctly); unsupported aggregate (TimeAverage → BadAggregateNotSupported); at-time (forwards the per-timestamp list); events (BaseEventType field list shape, SelectClauses populated to satisfy the stack's filter validator). Server.Tests Unit: 55 pass / 0 fail (43 prior + 12 new mapping). Server.Tests Integration: 14 pass / 0 fail (9 prior + 5 new history). Full solution build clean, 0 errors.
lmx-followups.md #1 updated to 'DONE (PRs 35 + 38)' with two explicit deferred items: continuation-point plumbing (driver returns null today so pass-through is fine) and per-SelectClause evaluation in HistoryReadEvents (clients with custom field selections get the canonical BaseEventType layout today).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-18 17:50:23 -04:00
Joseph Doherty
ef2a810b2d Phase 3 PR 34 — Host-status publisher (Server) + /hosts drill-down page (Admin). Closes LMX follow-up #7 by wiring together the data layer from PR 33. Server.HostStatusPublisher is a BackgroundService that walks every driver registered in DriverHost every 10 seconds, skips drivers that don't implement IHostConnectivityProbe, calls GetHostStatuses() on each probe-capable driver, and upserts one DriverHostStatus row per (NodeId, DriverInstanceId, HostName) into the central config DB. Upsert path: SingleOrDefaultAsync on the composite PK; if no row exists, Add a new one; if a row exists, LastSeenUtc advances unconditionally (heartbeat) and State + StateChangedUtc update only on transitions so Admin UI can distinguish 'still reporting, still Running' from 'freshly transitioned to Running'. MapState translates Core.Abstractions.HostState to Configuration.Enums.DriverHostState (intentional duplicate enum — Configuration project stays free of driver-runtime deps per PR 33's choice). If a driver's GetHostStatuses throws, log warning and skip that driver this tick — never take down the Server on a publisher failure. If the DB is unreachable, log warning + retry next heartbeat (no buffering — next tick's current-state snapshot is more useful than replaying stale transitions after a long outage). 2-second startup delay so NodeBootstrap's RegisterAsync calls land before the first publish tick, then tick runs immediately so a freshly-started Server surfaces its host topology in the Admin UI without waiting a full interval.
Polling chosen over event-driven for initial scope: simpler, matches Admin UI consumer cadence, avoids DriverHost lifecycle-event plumbing that doesn't exist today. Event-driven push for sub-heartbeat latency is a straightforward follow-up.
Admin.Services.HostStatusService left-joins DriverHostStatus against ClusterNode on NodeId so rows persist even when the ClusterNode entry doesn't exist yet (first-boot bootstrap case). StaleThreshold = 30s — covers one missed publisher heartbeat plus a generous buffer for clock skew and GC pauses. Admin Components/Pages/Hosts.razor — FleetAdmin-visible page grouped by cluster (handles the '(unassigned)' case for rows without a matching ClusterNode). Four summary cards (Hosts / Running / Stale / Faulted); per-cluster table with Node / Driver / Host / State + Stale-badge / Last-transition / Last-seen / Detail columns; 10s auto-refresh via IServiceScopeFactory timer pattern matching FleetStatusPoller + Fleet dashboard (PR 27). Row-class highlighting: Faulted → table-danger, Stale → table-warning, else default. State badge maps DriverHostState enum to bootstrap color classes. Sidebar link added between 'Fleet status' and 'Clusters'.
Server csproj adds Microsoft.EntityFrameworkCore.SqlServer 10.0.0 + registers OtOpcUaConfigDbContext in Program.cs scoped via NodeOptions.ConfigDbConnectionString (no Admin-style manual SQL raw — the DbContext is the only access path, keeps migrations owner-of-record).
Tests — HostStatusPublisherTests (4 new Integration cases, uses per-run throwaway DB matching the FleetStatusPollerTests pattern): publisher upserts one row per host from each probe-capable driver and skips non-probe drivers; second tick advances LastSeenUtc without creating duplicate rows (upsert pattern verified end-to-end); state change between ticks updates State AND StateChangedUtc (datetime2(3) rounds to millisecond precision so comparison uses 1ms tolerance — documented inline); MapState translates every HostState enum member. Server.Tests Integration: 4 new tests pass. Admin build clean, Admin.Tests Unit still 23 / 0. docs/v2/lmx-followups.md item #7 marked DONE with three explicit deferred items (event-driven push, failure-count column, SignalR fan-out).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-18 15:51:55 -04:00
Joseph Doherty
2f00c74bbb Phase 3 PR 32 — Multi-driver integration test. Closes LMX follow-up #6 with Server.Tests/MultipleDriverInstancesIntegrationTests.cs: registers two StubDriver instances (alpha + beta) with distinct DriverInstanceIds on one DriverHost, boots the full OpcUaApplicationHost, and exercises three behaviors end-to-end via a real OPC UA client session. (1) Each driver's namespace URI resolves to a distinct index in the client's NamespaceUris (alpha → urn:OtOpcUa:alpha, beta → urn:OtOpcUa:beta) — proves DriverNodeManager's namespaceUris-per-driver base-ctor wiring actually lands two separate INodeManager registrations. (2) Browsing one subtree returns only that driver's folder; the other driver's folder does NOT leak into the wrong subtree. This is the test that catches a cross-driver routing regression the v1 single-driver code path couldn't surface — if a future refactor flattens both drivers into a shared namespace, the 'shouldNotContain' assertion fails cleanly. (3) Reads route to the owning driver by namespace — alpha's ReadAsync returns 42 while beta's returns 99; a misroute would surface as 99 showing up on an alpha node id or vice versa. StubDriver is parameterized on (DriverInstanceId, folderName, readValue) so the same class constructs both instances without copy-paste.
No production code changes — pure additive test. Server.Tests Integration: 3 new tests pass; existing OpcUaServerIntegrationTests stays green (single-driver case still exercised there). Full Server.Tests Unit still 43 / 0. Deferred: multi-driver alarm-event case (two drivers each raising a GalaxyAlarmEvent, assert each condition lands on its owning instance's condition node) — needs a stub IAlarmSource and is worth its own focused PR.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-18 15:29:49 -04:00
Joseph Doherty
4886a5783f Phase 3 PR 31 — Live-LDAP integration test + Active Directory compatibility. Closes LMX follow-up #4 with 6 live-bind tests in Server.Tests/LdapUserAuthenticatorLiveTests.cs against the dev GLAuth instance at localhost:3893 (skipped cleanly when unreachable via Assert.Skip + a clear SkipReason — matches the GalaxyRepositoryLiveSmokeTests pattern). Coverage: valid credentials bind + surface DisplayName; wrong password fails; unknown user fails; empty credentials fail pre-flight without touching the directory; writeop user's memberOf maps through GroupToRole to WriteOperate (the exact string WriteAuthzPolicy.IsAllowed expects); admin user surfaces all four mapped roles (WriteOperate + WriteTune + WriteConfigure + AlarmAck) proving memberOf parsing doesn't stop after the first match. While wiring this up, the authenticator's hard-coded user-lookup filter 'uid=<name>' didn't match GLAuth (which keys users by cn and doesn't populate uid) — AND it doesn't match Active Directory either, which uses sAMAccountName. Added UserNameAttribute to LdapOptions (default 'uid' for RFC 2307 backcompat) so deployments override to 'cn' / 'sAMAccountName' / 'userPrincipalName' as the directory requires; authenticator filter now interpolates the configured attribute. The default stays 'uid' so existing test fixtures and OpenLDAP installs keep working without a config change — a regression guard in LdapUserAuthenticatorAdCompatTests.LdapOptions_default_UserNameAttribute_is_uid_for_rfc2307_compat pins this so a future 'helpful' default change can't silently break anyone.
Active Directory compatibility. LdapOptions xml-doc expanded with a cheat-sheet covering Server (DC FQDN), Port 389 vs 636, UseTls=true under AD LDAP-signing enforcement, dedicated read-only service account DN, sAMAccountName vs userPrincipalName vs cn trade-offs, memberOf DN shape (CN=Group,OU=...,DC=... with the CN= RDN stripped to become the GroupToRole key), and the explicit 'nested groups NOT expanded' call-out (LDAP_MATCHING_RULE_IN_CHAIN / tokenGroups is a future authenticator enhancement, not a config change). docs/security.md §'Active Directory configuration' adds a complete appsettings.json snippet with realistic AD group names (OPCUA-Operators → WriteOperate, OPCUA-Engineers → WriteConfigure, OPCUA-AlarmAck → AlarmAck, OPCUA-Tuners → WriteTune), LDAPS port 636, TLS on, insecure-LDAP off, and operator-facing notes on each field. LdapUserAuthenticatorAdCompatTests (5 unit guards): ExtractFirstRdnValue parses AD-style 'CN=OPCUA-Operators,OU=...,DC=...' DNs correctly (case-preserving — operators' GroupToRole keys stay readable); also handles mixed case and spaces in group names ('Domain Users'); also works against the OpenLDAP ou=<group>,ou=groups shape (GLAuth) so one extractor tolerates both memberOf formats common in the field; EscapeLdapFilter escapes the RFC 4515 injection set (\, *, (, ), \0) so a malicious login like 'admin)(cn=*' can't break out of the filter; default UserNameAttribute regression guard.
Test posture — Server.Tests Unit: 43 pass / 0 fail (38 prior + 5 new AD-compat guards). Server.Tests LiveLdap category: 6 pass / 0 fail against running GLAuth (would skip cleanly without). Server build clean, 0 errors, 0 warnings.
Deferred: the session-identity end-to-end check (drive a full OPC UA UserName session, then read a 'whoami' node to verify the role landed on RoleBasedIdentity). That needs a test-only address-space node and is scoped for a separate PR.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-18 15:23:22 -04:00
Joseph Doherty
6b04a85f86 Phase 3 PR 26 — server-layer write authorization gating by role. Per the user's ACL-at-server-layer directive (saved as feedback_acl_at_server_layer.md in memory), write authorization is enforced in DriverNodeManager.OnWriteValue and never delegated to the driver or to driver-specific auth (the v1 Galaxy-provided security path is explicitly not part of v2 — drivers report SecurityClassification as discovery metadata only). New WriteAuthzPolicy static class in Server/Security/ maps SecurityClassification → required role per the table documented in docs/Configuration.md: FreeAccess = no role required (anonymous sessions can write), Operate + SecuredWrite = WriteOperate, Tune = WriteTune, VerifiedWrite + Configure = WriteConfigure, ViewOnly = deny regardless of roles. Role matching is case-insensitive and role requirements do NOT cascade — a session with WriteConfigure can write Configure attributes but needs WriteOperate separately to write Operate attributes; this is deliberate so escalation is an explicit LDAP group assignment, not a hierarchy the policy silently grants. DriverNodeManager gains a _securityByFullRef Dictionary populated during Variable() registration (parallel to the existing _variablesByFullRef) so OnWriteValue can look up the classification in O(1) on the hot path. OnWriteValue casts the session's context.UserIdentity to the new IRoleBearer interface (implemented by OtOpcUaServer.RoleBasedIdentity from PR 19) — empty Roles collection when the session is anonymous; the same WriteAuthzPolicy.IsAllowed check then either short-circuits true (FreeAccess), false (ViewOnly), or walks the roles list looking for the required one. On deny, OnWriteValue logs 'Write denied for {FullRef}: classification=X userRoles=[...]' at Information level (readable trail for operator complaints) and returns BadUserAccessDenied without touching IWritable.WriteAsync — drivers never see a request we'd have refused. IRoleBearer kept as a minimal server-side interface rather than reusing some abstraction from Core.Abstractions because the concept is OPC-UA-session-scoped and doesn't generalize (the driver side has no notion of a user session). Tests — WriteAuthzPolicyTests (17 new cases): FreeAccess allows write with empty role set + arbitrary roles; ViewOnly denies write even with every role; Operate requires WriteOperate; role match is case-insensitive; Operate denies empty role set + wrong role; SecuredWrite shares Operate's requirement; Tune requires WriteTune; Tune denies WriteOperate-only (asserts roles don't cascade — this is the test that catches a future regression where someone 'helpfully' adds a role-escalation table); Configure requires WriteConfigure; VerifiedWrite shares Configure's requirement; multi-role session allowed when any role matches; unrelated roles denied; RequiredRole theory covering all 5 classified-and-mapped rows + null for FreeAccess/ViewOnly special cases. lmx-followups.md follow-up #2 marked DONE with a back-reference to this PR and the memory note. Full Server.Tests Unit suite: 38 pass / 0 fail (17 new WriteAuthz + 14 SecurityConfiguration from PR 19 + 2 NodeBootstrap + 5 others). Server.Tests Integration (Category=Integration) 2 pass — existing PR 17 anonymous-endpoint smoke tests stay green since the read path doesn't hit OnWriteValue.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-18 13:01:01 -04:00
Joseph Doherty
22d3b0d23c Phase 3 PR 19 — LDAP user identity + Basic256Sha256 security profile. Replaces the anonymous-only endpoint with a configurable security profile and an LDAP-backed UserName token validator. New IUserAuthenticator abstraction in Backend/Security/: LdapUserAuthenticator binds to the configured directory (reuses the pattern from Admin.Security.LdapAuthService without the cross-app dependency — Novell.Directory.Ldap.NETStandard 3.6.0 package ref added to Server alongside the existing OPCFoundation packages) and maps group membership to OPC UA roles via LdapOptions.GroupToRole (case-insensitive). DenyAllUserAuthenticator is the default when Ldap.Enabled=false so UserName token attempts return a clean BadUserAccessDenied rather than hanging on a localhost:3893 bind attempt. OpcUaSecurityProfile enum + LdapOptions nested record on OpcUaServerOptions. Profile=None keeps the PR 17 shape (SecurityPolicies.None + Anonymous token only) so existing integration tests stay green; Profile=Basic256Sha256SignAndEncrypt adds a second ServerSecurityPolicy (Basic256Sha256 + SignAndEncrypt) to the collection and, when Ldap.Enabled=true, adds a UserName token policy scoped to SecurityPolicies.Basic256Sha256 only — passwords must ride an encrypted channel, the stack rejects UserName over None. OtOpcUaServer.OnServerStarted hooks SessionManager.ImpersonateUser: AnonymousIdentityToken passes through; UserNameIdentityToken delegates to IUserAuthenticator.AuthenticateAsync — rejected identities throw ServiceResultException(BadUserAccessDenied); accepted identities get a RoleBasedIdentity that carries the resolved roles through session.Identity so future PRs can gate writes by role. OpcUaApplicationHost + OtOpcUaServer constructors take IUserAuthenticator as a dependency. Program.cs binds the new OpcUaServer:Ldap section from appsettings (Enabled defaults false, GroupToRole parsed as Dictionary<string,string>), registers IUserAuthenticator as LdapUserAuthenticator when enabled or DenyAllUserAuthenticator otherwise. PR 17 integration test updated to pass DenyAllUserAuthenticator so it keeps exercising the anonymous-only path unchanged. Tests — SecurityConfigurationTests (new, 13 cases): DenyAllAuthenticator rejects every credential; LdapAuthenticator rejects blank creds without hitting the server; rejects when Enabled=false; rejects plaintext when both UseTls=false AND AllowInsecureLdap=false (safety guard matching the Admin service); EscapeLdapFilter theory (4 rows: plain passthrough, parens/asterisk/backslash → hex escape) — regression guard against LDAP injection; ExtractOuSegment theory (3 rows: finds ou=, returns null when absent, handles multiple ou segments by returning first); ExtractFirstRdnValue theory (3 rows: strips cn= prefix, handles single-segment DN, returns plain string unchanged when no =). OpcUaServerOptions_default_is_anonymous_only asserts the default posture preserves PR 17 behavior. InternalsVisibleTo('ZB.MOM.WW.OtOpcUa.Server.Tests') added to Server csproj so ExtractOuSegment and siblings are reachable from the tests. Full solution: 0 errors, 180 tests pass (8 Core + 14 Proxy + 24 Configuration + 6 Shared + 91 Galaxy.Host + 19 Server (17 unit + 2 integration) + 18 Admin). Live-LDAP integration test (connect via Basic256Sha256 endpoint with a real user from GLAuth, assert the session.Identity carries the mapped role) is deferred to a follow-up — it requires the GLAuth dev instance to be running at localhost:3893 which is dev-machine-specific, and the test harness for that also needs a fresh client-side certificate provisioned by the live server's trusted store.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-18 08:49:46 -04:00
Joseph Doherty
46834a43bd Phase 3 PR 17 — complete OPC UA server startup end-to-end + integration test. PR 16 shipped the materialization shape (DriverNodeManager / OtOpcUaServer) without the activation glue; this PR finishes the scope so an external OPC UA client can actually connect, browse, and read. New OpcUaServerOptions DTO bound from the OpcUaServer section of appsettings.json (EndpointUrl default opc.tcp://0.0.0.0:4840/OtOpcUa, ApplicationName, ApplicationUri, PkiStoreRoot default %ProgramData%\OtOpcUa\pki, AutoAcceptUntrustedClientCertificates default true for dev — production flips via config). OpcUaApplicationHost wraps Opc.Ua.Configuration.ApplicationInstance: BuildConfiguration constructs the ApplicationConfiguration programmatically (no external XML) with SecurityConfiguration pointing at <PkiStoreRoot>/own, /issuers, /trusted, /rejected directories — stack auto-creates the cert folders on first run and generates a self-signed application certificate via CheckApplicationInstanceCertificate, ServerConfiguration.BaseAddresses set to the endpoint URL + SecurityPolicies just None + UserTokenPolicies just Anonymous with PolicyId='Anonymous' + SecurityPolicyUri=None so the client's UserTokenPolicy lookup succeeds at OpenSession, TransportQuotas.OperationTimeout=15s + MinRequestThreadCount=5 / MaxRequestThreadCount=100 / MaxQueuedRequestCount=200, CertificateValidator auto-accepts untrusted when configured. StartAsync creates the OtOpcUaServer (passes DriverHost + ILoggerFactory so one DriverNodeManager is created per registered driver in CreateMasterNodeManager from PR 16), calls ApplicationInstance.Start(server) to bind the endpoint, then walks each DriverNodeManager and drives a fresh GenericDriverNodeManager.BuildAddressSpaceAsync against it so the driver's discovery streams into the address space that's already serving clients. Per-driver discovery is isolated per decision #12: a discovery exception marks the driver's subtree faulted but the server stays up serving the other drivers' subtrees. DriverHost.GetDriver(instanceId) public accessor added alongside the existing GetHealth so OtOpcUaServer can enumerate drivers during CreateMasterNodeManager. DriverNodeManager.Driver property made public so OpcUaApplicationHost can identify which driver each node manager wraps during the discovery loop. OpcUaServerService constructor takes OpcUaApplicationHost — ExecuteAsync sequence now: bootstrap.LoadCurrentGenerationAsync → applicationHost.StartAsync → infinite Task.Delay until stop. StopAsync disposes the application host (which stops the server via OtOpcUaServer.Stop) before disposing DriverHost. Program.cs binds OpcUaServerOptions from appsettings + registers OpcUaApplicationHost + OpcUaServerOptions as singletons. Integration test (OpcUaServerIntegrationTests, Category=Integration): IAsyncLifetime spins up the server on a random non-default port (48400+random for test isolation) with a per-test-run PKI store root (%temp%/otopcua-test-<guid>) + a FakeDriver registered in DriverHost that has ITagDiscovery + IReadable implementations — DiscoverAsync registers TestFolder>Var1, ReadAsync returns 42. Client_can_connect_and_browse_driver_subtree creates an in-process OPC UA client session via CoreClientUtils.SelectEndpoint (which talks to the running server's GetEndpoints and fetches the live EndpointDescription with the actual PolicyId), browses the fake driver's root, asserts TestFolder appears in the returned references. Client_can_read_a_driver_variable_through_the_node_manager constructs the variable NodeId using the namespace index the server registered (urn:OtOpcUa:fake), calls Session.ReadValue, asserts the DataValue.Value is 42 — the whole pipeline (client → server endpoint → DriverNodeManager.OnReadValue → FakeDriver.ReadAsync → back through the node manager → response to client) round-trips correctly. Dispose tears down the session, server, driver host, and PKI store directory. Full solution: 0 errors, 165 tests pass (8 Core unit + 14 Proxy unit + 24 Configuration unit + 6 Shared unit + 91 Galaxy.Host unit + 4 Server (2 unit NodeBootstrap + 2 new integration) + 18 Admin). End-to-end outcome: PR 14's GalaxyAlarmTracker alarm events now flow through PR 15's GenericDriverNodeManager event forwarder → PR 16's ConditionSink → OPC UA AlarmConditionState.ReportEvent → out to every OPC UA client subscribed to the alarm condition. The full alarm subsystem (driver-side subscription of the Galaxy 4-attribute quartet, Core-side routing by source node id, Server-side AlarmConditionState materialization with ReportEvent dispatch) is now complete and observable through any compliant OPC UA client. LDAP / security-profile wire-up (replacing the anonymous-only endpoint with BasicSignAndEncrypt + user identity mapping to NodePermissions role) is the next layer — it reuses the same ApplicationConfiguration plumbing this PR introduces but needs a deployment-policy source (central config DB) for the cert trust decisions.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-18 08:18:37 -04:00
Joseph Doherty
f53c39a598 Phase 3 PR 16 — concrete OPC UA server scaffolding with AlarmConditionState materialization. Introduces the OPCFoundation.NetStandard.Opc.Ua.Server package (v1.5.374.126, same version the v1 stack already uses) and two new server-side classes: DriverNodeManager : CustomNodeManager2 is the concrete realization of PR 15's IAddressSpaceBuilder contract — Folder() creates FolderState nodes under an Organizes hierarchy rooted at ObjectsFolder > DriverInstanceId; Variable() creates BaseDataVariableState with DataType mapped from DriverDataType (Boolean/Int32/Float/Double/String/DateTime) + ValueRank (Scalar or OneDimension) + AccessLevel CurrentReadOrWrite; AddProperty() creates PropertyState with HasProperty reference. Read hook wires OnReadValue per variable to route to IReadable.ReadAsync; Write hook wires OnWriteValue to route to IWritable.WriteAsync and surface per-tag StatusCode. MarkAsAlarmCondition() materializes an OPC UA AlarmConditionState child of the variable, seeded from AlarmConditionInfo (SourceName, InitialSeverity → UA severity via Low=250/Medium=500/High=700/Critical=900, InitialDescription), initial state Enabled + Acknowledged + Inactive + Retain=false. Returns an IAlarmConditionSink whose OnTransition updates alarm.Severity/Time/Message and switches state per AlarmType string ('Active' → SetActiveState(true) + SetAcknowledgedState(false) + Retain=true; 'Acknowledged' → SetAcknowledgedState(true); 'Inactive' → SetActiveState(false) + Retain=false if already Acked) then calls alarm.ReportEvent to emit the OPC UA event to subscribed clients. Galaxy's GalaxyAlarmTracker (PR 14) now lands at a concrete AlarmConditionState node instead of just raising an unobserved C# event. OtOpcUaServer : StandardServer wires one DriverNodeManager per DriverHost.GetDriver during CreateMasterNodeManager — anonymous endpoint, no security profile (minimum-viable; LDAP + security-profile wire-up is the next PR). DriverHost gains public GetDriver(instanceId) so the server can enumerate drivers at startup. NestedBuilder inner class in DriverNodeManager implements IAddressSpaceBuilder by temporarily retargeting the parent's _currentFolder during each call so Folder→Variable→AddProperty land under the correct subtree — not thread-safe if discovery ran concurrently, but GenericDriverNodeManager.BuildAddressSpaceAsync is sequential per driver so this is safe by construction. NuGet audit suppress for GHSA-h958-fxgg-g7w3 (moderate-severity in OPCFoundation.NetStandard.Opc.Ua.Core 1.5.374.126; v1 stack already accepts this risk on the same package version). PR 16 is scoped as scaffolding — the actual server startup (ApplicationInstance, certificate config, endpoint binding, session management wiring into OpcUaServerService.ExecuteAsync) is deferred to a follow-up PR because it needs ApplicationConfiguration XML + optional-cert-store logic that depends on per-deployment policy decisions. The materialization shape is complete: a subsequent PR adds 100 LOC to start the server and all the already-written IAddressSpaceBuilder + alarm-condition + read/write wire-up activates end-to-end. Full solution: 0 errors, 152 unit tests pass (no new tests this PR — DriverNodeManager unit testing needs an IServerInternal mock which is heavyweight; live-endpoint integration tests land alongside the server-startup PR).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-18 08:00:36 -04:00
Joseph Doherty
01fd90c178 Phase 1 Streams B–E scaffold + Phase 2 Streams A–C scaffold — 8 new projects with ~70 new tests, all green alongside the 494 v1 IntegrationTests baseline (parity preserved: no v1 tests broken; legacy OtOpcUa.Host untouched). Phase 1 finish: Configuration project (16 entities + 10 enums + DbContext + DesignTimeDbContextFactory + InitialSchema/StoredProcedures/AuthorizationGrants migrations — 8 procs including sp_PublishGeneration with MERGE on ExternalIdReservation per decision #124, sp_RollbackToGeneration cloning rows into a new published generation, sp_ValidateDraft with cross-cluster-namespace + EquipmentUuid-immutability + ZTag/SAPID reservation pre-flight, sp_ComputeGenerationDiff with CHECKSUM-based row signature — plus OtOpcUaNode/OtOpcUaAdmin SQL roles with EXECUTE grants scoped to per-principal-class proc sets and DENY UPDATE/DELETE/INSERT/SELECT on dbo schema); managed DraftValidator covering UNS segment regex, path length, EquipmentUuid immutability across generations, same-cluster namespace binding (decision #122), reservation pre-flight, EquipmentId derivation (decision #125), driver↔namespace compatibility — returning every failing rule in one pass; LiteDB local cache with round-trip + ring pruning + corruption-fast-fail; GenerationApplier with per-entity Added/Removed/Modified diff and dependency-ordered callbacks (namespace → driver → device → equipment → poll-group → tag, Removed before Added); Core project with GenericDriverNodeManager (scaffold for the Phase 2 Galaxy port) and DriverHost lifecycle registry; Server project using Microsoft.Extensions.Hosting BackgroundService replacing TopShelf, with NodeBootstrap that falls back to LiteDB cache when the central DB is unreachable (decision #79); Admin project scaffolded as Blazor Server with Bootstrap 5 sidebar layout, cookie auth, three admin roles (ConfigViewer/ConfigEditor/FleetAdmin), Cluster + Generation services fronting the stored procs. Phase 2 scaffold: Driver.Galaxy.Shared (netstandard2.0) with full MessagePack IPC contract surface — Hello version negotiation, Open/CloseSession, Heartbeat, DiscoverHierarchy + GalaxyObjectInfo/GalaxyAttributeInfo, Read/WriteValues, Subscribe/Unsubscribe/OnDataChange, AlarmSubscribe/Event/Ack, HistoryRead, HostConnectivityStatus, Recycle — plus length-prefixed framing (decision #28) with a 16 MiB cap and thread-safe FrameWriter/FrameReader; Driver.Galaxy.Host (net48) implementing the Tier C cross-cutting protections from driver-stability.md — strict PipeAcl (allow configured server SID only, explicit deny on LocalSystem + Administrators), PipeServer with caller-SID verification via pipe.RunAsClient + WindowsIdentity.GetCurrent and per-process shared-secret Hello, Galaxy-specific MemoryWatchdog (warn at max(1.5×baseline, +200 MB), soft-recycle at max(2×baseline, +200 MB), hard ceiling 1.5 GB, slope ≥5 MB/min over 30-min rolling window), RecyclePolicy (1 soft recycle per hour cap + 03:00 local daily scheduled), PostMortemMmf (1000-entry ring buffer in %ProgramData%\OtOpcUa\driver-postmortem\galaxy.mmf, survives hard crash, readable cross-process), MxAccessHandle : SafeHandle (ReleaseHandle loops Marshal.ReleaseComObject until refcount=0 then calls optional unregister callback), StaPump with responsiveness probe (BlockingCollection dispatcher for Phase 1 — real Win32 GetMessage/DispatchMessage pump slots in with the same semantics when the Galaxy code lift happens), IsExternalInit shim for init setters on .NET 4.8; Driver.Galaxy.Proxy (net10) implementing IDriver + ITagDiscovery forwarding over the IPC channel with MX data-type and security-classification mapping, plus Supervisor pieces — Backoff (5s → 15s → 60s capped, reset-on-stable-run), CircuitBreaker (3 crashes per 5 min opens; 1h → 4h → manual cooldown escalation; sticky alert doesn't auto-clear), HeartbeatMonitor (2s cadence, 3 consecutive misses = host dead per driver-stability.md). Infrastructure: docker SQL Server remapped to host port 14330 to coexist with the native MSSQL14 Galaxy ZB DB instance on 1433; NuGetAuditSuppress applied per-project for two System.Security.Cryptography.Xml advisories that only reach via EF Core Design with PrivateAssets=all (fix ships in 11.0.0-preview); .slnx gains 14 project registrations. Deferred with explicit TODOs in docs/v2/implementation/phase-2-partial-exit-evidence.md: Phase 1 Stream E Admin UI pages (Generations listing + draft-diff-publish, Equipment CRUD with OPC 40010 fields, UNS Areas/Lines tabs, ACLs + permission simulator, Generic JSON config editor, SignalR real-time, Release-Reservation + Merge-Equipment workflows, LDAP login page, AppServer smoke test per decision #142), Phase 2 Stream D (Galaxy MXAccess code lift out of legacy OtOpcUa.Host, dual-service installer, appsettings → DriverConfig migration script, legacy Host deletion — blocked by parity), Phase 2 Stream E (v1 IntegrationTests against v2 topology, Client.CLI walkthrough diff, four 2026-04-13 stability findings regression tests, adversarial review — requires live MXAccess runtime).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-17 21:35:25 -04:00