Closes the HostedService half of Phase 6.1 Stream E.2 flagged as a follow-up
when the DriverResilienceStatusTracker shipped in PR #82. The Admin /hosts
column refresh + SignalR push + red-badge visual (Stream E.3) remain
deferred to the visual-compliance pass — this PR owns the persistence
story alone.
Server.Hosting:
- ResilienceStatusPublisherHostedService : BackgroundService. Samples the
DriverResilienceStatusTracker every TickInterval (default 5 s) and upserts
each (DriverInstanceId, HostName) counter pair into
DriverInstanceResilienceStatus via EF. New rows on first sight; in-place
updates on subsequent ticks.
- PersistOnceAsync extracted public so tests drive one tick directly —
matches the ScheduledRecycleHostedService pattern for deterministic
timing.
- Best-effort persistence: a DB outage logs a warning + continues; the next
tick retries. Never crashes the app on sample failure. Cancellation
propagates through cleanly.
- Tracks the bulkhead depth / recycle / footprint columns the entity was
designed for. CurrentBulkheadDepth currently persisted as 0 — the tracker
doesn't yet expose live bulkhead depth; a narrower follow-up wires the
Polly bulkhead-depth observer into the tracker.
Tests (6 new in ResilienceStatusPublisherHostedServiceTests):
- Empty tracker → tick is a no-op, zero rows written.
- Single-host counters → upsert a new row with ConsecutiveFailures + breaker
timestamp + sampled timestamp.
- Second tick updates the existing row in place (not a second insert).
- Multi-host pairs persist independently.
- Footprint counters (Baseline + Current) round-trip.
- TickCount advances on every PersistOnceAsync call.
Full solution dotnet test: 1225 passing (was 1219, +6). Pre-existing
Client.CLI Subscribe flake unchanged.
Production wiring (Program.cs) example:
builder.Services.AddSingleton<DriverResilienceStatusTracker>();
builder.Services.AddHostedService<ResilienceStatusPublisherHostedService>();
// Tracker gets wired into CapabilityInvoker via OtOpcUaServer resolution
// + the existing Phase 6.1 layer.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Turns the Phase 6.1 Stream B.4 pure-logic ScheduledRecycleScheduler (shipped
in PR #79) into a running background feature. A Tier C driver registers its
scheduler at startup; the hosted service ticks every TickInterval (default
1 min) and invokes TickAsync on each registered scheduler.
Server.Hosting:
- ScheduledRecycleHostedService : BackgroundService. AddScheduler(s) must be
called before StartAsync — registering post-start throws
InvalidOperationException to avoid "some ticks saw my scheduler, some
didn't" races. ExecuteAsync loops on Task.Delay(TickInterval, _timeProvider,
stoppingToken) + delegates to a public TickOnceAsync method for one tick.
- TickOnceAsync extracted as the unit-of-work so tests drive it directly
without needing to synchronize with FakeTimeProvider + BackgroundService
timing semantics.
- Exception isolation: if one scheduler throws, the loop logs + continues
to the next scheduler. A flaky supervisor can't take down the tick for
every other Tier C driver.
- Diagnostics: TickCount + SchedulerCount properties for tests + logs.
Tests (7 new ScheduledRecycleHostedServiceTests, all pass):
- TickOnce before interval doesn't fire; TickCount still advances.
- TickOnce at/after interval fires the underlying scheduler exactly once.
- Multiple ticks accumulate count.
- AddScheduler after StartAsync throws.
- Throwing scheduler doesn't poison its neighbours (logs + continues).
- SchedulerCount matches registrations.
- Empty scheduler list ticks cleanly (no-op + counter advances).
Full solution dotnet test: 1193 passing (was 1186, +7). Pre-existing
Client.CLI Subscribe flake unchanged.
Production wiring (Program.cs):
builder.Services.AddSingleton<ScheduledRecycleHostedService>();
builder.Services.AddHostedService(sp => sp.GetRequiredService<ScheduledRecycleHostedService>());
// During DI configuration, once Tier C drivers + their ScheduledRecycleSchedulers
// are resolved, call host.AddScheduler(scheduler) for each.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Wires the Phase 6.3 Stream B pure-logic pieces (ServiceLevelCalculator,
RecoveryStateManager, ApplyLeaseRegistry) + Stream A topology loader
(RedundancyCoordinator) into one orchestrator the runtime + OPC UA node
surface consume. The actual OPC UA variable-node plumbing (mapping
ServiceLevel Byte + ServerUriArray String[] onto the Opc.Ua.Server stack)
is narrower follow-up on top of this — the publisher emits change events
the OPC UA layer subscribes to.
Server.Redundancy additions:
- PeerReachability record + PeerReachabilityTracker — thread-safe
per-peer-NodeId holder of the latest (HttpHealthy, UaHealthy) tuple. Probe
loops (Stream B.1/B.2 runtime follow-up) write via Update; the publisher
reads via Get. PeerReachability.FullyHealthy / Unknown sentinels for the
two most-common states.
- RedundancyStatePublisher — pure orchestrator, no background timer, no OPC
UA stack dep. ComputeAndPublish reads the 6 inputs + calls the calculator:
* role (from coordinator.Current.SelfRole)
* selfHealthy (caller-supplied Func<bool>)
* peerHttpHealthy + peerUaHealthy (aggregate across all peers in
coordinator.Current.Peers)
* applyInProgress (ApplyLeaseRegistry.IsApplyInProgress)
* recoveryDwellMet (RecoveryStateManager.IsDwellMet)
* topologyValid (coordinator.IsTopologyValid)
* operatorMaintenance (caller-supplied Func<bool>)
Before-coordinator-init returns NoData=1 so clients never see an
authoritative value from an un-bootstrapped server.
OnStateChanged event fires edge-triggered when the byte changes;
OnServerUriArrayChanged fires edge-triggered when the topology's self-first
peer-sorted URI array content changes.
- ServiceLevelSnapshot record — per-tick output with Value + Band +
Topology. The OPC UA layer's ServiceLevel Byte node subscribes to
OnStateChanged; the ServerUriArray node subscribes to OnServerUriArrayChanged.
Tests (8 new RedundancyStatePublisherTests, all pass):
- Before-init returns NoData (Value=1, Band=NoData).
- Authoritative-Primary when healthy + peer fully reachable.
- Isolated-Primary (230) retains authority when peer unreachable — matches
decision #154 non-promotion semantics.
- Mid-apply band dominates: open lease → Value=200 even with peer healthy.
- Self-unhealthy → NoData regardless of other inputs.
- OnStateChanged fires only on value transitions (edge-triggered).
- OnServerUriArrayChanged fires once per topology content change; repeat
ticks with same topology don't re-emit.
- Standalone cluster treats healthy as AuthoritativePrimary=255.
Microsoft.EntityFrameworkCore.InMemory 10.0.0 added to Server.Tests for the
coordinator-backed publisher tests.
Full solution dotnet test: 1186 passing (was 1178, +8). Pre-existing
Client.CLI Subscribe flake unchanged.
Closes the core of release blocker #3 — the pure-logic + orchestration
layer now exists + is unit-tested. Remaining Stream C surfaces: OPC UA
ServiceLevel Byte variable wiring (binds to OnStateChanged), ServerUriArray
String[] wiring (binds to OnServerUriArrayChanged), RedundancySupport
static from RedundancyMode. Those touch the OPC UA stack directly + land
as Stream C.2 follow-up.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Lands the data path that feeds the Phase 6.3 ServiceLevelCalculator shipped in
PR #89. OPC UA node wiring (ServiceLevel variable + ServerUriArray +
RedundancySupport) still deferred to task #147; peer-probe loops (Stream B.1/B.2
runtime layer beyond the calculator logic) deferred.
Server.Redundancy additions:
- RedundancyTopology record — immutable snapshot (ClusterId, SelfNodeId,
SelfRole, Mode, Peers[], SelfApplicationUri). ServerUriArray() emits the
OPC UA Part 4 §6.6.2.2 shape (self first, peers lexicographically by
NodeId). RedundancyPeer record with per-peer Host/OpcUaPort/DashboardPort/
ApplicationUri so the follow-up peer-probe loops know where to probe.
- ClusterTopologyLoader — pure fn from ServerCluster + ClusterNode[] to
RedundancyTopology. Enforces Phase 6.3 Stream A.1 invariants:
* At least one node per cluster.
* At most 2 nodes (decision #83, v2.0 cap).
* Every node belongs to the target cluster.
* Unique ApplicationUri across the cluster (OPC UA Part 4 trust pin,
decision #86).
* At most 1 Primary per cluster in Warm/Hot modes (decision #84).
* Self NodeId must be a member of the cluster.
Violations throw InvalidTopologyException with a decision-ID-tagged message
so operators know which invariant + what to fix.
- RedundancyCoordinator singleton — holds the current topology + IsTopologyValid
flag. InitializeAsync throws on invariant violation (startup fails fast).
RefreshAsync logs + flips IsTopologyValid=false (runtime won't tear down a
running server; ServiceLevelCalculator falls to InvalidTopology band = 2
which surfaces the problem to clients without crashing). CAS-style swap
via Volatile.Write so readers always see a coherent snapshot.
Tests (10 new ClusterTopologyLoaderTests):
- Single-node standalone loads + empty peer list.
- Two-node cluster loads self + peer.
- ServerUriArray puts self first + peers sort lexicographically.
- Empty-nodes throws.
- Self-not-in-cluster throws.
- Three-node cluster rejected with decision #83 message.
- Duplicate ApplicationUri rejected with decision #86 shape reference.
- Two Primaries in Warm mode rejected (decision #84 + runtime-band reference).
- Cross-cluster node rejected.
- None-mode allows any role mix (standalone clusters don't enforce Primary count).
Full solution dotnet test: 1178 passing (was 1168, +10). Pre-existing
Client.CLI Subscribe flake unchanged.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Closes release blocker #2 from docs/v2/v2-release-readiness.md — the
generation-sealed cache + resilient reader + stale-config flag shipped as
unit-tested primitives in PR #81, but no production path consumed them until
now. This PR wires them end-to-end.
Server additions:
- SealedBootstrap — Phase 6.1 Stream D consumption hook. Resolves the node's
current generation through ResilientConfigReader's timeout → retry →
fallback-to-sealed pipeline. On every successful central-DB fetch it seals
a fresh snapshot to <cache-root>/<cluster>/<generationId>.db so a future
cache-miss has a known-good fallback. Alongside the original NodeBootstrap
(which still uses the single-file ILocalConfigCache); Program.cs can
switch between them once operators are ready for the generation-sealed
semantics.
- OpcUaApplicationHost: new optional staleConfigFlag ctor parameter. When
wired, HealthEndpointsHost consumes `flag.IsStale` via the existing
usingStaleConfig Func<bool> hook. Means `/healthz` actually reports
`usingStaleConfig: true` whenever a read fell back to the sealed cache —
closes the loop between Stream D's flag + Stream C's /healthz body shape.
Tests (4 new SealedBootstrapIntegrationTests, all pass):
- Central-DB success path seals snapshot + flag stays fresh.
- Central-DB failure falls back to sealed snapshot + flag flips stale (the
SQL-kill scenario from Phase 6.1 Stream D.4.a).
- No-snapshot + central-down throws GenerationCacheUnavailableException
with a clear error (the first-boot scenario from D.4.c).
- Next successful bootstrap after a fallback clears the stale flag.
Full solution dotnet test: 1168 passing (was 1164, +4). Pre-existing
Client.CLI Subscribe flake unchanged.
Production activation: Program.cs wires SealedBootstrap (instead of
NodeBootstrap), constructs OpcUaApplicationHost with the staleConfigFlag,
and a HostedService polls sp_GetCurrentGenerationForCluster periodically so
peer-published generations land in this node's sealed cache. The poller
itself is Stream D.1.b follow-up.
The sp_PublishGeneration SQL-side hook (where the publish commit itself
could also write to a shared sealed cache) stays deferred — the per-node
seal pattern shipped here is the correct v2 GA model: each Server node
owns its own on-disk cache and refreshes from its own DB reads, matching
the Phase 6.1 scope-table description.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Closes the Phase 6.2 security gap the v2 release-readiness dashboard flagged:
the evaluator + trie + gate shipped as code in PRs #84-88 but no dispatch
path called them. This PR threads the gate end-to-end from
OpcUaApplicationHost → OtOpcUaServer → DriverNodeManager and calls it on
every Read / Write / 4 HistoryRead paths.
Server.Security additions:
- NodeScopeResolver — maps driver fullRef → Core.Authorization NodeScope.
Phase 1 shape: populates ClusterId + TagId; leaves NamespaceId / UnsArea /
UnsLine / Equipment null. The cluster-level ACL cascade covers this
configuration (decision #129 additive grants). Finer-grained scope
resolution (joining against the live Configuration DB for UnsArea / UnsLine
path) lands as Stream C.12 follow-up.
- WriteAuthzPolicy.ToOpcUaOperation — maps SecurityClassification → the
OpcUaOperation the gate evaluator consults (Operate/SecuredWrite →
WriteOperate; Tune → WriteTune; Configure/VerifiedWrite → WriteConfigure).
DriverNodeManager wiring:
- Ctor gains optional AuthorizationGate + NodeScopeResolver; both null means
the pre-Phase-6.2 dispatch runs unchanged (backwards-compat for every
integration test that constructs DriverNodeManager directly).
- OnReadValue: ahead of the invoker call, builds NodeScope + calls
gate.IsAllowed(identity, Read, scope). Denied reads return
BadUserAccessDenied without hitting the driver.
- OnWriteValue: preserves the existing WriteAuthzPolicy check (classification
vs session roles) + adds an additive gate check using
WriteAuthzPolicy.ToOpcUaOperation(classification) to pick the right
WriteOperate/Tune/Configure surface. Lax mode falls through for identities
without LDAP groups.
- Four HistoryRead paths (Raw / Processed / AtTime / Events): gate check
runs per-node before the invoker. Events path tolerates fullRef=null
(event-history queries can target a notifier / driver-root; those are
cluster-wide reads that need a different scope shape — deferred).
- New WriteAccessDenied helper surfaces BadUserAccessDenied in the
OpcHistoryReadResult slot + errors list, matching the shape of the
existing WriteUnsupported / WriteInternalError helpers.
OtOpcUaServer + OpcUaApplicationHost: gate + resolver thread through as
optional constructor parameters (same pattern as DriverResiliencePipelineBuilder
in Phase 6.1). Null defaults keep the existing 3 OpcUaApplicationHost
integration tests constructing without them unchanged.
Tests (5 new in NodeScopeResolverTests):
- Resolve populates ClusterId + TagId + Equipment Kind.
- Resolve leaves finer path null per Phase 1 shape (doc'd as follow-up).
- Empty fullReference throws.
- Empty clusterId throws at ctor.
- Resolver is stateless across calls.
The existing 9 AuthorizationGate tests (shipped in PR #86) continue to
cover the gate's allow/deny semantics under strict + lax mode.
Full solution dotnet test: 1164 passing (was 1159, +5). Pre-existing
Client.CLI Subscribe flake unchanged. Existing OpcUaApplicationHost +
HealthEndpointsHost + driver integration tests continue to pass because the
gate defaults to null → no enforcement, and the lax-mode fallback returns
true for identities without LDAP groups (the anonymous test path).
Production deployments flip the gate on by constructing it via
OpcUaApplicationHost's new authzGate parameter + setting
`Authorization:StrictMode = true` once ACL data is populated. Flipping the
switch post-seed turns the evaluator + trie from scaffolded code into
actual enforcement.
This closes release blocker #1 listed in docs/v2/v2-release-readiness.md.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Lands the integration seam between the Server project's OPC UA stack and the
Core.Authorization evaluator. Actual DriverNodeManager dispatch-path wiring
(Read/Write/HistoryRead/Browse/Call/Subscribe/Alarm surfaces) lands in the
follow-up PR on this branch — covered by Task #143 below.
Server.Security additions:
- ILdapGroupsBearer — marker interface a custom IUserIdentity implements to
expose its resolved LDAP group DNs. Parallel to the existing IRoleBearer
(admin roles) — control/data-plane separation per decision #150.
- AuthorizationGate — stateless bridge between Opc.Ua.IUserIdentity and
IPermissionEvaluator. IsAllowed(identity, operation, scope) materializes a
UserAuthorizationState from the identity's LDAP groups, delegates to the
evaluator, and returns a single bool the dispatch paths use to decide
whether to surface BadUserAccessDenied.
- StrictMode knob controls fail-open-during-transition vs fail-closed:
- strict=false (default during rollout) — null identity, identity without
ILdapGroupsBearer, or NotGranted outcome all return true so older
deployments without ACL data keep working.
- strict=true (production target) — any of the above returns false.
The appsetting `Authorization:StrictMode = true` flips deployments over
once their ACL data is populated.
Tests (9 new in Server.Tests/AuthorizationGateTests):
- Null identity — strict denies, lax allows.
- Identity without LDAP groups — strict denies, lax allows.
- LDAP group with matching grant allows.
- LDAP group without grant — strict denies.
- Wrong operation denied (Read-only grant, WriteOperate requested).
- BuildSessionState returns materialized state with LDAP groups + null when
identity doesn't carry them.
Full solution dotnet test: 1087 passing (Phase 6.1 = 1042, Phase 6.2 A = +9,
B = +27, C foundation = +9 = 1087). Pre-existing Client.CLI Subscribe flake
unchanged.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Closes Stream C per docs/v2/implementation/phase-6-1-resilience-and-observability.md.
Core.Observability (new namespace):
- DriverHealthReport — pure-function aggregation over DriverHealthSnapshot list.
Empty fleet = Healthy. Any Faulted = Faulted. Any Unknown/Initializing (no
Faulted) = NotReady. Any Degraded or Reconnecting (no Faulted, no NotReady)
= Degraded. Else Healthy. HttpStatus(verdict) maps to the Stream C.1 state
matrix: Healthy/Degraded → 200, NotReady/Faulted → 503.
- LogContextEnricher — Serilog LogContext wrapper. Push(id, type, capability,
correlationId) returns an IDisposable scope; inner log calls carry
DriverInstanceId / DriverType / CapabilityName / CorrelationId structured
properties automatically. NewCorrelationId = 12-hex-char GUID slice for
cases where no OPC UA RequestHeader.RequestHandle is in flight.
CapabilityInvoker — now threads LogContextEnricher around every ExecuteAsync /
ExecuteWriteAsync call site. OtOpcUaServer passes driver.DriverType through
so logs correlate to the driver type too. Every capability call emits
structured fields per the Stream C.4 compliance check.
Server.Observability:
- HealthEndpointsHost — standalone HttpListener on http://localhost:4841/
(loopback avoids Windows URL-ACL elevation; remote probing via reverse
proxy or explicit netsh urlacl grant). Routes:
/healthz → 200 when (configDbReachable OR usingStaleConfig); 503 otherwise.
Body: status, uptimeSeconds, configDbReachable, usingStaleConfig.
/readyz → DriverHealthReport.Aggregate + HttpStatus mapping.
Body: verdict, drivers[], degradedDrivers[], uptimeSeconds.
anything else → 404.
Disposal cooperative with the HttpListener shutdown.
- OpcUaApplicationHost starts the health host after the OPC UA server comes up
and disposes it on shutdown. New OpcUaServerOptions knobs:
HealthEndpointsEnabled (default true), HealthEndpointsPrefix (default
http://localhost:4841/).
Program.cs:
- Serilog pipeline adds Enrich.FromLogContext + opt-in JSON file sink via
`Serilog:WriteJson = true` appsetting. Uses Serilog.Formatting.Compact's
CompactJsonFormatter (one JSON object per line — SIEMs like Splunk,
Datadog, Graylog ingest without a regex parser).
Server.Tests:
- Existing 3 OpcUaApplicationHost integration tests now set
HealthEndpointsEnabled=false to avoid port :4841 collisions under parallel
execution.
- New HealthEndpointsHostTests (9): /healthz healthy empty fleet; stale-config
returns 200 with flag; unreachable+no-cache returns 503; /readyz empty/
Healthy/Faulted/Degraded/Initializing drivers return correct status and
bodies; unknown path → 404. Uses ephemeral ports via Interlocked counter.
Core.Tests:
- DriverHealthReportTests (8): empty fleet, all-healthy, any-Faulted trumps,
any-NotReady without Faulted, Degraded without Faulted/NotReady, HttpStatus
per-verdict theory.
- LogContextEnricherTests (8): all 4 properties attach; scope disposes cleanly;
NewCorrelationId shape; null/whitespace driverInstanceId throws.
- CapabilityInvokerEnrichmentTests (2): inner logs carry structured
properties; no context leak outside the call site.
Full solution dotnet test: 1016 passing (baseline 906, +110 for Phase 6.1 so
far across Streams A+B+C). Pre-existing Client.CLI Subscribe flake unchanged.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Per-kind override shape: each hook receives the pre-filtered nodesToProcess list (NodeHandles for nodes this manager claimed), iterates them, resolves handle.NodeId.Identifier to the driver-side full reference string, and dispatches to the right IHistoryProvider method. Write back into the outer results + errors slots at handle.Index (not the local loop counter — nodesToProcess is a filtered subset of nodesToRead, so indexing by the loop counter lands in the wrong slot for mixed-manager batches). WriteResult helper sets both results[i] AND errors[i]; this matters because MasterNodeManager merges them and leaving errors[i] at its default (BadHistoryOperationUnsupported) overrides a Good result with Unsupported on the wire — this was the subtle failure mode that masked a correctly-constructed HistoryData response during debugging. Failure-isolation per node: NotSupportedException from a driver that doesn't implement a particular HistoryProvider method translates to BadHistoryOperationUnsupported in that slot; generic exceptions log and surface BadInternalError; unresolvable NodeIds get BadNodeIdUnknown. The batch continues unconditionally.
Aggregate mapping: MapAggregate translates ObjectIds.AggregateFunction_Average / Minimum / Maximum / Total / Count to the driver's HistoryAggregateType enum. Null for anything else (e.g. TimeAverage, Interpolative) so the handler surfaces BadAggregateNotSupported at the batch level — per Part 13, one unsupported aggregate means the whole request fails since ReadProcessedDetails carries one aggregate list for all nodes. BuildHistoryData wraps driver DataValueSnapshots as Opc.Ua.HistoryData in an ExtensionObject; BuildHistoryEvent wraps HistoricalEvents as Opc.Ua.HistoryEvent with the canonical BaseEventType field list (EventId, SourceName, Message, Severity, Time, ReceiveTime — the order OPC UA clients that didn't customize the SelectClause expect). ToDataValue preserves null SourceTimestamp (Galaxy historian rows often carry only ServerTimestamp) — synthesizing a SourceTimestamp would lie about actual sample time.
Two address-space changes were required to make the stack dispatch reach the per-kind hooks at all: (1) historized variables get AccessLevels.HistoryRead added to their AccessLevel byte — the base's early-gate check on (variable.AccessLevel & HistoryRead != 0) was rejecting requests before our override ever ran; (2) the driver-root folder gets EventNotifiers.HistoryRead | SubscribeToEvents so HistoryReadEvents can target it (the conventional pattern for alarm-history browse against a driver-owned object). Document the 'set both bits' requirement inline since it's not obvious from the surface API.
OpcHistoryReadResult alias: Opc.Ua.HistoryReadResult (service-layer per-node result) collides with Core.Abstractions.HistoryReadResult (driver-side samples + continuation point) by type name; the alias 'using OpcHistoryReadResult = Opc.Ua.HistoryReadResult' keeps the override signatures unambiguous and the test project applies the mirror pattern for its stub driver impl.
Tests — DriverNodeManagerHistoryMappingTests (12 new Category=Unit cases): MapAggregate translates each supported aggregate NodeId via reflection-backed theory (guards against the stack renaming AggregateFunction_* constants); returns null for unsupported NodeIds (TimeAverage) and null input; BuildHistoryData wraps samples with correct DataValues + SourceTimestamp preservation; BuildHistoryEvent emits the 6-element BaseEventType field list in canonical order (regression guard for a future 'respect the client's SelectClauses' change); null SourceName / Message translate to empty-string Variants (nullable-Variant refactor trap); ToDataValue preserves StatusCode + both timestamps; ToDataValue leaves SourceTimestamp at default when the snapshot omits it. HistoryReadIntegrationTests (5 new Category=Integration): drives a real OPC UA client Session.HistoryRead against a fake HistoryDriver through the running server. Covers raw round-trip (verifies per-node DataValue ordering + values); processed with Average aggregate (captures the driver's received aggregate + interval, asserting MapAggregate routed correctly); unsupported aggregate (TimeAverage → BadAggregateNotSupported); at-time (forwards the per-timestamp list); events (BaseEventType field list shape, SelectClauses populated to satisfy the stack's filter validator). Server.Tests Unit: 55 pass / 0 fail (43 prior + 12 new mapping). Server.Tests Integration: 14 pass / 0 fail (9 prior + 5 new history). Full solution build clean, 0 errors.
lmx-followups.md #1 updated to 'DONE (PRs 35 + 38)' with two explicit deferred items: continuation-point plumbing (driver returns null today so pass-through is fine) and per-SelectClause evaluation in HistoryReadEvents (clients with custom field selections get the canonical BaseEventType layout today).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Polling chosen over event-driven for initial scope: simpler, matches Admin UI consumer cadence, avoids DriverHost lifecycle-event plumbing that doesn't exist today. Event-driven push for sub-heartbeat latency is a straightforward follow-up.
Admin.Services.HostStatusService left-joins DriverHostStatus against ClusterNode on NodeId so rows persist even when the ClusterNode entry doesn't exist yet (first-boot bootstrap case). StaleThreshold = 30s — covers one missed publisher heartbeat plus a generous buffer for clock skew and GC pauses. Admin Components/Pages/Hosts.razor — FleetAdmin-visible page grouped by cluster (handles the '(unassigned)' case for rows without a matching ClusterNode). Four summary cards (Hosts / Running / Stale / Faulted); per-cluster table with Node / Driver / Host / State + Stale-badge / Last-transition / Last-seen / Detail columns; 10s auto-refresh via IServiceScopeFactory timer pattern matching FleetStatusPoller + Fleet dashboard (PR 27). Row-class highlighting: Faulted → table-danger, Stale → table-warning, else default. State badge maps DriverHostState enum to bootstrap color classes. Sidebar link added between 'Fleet status' and 'Clusters'.
Server csproj adds Microsoft.EntityFrameworkCore.SqlServer 10.0.0 + registers OtOpcUaConfigDbContext in Program.cs scoped via NodeOptions.ConfigDbConnectionString (no Admin-style manual SQL raw — the DbContext is the only access path, keeps migrations owner-of-record).
Tests — HostStatusPublisherTests (4 new Integration cases, uses per-run throwaway DB matching the FleetStatusPollerTests pattern): publisher upserts one row per host from each probe-capable driver and skips non-probe drivers; second tick advances LastSeenUtc without creating duplicate rows (upsert pattern verified end-to-end); state change between ticks updates State AND StateChangedUtc (datetime2(3) rounds to millisecond precision so comparison uses 1ms tolerance — documented inline); MapState translates every HostState enum member. Server.Tests Integration: 4 new tests pass. Admin build clean, Admin.Tests Unit still 23 / 0. docs/v2/lmx-followups.md item #7 marked DONE with three explicit deferred items (event-driven push, failure-count column, SignalR fan-out).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
No production code changes — pure additive test. Server.Tests Integration: 3 new tests pass; existing OpcUaServerIntegrationTests stays green (single-driver case still exercised there). Full Server.Tests Unit still 43 / 0. Deferred: multi-driver alarm-event case (two drivers each raising a GalaxyAlarmEvent, assert each condition lands on its owning instance's condition node) — needs a stub IAlarmSource and is worth its own focused PR.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Active Directory compatibility. LdapOptions xml-doc expanded with a cheat-sheet covering Server (DC FQDN), Port 389 vs 636, UseTls=true under AD LDAP-signing enforcement, dedicated read-only service account DN, sAMAccountName vs userPrincipalName vs cn trade-offs, memberOf DN shape (CN=Group,OU=...,DC=... with the CN= RDN stripped to become the GroupToRole key), and the explicit 'nested groups NOT expanded' call-out (LDAP_MATCHING_RULE_IN_CHAIN / tokenGroups is a future authenticator enhancement, not a config change). docs/security.md §'Active Directory configuration' adds a complete appsettings.json snippet with realistic AD group names (OPCUA-Operators → WriteOperate, OPCUA-Engineers → WriteConfigure, OPCUA-AlarmAck → AlarmAck, OPCUA-Tuners → WriteTune), LDAPS port 636, TLS on, insecure-LDAP off, and operator-facing notes on each field. LdapUserAuthenticatorAdCompatTests (5 unit guards): ExtractFirstRdnValue parses AD-style 'CN=OPCUA-Operators,OU=...,DC=...' DNs correctly (case-preserving — operators' GroupToRole keys stay readable); also handles mixed case and spaces in group names ('Domain Users'); also works against the OpenLDAP ou=<group>,ou=groups shape (GLAuth) so one extractor tolerates both memberOf formats common in the field; EscapeLdapFilter escapes the RFC 4515 injection set (\, *, (, ), \0) so a malicious login like 'admin)(cn=*' can't break out of the filter; default UserNameAttribute regression guard.
Test posture — Server.Tests Unit: 43 pass / 0 fail (38 prior + 5 new AD-compat guards). Server.Tests LiveLdap category: 6 pass / 0 fail against running GLAuth (would skip cleanly without). Server build clean, 0 errors, 0 warnings.
Deferred: the session-identity end-to-end check (drive a full OPC UA UserName session, then read a 'whoami' node to verify the role landed on RoleBasedIdentity). That needs a test-only address-space node and is scoped for a separate PR.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>