The ReconnectSupervisor was constructed but its trigger
ReportTransportFailure was never called. When the gateway StreamEvents
stream faulted, EventPump just logged and exited — the supervisor was
never notified, so a transient gateway drop permanently stopped
data-change notifications while GetHealth() still reported Healthy.
EventPump gains an optional onStreamFault callback invoked from its
stream-fault catch block (not on clean shutdown). GalaxyDriver wires it
to ReconnectSupervisor.ReportTransportFailure so a transport drop drives
reopen → replay.
This is the minimal fix for -001; the pump-restart-on-reopen gap remains
tracked as Driver.Galaxy-008. Regression tests cover the callback being
invoked on fault, the end-to-end supervisor reopen/replay, and that a
clean shutdown does not fire it. Driver.Galaxy suite: 206/206 pass.
Resolves code-review finding Driver.Galaxy-001 (Critical).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The mxaccessgw updated alarms to a session-less central monitor:
AcknowledgeAlarm dropped SessionId and alarm transitions now come from
the session-less StreamAlarms feed instead of the per-session worker
StreamEvents stream. The GalaxyDriver no longer compiled against the
updated client.
- GatewayGalaxyAlarmAcknowledger: session-less rewrite — no GalaxyMxSession;
outcome read from ProtocolStatus (throw) and Hresult (warn).
- New IGalaxyAlarmFeed seam + GatewayGalaxyAlarmFeed: background consumer
of StreamAlarms that decodes the active-alarm snapshot plus live
transitions into GalaxyAlarmTransition and reopens the stream on
transport faults.
- EventPump: drop the dead per-session OnAlarmTransition path; the
per-session stream no longer carries alarms.
- GalaxyDriver: bridge the feed onto IAlarmSource.OnAlarmEvent; the feed
starts on SubscribeAlarmsAsync, independent of data subscriptions.
- Tests: replace EventPumpAlarmTests with GatewayGalaxyAlarmFeedTests;
move the driver alarm-source tests onto the IGalaxyAlarmFeed seam.
Browse needed no change — GatewayGalaxyHierarchySource consumes the
unchanged DiscoverHierarchy contract.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
SdkAlarmHistorianWriteBackend.WriteBatchAsync replaces the RetryPlease
placeholder with the real entry point — HistorianAccess.AddStreamedValue
(HistorianEvent, out HistorianAccessError) in aahClientManaged, pinned by
decompiling the installed SDK.
The write path opens its own ReadOnly=false connection: the query-side
HistorianDataSource opens ReadOnly sessions and AddStreamedValue fails on
those with WriteToReadOnlyFile. IHistorianConnectionFactory gains a readOnly
parameter (default true, query path unchanged); BuildConnectionArgs is
extracted as a pure helper. HistorianClusterEndpointPicker is shared for
node failover; connection-class errors abort the batch as RetryPlease and
reset the connection, malformed-input codes map to PermanentFail.
Tests: connection-unavailable batch deferral, ClassifyOutcome error-code
table, BuildConnectionArgs read-vs-write shaping (80 pass, 2 rig-skipped).
Live_* round-trip tests stay Skip-gated for the D.1 rollout smoke.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Add the spec-required 100/1000-event batching tests and cluster-failover
tests that were missing from the existing C.1 suite:
- AahClientManagedAlarmEventWriterTests: add Large_batch_all_ack_returns_all_true
(batchSize 100 + 1000) and Large_batch_alternating_outcomes_are_positionally_correct
(batchSize 100 + 1000) to satisfy the "1 / 100 / 1000 events" spec requirement;
add Backend_retry_then_succeed_simulates_cluster_failover to cover the
RetryPlease-then-Ack sequence at the IPC layer (unit-level stand-in for the
rig-gated live cluster-failover path).
- SdkAlarmHistorianWriteBackendTests (new file): unit tests that pin the
placeholder backend's RetryPlease-for-every-slot contract (preserves queued
events while D.1 is unresolved); plus two Skip("rig-required") integration
tests covering the live SDK single-event roundtrip and cluster failover via
HistorianClusterEndpointPicker — remove the Skip in PR D.1.
Feasibility note: aahClientManaged.dll IS present in lib/ and referenced in
the csproj; the SDK call site is isolated behind IAlarmHistorianWriteBackend
in SdkAlarmHistorianWriteBackend.WriteBatchAsync (single method, D.1 seam).
The full AahClientManagedAlarmEventWriter implementation was already complete.
Build: 0 errors, 0 warnings.
Tests: 64 passed, 2 skipped (rig-gated), 0 failed.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
#1 EventPumpBoundedChannelTests.Tags_metrics_with_client_name_for_multi_driver_hosts:
Replace fixed Task.Delay(100) with a poll-until-condition loop (5 s
timeout, 25 ms poll) so the test waits until the galaxy.events.received
measurement for galaxy.client=Driver-X actually lands in the listener.
Also adds lock(captured) in the MeterListener callback and at all reads,
since Counter.Add() fires the callback on the RunAsync background thread.
#2 VirtualTagEngineTests.Upstream_change_triggers_cascade_through_two_levels:
After waiting for B=15.0, also await WaitForConditionAsync for C=30.0
before asserting C. The cascade runs B then C sequentially under the
_evalGate semaphore; the prior code could read C while its evaluation
had not yet acquired the gate.
#3 ThreeUserInteropMatrixTests.Admin_Resolves_All_Five_Groups_From_LDAP:
Wrap the AuthenticateAsync call in a 15 s linked CancellationTokenSource
with one retry so transient GLAuth latency spikes under parallel test
load do not cause a CancellationToken expiry before the LDAP bind/search
complete.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Rewrite src/ and tests/ project paths in docs, CLAUDE.md, README.md, and
test-fixture READMEs to the new module-folder layout (Core/Server/Drivers/
Client/Tooling). References to retired v1 projects (Galaxy.Host/Proxy/Shared,
the legacy monolithic test projects) are left untouched.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Group all 69 projects into category subfolders under src/ and tests/ so the
Rider Solution Explorer mirrors the module structure. Folders: Core, Server,
Drivers (with a nested Driver CLIs subfolder), Client, Tooling.
- Move every project folder on disk with git mv (history preserved as renames).
- Recompute relative paths in 57 .csproj files: cross-category ProjectReferences,
the lib/ HintPath+None refs in Driver.Historian.Wonderware, and the external
mxaccessgw refs in Driver.Galaxy and its test project.
- Rebuild ZB.MOM.WW.OtOpcUa.slnx with nested solution folders.
- Re-prefix project paths in functional scripts (e2e, compliance, smoke SQL,
integration, install).
Build green (0 errors); unit tests pass. Docs left for a separate pass.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>