Commit Graph

63 Commits

Author SHA1 Message Date
Joseph Doherty
ebc0511c72 fix(driver-opcuaclient): resolve High code-review findings (Driver.OpcUaClient-001..-005)
Driver.OpcUaClient-001 — ReadAsync/WriteAsync/DiscoverAsync captured the
session before acquiring _gate, so a reconnect that completed while the
operation was blocked on the gate left the wire call bound to a stale,
closed session. All three now re-read Session (and parse NodeIds) inside
the _gate critical section after WaitAsync returns.

Driver.OpcUaClient-002 — OnReconnectComplete ignored the give-up (null
session) case, permanently wedging the driver with no Faulted signal and
no reconnect loop. The give-up branch now transitions HostState to
Faulted, sets a Faulted DriverHealth with an explanatory message, and
re-arms a fresh SessionReconnectHandler (TryRearmReconnect) against the
last-known session so an always-on gateway self-heals.

Driver.OpcUaClient-003 — BrowseRecursiveAsync discarded browse
continuation points, silently truncating large remote folders.
It now loops on BrowseResult.ContinuationPoint calling BrowseNextAsync
and appending each page until the continuation point is empty.

Driver.OpcUaClient-004 — driver-specs.md §8 namespace handling was
absent. Added NamespaceMap (built from session.NamespaceUris at connect,
rebuilt on reconnect) which persists discovered NodeIds in the
server-stable nsu=<uri>;... form; reads/writes re-resolve that form
against the current session so a remote namespace-table reorder no
longer misaddresses nodes. Added the TargetNamespaceKind option +
UnsMappingTable and ValidateNamespaceKind startup enforcement.

Driver.OpcUaClient-005 — OnKeepAlive read/wrote _reconnectHandler
without a lock, racing the SDK keep-alive timer thread and leaking
handlers. The check-and-set in OnKeepAlive, the take-and-clear in
ShutdownAsync, and the dispose/re-arm in OnReconnectComplete now all
run inside the _probeLock critical section.

Adds OpcUaClientNamespaceTests (11 xUnit + Shouldly regression tests)
covering ValidateNamespaceKind and the NamespaceMap stable encoding.
Reconnect/browse wire paths remain fixture-gated per finding -015.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-22 06:41:28 -04:00
Joseph Doherty
090d2a4b44 fix(driver-s7): resolve High code-review findings (Driver.S7-001, -006, -007, -011)
Driver.S7-001: Timer (T{n}) / Counter (C{n}) addresses parsed cleanly but
the read path had no S7DataType or decode case for them, so a Timer/Counter
tag passed fail-fast init and then threw a misleading type-mismatch on every
read. InitializeAsync now runs RejectUnsupportedTagAddresses, throwing a clear
NotSupportedException ("not yet supported", echoing tag name + address) so the
config error fails fast at init.

Driver.S7-006: ShutdownAsync cancelled the probe/poll CTSs but did not await
the fire-and-forget loop tasks before DisposeAsync disposed _gate, letting a
loop iteration mid-semaphore race a disposed object. The probe task is now
tracked in _probeTask and each poll task in SubscriptionState.PollTask;
ShutdownAsync cancels every CTS, awaits Task.WhenAll of those handles with a
bounded 5 s DrainTimeout, then disposes the CTSs and gate. Task.Run is passed
CancellationToken.None so the handle is always awaitable.

Driver.S7-007: a PUT/GET-disabled fault (permanent misconfiguration) was
mapped identically to a transient PlcException — both BadDeviceFailure +
Degraded. ReadAsync/WriteAsync now split the catch via an IsAccessDenied
filter (S7.Net exposes no typed code for AccessingObjectNotAllowed, so the
inner-exception chain is inspected for the "not allowed" marker). Access-denied
now maps to BadNotSupported and Faulted with a config-alert message pointing
at the TIA Portal PUT/GET toggle; genuine device faults stay BadDeviceFailure.

Driver.S7-011: S7Driver ignored driverConfigJson on Initialize/Reinitialize,
so a config change delivered through ReinitializeAsync (the only Core-initiated
in-process recovery path) was silently discarded. Config parsing was factored
into S7DriverFactoryExtensions.ParseOptions; InitializeAsync now re-parses
driverConfigJson and rebuilds _options whenever the document has a real body.
An empty / placeholder document keeps the constructor options.

Adds S7DriverCodeReviewFixTests covering Timer/Counter rejection, config-json
application on Initialize/Reinitialize, and shutdown-drain with active
subscriptions. All 68 S7 driver tests pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-22 06:41:26 -04:00
Joseph Doherty
d89be2a011 fix(driver-ablegacy): resolve High code-review findings (Driver.AbLegacy-001, Driver.AbLegacy-006)
Driver.AbLegacy-001 — PCCC bit-index range. AbLegacyAddress.TryParse
accepted a bit index of 0..31 for every file type, but a 16-bit
N/B/I/O/S/A word only has bits 0..15. TryParse now range-checks the
bit index against the file's word width (0..15 for 16-bit element
files, 0..31 for the 32-bit L file, no bits on float files), so
addresses like N7:0/20 are rejected at parse time instead of silently
truncating in the (short) cast. WriteBitInWordAsync reads and writes
an L-file parent word as 32-bit Long and masks the RMW arithmetic to
the native width, so a sign-extended 16-bit decode can no longer
corrupt the high bits.

Driver.AbLegacy-006 — shared-runtime concurrency. A per-tag libplctag
Tag handle is cached and reused by both the server read path and the
poll loop, with no synchronisation around Read/GetStatus/DecodeValue.
Added a per-runtime SemaphoreSlim (DeviceState.GetRuntimeLock, keyed
by tag name); ReadAsync and WriteAsync now hold it across the whole
Read -> GetStatus -> Decode / Encode -> Write -> GetStatus sequence so
no two threads touch the same Tag handle concurrently.

Added xUnit + Shouldly regression coverage: AbLegacyBitIndexRangeTests
(per-file bit-range validation + L-file 32-bit RMW + sign-extension
safety) and AbLegacyRuntimeConcurrencyTests (overlap-detecting fake
proving concurrent read/read and read/write are serialised).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-22 06:41:26 -04:00
Joseph Doherty
8a7668c678 fix(driver-abcip): resolve High code-review findings (Driver.AbCip-001, -002, -003, -008)
Driver.AbCip-001 — ReinitializeAsync silently discarded its config JSON.
Extracted AbCipDriverFactoryExtensions.ParseOptions; InitializeAsync now
re-parses a content-bearing driverConfigJson and replaces _options (and
recreates the alarm projection), so a reinitialize with a changed config
(new device/tag, changed timeout) actually takes effect. A blank or
empty-object JSON keeps construction-time options for the unit-test seam.

Driver.AbCip-002 — libplctag status mapping used wrong integer constants.
MapLibplctagStatus now switches on the libplctag.NET Status enum members
(Ok/Pending/ErrorTimeout/ErrorNotFound/ErrorNotAllowed/ErrorOutOfBounds/…)
instead of hand-typed natives, so timeout/not-found/not-allowed/out-of-bounds
get their specific OPC UA codes instead of all collapsing to
BadCommunicationError. The int overload casts to Status to stay correct
against the wrapper's contiguous renumbering.

Driver.AbCip-003 — whole-UDT reads decoded members at declaration-order
offsets, which Studio 5000 does not guarantee. Added the opt-in
AbCipDriverOptions.EnableDeclarationOnlyUdtGrouping flag (default false);
AbCipUdtReadPlanner.Build forms no groups when it is off, so by default
every UDT member reads per-tag rather than at possibly-wrong offsets.

Driver.AbCip-008 — probe loops were fire-and-forget and ShutdownAsync raced
them. Each probe Task is stored on DeviceState.ProbeTask; ShutdownAsync now
cancels every CTS, awaits each probe Task (10s timeout), then disposes the
CTS and handles. DeviceState.Runtimes/ParentRuntimes are ConcurrentDictionary
and the Ensure*RuntimeAsync paths use TryAdd, disposing the losing concurrent
creator instead of leaking a native tag handle.

Adds AbCipDriverCodeReviewRegressionTests and updates existing AbCip tests
to the corrected status constants + opt-in grouping flag. AbCip driver +
test project build clean; all 244 AbCip tests pass. (The full-solution
build has pre-existing, unrelated Driver.Galaxy protobuf-generation errors
in this worktree.)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-22 06:41:25 -04:00
Joseph Doherty
5197b6c237 fix(driver-twincat): resolve High code-review findings (Driver.TwinCAT-001, -002, -007, -008, -013)
Driver.TwinCAT-001 — InitializeAsync/ReinitializeAsync ignored driverConfigJson.
Extracted the DTO-to-options parse into a shared TwinCATDriverFactoryExtensions.ParseOptions;
InitializeAsync now re-parses driverConfigJson into a mutable _options field, so a config
generation pushed via ReinitializeAsync (added/removed devices, tags, probe settings) is
actually applied at runtime.

Driver.TwinCAT-002 — LInt/ULInt narrowed to Int32. ToDriverDataType now maps LInt to Int64,
ULInt to UInt64, UDInt to UInt32, UInt/USInt to UInt16, Int/SInt to Int16, and the IEC
TIME/DATE/DT/TOD types to UInt32 (their raw UDINT counter). Removed the stale "Int64 gap"
comment — no truncation or sign flips at the OPC UA encode layer.

Driver.TwinCAT-007 — EnsureConnectedAsync was not thread-safe. Connect/reconnect is now
serialized per device by a SemaphoreSlim (DeviceState.ConnectGate) with a double-checked
connect, mirroring the S7 driver. Concurrent read/write/probe callers can no longer leak a
client or race a create-vs-dispose.

Driver.TwinCAT-008 — native ADS notification callbacks ran driver logic on the AMS router
thread. AdsTwinCATClient now enqueues AdsNotificationEx callbacks onto a bounded Channel
drained by a dedicated managed task; the router-thread callback only does a non-blocking
TryWrite, so a slow consumer cannot stall ADS notification delivery process-wide.

Driver.TwinCAT-013 — TwinCATDriver did not implement IRediscoverable. The driver now
implements IRediscoverable; AdsTwinCATClient detects ADS 0x0702 (symbol-version-changed) on
read/write paths and raises OnSymbolVersionChanged, which the driver forwards as
OnRediscoveryNeeded so Core rebuilds the address space after a PLC program re-download.

Adds TwinCATHighFindingsRegressionTests covering all five fixes; updates the data-type
mapping assertion in TwinCATDriverTests. TwinCAT driver builds clean; 119 tests pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-22 06:37:05 -04:00
Joseph Doherty
6300a9e4a8 fix(driver-cli-common): resolve High code-review finding (Driver.Cli.Common-001)
SnapshotFormatter.FormatStatus mapped four OPC UA status names to
incorrect numeric codes, mislabelling operator-facing CLI output. The
codes were corrected to their canonical OPC Foundation
Opc.Ua.StatusCodes values:

  BadTimeout                0x80060000 -> 0x800A0000
  BadNoCommunication        0x80070000 -> 0x80310000
  BadWaitingForInitialData  0x80080000 -> 0x80320000
  BadNodeIdInvalid          0x80350000 -> 0x80330000

The Cli.Common project does not reference the Opc.Ua package (only
Core.Abstractions / CliFx / Serilog), so the hex literals were
corrected in place with a sync note rather than adding a heavy new
dependency.

SnapshotFormatterTests was updated: the [Theory] expectations now use
the correct spec codes and assert the full rendered form, plus a new
regression [Theory] confirms the pre-fix wrong names no longer apply.
All 24 tests pass.

findings.md: Driver.Cli.Common-001 set to Resolved; open count 6 -> 5.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-22 06:27:39 -04:00
Joseph Doherty
4df8737c86 fix(driver-galaxy): wire event-stream faults to the reconnect supervisor (Driver.Galaxy-001)
The ReconnectSupervisor was constructed but its trigger
ReportTransportFailure was never called. When the gateway StreamEvents
stream faulted, EventPump just logged and exited — the supervisor was
never notified, so a transient gateway drop permanently stopped
data-change notifications while GetHealth() still reported Healthy.

EventPump gains an optional onStreamFault callback invoked from its
stream-fault catch block (not on clean shutdown). GalaxyDriver wires it
to ReconnectSupervisor.ReportTransportFailure so a transport drop drives
reopen → replay.

This is the minimal fix for -001; the pump-restart-on-reopen gap remains
tracked as Driver.Galaxy-008. Regression tests cover the callback being
invoked on fault, the end-to-end supervisor reopen/replay, and that a
clean shutdown does not fire it. Driver.Galaxy suite: 206/206 pass.

Resolves code-review finding Driver.Galaxy-001 (Critical).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-22 05:54:33 -04:00
Joseph Doherty
27a8d05b7c feat(driver-galaxy): consume the gateway's session-less alarm model
The mxaccessgw updated alarms to a session-less central monitor:
AcknowledgeAlarm dropped SessionId and alarm transitions now come from
the session-less StreamAlarms feed instead of the per-session worker
StreamEvents stream. The GalaxyDriver no longer compiled against the
updated client.

- GatewayGalaxyAlarmAcknowledger: session-less rewrite — no GalaxyMxSession;
  outcome read from ProtocolStatus (throw) and Hresult (warn).
- New IGalaxyAlarmFeed seam + GatewayGalaxyAlarmFeed: background consumer
  of StreamAlarms that decodes the active-alarm snapshot plus live
  transitions into GalaxyAlarmTransition and reopens the stream on
  transport faults.
- EventPump: drop the dead per-session OnAlarmTransition path; the
  per-session stream no longer carries alarms.
- GalaxyDriver: bridge the feed onto IAlarmSource.OnAlarmEvent; the feed
  starts on SubscribeAlarmsAsync, independent of data subscriptions.
- Tests: replace EventPumpAlarmTests with GatewayGalaxyAlarmFeedTests;
  move the driver alarm-source tests onto the IGalaxyAlarmFeed seam.

Browse needed no change — GatewayGalaxyHierarchySource consumes the
unchanged DiscoverHierarchy contract.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-22 03:59:36 -04:00
Joseph Doherty
cd2306db66 feat(historian-sidecar): live aahClientManaged alarm-event write path (C.1)
SdkAlarmHistorianWriteBackend.WriteBatchAsync replaces the RetryPlease
placeholder with the real entry point — HistorianAccess.AddStreamedValue
(HistorianEvent, out HistorianAccessError) in aahClientManaged, pinned by
decompiling the installed SDK.

The write path opens its own ReadOnly=false connection: the query-side
HistorianDataSource opens ReadOnly sessions and AddStreamedValue fails on
those with WriteToReadOnlyFile. IHistorianConnectionFactory gains a readOnly
parameter (default true, query path unchanged); BuildConnectionArgs is
extracted as a pure helper. HistorianClusterEndpointPicker is shared for
node failover; connection-class errors abort the batch as RetryPlease and
reset the connection, malformed-input codes map to PermanentFail.

Tests: connection-unavailable batch deferral, ClassifyOutcome error-code
table, BuildConnectionArgs read-vs-write shaping (80 pass, 2 rig-skipped).
Live_* round-trip tests stay Skip-gated for the D.1 rollout smoke.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-18 16:08:32 -04:00
Joseph Doherty
8a51842e89 test(historian-sidecar): complete PR C.1 test coverage for AahClientManagedAlarmEventWriter
Add the spec-required 100/1000-event batching tests and cluster-failover
tests that were missing from the existing C.1 suite:

- AahClientManagedAlarmEventWriterTests: add Large_batch_all_ack_returns_all_true
  (batchSize 100 + 1000) and Large_batch_alternating_outcomes_are_positionally_correct
  (batchSize 100 + 1000) to satisfy the "1 / 100 / 1000 events" spec requirement;
  add Backend_retry_then_succeed_simulates_cluster_failover to cover the
  RetryPlease-then-Ack sequence at the IPC layer (unit-level stand-in for the
  rig-gated live cluster-failover path).

- SdkAlarmHistorianWriteBackendTests (new file): unit tests that pin the
  placeholder backend's RetryPlease-for-every-slot contract (preserves queued
  events while D.1 is unresolved); plus two Skip("rig-required") integration
  tests covering the live SDK single-event roundtrip and cluster failover via
  HistorianClusterEndpointPicker — remove the Skip in PR D.1.

Feasibility note: aahClientManaged.dll IS present in lib/ and referenced in
the csproj; the SDK call site is isolated behind IAlarmHistorianWriteBackend
in SdkAlarmHistorianWriteBackend.WriteBatchAsync (single method, D.1 seam).
The full AahClientManagedAlarmEventWriter implementation was already complete.

Build: 0 errors, 0 warnings.
Tests: 64 passed, 2 skipped (rig-gated), 0 failed.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-18 06:25:11 -04:00
Joseph Doherty
392b219233 fix(tests): stabilize three flaky tests under parallel full-solution load
#1 EventPumpBoundedChannelTests.Tags_metrics_with_client_name_for_multi_driver_hosts:
Replace fixed Task.Delay(100) with a poll-until-condition loop (5 s
timeout, 25 ms poll) so the test waits until the galaxy.events.received
measurement for galaxy.client=Driver-X actually lands in the listener.
Also adds lock(captured) in the MeterListener callback and at all reads,
since Counter.Add() fires the callback on the RunAsync background thread.

#2 VirtualTagEngineTests.Upstream_change_triggers_cascade_through_two_levels:
After waiting for B=15.0, also await WaitForConditionAsync for C=30.0
before asserting C. The cascade runs B then C sequentially under the
_evalGate semaphore; the prior code could read C while its evaluation
had not yet acquired the gate.

#3 ThreeUserInteropMatrixTests.Admin_Resolves_All_Five_Groups_From_LDAP:
Wrap the AuthenticateAsync call in a 15 s linked CancellationTokenSource
with one retry so transient GLAuth latency spikes under parallel test
load do not cause a CancellationToken expiry before the LDAP bind/search
complete.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-18 05:59:00 -04:00
Joseph Doherty
969b0847a1 docs: update path references for module-folder reorganization
Rewrite src/ and tests/ project paths in docs, CLAUDE.md, README.md, and
test-fixture READMEs to the new module-folder layout (Core/Server/Drivers/
Client/Tooling). References to retired v1 projects (Galaxy.Host/Proxy/Shared,
the legacy monolithic test projects) are left untouched.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-17 02:10:29 -04:00
Joseph Doherty
a25593a9c6 chore: organize solution into module folders (Core/Server/Drivers/Client/Tooling)
Group all 69 projects into category subfolders under src/ and tests/ so the
Rider Solution Explorer mirrors the module structure. Folders: Core, Server,
Drivers (with a nested Driver CLIs subfolder), Client, Tooling.

- Move every project folder on disk with git mv (history preserved as renames).
- Recompute relative paths in 57 .csproj files: cross-category ProjectReferences,
  the lib/ HintPath+None refs in Driver.Historian.Wonderware, and the external
  mxaccessgw refs in Driver.Galaxy and its test project.
- Rebuild ZB.MOM.WW.OtOpcUa.slnx with nested solution folders.
- Re-prefix project paths in functional scripts (e2e, compliance, smoke SQL,
  integration, install).

Build green (0 errors); unit tests pass. Docs left for a separate pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-17 01:55:28 -04:00