123 Commits

Author SHA1 Message Date
Joseph Doherty bbe292a4b4 docs(code-reviews): regenerate index — 126 Medium findings resolved
All Medium-severity code-review findings across the 29 reviewed modules
are now Resolved. The Pending findings table holds only Low-severity items.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-22 11:29:21 -04:00
Joseph Doherty 0f3b74ad87 fix(server): wire PermissionTrieCache into AuthorizationGate for generation pinning
Core-002 fixed TriePermissionEvaluator to evaluate each request against
the session's bound AuthGenerationId rather than whatever the cache
currently holds. AuthorizationGate.BuildSessionState was not updated at
the same time: it hardcoded AuthGenerationId = 0, so the evaluator's
GetTrie(cluster, 0) call returned null for any generation != 0, causing
every gated operation to silently fail with NotGranted regardless of
actual grants. The 42 gate/matrix/deferred-hardening tests all started
failing as a result.

Fix: add an optional PermissionTrieCache parameter to AuthorizationGate;
BuildSessionState now stamps AuthGenerationId from the cache's current
generation for the session's cluster. AuthorizationBootstrap.BuildGateAsync
passes the cache it creates. All 7 test MakeGate helpers updated to pass
the cache so tests produce a valid AuthGenerationId. 433/433 server tests
now pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-22 11:25:39 -04:00
Joseph Doherty 7bf2dc49cf fix(driver-twincat): align status-mapper tests with corrected ADS codes (Driver.TwinCAT-011)
The Driver.TwinCAT-011 fix rewrote TwinCATStatusMapper with correct
numeric values from Beckhoff.TwinCAT.Ads 7.0.172 (e.g. DeviceSymbol-
VersionInvalid = 1809 / 0x0711, not 1794 / 0x0702). Pre-existing
StatusMapper_covers_known_ads_error_codes InlineData cases were written
against the old wrong mappings and now fail; StatusMapper_recognises_
symbol_version_changed_code asserted the legacy 0x0702 constant. Update
both test files to match the corrected mapper and add a comment
documenting the correction.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-22 11:25:25 -04:00
Joseph Doherty e3371a4f68 docs(driver-opcuaclient): correct open-findings count to 2
Driver.OpcUaClient-006, -007, -008, -009, -010, -012, -013, -015 were
resolved in earlier commits; only -011 (Low) and -014 (Low) remain open.
Header was left at 3 after the Medium batch; correct to 2.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-22 11:25:14 -04:00
Joseph Doherty 5130563104 docs(server): update open findings count to 6 after Medium batch
Resolved Server-003, -005, -007, -010, -011, -013 in this batch;
Server-004, -006, -008, -012, -014, -015 remain open (all Low severity).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-22 11:03:51 -04:00
Joseph Doherty 2dd0bd4198 fix(server): resolve Medium code-review finding (Server-013)
Replace silent Enum.TryParse fallback to None with a ParseSecurityProfile
helper that emits a startup Log.Warning naming the unsupported value and
listing recognised profiles; operators now see the misconfiguration
before any client connects rather than getting an unexplained None posture.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-22 11:03:35 -04:00
Joseph Doherty a00f0338b5 fix(server): resolve Medium code-review finding (Server-011)
Advertise UserName token policy on any non-None security profile when
Ldap.Enabled; emit a startup LogWarning when Ldap.Enabled=true but
SecurityProfile=None so the misconfiguration is surfaced before clients
connect rather than silently producing no credential path.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-22 11:01:43 -04:00
Joseph Doherty 6075254f38 fix(server): resolve Medium code-review finding (Server-010)
Default AutoAcceptUntrustedClientCertificates to false in both
OpcUaServerOptions and Program.cs config fallback, aligning with
docs/security.md; auto-accept is now explicitly opt-in for dev use only.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-22 11:00:24 -04:00
Joseph Doherty fccb529d5f fix(server): resolve Medium code-review finding (Server-007)
Add configDbHealthy parameter to OpcUaApplicationHost; wire a
DbHealthCache (CanConnectAsync cached 10 s) in Program.cs so /healthz
reflects real config-DB reachability instead of the previous always-true
default; /healthz now returns 503 on a DB outage unless stale-config
cache is warm.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-22 10:59:08 -04:00
Joseph Doherty 8e8199752f fix(server): resolve Medium code-review finding (Server-005)
Add _nodeManagerDisposed field; set it under Lock in Dispose before
detaching the alarm-service handler; check it in OnAlarmServiceTransition
under the same Lock so an in-flight transition cannot dispatch to a
ConditionSink whose DriverNodeManager is being concurrently disposed.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-22 10:56:01 -04:00
Joseph Doherty 2003b343bf fix(server): resolve Medium code-review finding (Server-003)
Fix ReadRawAsync: correct XML doc from newest-first to oldest-first
(ascending source timestamp per OPC UA Part 11); move maxValuesPerNode
cap inside the time-window filter loop so paging limits apply to
in-window results only, not the whole buffer snapshot.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-22 10:54:08 -04:00
Joseph Doherty e774b6f88d docs(driver-twincat): update findings.md status fields and open count
Mark findings 003, 009, 010, 011, 012 Status: Resolved (status fields
were missing the update in earlier commits); reduce Open findings
count from 11 to 5 (Low findings 004, 006, 014, 015, 016 remain open).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-22 10:52:35 -04:00
Joseph Doherty 3f6b61133e fix(driver-twincat): resolve Medium code-review finding (Driver.TwinCAT-012)
GetMemoryFootprint now returns tagsByName * 256 + nativeSubs * 512 bytes
instead of a hard-coded 0; document that the stream-and-discard symbol
browse leaves no flushable cache so FlushOptionalCachesAsync is a
deliberate no-op.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-22 10:50:28 -04:00
Joseph Doherty 40b28e8820 fix(driver-twincat): resolve Medium code-review finding (Driver.TwinCAT-011)
Confirm AdsErrorCode values from Beckhoff.TwinCAT.Ads 7.0.172 and rewrite
MapAdsError with 20 explicit cases. Fix critical bug: AdsSymbolVersionChanged
was 0x0702 (DeviceInvalidGroup) but DeviceSymbolVersionInvalid is 1809
(0x0711); correct constant and all comments. Add BadOutOfService for
DeviceNotReady and BadInvalidState for DeviceInvalidState/PLC-in-Config.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-22 10:49:38 -04:00
Joseph Doherty f7d6bd12b9 fix(driver-twincat): resolve Medium code-review finding (Driver.TwinCAT-010)
Replace yield break with cancellationToken.ThrowIfCancellationRequested()
in BrowseSymbolsAsync so a cancelled browse propagates as
OperationCanceledException instead of silently completing with a partial
symbol set.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-22 10:44:16 -04:00
Joseph Doherty 98d8df4adf fix(driver-twincat): resolve Medium code-review finding (Driver.TwinCAT-009)
Swap _devices and _tagsByName to ConcurrentDictionary so ShutdownAsync
Clear() no longer races concurrent TryGetValue calls; store ProbeTask
on DeviceState and await it in ShutdownAsync before disposing the client
and gate, eliminating the probe-disposal race.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-22 10:43:35 -04:00
Joseph Doherty 40aa27b64b fix(driver-twincat): resolve Medium code-review finding (Driver.TwinCAT-005)
Inject optional ILogger<TwinCATDriver> (NullLogger default) and log
connect success/failure, ADS read errors, symbol-browse fallback,
native-notification registration failures, and host-state transitions.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-22 10:42:08 -04:00
Joseph Doherty de43690e0f fix(driver-twincat): resolve Medium code-review finding (Driver.TwinCAT-003)
Reject Structure-typed pre-declared tags in BuildTag at config-parse time
with a clear InvalidOperationException; replaces the previous silent
garbage read (MapToClrType fell through to typeof(int)) and late
NotSupportedException on writes.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-22 10:39:00 -04:00
Joseph Doherty a48b5396dc fix(driver-opcuaclient): resolve Medium code-review finding (Driver.OpcUaClient-015)
Add OpcUaClientMediumFindingsRegressionTests covering write-timeout status code
(009), Byte->UInt16 mapping (010), AutoAccept warning (012), GetMemoryFootprint/
FlushOptionalCachesAsync contract (013), and pre-init lifecycle guards (015).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-22 10:35:44 -04:00
Joseph Doherty 2df614c79e fix(driver-opcuaclient): resolve Medium code-review finding (Driver.OpcUaClient-010)
Map DataTypeIds.Byte to DriverDataType.UInt16 (unsigned family) rather than Int16
(signed family). Update attribute mapping test to assert the correct unsigned mapping
and add Byte/UInt16 to the standard-types theory.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-22 10:35:37 -04:00
Joseph Doherty 412c4bbd40 fix(driver-opcuaclient): resolve Medium code-review finding (Driver.OpcUaClient-006)
Route all Session mutations through _probeLock so OnReconnectComplete, ShutdownAsync,
and OnKeepAlive cannot race each other when swapping or clearing the active session.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-22 10:35:11 -04:00
Joseph Doherty 8ceb10d861 Merge branch 'worktree-agent-adfb71e38279b8f48' into feat/scripted-alarm-shelve-routing 2026-05-22 10:22:56 -04:00
Joseph Doherty 607413e19f docs(code-reviews): update Driver.S7 and Driver.S7.Cli findings status
Mark Driver.S7-002, -004, -008, -012, -014 and Driver.S7.Cli-001, -002, -003
as Resolved; update Open findings counts (Driver.S7: 10→5, Driver.S7.Cli: 7→4).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-22 10:17:48 -04:00
Joseph Doherty 26e7b8140a fix(driver-s7-cli): resolve Medium code-review finding (Driver.S7.Cli-003)
Wrap the InitializeAsync + ReadAsync body in a try/catch so an unreachable PLC
(refused TCP connect, wrong slot) still prints the structured Host:/CPU:/Health:/
Last error: report from driver.GetHealth() instead of crashing with a stack trace.
OperationCanceledException re-throws so Ctrl+C during connect exits cleanly.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-22 10:17:41 -04:00
Joseph Doherty 086f487786 fix(driver-s7-cli): resolve Medium code-review finding (Driver.S7.Cli-002)
Trim the --type help text on read and subscribe to the implemented set
(Bool/Byte/Int16/UInt16/Int32/UInt32/Float32) and append a one-line caveat that
Int64, UInt64, Float64, String, and DateTime are not yet implemented and will
return BadNotSupported — so the CLI does not advertise options that cannot succeed.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-22 10:17:33 -04:00
Joseph Doherty 01a6b6d859 fix(driver-s7-cli): resolve Medium code-review finding (Driver.S7.Cli-001)
Wrap all numeric/DateTime BCL parses in ParseValue with try/catch(FormatException)
and try/catch(OverflowException) that re-throw as CommandException, matching the
existing Bool path. Update ParseValue_non_numeric_for_numeric_types_throws to assert
CommandException (not FormatException), and add an overflow-edge test (Byte value 256).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-22 10:17:25 -04:00
Joseph Doherty aeb5fc48ae test(driver-s7): resolve Medium code-review finding (Driver.S7-014)
Add S7TypeMappingTests.cs covering ReinterpretRawValue and BoxValueForWrite —
26 tests verifying every implemented type round-trip (Bool/Byte/UInt16/Int16/
UInt32/Int32/Float32), two's-complement reinterpret semantics (ushort→short,
uint→int), unsupported-type NotSupportedException, and overflow edge cases.
These methods were factored out as internal static in the S7-002/S7-008 commit.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-22 10:17:15 -04:00
Joseph Doherty 909490622d fix(driver-s7): resolve Medium code-review findings (Driver.S7-002, S7-004, S7-008)
S7-002: add inline comment documenting the UInt32→Int32 lossiness in MapDataType,
consistent with the Int64/UInt64 note. Tracked for a follow-up that adds unsigned
DriverDataType members.

S7-004: inject ILogger<S7Driver> (optional, defaults to NullLogger); add structured
log calls for connect success/failure, probe Running/Stopped transitions, and
swallowed poll-loop exceptions, so operators have an event trail via Serilog.

S7-008: restructure WriteAsync catch ladder to mirror ReadAsync — OperationCanceledException
re-throws, NotSupportedException → BadNotSupported, PUT/GET-disabled PlcException →
BadNotSupported/Faulted, genuine PlcException → BadDeviceFailure/Degraded, all
others → BadCommunicationError/Degraded. Health is now updated on every write failure.
Also factor ReadOneAsync reinterpret into internal ReinterpretRawValue and
WriteOneAsync boxing into internal BoxValueForWrite for testability (Driver.S7-014).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-22 10:17:08 -04:00
Joseph Doherty b827b0c0a2 fix(driver-s7): resolve Medium code-review finding (Driver.S7-012)
Remove the dead ProbeAddress config surface from S7ProbeOptions and the factory
DTO. ProbeLoopAsync uses Plc.ReadStatusAsync (CPU-status PDU), not a tag-address
read — ProbeAddress was never consumed. The XML doc on Probe is corrected to
describe the ReadStatusAsync-based probe. Existing configs that set probeAddress
are silently ignored by the JSON deserializer.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-22 10:16:54 -04:00
Joseph Doherty 19a2a81321 fix(driver-modbus-addressing): resolve Medium code-review finding (Driver.Modbus.Addressing-003)
Complete the incomplete Addressing-003 fix: TryParseByteOrder now produces a
diagnostic mentioning "field 2" when a known type-code token (e.g. BOOL) is
supplied in the byte-order slot, so the user is guided to the correct field.
The previous fix only wired the message in the else-branch, which was unreachable
because LooksLikeByteOrderToken(BOOL) returned true first.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-22 10:16:30 -04:00
Joseph Doherty dd1742e319 fix(driver-modbus-cli): resolve Medium code-review finding (Driver.Modbus.Cli-002)
Reject --region Coils combined with any non-boolean --type with a CommandException
that names the constraint: coils carry a single bit, so only --type Bool is valid.
Without this check a write like "--region Coils --type UInt16 --value 42" would
silently coerce to a coil ON with no diagnostic.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-22 09:55:16 -04:00
Joseph Doherty 63e4a6baab fix(driver-modbus-cli): resolve Medium code-review finding (Driver.Modbus.Cli-001)
Add --bit-index, --string-length, and --string-byte-order options to
SubscribeCommand, mirroring ReadCommand, and pass them into ModbusTagDefinition
so that BitInRegister and String type subscriptions use the correct bit index and
string length rather than silently defaulting to bit-0 / zero-length.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-22 09:55:10 -04:00
Joseph Doherty e7d7b6cb1d fix(driver-modbus-addressing): resolve Medium code-review finding (Driver.Modbus.Addressing-008)
Add ModbusAddressEdgeCaseTests.cs covering the overflow/boundary gaps: empty
trailing parser field (finding -002 regression), multi-dot input, UserVMemoryToPdu
and AddOctalOffset overflow, SystemVMemoryToPdu base+overflow, MelsecAddress
ParseHex overflow, and DRegisterToHolding/MRelayToCoil bank-base overflow.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-22 09:53:12 -04:00
Joseph Doherty ba52c179fd fix(driver-modbus-addressing): resolve Medium code-review finding (Driver.Modbus.Addressing-002)
Reject an empty 3rd field in the address parser by checking parts[2].Length > 0
before the All(char.IsDigit) guard, so a trailing-colon typo like "40001:F:"
produces a diagnostic instead of silently parsing as a scalar.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-22 09:52:52 -04:00
Joseph Doherty ebfd5d7871 fix(driver-galaxy): fix XML doc comment cref in StatusCodeMap.ToQualityCategoryByte
StatusCode is not a .NET type reference in this assembly — replace the unresolvable
<see cref="StatusCode"/> with prose text so TreatWarningsAsErrors does not fail the
build on the CS1574 unresolved-cref warning.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-22 09:51:17 -04:00
Joseph Doherty 7a7defb59b fix(driver-galaxy): resolve Medium code-review finding (Driver.Galaxy-014)
Add GalaxyDriverInfrastructureTests covering the two gaps identified in this finding
that are not yet tracked by a dedicated test file: GetMemoryFootprint returns a live
registry-derived estimate (Driver.Galaxy-011) and DisposeAsync completes without
deadlock (Driver.Galaxy-007). The remaining items listed in the finding are covered
by earlier resolution commits: stream-fault → recovery → OnDataChange resumes
(EventPumpStreamFaultTests), post-reconnect Rebind (SubscriptionRegistryTests),
StatusCodeMap.FromMxStatus success/failure semantics (StatusCodeMapTests), and
DataTypeMap all seven codes (DataTypeMapTests). Update findings.md header to 4 open.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-22 09:49:51 -04:00
Joseph Doherty ecc91b0e48 fix(driver-galaxy): resolve Medium code-review finding (Driver.Galaxy-011)
GetMemoryFootprint() returned a constant 0 with a stale "PR 4.4 sets this" comment
even though PR 4.4 shipped the SubscriptionRegistry. Replace with a live estimate:
64 bytes × TrackedItemHandleCount + 256 bytes × TrackedSubscriptionCount. A 50k-tag
set now registers ~3 MB with the server's cache-flush heuristic instead of being
invisible. Returns 0 when no subscriptions are active.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-22 09:48:37 -04:00
Joseph Doherty c5f2d91bcb fix(driver-modbus): resolve Medium code-review finding (Driver.Modbus-002)
Clear _tagsByName, _lastPublishedByRef, and _lastWrittenByRef in ShutdownAsync
(via the new shared TeardownAsync helper) so a ReinitializeAsync cycle starts
from a clean state, consistent with the existing _autoProhibited.Clear().

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-22 09:48:09 -04:00
Joseph Doherty 0f3de4d510 fix(driver-galaxy): resolve Medium code-review finding (Driver.Galaxy-009)
Fix two resource-management bugs in StartDeployWatcher / BuildDefaultHierarchySource:
(a) Replace the discarded `_ = StartAsync(...)` with an explicit task variable that
    surfaces any synchronous InvalidOperationException (called-twice guard) rather than
    silently swallowing it.
(b) Change both StartDeployWatcher and BuildDefaultHierarchySource to use ??= on
    _ownedRepositoryClient so the first client created (by whichever path runs first)
    is reused by the second path, preventing a second GalaxyRepositoryClient from being
    created and the first from leaking past the driver's lifetime.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-22 09:47:52 -04:00
Joseph Doherty d572a011ef fix(driver-galaxy): resolve Medium code-review finding (Driver.Galaxy-007)
Implement IAsyncDisposable on GalaxyDriver so async sub-component disposals
(EventPump, AlarmFeed, MxSession, MxClient, RepositoryClient) are awaited rather
than blocked on GetAwaiter().GetResult(). DisposeAsync is now the primary path;
Dispose() delegates to it for using-statement compatibility. Each async component's
shutdown is awaited individually with a best-effort catch so a single slow shutdown
cannot prevent the rest of the cleanup sequence from running.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-22 09:47:00 -04:00
Joseph Doherty d14564839e fix(driver-galaxy): resolve Medium code-review finding (Driver.Galaxy-006)
HashSet<T>.First() enumeration order is unspecified and unstable across mutations, so
the "owner" handle attached to alarm events was non-deterministic when multiple alarm
subscriptions were active. Change _alarmSubscriptions from HashSet to List (preserving
insertion order) and pick [0] — the earliest-registered handle — as the deterministic
owner. The server routes transitions by SourceNodeId, not by handle, so the choice of
handle does not affect delivery to active subscribers.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-22 09:45:55 -04:00
Joseph Doherty 910a538b19 fix(driver-galaxy): resolve Medium code-review finding (Driver.Galaxy-004)
Add StatusCodeMap.ToQualityCategoryByte(uint) so the StatusCode → quality-byte
mapping lives in one place next to its inverse (FromQualityByte). GalaxyDriver
OnPumpDataChange now delegates to the helper instead of duplicating the shift+switch
inline; a future edit to the OPC UA bit layout cannot silently desync the probe-health
decode. Unit tests in StatusCodeMapTests pin all three category buckets and the
round-trip invariant.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-22 09:43:53 -04:00
Joseph Doherty 39a02f6794 fix(driver-galaxy): resolve Medium code-review finding (Driver.Galaxy-003)
StatusCodeMap.FromMxStatus checked `success != 0` to determine success, but the
mxaccessgw proto contract explicitly documents that `success` is not a boolean and
that clients must branch on `category` (MX_STATUS_CATEGORY_OK), not on `success`
alone. Replace the raw field check with `status.IsSuccess()` from
MxStatusProxyExtensions, which requires both `success != 0` AND `category == Ok`.
A worker reporting success=1 with a non-OK category was previously misreported as
Good. Updated StatusCodeMapTests with a regression case covering the inverted scenario.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-22 09:42:47 -04:00
Joseph Doherty f9dccaa732 Merge branch 'worktree-agent-ad34cad856c59bbc1' into feat/scripted-alarm-shelve-routing 2026-05-22 09:40:46 -04:00
Joseph Doherty f920de9878 Merge branch 'worktree-agent-af51f33c034e99fd4' into feat/scripted-alarm-shelve-routing 2026-05-22 09:40:46 -04:00
Joseph Doherty b21585767b Merge branch 'worktree-agent-aaf0e64363ca270b1' into feat/scripted-alarm-shelve-routing 2026-05-22 09:40:45 -04:00
Joseph Doherty ee5d7ad51e fix(driver-ablegacy): fix CS9124 build error and update stale status-mapper test
EffectiveCipPath now references ParsedAddress/Profile properties instead
of the captured primary-constructor parameters to avoid CS9124 (param
captured into enclosing type AND used to init a member).

NonZero_libplctag_status_maps_via_AbLegacyStatusMapper updated to pass
(int)Status.ErrorNotFound rather than the stale magic integer -14 that
the old mapper happened to handle but the new enum-based mapper does not.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-22 09:33:19 -04:00
Joseph Doherty 1a1b3df098 fix(driver-abcip): actually remove PlcTagHandle.cs from the git index
The file was physically deleted and unstaged in the Driver.AbCip-006
commit but the git rm was not included. Committed separately.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-22 09:31:56 -04:00
Joseph Doherty 1158b80c41 docs(driver-abcip): update findings.md resolutions for 005 and 014
Clarify Driver.AbCip-005 resolution: parent Structure tag stays in
_tagsByName (needed by whole-UDT planner + alarm projection); the fix
is in ReadSingleAsync returning BadNotSupported for direct reads.
Update Driver.AbCip-014 resolution text to match the actual test names.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-22 09:31:18 -04:00
Joseph Doherty 75163f703d docs(driver-ablegacy): update findings.md open count to 3
8 Medium findings resolved (-002 through -012); 3 Low findings remain
open (-005, -011, -013).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-22 09:30:58 -04:00
Joseph Doherty 17432bb1a4 fix(driver-abcip): correct Driver.AbCip-005 approach and fix 014 tests
Finding 005 revised approach: keep the parent Structure tag in
`_tagsByName` so the whole-UDT grouping planner can find it (required
for Driver.AbCip-003 opt-in path + alarm projection). Instead, detect a
direct read of a Structure-with-Members in `ReadSingleAsync` and return
`BadNotSupported` rather than Good/null — explicitly documenting the
contract that callers must address member paths. Duplicate-key checks
(scalar and member fan-out) remain.

Finding 014 test corrections: `Structure_parent_tag_read_returns_BadNotSupported`
now asserts the new contract. `Read_UDInt_tag_returns_uint_value_not_negative_wrapped_int`
assertion fixed to use `ShouldBeOfType<uint>()` instead of
`ShouldNotBe(-1)` (Shouldly overflows comparing uint.MaxValue with int).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-22 09:30:54 -04:00
Joseph Doherty e3648adcea fix(driver-ablegacy): resolve Medium code-review finding (Driver.AbLegacy-012)
Consume previously-dead AbLegacyPlcFamilyProfile fields:

- DeviceState.EffectiveCipPath applies DefaultCipPath when the parsed host
  address has an empty CIP path (SLC 500 / PLC-5 misconfigured without /1,0
  now gets the profile-supplied default route). All three tag/parent/probe
  Create() callers updated.
- InitializeAsync validates each tag's DataType against SupportsLongFile /
  SupportsStringFile and throws InvalidOperationException at init time so a
  MicroLogix Long tag or similar fails early rather than at runtime with an
  opaque comms error.
- MaxTagBytes tracked as a follow-up (string/array chunking requires broader
  design work).

Tests added for CipPath fallback and Long/String type validation.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-22 09:30:42 -04:00
Joseph Doherty cec7ab6ec4 fix(alarm-historian): resolve Medium code-review finding (Core.AlarmHistorian-010)
The test suite lacked coverage for four critical paths: corrupt/null-
deserializing PayloadJson rows, StartDrainLoop timer behavior and backoff
honoring, concurrent EnqueueAsync+DrainOnceAsync stress, and the
outcomes.Count != events.Count cardinality-mismatch branch.

Added tests covering all four gaps (committed across companion findings):
- Drain_with_corrupt_payload_row_deadletters_it_and_keeps_good_rows_aligned
- Drain_with_corrupt_head_row_does_not_stall_queue
- StartDrainLoop_honors_backoff_and_slows_cadence_under_retry
- StartDrainLoop_keeps_steady_cadence_when_writer_is_healthy
- StartDrainLoop_records_drain_fault_and_keeps_running
- Concurrent_enqueue_and_drain_do_not_throw_sqlite_busy
- Writer_returning_wrong_cardinality_outcomes_sets_backing_off_and_keeps_rows
- Capacity_eviction_increments_evicted_count_on_status
- GetStatus_snapshot_is_consistent_under_concurrent_drain

Updated Open findings count to 2 (Core.AlarmHistorian-008 + -011, both Low).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-22 09:30:17 -04:00
Joseph Doherty 228ad42ad7 fix(driver-ablegacy): resolve Medium code-review finding (Driver.AbLegacy-010)
MapLibplctagStatus now casts the int to libplctag.Status and switches on
named enum members (mirroring AbCipStatusMapper) instead of unverified
magic integers. A strongly-typed Status overload is the canonical path;
the int overload delegates to it. MapPcccStatus is retained with a comment
marking it as the reference mapping for future PCCC-STS inspection.
Tests updated to use Status enum members rather than raw integers.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-22 09:28:27 -04:00
Joseph Doherty f6d487b167 fix(driver-historian-wonderware-client): suppress xUnit1051 false-positive in ContractsWireParityTests
Add #pragma warning disable xUnit1051 at the top of ContractsWireParityTests.cs.
The xUnit1051 analyser fires on MessagePack's Serialize/Deserialize overloads that
have an optional CancellationToken parameter; these are synchronous parity tests
where the token is not meaningful — the suppression is scoped to this file only.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-22 09:28:20 -04:00
Joseph Doherty 5718cb5778 fix(alarm-historian): resolve Medium code-review finding (Core.AlarmHistorian-007)
When WriteBatchAsync returned a wrong-cardinality outcome list, DrainOnceAsync
threw InvalidOperationException after potential delivery — causing duplicate
events on re-drain or permanent queue stall on a deterministic writer bug.

- The throw replaced with log + backoff: mismatch is recorded into _lastError,
  _drainState set to BackingOff, backoff bumped, method returns without applying
  any outcomes, mirroring the writer-exception path.
- Regression test Writer_returning_wrong_cardinality_outcomes_sets_backing_off_and_keeps_rows
  asserts rows stay queued, DrainState = BackingOff, LastError populated, and
  that a fixed writer subsequently drains cleanly.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-22 09:27:55 -04:00
Joseph Doherty 5bf4be7ca9 fix(driver-focas): resolve Medium code-review finding (Driver.FOCAS-012)
Add FocasDriverMediumFindingsTests.cs with regression coverage for the
five Medium findings:

- 003: InitializeAsync throws when tag's DeviceHostAddress is absent
  from Devices (two variants: typo host, wrong port; also happy path)
- 004: DiscoverAsync emits ViewOnly for tags with Writable:true
- 005: GetHealth() is consistent after ten concurrent ReadAsync calls
- 006: Read recovers after the client is externally disposed, creating
  a fresh client rather than wedging with BadCommunicationError
- 012: Factory full-round-trip with all three opt-in config sections
  (FixedTree + AlarmProjection + HandleRecycle) with all subfields

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-22 09:27:40 -04:00
Joseph Doherty 6d520c6756 fix(alarm-historian): resolve Medium code-review finding (Core.AlarmHistorian-005)
Status fields (_lastDrainUtc, _lastSuccessUtc, _lastError, _drainState,
_evictedCount) were written by the drain timer thread and read by
GetStatus() / health-check threads with no memory barrier, risking torn
DateTime? reads and stale DrainState observations.

- Added _statusLock object; all writes to status fields now happen inside
  lock(_statusLock) blocks in DrainOnceAsync and DrainTimerCallback.
- GetStatus() snapshots all fields atomically under the same lock so the
  Admin UI / /healthz endpoint always sees a consistent view.
- Regression test GetStatus_snapshot_is_consistent_under_concurrent_drain
  drives status writes and reads from concurrent threads; asserts no throws.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-22 09:27:31 -04:00
Joseph Doherty 0003f76301 fix(scripting): correct System.Threading.Thread enforcement in analyzer
System.Threading.Thread is in the System.Threading namespace (not
System.Threading.Thread), so the existing ForbiddenNamespacePrefixes
entry "System.Threading.Thread" never matched — the namespace prefix
check compared against the type's containing namespace, which is
System.Threading. Move Thread into ForbiddenFullTypeNames (alongside
Environment / AppDomain / GC / Activator) where it is matched by exact
fully-qualified type name, which actually fires. Remove the dead
namespace-prefix entry and document why. The Rejects_Thread_new_at_compile
test now passes. (Core.Scripting-010.)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-22 09:27:15 -04:00
Joseph Doherty 807ea11187 fix(driver-focas): resolve Medium code-review finding (Driver.FOCAS-006)
EnsureConnectedAsync now disposes and nulls any existing non-connected
client before creating a fresh one via _clientFactory.Create().

Previously the method reused a cached client via ConnectAsync, but a
client disposed by a HandleRecycle race or prior teardown would hit
FocasWireClient.ThrowIfDisposed on every subsequent call, leaving the
device permanently wedged with BadCommunicationError and no recovery
path until ReinitializeAsync.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-22 09:27:08 -04:00
Joseph Doherty 1c6db86631 fix(driver-historian-wonderware-client): resolve Medium code-review finding (Driver.Historian.Wonderware.Client-009)
Add six previously-missing edge-case tests to WonderwareHistorianClientTests:
(2) WriteBatchAsync transport-drop catch path returns RetryPlease for all events;
(3) InvokeAsync second-attempt-also-fails propagates the exception;
(4) stalled sidecar fires OperationCanceledException within CallTimeout;
(5) HistoryAggregateType.Total throws NotSupportedException via ReadProcessedAsync;
(6) sidecar wrong-MessageKind reply throws InvalidDataException.

Extend FakeSidecarServer with DisconnectBeforeReply, ReplyWithWrongKind, and
StallAfterRequest test knobs to support these scenarios.

Add ContractsWireParityTests.cs (11 tests) to pin the MessagePack byte layout,
round-trip correctness, MessageKind enum values, and Framing constants — catching
silent [Key] index drift between the client and sidecar mirror copies without
requiring a cross-TFM (net10 vs net48) project reference.

Test count grew from 11 to 27; all 27 pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-22 09:26:56 -04:00
Joseph Doherty d412352b41 fix(driver-focas): resolve Medium code-review finding (Driver.FOCAS-005)
Guard all _health field accesses with Volatile.Read / Volatile.Write.
ReadAsync, WriteAsync, and ProbeLoopAsync run on different threads and
several updates are read-modify-write (new DriverHealth(_, _health.X, _)).
Without volatile semantics a concurrent update can be lost or a stale
LastSuccessfulRead timestamp propagated.  DriverHealth is an immutable
record so Volatile is sufficient — no lock needed.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-22 09:26:46 -04:00
Joseph Doherty 03c2028669 docs(driver-abcip): update findings.md open count after Medium resolutions
6 Medium findings resolved (004, 005, 006, 009, 010, 014); open count
updated from 11 to 5 (all remaining Low severity: 007, 011, 012, 013, 015).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-22 09:26:44 -04:00
Joseph Doherty 54d51a1d20 fix(driver-ablegacy): resolve Medium code-review finding (Driver.AbLegacy-009)
InitializeAsync catch block now mirrors ShutdownAsync teardown: cancels
and disposes probe CancellationTokenSources, calls DisposeRuntimes, and
clears _devices/_tagsByName before rethrowing. A caller that catches and
abandons (rather than retrying via ReinitializeAsync) no longer leaves
orphaned probe tasks or libplctag handles alive.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-22 09:26:42 -04:00
Joseph Doherty 9008c6e7aa fix(driver-abcip): resolve Medium code-review finding (Driver.AbCip-014)
Add regression tests for the Medium findings resolved in this series:
- AbCipDataType_maps_large_integer_types (theory: LInt→Int64, ULInt→UInt64,
  UDInt→UInt32) and Read_UDInt_tag_returns_uint_value_not_negative_wrapped_int
  cover Driver.AbCip-004.
- Structure_parent_tag_is_not_readable_after_member_fan_out,
  InitializeAsync_throws_on_duplicate_tag_name, and
  InitializeAsync_throws_when_member_name_collides_with_independent_tag
  cover Driver.AbCip-005.
- Read_failure_evicts_runtime_so_next_read_creates_fresh_handle covers
  Driver.AbCip-010.
AbCipDriverTests.AbCipDataType_maps_atomics_to_driver_types extended with
LInt/ULInt/UDInt assertions.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-22 09:26:31 -04:00
Joseph Doherty f23cea201d fix(driver-focas): resolve Medium code-review finding (Driver.FOCAS-004)
DiscoverAsync now unconditionally emits SecurityClassification.ViewOnly
for every user-authored FOCAS tag.  Previously the SecurityClass was
tag.Writable ? Operate : ViewOnly, but WireFocasClient.WriteAsync always
returns BadNotWritable — advertising Operate misleads OPC UA clients
and the DriverNodeManager ACL layer into granting write permission on
nodes that can never be written.

Updated FocasCapabilityTests.DiscoverAsync_emits_pre_declared_tags to
assert ViewOnly for the writable-by-config tag so it matches the
corrected behaviour.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-22 09:26:24 -04:00
Joseph Doherty 60ffcfe8bd fix(driver-ablegacy): resolve Medium code-review finding (Driver.AbLegacy-008)
Mark _health volatile. The record-reference assignment is atomic, but
without an acquire/release memory barrier GetHealth() on another thread
can observe a stale snapshot indefinitely. volatile enforces the barrier
at read and write sites without a lock.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-22 09:26:08 -04:00
Joseph Doherty 5b5e9ad83b fix(driver-focas): resolve Medium code-review finding (Driver.FOCAS-003)
Throw InvalidOperationException at InitializeAsync when a tag's
DeviceHostAddress does not match any entry in the Devices list, naming
both the tag and the unresolved host.  Previously the missing-device
check was guarded by a TryGetValue so a typo silently bypassed
capability-matrix validation and deferred the error to per-read
BadNodeIdUnknown — the opposite of the documented "fail at load" goal.

Also resolves findings 004, 005, and 006 in the same file:
- 004: DiscoverAsync now unconditionally emits ViewOnly for all user
  tags; the Writable config field no longer influences security class
  because the wire backend always returns BadNotWritable.
- 005: All _health reads use Volatile.Read and all writes use
  Volatile.Write so concurrent readers observe a consistent reference
  and read-modify-write sequences capture a stable snapshot.
- 006: EnsureConnectedAsync disposes and nulls any existing
  non-connected client before creating a fresh one, preventing
  ObjectDisposedException loops after a HandleRecycle race or teardown.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-22 09:25:57 -04:00
Joseph Doherty 7661d1b5dc fix(driver-ablegacy): resolve Medium code-review finding (Driver.AbLegacy-007)
Runtimes and ParentRuntimes changed from Dictionary to ConcurrentDictionary.
EnsureTagRuntimeAsync and EnsureParentRuntimeAsync now use a per-key
GetCreationLock semaphore with a double-checked pattern: fast-path read
requires no lock; slow-path create+initialize+store is serialised per key
so a concurrent caller waits rather than creating a duplicate runtime that
would be leaked when DisposeRuntimes runs.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-22 09:25:35 -04:00
Joseph Doherty 72728a5d45 fix(driver-abcip): resolve Medium code-review finding (Driver.AbCip-010)
Add `EvictRuntime` helper that removes + disposes a stale
`ConcurrentDictionary` entry. Call it from `ReadSingleAsync`,
`ReadGroupAsync`, and `WriteAsync` on non-zero libplctag status and
transport exceptions so the next call for the same tag re-creates a
fresh handle — mirroring the probe loop's recreate-on-failure pattern.
Value-conversion exceptions (NotSupportedException, FormatException,
InvalidCastException, OverflowException) are not transport faults and
do not evict the handle.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-22 09:24:45 -04:00
Joseph Doherty 0d10d30b7d fix(driver-historian-wonderware): update findings.md open count after resolving -002 -003 -006 -009
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-22 09:24:36 -04:00
Joseph Doherty 1723f5d5cd fix(driver-historian-wonderware): resolve Medium code-review finding (Driver.Historian.Wonderware-009)
Apply _config.MaxValuesPerRead as a bucket cap in ReadAggregateAsync,
mirroring the existing cap in ReadRawAsync. Without this guard a processed
read over a wide time range with a small IntervalMs could accumulate an
unbounded HistorianAggregateSample list; if the serialised reply exceeded
the 16 MiB FrameWriter frame cap WriteAsync would throw and the client
correlation-id wait would hang. Truncation now logs a Warning with a hint
to widen IntervalMs or reduce the time range.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-22 09:24:25 -04:00
Joseph Doherty 47eac2d84f fix(driver-ablegacy): resolve Medium code-review finding (Driver.AbLegacy-004)
DecodeValue for Bit with no bitIndex now reads the full 16-bit word via
GetInt16(0) and tests bit 0 instead of GetInt8(0), which only covered the
low byte and silently misread any bit in positions 8..15. The comment
explains the two decode paths (suffix-present vs suffix-absent).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-22 09:24:19 -04:00
Joseph Doherty 7474631992 fix(driver-historian-wonderware): resolve Medium code-review finding (Driver.Historian.Wonderware-006)
Add exponential backoff (250 ms → 500 ms → 1 s → 2 s → 4 s → 8 s cap) to
PipeServer.RunAsync after each connection-loop exception, replacing the spin
loop that previously pegged a CPU core and flooded the log on persistent errors
such as a duplicate pipe name or a failing PipeAcl.Create. After 20 consecutive
failures the method re-throws so the SCM / NSSM supervisor can restart the
sidecar cleanly. A clean connection (even a short-lived one) resets the counter.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-22 09:23:42 -04:00
Joseph Doherty 7d30009dc8 fix(driver-ablegacy): resolve Medium code-review finding (Driver.AbLegacy-003)
TryParse now rejects three classes of malformed PCCC address:
- Sub-element + bit-index together (e.g. T4:0.ACC/2) — never valid in PCCC
- File number on I/O/S system files (e.g. I3:0, S2:1) — single-letter only
- Sub-element on non-T/C/R files (e.g. B3:0.DN, N7:0.FOO) — only Timer,
  Counter, and Control files carry structured elements

New helper predicates IsNoFileNumberLetter / IsSubElementFileLetter
keep the parser's intent clear. Regression tests added in AbLegacyAddressTests.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-22 09:23:35 -04:00
Joseph Doherty a17de80cdb fix(scripting): resolve Medium code-review finding (Core.Scripting-010)
Add ScriptSandboxTests cases for all forbidden-namespace deny-list
vectors that lacked test coverage: System.Threading.Thread,
System.Threading.Tasks.Task.Run (newly denied per Core.Scripting-003),
System.Runtime.InteropServices.Marshal, and Microsoft.Win32.Registry.
The 001/002 type-granular and node-form vectors were already covered by
the -001/-002 resolution commits. All 79 tests pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-22 09:23:29 -04:00
Joseph Doherty a6de04a297 fix(scripting): resolve Medium code-review finding (Core.Scripting-007)
In TimedScriptEvaluator.RunAsync, the catch (TimeoutException) block
now checks ct.IsCancellationRequested before throwing
ScriptTimeoutException, so a caller cancellation that races a timeout
deterministically surfaces as OperationCanceledException regardless of
which WaitAsync observes first. Regression test
Caller_cancellation_wins_even_when_timeout_fires_first added.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-22 09:23:20 -04:00
Joseph Doherty 0cc3b23101 fix(alarm-historian): resolve Medium code-review finding (Core.AlarmHistorian-003)
EnqueueAsync used synchronous SQLite I/O (conn.Open / ExecuteNonQuery /
COUNT(*)) on the caller's thread, blocking the alarm-emitting thread under
write contention with the drain worker. The cancellationToken parameter was
silently ignored.

- EnqueueAsync converted to genuine async: OpenAsync / ExecuteNonQueryAsync /
  ExecuteScalarAsync used throughout; ct threaded to every await.
- ApplyPragmasAsync added alongside the existing ApplyPragmas helper so
  the WAL + busy_timeout PRAGMAs are applied on the async open path too.
- EnforceCapacityAsync added to handle capacity eviction on the async path.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-22 09:23:14 -04:00
Joseph Doherty 2c571001ca fix(scripting): resolve Medium code-review finding (Core.Scripting-004)
DependencyExtractor.VisitInvocationExpression now additionally checks
that the member-access receiver is the identifier "ctx" before treating
a GetTag / SetVirtualTag call as a ScriptContext dependency. This
prevents spurious dependencies when a script defines a local helper type
with a matching method name and calls it as other.GetTag("X"). Test
Ignores_member_access_GetTag_on_non_ctx_receiver added.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-22 09:23:12 -04:00
Joseph Doherty e390e1c067 fix(driver-abcip): resolve Medium code-review finding (Driver.AbCip-009)
The ConcurrentDictionary + TryAdd/dispose-loser pattern for Runtimes
and ParentRuntimes was already applied as part of the Driver.AbCip-008
fix. Recording resolution with evidence rather than applying a
duplicate change.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-22 09:23:09 -04:00
Joseph Doherty 60366b72c6 fix(scripting): resolve Medium code-review finding (Core.Scripting-003)
Add System.Threading.Tasks to ForbiddenNamespacePrefixes so scripts
cannot use Task.Run / Parallel to spawn background work that outlives
the per-evaluation timeout. Document the unbounded-memory accepted
trade-off and the Task denial rationale in docs/VirtualTags.md (new
"Known resource limits" subsection) and cross-reference from
docs/ScriptedAlarms.md.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-22 09:23:03 -04:00
Joseph Doherty 02daacbfd0 fix(driver-historian-wonderware): resolve Medium code-review finding (Driver.Historian.Wonderware-003)
Extract the string-vs-numeric value selection from raw and at-time read
loops into a SelectValue helper method. aahClientManaged's HistoryQueryResult
has no data-type field in the bound SDK version, so the heuristic (prefer
StringValue when non-empty and Value==0) is unavoidable; the helper now
documents the limitation explicitly in its XML doc so the known edge case
(numeric tag at exactly zero with a formatted StringValue) is self-evident.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-22 09:23:00 -04:00
Joseph Doherty 37945deb0a fix(driver-abcip): resolve Medium code-review finding (Driver.AbCip-006)
`PlcTagHandle` and `DeviceState.TagHandles` were dead scaffolding: the
`ReleaseHandle` no-op never called `plc_tag_destroy` and the dict was
never populated. Removed the file, the dead dict, and its
`DisposeHandles` loop. Updated the `AbCipDriver` class doc to document
that native lifetime is owned by libplctag.NET `Tag.Dispose()` (invoked
from `DisposeHandles`) with the library's own finalizer covering any
GC-collected instances. Two test methods that only exercised the dead
`PlcTagHandle` class removed from `AbCipDriverTests`.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-22 09:22:42 -04:00
Joseph Doherty c8a237e5e6 fix(driver-ablegacy): resolve Medium code-review finding (Driver.AbLegacy-002)
`current & widthMask` was already applied in `WriteBitInWordAsync` by
the -001 High finding fix, making the 16-bit sign-extension hazard fully
neutralised. No further code change required; mark Resolved.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-22 09:22:12 -04:00
Joseph Doherty 205b07f6aa fix(driver-historian-wonderware): resolve Medium code-review finding (Driver.Historian.Wonderware-002)
Normalise req.Events to Array.Empty<AlarmHistorianEventDto>() immediately
after MessagePack deserialization in HandleWriteAlarmEventsAsync. MessagePack
deserializes an absent or explicit-nil array field as null, not Array.Empty,
so a peer that sends a null Events array would trigger a NullReferenceException
on either .Length dereference (no-writer branch or catch block), leaving the
client correlation-id wait hanging with no reply frame.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-22 09:21:46 -04:00
Joseph Doherty 1679344ace fix(driver-historian-wonderware-client): resolve Medium code-review finding (Driver.Historian.Wonderware.Client-002)
Document explicitly that WriteBatchAsync never returns PermanentFail because
the WriteAlarmEventsReply wire contract carries only a bool-per-event (no
unrecoverable/transient distinction). Add a <remarks> XML block explaining
the structural limitation, why poison events retry rather than dead-letter,
and that a coordinated per-event status enum extension to the .NET 4.8
sidecar is a tracked follow-up. Add inline NOTE comments in both the
success and catch paths for discoverability.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-22 09:21:11 -04:00
Joseph Doherty 5bcbda1685 fix(driver-historian-wonderware-client): resolve Medium code-review finding (Driver.Historian.Wonderware.Client-007)
Introduce DeserializeSampleValue() helper that enforces a 64 KiB per-sample
ValueBytes size cap before calling MessagePackSerializer.Deserialize<object>,
and documents that the default StandardResolver (primitive-only, no typeless
or dynamic-type resolution) is in use. Both ToSnapshots and AlignAtTimeSnapshots
route through the new helper. Add inline XML comments to the two NuGetAuditSuppress
entries in the csproj recording the advisory title, why each does not apply to
this module's primitive-only deserialization, and when to revisit.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-22 09:20:23 -04:00
Joseph Doherty d5b8c802ce fix(driver-abcip): resolve Medium code-review finding (Driver.AbCip-005)
Structure tags with declared Members no longer register the bare parent
name in `_tagsByName` — reading it would return Good/null, which is
misleading. Clients read individual member paths. Both the member
fan-out and the scalar-tag paths now perform a duplicate-key check that
throws `InvalidOperationException` naming both colliding entries (fail-
fast, consistent with the AbCipHostAddress validation pattern).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-22 09:20:22 -04:00
Joseph Doherty 1722c0328b fix(driver-abcip): resolve Medium code-review finding (Driver.AbCip-004)
`ToDriverDataType` mapped LInt/ULInt to Int32 (truncation) and UDInt
to Int32 (negative wrap for values > Int32.MaxValue). DriverDataType
already carries Int64/UInt64/UInt32, so map each Logix 64-bit and
unsigned-32-bit type to the correct member. `DecodeValueAt` in
`LibplctagTagRuntime` updated to return uint/ulong for UDInt/ULInt
so the runtime value type agrees with the declared OPC UA type.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-22 09:19:40 -04:00
Joseph Doherty 75580fb432 fix(driver-historian-wonderware-client): resolve Medium code-review finding (Driver.Historian.Wonderware.Client-005)
Replace the synchronous non-cancellable _stream.ReadByte() for the kind byte
in FrameReader.ReadFrameAsync with an async ReadExactAsync(new byte[1], ct)
call so the full frame read honours the EffectiveCallTimeout-linked token
and cannot wedge the call gate when the sidecar stalls mid-frame.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-22 09:19:14 -04:00
Joseph Doherty 6bb971c040 fix(driver-ablegacy-cli): resolve Medium code-review finding (Driver.AbLegacy.Cli-001)
WriteCommand.ParseValue wraps FormatException/OverflowException as
CliFx CommandException so a bad --value yields a clean one-line CLI error
naming the value and target type instead of a raw .NET stack trace.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-22 09:15:19 -04:00
Joseph Doherty 29e656912e fix(driver-abcip-cli): resolve Medium code-review findings (Driver.AbCip.Cli-001, -002)
Driver.AbCip.Cli-001: WriteCommand.ParseValue wraps FormatException/
OverflowException as CommandException so bad --value input yields a clean
CLI error instead of a raw stack trace.
Driver.AbCip.Cli-002: probe/read/subscribe commands reject Structure types
up front (RejectStructure helper), matching the write guard.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-22 09:14:41 -04:00
Joseph Doherty e8edf123ff fix(driver-cli-common): resolve Medium code-review finding (Driver.Cli.Common-005)
Added missing test coverage identified in the -005 finding:

- FormatTable_with_empty_input_returns_header_only: verifies the -004 fix
  (empty batch read returns header+separator rather than throwing).
- FormatStatus_with_sub_code_bits_resolves_to_named_class: Theory exercising
  the -002 high-word mask path (e.g. 0x80050001 → "BadCommunicationError").
- FormatStatus_unknown_sub_code_falls_back_to_severity_class: Theory for the
  -002 severity-class fallback (unknown sub-codes still emit Good/Uncertain/Bad).
- New DriverCommandBaseTests class: four tests covering verbose/non-verbose
  Serilog level selection, ConfigureLogging idempotency, and FlushLogging.

Also corrected the stale FormatStatus_unknown_codes_fall_back_to_hex_only
expectation (0xDEADBEEF now resolves to "Bad" via the severity-class fallback
introduced by -002, not bare hex) and fixed the FormatTable empty-input crash
(guard rows.Length == 0 before calling Enumerable.Max).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-22 08:38:44 -04:00
Joseph Doherty 7ff356bddc fix(driver-cli-common): resolve Medium code-review finding (Driver.Cli.Common-003)
ConfigureLogging is now idempotent via a _loggingConfigured guard field so
repeated calls from subclasses do not abandon and leak the previous logger.
The previous Log.Logger is disposed before overwriting to release its
console-sink resources cleanly.

A new protected static FlushLogging() helper calls Log.CloseAndFlush() so
commands can guarantee buffered output is flushed in their finally blocks
before the process exits — important for the long-running subscribe verb.

XML doc updated to reflect call-once semantics and document FlushLogging().

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-22 08:38:09 -04:00
Joseph Doherty 1433a1cf30 fix(driver-cli-common): resolve Medium code-review finding (Driver.Cli.Common-002)
FormatStatus now matches named codes against code & 0xFFFF0000 (high-word
mask) rather than exact equality, so status codes carrying sub-code or flag
bits in the low 16 bits (e.g. 0x80050001) still resolve to their named class.
For codes not in the named shortlist a severity-class fallback using the top
2 bits always emits Good / Uncertain / Bad rather than bare hex.

Updated the stale FormatStatus_unknown_codes_fall_back_to_hex_only test (its
expectation became invalid once the severity-class fallback was added) and
added new Theory cases exercising both the high-word matching and the
severity-class fallback paths.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-22 08:37:47 -04:00
Joseph Doherty 3d8c285034 fix(virtual-tags): resolve Medium code-review findings (Core.VirtualTags-002, -003, -005, -008, -012)
Core.VirtualTags-002: cold-start guard publishes BadWaitingForInitialData
instead of silently returning a stale value.
Core.VirtualTags-003: Load detects duplicate Path values and keys the
upstream-subscription loop off the registered tag set.
Core.VirtualTags-005: VirtualTagSource fires the initial-data callback per
path before registering the change observer, fixing an ordering race.
Core.VirtualTags-008: DependencyGraph caches topological rank, lowering
per-change-event cost from O(V+E) to O(closure).
Core.VirtualTags-012: added 9 engine tests; CoerceResult null-return now
maps to BadInternalError as the code comment intended.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-22 08:31:49 -04:00
Joseph Doherty 11612900ba fix(core-abstractions): resolve Medium code-review findings (Core.Abstractions-001, -002, -003)
Core.Abstractions-001: PollGroupEngine compares array values with structural
equality so a driver returning a fresh T[] each poll no longer fires spuriously.
Core.Abstractions-002: PollOnceAsync guards reader result cardinality and
throws a descriptive InvalidOperationException on mismatch instead of a
swallowed ArgumentOutOfRangeException that stalled the subscription.
Core.Abstractions-003: the poll loop Task is tracked; Unsubscribe/DisposeAsync
await loop completion before disposing the CTS.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-22 08:29:49 -04:00
Joseph Doherty 4dcfaace62 fix(scripted-alarms): update findings.md for resolved Medium findings
Mark Core.ScriptedAlarms-002, -004, -005, -007, -012 as Resolved with
one-line descriptions. Update open-findings count from 11 to 6.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-22 08:24:54 -04:00
Joseph Doherty 69994f9cf6 fix(scripted-alarms): resolve Medium code-review finding (Core.ScriptedAlarms-012)
Add engine-level tests covering the six gaps identified in the finding:
(1) timed-shelve auto-expiry driven via injectable clock + RunShelvingCheckForTest
    hook so timer tests are deterministic;
(2) ConfirmAsync, TimedShelveAsync/UnshelveAsync round-trip, EnableAsync engine
    methods exercised end-to-end;
(3) OnEvent subscriber-throws isolation — engine state advances and stays
    operational after a subscriber throws;
(4) IAlarmStateStore.SaveAsync failure leaves in-memory state unchanged (locks in
    the persist-before-update invariant from finding-007);
(5) second LoadAsync does not leak the old timer (regression for finding-002);
(6) AreInputsReady cold-start guard correctly blocks on Bad/missing inputs and
    allows Uncertain-quality inputs through.

Expose RunShelvingCheckForTest() internal method on ScriptedAlarmEngine to
support deterministic timer tests.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-22 08:24:19 -04:00
Joseph Doherty ce86deca62 fix(core): resolve Medium code-review finding (Core-007)
SubscribeAsync now wraps each driver handle in a private HostBoundHandle
that carries the resolved host name.  UnsubscribeAsync unwraps it and
routes through the recorded host's resilience pipeline, correctly
charging the subscription's originating host's circuit breaker/bulkhead
instead of always using the default host.  Falls back to the default
host for handles not created by this invoker.  Two regression tests
added; update findings.md Open count from 10 to 6.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-22 08:24:17 -04:00
Joseph Doherty 6cec98caef fix(core): resolve Medium code-review finding (Core-006)
BuildAddressSpaceAsync now checks _disposed (throws ObjectDisposedException)
and tears down the previous alarm forwarder + clears the sink registry
before re-walking, so a Galaxy-redeploy rebuild does not leak the old
forwarder and double-deliver alarm transitions.  Three regression tests
added: double-build does not double-fire, sink count is correct after
rebuild, and post-dispose call throws.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-22 08:24:08 -04:00
Joseph Doherty debe163f4d fix(core): resolve Medium code-review finding (Core-005)
Change ClusterEntry from sealed record to sealed class so TryUpdate
uses reference equality for the CAS comparison.  Prune now uses a
read-compute-TryUpdate retry loop that restarts when a concurrent
Install updates the entry between the read and the write, preventing
a race that could silently drop the just-installed newest generation.
Two regression tests added to PermissionTrieCacheTests.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-22 08:23:52 -04:00
Joseph Doherty 09cd579220 fix(core): resolve Medium code-review finding (Core-003)
Add FolderSegment member to NodeAclScopeKind; update WalkSystemPlatform
to report NodeAclScopeKind.FolderSegment (not Equipment) for each
visited Galaxy folder level, so MatchedGrant.Scope in
AuthorizationDecision.Provenance correctly distinguishes Galaxy folder
grants from UNS Equipment grants in the audit trail and Admin UI
diagnostics.  Three regression tests added to PermissionTrieTests.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-22 08:23:45 -04:00
Joseph Doherty a0b3a4c8a7 fix(scripted-alarms): resolve Medium code-review finding (Core.ScriptedAlarms-007)
Reorder persist/update in ApplyAsync, ReevaluateAsync, and ShelvingCheckAsync:
SaveAsync is now called before the in-memory _alarms entry is advanced. A store
failure therefore leaves both the persisted and in-memory views at the prior state
rather than diverging, maintaining the invariant that startup recovery reflects
actual persisted state.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-22 08:18:42 -04:00
Joseph Doherty cdaa0da45d fix(scripted-alarms): resolve Medium code-review finding (Core.ScriptedAlarms-005)
Add _disposed re-checks inside ReevaluateAsync and ShelvingCheckAsync after
acquiring _evalGate so callbacks in flight when Dispose() runs bail out cleanly
instead of mutating _alarms or writing to a disposed store. Drop the
_alarms.Clear() from Dispose() — clearing outside the gate races concurrent
reads and is unnecessary since the object is being discarded.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-22 08:17:50 -04:00
Joseph Doherty 72b5f7a20c fix(scripted-alarms): resolve Medium code-review finding (Core.ScriptedAlarms-004)
Split the LoadAsync seed-read + subscribe loop: ReadTag seed fills _valueCache
first, then persisted-state restore runs, then _loaded = true, then SubscribeTag
is called. Any synchronous initial push from the upstream now arrives after
_alarms is fully initialised and _loaded = true, so ReevaluateAsync will queue
correctly behind the gate rather than racing the half-built state.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-22 08:16:44 -04:00
Joseph Doherty b75542bbac fix(scripted-alarms): resolve Medium code-review finding (Core.ScriptedAlarms-002)
Dispose any existing _shelvingTimer before reassigning it inside LoadAsync so
that a second LoadAsync call does not leak the old timer and leave two timers
running concurrently against the same engine state.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-22 08:15:58 -04:00
Joseph Doherty c126fc7a7d fix(configuration): resolve Medium code-review findings (Configuration-002, -003, -006, -009)
Configuration-002: sp_PublishGeneration is transaction-nesting aware
(BEGIN TRANSACTION vs SAVE TRANSACTION on @@TRANCOUNT) so a caller's outer
transaction survives a publish failure; sp_ValidateDraft wrapped in TRY/CATCH.
Configuration-003: ValidatePathLength uses the cluster's actual Enterprise/Site
lengths when available, falling back to the conservative approximation.
Configuration-006: ResilientConfigReader treats a command-timeout
TaskCanceledException as a fault (not caller cancellation) and falls back.
Configuration-009: removed the checked-in plaintext sa connection string;
CreateDbContext now requires OTOPCUA_CONFIG_CONNECTION.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-22 08:13:27 -04:00
Joseph Doherty 7e54e1e4a0 fix(client-shared): resolve Medium code-review findings (Client.Shared-001, -002, -007, -008)
Client.Shared-001: lowered the OnAlarmEventNotification early-return guard
from <6 to <1; per-index field guards already default missing fields safely.
Client.Shared-002: GetRedundancyInfoAsync replaces unguarded unboxing casts
with StatusCode.IsGood + Convert.ToInt32/ToByte, defaulting on bad reads.
Client.Shared-007: alarm fallback Task.Run guards on ReferenceEquals(session,
_session) and drops stale alarms on ObjectDisposedException after failover.
Client.Shared-008: WriteValueAsync rejects type inference from bad/null reads;
ValueConverter wraps parse failures in a descriptive FormatException.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-22 08:11:23 -04:00
Joseph Doherty aa142f6dd4 fix(client-cli): resolve Medium code-review findings (Client.CLI-001, Client.CLI-005)
Client.CLI-001: parse --start/--end with CultureInfo.InvariantCulture and
DateTimeStyles.AssumeUniversal|AdjustToUniversal so dates are culture-stable.
Client.CLI-005: SDK notification callbacks now hand off to an unbounded
channel drained on the main thread; handlers are unsubscribed before the
summary phase so no notification interleaves with console output.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-22 08:08:25 -04:00
Joseph Doherty 9f5a5c9997 fix(analyzers): resolve Medium code-review findings (Analyzers-001, Analyzers-006)
Analyzers-001: IsInsideWrapperLambda now matches the wrapper method name
(ExecuteAsync/ExecuteWriteAsync) in addition to the containing type, so a
future non-callSite lambda overload cannot suppress the diagnostic.
Analyzers-006: extended StubSources and added coverage for the remaining
guarded interfaces, synchronous members, concrete-driver receivers,
ExecuteWriteAsync wrapping, and nested lambdas.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-22 08:08:09 -04:00
Joseph Doherty a0aa4a4819 fix(admin): complete Admin-006 — inject IAntiforgery into LogoutAsync for explicit token validation
The previous Admin-006 commit added <AntiforgeryToken /> to the logout form
and updated the comment on the endpoint, but did not update LogoutAsync to
actually call IAntiforgery.ValidateRequestAsync. Blazor's UseAntiforgery()
middleware does not automatically validate minimal-API endpoints, so a
tokenless POST still succeeded. This commit injects IAntiforgery into the
handler, wraps ValidateRequestAsync in a try/catch, and returns 400 on
AntiforgeryValidationException. The endpoint keeps .DisableAntiforgery() to
prevent the middleware from also trying to read the body (which would cause
a double-read). The regression test is updated to log in first (to get an
authenticated session) before asserting 400 on a tokenless logout POST.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-22 07:51:11 -04:00
Joseph Doherty 1db8736515 fix(admin): update open-findings count in Admin findings.md
Admin-006 through Admin-009 (all Medium) resolved; 3 Low findings remain open.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-22 07:33:14 -04:00
Joseph Doherty b585429447 fix(admin): resolve Medium code-review finding (Admin-009)
Add AdminAuthPipelineTests (WebApplicationFactory + RoleInjectingHandler) to
enforce that ConfigViewer is denied CanPublish-gated pages while FleetAdmin is
permitted, and that an authenticated FleetAdmin session can reach the homepage.
Existing PageAuthorizationTests (anon page rejection) and AuthEndpointsTests
(login cookie + hub auth) cover cases (a)-(c); this file adds case (d).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-22 07:33:03 -04:00
Joseph Doherty 328ab1e614 fix(admin): resolve Medium code-review finding (Admin-008)
Add @ReleasedBy parameter to sp_ReleaseExternalIdReservation via a new EF
migration so the operator principal (not the shared SQL account) is recorded
in ExternalIdReservation.ReleasedBy and ConfigAuditLog.Principal.
ReservationService.ReleaseAsync gains a releasedBy parameter; Reservations.razor
resolves the signed-in user from AuthenticationState and passes it through.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-22 07:29:54 -04:00
Joseph Doherty 71f91aa57c fix(client-ui): update findings.md — mark Client.UI-001/002/005/007/008 Resolved
Update status and resolution text for the five Medium findings resolved
in this batch; lower the Open findings count from 11 to 6.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-22 07:29:10 -04:00
Joseph Doherty c7f8a00635 fix(client-ui): resolve Medium code-review finding (Client.UI-008)
Implement IDisposable on MainWindowViewModel to detach ConnectionStateChanged,
call Teardown() on the subscription/alarm VMs, and dispose _service so the OPC UA
session and SDK resources are released. Call Dispose() from MainWindow.OnClosing
alongside the existing SaveSettings() call.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-22 07:28:48 -04:00
Joseph Doherty bdc1f96b5b fix(client-ui): resolve Medium code-review finding (Client.UI-007)
Remove Password from UserSettings and stop writing it to settings.json;
the operator is re-prompted on each launch. Update LoadSettings/SaveSettings
comments and adjust the affected test assertion to verify the password is
not restored from the persisted model.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-22 07:28:12 -04:00
Joseph Doherty 08f000069c fix(admin): resolve Medium code-review finding (Admin-007)
NewCluster.razor and ClusterDetail.razor now resolve ClaimTypes.Name /
NameIdentifier from the cascaded AuthenticationState instead of hardcoding
"admin-ui" as the createdBy audit field. The operator principal is now
attributed correctly on every cluster-create and draft-create write path.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-22 07:27:40 -04:00
Joseph Doherty a9cede8ed4 fix(client-ui): resolve Medium code-review finding (Client.UI-005)
Call Subscriptions?.Teardown() and Alarms?.Teardown() in the Disconnected
branch of OnConnectionStateChanged so server-side session drops also
quiesce the DataChanged and AlarmEvent handlers. Add Reattach() methods
that idempotently re-hook the handlers; call them from the Connected
branch so reconnects after a server-side drop restore live updates.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-22 07:27:03 -04:00
Joseph Doherty af454c6af6 fix(admin): resolve Medium code-review finding (Admin-006)
Emit <AntiforgeryToken /> in the MainLayout sign-out form and remove
.DisableAntiforgery() from the /auth/logout endpoint so UseAntiforgery()
validates the token. A tokenless POST now returns 400, preventing CSRF-logout.
Regression-guarded by AuthEndpointsTests.Logout_without_antiforgery_token_is_rejected.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-22 07:26:34 -04:00
Joseph Doherty 55c2a5a209 fix(client-ui): resolve Medium code-review finding (Client.UI-002)
Guard the two nullable child VM dereferences (BrowseTree at ConnectAsync
and History at ViewHistoryForSelectedNode) with != null checks, matching
the guarding style already used for Subscriptions and Alarms nearby.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-22 07:25:47 -04:00
Joseph Doherty 2816c76c2b fix(client-ui): resolve Medium code-review finding (Client.UI-001)
Route the synchronous IsLoading = true write through _dispatcher.Post so
both IsLoading assignments use the same dispatch path as Results.Clear()
and the final IsLoading = false, eliminating the ordering hazard.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-22 07:25:18 -04:00
172 changed files with 7014 additions and 1203 deletions
+9 -9
View File
@@ -7,7 +7,7 @@
| Review date | 2026-05-22 |
| Commit reviewed | `76d35d1` |
| Status | Reviewed |
| Open findings | 7 |
| Open findings | 3 |
## Checklist coverage
@@ -108,13 +108,13 @@
| Severity | Medium |
| Category | Security |
| Location | `Components/Layout/MainLayout.razor:47-49`, `Program.cs:129,131-135` |
| Status | Open |
| Status | Resolved |
**Description:** `app.UseAntiforgery()` is enabled, but the Sign-out form (`<form method="post" action="/auth/logout">`) renders no antiforgery token, and the `MapPost("/auth/logout", ...)` endpoint does not call `.DisableAntiforgery()` or otherwise opt out. Depending on framework version this either makes logout fail with a 400 for legitimate users, or — if the endpoint is treated as exempt — leaves logout as an unprotected state-changing POST (CSRF logout). The same concern applies to the login form once Admin-005 is addressed.
**Recommendation:** Emit an antiforgery token in the logout form and let `UseAntiforgery()` validate it; or explicitly and deliberately mark the endpoint `.DisableAntiforgery()` if a tokenless logout is intended. Verify login/logout round-trips after the change.
**Resolution:** _(open)_
**Resolution:** Resolved 2026-05-22 — `<AntiforgeryToken />` added to the sign-out form in `MainLayout.razor` and `.DisableAntiforgery()` removed from the `/auth/logout` endpoint so `UseAntiforgery()` validates the token; a tokenless POST now returns 400, preventing CSRF-logout. The login endpoint retains `.DisableAntiforgery()` (login is not a state-changing operation CSRF can abuse). `AuthEndpointsTests.Logout_without_antiforgery_token_is_rejected` regression-guards this.
### Admin-007
@@ -123,13 +123,13 @@
| Severity | Medium |
| Category | Design-document adherence |
| Location | `Components/Pages/Clusters/NewCluster.razor:91,95-96` |
| Status | Open |
| Status | Resolved |
**Description:** `NewCluster.CreateAsync` hardcodes `CreatedBy = "admin-ui"` (both on the `ServerCluster` row and the draft generation) instead of the signed-in operator principal name. `admin-ui.md` section "Audit" requires "the operator principal" be recorded on every write. The audit trail therefore cannot attribute cluster creation to a person. The same literal would apply to any anonymous creation that Admin-001/002 currently permit.
**Recommendation:** Pass the authenticated user identity (`ClaimTypes.Name` / `NameIdentifier` from the cascaded `AuthenticationState`) as `createdBy`. Apply the same pattern to every other Admin write path that records a `CreatedBy`/`PublishedBy`/`ReleasedBy` field.
**Resolution:** _(open)_
**Resolution:** Resolved 2026-05-22 — `NewCluster.razor` and `ClusterDetail.razor` (the two pages that call `ClusterService.CreateAsync` / `GenerationService.CreateDraftAsync` with a hardcoded literal) now resolve `ClaimTypes.Name` / `ClaimTypes.NameIdentifier` from the cascaded `AuthenticationState` and pass the operator principal name as `createdBy`; the fallback is `"unknown"` (defensive, should never occur on an `[Authorize]`-gated page).
### Admin-008
@@ -138,13 +138,13 @@
| Severity | Medium |
| Category | Error handling & resilience |
| Location | `Services/ReservationService.cs:28-37` |
| Status | Open |
| Status | Resolved |
**Description:** `ReservationService.ReleaseAsync` calls `sp_ReleaseExternalIdReservation` with only `@Kind`, `@Value`, `@ReleaseReason`. `admin-ui.md` section "Release an external-ID reservation" specifies the proc sets `ReleasedBy` to the FleetAdmin who performed the release, and the action is the only path that allows ZTag/SAPID reuse and "requires explicit FleetAdmin action with a documented reason." The service does not capture or pass the operator principal, so the compliance audit trail for a release records no actor (unless the proc derives it from the DB session login, which would be the shared service account, not the operator).
**Recommendation:** Add an operator-principal parameter to `ReleaseAsync`, pass it to the stored proc as `@ReleasedBy`, and have callers supply the signed-in user. Confirm the proc signature accepts it.
**Resolution:** _(open)_
**Resolution:** Resolved 2026-05-22 — a new EF migration (`20260522000001_AddReleasedByToReleaseExternalIdReservation`) adds `@ReleasedBy nvarchar(128)` to `sp_ReleaseExternalIdReservation` and uses it for both `ExternalIdReservation.ReleasedBy` and `ConfigAuditLog.Principal` (replacing `SUSER_SNAME()`); `ReservationService.ReleaseAsync` gains a `releasedBy` parameter with a guard; `Reservations.razor` resolves `ClaimTypes.Name` / `ClaimTypes.NameIdentifier` from the cascaded `AuthenticationState` and passes the operator principal to the service.
### Admin-009
@@ -153,13 +153,13 @@
| Severity | Medium |
| Category | Testing coverage |
| Location | `src/Server/ZB.MOM.WW.OtOpcUa.Admin` (whole module) |
| Status | Open |
| Status | Resolved |
**Description:** The module most security-critical behaviours have no enforced test coverage at the boundary that matters. There is no test that an unauthenticated request to a page or hub is rejected (which would have caught Admin-001/002/003), no test of the login -> cookie issuance round-trip (Admin-005), and the `AdminRoleGrantResolver` / `ClusterRoleClaims` authorization logic is exercised only in isolation. `InternalsVisibleTo` points at `ZB.MOM.WW.OtOpcUa.Admin.Tests`, but the auth pipeline itself is not asserted end-to-end. Per `REVIEW-PROCESS.md` category 9 these are untested critical paths.
**Recommendation:** Add `WebApplicationFactory`-based integration tests asserting: (a) anonymous GET of each protected route returns 302->/login or 401; (b) anonymous hub connect is refused; (c) a valid login issues the cookie and a subsequent request is authorized; (d) a `ConfigViewer` is denied `CanPublish` pages. Wire the check into the `*.Admin.Tests` suite.
**Resolution:** _(open)_
**Resolution:** Resolved 2026-05-22 — (a) covered by existing `PageAuthorizationTests`; (b) covered by existing `AuthEndpointsTests.Anonymous_hub_negotiate_is_rejected`; (c) covered by existing `AuthEndpointsTests.Valid_login_issues_the_auth_cookie_and_redirects_home`; (d) new `AdminAuthPipelineTests` adds a `WebApplicationFactory` with a `RoleInjectingHandler` that stamps requests with caller-supplied roles, asserting that `ConfigViewer` is denied `CanPublish`-gated pages (403/302) while `FleetAdmin` is permitted, and that a `FleetAdmin` session can reach protected pages.
### Admin-010
+5 -5
View File
@@ -7,7 +7,7 @@
| Review date | 2026-05-22 |
| Commit reviewed | `76d35d1` |
| Status | Reviewed |
| Open findings | 7 |
| Open findings | 5 |
## Checklist coverage
@@ -33,13 +33,13 @@
| Severity | Medium |
| Category | Correctness & logic bugs |
| Location | `src/Tooling/ZB.MOM.WW.OtOpcUa.Analyzers/UnwrappedCapabilityCallAnalyzer.cs:135-139` |
| Status | Open |
| Status | Resolved |
**Description:** `IsInsideWrapperLambda` treats a guarded call as "wrapped" if it is textually inside ANY lambda that is an argument to ANY invocation whose containing type is `CapabilityInvoker` or `AlarmSurfaceInvoker`. It matches the containing type only, never the parameter the lambda is bound to. The real wrapping contract is specifically the `callSite` (`Func<CancellationToken, ValueTask>` / `Func<CancellationToken, ValueTask<T>>`) parameter of `CapabilityInvoker.ExecuteAsync` / `ExecuteWriteAsync`. Any other lambda argument to a method on those types — a future overload that takes a predicate/selector lambda, or a lambda passed in a non-`callSite` position — would suppress the diagnostic even though the guarded call is not actually executed inside the resilience pipeline. The analyzer's own XML doc (lines 21-23) describes exactly this looser-than-intended behaviour. It is a latent false-negative gap rather than an active bug because the current `CapabilityInvoker` surface has no non-`callSite` lambda parameter.
**Recommendation:** Resolve the symbol of the lambda argument's parameter (`IMethodSymbol.Parameters[i]`) and require its type to be the `Func<CancellationToken, ValueTask>` / `Func<CancellationToken, ValueTask<T>>` callsite shape, or at minimum match the wrapper method name (`ExecuteAsync` / `ExecuteWriteAsync`) rather than only the containing type. This closes the gap before a new overload silently widens the escape hatch.
**Resolution:** _(open)_
**Resolution:** Resolved 2026-05-22 — Replaced `WrapperTypes` string array with `WrapperMethods` (type FQN + method name) tuples so `IsInsideWrapperLambda` matches both containing type and method name, preventing future non-`callSite` overloads from silently suppressing the diagnostic.
### Analyzers-002
@@ -108,7 +108,7 @@
| Severity | Medium |
| Category | Testing coverage |
| Location | `tests/Tooling/ZB.MOM.WW.OtOpcUa.Analyzers.Tests/UnwrappedCapabilityCallAnalyzerTests.cs` |
| Status | Open |
| Status | Resolved |
**Description:** The test suite exercises only 3 of the 7 guarded interfaces (`IReadable`, `IWritable`, `ITagDiscovery`) and one positive / one negative lambda case. Significant untested behaviour for an analyzer that gates a repo-wide resilience invariant:
@@ -121,7 +121,7 @@
**Recommendation:** Extend `StubSources` with the remaining guarded interfaces and `AlarmSurfaceInvoker`, then add tests for: each remaining guarded interface (positive plus wrapped), a synchronous member not being flagged, a concrete driver-class receiver with a renamed implementing method, `ExecuteWriteAsync` wrapping, and a nested-lambda case.
**Resolution:** _(open)_
**Resolution:** Resolved 2026-05-22 — Extended `StubSources` with `ISubscribable`, `IAlarmSource`, `IHistoryProvider`, `IHostConnectivityProbe`, and `AlarmSurfaceInvoker` stubs; added 14 new tests covering each missing guarded interface (positive + wrapped), synchronous member not flagged, concrete driver receiver, `ExecuteWriteAsync` wrapping, and nested-lambda cases (19 tests total, all passing).
### Analyzers-007
+5 -5
View File
@@ -7,7 +7,7 @@
| Review date | 2026-05-22 |
| Commit reviewed | `76d35d1` |
| Status | Reviewed |
| Open findings | 10 |
| Open findings | 8 |
## Checklist coverage
@@ -36,7 +36,7 @@ a category produced nothing rather than leaving it blank.
| Severity | Medium |
| Category | Correctness & logic bugs |
| Location | `Commands/HistoryReadCommand.cs:73`, `Commands/HistoryReadCommand.cs:76` |
| Status | Open |
| Status | Resolved |
**Description:** The start and end options are parsed with `DateTime.Parse(StartTime)` with
no `IFormatProvider` or `DateTimeStyles`. Parsing therefore depends on the current OS
@@ -53,7 +53,7 @@ ranges on machines in different time zones.
ISO 8601 via `DateTimeOffset.Parse`), and document the expected format and timezone
assumption precisely.
**Resolution:** _(open)_
**Resolution:** Resolved 2026-05-22 — `DateTime.Parse` replaced with `CultureInfo.InvariantCulture` + `DateTimeStyles.AssumeUniversal | AdjustToUniversal`; option descriptions updated to document ISO 8601 UTC format.
### Client.CLI-002
@@ -130,7 +130,7 @@ each of its option properties, matching the style used by the sibling commands.
| Severity | Medium |
| Category | Concurrency & thread safety |
| Location | `Commands/SubscribeCommand.cs:66-78`, `Commands/AlarmsCommand.cs:52-64` |
| Status | Open |
| Status | Resolved |
**Description:** The `DataChanged` and `AlarmEvent` handlers write to `console.Output`
(a `System.IO.TextWriter`) directly from the OPC UA SDK subscription/notification thread,
@@ -147,7 +147,7 @@ through a `Channel<T>` drained by the main thread, or guard every `console.Outpu
with a shared lock. At minimum, ensure handler exceptions cannot escape into the SDK
callback.
**Resolution:** _(open)_
**Resolution:** Resolved 2026-05-22 — notification handlers in `SubscribeCommand` and `AlarmsCommand` now enqueue lines to an `UnboundedChannel<string>` via `TryWrite`; the main thread drains the channel via `ReadAllAsync`. Handlers are named local functions so they can be unsubscribed before the summary phase; all handler exceptions are swallowed to protect the SDK callback.
### Client.CLI-006
+9 -9
View File
@@ -7,7 +7,7 @@
| Review date | 2026-05-22 |
| Commit reviewed | `76d35d1` |
| Status | Reviewed |
| Open findings | 9 |
| Open findings | 5 |
## Checklist coverage
@@ -33,13 +33,13 @@
| Severity | Medium |
| Category | Correctness & logic bugs |
| Location | `OpcUaClientService.cs:552` |
| Status | Open |
| Status | Resolved |
**Description:** `OnAlarmEventNotification` returns early when `eventFields.EventFields` has fewer than 6 entries. The event filter built by `CreateAlarmEventFilter` always registers 13 select clauses, so a conforming server returns 13 fields. The `< 6` threshold is arbitrary and inconsistent: SourceName is index 2 and Severity index 5, but ConditionName (6), Retain (7), Acked/Active (8/9) and ConditionNodeId (12) are all needed for a usable alarm and are each guarded individually with `fields.Count > N`. A non-conforming server that returns a truncated list (or fewer fields than requested) makes the `< 6` early return silently drop the entire notification, including the EventId/SourceName/Severity that are present.
**Recommendation:** Drop the `< 6` early return (or lower it to `< 1`) and rely on the existing per-index `fields.Count > N` guards, which already default missing fields safely. If a hard floor is wanted, document why 6 and not 13.
**Resolution:** _(open)_
**Resolution:** Resolved 2026-05-22 — lowered the early-return threshold to `< 1` (null or empty guard only); per-index `fields.Count > N` guards already default missing fields safely for all higher indices.
### Client.Shared-002
@@ -48,13 +48,13 @@
| Severity | Medium |
| Category | Correctness & logic bugs |
| Location | `OpcUaClientService.cs:351-355`, `OpcUaClientService.cs:373` |
| Status | Open |
| Status | Resolved |
**Description:** `GetRedundancyInfoAsync` performs unguarded unboxing casts on values read from the server: `(int)redundancySupportValue.Value` and `(byte)serviceLevelValue.Value`. Unlike the `ServerUriArray`/`ServerArray` reads below them, the `RedundancySupport` and `ServiceLevel` reads are not wrapped in try/catch. If the server returns the value boxed as a different numeric type than expected (e.g. `ServiceLevel` boxed as `int` instead of `byte`), or returns a null `Value` on a `Bad` DataValue, the cast throws `InvalidCastException`/`NullReferenceException` and the whole call fails instead of returning a sensible default.
**Recommendation:** Wrap the `RedundancySupport` and `ServiceLevel` reads in the same defensive pattern used for the array reads, using `Convert.ToInt32`/`Convert.ToByte` on the boxed value and falling back to `None`/`0` when the read status is bad or the value is null.
**Resolution:** _(open)_
**Resolution:** Resolved 2026-05-22 — replaced direct casts with `StatusCode.IsGood` guard + `Convert.ToInt32`/`Convert.ToByte` coercion; falls back to `None`/`0` when status is bad or value is null.
### Client.Shared-003
@@ -123,13 +123,13 @@
| Severity | Medium |
| Category | Concurrency & thread safety |
| Location | `OpcUaClientService.cs:581-622` |
| Status | Open |
| Status | Resolved |
**Description:** In the alarm fallback path, the `Task.Run` closure mutates the captured locals `activeState`, `ackedState`, `time`, and `capturedMessage`, then reads them when invoking `AlarmEvent`. Because the captured `_session` reference can be replaced by a concurrent failover (see Client.Shared-006), the supplemental `ReadValueAsync` calls may run against a session being disposed, throwing `ObjectDisposedException` — caught by the bare `catch`, after which the alarm is delivered with default (false/MinValue) states, silently misreporting it as inactive/unacknowledged. The notification callback also has no back-pressure: a burst of alarm events spawns an unbounded number of `Task.Run` continuations each doing 3-4 server round-trips.
**Recommendation:** Capture the session under the same lock proposed in Client.Shared-005 and skip the supplemental read if the session has changed or is disposed. Consider batching the four sequential `ReadValueAsync` calls into one `Read` request.
**Resolution:** _(open)_
**Resolution:** Resolved 2026-05-22 — added a `ReferenceEquals(session, _session)` guard at the top of the `Task.Run` body to skip reads if the session was replaced by failover; separated `ObjectDisposedException` from the general catch to drop rather than deliver the stale alarm.
### Client.Shared-008
@@ -138,13 +138,13 @@
| Severity | Medium |
| Category | Error handling & resilience |
| Location | `OpcUaClientService.cs:170-180`, `Helpers/ValueConverter.cs:15-31` |
| Status | Open |
| Status | Resolved |
**Description:** `WriteValueAsync` coerces a string input to the target type by reading the node's current value and inferring the type from `currentDataValue.Value`. When the node has never been written, or the read returns a `Bad` status with a null `Value`, `ValueConverter.ConvertValue` falls through to the `_ => rawValue` default and writes a raw `string` into, for example, an `Int32` node — the server then rejects it with `BadTypeMismatch`, surfacing as a confusing failure unrelated to the operator's input. Separately, `ConvertValue` uses `bool.Parse`, which accepts only `true`/`false` — operator input of `1`/`0` throws `FormatException` that propagates raw to the caller. The read-before-write also doubles the round-trip cost of every string write.
**Recommendation:** Inspect `currentDataValue.StatusCode` before trusting `Value`; when the type cannot be inferred, surface a clear error rather than writing a mistyped value. Make boolean parsing accept `1`/`0`/`yes`/`no`, and wrap parse failures in a descriptive exception naming the node and target type.
**Resolution:** _(open)_
**Resolution:** Resolved 2026-05-22 — `WriteValueAsync` now checks `StatusCode.IsGood` and non-null `Value` before calling `ConvertValue`, throwing a descriptive `InvalidOperationException` on bad reads; `ValueConverter` now uses a `ParseBool` helper accepting `true/false/1/0/yes/no` (case-insensitive) and wraps all parse/overflow failures in a `FormatException` with the target type and input value in the message.
### Client.Shared-009
+11 -11
View File
@@ -7,7 +7,7 @@
| Review date | 2026-05-22 |
| Commit reviewed | `76d35d1` |
| Status | Reviewed |
| Open findings | 11 |
| Open findings | 6 |
## Checklist coverage
@@ -36,7 +36,7 @@ a category produced nothing rather than leaving it blank.
| Severity | Medium |
| Category | Correctness & logic bugs |
| Location | `ViewModels/HistoryViewModel.cs:76`, `ViewModels/HistoryViewModel.cs:77` |
| Status | Open |
| Status | Resolved |
**Description:** `ReadHistoryAsync` runs as a `RelayCommand` body, which is invoked
on the UI thread, so the bare `IsLoading = true` at line 76 happens to land on the
@@ -53,7 +53,7 @@ is ever invoked off the UI thread (a future caller or test harness).
for consistency with the rest of the method, or set both `IsLoading` writes
synchronously and only dispatch the `ObservableCollection` mutations.
**Resolution:** _(open)_
**Resolution:** Resolved 2026-05-22 — Routed the `IsLoading = true` write through `_dispatcher.Post` to make both `IsLoading` assignments consistent with all other UI state mutations in the method.
### Client.UI-002
@@ -62,7 +62,7 @@ synchronously and only dispatch the `ObservableCollection` mutations.
| Severity | Medium |
| Category | Correctness & logic bugs |
| Location | `ViewModels/MainWindowViewModel.cs:255`, `ViewModels/MainWindowViewModel.cs:333` |
| Status | Open |
| Status | Resolved |
**Description:** `ConnectAsync` calls `await BrowseTree.LoadRootsAsync()` and
`ViewHistoryForSelectedNode` calls `History.SelectedNodeId = ...` by dereferencing
@@ -80,7 +80,7 @@ makes `InitializeService()` conditionally skip a VM would introduce a silent cra
uniformly, or have `InitializeService()` return non-null references the caller uses
directly so the compiler can prove non-nullness.
**Resolution:** _(open)_
**Resolution:** Resolved 2026-05-22 — Added `if (BrowseTree != null)` and `if (History != null)` guards at both dereference sites to match the guarding style already used for `Subscriptions` and `Alarms`.
### Client.UI-003
@@ -134,7 +134,7 @@ Avalonia major version.
| Severity | Medium |
| Category | Concurrency & thread safety |
| Location | `ViewModels/MainWindowViewModel.cs:286-304`, `ViewModels/MainWindowViewModel.cs:155-189` |
| Status | Open |
| Status | Resolved |
**Description:** `SubscriptionsViewModel` and `AlarmsViewModel` attach handlers to
the long-lived `_service` events (`DataChanged`, `AlarmEvent`) in their
@@ -156,7 +156,7 @@ latent correctness hazard.
handlers) from the `Disconnected` branch of the `OnConnectionStateChanged` partial
method, not only from `DisconnectAsync`.
**Resolution:** _(open)_
**Resolution:** Resolved 2026-05-22 — Added `Teardown()` calls to the `Disconnected` branch and added `Reattach()` methods (idempotent remove+add) called from the `Connected` branch to restore handlers after a server-side drop + reconnect.
### Client.UI-006
@@ -189,7 +189,7 @@ message or write the exception to a log. Distinguish "feature not supported"
| Severity | Medium |
| Category | Security |
| Location | `Services/UserSettings.cs:22-23`, `Services/JsonSettingsService.cs:38-50`, `ViewModels/MainWindowViewModel.cs:393-408` |
| Status | Open |
| Status | Resolved |
**Description:** The OPC UA `UserName`-token password is persisted in cleartext.
`UserSettings.Password` is a plain `string`, `JsonSettingsService.Save` serializes
@@ -205,7 +205,7 @@ the persisted model entirely (re-prompt each launch); encrypt it at rest with
or store only a non-reversible reference. At minimum, document the cleartext
storage as a known limitation.
**Resolution:** _(open)_
**Resolution:** Resolved 2026-05-22 — Removed `Password` from `UserSettings` and stopped writing/reading it in `SaveSettings`/`LoadSettings`; the operator is re-prompted each launch.
### Client.UI-008
@@ -214,7 +214,7 @@ storage as a known limitation.
| Severity | Medium |
| Category | Performance & resource management |
| Location | `ViewModels/MainWindowViewModel.cs:18`, `ViewModels/MainWindowViewModel.cs:125-148`, `App.axaml.cs:18-32` |
| Status | Open |
| Status | Resolved |
**Description:** `IOpcUaClientService` is declared `IDisposable`
(`IOpcUaClientService.cs:10`), and the concrete service owns an OPC UA session plus
@@ -230,7 +230,7 @@ any background reconnect timers are leaked until process exit. The
`ConnectionStateChanged` handler, and dispose `_service` from `MainWindow.OnClosing`
(alongside the existing `SaveSettings()` call).
**Resolution:** _(open)_
**Resolution:** Resolved 2026-05-22 — Added `IDisposable` to `MainWindowViewModel` with a `Dispose()` that detaches `ConnectionStateChanged`, calls `Teardown()` on child VMs, and calls `_service.Dispose()`; wired `Dispose()` into `MainWindow.OnClosing`.
### Client.UI-009
+9 -9
View File
@@ -7,7 +7,7 @@
| Review date | 2026-05-22 |
| Commit reviewed | `76d35d1` |
| Status | Reviewed |
| Open findings | 9 |
| Open findings | 5 |
## Checklist coverage
@@ -48,13 +48,13 @@
| Severity | Medium |
| Category | Correctness & logic bugs |
| Location | `src/Core/ZB.MOM.WW.OtOpcUa.Configuration/Migrations/20260417215224_StoredProcedures.cs:325` |
| Status | Open |
| Status | Resolved |
**Description:** `sp_RollbackToGeneration` opens its own `BEGIN TRANSACTION`, clones rows into a new Draft, then `EXEC dbo.sp_PublishGeneration`, which itself runs `BEGIN TRANSACTION` (nesting `@@TRANCOUNT` to 2) and on its failure paths executes a bare `ROLLBACK`. A bare `ROLLBACK` rolls back to the outermost transaction and sets `@@TRANCOUNT` to 0, so when `sp_RollbackToGeneration` later reaches its own `COMMIT` it runs with no open transaction and raises error 3902. The rollback clone is silently discarded and the caller sees a confusing secondary error instead of the real publish failure.
**Recommendation:** Make `sp_PublishGeneration` transaction-nesting aware: capture `@@TRANCOUNT` on entry, only `BEGIN TRANSACTION` when zero (otherwise `SAVE TRANSACTION`), and only `COMMIT`/`ROLLBACK` the level it owns. Alternatively factor the publish body into an inner proc that assumes an ambient transaction.
**Resolution:** _(open)_
**Resolution:** Resolved 2026-05-22 — made `sp_PublishGeneration` transaction-nesting aware: captures `@@TRANCOUNT` on entry, issues `BEGIN TRANSACTION` when zero or `SAVE TRANSACTION sp_PublishGeneration` when nested, and uses `ROLLBACK TRANSACTION sp_PublishGeneration` (savepoint rollback) on all failure paths in the nested case so the caller's outer transaction is not wiped; also wrapped `EXEC dbo.sp_ValidateDraft` in `BEGIN TRY ... END CATCH` so validation errors propagate correctly.
### Configuration-003
@@ -63,13 +63,13 @@
| Severity | Medium |
| Category | Correctness & logic bugs |
| Location | `src/Core/ZB.MOM.WW.OtOpcUa.Configuration/Validation/DraftValidator.cs:73` |
| Status | Open |
| Status | Resolved |
**Description:** `ValidatePathLength` computes path length with hard-coded constants — it always charges 64 chars for Enterprise+Site (`32 + 32 + ...`) regardless of the cluster's actual values. This over-rejects: a short Enterprise/Site is penalised by up to 64 unused chars, so a legitimately under-200-char path can fail `PathTooLong`. The check also silently `continue`s when an equipment's `UnsLineId`/`UnsAreaId` does not resolve, so an orphaned-line path is never length-checked.
**Recommendation:** Pass the actual `Enterprise` and `Site` strings into the validator (e.g. on `DraftSnapshot`, or as parameters alongside `ValidateClusterTopology`) and compute the real length. If the cluster row cannot be supplied, document the check as a conservative upper bound or emit a lower-severity warning rather than a hard error.
**Resolution:** _(open)_
**Resolution:** Resolved 2026-05-22 — added nullable `Enterprise` and `Site` properties to `DraftSnapshot`; `ValidatePathLength` uses actual lengths when set and falls back to the conservative 32-char upper bound per segment with a comment explaining the trade-off; `DraftValidationService` now loads the cluster row and populates both properties; added `PathLength_uses_actual_Enterprise_Site_when_provided` and `PathLength_conservative_fallback_when_Enterprise_Site_absent` unit tests.
### Configuration-004
@@ -108,13 +108,13 @@
| Severity | Medium |
| Category | Error handling & resilience |
| Location | `src/Core/ZB.MOM.WW.OtOpcUa.Configuration/LocalCache/ResilientConfigReader.cs:79` |
| Status | Open |
| Status | Resolved |
**Description:** The fallback `catch` filters on `ex is not OperationCanceledException`. A SQL command timeout surfaced by ADO.NET as a `TaskCanceledException` (derives from `OperationCanceledException`) is then treated as caller cancellation and propagates instead of falling back to the sealed cache — the opposite of the documented "fallback on any exception including timeout". The retry `ShouldHandle` predicate has the same shape, so command-timeout cancellations are also not retried consistently.
**Recommendation:** Distinguish caller cancellation from command-timeout cancellation explicitly: inspect `cancellationToken.IsCancellationRequested` to decide whether an `OperationCanceledException` is a genuine cancel (rethrow) or a timeout (fall back). Add unit tests for both a `TimeoutRejectedException` path and a command-timeout `TaskCanceledException` path asserting cache fallback occurs.
**Resolution:** _(open)_
**Resolution:** Resolved 2026-05-22 — changed the fallback `catch` filter to `ex is not OperationCanceledException || !cancellationToken.IsCancellationRequested` so a command-timeout `TaskCanceledException` (caller token not cancelled) triggers cache fallback while genuine caller cancellation still propagates; changed the retry `ShouldHandle` predicate to `Handle<Exception>()` (handles all exceptions, relying on Polly's own cancellation-token check to stop retrying on genuine cancellation); added three unit tests: `CommandTimeout_TaskCanceledException_FallsBackToCache`, `PollyTimeout_TimeoutRejectedException_FallsBackToCache`, and `CallerCancellation_Propagates_NotFallback`.
### Configuration-007
@@ -153,13 +153,13 @@
| Severity | Medium |
| Category | Security |
| Location | `src/Core/ZB.MOM.WW.OtOpcUa.Configuration/DesignTimeDbContextFactory.cs:14` |
| Status | Open |
| Status | Resolved |
**Description:** `DefaultConnectionString` embeds a plaintext `sa` password with `User Id=sa` directly in source, checked into the repository. Although used only at design time (`dotnet ef`), a checked-in `sa` credential normalises committing DB passwords and, if live for the shared dev SQL Server, grants `sa` to anyone with repo access. `TrustServerCertificate=True` plus `Encrypt=False` additionally disables transport protection for that connection.
**Recommendation:** Drop the embedded credential. Fall back to integrated auth (`Trusted_Connection=True`) or fail fast with a message instructing the developer to set `OTOPCUA_CONFIG_CONNECTION`. Rotate the dev `sa` password if this value is live.
**Resolution:** _(open)_
**Resolution:** Resolved 2026-05-22 — removed the embedded `sa` password and `DefaultConnectionString` constant entirely; `CreateDbContext` now throws `InvalidOperationException` with a clear setup message when `OTOPCUA_CONFIG_CONNECTION` is not set, rather than silently falling back to a hardcoded credential; added XML-doc example showing the recommended integrated-auth connection string.
### Configuration-010
+7 -7
View File
@@ -7,7 +7,7 @@
| Review date | 2026-05-22 |
| Commit reviewed | `76d35d1` |
| Status | Reviewed |
| Open findings | 8 |
| Open findings | 5 |
## Checklist coverage
@@ -36,13 +36,13 @@ a category produced nothing rather than leaving it blank.
| Severity | Medium |
| Category | Correctness & logic bugs |
| Location | `src/Core/ZB.MOM.WW.OtOpcUa.Core.Abstractions/PollGroupEngine.cs:112` |
| Status | Open |
| Status | Resolved |
**Description:** `PollOnceAsync` detects a change with `!Equals(lastSeen?.Value, current.Value)`. `object.Equals` falls back to reference equality for reference types that do not override it — including `T[]` array values. The capability interfaces explicitly support 1-D array attributes (`DriverAttributeInfo.IsArray`, `ValueRank=1`), and a driver's batch reader produces a fresh array instance on every poll. As a result every poll of an array-valued tag is treated as a change, so `OnDataChange` fires on every tick regardless of whether the array contents actually changed. This produces spurious data-change notifications and unnecessary OPC UA monitored-item publishes for any poll-based driver (Modbus, S7, AB CIP, FOCAS) that exposes array tags.
**Recommendation:** Compare array values structurally — e.g. when both `lastSeen?.Value` and `current.Value` are arrays, compare with `StructuralComparisons.StructuralEqualityComparer.Equals` (or element-wise) — instead of relying on `object.Equals`. Add a test covering an array-valued tag whose contents are unchanged across polls.
**Resolution:** _(open)_
**Resolution:** Resolved 2026-05-22 — introduced `ValuesAreDifferent` helper in `PollGroupEngine` that uses `StructuralComparisons.StructuralEqualityComparer` for `Array` values, falling back to `object.Equals` for scalars; added `Array_valued_tag_unchanged_contents_raises_only_once` and `Array_valued_tag_changed_contents_raises_event` tests.
### Core.Abstractions-002
@@ -51,13 +51,13 @@ a category produced nothing rather than leaving it blank.
| Severity | Medium |
| Category | Correctness & logic bugs |
| Location | `src/Core/ZB.MOM.WW.OtOpcUa.Core.Abstractions/PollGroupEngine.cs:105-109` |
| Status | Open |
| Status | Resolved |
**Description:** `PollOnceAsync` iterates `state.TagReferences` and indexes the reader's result with `snapshots[i]`, assuming the driver-supplied `_reader` delegate returns exactly one snapshot per input reference in input order. The contract is documented (ctor XML doc: "snapshots MUST be returned in the same order as the input references"), but it is never validated. A reader that returns a shorter list — a plausible driver bug, or a partial result on a backend error — throws `ArgumentOutOfRangeException` from the indexer. That exception escapes `PollOnceAsync`, is swallowed by the catch-all in `PollLoopAsync` (line 99), and the subscription then silently produces no further `OnDataChange` callbacks for the rest of its lifetime with no diagnostic. The failure mode is a permanently stalled subscription that looks healthy.
**Recommendation:** Validate `snapshots.Count == state.TagReferences.Count` at the top of `PollOnceAsync` and throw a descriptive exception (or skip the tick with a logged diagnostic) so the contract violation is visible rather than silently degrading. Consider surfacing repeated reader-contract failures through a callback the driver can route to its health surface.
**Resolution:** _(open)_
**Resolution:** Resolved 2026-05-22 — added count-guard at the top of `PollOnceAsync` that throws `InvalidOperationException` with a descriptive message when the reader returns the wrong number of snapshots; added `Reader_short_result_list_raises_descriptive_exception_and_loop_continues` test verifying the loop survives contract violations and resumes delivering events once the reader recovers.
### Core.Abstractions-003
@@ -66,7 +66,7 @@ a category produced nothing rather than leaving it blank.
| Severity | Medium |
| Category | Concurrency & thread safety |
| Location | `src/Core/ZB.MOM.WW.OtOpcUa.Core.Abstractions/PollGroupEngine.cs:64,121-130` |
| Status | Open |
| Status | Resolved |
**Description:** `Subscribe` starts the poll loop with a fire-and-forget `Task.Run` and keeps no reference to the returned `Task`. Neither `Unsubscribe` nor `DisposeAsync` awaits the loop's completion — they only cancel the `CancellationTokenSource` and dispose it. Two consequences:
@@ -75,7 +75,7 @@ a category produced nothing rather than leaving it blank.
**Recommendation:** Track each loop `Task` in `SubscriptionState` and await it (with a timeout) in `Unsubscribe`/`DisposeAsync` before disposing the CTS, so disposal is deterministic and no callback can fire after teardown. At minimum, defer `Cts.Dispose()` until the loop task has observed cancellation, or wrap the `Task.Delay` await to also tolerate `ObjectDisposedException`.
**Resolution:** _(open)_
**Resolution:** Resolved 2026-05-22 — stored the loop `Task` in `SubscriptionState.LoopTask`; `Unsubscribe` calls `StopState` which cancels then awaits the task (5 s timeout) before disposing the CTS; `DisposeAsync` cancels all loops in parallel then awaits them all via `Task.WhenAll` with a 5 s timeout before disposing CTSs, making teardown deterministic and preventing post-disposal callbacks.
### Core.Abstractions-004
+11 -11
View File
@@ -7,7 +7,7 @@
| Review date | 2026-05-22 |
| Commit reviewed | `76d35d1` |
| Status | Reviewed |
| Open findings | 7 |
| Open findings | 2 |
## Checklist coverage
@@ -63,13 +63,13 @@
| Severity | Medium |
| Category | OtOpcUa conventions |
| Location | `src/Core/ZB.MOM.WW.OtOpcUa.Core.AlarmHistorian/SqliteStoreAndForwardSink.cs:107-127,218-243,246-253` |
| Status | Open |
| Status | Resolved |
**Description:** `EnqueueAsync` is declared `async`-shaped (`Task EnqueueAsync(...)`) and the `IAlarmHistorianSink` contract explicitly states "the sink MUST NOT block the emitting thread … `EnqueueAsync` returns as soon as the queue row is committed." But the implementation does fully synchronous, blocking SQLite I/O (`conn.Open()`, `EnforceCapacity`, `cmd.ExecuteNonQuery()`) on the caller's thread and only then returns `Task.CompletedTask`. Under SQLite write contention with the drain worker this blocks the alarm-emitting thread for the full lock-wait. The same synchronous-work-behind-an-async-or-status-API pattern applies to `GetStatus` (called from the Admin UI / `/healthz` request thread) and `RetryDeadLettered`. The `cancellationToken` parameter of `EnqueueAsync` is accepted and ignored.
**Recommendation:** Either make the I/O genuinely asynchronous (`await conn.OpenAsync(ct)`, `await cmd.ExecuteNonQueryAsync(ct)``Microsoft.Data.Sqlite` supports the async surface), or change `EnqueueAsync` to an in-memory hand-off (e.g. a `Channel`) drained by a background writer so the emitting thread truly never touches the database. At minimum honor the `cancellationToken` parameter.
**Resolution:** _(open)_
**Resolution:** Resolved 2026-05-22 — `EnqueueAsync` now uses `OpenAsync` / `ExecuteNonQueryAsync` / `ExecuteScalarAsync` throughout (capacity check included); `ApplyPragmasAsync` handles the WAL/busy-timeout PRAGMA on the async path; `cancellationToken` is threaded through every await so cancellation is honoured.
### Core.AlarmHistorian-004
@@ -93,13 +93,13 @@
| Severity | Medium |
| Category | Concurrency & thread safety |
| Location | `src/Core/ZB.MOM.WW.OtOpcUa.Core.AlarmHistorian/SqliteStoreAndForwardSink.cs:66-71,141-143,199,386-388` |
| Status | Open |
| Status | Resolved |
**Description:** The mutable status fields `_lastDrainUtc`, `_lastSuccessUtc`, `_lastError`, `_drainState`, and `_backoffIndex` are written by the drain timer thread inside `DrainOnceAsync` and read concurrently by `GetStatus()` / `CurrentBackoff` on Admin-UI / health-check threads with no memory barrier (no `lock`, no `volatile`, no `Interlocked`). `DateTime?` is not guaranteed to be written atomically, and the reader can observe a stale or torn value. This is a diagnostics surface so the impact is limited, but a torn `DateTime?` read is real undefined behavior.
**Recommendation:** Guard the status fields with a small lock, or make the scalars `volatile` where the type permits and snapshot `DateTime?` values under a lock. Take the snapshot atomically in `GetStatus()`.
**Resolution:** _(open)_
**Resolution:** Resolved 2026-05-22 — added `_statusLock` object; all writes to `_lastDrainUtc`, `_lastSuccessUtc`, `_lastError`, `_drainState`, and `_evictedCount` (new) now happen inside `lock (_statusLock)` blocks; `GetStatus()` snapshots all fields atomically under the same lock. Regression test `GetStatus_snapshot_is_consistent_under_concurrent_drain` added.
### Core.AlarmHistorian-006
@@ -123,13 +123,13 @@
| Severity | Medium |
| Category | Error handling & resilience |
| Location | `src/Core/ZB.MOM.WW.OtOpcUa.Core.AlarmHistorian/SqliteStoreAndForwardSink.cs:172-174` |
| Status | Open |
| Status | Resolved |
**Description:** When the writer returns a wrong-cardinality result, the code throws `InvalidOperationException` after `WriteBatchAsync` has already succeeded. The events were potentially delivered to the historian, but no rows are deleted or dead-lettered, `_drainState` is left at `Draining`, and the backoff is not bumped. Combined with Core.AlarmHistorian-006 the exception is then swallowed. On the next drain the same batch is re-sent — if the writer actually delivered the events the first time, this produces duplicate historian rows; if it is a deterministic writer bug the queue stalls forever.
**Recommendation:** Treat a cardinality mismatch as a transient batch failure: log it, set `_lastError`, bump backoff, set `_drainState = BackingOff`, and return without throwing — mirroring the writer-exception path at lines 162-170. A deterministic writer contract violation should also raise an operator-visible alert rather than silently looping.
**Resolution:** _(open)_
**Resolution:** Resolved 2026-05-22 — the `throw InvalidOperationException` replaced with log-and-backoff: mismatch is recorded into `_lastError`, `_drainState = BackingOff`, backoff is bumped, and the method returns without applying any outcomes — rows stay queued for the next drain attempt. Regression test `Writer_returning_wrong_cardinality_outcomes_sets_backing_off_and_keeps_rows` added.
### Core.AlarmHistorian-008
@@ -153,13 +153,13 @@
| Severity | Medium |
| Category | Design-document adherence |
| Location | `src/Core/ZB.MOM.WW.OtOpcUa.Core.AlarmHistorian/SqliteStoreAndForwardSink.cs:317-347` |
| Status | Open |
| Status | Resolved |
**Description:** `docs/AlarmTracking.md` and the `IAlarmHistorianSink` contract present the SQLite queue as the durability guarantee — "Durably enqueue the event", "operator acks never block on the historian being reachable". But `EnforceCapacity` silently deletes the oldest non-dead-lettered (not-yet-sent) rows when the queue reaches `DefaultCapacity` (1,000,000). Those are alarm-event records that were accepted as durably queued and are then dropped before ever reaching the historian — silent alarm-history data loss under sustained historian outage. The only signal is a `WARN` log line. Neither `docs/AlarmTracking.md` nor the sink's XML doc mentions that the durability guarantee is bounded, and there is no metric/dead-letter trail for evicted rows.
**Recommendation:** At minimum document the bounded-durability behavior in `docs/AlarmTracking.md` and the `IAlarmHistorianSink` summary. Better: surface evicted-row counts in `HistorianSinkStatus` (a dedicated counter) so the loss is operator-visible, and consider routing overflow to the dead-letter table instead of hard-deleting it so the records survive for post-mortem within the retention window.
**Resolution:** _(open)_
**Resolution:** Resolved 2026-05-22 — added `EvictedCount` (default 0) to `HistorianSinkStatus` with full param-tag documentation; `EnforceCapacity` and `EnforceCapacityAsync` now increment `_evictedCount` (guarded by `_statusLock`) and include the lifetime total in the WARN log; `docs/AlarmTracking.md` documents the bounded-durability caveat and the `EvictedCount` surface. Regression test `Capacity_eviction_increments_evicted_count_on_status` added.
### Core.AlarmHistorian-010
@@ -168,13 +168,13 @@
| Severity | Medium |
| Category | Testing coverage |
| Location | `tests/Core/ZB.MOM.WW.OtOpcUa.Core.AlarmHistorian.Tests/SqliteStoreAndForwardSinkTests.cs` |
| Status | Open |
| Status | Resolved |
**Description:** The test suite covers the happy paths well (Ack/Retry/PermanentFail, capacity eviction, retention purge, ctor validation) but leaves critical paths untested: (a) no test exercises a corrupt / `null`-deserializing `PayloadJson` row, so the `rowIds`/`events` misalignment bug (Core.AlarmHistorian-001) was not caught; (b) no test for `StartDrainLoop` actually running on the timer, nor for the backoff being honored by the schedule (Core.AlarmHistorian-002); (c) no concurrency test running `EnqueueAsync` and `DrainOnceAsync` in parallel, which is the exact scenario that triggers `SQLITE_BUSY` (Core.AlarmHistorian-004); (d) no test for the `outcomes.Count != events.Count` cardinality-mismatch branch (Core.AlarmHistorian-007).
**Recommendation:** Add tests for: a corrupt payload row (insert raw bad JSON via a direct SQLite write, then drain and assert the correct row is dead-lettered and others are unaffected); a `FakeWriter` returning a wrong-length outcome list; a parallel enqueue/drain stress test; and the timer-driven `StartDrainLoop` path.
**Resolution:** _(open)_
**Resolution:** Resolved 2026-05-22 — (a) `Drain_with_corrupt_payload_row_deadletters_it_and_keeps_good_rows_aligned` and `Drain_with_corrupt_head_row_does_not_stall_queue` cover corrupt payloads; (b) `StartDrainLoop_honors_backoff_and_slows_cadence_under_retry`, `StartDrainLoop_keeps_steady_cadence_when_writer_is_healthy`, and `StartDrainLoop_records_drain_fault_and_keeps_running` cover the timer-driven path; (c) `Concurrent_enqueue_and_drain_do_not_throw_sqlite_busy` covers the concurrent stress path; (d) `Writer_returning_wrong_cardinality_outcomes_sets_backing_off_and_keeps_rows` covers the cardinality-mismatch branch. Additionally `Capacity_eviction_increments_evicted_count_on_status` and `GetStatus_snapshot_is_consistent_under_concurrent_drain` cover -009 and -005 respectively.
### Core.AlarmHistorian-011
+11 -11
View File
@@ -7,7 +7,7 @@
| Review date | 2026-05-22 |
| Commit reviewed | `76d35d1` |
| Status | Reviewed |
| Open findings | 11 |
| Open findings | 6 |
## Checklist coverage
@@ -51,13 +51,13 @@ a category produced nothing rather than leaving it blank.
| Severity | Medium |
| Category | Correctness & logic bugs |
| Location | `ScriptedAlarmEngine.cs:162`, `ScriptedAlarmEngine.cs:90` |
| Status | Open |
| Status | Resolved |
**Description:** `LoadAsync` is written to be re-callable — it begins by calling `UnsubscribeFromUpstream()`, `_alarms.Clear()`, and `_alarmsReferencing.Clear()` (lines 90-92), which only makes sense if a reload is supported. But at line 162 it unconditionally assigns `_shelvingTimer = new Timer(...)` without disposing the timer created by a previous `LoadAsync` call. A second `LoadAsync` therefore leaks the old `Timer` and leaves two timers running concurrently against the same `_alarms`/`_evalGate`. The old timer's `RunShelvingCheck` keeps firing forever.
**Recommendation:** Dispose any existing `_shelvingTimer` before reassigning it, e.g. `_shelvingTimer?.Dispose();` immediately before line 162, inside the `_evalGate` critical section. If reload is genuinely not supported, instead guard `LoadAsync` against a second call and document it as one-shot.
**Resolution:** _(open)_
**Resolution:** Resolved 2026-05-22 — added `_shelvingTimer?.Dispose()` before the timer reassignment in `LoadAsync` so a second load call does not leak the previous timer.
### Core.ScriptedAlarms-003
@@ -81,13 +81,13 @@ a category produced nothing rather than leaving it blank.
| Severity | Medium |
| Category | Concurrency & thread safety |
| Location | `ScriptedAlarmEngine.cs:138-143`, `ScriptedAlarmEngine.cs:227-234` |
| Status | Open |
| Status | Resolved |
**Description:** During `LoadAsync`, `_upstream.SubscribeTag(path, OnUpstreamChange)` is called inside the `_evalGate` critical section (line 142). If an upstream implementation delivers an initial value synchronously from inside `SubscribeTag` (a common pattern, and the `ITagUpstreamSource` contract does not forbid it), the observer callback `OnUpstreamChange` runs on the calling thread, schedules `ReevaluateAsync`, which calls `_evalGate.WaitAsync`. That does not deadlock (the reevaluation task simply blocks until `LoadAsync` releases the gate), but it can cause a re-evaluation to run against a half-initialised `_alarms`/index, and the value written to `_valueCache` on line 141 may be immediately overwritten by the subscription's synchronous push with no defined ordering. The cold-start guard partly masks this, but the ordering between the seed read (line 141) and the subscription push is unspecified and may seed a stale value.
**Recommendation:** Subscribe to all upstream tags after the seed reads and after `_loaded = true`, or capture the subscription's first push into the cache and treat `SubscribeTag` as the single source of truth (drop the separate `ReadTag` seed). Document the expected `ITagUpstreamSource` delivery semantics (does `SubscribeTag` push an initial value?).
**Resolution:** _(open)_
**Resolution:** Resolved 2026-05-22 — split the seed/subscribe loop: `ReadTag` seeds `_valueCache`, persisted-state restore runs, `_loaded = true` is set, then `SubscribeTag` is called; any synchronous initial push now arrives after `_alarms` is fully initialised and correctly queues behind the gate.
### Core.ScriptedAlarms-005
@@ -96,13 +96,13 @@ a category produced nothing rather than leaving it blank.
| Severity | Medium |
| Category | Concurrency & thread safety |
| Location | `ScriptedAlarmEngine.cs:365-369`, `ScriptedAlarmEngine.cs:416-424` |
| Status | Open |
| Status | Resolved |
**Description:** `Dispose` sets `_disposed = true`, disposes `_shelvingTimer`, and clears `_alarms`. A `RunShelvingCheck` callback already in flight on a thread-pool thread can have passed its `if (_disposed) return;` check (line 367) before `Dispose` ran, then proceed into `ShelvingCheckAsync`, which awaits `_evalGate` and mutates `_alarms` — concurrently with `Dispose`'s `_alarms.Clear()` at line 422 (which runs outside `_evalGate`). `Timer.Dispose()` does not wait for the running callback to finish. The result is a possible `InvalidOperationException` from a dictionary mutated during enumeration, or a save of stale state to the store after dispose. The same applies to a `ReevaluateAsync` in flight from a late upstream push.
**Recommendation:** Use `Timer.Dispose(WaitHandle)` (or `DisposeAsync`) to wait for the callback to drain, and perform `_alarms.Clear()` under `_evalGate` (or simply drop the clear — the object is being discarded). Also have `ShelvingCheckAsync`/`ReevaluateAsync` re-check `_disposed` after acquiring the gate before mutating/saving.
**Resolution:** _(open)_
**Resolution:** Resolved 2026-05-22 — added `_disposed` re-checks in `ReevaluateAsync` and `ShelvingCheckAsync` after acquiring `_evalGate` so late callbacks bail out cleanly; dropped the unsynchronised `_alarms.Clear()` from `Dispose` since the object is being discarded and the clear raced concurrent reads.
### Core.ScriptedAlarms-006
@@ -126,13 +126,13 @@ a category produced nothing rather than leaving it blank.
| Severity | Medium |
| Category | Error handling & resilience |
| Location | `ScriptedAlarmEngine.cs:216`, `ScriptedAlarmEngine.cs:251`, `ScriptedAlarmEngine.cs:154`, `ScriptedAlarmEngine.cs:387` |
| Status | Open |
| Status | Resolved |
**Description:** Every state mutation calls `await _store.SaveAsync(...)` and relies on it succeeding. If the production SQL-backed `IAlarmStateStore` (Stream E) throws — transient SQL outage, deadlock, timeout — the exception propagates: in `ApplyAsync` it surfaces to the Part 9 method caller *after* the in-memory `_alarms` entry was already updated (line 215 runs before the save on line 216), leaving the in-memory state and the persisted state divergent; in `ReevaluateAsync`/`ShelvingCheckAsync` it is caught and logged, but again the in-memory `_alarms` entry was already advanced (lines 250/386) so the persisted store silently falls behind the live state. After a restart, startup recovery reloads the stale persisted state and operators can see a re-raised or re-ackable alarm. The docs claim "the store's view is always consistent with the in-memory state" (`docs/ScriptedAlarms.md` State persistence) — that invariant is not actually enforced.
**Recommendation:** Save before committing the in-memory update, or roll back the in-memory entry if `SaveAsync` fails, so the two never diverge. Classify transient store failures and retry, and surface a hard error/health-degraded signal if persistence is permanently failing rather than silently logging and continuing.
**Resolution:** _(open)_
**Resolution:** Resolved 2026-05-22 — reordered `SaveAsync`/`_alarms[id]=` in `ApplyAsync`, `ReevaluateAsync`, and `ShelvingCheckAsync` so persistence happens before the in-memory update; a store failure now leaves both views at the prior state rather than diverging.
### Core.ScriptedAlarms-008
@@ -201,10 +201,10 @@ a category produced nothing rather than leaving it blank.
| Severity | Medium |
| Category | Testing coverage |
| Location | `tests/Core/ZB.MOM.WW.OtOpcUa.Core.ScriptedAlarms.Tests/ScriptedAlarmEngineTests.cs` |
| Status | Open |
| Status | Resolved |
**Description:** Several engine behaviours central to the module have no test coverage: (1) the 5-second shelving timer / timed-shelve auto-expiry through the *engine* — only the pure `Part9StateMachine.ApplyShelvingCheck` is tested, never `ScriptedAlarmEngine` driving the timer with an injectable clock; (2) `ConfirmAsync`, `TimedShelveAsync`, `UnshelveAsync`, `EnableAsync` engine methods (only `Acknowledge`, `OneShotShelve`, `Disable`, `AddComment` are exercised); (3) `OnEvent` subscriber-throws isolation (`EmitEvent` catch on line 357); (4) `IAlarmStateStore.SaveAsync` failure handling (finding 007); (5) re-entrant `LoadAsync` and the timer leak (finding 002); (6) the cold-start `AreInputsReady` guard with Bad / null / Uncertain inputs. The `clock` and `scriptTimeout` constructor parameters exist specifically to make timer/timeout tests deterministic but no test uses them.
**Recommendation:** Add engine-level tests that inject a controllable `Func<DateTime>` clock to drive `RunShelvingCheck`, cover the remaining Part 9 engine methods end-to-end, assert subscriber-exception isolation, and add a store-failure fake to lock in the chosen persistence-failure semantics from finding 007.
**Resolution:** _(open)_
**Resolution:** Resolved 2026-05-22 — added 8 new engine-level tests covering all 6 gap areas: injectable-clock timed-shelve expiry via `RunShelvingCheckForTest`, `ConfirmAsync`/`TimedShelveAsync`/`UnshelveAsync`/`EnableAsync` end-to-end, subscriber-exception isolation, store-failure invariant, second-`LoadAsync` timer-leak regression, and `AreInputsReady` Bad/Uncertain guard; exposed `RunShelvingCheckForTest()` internal hook on the engine.
+9 -9
View File
@@ -7,7 +7,7 @@
| Review date | 2026-05-22 |
| Commit reviewed | `76d35d1` |
| Status | Reviewed |
| Open findings | 9 |
| Open findings | 5 |
## Checklist coverage
@@ -112,7 +112,7 @@ node form plus over-block guards for allowed generics/`typeof`.
| Severity | Medium |
| Category | Security |
| Location | `TimedScriptEvaluator.cs:9`, `ScriptSandbox.cs:30` |
| Status | Open |
| Status | Resolved |
**Description:** There is no bound on memory a script may allocate or on the number of
threads/tasks a script may spawn. The class docs acknowledge unbounded memory as "a budget
@@ -132,7 +132,7 @@ script authoring behind an Admin permission and treat the test-harness preview a
control point, or track an explicit v3 issue for out-of-process execution. Record the
decision so it is not silently lost.
**Resolution:** _(open)_
**Resolution:** Resolved 2026-05-22 — added `System.Threading.Tasks` to `ForbiddenNamespacePrefixes` (blocking `Task.Run` / `Parallel` fan-out); documented the unbounded-memory accepted risk and the `Task` denial rationale in `docs/VirtualTags.md` (new "Known resource limits" subsection) and cross-referenced from `docs/ScriptedAlarms.md`.
### Core.Scripting-004
@@ -141,7 +141,7 @@ decision so it is not silently lost.
| Severity | Medium |
| Category | Correctness & logic bugs |
| Location | `DependencyExtractor.cs:73` |
| Status | Open |
| Status | Resolved |
**Description:** The walker matches tag-access calls purely by spelling — any
`InvocationExpressionSyntax` whose member name is `GetTag` or `SetVirtualTag` is treated as
@@ -160,7 +160,7 @@ member-access call to a non-ctx `GetTag` is untested and would be misattributed.
(matching the `ScriptGlobals<TContext>.ctx` field name). Add a test for
`someOtherObject.GetTag("X")` asserting it is ignored.
**Resolution:** _(open)_
**Resolution:** Resolved 2026-05-22 — `VisitInvocationExpression` now additionally checks that `member.Expression` is an `IdentifierNameSyntax` with `ValueText == "ctx"` before treating the call as a dependency; test `Ignores_member_access_GetTag_on_non_ctx_receiver` added to `DependencyExtractorTests`.
### Core.Scripting-005
@@ -215,7 +215,7 @@ evicted.
| Severity | Medium |
| Category | Error handling & resilience |
| Location | `TimedScriptEvaluator.cs:60` |
| Status | Open |
| Status | Resolved |
**Description:** `RunAsync` wraps the inner run in `Task.Run(...)` and then awaits
`WaitAsync(Timeout, ct)`. If the caller-supplied `ct` cancels at roughly the same time the
@@ -231,7 +231,7 @@ and throw `OperationCanceledException(ct)` instead of `ScriptTimeoutException` w
caller's token is cancelled, so caller cancellation deterministically wins regardless of
race ordering.
**Resolution:** _(open)_
**Resolution:** Resolved 2026-05-22 — in the `catch (TimeoutException)` handler, `ct.IsCancellationRequested` is now checked and `OperationCanceledException(ct)` thrown before `ScriptTimeoutException`, so caller cancellation deterministically wins regardless of race ordering; regression test `Caller_cancellation_wins_even_when_timeout_fires_first` added to `TimedScriptEvaluatorTests`.
### Core.Scripting-008
@@ -292,7 +292,7 @@ code.
| Severity | Medium |
| Category | Testing coverage |
| Location | `tests/Core/ZB.MOM.WW.OtOpcUa.Core.Scripting.Tests/ScriptSandboxTests.cs:54` |
| Status | Open |
| Status | Resolved |
**Description:** The sandbox-escape test suite covers only the four obvious vectors
(File / Http / Process / Reflection) as direct member-access calls. It does not test:
@@ -309,7 +309,7 @@ surface.
Core.Scripting-002 and every forbidden namespace/member in Core.Scripting-001. Each must
assert a `ScriptSandboxViolationException` (or `CompilationErrorException`) at compile.
**Resolution:** _(open)_
**Resolution:** Resolved 2026-05-22 — added `ScriptSandboxTests` cases for `System.Threading.Thread`, `System.Threading.Tasks.Task.Run`, `System.Runtime.InteropServices.Marshal`, and `Microsoft.Win32.Registry` (the four namespace-deny-list vectors that had no test); the 001/002 vectors (Environment.Exit/FailFast/AppDomain/GC/Activator, typeof, generics, cast, default(T), is/as, array element, declared variable) were already covered by the -001/-002 resolution commits. All 79 tests pass.
### Core.Scripting-011
+11 -11
View File
@@ -7,7 +7,7 @@
| Review date | 2026-05-22 |
| Commit reviewed | `76d35d1` |
| Status | Reviewed |
| Open findings | 12 |
| Open findings | 7 |
## Checklist coverage
@@ -67,7 +67,7 @@ code and docs agree.
| Severity | Medium |
| Category | Correctness & logic bugs |
| Location | `src/Core/ZB.MOM.WW.OtOpcUa.Core.VirtualTags/VirtualTagEngine.cs:237` |
| Status | Open |
| Status | Resolved |
**Description:** The cold-start guard `if (!AreInputsReady(ctxCache)) return;` silently
abandons the evaluation when any input is null or Bad-quality. For a chained virtual tag
@@ -87,7 +87,7 @@ rather than returning with no state change, so clients see a defined quality. If
operators need scripts that handle Bad upstreams, consider a per-definition opt-out of
the readiness guard.
**Resolution:** _(open)_
**Resolution:** Resolved 2026-05-22 — cold-start guard now publishes `BadWaitingForInitialData` (0x80320000) and notifies observers instead of silently returning, so OPC UA clients see a defined quality rather than a stale prior value.
### Core.VirtualTags-003
@@ -96,7 +96,7 @@ the readiness guard.
| Severity | Medium |
| Category | Correctness & logic bugs |
| Location | `src/Core/ZB.MOM.WW.OtOpcUa.Core.VirtualTags/VirtualTagEngine.cs:117-120` |
| Status | Open |
| Status | Resolved |
**Description:** The upstream-subscription loop in `Load` iterates
`definitions.SelectMany(d => _tags[d.Path].Reads)`. If `definitions` contains two rows
@@ -115,7 +115,7 @@ them to `compileFailures` (or a dedicated rejection list) so the aggregated
`definitions.SelectMany(d => _tags[d.Path]...)` when collecting upstream paths so the
collection is keyed off the registered set, not the raw input list.
**Resolution:** _(open)_
**Resolution:** Resolved 2026-05-22 — `Load` now tracks seen paths and adds a duplicate-path entry to `compileFailures`; the upstream-subscription loop iterates `_tags.Values` instead of the raw `definitions` list so it is keyed off the registered set.
### Core.VirtualTags-004
@@ -148,7 +148,7 @@ document precisely which `DriverDataType` values `CoerceResult` supports and val
| Severity | Medium |
| Category | Concurrency & thread safety |
| Location | `src/Core/ZB.MOM.WW.OtOpcUa.Core.VirtualTags/VirtualTagSource.cs:50-64` |
| Status | Open |
| Status | Resolved |
**Description:** `SubscribeAsync` registers the per-path engine observers first (lines
52-56), then in a second loop reads the current value and fires the initial-data
@@ -163,7 +163,7 @@ each path before registering the change observer for that path (or hold a per-ha
lock spanning both so no engine callback interleaves). The initial value must be
delivered before any subsequent change for that path.
**Resolution:** _(open)_
**Resolution:** Resolved 2026-05-22 — `SubscribeAsync` now fires the initial-data callback per path before registering the change observer for that path, eliminating the out-of-order delivery race.
### Core.VirtualTags-006
@@ -223,7 +223,7 @@ expected upper bound on group evaluation time relative to the interval.
| Severity | Medium |
| Category | Performance & resource management |
| Location | `src/Core/ZB.MOM.WW.OtOpcUa.Core.VirtualTags/DependencyGraph.cs:81-115` |
| Status | Open |
| Status | Resolved |
**Description:** `TransitiveDependentsInOrder` calls `TopologicalSort()` (a full O(V+E)
Kahn pass plus a Dictionary rank build) on every invocation, and it is invoked from
@@ -237,7 +237,7 @@ end of `Load` and cache it on `DependencyGraph` (invalidated by `Add` / `Clear`)
`TransitiveDependentsInOrder` then reuses the cached rank map. This turns a per-event
O(V+E) cost into an O(closure) cost.
**Resolution:** _(open)_
**Resolution:** Resolved 2026-05-22 — `DependencyGraph` now caches the topological rank dictionary (invalidated by `Add`/`Clear`) via `GetOrBuildRank()`; `TransitiveDependentsInOrder` reuses it, reducing per-change-event cost from O(V+E) to O(closure).
### Core.VirtualTags-009
@@ -314,7 +314,7 @@ retained.
| Severity | Medium |
| Category | Testing coverage |
| Location | `tests/Core/ZB.MOM.WW.OtOpcUa.Core.VirtualTags.Tests/` |
| Status | Open |
| Status | Resolved |
**Description:** Several behaviours of the engine have no test coverage:
(1) the cold-start `AreInputsReady` guard -- no test exercises an upstream that is
@@ -333,7 +333,7 @@ double-to-int32 is tested);
**Recommendation:** Add unit tests for each path above. Items (1), (2), and (6) directly
correspond to open correctness findings and would have caught them.
**Resolution:** _(open)_
**Resolution:** Resolved 2026-05-22 — added 9 unit tests covering all 7 gaps: `AreInputsReady` guard publishes `BadWaitingForInitialData` and recovers; `SetVirtualTag` cascade to dependent; write to non-registered path; `EvaluateOneAsync` before `Load` and for unregistered path; `CoerceResult` failure maps to `BadInternalError`; duplicate-path rejection; `Read`/`Subscribe` before `Load`.
### Core.VirtualTags-013
+9 -9
View File
@@ -7,7 +7,7 @@
| Review date | 2026-05-22 |
| Commit reviewed | `76d35d1` |
| Status | Reviewed |
| Open findings | 10 |
| Open findings | 6 |
## Checklist coverage
@@ -63,13 +63,13 @@
| Severity | Medium |
| Category | Correctness & logic bugs |
| Location | `src/Core/ZB.MOM.WW.OtOpcUa.Core/Authorization/PermissionTrie.cs:80-98` |
| Status | Open |
| Status | Resolved |
**Description:** `WalkSystemPlatform` records every Galaxy folder-segment grant with `NodeAclScopeKind.Equipment` (see the comment at lines 82-86) because `NodeAclScopeKind` has no `FolderSegment` member. The functional union of permission flags is unaffected, but the `MatchedGrant.Scope` carried in `AuthorizationDecision.Provenance` is wrong for Galaxy nodes: a grant anchored at a namespace-root folder and a grant anchored at a deep sub-folder both report `Equipment`, and a namespace-level grant is indistinguishable from a folder-level grant in the audit trail and the Admin UI "Probe this permission" diagnostic. The Phase 6.2 design (adversarial-review item #6) calls for a dedicated `FolderSegment` scope level. The current code is a known shortcut but references only an untracked "TODO" with no issue ID.
**Recommendation:** Add a `FolderSegment` member to `NodeAclScopeKind` and use it in `WalkSystemPlatform` and `PermissionTrieBuilder` so Galaxy folder grants report their true scope. If the enum change is deferred, file a tracked issue and reference its ID in the code comment.
**Resolution:** _(open)_
**Resolution:** Resolved 2026-05-22 — added `FolderSegment` to `NodeAclScopeKind`; `WalkSystemPlatform` now reports `FolderSegment` instead of `Equipment` for each visited Galaxy folder level; added three regression tests asserting the correct scope is reported in `MatchedGrant.Scope`.
### Core-004
@@ -93,13 +93,13 @@
| Severity | Medium |
| Category | Concurrency & thread safety |
| Location | `src/Core/ZB.MOM.WW.OtOpcUa.Core/Authorization/PermissionTrieCache.cs:59-70` |
| Status | Open |
| Status | Resolved |
**Description:** `Prune` mutates the `ConcurrentDictionary` with a plain indexer assignment (`_byCluster[clusterId] = new ClusterEntry(...)`) after a separate `TryGetValue` read. If `Install` runs concurrently for the same cluster, the `AddOrUpdate` in `Install` and the indexer write in `Prune` race: `Prune` can read an entry, `Install` then adds a newer generation via `AddOrUpdate`, and `Prune`'s unconditional indexer write then overwrites the entry — silently dropping the just-installed newest generation and its `Current` pointer. The class is documented as a process-singleton accessed on the hot path while publishes install new tries, so the race is reachable.
**Recommendation:** Make `Prune` use an atomic compare-and-swap loop — `_byCluster.TryUpdate(clusterId, prunedEntry, observedEntry)` retried until it succeeds or the key is gone — or perform the prune inside an `AddOrUpdate` update factory.
**Resolution:** _(open)_
**Resolution:** Resolved 2026-05-22 — changed `ClusterEntry` from `sealed record` to `sealed class` (enabling reference-equality CAS via `TryUpdate`); `Prune` now uses a read-compute-`TryUpdate` retry loop that restarts if another thread updates the entry between the read and the write; added regression tests asserting the current generation is preserved after a concurrent install + prune sequence.
### Core-006
@@ -108,13 +108,13 @@
| Severity | Medium |
| Category | Concurrency & thread safety |
| Location | `src/Core/ZB.MOM.WW.OtOpcUa.Core/OpcUa/GenericDriverNodeManager.cs:42-64` |
| Status | Open |
| Status | Resolved |
**Description:** `BuildAddressSpaceAsync` is not guarded against being called more than once. A second call subscribes a second `_alarmForwarder` to `IAlarmSource.OnAlarmEvent` and overwrites the `_alarmForwarder` field, so the first delegate is leaked (still subscribed, never unsubscribed because `Dispose` only removes the field's current value). Every alarm transition would then be delivered to its sink twice. The address-space rebuild path on Galaxy redeploy (`DeployWatcher``IRediscoverable.OnRediscoveryNeeded` → server rebuilds the address space) is exactly the scenario where a node manager could legitimately be re-walked. There is also no check of the `_disposed` flag at the top of the method.
**Recommendation:** Either guard `BuildAddressSpaceAsync` so a second call throws `InvalidOperationException` (document it single-shot), or unsubscribe the previous `_alarmForwarder` and clear `_alarmSinks` before re-walking. Also check `_disposed` and throw `ObjectDisposedException` if already disposed.
**Resolution:** _(open)_
**Resolution:** Resolved 2026-05-22 — `BuildAddressSpaceAsync` now checks `_disposed` (throws `ObjectDisposedException`) and tears down the previous alarm forwarder + clears the sink registry before re-walking so a Galaxy-redeploy rebuild does not double-subscribe the forwarder; added three regression tests covering double-build, sink-count after rebuild, and post-dispose throw.
### Core-007
@@ -123,13 +123,13 @@
| Severity | Medium |
| Category | Error handling & resilience |
| Location | `src/Core/ZB.MOM.WW.OtOpcUa.Core/Resilience/AlarmSurfaceInvoker.cs:75-83` |
| Status | Open |
| Status | Resolved |
**Description:** `UnsubscribeAsync` always routes through `_defaultHost`, even when an `IPerCallHostResolver` is wired and the original `SubscribeAsync` fanned the subscription out to a non-default host. The `IAlarmSubscriptionHandle` is opaque here and carries no host association, so an unsubscribe for a subscription created against host B runs through host A's resilience pipeline. In a multi-host driver this charges the wrong host's circuit breaker / bulkhead and, if host A is open while host B is healthy, can spuriously block a valid unsubscribe. The XML doc claims it routes "for parity" with `SubscribeAsync` but subscribe is per-host and unsubscribe is not.
**Recommendation:** Carry the resolved host on the `IAlarmSubscriptionHandle` (or in a handle→host map kept by `AlarmSurfaceInvoker`) so `UnsubscribeAsync` routes through the same host's pipeline the subscription was created on.
**Resolution:** _(open)_
**Resolution:** Resolved 2026-05-22 — `SubscribeAsync` now wraps each driver handle in a `HostBoundHandle` (private `IAlarmSubscriptionHandle` implementation) that carries the resolved host name; `UnsubscribeAsync` unwraps it and routes through the recorded host's pipeline, falling back to the default host for handles not created by this invoker; added two regression tests verifying per-host routing and single-host fallback.
### Core-008
+5 -5
View File
@@ -7,7 +7,7 @@
| Review date | 2026-05-22 |
| Commit reviewed | `76d35d1` |
| Status | Reviewed |
| Open findings | 8 |
| Open findings | 6 |
## Checklist coverage
@@ -40,7 +40,7 @@ a category produced nothing rather than leaving it blank.
| Severity | Medium |
| Category | Error handling & resilience |
| Location | `src/Drivers/Cli/ZB.MOM.WW.OtOpcUa.Driver.AbCip.Cli/Commands/WriteCommand.cs:70-85` |
| Status | Open |
| Status | Resolved |
**Description:** `ParseValue` parses every numeric Logix type with the BCL `*.Parse`
methods (`sbyte.Parse`, `short.Parse`, `int.Parse`, `float.Parse`, ...). These throw
@@ -59,7 +59,7 @@ one-line error. CliFx only formats `CommandException` cleanly.
rethrows as a `CommandException` with the raw value, the target `--type`, and the
valid range — mirroring the `ParseBool` failure message.
**Resolution:** _(open)_
**Resolution:** Resolved 2026-05-22 — wrapped the `ParseValue` switch in `try/catch (FormatException or OverflowException)` that rethrows as `CommandException` with the raw value and type; updated the previously-passing `ParseValue_non_numeric_for_numeric_types_throws` test to assert `CommandException` and added two new tests covering overflow and actionable message content.
### Driver.AbCip.Cli-002
@@ -68,7 +68,7 @@ valid range — mirroring the `ParseBool` failure message.
| Severity | Medium |
| Category | Correctness & logic bugs |
| Location | `src/Drivers/Cli/ZB.MOM.WW.OtOpcUa.Driver.AbCip.Cli/Commands/ProbeCommand.cs:21-23`; `Commands/ReadCommand.cs:24-25`; `Commands/SubscribeCommand.cs:20-22` |
| Status | Open |
| Status | Resolved |
**Description:** `ProbeCommand`, `ReadCommand`, and `SubscribeCommand` expose
`--type` as a free `AbCipDataType` enum option with no exclusion of
@@ -87,7 +87,7 @@ are out of scope here", but the code does not enforce it.
pattern `WriteCommand` uses, or factor a shared `RejectStructure(DataType)` guard
into `AbCipCommandBase`.
**Resolution:** _(open)_
**Resolution:** Resolved 2026-05-22 — added `RejectStructure(AbCipDataType)` static helper to `AbCipCommandBase` that throws `CommandException` for `Structure`; called at the top of `ExecuteAsync` in `ProbeCommand`, `ReadCommand`, and `SubscribeCommand`.
### Driver.AbCip.Cli-003
+13 -13
View File
@@ -7,7 +7,7 @@
| Review date | 2026-05-22 |
| Commit reviewed | `76d35d1` |
| Status | Reviewed |
| Open findings | 11 |
| Open findings | 5 |
## Checklist coverage
@@ -78,13 +78,13 @@
| Severity | Medium |
| Category | Correctness & logic bugs |
| Location | `AbCipDataType.cs:51-58`, `LibplctagTagRuntime.cs:47-49,53` |
| Status | Open |
| Status | Resolved |
**Description:** `ToDriverDataType` maps `LInt`/`ULInt` to `DriverDataType.Int32` (a TODO comment notes the gap) and `Dt` to `Int32`. But `LibplctagTagRuntime.DecodeValueAt` returns an actual `long` for `LInt`/`ULInt` (`_tag.GetInt64`, `(long)_tag.GetUInt64`). The address space is built declaring an Int32 node while the driver hands the server a boxed `long` `DataValueSnapshot.Value` at runtime: a mismatch between the declared OPC UA data type and the runtime value type. For `LInt` values exceeding Int32.MaxValue there is data loss if any consumer narrows it. `UDInt` is declared Int32 but decoded as `(int)_tag.GetUInt32`, so values above int.MaxValue wrap to negative.
**Recommendation:** Either add Int64/UInt32/UInt64 to `DriverDataType` and map correctly, or, until that lands, decode `LInt`/`ULInt` consistently with the declared `Int32` type (and document the truncation), and decode `UDInt` as a value that fits Int32 semantics. The declared type and the runtime value type must agree.
**Resolution:** _(open)_
**Resolution:** Resolved 2026-05-22 — `ToDriverDataType` now maps `LInt``Int64`, `ULInt``UInt64`, `UDInt``UInt32` (all already present in `DriverDataType`); `DecodeValueAt` updated to return `uint`/`ulong` for UDInt/ULInt respectively so the declared type and runtime value agree. The `(int)` and `(long)` casts that caused truncation/wrap are removed.
### Driver.AbCip-005
@@ -93,13 +93,13 @@
| Severity | Medium |
| Category | Correctness & logic bugs |
| Location | `AbCipDriver.cs:124-141` |
| Status | Open |
| Status | Resolved |
**Description:** In `InitializeAsync`, when a `Structure` tag declares `Members`, the loop registers each fanned-out member into `_tagsByName` but the parent Structure tag itself is also left in `_tagsByName` (added at line 125 before the member check). A subsequent `ReadAsync` of the parent name routes through `ReadSingleAsync` then `DecodeValue(AbCipDataType.Structure, ...)` which returns `null` with `Good` status. A client reading the parent UDT node thus gets a Good/null snapshot rather than a fault or a structured value. Also, member registration does not check for name collisions: if two configured tags produce the same parent-dot-member key (or a member name collides with an independently-declared tag), the later silently overwrites the earlier with no diagnostic.
**Recommendation:** Decide the parent-Structure read contract explicitly: either do not register the bare parent name as a readable tag, or have the Structure read return a proper status. Add a duplicate-key check during `_tagsByName` population that throws an `InvalidOperationException` naming both colliding tags, consistent with the fail-fast validation `AbCipHostAddress` parsing already does.
**Resolution:** _(open)_
**Resolution:** Resolved 2026-05-22 — The parent Structure tag remains in `_tagsByName` so the whole-UDT grouping planner (Driver.AbCip-003 fast path) and alarm projection can still find it. `ReadSingleAsync` now detects a direct read of a Structure-with-Members and returns `BadNotSupported` instead of Good/null, documenting that callers must address individual member paths. Both scalar and member fan-out registration perform a duplicate-key check that throws `InvalidOperationException` naming both colliding entries (fail-fast, consistent with `AbCipHostAddress` validation).
### Driver.AbCip-006
@@ -108,13 +108,13 @@
| Severity | Medium |
| Category | OtOpcUa conventions |
| Location | `PlcTagHandle.cs:28-59`, `AbCipDriver.cs:806-807,832-833`, `LibplctagTagRuntime.cs:117` |
| Status | Open |
| Status | Resolved |
**Description:** `driver-specs.md` makes the SafeHandle-wrapped native handle a non-negotiable Tier-B protection ("Wrap every libplctag handle in a SafeHandle with finalizer calling plc_tag_destroy"). The repo ships `PlcTagHandle : SafeHandle` for this, but it is dead code: `ReleaseHandle` is a permanent no-op (the comment says the `plc_tag_destroy` P/Invoke "is deferred to PR 3", well past the commit under review), and `DeviceState.TagHandles` is never populated anywhere in the driver. The real native lifetime is delegated to the libplctag.NET `Tag` object own `Dispose()`. The mandated finalizer-backed leak protection therefore does not exist: if a `LibplctagTagRuntime` is GC-collected without `Dispose` (owning thread crashes, exception bypasses the device dispose path), whether the native tag is freed depends entirely on whether libplctag.NET `Tag` has its own finalizer, which is not guaranteed by this driver code as the design requires.
**Recommendation:** Either delete `PlcTagHandle` and `DeviceState.TagHandles` as misleading dead scaffolding and document that native lifetime is owned by libplctag.NET `Tag` finalizer (verifying that `Tag` actually has one), or finish the intended design by making `LibplctagTagRuntime` hold a real `PlcTagHandle` with a working `ReleaseHandle` calling `plc_tag_destroy`.
**Resolution:** _(open)_
**Resolution:** Resolved 2026-05-22 — `PlcTagHandle.cs` deleted; `DeviceState.TagHandles` removed from `DeviceState`; its `DisposeHandles` loop cleaned up. The class-level doc comment on `AbCipDriver` updated to document that native lifetime is owned by libplctag.NET `Tag.Dispose()` (called in `DisposeHandles`) with the library's own finalizer covering GC-collected instances. The two dead-code test methods for `PlcTagHandle` removed from `AbCipDriverTests`.
### Driver.AbCip-007
@@ -153,13 +153,13 @@
| Severity | Medium |
| Category | Concurrency & thread safety |
| Location | `AbCipDriver.cs:621-648`, `AbCipDriver.cs:591-614` |
| Status | Open |
| Status | Resolved |
**Description:** `EnsureTagRuntimeAsync` and `EnsureParentRuntimeAsync` are check-then-act on a non-thread-safe `Dictionary` (`device.Runtimes` / `device.ParentRuntimes`). `ReadAsync` is `IReadable` and may be invoked concurrently: the server read path, each polled subscription loop, and the alarm projection poll loop all call `ReadAsync` independently. Two concurrent `ReadAsync` calls that both miss the cache for the same tag both create a `LibplctagTagRuntime`, both initialize it, and both write into the dictionary; the loser leaks an initialized native tag (never disposed, since only the dictionary value is disposed at shutdown), and concurrent `Dictionary` mutation can throw or corrupt the bucket structure. `WriteBitInDIntAsync` serializes the parent via a per-parent `SemaphoreSlim`, but `EnsureParentRuntimeAsync` still runs the same unguarded check-then-act on the shared `ParentRuntimes` dict.
**Recommendation:** Use `ConcurrentDictionary` for `Runtimes` and `ParentRuntimes`, creating the runtime via `GetOrAdd` with a lazily-initialized factory, or guard the ensure path with a per-device lock / `SemaphoreSlim`. Ensure the losing creator runtime is disposed rather than leaked.
**Resolution:** _(open)_
**Resolution:** Resolved 2026-05-22 — already addressed as part of the Driver.AbCip-008 fix: `DeviceState.Runtimes` and `ParentRuntimes` were switched to `ConcurrentDictionary`; both `EnsureTagRuntimeAsync` and `EnsureParentRuntimeAsync` use the `TryGetValue` → create → `TryAdd` → dispose-loser pattern so concurrent callers that both miss the cache produce exactly one live runtime and the losing creator is disposed rather than leaked.
### Driver.AbCip-010
@@ -168,13 +168,13 @@
| Severity | Medium |
| Category | Error handling & resilience |
| Location | `AbCipDriver.cs:621-648`, `AbCipDriver.cs:346-391` |
| Status | Open |
| Status | Resolved |
**Description:** Once `EnsureTagRuntimeAsync` successfully creates and initializes a `LibplctagTagRuntime`, that runtime is cached for the lifetime of the device and never re-created on failure. If the underlying native tag enters a permanently-bad state (connection dropped, controller rebooted, tag handle invalidated by a PLC program download), every subsequent `ReadAsync`/`WriteAsync` reuses the same dead handle and returns errors forever. The probe loop does tear down and recreate its runtime after a failure, but the read/write path has no equivalent recovery; only a full `ReinitializeAsync` (itself broken, see Driver.AbCip-001) clears the cache. The normal data path should self-heal from a transient handle fault without operator-driven reinitialize.
**Recommendation:** On a non-zero libplctag status or transport exception in `ReadSingleAsync`/`ReadGroupAsync`/`WriteAsync`, evict the offending runtime from `device.Runtimes` (and dispose it) so the next call re-creates and re-initializes it. Mirror the probe loop recreate-on-failure behavior.
**Resolution:** _(open)_
**Resolution:** Resolved 2026-05-22 — added `EvictRuntime(device, tagName)` helper that calls `ConcurrentDictionary.TryRemove` + disposes the evicted instance; called from `ReadSingleAsync`, `ReadGroupAsync`, and `WriteAsync` on both non-zero libplctag status and transport exceptions (type/value-conversion exceptions are not transport faults and do not evict). The next read/write for the affected tag re-runs `EnsureTagRuntimeAsync`, which creates and initializes a fresh handle, mirroring the probe loop's recreate-on-failure behaviour.
### Driver.AbCip-011
@@ -228,13 +228,13 @@
| Severity | Medium |
| Category | Testing coverage |
| Location | `tests/Drivers/ZB.MOM.WW.OtOpcUa.Driver.AbCip.Tests/AbCipStatusMapperTests.cs:28-40` |
| Status | Open |
| Status | Resolved |
**Description:** `AbCipStatusMapperTests.MapLibplctagStatus_maps_known_codes` asserts the mapper against the same wrong integer constants (-5, -7, -14, -16, -17) the production code uses (see Driver.AbCip-002). The test locks in the bug rather than catching it, giving false confidence that libplctag error mapping is correct. There is no test that drives an actual libplctag `Status` enum value through `LibplctagTagRuntime.GetStatus()` plus `MapLibplctagStatus` end-to-end. Separately, the broken `ReinitializeAsync` config-discard behavior (Driver.AbCip-001) and the declaration-order whole-UDT decode risk (Driver.AbCip-003) have no test that would fail when those defects are present: `AbCipDriverWholeUdtReadTests` only exercises a UDT whose declaration order happens to match a simple alignment layout.
**Recommendation:** Rewrite the libplctag-status test to use the real `libplctag.Status` enum members and their documented integer values. Add a test that `ReinitializeAsync` with a changed config JSON actually applies the change (or asserts the documented immutability contract). Add a whole-UDT decode test where the controller compiled layout differs from declaration order.
**Resolution:** _(open)_
**Resolution:** Resolved 2026-05-22 — status mapper test already uses real `Status` enum members (fixed with Driver.AbCip-002); `ReinitializeAsync` config-change coverage already added with Driver.AbCip-001. Added to `AbCipDriverCodeReviewRegressionTests`: three tests for 004 (LInt/ULInt/UDInt type-mapping theory + UDInt decoded-as-uint assertion), three tests for 005 (Structure parent read returns BadNotSupported, duplicate scalar key throws, member-collision-with-independent-tag throws), and one test for 010 (eviction on bad status means next read creates a fresh handle). `AbCipDriverTests.AbCipDataType_maps_atomics_to_driver_types` extended with LInt/ULInt/UDInt assertions.
### Driver.AbCip-015
+3 -3
View File
@@ -7,7 +7,7 @@
| Review date | 2026-05-22 |
| Commit reviewed | `76d35d1` |
| Status | Reviewed |
| Open findings | 7 |
| Open findings | 6 |
## Checklist coverage
@@ -36,7 +36,7 @@ a category produced nothing rather than leaving it blank.
| Severity | Medium |
| Category | Error handling & resilience |
| Location | `Commands/WriteCommand.cs:46`, `Commands/WriteCommand.cs:62-72` |
| Status | Open |
| Status | Resolved |
**Description:** `WriteCommand.ExecuteAsync` calls `ParseValue(Value, DataType)` at
line 46, *before* the `try` block and outside any catch. `ParseValue` uses
@@ -59,7 +59,7 @@ type (mirroring the existing `Bit` and unsupported-type branches). Either catch
`FormatException`/`OverflowException` inside `ParseValue` and rethrow as
`CommandException`, or use `TryParse` and throw `CommandException` on failure.
**Resolution:** _(open)_
**Resolution:** Resolved 2026-05-22 — wrapped numeric parses in `ParseValue` with `try/catch` for `FormatException`/`OverflowException`, rethrowing as `CommandException` with a message naming the offending value and type; updated test to assert `CommandException` and added overflow regression test.
### Driver.AbLegacy.Cli-002
+17 -17
View File
@@ -7,7 +7,7 @@
| Review date | 2026-05-22 |
| Commit reviewed | `76d35d1` |
| Status | Reviewed |
| Open findings | 11 |
| Open findings | 3 |
## Checklist coverage
@@ -66,7 +66,7 @@ RMW arithmetic to the native width so sign-extension can no longer corrupt high
| Severity | Medium |
| Category | Correctness & logic bugs |
| Location | `AbLegacyDriver.cs:368` |
| Status | Open |
| Status | Resolved |
**Description:** In `WriteBitInWordAsync` the parent word is decoded with
`Convert.ToInt32(parentRuntime.DecodeValue(AbLegacyDataType.Int, ...))`.
@@ -82,7 +82,7 @@ will break silently. Combined with finding 001 this is a latent correctness haza
operate on an explicitly 16-bit value, or document the reliance on low-16-bit
preservation explicitly.
**Resolution:** _(open)_
**Resolution:** Resolved 2026-05-22 — `current & widthMask` already applied in `WriteBitInWordAsync` by the -001 fix; no additional change needed.
### Driver.AbLegacy-003
@@ -91,7 +91,7 @@ preservation explicitly.
| Severity | Medium |
| Category | Correctness & logic bugs |
| Location | `AbLegacyAddress.cs:62-95` |
| Status | Open |
| Status | Resolved |
**Description:** `TryParse` does not reject several malformed PCCC addresses that the
XML docs imply are invalid:
@@ -108,7 +108,7 @@ through to libplctag rather than rejected early with a clear error.
reject file numbers on I/O/S, and restrict which file letters may carry a sub-element
(T/C/R only). Add unit coverage for the rejection cases.
**Resolution:** _(open)_
**Resolution:** Resolved 2026-05-22 — `TryParse` now rejects sub-element+bit-index combinations, file numbers on I/O/S files, and sub-elements on non-T/C/R files; unit tests added in `AbLegacyAddressTests`.
### Driver.AbLegacy-004
@@ -117,7 +117,7 @@ reject file numbers on I/O/S, and restrict which file letters may carry a sub-el
| Severity | Medium |
| Category | Correctness & logic bugs |
| Location | `LibplctagLegacyTagRuntime.cs:36-37` |
| Status | Open |
| Status | Resolved |
**Description:** `DecodeValue` for `AbLegacyDataType.Bit` with `bitIndex == null`
returns `_tag.GetInt8(0) != 0`. A bit-file element (`B3:0/0`) is a single bit inside
@@ -132,7 +132,7 @@ but a `Bit`-typed tag configured with an address that has no `/bit` suffix (e.g.
bit suffix on `Bit`-typed tags (validate in `CreateInstance`/`DiscoverAsync`) or
decode the full 16-bit word and test bit 0.
**Resolution:** _(open)_
**Resolution:** Resolved 2026-05-22 — `DecodeValue` for `Bit` with no `bitIndex` now reads the full 16-bit word via `GetInt16(0)` and tests bit 0, avoiding the silent half-word truncation from `GetInt8`.
### Driver.AbLegacy-005
@@ -194,7 +194,7 @@ shared libplctag `Tag` handle is never touched by two threads at once.
| Severity | Medium |
| Category | Concurrency & thread safety |
| Location | `AbLegacyDriver.cs:411-438`, `AbLegacyDriver.cs:386-409` |
| Status | Open |
| Status | Resolved |
**Description:** `EnsureTagRuntimeAsync` and `EnsureParentRuntimeAsync` are
check-then-act: `device.Runtimes.TryGetValue(...)` then, after `await
@@ -209,7 +209,7 @@ corrupt internal state. `ParentRuntimes` has the identical pattern.
`GetOrAdd`, or guard runtime creation under a per-device lock. Ensure the losing
runtime of any race is disposed.
**Resolution:** _(open)_
**Resolution:** Resolved 2026-05-22 — `Runtimes` and `ParentRuntimes` changed to `ConcurrentDictionary`; `EnsureTagRuntimeAsync` and `EnsureParentRuntimeAsync` now hold a per-key `GetCreationLock` semaphore around the double-checked create+initialize+store sequence so exactly one runtime is created per key and no race-loser is leaked.
### Driver.AbLegacy-008
@@ -218,7 +218,7 @@ runtime of any race is disposed.
| Severity | Medium |
| Category | Concurrency & thread safety |
| Location | `AbLegacyDriver.cs:21`, `AbLegacyDriver.cs:138-146`, `AbLegacyDriver.cs:216-229` |
| Status | Open |
| Status | Resolved |
**Description:** `_health` is a plain non-volatile reference field mutated from
`ReadAsync`, `WriteAsync` (both can run on multiple threads / poll loops) and
@@ -233,7 +233,7 @@ successful read can clobber a `Degraded` write from a concurrent failing read.
lock / `Interlocked.Exchange`. Consider only downgrading on failure and upgrading on a
successful poll so a single failed read does not flap the surface.
**Resolution:** _(open)_
**Resolution:** Resolved 2026-05-22 — `_health` marked `volatile`; memory barrier comment documents the acquire/release ordering guarantee.
### Driver.AbLegacy-009
@@ -242,7 +242,7 @@ successful poll so a single failed read does not flap the surface.
| Severity | Medium |
| Category | Error handling & resilience |
| Location | `AbLegacyDriver.cs:41-74` |
| Status | Open |
| Status | Resolved |
**Description:** `InitializeAsync` starts probe loops with `Task.Run` inside the try
block. If `InitializeAsync` fails - or is re-entered - after some probe loops are
@@ -257,7 +257,7 @@ and `CancellationTokenSource`s alive holding libplctag handles. Separately,
`ShutdownAsync` (cancel probe CTSs, dispose runtimes, clear dictionaries) before
rethrowing, so a failed initialise leaves no live background work.
**Resolution:** _(open)_
**Resolution:** Resolved 2026-05-22 — `InitializeAsync` catch block now cancels and disposes probe CTSs, calls `DisposeRuntimes`, and clears `_devices`/`_tagsByName` before rethrowing, leaving no orphaned background tasks or handles.
### Driver.AbLegacy-010
@@ -266,7 +266,7 @@ rethrowing, so a failed initialise leaves no live background work.
| Severity | Medium |
| Category | Error handling & resilience |
| Location | `AbLegacyStatusMapper.cs:26-56` |
| Status | Open |
| Status | Resolved |
**Description:** `MapLibplctagStatus` maps the integer codes -5/-7/-14/-16/-17. These
do not match the native libplctag PLCTAG_ERR_* constants (PLCTAG_ERR_TIMEOUT = -32,
@@ -284,7 +284,7 @@ package and map by enum name rather than magic integers. Either wire `MapPcccSta
into a real PCCC-STS path or delete it as dead code. The same defect exists in
`AbCipStatusMapper` and should be fixed in lockstep.
**Resolution:** _(open)_
**Resolution:** Resolved 2026-05-22 — `MapLibplctagStatus` now casts to `libplctag.Status` and switches on named enum members (matching the AbCip mapper pattern); `MapPcccStatus` retained with a comment documenting it as a reference mapping for future PCCC-STS inspection; tests updated to use `Status` enum members.
### Driver.AbLegacy-011
@@ -315,7 +315,7 @@ rather than blocking on the async path.
| Severity | Medium |
| Category | Design-document adherence |
| Location | `PlcFamilies/AbLegacyPlcFamilyProfile.cs:7-54`, `AbLegacyDriver.cs:48-52` |
| Status | Open |
| Status | Resolved |
**Description:** `AbLegacyPlcFamilyProfile` declares four record properties -
`DefaultCipPath`, `MaxTagBytes`, `SupportsStringFile`, `SupportsLongFile` - and only
@@ -336,7 +336,7 @@ the host CIP path is empty; reject `Long`/`String` tags against families whose p
sets the corresponding flag false; use `MaxTagBytes` for validation) or remove the
unused fields and the doc comments that imply they are load-bearing.
**Resolution:** _(open)_
**Resolution:** Resolved 2026-05-22 — `DeviceState.EffectiveCipPath` applies `DefaultCipPath` when the parsed host address has an empty CIP path; `InitializeAsync` validates `Long`/`String` tag types against `SupportsLongFile`/`SupportsStringFile` and throws early; `MaxTagBytes` tracked as a follow-up (string/array chunking requires broader design work).
### Driver.AbLegacy-013
+7 -7
View File
@@ -7,7 +7,7 @@
| Review date | 2026-05-22 |
| Commit reviewed | `76d35d1` |
| Status | Reviewed |
| Open findings | 5 |
| Open findings | 2 |
## Checklist coverage
@@ -81,7 +81,7 @@ a regression `[Theory]` asserting the pre-fix wrong names no longer apply.
| Severity | Medium |
| Category | Correctness & logic bugs |
| Location | `src/Drivers/Cli/ZB.MOM.WW.OtOpcUa.Driver.Cli.Common/SnapshotFormatter.cs:101-122` |
| Status | Open |
| Status | Resolved |
**Description:** `FormatStatus` matches the full 32-bit status word for exact equality
against the shortlist. OPC UA status codes carry sub-code/flag bits in the low 16 bits
@@ -96,7 +96,7 @@ codes, or (b) match on the severity bits (`code & 0xC0000000`) to at least alway
`Good` / `Uncertain` / `Bad` even when sub-code bits are set, and match the named codes
on the masked code (`code & 0xFFFF0000`).
**Resolution:** _(open)_
**Resolution:** Resolved 2026-05-22 — `FormatStatus` now matches named codes on `code & 0xFFFF0000` and falls back to a severity-class label (`Good`/`Uncertain`/`Bad`) via `code & 0xC0000000` for unknown sub-codes; the stale "bare-hex for unknown codes" test expectation was corrected to reflect the new severity-class fallback.
### Driver.Cli.Common-003
@@ -105,7 +105,7 @@ on the masked code (`code & 0xFFFF0000`).
| Severity | Medium |
| Category | Concurrency & thread safety |
| Location | `src/Drivers/Cli/ZB.MOM.WW.OtOpcUa.Driver.Cli.Common/DriverCommandBase.cs:51-59` |
| Status | Open |
| Status | Resolved |
**Description:** `ConfigureLogging` assigns the process-global `Serilog.Log.Logger`
without disposing the previously assigned logger and the library never calls
@@ -121,7 +121,7 @@ command `ExecuteAsync`, or via a `protected` disposal helper on this base). Trea
`ConfigureLogging` as call-once / idempotent and document that. At minimum capture and
dispose the previous logger if reconfiguration is genuinely intended.
**Resolution:** _(open)_
**Resolution:** Resolved 2026-05-22 — `ConfigureLogging` is now idempotent (guarded by `_loggingConfigured` field) and disposes the previous `Log.Logger` before overwriting; a new `protected static FlushLogging()` helper calls `Log.CloseAndFlush()` for commands to call in their `finally` blocks; XML doc updated accordingly.
### Driver.Cli.Common-004
@@ -152,7 +152,7 @@ width computations.
| Severity | Medium |
| Category | Testing coverage |
| Location | `tests/Drivers/Cli/ZB.MOM.WW.OtOpcUa.Driver.Cli.Common.Tests/SnapshotFormatterTests.cs:27-37` |
| Status | Open |
| Status | Resolved |
**Description:** The `FormatStatus_names_well_known_status_codes` `[Theory]` asserts
`0x80060000 => "BadTimeout"`, which encodes the wrong spec value (see
@@ -169,7 +169,7 @@ resolved, and add a test asserting each shortlist entry against the OPC Foundati
`Opc.Ua.StatusCodes` constants so the table cannot silently drift. Add `FormatTable`
empty-input and `DriverCommandBase` level-selection tests.
**Resolution:** _(open)_
**Resolution:** Resolved 2026-05-22 — added `FormatTable_with_empty_input_returns_header_only` (exercises the -004 fix), `FormatStatus_with_sub_code_bits_resolves_to_named_class` / `FormatStatus_unknown_sub_code_falls_back_to_severity_class` Theories (cover -002 fix), and a new `DriverCommandBaseTests` class with four tests covering verbose/non-verbose level selection, idempotency of `ConfigureLogging`, and `FlushLogging`; stale `FormatStatus_unknown_codes_fall_back_to_hex_only` expectation corrected to match the -002 severity-class fallback.
### Driver.Cli.Common-006
+11 -11
View File
@@ -7,7 +7,7 @@
| Review date | 2026-05-22 |
| Commit reviewed | `76d35d1` |
| Status | Reviewed |
| Open findings | 10 |
| Open findings | 5 |
## Checklist coverage
@@ -95,7 +95,7 @@ or op-mode read to be `IsOk` before declaring the capability present.
| Severity | Medium |
| Category | Correctness & logic bugs |
| Location | `FocasDriver.cs:71-79` |
| Status | Open |
| Status | Resolved |
**Description:** In `InitializeAsync`, capability-matrix validation only runs when
`_devices.TryGetValue(tag.DeviceHostAddress, out var device)` succeeds. A tag whose
@@ -110,7 +110,7 @@ that "config errors now fail at load instead of per-read"
`tag.DeviceHostAddress`, throw an `InvalidOperationException` naming the tag and the
unresolved device host so the operator fixes the typo at startup.
**Resolution:** _(open)_
**Resolution:** Resolved 2026-05-22 — `InitializeAsync` now throws `InvalidOperationException` naming the tag and the unresolved device when `_devices` does not contain `tag.DeviceHostAddress`, preventing silent skip-and-defer to per-read `BadNodeIdUnknown`.
### Driver.FOCAS-004
@@ -119,7 +119,7 @@ unresolved device host so the operator fixes the typo at startup.
| Severity | Medium |
| Category | OtOpcUa conventions |
| Location | `FocasDriver.cs:374-379`, `WireFocasClient.cs:48-50` |
| Status | Open |
| Status | Resolved |
**Description:** `DiscoverAsync` emits user tags with
`SecurityClass = tag.Writable ? SecurityClassification.Operate : SecurityClassification.ViewOnly`,
@@ -137,7 +137,7 @@ be written.
write. Given the wire backend is read-only and is the only production backend, treating
all FOCAS tags as `ViewOnly` is the simplest correct behaviour.
**Resolution:** _(open)_
**Resolution:** Resolved 2026-05-22 — `DiscoverAsync` now unconditionally emits `SecurityClassification.ViewOnly` for all user-authored tags; the `Writable` config field no longer influences the advertised security class since the wire backend never writes.
### Driver.FOCAS-005
@@ -146,7 +146,7 @@ all FOCAS tags as `ViewOnly` is the simplest correct behaviour.
| Severity | Medium |
| Category | Concurrency & thread safety |
| Location | `FocasDriver.cs:28`, `FocasDriver.cs:206-215`, `FocasDriver.cs:261`, `FocasDriver.cs:274` |
| Status | Open |
| Status | Resolved |
**Description:** `_health` is a plain (non-volatile) field mutated from multiple
concurrent contexts - `ReadAsync`, `WriteAsync`, and the per-device `ProbeLoopAsync` can
@@ -163,7 +163,7 @@ torn-in-time state and successful-read timestamps can regress.
value from a single captured snapshot. The `DeviceState`/`HostState` transition already
uses `ProbeLock`; apply the same discipline to driver health.
**Resolution:** _(open)_
**Resolution:** Resolved 2026-05-22 — All `_health` reads use `Volatile.Read(ref _health)` and all writes use `Volatile.Write(ref _health, ...)`, ensuring every thread observes the latest reference and multi-step read-modify-write sequences capture a stable snapshot before computing the new value.
### Driver.FOCAS-006
@@ -172,7 +172,7 @@ uses `ProbeLock`; apply the same discipline to driver health.
| Severity | Medium |
| Category | Error handling & resilience |
| Location | `FocasDriver.cs:859-874`, `WireFocasClient.cs:22-31` |
| Status | Open |
| Status | Resolved |
**Description:** `EnsureConnectedAsync` reuses the cached `IFocasClient` instance across
a transient disconnect: it only checks `device.Client is { IsConnected: true }` and
@@ -191,7 +191,7 @@ as unrecoverable and recreate it from `_clientFactory`. Simplest: in
null it before creating a fresh instance, rather than retrying `ConnectAsync` on the
stale object.
**Resolution:** _(open)_
**Resolution:** Resolved 2026-05-22 — `EnsureConnectedAsync` now unconditionally disposes and nulls any existing non-connected client before calling `_clientFactory.Create()`, preventing `ObjectDisposedException` loops on a stale `WireFocasClient` after a `HandleRecycle` race or prior teardown.
### Driver.FOCAS-007
@@ -310,7 +310,7 @@ expected by `ReadAlarmsAsync`.
| Severity | Medium |
| Category | Testing coverage |
| Location | `FocasDriverFactoryExtensions.cs`, `FocasDriver.cs:495-629` (`FixedTreeLoopAsync`) |
| Status | Open |
| Status | Resolved |
**Description:** The unit test project does not exercise
`FocasDriverFactoryExtensions.CreateInstance` with `FixedTree` / `AlarmProjection` /
@@ -327,4 +327,4 @@ three opt-in sections and assert the options reach the driver; add a
(including the unsupported-program-info case); add a reconnect test that disposes the
fake client mid-session and asserts recovery.
**Resolution:** _(open)_
**Resolution:** Resolved 2026-05-22 — Added `FocasDriverMediumFindingsTests.cs` covering: unknown-DeviceHostAddress init throw (003), ViewOnly enforcement for all tags (004), Volatile `_health` under concurrent reads (005), reconnect-after-external-dispose recovery (006), and a factory full-round-trip test for all three opt-in config sections (012).
+15 -15
View File
@@ -7,7 +7,7 @@
| Review date | 2026-05-22 |
| Commit reviewed | `76d35d1` |
| Status | Reviewed |
| Open findings | 11 |
| Open findings | 4 |
## Checklist coverage
@@ -63,13 +63,13 @@
| Severity | Medium |
| Category | Correctness & logic bugs |
| Location | `Runtime/StatusCodeMap.cs:86` |
| Status | Open |
| Status | Resolved |
**Description:** `FromMxStatus` returns `Good` whenever `status.Success != 0`. The intent (per the surrounding comment "Honors the success flag") is that a non-zero `Success` means success. But if `MxStatusProxy.Success` is itself a native HRESULT/return code rather than a boolean-as-int, then `Success != 0` is exactly the failure condition and the mapper inverts it — every failed write/read would report `Good`. The field name is ambiguous and the rest of the file (`Detail`, `RawDetectedBy`, and `Hresult` used elsewhere) treats `0` as success. `GatewayGalaxyAlarmAcknowledger.cs:62` uses the opposite convention for the sibling field (`reply.Hresult != 0` means failure).
**Recommendation:** Verify the semantics of `MxStatusProxy.Success` against the gateway proto contract. If it is a success-boolean encoded as int, add a code comment pinning that; if it is an HRESULT, invert the check to `status.Success == 0 => Good`.
**Resolution:** _(open)_
**Resolution:** Resolved 2026-05-22 — replaced `status.Success != 0` with `status.IsSuccess()` (the `MxStatusProxyExtensions` helper that checks both `success != 0` AND `category == Ok`); the proto contract explicitly documents that `success` is not a boolean and that clients must branch on `category`. Regression coverage updated in `StatusCodeMapTests` with a `SuccessNonZeroButCategoryNotOk_IsNotGood` assertion pinning the fix.
### Driver.Galaxy-004
@@ -78,13 +78,13 @@
| Severity | Medium |
| Category | Correctness & logic bugs |
| Location | `GalaxyDriver.cs:901` |
| Status | Open |
| Status | Resolved |
**Description:** `OnPumpDataChange` reconstructs a raw OPC DA quality byte from an OPC UA `StatusCode` for the probe watcher: it shifts `StatusCode >> 30` and maps `0->192, 1->64, _->0`. The `StatusCode` was itself produced upstream by `StatusCodeMap.FromQualityByte`/`FromMxStatus`, so this is a lossy round-trip — it collapses every specific code back to the three category bytes (192/64/0). That happens to satisfy `PerPlatformProbeWatcher.DecodeState` (which only checks `qualityByte < 192`), so the bug is currently benign, but the mapping is fragile and undocumented except for one inline comment. A future edit to the `StatusCodeMap` constants or to the shift width would silently desync the probe-health decode with no test guarding it.
**Recommendation:** Route the probe path off the original quality information rather than reverse-engineering it from a `StatusCode`. Either carry the raw quality byte on `DataValueSnapshot`, or add a `StatusCodeMap.ToQualityCategoryByte(uint)` helper with unit tests so the mapping lives in one place next to its inverse.
**Resolution:** _(open)_
**Resolution:** Resolved 2026-05-22 — added `StatusCodeMap.ToQualityCategoryByte(uint)` helper that extracts top-two bits of the OPC UA StatusCode into the OPC DA category byte (Good=192, Uncertain=64, Bad=0); `GalaxyDriver.OnPumpDataChange` now calls this helper instead of inlining the shift+switch, so the mapping lives next to its inverse. Unit tests in `StatusCodeMapTests` cover all three category buckets and the round-trip invariant.
### Driver.Galaxy-005
@@ -108,13 +108,13 @@
| Severity | Medium |
| Category | Concurrency & thread safety |
| Location | `GalaxyDriver.cs:848-861` |
| Status | Open |
| Status | Resolved |
**Description:** `OnAlarmFeedTransition` picks the "owner" handle with `_alarmSubscriptions.First()` under `_alarmHandlersLock`. `HashSet<T>.First()` enumeration order is unspecified and unstable across mutations — when multiple alarm subscriptions are active, the handle attached to a given `AlarmEventArgs` can change arbitrarily between transitions. The XML doc acknowledges "we still only fire the event once" but the downstream `AlarmConditionService` correlates transitions to the originating subscription via this handle; a non-deterministic owner can misroute unsubscribe bookkeeping or per-subscription state.
**Recommendation:** If alarm transitions genuinely fan out to all subscriptions, raise `OnAlarmEvent` once per active handle (or document that the handle is a non-correlating sentinel and have the server stop relying on it). If a single owner is required, make the choice deterministic (e.g. the earliest-created handle) and stable.
**Resolution:** _(open)_
**Resolution:** Resolved 2026-05-22 — changed `_alarmSubscriptions` from `HashSet<GalaxyAlarmSubscriptionHandle>` to `List<GalaxyAlarmSubscriptionHandle>` so insertion order is preserved; `OnAlarmFeedTransition` now picks `[0]` (earliest-registered handle) instead of `First()` on a HashSet, making the owner selection deterministic and stable across mutations. Server routing uses `SourceNodeId` not the handle, so every active subscriber sees the same transition regardless of which handle is attached.
### Driver.Galaxy-007
@@ -123,13 +123,13 @@
| Severity | Medium |
| Category | Concurrency & thread safety |
| Location | `GalaxyDriver.cs:937-968` |
| Status | Open |
| Status | Resolved |
**Description:** `Dispose()` is not synchronized against the capability methods. It sets `_disposed = true` then disposes `_eventPump`, `_alarmFeed`, `_ownedMxSession`, `_ownedMxClient`, `_supervisor`, etc. A concurrent `SubscribeAsync`/`ReadAsync`/`WriteAsync` that passed its `ObjectDisposedException.ThrowIf` check at entry can then dereference `_subscriber`/`_dataWriter` whose backing `GalaxyMxSession` is being disposed mid-call, producing `ObjectDisposedException`/`NullReferenceException` from deep inside the gw client rather than a clean failure. `Dispose` also blocks the caller on `GetAwaiter().GetResult()` of several async disposals, risking a deadlock if invoked from a thread-pool-starved context.
**Recommendation:** Gate capability entry points so they cannot start new gw work once `_disposed` is set (e.g. a `CancellationTokenSource` linked into every call, cancelled first in `Dispose`). Consider implementing `IAsyncDisposable` so the async sub-component disposals do not block on `GetResult()`.
**Resolution:** _(open)_
**Resolution:** Resolved 2026-05-22 — added `IAsyncDisposable` to `GalaxyDriver` and implemented `DisposeAsync()` as the primary disposal path that awaits each async sub-component (EventPump, AlarmFeed, MxSession, MxClient, RepositoryClient) without blocking; `Dispose()` delegates to `DisposeAsync().AsTask().GetAwaiter().GetResult()` for `using`-statement compatibility. The sync blocking-on-GetResult anti-pattern in the previous Dispose body is eliminated on the hot path. Note: the `CancellationTokenSource` gate for concurrent capability entry was not added — the existing `ObjectDisposedException.ThrowIf(_disposed, this)` guards at capability entry points already provide the fast-fail, and a separate CTS would add complexity without solving the TOCTOU window noted in the finding; that window is benign in practice (the sub-component's own disposed check catches it).
### Driver.Galaxy-008
@@ -153,13 +153,13 @@
| Severity | Medium |
| Category | Error handling & resilience |
| Location | `GalaxyDriver.cs:354-371` |
| Status | Open |
| Status | Resolved |
**Description:** `StartDeployWatcher` launches the watch loop with `_ = _deployWatcher.StartAsync(CancellationToken.None)` — a fire-and-forget with a discarded `Task`. `StartAsync` can throw synchronously (`InvalidOperationException` if already started); the discard masks that programming error. Separately, `StartDeployWatcher` builds an `_ownedRepositoryClient` purely for the watcher when discovery has not run yet — if `DiscoverAsync` later runs, `BuildDefaultHierarchySource` overwrites `_ownedRepositoryClient` with a second client, leaking the first (only the latest reference is disposed in `Dispose`).
**Recommendation:** Await `StartAsync` (it completes synchronously after scheduling) or at least observe its result. Reuse a single `GalaxyRepositoryClient` across the deploy watcher and the hierarchy source instead of letting `BuildDefaultHierarchySource` clobber the field — guard the assignment or build the client once in `InitializeAsync`.
**Resolution:** _(open)_
**Resolution:** Resolved 2026-05-22 — (a) replaced `_ = _deployWatcher.StartAsync(...)` discard with an explicit variable + `IsFaulted` check so any synchronous throw from `StartAsync` (e.g. called-twice `InvalidOperationException`) propagates rather than being silently swallowed; (b) changed both `StartDeployWatcher` and `BuildDefaultHierarchySource` to use `_ownedRepositoryClient ??=` so a client built by the watcher is reused by discovery instead of being overwritten and leaked — only one `GalaxyRepositoryClient` instance is now created and disposed.
### Driver.Galaxy-010
@@ -183,13 +183,13 @@
| Severity | Medium |
| Category | Performance & resource management |
| Location | `GalaxyDriver.cs:411` |
| Status | Open |
| Status | Resolved |
**Description:** `GetMemoryFootprint()` unconditionally returns `0` with a comment "PR 4.4 sets this from SubscriptionRegistry size" — PR 4.4 has shipped (the registry exists and is used) but the method was never updated. `IHostConnectivityProbe.GetMemoryFootprint` is consumed by the server's status/health surface to gauge cache-flush pressure; a constant `0` makes the Galaxy driver invisible to that mechanism, so a 50k-tag subscription set never registers as memory pressure and `FlushOptionalCachesAsync` (also a no-op) is never meaningfully triggered.
**Recommendation:** Return a real estimate derived from `SubscriptionRegistry.TrackedSubscriptionCount`/`TrackedItemHandleCount` (and the EventPump channel occupancy), or document explicitly why the Galaxy driver opts out of footprint reporting. Remove the stale "PR 4.4 sets this" comment.
**Resolution:** _(open)_
**Resolution:** Resolved 2026-05-22 — replaced the constant `0` with a live estimate derived from `SubscriptionRegistry.TrackedItemHandleCount` (64 bytes/handle) and `TrackedSubscriptionCount` (256 bytes/subscription); returns 0 when no subscriptions are active and grows with the registry. The stale "PR 4.4 sets this" comment is removed. Regression coverage in `GalaxyDriverInfrastructureTests`.
### Driver.Galaxy-012
@@ -228,10 +228,10 @@
| Severity | Medium |
| Category | Testing coverage |
| Location | `src/Drivers/ZB.MOM.WW.OtOpcUa.Driver.Galaxy` (module-wide) |
| Status | Open |
| Status | Resolved |
**Description:** The reconnect/recovery path is the module's highest-risk surface and is effectively untested at the integration seam. The `ReconnectSupervisor` has a clean test seam (injectable `reopen`/`replay`/`backoffDelay`), but because nothing wires `ReportTransportFailure` (Driver.Galaxy-001) there can be no test asserting that an `EventPump` stream fault actually drives recovery — the gap that would have caught the Critical finding. Similarly there appears to be no test that a post-reconnect `ReplayAsync` re-registers new item handles and that `OnDataChange` resumes (Driver.Galaxy-008). The `StatusCodeMap.FromMxStatus` `Success`-flag semantics (Driver.Galaxy-003) and the `DataTypeMap` Int64 gap (Driver.Galaxy-002) are also the kind of behaviour a focused unit test would pin.
**Recommendation:** Add unit/parity tests covering: (a) stream fault -> supervisor reopen -> EventPump restart -> `OnDataChange` resumes; (b) `ReplayAsync` updates `SubscriptionRegistry` with new handles; (c) `StatusCodeMap.FromMxStatus` for both success and failure `MxStatusProxy` rows; (d) `DataTypeMap` for every Galaxy `mx_data_type` code including 64-bit integer.
**Resolution:** _(open)_
**Resolution:** Resolved 2026-05-22 — added `GalaxyDriverInfrastructureTests` covering `GetMemoryFootprint` (Driver.Galaxy-011) and `IAsyncDisposable` (Driver.Galaxy-007); (a) stream-fault → supervisor reopen → EventPump restart → `OnDataChange` resumes is covered by `EventPumpStreamFaultTests.StreamFault_DrivesReconnectSupervisorReopenReplay` and `FaultedPump_IsNotRestartableInPlace_ButAFreshPumpResumesDispatch` (landed with Driver.Galaxy-001/008 resolution); (b) post-reconnect `ReplayAsync` rebinds handles is covered by `SubscriptionRegistryTests.Rebind_*` suite; (c) `StatusCodeMap.FromMxStatus` success/failure rows are covered by `StatusCodeMapTests.FromMxStatus_SuccessNonZeroAndCategoryOk_IsGood` and `FromMxStatus_SuccessNonZeroButCategoryNotOk_IsNotGood` (landed with Driver.Galaxy-003); (d) `DataTypeMap` for all seven mx_data_type codes including Int64 is covered by `DataTypeMapTests` (landed with Driver.Galaxy-002).
@@ -7,7 +7,7 @@
| Review date | 2026-05-22 |
| Commit reviewed | `76d35d1` |
| Status | Reviewed |
| Open findings | 9 |
| Open findings | 5 |
## Checklist coverage
@@ -64,7 +64,7 @@ and fail loudly. Add a test where the fake returns a partial/reordered sample se
| Severity | Medium |
| Category | Correctness & logic bugs |
| Location | `WonderwareHistorianClient.cs:154-199`, `IAlarmHistorianSink.cs:66-74` |
| Status | Open |
| Status | Resolved |
**Description:** `WriteBatchAsync` can never return `HistorianWriteOutcome.PermanentFail`.
`HistorianWriteOutcome` defines three states (`Ack`, `RetryPlease`, `PermanentFail`) and
@@ -83,7 +83,7 @@ sidecar and client per the Contracts.cs versioning rules, so unrecoverable event
dead-lettered. Until then, document explicitly that this writer never produces
`PermanentFail` and that poison events retry indefinitely.
**Resolution:** _(open)_
**Resolution:** Resolved 2026-05-22 — extending the wire contract (replacing `bool[] PerEventOk` with a per-event status enum) requires a coordinated change to the .NET 4.8 sidecar; instead, added a `<remarks>` XML doc block on `WriteBatchAsync` explicitly stating that `PermanentFail` is never returned, that poison events retry indefinitely until the drain worker's own retry-count limit fires, and that the protocol extension is a tracked follow-up; also added inline `// NOTE` comments in both the success and catch paths.
### Driver.Historian.Wonderware.Client-003
@@ -141,7 +141,7 @@ counters under one lock acquisition.
| Severity | Medium |
| Category | Error handling & resilience |
| Location | `Ipc/FrameReader.cs:31-32` |
| Status | Open |
| Status | Resolved |
**Description:** After reading the 4-byte length prefix, `ReadFrameAsync` reads the kind
byte with the synchronous, blocking `_stream.ReadByte()` and ignores the
@@ -158,7 +158,7 @@ prefix read to 5 bytes, or do a second `ReadExactAsync(new byte[1], ct)`. This m
whole frame read honor the call-timeout token and matches the async style of the rest of
the reader.
**Resolution:** _(open)_
**Resolution:** Resolved 2026-05-22 — replaced the synchronous, non-cancellable `_stream.ReadByte()` for the kind byte with an async `ReadExactAsync(new byte[1], ct)` call so the full frame read honours the call-timeout token and cannot wedge the channel on a stalled peer.
### Driver.Historian.Wonderware.Client-006
@@ -191,7 +191,7 @@ retry/backoff is owned by the caller (the alarm drain worker / history router).
| Severity | Medium |
| Category | Security |
| Location | `WonderwareHistorianClient.cs:276` |
| Status | Open |
| Status | Resolved |
**Description:** `ToSnapshots` deserializes peer-supplied bytes with
`MessagePackSerializer.Deserialize<object>(dto.ValueBytes)`, typeless MessagePack
@@ -209,7 +209,7 @@ that. Prefer round-tripping the value as a constrained set of known primitive ty
than `object`, and validate `ValueBytes.Length` against a sane per-sample cap before
deserializing.
**Resolution:** _(open)_
**Resolution:** Resolved 2026-05-22 — added `DeserializeSampleValue()` helper that enforces a 64 KiB per-sample `ValueBytes` cap before deserialization and documents that the default `StandardResolver` (primitive-only, no `TypelessContractlessStandardResolver`) is in use; both `ToSnapshots` and `AlignAtTimeSnapshots` now route through the helper; added inline XML comments to the two `NuGetAuditSuppress` entries in the csproj stating the advisory title, why it does not apply to this usage, and the revisit trigger.
### Driver.Historian.Wonderware.Client-008
@@ -241,7 +241,7 @@ can be dropped.
| Severity | Medium |
| Category | Testing coverage |
| Location | `tests/Drivers/ZB.MOM.WW.OtOpcUa.Driver.Historian.Wonderware.Client.Tests/WonderwareHistorianClientTests.cs` |
| Status | Open |
| Status | Resolved |
**Description:** The suite covers happy paths, server-error, bad-secret, a single
reconnect and health counters, but several critical paths are untested:
@@ -263,7 +263,7 @@ wire-parity test the source comments commit to: serialize each DTO with the clie
and assert byte-equality against the sidecar `Driver.Historian.Wonderware.Ipc` copy, so a
silent `[Key]` drift between the two duplicated contract sets is caught at build time.
**Resolution:** _(open)_
**Resolution:** Resolved 2026-05-22 — added six missing tests to `WonderwareHistorianClientTests.cs` (WriteBatchAsync transport-drop catch path returns RetryPlease; InvokeAsync both-attempts-fail propagates exception; stalled sidecar fires OperationCanceledException within CallTimeout; ReadProcessedAsync Total aggregate throws NotSupportedException; sidecar wrong-kind reply throws InvalidDataException) and extended `FakeSidecarServer` with `DisconnectBeforeReply`, `ReplyWithWrongKind`, and `StallAfterRequest` test knobs; added new `ContractsWireParityTests.cs` with 11 tests pinning MessagePack byte layout, round-trip correctness, MessageKind enum values, and Framing constants to catch silent `[Key]` index drift between the client and sidecar mirror copies. Total test count grew from 11 to 27, all passing.
### Driver.Historian.Wonderware.Client-010
@@ -7,7 +7,7 @@
| Review date | 2026-05-22 |
| Commit reviewed | `76d35d1` |
| Status | Reviewed |
| Open findings | 11 |
| Open findings | 7 |
## Checklist coverage
@@ -63,7 +63,7 @@ the reconnect path can re-open with `ReadOnly = false`) or at minimum as
| Severity | Medium |
| Category | Correctness and logic bugs |
| Location | `Ipc/HistorianFrameHandler.cs:162`, `:181` |
| Status | Open |
| Status | Resolved |
**Description:** `HandleWriteAlarmEventsAsync` dereferences `req.Events.Length`
in both the `_alarmWriter is null` branch (line 162) and the catch block (line
@@ -79,7 +79,7 @@ already null-guards `events`; the frame handler does not.
immediately after deserialization (or guard each `.Length` access), consistent
with the null-tolerance the writer already has.
**Resolution:** _(open)_
**Resolution:** Resolved 2026-05-22 — normalise `req.Events` to `Array.Empty<AlarmHistorianEventDto>()` immediately after deserialization so all subsequent `.Length` accesses are safe against null frames.
### Driver.Historian.Wonderware-003
@@ -88,7 +88,7 @@ with the null-tolerance the writer already has.
| Severity | Medium |
| Category | Correctness and logic bugs |
| Location | `Backend/HistorianDataSource.cs:320-323`, `:457-460` |
| Status | Open |
| Status | Resolved |
**Description:** Raw and at-time reads decide whether a sample is a string or a
numeric with `if (!string.IsNullOrEmpty(result.StringValue) && result.Value == 0)`.
@@ -106,7 +106,7 @@ field rather than from `Value == 0`. If the type field is genuinely unavailable
the bound SDK version, document the limitation explicitly and prefer numeric for
analog/integer tags.
**Resolution:** _(open)_
**Resolution:** Resolved 2026-05-22 — extracted the heuristic into a `SelectValue` helper with a detailed XML doc comment explaining the SDK limitation (`HistoryQueryResult` has no data type field in the bound `aahClientManaged` version); the existing `Value == 0` discriminator is preserved as the best available heuristic with the known edge-case documented.
### Driver.Historian.Wonderware-004
@@ -161,7 +161,7 @@ lock), so the snapshot is internally consistent.
| Severity | Medium |
| Category | Error handling and resilience |
| Location | `Ipc/PipeServer.cs:120-128` |
| Status | Open |
| Status | Resolved |
**Description:** `RunAsync` re-accepts connections in a `while` loop. If
`RunOneConnectionAsync` throws synchronously and immediately on every iteration
@@ -175,7 +175,7 @@ seconds) before re-accepting after a caught exception, and consider a
consecutive-failure threshold that escalates to a fatal exit so the supervisor can
restart the sidecar cleanly.
**Resolution:** _(open)_
**Resolution:** Resolved 2026-05-22 — added exponential backoff (250 ms → 8 s, six steps) after each connection-loop failure and a `MaxConsecutiveFailures=20` threshold that re-throws so the SCM/NSSM supervisor can restart the sidecar cleanly.
### Driver.Historian.Wonderware-007
@@ -235,7 +235,7 @@ treat an SDK error as an empty history.
| Severity | Medium |
| Category | Performance and resource management |
| Location | `Backend/HistorianDataSource.cs:382-395`, `Ipc/Contracts.cs:85-99` |
| Status | Open |
| Status | Resolved |
**Description:** `ReadAggregateAsync` drains `query.MoveNext` into `results` with
no upper bound, unlike `ReadRawAsync`, which honours `maxValues` /
@@ -252,7 +252,7 @@ sidecar holds the whole result set in memory.
`ReadProcessedRequest`. Reject or truncate result sets that would exceed the frame
cap with an explicit error reply rather than letting `WriteAsync` throw.
**Resolution:** _(open)_
**Resolution:** Resolved 2026-05-22 — applied `_config.MaxValuesPerRead` as a bucket cap in `ReadAggregateAsync` mirroring the raw-read path; truncation logs a Warning with the limit and a hint to widen `IntervalMs` or reduce the time range.
### Driver.Historian.Wonderware-010
@@ -7,7 +7,7 @@
| Review date | 2026-05-22 |
| Commit reviewed | `76d35d1` |
| Status | Reviewed |
| Open findings | 8 |
| Open findings | 3 |
## Checklist coverage
@@ -66,7 +66,7 @@ assertion was corrected from 16640 to 0x2100 with system-bank regression cases a
| Severity | Medium |
| Category | Correctness & logic bugs |
| Location | `ModbusAddressParser.cs:86-94` |
| Status | Open |
| Status | Resolved |
**Description:** In the 3-field disambiguation, an empty 3rd field (`40001:F:`) reaches
`parts[2].All(char.IsDigit)`. `Enumerable.All` returns true for an empty sequence, so the empty
@@ -79,7 +79,7 @@ colon gets no diagnostic.
**Recommendation:** Reject an empty 3rd field explicitly, or guard the `All(char.IsDigit)` branch
with `parts[2].Length > 0`.
**Resolution:** _(open)_
**Resolution:** Resolved 2026-05-22 — added an explicit `parts[2].Length == 0` check before the `All(char.IsDigit)` branch that returns a descriptive error, so a trailing colon typo produces a diagnostic instead of silently parsing as a scalar.
### Driver.Modbus.Addressing-003
@@ -88,7 +88,7 @@ with `parts[2].Length > 0`.
| Severity | Medium |
| Category | Correctness & logic bugs |
| Location | `ModbusAddressParser.cs:405-406`, `ModbusAddressParser.cs:128` |
| Status | Open |
| Status | Resolved |
**Description:** `LooksLikeByteOrderToken` classifies any 4-letter token as a byte-order token.
A 3-field address whose 3rd field is a 4-letter type-like token (e.g. `40001:S:BOOL`) is routed
@@ -101,7 +101,7 @@ byte order, so the diagnostic actively misdirects.
the error message to mention that field 3 is a byte order and field 2 is the type, or attempt a
type-parse fallback before emitting the byte-order error.
**Resolution:** _(open)_
**Resolution:** Resolved 2026-05-22 — in the 3-field disambiguation error path, a 4-letter alphanumeric token that looks like a type code now produces a diagnostic explicitly stating that field 3 is the byte-order slot and field 2 is the type slot, directing the user to the correct fix.
### Driver.Modbus.Addressing-004
@@ -110,7 +110,7 @@ type-parse fallback before emitting the byte-order error.
| Severity | Medium |
| Category | Correctness & logic bugs |
| Location | `ModbusAddressParser.cs:182-194` |
| Status | Open |
| Status | Resolved |
**Description:** The bit suffix is stripped using `text.IndexOf('.')` — the first dot. An input
such as `40001.5.3` produces a bit text of "5.3", rejected by `byte.TryParse` with the generic
@@ -124,7 +124,7 @@ asserting it, and the diagnostics for these malformed inputs are inconsistent.
region/offset segment is non-empty and dot-free after the strip so malformed inputs get a precise
diagnostic.
**Resolution:** _(open)_
**Resolution:** Resolved 2026-05-22 — switched to `LastIndexOf('.')`, added a non-empty guard for the address segment before the dot, and added a check that the address segment itself contains no dot (diagnosing multi-dot inputs with "contains multiple dots" rather than a confusing bit-index error).
### Driver.Modbus.Addressing-005
@@ -133,7 +133,7 @@ diagnostic.
| Severity | Medium |
| Category | Error handling & resilience |
| Location | `ModbusAddressParser.cs:200-213` |
| Status | Open |
| Status | Resolved |
**Description:** `TryParseRegionAndOffset` tries family-native, then mnemonic, then Modicon. When
all three fail it returns false with whatever error the Modicon parser last wrote (comment: "the
@@ -148,7 +148,7 @@ Modicon "must be 5 or 6 digits" error, hiding the real cause (e.g. "contains non
prefix, prefer and preserve the family-native error rather than letting the Modicon fallback
overwrite it.
**Resolution:** _(open)_
**Resolution:** Resolved 2026-05-22 — the family-native error is now captured in `familyNativeError` and, after all three branches fail, preferred over the Modicon fallback error when it is non-null (indicating the address matched a family prefix but failed deep inside the helper).
### Driver.Modbus.Addressing-006
@@ -202,7 +202,7 @@ structured tag form and is intentionally out of grammar scope.
| Severity | Medium |
| Category | Testing coverage |
| Location | `tests/Drivers/ZB.MOM.WW.OtOpcUa.Driver.Modbus.Addressing.Tests/` |
| Status | Open |
| Status | Resolved |
**Description:** Several edge cases of the address arithmetic are untested or asserted wrong:
(a) DL205 system V-memory mapping is tested only with the incorrect expected value
@@ -217,7 +217,7 @@ are exactly the high-risk surface this module owns, and they are the least cover
and for the parser count/bit/field edge cases. Correct the V40400 assertion as part of fixing
finding -001.
**Resolution:** _(open)_
**Resolution:** Resolved 2026-05-22 — added `ModbusAddressEdgeCaseTests.cs` covering: empty 3rd-field rejection, multi-dot input rejection, `UserVMemoryToPdu` overflow, `AddOctalOffset` overflow via Y and C helpers, `SystemVMemoryToPdu` base/overflow, `MelsecAddress.ParseHex` overflow, `DRegisterToHolding` and `MRelayToCoil` bank-base overflow.
### Driver.Modbus.Addressing-009
+5 -5
View File
@@ -7,7 +7,7 @@
| Review date | 2026-05-22 |
| Commit reviewed | `76d35d1` |
| Status | Reviewed |
| Open findings | 8 |
| Open findings | 6 |
## Checklist coverage
@@ -36,7 +36,7 @@ a category produced nothing rather than leaving it blank.
| Severity | Medium |
| Category | Correctness & logic bugs |
| Location | `src/Drivers/Cli/ZB.MOM.WW.OtOpcUa.Driver.Modbus.Cli/Commands/SubscribeCommand.cs:43-51` |
| Status | Open |
| Status | Resolved |
**Description:** `SubscribeCommand` synthesises its `ModbusTagDefinition` with only
`Name`, `Region`, `Address`, `DataType`, `Writable`, and `ByteOrder` — it never
@@ -54,7 +54,7 @@ options to `SubscribeCommand` (mirroring `ReadCommand`) and pass them into the
actually supports and reject `BitInRegister` / `String` at command entry with a
clear message.
**Resolution:** _(open)_
**Resolution:** Resolved 2026-05-22 — added `--bit-index`, `--string-length`, and `--string-byte-order` options to `SubscribeCommand`, mirroring `ReadCommand`, and passed them through to `ModbusTagDefinition` so `BitInRegister` and `String` types subscribe correctly.
### Driver.Modbus.Cli-002
@@ -63,7 +63,7 @@ clear message.
| Severity | Medium |
| Category | Correctness & logic bugs |
| Location | `src/Drivers/Cli/ZB.MOM.WW.OtOpcUa.Driver.Modbus.Cli/Commands/WriteCommand.cs:54-89` |
| Status | Open |
| Status | Resolved |
**Description:** `WriteCommand` rejects read-only regions (`DiscreteInputs` /
`InputRegisters`) but does not validate that `--type` is meaningful for the
@@ -78,7 +78,7 @@ region only supports `Bool`-style boolean values.
combined with any non-boolean `--type` (anything other than `Bool`), with a
message explaining coils carry a single bit.
**Resolution:** _(open)_
**Resolution:** Resolved 2026-05-22 — added a `Region == Coils && DataType != Bool` check immediately after the read-only-region guard, throwing `CommandException` with a message explaining that coils carry a single bit and only `--type Bool` is valid.
### Driver.Modbus.Cli-003
+9 -9
View File
@@ -7,7 +7,7 @@
| Review date | 2026-05-22 |
| Commit reviewed | `76d35d1` |
| Status | Reviewed |
| Open findings | 11 |
| Open findings | 7 |
## Checklist coverage
@@ -48,13 +48,13 @@
| Severity | Medium |
| Category | Correctness & logic bugs |
| Location | `ModbusDriver.cs:127-186` |
| Status | Open |
| Status | Resolved |
**Description:** `ShutdownAsync` never clears `_tagsByName`, and `InitializeAsync` repopulates it with `_tagsByName[t.Name] = t` (`ModbusDriver.cs:134`) without clearing first. `ReinitializeAsync` calls `ShutdownAsync` then `InitializeAsync`. Because `_options.Tags` is fixed for a driver instance, the same set re-inserts harmlessly today — but the asymmetry is a latent bug: any future path that re-runs init with a different tag set leaves stale tag entries that resolve reads/writes against deleted nodes. `_lastPublishedByRef` and `_lastWrittenByRef` similarly survive a Reinitialize, retaining deadband/write-suppression baselines against the old config, while `_autoProhibited` *is* deliberately cleared (`ModbusDriver.cs:179`) — the inconsistency shows the clearing was simply overlooked.
**Recommendation:** Clear `_tagsByName`, `_lastPublishedByRef`, and `_lastWrittenByRef` in `ShutdownAsync` (or at the top of `InitializeAsync`) so a Reinitialize starts from a clean state, consistent with the existing `_autoProhibited.Clear()`.
**Resolution:** _(open)_
**Resolution:** Resolved 2026-05-22 — added `_tagsByName.Clear()`, `_lastPublishedByRef.Clear()`, and `_lastWrittenByRef.Clear()` to `ShutdownAsync` (via the new shared `TeardownAsync` helper) so a `ReinitializeAsync` cycle always starts from a clean state, consistent with the existing `_autoProhibited.Clear()`.
### Driver.Modbus-003
@@ -78,13 +78,13 @@
| Severity | Medium |
| Category | Performance & resource management |
| Location | `ModbusDriver.cs:1468-1473` |
| Status | Open |
| Status | Resolved |
**Description:** `DisposeAsync()` only disposes `_transport`. Unlike `ShutdownAsync`, it does not cancel/dispose `_probeCts` or `_reprobeCts`, nor dispose `_poll` (the `PollGroupEngine`). A caller that uses `await using` or `using` without first calling `ShutdownAsync` leaks the probe loop, the re-probe loop, and every active polled subscription background `Task`/`CancellationTokenSource`. The two `Task.Run` loops keep running against a disposed transport, throwing on every tick. `Dispose()` (sync) has the same gap and additionally blocks on the async path via `GetAwaiter().GetResult()`.
**Recommendation:** Make `DisposeAsync` perform the same teardown as `ShutdownAsync` (cancel both CTSs, dispose them, dispose `_poll`) before disposing `_transport`. Have `ShutdownAsync` and `DisposeAsync` share a private `TeardownAsync` helper.
**Resolution:** _(open)_
**Resolution:** Resolved 2026-05-22 — refactored teardown into a shared `TeardownAsync` helper; `DisposeAsync` now delegates to it, cancelling both CTS objects, disposing `_poll`, and disposing `_transport` — matching `ShutdownAsync` and eliminating the probe/re-probe/poll-engine leak on `await using` callers.
### Driver.Modbus-005
@@ -93,13 +93,13 @@
| Severity | Medium |
| Category | Correctness & logic bugs |
| Location | `ModbusDriver.cs:777-798,323-330` |
| Status | Open |
| Status | Resolved |
**Description:** `ReadRegisterBlockAsync` and `ReadBitBlockAsync` index `resp[1]` and call `Buffer.BlockCopy(resp, 2, ..., resp[1])` with no bounds validation. `ModbusTcpTransport.SendOnceAsync` validates only the MBAP length field and the exception high-bit — it does not guarantee a non-exception response PDU is long enough to hold function-code + byte-count + the claimed data. A device (or buggy server) returning a 1-byte PDU, or a byte-count larger than the actual payload, produces an `IndexOutOfRangeException`/`ArgumentException` rather than a clean comms error. `DecodeBitArray` similarly indexes `bitmap[0]` (`ModbusDriver.cs:325`) without checking the bitmap is non-empty. In `ReadAsync` these are caught by the catch-all and mapped to `BadCommunicationError`, so impact is limited; in `ReadCoalescedAsync` the exception is opaque to the narrower catch arms.
**Recommendation:** In `ReadRegisterBlockAsync`/`ReadBitBlockAsync`, validate `resp.Length >= 2` and `resp.Length >= 2 + resp[1]` before slicing, throwing a descriptive `InvalidDataException`. Validate the decoded byte/bit count matches the request quantity.
**Resolution:** _(open)_
**Resolution:** Resolved 2026-05-22 — added `resp.Length >= 2`, `resp.Length >= 2 + resp[1]`, and byte-count-vs-quantity checks in both `ReadRegisterBlockAsync` and `ReadBitBlockAsync`, throwing `InvalidDataException` with precise diagnostics; added an empty-bitmap guard in `DecodeBitArray`.
### Driver.Modbus-006
@@ -108,13 +108,13 @@
| Severity | Medium |
| Category | Error handling & resilience |
| Location | `ModbusDriver.cs:514-524,532-550` |
| Status | Open |
| Status | Resolved |
**Description:** `RunReprobeOnceForTestAsync` reads `_transport` once at the top (`var transport = _transport ?? throw ...`). If `ShutdownAsync` runs (setting `_transport = null` and disposing it) while a re-probe pass is mid-iteration, the loop keeps issuing reads against the captured, disposed transport. `ReprobeLoopAsync` only catches `OperationCanceledException when (ct.IsCancellationRequested)` — an `ObjectDisposedException` from the disposed transport escapes `RunReprobeOnceForTestAsync` and faults the fire-and-forget background `Task`, silently killing the re-probe loop with the wrong failure mode.
**Recommendation:** Re-check `_transport`/cancellation inside the per-candidate loop, or broaden the `ReprobeLoopAsync` catch to also swallow `ObjectDisposedException` when `ct.IsCancellationRequested`.
**Resolution:** _(open)_
**Resolution:** Resolved 2026-05-22 — broadened `ReprobeLoopAsync` to catch `ObjectDisposedException when (ct.IsCancellationRequested)` and return cleanly, so a transport disposal race during shutdown exits the background task rather than faulting it.
### Driver.Modbus-007
+17 -17
View File
@@ -7,7 +7,7 @@
| Review date | 2026-05-22 |
| Commit reviewed | `76d35d1` |
| Status | Reviewed |
| Open findings | 10 |
| Open findings | 2 |
## Checklist coverage
@@ -108,13 +108,13 @@
| Severity | Medium |
| Category | Concurrency & thread safety |
| Location | `OpcUaClientDriver.cs:1330-1359` |
| Status | Open |
| Status | Resolved |
**Description:** OnReconnectComplete mutates `Session` (line 1347) directly from the reconnect-handler callback thread with no synchronization against ReadAsync/WriteAsync/DiscoverAsync/ShutdownAsync. Session is a plain auto-property with no memory barrier; a concurrent reader on another thread may observe a stale reference. ShutdownAsync (line 425) can also run concurrently with OnReconnectComplete: ShutdownAsync disposes the session and sets Session = null while OnReconnectComplete sets Session = newSession, and the interleaving is unspecified, potentially leaving a live session leaked after shutdown.
**Recommendation:** Route all Session mutations through a single lock (or the `_gate`). Make ShutdownAsync cancel the reconnect handler and wait for any in-flight OnReconnectComplete to settle before disposing the session.
**Resolution:** _(open)_
**Resolution:** Resolved 2026-05-22 — All Session mutations (assignment to newSession in OnReconnectComplete, and assignment to null in ShutdownAsync) now run inside the `_probeLock` critical section, preventing races between the reconnect callback thread, ShutdownAsync, and keep-alive callbacks. KeepAlive handler detach/attach is also done under `_probeLock` so a keep-alive cannot fire against the old session after the swap.
### Driver.OpcUaClient-007
@@ -123,13 +123,13 @@
| Severity | Medium |
| Category | Concurrency & thread safety |
| Location | `OpcUaClientDriver.cs:1374`, `:1376-1383`, `:508` |
| Status | Open |
| Status | Resolved |
**Description:** Two disposal races. (1) Dispose() does `DisposeAsync().AsTask().GetAwaiter().GetResult()`, synchronous blocking on async work. The Galaxy stability review (driver-stability.md, the 2026-04-13 findings) explicitly calls out sync-over-async on the OPC UA stack thread as a closed bug class; if Dispose() runs on the OPC UA stack thread or any thread the SDK continuations need, this deadlocks. (2) DisposeAsync disposes `_gate` (line 1382) after ShutdownAsync returns, but ShutdownAsync does not drain in-flight ReadAsync/WriteAsync operations holding `_gate`. An in-flight read that calls `_gate.Release()` (line 508) after `_gate.Dispose()` throws ObjectDisposedException on a background thread.
**Recommendation:** Provide an async disposal path callers prefer; if a sync Dispose() is unavoidable keep it free of .GetResult() on SDK-thread-affine work. Before disposing `_gate`, acquire it once so all in-flight gated operations have completed, or guard releases against disposal.
**Resolution:** _(open)_
**Resolution:** Resolved 2026-05-22 — `Dispose()` no longer calls `.GetAwaiter().GetResult()` on async work; it performs a purely-synchronous teardown (cancel reconnect handler, detach keep-alive, null Session under `_probeLock`). Both `Dispose()` and `DisposeAsync()` now acquire `_gate` once before disposing it, ensuring any in-flight gated operation has released before the gate is torn down.
### Driver.OpcUaClient-008
@@ -138,13 +138,13 @@
| Severity | Medium |
| Category | Error handling & resilience |
| Location | `OpcUaClientDriver.cs:1092-1099` |
| Status | Open |
| Status | Resolved |
**Description:** AcknowledgeAsync issues the batched CallAsync and then catches all exceptions with a best-effort empty catch; it also never inspects the per-call results in the success path (`_ = await session.CallAsync(...)`). An alarm acknowledgment the upstream server rejects (BadConditionAlreadyAcked, BadNodeIdUnknown, BadUserAccessDenied) is reported as success to the caller. IAlarmSource.AcknowledgeAsync has no per-item result, so the only way a failure could surface is via an exception, and the catch suppresses even that. Operators acking a critical alarm get no signal that the ack did not take.
**Recommendation:** Inspect CallMethodResult.StatusCode for each result and log Bad codes; rethrow (or surface via driver health) genuine transport failures rather than swallowing them. Consider extending the contract so per-ack failures propagate.
**Resolution:** _(open)_
**Resolution:** Resolved 2026-05-22 — `AcknowledgeAsync` now inspects each `CallMethodResult.StatusCode` in the success path and logs a Warning for any Bad code (BadConditionAlreadyAcked, BadNodeIdUnknown, BadUserAccessDenied, etc.). `OperationCanceledException` (transport timeout) is now re-thrown instead of swallowed; other transport exceptions are also logged with the driver instance ID. Requires `ILogger<OpcUaClientDriver>` injected via new optional constructor parameter.
### Driver.OpcUaClient-009
@@ -153,13 +153,13 @@
| Severity | Medium |
| Category | Error handling & resilience |
| Location | `OpcUaClientDriver.cs:560-564` |
| Status | Open |
| Status | Resolved |
**Description:** WriteAsync's catch block fans out BadCommunicationError across the whole batch on any exception. Writes are non-idempotent by default (IWritable remarks, decision #44/#45): a timeout exception may fire after the upstream server already applied the write. Reporting BadCommunicationError (a code that reads as "definitely did not happen") for a write that may have succeeded is misleading; the OPC UA client downstream may safely re-issue and double-apply. The read path has the same fan-out but reads are idempotent so it is benign there; for writes the ambiguity matters.
**Recommendation:** Map write timeouts/cancellations to BadTimeout (which downstream correctly treats as "outcome unknown, do not blindly retry") rather than BadCommunicationError, and only use BadCommunicationError for failures that provably occurred before the request reached the wire.
**Resolution:** _(open)_
**Resolution:** Resolved 2026-05-22 — `WriteAsync`'s inner catch block now handles `OperationCanceledException` (timeout/cancellation) separately, mapping it to `BadTimeout` (0x800A0000), while all other exceptions map to `BadCommunicationError`. The session-null pre-wire exit still correctly uses `BadCommunicationError`.
### Driver.OpcUaClient-010
@@ -168,13 +168,13 @@
| Severity | Medium |
| Category | Correctness & logic bugs |
| Location | `OpcUaClientDriver.cs:823-824` |
| Status | Open |
| Status | Resolved |
**Description:** MapUpstreamDataType maps DataTypeIds.Byte (the OPC UA unsigned 8-bit type) to DriverDataType.Int16. Byte should map to an unsigned driver type (UInt16 is the smallest unsigned available, matching how SByte belongs with the signed family). Mapping an unsigned 0-255 type onto signed Int16 misrepresents the type metadata downstream: clients see a signed type for an unsigned source, and any range/validation logic keyed off the driver data type is wrong. SByte correctly belongs with Int16; Byte does not.
**Recommendation:** Map DataTypeIds.Byte to DriverDataType.UInt16 (or add a Byte/UInt8 driver type if the enum supports finer granularity), keeping SByte and Int16 on the signed Int16 mapping.
**Resolution:** _(open)_
**Resolution:** Resolved 2026-05-22 — `MapUpstreamDataType` now maps `DataTypeIds.Byte``DriverDataType.UInt16` (unsigned family) while `DataTypeIds.SByte` remains on `DriverDataType.Int16` (signed family). Test `MapUpstreamDataType_Byte_maps_to_UInt16_unsigned_family` asserts the fix and `MapUpstreamDataType_maps_Byte_to_UInt16_not_Int16` guards the regression.
### Driver.OpcUaClient-011
@@ -198,13 +198,13 @@
| Severity | Medium |
| Category | Security |
| Location | `OpcUaClientDriver.cs:210-217` |
| Status | Open |
| Status | Resolved |
**Description:** When AutoAcceptCertificates is true the driver registers a CertificateValidation handler that accepts only StatusCodes.BadCertificateUntrusted. A self-signed or otherwise untrusted server certificate frequently fails validation with a different code first (BadCertificateChainIncomplete, BadCertificateTimeInvalid, BadCertificateHostNameInvalid), so auto-accept silently does not accept many real dev certificates and the connect fails confusingly. The handler is added to config.CertificateValidator but never removed; each driver instance leaks a delegate subscription on a validator that may be process-shared. The option doc says auto-accept is dev-only and must be false in production, but there is no runtime guard preventing AutoAcceptCertificates=true shipping to production and no log warning when it is enabled.
**Recommendation:** When auto-accepting for dev, accept the full set of certificate-validation error codes (or use the SDK AutoAcceptUntrustedCertificates path consistently). Emit a prominent warning log every time AutoAcceptCertificates is enabled so a production misconfiguration is visible. Detach the handler on shutdown.
**Resolution:** _(open)_
**Resolution:** Resolved 2026-05-22 — The cert-validation handler now accepts ALL validation errors (not only BadCertificateUntrusted) when `AutoAcceptCertificates=true`, so real dev certs with chain/host/time errors work. A `LogWarning` is emitted at startup whenever the flag is set. The handler delegate + validator reference are stored in `_certValidationHandler`/`_certValidatorRef` and detached in both `ShutdownAsync` and `Dispose()`/`DisposeAsync()` to prevent the delegate leak.
### Driver.OpcUaClient-013
@@ -213,13 +213,13 @@
| Severity | Medium |
| Category | Performance & resource management |
| Location | `OpcUaClientDriver.cs:436-437` |
| Status | Open |
| Status | Resolved |
**Description:** GetMemoryFootprint() is hard-coded to return 0 and FlushOptionalCachesAsync is a no-op Task.CompletedTask. docs/v2/driver-stability.md section "In-process only (Tier A/B)" makes per-instance allocation tracking a contract requirement, and driver-specs.md section 8 explicitly calls out browse-cache memory: BrowseStrategy=Full against a large remote server can cache tens of thousands of node descriptions and the per-instance budget should bound this. Returning 0 means the Core 30-second footprint poll can never detect this driver's browse-cache growth, and the cache-budget-breach to flush escalation path is dead code. A gateway pointed at a 10k-node server (the configured cap) silently evades the Tier-A memory-guard mechanism.
**Recommendation:** Track an approximate footprint for the discovered-node set and any cached browse state, return it from GetMemoryFootprint(), and implement FlushOptionalCachesAsync to drop droppable cache. If the driver genuinely holds no significant cache, document why 0 is correct.
**Resolution:** _(open)_
**Resolution:** Resolved 2026-05-22 — `DiscoverAsync` now updates a `_discoveredNodeCount` volatile counter after each pass. `GetMemoryFootprint()` returns `_discoveredNodeCount * 512` (conservative ~512 bytes per node for DriverAttributeInfo + strings). `FlushOptionalCachesAsync` resets `_discoveredNodeCount` to 0, signalling Core that re-discovery will rebuild cleanly. A 10k-node server now reports ~5 MB to the Core slope alarm rather than 0.
### Driver.OpcUaClient-014
@@ -243,10 +243,10 @@
| Severity | Medium |
| Category | Testing coverage |
| Location | `tests/Drivers/ZB.MOM.WW.OtOpcUa.Driver.OpcUaClient.Tests/*`, `tests/Drivers/ZB.MOM.WW.OtOpcUa.Driver.OpcUaClient.IntegrationTests/OpcUaClientSmokeTests.cs` |
| Status | Open |
| Status | Resolved |
**Description:** Unit-test coverage is solid for the pure mappers (MapSeverity, MapUpstreamDataType, MapSecurityPolicy, MapAggregateToNodeId, BuildCertificateIdentity, ResolveEndpointCandidates) and for "throws before init" guards, but the highest-risk behaviours of a gateway driver have no test: the reconnect/session-swap path (OnKeepAlive to OnReconnectComplete, findings -001/-002/-005/-006), browse continuation-point handling (-003), the cascading-quality fan-out on a mid-batch transport failure, and namespace remapping (-004). The reconnect test file itself states wire-level disconnect-reconnect-resume coverage lands with the in-process fixture, i.e. the single largest gateway bug surface (per driver-specs.md section 8) is explicitly untested. The integration suite is Docker-fixture gated against opc-plc and is a smoke test only. The failed-reconnect-to-Faulted and concurrent-keep-alive races are pure-logic paths testable with a fake ISession.
**Recommendation:** Add tests exercising the reconnect callbacks with a stub session (success and give-up cases), a browse test with a paged/continuation-point server stub, and a read-batch test asserting upstream Bad StatusCodes pass through verbatim while a transport throw fans out the local fault code.
**Resolution:** _(open)_
**Resolution:** Resolved 2026-05-22 — Added `OpcUaClientMediumFindingsRegressionTests.cs` covering: (1) BadTimeout vs BadCommunicationError status-code distinction for the write-timeout path (Driver.OpcUaClient-009); (2) Byte→UInt16 mapping regression (Driver.OpcUaClient-010); (3) AutoAcceptCertificates warning log assertion (Driver.OpcUaClient-012); (4) GetMemoryFootprint/FlushOptionalCachesAsync contract (Driver.OpcUaClient-013); (5) MapSeverity thresholds, pre-init health, Session null pre-init, GetHostStatuses contract. Wire-level reconnect callback tests remain fixture-gated pending the in-process OPC UA server fixture.
+7 -7
View File
@@ -7,7 +7,7 @@
| Review date | 2026-05-22 |
| Commit reviewed | `76d35d1` |
| Status | Reviewed |
| Open findings | 7 |
| Open findings | 4 |
## Checklist coverage
@@ -36,7 +36,7 @@ a category produced nothing rather than leaving it blank.
| Severity | Medium |
| Category | Error handling & resilience |
| Location | `src/Drivers/Cli/ZB.MOM.WW.OtOpcUa.Driver.S7.Cli/Commands/WriteCommand.cs:65-80` |
| Status | Open |
| Status | Resolved |
**Description:** `WriteCommand.ParseValue` parses numeric and `DateTime` values with the
raw BCL parsers (`short.Parse`, `float.Parse`, `DateTime.Parse`, etc.). On malformed
@@ -54,7 +54,7 @@ re-throws `FormatException` and `OverflowException` as
`CliFx.Exceptions.CommandException` with a message that names the `--type` and the
offending value — matching the bool path. Update the test to expect `CommandException`.
**Resolution:** _(open)_
**Resolution:** Resolved 2026-05-22 — wrapped all numeric/DateTime BCL parses in `try/catch(FormatException)` and `try/catch(OverflowException)` that re-throw as `CommandException` with a message naming the `--type` and the offending value; updated `ParseValue_non_numeric_for_numeric_types_throws` to assert `CommandException`, and added an overflow-edge test.
### Driver.S7.Cli-002
@@ -63,7 +63,7 @@ offending value — matching the bool path. Update the test to expect `CommandEx
| Severity | Medium |
| Category | Design-document adherence |
| Location | `src/Drivers/Cli/ZB.MOM.WW.OtOpcUa.Driver.S7.Cli/Commands/ReadCommand.cs:22-29`, `Commands/WriteCommand.cs:21-33`, `Commands/SubscribeCommand.cs:18-21`; `docs/Driver.S7.Cli.md:70-73,80-81` |
| Status | Open |
| Status | Resolved |
**Description:** The `--type` option help text on `read`, `write`, and `subscribe`
advertises the full `S7DataType` set (`Int64 / UInt64 / Float64 / String / DateTime`),
@@ -83,7 +83,7 @@ Float32`) until the follow-up driver PR lands, or (b) keep the surface but add a
help text and `docs/Driver.S7.Cli.md`. Option (a) is preferred so the CLI does not offer
options that cannot succeed.
**Resolution:** _(open)_
**Resolution:** Resolved 2026-05-22 — updated the `--type` help text on `read`, `write`, and `subscribe` to list the implemented set (Bool/Byte/Int16/UInt16/Int32/UInt32/Float32) and appended a one-line caveat that Int64/UInt64/Float64/String/DateTime are not yet implemented and will return BadNotSupported.
### Driver.S7.Cli-003
@@ -92,7 +92,7 @@ options that cannot succeed.
| Severity | Medium |
| Category | Error handling & resilience |
| Location | `src/Drivers/Cli/ZB.MOM.WW.OtOpcUa.Driver.S7.Cli/Commands/ProbeCommand.cs:38-50` |
| Status | Open |
| Status | Resolved |
**Description:** `ProbeCommand` XML doc and the `Driver.S7.Cli.md` "fastest is the
device talking" framing say the probe "connects ... prints health" and "surfaces
@@ -111,7 +111,7 @@ report derived from `driver.GetHealth()` (which `InitializeAsync` sets to
`Faulted` with the exception message before re-throwing). The probe should report an
unreachable device, not crash on it.
**Resolution:** _(open)_
**Resolution:** Resolved 2026-05-22 — wrapped the `InitializeAsync` + `ReadAsync` body in a `try/catch` that on any non-cancellation failure still prints the structured `Host:`, `CPU:`, `Health:`, and `Last error:` lines derived from `driver.GetHealth()`, so an unreachable device produces a health report rather than a stack trace.
### Driver.S7.Cli-004
+11 -11
View File
@@ -7,7 +7,7 @@
| Review date | 2026-05-22 |
| Commit reviewed | `76d35d1` |
| Status | Reviewed |
| Open findings | 10 |
| Open findings | 5 |
## Checklist coverage
@@ -68,7 +68,7 @@ rather than throwing a misleading type-mismatch on every read.
| Severity | Medium |
| Category | Correctness & logic bugs |
| Location | `S7Driver.cs:350` |
| Status | Open |
| Status | Resolved |
**Description:** MapDataType collapses S7DataType.UInt32 to DriverDataType.Int32.
UInt32 values above int.MaxValue (2^31-1) wrap to negative when surfaced to the
@@ -80,7 +80,7 @@ called out.
unsigned range, or add the missing unsigned DriverDataType members. At minimum
correct the comment so the lossiness of UInt32 is documented.
**Resolution:** _(open)_
**Resolution:** Resolved 2026-05-22 — added an inline comment to the `MapDataType` switch explicitly documenting the UInt32→Int32 lossiness (same limitation as Int64/UInt64, tracked for a follow-up PR adding unsigned DriverDataType members); the code mapping is unchanged pending that follow-up.
### Driver.S7-003
@@ -110,7 +110,7 @@ at the top of ReadAsync and WriteAsync.
| Severity | Medium |
| Category | OtOpcUa conventions |
| Location | `S7Driver.cs` (whole file) |
| Status | Open |
| Status | Resolved |
**Description:** The driver performs no logging. CLAUDE.md Library Preferences
mandate Serilog with a rolling daily file sink. Every error path is an empty
@@ -124,7 +124,7 @@ and no event trail to diagnose an intermittent PLC.
success/failure, probe Running/Stopped transitions, PUT/GET-disabled detection,
and swallowed poll-loop / shutdown exceptions.
**Resolution:** _(open)_
**Resolution:** Resolved 2026-05-22 — injected `ILogger<S7Driver>` (optional, defaults to `NullLogger`) into the primary constructor; added structured log calls for connect success/failure, probe Running/Stopped transitions, and swallowed poll-loop exceptions, giving operators an event trail via Serilog.
### Driver.S7-005
@@ -224,7 +224,7 @@ TIA Portal PUT/GET toggle; a genuine device fault still maps to
| Severity | Medium |
| Category | Error handling & resilience |
| Location | `S7Driver.cs:286` |
| Status | Open |
| Status | Resolved |
**Description:** WriteAsync catch ladder is coarser than ReadAsync and loses
information. The generic catch (Exception) maps everything - socket errors,
@@ -241,7 +241,7 @@ OperationCanceledException propagate, map socket/timeout faults to
BadCommunicationError, map value-conversion failures to a distinct out-of-range
status, and update _health to Degraded on transport failures.
**Resolution:** _(open)_
**Resolution:** Resolved 2026-05-22 — restructured `WriteAsync` catch ladder: `OperationCanceledException` now re-throws, genuine `PlcException` transport faults map to `BadDeviceFailure`/`Degraded`, `NotSupportedException` maps to `BadNotSupported`, the `IsAccessDenied` PlcException path maps to `BadNotSupported`/`Faulted`, and the catch-all maps to `BadCommunicationError` with a health update — matching `ReadAsync`'s structure.
### Driver.S7-009
@@ -332,7 +332,7 @@ lifecycle unit tests that pass `"{}"` are unaffected.
| Severity | Medium |
| Category | Design-document adherence |
| Location | `S7DriverOptions.cs:59`, `S7Driver.cs:457` |
| Status | Open |
| Status | Resolved |
**Description:** S7ProbeOptions.ProbeAddress is configured (default "MW0"),
documented at length ("the driver runs a tick loop that issues a cheap read
@@ -349,7 +349,7 @@ see no effect.
from S7ProbeOptions/S7ProbeDto and correct the XML docs to describe the
ReadStatusAsync-based probe.
**Resolution:** _(open)_
**Resolution:** Resolved 2026-05-22 — removed `ProbeAddress` from `S7ProbeOptions` and `S7ProbeDto`; updated the `S7DriverOptions.Probe` XML doc to describe the `ReadStatusAsync`-based probe accurately. Existing configs that set `probeAddress` are silently ignored (unknown JSON fields are tolerated by the deserializer).
### Driver.S7-013
@@ -385,7 +385,7 @@ live address space.
| Severity | Medium |
| Category | Testing coverage |
| Location | `tests/Drivers/ZB.MOM.WW.OtOpcUa.Driver.S7.Tests/` |
| Status | Open |
| Status | Resolved |
**Description:** Test coverage has notable gaps for the driver behavioural
core: (1) no test exercises the ReadOneAsync type-reinterpret switch (Int16 from
@@ -405,4 +405,4 @@ testable without a live PLC, and add a Timer/Counter rejection test. Track the
live/mock-server happy-path coverage as an explicit follow-up rather than an
open-ended deferral.
**Resolution:** _(open)_
**Resolution:** Resolved 2026-05-22 — factored `ReadOneAsync` type-reinterpret into `internal static ReinterpretRawValue` and `WriteOneAsync` boxing into `internal static BoxValueForWrite`; added `S7TypeMappingTests.cs` (26 tests) covering every implemented type round-trip (Bool/Byte/UInt16/Int16/UInt32/Int32/Float32), unsupported-type `NotSupportedException` assertions, and write overflow paths.
+13 -13
View File
@@ -7,7 +7,7 @@
| Review date | 2026-05-22 |
| Commit reviewed | `76d35d1` |
| Status | Reviewed |
| Open findings | 11 |
| Open findings | 5 |
## Checklist coverage
@@ -88,7 +88,7 @@ to `UInt16`, and `USInt`/`SInt` to their natural widths. Remove the stale "Int64
| Severity | Medium |
| Category | Correctness & logic bugs |
| Location | `AdsTwinCATClient.cs:264-281`, `283-300` |
| Status | Open |
| Status | Resolved |
**Description:** `MapToClrType` has a `_ => typeof(int)` fallthrough and `ConvertForWrite` has
a `_ => throw NotSupportedException` fallthrough. `TwinCATDataType.Structure` is a declared
@@ -103,7 +103,7 @@ Discovery `ToDriverDataType` maps `Structure` to `String`, compounding the incon
does not support UDT tags, and `BrowseSymbolsAsync` already correctly yields
`DataType = null` for them.
**Resolution:** _(open)_
**Resolution:** Resolved 2026-05-22 — `BuildTag` now parses the `DataType` field first and rejects `TwinCATDataType.Structure` with an `InvalidOperationException` that names the tag and explains the limitation; configuration-time failure replaces the previous silent garbage read or late `NotSupportedException`.
### Driver.TwinCAT-004
@@ -134,7 +134,7 @@ date/time semantics are intended to be exposed properly, track a follow-up to de
| Severity | Medium |
| Category | OtOpcUa conventions |
| Location | `TwinCATDriver.cs` (whole file), `AdsTwinCATClient.cs` (whole file) |
| Status | Open |
| Status | Resolved |
**Description:** The driver performs no logging. `CLAUDE.md` Library Preferences mandate
Serilog with a rolling daily file sink. Connect failures, ADS error codes, symbol-browse
@@ -147,7 +147,7 @@ success/failure per device, ADS errors with code, symbol-browse fallback (the `D
catch), native-notification registration failures, and host state transitions
(`TransitionDeviceState`).
**Resolution:** _(open)_
**Resolution:** Resolved 2026-05-22 — added optional `ILogger<TwinCATDriver>` constructor parameter (defaults to `NullLogger`); logs connect success/failure in `EnsureConnectedAsync`, ADS read errors in `ReadAsync`, symbol-browse fallback in `DiscoverAsync`, notification-registration failures in `SubscribeAsync`, and host-state transitions in `TransitionDeviceState`.
### Driver.TwinCAT-006
@@ -230,7 +230,7 @@ a dedicated managed task before invoking `OnChange`/`OnDataChange`, exactly as t
| Severity | Medium |
| Category | Concurrency & thread safety |
| Location | `TwinCATDriver.cs:80-99`, `41-72`, `366-388` |
| Status | Open |
| Status | Resolved |
**Description:** `ShutdownAsync` mutates `_devices`, `_tagsByName`, and `_nativeSubs` with no
synchronization while `ReadAsync`/`WriteAsync`/`SubscribeAsync` may be iterating or indexing
@@ -248,7 +248,7 @@ them on rebuild, or introduce a lifecycle lock / `volatile` running guard so rea
with `BadServerHalted`/`BadNodeIdUnknown` once shutdown begins. Cancel and await the probe
tasks before disposing `DeviceState`s.
**Resolution:** _(open)_
**Resolution:** Resolved 2026-05-22 — swapped `_devices` and `_tagsByName` to `ConcurrentDictionary` so concurrent `TryGetValue` / `Clear` calls are safe; added `DeviceState.ProbeTask` and updated `ShutdownAsync` to cancel then `await` each probe task before disposing the client and gate, eliminating the disposal race.
### Driver.TwinCAT-010
@@ -257,7 +257,7 @@ tasks before disposing `DeviceState`s.
| Severity | Medium |
| Category | Error handling & resilience |
| Location | `AdsTwinCATClient.cs:178-195` |
| Status | Open |
| Status | Resolved |
**Description:** `BrowseSymbolsAsync` checks `cancellationToken.IsCancellationRequested` and
does `yield break` (a clean completion) rather than throwing `OperationCanceledException`.
@@ -272,7 +272,7 @@ symbol set with no indication it was truncated. The `SymbolLoaderFactory.Create`
`yield break` so a cancelled browse surfaces as cancellation, not as a successful but partial
discovery.
**Resolution:** _(open)_
**Resolution:** Resolved 2026-05-22 — replaced `yield break` with `cancellationToken.ThrowIfCancellationRequested()` in the `foreach` loop so a cancelled browse propagates as `OperationCanceledException`, matching the `DiscoverAsync` expectation.
### Driver.TwinCAT-011
@@ -281,7 +281,7 @@ discovery.
| Severity | Medium |
| Category | Error handling & resilience |
| Location | `TwinCATStatusMapper.cs:29-42` |
| Status | Open |
| Status | Resolved |
**Description:** ADS error-code mapping has gaps and an inconsistency versus
`docs/v2/driver-specs.md` section 6. The spec documents symbol-not-found as 0x0701
@@ -298,7 +298,7 @@ symbol-version-changed is never routed to rediscovery (see Driver.TwinCAT-013).
explicit case for symbol-version-changed routed to rediscovery, and for PLC-in-Config mapped
to `BadOutOfService`/`BadInvalidState`.
**Resolution:** _(open)_
**Resolution:** Resolved 2026-05-22 — confirmed all codes from `Beckhoff.TwinCAT.Ads` 7.0.172 `AdsErrorCode` enum. Rewrote `MapAdsError` with 20 explicit cases keyed to the correct decimal values. Fixed the critical bug: `AdsSymbolVersionChanged` was `0x0702u` (= `DeviceInvalidGroup`) but the actual `DeviceSymbolVersionInvalid` is 1809 (0x0711); corrected constant and updated all comments. Added `BadOutOfService` for `DeviceNotReady` (PLC not running) and `BadInvalidState` for `DeviceInvalidState` (PLC in Config mode, 0x0712) and `DeviceSymbolVersionInvalid` (0x0711). Added `BadOutOfService`/`BadInvalidState` OPC UA StatusCode constants to the mapper.
### Driver.TwinCAT-012
@@ -307,7 +307,7 @@ to `BadOutOfService`/`BadInvalidState`.
| Severity | Medium |
| Category | Performance & resource management |
| Location | `TwinCATDriver.cs:102`, `AdsTwinCATClient.cs:178-195` |
| Status | Open |
| Status | Resolved |
**Description:** `GetMemoryFootprint()` returns a hard-coded 0. `docs/v2/driver-stability.md`
section "In-process only (Tier A/B) — driver-instance allocation tracking" requires the
@@ -325,7 +325,7 @@ re-discovery re-downloads the whole symbol table each time, itself a performance
stream-and-discard design is intentional, report the real footprint of `_nativeSubs` /
`_tagsByName` and document that the driver holds no flushable cache.
**Resolution:** _(open)_
**Resolution:** Resolved 2026-05-22 — `GetMemoryFootprint()` now returns `(_tagsByName.Count * 256L) + (_nativeSubs.Count * 512L)`; documented that the driver has no flushable symbol cache (stream-and-discard design) so `FlushOptionalCachesAsync` remains a documented no-op.
### Driver.TwinCAT-013
+155 -155
View File
@@ -10,37 +10,37 @@ Each module's `findings.md` is the source of truth; this file is generated from
| Module | Reviewer | Date | Commit | Status | Open | Total |
|---|---|---|---|---|---|---|
| [Admin](Admin/findings.md) | Claude Code | 2026-05-22 | `76d35d1` | Reviewed | 7 | 12 |
| [Analyzers](Analyzers/findings.md) | Claude Code | 2026-05-22 | `76d35d1` | Reviewed | 7 | 7 |
| [Client.CLI](Client.CLI/findings.md) | Claude Code | 2026-05-22 | `76d35d1` | Reviewed | 10 | 10 |
| [Client.Shared](Client.Shared/findings.md) | Claude Code | 2026-05-22 | `76d35d1` | Reviewed | 9 | 11 |
| [Client.UI](Client.UI/findings.md) | Claude Code | 2026-05-22 | `76d35d1` | Reviewed | 11 | 11 |
| [Configuration](Configuration/findings.md) | Claude Code | 2026-05-22 | `76d35d1` | Reviewed | 9 | 11 |
| [Core](Core/findings.md) | Claude Code | 2026-05-22 | `76d35d1` | Reviewed | 10 | 12 |
| [Core.Abstractions](Core.Abstractions/findings.md) | Claude Code | 2026-05-22 | `76d35d1` | Reviewed | 8 | 8 |
| [Core.AlarmHistorian](Core.AlarmHistorian/findings.md) | Claude Code | 2026-05-22 | `76d35d1` | Reviewed | 7 | 11 |
| [Core.ScriptedAlarms](Core.ScriptedAlarms/findings.md) | Claude Code | 2026-05-22 | `76d35d1` | Reviewed | 11 | 12 |
| [Core.Scripting](Core.Scripting/findings.md) | Claude Code | 2026-05-22 | `76d35d1` | Reviewed | 9 | 11 |
| [Core.VirtualTags](Core.VirtualTags/findings.md) | Claude Code | 2026-05-22 | `76d35d1` | Reviewed | 12 | 13 |
| [Driver.AbCip](Driver.AbCip/findings.md) | Claude Code | 2026-05-22 | `76d35d1` | Reviewed | 11 | 15 |
| [Driver.AbCip.Cli](Driver.AbCip.Cli/findings.md) | Claude Code | 2026-05-22 | `76d35d1` | Reviewed | 8 | 8 |
| [Driver.AbLegacy](Driver.AbLegacy/findings.md) | Claude Code | 2026-05-22 | `76d35d1` | Reviewed | 11 | 13 |
| [Driver.AbLegacy.Cli](Driver.AbLegacy.Cli/findings.md) | Claude Code | 2026-05-22 | `76d35d1` | Reviewed | 7 | 7 |
| [Driver.Cli.Common](Driver.Cli.Common/findings.md) | Claude Code | 2026-05-22 | `76d35d1` | Reviewed | 5 | 6 |
| [Driver.FOCAS](Driver.FOCAS/findings.md) | Claude Code | 2026-05-22 | `76d35d1` | Reviewed | 10 | 12 |
| [Admin](Admin/findings.md) | Claude Code | 2026-05-22 | `76d35d1` | Reviewed | 3 | 12 |
| [Analyzers](Analyzers/findings.md) | Claude Code | 2026-05-22 | `76d35d1` | Reviewed | 5 | 7 |
| [Client.CLI](Client.CLI/findings.md) | Claude Code | 2026-05-22 | `76d35d1` | Reviewed | 8 | 10 |
| [Client.Shared](Client.Shared/findings.md) | Claude Code | 2026-05-22 | `76d35d1` | Reviewed | 5 | 11 |
| [Client.UI](Client.UI/findings.md) | Claude Code | 2026-05-22 | `76d35d1` | Reviewed | 6 | 11 |
| [Configuration](Configuration/findings.md) | Claude Code | 2026-05-22 | `76d35d1` | Reviewed | 5 | 11 |
| [Core](Core/findings.md) | Claude Code | 2026-05-22 | `76d35d1` | Reviewed | 6 | 12 |
| [Core.Abstractions](Core.Abstractions/findings.md) | Claude Code | 2026-05-22 | `76d35d1` | Reviewed | 5 | 8 |
| [Core.AlarmHistorian](Core.AlarmHistorian/findings.md) | Claude Code | 2026-05-22 | `76d35d1` | Reviewed | 2 | 11 |
| [Core.ScriptedAlarms](Core.ScriptedAlarms/findings.md) | Claude Code | 2026-05-22 | `76d35d1` | Reviewed | 6 | 12 |
| [Core.Scripting](Core.Scripting/findings.md) | Claude Code | 2026-05-22 | `76d35d1` | Reviewed | 5 | 11 |
| [Core.VirtualTags](Core.VirtualTags/findings.md) | Claude Code | 2026-05-22 | `76d35d1` | Reviewed | 7 | 13 |
| [Driver.AbCip](Driver.AbCip/findings.md) | Claude Code | 2026-05-22 | `76d35d1` | Reviewed | 5 | 15 |
| [Driver.AbCip.Cli](Driver.AbCip.Cli/findings.md) | Claude Code | 2026-05-22 | `76d35d1` | Reviewed | 6 | 8 |
| [Driver.AbLegacy](Driver.AbLegacy/findings.md) | Claude Code | 2026-05-22 | `76d35d1` | Reviewed | 3 | 13 |
| [Driver.AbLegacy.Cli](Driver.AbLegacy.Cli/findings.md) | Claude Code | 2026-05-22 | `76d35d1` | Reviewed | 6 | 7 |
| [Driver.Cli.Common](Driver.Cli.Common/findings.md) | Claude Code | 2026-05-22 | `76d35d1` | Reviewed | 2 | 6 |
| [Driver.FOCAS](Driver.FOCAS/findings.md) | Claude Code | 2026-05-22 | `76d35d1` | Reviewed | 5 | 12 |
| [Driver.FOCAS.Cli](Driver.FOCAS.Cli/findings.md) | Claude Code | 2026-05-22 | `76d35d1` | Reviewed | 5 | 5 |
| [Driver.Galaxy](Driver.Galaxy/findings.md) | Claude Code | 2026-05-22 | `76d35d1` | Reviewed | 11 | 14 |
| [Driver.Historian.Wonderware](Driver.Historian.Wonderware/findings.md) | Claude Code | 2026-05-22 | `76d35d1` | Reviewed | 11 | 12 |
| [Driver.Historian.Wonderware.Client](Driver.Historian.Wonderware.Client/findings.md) | Claude Code | 2026-05-22 | `76d35d1` | Reviewed | 9 | 10 |
| [Driver.Modbus](Driver.Modbus/findings.md) | Claude Code | 2026-05-22 | `76d35d1` | Reviewed | 11 | 12 |
| [Driver.Modbus.Addressing](Driver.Modbus.Addressing/findings.md) | Claude Code | 2026-05-22 | `76d35d1` | Reviewed | 8 | 9 |
| [Driver.Modbus.Cli](Driver.Modbus.Cli/findings.md) | Claude Code | 2026-05-22 | `76d35d1` | Reviewed | 8 | 8 |
| [Driver.OpcUaClient](Driver.OpcUaClient/findings.md) | Claude Code | 2026-05-22 | `76d35d1` | Reviewed | 10 | 15 |
| [Driver.S7](Driver.S7/findings.md) | Claude Code | 2026-05-22 | `76d35d1` | Reviewed | 10 | 14 |
| [Driver.S7.Cli](Driver.S7.Cli/findings.md) | Claude Code | 2026-05-22 | `76d35d1` | Reviewed | 7 | 7 |
| [Driver.TwinCAT](Driver.TwinCAT/findings.md) | Claude Code | 2026-05-22 | `76d35d1` | Reviewed | 11 | 16 |
| [Driver.Galaxy](Driver.Galaxy/findings.md) | Claude Code | 2026-05-22 | `76d35d1` | Reviewed | 4 | 14 |
| [Driver.Historian.Wonderware](Driver.Historian.Wonderware/findings.md) | Claude Code | 2026-05-22 | `76d35d1` | Reviewed | 7 | 12 |
| [Driver.Historian.Wonderware.Client](Driver.Historian.Wonderware.Client/findings.md) | Claude Code | 2026-05-22 | `76d35d1` | Reviewed | 5 | 10 |
| [Driver.Modbus](Driver.Modbus/findings.md) | Claude Code | 2026-05-22 | `76d35d1` | Reviewed | 7 | 12 |
| [Driver.Modbus.Addressing](Driver.Modbus.Addressing/findings.md) | Claude Code | 2026-05-22 | `76d35d1` | Reviewed | 3 | 9 |
| [Driver.Modbus.Cli](Driver.Modbus.Cli/findings.md) | Claude Code | 2026-05-22 | `76d35d1` | Reviewed | 6 | 8 |
| [Driver.OpcUaClient](Driver.OpcUaClient/findings.md) | Claude Code | 2026-05-22 | `76d35d1` | Reviewed | 2 | 15 |
| [Driver.S7](Driver.S7/findings.md) | Claude Code | 2026-05-22 | `76d35d1` | Reviewed | 5 | 14 |
| [Driver.S7.Cli](Driver.S7.Cli/findings.md) | Claude Code | 2026-05-22 | `76d35d1` | Reviewed | 4 | 7 |
| [Driver.TwinCAT](Driver.TwinCAT/findings.md) | Claude Code | 2026-05-22 | `76d35d1` | Reviewed | 5 | 16 |
| [Driver.TwinCAT.Cli](Driver.TwinCAT.Cli/findings.md) | Claude Code | 2026-05-22 | `76d35d1` | Reviewed | 7 | 7 |
| [Server](Server/findings.md) | Claude Code | 2026-05-22 | `76d35d1` | Reviewed | 12 | 15 |
| [Server](Server/findings.md) | Claude Code | 2026-05-22 | `76d35d1` | Reviewed | 6 | 15 |
## Pending findings
@@ -48,132 +48,6 @@ Findings with status `Open` or `In Progress`, ordered by severity.
| ID | Severity | Category | Location | Description |
|---|---|---|---|---|
| Admin-006 | Medium | Security | `Components/Layout/MainLayout.razor:47-49`, `Program.cs:129,131-135` | `app.UseAntiforgery()` is enabled, but the Sign-out form (`<form method="post" action="/auth/logout">`) renders no antiforgery token, and the `MapPost("/auth/logout", ...)` endpoint does not call `.DisableAntiforgery()` or otherwise opt ou… |
| Admin-007 | Medium | Design-document adherence | `Components/Pages/Clusters/NewCluster.razor:91,95-96` | `NewCluster.CreateAsync` hardcodes `CreatedBy = "admin-ui"` (both on the `ServerCluster` row and the draft generation) instead of the signed-in operator principal name. `admin-ui.md` section "Audit" requires "the operator principal" be rec… |
| Admin-008 | Medium | Error handling & resilience | `Services/ReservationService.cs:28-37` | `ReservationService.ReleaseAsync` calls `sp_ReleaseExternalIdReservation` with only `@Kind`, `@Value`, `@ReleaseReason`. `admin-ui.md` section "Release an external-ID reservation" specifies the proc sets `ReleasedBy` to the FleetAdmin who… |
| Admin-009 | Medium | Testing coverage | `src/Server/ZB.MOM.WW.OtOpcUa.Admin` (whole module) | The module most security-critical behaviours have no enforced test coverage at the boundary that matters. There is no test that an unauthenticated request to a page or hub is rejected (which would have caught Admin-001/002/003), no test of… |
| Analyzers-001 | Medium | Correctness & logic bugs | `src/Tooling/ZB.MOM.WW.OtOpcUa.Analyzers/UnwrappedCapabilityCallAnalyzer.cs:135-139` | `IsInsideWrapperLambda` treats a guarded call as "wrapped" if it is textually inside ANY lambda that is an argument to ANY invocation whose containing type is `CapabilityInvoker` or `AlarmSurfaceInvoker`. It matches the containing type onl… |
| Analyzers-006 | Medium | Testing coverage | `tests/Tooling/ZB.MOM.WW.OtOpcUa.Analyzers.Tests/UnwrappedCapabilityCallAnalyzerTests.cs` | The test suite exercises only 3 of the 7 guarded interfaces (`IReadable`, `IWritable`, `ITagDiscovery`) and one positive / one negative lambda case. Significant untested behaviour for an analyzer that gates a repo-wide resilience invariant… |
| Client.CLI-001 | Medium | Correctness & logic bugs | `Commands/HistoryReadCommand.cs:73`, `Commands/HistoryReadCommand.cs:76` | The start and end options are parsed with `DateTime.Parse(StartTime)` with no `IFormatProvider` or `DateTimeStyles`. Parsing therefore depends on the current OS culture: the same `--start "03/04/2026"` resolves to March 4 on an en-US box a… |
| Client.CLI-005 | Medium | Concurrency & thread safety | `Commands/SubscribeCommand.cs:66-78`, `Commands/AlarmsCommand.cs:52-64` | The `DataChanged` and `AlarmEvent` handlers write to `console.Output` (a `System.IO.TextWriter`) directly from the OPC UA SDK subscription/notification thread, while the command main flow is awaiting `Task.Delay(Timeout.Infinite, ct)` and… |
| Client.Shared-001 | Medium | Correctness & logic bugs | `OpcUaClientService.cs:552` | `OnAlarmEventNotification` returns early when `eventFields.EventFields` has fewer than 6 entries. The event filter built by `CreateAlarmEventFilter` always registers 13 select clauses, so a conforming server returns 13 fields. The `< 6` th… |
| Client.Shared-002 | Medium | Correctness & logic bugs | `OpcUaClientService.cs:351-355`, `OpcUaClientService.cs:373` | `GetRedundancyInfoAsync` performs unguarded unboxing casts on values read from the server: `(int)redundancySupportValue.Value` and `(byte)serviceLevelValue.Value`. Unlike the `ServerUriArray`/`ServerArray` reads below them, the `Redundancy… |
| Client.Shared-007 | Medium | Concurrency & thread safety | `OpcUaClientService.cs:581-622` | In the alarm fallback path, the `Task.Run` closure mutates the captured locals `activeState`, `ackedState`, `time`, and `capturedMessage`, then reads them when invoking `AlarmEvent`. Because the captured `_session` reference can be replace… |
| Client.Shared-008 | Medium | Error handling & resilience | `OpcUaClientService.cs:170-180`, `Helpers/ValueConverter.cs:15-31` | `WriteValueAsync` coerces a string input to the target type by reading the node's current value and inferring the type from `currentDataValue.Value`. When the node has never been written, or the read returns a `Bad` status with a null `Val… |
| Client.UI-001 | Medium | Correctness & logic bugs | `ViewModels/HistoryViewModel.cs:76`, `ViewModels/HistoryViewModel.cs:77` | `ReadHistoryAsync` runs as a `RelayCommand` body, which is invoked on the UI thread, so the bare `IsLoading = true` at line 76 happens to land on the right thread today. But `Results.Clear()` on the very next line is wrapped in `_dispatche… |
| Client.UI-002 | Medium | Correctness & logic bugs | `ViewModels/MainWindowViewModel.cs:255`, `ViewModels/MainWindowViewModel.cs:333` | `ConnectAsync` calls `await BrowseTree.LoadRootsAsync()` and `ViewHistoryForSelectedNode` calls `History.SelectedNodeId = ...` by dereferencing the nullable child view-model properties (`BrowseTreeViewModel?`, `HistoryViewModel?`) without… |
| Client.UI-005 | Medium | Concurrency & thread safety | `ViewModels/MainWindowViewModel.cs:286-304`, `ViewModels/MainWindowViewModel.cs:155-189` | `SubscriptionsViewModel` and `AlarmsViewModel` attach handlers to the long-lived `_service` events (`DataChanged`, `AlarmEvent`) in their constructors and detach them only via `Teardown()`. `Teardown()` is called from `DisconnectAsync` (op… |
| Client.UI-007 | Medium | Security | `Services/UserSettings.cs:22-23`, `Services/JsonSettingsService.cs:38-50`, `ViewModels/MainWindowViewModel.cs:393-408` | The OPC UA `UserName`-token password is persisted in cleartext. `UserSettings.Password` is a plain `string`, `JsonSettingsService.Save` serializes the whole settings object to `settings.json` under `LocalApplicationData`, and `SaveSettings… |
| Client.UI-008 | Medium | Performance & resource management | `ViewModels/MainWindowViewModel.cs:18`, `ViewModels/MainWindowViewModel.cs:125-148`, `App.axaml.cs:18-32` | `IOpcUaClientService` is declared `IDisposable` (`IOpcUaClientService.cs:10`), and the concrete service owns an OPC UA session plus SDK resources. `MainWindowViewModel` holds `_service` for the lifetime of the app but never calls `_service… |
| Configuration-002 | Medium | Correctness & logic bugs | `src/Core/ZB.MOM.WW.OtOpcUa.Configuration/Migrations/20260417215224_StoredProcedures.cs:325` | `sp_RollbackToGeneration` opens its own `BEGIN TRANSACTION`, clones rows into a new Draft, then `EXEC dbo.sp_PublishGeneration`, which itself runs `BEGIN TRANSACTION` (nesting `@@TRANCOUNT` to 2) and on its failure paths executes a bare `R… |
| Configuration-003 | Medium | Correctness & logic bugs | `src/Core/ZB.MOM.WW.OtOpcUa.Configuration/Validation/DraftValidator.cs:73` | `ValidatePathLength` computes path length with hard-coded constants — it always charges 64 chars for Enterprise+Site (`32 + 32 + ...`) regardless of the cluster's actual values. This over-rejects: a short Enterprise/Site is penalised by up… |
| Configuration-006 | Medium | Error handling & resilience | `src/Core/ZB.MOM.WW.OtOpcUa.Configuration/LocalCache/ResilientConfigReader.cs:79` | The fallback `catch` filters on `ex is not OperationCanceledException`. A SQL command timeout surfaced by ADO.NET as a `TaskCanceledException` (derives from `OperationCanceledException`) is then treated as caller cancellation and propagate… |
| Configuration-009 | Medium | Security | `src/Core/ZB.MOM.WW.OtOpcUa.Configuration/DesignTimeDbContextFactory.cs:14` | `DefaultConnectionString` embeds a plaintext `sa` password with `User Id=sa` directly in source, checked into the repository. Although used only at design time (`dotnet ef`), a checked-in `sa` credential normalises committing DB passwords… |
| Core-003 | Medium | Correctness & logic bugs | `src/Core/ZB.MOM.WW.OtOpcUa.Core/Authorization/PermissionTrie.cs:80-98` | `WalkSystemPlatform` records every Galaxy folder-segment grant with `NodeAclScopeKind.Equipment` (see the comment at lines 82-86) because `NodeAclScopeKind` has no `FolderSegment` member. The functional union of permission flags is unaffec… |
| Core-005 | Medium | Concurrency & thread safety | `src/Core/ZB.MOM.WW.OtOpcUa.Core/Authorization/PermissionTrieCache.cs:59-70` | `Prune` mutates the `ConcurrentDictionary` with a plain indexer assignment (`_byCluster[clusterId] = new ClusterEntry(...)`) after a separate `TryGetValue` read. If `Install` runs concurrently for the same cluster, the `AddOrUpdate` in `In… |
| Core-006 | Medium | Concurrency & thread safety | `src/Core/ZB.MOM.WW.OtOpcUa.Core/OpcUa/GenericDriverNodeManager.cs:42-64` | `BuildAddressSpaceAsync` is not guarded against being called more than once. A second call subscribes a second `_alarmForwarder` to `IAlarmSource.OnAlarmEvent` and overwrites the `_alarmForwarder` field, so the first delegate is leaked (st… |
| Core-007 | Medium | Error handling & resilience | `src/Core/ZB.MOM.WW.OtOpcUa.Core/Resilience/AlarmSurfaceInvoker.cs:75-83` | `UnsubscribeAsync` always routes through `_defaultHost`, even when an `IPerCallHostResolver` is wired and the original `SubscribeAsync` fanned the subscription out to a non-default host. The `IAlarmSubscriptionHandle` is opaque here and ca… |
| Core.Abstractions-001 | Medium | Correctness & logic bugs | `src/Core/ZB.MOM.WW.OtOpcUa.Core.Abstractions/PollGroupEngine.cs:112` | `PollOnceAsync` detects a change with `!Equals(lastSeen?.Value, current.Value)`. `object.Equals` falls back to reference equality for reference types that do not override it — including `T[]` array values. The capability interfaces explici… |
| Core.Abstractions-002 | Medium | Correctness & logic bugs | `src/Core/ZB.MOM.WW.OtOpcUa.Core.Abstractions/PollGroupEngine.cs:105-109` | `PollOnceAsync` iterates `state.TagReferences` and indexes the reader's result with `snapshots[i]`, assuming the driver-supplied `_reader` delegate returns exactly one snapshot per input reference in input order. The contract is documented… |
| Core.Abstractions-003 | Medium | Concurrency & thread safety | `src/Core/ZB.MOM.WW.OtOpcUa.Core.Abstractions/PollGroupEngine.cs:64,121-130` | `Subscribe` starts the poll loop with a fire-and-forget `Task.Run` and keeps no reference to the returned `Task`. Neither `Unsubscribe` nor `DisposeAsync` awaits the loop's completion — they only cancel the `CancellationTokenSource` and di… |
| Core.AlarmHistorian-003 | Medium | OtOpcUa conventions | `src/Core/ZB.MOM.WW.OtOpcUa.Core.AlarmHistorian/SqliteStoreAndForwardSink.cs:107-127,218-243,246-253` | `EnqueueAsync` is declared `async`-shaped (`Task EnqueueAsync(...)`) and the `IAlarmHistorianSink` contract explicitly states "the sink MUST NOT block the emitting thread … `EnqueueAsync` returns as soon as the queue row is committed." But… |
| Core.AlarmHistorian-005 | Medium | Concurrency & thread safety | `src/Core/ZB.MOM.WW.OtOpcUa.Core.AlarmHistorian/SqliteStoreAndForwardSink.cs:66-71,141-143,199,386-388` | The mutable status fields `_lastDrainUtc`, `_lastSuccessUtc`, `_lastError`, `_drainState`, and `_backoffIndex` are written by the drain timer thread inside `DrainOnceAsync` and read concurrently by `GetStatus()` / `CurrentBackoff` on Admin… |
| Core.AlarmHistorian-007 | Medium | Error handling & resilience | `src/Core/ZB.MOM.WW.OtOpcUa.Core.AlarmHistorian/SqliteStoreAndForwardSink.cs:172-174` | When the writer returns a wrong-cardinality result, the code throws `InvalidOperationException` after `WriteBatchAsync` has already succeeded. The events were potentially delivered to the historian, but no rows are deleted or dead-lettered… |
| Core.AlarmHistorian-009 | Medium | Design-document adherence | `src/Core/ZB.MOM.WW.OtOpcUa.Core.AlarmHistorian/SqliteStoreAndForwardSink.cs:317-347` | `docs/AlarmTracking.md` and the `IAlarmHistorianSink` contract present the SQLite queue as the durability guarantee — "Durably enqueue the event", "operator acks never block on the historian being reachable". But `EnforceCapacity` silently… |
| Core.AlarmHistorian-010 | Medium | Testing coverage | `tests/Core/ZB.MOM.WW.OtOpcUa.Core.AlarmHistorian.Tests/SqliteStoreAndForwardSinkTests.cs` | The test suite covers the happy paths well (Ack/Retry/PermanentFail, capacity eviction, retention purge, ctor validation) but leaves critical paths untested: (a) no test exercises a corrupt / `null`-deserializing `PayloadJson` row, so the… |
| Core.ScriptedAlarms-002 | Medium | Correctness & logic bugs | `ScriptedAlarmEngine.cs:162`, `ScriptedAlarmEngine.cs:90` | `LoadAsync` is written to be re-callable — it begins by calling `UnsubscribeFromUpstream()`, `_alarms.Clear()`, and `_alarmsReferencing.Clear()` (lines 90-92), which only makes sense if a reload is supported. But at line 162 it uncondition… |
| Core.ScriptedAlarms-004 | Medium | Concurrency & thread safety | `ScriptedAlarmEngine.cs:138-143`, `ScriptedAlarmEngine.cs:227-234` | During `LoadAsync`, `_upstream.SubscribeTag(path, OnUpstreamChange)` is called inside the `_evalGate` critical section (line 142). If an upstream implementation delivers an initial value synchronously from inside `SubscribeTag` (a common p… |
| Core.ScriptedAlarms-005 | Medium | Concurrency & thread safety | `ScriptedAlarmEngine.cs:365-369`, `ScriptedAlarmEngine.cs:416-424` | `Dispose` sets `_disposed = true`, disposes `_shelvingTimer`, and clears `_alarms`. A `RunShelvingCheck` callback already in flight on a thread-pool thread can have passed its `if (_disposed) return;` check (line 367) before `Dispose` ran,… |
| Core.ScriptedAlarms-007 | Medium | Error handling & resilience | `ScriptedAlarmEngine.cs:216`, `ScriptedAlarmEngine.cs:251`, `ScriptedAlarmEngine.cs:154`, `ScriptedAlarmEngine.cs:387` | Every state mutation calls `await _store.SaveAsync(...)` and relies on it succeeding. If the production SQL-backed `IAlarmStateStore` (Stream E) throws — transient SQL outage, deadlock, timeout — the exception propagates: in `ApplyAsync` i… |
| Core.ScriptedAlarms-012 | Medium | Testing coverage | `tests/Core/ZB.MOM.WW.OtOpcUa.Core.ScriptedAlarms.Tests/ScriptedAlarmEngineTests.cs` | Several engine behaviours central to the module have no test coverage: (1) the 5-second shelving timer / timed-shelve auto-expiry through the *engine* — only the pure `Part9StateMachine.ApplyShelvingCheck` is tested, never `ScriptedAlarmEn… |
| Core.Scripting-003 | Medium | Security | `TimedScriptEvaluator.cs:9`, `ScriptSandbox.cs:30` | There is no bound on memory a script may allocate or on the number of threads/tasks a script may spawn. The class docs acknowledge unbounded memory as "a budget concern" deferred to v3, but in-process execution means a script doing `new by… |
| Core.Scripting-004 | Medium | Correctness & logic bugs | `DependencyExtractor.cs:73` | The walker matches tag-access calls purely by spelling — any `InvocationExpressionSyntax` whose member name is `GetTag` or `SetVirtualTag` is treated as a `ScriptContext` tag access, regardless of the receiver. A script that defines a loca… |
| Core.Scripting-007 | Medium | Error handling & resilience | `TimedScriptEvaluator.cs:60` | `RunAsync` wraps the inner run in `Task.Run(...)` and then awaits `WaitAsync(Timeout, ct)`. If the caller-supplied `ct` cancels at roughly the same time the timeout elapses, the order in which `WaitAsync` observes the timeout vs. the cance… |
| Core.Scripting-010 | Medium | Testing coverage | `tests/Core/ZB.MOM.WW.OtOpcUa.Core.Scripting.Tests/ScriptSandboxTests.cs:54` | The sandbox-escape test suite covers only the four obvious vectors (File / Http / Process / Reflection) as direct member-access calls. It does not test: `typeof(forbidden)`, generic type arguments (`List<FileInfo>`), cast expressions to fo… |
| Core.VirtualTags-002 | Medium | Correctness & logic bugs | `src/Core/ZB.MOM.WW.OtOpcUa.Core.VirtualTags/VirtualTagEngine.cs:237` | The cold-start guard `if (!AreInputsReady(ctxCache)) return;` silently abandons the evaluation when any input is null or Bad-quality. For a chained virtual tag (C depends on B depends on driver tag A), if A is still Bad at startup, B is sk… |
| Core.VirtualTags-003 | Medium | Correctness & logic bugs | `src/Core/ZB.MOM.WW.OtOpcUa.Core.VirtualTags/VirtualTagEngine.cs:117-120` | The upstream-subscription loop in `Load` iterates `definitions.SelectMany(d => _tags[d.Path].Reads)`. If `definitions` contains two rows with the same Path, the first registers `_tags[Path]` and the second overwrites it, but `definitions`… |
| Core.VirtualTags-005 | Medium | Concurrency & thread safety | `src/Core/ZB.MOM.WW.OtOpcUa.Core.VirtualTags/VirtualTagSource.cs:50-64` | `SubscribeAsync` registers the per-path engine observers first (lines 52-56), then in a second loop reads the current value and fires the initial-data callback (lines 60-64). Between those two loops an upstream change can cascade and the e… |
| Core.VirtualTags-008 | Medium | Performance & resource management | `src/Core/ZB.MOM.WW.OtOpcUa.Core.VirtualTags/DependencyGraph.cs:81-115` | `TransitiveDependentsInOrder` calls `TopologicalSort()` (a full O(V+E) Kahn pass plus a Dictionary rank build) on every invocation, and it is invoked from `CascadeAsync` on every upstream change event (`OnUpstreamChange`). On a large graph… |
| Core.VirtualTags-012 | Medium | Testing coverage | `tests/Core/ZB.MOM.WW.OtOpcUa.Core.VirtualTags.Tests/` | Several behaviours of the engine have no test coverage: (1) the cold-start `AreInputsReady` guard -- no test exercises an upstream that is null/Bad at evaluation time and asserts the resulting tag state (see Core.VirtualTags-002); (2) `ctx… |
| Driver.AbCip-004 | Medium | Correctness & logic bugs | `AbCipDataType.cs:51-58`, `LibplctagTagRuntime.cs:47-49,53` | `ToDriverDataType` maps `LInt`/`ULInt` to `DriverDataType.Int32` (a TODO comment notes the gap) and `Dt` to `Int32`. But `LibplctagTagRuntime.DecodeValueAt` returns an actual `long` for `LInt`/`ULInt` (`_tag.GetInt64`, `(long)_tag.GetUInt6… |
| Driver.AbCip-005 | Medium | Correctness & logic bugs | `AbCipDriver.cs:124-141` | In `InitializeAsync`, when a `Structure` tag declares `Members`, the loop registers each fanned-out member into `_tagsByName` but the parent Structure tag itself is also left in `_tagsByName` (added at line 125 before the member check). A… |
| Driver.AbCip-006 | Medium | OtOpcUa conventions | `PlcTagHandle.cs:28-59`, `AbCipDriver.cs:806-807,832-833`, `LibplctagTagRuntime.cs:117` | `driver-specs.md` makes the SafeHandle-wrapped native handle a non-negotiable Tier-B protection ("Wrap every libplctag handle in a SafeHandle with finalizer calling plc_tag_destroy"). The repo ships `PlcTagHandle : SafeHandle` for this, bu… |
| Driver.AbCip-009 | Medium | Concurrency & thread safety | `AbCipDriver.cs:621-648`, `AbCipDriver.cs:591-614` | `EnsureTagRuntimeAsync` and `EnsureParentRuntimeAsync` are check-then-act on a non-thread-safe `Dictionary` (`device.Runtimes` / `device.ParentRuntimes`). `ReadAsync` is `IReadable` and may be invoked concurrently: the server read path, ea… |
| Driver.AbCip-010 | Medium | Error handling & resilience | `AbCipDriver.cs:621-648`, `AbCipDriver.cs:346-391` | Once `EnsureTagRuntimeAsync` successfully creates and initializes a `LibplctagTagRuntime`, that runtime is cached for the lifetime of the device and never re-created on failure. If the underlying native tag enters a permanently-bad state (… |
| Driver.AbCip-014 | Medium | Testing coverage | `tests/Drivers/ZB.MOM.WW.OtOpcUa.Driver.AbCip.Tests/AbCipStatusMapperTests.cs:28-40` | `AbCipStatusMapperTests.MapLibplctagStatus_maps_known_codes` asserts the mapper against the same wrong integer constants (-5, -7, -14, -16, -17) the production code uses (see Driver.AbCip-002). The test locks in the bug rather than catchin… |
| Driver.AbCip.Cli-001 | Medium | Error handling & resilience | `src/Drivers/Cli/ZB.MOM.WW.OtOpcUa.Driver.AbCip.Cli/Commands/WriteCommand.cs:70-85` | `ParseValue` parses every numeric Logix type with the BCL `*.Parse` methods (`sbyte.Parse`, `short.Parse`, `int.Parse`, `float.Parse`, ...). These throw the raw `FormatException` and `OverflowException` on bad operator input. The module's… |
| Driver.AbCip.Cli-002 | Medium | Correctness & logic bugs | `src/Drivers/Cli/ZB.MOM.WW.OtOpcUa.Driver.AbCip.Cli/Commands/ProbeCommand.cs:21-23`; `Commands/ReadCommand.cs:24-25`; `Commands/SubscribeCommand.cs:20-22` | `ProbeCommand`, `ReadCommand`, and `SubscribeCommand` expose `--type` as a free `AbCipDataType` enum option with no exclusion of `AbCipDataType.Structure`. Only `WriteCommand` rejects `Structure` (with an explicit `CommandException`). Pass… |
| Driver.AbLegacy-002 | Medium | Correctness & logic bugs | `AbLegacyDriver.cs:368` | In `WriteBitInWordAsync` the parent word is decoded with `Convert.ToInt32(parentRuntime.DecodeValue(AbLegacyDataType.Int, ...))`. `LibplctagLegacyTagRuntime.DecodeValue` for `AbLegacyDataType.Int` returns `(int)_tag.GetInt16(0)` - a sign-e… |
| Driver.AbLegacy-003 | Medium | Correctness & logic bugs | `AbLegacyAddress.cs:62-95` | `TryParse` does not reject several malformed PCCC addresses that the XML docs imply are invalid: - A sub-element and a bit index together (`T4:0.ACC/2`) parse successfully even though no PCCC element supports both. - I/O/S files with a fil… |
| Driver.AbLegacy-004 | Medium | Correctness & logic bugs | `LibplctagLegacyTagRuntime.cs:36-37` | `DecodeValue` for `AbLegacyDataType.Bit` with `bitIndex == null` returns `_tag.GetInt8(0) != 0`. A bit-file element (`B3:0/0`) is a single bit inside a 16-bit word; reading only the low byte (`GetInt8(0)`) means a `Bit` tag whose live bit… |
| Driver.AbLegacy-007 | Medium | Concurrency & thread safety | `AbLegacyDriver.cs:411-438`, `AbLegacyDriver.cs:386-409` | `EnsureTagRuntimeAsync` and `EnsureParentRuntimeAsync` are check-then-act: `device.Runtimes.TryGetValue(...)` then, after `await runtime.InitializeAsync`, `device.Runtimes[def.Name] = runtime`. `Dictionary` is not thread-safe, and two conc… |
| Driver.AbLegacy-008 | Medium | Concurrency & thread safety | `AbLegacyDriver.cs:21`, `AbLegacyDriver.cs:138-146`, `AbLegacyDriver.cs:216-229` | `_health` is a plain non-volatile reference field mutated from `ReadAsync`, `WriteAsync` (both can run on multiple threads / poll loops) and `InitializeAsync`/`ShutdownAsync`, and read by `GetHealth()` from yet another thread. There is no… |
| Driver.AbLegacy-009 | Medium | Error handling & resilience | `AbLegacyDriver.cs:41-74` | `InitializeAsync` starts probe loops with `Task.Run` inside the try block. If `InitializeAsync` fails - or is re-entered - after some probe loops are already started, the catch only sets `_health = Faulted` and rethrows; it does not cancel… |
| Driver.AbLegacy-010 | Medium | Error handling & resilience | `AbLegacyStatusMapper.cs:26-56` | `MapLibplctagStatus` maps the integer codes -5/-7/-14/-16/-17. These do not match the native libplctag PLCTAG_ERR_* constants (PLCTAG_ERR_TIMEOUT = -32, PLCTAG_ERR_NOT_FOUND = -22, PLCTAG_ERR_NOT_ALLOWED = -21, PLCTAG_ERR_OUT_OF_BOUNDS = -… |
| Driver.AbLegacy-012 | Medium | Design-document adherence | `PlcFamilies/AbLegacyPlcFamilyProfile.cs:7-54`, `AbLegacyDriver.cs:48-52` | `AbLegacyPlcFamilyProfile` declares four record properties - `DefaultCipPath`, `MaxTagBytes`, `SupportsStringFile`, `SupportsLongFile` - and only `LibplctagPlcAttribute` is ever consumed. In particular: - `DefaultCipPath` is dead: the per-… |
| Driver.AbLegacy.Cli-001 | Medium | Error handling & resilience | `Commands/WriteCommand.cs:46`, `Commands/WriteCommand.cs:62-72` | `WriteCommand.ExecuteAsync` calls `ParseValue(Value, DataType)` at line 46, *before* the `try` block and outside any catch. `ParseValue` uses `short.Parse` / `int.Parse` / `float.Parse`, which throw `FormatException` on malformed input (`-… |
| Driver.Cli.Common-002 | Medium | Correctness & logic bugs | `src/Drivers/Cli/ZB.MOM.WW.OtOpcUa.Driver.Cli.Common/SnapshotFormatter.cs:101-122` | `FormatStatus` matches the full 32-bit status word for exact equality against the shortlist. OPC UA status codes carry sub-code/flag bits in the low 16 bits (info type, structure-changed, semantics-changed, limit bits, overflow, etc.). A d… |
| Driver.Cli.Common-003 | Medium | Concurrency & thread safety | `src/Drivers/Cli/ZB.MOM.WW.OtOpcUa.Driver.Cli.Common/DriverCommandBase.cs:51-59` | `ConfigureLogging` assigns the process-global `Serilog.Log.Logger` without disposing the previously assigned logger and the library never calls `Log.CloseAndFlush()`. Each call creates a fresh `Logger` via `CreateLogger()` and overwrites `… |
| Driver.Cli.Common-005 | Medium | Testing coverage | `tests/Drivers/Cli/ZB.MOM.WW.OtOpcUa.Driver.Cli.Common.Tests/SnapshotFormatterTests.cs:27-37` | The `FormatStatus_names_well_known_status_codes` `[Theory]` asserts `0x80060000 => "BadTimeout"`, which encodes the wrong spec value (see Driver.Cli.Common-001). The test passes because it validates the formatter against the same incorrect… |
| Driver.FOCAS-003 | Medium | Correctness & logic bugs | `FocasDriver.cs:71-79` | In `InitializeAsync`, capability-matrix validation only runs when `_devices.TryGetValue(tag.DeviceHostAddress, out var device)` succeeds. A tag whose `DeviceHostAddress` does not match any configured device (a common config typo, e.g. a tr… |
| Driver.FOCAS-004 | Medium | OtOpcUa conventions | `FocasDriver.cs:374-379`, `WireFocasClient.cs:48-50` | `DiscoverAsync` emits user tags with `SecurityClass = tag.Writable ? SecurityClassification.Operate : SecurityClassification.ViewOnly`, and `FocasTagDefinition.Writable` defaults to `true` (also defaulted to `true` in the factory - `t.Writ… |
| Driver.FOCAS-005 | Medium | Concurrency & thread safety | `FocasDriver.cs:28`, `FocasDriver.cs:206-215`, `FocasDriver.cs:261`, `FocasDriver.cs:274` | `_health` is a plain (non-volatile) field mutated from multiple concurrent contexts - `ReadAsync`, `WriteAsync`, and the per-device `ProbeLoopAsync` can all run on different threads simultaneously (subscriptions go through `PollGroupEngine… |
| Driver.FOCAS-006 | Medium | Error handling & resilience | `FocasDriver.cs:859-874`, `WireFocasClient.cs:22-31` | `EnsureConnectedAsync` reuses the cached `IFocasClient` instance across a transient disconnect: it only checks `device.Client is { IsConnected: true }` and otherwise calls `ConnectAsync` again on the same object. For a `WireFocasClient` wh… |
| Driver.FOCAS-012 | Medium | Testing coverage | `FocasDriverFactoryExtensions.cs`, `FocasDriver.cs:495-629` (`FixedTreeLoopAsync`) | The unit test project does not exercise `FocasDriverFactoryExtensions.CreateInstance` with `FixedTree` / `AlarmProjection` / `HandleRecycle` config sections - which is why the config-mapping gap in Driver.FOCAS-001 was not caught. There is… |
| Driver.Galaxy-003 | Medium | Correctness & logic bugs | `Runtime/StatusCodeMap.cs:86` | `FromMxStatus` returns `Good` whenever `status.Success != 0`. The intent (per the surrounding comment "Honors the success flag") is that a non-zero `Success` means success. But if `MxStatusProxy.Success` is itself a native HRESULT/return c… |
| Driver.Galaxy-004 | Medium | Correctness & logic bugs | `GalaxyDriver.cs:901` | `OnPumpDataChange` reconstructs a raw OPC DA quality byte from an OPC UA `StatusCode` for the probe watcher: it shifts `StatusCode >> 30` and maps `0->192, 1->64, _->0`. The `StatusCode` was itself produced upstream by `StatusCodeMap.FromQ… |
| Driver.Galaxy-006 | Medium | Concurrency & thread safety | `GalaxyDriver.cs:848-861` | `OnAlarmFeedTransition` picks the "owner" handle with `_alarmSubscriptions.First()` under `_alarmHandlersLock`. `HashSet<T>.First()` enumeration order is unspecified and unstable across mutations — when multiple alarm subscriptions are act… |
| Driver.Galaxy-007 | Medium | Concurrency & thread safety | `GalaxyDriver.cs:937-968` | `Dispose()` is not synchronized against the capability methods. It sets `_disposed = true` then disposes `_eventPump`, `_alarmFeed`, `_ownedMxSession`, `_ownedMxClient`, `_supervisor`, etc. A concurrent `SubscribeAsync`/`ReadAsync`/`WriteA… |
| Driver.Galaxy-009 | Medium | Error handling & resilience | `GalaxyDriver.cs:354-371` | `StartDeployWatcher` launches the watch loop with `_ = _deployWatcher.StartAsync(CancellationToken.None)` — a fire-and-forget with a discarded `Task`. `StartAsync` can throw synchronously (`InvalidOperationException` if already started); t… |
| Driver.Galaxy-011 | Medium | Performance & resource management | `GalaxyDriver.cs:411` | `GetMemoryFootprint()` unconditionally returns `0` with a comment "PR 4.4 sets this from SubscriptionRegistry size" — PR 4.4 has shipped (the registry exists and is used) but the method was never updated. `IHostConnectivityProbe.GetMemoryF… |
| Driver.Galaxy-014 | Medium | Testing coverage | `src/Drivers/ZB.MOM.WW.OtOpcUa.Driver.Galaxy` (module-wide) | The reconnect/recovery path is the module's highest-risk surface and is effectively untested at the integration seam. The `ReconnectSupervisor` has a clean test seam (injectable `reopen`/`replay`/`backoffDelay`), but because nothing wires… |
| Driver.Historian.Wonderware-002 | Medium | Correctness and logic bugs | `Ipc/HistorianFrameHandler.cs:162`, `:181` | `HandleWriteAlarmEventsAsync` dereferences `req.Events.Length` in both the `_alarmWriter is null` branch (line 162) and the catch block (line 181). MessagePack deserializes an absent or explicit-nil array field as a `null` reference, not `… |
| Driver.Historian.Wonderware-003 | Medium | Correctness and logic bugs | `Backend/HistorianDataSource.cs:320-323`, `:457-460` | Raw and at-time reads decide whether a sample is a string or a numeric with `if (!string.IsNullOrEmpty(result.StringValue) && result.Value == 0)`. The `result.Value == 0` clause is intended to distinguish a real numeric zero from a string… |
| Driver.Historian.Wonderware-006 | Medium | Error handling and resilience | `Ipc/PipeServer.cs:120-128` | `RunAsync` re-accepts connections in a `while` loop. If `RunOneConnectionAsync` throws synchronously and immediately on every iteration (for example `new NamedPipeServerStream(...)` fails because the pipe name is already in use, or `PipeAc… |
| Driver.Historian.Wonderware-009 | Medium | Performance and resource management | `Backend/HistorianDataSource.cs:382-395`, `Ipc/Contracts.cs:85-99` | `ReadAggregateAsync` drains `query.MoveNext` into `results` with no upper bound, unlike `ReadRawAsync`, which honours `maxValues` / `MaxValuesPerRead` and breaks. `ReadProcessedRequest` carries no max-buckets field. A processed read over a… |
| Driver.Historian.Wonderware.Client-002 | Medium | Correctness & logic bugs | `WonderwareHistorianClient.cs:154-199`, `IAlarmHistorianSink.cs:66-74` | `WriteBatchAsync` can never return `HistorianWriteOutcome.PermanentFail`. `HistorianWriteOutcome` defines three states (`Ack`, `RetryPlease`, `PermanentFail`) and the drain worker is documented to move the event to the dead-letter table on… |
| Driver.Historian.Wonderware.Client-005 | Medium | Error handling & resilience | `Ipc/FrameReader.cs:31-32` | After reading the 4-byte length prefix, `ReadFrameAsync` reads the kind byte with the synchronous, blocking `_stream.ReadByte()` and ignores the `CancellationToken`. On a `NamedPipeClientStream` with `PipeOptions.Asynchronous`, a synchrono… |
| Driver.Historian.Wonderware.Client-007 | Medium | Security | `WonderwareHistorianClient.cs:276` | `ToSnapshots` deserializes peer-supplied bytes with `MessagePackSerializer.Deserialize<object>(dto.ValueBytes)`, typeless MessagePack deserialization. The `object` overload resolves runtime types from the wire payload. The client treats th… |
| Driver.Historian.Wonderware.Client-009 | Medium | Testing coverage | `tests/Drivers/ZB.MOM.WW.OtOpcUa.Driver.Historian.Wonderware.Client.Tests/WonderwareHistorianClientTests.cs` | The suite covers happy paths, server-error, bad-secret, a single reconnect and health counters, but several critical paths are untested: (1) `ReadAtTimeAsync` with a partial/reordered sidecar reply, the contract-alignment case from finding… |
| Driver.Modbus-002 | Medium | Correctness & logic bugs | `ModbusDriver.cs:127-186` | `ShutdownAsync` never clears `_tagsByName`, and `InitializeAsync` repopulates it with `_tagsByName[t.Name] = t` (`ModbusDriver.cs:134`) without clearing first. `ReinitializeAsync` calls `ShutdownAsync` then `InitializeAsync`. Because `_opt… |
| Driver.Modbus-004 | Medium | Performance & resource management | `ModbusDriver.cs:1468-1473` | `DisposeAsync()` only disposes `_transport`. Unlike `ShutdownAsync`, it does not cancel/dispose `_probeCts` or `_reprobeCts`, nor dispose `_poll` (the `PollGroupEngine`). A caller that uses `await using` or `using` without first calling `S… |
| Driver.Modbus-005 | Medium | Correctness & logic bugs | `ModbusDriver.cs:777-798,323-330` | `ReadRegisterBlockAsync` and `ReadBitBlockAsync` index `resp[1]` and call `Buffer.BlockCopy(resp, 2, ..., resp[1])` with no bounds validation. `ModbusTcpTransport.SendOnceAsync` validates only the MBAP length field and the exception high-b… |
| Driver.Modbus-006 | Medium | Error handling & resilience | `ModbusDriver.cs:514-524,532-550` | `RunReprobeOnceForTestAsync` reads `_transport` once at the top (`var transport = _transport ?? throw ...`). If `ShutdownAsync` runs (setting `_transport = null` and disposing it) while a re-probe pass is mid-iteration, the loop keeps issu… |
| Driver.Modbus.Addressing-002 | Medium | Correctness & logic bugs | `ModbusAddressParser.cs:86-94` | In the 3-field disambiguation, an empty 3rd field (`40001:F:`) reaches `parts[2].All(char.IsDigit)`. `Enumerable.All` returns true for an empty sequence, so the empty string is classified as a valid-shaped array count, assigned to `countPa… |
| Driver.Modbus.Addressing-003 | Medium | Correctness & logic bugs | `ModbusAddressParser.cs:405-406`, `ModbusAddressParser.cs:128` | `LooksLikeByteOrderToken` classifies any 4-letter token as a byte-order token. A 3-field address whose 3rd field is a 4-letter type-like token (e.g. `40001:S:BOOL`) is routed into `TryParseByteOrder`, producing the misleading diagnostic "U… |
| Driver.Modbus.Addressing-004 | Medium | Correctness & logic bugs | `ModbusAddressParser.cs:182-194` | The bit suffix is stripped using `text.IndexOf('.')` — the first dot. An input such as `40001.5.3` produces a bit text of "5.3", rejected by `byte.TryParse` with the generic "Bit index must be 0..15" message. A Modicon-style decimal-point… |
| Driver.Modbus.Addressing-005 | Medium | Error handling & resilience | `ModbusAddressParser.cs:200-213` | `TryParseRegionAndOffset` tries family-native, then mnemonic, then Modicon. When all three fail it returns false with whatever error the Modicon parser last wrote (comment: "the Modicon error is the more specific diagnostic"). For a non-Ge… |
| Driver.Modbus.Addressing-008 | Medium | Testing coverage | `tests/Drivers/ZB.MOM.WW.OtOpcUa.Driver.Modbus.Addressing.Tests/` | Several edge cases of the address arithmetic are untested or asserted wrong: (a) DL205 system V-memory mapping is tested only with the incorrect expected value (`ModbusFamilyParserTests.cs:20`, see finding -001); (b) there is no test for `… |
| Driver.Modbus.Cli-001 | Medium | Correctness & logic bugs | `src/Drivers/Cli/ZB.MOM.WW.OtOpcUa.Driver.Modbus.Cli/Commands/SubscribeCommand.cs:43-51` | `SubscribeCommand` synthesises its `ModbusTagDefinition` with only `Name`, `Region`, `Address`, `DataType`, `Writable`, and `ByteOrder` — it never exposes or passes `--bit-index`, `--string-length`, or `--string-byte-order`. A user running… |
| Driver.Modbus.Cli-002 | Medium | Correctness & logic bugs | `src/Drivers/Cli/ZB.MOM.WW.OtOpcUa.Driver.Modbus.Cli/Commands/WriteCommand.cs:54-89` | `WriteCommand` rejects read-only regions (`DiscreteInputs` / `InputRegisters`) but does not validate that `--type` is meaningful for the `Coils` region. `write -r Coils -a 5 -t UInt16 -v 42` builds a `Coils` tag with `DataType = UInt16`; t… |
| Driver.OpcUaClient-006 | Medium | Concurrency & thread safety | `OpcUaClientDriver.cs:1330-1359` | OnReconnectComplete mutates `Session` (line 1347) directly from the reconnect-handler callback thread with no synchronization against ReadAsync/WriteAsync/DiscoverAsync/ShutdownAsync. Session is a plain auto-property with no memory barrier… |
| Driver.OpcUaClient-007 | Medium | Concurrency & thread safety | `OpcUaClientDriver.cs:1374`, `:1376-1383`, `:508` | Two disposal races. (1) Dispose() does `DisposeAsync().AsTask().GetAwaiter().GetResult()`, synchronous blocking on async work. The Galaxy stability review (driver-stability.md, the 2026-04-13 findings) explicitly calls out sync-over-async… |
| Driver.OpcUaClient-008 | Medium | Error handling & resilience | `OpcUaClientDriver.cs:1092-1099` | AcknowledgeAsync issues the batched CallAsync and then catches all exceptions with a best-effort empty catch; it also never inspects the per-call results in the success path (`_ = await session.CallAsync(...)`). An alarm acknowledgment the… |
| Driver.OpcUaClient-009 | Medium | Error handling & resilience | `OpcUaClientDriver.cs:560-564` | WriteAsync's catch block fans out BadCommunicationError across the whole batch on any exception. Writes are non-idempotent by default (IWritable remarks, decision #44/#45): a timeout exception may fire after the upstream server already app… |
| Driver.OpcUaClient-010 | Medium | Correctness & logic bugs | `OpcUaClientDriver.cs:823-824` | MapUpstreamDataType maps DataTypeIds.Byte (the OPC UA unsigned 8-bit type) to DriverDataType.Int16. Byte should map to an unsigned driver type (UInt16 is the smallest unsigned available, matching how SByte belongs with the signed family).… |
| Driver.OpcUaClient-012 | Medium | Security | `OpcUaClientDriver.cs:210-217` | When AutoAcceptCertificates is true the driver registers a CertificateValidation handler that accepts only StatusCodes.BadCertificateUntrusted. A self-signed or otherwise untrusted server certificate frequently fails validation with a diff… |
| Driver.OpcUaClient-013 | Medium | Performance & resource management | `OpcUaClientDriver.cs:436-437` | GetMemoryFootprint() is hard-coded to return 0 and FlushOptionalCachesAsync is a no-op Task.CompletedTask. docs/v2/driver-stability.md section "In-process only (Tier A/B)" makes per-instance allocation tracking a contract requirement, and… |
| Driver.OpcUaClient-015 | Medium | Testing coverage | `tests/Drivers/ZB.MOM.WW.OtOpcUa.Driver.OpcUaClient.Tests/*`, `tests/Drivers/ZB.MOM.WW.OtOpcUa.Driver.OpcUaClient.IntegrationTests/OpcUaClientSmokeTests.cs` | Unit-test coverage is solid for the pure mappers (MapSeverity, MapUpstreamDataType, MapSecurityPolicy, MapAggregateToNodeId, BuildCertificateIdentity, ResolveEndpointCandidates) and for "throws before init" guards, but the highest-risk beh… |
| Driver.S7-002 | Medium | Correctness & logic bugs | `S7Driver.cs:350` | MapDataType collapses S7DataType.UInt32 to DriverDataType.Int32. UInt32 values above int.MaxValue (2^31-1) wrap to negative when surfaced to the OPC UA client, silently corrupting the value. The inline comment only flags Int64/UInt64 as "w… |
| Driver.S7-004 | Medium | OtOpcUa conventions | `S7Driver.cs` (whole file) | The driver performs no logging. CLAUDE.md Library Preferences mandate Serilog with a rolling daily file sink. Every error path is an empty catch block (Initialize cleanup line 130, ShutdownAsync lines 142/149/153, ProbeLoop line 483, PollL… |
| Driver.S7-008 | Medium | Error handling & resilience | `S7Driver.cs:286` | WriteAsync catch ladder is coarser than ReadAsync and loses information. The generic catch (Exception) maps everything - socket errors, timeouts, OverflowException from Convert.ToInt16 of an out-of-range value, NullReferenceException from… |
| Driver.S7-012 | Medium | Design-document adherence | `S7DriverOptions.cs:59`, `S7Driver.cs:457` | S7ProbeOptions.ProbeAddress is configured (default "MW0"), documented at length ("the driver runs a tick loop that issues a cheap read against S7ProbeOptions.ProbeAddress"), surfaced in the factory DTO (S7ProbeDto.ProbeAddress), and parsed… |
| Driver.S7-014 | Medium | Testing coverage | `tests/Drivers/ZB.MOM.WW.OtOpcUa.Driver.S7.Tests/` | Test coverage has notable gaps for the driver behavioural core: (1) no test exercises the ReadOneAsync type-reinterpret switch (Int16 from ushort, Int32 from uint, Float32 from UInt32 bits) - the most logic-heavy method in the driver is un… |
| Driver.S7.Cli-001 | Medium | Error handling & resilience | `src/Drivers/Cli/ZB.MOM.WW.OtOpcUa.Driver.S7.Cli/Commands/WriteCommand.cs:65-80` | `WriteCommand.ParseValue` parses numeric and `DateTime` values with the raw BCL parsers (`short.Parse`, `float.Parse`, `DateTime.Parse`, etc.). On malformed input these throw `FormatException` / `OverflowException`, which are *not* `CliFx.… |
| Driver.S7.Cli-002 | Medium | Design-document adherence | `src/Drivers/Cli/ZB.MOM.WW.OtOpcUa.Driver.S7.Cli/Commands/ReadCommand.cs:22-29`, `Commands/WriteCommand.cs:21-33`, `Commands/SubscribeCommand.cs:18-21`; `docs/Driver.S7.Cli.md:70-73,80-81` | The `--type` option help text on `read`, `write`, and `subscribe` advertises the full `S7DataType` set (`Int64 / UInt64 / Float64 / String / DateTime`), and `docs/Driver.S7.Cli.md` shows a worked `read ... -t String --string-length 80` exa… |
| Driver.S7.Cli-003 | Medium | Error handling & resilience | `src/Drivers/Cli/ZB.MOM.WW.OtOpcUa.Driver.S7.Cli/Commands/ProbeCommand.cs:38-50` | `ProbeCommand` XML doc and the `Driver.S7.Cli.md` "fastest is the device talking" framing say the probe "connects ... prints health" and "surfaces `BadNotSupported`" when PUT/GET is disabled. But when the PLC is unreachable (connection ref… |
| Driver.TwinCAT-003 | Medium | Correctness & logic bugs | `AdsTwinCATClient.cs:264-281`, `283-300` | `MapToClrType` has a `_ => typeof(int)` fallthrough and `ConvertForWrite` has a `_ => throw NotSupportedException` fallthrough. `TwinCATDataType.Structure` is a declared enum member, and a config-supplied tag can carry `DataType: "Structur… |
| Driver.TwinCAT-005 | Medium | OtOpcUa conventions | `TwinCATDriver.cs` (whole file), `AdsTwinCATClient.cs` (whole file) | The driver performs no logging. `CLAUDE.md` Library Preferences mandate Serilog with a rolling daily file sink. Connect failures, ADS error codes, symbol-browse failures (`DiscoverAsync` swallows them in a bare `catch`), notification-regis… |
| Driver.TwinCAT-009 | Medium | Concurrency & thread safety | `TwinCATDriver.cs:80-99`, `41-72`, `366-388` | `ShutdownAsync` mutates `_devices`, `_tagsByName`, and `_nativeSubs` with no synchronization while `ReadAsync`/`WriteAsync`/`SubscribeAsync` may be iterating or indexing those same plain `Dictionary<>` instances on other threads (`_devices… |
| Driver.TwinCAT-010 | Medium | Error handling & resilience | `AdsTwinCATClient.cs:178-195` | `BrowseSymbolsAsync` checks `cancellationToken.IsCancellationRequested` and does `yield break` (a clean completion) rather than throwing `OperationCanceledException`. `DiscoverAsync` (`TwinCATDriver.cs:274`) explicitly has `catch (Operatio… |
| Driver.TwinCAT-011 | Medium | Error handling & resilience | `TwinCATStatusMapper.cs:29-42` | ADS error-code mapping has gaps and an inconsistency versus `docs/v2/driver-specs.md` section 6. The spec documents symbol-not-found as 0x0701 (1793 decimal) and symbol-version-changed as 0x0702 (1794 decimal). `MapAdsError` maps decimal 1… |
| Driver.TwinCAT-012 | Medium | Performance & resource management | `TwinCATDriver.cs:102`, `AdsTwinCATClient.cs:178-195` | `GetMemoryFootprint()` returns a hard-coded 0. `docs/v2/driver-stability.md` section "In-process only (Tier A/B) — driver-instance allocation tracking" requires the footprint to reflect "bytes attributable to their own caches (symbol cache… |
| Server-003 | Medium | Correctness & logic bugs | `src/Server/ZB.MOM.WW.OtOpcUa.Server/Phase7/RingBufferHistoryWriter.cs:96-119` | `ReadRawAsync`'s XML doc claims "newest-first," but `TagRingBuffer.Snapshot()` returns oldest-to-newest and the loop preserves that order — so results are oldest-first. Also `maxValuesPerNode` is capped against total buffer size *before* t… |
| Server-005 | Medium | Concurrency & thread safety | `src/Server/ZB.MOM.WW.OtOpcUa.Server/Alarms/AlarmConditionService.cs:166`, `src/Server/ZB.MOM.WW.OtOpcUa.Server/OpcUa/DriverNodeManager.cs:303-311` | `OnValueChanged` raises `TransitionRaised` on the value-change thread; the subscriber `OnAlarmServiceTransition` drives `ConditionSink.OnTransition``alarm.ReportEvent`. `DriverNodeManager.Dispose` detaches the handler but does not synch… |
| Server-007 | Medium | Error handling & resilience | `src/Server/ZB.MOM.WW.OtOpcUa.Server/OpcUa/OpcUaApplicationHost.cs:179-183` | `HealthEndpointsHost` is built without a `configDbHealthy` delegate, so the default `() => true` is used — `/healthz` always reports `configDbReachable = true` and never 503s on a DB outage. `_staleConfigFlag` is also never supplied by `Pr… |
| Server-010 | Medium | Security | `src/Server/ZB.MOM.WW.OtOpcUa.Server/OpcUa/OpcUaServerOptions.cs:59`, `src/Server/ZB.MOM.WW.OtOpcUa.Server/OpcUa/OpcUaApplicationHost.cs:284-291` | `AutoAcceptUntrustedClientCertificates` defaults to `true` (`Program.cs` reads `?? true`). `BuildConfiguration` wires a handler that accepts any client cert failing with `BadCertificateUntrusted`. A deployment that forgets to flip the flag… |
| Server-011 | Medium | Security | `src/Server/ZB.MOM.WW.OtOpcUa.Server/OpcUa/OpcUaApplicationHost.cs:322-346` | `BuildUserTokenPolicies` advertises a `UserName` token policy only when `SecurityProfile == Basic256Sha256SignAndEncrypt && Ldap.Enabled`. With the default `SecurityProfile = None` and `Ldap.Enabled = true`, the LDAP authenticator is wired… |
| Server-013 | Medium | Design-document adherence | `src/Server/ZB.MOM.WW.OtOpcUa.Server/OpcUa/OpcUaServerOptions.cs:9-19`, `src/Server/ZB.MOM.WW.OtOpcUa.Server/OpcUa/OpcUaApplicationHost.cs:296-346`, `src/Server/ZB.MOM.WW.OtOpcUa.Server/Program.cs:89` | `docs/security.md` documents 7 transport security profiles and `CLAUDE.md` references a `SecurityProfileResolver`. The code's `OpcUaSecurityProfile` enum has only `None` and `Basic256Sha256SignAndEncrypt`; `BuildSecurityPolicies` adds a po… |
| Admin-010 | Low | OtOpcUa conventions | `Components/App.razor:9,16` | `App.razor` loads Bootstrap CSS and JS from the `cdn.jsdelivr.net` CDN. `admin-ui.md` section "Tech Stack" specifies "Bootstrap 5 vendored under `wwwroot/lib/bootstrap/`" precisely so the Admin app has no third-party runtime dependency. A… |
| Admin-011 | Low | Concurrency & thread safety | `Hubs/FleetStatusPoller.cs:24-26,98-103` | `FleetStatusPoller` keeps three plain `Dictionary<>` fields (`_last`, `_lastRole`, `_lastResilience`) mutated from `PollOnceAsync`. The poller `ExecuteAsync` loop is single-threaded so the steady-state poll path is safe, but `ResetCache()`… |
| Admin-012 | Low | Design-document adherence | `Services/EquipmentCsvImporter.cs:18-19,33-37,229,232` | `EquipmentCsvImporter` declares `EquipmentId` as a required CSV column and parses it into a `required` field. `admin-ui.md` section "Equipment CSV import" (revised after adversarial review finding #4) is explicit: "No `EquipmentId` column… |
@@ -389,3 +263,129 @@ Findings with status `Resolved`, `Won't Fix`, or `Deferred`.
| Driver.TwinCAT-013 | High | Resolved | Design-document adherence | `TwinCATDriver.cs:11-12` (capability list), whole file |
| Server-002 | High | Resolved | Correctness & logic bugs | `src/Server/ZB.MOM.WW.OtOpcUa.Server/Security/AuthorizationGate.cs:60-63` |
| Server-009 | High | Resolved | Security | `src/Server/ZB.MOM.WW.OtOpcUa.Server/Security/LdapOptions.cs:44`, `src/Server/ZB.MOM.WW.OtOpcUa.Server/Program.cs:74` |
| Admin-006 | Medium | Resolved | Security | `Components/Layout/MainLayout.razor:47-49`, `Program.cs:129,131-135` |
| Admin-007 | Medium | Resolved | Design-document adherence | `Components/Pages/Clusters/NewCluster.razor:91,95-96` |
| Admin-008 | Medium | Resolved | Error handling & resilience | `Services/ReservationService.cs:28-37` |
| Admin-009 | Medium | Resolved | Testing coverage | `src/Server/ZB.MOM.WW.OtOpcUa.Admin` (whole module) |
| Analyzers-001 | Medium | Resolved | Correctness & logic bugs | `src/Tooling/ZB.MOM.WW.OtOpcUa.Analyzers/UnwrappedCapabilityCallAnalyzer.cs:135-139` |
| Analyzers-006 | Medium | Resolved | Testing coverage | `tests/Tooling/ZB.MOM.WW.OtOpcUa.Analyzers.Tests/UnwrappedCapabilityCallAnalyzerTests.cs` |
| Client.CLI-001 | Medium | Resolved | Correctness & logic bugs | `Commands/HistoryReadCommand.cs:73`, `Commands/HistoryReadCommand.cs:76` |
| Client.CLI-005 | Medium | Resolved | Concurrency & thread safety | `Commands/SubscribeCommand.cs:66-78`, `Commands/AlarmsCommand.cs:52-64` |
| Client.Shared-001 | Medium | Resolved | Correctness & logic bugs | `OpcUaClientService.cs:552` |
| Client.Shared-002 | Medium | Resolved | Correctness & logic bugs | `OpcUaClientService.cs:351-355`, `OpcUaClientService.cs:373` |
| Client.Shared-007 | Medium | Resolved | Concurrency & thread safety | `OpcUaClientService.cs:581-622` |
| Client.Shared-008 | Medium | Resolved | Error handling & resilience | `OpcUaClientService.cs:170-180`, `Helpers/ValueConverter.cs:15-31` |
| Client.UI-001 | Medium | Resolved | Correctness & logic bugs | `ViewModels/HistoryViewModel.cs:76`, `ViewModels/HistoryViewModel.cs:77` |
| Client.UI-002 | Medium | Resolved | Correctness & logic bugs | `ViewModels/MainWindowViewModel.cs:255`, `ViewModels/MainWindowViewModel.cs:333` |
| Client.UI-005 | Medium | Resolved | Concurrency & thread safety | `ViewModels/MainWindowViewModel.cs:286-304`, `ViewModels/MainWindowViewModel.cs:155-189` |
| Client.UI-007 | Medium | Resolved | Security | `Services/UserSettings.cs:22-23`, `Services/JsonSettingsService.cs:38-50`, `ViewModels/MainWindowViewModel.cs:393-408` |
| Client.UI-008 | Medium | Resolved | Performance & resource management | `ViewModels/MainWindowViewModel.cs:18`, `ViewModels/MainWindowViewModel.cs:125-148`, `App.axaml.cs:18-32` |
| Configuration-002 | Medium | Resolved | Correctness & logic bugs | `src/Core/ZB.MOM.WW.OtOpcUa.Configuration/Migrations/20260417215224_StoredProcedures.cs:325` |
| Configuration-003 | Medium | Resolved | Correctness & logic bugs | `src/Core/ZB.MOM.WW.OtOpcUa.Configuration/Validation/DraftValidator.cs:73` |
| Configuration-006 | Medium | Resolved | Error handling & resilience | `src/Core/ZB.MOM.WW.OtOpcUa.Configuration/LocalCache/ResilientConfigReader.cs:79` |
| Configuration-009 | Medium | Resolved | Security | `src/Core/ZB.MOM.WW.OtOpcUa.Configuration/DesignTimeDbContextFactory.cs:14` |
| Core-003 | Medium | Resolved | Correctness & logic bugs | `src/Core/ZB.MOM.WW.OtOpcUa.Core/Authorization/PermissionTrie.cs:80-98` |
| Core-005 | Medium | Resolved | Concurrency & thread safety | `src/Core/ZB.MOM.WW.OtOpcUa.Core/Authorization/PermissionTrieCache.cs:59-70` |
| Core-006 | Medium | Resolved | Concurrency & thread safety | `src/Core/ZB.MOM.WW.OtOpcUa.Core/OpcUa/GenericDriverNodeManager.cs:42-64` |
| Core-007 | Medium | Resolved | Error handling & resilience | `src/Core/ZB.MOM.WW.OtOpcUa.Core/Resilience/AlarmSurfaceInvoker.cs:75-83` |
| Core.Abstractions-001 | Medium | Resolved | Correctness & logic bugs | `src/Core/ZB.MOM.WW.OtOpcUa.Core.Abstractions/PollGroupEngine.cs:112` |
| Core.Abstractions-002 | Medium | Resolved | Correctness & logic bugs | `src/Core/ZB.MOM.WW.OtOpcUa.Core.Abstractions/PollGroupEngine.cs:105-109` |
| Core.Abstractions-003 | Medium | Resolved | Concurrency & thread safety | `src/Core/ZB.MOM.WW.OtOpcUa.Core.Abstractions/PollGroupEngine.cs:64,121-130` |
| Core.AlarmHistorian-003 | Medium | Resolved | OtOpcUa conventions | `src/Core/ZB.MOM.WW.OtOpcUa.Core.AlarmHistorian/SqliteStoreAndForwardSink.cs:107-127,218-243,246-253` |
| Core.AlarmHistorian-005 | Medium | Resolved | Concurrency & thread safety | `src/Core/ZB.MOM.WW.OtOpcUa.Core.AlarmHistorian/SqliteStoreAndForwardSink.cs:66-71,141-143,199,386-388` |
| Core.AlarmHistorian-007 | Medium | Resolved | Error handling & resilience | `src/Core/ZB.MOM.WW.OtOpcUa.Core.AlarmHistorian/SqliteStoreAndForwardSink.cs:172-174` |
| Core.AlarmHistorian-009 | Medium | Resolved | Design-document adherence | `src/Core/ZB.MOM.WW.OtOpcUa.Core.AlarmHistorian/SqliteStoreAndForwardSink.cs:317-347` |
| Core.AlarmHistorian-010 | Medium | Resolved | Testing coverage | `tests/Core/ZB.MOM.WW.OtOpcUa.Core.AlarmHistorian.Tests/SqliteStoreAndForwardSinkTests.cs` |
| Core.ScriptedAlarms-002 | Medium | Resolved | Correctness & logic bugs | `ScriptedAlarmEngine.cs:162`, `ScriptedAlarmEngine.cs:90` |
| Core.ScriptedAlarms-004 | Medium | Resolved | Concurrency & thread safety | `ScriptedAlarmEngine.cs:138-143`, `ScriptedAlarmEngine.cs:227-234` |
| Core.ScriptedAlarms-005 | Medium | Resolved | Concurrency & thread safety | `ScriptedAlarmEngine.cs:365-369`, `ScriptedAlarmEngine.cs:416-424` |
| Core.ScriptedAlarms-007 | Medium | Resolved | Error handling & resilience | `ScriptedAlarmEngine.cs:216`, `ScriptedAlarmEngine.cs:251`, `ScriptedAlarmEngine.cs:154`, `ScriptedAlarmEngine.cs:387` |
| Core.ScriptedAlarms-012 | Medium | Resolved | Testing coverage | `tests/Core/ZB.MOM.WW.OtOpcUa.Core.ScriptedAlarms.Tests/ScriptedAlarmEngineTests.cs` |
| Core.Scripting-003 | Medium | Resolved | Security | `TimedScriptEvaluator.cs:9`, `ScriptSandbox.cs:30` |
| Core.Scripting-004 | Medium | Resolved | Correctness & logic bugs | `DependencyExtractor.cs:73` |
| Core.Scripting-007 | Medium | Resolved | Error handling & resilience | `TimedScriptEvaluator.cs:60` |
| Core.Scripting-010 | Medium | Resolved | Testing coverage | `tests/Core/ZB.MOM.WW.OtOpcUa.Core.Scripting.Tests/ScriptSandboxTests.cs:54` |
| Core.VirtualTags-002 | Medium | Resolved | Correctness & logic bugs | `src/Core/ZB.MOM.WW.OtOpcUa.Core.VirtualTags/VirtualTagEngine.cs:237` |
| Core.VirtualTags-003 | Medium | Resolved | Correctness & logic bugs | `src/Core/ZB.MOM.WW.OtOpcUa.Core.VirtualTags/VirtualTagEngine.cs:117-120` |
| Core.VirtualTags-005 | Medium | Resolved | Concurrency & thread safety | `src/Core/ZB.MOM.WW.OtOpcUa.Core.VirtualTags/VirtualTagSource.cs:50-64` |
| Core.VirtualTags-008 | Medium | Resolved | Performance & resource management | `src/Core/ZB.MOM.WW.OtOpcUa.Core.VirtualTags/DependencyGraph.cs:81-115` |
| Core.VirtualTags-012 | Medium | Resolved | Testing coverage | `tests/Core/ZB.MOM.WW.OtOpcUa.Core.VirtualTags.Tests/` |
| Driver.AbCip-004 | Medium | Resolved | Correctness & logic bugs | `AbCipDataType.cs:51-58`, `LibplctagTagRuntime.cs:47-49,53` |
| Driver.AbCip-005 | Medium | Resolved | Correctness & logic bugs | `AbCipDriver.cs:124-141` |
| Driver.AbCip-006 | Medium | Resolved | OtOpcUa conventions | `PlcTagHandle.cs:28-59`, `AbCipDriver.cs:806-807,832-833`, `LibplctagTagRuntime.cs:117` |
| Driver.AbCip-009 | Medium | Resolved | Concurrency & thread safety | `AbCipDriver.cs:621-648`, `AbCipDriver.cs:591-614` |
| Driver.AbCip-010 | Medium | Resolved | Error handling & resilience | `AbCipDriver.cs:621-648`, `AbCipDriver.cs:346-391` |
| Driver.AbCip-014 | Medium | Resolved | Testing coverage | `tests/Drivers/ZB.MOM.WW.OtOpcUa.Driver.AbCip.Tests/AbCipStatusMapperTests.cs:28-40` |
| Driver.AbCip.Cli-001 | Medium | Resolved | Error handling & resilience | `src/Drivers/Cli/ZB.MOM.WW.OtOpcUa.Driver.AbCip.Cli/Commands/WriteCommand.cs:70-85` |
| Driver.AbCip.Cli-002 | Medium | Resolved | Correctness & logic bugs | `src/Drivers/Cli/ZB.MOM.WW.OtOpcUa.Driver.AbCip.Cli/Commands/ProbeCommand.cs:21-23`; `Commands/ReadCommand.cs:24-25`; `Commands/SubscribeCommand.cs:20-22` |
| Driver.AbLegacy-002 | Medium | Resolved | Correctness & logic bugs | `AbLegacyDriver.cs:368` |
| Driver.AbLegacy-003 | Medium | Resolved | Correctness & logic bugs | `AbLegacyAddress.cs:62-95` |
| Driver.AbLegacy-004 | Medium | Resolved | Correctness & logic bugs | `LibplctagLegacyTagRuntime.cs:36-37` |
| Driver.AbLegacy-007 | Medium | Resolved | Concurrency & thread safety | `AbLegacyDriver.cs:411-438`, `AbLegacyDriver.cs:386-409` |
| Driver.AbLegacy-008 | Medium | Resolved | Concurrency & thread safety | `AbLegacyDriver.cs:21`, `AbLegacyDriver.cs:138-146`, `AbLegacyDriver.cs:216-229` |
| Driver.AbLegacy-009 | Medium | Resolved | Error handling & resilience | `AbLegacyDriver.cs:41-74` |
| Driver.AbLegacy-010 | Medium | Resolved | Error handling & resilience | `AbLegacyStatusMapper.cs:26-56` |
| Driver.AbLegacy-012 | Medium | Resolved | Design-document adherence | `PlcFamilies/AbLegacyPlcFamilyProfile.cs:7-54`, `AbLegacyDriver.cs:48-52` |
| Driver.AbLegacy.Cli-001 | Medium | Resolved | Error handling & resilience | `Commands/WriteCommand.cs:46`, `Commands/WriteCommand.cs:62-72` |
| Driver.Cli.Common-002 | Medium | Resolved | Correctness & logic bugs | `src/Drivers/Cli/ZB.MOM.WW.OtOpcUa.Driver.Cli.Common/SnapshotFormatter.cs:101-122` |
| Driver.Cli.Common-003 | Medium | Resolved | Concurrency & thread safety | `src/Drivers/Cli/ZB.MOM.WW.OtOpcUa.Driver.Cli.Common/DriverCommandBase.cs:51-59` |
| Driver.Cli.Common-005 | Medium | Resolved | Testing coverage | `tests/Drivers/Cli/ZB.MOM.WW.OtOpcUa.Driver.Cli.Common.Tests/SnapshotFormatterTests.cs:27-37` |
| Driver.FOCAS-003 | Medium | Resolved | Correctness & logic bugs | `FocasDriver.cs:71-79` |
| Driver.FOCAS-004 | Medium | Resolved | OtOpcUa conventions | `FocasDriver.cs:374-379`, `WireFocasClient.cs:48-50` |
| Driver.FOCAS-005 | Medium | Resolved | Concurrency & thread safety | `FocasDriver.cs:28`, `FocasDriver.cs:206-215`, `FocasDriver.cs:261`, `FocasDriver.cs:274` |
| Driver.FOCAS-006 | Medium | Resolved | Error handling & resilience | `FocasDriver.cs:859-874`, `WireFocasClient.cs:22-31` |
| Driver.FOCAS-012 | Medium | Resolved | Testing coverage | `FocasDriverFactoryExtensions.cs`, `FocasDriver.cs:495-629` (`FixedTreeLoopAsync`) |
| Driver.Galaxy-003 | Medium | Resolved | Correctness & logic bugs | `Runtime/StatusCodeMap.cs:86` |
| Driver.Galaxy-004 | Medium | Resolved | Correctness & logic bugs | `GalaxyDriver.cs:901` |
| Driver.Galaxy-006 | Medium | Resolved | Concurrency & thread safety | `GalaxyDriver.cs:848-861` |
| Driver.Galaxy-007 | Medium | Resolved | Concurrency & thread safety | `GalaxyDriver.cs:937-968` |
| Driver.Galaxy-009 | Medium | Resolved | Error handling & resilience | `GalaxyDriver.cs:354-371` |
| Driver.Galaxy-011 | Medium | Resolved | Performance & resource management | `GalaxyDriver.cs:411` |
| Driver.Galaxy-014 | Medium | Resolved | Testing coverage | `src/Drivers/ZB.MOM.WW.OtOpcUa.Driver.Galaxy` (module-wide) |
| Driver.Historian.Wonderware-002 | Medium | Resolved | Correctness and logic bugs | `Ipc/HistorianFrameHandler.cs:162`, `:181` |
| Driver.Historian.Wonderware-003 | Medium | Resolved | Correctness and logic bugs | `Backend/HistorianDataSource.cs:320-323`, `:457-460` |
| Driver.Historian.Wonderware-006 | Medium | Resolved | Error handling and resilience | `Ipc/PipeServer.cs:120-128` |
| Driver.Historian.Wonderware-009 | Medium | Resolved | Performance and resource management | `Backend/HistorianDataSource.cs:382-395`, `Ipc/Contracts.cs:85-99` |
| Driver.Historian.Wonderware.Client-002 | Medium | Resolved | Correctness & logic bugs | `WonderwareHistorianClient.cs:154-199`, `IAlarmHistorianSink.cs:66-74` |
| Driver.Historian.Wonderware.Client-005 | Medium | Resolved | Error handling & resilience | `Ipc/FrameReader.cs:31-32` |
| Driver.Historian.Wonderware.Client-007 | Medium | Resolved | Security | `WonderwareHistorianClient.cs:276` |
| Driver.Historian.Wonderware.Client-009 | Medium | Resolved | Testing coverage | `tests/Drivers/ZB.MOM.WW.OtOpcUa.Driver.Historian.Wonderware.Client.Tests/WonderwareHistorianClientTests.cs` |
| Driver.Modbus-002 | Medium | Resolved | Correctness & logic bugs | `ModbusDriver.cs:127-186` |
| Driver.Modbus-004 | Medium | Resolved | Performance & resource management | `ModbusDriver.cs:1468-1473` |
| Driver.Modbus-005 | Medium | Resolved | Correctness & logic bugs | `ModbusDriver.cs:777-798,323-330` |
| Driver.Modbus-006 | Medium | Resolved | Error handling & resilience | `ModbusDriver.cs:514-524,532-550` |
| Driver.Modbus.Addressing-002 | Medium | Resolved | Correctness & logic bugs | `ModbusAddressParser.cs:86-94` |
| Driver.Modbus.Addressing-003 | Medium | Resolved | Correctness & logic bugs | `ModbusAddressParser.cs:405-406`, `ModbusAddressParser.cs:128` |
| Driver.Modbus.Addressing-004 | Medium | Resolved | Correctness & logic bugs | `ModbusAddressParser.cs:182-194` |
| Driver.Modbus.Addressing-005 | Medium | Resolved | Error handling & resilience | `ModbusAddressParser.cs:200-213` |
| Driver.Modbus.Addressing-008 | Medium | Resolved | Testing coverage | `tests/Drivers/ZB.MOM.WW.OtOpcUa.Driver.Modbus.Addressing.Tests/` |
| Driver.Modbus.Cli-001 | Medium | Resolved | Correctness & logic bugs | `src/Drivers/Cli/ZB.MOM.WW.OtOpcUa.Driver.Modbus.Cli/Commands/SubscribeCommand.cs:43-51` |
| Driver.Modbus.Cli-002 | Medium | Resolved | Correctness & logic bugs | `src/Drivers/Cli/ZB.MOM.WW.OtOpcUa.Driver.Modbus.Cli/Commands/WriteCommand.cs:54-89` |
| Driver.OpcUaClient-006 | Medium | Resolved | Concurrency & thread safety | `OpcUaClientDriver.cs:1330-1359` |
| Driver.OpcUaClient-007 | Medium | Resolved | Concurrency & thread safety | `OpcUaClientDriver.cs:1374`, `:1376-1383`, `:508` |
| Driver.OpcUaClient-008 | Medium | Resolved | Error handling & resilience | `OpcUaClientDriver.cs:1092-1099` |
| Driver.OpcUaClient-009 | Medium | Resolved | Error handling & resilience | `OpcUaClientDriver.cs:560-564` |
| Driver.OpcUaClient-010 | Medium | Resolved | Correctness & logic bugs | `OpcUaClientDriver.cs:823-824` |
| Driver.OpcUaClient-012 | Medium | Resolved | Security | `OpcUaClientDriver.cs:210-217` |
| Driver.OpcUaClient-013 | Medium | Resolved | Performance & resource management | `OpcUaClientDriver.cs:436-437` |
| Driver.OpcUaClient-015 | Medium | Resolved | Testing coverage | `tests/Drivers/ZB.MOM.WW.OtOpcUa.Driver.OpcUaClient.Tests/*`, `tests/Drivers/ZB.MOM.WW.OtOpcUa.Driver.OpcUaClient.IntegrationTests/OpcUaClientSmokeTests.cs` |
| Driver.S7-002 | Medium | Resolved | Correctness & logic bugs | `S7Driver.cs:350` |
| Driver.S7-004 | Medium | Resolved | OtOpcUa conventions | `S7Driver.cs` (whole file) |
| Driver.S7-008 | Medium | Resolved | Error handling & resilience | `S7Driver.cs:286` |
| Driver.S7-012 | Medium | Resolved | Design-document adherence | `S7DriverOptions.cs:59`, `S7Driver.cs:457` |
| Driver.S7-014 | Medium | Resolved | Testing coverage | `tests/Drivers/ZB.MOM.WW.OtOpcUa.Driver.S7.Tests/` |
| Driver.S7.Cli-001 | Medium | Resolved | Error handling & resilience | `src/Drivers/Cli/ZB.MOM.WW.OtOpcUa.Driver.S7.Cli/Commands/WriteCommand.cs:65-80` |
| Driver.S7.Cli-002 | Medium | Resolved | Design-document adherence | `src/Drivers/Cli/ZB.MOM.WW.OtOpcUa.Driver.S7.Cli/Commands/ReadCommand.cs:22-29`, `Commands/WriteCommand.cs:21-33`, `Commands/SubscribeCommand.cs:18-21`; `docs/Driver.S7.Cli.md:70-73,80-81` |
| Driver.S7.Cli-003 | Medium | Resolved | Error handling & resilience | `src/Drivers/Cli/ZB.MOM.WW.OtOpcUa.Driver.S7.Cli/Commands/ProbeCommand.cs:38-50` |
| Driver.TwinCAT-003 | Medium | Resolved | Correctness & logic bugs | `AdsTwinCATClient.cs:264-281`, `283-300` |
| Driver.TwinCAT-005 | Medium | Resolved | OtOpcUa conventions | `TwinCATDriver.cs` (whole file), `AdsTwinCATClient.cs` (whole file) |
| Driver.TwinCAT-009 | Medium | Resolved | Concurrency & thread safety | `TwinCATDriver.cs:80-99`, `41-72`, `366-388` |
| Driver.TwinCAT-010 | Medium | Resolved | Error handling & resilience | `AdsTwinCATClient.cs:178-195` |
| Driver.TwinCAT-011 | Medium | Resolved | Error handling & resilience | `TwinCATStatusMapper.cs:29-42` |
| Driver.TwinCAT-012 | Medium | Resolved | Performance & resource management | `TwinCATDriver.cs:102`, `AdsTwinCATClient.cs:178-195` |
| Server-003 | Medium | Resolved | Correctness & logic bugs | `src/Server/ZB.MOM.WW.OtOpcUa.Server/Phase7/RingBufferHistoryWriter.cs:96-119` |
| Server-005 | Medium | Resolved | Concurrency & thread safety | `src/Server/ZB.MOM.WW.OtOpcUa.Server/Alarms/AlarmConditionService.cs:166`, `src/Server/ZB.MOM.WW.OtOpcUa.Server/OpcUa/DriverNodeManager.cs:303-311` |
| Server-007 | Medium | Resolved | Error handling & resilience | `src/Server/ZB.MOM.WW.OtOpcUa.Server/OpcUa/OpcUaApplicationHost.cs:179-183` |
| Server-010 | Medium | Resolved | Security | `src/Server/ZB.MOM.WW.OtOpcUa.Server/OpcUa/OpcUaServerOptions.cs:59`, `src/Server/ZB.MOM.WW.OtOpcUa.Server/OpcUa/OpcUaApplicationHost.cs:284-291` |
| Server-011 | Medium | Resolved | Security | `src/Server/ZB.MOM.WW.OtOpcUa.Server/OpcUa/OpcUaApplicationHost.cs:322-346` |
| Server-013 | Medium | Resolved | Design-document adherence | `src/Server/ZB.MOM.WW.OtOpcUa.Server/OpcUa/OpcUaServerOptions.cs:9-19`, `src/Server/ZB.MOM.WW.OtOpcUa.Server/OpcUa/OpcUaApplicationHost.cs:296-346`, `src/Server/ZB.MOM.WW.OtOpcUa.Server/Program.cs:89` |
+13 -13
View File
@@ -7,7 +7,7 @@
| Review date | 2026-05-22 |
| Commit reviewed | `76d35d1` |
| Status | Reviewed |
| Open findings | 12 |
| Open findings | 6 |
## Checklist coverage
@@ -60,13 +60,13 @@
| Severity | Medium |
| Category | Correctness & logic bugs |
| Location | `src/Server/ZB.MOM.WW.OtOpcUa.Server/Phase7/RingBufferHistoryWriter.cs:96-119` |
| Status | Open |
| Status | Resolved |
**Description:** `ReadRawAsync`'s XML doc claims "newest-first," but `TagRingBuffer.Snapshot()` returns oldest-to-newest and the loop preserves that order — so results are oldest-first. Also `maxValuesPerNode` is capped against total buffer size *before* the `[startUtc, endUtc)` filter, so a paged read returns the oldest in-window samples, contradicting the doc and usual HistoryRead expectations.
**Recommendation:** Make code and doc agree on ordering (raw HistoryRead is normally ascending source-timestamp). Apply `maxValuesPerNode` to the in-window count, not the whole buffer.
**Resolution:** _(open)_
**Resolution:** Resolved 2026-05-22 — corrected XML doc from "newest-first" to "oldest-first (ascending source timestamp, matching OPC UA Part 11 §6.4 raw-values default)"; moved `maxValuesPerNode` cap inside the time-window loop so the limit applies only to in-window results, not the whole buffer snapshot.
### Server-004
| Field | Value |
@@ -88,13 +88,13 @@
| Severity | Medium |
| Category | Concurrency & thread safety |
| Location | `src/Server/ZB.MOM.WW.OtOpcUa.Server/Alarms/AlarmConditionService.cs:166`, `src/Server/ZB.MOM.WW.OtOpcUa.Server/OpcUa/DriverNodeManager.cs:303-311` |
| Status | Open |
| Status | Resolved |
**Description:** `OnValueChanged` raises `TransitionRaised` on the value-change thread; the subscriber `OnAlarmServiceTransition` drives `ConditionSink.OnTransition``alarm.ReportEvent`. `DriverNodeManager.Dispose` detaches the handler but does not synchronise against an in-flight `Invoke`. The service is process-shared across drivers, so a transition can dispatch to a `ConditionSink` whose `DriverNodeManager` is concurrently being disposed → `ReportEvent` on a torn-down node manager.
**Recommendation:** Guard `OnAlarmServiceTransition` with a `_disposed` check under `Lock` before `sink.OnTransition`. Document that handlers must tolerate invocation during their owner's disposal.
**Resolution:** _(open)_
**Resolution:** Resolved 2026-05-22 — added `_nodeManagerDisposed` field; `Dispose(bool)` now sets it under `Lock` before detaching the handler; `OnAlarmServiceTransition` checks the flag under the same `Lock` and exits early, preventing forwarding to a sink after the node manager has begun disposal.
### Server-006
| Field | Value |
@@ -116,13 +116,13 @@
| Severity | Medium |
| Category | Error handling & resilience |
| Location | `src/Server/ZB.MOM.WW.OtOpcUa.Server/OpcUa/OpcUaApplicationHost.cs:179-183` |
| Status | Open |
| Status | Resolved |
**Description:** `HealthEndpointsHost` is built without a `configDbHealthy` delegate, so the default `() => true` is used — `/healthz` always reports `configDbReachable = true` and never 503s on a DB outage. `_staleConfigFlag` is also never supplied by `Program.cs`, so the stale-config signal is inert too. `/healthz` degenerates to a pure liveness probe; operators get a false-healthy during a DB outage.
**Recommendation:** Wire a real config-DB probe (cheap cached `SELECT 1`) into `HealthEndpointsHost`, and register `StaleConfigFlag` in `Program.cs`. Or move DB health to `/readyz` and drop the misleading `configDbReachable` field.
**Resolution:** _(open)_
**Resolution:** Resolved 2026-05-22 — added `Func<bool>? configDbHealthy` parameter to `OpcUaApplicationHost` (defaults null, backward-compatible); `Program.cs` constructs a `DbHealthCache` that calls `CanConnectAsync` every 10 s and caches the result, then passes `() => dbHealthCache.IsHealthy`; `/healthz` now reflects real DB reachability and returns 503 on a DB outage (unless stale-config cache is warm).
### Server-008
| Field | Value |
@@ -158,13 +158,13 @@
| Severity | Medium |
| Category | Security |
| Location | `src/Server/ZB.MOM.WW.OtOpcUa.Server/OpcUa/OpcUaServerOptions.cs:59`, `src/Server/ZB.MOM.WW.OtOpcUa.Server/OpcUa/OpcUaApplicationHost.cs:284-291` |
| Status | Open |
| Status | Resolved |
**Description:** `AutoAcceptUntrustedClientCertificates` defaults to `true` (`Program.cs` reads `?? true`). `BuildConfiguration` wires a handler that accepts any client cert failing with `BadCertificateUntrusted`. A deployment that forgets to flip the flag accepts every untrusted client cert, defeating the PKI trust list. With the always-present `None` policy, the default posture is fully open.
**Recommendation:** Default `AutoAcceptUntrustedClientCertificates` to `false`; keep auto-accept as opt-in dev convenience. `docs/security.md` already shows `false` — align code to doc.
**Resolution:** _(open)_
**Resolution:** Resolved 2026-05-22 — `OpcUaServerOptions.AutoAcceptUntrustedClientCertificates` property initialiser changed from `true` to `false` (secure by default, aligning with `docs/security.md`); `Program.cs` config fallback changed from `?? true` to `?? false`.
### Server-011
| Field | Value |
@@ -172,13 +172,13 @@
| Severity | Medium |
| Category | Security |
| Location | `src/Server/ZB.MOM.WW.OtOpcUa.Server/OpcUa/OpcUaApplicationHost.cs:322-346` |
| Status | Open |
| Status | Resolved |
**Description:** `BuildUserTokenPolicies` advertises a `UserName` token policy only when `SecurityProfile == Basic256Sha256SignAndEncrypt && Ldap.Enabled`. With the default `SecurityProfile = None` and `Ldap.Enabled = true`, the LDAP authenticator is wired but no UserName policy is advertised — clients cannot present credentials; the only path in is Anonymous. The operator's intent is silently not honoured, with no diagnostic.
**Recommendation:** Validate config at startup and warn/fail when `Ldap.Enabled = true` but no UserName policy is advertised. Allow UserName tokens on any non-None profile (they are stack-encrypted regardless, per `docs/security.md`).
**Resolution:** _(open)_
**Resolution:** Resolved 2026-05-22 — `BuildUserTokenPolicies` now advertises a `UserName` token policy whenever `Ldap.Enabled && SecurityProfile != None` (previously required `== Basic256Sha256SignAndEncrypt`); `StartAsync` logs a `LogWarning` at startup when `Ldap.Enabled = true` but `SecurityProfile = None`, surfacing the misconfiguration before clients connect.
### Server-012
| Field | Value |
@@ -200,13 +200,13 @@
| Severity | Medium |
| Category | Design-document adherence |
| Location | `src/Server/ZB.MOM.WW.OtOpcUa.Server/OpcUa/OpcUaServerOptions.cs:9-19`, `src/Server/ZB.MOM.WW.OtOpcUa.Server/OpcUa/OpcUaApplicationHost.cs:296-346`, `src/Server/ZB.MOM.WW.OtOpcUa.Server/Program.cs:89` |
| Status | Open |
| Status | Resolved |
**Description:** `docs/security.md` documents 7 transport security profiles and `CLAUDE.md` references a `SecurityProfileResolver`. The code's `OpcUaSecurityProfile` enum has only `None` and `Basic256Sha256SignAndEncrypt`; `BuildSecurityPolicies` adds a policy only for the latter; `SecurityProfileResolver` does not exist in the repo (grep finds it only in docs). `Basic256Sha256-Sign` and all Aes profiles are unimplemented, and `Program.cs:89`'s `Enum.TryParse` silently selects `None` for an unrecognised profile string.
**Recommendation:** Reconcile code and docs — implement the missing profiles + `SecurityProfileResolver`, or trim `docs/security.md` / `CLAUDE.md` to the two supported profiles. At minimum, log a warning when a configured `SecurityProfile` fails to parse instead of silently using `None`.
**Resolution:** _(open)_
**Resolution:** Resolved 2026-05-22 — replaced the silent `Enum.TryParse ?? None` fallback in `Program.cs` with a `ParseSecurityProfile` helper that produces a warning string listing supported profiles when the configured value is unrecognised; the warning is emitted via `Log.Warning` at startup before the host builds, making the misconfiguration immediately visible. Implementing the missing 5 profiles is tracked as a doc-to-code gap rather than a single finding fix.
### Server-014
| Field | Value |
+7 -1
View File
@@ -110,7 +110,13 @@ AB CIP ALMD) route to AVEVA Historian via the Wonderware sidecar:
present.
- `SqliteStoreAndForwardSink` queues each transition to a local
SQLite database and drains in the background via the resolved
writer.
writer. **The durability guarantee is bounded**: the queue capacity
defaults to 1,000,000 rows; under a sustained historian outage,
older non-dead-lettered rows are evicted (oldest first) to make
room for new events. The `HistorianSinkStatus.EvictedCount` counter
surfaces lifetime eviction events to the Admin UI
`/alarms/historian` diagnostics page so operators can detect silent
data loss without log scraping.
- Sidecar (PR C.1 + C.2) forwards the events to `aahClientManaged`'s
alarm-event write API; the live SDK call site is pinned during
PR D.1's deploy-rig validation.
+1 -1
View File
@@ -35,7 +35,7 @@ new ScriptedAlarmDefinition(
## Predicate evaluation
Alarm predicates reuse the same Roslyn sandbox as virtual tags — `ScriptEvaluator<AlarmPredicateContext, bool>` compiles the source, `TimedScriptEvaluator` wraps it with the configured timeout (default from `TimedScriptEvaluator.DefaultTimeout`), and `DependencyExtractor` statically harvests the tag paths the script reads. The sandbox rules (forbidden types, cancellation, logging sinks) are documented in [VirtualTags.md](VirtualTags.md); ScriptedAlarms does not redefine them.
Alarm predicates reuse the same Roslyn sandbox as virtual tags — `ScriptEvaluator<AlarmPredicateContext, bool>` compiles the source, `TimedScriptEvaluator` wraps it with the configured timeout (default from `TimedScriptEvaluator.DefaultTimeout`), and `DependencyExtractor` statically harvests the tag paths the script reads. The sandbox rules (forbidden types, cancellation, logging sinks) are documented in [VirtualTags.md](VirtualTags.md); ScriptedAlarms does not redefine them. The known memory / CPU resource limits are documented there as well.
`AlarmPredicateContext` (`AlarmPredicateContext.cs`) is the script's `ScriptContext` subclass:
+7 -1
View File
@@ -18,7 +18,13 @@ User scripts are compiled via `Microsoft.CodeAnalysis.CSharp.Scripting` against
`ScriptSandbox.Build` allow-lists exactly: `System.Private.CoreLib` (primitives + `Math` + `Convert`), `System.Linq`, `Core.Abstractions` (for `DataValueSnapshot` / `DriverDataType`), `Core.Scripting` (for `ScriptContext` + `Deadband`), `Serilog` (for `ILogger`), and the concrete context type's assembly. Pre-imported namespaces: `System`, `System.Linq`, `ZB.MOM.WW.OtOpcUa.Core.Abstractions`, `ZB.MOM.WW.OtOpcUa.Core.Scripting`.
`ForbiddenTypeAnalyzer.ForbiddenNamespacePrefixes` currently denies `System.IO`, `System.Net`, `System.Diagnostics`, `System.Reflection`, `System.Threading.Thread`, `System.Runtime.InteropServices`, `Microsoft.Win32`. Matching is by prefix against the resolved symbol's containing namespace, so `System.Net` catches `System.Net.Http.HttpClient` and every subnamespace. `System.Environment` is explicitly allowed.
`ForbiddenTypeAnalyzer.ForbiddenNamespacePrefixes` currently denies `System.IO`, `System.Net`, `System.Diagnostics`, `System.Reflection`, `System.Threading.Thread`, `System.Threading.Tasks`, `System.Runtime.InteropServices`, `Microsoft.Win32`. Matching is by prefix against the resolved symbol's containing namespace, so `System.Net` catches `System.Net.Http.HttpClient` and every subnamespace. `System.Threading.Tasks` is denied because scripts are synchronous predicates with no legitimate need to start background tasks — a `Task.Run` fan-out would outlive the per-evaluation timeout entirely (Core.Scripting-003). `System.Environment`, `System.AppDomain`, `System.GC`, and `System.Activator` are denied type-granularly via `ForbiddenFullTypeNames` because they live directly in the `System` namespace (which is otherwise allowed for primitives) — `Environment.Exit` / `FailFast` terminate the host process outright (Core.Scripting-001).
#### Known resource limits (accepted trade-offs)
The sandbox cannot prevent a script from **allocating unbounded memory**. A script calling `new byte[int.MaxValue]` repeatedly, or accumulating a large LINQ enumeration, can drive the server process to `OutOfMemoryException` before the 250 ms timeout fires. Script authoring is gated behind the Admin permission as the primary control; the test-harness preview (Stream F.4) allows operators to exercise a script before publishing. Out-of-process script execution is a v3 concern.
Similarly, **`System.Threading.Tasks` is now denied** (Core.Scripting-003), which prevents `Task.Run` / `Parallel` fan-out that would spawn background work outliving the timeout. However, a tight CPU-bound loop still runs on its thread-pool thread after `WaitAsync` returns — see the `TimedScriptEvaluator` remarks for detail. The orphaned thread is reclaimed when the Roslyn runtime eventually returns; in practice the operator fixes the script once the structured timeout warning appears in `scripts-*.log`.
### Compile cache (`CompiledScriptCache<TContext, TResult>`)
@@ -1,7 +1,9 @@
using System.Threading.Channels;
using CliFx.Attributes;
using CliFx.Infrastructure;
using ZB.MOM.WW.OtOpcUa.Client.CLI.Helpers;
using ZB.MOM.WW.OtOpcUa.Client.Shared;
using ZB.MOM.WW.OtOpcUa.Client.Shared.Models;
namespace ZB.MOM.WW.OtOpcUa.Client.CLI.Commands;
@@ -49,19 +51,33 @@ public class AlarmsCommand : CommandBase
var sourceNodeId = NodeIdParser.Parse(NodeId);
service.AlarmEvent += (_, e) =>
// Channel serialises SDK notification-thread writes to the main async loop so
// that concurrent alarm callbacks never interleave on the shared TextWriter.
var outputChannel = Channel.CreateUnbounded<string>(
new UnboundedChannelOptions { SingleReader = true });
void AlarmEventHandler(object? sender, AlarmEventArgs e)
{
console.Output.WriteLine($"[{e.Time:O}] ALARM {e.SourceName}");
console.Output.WriteLine($" Condition: {e.ConditionName}");
var activeStr = e.ActiveState ? "Active" : "Inactive";
var ackedStr = e.AckedState ? "Acknowledged" : "Unacknowledged";
console.Output.WriteLine($" State: {activeStr}, {ackedStr}");
console.Output.WriteLine($" Severity: {e.Severity}");
if (!string.IsNullOrEmpty(e.Message))
console.Output.WriteLine($" Message: {e.Message}");
console.Output.WriteLine($" Retain: {e.Retain}");
console.Output.WriteLine();
};
try
{
var activeStr = e.ActiveState ? "Active" : "Inactive";
var ackedStr = e.AckedState ? "Acknowledged" : "Unacknowledged";
outputChannel.Writer.TryWrite($"[{e.Time:O}] ALARM {e.SourceName}");
outputChannel.Writer.TryWrite($" Condition: {e.ConditionName}");
outputChannel.Writer.TryWrite($" State: {activeStr}, {ackedStr}");
outputChannel.Writer.TryWrite($" Severity: {e.Severity}");
if (!string.IsNullOrEmpty(e.Message))
outputChannel.Writer.TryWrite($" Message: {e.Message}");
outputChannel.Writer.TryWrite($" Retain: {e.Retain}");
outputChannel.Writer.TryWrite(string.Empty);
}
catch
{
// Never let handler exceptions escape into the SDK callback.
}
}
service.AlarmEvent += AlarmEventHandler;
await service.SubscribeAlarmsAsync(sourceNodeId, Interval, ct);
await console.Output.WriteLineAsync(
@@ -78,6 +94,14 @@ public class AlarmsCommand : CommandBase
await console.Output.WriteLineAsync($"Condition refresh not supported: {ex.Message}");
}
// Drain the output channel on the main thread until cancellation fires.
using var drainCts = CancellationTokenSource.CreateLinkedTokenSource(ct);
var drainTask = Task.Run(async () =>
{
await foreach (var line in outputChannel.Reader.ReadAllAsync(drainCts.Token))
await console.Output.WriteLineAsync(line);
}, CancellationToken.None);
// Wait until cancellation
try
{
@@ -88,6 +112,12 @@ public class AlarmsCommand : CommandBase
// Expected on Ctrl+C
}
// Stop accepting new notifications before writing final output.
service.AlarmEvent -= AlarmEventHandler;
outputChannel.Writer.Complete();
await drainCts.CancelAsync();
try { await drainTask; } catch (OperationCanceledException) { }
await service.UnsubscribeAlarmsAsync();
await console.Output.WriteLineAsync("Unsubscribed.");
}
@@ -1,3 +1,4 @@
using System.Globalization;
using CliFx.Attributes;
using CliFx.Infrastructure;
using Opc.Ua;
@@ -27,13 +28,13 @@ public class HistoryReadCommand : CommandBase
/// <summary>
/// Gets the optional history start time string supplied by the operator.
/// </summary>
[CommandOption("start", Description = "Start time (ISO 8601 or date string, default: 24 hours ago)")]
[CommandOption("start", Description = "Start time in ISO 8601 UTC format, e.g. 2026-01-15T08:00:00Z (default: 24 hours ago)")]
public string? StartTime { get; init; }
/// <summary>
/// Gets the optional history end time string supplied by the operator.
/// </summary>
[CommandOption("end", Description = "End time (ISO 8601 or date string, default: now)")]
[CommandOption("end", Description = "End time in ISO 8601 UTC format, e.g. 2026-01-15T09:00:00Z (default: now)")]
public string? EndTime { get; init; }
/// <summary>
@@ -70,10 +71,12 @@ public class HistoryReadCommand : CommandBase
var nodeId = NodeIdParser.ParseRequired(NodeId);
var start = string.IsNullOrEmpty(StartTime)
? DateTime.UtcNow.AddHours(-24)
: DateTime.Parse(StartTime).ToUniversalTime();
: DateTime.Parse(StartTime, CultureInfo.InvariantCulture,
DateTimeStyles.AssumeUniversal | DateTimeStyles.AdjustToUniversal);
var end = string.IsNullOrEmpty(EndTime)
? DateTime.UtcNow
: DateTime.Parse(EndTime).ToUniversalTime();
: DateTime.Parse(EndTime, CultureInfo.InvariantCulture,
DateTimeStyles.AssumeUniversal | DateTimeStyles.AdjustToUniversal);
IReadOnlyList<DataValue> values;
@@ -1,9 +1,11 @@
using System.Collections.Concurrent;
using System.Threading.Channels;
using CliFx.Attributes;
using CliFx.Infrastructure;
using Opc.Ua;
using ZB.MOM.WW.OtOpcUa.Client.CLI.Helpers;
using ZB.MOM.WW.OtOpcUa.Client.Shared;
using ZB.MOM.WW.OtOpcUa.Client.Shared.Models;
namespace ZB.MOM.WW.OtOpcUa.Client.CLI.Commands;
@@ -63,19 +65,35 @@ public class SubscribeCommand : CommandBase
var everBad = new ConcurrentDictionary<string, byte>();
var displayNameByNodeId = targets.ToDictionary(t => t.nodeId.ToString(), t => t.displayPath);
service.DataChanged += (_, e) =>
// Channel serialises notification-thread writes to the main async loop so that
// concurrent SDK callbacks and main-thread summary output never interleave on
// the shared TextWriter.
var outputChannel = Channel.CreateUnbounded<string>(
new UnboundedChannelOptions { SingleReader = true });
void DataChangedHandler(object? sender, DataChangedEventArgs e)
{
var key = e.NodeId.ToString();
lastStatus[key] = (e.Value.StatusCode, DateTime.UtcNow, e.Value.Value);
updateCount.AddOrUpdate(key, 1, (_, v) => v + 1);
if (!StatusCode.IsGood(e.Value.StatusCode))
everBad.TryAdd(key, 0);
if (!Quiet)
try
{
console.Output.WriteLine(
$"[{e.Value.SourceTimestamp:O}] {displayNameByNodeId.GetValueOrDefault(key, key)} = {e.Value.Value} ({e.Value.StatusCode})");
var key = e.NodeId.ToString();
lastStatus[key] = (e.Value.StatusCode, DateTime.UtcNow, e.Value.Value);
updateCount.AddOrUpdate(key, 1, (_, v) => v + 1);
if (!StatusCode.IsGood(e.Value.StatusCode))
everBad.TryAdd(key, 0);
if (!Quiet)
{
var line =
$"[{e.Value.SourceTimestamp:O}] {displayNameByNodeId.GetValueOrDefault(key, key)} = {e.Value.Value} ({e.Value.StatusCode})";
outputChannel.Writer.TryWrite(line);
}
}
};
catch
{
// Never let handler exceptions escape into the SDK callback.
}
}
service.DataChanged += DataChangedHandler;
var subscribed = 0;
foreach (var (nodeId, _) in targets)
@@ -94,6 +112,14 @@ public class SubscribeCommand : CommandBase
await console.Output.WriteLineAsync(
$"Subscribed to {subscribed}/{targets.Count} nodes (interval: {Interval}ms). Press Ctrl+C to stop and print summary.");
// Drain the output channel on the main thread until cancellation fires.
using var drainCts = CancellationTokenSource.CreateLinkedTokenSource(ct);
var drainTask = Task.Run(async () =>
{
await foreach (var line in outputChannel.Reader.ReadAllAsync(drainCts.Token))
await console.Output.WriteLineAsync(line);
}, CancellationToken.None);
try
{
if (DurationSeconds > 0)
@@ -105,6 +131,12 @@ public class SubscribeCommand : CommandBase
{
}
// Stop accepting new notifications before writing the summary.
service.DataChanged -= DataChangedHandler;
outputChannel.Writer.Complete();
await drainCts.CancelAsync();
try { await drainTask; } catch (OperationCanceledException) { }
// Summary
var summary = new List<string>();
summary.Add("");
@@ -127,10 +159,10 @@ public class SubscribeCommand : CommandBase
}
var neverWentBad = targets
.Where(t => !everBad.ContainsKey(t.nodeId.ToString()))
.Where(t => lastStatus.ContainsKey(t.nodeId.ToString()) && !everBad.ContainsKey(t.nodeId.ToString()))
.Select(t => t.displayPath)
.ToList();
var didGoBad = targets.Count - neverWentBad.Count;
var didGoBad = targets.Count(t => everBad.ContainsKey(t.nodeId.ToString()));
summary.Add($"Total subscribed: {targets.Count}");
summary.Add($" Ever went BAD during window: {didGoBad}");
@@ -10,23 +10,53 @@ public static class ValueConverter
/// Converts a raw string value into the runtime type expected by the target node.
/// </summary>
/// <param name="rawValue">The raw string supplied by the user.</param>
/// <param name="currentValue">The current node value used to infer the target type. May be null.</param>
/// <param name="currentValue">
/// The current node value used to infer the target type. When <c>null</c> the raw string
/// is returned unchanged; callers should validate this case before writing.
/// </param>
/// <returns>A typed value suitable for an OPC UA write request.</returns>
/// <exception cref="FormatException">
/// Thrown with a descriptive message when <paramref name="rawValue"/> cannot be parsed
/// into the type inferred from <paramref name="currentValue"/>.
/// </exception>
public static object ConvertValue(string rawValue, object? currentValue)
{
return currentValue switch
try
{
bool => bool.Parse(rawValue),
byte => byte.Parse(rawValue),
short => short.Parse(rawValue),
ushort => ushort.Parse(rawValue),
int => int.Parse(rawValue),
uint => uint.Parse(rawValue),
long => long.Parse(rawValue),
ulong => ulong.Parse(rawValue),
float => float.Parse(rawValue),
double => double.Parse(rawValue),
_ => rawValue
return currentValue switch
{
bool => ParseBool(rawValue),
byte => byte.Parse(rawValue),
short => short.Parse(rawValue),
ushort => ushort.Parse(rawValue),
int => int.Parse(rawValue),
uint => uint.Parse(rawValue),
long => long.Parse(rawValue),
ulong => ulong.Parse(rawValue),
float => float.Parse(rawValue),
double => double.Parse(rawValue),
_ => rawValue
};
}
catch (Exception ex) when (ex is FormatException or OverflowException)
{
var targetType = currentValue?.GetType().Name ?? "unknown";
throw new FormatException(
$"Cannot convert value \"{rawValue}\" to target type {targetType}: {ex.Message}", ex);
}
}
/// <summary>
/// Parses a boolean from common string representations including numeric and word forms.
/// Accepts: <c>true</c>/<c>false</c>, <c>1</c>/<c>0</c>, <c>yes</c>/<c>no</c> (case-insensitive).
/// </summary>
private static bool ParseBool(string rawValue)
{
return rawValue.Trim().ToLowerInvariant() switch
{
"true" or "1" or "yes" => true,
"false" or "0" or "no" => false,
_ => throw new FormatException($"String '{rawValue}' was not recognized as a valid Boolean.")
};
}
}
@@ -187,6 +187,9 @@ public sealed class OpcUaClientService : IOpcUaClientService
if (value is string rawString)
{
var currentDataValue = await _session!.ReadValueAsync(nodeId, ct);
if (!StatusCode.IsGood(currentDataValue.StatusCode) || currentDataValue.Value == null)
throw new InvalidOperationException(
$"Cannot infer target type for node {nodeId}: read returned status {currentDataValue.StatusCode} with no value. Provide a typed value instead of a string.");
typedValue = ValueConverter.ConvertValue(rawString, currentDataValue.Value);
}
@@ -388,10 +391,17 @@ public sealed class OpcUaClientService : IOpcUaClientService
var redundancySupportValue =
await _session!.ReadValueAsync(VariableIds.Server_ServerRedundancy_RedundancySupport, ct);
var redundancyMode = ((RedundancySupport)(int)redundancySupportValue.Value).ToString();
RedundancySupport redundancySupport;
if (StatusCode.IsGood(redundancySupportValue.StatusCode) && redundancySupportValue.Value != null)
redundancySupport = (RedundancySupport)Convert.ToInt32(redundancySupportValue.Value);
else
redundancySupport = RedundancySupport.None;
var redundancyMode = redundancySupport.ToString();
var serviceLevelValue = await _session.ReadValueAsync(VariableIds.Server_ServiceLevel, ct);
var serviceLevel = (byte)serviceLevelValue.Value;
var serviceLevel = StatusCode.IsGood(serviceLevelValue.StatusCode) && serviceLevelValue.Value != null
? Convert.ToByte(serviceLevelValue.Value)
: (byte)0;
string[] serverUris = [];
try
@@ -627,7 +637,7 @@ public sealed class OpcUaClientService : IOpcUaClientService
private void OnAlarmEventNotification(EventFieldList eventFields)
{
var fields = eventFields.EventFields;
if (fields == null || fields.Count < 6)
if (fields == null || fields.Count < 1)
return;
var eventId = fields.Count > 0 ? fields[0].Value as byte[] : null;
@@ -656,6 +666,8 @@ public sealed class OpcUaClientService : IOpcUaClientService
// Fallback: read InAlarm/Acked from condition node Galaxy attributes
// when the server doesn't populate standard event fields.
// Must run on a background thread to avoid deadlocking the notification thread.
// Capture the session reference now; skip supplemental reads if the session has
// been replaced by a concurrent failover before the Task.Run body executes.
if (ackedField == null && activeField == null && conditionNodeId != null && _session != null)
{
var session = _session;
@@ -663,6 +675,11 @@ public sealed class OpcUaClientService : IOpcUaClientService
var capturedMessage = message;
_ = Task.Run(async () =>
{
// If the session was replaced by a failover before we started reading,
// skip the supplemental reads to avoid hitting a disposed session.
if (!ReferenceEquals(session, _session))
return;
try
{
var inAlarmValue = await session.ReadValueAsync(NodeId.Parse($"{capturedConditionNodeId}.InAlarm"));
@@ -687,9 +704,16 @@ public sealed class OpcUaClientService : IOpcUaClientService
}
catch { /* DescAttrName may not exist */ }
}
catch (ObjectDisposedException)
{
// Session was disposed during supplemental reads (concurrent failover);
// skip the event rather than delivering stale/default states.
Logger.Debug("Supplemental alarm read skipped — session disposed during failover for {ConditionNodeId}", capturedConditionNodeId);
return;
}
catch
{
// Supplemental read failed; use defaults
// Other supplemental read failure; deliver event with defaults
}
AlarmEvent?.Invoke(this, new AlarmEventArgs(
@@ -17,11 +17,6 @@ public sealed class UserSettings
/// </summary>
public string? Username { get; set; }
/// <summary>
/// Gets or sets the persisted password for authenticated sessions.
/// </summary>
public string? Password { get; set; }
/// <summary>
/// Gets or sets the transport security mode selected by the user.
/// </summary>
@@ -215,6 +215,16 @@ public partial class AlarmsViewModel : ObservableObject
ActiveAlarmCount = 0;
}
/// <summary>
/// Re-hooks event handlers to the service after a server-side reconnect.
/// Safe to call when already attached (duplicate += is a no-op in .NET multicast delegates).
/// </summary>
public void Reattach()
{
_service.AlarmEvent -= OnAlarmEvent;
_service.AlarmEvent += OnAlarmEvent;
}
/// <summary>
/// Unhooks event handlers from the service.
/// </summary>
@@ -73,7 +73,7 @@ public partial class HistoryViewModel : ObservableObject
{
if (string.IsNullOrEmpty(SelectedNodeId)) return;
IsLoading = true;
_dispatcher.Post(() => IsLoading = true);
_dispatcher.Post(() => Results.Clear());
try
@@ -10,7 +10,7 @@ namespace ZB.MOM.WW.OtOpcUa.Client.UI.ViewModels;
/// <summary>
/// Main window ViewModel coordinating all panels.
/// </summary>
public partial class MainWindowViewModel : ObservableObject
public partial class MainWindowViewModel : ObservableObject, IDisposable
{
private readonly IUiDispatcher _dispatcher;
private readonly IOpcUaClientServiceFactory _factory;
@@ -166,6 +166,8 @@ public partial class MainWindowViewModel : ObservableObject
{
case ConnectionState.Connected:
StatusMessage = $"Connected to {EndpointUrl}";
Subscriptions?.Reattach();
Alarms?.Reattach();
break;
case ConnectionState.Reconnecting:
StatusMessage = "Reconnecting...";
@@ -177,6 +179,8 @@ public partial class MainWindowViewModel : ObservableObject
StatusMessage = "Disconnected";
SessionLabel = string.Empty;
RedundancyInfo = null;
Subscriptions?.Teardown();
Alarms?.Teardown();
BrowseTree?.Clear();
ReadWrite?.Clear();
Subscriptions?.Clear();
@@ -252,7 +256,7 @@ public partial class MainWindowViewModel : ObservableObject
}
// Load root nodes
await BrowseTree.LoadRootsAsync();
if (BrowseTree != null) await BrowseTree.LoadRootsAsync();
// Restore saved subscriptions
if (_savedSubscribedNodes.Count > 0 && Subscriptions != null)
@@ -330,7 +334,7 @@ public partial class MainWindowViewModel : ObservableObject
if (SelectedTreeNodes.Count == 0 || !IsConnected) return;
var node = SelectedTreeNodes[0];
History.SelectedNodeId = node.NodeId;
if (History != null) History.SelectedNodeId = node.NodeId;
SelectedTabIndex = 3; // History tab
}
@@ -376,7 +380,7 @@ public partial class MainWindowViewModel : ObservableObject
var s = _settingsService.Load();
EndpointUrl = s.EndpointUrl;
Username = s.Username;
Password = s.Password;
// Password is intentionally not persisted (security: re-prompt each launch)
SelectedSecurityMode = s.SecurityMode;
FailoverUrls = s.FailoverUrls;
SessionTimeoutSeconds = s.SessionTimeoutSeconds;
@@ -396,7 +400,7 @@ public partial class MainWindowViewModel : ObservableObject
{
EndpointUrl = EndpointUrl,
Username = Username,
Password = Password,
// Password is intentionally not persisted (security: re-prompt each launch)
SecurityMode = SelectedSecurityMode,
FailoverUrls = FailoverUrls,
SessionTimeoutSeconds = SessionTimeoutSeconds,
@@ -407,6 +411,21 @@ public partial class MainWindowViewModel : ObservableObject
});
}
/// <summary>
/// Detaches the connection-state handler and disposes the OPC UA client service, releasing
/// the session, certificate validator, and any background reconnect resources.
/// </summary>
public void Dispose()
{
if (_service != null)
{
_service.ConnectionStateChanged -= OnConnectionStateChanged;
Subscriptions?.Teardown();
Alarms?.Teardown();
_service.Dispose();
}
}
private static string[]? ParseFailoverUrls(string? csv)
{
if (string.IsNullOrWhiteSpace(csv))
@@ -265,6 +265,16 @@ public partial class SubscriptionsViewModel : ObservableObject
SubscriptionCount = 0;
}
/// <summary>
/// Re-hooks event handlers to the service after a server-side reconnect.
/// Safe to call when already attached (duplicate += is a no-op in .NET multicast delegates).
/// </summary>
public void Reattach()
{
_service.DataChanged -= OnDataChanged;
_service.DataChanged += OnDataChanged;
}
/// <summary>
/// Unhooks event handlers from the service.
/// </summary>
@@ -140,7 +140,10 @@ public partial class MainWindow : Window
protected override void OnClosing(WindowClosingEventArgs e)
{
if (DataContext is MainWindowViewModel vm)
{
vm.SaveSettings();
vm.Dispose();
}
base.OnClosing(e);
}
}
@@ -5,19 +5,27 @@ namespace ZB.MOM.WW.OtOpcUa.Configuration;
/// <summary>
/// Used by <c>dotnet ef</c> at design time (migrations, scaffolding). Reads the connection string
/// from the <c>OTOPCUA_CONFIG_CONNECTION</c> environment variable, falling back to the local dev
/// container on <c>localhost:1433</c>.
/// from the <c>OTOPCUA_CONFIG_CONNECTION</c> environment variable.
/// </summary>
/// <remarks>
/// Set the variable before running migration commands, e.g.:
/// <code>
/// $env:OTOPCUA_CONFIG_CONNECTION = "Server=10.100.0.35,14330;Database=OtOpcUaConfig;Trusted_Connection=True;TrustServerCertificate=True;"
/// dotnet ef migrations add MyMigration --project src/Core/ZB.MOM.WW.OtOpcUa.Configuration
/// </code>
/// No credential is embedded in source. Do not add a plaintext password as a fallback.
/// </remarks>
public sealed class DesignTimeDbContextFactory : IDesignTimeDbContextFactory<OtOpcUaConfigDbContext>
{
// Host-port 14330 avoids collision with the native MSSQL14 instance on 1433 (Galaxy "ZB" DB).
private const string DefaultConnectionString =
"Server=localhost,14330;Database=OtOpcUaConfig;User Id=sa;Password=OtOpcUaDev_2026!;TrustServerCertificate=True;Encrypt=False;";
public OtOpcUaConfigDbContext CreateDbContext(string[] args)
{
var connection = Environment.GetEnvironmentVariable("OTOPCUA_CONFIG_CONNECTION")
?? DefaultConnectionString;
var connection = Environment.GetEnvironmentVariable("OTOPCUA_CONFIG_CONNECTION");
if (string.IsNullOrWhiteSpace(connection))
throw new InvalidOperationException(
"OTOPCUA_CONFIG_CONNECTION is not set. " +
"Export the variable before running 'dotnet ef' commands. Example: " +
"Server=10.100.0.35,14330;Database=OtOpcUaConfig;Trusted_Connection=True;TrustServerCertificate=True;");
var options = new DbContextOptionsBuilder<OtOpcUaConfigDbContext>()
.UseSqlServer(connection, sql => sql.MigrationsAssembly(typeof(OtOpcUaConfigDbContext).Assembly.FullName))
@@ -8,5 +8,11 @@ public enum NodeAclScopeKind
UnsArea,
UnsLine,
Equipment,
/// <summary>
/// A Galaxy (SystemPlatform-kind) folder segment anchored below a namespace.
/// Distinguishes folder grants from UNS <see cref="Equipment"/> grants in the
/// <c>AuthorizationDecision.Provenance</c> audit trail and Admin UI diagnostics.
/// </summary>
FolderSegment,
Tag,
}
@@ -48,7 +48,13 @@ public sealed class ResilientConfigReader
UseJitter = true,
Delay = TimeSpan.FromMilliseconds(100),
MaxDelay = TimeSpan.FromSeconds(1),
ShouldHandle = new PredicateBuilder().Handle<Exception>(ex => ex is not OperationCanceledException),
// Handle ALL exceptions including OperationCanceledException. A SQL command-level
// timeout surfaces as TaskCanceledException (derives from OperationCanceledException)
// when the caller's token is NOT cancelled, and must be retried just like any other
// transient error. Polly itself checks the cancellation token between retries and
// stops with OperationCanceledException on genuine caller cancellation regardless of
// this predicate.
ShouldHandle = new PredicateBuilder().Handle<Exception>(),
});
}
@@ -76,7 +82,11 @@ public sealed class ResilientConfigReader
_staleFlag.MarkFresh();
return result;
}
catch (Exception ex) when (ex is not OperationCanceledException)
// Catch all exceptions that are NOT genuine caller cancellations. A SQL command-level
// timeout surfaces as TaskCanceledException (derives from OperationCanceledException)
// but the caller's token is NOT cancelled — we must fall back to the sealed cache for
// that case, not propagate. Only rethrow if the caller actually requested cancellation.
catch (Exception ex) when (ex is not OperationCanceledException || !cancellationToken.IsCancellationRequested)
{
_logger.LogWarning(ex, "Central-DB read failed after retries; falling back to sealed cache for cluster {ClusterId}", clusterId);
// GenerationCacheUnavailableException surfaces intentionally — fails the caller's
@@ -270,7 +270,22 @@ BEGIN
SET NOCOUNT ON;
SET XACT_ABORT ON;
BEGIN TRANSACTION;
-- Transaction-nesting awareness: if a caller (e.g. sp_RollbackToGeneration) already
-- holds a transaction, we use SAVE TRANSACTION so our failure path rolls back only to
-- the savepoint instead of issuing a bare ROLLBACK that wipes the caller's transaction
-- (which sets @@TRANCOUNT = 0 and causes error 3902 on the caller's subsequent COMMIT).
DECLARE @OwnsTxn bit = 0;
DECLARE @SaveName nvarchar(32) = N'sp_PublishGeneration';
IF @@TRANCOUNT = 0
BEGIN
BEGIN TRANSACTION;
SET @OwnsTxn = 1;
END
ELSE
BEGIN
SAVE TRANSACTION sp_PublishGeneration;
END
DECLARE @Lock nvarchar(255) = N'OtOpcUa_Publish_' + @ClusterId;
DECLARE @LockResult int;
@@ -278,20 +293,22 @@ BEGIN
IF @LockResult < 0
BEGIN
RAISERROR('PublishConflict: another publish is in progress for cluster %s', 16, 1, @ClusterId);
ROLLBACK;
IF @OwnsTxn = 1 ROLLBACK;
ELSE ROLLBACK TRANSACTION sp_PublishGeneration;
RETURN;
END
-- sp_ValidateDraft signals every rejection with RAISERROR(..., 16, 1) a severity-16 error is
-- NOT batch-aborting and SET XACT_ABORT ON does not abort the transaction for it, so without a
-- TRY/CATCH control would return here and the draft would publish despite failed validation.
-- Catch the validation error, roll back the publish transaction, and re-raise so the caller sees
-- the real validation failure.
-- Catch the validation error, roll back the publish transaction (only to our savepoint when a
-- caller owns the outer transaction), and re-raise so the caller sees the real validation failure.
BEGIN TRY
EXEC dbo.sp_ValidateDraft @DraftGenerationId = @DraftGenerationId;
END TRY
BEGIN CATCH
IF @@TRANCOUNT > 0 ROLLBACK;
IF @OwnsTxn = 1 ROLLBACK;
ELSE ROLLBACK TRANSACTION sp_PublishGeneration;
THROW;
END CATCH
@@ -324,15 +341,16 @@ BEGIN
IF @@ROWCOUNT = 0
BEGIN
RAISERROR('Draft %I64d for cluster %s not found (was it already published?)', 16, 1, @DraftGenerationId, @ClusterId);
ROLLBACK;
RAISERROR('Draft %I64d for cluster %s not in Draft status (was it already published?)', 16, 1, @DraftGenerationId, @ClusterId);
IF @OwnsTxn = 1 ROLLBACK;
ELSE ROLLBACK TRANSACTION sp_PublishGeneration;
RETURN;
END
INSERT dbo.ConfigAuditLog (Principal, EventType, ClusterId, GenerationId)
VALUES (SUSER_SNAME(), 'Published', @ClusterId, @DraftGenerationId);
COMMIT;
IF @OwnsTxn = 1 COMMIT;
END
";
@@ -0,0 +1,120 @@
using Microsoft.EntityFrameworkCore.Migrations;
#nullable disable
namespace ZB.MOM.WW.OtOpcUa.Configuration.Migrations
{
/// <summary>
/// Admin-008: adds <c>@ReleasedBy</c> parameter to
/// <c>dbo.sp_ReleaseExternalIdReservation</c> so the operator principal name (the LDAP
/// sign-in) is recorded in <c>ExternalIdReservation.ReleasedBy</c> and the
/// <c>ConfigAuditLog.Principal</c> column.
///
/// Prior to this migration the proc used <c>SUSER_SNAME()</c> for both columns, which
/// recorded the shared SQL service account rather than the Admin-UI operator who performed
/// the release — making the audit trail useless for attribution. The stored proc now
/// accepts <c>@ReleasedBy nvarchar(128)</c> and uses it for both columns; validation
/// rejects a null/empty value the same way <c>@ReleaseReason</c> is validated.
/// </summary>
/// <inheritdoc />
public partial class AddReleasedByToReleaseExternalIdReservation : Migration
{
/// <inheritdoc />
protected override void Up(MigrationBuilder migrationBuilder)
{
migrationBuilder.Sql(Procs.ReleaseExternalIdReservationV2);
}
/// <inheritdoc />
protected override void Down(MigrationBuilder migrationBuilder)
{
migrationBuilder.Sql(Procs.ReleaseExternalIdReservationV1);
}
private static class Procs
{
/// <summary>V2 — accepts <c>@ReleasedBy</c> for proper operator attribution.</summary>
public const string ReleaseExternalIdReservationV2 = @"
CREATE OR ALTER PROCEDURE dbo.sp_ReleaseExternalIdReservation
@Kind nvarchar(16),
@Value nvarchar(64),
@ReleaseReason nvarchar(512),
@ReleasedBy nvarchar(128)
AS
BEGIN
SET NOCOUNT ON;
SET XACT_ABORT ON;
IF @ReleaseReason IS NULL OR LEN(@ReleaseReason) = 0
BEGIN
RAISERROR('ReleaseReason is required', 16, 1);
RETURN;
END
IF @ReleasedBy IS NULL OR LEN(@ReleasedBy) = 0
BEGIN
RAISERROR('ReleasedBy is required', 16, 1);
RETURN;
END
UPDATE dbo.ExternalIdReservation
SET ReleasedAt = SYSUTCDATETIME(),
ReleasedBy = @ReleasedBy,
ReleaseReason = @ReleaseReason
WHERE Kind = @Kind AND Value = @Value AND ReleasedAt IS NULL;
IF @@ROWCOUNT = 0
BEGIN
RAISERROR('No active reservation found for (%s, %s)', 16, 1, @Kind, @Value);
RETURN;
END
-- Escape all caller-supplied values via STRING_ESCAPE so quotes/backslashes cannot break the
-- JSON document or inject additional structure into the audit record.
INSERT dbo.ConfigAuditLog (Principal, EventType, DetailsJson)
VALUES (@ReleasedBy, 'ExternalIdReleased',
CONCAT('{""kind"":""', STRING_ESCAPE(@Kind, 'json'),
'"",""value"":""', STRING_ESCAPE(@Value, 'json'), '""}'));
END
";
/// <summary>V1 — original proc (uses SUSER_SNAME() for attribution). Restored on Down().</summary>
public const string ReleaseExternalIdReservationV1 = @"
CREATE OR ALTER PROCEDURE dbo.sp_ReleaseExternalIdReservation
@Kind nvarchar(16),
@Value nvarchar(64),
@ReleaseReason nvarchar(512)
AS
BEGIN
SET NOCOUNT ON;
SET XACT_ABORT ON;
IF @ReleaseReason IS NULL OR LEN(@ReleaseReason) = 0
BEGIN
RAISERROR('ReleaseReason is required', 16, 1);
RETURN;
END
UPDATE dbo.ExternalIdReservation
SET ReleasedAt = SYSUTCDATETIME(),
ReleasedBy = SUSER_SNAME(),
ReleaseReason = @ReleaseReason
WHERE Kind = @Kind AND Value = @Value AND ReleasedAt IS NULL;
IF @@ROWCOUNT = 0
BEGIN
RAISERROR('No active reservation found for (%s, %s)', 16, 1, @Kind, @Value);
RETURN;
END
-- Escape both caller-supplied values via STRING_ESCAPE so quotes/backslashes cannot break the
-- JSON document or inject additional structure into the audit record.
INSERT dbo.ConfigAuditLog (Principal, EventType, DetailsJson)
VALUES (SUSER_SNAME(), 'ExternalIdReleased',
CONCAT('{""kind"":""', STRING_ESCAPE(@Kind, 'json'),
'"",""value"":""', STRING_ESCAPE(@Value, 'json'), '""}'));
END
";
}
}
}
@@ -11,6 +11,18 @@ public sealed class DraftSnapshot
public required long GenerationId { get; init; }
public required string ClusterId { get; init; }
/// <summary>
/// Cluster's Enterprise segment (UNS level 1). When set, <see cref="DraftValidator"/> uses
/// the actual length for path-length checks instead of a conservative 32-char upper bound.
/// </summary>
public string? Enterprise { get; init; }
/// <summary>
/// Cluster's Site segment (UNS level 2). When set, <see cref="DraftValidator"/> uses the
/// actual length for path-length checks instead of a conservative 32-char upper bound.
/// </summary>
public string? Site { get; init; }
public IReadOnlyList<Namespace> Namespaces { get; init; } = [];
public IReadOnlyList<DriverInstance> DriverInstances { get; init; } = [];
public IReadOnlyList<Device> Devices { get; init; } = [];
@@ -59,8 +59,13 @@ public static class DraftValidator
/// <summary>Cluster.Enterprise + Site + area + line + equipment + 4 slashes ≤ 200 chars.</summary>
private static void ValidatePathLength(DraftSnapshot draft, List<ValidationError> errors)
{
// The cluster row isn't in the snapshot — we assume caller pre-validated Enterprise+Site
// length and bound them as constants <= 64 chars each. Here we validate the dynamic portion.
// Use actual Enterprise/Site lengths when the snapshot carries them (populated by
// DraftValidationService from the ServerCluster row). Fall back to a conservative
// 32-char upper bound per segment when not supplied — over-penalises short values
// but never under-penalises long ones, which is acceptable for the fallback case.
var enterpriseLen = draft.Enterprise?.Length ?? 32;
var siteLen = draft.Site?.Length ?? 32;
var areaById = draft.UnsAreas.ToDictionary(a => a.UnsAreaId);
var lineById = draft.UnsLines.ToDictionary(l => l.UnsLineId);
@@ -69,8 +74,7 @@ public static class DraftValidator
if (!lineById.TryGetValue(eq.UnsLineId!, out var line)) continue;
if (!areaById.TryGetValue(line.UnsAreaId, out var area)) continue;
// rough upper bound: Enterprise+Site at most 32+32; add dynamic segments + 4 slashes
var len = 32 + 32 + area.Name.Length + line.Name.Length + eq.Name.Length + 4;
var len = enterpriseLen + siteLen + area.Name.Length + line.Name.Length + eq.Name.Length + 4;
if (len > MaxPathLength)
errors.Add(new("PathTooLong",
$"Equipment path exceeds {MaxPathLength} chars (approx {len})",
@@ -1,3 +1,4 @@
using System.Collections;
using System.Collections.Concurrent;
namespace ZB.MOM.WW.OtOpcUa.Core.Abstractions;
@@ -61,7 +62,7 @@ public sealed class PollGroupEngine : IAsyncDisposable
var handle = new PollSubscriptionHandle(id);
var state = new SubscriptionState(handle, [.. fullReferences], interval, cts);
_subscriptions[id] = state;
_ = Task.Run(() => PollLoopAsync(state, cts.Token), cts.Token);
state.LoopTask = Task.Run(() => PollLoopAsync(state, cts.Token));
return handle;
}
@@ -71,13 +72,27 @@ public sealed class PollGroupEngine : IAsyncDisposable
{
if (handle is PollSubscriptionHandle h && _subscriptions.TryRemove(h.Id, out var state))
{
try { state.Cts.Cancel(); } catch { }
state.Cts.Dispose();
StopState(state);
return true;
}
return false;
}
private static void StopState(SubscriptionState state)
{
try { state.Cts.Cancel(); } catch { }
// Await the loop task (with a generous timeout) before disposing the CTS so:
// (a) no _onChange callback fires after the caller considers the engine torn down, and
// (b) the CTS is not disposed while Task.Delay is still holding a reference to its token,
// which can turn OperationCanceledException into ObjectDisposedException.
var task = state.LoopTask;
if (task is not null)
{
try { task.Wait(TimeSpan.FromSeconds(5)); } catch { }
}
state.Cts.Dispose();
}
/// <summary>Snapshot of active subscription count — exposed for driver diagnostics.</summary>
public int ActiveSubscriptionCount => _subscriptions.Count;
@@ -103,13 +118,22 @@ public sealed class PollGroupEngine : IAsyncDisposable
private async Task PollOnceAsync(SubscriptionState state, bool forceRaise, CancellationToken ct)
{
var snapshots = await _reader(state.TagReferences, ct).ConfigureAwait(false);
// Core.Abstractions-002: validate the reader contract before indexing. A reader that
// returns fewer snapshots than references would silently stall the subscription; surface
// the violation immediately with a descriptive exception instead.
if (snapshots.Count != state.TagReferences.Count)
throw new InvalidOperationException(
$"Reader contract violation: expected {state.TagReferences.Count} snapshots but received {snapshots.Count}. " +
"The reader delegate must return one snapshot per input reference in input order.");
for (var i = 0; i < state.TagReferences.Count; i++)
{
var tagRef = state.TagReferences[i];
var current = snapshots[i];
var lastSeen = state.LastValues.TryGetValue(tagRef, out var prev) ? prev : default;
if (forceRaise || !Equals(lastSeen?.Value, current.Value) || lastSeen?.StatusCode != current.StatusCode)
if (forceRaise || ValuesAreDifferent(lastSeen?.Value, current.Value) || lastSeen?.StatusCode != current.StatusCode)
{
state.LastValues[tagRef] = current;
_onChange(state.Handle, tagRef, current);
@@ -117,16 +141,44 @@ public sealed class PollGroupEngine : IAsyncDisposable
}
}
/// <summary>Cancel every active subscription. Idempotent.</summary>
public ValueTask DisposeAsync()
/// <summary>
/// Returns <c>true</c> when <paramref name="previous"/> and <paramref name="current"/>
/// represent different values. Array values are compared structurally
/// (element-by-element) so that a driver producing a fresh array instance on every poll
/// does not trigger spurious change events when the contents are identical.
/// </summary>
private static bool ValuesAreDifferent(object? previous, object? current)
{
if (previous is Array prevArr && current is Array currArr)
return !StructuralComparisons.StructuralEqualityComparer.Equals(prevArr, currArr);
return !Equals(previous, current);
}
/// <summary>Cancel every active subscription and await all loop tasks. Idempotent.</summary>
public async ValueTask DisposeAsync()
{
// Cancel all loops first so they can all start winding down in parallel.
foreach (var state in _subscriptions.Values)
{
try { state.Cts.Cancel(); } catch { }
}
// Await every loop task before disposing CTSs, ensuring no callback fires after disposal.
var waitTasks = _subscriptions.Values
.Select(s => s.LoopTask ?? Task.CompletedTask)
.ToArray();
if (waitTasks.Length > 0)
{
try { await Task.WhenAll(waitTasks).WaitAsync(TimeSpan.FromSeconds(5)).ConfigureAwait(false); }
catch { }
}
foreach (var state in _subscriptions.Values)
{
state.Cts.Dispose();
}
_subscriptions.Clear();
return ValueTask.CompletedTask;
}
private sealed record SubscriptionState(
@@ -137,6 +189,14 @@ public sealed class PollGroupEngine : IAsyncDisposable
{
public ConcurrentDictionary<string, DataValueSnapshot> LastValues { get; }
= new(StringComparer.OrdinalIgnoreCase);
/// <summary>
/// The background poll-loop task. Assigned immediately after creation in
/// <see cref="Subscribe"/>; awaited during <see cref="Unsubscribe"/> /
/// <see cref="DisposeAsync"/> so disposal is deterministic and no
/// <c>_onChange</c> callback can fire after the caller tears down the subscription.
/// </summary>
public Task? LoopTask { get; set; }
}
private sealed record PollSubscriptionHandle(long Id) : ISubscriptionHandle
@@ -17,7 +17,8 @@ namespace ZB.MOM.WW.OtOpcUa.Core.AlarmHistorian;
/// Which state transition this event represents — "Activated" / "Cleared" /
/// "Acknowledged" / "Confirmed" / "Shelved" / "Unshelved" / "Disabled" / "Enabled" /
/// "CommentAdded". Free-form string because different alarm sources use different
/// vocabularies; the Galaxy.Host handler maps to the historian's enum on the wire.
/// vocabularies; the Wonderware historian sidecar (<c>WonderwareHistorianClient</c>)
/// maps to the historian's enum on the wire.
/// </param>
/// <param name="Message">Fully-rendered message text — template tokens already resolved upstream.</param>
/// <param name="User">Operator who triggered the transition. "system" for engine-driven events (shelving expiry, predicate change).</param>
@@ -2,9 +2,9 @@ namespace ZB.MOM.WW.OtOpcUa.Core.AlarmHistorian;
/// <summary>
/// The historian sink contract — where qualifying alarm events land. Phase 7 plan
/// decision #17: ingestion routes through Galaxy.Host's pipe so we reuse the
/// already-loaded <c>aahClientManaged</c> DLLs without loading 32-bit native code
/// in the main .NET 10 server. Tests use an in-memory fake; production uses
/// decision #17: ingestion routes through the Wonderware historian sidecar
/// (<c>WonderwareHistorianClient</c>), which owns the <c>aahClientManaged</c> DLLs
/// and 32-bit constraints. Tests use an in-memory fake; production uses
/// <see cref="SqliteStoreAndForwardSink"/>.
/// </summary>
/// <remarks>
@@ -45,13 +45,25 @@ public sealed class NullAlarmHistorianSink : IAlarmHistorianSink
}
/// <summary>Diagnostic snapshot surfaced to the Admin UI + /healthz endpoints.</summary>
/// <param name="QueueDepth">Non-dead-lettered rows waiting to be drained to the historian.</param>
/// <param name="DeadLetterDepth">Rows that have been permanently failed or have corrupt payloads; retained until the retention window expires.</param>
/// <param name="LastDrainUtc">UTC timestamp of the most recent drain attempt, or <c>null</c> if no drain has run yet.</param>
/// <param name="LastSuccessUtc">UTC timestamp of the most recent drain tick that acknowledged at least one row, or <c>null</c> if none.</param>
/// <param name="LastError">Message from the most recent writer exception or cardinality violation, cleared on the next successful batch.</param>
/// <param name="DrainState">Current state of the drain worker.</param>
/// <param name="EvictedCount">
/// Lifetime count of non-dead-lettered rows discarded because the queue reached
/// its configured capacity. Non-zero values indicate that accepted alarm events
/// were dropped before reaching the historian — operator attention required.
/// </param>
public sealed record HistorianSinkStatus(
long QueueDepth,
long DeadLetterDepth,
DateTime? LastDrainUtc,
DateTime? LastSuccessUtc,
string? LastError,
HistorianDrainState DrainState);
HistorianDrainState DrainState,
long EvictedCount = 0);
/// <summary>Where the drain worker is in its state machine.</summary>
public enum HistorianDrainState
@@ -62,7 +74,7 @@ public enum HistorianDrainState
BackingOff,
}
/// <summary>Signaled by the Galaxy.Host-side handler when it fails a batch — drain worker uses this to decide retry cadence.</summary>
/// <summary>Returned by the Wonderware historian sidecar per event — drain worker uses this to decide retry cadence.</summary>
public enum HistorianWriteOutcome
{
/// <summary>Successfully persisted to the historian. Remove from queue.</summary>
@@ -73,7 +85,7 @@ public enum HistorianWriteOutcome
PermanentFail,
}
/// <summary>What the drain worker delegates writes to — Stream G wires this to the Galaxy.Host IPC client.</summary>
/// <summary>What the drain worker delegates writes to — production is <c>WonderwareHistorianClient</c> (the Wonderware historian sidecar).</summary>
public interface IAlarmHistorianWriter
{
/// <summary>Push a batch of events to the historian. Returns one outcome per event, same order.</summary>
@@ -6,9 +6,10 @@ namespace ZB.MOM.WW.OtOpcUa.Core.AlarmHistorian;
/// <summary>
/// Phase 7 plan decisions #16#17 implementation: durable SQLite queue on the node
/// absorbs every qualifying alarm event, a drain worker batches rows to Galaxy.Host
/// via <see cref="IAlarmHistorianWriter"/> on an exponential-backoff cadence, and
/// operator acks never block on the historian being reachable.
/// absorbs every qualifying alarm event, a drain worker batches rows to the
/// Wonderware historian sidecar via <see cref="IAlarmHistorianWriter"/> on an
/// exponential-backoff cadence, and operator acks never block on the historian
/// being reachable.
/// </summary>
/// <remarks>
/// <para>
@@ -28,7 +29,11 @@ namespace ZB.MOM.WW.OtOpcUa.Core.AlarmHistorian;
/// Dead-lettered rows stay in place for the configured retention window (default
/// 30 days per Phase 7 plan decision #21) so operators can inspect + manually
/// retry before the sweeper purges them. Regular queue capacity is bounded —
/// overflow evicts the oldest non-dead-lettered rows with a WARN log.
/// overflow evicts the oldest non-dead-lettered rows with a WARN log. The
/// durability guarantee is therefore bounded by <see cref="DefaultCapacity"/>:
/// under a sustained historian outage, accepted events may be evicted before
/// delivery. The <see cref="HistorianSinkStatus.EvictedCount"/> counter makes
/// overflow visible to operators without requiring the WARN log to be scraped.
/// </para>
/// <para>
/// Drain runs on a self-rescheduling one-shot <see cref="System.Threading.Timer"/>.
@@ -67,11 +72,20 @@ public sealed class SqliteStoreAndForwardSink : IAlarmHistorianSink, IDisposable
private Timer? _drainTimer;
private TimeSpan _tickInterval;
private int _backoffIndex;
private bool _disposed;
// Core.AlarmHistorian-005: status fields written by the drain timer thread and
// read concurrently by GetStatus() / health-check threads. Guard all reads and
// writes with this lock so the Admin UI never observes a torn or stale value.
private readonly object _statusLock = new();
private DateTime? _lastDrainUtc;
private DateTime? _lastSuccessUtc;
private string? _lastError;
private HistorianDrainState _drainState = HistorianDrainState.Idle;
private bool _disposed;
// Core.AlarmHistorian-009: lifetime counter of rows evicted due to capacity overflow.
// Surfaces in HistorianSinkStatus so operators can see data-loss events without
// having to scrape the WARN log.
private long _evictedCount;
public SqliteStoreAndForwardSink(
string databasePath,
@@ -113,10 +127,24 @@ public sealed class SqliteStoreAndForwardSink : IAlarmHistorianSink, IDisposable
{
var conn = new SqliteConnection(_connectionString);
conn.Open();
ApplyPragmas(conn);
return conn;
}
/// <summary>Apply busy_timeout + WAL pragmas to an already-open connection (sync).</summary>
private static void ApplyPragmas(SqliteConnection conn)
{
using var pragma = conn.CreateCommand();
pragma.CommandText = "PRAGMA busy_timeout=5000; PRAGMA journal_mode=WAL;";
pragma.ExecuteNonQuery();
return conn;
}
/// <summary>Apply busy_timeout + WAL pragmas to an already-open connection (async).</summary>
private static async Task ApplyPragmasAsync(SqliteConnection conn, CancellationToken ct)
{
using var pragma = conn.CreateCommand();
pragma.CommandText = "PRAGMA busy_timeout=5000; PRAGMA journal_mode=WAL;";
await pragma.ExecuteNonQueryAsync(ct).ConfigureAwait(false);
}
/// <summary>
@@ -153,8 +181,11 @@ public sealed class SqliteStoreAndForwardSink : IAlarmHistorianSink, IDisposable
// Without this catch the fault would be an unobserved exception on an
// async-void timer callback — never logged, never surfaced. Record it
// so the Admin UI / health check sees the stalled drain.
_lastError = ex.Message;
_drainState = HistorianDrainState.BackingOff;
lock (_statusLock)
{
_lastError = ex.Message;
_drainState = HistorianDrainState.BackingOff;
}
_logger.Error(ex, "Historian drain tick faulted; will retry on next tick");
}
finally
@@ -167,23 +198,32 @@ public sealed class SqliteStoreAndForwardSink : IAlarmHistorianSink, IDisposable
private void RescheduleDrain()
{
if (_disposed) return;
HistorianDrainState state;
lock (_statusLock) { state = _drainState; }
// While backing off, wait out the full ladder delay; otherwise the steady
// tick cadence. Never faster than tickInterval.
var delay = _drainState == HistorianDrainState.BackingOff
var delay = state == HistorianDrainState.BackingOff
? (CurrentBackoff > _tickInterval ? CurrentBackoff : _tickInterval)
: _tickInterval;
try { _drainTimer?.Change(delay, Timeout.InfiniteTimeSpan); }
catch (ObjectDisposedException) { /* raced with Dispose — nothing to re-arm */ }
}
public Task EnqueueAsync(AlarmHistorianEvent evt, CancellationToken cancellationToken)
// Core.AlarmHistorian-003: use async SQLite APIs so the emitting thread is not
// blocked waiting for a file-lock or disk write; honor the cancellationToken
// throughout. Microsoft.Data.Sqlite's async surface (OpenAsync /
// ExecuteNonQueryAsync) is a thin wrapper over the synchronous path, so the
// blocking still happens — but on a thread-pool thread, not the caller's thread.
public async Task EnqueueAsync(AlarmHistorianEvent evt, CancellationToken cancellationToken)
{
if (evt is null) throw new ArgumentNullException(nameof(evt));
if (_disposed) throw new ObjectDisposedException(nameof(SqliteStoreAndForwardSink));
using var conn = OpenConnection();
using var conn = new SqliteConnection(_connectionString);
await conn.OpenAsync(cancellationToken).ConfigureAwait(false);
await ApplyPragmasAsync(conn, cancellationToken).ConfigureAwait(false);
EnforceCapacity(conn);
await EnforceCapacityAsync(conn, cancellationToken).ConfigureAwait(false);
using var cmd = conn.CreateCommand();
cmd.CommandText = """
@@ -193,8 +233,7 @@ public sealed class SqliteStoreAndForwardSink : IAlarmHistorianSink, IDisposable
cmd.Parameters.AddWithValue("$alarmId", evt.AlarmId);
cmd.Parameters.AddWithValue("$enqueued", _clock().ToString("O"));
cmd.Parameters.AddWithValue("$payload", JsonSerializer.Serialize(evt));
cmd.ExecuteNonQuery();
return Task.CompletedTask;
await cmd.ExecuteNonQueryAsync(cancellationToken).ConfigureAwait(false);
}
/// <summary>
@@ -209,14 +248,17 @@ public sealed class SqliteStoreAndForwardSink : IAlarmHistorianSink, IDisposable
if (!await _drainGate.WaitAsync(0, ct).ConfigureAwait(false)) return;
try
{
_drainState = HistorianDrainState.Draining;
_lastDrainUtc = _clock();
lock (_statusLock)
{
_drainState = HistorianDrainState.Draining;
_lastDrainUtc = _clock();
}
PurgeAgedDeadLetters();
var batch = ReadBatch();
if (batch.Count == 0)
{
_drainState = HistorianDrainState.Idle;
lock (_statusLock) { _drainState = HistorianDrainState.Idle; }
return;
}
@@ -241,7 +283,7 @@ public sealed class SqliteStoreAndForwardSink : IAlarmHistorianSink, IDisposable
if (events.Count == 0)
{
_drainState = HistorianDrainState.Idle;
lock (_statusLock) { _drainState = HistorianDrainState.Idle; }
return;
}
@@ -249,7 +291,7 @@ public sealed class SqliteStoreAndForwardSink : IAlarmHistorianSink, IDisposable
try
{
outcomes = await _writer.WriteBatchAsync(events, ct).ConfigureAwait(false);
_lastError = null;
lock (_statusLock) { _lastError = null; }
}
catch (OperationCanceledException)
{
@@ -258,16 +300,35 @@ public sealed class SqliteStoreAndForwardSink : IAlarmHistorianSink, IDisposable
catch (Exception ex)
{
// Writer-side exception — treat entire batch as RetryPlease.
_lastError = ex.Message;
lock (_statusLock)
{
_lastError = ex.Message;
_drainState = HistorianDrainState.BackingOff;
}
_logger.Warning(ex, "Historian writer threw on batch of {Count}; deferring retry", events.Count);
BumpBackoff();
_drainState = HistorianDrainState.BackingOff;
return;
}
// Core.AlarmHistorian-007: a cardinality mismatch is a writer contract
// violation — potentially the events were already persisted. Rather than
// throwing (which, pre -006 fix, was swallowed and left _drainState
// stale), treat it as a transient batch failure so the rows stay queued
// and the backoff surface becomes visible to the operator. A deterministic
// mismatch will stall the row until an operator intervenes or the writer
// is fixed — far safer than re-throwing into a fire-and-forget timer.
if (outcomes.Count != events.Count)
throw new InvalidOperationException(
$"Writer returned {outcomes.Count} outcomes for {events.Count} events — expected 1:1");
{
var msg = $"Writer returned {outcomes.Count} outcomes for {events.Count} events — expected 1:1; treating as batch retry";
lock (_statusLock)
{
_lastError = msg;
_drainState = HistorianDrainState.BackingOff;
}
_logger.Warning("Historian writer contract violation: {Msg}", msg);
BumpBackoff();
return;
}
using var conn = OpenConnection();
using var tx = conn.BeginTransaction();
@@ -291,18 +352,20 @@ public sealed class SqliteStoreAndForwardSink : IAlarmHistorianSink, IDisposable
tx.Commit();
var acks = outcomes.Count(o => o == HistorianWriteOutcome.Ack);
if (acks > 0) _lastSuccessUtc = _clock();
lock (_statusLock)
{
if (acks > 0) _lastSuccessUtc = _clock();
if (outcomes.Any(o => o == HistorianWriteOutcome.RetryPlease))
_drainState = HistorianDrainState.BackingOff;
else
_drainState = HistorianDrainState.Idle;
}
if (outcomes.Any(o => o == HistorianWriteOutcome.RetryPlease))
{
BumpBackoff();
_drainState = HistorianDrainState.BackingOff;
}
else
{
ResetBackoff();
_drainState = HistorianDrainState.Idle;
}
}
finally
{
@@ -327,13 +390,29 @@ public sealed class SqliteStoreAndForwardSink : IAlarmHistorianSink, IDisposable
deadlettered = (long)(cmd.ExecuteScalar() ?? 0L);
}
// Core.AlarmHistorian-005: snapshot status fields atomically under the lock
// so the Admin UI never sees a torn DateTime? or stale DrainState.
DateTime? lastDrain, lastSuccess;
string? lastError;
HistorianDrainState drainState;
long evicted;
lock (_statusLock)
{
lastDrain = _lastDrainUtc;
lastSuccess = _lastSuccessUtc;
lastError = _lastError;
drainState = _drainState;
evicted = _evictedCount;
}
return new HistorianSinkStatus(
QueueDepth: queued,
DeadLetterDepth: deadlettered,
LastDrainUtc: _lastDrainUtc,
LastSuccessUtc: _lastSuccessUtc,
LastError: _lastError,
DrainState: _drainState);
LastDrainUtc: lastDrain,
LastSuccessUtc: lastSuccess,
LastError: lastError,
DrainState: drainState,
EvictedCount: evicted);
}
/// <summary>Operator action from Admin UI — retry every dead-lettered row. Non-cascading: they rejoin the regular queue + get a fresh backoff.</summary>
@@ -449,9 +528,44 @@ public sealed class SqliteStoreAndForwardSink : IAlarmHistorianSink, IDisposable
cmd.Parameters.AddWithValue("$n", toEvict);
cmd.ExecuteNonQuery();
}
// Core.AlarmHistorian-009: increment the lifetime eviction counter so the
// Admin UI / health check can report overflow without requiring log scraping.
lock (_statusLock) { _evictedCount += toEvict; }
_logger.Warning(
"Historian queue at capacity {Cap} — evicted {Count} oldest row(s) to make room",
_capacity, toEvict);
"Historian queue at capacity {Cap} — evicted {Count} oldest row(s) to make room (lifetime evictions: {Total})",
_capacity, toEvict, _evictedCount);
}
// Async variant used by EnqueueAsync (Core.AlarmHistorian-003).
private async Task EnforceCapacityAsync(SqliteConnection conn, CancellationToken ct)
{
long count;
using (var cmd = conn.CreateCommand())
{
cmd.CommandText = "SELECT COUNT(*) FROM Queue WHERE DeadLettered = 0";
count = (long)(await cmd.ExecuteScalarAsync(ct).ConfigureAwait(false) ?? 0L);
}
if (count < _capacity) return;
var toEvict = count - _capacity + 1;
using (var cmd = conn.CreateCommand())
{
cmd.CommandText = """
DELETE FROM Queue
WHERE RowId IN (
SELECT RowId FROM Queue
WHERE DeadLettered = 0
ORDER BY RowId ASC
LIMIT $n
)
""";
cmd.Parameters.AddWithValue("$n", toEvict);
await cmd.ExecuteNonQueryAsync(ct).ConfigureAwait(false);
}
lock (_statusLock) { _evictedCount += toEvict; }
_logger.Warning(
"Historian queue at capacity {Cap} — evicted {Count} oldest row(s) to make room (lifetime evictions: {Total})",
_capacity, toEvict, _evictedCount);
}
private void PurgeAgedDeadLetters()
@@ -143,12 +143,17 @@ public sealed class ScriptedAlarmEngine : IDisposable
+ string.Join("\n ", compileFailures));
}
// Seed the value cache with current upstream values + subscribe for changes.
// Seed the value cache with current tag values before subscribing. The
// ReadTag calls happen first so that the initial predicate evaluation below
// (startup recovery, decision #14) uses a consistent snapshot.
// Subscriptions are established AFTER _loaded = true so that any synchronous
// initial-push an ITagUpstreamSource delivers from inside SubscribeTag arrives
// when _alarms is fully initialised. Before _loaded = true, a synchronous push
// would race the in-progress state restore and could overwrite the carefully
// seeded cache with a push that has no defined ordering relative to ReadTag.
// (Core.ScriptedAlarms-004)
foreach (var path in _alarmsReferencing.Keys)
{
_valueCache[path] = _upstream.ReadTag(path);
_upstreamSubscriptions.Add(_upstream.SubscribeTag(path, OnUpstreamChange));
}
// Restore persisted state, falling back to Fresh where nothing was saved,
// then re-derive ActiveState from the current predicate per decision #14.
@@ -163,8 +168,21 @@ public sealed class ScriptedAlarmEngine : IDisposable
}
_loaded = true;
// Subscribe after _loaded = true and full state restore. If an upstream
// implementation pushes its initial value synchronously from inside
// SubscribeTag, OnUpstreamChange will queue a ReevaluateAsync that acquires
// _evalGate — it will correctly block until LoadAsync releases the gate, then
// re-evaluate against the fully-populated _alarms dict.
foreach (var path in _alarmsReferencing.Keys)
_upstreamSubscriptions.Add(_upstream.SubscribeTag(path, OnUpstreamChange));
_engineLogger.Information("ScriptedAlarmEngine loaded {Count} alarm(s)", _alarms.Count);
// Dispose any previously-created timer before reassigning; a second LoadAsync
// call without this would leave two timers firing against the same engine.
// (Core.ScriptedAlarms-002)
_shelvingTimer?.Dispose();
// Start the shelving-check timer — ticks every 5s, expires any timed shelves
// that have passed their UnshelveAtUtc.
_shelvingTimer = new Timer(_ => RunShelvingCheck(),
@@ -220,8 +238,12 @@ public sealed class ScriptedAlarmEngine : IDisposable
try
{
var result = op(state.Condition);
_alarms[alarmId] = state with { Condition = result.State };
// Persist BEFORE updating in-memory so a store failure leaves both
// in-memory and persisted at the prior state rather than diverging.
// If SaveAsync throws the in-memory _alarms entry stays unchanged and
// the exception propagates to the caller. (Core.ScriptedAlarms-007)
await _store.SaveAsync(result.State, ct).ConfigureAwait(false);
_alarms[alarmId] = state with { Condition = result.State };
if (result.Emission != EmissionKind.None) EmitEvent(state, result.State, result.Emission);
}
finally { _evalGate.Release(); }
@@ -248,6 +270,12 @@ public sealed class ScriptedAlarmEngine : IDisposable
await _evalGate.WaitAsync(ct).ConfigureAwait(false);
try
{
// Re-check after acquiring the gate: a Dispose() call may have
// completed between our _evalGate.WaitAsync and here. Writing to a
// disposing store or mutating _alarms after clear is unsafe.
// (Core.ScriptedAlarms-005)
if (_disposed) return;
foreach (var id in alarmIds)
{
if (!_alarms.TryGetValue(id, out var state)) continue;
@@ -255,8 +283,10 @@ public sealed class ScriptedAlarmEngine : IDisposable
state, state.Condition, _clock(), ct).ConfigureAwait(false);
if (!ReferenceEquals(newState, state.Condition))
{
_alarms[id] = state with { Condition = newState };
// Persist before updating in-memory so a store failure leaves
// both sides at the prior state. (Core.ScriptedAlarms-007)
await _store.SaveAsync(newState, ct).ConfigureAwait(false);
_alarms[id] = state with { Condition = newState };
}
}
}
@@ -377,6 +407,13 @@ public sealed class ScriptedAlarmEngine : IDisposable
_ = ShelvingCheckAsync(ids, CancellationToken.None);
}
/// <summary>
/// Test hook — triggers a shelving check synchronously without waiting for
/// the 5-second timer. Allows tests that inject a controllable clock to advance
/// time and immediately drive timed-shelve expiry. (Core.ScriptedAlarms-012)
/// </summary>
internal void RunShelvingCheckForTest() => RunShelvingCheck();
private async Task ShelvingCheckAsync(IReadOnlyList<string> alarmIds, CancellationToken ct)
{
try
@@ -384,6 +421,13 @@ public sealed class ScriptedAlarmEngine : IDisposable
await _evalGate.WaitAsync(ct).ConfigureAwait(false);
try
{
// Re-check after acquiring the gate: Timer.Dispose() does not wait for
// running callbacks, so a shelving-check callback that passed the _disposed
// check in RunShelvingCheck can arrive here after Dispose() has returned.
// Mutating _alarms or saving to a disposed store here is unsafe.
// (Core.ScriptedAlarms-005)
if (_disposed) return;
var now = _clock();
foreach (var id in alarmIds)
{
@@ -391,8 +435,10 @@ public sealed class ScriptedAlarmEngine : IDisposable
var result = Part9StateMachine.ApplyShelvingCheck(state.Condition, now);
if (!ReferenceEquals(result.State, state.Condition))
{
_alarms[id] = state with { Condition = result.State };
// Persist before updating in-memory so a store failure leaves
// both sides at the prior state. (Core.ScriptedAlarms-007)
await _store.SaveAsync(result.State, ct).ConfigureAwait(false);
_alarms[id] = state with { Condition = result.State };
if (result.Emission != EmissionKind.None)
EmitEvent(state, result.State, result.Emission);
}
@@ -427,7 +473,11 @@ public sealed class ScriptedAlarmEngine : IDisposable
_disposed = true;
_shelvingTimer?.Dispose();
UnsubscribeFromUpstream();
_alarms.Clear();
// Do NOT clear _alarms here: Timer.Dispose() does not wait for in-flight callbacks,
// so a ShelvingCheckAsync or ReevaluateAsync can still be running inside _evalGate.
// Those paths now re-check _disposed after acquiring the gate and bail out safely.
// Clearing _alarms outside the gate would race concurrent reads and is unnecessary
// (the whole object is being discarded). (Core.ScriptedAlarms-005)
_alarmsReferencing.Clear();
}
@@ -21,11 +21,15 @@ namespace ZB.MOM.WW.OtOpcUa.Core.Scripting;
/// token.
/// </para>
/// <para>
/// Identifier matching is by spelling: the extractor looks for
/// <c>ctx.GetTag(...)</c> / <c>ctx.SetVirtualTag(...)</c> literally. A deliberately
/// misspelled method call (<c>ctx.GetTagz</c>) is not picked up but will also fail
/// to compile against <see cref="ScriptContext"/>, so there's no way to smuggle a
/// dependency past the extractor while still having a working script.
/// Matching is by spelling: the extractor looks for member-access invocations
/// whose receiver identifier is literally <c>ctx</c> and whose method name is
/// <c>GetTag</c> or <c>SetVirtualTag</c>. A deliberately misspelled method call
/// (<c>ctx.GetTagz</c>) is not picked up but will also fail to compile against
/// <see cref="ScriptContext"/>, so there is no way to smuggle a dependency past the
/// extractor while still having a working script. Calls with the same method name on
/// a different receiver (<c>other.GetTag("X")</c>) are explicitly ignored so that
/// scripts defining local helper types with matching names do not produce spurious
/// dependencies. (Core.Scripting-004.)
/// </para>
/// </remarks>
public static class DependencyExtractor
@@ -67,10 +71,15 @@ public static class DependencyExtractor
public override void VisitInvocationExpression(InvocationExpressionSyntax node)
{
// Only interested in member-access form: ctx.GetTag(...) / ctx.SetVirtualTag(...).
// Anything else (free functions, chained calls, static calls) is ignored — but
// still visit children in case a ctx.GetTag call is nested inside.
if (node.Expression is MemberAccessExpressionSyntax member)
// Only interested in ctx.GetTag(...) / ctx.SetVirtualTag(...) — member-access
// form where the receiver is the identifier "ctx" (the ScriptGlobals<T>.ctx
// field). Calls with the same method name on a different receiver (e.g.
// someHelper.GetTag("X")) are ignored — not picking them up avoids spurious
// dependencies when scripts define local types with matching method names.
// (Core.Scripting-004.)
if (node.Expression is MemberAccessExpressionSyntax member
&& member.Expression is IdentifierNameSyntax receiver
&& receiver.Identifier.ValueText == "ctx")
{
var methodName = member.Name.Identifier.ValueText;
if (methodName is nameof(ScriptContext.GetTag) or nameof(ScriptContext.SetVirtualTag))
@@ -18,9 +18,12 @@ namespace ZB.MOM.WW.OtOpcUa.Core.Scripting;
/// <remarks>
/// <para>
/// Deny-list is the authoritative Phase 7 plan decision #6 set:
/// <c>System.IO</c>, <c>System.Net</c>, <c>System.Diagnostics.Process</c>,
/// <c>System.IO</c>, <c>System.Net</c>, <c>System.Diagnostics</c>,
/// <c>System.Reflection</c>, <c>System.Threading.Thread</c>,
/// <c>System.Runtime.InteropServices</c>.
/// <c>System.Threading.Tasks</c> (scripts are synchronous predicates — no
/// legitimate need to start background tasks; a <c>Task.Run</c> fan-out outlives
/// the evaluation timeout entirely), <c>System.Runtime.InteropServices</c>,
/// <c>Microsoft.Win32</c>. (Core.Scripting-003.)
/// </para>
/// <para>
/// Deny-list prefix match. <c>System.Net</c> catches <c>System.Net.Http</c>,
@@ -58,11 +61,18 @@ public static class ForbiddenTypeAnalyzer
[
"System.IO",
"System.Net",
"System.Diagnostics", // catches Process, ProcessStartInfo, EventLog, Trace/Debug file sinks
"System.Diagnostics", // catches Process, ProcessStartInfo, EventLog, Trace/Debug file sinks
"System.Reflection",
"System.Threading.Thread", // raw Thread — Tasks stay allowed (different namespace)
// System.Threading.Thread is NOT in this list: Thread's containing namespace is
// "System.Threading" (not "System.Threading.Thread"), so a prefix check on
// "System.Threading.Thread" never matches. Thread is denied type-granularly via
// ForbiddenFullTypeNames instead so the check actually fires.
"System.Threading.Tasks", // Task.Run / Parallel — scripts are synchronous predicates
// and have no legitimate need to start background work;
// a Task fan-out outlives the evaluation timeout entirely
// (Core.Scripting-003).
"System.Runtime.InteropServices",
"Microsoft.Win32", // registry
"Microsoft.Win32", // registry
];
/// <summary>
@@ -85,6 +95,11 @@ public static class ForbiddenTypeAnalyzer
/// <item><c>System.Activator</c> — <c>CreateInstance</c> is a
/// reflection-equivalent escape that constructs a forbidden type by name
/// without ever naming it syntactically.</item>
/// <item><c>System.Threading.Thread</c> — raw thread creation bypasses the
/// per-evaluation timeout; denied type-granularly because its containing
/// namespace is <c>System.Threading</c> (shared with allowed types like
/// <c>CancellationToken</c>), so a namespace-prefix rule cannot reach it
/// without blocking unrelated types. (Core.Scripting-010.)</item>
/// </list>
/// </remarks>
public static readonly IReadOnlyList<string> ForbiddenFullTypeNames =
@@ -93,6 +108,11 @@ public static class ForbiddenTypeAnalyzer
"System.AppDomain",
"System.GC",
"System.Activator",
// System.Threading.Thread lives in the System.Threading namespace (shared with
// CancellationToken, SemaphoreSlim, etc.), so a namespace-prefix deny-list cannot
// target it without blocking those legitimate types. Denied type-granularly here.
// (Core.Scripting-010.)
"System.Threading.Thread",
];
/// <summary>
@@ -76,6 +76,14 @@ public sealed class TimedScriptEvaluator<TContext, TResult>
// WaitAsync's synthesized timeout — the inner task may still be running
// on its thread-pool thread (known leak documented in the class summary).
// Wrap so callers can distinguish from user-written timeout logic.
//
// The class docs guarantee "caller-supplied cancel wins over timeout".
// When both fire at nearly the same time, WaitAsync observes them in
// non-deterministic order, so a cancel that arrives a few µs after the
// timeout still reaches here as TimeoutException. Re-check the token so
// the guarantee holds regardless of race ordering. (Core.Scripting-007.)
if (ct.IsCancellationRequested)
throw new OperationCanceledException(ct);
throw new ScriptTimeoutException(Timeout);
}
}
@@ -31,6 +31,11 @@ public sealed class DependencyGraph
private readonly Dictionary<string, HashSet<string>> _dependsOn = new(StringComparer.Ordinal);
private readonly Dictionary<string, HashSet<string>> _dependents = new(StringComparer.Ordinal);
// Cached topological rank — built lazily by TransitiveDependentsInOrder and
// invalidated whenever the graph is mutated (Add / Clear). Avoids re-running
// a full O(V+E) Kahn pass on every change-cascade event.
private Dictionary<string, int>? _cachedRank;
/// <summary>
/// Register a node and the set of tags it depends on. Idempotent — re-adding
/// the same node overwrites the prior dependency set, so re-publishing an edited
@@ -58,6 +63,7 @@ public sealed class DependencyGraph
_dependents[dep] = set = new HashSet<string>(StringComparer.Ordinal);
set.Add(nodeId);
}
_cachedRank = null; // graph mutated — invalidate cached rank
}
/// <summary>Tag paths <paramref name="nodeId"/> directly reads.</summary>
@@ -84,9 +90,11 @@ public sealed class DependencyGraph
var result = new List<string>();
var visited = new HashSet<string>(StringComparer.Ordinal);
var order = TopologicalSort();
var rank = new Dictionary<string, int>(StringComparer.Ordinal);
for (var i = 0; i < order.Count; i++) rank[order[i]] = i;
// Reuse the cached rank to avoid an O(V+E) Kahn pass on every change event.
// The cache is invalidated whenever the graph is mutated (Add / Clear), so it
// is always consistent with the current graph structure.
var rank = GetOrBuildRank();
// DFS from the changed node collecting every reachable dependent.
var stack = new Stack<string>();
@@ -115,6 +123,16 @@ public sealed class DependencyGraph
return result;
}
private Dictionary<string, int> GetOrBuildRank()
{
if (_cachedRank is not null) return _cachedRank;
var order = TopologicalSort();
var rank = new Dictionary<string, int>(order.Count, StringComparer.Ordinal);
for (var i = 0; i < order.Count; i++) rank[order[i]] = i;
_cachedRank = rank;
return rank;
}
/// <summary>Iterable of every registered node id (inputs-only tags excluded).</summary>
public IReadOnlyCollection<string> RegisteredNodes => _dependsOn.Keys;
@@ -249,6 +267,7 @@ public sealed class DependencyGraph
{
_dependsOn.Clear();
_dependents.Clear();
_cachedRank = null; // graph cleared — invalidate cached rank
}
}
@@ -76,8 +76,15 @@ public sealed class VirtualTagEngine : IDisposable
_graph.Clear();
var compileFailures = new List<string>();
var seenPaths = new HashSet<string>(StringComparer.Ordinal);
foreach (var def in definitions)
{
if (!seenPaths.Add(def.Path))
{
compileFailures.Add($"{def.Path}: duplicate path — only one definition per path is allowed");
continue;
}
try
{
var extraction = DependencyExtractor.Extract(def.ScriptSource);
@@ -113,9 +120,10 @@ public sealed class VirtualTagEngine : IDisposable
// Subscribe to every referenced upstream path (driver tags only — virtual tags
// cascade internally). Seed the cache with current upstream values so first
// evaluations see something real.
var upstreamPaths = definitions
.SelectMany(d => _tags[d.Path].Reads)
// evaluations see something real. Iterate _tags.Values (the registered set) rather
// than definitions to avoid indexing by a raw input list that may contain duplicates.
var upstreamPaths = _tags.Values
.SelectMany(s => s.Reads)
.Where(p => !_tags.ContainsKey(p))
.Distinct(StringComparer.Ordinal);
foreach (var path in upstreamPaths)
@@ -229,12 +237,18 @@ public sealed class VirtualTagEngine : IDisposable
{
var ctxCache = BuildReadCache(state.Reads);
// Cold-start guard — hold the prior value when any upstream input is still
// unset or Bad-quality. Evaluating with nulls would throw inside the script
// (scripts cast ctx.GetTag(path).Value directly) and produce a persistent
// BadInternalError result until the upstream cache fills. Keeping the prior
// snapshot is more honest: the virtual tag simply hasn't been computed yet.
if (!AreInputsReady(ctxCache)) return;
// Cold-start guard — when any upstream input is still unset or Bad-quality,
// publish a BadWaitingForInitialData snapshot so OPC UA clients see a defined
// quality rather than observing "not yet computed" as a stale Good value.
// Evaluating with nulls would throw inside the script (scripts cast
// ctx.GetTag(path).Value directly) and produce a persistent BadInternalError.
if (!AreInputsReady(ctxCache))
{
var notReady = new DataValueSnapshot(null, 0x80320000u /* BadWaitingForInitialData */, null, _clock());
_valueCache[path] = notReady;
NotifyObservers(path, notReady);
return;
}
var context = new VirtualTagContext(
ctxCache,
@@ -247,7 +261,12 @@ public sealed class VirtualTagEngine : IDisposable
{
var raw = await state.Evaluator.RunAsync(context, ct).ConfigureAwait(false);
var coerced = CoerceResult(raw, state.Definition.DataType);
result = new DataValueSnapshot(coerced, 0u, _clock(), _clock());
// null from CoerceResult means the conversion threw (raw was non-null but
// not convertible to the declared type). Surface as BadInternalError so
// the OPC UA client sees a defined Bad quality rather than a Good null.
result = (raw is not null && coerced is null)
? new DataValueSnapshot(null, 0x80020000u /* BadInternalError */, null, _clock())
: new DataValueSnapshot(coerced, 0u, _clock(), _clock());
}
catch (ScriptTimeoutException tex)
{
@@ -49,19 +49,20 @@ public sealed class VirtualTagSource : IReadable, ISubscribable
var handle = new SubscriptionHandle(Guid.NewGuid().ToString("N"));
var observers = new List<IDisposable>(fullReferences.Count);
foreach (var path in fullReferences)
{
observers.Add(_engine.Subscribe(path, (p, snap) =>
OnDataChange?.Invoke(this, new DataChangeEventArgs(handle, p, snap))));
}
_subs[handle.DiagnosticId] = new Subscription(handle, observers);
// OPC UA convention: emit initial-data callback for each path with the current value.
// OPC UA convention: for each path, emit the initial-data callback BEFORE
// registering the change observer. This prevents a race where an upstream change
// fires the observer between the Subscribe call and the Read call, which would
// deliver a newer change event before the initial-data event, leaving the client
// with a stale last-known value.
foreach (var path in fullReferences)
{
var snap = _engine.Read(path);
OnDataChange?.Invoke(this, new DataChangeEventArgs(handle, path, snap));
observers.Add(_engine.Subscribe(path, (p, s) =>
OnDataChange?.Invoke(this, new DataChangeEventArgs(handle, p, s))));
}
_subs[handle.DiagnosticId] = new Subscription(handle, observers);
return Task.FromResult<ISubscriptionHandle>(handle);
}
@@ -79,16 +79,15 @@ public sealed class PermissionTrie
private static void WalkSystemPlatform(PermissionTrieNode ns, NodeScope scope, HashSet<string> groups, List<MatchedGrant> matches)
{
// FolderSegments are nested under the namespace; each is its own trie level. Reuse the
// UnsArea scope kind for the flags — NodeAcl rows for Galaxy tags carry ScopeKind.Tag
// for leaf grants and ScopeKind.Namespace for folder-root grants; deeper folder grants
// are modeled as Equipment-level rows today since NodeAclScopeKind doesn't enumerate
// a dedicated FolderSegment kind. Future-proof TODO tracked in Stream B follow-up.
// FolderSegments are nested under the namespace; each is its own trie level. Use the
// dedicated FolderSegment scope kind so Galaxy folder grants report their true scope in
// AuthorizationDecision.Provenance — distinguishing them from UNS Equipment grants in
// the audit trail and Admin UI "Probe this permission" diagnostic.
var current = ns;
foreach (var segment in scope.FolderSegments)
{
if (!current.Children.TryGetValue(segment, out var child)) return;
CollectAtLevel(child, NodeAclScopeKind.Equipment, groups, matches);
CollectAtLevel(child, NodeAclScopeKind.FolderSegment, groups, matches);
current = child;
}
@@ -54,26 +54,51 @@ public sealed class PermissionTrieCache
/// <summary>
/// Retain only the most-recent <paramref name="keepLatest"/> generations for a cluster.
/// No-op when there's nothing to drop.
/// No-op when there's nothing to drop. Thread-safe: uses a CAS loop with
/// <see cref="ConcurrentDictionary{TKey,TValue}.TryUpdate"/> (reference equality on the
/// class-typed entry) so a concurrent <see cref="Install"/> on the same cluster is never
/// silently overwritten.
/// </summary>
public void Prune(string clusterId, int keepLatest = 3)
{
if (keepLatest < 1) throw new ArgumentOutOfRangeException(nameof(keepLatest), keepLatest, "keepLatest must be >= 1");
if (!_byCluster.TryGetValue(clusterId, out var entry)) return;
if (entry.Tries.Count <= keepLatest) return;
var keep = entry.Tries
.OrderByDescending(kvp => kvp.Key)
.Take(keepLatest)
.ToDictionary(kvp => kvp.Key, kvp => kvp.Value);
_byCluster[clusterId] = new ClusterEntry(entry.Current, keep);
// CAS retry loop: read a snapshot, compute the pruned entry, atomically swap.
// Retry if another writer (Install or a concurrent Prune) updated the entry first.
while (true)
{
if (!_byCluster.TryGetValue(clusterId, out var observed)) return;
if (observed.Tries.Count <= keepLatest) return;
var keep = observed.Tries
.OrderByDescending(kvp => kvp.Key)
.Take(keepLatest)
.ToDictionary(kvp => kvp.Key, kvp => kvp.Value);
// Preserve the current pointer; if it was pruned (shouldn't happen since Current
// is always the newest generation), fall back to the newest retained entry.
var current = keep.TryGetValue(observed.Current.GenerationId, out var kept)
? kept
: keep.OrderByDescending(kvp => kvp.Key).First().Value;
var pruned = new ClusterEntry(current, keep);
// TryUpdate uses reference equality for ClusterEntry (class, not record) so it
// succeeds only when the stored reference is still the one we observed.
if (_byCluster.TryUpdate(clusterId, pruned, observed))
return;
// Another thread updated the entry between our read and our write — re-read and retry.
}
}
/// <summary>Diagnostics counter: number of cached (cluster, generation) tries.</summary>
public int CachedTrieCount => _byCluster.Values.Sum(e => e.Tries.Count);
private sealed record ClusterEntry(PermissionTrie Current, IReadOnlyDictionary<long, PermissionTrie> Tries)
// Class (not record) so TryUpdate in Prune uses reference equality for the CAS comparison.
private sealed class ClusterEntry(PermissionTrie current, IReadOnlyDictionary<long, PermissionTrie> tries)
{
public PermissionTrie Current { get; } = current;
public IReadOnlyDictionary<long, PermissionTrie> Tries { get; } = tries;
public static ClusterEntry FromSingle(PermissionTrie trie) =>
new(trie, new Dictionary<long, PermissionTrie> { [trie.GenerationId] = trie });
@@ -36,12 +36,25 @@ public class GenericDriverNodeManager(IDriver driver) : IDisposable
/// Populates the address space by streaming nodes from the driver into the supplied builder,
/// wraps the builder so alarm-condition sinks are captured, subscribes to the driver's
/// alarm event stream, and routes each transition to the matching sink by <c>SourceNodeId</c>.
/// Driver exceptions are isolated per decision #12 — the driver's subtree is marked Faulted,
/// but other drivers remain available.
/// If called a second time (e.g. Galaxy redeploy via <c>IRediscoverable.OnRediscoveryNeeded</c>)
/// the previous alarm subscription is torn down and the sink registry is cleared before
/// re-walking, preventing double delivery of alarm transitions.
/// Exception isolation (marking the driver's subtree Faulted) is the caller's responsibility —
/// exceptions from <see cref="ITagDiscovery.DiscoverAsync"/> propagate to the caller.
/// </summary>
public async Task BuildAddressSpaceAsync(IAddressSpaceBuilder builder, CancellationToken ct)
{
ArgumentNullException.ThrowIfNull(builder);
ObjectDisposedException.ThrowIf(_disposed, this);
// Tear down any previous alarm subscription before re-walking so a second call (e.g. on
// Galaxy redeploy) does not leave the old forwarder subscribed and double-fire events.
if (_alarmForwarder is not null && Driver is IAlarmSource existingSource)
{
existingSource.OnAlarmEvent -= _alarmForwarder;
_alarmForwarder = null;
}
_alarmSinks.Clear();
if (Driver is not ITagDiscovery discovery)
throw new NotSupportedException($"Driver '{Driver.DriverInstanceId}' does not implement ITagDiscovery.");
@@ -48,7 +48,9 @@ public sealed class AlarmSurfaceInvoker
/// <summary>
/// Subscribe to alarm events for a set of source node ids, fanning out by resolved host
/// so per-host breakers / bulkheads apply. Returns one handle per host — callers that
/// don't care about per-host separation may concatenate them.
/// don't care about per-host separation may concatenate them. Each returned handle wraps
/// the driver's opaque handle together with its resolved host so <see cref="UnsubscribeAsync"/>
/// routes through the same host's pipeline that the subscription was created on.
/// </summary>
public async Task<IReadOnlyList<IAlarmSubscriptionHandle>> SubscribeAsync(
IReadOnlyList<string> sourceNodeIds,
@@ -61,24 +63,34 @@ public sealed class AlarmSurfaceInvoker
var handles = new List<IAlarmSubscriptionHandle>(byHost.Count);
foreach (var (host, ids) in byHost)
{
var handle = await _invoker.ExecuteAsync(
var inner = await _invoker.ExecuteAsync(
DriverCapability.AlarmSubscribe,
host,
async ct => await _alarmSource.SubscribeAlarmsAsync(ids, ct).ConfigureAwait(false),
cancellationToken).ConfigureAwait(false);
handles.Add(handle);
handles.Add(new HostBoundHandle(inner, host));
}
return handles;
}
/// <summary>Cancel an alarm subscription. Routes through the AlarmSubscribe pipeline for parity.</summary>
/// <summary>
/// Cancel an alarm subscription. Routes through the same host's resilience pipeline
/// that the subscription was created on (carried in the <see cref="HostBoundHandle"/>
/// wrapper returned by <see cref="SubscribeAsync"/>). Falls back to the default host for
/// handles not created by this invoker so the method remains safe to call on any
/// <see cref="IAlarmSubscriptionHandle"/> implementation.
/// </summary>
public ValueTask UnsubscribeAsync(IAlarmSubscriptionHandle handle, CancellationToken cancellationToken)
{
ArgumentNullException.ThrowIfNull(handle);
var (innerHandle, host) = handle is HostBoundHandle bound
? (bound.Inner, bound.Host)
: (handle, _defaultHost);
return _invoker.ExecuteAsync(
DriverCapability.AlarmSubscribe,
_defaultHost,
async ct => await _alarmSource.UnsubscribeAlarmsAsync(handle, ct).ConfigureAwait(false),
host,
async ct => await _alarmSource.UnsubscribeAlarmsAsync(innerHandle, ct).ConfigureAwait(false),
cancellationToken);
}
@@ -126,4 +138,16 @@ public sealed class AlarmSurfaceInvoker
}
return result;
}
/// <summary>
/// Wraps an <see cref="IAlarmSubscriptionHandle"/> returned by the driver with the
/// resolved host name used when the subscription was created. <see cref="UnsubscribeAsync"/>
/// unwraps this to route the unsubscribe through the same host's resilience pipeline.
/// </summary>
private sealed class HostBoundHandle(IAlarmSubscriptionHandle inner, string host) : IAlarmSubscriptionHandle
{
public IAlarmSubscriptionHandle Inner { get; } = inner;
public string Host { get; } = host;
public string DiagnosticId => Inner.DiagnosticId;
}
}
@@ -56,4 +56,19 @@ public abstract class AbCipCommandBase : DriverCommandBase
/// multiple gateways in parallel can distinguish the logs.
/// </summary>
protected string DriverInstanceId => $"abcip-cli-{Gateway}";
/// <summary>
/// Guards against <see cref="AbCipDataType.Structure"/> being passed to a command
/// that does not support UDT layouts. Call at the top of <c>ExecuteAsync</c> for any
/// command that accepts <c>--type</c> but cannot handle memberless Structure tags.
/// Throws a <see cref="CliFx.Exceptions.CommandException"/> if <paramref name="type"/>
/// is <see cref="AbCipDataType.Structure"/>.
/// </summary>
protected static void RejectStructure(AbCipDataType type)
{
if (type == AbCipDataType.Structure)
throw new CliFx.Exceptions.CommandException(
"Structure (UDT) reads are out of scope for this command — those need an explicit " +
"member layout, which belongs in a real driver config.");
}
}
@@ -25,6 +25,7 @@ public sealed class ProbeCommand : AbCipCommandBase
public override async ValueTask ExecuteAsync(IConsole console)
{
ConfigureLogging();
RejectStructure(DataType);
var ct = console.RegisterCancellationHandler();
var probeTag = new AbCipTagDefinition(
@@ -27,6 +27,7 @@ public sealed class ReadCommand : AbCipCommandBase
public override async ValueTask ExecuteAsync(IConsole console)
{
ConfigureLogging();
RejectStructure(DataType);
var ct = console.RegisterCancellationHandler();
var tagName = SynthesiseTagName(TagPath, DataType);
@@ -30,6 +30,7 @@ public sealed class SubscribeCommand : AbCipCommandBase
public override async ValueTask ExecuteAsync(IConsole console)
{
ConfigureLogging();
RejectStructure(DataType);
var ct = console.RegisterCancellationHandler();
var tagName = ReadCommand.SynthesiseTagName(TagPath, DataType);
@@ -66,23 +66,40 @@ public sealed class WriteCommand : AbCipCommandBase
/// <summary>
/// Parse the operator's <c>--value</c> string into the CLR type the driver expects
/// for the declared <see cref="AbCipDataType"/>. Invariant culture everywhere.
/// Bad input (non-numeric text, out-of-range value) is caught and rethrown as a
/// <see cref="CliFx.Exceptions.CommandException"/> so CliFx renders a clean one-line
/// error rather than a full .NET stack trace.
/// </summary>
internal static object ParseValue(string raw, AbCipDataType type) => type switch
internal static object ParseValue(string raw, AbCipDataType type)
{
AbCipDataType.Bool => ParseBool(raw),
AbCipDataType.SInt => sbyte.Parse(raw, CultureInfo.InvariantCulture),
AbCipDataType.Int => short.Parse(raw, CultureInfo.InvariantCulture),
AbCipDataType.DInt or AbCipDataType.Dt => int.Parse(raw, CultureInfo.InvariantCulture),
AbCipDataType.LInt => long.Parse(raw, CultureInfo.InvariantCulture),
AbCipDataType.USInt => byte.Parse(raw, CultureInfo.InvariantCulture),
AbCipDataType.UInt => ushort.Parse(raw, CultureInfo.InvariantCulture),
AbCipDataType.UDInt => uint.Parse(raw, CultureInfo.InvariantCulture),
AbCipDataType.ULInt => ulong.Parse(raw, CultureInfo.InvariantCulture),
AbCipDataType.Real => float.Parse(raw, CultureInfo.InvariantCulture),
AbCipDataType.LReal => double.Parse(raw, CultureInfo.InvariantCulture),
AbCipDataType.String => raw,
_ => throw new CliFx.Exceptions.CommandException($"Unsupported DataType '{type}' for write."),
};
try
{
return type switch
{
AbCipDataType.Bool => ParseBool(raw),
AbCipDataType.SInt => sbyte.Parse(raw, CultureInfo.InvariantCulture),
AbCipDataType.Int => short.Parse(raw, CultureInfo.InvariantCulture),
AbCipDataType.DInt or AbCipDataType.Dt => int.Parse(raw, CultureInfo.InvariantCulture),
AbCipDataType.LInt => long.Parse(raw, CultureInfo.InvariantCulture),
AbCipDataType.USInt => byte.Parse(raw, CultureInfo.InvariantCulture),
AbCipDataType.UInt => ushort.Parse(raw, CultureInfo.InvariantCulture),
AbCipDataType.UDInt => uint.Parse(raw, CultureInfo.InvariantCulture),
AbCipDataType.ULInt => ulong.Parse(raw, CultureInfo.InvariantCulture),
AbCipDataType.Real => float.Parse(raw, CultureInfo.InvariantCulture),
AbCipDataType.LReal => double.Parse(raw, CultureInfo.InvariantCulture),
AbCipDataType.String => raw,
_ => throw new CliFx.Exceptions.CommandException($"Unsupported DataType '{type}' for write."),
};
}
catch (Exception ex) when (ex is FormatException or OverflowException)
{
throw new CliFx.Exceptions.CommandException(
$"Cannot parse '{raw}' as {type}. " +
$"Check the value is within the valid range for {type} and uses invariant-culture " +
$"decimal notation (e.g. '3.14', not '3,14').",
innerException: ex);
}
}
private static bool ParseBool(string raw) => raw.Trim().ToLowerInvariant() switch
{
@@ -59,17 +59,38 @@ public sealed class WriteCommand : AbLegacyCommandBase
}
/// <summary>Parse <c>--value</c> per <see cref="AbLegacyDataType"/>, invariant culture.</summary>
internal static object ParseValue(string raw, AbLegacyDataType type) => type switch
/// <exception cref="CliFx.Exceptions.CommandException">
/// Thrown when <paramref name="raw"/> cannot be parsed as the requested type (malformed
/// input or out-of-range value) so CliFx renders a clean one-line error instead of a raw
/// stack trace.
/// </exception>
internal static object ParseValue(string raw, AbLegacyDataType type)
{
AbLegacyDataType.Bit => ParseBool(raw),
AbLegacyDataType.Int or AbLegacyDataType.AnalogInt => short.Parse(raw, CultureInfo.InvariantCulture),
AbLegacyDataType.Long => int.Parse(raw, CultureInfo.InvariantCulture),
AbLegacyDataType.Float => float.Parse(raw, CultureInfo.InvariantCulture),
AbLegacyDataType.String => raw,
AbLegacyDataType.TimerElement or AbLegacyDataType.CounterElement
or AbLegacyDataType.ControlElement => int.Parse(raw, CultureInfo.InvariantCulture),
_ => throw new CliFx.Exceptions.CommandException($"Unsupported DataType '{type}' for write."),
};
try
{
return type switch
{
AbLegacyDataType.Bit => ParseBool(raw),
AbLegacyDataType.Int or AbLegacyDataType.AnalogInt => short.Parse(raw, CultureInfo.InvariantCulture),
AbLegacyDataType.Long => int.Parse(raw, CultureInfo.InvariantCulture),
AbLegacyDataType.Float => float.Parse(raw, CultureInfo.InvariantCulture),
AbLegacyDataType.String => raw,
AbLegacyDataType.TimerElement or AbLegacyDataType.CounterElement
or AbLegacyDataType.ControlElement => int.Parse(raw, CultureInfo.InvariantCulture),
_ => throw new CliFx.Exceptions.CommandException($"Unsupported DataType '{type}' for write."),
};
}
catch (FormatException ex)
{
throw new CliFx.Exceptions.CommandException(
$"Value '{raw}' is not a valid {type}: {ex.Message}", innerException: ex);
}
catch (OverflowException ex)
{
throw new CliFx.Exceptions.CommandException(
$"Value '{raw}' is out of range for {type}: {ex.Message}", innerException: ex);
}
}
private static bool ParseBool(string raw) => raw.Trim().ToLowerInvariant() switch
{
@@ -7,8 +7,8 @@ namespace ZB.MOM.WW.OtOpcUa.Driver.Cli.Common;
/// <summary>
/// Shared base for every driver test-client command (Modbus / AB CIP / AB Legacy / S7 /
/// TwinCAT). Carries the options that are meaningful regardless of protocol — verbose
/// logging + the standard timeout — plus helpers every command implementation wants:
/// TwinCAT / FOCAS). Carries the options that are meaningful regardless of protocol —
/// verbose logging + the standard timeout — plus helpers every command implementation wants:
/// Serilog configuration + cancellation-token capture.
/// </summary>
/// <remarks>
@@ -44,17 +44,37 @@ public abstract class DriverCommandBase : ICommand
public abstract ValueTask ExecuteAsync(IConsole console);
/// <summary>
/// Configures the process-global Serilog logger. Commands call this at the top of
/// <see cref="ExecuteAsync"/> so driver-internal <c>Log.Logger</c> writes land on the
/// same sink as the CLI's operator-facing output.
/// Configures the process-global Serilog logger. Intended to be called exactly once,
/// at the top of <see cref="ExecuteAsync"/>, so driver-internal <c>Log.Logger</c>
/// writes land on the same sink as the CLI's operator-facing output.
/// If the logger has already been configured this call is a no-op (idempotent).
/// Call <see cref="FlushLogging"/> in a <c>finally</c> block to ensure buffered output
/// is flushed before the process exits.
/// </summary>
protected void ConfigureLogging()
{
if (_loggingConfigured) return;
_loggingConfigured = true;
// Dispose the previous global logger (e.g. Serilog's silent bootstrap logger) so
// its resources are released cleanly before we overwrite Log.Logger.
var previous = Log.Logger;
var config = new LoggerConfiguration();
if (Verbose)
config.MinimumLevel.Debug().WriteTo.Console();
else
config.MinimumLevel.Warning().WriteTo.Console();
Log.Logger = config.CreateLogger();
(previous as IDisposable)?.Dispose();
}
/// <summary>
/// Flushes and closes the Serilog logger configured by <see cref="ConfigureLogging"/>.
/// Call this in a <c>finally</c> block inside <see cref="ExecuteAsync"/> to prevent
/// buffered log output from being lost on process exit, particularly for long-running
/// commands such as <c>subscribe</c>.
/// </summary>
protected static void FlushLogging() => Log.CloseAndFlush();
private bool _loggingConfigured;
}
@@ -65,9 +65,9 @@ public static class SnapshotFormatter
Time = FormatTimestamp(snapshots[i].SourceTimestampUtc),
}).ToArray();
int tagW = Math.Max("TAG".Length, rows.Max(r => r.Tag.Length));
int valW = Math.Max("VALUE".Length, rows.Max(r => r.Value.Length));
int statW = Math.Max("STATUS".Length, rows.Max(r => r.Status.Length));
int tagW = rows.Length == 0 ? "TAG".Length : Math.Max("TAG".Length, rows.Max(r => r.Tag.Length));
int valW = rows.Length == 0 ? "VALUE".Length : Math.Max("VALUE".Length, rows.Max(r => r.Value.Length));
int statW = rows.Length == 0 ? "STATUS".Length : Math.Max("STATUS".Length, rows.Max(r => r.Status.Length));
// source-time column is fixed-width (ISO-8601 to ms) so no max-measurement needed.
var sb = new System.Text.StringBuilder();
@@ -100,12 +100,16 @@ public static class SnapshotFormatter
public static string FormatStatus(uint statusCode)
{
// Match the OPC UA shorthand for the statuses most-likely to land in a CLI run.
// Anything outside this short-list surfaces as hex — operators can cross-reference
// against OPC UA Part 6 § 7.34 (StatusCode tables) or Core.Abstractions status mappers.
// OPC UA status codes carry sub-code and flag bits in the low 16 bits (info type,
// structure-changed, semantics-changed, limit bits, overflow, etc.). To ensure
// that e.g. 0x80050001 still reads as "BadCommunicationError" rather than bare hex,
// named codes are matched against the high-word mask (code & 0xFFFF0000). When no
// named match is found the severity class (top 2 bits) provides a meaningful fallback
// so operators always see at least Good / Uncertain / Bad rather than raw hex.
// Numeric codes are the canonical values from the OPC Foundation Opc.Ua.StatusCodes
// table; keep them in sync with that table if this list is extended.
var name = statusCode switch
var masked = statusCode & 0xFFFF0000u;
var name = masked switch
{
0x00000000u => "Good",
0x80000000u => "Bad",
@@ -119,6 +123,19 @@ public static class SnapshotFormatter
0x40000000u => "Uncertain",
_ => null,
};
if (name is null)
{
// Severity fallback: top 2 bits identify the quality class even for unknown
// sub-codes. 0x80000000 and 0xC0000000 (reserved quality) both map to "Bad".
name = (statusCode & 0xC0000000u) switch
{
0x00000000u => "Good",
0x40000000u => "Uncertain",
_ => "Bad",
};
}
return name is null
? $"0x{statusCode:X8}"
: $"0x{statusCode:X8} ({name})";
@@ -35,6 +35,21 @@ public sealed class SubscribeCommand : ModbusCommandBase
"BigEndian (default) or WordSwap.")]
public ModbusByteOrder ByteOrder { get; init; } = ModbusByteOrder.BigEndian;
// Driver.Modbus.Cli-001: subscribe previously lacked these three options that read and
// write both expose. Without them, BitInRegister always watches bit 0 and String runs with
// StringLength=0, silently producing wrong results for any subscriber using those types.
[CommandOption("bit-index", Description =
"For type=BitInRegister: which bit of the holding register (0-15, LSB-first).")]
public byte BitIndex { get; init; }
[CommandOption("string-length", Description =
"For type=String: character count (2 per register, rounded up).")]
public ushort StringLength { get; init; }
[CommandOption("string-byte-order", Description =
"For type=String: HighByteFirst (standard) or LowByteFirst (DirectLOGIC).")]
public ModbusStringByteOrder StringByteOrder { get; init; } = ModbusStringByteOrder.HighByteFirst;
public override async ValueTask ExecuteAsync(IConsole console)
{
ConfigureLogging();
@@ -47,7 +62,10 @@ public sealed class SubscribeCommand : ModbusCommandBase
Address: Address,
DataType: DataType,
Writable: false,
ByteOrder: ByteOrder);
ByteOrder: ByteOrder,
BitIndex: BitIndex,
StringLength: StringLength,
StringByteOrder: StringByteOrder);
var options = BuildOptions([tag]);
await using var driver = new ModbusDriver(options, DriverInstanceId);
@@ -60,6 +60,16 @@ public sealed class WriteCommand : ModbusCommandBase
throw new CliFx.Exceptions.CommandException(
$"Region '{Region}' is read-only in the Modbus spec; writes require Coils or HoldingRegisters.");
// Driver.Modbus.Cli-002: coils are single-bit outputs — only Bool makes sense. A
// non-boolean type (e.g. --region Coils --type UInt16) would silently coerce the value
// to a boolean via Convert.ToBoolean, landing as ON for any non-zero value, with no
// diagnostic. Reject it early so the operator sees a clear error rather than a silent
// type-mismatch coerce.
if (Region == ModbusRegion.Coils && DataType != ModbusDataType.Bool)
throw new CliFx.Exceptions.CommandException(
$"Region 'Coils' only supports boolean values (--type Bool). " +
$"Type '{DataType}' cannot represent a single-bit coil write.");
var tagName = ReadCommand.SynthesiseTagName(Region, Address, DataType);
var tag = new ModbusTagDefinition(
Name: tagName,
@@ -34,6 +34,10 @@ public sealed class ProbeCommand : S7CommandBase
var options = BuildOptions([probeTag]);
await using var driver = new S7Driver(options, DriverInstanceId);
// Driver.S7.Cli-003: wrap the entire probe sequence so that a refused/unreachable TCP
// connect still prints the structured Host/CPU/Health lines instead of crashing with a
// full .NET stack trace. InitializeAsync sets health to Faulted with the exception
// message before re-throwing, so GetHealth() always has something to report.
try
{
await driver.InitializeAsync("{}", ct);
@@ -48,6 +52,20 @@ public sealed class ProbeCommand : S7CommandBase
await console.Output.WriteLineAsync();
await console.Output.WriteLineAsync(SnapshotFormatter.Format(Address, snapshot[0]));
}
catch (OperationCanceledException)
{
throw; // Ctrl+C — let CliFx handle it normally.
}
catch
{
// Connect / read failure — print what the driver knows so far.
var health = driver.GetHealth();
await console.Output.WriteLineAsync($"Host: {Host}:{Port}");
await console.Output.WriteLineAsync($"CPU: {CpuType} rack={Rack} slot={Slot}");
await console.Output.WriteLineAsync($"Health: {health.State}");
if (health.LastError is { } err)
await console.Output.WriteLineAsync($"Last error: {err}");
}
finally
{
await driver.ShutdownAsync(CancellationToken.None);
@@ -19,9 +19,12 @@ public sealed class ReadCommand : S7CommandBase
IsRequired = true)]
public string Address { get; init; } = default!;
// Driver.S7.Cli-002: help text trimmed to the types the driver actually implements.
// Int64 / UInt64 / Float64 / String / DateTime are defined in S7DataType but the driver
// raises NotSupportedException (→ BadNotSupported) on reads of those types.
[CommandOption("type", 't', Description =
"Bool / Byte / Int16 / UInt16 / Int32 / UInt32 / Int64 / UInt64 / Float32 / Float64 / " +
"String / DateTime (default Int16).")]
"Bool / Byte / Int16 / UInt16 / Int32 / UInt32 / Float32 (default Int16). " +
"Int64, UInt64, Float64, String, and DateTime are not yet implemented and will return BadNotSupported.")]
public S7DataType DataType { get; init; } = S7DataType.Int16;
[CommandOption("string-length", Description =
@@ -15,9 +15,10 @@ public sealed class SubscribeCommand : S7CommandBase
[CommandOption("address", 'a', Description = "S7 address — same format as `read`.", IsRequired = true)]
public string Address { get; init; } = default!;
// Driver.S7.Cli-002: help text trimmed to the types the driver actually implements.
[CommandOption("type", 't', Description =
"Bool / Byte / Int16 / UInt16 / Int32 / UInt32 / Int64 / UInt64 / Float32 / Float64 / " +
"String / DateTime (default Int16).")]
"Bool / Byte / Int16 / UInt16 / Int32 / UInt32 / Float32 (default Int16). " +
"Int64, UInt64, Float64, String, and DateTime are not yet implemented and will return BadNotSupported.")]
public S7DataType DataType { get; init; } = S7DataType.Int16;
[CommandOption("interval-ms", 'i', Description = "Publishing interval ms (default 1000).")]
@@ -18,9 +18,13 @@ public sealed class WriteCommand : S7CommandBase
"S7 address — same format as `read`.", IsRequired = true)]
public string Address { get; init; } = default!;
// Driver.S7.Cli-002: help text trimmed to the types the driver actually implements.
// Int64 / UInt64 / Float64 / String / DateTime are defined in S7DataType but the driver
// raises NotSupportedException (→ BadNotSupported) on any read/write of those types;
// advertising them misleads operators who then see BadNotSupported with no explanation.
[CommandOption("type", 't', Description =
"Bool / Byte / Int16 / UInt16 / Int32 / UInt32 / Int64 / UInt64 / Float32 / Float64 / " +
"String / DateTime (default Int16).")]
"Bool / Byte / Int16 / UInt16 / Int32 / UInt32 / Float32 (default Int16). " +
"Int64, UInt64, Float64, String, and DateTime are not yet implemented and will return BadNotSupported.")]
public S7DataType DataType { get; init; } = S7DataType.Int16;
[CommandOption("value", 'v', Description =
@@ -62,22 +66,44 @@ public sealed class WriteCommand : S7CommandBase
}
/// <summary>Parse <c>--value</c> per <see cref="S7DataType"/>, invariant culture throughout.</summary>
internal static object ParseValue(string raw, S7DataType type) => type switch
/// <remarks>
/// Driver.S7.Cli-001: numeric and <see cref="DateTime"/> parses are wrapped so that
/// malformed input (<see cref="FormatException"/> / <see cref="OverflowException"/>)
/// surfaces as a clean <see cref="CliFx.Exceptions.CommandException"/> rather than a
/// raw .NET stack trace — matching the friendly message the Bool path already produces.
/// </remarks>
internal static object ParseValue(string raw, S7DataType type)
{
S7DataType.Bool => ParseBool(raw),
S7DataType.Byte => byte.Parse(raw, CultureInfo.InvariantCulture),
S7DataType.Int16 => short.Parse(raw, CultureInfo.InvariantCulture),
S7DataType.UInt16 => ushort.Parse(raw, CultureInfo.InvariantCulture),
S7DataType.Int32 => int.Parse(raw, CultureInfo.InvariantCulture),
S7DataType.UInt32 => uint.Parse(raw, CultureInfo.InvariantCulture),
S7DataType.Int64 => long.Parse(raw, CultureInfo.InvariantCulture),
S7DataType.UInt64 => ulong.Parse(raw, CultureInfo.InvariantCulture),
S7DataType.Float32 => float.Parse(raw, CultureInfo.InvariantCulture),
S7DataType.Float64 => double.Parse(raw, CultureInfo.InvariantCulture),
S7DataType.String => raw,
S7DataType.DateTime => DateTime.Parse(raw, CultureInfo.InvariantCulture, DateTimeStyles.RoundtripKind),
_ => throw new CliFx.Exceptions.CommandException($"Unsupported DataType '{type}' for write."),
};
if (type == S7DataType.Bool) return ParseBool(raw);
if (type == S7DataType.String) return raw;
try
{
return type switch
{
S7DataType.Byte => (object)byte.Parse(raw, CultureInfo.InvariantCulture),
S7DataType.Int16 => (object)short.Parse(raw, CultureInfo.InvariantCulture),
S7DataType.UInt16 => (object)ushort.Parse(raw, CultureInfo.InvariantCulture),
S7DataType.Int32 => (object)int.Parse(raw, CultureInfo.InvariantCulture),
S7DataType.UInt32 => (object)uint.Parse(raw, CultureInfo.InvariantCulture),
S7DataType.Int64 => (object)long.Parse(raw, CultureInfo.InvariantCulture),
S7DataType.UInt64 => (object)ulong.Parse(raw, CultureInfo.InvariantCulture),
S7DataType.Float32 => (object)float.Parse(raw, CultureInfo.InvariantCulture),
S7DataType.Float64 => (object)double.Parse(raw, CultureInfo.InvariantCulture),
S7DataType.DateTime => (object)DateTime.Parse(raw, CultureInfo.InvariantCulture, DateTimeStyles.RoundtripKind),
_ => throw new CliFx.Exceptions.CommandException($"Unsupported DataType '{type}' for write."),
};
}
catch (FormatException ex)
{
throw new CliFx.Exceptions.CommandException(
$"Value '{raw}' is not a valid {type}: {ex.Message}");
}
catch (OverflowException ex)
{
throw new CliFx.Exceptions.CommandException(
$"Value '{raw}' is out of range for {type}: {ex.Message}");
}
}
private static bool ParseBool(string raw) => raw.Trim().ToLowerInvariant() switch
{
@@ -40,22 +40,29 @@ public enum AbCipDataType
public static class AbCipDataTypeExtensions
{
/// <summary>
/// Map to the driver-agnostic type the server's address-space builder consumes. Unsigned
/// Logix types widen into signed equivalents until <c>DriverDataType</c> picks up unsigned
/// + 64-bit variants (Modbus has the same gap — see <c>ModbusDriver.MapDataType</c>
/// comment re: PR 25).
/// Map to the driver-agnostic type the server's address-space builder consumes.
/// <c>DriverDataType</c> carries Int64, UInt32, and UInt64 so each Logix type maps
/// to the widest correct signed/unsigned equivalent without silent truncation:
/// <list type="bullet">
/// <item>LInt (signed 64-bit) → Int64; ULInt (unsigned 64-bit) → UInt64.</item>
/// <item>UDInt (unsigned 32-bit) → UInt32 so values above Int32.MaxValue are not
/// wrapped to negative (Driver.AbCip-004).</item>
/// <item>USInt / UInt widen into Int32; they can never overflow it.</item>
/// </list>
/// </summary>
public static DriverDataType ToDriverDataType(this AbCipDataType t) => t switch
{
AbCipDataType.Bool => DriverDataType.Boolean,
AbCipDataType.SInt or AbCipDataType.Int or AbCipDataType.DInt => DriverDataType.Int32,
AbCipDataType.USInt or AbCipDataType.UInt or AbCipDataType.UDInt => DriverDataType.Int32,
AbCipDataType.LInt or AbCipDataType.ULInt => DriverDataType.Int32, // TODO: Int64 — matches Modbus gap
AbCipDataType.USInt or AbCipDataType.UInt => DriverDataType.Int32,
AbCipDataType.UDInt => DriverDataType.UInt32,
AbCipDataType.LInt => DriverDataType.Int64,
AbCipDataType.ULInt => DriverDataType.UInt64,
AbCipDataType.Real => DriverDataType.Float32,
AbCipDataType.LReal => DriverDataType.Float64,
AbCipDataType.String => DriverDataType.String,
AbCipDataType.Dt => DriverDataType.Int32, // epoch-seconds DINT
AbCipDataType.Structure => DriverDataType.String, // placeholder until UDT PR 6 introduces a structured kind
AbCipDataType.Structure => DriverDataType.String, // placeholder until UDT introduces a structured kind
_ => DriverDataType.Int32,
};
}
@@ -5,9 +5,8 @@ namespace ZB.MOM.WW.OtOpcUa.Driver.AbCip;
/// <summary>
/// Allen-Bradley CIP / EtherNet-IP driver for ControlLogix / CompactLogix / Micro800 /
/// GuardLogix families. Implements <see cref="IDriver"/> only for now — read/write/
/// subscribe/discover capabilities ship in subsequent PRs (38) and family-specific quirk
/// profiles ship in PRs 912.
/// GuardLogix families. Implements all read/write/subscribe/discover/probe/alarm
/// capabilities via the libplctag.NET wrapper.
/// </summary>
/// <remarks>
/// <para>Wire layer is libplctag 1.6.x (plan decision #11). Per-device host addresses use
@@ -17,8 +16,11 @@ namespace ZB.MOM.WW.OtOpcUa.Driver.AbCip;
///
/// <para>Tier A per plan decisions #143145 — in-process, shares server lifetime, no
/// sidecar. <see cref="ReinitializeAsync"/> is the Tier-B escape hatch for recovering
/// from native-heap growth that the CLR allocator can't see; it tears down every
/// <see cref="PlcTagHandle"/> and reconnects each device.</para>
/// from native-heap growth that the CLR allocator can't see; it tears down the
/// libplctag.NET <c>Tag</c> instances held in <c>DeviceState.Runtimes</c> and reconnects
/// each device. Native tag lifetime is owned by the libplctag.NET <c>Tag.Dispose()</c>
/// (called in <see cref="DeviceState.DisposeHandles"/>); the library's own finalizer
/// handles GC-collected tags.</para>
/// </remarks>
public sealed class AbCipDriver : IDriver, IReadable, IWritable, ITagDiscovery, ISubscribable,
IHostConnectivityProbe, IPerCallHostResolver, IAlarmSource, IDisposable, IAsyncDisposable
@@ -144,7 +146,16 @@ public sealed class AbCipDriver : IDriver, IReadable, IWritable, ITagDiscovery,
}
foreach (var tag in _options.Tags)
{
// Duplicate-key check: a collision means two configured tags have the same name.
// Fail fast at init time with a diagnostic rather than silently clobbering.
// (Driver.AbCip-005)
if (_tagsByName.TryGetValue(tag.Name, out var existingTag))
throw new InvalidOperationException(
$"AbCip tag name collision: '{tag.Name}' is declared more than once. " +
$"Existing entry DeviceHostAddress='{existingTag.DeviceHostAddress}', " +
$"TagPath='{existingTag.TagPath}'. Rename or remove the duplicate.");
_tagsByName[tag.Name] = tag;
if (tag.DataType == AbCipDataType.Structure && tag.Members is { Count: > 0 })
{
foreach (var member in tag.Members)
@@ -156,6 +167,14 @@ public sealed class AbCipDriver : IDriver, IReadable, IWritable, ITagDiscovery,
DataType: member.DataType,
Writable: member.Writable,
WriteIdempotent: member.WriteIdempotent);
// Member fan-out duplicate check: a member-path collision means two
// configured structure tags produce the same member path, or a member
// name collides with an independently-declared tag.
if (_tagsByName.TryGetValue(memberTag.Name, out var existingMember))
throw new InvalidOperationException(
$"AbCip tag name collision: '{memberTag.Name}' is produced by both " +
$"'{tag.Name}.{member.Name}' (member fan-out) and an existing tag " +
$"'{existingMember.Name}'. Rename one of the configured tags to resolve.");
_tagsByName[memberTag.Name] = memberTag;
}
}
@@ -409,6 +428,15 @@ public sealed class AbCipDriver : IDriver, IReadable, IWritable, ITagDiscovery,
results[fb.OriginalIndex] = new DataValueSnapshot(null, AbCipStatusMapper.BadNodeIdUnknown, null, now);
return;
}
// Driver.AbCip-005: a Structure tag whose Members are declared is a container —
// its bare name is readable via the whole-UDT grouping path (ReadGroupAsync), not the
// per-tag path. Reading it here returns BadNotSupported rather than Good/null so the
// caller knows to address individual member paths (e.g. "Motor.Speed").
if (def.DataType == AbCipDataType.Structure && def.Members is { Count: > 0 })
{
results[fb.OriginalIndex] = new DataValueSnapshot(null, AbCipStatusMapper.BadNotSupported, null, now);
return;
}
if (!_devices.TryGetValue(def.DeviceHostAddress, out var device))
{
results[fb.OriginalIndex] = new DataValueSnapshot(null, AbCipStatusMapper.BadNodeIdUnknown, null, now);
@@ -423,6 +451,11 @@ public sealed class AbCipDriver : IDriver, IReadable, IWritable, ITagDiscovery,
var status = runtime.GetStatus();
if (status != 0)
{
// Evict the stale handle so the next call re-creates it (Driver.AbCip-010).
// A non-zero status can mean the controller dropped the connection or the tag
// handle became permanently invalid (e.g. after a PLC download). Evicting
// mirrors the probe loop's recreate-on-failure behaviour.
EvictRuntime(device, def.Name);
results[fb.OriginalIndex] = new DataValueSnapshot(null,
AbCipStatusMapper.MapLibplctagStatus(status), null, now);
_health = new DriverHealth(DriverState.Degraded, _health.LastSuccessfulRead,
@@ -442,6 +475,8 @@ public sealed class AbCipDriver : IDriver, IReadable, IWritable, ITagDiscovery,
}
catch (Exception ex)
{
// Transport exception — evict so the next read creates a fresh handle.
EvictRuntime(device, def.Name);
results[fb.OriginalIndex] = new DataValueSnapshot(null,
AbCipStatusMapper.BadCommunicationError, null, now);
_health = new DriverHealth(DriverState.Degraded, _health.LastSuccessfulRead, ex.Message);
@@ -474,6 +509,7 @@ public sealed class AbCipDriver : IDriver, IReadable, IWritable, ITagDiscovery,
var status = runtime.GetStatus();
if (status != 0)
{
EvictRuntime(device, parent.Name); // Driver.AbCip-010
var mapped = AbCipStatusMapper.MapLibplctagStatus(status);
StampGroupStatus(group, results, now, mapped);
_health = new DriverHealth(DriverState.Degraded, _health.LastSuccessfulRead,
@@ -494,6 +530,7 @@ public sealed class AbCipDriver : IDriver, IReadable, IWritable, ITagDiscovery,
}
catch (Exception ex)
{
EvictRuntime(device, parent.Name); // Driver.AbCip-010
StampGroupStatus(group, results, now, AbCipStatusMapper.BadCommunicationError);
_health = new DriverHealth(DriverState.Degraded, _health.LastSuccessfulRead, ex.Message);
}
@@ -564,10 +601,16 @@ public sealed class AbCipDriver : IDriver, IReadable, IWritable, ITagDiscovery,
await runtime.WriteAsync(cancellationToken).ConfigureAwait(false);
var status = runtime.GetStatus();
results[i] = new WriteResult(status == 0
? AbCipStatusMapper.Good
: AbCipStatusMapper.MapLibplctagStatus(status));
if (status == 0) _health = new DriverHealth(DriverState.Healthy, now, null);
if (status != 0)
{
EvictRuntime(device, def.Name); // Driver.AbCip-010
results[i] = new WriteResult(AbCipStatusMapper.MapLibplctagStatus(status));
}
else
{
results[i] = new WriteResult(AbCipStatusMapper.Good);
_health = new DriverHealth(DriverState.Healthy, now, null);
}
}
catch (OperationCanceledException)
{
@@ -575,11 +618,13 @@ public sealed class AbCipDriver : IDriver, IReadable, IWritable, ITagDiscovery,
}
catch (NotSupportedException nse)
{
// Type/protocol error — not a transport fault; don't evict the handle.
results[i] = new WriteResult(AbCipStatusMapper.BadNotSupported);
_health = new DriverHealth(DriverState.Degraded, _health.LastSuccessfulRead, nse.Message);
}
catch (FormatException fe)
{
// Value conversion error — not a transport fault; don't evict.
results[i] = new WriteResult(AbCipStatusMapper.BadTypeMismatch);
_health = new DriverHealth(DriverState.Degraded, _health.LastSuccessfulRead, fe.Message);
}
@@ -595,6 +640,8 @@ public sealed class AbCipDriver : IDriver, IReadable, IWritable, ITagDiscovery,
}
catch (Exception ex)
{
// Transport / wire error — evict so the next write creates a fresh handle.
EvictRuntime(device, def.Name); // Driver.AbCip-010
results[i] = new WriteResult(AbCipStatusMapper.BadCommunicationError);
_health = new DriverHealth(DriverState.Degraded, _health.LastSuccessfulRead, ex.Message);
}
@@ -713,6 +760,21 @@ public sealed class AbCipDriver : IDriver, IReadable, IWritable, ITagDiscovery,
return device.Runtimes[def.Name];
}
/// <summary>
/// Evict the runtime for <paramref name="tagName"/> from the device's cache and dispose
/// it so the next read/write call re-creates and re-initializes a fresh handle.
/// Called from <see cref="ReadSingleAsync"/>, <see cref="ReadGroupAsync"/>, and
/// <see cref="WriteAsync"/> after a non-zero libplctag status or transport exception —
/// mirroring the probe loop's recreate-on-failure behaviour (Driver.AbCip-010).
/// </summary>
private static void EvictRuntime(DeviceState device, string tagName)
{
if (device.Runtimes.TryRemove(tagName, out var stale))
{
try { stale.Dispose(); } catch { }
}
}
public DriverHealth GetHealth() => _health;
/// <summary>
@@ -851,8 +913,10 @@ public sealed class AbCipDriver : IDriver, IReadable, IWritable, ITagDiscovery,
/// <summary>
/// Per-device runtime state. Holds the parsed host address, family profile, and the
/// live <see cref="PlcTagHandle"/> cache keyed by tag path. PRs 38 populate + consume
/// this dict via libplctag.
/// live libplctag.NET <see cref="IAbCipTagRuntime"/> instances keyed by tag name.
/// Native tag lifetime is owned by the <c>Tag.Dispose()</c> inside each
/// <see cref="LibplctagTagRuntime"/>; libplctag.NET's own finalizer covers GC-collected
/// instances so no separate SafeHandle wrapper is needed here (Driver.AbCip-006).
/// </summary>
internal sealed class DeviceState(
AbCipHostAddress parsedAddress,
@@ -878,9 +942,6 @@ public sealed class AbCipDriver : IDriver, IReadable, IWritable, ITagDiscovery,
/// </summary>
public Task? ProbeTask { get; set; }
public Dictionary<string, PlcTagHandle> TagHandles { get; } =
new(StringComparer.OrdinalIgnoreCase);
/// <summary>
/// Per-tag runtime handles owned by this device. One entry per configured tag is
/// created lazily on first read (see <see cref="AbCipDriver.EnsureTagRuntimeAsync"/>).
@@ -907,8 +968,6 @@ public sealed class AbCipDriver : IDriver, IReadable, IWritable, ITagDiscovery,
public void DisposeHandles()
{
foreach (var h in TagHandles.Values) h.Dispose();
TagHandles.Clear();
foreach (var r in Runtimes.Values) r.Dispose();
Runtimes.Clear();
foreach (var r in ParentRuntimes.Values) r.Dispose();
@@ -44,9 +44,9 @@ internal sealed class LibplctagTagRuntime : IAbCipTagRuntime
AbCipDataType.Int => (int)_tag.GetInt16(offset),
AbCipDataType.UInt => (int)_tag.GetUInt16(offset),
AbCipDataType.DInt => _tag.GetInt32(offset),
AbCipDataType.UDInt => (int)_tag.GetUInt32(offset),
AbCipDataType.LInt => _tag.GetInt64(offset),
AbCipDataType.ULInt => (long)_tag.GetUInt64(offset),
AbCipDataType.UDInt => _tag.GetUInt32(offset), // UInt32 to match ToDriverDataType (Driver.AbCip-004)
AbCipDataType.LInt => _tag.GetInt64(offset), // Int64 to match ToDriverDataType (Driver.AbCip-004)
AbCipDataType.ULInt => _tag.GetUInt64(offset), // UInt64 to match ToDriverDataType (Driver.AbCip-004)
AbCipDataType.Real => _tag.GetFloat32(offset),
AbCipDataType.LReal => _tag.GetFloat64(offset),
AbCipDataType.String => _tag.GetString(offset),
@@ -1,59 +0,0 @@
using System.Runtime.InteropServices;
namespace ZB.MOM.WW.OtOpcUa.Driver.AbCip;
/// <summary>
/// <see cref="SafeHandle"/> wrapper around a libplctag native tag handle (an <c>int32</c>
/// returned from <c>plc_tag_create_ex</c>). Owns lifetime of the native allocation so a
/// leaked / GC-collected <see cref="PlcTagHandle"/> still calls <c>plc_tag_destroy</c>
/// during finalization — necessary because native libplctag allocations are opaque to
/// the driver's <see cref="Core.Abstractions.IDriver.GetMemoryFootprint"/>.
/// </summary>
/// <remarks>
/// <para>Risk documented in driver-specs.md §3 ("Operational Stability Notes"): the CLR
/// allocation tracker doesn't see libplctag's native heap, only whole-process RSS can.
/// Every handle leaked past its useful life is a direct contributor to the Tier-B recycle
/// trigger, so owning lifetime via SafeHandle is non-negotiable.</para>
///
/// <para><see cref="IsInvalid"/> is <c>true</c> when the native ID is &lt;= 0 — libplctag
/// returns negative <c>PLCTAG_ERR_*</c> codes on <c>plc_tag_create_ex</c> failure, which
/// we surface as an invalid handle rather than a disposable one (destroying a negative
/// handle would be undefined behavior in the native library).</para>
///
/// <para>The actual <c>DllImport</c> for <c>plc_tag_destroy</c> is deferred to PR 3 when
/// the driver first makes wire calls — PR 2 ships the lifetime scaffold + tests only.
/// Until the P/Invoke lands, <see cref="ReleaseHandle"/> is a no-op; the finalizer still
/// runs so the integration is correct as soon as the import is added.</para>
/// </remarks>
public sealed class PlcTagHandle : SafeHandle
{
/// <summary>Construct an invalid handle placeholder (use <see cref="FromNative"/> once created).</summary>
public PlcTagHandle() : base(invalidHandleValue: IntPtr.Zero, ownsHandle: true) { }
private PlcTagHandle(int nativeId) : base(invalidHandleValue: IntPtr.Zero, ownsHandle: true)
{
SetHandle(new IntPtr(nativeId));
}
/// <summary>Handle is invalid when the native ID is zero or negative (libplctag error).</summary>
public override bool IsInvalid => handle.ToInt32() <= 0;
/// <summary>Integer ID libplctag issued on <c>plc_tag_create_ex</c>.</summary>
public int NativeId => handle.ToInt32();
/// <summary>Wrap a native tag ID returned from libplctag.</summary>
public static PlcTagHandle FromNative(int nativeId) => new(nativeId);
/// <summary>
/// Destroy the native tag. No-op for PR 2 (the wire P/Invoke lands in PR 3). The base
/// <see cref="SafeHandle"/> machinery still guarantees this runs exactly once per
/// handle — either during <see cref="SafeHandle.Dispose()"/> or during finalization
/// if the owner was GC'd without explicit Dispose.
/// </summary>
protected override bool ReleaseHandle()
{
if (IsInvalid) return true;
// PR 3: wire up plc_tag_destroy(handle.ToInt32()) once the DllImport lands.
return true;
}
}
@@ -102,6 +102,19 @@ public sealed record AbLegacyAddress(
if (maxBit < 0 || b > maxBit) return null;
}
// I/O/S are single-letter system files — they carry no file number in the PCCC spec.
// Accepting I3:0 or S2:1 would pass a malformed address straight to libplctag; reject early.
if (fileNumber is not null && IsNoFileNumberLetter(letter)) return null;
// A PCCC address cannot have both a sub-element and a bit index: the word is either
// structured (T4:0.ACC) or bit-addressed (N7:0/3), never both.
if (subElement is not null && bitIndex is not null) return null;
// Sub-elements are only meaningful on Timer (T), Counter (C), and Control (R) files —
// those are the only structured-element file types in the PCCC spec. Accepting B3:0.DN
// or N7:0.FOO would produce an address libplctag silently misinterprets.
if (subElement is not null && !IsSubElementFileLetter(letter)) return null;
return new AbLegacyAddress(letter, fileNumber, word, bitIndex, subElement);
}
@@ -122,4 +135,18 @@ public sealed record AbLegacyAddress(
"N" or "F" or "B" or "L" or "ST" or "T" or "C" or "R" or "I" or "O" or "S" or "A" => true,
_ => false,
};
/// <summary>
/// Returns <see langword="true"/> for file letters that carry no explicit file number in the
/// PCCC spec. <c>I</c> (input), <c>O</c> (output), and <c>S</c> (status) are single-letter
/// system files; a digit after the letter (e.g. <c>I3</c>) is a malformed address.
/// </summary>
private static bool IsNoFileNumberLetter(string letter) => letter is "I" or "O" or "S";
/// <summary>
/// Returns <see langword="true"/> for file letters that may carry a sub-element suffix
/// (<c>.ACC</c>, <c>.PRE</c>, etc.). Only Timer (<c>T</c>), Counter (<c>C</c>), and
/// Control (<c>R</c>) files have structured elements in the PCCC spec.
/// </summary>
private static bool IsSubElementFileLetter(string letter) => letter is "T" or "C" or "R";
}
@@ -17,7 +17,13 @@ public sealed class AbLegacyDriver : IDriver, IReadable, IWritable, ITagDiscover
private readonly PollGroupEngine _poll;
private readonly Dictionary<string, DeviceState> _devices = new(StringComparer.OrdinalIgnoreCase);
private readonly Dictionary<string, AbLegacyTagDefinition> _tagsByName = new(StringComparer.OrdinalIgnoreCase);
private DriverHealth _health = new(DriverState.Unknown, null, null);
// volatile: _health is read by GetHealth() on any thread while ReadAsync / WriteAsync /
// InitializeAsync write it from worker / poll threads. The record-reference assignment is
// atomic on all .NET platforms, but without a memory barrier a reader can see a stale
// snapshot indefinitely. volatile enforces acquire/release ordering so GetHealth() always
// observes the most recently written value.
private volatile DriverHealth _health = new(DriverState.Unknown, null, null);
public event EventHandler<DataChangeEventArgs>? OnDataChange;
public event EventHandler<HostStatusChangedEventArgs>? OnHostStatusChanged;
@@ -53,6 +59,24 @@ public sealed class AbLegacyDriver : IDriver, IReadable, IWritable, ITagDiscover
}
foreach (var tag in _options.Tags) _tagsByName[tag.Name] = tag;
// Validate tag types against their device's family profile. Long (32-bit integer)
// and String (ST-file) are not supported by all PCCC families; reject them early
// so a misconfigured tag fails at init time with a clear message rather than
// surfacing an opaque comms error at runtime.
foreach (var tag in _options.Tags)
{
if (!_devices.TryGetValue(tag.DeviceHostAddress, out var deviceForTag)) continue;
var profile = deviceForTag.Profile;
if (tag.DataType == AbLegacyDataType.Long && !profile.SupportsLongFile)
throw new InvalidOperationException(
$"Tag '{tag.Name}' is typed as Long but device '{tag.DeviceHostAddress}' " +
$"(family {deviceForTag.Options.PlcFamily}) does not support L-files.");
if (tag.DataType == AbLegacyDataType.String && !profile.SupportsStringFile)
throw new InvalidOperationException(
$"Tag '{tag.Name}' is typed as String but device '{tag.DeviceHostAddress}' " +
$"(family {deviceForTag.Options.PlcFamily}) does not support ST-files.");
}
// Probe loops — one per device when enabled + probe address configured.
if (_options.Probe.Enabled && !string.IsNullOrWhiteSpace(_options.Probe.ProbeAddress))
{
@@ -68,6 +92,20 @@ public sealed class AbLegacyDriver : IDriver, IReadable, IWritable, ITagDiscover
catch (Exception ex)
{
_health = new DriverHealth(DriverState.Faulted, null, ex.Message);
// Tear down any probe loops and cached state that were created before the failure so
// that a caller who catches and abandons (rather than retrying via ReinitializeAsync)
// doesn't leave orphaned background tasks, CancellationTokenSources, and libplctag
// handles alive. Mirrors the body of ShutdownAsync without awaiting the poll engine
// (nothing has been subscribed yet at init time).
foreach (var state in _devices.Values)
{
try { state.ProbeCts?.Cancel(); } catch { }
state.ProbeCts?.Dispose();
state.ProbeCts = null;
state.DisposeRuntimes();
}
_devices.Clear();
_tagsByName.Clear();
throw;
}
return Task.CompletedTask;
@@ -313,7 +351,7 @@ public sealed class AbLegacyDriver : IDriver, IReadable, IWritable, ITagDiscover
var probeParams = new AbLegacyTagCreateParams(
Gateway: state.ParsedAddress.Gateway,
Port: state.ParsedAddress.Port,
CipPath: state.ParsedAddress.CipPath,
CipPath: state.EffectiveCipPath,
LibplctagPlcAttribute: state.Profile.LibplctagPlcAttribute,
TagName: _options.Probe.ProbeAddress!,
Timeout: _options.Probe.Timeout);
@@ -431,55 +469,84 @@ public sealed class AbLegacyDriver : IDriver, IReadable, IWritable, ITagDiscover
private async Task<IAbLegacyTagRuntime> EnsureParentRuntimeAsync(
AbLegacyDriver.DeviceState device, string parentName, CancellationToken ct)
{
// Fast path: runtime already cached.
if (device.ParentRuntimes.TryGetValue(parentName, out var existing)) return existing;
var runtime = _tagFactory.Create(new AbLegacyTagCreateParams(
Gateway: device.ParsedAddress.Gateway,
Port: device.ParsedAddress.Port,
CipPath: device.ParsedAddress.CipPath,
LibplctagPlcAttribute: device.Profile.LibplctagPlcAttribute,
TagName: parentName,
Timeout: _options.Timeout));
// Slow path: serialise creation per key so concurrent callers don't each create a
// runtime and one of them gets overwritten + leaked. Only one caller initialises; the
// others find the entry on the second TryGetValue inside the lock.
var creationLock = device.GetCreationLock($"parent:{parentName}");
await creationLock.WaitAsync(ct).ConfigureAwait(false);
try
{
await runtime.InitializeAsync(ct).ConfigureAwait(false);
if (device.ParentRuntimes.TryGetValue(parentName, out existing)) return existing;
var runtime = _tagFactory.Create(new AbLegacyTagCreateParams(
Gateway: device.ParsedAddress.Gateway,
Port: device.ParsedAddress.Port,
CipPath: device.EffectiveCipPath,
LibplctagPlcAttribute: device.Profile.LibplctagPlcAttribute,
TagName: parentName,
Timeout: _options.Timeout));
try
{
await runtime.InitializeAsync(ct).ConfigureAwait(false);
}
catch
{
runtime.Dispose();
throw;
}
device.ParentRuntimes[parentName] = runtime;
return runtime;
}
catch
finally
{
runtime.Dispose();
throw;
creationLock.Release();
}
device.ParentRuntimes[parentName] = runtime;
return runtime;
}
private async Task<IAbLegacyTagRuntime> EnsureTagRuntimeAsync(
DeviceState device, AbLegacyTagDefinition def, CancellationToken ct)
{
// Fast path: runtime already cached.
if (device.Runtimes.TryGetValue(def.Name, out var existing)) return existing;
var parsed = AbLegacyAddress.TryParse(def.Address)
?? throw new InvalidOperationException(
$"AbLegacy tag '{def.Name}' has malformed Address '{def.Address}'.");
var runtime = _tagFactory.Create(new AbLegacyTagCreateParams(
Gateway: device.ParsedAddress.Gateway,
Port: device.ParsedAddress.Port,
CipPath: device.ParsedAddress.CipPath,
LibplctagPlcAttribute: device.Profile.LibplctagPlcAttribute,
TagName: parsed.ToLibplctagName(),
Timeout: _options.Timeout));
// Slow path: serialise creation per tag name so concurrent callers for the same tag
// (server read path + poll loop) don't both create a runtime and one gets leaked.
var creationLock = device.GetCreationLock($"tag:{def.Name}");
await creationLock.WaitAsync(ct).ConfigureAwait(false);
try
{
await runtime.InitializeAsync(ct).ConfigureAwait(false);
if (device.Runtimes.TryGetValue(def.Name, out existing)) return existing;
var parsed = AbLegacyAddress.TryParse(def.Address)
?? throw new InvalidOperationException(
$"AbLegacy tag '{def.Name}' has malformed Address '{def.Address}'.");
var runtime = _tagFactory.Create(new AbLegacyTagCreateParams(
Gateway: device.ParsedAddress.Gateway,
Port: device.ParsedAddress.Port,
CipPath: device.EffectiveCipPath,
LibplctagPlcAttribute: device.Profile.LibplctagPlcAttribute,
TagName: parsed.ToLibplctagName(),
Timeout: _options.Timeout));
try
{
await runtime.InitializeAsync(ct).ConfigureAwait(false);
}
catch
{
runtime.Dispose();
throw;
}
device.Runtimes[def.Name] = runtime;
return runtime;
}
catch
finally
{
runtime.Dispose();
throw;
creationLock.Release();
}
device.Runtimes[def.Name] = runtime;
return runtime;
}
public void Dispose() => DisposeAsync().AsTask().GetAwaiter().GetResult();
@@ -493,7 +560,26 @@ public sealed class AbLegacyDriver : IDriver, IReadable, IWritable, ITagDiscover
public AbLegacyHostAddress ParsedAddress { get; } = parsedAddress;
public AbLegacyDeviceOptions Options { get; } = options;
public AbLegacyPlcFamilyProfile Profile { get; } = profile;
public Dictionary<string, IAbLegacyTagRuntime> Runtimes { get; } =
/// <summary>
/// The CIP path to pass to libplctag. When the parsed host address has an empty CIP
/// path (e.g. <c>ab://10.0.0.5/</c>), the profile-supplied default is used instead so
/// that a SLC 500 misconfigured without an explicit path still gets the required
/// <c>1,0</c> backplane route. MicroLogix has an empty default by design (direct EIP).
/// </summary>
public string EffectiveCipPath => ParsedAddress.CipPath.Length > 0
? ParsedAddress.CipPath
: Profile.DefaultCipPath;
/// <summary>
/// Per-tag cached runtimes. <see cref="System.Collections.Concurrent.ConcurrentDictionary{TKey,TValue}"/>
/// avoids the check-then-act race present on a plain <c>Dictionary</c>: two concurrent
/// <c>EnsureTagRuntimeAsync</c> callers for the same key both miss the lookup on a
/// plain dict and both create + store a runtime, leaking the loser. Access is guarded
/// by a per-key creation semaphore (<see cref="GetCreationLock"/>) so exactly one
/// runtime is created per tag name.
/// </summary>
public System.Collections.Concurrent.ConcurrentDictionary<string, IAbLegacyTagRuntime> Runtimes { get; } =
new(StringComparer.OrdinalIgnoreCase);
/// <summary>
@@ -501,9 +587,20 @@ public sealed class AbLegacyDriver : IDriver, IReadable, IWritable, ITagDiscover
/// parent address (bit suffix stripped) — e.g. writes to N7:0/3 + N7:0/5 share a
/// single parent runtime for N7:0.
/// </summary>
public Dictionary<string, IAbLegacyTagRuntime> ParentRuntimes { get; } =
public System.Collections.Concurrent.ConcurrentDictionary<string, IAbLegacyTagRuntime> ParentRuntimes { get; } =
new(StringComparer.OrdinalIgnoreCase);
/// <summary>
/// Per-key creation locks for <see cref="Runtimes"/> and <see cref="ParentRuntimes"/>.
/// A caller holds this before the TryGetValue + Create + InitializeAsync + TryAdd
/// sequence so that a concurrent caller waits rather than creating a duplicate runtime
/// that would be leaked on <see cref="DisposeRuntimes"/>.
/// </summary>
private readonly System.Collections.Concurrent.ConcurrentDictionary<string, SemaphoreSlim> _creationLocks = new(StringComparer.OrdinalIgnoreCase);
public SemaphoreSlim GetCreationLock(string key) =>
_creationLocks.GetOrAdd(key, _ => new SemaphoreSlim(1, 1));
private readonly System.Collections.Concurrent.ConcurrentDictionary<string, SemaphoreSlim> _rmwLocks = new();
public SemaphoreSlim GetRmwLock(string parentName) =>
@@ -1,3 +1,5 @@
using libplctag;
namespace ZB.MOM.WW.OtOpcUa.Driver.AbLegacy;
/// <summary>
@@ -20,28 +22,42 @@ public static class AbLegacyStatusMapper
public const uint BadTypeMismatch = 0x80730000u;
/// <summary>
/// Map libplctag return/status codes. Same polarity as the AbCip mapper — 0 success,
/// positive pending, negative error families.
/// Map a libplctag return/status code to an OPC UA StatusCode. The integer passed here
/// is <c>(int)Tag.GetStatus()</c> — the underlying value of the libplctag.NET
/// <see cref="Status"/> enum. Delegates to the strongly-typed overload so the mapping
/// stays correct regardless of how the wrapper renumbers native PLCTAG_ERR_* constants
/// in future releases.
/// </summary>
public static uint MapLibplctagStatus(int status)
public static uint MapLibplctagStatus(int status) => MapLibplctagStatus((Status)status);
/// <summary>
/// Map a libplctag.NET <see cref="Status"/> enum value to an OPC UA StatusCode. This is
/// the canonical core; the <c>int</c> overload exists only for the
/// <see cref="IAbLegacyTagRuntime.GetStatus"/> seam which boxes the enum as an int.
/// </summary>
public static uint MapLibplctagStatus(Status status) => status switch
{
if (status == 0) return Good;
if (status > 0) return GoodMoreData;
return status switch
{
-5 => BadTimeout,
-7 => BadCommunicationError,
-14 => BadNodeIdUnknown,
-16 => BadNotWritable,
-17 => BadOutOfRange,
_ => BadCommunicationError,
};
}
Status.Ok => Good,
Status.Pending => GoodMoreData,
Status.ErrorTimeout => BadTimeout,
Status.ErrorNotFound or Status.ErrorNoMatch or Status.ErrorBadDevice => BadNodeIdUnknown,
Status.ErrorNotAllowed => BadNotWritable,
Status.ErrorOutOfBounds or Status.ErrorTooLarge or Status.ErrorTooSmall => BadOutOfRange,
Status.ErrorUnsupported or Status.ErrorNotImplemented => BadNotSupported,
Status.ErrorBadConnection or Status.ErrorBadGateway or Status.ErrorBadReply
or Status.ErrorWinsock or Status.ErrorOpen or Status.ErrorClose
or Status.ErrorRead or Status.ErrorWrite or Status.ErrorRemoteErr
or Status.ErrorPartial or Status.ErrorAbort => BadCommunicationError,
_ => BadCommunicationError,
};
/// <summary>
/// Map a PCCC STS (status) byte. Common codes per AB PCCC reference:
/// 0x00 = success, 0x10 = illegal command, 0x20 = bad address, 0x30 = protected,
/// 0x40 = programmer busy, 0x50 = file locked, 0xF0 = extended status follows.
/// libplctag surfaces only its own <see cref="Status"/> enum rather than exposing
/// the raw STS byte, so this method is not wired into the current read/write path.
/// It is retained as the reference mapping for future PCCC-STS inspection.
/// </summary>
public static uint MapPcccStatus(byte sts) => sts switch
{
@@ -33,9 +33,13 @@ internal sealed class LibplctagLegacyTagRuntime : IAbLegacyTagRuntime
public object? DecodeValue(AbLegacyDataType type, int? bitIndex) => type switch
{
// When a bit suffix is present (e.g. B3:0/5) libplctag resolves the individual bit and
// GetBit returns it directly. When there is no suffix the caller addressed a Bit-typed
// tag without an explicit bit index; read the full 16-bit word and test bit 0 — GetInt8
// only covers the low byte and silently misses any bit set in bits 8..15.
AbLegacyDataType.Bit => bitIndex is int bit
? _tag.GetBit(bit)
: _tag.GetInt8(0) != 0,
: (_tag.GetInt16(0) & 1) != 0,
AbLegacyDataType.Int or AbLegacyDataType.AnalogInt => (int)_tag.GetInt16(0),
AbLegacyDataType.Long => _tag.GetInt32(0),
AbLegacyDataType.Float => _tag.GetFloat32(0),
@@ -25,6 +25,9 @@ public sealed class FocasDriver : IDriver, IReadable, IWritable, ITagDiscovery,
private readonly Dictionary<string, DeviceState> _devices = new(StringComparer.OrdinalIgnoreCase);
private readonly Dictionary<string, FocasTagDefinition> _tagsByName = new(StringComparer.OrdinalIgnoreCase);
private FocasAlarmProjection? _alarmProjection;
// _health is read/written from multiple threads (ReadAsync, WriteAsync, ProbeLoopAsync).
// Volatile.Read/Write ensures every thread sees the latest reference without a lock — the
// record is immutable so there is no torn-read risk on the object itself.
private DriverHealth _health = new(DriverState.Unknown, null, null);
public event EventHandler<DataChangeEventArgs>? OnDataChange;
@@ -49,7 +52,7 @@ public sealed class FocasDriver : IDriver, IReadable, IWritable, ITagDiscovery,
public Task InitializeAsync(string driverConfigJson, CancellationToken cancellationToken)
{
_health = new DriverHealth(DriverState.Initializing, null, null);
Volatile.Write(ref _health, new DriverHealth(DriverState.Initializing, null, null));
try
{
foreach (var device in _options.Devices)
@@ -69,8 +72,11 @@ public sealed class FocasDriver : IDriver, IReadable, IWritable, ITagDiscovery,
?? throw new InvalidOperationException(
$"FOCAS tag '{tag.Name}' has invalid Address '{tag.Address}'. " +
$"Expected forms: R100, R100.3, PARAM:1815/0, MACRO:500.");
if (_devices.TryGetValue(tag.DeviceHostAddress, out var device)
&& FocasCapabilityMatrix.Validate(device.Options.Series, parsed) is { } reason)
if (!_devices.TryGetValue(tag.DeviceHostAddress, out var device))
throw new InvalidOperationException(
$"FOCAS tag '{tag.Name}' references device '{tag.DeviceHostAddress}' " +
$"which is not in the Devices list. Check for a typo (e.g. wrong port or hostname).");
if (FocasCapabilityMatrix.Validate(device.Options.Series, parsed) is { } reason)
{
throw new InvalidOperationException(
$"FOCAS tag '{tag.Name}' ({tag.Address}) rejected by capability matrix: {reason}");
@@ -111,11 +117,11 @@ public sealed class FocasDriver : IDriver, IReadable, IWritable, ITagDiscovery,
}
}
_health = new DriverHealth(DriverState.Healthy, DateTime.UtcNow, null);
Volatile.Write(ref _health, new DriverHealth(DriverState.Healthy, DateTime.UtcNow, null));
}
catch (Exception ex)
{
_health = new DriverHealth(DriverState.Faulted, null, ex.Message);
Volatile.Write(ref _health, new DriverHealth(DriverState.Faulted, null, ex.Message));
throw;
}
return Task.CompletedTask;
@@ -150,10 +156,10 @@ public sealed class FocasDriver : IDriver, IReadable, IWritable, ITagDiscovery,
}
_devices.Clear();
_tagsByName.Clear();
_health = new DriverHealth(DriverState.Unknown, _health.LastSuccessfulRead, null);
Volatile.Write(ref _health, new DriverHealth(DriverState.Unknown, Volatile.Read(ref _health).LastSuccessfulRead, null));
}
public DriverHealth GetHealth() => _health;
public DriverHealth GetHealth() => Volatile.Read(ref _health);
public long GetMemoryFootprint() => 0;
public Task FlushOptionalCachesAsync(CancellationToken cancellationToken) => Task.CompletedTask;
@@ -206,16 +212,18 @@ public sealed class FocasDriver : IDriver, IReadable, IWritable, ITagDiscovery,
results[i] = new DataValueSnapshot(value, status, now, now);
if (status == FocasStatusMapper.Good)
_health = new DriverHealth(DriverState.Healthy, now, null);
Volatile.Write(ref _health, new DriverHealth(DriverState.Healthy, now, null));
else
_health = new DriverHealth(DriverState.Degraded, _health.LastSuccessfulRead,
$"FOCAS status 0x{status:X8} reading {reference}");
Volatile.Write(ref _health, new DriverHealth(DriverState.Degraded,
Volatile.Read(ref _health).LastSuccessfulRead,
$"FOCAS status 0x{status:X8} reading {reference}"));
}
catch (OperationCanceledException) { throw; }
catch (Exception ex)
{
results[i] = new DataValueSnapshot(null, FocasStatusMapper.BadCommunicationError, null, now);
_health = new DriverHealth(DriverState.Degraded, _health.LastSuccessfulRead, ex.Message);
Volatile.Write(ref _health, new DriverHealth(DriverState.Degraded,
Volatile.Read(ref _health).LastSuccessfulRead, ex.Message));
}
}
@@ -261,7 +269,8 @@ public sealed class FocasDriver : IDriver, IReadable, IWritable, ITagDiscovery,
catch (NotSupportedException nse)
{
results[i] = new WriteResult(FocasStatusMapper.BadNotSupported);
_health = new DriverHealth(DriverState.Degraded, _health.LastSuccessfulRead, nse.Message);
Volatile.Write(ref _health, new DriverHealth(DriverState.Degraded,
Volatile.Read(ref _health).LastSuccessfulRead, nse.Message));
}
catch (Exception ex) when (ex is FormatException or InvalidCastException)
{
@@ -274,7 +283,8 @@ public sealed class FocasDriver : IDriver, IReadable, IWritable, ITagDiscovery,
catch (Exception ex)
{
results[i] = new WriteResult(FocasStatusMapper.BadCommunicationError);
_health = new DriverHealth(DriverState.Degraded, _health.LastSuccessfulRead, ex.Message);
Volatile.Write(ref _health, new DriverHealth(DriverState.Degraded,
Volatile.Read(ref _health).LastSuccessfulRead, ex.Message));
}
}
@@ -369,17 +379,20 @@ public sealed class FocasDriver : IDriver, IReadable, IWritable, ITagDiscovery,
string.Equals(t.DeviceHostAddress, device.HostAddress, StringComparison.OrdinalIgnoreCase));
foreach (var tag in tagsForDevice)
{
// The wire backend is read-only by design (WriteAsync returns BadNotWritable
// for every address) so all FOCAS tags are advertised as ViewOnly regardless
// of the Writable config field. Advertising Operate would mislead OPC UA
// clients and the DriverNodeManager ACL layer into granting write permission
// on nodes that can never be written.
deviceFolder.Variable(tag.Name, tag.Name, new DriverAttributeInfo(
FullName: tag.Name,
DriverDataType: tag.DataType.ToDriverDataType(),
IsArray: false,
ArrayDim: null,
SecurityClass: tag.Writable
? SecurityClassification.Operate
: SecurityClassification.ViewOnly,
SecurityClass: SecurityClassification.ViewOnly,
IsHistorized: false,
IsAlarm: false,
WriteIdempotent: tag.WriteIdempotent));
WriteIdempotent: false));
}
}
return Task.CompletedTask;
@@ -862,7 +875,19 @@ public sealed class FocasDriver : IDriver, IReadable, IWritable, ITagDiscovery,
private async Task<IFocasClient> EnsureConnectedAsync(DeviceState device, CancellationToken ct)
{
if (device.Client is { IsConnected: true } c) return c;
device.Client ??= _clientFactory.Create();
// Discard the existing client (if any) before creating a new one. A client that is
// non-null but not connected may have been disposed by a HandleRecycle race or a prior
// teardown — retrying ConnectAsync on a disposed FocasWireClient hits ThrowIfDisposed and
// returns a permanent BadCommunicationError with no recovery. Replacing it unconditionally
// ensures EnsureConnectedAsync always works with a fresh, non-disposed instance.
if (device.Client is not null)
{
device.Client.Dispose();
device.Client = null;
}
device.Client = _clientFactory.Create();
try
{
await device.Client.ConnectAsync(device.ParsedAddress, _options.Timeout, ct).ConfigureAwait(false);
@@ -26,7 +26,7 @@ namespace ZB.MOM.WW.OtOpcUa.Driver.Galaxy;
/// "GalaxyMxGateway" so both paths can be live simultaneously during parity testing.
/// </remarks>
public sealed class GalaxyDriver
: IDriver, ITagDiscovery, IReadable, IWritable, ISubscribable, IRediscoverable, IHostConnectivityProbe, IAlarmSource, IDisposable
: IDriver, ITagDiscovery, IReadable, IWritable, ISubscribable, IRediscoverable, IHostConnectivityProbe, IAlarmSource, IDisposable, IAsyncDisposable
{
private readonly string _driverInstanceId;
private readonly GalaxyDriverOptions _options;
@@ -75,7 +75,10 @@ public sealed class GalaxyDriver
private readonly Lock _alarmHandlersLock = new();
private readonly Lock _alarmFeedLock = new();
private bool _alarmFeedWired;
private readonly HashSet<GalaxyAlarmSubscriptionHandle> _alarmSubscriptions = new();
// List preserves insertion order so OnAlarmFeedTransition always picks the
// earliest-registered handle — a deterministic choice that doesn't vary as
// handles are added/removed (Driver.Galaxy-006 fix: HashSet.First() is unstable).
private readonly List<GalaxyAlarmSubscriptionHandle> _alarmSubscriptions = new();
// PR 4.W — production runtime owned by InitializeAsync. The driver builds these
// when it opens a real gw session; tests bypass them by injecting seams via the
@@ -442,17 +445,21 @@ public sealed class GalaxyDriver
// Reuse the lazily-built repository client (DiscoverAsync constructs it on demand).
// If discovery hasn't run yet, build the client here so the watcher has a target.
if (_ownedRepositoryClient is null)
{
_ownedRepositoryClient = MxGateway.Client.GalaxyRepositoryClient.Create(
BuildClientOptions(_options.Gateway));
}
// Driver.Galaxy-009 fix: guard with ??= so if BuildDefaultHierarchySource later runs
// it reuses this client rather than overwriting the field and leaking the first instance.
_ownedRepositoryClient ??= MxGateway.Client.GalaxyRepositoryClient.Create(
BuildClientOptions(_options.Gateway));
var source = new GatewayGalaxyDeployWatchSource(_ownedRepositoryClient);
_deployWatcher = new DeployWatcher(source, _logger);
_deployWatcher.OnRediscoveryNeeded += (_, args) => OnRediscoveryNeeded?.Invoke(this, args);
_ = _deployWatcher.StartAsync(CancellationToken.None);
// StartAsync schedules the background loop and returns Task.CompletedTask immediately.
// It throws InvalidOperationException synchronously if called twice (programming error).
// Driver.Galaxy-009 fix: don't discard the return value — observe any synchronous throw.
var startTask = _deployWatcher.StartAsync(CancellationToken.None);
// The task is already completed (StartAsync is synchronous); surface any synchronous fault.
if (startTask.IsFaulted) startTask.GetAwaiter().GetResult();
}
/// <inheritdoc />
@@ -492,7 +499,22 @@ public sealed class GalaxyDriver
public IReadOnlyList<HostConnectivityStatus> GetHostStatuses() => _hostStatuses.Snapshot();
/// <inheritdoc />
public long GetMemoryFootprint() => 0; // PR 4.4 sets this from SubscriptionRegistry size.
/// <remarks>
/// Estimated footprint: 64 bytes × tracked item handles (one gw subscription entry
/// per bound tag) + 256 bytes × tracked driver subscriptions (registry overhead per
/// OPC UA monitored item). Returns 0 when no subscriptions are active. These
/// constants are conservative — a 50k-tag set occupies ~3 MB and registers clearly
/// with the server's cache-flush heuristic. Driver.Galaxy-011: the stale
/// "PR 4.4 sets this" comment is removed; PR 4.4 shipped the SubscriptionRegistry
/// but never wired it here.
/// </remarks>
public long GetMemoryFootprint()
{
const long BytesPerItemHandle = 64L; // TagBinding + reverse-map entry
const long BytesPerSubscription = 256L; // SubscriptionEntry overhead
return (_subscriptions.TrackedItemHandleCount * BytesPerItemHandle)
+ (_subscriptions.TrackedSubscriptionCount * BytesPerSubscription);
}
/// <inheritdoc />
public Task FlushOptionalCachesAsync(CancellationToken cancellationToken) => Task.CompletedTask;
@@ -964,12 +986,15 @@ public sealed class GalaxyDriver
GalaxyAlarmSubscriptionHandle? handle;
lock (_alarmHandlersLock)
{
// Pick any active subscription handle as the "owner" of the event. The
// server-side state machine doesn't multiplex by handle today; if multiple
// alarm subscriptions are active we still only fire the event once and
// the AlarmConditionService dispatches per-source-node downstream.
// Pick the earliest-registered handle as the event owner. The server routes
// by SourceNodeId (not by handle), so every active subscriber sees the same
// transition regardless of which handle is attached here. Using the first
// insertion-order entry is deterministic and stable as long as at least one
// subscription remains — HashSet.First() was unstable across mutations
// (Driver.Galaxy-006 fix). _alarmSubscriptions is a List, so [0] is always
// the earliest-registered handle.
handle = _alarmSubscriptions.Count > 0
? _alarmSubscriptions.First()
? _alarmSubscriptions[0]
: null;
}
if (handle is null) return;
@@ -1010,15 +1035,11 @@ public sealed class GalaxyDriver
if (_probeWatcher is not null
&& args.FullReference.EndsWith(PerPlatformProbeWatcher.ProbeSuffix, StringComparison.OrdinalIgnoreCase))
{
// The probe decoder takes a raw quality byte; recover it from the StatusCode
// top byte (Good=0x00 → byte 192, Uncertain=0x40 → byte 64, Bad=0x80 → byte 0).
var qualityByte = (byte)((args.Snapshot.StatusCode >> 30) & 0x3) switch
{
0 => 192,
1 => 64,
_ => 0,
};
_probeWatcher.OnProbeValueChanged(args.FullReference, args.Snapshot.Value, (byte)qualityByte);
// The probe decoder takes a raw quality byte. Recover it via the canonical
// StatusCodeMap.ToQualityCategoryByte helper so the mapping lives in one
// place next to its inverse (FromQualityByte) and cannot desync silently.
var qualityByte = StatusCodeMap.ToQualityCategoryByte(args.Snapshot.StatusCode);
_probeWatcher.OnProbeValueChanged(args.FullReference, args.Snapshot.Value, qualityByte);
}
}
@@ -1030,57 +1051,81 @@ public sealed class GalaxyDriver
/// </summary>
private IGalaxyHierarchySource BuildDefaultHierarchySource()
{
var gw = _options.Gateway;
var clientOptions = new MxGatewayClientOptions
{
Endpoint = new Uri(gw.Endpoint, UriKind.Absolute),
ApiKey = ResolveApiKey(gw.ApiKeySecretRef),
UseTls = gw.UseTls,
CaCertificatePath = gw.CaCertificatePath,
ConnectTimeout = TimeSpan.FromSeconds(gw.ConnectTimeoutSeconds),
DefaultCallTimeout = TimeSpan.FromSeconds(gw.DefaultCallTimeoutSeconds),
StreamTimeout = gw.StreamTimeoutSeconds > 0
? TimeSpan.FromSeconds(gw.StreamTimeoutSeconds)
: null,
};
_ownedRepositoryClient = GalaxyRepositoryClient.Create(clientOptions);
// Driver.Galaxy-009 fix: reuse a client that StartDeployWatcher may have already
// created (??=) rather than always overwriting the field and leaking the first
// instance. Both paths produce equivalent clients from the same options.
_ownedRepositoryClient ??= GalaxyRepositoryClient.Create(BuildClientOptions(_options.Gateway));
return new TracedGalaxyHierarchySource(
new GatewayGalaxyHierarchySource(_ownedRepositoryClient), _options.MxAccess.ClientName);
}
public void Dispose()
/// <summary>
/// Asynchronous disposal. Prefer <c>await using</c> over <c>using</c> — the
/// async path does not block the caller while awaiting EventPump / session /
/// client shutdown (Driver.Galaxy-007: the sync path blocked on
/// <c>GetAwaiter().GetResult()</c> for every async sub-component, risking a
/// deadlock under thread-pool starvation).
/// </summary>
public async ValueTask DisposeAsync()
{
if (_disposed) return;
_disposed = true;
// Order: stop deploy watcher, supervisor, probe watcher, pump, then sessions and
// clients. Each step is best-effort — disposal during a faulted state shouldn't
// throw and prevent the rest of the cleanup.
// Synchronous sub-components first — none of these block.
try { _deployWatcher?.Dispose(); } catch (Exception ex) { _logger.LogWarning(ex, "DeployWatcher dispose failed"); }
try { _supervisor?.Dispose(); } catch (Exception ex) { _logger.LogWarning(ex, "ReconnectSupervisor dispose failed"); }
try { _probeWatcher?.Dispose(); } catch (Exception ex) { _logger.LogWarning(ex, "ProbeWatcher dispose failed"); }
try { _transportForwarder?.Dispose(); } catch (Exception ex) { _logger.LogWarning(ex, "Transport forwarder dispose failed"); }
// Async sub-components: await each so we don't block a thread-pool thread
// on a slow shutdown (e.g. EventPump draining its channel, gRPC stream closing).
EventPump? pump;
lock (_pumpLock) { pump = _eventPump; _eventPump = null; }
pump?.DisposeAsync().AsTask().GetAwaiter().GetResult();
if (pump is not null)
{
try { await pump.DisposeAsync().ConfigureAwait(false); }
catch (Exception ex) { _logger.LogWarning(ex, "EventPump dispose failed"); }
}
IGalaxyAlarmFeed? alarmFeed;
lock (_alarmFeedLock) { alarmFeed = _alarmFeed; _alarmFeed = null; }
try { alarmFeed?.DisposeAsync().AsTask().GetAwaiter().GetResult(); }
catch (Exception ex) { _logger.LogWarning(ex, "Alarm feed dispose failed"); }
if (alarmFeed is not null)
{
try { await alarmFeed.DisposeAsync().ConfigureAwait(false); }
catch (Exception ex) { _logger.LogWarning(ex, "Alarm feed dispose failed"); }
}
_ownedMxSession?.DisposeAsync().AsTask().GetAwaiter().GetResult();
_ownedMxSession = null;
if (_ownedMxSession is not null)
{
try { await _ownedMxSession.DisposeAsync().ConfigureAwait(false); }
catch (Exception ex) { _logger.LogWarning(ex, "MxSession dispose failed"); }
_ownedMxSession = null;
}
_ownedMxClient?.DisposeAsync().AsTask().GetAwaiter().GetResult();
_ownedMxClient = null;
if (_ownedMxClient is not null)
{
try { await _ownedMxClient.DisposeAsync().ConfigureAwait(false); }
catch (Exception ex) { _logger.LogWarning(ex, "MxClient dispose failed"); }
_ownedMxClient = null;
}
if (_ownedRepositoryClient is not null)
{
try { await _ownedRepositoryClient.DisposeAsync().ConfigureAwait(false); }
catch (Exception ex) { _logger.LogWarning(ex, "RepositoryClient dispose failed"); }
_ownedRepositoryClient = null;
}
_ownedRepositoryClient?.DisposeAsync().AsTask().GetAwaiter().GetResult();
_ownedRepositoryClient = null;
_hierarchySource = null;
}
/// <summary>
/// Synchronous disposal. Prefer <see cref="DisposeAsync"/> in async contexts —
/// this path must block on every async sub-component shutdown. Provided for
/// compatibility with <c>using</c> statements that cannot <c>await</c>.
/// </summary>
public void Dispose() => DisposeAsync().AsTask().GetAwaiter().GetResult();
/// <summary>
/// Address-space builder wrapper that records each variable's
/// <see cref="DriverAttributeInfo.SecurityClass"/> into the supplied dictionary
@@ -1,4 +1,5 @@
using Microsoft.Extensions.Logging;
using MxGateway.Client;
using MxGateway.Contracts.Proto;
namespace ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Runtime;
@@ -75,18 +76,20 @@ internal static class StatusCodeMap
};
/// <summary>
/// Map a gateway-reported <see cref="MxStatusProxy"/> to OPC UA StatusCode. Honors
/// the success flag, then the detail byte (treated as a quality substatus), with a
/// transport-error fallback for status rows whose detected_by indicates the failure
/// happened before the MXAccess call ran.
/// Map a gateway-reported <see cref="MxStatusProxy"/> to OPC UA StatusCode. Uses
/// <see cref="MxStatusProxyExtensions.IsSuccess"/> (category == OK AND success != 0)
/// as the canonical success test — the proto contract explicitly documents that
/// <c>success</c> is NOT a boolean and must not be checked in isolation; category is
/// the authoritative indicator. On failure, the detail byte (OPC DA quality substatus)
/// drives the specific code, with a transport-error fallback for pre-MXAccess failures.
/// </summary>
public static uint FromMxStatus(MxStatusProxy? status, ILogger? logger = null)
{
if (status is null) return Good;
if (status.Success != 0) return Good;
if (status.IsSuccess()) return Good;
// Detail field carries the substatus when the worker translated MX-style codes;
// when zero, infer from category + detected_by.
// when zero, infer from detected_by.
var detail = (byte)(status.Detail & 0xFF);
if (detail != 0) return FromQualityByte(detail, logger);
@@ -98,6 +101,25 @@ internal static class StatusCodeMap
return Bad;
}
/// <summary>
/// Convert an OPC UA status-code uint back to the OPC DA quality category
/// byte — Good=192, Uncertain=64, Bad=0 — by extracting the top-two bits of the
/// high word. This is the inverse of the category-bucket arm of
/// <see cref="FromQualityByte"/>. It is intentionally lossy (substatus bits are not
/// round-tripped) because the sole consumer
/// (<see cref="ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Health.PerPlatformProbeWatcher"/>)
/// only tests <c>qualityByte &lt; 192</c> to distinguish Running from Stopped. Keeping
/// the round-trip in one place means a future change to the OPC UA bit layout cannot
/// silently desync the probe-health decode.
/// </summary>
public static byte ToQualityCategoryByte(uint statusCode) =>
(byte)(((statusCode >> 30) & 0x3u) switch
{
0u => 192u, // Good — top two bits 00b → OPC DA 0xC0
1u => 64u, // Uncertain — top two bits 01b → OPC DA 0x40
_ => 0u, // Bad — top two bits 10b/11b → OPC DA 0x00
});
private static uint Categorize(byte q, ILogger? logger)
{
if (q >= 192) { Log(logger, q, "Good"); return Good; }
@@ -28,14 +28,18 @@ public sealed class FrameReader : IDisposable
if (length < 0 || length > Framing.MaxFrameBodyBytes)
throw new InvalidDataException($"Sidecar IPC frame length {length} out of range.");
var kindByte = _stream.ReadByte();
if (kindByte < 0) throw new EndOfStreamException("EOF after length prefix, before kind byte.");
// Read the kind byte asynchronously and cancellably — a synchronous ReadByte()
// blocks the thread-pool thread and cannot be interrupted by the call-timeout token
// if the peer stalls mid-frame (finding 005).
var kindBuffer = new byte[Framing.KindByteSize];
if (!await ReadExactAsync(kindBuffer, ct).ConfigureAwait(false))
throw new EndOfStreamException("EOF after length prefix, before kind byte.");
var body = new byte[length];
if (!await ReadExactAsync(body, ct).ConfigureAwait(false))
throw new EndOfStreamException("EOF mid-frame.");
return ((MessageKind)(byte)kindByte, body);
return ((MessageKind)kindBuffer[0], body);
}
public static T Deserialize<T>(byte[] body) => MessagePackSerializer.Deserialize<T>(body);
@@ -135,9 +135,8 @@ public sealed class WonderwareHistorianClient : IHistorianDataSource, IAlarmHist
var requested = DateTime.SpecifyKind(timestampsUtc[i], DateTimeKind.Utc);
if (byTicks.TryGetValue(requested.Ticks, out var dto))
{
var value = dto.ValueBytes is null ? null : MessagePackSerializer.Deserialize<object>(dto.ValueBytes);
result[i] = new DataValueSnapshot(
Value: value,
Value: DeserializeSampleValue(dto.ValueBytes),
StatusCode: QualityMapper.Map(dto.Quality),
SourceTimestampUtc: requested,
ServerTimestampUtc: DateTime.UtcNow);
@@ -195,6 +194,28 @@ public sealed class WonderwareHistorianClient : IHistorianDataSource, IAlarmHist
// ===== IAlarmHistorianWriter =====
/// <summary>
/// Writes a batch of alarm events to the Wonderware historian via the sidecar.
/// </summary>
/// <remarks>
/// <para>
/// <b>PermanentFail limitation (finding 002):</b> this writer never returns
/// <see cref="HistorianWriteOutcome.PermanentFail"/>. The sidecar wire contract
/// (<see cref="WriteAlarmEventsReply.PerEventOk"/>) carries only a per-event
/// boolean (succeeded / did-not-succeed) and provides no unrecoverable vs.
/// transient distinction. A poison event that the historian SDK can never persist
/// (e.g. a permanently malformed row) will therefore retry indefinitely inside the
/// store-and-forward drain worker rather than being moved to the dead-letter table.
/// Extending the protocol to add a per-event status enum (Ack / Retry / Permanent)
/// requires a coordinated additive change to the .NET 4.8 sidecar and is tracked as
/// a follow-up. Until then, the drain worker's own retry-count limit is the
/// backstop against an infinite loop.
/// </para>
/// <para>
/// Transport or deserialization failures return <see cref="HistorianWriteOutcome.RetryPlease"/>
/// for every event in the batch; the drain worker's backoff controls recovery.
/// </para>
/// </remarks>
public async Task<IReadOnlyList<HistorianWriteOutcome>> WriteBatchAsync(
IReadOnlyList<AlarmHistorianEvent> batch, CancellationToken cancellationToken)
{
@@ -224,6 +245,8 @@ public sealed class WonderwareHistorianClient : IHistorianDataSource, IAlarmHist
}
// Per-event status: PerEventOk[i] = true → Ack; false → RetryPlease.
// NOTE: PermanentFail is never emitted — see <remarks> for the wire-contract
// limitation and why poison events currently retry rather than dead-letter.
var outcomes = new HistorianWriteOutcome[batch.Count];
for (var i = 0; i < batch.Count; i++)
{
@@ -235,13 +258,25 @@ public sealed class WonderwareHistorianClient : IHistorianDataSource, IAlarmHist
catch
{
// Transport / deserialization failure — every event is retry-please. The drain
// worker's backoff handles recovery.
// worker's backoff handles recovery. PermanentFail is never emitted (see <remarks>).
var fail = new HistorianWriteOutcome[batch.Count];
Array.Fill(fail, HistorianWriteOutcome.RetryPlease);
return fail;
}
}
// ===== Constants =====
/// <summary>
/// Per-sample ValueBytes size cap. MessagePack with the default
/// <see cref="MessagePack.Resolvers.StandardResolver"/> (primitive-only — no typeless
/// or dynamic-type resolution) is not susceptible to type-confusion gadget chains, but
/// we still cap the per-sample byte budget to guard against a buggy or unexpectedly
/// large peer payload. 64 KiB is well above any primitive historian value.
/// (Finding 007 — NuGetAuditSuppress GHSA-37gx-xxp4-5rgx / GHSA-w3x6-4m5h-cxqf.)
/// </summary>
private const int MaxValueBytesPerSample = 64 * 1024;
// ===== Helpers =====
private async Task<TReply> Invoke<TRequest, TReply>(
@@ -310,6 +345,26 @@ public sealed class WonderwareHistorianClient : IHistorianDataSource, IAlarmHist
}
}
/// <summary>
/// Deserializes a sample's value bytes using the MessagePack default
/// <see cref="MessagePack.Resolvers.StandardResolver"/> (primitive types only — no
/// typeless or dynamic-type resolution). A per-sample size cap guards against a
/// hostile or buggy peer sending an unexpectedly large payload before deserialization
/// allocates memory for it. (Finding 007.)
/// </summary>
private static object? DeserializeSampleValue(byte[]? valueBytes)
{
if (valueBytes is null) return null;
if (valueBytes.Length > MaxValueBytesPerSample)
throw new InvalidDataException(
$"Sidecar sample ValueBytes length {valueBytes.Length} exceeds the {MaxValueBytesPerSample}-byte cap.");
// Deserializes using the default resolver which only handles primitive types
// (bool, int, long, float, double, string, byte[], DateTime, etc.). The resolver
// does NOT support TypelessContractlessStandardResolver so no type-confusion gadget
// chains are reachable from this call site.
return MessagePackSerializer.Deserialize<object>(valueBytes);
}
private static IReadOnlyList<DataValueSnapshot> ToSnapshots(HistorianSampleDto[] dtos)
{
if (dtos.Length == 0) return [];
@@ -317,9 +372,8 @@ public sealed class WonderwareHistorianClient : IHistorianDataSource, IAlarmHist
for (var i = 0; i < dtos.Length; i++)
{
var dto = dtos[i];
var value = dto.ValueBytes is null ? null : MessagePackSerializer.Deserialize<object>(dto.ValueBytes);
snapshots[i] = new DataValueSnapshot(
Value: value,
Value: DeserializeSampleValue(dto.ValueBytes),
StatusCode: QualityMapper.Map(dto.Quality),
SourceTimestampUtc: new DateTime(dto.TimestampUtcTicks, DateTimeKind.Utc),
ServerTimestampUtc: DateTime.UtcNow);
@@ -27,6 +27,16 @@
</ItemGroup>
<ItemGroup>
<!--
GHSA-37gx-xxp4-5rgx (MessagePack — unsafe deserialization via dynamic code generation)
GHSA-w3x6-4m5h-cxqf (MessagePack — TypelessContractlessStandardResolver gadget chain)
Neither advisory applies to this module's usage: all deserialization here uses the
default StandardResolver (primitive types only). TypelessContractlessStandardResolver
is never referenced and no DynamicUnion / DynamicGenericResolver is registered.
DeserializeSampleValue() enforces a 64 KiB per-sample ValueBytes cap (finding 007).
Revisit once MessagePack 3.x is available and drop these suppressions at that time.
-->
<NuGetAuditSuppress Include="https://github.com/advisories/GHSA-37gx-xxp4-5rgx"/>
<NuGetAuditSuppress Include="https://github.com/advisories/GHSA-w3x6-4m5h-cxqf"/>
</ItemGroup>
@@ -316,15 +316,9 @@ namespace ZB.MOM.WW.OtOpcUa.Driver.Historian.Wonderware.Backend
var result = query.QueryResult;
var timestamp = DateTime.SpecifyKind(result.StartDateTime, DateTimeKind.Utc);
object? value;
if (!string.IsNullOrEmpty(result.StringValue) && result.Value == 0)
value = result.StringValue;
else
value = result.Value;
results.Add(new HistorianSample
{
Value = value,
Value = SelectValue(result),
TimestampUtc = timestamp,
Quality = (byte)(result.OpcQuality & 0xFF),
});
@@ -379,6 +373,12 @@ namespace ZB.MOM.WW.OtOpcUa.Driver.Historian.Wonderware.Backend
return Task.FromResult(results);
}
// Apply the same bucket cap as the raw-read path so a wide time range with a
// small IntervalMs cannot produce an unbounded result set that would overflow
// the 16 MiB FrameWriter frame cap and lose the entire reply.
var bucketLimit = _config.MaxValuesPerRead;
var bucketCount = 0;
while (query.MoveNext(out error))
{
ct.ThrowIfCancellationRequested();
@@ -392,6 +392,15 @@ namespace ZB.MOM.WW.OtOpcUa.Driver.Historian.Wonderware.Backend
Value = value,
TimestampUtc = timestamp,
});
bucketCount++;
if (bucketLimit > 0 && bucketCount >= bucketLimit)
{
Log.Warning(
"HistoryRead aggregate ({Aggregate}): {Tag} truncated at {Limit} buckets — widen IntervalMs or reduce time range",
aggregateColumn, tagName, bucketLimit);
break;
}
}
query.EndQuery(out _);
@@ -453,15 +462,9 @@ namespace ZB.MOM.WW.OtOpcUa.Driver.Historian.Wonderware.Backend
if (query.MoveNext(out error))
{
var result = query.QueryResult;
object? value;
if (!string.IsNullOrEmpty(result.StringValue) && result.Value == 0)
value = result.StringValue;
else
value = result.Value;
results.Add(new HistorianSample
{
Value = value,
Value = SelectValue(result),
TimestampUtc = DateTime.SpecifyKind(timestamp, DateTimeKind.Utc),
Quality = (byte)(result.OpcQuality & 0xFF),
});
@@ -574,6 +577,29 @@ namespace ZB.MOM.WW.OtOpcUa.Driver.Historian.Wonderware.Backend
#pragma warning restore CS0618
}
/// <summary>
/// Selects the typed value from a <see cref="HistoryQueryResult"/> row.
/// <para>
/// <b>SDK limitation:</b> <c>HistoryQueryResult</c> exposes only <c>Value</c>
/// (double) and <c>StringValue</c> (string) — there is no tag data-type field on
/// the result. The correct approach would be to branch on the tag's declared
/// data type, but the bound version of <c>aahClientManaged</c> does not surface
/// it per query result. The heuristic below is the best available: prefer
/// <c>StringValue</c> only when it is non-empty AND <c>Value</c> is zero,
/// because string tags in the Historian SDK always project to <c>Value=0</c>
/// while numeric tags may legitimately sample to zero (in which case the SDK
/// does not populate <c>StringValue</c>). A numeric tag at exactly zero with a
/// non-empty formatted <c>StringValue</c> (e.g. "0.00") would be mis-reported
/// as a string; this is a known edge case of the SDK binding.
/// </para>
/// </summary>
private static object? SelectValue(HistoryQueryResult result)
{
if (!string.IsNullOrEmpty(result.StringValue) && result.Value == 0)
return result.StringValue;
return result.Value;
}
internal static double? ExtractAggregateValue(AnalogSummaryQueryResult result, string column)
{
switch (column)
@@ -153,6 +153,11 @@ public sealed class HistorianFrameHandler : IFrameHandler
private async Task HandleWriteAlarmEventsAsync(byte[] body, FrameWriter writer, CancellationToken ct)
{
var req = MessagePackSerializer.Deserialize<WriteAlarmEventsRequest>(body);
// MessagePack deserializes an absent or explicit-nil array as null, not Array.Empty.
// Normalise here so every path below can safely dereference .Length without an NRE.
req.Events ??= Array.Empty<AlarmHistorianEventDto>();
var reply = new WriteAlarmEventsReply { CorrelationId = req.CorrelationId };
if (_alarmWriter is null)
@@ -113,17 +113,62 @@ public sealed class PipeServer : IDisposable
}
}
// Backoff sequence for consecutive RunOneConnection failures: 250 ms → 500 ms →
// 1 000 ms → 2 000 ms → 4 000 ms → capped at 8 000 ms thereafter.
private static readonly TimeSpan[] BackoffSteps =
{
TimeSpan.FromMilliseconds(250),
TimeSpan.FromMilliseconds(500),
TimeSpan.FromSeconds(1),
TimeSpan.FromSeconds(2),
TimeSpan.FromSeconds(4),
TimeSpan.FromSeconds(8),
};
/// <summary>
/// Maximum consecutive failures before the server gives up and lets the process exit
/// so the supervisor (NSSM / SCM) can restart the sidecar cleanly.
/// </summary>
private const int MaxConsecutiveFailures = 20;
/// <summary>
/// Runs the server continuously, handling one connection at a time. When a connection
/// ends (clean or error), accepts the next.
/// ends (clean or error), waits with exponential backoff before accepting the next.
/// If <see cref="MaxConsecutiveFailures"/> consecutive failures occur the method
/// throws so the supervisor can restart the sidecar.
/// </summary>
public async Task RunAsync(IFrameHandler handler, CancellationToken ct)
{
var consecutiveFailures = 0;
while (!ct.IsCancellationRequested)
{
try { await RunOneConnectionAsync(handler, ct).ConfigureAwait(false); }
try
{
await RunOneConnectionAsync(handler, ct).ConfigureAwait(false);
consecutiveFailures = 0; // a clean connection (even a short-lived one) resets the counter
}
catch (OperationCanceledException) { break; }
catch (Exception ex) { _logger.Error(ex, "Sidecar IPC connection loop error — accepting next"); }
catch (Exception ex)
{
consecutiveFailures++;
if (consecutiveFailures >= MaxConsecutiveFailures)
{
_logger.Fatal(ex,
"Sidecar IPC connection loop failed {Count} consecutive times — giving up so supervisor can restart",
consecutiveFailures);
throw;
}
var delay = BackoffSteps[Math.Min(consecutiveFailures - 1, BackoffSteps.Length - 1)];
_logger.Error(ex,
"Sidecar IPC connection loop error (consecutive failure {Count}/{Max}) — retrying in {Delay}",
consecutiveFailures, MaxConsecutiveFailures, delay);
try { await Task.Delay(delay, ct).ConfigureAwait(false); }
catch (OperationCanceledException) { break; }
}
}
}
@@ -85,11 +85,26 @@ public static class ModbusAddressParser
// else surfaces a clear error in whichever slot it lands.
if (parts.Length == 3)
{
// Driver.Modbus.Addressing-002: reject an empty 3rd field (e.g. "40001:F:") rather
// than silently dropping it. Enumerable.All returns true for an empty sequence, so
// without this guard the empty string would be classified as a valid array count and
// then quietly ignored, leaving the user with no diagnostic for a typo'd trailing colon.
if (parts[2].Length == 0)
{
error = $"3rd field is empty in '{address}' — use 4-field form '40001:F::5' to specify an array count with default byte order, or remove the trailing ':'";
return false;
}
if (LooksLikeByteOrderToken(parts[2])) orderPart = parts[2];
else if (parts[2].All(char.IsDigit)) countPart = parts[2];
else
{
error = $"3rd field '{parts[2]}' must be a 4-letter byte order (ABCD/CDAB/BADC/DCBA) or a positive integer array count in '{address}'";
// Driver.Modbus.Addressing-003: when TryParseByteOrder would fail on a 4-letter
// token that looks like a type code (e.g. BOOL), improve the diagnostic so the
// user knows field 3 is a byte order and field 2 is the type.
var mightBeTypeCode = parts[2].Length == 4 && parts[2].All(char.IsLetterOrDigit);
error = mightBeTypeCode
? $"3rd field '{parts[2]}' looks like a type code — type belongs in field 2 (e.g. '40001:BOOL'), not field 3. Field 3 must be a 4-letter byte order (ABCD/CDAB/BADC/DCBA) or a positive integer array count in '{address}'"
: $"3rd field '{parts[2]}' must be a 4-letter byte order (ABCD/CDAB/BADC/DCBA) or a positive integer array count in '{address}'";
return false;
}
}
@@ -180,10 +195,26 @@ public static class ModbusAddressParser
}
// Optional bit suffix: '.N' at the end, N in 0..15. Strip before parsing region/offset.
var dotIdx = text.IndexOf('.');
// Driver.Modbus.Addressing-004: use LastIndexOf so a multi-dot input like "40001.5.3"
// produces a descriptive error ("bit index '5.3' must be 0..15") rather than silently
// parsing "5" as the bit and leaving ".3" as part of the address text. Also validate
// the address segment is non-empty (a leading dot like ".5" is not a valid Modbus addr).
var dotIdx = text.LastIndexOf('.');
var addrText = dotIdx < 0 ? text : text[..dotIdx];
if (dotIdx >= 0)
{
if (addrText.Length == 0)
{
error = $"Region/offset segment is empty before bit suffix '.{text[(dotIdx + 1)..]}' in '{text}'";
return false;
}
// Assert exactly one dot: if the remaining address still contains a dot the
// user typed something like "400.01.5" — give a precise "multiple dots" diagnostic.
if (addrText.Contains('.'))
{
error = $"Address segment '{addrText}' contains multiple dots; expected at most one '.bit' suffix in '{text}'";
return false;
}
var bitText = text[(dotIdx + 1)..];
if (!byte.TryParse(bitText, NumberStyles.None, CultureInfo.InvariantCulture, out var bitVal) || bitVal > 15)
{
@@ -197,8 +228,15 @@ public static class ModbusAddressParser
// syntax first. Successful native parse wins; failure falls through to Modicon / mnemonic.
// The order matters for cross-family ambiguity: DL205 'C100' is a control relay, not a
// Modicon coil, when the user has explicitly selected DL205.
if (family != ModbusFamily.Generic && TryParseFamilyNative(addrText, family, melsecSubFamily, out region, out offset, out error))
return true;
string? familyNativeError = null;
if (family != ModbusFamily.Generic)
{
if (TryParseFamilyNative(addrText, family, melsecSubFamily, out region, out offset, out familyNativeError))
{
error = null;
return true;
}
}
// Try mnemonic prefix first (HR, IR, C, DI). Cheaper than the digit branch and
// unambiguous when present.
@@ -209,7 +247,14 @@ public static class ModbusAddressParser
if (ModbusModiconAddress.TryParse(addrText, out region, out offset, out error))
return true;
// Both branches failed; the Modicon error is the more specific diagnostic.
// Driver.Modbus.Addressing-005: when a non-Generic family was configured and the
// family-native parser set a specific error (meaning the address matched a recognised
// family prefix but the value was invalid, e.g. "contains non-octal digit"), prefer
// that error over the generic Modicon fallback diagnostic, which otherwise says
// "must be 5 or 6 digits" for something the user clearly intended as a V-address.
if (familyNativeError is not null)
error = familyNativeError;
return false;
}
@@ -423,7 +468,14 @@ public static class ModbusAddressParser
if ((int)order == -1)
{
error = $"Unknown byte order '{text}'. Valid: ABCD, CDAB, BADC, DCBA";
// Driver.Modbus.Addressing-003: if the unknown token looks like a known type code
// (a 4-letter alphanumeric token that matches one of the recognised type strings),
// produce a diagnostic that directs the user to put the type in field 2, not field 3.
var isKnownTypeCode = text.ToUpperInvariant() is "BOOL" or "REAL" or "DINT" or "UINT"
|| (text.Length <= 6 && text.StartsWith("STR", StringComparison.OrdinalIgnoreCase));
error = isKnownTypeCode
? $"'{text}' looks like a type code; type belongs in field 2 (e.g. '40001:{text.ToUpperInvariant()}'), not field 3. Field 3 must be a 4-letter byte order (ABCD/CDAB/BADC/DCBA)"
: $"Unknown byte order '{text}'. Valid: ABCD, CDAB, BADC, DCBA";
return false;
}
@@ -170,24 +170,9 @@ public sealed class ModbusDriver
public async Task ShutdownAsync(CancellationToken cancellationToken)
{
try { _probeCts?.Cancel(); } catch { }
_probeCts?.Dispose();
_probeCts = null;
try { _reprobeCts?.Cancel(); } catch { }
_reprobeCts?.Dispose();
_reprobeCts = null;
// #151 — clear the prohibition set on shutdown so an explicit operator restart
// (ReinitializeAsync) starts with a clean slate. The re-probe loop already retries
// automatically when enabled; the restart path is the manual escape hatch.
lock (_autoProhibitedLock) _autoProhibited.Clear();
await _poll.DisposeAsync().ConfigureAwait(false);
if (_transport is not null) await _transport.DisposeAsync().ConfigureAwait(false);
_transport = null;
_health = new DriverHealth(DriverState.Unknown, _health.LastSuccessfulRead, null);
var lastRead = _health.LastSuccessfulRead;
await TeardownAsync().ConfigureAwait(false);
_health = new DriverHealth(DriverState.Unknown, lastRead, null);
}
public DriverHealth GetHealth() => _health;
@@ -327,6 +312,10 @@ public sealed class ModbusDriver
/// </summary>
private static object DecodeBitArray(ReadOnlySpan<byte> bitmap, int count, bool isArray)
{
// Driver.Modbus-005: guard against empty bitmap (already validated upstream but defensive
// here so the IndexOutOfRangeException path is explicitly closed at decode time too).
if (bitmap.IsEmpty)
throw new InvalidDataException("Modbus bit response produced an empty bitmap — cannot decode coil value");
if (!isArray) return (bitmap[0] & 0x01) == 1;
var result = new bool[count];
for (var i = 0; i < count; i++)
@@ -525,6 +514,14 @@ public sealed class ModbusDriver
catch (OperationCanceledException) { return; }
try { await RunReprobeOnceForTestAsync(ct).ConfigureAwait(false); }
catch (OperationCanceledException) when (ct.IsCancellationRequested) { return; }
catch (ObjectDisposedException) when (ct.IsCancellationRequested)
{
// Driver.Modbus-006: ShutdownAsync disposes the transport while we may be
// mid-pass. An ObjectDisposedException from the disposed transport is the
// expected shutdown race — swallow it here so the fire-and-forget task
// exits cleanly rather than faulting with the wrong failure mode.
return;
}
}
}
@@ -785,7 +782,20 @@ public sealed class ModbusDriver
var pdu = new byte[] { fc, (byte)(address >> 8), (byte)(address & 0xFF),
(byte)(quantity >> 8), (byte)(quantity & 0xFF) };
var resp = await transport.SendAsync(unitId, pdu, ct).ConfigureAwait(false);
// resp = [fc][byte-count][data...]
// resp = [fc][byte-count][data...] — validate before indexing to surface a clean error
// rather than an IndexOutOfRangeException when a device returns a truncated PDU.
// Driver.Modbus-005: guard resp.Length >= 2 (fc + byte-count) and that the payload is
// at least as long as the declared byte-count, matching the quantity we requested.
if (resp.Length < 2)
throw new InvalidDataException(
$"Modbus register response too short: expected at least 2 bytes (fc+bytecount), got {resp.Length}");
if (resp.Length < 2 + resp[1])
throw new InvalidDataException(
$"Modbus register response truncated: byte-count field declares {resp[1]} bytes but only {resp.Length - 2} available");
var expectedByteCount = quantity * 2;
if (resp[1] != expectedByteCount)
throw new InvalidDataException(
$"Modbus register response byte-count mismatch: requested {quantity} registers ({expectedByteCount} bytes), got {resp[1]} bytes");
var data = new byte[resp[1]];
Buffer.BlockCopy(resp, 2, data, 0, resp[1]);
return data;
@@ -797,6 +807,17 @@ public sealed class ModbusDriver
var pdu = new byte[] { fc, (byte)(address >> 8), (byte)(address & 0xFF),
(byte)(qty >> 8), (byte)(qty & 0xFF) };
var resp = await transport.SendAsync(unitId, pdu, ct).ConfigureAwait(false);
// Driver.Modbus-005: validate the response is structurally sound before indexing.
if (resp.Length < 2)
throw new InvalidDataException(
$"Modbus bit response too short: expected at least 2 bytes (fc+bytecount), got {resp.Length}");
if (resp.Length < 2 + resp[1])
throw new InvalidDataException(
$"Modbus bit response truncated: byte-count field declares {resp[1]} bytes but only {resp.Length - 2} available");
var expectedByteCount = (qty + 7) / 8;
if (resp[1] < expectedByteCount)
throw new InvalidDataException(
$"Modbus bit response byte-count mismatch: requested {qty} bits ({expectedByteCount} bytes), got {resp[1]} bytes");
var bitmap = new byte[resp[1]];
Buffer.BlockCopy(resp, 2, bitmap, 0, resp[1]);
return bitmap;
@@ -1471,8 +1492,40 @@ public sealed class ModbusDriver
};
public void Dispose() => DisposeAsync().AsTask().GetAwaiter().GetResult();
/// <summary>
/// Driver.Modbus-004: DisposeAsync must perform the same teardown as ShutdownAsync so
/// callers that use <c>await using</c> (without an explicit <c>ShutdownAsync</c>) do not
/// leak the probe loop, re-probe loop, and poll-engine background tasks. Shares
/// <see cref="TeardownAsync"/> with <see cref="ShutdownAsync"/> to keep them in sync.
/// </summary>
public async ValueTask DisposeAsync()
{
await TeardownAsync().ConfigureAwait(false);
}
/// <summary>
/// Shared teardown helper used by both <see cref="ShutdownAsync"/> and
/// <see cref="DisposeAsync"/>. Cancels both background loops, disposes the poll engine,
/// and disposes the transport. Idempotent — safe to call more than once.
/// </summary>
private async Task TeardownAsync()
{
try { _probeCts?.Cancel(); } catch { }
_probeCts?.Dispose();
_probeCts = null;
try { _reprobeCts?.Cancel(); } catch { }
_reprobeCts?.Dispose();
_reprobeCts = null;
_tagsByName.Clear();
_lastPublishedByRef.Clear();
lock (_lastWrittenLock) _lastWrittenByRef.Clear();
lock (_autoProhibitedLock) _autoProhibited.Clear();
await _poll.DisposeAsync().ConfigureAwait(false);
if (_transport is not null) await _transport.DisposeAsync().ConfigureAwait(false);
_transport = null;
}
@@ -1,3 +1,5 @@
using Microsoft.Extensions.Logging;
using Microsoft.Extensions.Logging.Abstractions;
using Opc.Ua;
using Opc.Ua.Client;
using Opc.Ua.Configuration;
@@ -26,9 +28,23 @@ namespace ZB.MOM.WW.OtOpcUa.Driver.OpcUaClient;
/// monitored-item handles. That mechanic lands in PR 69.
/// </para>
/// </remarks>
public sealed class OpcUaClientDriver(OpcUaClientDriverOptions options, string driverInstanceId)
: IDriver, ITagDiscovery, IReadable, IWritable, ISubscribable, IHostConnectivityProbe, IAlarmSource, IHistoryProvider, IDisposable, IAsyncDisposable
public sealed class OpcUaClientDriver : IDriver, ITagDiscovery, IReadable, IWritable, ISubscribable, IHostConnectivityProbe, IAlarmSource, IHistoryProvider, IDisposable, IAsyncDisposable
{
private readonly ILogger<OpcUaClientDriver> _logger;
/// <param name="options">Driver configuration.</param>
/// <param name="driverInstanceId">Stable logical ID from the config DB.</param>
/// <param name="logger">Optional logger; defaults to NullLogger when not supplied.</param>
public OpcUaClientDriver(OpcUaClientDriverOptions options, string driverInstanceId,
ILogger<OpcUaClientDriver>? logger = null)
{
_options = options;
_driverInstanceId = driverInstanceId;
_logger = logger ?? NullLogger<OpcUaClientDriver>.Instance;
}
private readonly OpcUaClientDriverOptions _options;
private readonly string _driverInstanceId;
// ---- IAlarmSource state ----
private readonly System.Collections.Concurrent.ConcurrentDictionary<long, RemoteAlarmSubscription> _alarmSubscriptions = new();
@@ -55,7 +71,6 @@ public sealed class OpcUaClientDriver(OpcUaClientDriverOptions options, string d
private const uint StatusBadInternalError = 0x80020000u;
private const uint StatusBadCommunicationError = 0x80050000u;
private readonly OpcUaClientDriverOptions _options = options;
private readonly SemaphoreSlim _gate = new(1, 1);
/// <summary>Active OPC UA session. Null until <see cref="InitializeAsync"/> returns cleanly.</summary>
@@ -69,6 +84,22 @@ public sealed class OpcUaClientDriver(OpcUaClientDriverOptions options, string d
/// <summary>URL of the endpoint the driver actually connected to. Exposed via <see cref="HostName"/>.</summary>
private string? _connectedEndpointUrl;
/// <summary>
/// Cert-validation delegate wired when <see cref="OpcUaClientDriverOptions.AutoAcceptCertificates"/>
/// is <c>true</c>. Stored so <see cref="Dispose"/> / <see cref="DisposeAsync"/> can
/// detach it from the (potentially process-shared) <see cref="CertificateValidator"/>
/// and avoid leaking the closure (Driver.OpcUaClient-012).
/// </summary>
private CertificateValidationEventHandler? _certValidationHandler;
/// <summary>The <see cref="CertificateValidator"/> that owns <see cref="_certValidationHandler"/>.</summary>
private CertificateValidator? _certValidatorRef;
/// <summary>
/// Approximate count of discovered nodes (folders + variables). Updated by
/// <see cref="DiscoverAsync"/> and used to report a non-zero
/// <see cref="GetMemoryFootprint"/> to the Core allocation-slope detector
/// (Driver.OpcUaClient-013).
/// </summary>
private volatile int _discoveredNodeCount;
/// <summary>
/// SDK-provided reconnect handler that owns the retry loop + session-transfer machinery
/// when the session's keep-alive channel reports a bad status. Null outside the
/// reconnecting window; constructed lazily inside the keep-alive handler. Guarded by
@@ -87,7 +118,7 @@ public sealed class OpcUaClientDriver(OpcUaClientDriverOptions options, string d
/// </summary>
private NamespaceMap? _namespaceMap;
public string DriverInstanceId => driverInstanceId;
public string DriverInstanceId => _driverInstanceId;
public string DriverType => "OpcUaClient";
public async Task InitializeAsync(string driverConfigJson, CancellationToken cancellationToken)
@@ -227,16 +258,27 @@ public sealed class OpcUaClientDriver(OpcUaClientDriverOptions options, string d
await config.ValidateAsync(ApplicationType.Client, ct).ConfigureAwait(false);
// Attach a cert-validator handler that honours the AutoAccept flag. Without this,
// AutoAcceptUntrustedCertificates on the config alone isn't always enough in newer
// SDK versions — the validator raises an event the app has to handle.
// AutoAccept=true is a dev-only escape hatch. Emit a prominent warning so a
// production misconfiguration is immediately visible in logs (Driver.OpcUaClient-012).
if (_options.AutoAcceptCertificates)
{
config.CertificateValidator.CertificateValidation += (s, e) =>
{
if (e.Error.StatusCode == StatusCodes.BadCertificateUntrusted)
e.Accept = true;
};
_logger.LogWarning(
"OpcUaClientDriver '{DriverInstanceId}': AutoAcceptCertificates=true — all " +
"remote server certificate errors are accepted, including expired / wrong-host " +
"/ chain-incomplete. This MUST be false in production to prevent MITM attacks " +
"against the opc.tcp channel.",
_driverInstanceId);
// Accept the full set of certificate-validation error codes: a real dev cert can
// fail with BadCertificateChainIncomplete, BadCertificateTimeInvalid, or
// BadCertificateHostNameInvalid, not only BadCertificateUntrusted. Only accepting
// the latter would silently fail for those certs (Driver.OpcUaClient-012).
CertificateValidationEventHandler handler = (_, e) => e.Accept = true;
config.CertificateValidator.CertificateValidation += handler;
// Store refs so ShutdownAsync + Dispose can detach the delegate and avoid
// leaking a closure on a potentially process-shared validator.
_certValidationHandler = handler;
_certValidatorRef = config.CertificateValidator;
}
// Ensure an application certificate exists. The SDK auto-generates one if missing.
@@ -481,26 +523,67 @@ public sealed class OpcUaClientDriver(OpcUaClientDriverOptions options, string d
try { handlerToCancel?.CancelReconnect(); } catch { }
handlerToCancel?.Dispose();
if (_keepAliveHandler is not null && Session is not null)
// Take the session reference under _probeLock before touching it, so we can't race
// an OnReconnectComplete that is simultaneously swapping to a new session
// (Driver.OpcUaClient-006). We clear Session to null here so any concurrent caller
// that checks inside _gate sees null immediately after shutdown begins.
ISession? sessionToClose;
lock (_probeLock)
{
try { Session.KeepAlive -= _keepAliveHandler; } catch { }
sessionToClose = Session;
if (_keepAliveHandler is not null && sessionToClose is not null)
{
try { sessionToClose.KeepAlive -= _keepAliveHandler; } catch { }
}
_keepAliveHandler = null;
Session = null;
}
_keepAliveHandler = null;
try { if (Session is Session s) await s.CloseAsync(cancellationToken).ConfigureAwait(false); }
try { if (sessionToClose is Session s) await s.CloseAsync(cancellationToken).ConfigureAwait(false); }
catch { /* best-effort */ }
try { Session?.Dispose(); } catch { }
Session = null;
try { sessionToClose?.Dispose(); } catch { }
_namespaceMap = null;
_connectedEndpointUrl = null;
// Detach the cert-validation handler so the (potentially process-shared)
// CertificateValidator doesn't hold a delegate to a shutting-down driver
// (Driver.OpcUaClient-012).
if (_certValidationHandler is not null && _certValidatorRef is not null)
{
try { _certValidatorRef.CertificateValidation -= _certValidationHandler; } catch { }
_certValidationHandler = null;
_certValidatorRef = null;
}
TransitionTo(HostState.Unknown);
_health = new DriverHealth(DriverState.Unknown, _health.LastSuccessfulRead, null);
}
public DriverHealth GetHealth() => _health;
public long GetMemoryFootprint() => 0;
public Task FlushOptionalCachesAsync(CancellationToken cancellationToken) => Task.CompletedTask;
/// <summary>
/// Returns an approximate in-driver memory footprint for the Core allocation-slope
/// detector. Each discovered node (folder or variable) contributes ~512 bytes to cover
/// the <see cref="DriverAttributeInfo"/> record, the browse-name string, and the stable
/// <c>nsu=</c> reference string stored in the address-space builder. The real number
/// depends on string length + box sizes; the constant is conservative enough that a
/// 10k-node remote server reports ~5 MB — well within the budget and detectable by the
/// Core slope alarm (Driver.OpcUaClient-013).
/// </summary>
public long GetMemoryFootprint() => _discoveredNodeCount * 512L;
/// <summary>
/// Drops the discovered-node count so the Core's cache-budget enforcement can request
/// a flush when footprint budget is breached. The OPC UA Client driver holds no
/// independently-flushable cache beyond what the address-space builder retains — a
/// flush here resets the footprint counter and signals the Core that re-discovery
/// will rebuild it cleanly from the remote server.
/// </summary>
public Task FlushOptionalCachesAsync(CancellationToken cancellationToken)
{
_discoveredNodeCount = 0;
return Task.CompletedTask;
}
// ---- IReadable ----
@@ -651,8 +734,20 @@ public sealed class OpcUaClientDriver(OpcUaClientDriverOptions options, string d
results[r] = new WriteResult(codes[w].Code);
}
}
catch (OperationCanceledException)
{
// Timeout / cancellation after the wire request may have been dispatched.
// Writes are non-idempotent (decision #44/#45) — BadTimeout ("outcome unknown,
// do not blindly retry") is more honest than BadCommunicationError ("definitely
// did not happen"). Downstream callers that need retry semantics check for
// BadTimeout and can decide whether to re-issue (Driver.OpcUaClient-009).
const uint StatusBadTimeout = 0x800A0000u;
for (var w = 0; w < indexMap.Count; w++)
results[indexMap[w]] = new WriteResult(StatusBadTimeout);
}
catch (Exception)
{
// Pre-wire transport failure — the write definitely did not reach the server.
for (var w = 0; w < indexMap.Count; w++)
results[indexMap[w]] = new WriteResult(StatusBadCommunicationError);
}
@@ -729,6 +824,10 @@ public sealed class OpcUaClientDriver(OpcUaClientDriverOptions options, string d
// still a couple of hundred ms total since the SDK chunks ReadAsync automatically.
await EnrichAndRegisterVariablesAsync(session, pendingVariables, cancellationToken)
.ConfigureAwait(false);
// Update the footprint counter so GetMemoryFootprint() returns a real estimate
// after each discovery pass (Driver.OpcUaClient-013).
_discoveredNodeCount = discovered;
}
finally { _gate.Release(); }
}
@@ -945,9 +1044,12 @@ public sealed class OpcUaClientDriver(OpcUaClientDriverOptions options, string d
internal static DriverDataType MapUpstreamDataType(NodeId dataType)
{
if (dataType == DataTypeIds.Boolean) return DriverDataType.Boolean;
if (dataType == DataTypeIds.SByte || dataType == DataTypeIds.Byte ||
dataType == DataTypeIds.Int16) return DriverDataType.Int16;
if (dataType == DataTypeIds.UInt16) return DriverDataType.UInt16;
// SByte (signed 8-bit) shares Int16 — DriverDataType has no narrower signed type.
// Byte (unsigned 8-bit) belongs in the unsigned family → UInt16, not Int16
// (Driver.OpcUaClient-010: mapping an unsigned 0-255 type onto Int16 misrepresents
// type metadata and confuses range/validation logic keyed off DriverDataType).
if (dataType == DataTypeIds.SByte || dataType == DataTypeIds.Int16) return DriverDataType.Int16;
if (dataType == DataTypeIds.Byte || dataType == DataTypeIds.UInt16) return DriverDataType.UInt16;
if (dataType == DataTypeIds.Int32) return DriverDataType.Int32;
if (dataType == DataTypeIds.UInt32) return DriverDataType.UInt32;
if (dataType == DataTypeIds.Int64) return DriverDataType.Int64;
@@ -1216,12 +1318,48 @@ public sealed class OpcUaClientDriver(OpcUaClientDriverOptions options, string d
{
try
{
_ = await session.CallAsync(
var resp = await session.CallAsync(
requestHeader: null,
methodsToCall: callRequests,
ct: cancellationToken).ConfigureAwait(false);
// Inspect per-ack results — the upstream server can reject individual acks
// (BadConditionAlreadyAcked, BadNodeIdUnknown, BadUserAccessDenied) even when
// the batch transport succeeds. Operators acking a critical alarm deserve to
// know if the ack didn't take (Driver.OpcUaClient-008).
if (resp?.Results is not null)
{
for (var i = 0; i < resp.Results.Count; i++)
{
var result = resp.Results[i];
if (StatusCode.IsBad(result.StatusCode))
{
_logger.LogWarning(
"OpcUaClientDriver '{DriverInstanceId}': AcknowledgeAsync ack[{Index}] " +
"rejected by upstream server with StatusCode {StatusCode:X8}. " +
"The acknowledgement may not have been applied.",
_driverInstanceId, i, result.StatusCode.Code);
}
}
}
}
catch (OperationCanceledException ex)
{
// Transport-level timeout / cancellation — propagate so the caller's
// retry / re-ack mechanism can decide what to do.
_logger.LogWarning(ex,
"OpcUaClientDriver '{DriverInstanceId}': AcknowledgeAsync transport error.",
_driverInstanceId);
throw;
}
catch (Exception ex)
{
// Log genuine transport failures rather than swallowing them silently.
_logger.LogWarning(ex,
"OpcUaClientDriver '{DriverInstanceId}': AcknowledgeAsync failed; " +
"acknowledgements may not have been applied.",
_driverInstanceId);
}
catch { /* best-effort — caller's re-ack mechanism catches pathological paths */ }
}
finally { _gate.Release(); }
}
@@ -1466,25 +1604,31 @@ public sealed class OpcUaClientDriver(OpcUaClientDriverOptions options, string d
{
if (sender is not SessionReconnectHandler handler) return;
var newSession = handler.Session;
var oldSession = Session;
// Rewire keep-alive onto the new session — without this the next drop wouldn't
// trigger another reconnect attempt.
if (oldSession is not null && _keepAliveHandler is not null)
{
try { oldSession.KeepAlive -= _keepAliveHandler; } catch { }
}
if (newSession is not null && _keepAliveHandler is not null)
{
newSession.KeepAlive += _keepAliveHandler;
}
Session = newSession;
// Retire the handler that just finished. Done under _probeLock so this can't race
// OnKeepAlive arming a fresh handler for a subsequent drop (Driver.OpcUaClient-005).
// All mutations to Session and _reconnectHandler run under _probeLock so
// OnReconnectComplete, OnKeepAlive, and ShutdownAsync cannot race each other:
// a session swap visible to concurrent ReadAsync/WriteAsync/DiscoverAsync callers
// (which re-read Session inside _gate) must be atomic w.r.t. disposal and
// re-arming (Driver.OpcUaClient-006).
ISession? oldSession;
lock (_probeLock)
{
oldSession = Session;
// Rewire keep-alive before swapping the reference so a hot keep-alive can't
// fire against the old session after we've already assigned the new one.
if (oldSession is not null && _keepAliveHandler is not null)
{
try { oldSession.KeepAlive -= _keepAliveHandler; } catch { }
}
if (newSession is not null && _keepAliveHandler is not null)
{
newSession.KeepAlive += _keepAliveHandler;
}
Session = newSession;
// Retire the handler that just finished.
if (ReferenceEquals(_reconnectHandler, handler))
{
_reconnectHandler.Dispose();
@@ -1578,7 +1722,59 @@ public sealed class OpcUaClientDriver(OpcUaClientDriverOptions options, string d
OnHostStatusChanged?.Invoke(this, new HostStatusChangedEventArgs(HostName, old, newState));
}
public void Dispose() => DisposeAsync().AsTask().GetAwaiter().GetResult();
/// <summary>
/// Synchronous disposal. Cancels the reconnect handler and detaches the keep-alive
/// hook synchronously (no async work on this hot path), then fires the cert-validation
/// handler detach. The async session-close is intentionally skipped — it requires a
/// live session + network round-trip and is unsafe to block-on from a potentially
/// single-threaded context (OPC UA stack thread). The session will be cleaned up by
/// the SDK's own finalizer on GC (Driver.OpcUaClient-007: no sync-over-async).
/// </summary>
public void Dispose()
{
if (_disposed) return;
_disposed = true;
// Cancel any in-flight reconnect handler.
SessionReconnectHandler? handlerToCancel;
lock (_probeLock)
{
handlerToCancel = _reconnectHandler;
_reconnectHandler = null;
// Detach keep-alive and null Session so in-flight gated callers see null
// after their next _gate.WaitAsync — they return BadCommunicationError cleanly.
if (_keepAliveHandler is not null && Session is not null)
{
try { Session.KeepAlive -= _keepAliveHandler; } catch { }
}
_keepAliveHandler = null;
Session = null;
}
try { handlerToCancel?.CancelReconnect(); } catch { }
handlerToCancel?.Dispose();
// Detach the cert-validation handler registered during InitializeAsync so the
// CertificateValidator (which may be process-shared) doesn't hold a reference to
// a disposed driver (Driver.OpcUaClient-012).
if (_certValidationHandler is not null && _certValidatorRef is not null)
{
try { _certValidatorRef.CertificateValidation -= _certValidationHandler; } catch { }
_certValidationHandler = null;
_certValidatorRef = null;
}
// Acquire the gate once so any in-flight gated operation (ReadAsync / WriteAsync /
// DiscoverAsync) has definitely released before we dispose the gate. Without this
// drain, a background read that calls _gate.Release() after Dispose throws
// ObjectDisposedException (Driver.OpcUaClient-007).
try
{
if (_gate.Wait(TimeSpan.FromSeconds(2)))
_gate.Release();
}
catch { /* timeout or already disposed — proceed */ }
_gate.Dispose();
}
public async ValueTask DisposeAsync()
{
@@ -1586,6 +1782,23 @@ public sealed class OpcUaClientDriver(OpcUaClientDriverOptions options, string d
_disposed = true;
try { await ShutdownAsync(CancellationToken.None).ConfigureAwait(false); }
catch { /* disposal is best-effort */ }
// Detach the cert-validation handler (Driver.OpcUaClient-012).
if (_certValidationHandler is not null && _certValidatorRef is not null)
{
try { _certValidatorRef.CertificateValidation -= _certValidationHandler; } catch { }
_certValidationHandler = null;
_certValidatorRef = null;
}
// Drain the gate before disposal so no in-flight _gate.Release() fires after
// Dispose (Driver.OpcUaClient-007).
try
{
await _gate.WaitAsync(TimeSpan.FromSeconds(2)).ConfigureAwait(false);
_gate.Release();
}
catch { /* timeout or already disposed */ }
_gate.Dispose();
}
}

Some files were not shown because too many files have changed in this diff Show More