Files
lmxopcua/code-reviews/Driver.S7/findings.md
Joseph Doherty 090d2a4b44 fix(driver-s7): resolve High code-review findings (Driver.S7-001, -006, -007, -011)
Driver.S7-001: Timer (T{n}) / Counter (C{n}) addresses parsed cleanly but
the read path had no S7DataType or decode case for them, so a Timer/Counter
tag passed fail-fast init and then threw a misleading type-mismatch on every
read. InitializeAsync now runs RejectUnsupportedTagAddresses, throwing a clear
NotSupportedException ("not yet supported", echoing tag name + address) so the
config error fails fast at init.

Driver.S7-006: ShutdownAsync cancelled the probe/poll CTSs but did not await
the fire-and-forget loop tasks before DisposeAsync disposed _gate, letting a
loop iteration mid-semaphore race a disposed object. The probe task is now
tracked in _probeTask and each poll task in SubscriptionState.PollTask;
ShutdownAsync cancels every CTS, awaits Task.WhenAll of those handles with a
bounded 5 s DrainTimeout, then disposes the CTSs and gate. Task.Run is passed
CancellationToken.None so the handle is always awaitable.

Driver.S7-007: a PUT/GET-disabled fault (permanent misconfiguration) was
mapped identically to a transient PlcException — both BadDeviceFailure +
Degraded. ReadAsync/WriteAsync now split the catch via an IsAccessDenied
filter (S7.Net exposes no typed code for AccessingObjectNotAllowed, so the
inner-exception chain is inspected for the "not allowed" marker). Access-denied
now maps to BadNotSupported and Faulted with a config-alert message pointing
at the TIA Portal PUT/GET toggle; genuine device faults stay BadDeviceFailure.

Driver.S7-011: S7Driver ignored driverConfigJson on Initialize/Reinitialize,
so a config change delivered through ReinitializeAsync (the only Core-initiated
in-process recovery path) was silently discarded. Config parsing was factored
into S7DriverFactoryExtensions.ParseOptions; InitializeAsync now re-parses
driverConfigJson and rebuilds _options whenever the document has a real body.
An empty / placeholder document keeps the constructor options.

Adds S7DriverCodeReviewFixTests covering Timer/Counter rejection, config-json
application on Initialize/Reinitialize, and shutdown-drain with active
subscriptions. All 68 S7 driver tests pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-22 06:41:26 -04:00

18 KiB

Code Review — Driver.S7

Field Value
Module src/Drivers/ZB.MOM.WW.OtOpcUa.Driver.S7
Reviewer Claude Code
Review date 2026-05-22
Commit reviewed 76d35d1
Status Reviewed
Open findings 10

Checklist coverage

A comprehensive review completes every category, recording "No issues found" where a category produced nothing rather than leaving it blank.

# Category Result
1 Correctness & logic bugs Driver.S7-001, Driver.S7-002, Driver.S7-003
2 OtOpcUa conventions Driver.S7-004, Driver.S7-005
3 Concurrency & thread safety Driver.S7-006
4 Error handling & resilience Driver.S7-007, Driver.S7-008, Driver.S7-009
5 Security No issues found
6 Performance & resource management Driver.S7-010
7 Design-document adherence Driver.S7-011, Driver.S7-012
8 Code organization & conventions Driver.S7-013
9 Testing coverage Driver.S7-014
10 Documentation & comments Driver.S7-012 (shared)

Findings

Driver.S7-001

Field Value
Severity High
Category Correctness & logic bugs
Location S7AddressParser.cs:93, S7Driver.cs:231
Status Resolved

Description: S7AddressParser.Parse accepts Timer (T0) and Counter (C0) addresses and the test suite asserts they parse successfully, but the read path cannot serve them. Two problems compound: (1) ReadOneAsync type-mapping switch (lines 231-250) has no case for any Timer/Counter combination, so a Timer/Counter tag falls through to the default arm and throws InvalidDataException with a misleading "type-mismatch" message on every read; (2) the read is issued via plc.ReadAsync(tag.Address, ...) passing the raw address string, and S7.Net string-based parser does not understand T{n}/C{n} syntax. A tag configured with a timer or counter address passes init-time parsing (the docstring promises config typos fail fast at init) and then fails on every read - exactly the un-diagnosable failure mode the fail-fast parse was meant to prevent.

Recommendation: Either drop Timer/Counter from S7AddressParser and S7Area until they are wired through to S7.Net, or implement the Timer/Counter read path. If kept, reject Timer/Counter tags at InitializeAsync with a clear "not yet supported" error rather than letting them parse clean.

Resolution: Resolved 2026-05-22 — InitializeAsync now runs RejectUnsupportedTagAddresses, which throws NotSupportedException with a clear "not yet supported" message (echoing the tag name + address) for any tag whose address parses as a Timer or Counter, so the bad config fails fast at init rather than throwing a misleading type-mismatch on every read.

Driver.S7-002

Field Value
Severity Medium
Category Correctness & logic bugs
Location S7Driver.cs:350
Status Open

Description: MapDataType collapses S7DataType.UInt32 to DriverDataType.Int32. UInt32 values above int.MaxValue (2^31-1) wrap to negative when surfaced to the OPC UA client, silently corrupting the value. The inline comment only flags Int64/UInt64 as "widens; lossy" but UInt32 to Int32 is equally lossy and is not called out.

Recommendation: Map UInt32/UInt16 to a DriverDataType wide enough to hold the unsigned range, or add the missing unsigned DriverDataType members. At minimum correct the comment so the lossiness of UInt32 is documented.

Resolution: (open)

Driver.S7-003

Field Value
Severity Low
Category Correctness & logic bugs
Location S7Driver.cs:172, S7Driver.cs:255
Status Open

Description: ReadAsync and WriteAsync dereference fullReferences.Count / writes.Count with no null guard. A null argument throws NullReferenceException rather than ArgumentNullException, and the NRE escapes before the _gate is taken so it is not wrapped in a per-item status. DiscoverAsync correctly uses ArgumentNullException.ThrowIfNull(builder); the read/write entry points are inconsistent with it.

Recommendation: Add ArgumentNullException.ThrowIfNull for the list parameters at the top of ReadAsync and WriteAsync.

Resolution: (open)

Driver.S7-004

Field Value
Severity Medium
Category OtOpcUa conventions
Location S7Driver.cs (whole file)
Status Open

Description: The driver performs no logging. CLAUDE.md Library Preferences mandate Serilog with a rolling daily file sink. Every error path is an empty catch block (Initialize cleanup line 130, ShutdownAsync lines 142/149/153, ProbeLoop line 483, PollLoop lines 396/406, Dispose line 511). Connection faults, probe transitions, PUT/GET-disabled config errors, and poll-loop exceptions are all silently swallowed. An operator has only the DriverHealth.LastError string and no event trail to diagnose an intermittent PLC.

Recommendation: Inject an ILogger/ILoggerFactory and log connect success/failure, probe Running/Stopped transitions, PUT/GET-disabled detection, and swallowed poll-loop / shutdown exceptions.

Resolution: (open)

Driver.S7-005

Field Value
Severity Low
Category OtOpcUa conventions
Location S7Driver.cs:33, S7Driver.cs:433
Status Open

Description: System.Collections.Concurrent.ConcurrentDictionary is written out with a fully-qualified namespace at the field declarations instead of a using System.Collections.Concurrent directive. ImplicitUsings is enabled and the rest of the codebase relies on using directives; the inline FQN is inconsistent with house style. Similar redundant global::S7.Net.* qualifiers appear throughout S7Driver.cs despite the file-top using S7.Net.

Recommendation: Add using System.Collections.Concurrent and drop the redundant global::S7.Net. qualifiers where using S7.Net already covers them.

Resolution: (open)

Driver.S7-006

Field Value
Severity High
Category Concurrency & thread safety
Location S7Driver.cs:140, S7Driver.cs:457, S7Driver.cs:506
Status Resolved

Description: Disposal races with the in-flight probe / poll tasks. ShutdownAsync calls _probeCts.Cancel() and cancels each subscription CTS, but it does not await the ProbeLoopAsync / PollLoopAsync tasks (they are fire-and-forget Task.Run with the task handle discarded). DisposeAsync then calls ShutdownAsync followed immediately by _gate.Dispose(). A probe or poll iteration that is between _gate.WaitAsync and _gate.Release() when cancellation fires will call Release() (line 479) or have WaitAsync observe a disposed semaphore - ObjectDisposedException. The probe loop broad catch swallows it, but the disposal-ordering bug is real: the semaphore can be disposed while a worker still holds or is waiting on it. The same applies to _probeCts.Dispose() (line 143) running while ProbeLoopAsync may still touch the linked token.

Recommendation: Track the probe and poll Task handles, and in ShutdownAsync (or DisposeAsync) await Task.WhenAll(...) with a bounded timeout after cancelling, before disposing _gate and the CTS objects.

Resolution: Resolved 2026-05-22 — the probe loop now stores its Task in _probeTask and each subscription records its poll Task in SubscriptionState.PollTask. ShutdownAsync cancels every CTS, awaits Task.WhenAll of those handles with a bounded 5 s DrainTimeout, and only then disposes _probeCts, the subscription CTSs, and (via DisposeAsync) _gate — so no loop can touch a disposed semaphore. Task.Run is now passed CancellationToken.None so the handle is always awaitable even if the token is already cancelled.

Driver.S7-007

Field Value
Severity High
Category Error handling & resilience
Location S7Driver.cs:200, S7DriverOptions.cs:13, docs/v2/driver-specs.md:434
Status Resolved

Description: PUT/GET-disabled handling contradicts the design and the module own docstring. driver-specs.md section 5 (line 434) and the S7DriverOptions class remark both state PUT/GET-disabled must be mapped to BadNotSupported and surfaced as a configuration alert, not a transient fault, because blind retry is wasted effort. The actual code (ReadAsync, lines 200-208) catches every S7.Net.PlcException and maps it to StatusBadDeviceFailure, then sets health to Degraded. Consequences: (1) a genuinely transient PlcException (e.g. CPU briefly in STOP) is reported identically to a permanent PUT/GET misconfiguration - the operator cannot tell a config problem from a transient one, which is the exact distinction the spec demands; (2) the promised BadNotSupported status code is never produced, so the S7DriverOptions docstring is now false.

Recommendation: Inspect PlcException.ErrorCode and map the PUT/GET-disabled / access-denied code to BadNotSupported with a distinct config-alert health state; keep BadDeviceFailure/Degraded only for genuine device-fault error codes.

Resolution: Resolved 2026-05-22 — ReadAsync / WriteAsync now split the PlcException catch via an IsAccessDenied filter. S7.Net exposes no typed error code for the S7 AccessingObjectNotAllowed status (its ValidateResponseCode throws a plain Exception wrapped as the inner exception of a PlcException), so IsAccessDenied walks the inner-exception chain for the "not allowed" marker. A PUT/GET-disabled fault now maps to BadNotSupported and sets health to Faulted with a config-alert message pointing operators at the TIA Portal PUT/GET toggle; a genuine device fault still maps to BadDeviceFailure/Degraded.

Driver.S7-008

Field Value
Severity Medium
Category Error handling & resilience
Location S7Driver.cs:286
Status Open

Description: WriteAsync catch ladder is coarser than ReadAsync and loses information. The generic catch (Exception) maps everything - socket errors, timeouts, OverflowException from Convert.ToInt16 of an out-of-range value, NullReferenceException from Convert.ToBoolean(null) - to StatusBadInternalError. A genuine transport fault during a write is reported to the client as an internal error rather than BadCommunicationError, and unlike ReadAsync the write path never updates _health on failure, so a PLC that is down stays Healthy in the dashboard as long as only writes are attempted. OperationCanceledException is also caught and turned into a status code rather than propagating.

Recommendation: Mirror the ReadAsync catch structure: let OperationCanceledException propagate, map socket/timeout faults to BadCommunicationError, map value-conversion failures to a distinct out-of-range status, and update _health to Degraded on transport failures.

Resolution: (open)

Driver.S7-009

Field Value
Severity Low
Category Error handling & resilience
Location S7Driver.cs:392
Status Open

Description: The subscription poll loop never reflects sustained polling failure anywhere an operator can see it. PollLoopAsync swallows every non-cancellation exception with an empty catch and the comment claims "the health surface reflects it" - but a poll failure routes through ReadAsync, which only sets DriverState.Degraded when the per-tag read throws inside the gate; exceptions thrown before that (e.g. RequirePlc() when Plc is null after a drop) bypass the health update entirely. A subscription against an uninitialized or dropped driver loops forever silently, with no backoff - re-polling every Interval indefinitely on a hard failure.

Recommendation: Have the poll loop update _health on repeated failure and apply a capped backoff after consecutive errors; at minimum log the swallowed exception (see Driver.S7-004).

Resolution: (open)

Driver.S7-010

Field Value
Severity Low
Category Performance & resource management
Location S7Driver.cs:504
Status Open

Description: Dispose() is implemented as DisposeAsync().AsTask().GetAwaiter().GetResult() - sync-over-async. Inside the generic host this is currently safe (no captured SynchronizationContext), but it is a known deadlock pattern. The only async work behind DisposeAsync is ShutdownAsync, which does nothing async (returns Task.CompletedTask). The blocking wrap is unnecessary risk.

Recommendation: Since ShutdownAsync is effectively synchronous, have Dispose() perform the teardown directly (cancel CTSs, close Plc, dispose _gate) without round-tripping through the async path.

Resolution: (open)

Driver.S7-011

Field Value
Severity High
Category Design-document adherence
Location S7Driver.cs:82, S7Driver.cs:134, IDriver.cs:24
Status Resolved

Description: S7Driver ignores the driverConfigJson parameter on both InitializeAsync and ReinitializeAsync. The IDriver contract states InitializeAsync initializes the driver "from its DriverConfig JSON" and ReinitializeAsync "applies a config change in place". All configuration is instead captured in the constructor (S7DriverOptions options), and ReinitializeAsync simply calls ShutdownAsync then InitializeAsync with the same options object. Consequently a config change delivered to ReinitializeAsync (the documented IGenerationApplier recovery path per driver-stability.md) is silently discarded - the driver re-opens with the old config. This breaks the only Core-initiated in-process recovery path.

Recommendation: Either re-parse driverConfigJson inside InitializeAsync/ReinitializeAsync and rebuild _options from it, or document explicitly that S7 reconfiguration requires instance recreation and have ReinitializeAsync signal that the passed JSON is unused so the contract mismatch is visible.

Resolution: Resolved 2026-05-22 — config parsing was factored out of the factory into S7DriverFactoryExtensions.ParseOptions. InitializeAsync (and therefore ReinitializeAsync, which delegates to it) now re-parses driverConfigJson and rebuilds _options from it whenever the document carries a real body, so a config change delivered through ReinitializeAsync — the only Core-initiated in-process recovery path — is honoured. An empty / placeholder document ("", {}, []) keeps the constructor-supplied options so existing lifecycle unit tests that pass "{}" are unaffected.

Driver.S7-012

Field Value
Severity Medium
Category Design-document adherence
Location S7DriverOptions.cs:59, S7Driver.cs:457
Status Open

Description: S7ProbeOptions.ProbeAddress is configured (default "MW0"), documented at length ("the driver runs a tick loop that issues a cheap read against S7ProbeOptions.ProbeAddress"), surfaced in the factory DTO (S7ProbeDto.ProbeAddress), and parsed from JSON - but it is never read by any code. ProbeLoopAsync probes liveness via plc.ReadStatusAsync (CPU status), not via a read of ProbeAddress. The XML doc on the S7DriverOptions.Probe property and on ProbeAddress describes behaviour the driver does not implement. An operator who sets ProbeAddress to a known-good DB word expecting the probe to exercise it will see no effect.

Recommendation: Either make ProbeLoopAsync actually read ProbeAddress (parsing it once at init and rejecting a bad value early), or delete ProbeAddress from S7ProbeOptions/S7ProbeDto and correct the XML docs to describe the ReadStatusAsync-based probe.

Resolution: (open)

Driver.S7-013

Field Value
Severity Low
Category Code organization & conventions
Location S7DriverOptions.cs:90, S7Driver.cs:300
Status Open

Description: S7TagDefinition.StringLength is a public configured/JSON-bound parameter (default 254) but is dead: S7DataType.String reads and writes both throw NotSupportedException ("...land in a follow-up PR"), so StringLength is never consumed. Likewise S7DataType.Int64, UInt64, Float64, String, and DateTime are exposed as configurable, browse through MapDataType into real DriverDataType values, and pass DiscoverAsync - creating address-space nodes - yet every read/write of them throws NotSupportedException, becoming BadNotSupported. A site can configure a Float64 tag, see the node appear, and get BadNotSupported on every access. The scaffold/follow-up-PR split leaks half-implemented types into the configurable surface.

Recommendation: Reject the not-yet-implemented S7DataType values (and StringLength) at InitializeAsync / factory validation with a clear "not yet supported" error, so a partially-implemented type cannot be configured into a live address space.

Resolution: (open)

Driver.S7-014

Field Value
Severity Medium
Category Testing coverage
Location tests/Drivers/ZB.MOM.WW.OtOpcUa.Driver.S7.Tests/
Status Open

Description: Test coverage has notable gaps for the driver behavioural core: (1) no test exercises the ReadOneAsync type-reinterpret switch (Int16 from ushort, Int32 from uint, Float32 from UInt32 bits) - the most logic-heavy method in the driver is untested, and the unsigned/signed unchecked casts are unverified; (2) no test covers a Timer/Counter tag end-to-end, which would have caught Driver.S7-001; (3) no test covers WriteOneAsync boxing conversions or the out-of-range Convert failure paths; (4) the read-write tests only cover error paths (uninitialized, bad address) - the happy path is explicitly deferred to "a follow-up PR" with no mock S7 server, leaving the entire successful read, write, poll, and probe-transition surface untested; (5) ReinitializeAsync and the driverConfigJson-ignored behaviour (Driver.S7-011) has no test.

Recommendation: Add unit tests for ReadOneAsync/WriteOneAsync type mapping by factoring the pure reinterpret/boxing logic out of the PLC round-trip so it is testable without a live PLC, and add a Timer/Counter rejection test. Track the live/mock-server happy-path coverage as an explicit follow-up rather than an open-ended deferral.

Resolution: (open)