Files
lmxopcua/code-reviews/Driver.S7/findings.md
Joseph Doherty af0f09d07e fix(driver-s7): resolve Low code-review findings (Driver.S7-003,005,009,010,013)
- Driver.S7-003: ArgumentNullException.ThrowIfNull on the references
  argument at the top of ReadAsync / WriteAsync (was reaching .Count
  before any null check).
- Driver.S7-005: drop the redundant global::S7.Net.Plc qualifiers in
  ReadOneAsync / WriteOneAsync — using S7.Net already covers Plc.
- Driver.S7-009: PollLoopAsync degrades _health to Degraded after
  sustained failure and backs off exponentially up to PollBackoffCap;
  resets on a healthy tick so an operator can see the loop wedge.
- Driver.S7-010: Dispose runs the synchronous teardown directly with a
  bounded WhenAll Wait drain instead of bridging via DisposeAsync().
- Driver.S7-013: reject unsupported S7DataType values (Int64 / UInt64 /
  Float64 / String / DateTime) at InitializeAsync so half-implemented
  types no longer leak BadNotSupported live nodes into the address space.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-23 07:45:45 -04:00

453 lines
23 KiB
Markdown

# Code Review — Driver.S7
| Field | Value |
|---|---|
| Module | `src/Drivers/ZB.MOM.WW.OtOpcUa.Driver.S7` |
| Reviewer | Claude Code |
| Review date | 2026-05-22 |
| Commit reviewed | `76d35d1` |
| Status | Reviewed |
| Open findings | 0 |
## Checklist coverage
A comprehensive review completes every category, recording "No issues found" where
a category produced nothing rather than leaving it blank.
| # | Category | Result |
|---|---|---|
| 1 | Correctness & logic bugs | Driver.S7-001, Driver.S7-002, Driver.S7-003 |
| 2 | OtOpcUa conventions | Driver.S7-004, Driver.S7-005 |
| 3 | Concurrency & thread safety | Driver.S7-006 |
| 4 | Error handling & resilience | Driver.S7-007, Driver.S7-008, Driver.S7-009 |
| 5 | Security | No issues found |
| 6 | Performance & resource management | Driver.S7-010 |
| 7 | Design-document adherence | Driver.S7-011, Driver.S7-012 |
| 8 | Code organization & conventions | Driver.S7-013 |
| 9 | Testing coverage | Driver.S7-014 |
| 10 | Documentation & comments | Driver.S7-012 (shared) |
## Findings
### Driver.S7-001
| Field | Value |
|---|---|
| Severity | High |
| Category | Correctness & logic bugs |
| Location | `S7AddressParser.cs:93`, `S7Driver.cs:231` |
| Status | Resolved |
**Description:** S7AddressParser.Parse accepts Timer (T0) and Counter (C0)
addresses and the test suite asserts they parse successfully, but the read path
cannot serve them. Two problems compound: (1) ReadOneAsync type-mapping switch
(lines 231-250) has no case for any Timer/Counter combination, so a Timer/Counter
tag falls through to the default arm and throws InvalidDataException with a
misleading "type-mismatch" message on every read; (2) the read is issued via
plc.ReadAsync(tag.Address, ...) passing the raw address string, and S7.Net
string-based parser does not understand T{n}/C{n} syntax. A tag configured with a
timer or counter address passes init-time parsing (the docstring promises config
typos fail fast at init) and then fails on every read - exactly the
un-diagnosable failure mode the fail-fast parse was meant to prevent.
**Recommendation:** Either drop Timer/Counter from S7AddressParser and S7Area
until they are wired through to S7.Net, or implement the Timer/Counter read path.
If kept, reject Timer/Counter tags at InitializeAsync with a clear "not yet
supported" error rather than letting them parse clean.
**Resolution:** Resolved 2026-05-22 — `InitializeAsync` now runs
`RejectUnsupportedTagAddresses`, which throws `NotSupportedException` with a
clear "not yet supported" message (echoing the tag name + address) for any tag
whose address parses as a Timer or Counter, so the bad config fails fast at init
rather than throwing a misleading type-mismatch on every read.
### Driver.S7-002
| Field | Value |
|---|---|
| Severity | Medium |
| Category | Correctness & logic bugs |
| Location | `S7Driver.cs:350` |
| Status | Resolved |
**Description:** MapDataType collapses S7DataType.UInt32 to DriverDataType.Int32.
UInt32 values above int.MaxValue (2^31-1) wrap to negative when surfaced to the
OPC UA client, silently corrupting the value. The inline comment only flags
Int64/UInt64 as "widens; lossy" but UInt32 to Int32 is equally lossy and is not
called out.
**Recommendation:** Map UInt32/UInt16 to a DriverDataType wide enough to hold the
unsigned range, or add the missing unsigned DriverDataType members. At minimum
correct the comment so the lossiness of UInt32 is documented.
**Resolution:** Resolved 2026-05-22 — added an inline comment to the `MapDataType` switch explicitly documenting the UInt32→Int32 lossiness (same limitation as Int64/UInt64, tracked for a follow-up PR adding unsigned DriverDataType members); the code mapping is unchanged pending that follow-up.
### Driver.S7-003
| Field | Value |
|---|---|
| Severity | Low |
| Category | Correctness & logic bugs |
| Location | `S7Driver.cs:172`, `S7Driver.cs:255` |
| Status | Resolved |
**Description:** ReadAsync and WriteAsync dereference fullReferences.Count /
writes.Count with no null guard. A null argument throws NullReferenceException
rather than ArgumentNullException, and the NRE escapes before the _gate is taken
so it is not wrapped in a per-item status. DiscoverAsync correctly uses
ArgumentNullException.ThrowIfNull(builder); the read/write entry points are
inconsistent with it.
**Recommendation:** Add ArgumentNullException.ThrowIfNull for the list parameters
at the top of ReadAsync and WriteAsync.
**Resolution:** Resolved 2026-05-23 — added `ArgumentNullException.ThrowIfNull`
at the top of both `ReadAsync` and `WriteAsync`, placed BEFORE `RequirePlc()` so
a null argument produces a typed `ArgumentNullException` (consistent with
`DiscoverAsync`) rather than either an NRE on `.Count` or the "not initialized"
`InvalidOperationException` from `RequirePlc`. Regression tests
`ReadAsync_with_null_fullReferences_throws_ArgumentNullException` and
`WriteAsync_with_null_writes_throws_ArgumentNullException`.
### Driver.S7-004
| Field | Value |
|---|---|
| Severity | Medium |
| Category | OtOpcUa conventions |
| Location | `S7Driver.cs` (whole file) |
| Status | Resolved |
**Description:** The driver performs no logging. CLAUDE.md Library Preferences
mandate Serilog with a rolling daily file sink. Every error path is an empty
catch block (Initialize cleanup line 130, ShutdownAsync lines 142/149/153,
ProbeLoop line 483, PollLoop lines 396/406, Dispose line 511). Connection faults,
probe transitions, PUT/GET-disabled config errors, and poll-loop exceptions are
all silently swallowed. An operator has only the DriverHealth.LastError string
and no event trail to diagnose an intermittent PLC.
**Recommendation:** Inject an ILogger/ILoggerFactory and log connect
success/failure, probe Running/Stopped transitions, PUT/GET-disabled detection,
and swallowed poll-loop / shutdown exceptions.
**Resolution:** Resolved 2026-05-22 — injected `ILogger<S7Driver>` (optional, defaults to `NullLogger`) into the primary constructor; added structured log calls for connect success/failure, probe Running/Stopped transitions, and swallowed poll-loop exceptions, giving operators an event trail via Serilog.
### Driver.S7-005
| Field | Value |
|---|---|
| Severity | Low |
| Category | OtOpcUa conventions |
| Location | `S7Driver.cs:33`, `S7Driver.cs:433` |
| Status | Resolved |
**Description:** System.Collections.Concurrent.ConcurrentDictionary is written
out with a fully-qualified namespace at the field declarations instead of a
using System.Collections.Concurrent directive. ImplicitUsings is enabled and the
rest of the codebase relies on using directives; the inline FQN is inconsistent
with house style. Similar redundant global::S7.Net.* qualifiers appear throughout
S7Driver.cs despite the file-top using S7.Net.
**Recommendation:** Add using System.Collections.Concurrent and drop the
redundant global::S7.Net. qualifiers where using S7.Net already covers them.
**Resolution:** Resolved 2026-05-23 — `using System.Collections.Concurrent` was
already added by an earlier finding fix; this resolution removes the remaining
`global::S7.Net.Plc` qualifiers from the `ReadOneAsync` and `WriteOneAsync`
signatures, now using the unqualified `Plc` type (the file-top `using S7.Net`
already covers it). House style restored.
### Driver.S7-006
| Field | Value |
|---|---|
| Severity | High |
| Category | Concurrency & thread safety |
| Location | `S7Driver.cs:140`, `S7Driver.cs:457`, `S7Driver.cs:506` |
| Status | Resolved |
**Description:** Disposal races with the in-flight probe / poll tasks.
ShutdownAsync calls _probeCts.Cancel() and cancels each subscription CTS, but it
does not await the ProbeLoopAsync / PollLoopAsync tasks (they are fire-and-forget
Task.Run with the task handle discarded). DisposeAsync then calls ShutdownAsync
followed immediately by _gate.Dispose(). A probe or poll iteration that is
between _gate.WaitAsync and _gate.Release() when cancellation fires will call
Release() (line 479) or have WaitAsync observe a disposed semaphore -
ObjectDisposedException. The probe loop broad catch swallows it, but the
disposal-ordering bug is real: the semaphore can be disposed while a worker still
holds or is waiting on it. The same applies to _probeCts.Dispose() (line 143)
running while ProbeLoopAsync may still touch the linked token.
**Recommendation:** Track the probe and poll Task handles, and in ShutdownAsync
(or DisposeAsync) await Task.WhenAll(...) with a bounded timeout after cancelling,
before disposing _gate and the CTS objects.
**Resolution:** Resolved 2026-05-22 — the probe loop now stores its Task in
`_probeTask` and each subscription records its poll Task in `SubscriptionState.PollTask`.
`ShutdownAsync` cancels every CTS, awaits `Task.WhenAll` of those handles with a
bounded 5 s `DrainTimeout`, and only then disposes `_probeCts`, the subscription
CTSs, and (via `DisposeAsync`) `_gate` — so no loop can touch a disposed
semaphore. `Task.Run` is now passed `CancellationToken.None` so the handle is
always awaitable even if the token is already cancelled.
### Driver.S7-007
| Field | Value |
|---|---|
| Severity | High |
| Category | Error handling & resilience |
| Location | `S7Driver.cs:200`, `S7DriverOptions.cs:13`, `docs/v2/driver-specs.md:434` |
| Status | Resolved |
**Description:** PUT/GET-disabled handling contradicts the design and the
module own docstring. driver-specs.md section 5 (line 434) and the
S7DriverOptions class remark both state PUT/GET-disabled must be mapped to
BadNotSupported and surfaced as a configuration alert, not a transient fault,
because blind retry is wasted effort. The actual code (ReadAsync, lines 200-208)
catches every S7.Net.PlcException and maps it to StatusBadDeviceFailure, then
sets health to Degraded. Consequences: (1) a genuinely transient PlcException
(e.g. CPU briefly in STOP) is reported identically to a permanent PUT/GET
misconfiguration - the operator cannot tell a config problem from a transient
one, which is the exact distinction the spec demands; (2) the promised
BadNotSupported status code is never produced, so the S7DriverOptions docstring
is now false.
**Recommendation:** Inspect PlcException.ErrorCode and map the
PUT/GET-disabled / access-denied code to BadNotSupported with a distinct
config-alert health state; keep BadDeviceFailure/Degraded only for genuine
device-fault error codes.
**Resolution:** Resolved 2026-05-22 — `ReadAsync` / `WriteAsync` now split the
`PlcException` catch via an `IsAccessDenied` filter. S7.Net exposes no typed
error code for the S7 `AccessingObjectNotAllowed` status (its
`ValidateResponseCode` throws a plain `Exception` wrapped as the inner exception
of a `PlcException`), so `IsAccessDenied` walks the inner-exception chain for the
"not allowed" marker. A PUT/GET-disabled fault now maps to `BadNotSupported` and
sets health to `Faulted` with a config-alert message pointing operators at the
TIA Portal PUT/GET toggle; a genuine device fault still maps to
`BadDeviceFailure`/`Degraded`.
### Driver.S7-008
| Field | Value |
|---|---|
| Severity | Medium |
| Category | Error handling & resilience |
| Location | `S7Driver.cs:286` |
| Status | Resolved |
**Description:** WriteAsync catch ladder is coarser than ReadAsync and loses
information. The generic catch (Exception) maps everything - socket errors,
timeouts, OverflowException from Convert.ToInt16 of an out-of-range value,
NullReferenceException from Convert.ToBoolean(null) - to StatusBadInternalError.
A genuine transport fault during a write is reported to the client as an internal
error rather than BadCommunicationError, and unlike ReadAsync the write path never
updates _health on failure, so a PLC that is down stays Healthy in the dashboard
as long as only writes are attempted. OperationCanceledException is also caught
and turned into a status code rather than propagating.
**Recommendation:** Mirror the ReadAsync catch structure: let
OperationCanceledException propagate, map socket/timeout faults to
BadCommunicationError, map value-conversion failures to a distinct out-of-range
status, and update _health to Degraded on transport failures.
**Resolution:** Resolved 2026-05-22 — restructured `WriteAsync` catch ladder: `OperationCanceledException` now re-throws, genuine `PlcException` transport faults map to `BadDeviceFailure`/`Degraded`, `NotSupportedException` maps to `BadNotSupported`, the `IsAccessDenied` PlcException path maps to `BadNotSupported`/`Faulted`, and the catch-all maps to `BadCommunicationError` with a health update — matching `ReadAsync`'s structure.
### Driver.S7-009
| Field | Value |
|---|---|
| Severity | Low |
| Category | Error handling & resilience |
| Location | `S7Driver.cs:392` |
| Status | Resolved |
**Description:** The subscription poll loop never reflects sustained polling
failure anywhere an operator can see it. PollLoopAsync swallows every
non-cancellation exception with an empty catch and the comment claims "the health
surface reflects it" - but a poll failure routes through ReadAsync, which only
sets DriverState.Degraded when the per-tag read throws inside the gate;
exceptions thrown before that (e.g. RequirePlc() when Plc is null after a drop)
bypass the health update entirely. A subscription against an uninitialized or
dropped driver loops forever silently, with no backoff - re-polling every
Interval indefinitely on a hard failure.
**Recommendation:** Have the poll loop update _health on repeated failure and
apply a capped backoff after consecutive errors; at minimum log the swallowed
exception (see Driver.S7-004).
**Resolution:** Resolved 2026-05-23 — `PollLoopAsync` now tracks
`consecutiveFailures`, calls new `HandlePollFailure` which both logs (with the
failure count) AND degrades `_health` to `Degraded` once
`PollFailureHealthThreshold` (1) consecutive failures have accumulated, and
applies a capped exponential backoff via new `ComputeBackoffDelay` (doubles the
wait each consecutive failure up to a 30 s `PollBackoffCap`). A healthy tick
resets the counter so the cadence snaps back to the configured Interval.
`HandlePollFailure` refuses to downgrade a `Faulted` state (reserved for
permanent config faults like PUT/GET-denied). Regression test
`PollLoop_against_uninitialized_driver_degrades_health` proves the health
surface now reflects sustained failure; `PollLoop_applies_capped_backoff_after_consecutive_failures`
proves shutdown still completes inside the drain window even under a fault
storm.
### Driver.S7-010
| Field | Value |
|---|---|
| Severity | Low |
| Category | Performance & resource management |
| Location | `S7Driver.cs:504` |
| Status | Resolved |
**Description:** Dispose() is implemented as
DisposeAsync().AsTask().GetAwaiter().GetResult() - sync-over-async. Inside the
generic host this is currently safe (no captured SynchronizationContext), but it
is a known deadlock pattern. The only async work behind DisposeAsync is
ShutdownAsync, which does nothing async (returns Task.CompletedTask). The
blocking wrap is unnecessary risk.
**Recommendation:** Since ShutdownAsync is effectively synchronous, have Dispose()
perform the teardown directly (cancel CTSs, close Plc, dispose _gate) without
round-tripping through the async path.
**Resolution:** Resolved 2026-05-23 — `Dispose()` now performs teardown
directly via a new private `SynchronousTeardown` method that mirrors
`ShutdownAsync` but uses `Task.WhenAll(...).Wait(DrainTimeout)` instead of
`await Task.WhenAll(...).WaitAsync(...)`. Probe + poll Tasks are still drained
with the bounded 5 s timeout (so a wedged loop cannot hang `Dispose` indefinitely),
but the sync path no longer round-trips through `DisposeAsync().AsTask().GetAwaiter().GetResult()`.
`DisposeAsync` keeps its existing implementation for callers that opt into the
async dispose pattern. Regression tests
`Dispose_completes_synchronously_without_sync_over_async_round_trip` and
`Dispose_is_idempotent`.
### Driver.S7-011
| Field | Value |
|---|---|
| Severity | High |
| Category | Design-document adherence |
| Location | `S7Driver.cs:82`, `S7Driver.cs:134`, `IDriver.cs:24` |
| Status | Resolved |
**Description:** S7Driver ignores the driverConfigJson parameter on both
InitializeAsync and ReinitializeAsync. The IDriver contract states InitializeAsync
initializes the driver "from its DriverConfig JSON" and ReinitializeAsync "applies
a config change in place". All configuration is instead captured in the
constructor (S7DriverOptions options), and ReinitializeAsync simply calls
ShutdownAsync then InitializeAsync with the same options object. Consequently a
config change delivered to ReinitializeAsync (the documented IGenerationApplier
recovery path per driver-stability.md) is silently discarded - the driver
re-opens with the old config. This breaks the only Core-initiated in-process
recovery path.
**Recommendation:** Either re-parse driverConfigJson inside
InitializeAsync/ReinitializeAsync and rebuild _options from it, or document
explicitly that S7 reconfiguration requires instance recreation and have
ReinitializeAsync signal that the passed JSON is unused so the contract mismatch
is visible.
**Resolution:** Resolved 2026-05-22 — config parsing was factored out of the
factory into `S7DriverFactoryExtensions.ParseOptions`. `InitializeAsync` (and
therefore `ReinitializeAsync`, which delegates to it) now re-parses
`driverConfigJson` and rebuilds `_options` from it whenever the document carries
a real body, so a config change delivered through `ReinitializeAsync` — the only
Core-initiated in-process recovery path — is honoured. An empty / placeholder
document (`""`, `{}`, `[]`) keeps the constructor-supplied options so existing
lifecycle unit tests that pass `"{}"` are unaffected.
### Driver.S7-012
| Field | Value |
|---|---|
| Severity | Medium |
| Category | Design-document adherence |
| Location | `S7DriverOptions.cs:59`, `S7Driver.cs:457` |
| Status | Resolved |
**Description:** S7ProbeOptions.ProbeAddress is configured (default "MW0"),
documented at length ("the driver runs a tick loop that issues a cheap read
against S7ProbeOptions.ProbeAddress"), surfaced in the factory DTO
(S7ProbeDto.ProbeAddress), and parsed from JSON - but it is never read by any
code. ProbeLoopAsync probes liveness via plc.ReadStatusAsync (CPU status), not via
a read of ProbeAddress. The XML doc on the S7DriverOptions.Probe property and on
ProbeAddress describes behaviour the driver does not implement. An operator who
sets ProbeAddress to a known-good DB word expecting the probe to exercise it will
see no effect.
**Recommendation:** Either make ProbeLoopAsync actually read ProbeAddress
(parsing it once at init and rejecting a bad value early), or delete ProbeAddress
from S7ProbeOptions/S7ProbeDto and correct the XML docs to describe the
ReadStatusAsync-based probe.
**Resolution:** Resolved 2026-05-22 — removed `ProbeAddress` from `S7ProbeOptions` and `S7ProbeDto`; updated the `S7DriverOptions.Probe` XML doc to describe the `ReadStatusAsync`-based probe accurately. Existing configs that set `probeAddress` are silently ignored (unknown JSON fields are tolerated by the deserializer).
### Driver.S7-013
| Field | Value |
|---|---|
| Severity | Low |
| Category | Code organization & conventions |
| Location | `S7DriverOptions.cs:90`, `S7Driver.cs:300` |
| Status | Resolved |
**Description:** S7TagDefinition.StringLength is a public configured/JSON-bound
parameter (default 254) but is dead: S7DataType.String reads and writes both
throw NotSupportedException ("...land in a follow-up PR"), so StringLength is
never consumed. Likewise S7DataType.Int64, UInt64, Float64, String, and DateTime
are exposed as configurable, browse through MapDataType into real DriverDataType
values, and pass DiscoverAsync - creating address-space nodes - yet every
read/write of them throws NotSupportedException, becoming BadNotSupported. A site
can configure a Float64 tag, see the node appear, and get BadNotSupported on
every access. The scaffold/follow-up-PR split leaks half-implemented types into
the configurable surface.
**Recommendation:** Reject the not-yet-implemented S7DataType values (and
StringLength) at InitializeAsync / factory validation with a clear "not yet
supported" error, so a partially-implemented type cannot be configured into a
live address space.
**Resolution:** Resolved 2026-05-23 — `InitializeAsync` now runs new
`RejectUnsupportedTagDataTypes`, which throws `NotSupportedException` for any
tag whose `DataType` is in the `UnimplementedDataTypes` set (`Int64`, `UInt64`,
`Float64`, `String`, `DateTime`). The half-implemented types can no longer leak
into the live address space — a site that configures one fails fast at init
rather than seeing a node that returns `BadNotSupported` on every access.
Entries should be removed from `UnimplementedDataTypes` as each type is wired
through; the comment on `RejectUnsupportedTagDataTypes` makes it a single grep
target for that follow-up. `StringLength` remains in `S7TagDefinition` because
removing it would be a breaking change to existing config JSON; once `String`
is implemented it will be consumed without further config changes. Regression
tests `Initialize_rejects_not_yet_implemented_data_type_with_NotSupportedException`
(Theory, 5 types) and `Initialize_accepts_implemented_data_types` (Theory, 7
types) prove the guard is targeted.
### Driver.S7-014
| Field | Value |
|---|---|
| Severity | Medium |
| Category | Testing coverage |
| Location | `tests/Drivers/ZB.MOM.WW.OtOpcUa.Driver.S7.Tests/` |
| Status | Resolved |
**Description:** Test coverage has notable gaps for the driver behavioural
core: (1) no test exercises the ReadOneAsync type-reinterpret switch (Int16 from
ushort, Int32 from uint, Float32 from UInt32 bits) - the most logic-heavy method
in the driver is untested, and the unsigned/signed unchecked casts are
unverified; (2) no test covers a Timer/Counter tag end-to-end, which would have
caught Driver.S7-001; (3) no test covers WriteOneAsync boxing conversions or
the out-of-range Convert failure paths; (4) the read-write tests only cover error
paths (uninitialized, bad address) - the happy path is explicitly deferred to "a
follow-up PR" with no mock S7 server, leaving the entire successful read, write,
poll, and probe-transition surface untested; (5) ReinitializeAsync and the
driverConfigJson-ignored behaviour (Driver.S7-011) has no test.
**Recommendation:** Add unit tests for ReadOneAsync/WriteOneAsync type mapping by
factoring the pure reinterpret/boxing logic out of the PLC round-trip so it is
testable without a live PLC, and add a Timer/Counter rejection test. Track the
live/mock-server happy-path coverage as an explicit follow-up rather than an
open-ended deferral.
**Resolution:** Resolved 2026-05-22 — factored `ReadOneAsync` type-reinterpret into `internal static ReinterpretRawValue` and `WriteOneAsync` boxing into `internal static BoxValueForWrite`; added `S7TypeMappingTests.cs` (26 tests) covering every implemented type round-trip (Bool/Byte/UInt16/Int16/UInt32/Int32/Float32), unsupported-type `NotSupportedException` assertions, and write overflow paths.