docs(code-reviews): comprehensive per-module review pass at 76d35d1

Reviewed all 31 src/ production projects against the 10-category
checklist in REVIEW-PROCESS.md. Each module gets its own findings.md;
code-reviews/README.md is regenerated from them.

334 findings: 6 Critical, 46 High, 126 Medium, 156 Low.

Critical findings:
- Server-001: WriteNodeIdUnknown recurses unconditionally — a HistoryRead
  on an unresolvable node crashes the process (remote DoS).
- Admin-001/002: app-wide auth bypass (RouteView not AuthorizeRouteView)
  plus unauthenticated mutating routes.
- Core.Scripting-001: System.Environment reachable from operator scripts;
  Environment.Exit() terminates the server.
- Core.AlarmHistorian-001: rowIds/events parallel-list desync on a corrupt
  payload misapplies outcomes — silent alarm-event data loss.
- Driver.Galaxy-001: ReconnectSupervisor is built but never triggered, so
  a transient gateway drop permanently kills the event stream.

All findings are Status=Open; resolution is tracked per REVIEW-PROCESS.md
section 4. Review only — no source code changed.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
Joseph Doherty
2026-05-22 05:20:27 -04:00
parent 76d35d1b9f
commit 8568f5cd85
32 changed files with 8134 additions and 2 deletions

View File

@@ -0,0 +1,383 @@
# Code Review — Driver.S7
| Field | Value |
|---|---|
| Module | `src/Drivers/ZB.MOM.WW.OtOpcUa.Driver.S7` |
| Reviewer | Claude Code |
| Review date | 2026-05-22 |
| Commit reviewed | `76d35d1` |
| Status | Reviewed |
| Open findings | 14 |
## Checklist coverage
A comprehensive review completes every category, recording "No issues found" where
a category produced nothing rather than leaving it blank.
| # | Category | Result |
|---|---|---|
| 1 | Correctness & logic bugs | Driver.S7-001, Driver.S7-002, Driver.S7-003 |
| 2 | OtOpcUa conventions | Driver.S7-004, Driver.S7-005 |
| 3 | Concurrency & thread safety | Driver.S7-006 |
| 4 | Error handling & resilience | Driver.S7-007, Driver.S7-008, Driver.S7-009 |
| 5 | Security | No issues found |
| 6 | Performance & resource management | Driver.S7-010 |
| 7 | Design-document adherence | Driver.S7-011, Driver.S7-012 |
| 8 | Code organization & conventions | Driver.S7-013 |
| 9 | Testing coverage | Driver.S7-014 |
| 10 | Documentation & comments | Driver.S7-012 (shared) |
## Findings
### Driver.S7-001
| Field | Value |
|---|---|
| Severity | High |
| Category | Correctness & logic bugs |
| Location | `S7AddressParser.cs:93`, `S7Driver.cs:231` |
| Status | Open |
**Description:** S7AddressParser.Parse accepts Timer (T0) and Counter (C0)
addresses and the test suite asserts they parse successfully, but the read path
cannot serve them. Two problems compound: (1) ReadOneAsync type-mapping switch
(lines 231-250) has no case for any Timer/Counter combination, so a Timer/Counter
tag falls through to the default arm and throws InvalidDataException with a
misleading "type-mismatch" message on every read; (2) the read is issued via
plc.ReadAsync(tag.Address, ...) passing the raw address string, and S7.Net
string-based parser does not understand T{n}/C{n} syntax. A tag configured with a
timer or counter address passes init-time parsing (the docstring promises config
typos fail fast at init) and then fails on every read - exactly the
un-diagnosable failure mode the fail-fast parse was meant to prevent.
**Recommendation:** Either drop Timer/Counter from S7AddressParser and S7Area
until they are wired through to S7.Net, or implement the Timer/Counter read path.
If kept, reject Timer/Counter tags at InitializeAsync with a clear "not yet
supported" error rather than letting them parse clean.
**Resolution:** _(open)_
### Driver.S7-002
| Field | Value |
|---|---|
| Severity | Medium |
| Category | Correctness & logic bugs |
| Location | `S7Driver.cs:350` |
| Status | Open |
**Description:** MapDataType collapses S7DataType.UInt32 to DriverDataType.Int32.
UInt32 values above int.MaxValue (2^31-1) wrap to negative when surfaced to the
OPC UA client, silently corrupting the value. The inline comment only flags
Int64/UInt64 as "widens; lossy" but UInt32 to Int32 is equally lossy and is not
called out.
**Recommendation:** Map UInt32/UInt16 to a DriverDataType wide enough to hold the
unsigned range, or add the missing unsigned DriverDataType members. At minimum
correct the comment so the lossiness of UInt32 is documented.
**Resolution:** _(open)_
### Driver.S7-003
| Field | Value |
|---|---|
| Severity | Low |
| Category | Correctness & logic bugs |
| Location | `S7Driver.cs:172`, `S7Driver.cs:255` |
| Status | Open |
**Description:** ReadAsync and WriteAsync dereference fullReferences.Count /
writes.Count with no null guard. A null argument throws NullReferenceException
rather than ArgumentNullException, and the NRE escapes before the _gate is taken
so it is not wrapped in a per-item status. DiscoverAsync correctly uses
ArgumentNullException.ThrowIfNull(builder); the read/write entry points are
inconsistent with it.
**Recommendation:** Add ArgumentNullException.ThrowIfNull for the list parameters
at the top of ReadAsync and WriteAsync.
**Resolution:** _(open)_
### Driver.S7-004
| Field | Value |
|---|---|
| Severity | Medium |
| Category | OtOpcUa conventions |
| Location | `S7Driver.cs` (whole file) |
| Status | Open |
**Description:** The driver performs no logging. CLAUDE.md Library Preferences
mandate Serilog with a rolling daily file sink. Every error path is an empty
catch block (Initialize cleanup line 130, ShutdownAsync lines 142/149/153,
ProbeLoop line 483, PollLoop lines 396/406, Dispose line 511). Connection faults,
probe transitions, PUT/GET-disabled config errors, and poll-loop exceptions are
all silently swallowed. An operator has only the DriverHealth.LastError string
and no event trail to diagnose an intermittent PLC.
**Recommendation:** Inject an ILogger/ILoggerFactory and log connect
success/failure, probe Running/Stopped transitions, PUT/GET-disabled detection,
and swallowed poll-loop / shutdown exceptions.
**Resolution:** _(open)_
### Driver.S7-005
| Field | Value |
|---|---|
| Severity | Low |
| Category | OtOpcUa conventions |
| Location | `S7Driver.cs:33`, `S7Driver.cs:433` |
| Status | Open |
**Description:** System.Collections.Concurrent.ConcurrentDictionary is written
out with a fully-qualified namespace at the field declarations instead of a
using System.Collections.Concurrent directive. ImplicitUsings is enabled and the
rest of the codebase relies on using directives; the inline FQN is inconsistent
with house style. Similar redundant global::S7.Net.* qualifiers appear throughout
S7Driver.cs despite the file-top using S7.Net.
**Recommendation:** Add using System.Collections.Concurrent and drop the
redundant global::S7.Net. qualifiers where using S7.Net already covers them.
**Resolution:** _(open)_
### Driver.S7-006
| Field | Value |
|---|---|
| Severity | High |
| Category | Concurrency & thread safety |
| Location | `S7Driver.cs:140`, `S7Driver.cs:457`, `S7Driver.cs:506` |
| Status | Open |
**Description:** Disposal races with the in-flight probe / poll tasks.
ShutdownAsync calls _probeCts.Cancel() and cancels each subscription CTS, but it
does not await the ProbeLoopAsync / PollLoopAsync tasks (they are fire-and-forget
Task.Run with the task handle discarded). DisposeAsync then calls ShutdownAsync
followed immediately by _gate.Dispose(). A probe or poll iteration that is
between _gate.WaitAsync and _gate.Release() when cancellation fires will call
Release() (line 479) or have WaitAsync observe a disposed semaphore -
ObjectDisposedException. The probe loop broad catch swallows it, but the
disposal-ordering bug is real: the semaphore can be disposed while a worker still
holds or is waiting on it. The same applies to _probeCts.Dispose() (line 143)
running while ProbeLoopAsync may still touch the linked token.
**Recommendation:** Track the probe and poll Task handles, and in ShutdownAsync
(or DisposeAsync) await Task.WhenAll(...) with a bounded timeout after cancelling,
before disposing _gate and the CTS objects.
**Resolution:** _(open)_
### Driver.S7-007
| Field | Value |
|---|---|
| Severity | High |
| Category | Error handling & resilience |
| Location | `S7Driver.cs:200`, `S7DriverOptions.cs:13`, `docs/v2/driver-specs.md:434` |
| Status | Open |
**Description:** PUT/GET-disabled handling contradicts the design and the
module own docstring. driver-specs.md section 5 (line 434) and the
S7DriverOptions class remark both state PUT/GET-disabled must be mapped to
BadNotSupported and surfaced as a configuration alert, not a transient fault,
because blind retry is wasted effort. The actual code (ReadAsync, lines 200-208)
catches every S7.Net.PlcException and maps it to StatusBadDeviceFailure, then
sets health to Degraded. Consequences: (1) a genuinely transient PlcException
(e.g. CPU briefly in STOP) is reported identically to a permanent PUT/GET
misconfiguration - the operator cannot tell a config problem from a transient
one, which is the exact distinction the spec demands; (2) the promised
BadNotSupported status code is never produced, so the S7DriverOptions docstring
is now false.
**Recommendation:** Inspect PlcException.ErrorCode and map the
PUT/GET-disabled / access-denied code to BadNotSupported with a distinct
config-alert health state; keep BadDeviceFailure/Degraded only for genuine
device-fault error codes.
**Resolution:** _(open)_
### Driver.S7-008
| Field | Value |
|---|---|
| Severity | Medium |
| Category | Error handling & resilience |
| Location | `S7Driver.cs:286` |
| Status | Open |
**Description:** WriteAsync catch ladder is coarser than ReadAsync and loses
information. The generic catch (Exception) maps everything - socket errors,
timeouts, OverflowException from Convert.ToInt16 of an out-of-range value,
NullReferenceException from Convert.ToBoolean(null) - to StatusBadInternalError.
A genuine transport fault during a write is reported to the client as an internal
error rather than BadCommunicationError, and unlike ReadAsync the write path never
updates _health on failure, so a PLC that is down stays Healthy in the dashboard
as long as only writes are attempted. OperationCanceledException is also caught
and turned into a status code rather than propagating.
**Recommendation:** Mirror the ReadAsync catch structure: let
OperationCanceledException propagate, map socket/timeout faults to
BadCommunicationError, map value-conversion failures to a distinct out-of-range
status, and update _health to Degraded on transport failures.
**Resolution:** _(open)_
### Driver.S7-009
| Field | Value |
|---|---|
| Severity | Low |
| Category | Error handling & resilience |
| Location | `S7Driver.cs:392` |
| Status | Open |
**Description:** The subscription poll loop never reflects sustained polling
failure anywhere an operator can see it. PollLoopAsync swallows every
non-cancellation exception with an empty catch and the comment claims "the health
surface reflects it" - but a poll failure routes through ReadAsync, which only
sets DriverState.Degraded when the per-tag read throws inside the gate;
exceptions thrown before that (e.g. RequirePlc() when Plc is null after a drop)
bypass the health update entirely. A subscription against an uninitialized or
dropped driver loops forever silently, with no backoff - re-polling every
Interval indefinitely on a hard failure.
**Recommendation:** Have the poll loop update _health on repeated failure and
apply a capped backoff after consecutive errors; at minimum log the swallowed
exception (see Driver.S7-004).
**Resolution:** _(open)_
### Driver.S7-010
| Field | Value |
|---|---|
| Severity | Low |
| Category | Performance & resource management |
| Location | `S7Driver.cs:504` |
| Status | Open |
**Description:** Dispose() is implemented as
DisposeAsync().AsTask().GetAwaiter().GetResult() - sync-over-async. Inside the
generic host this is currently safe (no captured SynchronizationContext), but it
is a known deadlock pattern. The only async work behind DisposeAsync is
ShutdownAsync, which does nothing async (returns Task.CompletedTask). The
blocking wrap is unnecessary risk.
**Recommendation:** Since ShutdownAsync is effectively synchronous, have Dispose()
perform the teardown directly (cancel CTSs, close Plc, dispose _gate) without
round-tripping through the async path.
**Resolution:** _(open)_
### Driver.S7-011
| Field | Value |
|---|---|
| Severity | High |
| Category | Design-document adherence |
| Location | `S7Driver.cs:82`, `S7Driver.cs:134`, `IDriver.cs:24` |
| Status | Open |
**Description:** S7Driver ignores the driverConfigJson parameter on both
InitializeAsync and ReinitializeAsync. The IDriver contract states InitializeAsync
initializes the driver "from its DriverConfig JSON" and ReinitializeAsync "applies
a config change in place". All configuration is instead captured in the
constructor (S7DriverOptions options), and ReinitializeAsync simply calls
ShutdownAsync then InitializeAsync with the same options object. Consequently a
config change delivered to ReinitializeAsync (the documented IGenerationApplier
recovery path per driver-stability.md) is silently discarded - the driver
re-opens with the old config. This breaks the only Core-initiated in-process
recovery path.
**Recommendation:** Either re-parse driverConfigJson inside
InitializeAsync/ReinitializeAsync and rebuild _options from it, or document
explicitly that S7 reconfiguration requires instance recreation and have
ReinitializeAsync signal that the passed JSON is unused so the contract mismatch
is visible.
**Resolution:** _(open)_
### Driver.S7-012
| Field | Value |
|---|---|
| Severity | Medium |
| Category | Design-document adherence |
| Location | `S7DriverOptions.cs:59`, `S7Driver.cs:457` |
| Status | Open |
**Description:** S7ProbeOptions.ProbeAddress is configured (default "MW0"),
documented at length ("the driver runs a tick loop that issues a cheap read
against S7ProbeOptions.ProbeAddress"), surfaced in the factory DTO
(S7ProbeDto.ProbeAddress), and parsed from JSON - but it is never read by any
code. ProbeLoopAsync probes liveness via plc.ReadStatusAsync (CPU status), not via
a read of ProbeAddress. The XML doc on the S7DriverOptions.Probe property and on
ProbeAddress describes behaviour the driver does not implement. An operator who
sets ProbeAddress to a known-good DB word expecting the probe to exercise it will
see no effect.
**Recommendation:** Either make ProbeLoopAsync actually read ProbeAddress
(parsing it once at init and rejecting a bad value early), or delete ProbeAddress
from S7ProbeOptions/S7ProbeDto and correct the XML docs to describe the
ReadStatusAsync-based probe.
**Resolution:** _(open)_
### Driver.S7-013
| Field | Value |
|---|---|
| Severity | Low |
| Category | Code organization & conventions |
| Location | `S7DriverOptions.cs:90`, `S7Driver.cs:300` |
| Status | Open |
**Description:** S7TagDefinition.StringLength is a public configured/JSON-bound
parameter (default 254) but is dead: S7DataType.String reads and writes both
throw NotSupportedException ("...land in a follow-up PR"), so StringLength is
never consumed. Likewise S7DataType.Int64, UInt64, Float64, String, and DateTime
are exposed as configurable, browse through MapDataType into real DriverDataType
values, and pass DiscoverAsync - creating address-space nodes - yet every
read/write of them throws NotSupportedException, becoming BadNotSupported. A site
can configure a Float64 tag, see the node appear, and get BadNotSupported on
every access. The scaffold/follow-up-PR split leaks half-implemented types into
the configurable surface.
**Recommendation:** Reject the not-yet-implemented S7DataType values (and
StringLength) at InitializeAsync / factory validation with a clear "not yet
supported" error, so a partially-implemented type cannot be configured into a
live address space.
**Resolution:** _(open)_
### Driver.S7-014
| Field | Value |
|---|---|
| Severity | Medium |
| Category | Testing coverage |
| Location | `tests/Drivers/ZB.MOM.WW.OtOpcUa.Driver.S7.Tests/` |
| Status | Open |
**Description:** Test coverage has notable gaps for the driver behavioural
core: (1) no test exercises the ReadOneAsync type-reinterpret switch (Int16 from
ushort, Int32 from uint, Float32 from UInt32 bits) - the most logic-heavy method
in the driver is untested, and the unsigned/signed unchecked casts are
unverified; (2) no test covers a Timer/Counter tag end-to-end, which would have
caught Driver.S7-001; (3) no test covers WriteOneAsync boxing conversions or
the out-of-range Convert failure paths; (4) the read-write tests only cover error
paths (uninitialized, bad address) - the happy path is explicitly deferred to "a
follow-up PR" with no mock S7 server, leaving the entire successful read, write,
poll, and probe-transition surface untested; (5) ReinitializeAsync and the
driverConfigJson-ignored behaviour (Driver.S7-011) has no test.
**Recommendation:** Add unit tests for ReadOneAsync/WriteOneAsync type mapping by
factoring the pure reinterpret/boxing logic out of the PLC round-trip so it is
testable without a live PLC, and add a Timer/Counter rejection test. Track the
live/mock-server happy-path coverage as an explicit follow-up rather than an
open-ended deferral.
**Resolution:** _(open)_