docs(code-reviews): comprehensive per-module review pass at 76d35d1

Reviewed all 31 src/ production projects against the 10-category
checklist in REVIEW-PROCESS.md. Each module gets its own findings.md;
code-reviews/README.md is regenerated from them.

334 findings: 6 Critical, 46 High, 126 Medium, 156 Low.

Critical findings:
- Server-001: WriteNodeIdUnknown recurses unconditionally — a HistoryRead
  on an unresolvable node crashes the process (remote DoS).
- Admin-001/002: app-wide auth bypass (RouteView not AuthorizeRouteView)
  plus unauthenticated mutating routes.
- Core.Scripting-001: System.Environment reachable from operator scripts;
  Environment.Exit() terminates the server.
- Core.AlarmHistorian-001: rowIds/events parallel-list desync on a corrupt
  payload misapplies outcomes — silent alarm-event data loss.
- Driver.Galaxy-001: ReconnectSupervisor is built but never triggered, so
  a transient gateway drop permanently kills the event stream.

All findings are Status=Open; resolution is tracked per REVIEW-PROCESS.md
section 4. Review only — no source code changed.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
Joseph Doherty
2026-05-22 05:20:27 -04:00
parent 76d35d1b9f
commit 8568f5cd85
32 changed files with 8134 additions and 2 deletions

View File

@@ -0,0 +1,330 @@
# Code Review — Driver.FOCAS
| Field | Value |
|---|---|
| Module | `src/Drivers/ZB.MOM.WW.OtOpcUa.Driver.FOCAS` |
| Reviewer | Claude Code |
| Review date | 2026-05-22 |
| Commit reviewed | `76d35d1` |
| Status | Reviewed |
| Open findings | 12 |
## Checklist coverage
A comprehensive review completes every category, recording "No issues found" where
a category produced nothing rather than leaving it blank.
| # | Category | Result |
|---|---|---|
| 1 | Correctness & logic bugs | Driver.FOCAS-001, Driver.FOCAS-002, Driver.FOCAS-003 |
| 2 | OtOpcUa conventions | Driver.FOCAS-004 |
| 3 | Concurrency & thread safety | Driver.FOCAS-005 |
| 4 | Error handling & resilience | Driver.FOCAS-006, Driver.FOCAS-007 |
| 5 | Security | No issues found |
| 6 | Performance & resource management | Driver.FOCAS-008 |
| 7 | Design-document adherence | Driver.FOCAS-009 |
| 8 | Code organization & conventions | Driver.FOCAS-010, Driver.FOCAS-011 |
| 9 | Testing coverage | Driver.FOCAS-012 |
| 10 | Documentation & comments | No issues found |
## Findings
### Driver.FOCAS-001
| Field | Value |
|---|---|
| Severity | High |
| Category | Correctness & logic bugs |
| Location | `FocasDriverFactoryExtensions.cs:54-86`, `FocasDriverFactoryExtensions.cs:132-140` |
| Status | Open |
**Description:** `FocasDriverConfigDto` exposes only `Backend`, `Series`, `TimeoutMs`,
`Devices`, `Tags`, and `Probe`. It has no `FixedTree`, `AlarmProjection`, or
`HandleRecycle` properties, and `CreateInstance` never sets those three options on
`FocasDriverOptions`. As a result, a deployment that follows the documented config -
`docs/drivers/FOCAS.md` shows `"FixedTree": { "Enabled": true }`,
`"AlarmProjection": { "Enabled": true }`, and `"HandleRecycle": { "Enabled": true }`
inside `Config` - is parsed with `PropertyNameCaseInsensitive` and the unknown sections
are discarded. The features stay at their hard-coded defaults (all `Enabled = false`).
The fixed-node tree never appears, alarm subscriptions throw `NotSupportedException`
("FOCAS alarm projection is disabled"), and handle recycling never runs - despite the
operator explicitly opting in.
**Recommendation:** Add `FixedTree`, `AlarmProjection`, and `HandleRecycle` DTO classes
to `FocasDriverConfigDto`, parse their `TimeSpan`/`bool` fields, and populate the
corresponding `FocasDriverOptions` properties in `CreateInstance`. Consider enabling
strict JSON handling (`UnmappedMemberHandling.Disallow`) so future unknown config
sections fail loudly instead of being dropped.
**Resolution:** _(open)_
### Driver.FOCAS-002
| Field | Value |
|---|---|
| Severity | High |
| Category | Correctness & logic bugs |
| Location | `WireFocasClient.cs:164-179`, `FocasDriver.cs:513`, `FocasDriver.cs:593` |
| Status | Open |
**Description:** The fixed-tree bootstrap probes the `ProgramInfo` capability via
`SafeTryProbe(() => client.GetProgramInfoAsync(ct))` and treats a non-null result as
"supported". But `WireFocasClient.GetProgramInfoAsync` never throws on a FOCAS error
return code: `ReadExecutingProgramNameAsync`, `ReadBlockCountAsync`, and
`ReadOperationModeCodeAsync` all return `FocasResult<T>` envelopes, and the method
substitutes defaults (`string.Empty`, `0`) when `IsOk` is false instead of throwing. It
only throws from `RequireConnected()`. Consequently `GetProgramInfoAsync` always
returns a non-null `FocasProgramInfo`, so `Capabilities.ProgramInfo` is set `true` even
on a CNC series that returns `EW_FUNC`/`EW_NOOPT` for `cnc_exeprgname2`/`cnc_rdopmode`.
The driver then emits the `Program/` and `OperationMode/` subtrees and polls them every
tick against a controller that does not support them - the exact "nodes that only ever
return BadDeviceFailure" outcome the capability suppression was designed to prevent
(`docs/drivers/FOCAS.md`, "Per-series node suppression").
**Recommendation:** Make `GetProgramInfoAsync` throw (or return a nullable result) when
the underlying `cnc_exeprgname2` / `cnc_rdopmode` calls report a non-zero RC, so
`SafeTryProbe` can correctly classify the series. At minimum require the program-name
or op-mode read to be `IsOk` before declaring the capability present.
**Resolution:** _(open)_
### Driver.FOCAS-003
| Field | Value |
|---|---|
| Severity | Medium |
| Category | Correctness & logic bugs |
| Location | `FocasDriver.cs:71-79` |
| Status | Open |
**Description:** In `InitializeAsync`, capability-matrix validation only runs when
`_devices.TryGetValue(tag.DeviceHostAddress, out var device)` succeeds. A tag whose
`DeviceHostAddress` does not match any configured device (a common config typo, e.g. a
trailing `:8193` mismatch or a wrong host) silently skips validation and is still added
to `_tagsByName`. The mistake is not surfaced at load time - it only manifests at read
time as `BadNodeIdUnknown` (`ReadAsync` lines 191-194), defeating the documented goal
that "config errors now fail at load instead of per-read"
(`docs/v2/focas-version-matrix.md`).
**Recommendation:** After parsing the tag address, if `_devices` does not contain
`tag.DeviceHostAddress`, throw an `InvalidOperationException` naming the tag and the
unresolved device host so the operator fixes the typo at startup.
**Resolution:** _(open)_
### Driver.FOCAS-004
| Field | Value |
|---|---|
| Severity | Medium |
| Category | OtOpcUa conventions |
| Location | `FocasDriver.cs:374-379`, `WireFocasClient.cs:48-50` |
| Status | Open |
**Description:** `DiscoverAsync` emits user tags with
`SecurityClass = tag.Writable ? SecurityClassification.Operate : SecurityClassification.ViewOnly`,
and `FocasTagDefinition.Writable` defaults to `true` (also defaulted to `true` in the
factory - `t.Writable ?? true`). But the production `wire` backend's
`WireFocasClient.WriteAsync` unconditionally returns `FocasStatusMapper.BadNotWritable`
- the driver is read-only against FOCAS by design (`docs/drivers/FOCAS.md`). The result
is that every tag is advertised in the address space as a writable `Operate` node, yet
every write attempt fails. This is misleading to OPC UA clients and to the
`DriverNodeManager` ACL layer, which will grant write permission on nodes that can never
be written.
**Recommendation:** Either default `Writable` to `false` for the FOCAS driver, or have
`DiscoverAsync` force `SecurityClassification.ViewOnly` when the active backend cannot
write. Given the wire backend is read-only and is the only production backend, treating
all FOCAS tags as `ViewOnly` is the simplest correct behaviour.
**Resolution:** _(open)_
### Driver.FOCAS-005
| Field | Value |
|---|---|
| Severity | Medium |
| Category | Concurrency & thread safety |
| Location | `FocasDriver.cs:28`, `FocasDriver.cs:206-215`, `FocasDriver.cs:261`, `FocasDriver.cs:274` |
| Status | Open |
**Description:** `_health` is a plain (non-volatile) field mutated from multiple
concurrent contexts - `ReadAsync`, `WriteAsync`, and the per-device `ProbeLoopAsync` can
all run on different threads simultaneously (subscriptions go through `PollGroupEngine`
timers; probe loops are `Task.Run`). Several updates are read-modify-write - e.g.
`new DriverHealth(DriverState.Degraded, _health.LastSuccessfulRead, ...)` reads `_health`
then writes a new instance - so a concurrent update can be lost or a stale
`LastSuccessfulRead` propagated. While `DriverHealth` is an immutable record and the
reference write is atomic, the lack of synchronization means `GetHealth()` can observe
torn-in-time state and successful-read timestamps can regress.
**Recommendation:** Guard `_health` reads/writes with a lock, or use
`Interlocked.Exchange`/`Volatile` around the whole record reference and compute the new
value from a single captured snapshot. The `DeviceState`/`HostState` transition already
uses `ProbeLock`; apply the same discipline to driver health.
**Resolution:** _(open)_
### Driver.FOCAS-006
| Field | Value |
|---|---|
| Severity | Medium |
| Category | Error handling & resilience |
| Location | `FocasDriver.cs:859-874`, `WireFocasClient.cs:22-31` |
| Status | Open |
**Description:** `EnsureConnectedAsync` reuses the cached `IFocasClient` instance across
a transient disconnect: it only checks `device.Client is { IsConnected: true }` and
otherwise calls `ConnectAsync` again on the same object. For a `WireFocasClient` whose
underlying `FocasWireClient` has been disposed (e.g. via a `HandleRecycle` /
`DisposeClient` race, or a prior teardown), every subsequent call hits
`FocasWireClient.ThrowIfDisposed` and throws `ObjectDisposedException`. In `ReadAsync`
that exception is caught only by the generic `catch (Exception ex)` and mapped to a
permanent `BadCommunicationError` - the device stays wedged with no recovery path until
`ReinitializeAsync` is invoked, because the reconnect logic never discards the disposed
client.
**Recommendation:** On any connect/use failure, treat a disposed or non-connected client
as unrecoverable and recreate it from `_clientFactory`. Simplest: in
`EnsureConnectedAsync`, when `device.Client` is non-null but not connected, dispose and
null it before creating a fresh instance, rather than retrying `ConnectAsync` on the
stale object.
**Resolution:** _(open)_
### Driver.FOCAS-007
| Field | Value |
|---|---|
| Severity | Low |
| Category | Error handling & resilience |
| Location | `FocasDriver.cs:140-148`, `FocasDriver.cs:478-484`, `FocasDriver.cs:529-533`, `FocasAlarmProjection.cs:61-63` |
| Status | Open |
**Description:** Numerous `try { ... } catch {}` blocks swallow every exception with no
logging - `ShutdownAsync` (CTS cancel/dispose), `RecycleLoopAsync` (`DisposeClient`),
`FixedTreeLoopAsync` transient catches, `ProbeLoopAsync`, and the alarm projection's
`sub.Cts.Cancel()`. The driver takes no `ILogger` dependency at all (only
`FocasWireClient` optionally accepts one, and the driver never supplies it). A CNC that
is silently failing every probe/poll tick produces no diagnostic trail, which conflicts
with the project's Serilog logging convention and forces field troubleshooting to rely
solely on `GetHealth()`.
**Recommendation:** Inject an `ILogger<FocasDriver>` and log caught exceptions in the
poll/probe/recycle loops at `Debug`/`Warning`. Pass a logger into `FocasWireClient` so
the per-response `Debug` entries it already emits are actually captured.
**Resolution:** _(open)_
### Driver.FOCAS-008
| Field | Value |
|---|---|
| Severity | Low |
| Category | Performance & resource management |
| Location | `FocasDriver.cs:201`, `FocasDriver.cs:253` |
| Status | Open |
**Description:** `ReadAsync` and `WriteAsync` call `FocasAddress.TryParse(def.Address)`
on every operation, even though `InitializeAsync` already parsed and validated every
tag address. On a subscription hot path (each poll tick re-enters `ReadAsync`) this
re-parses and allocates a `FocasAddress` record per tag per tick unnecessarily.
**Recommendation:** Parse each tag address once at `InitializeAsync` and store the
parsed `FocasAddress` on `FocasTagDefinition` (or in a side dictionary), so the runtime
read/write paths use the cached value.
**Resolution:** _(open)_
### Driver.FOCAS-009
| Field | Value |
|---|---|
| Severity | Low |
| Category | Design-document adherence |
| Location | `FocasDriverOptions.cs:110-115`, `FocasDriver.cs:468-486`, `FocasDriverFactoryExtensions.cs:75-80` |
| Status | Open |
**Description:** `FocasProbeOptions.Timeout` is parsed by the factory
(`FocasProbeDto.TimeoutMs` to `FocasProbeOptions.Timeout`) but never consumed.
`ProbeLoopAsync` calls `client.ProbeAsync(ct)` with only the probe-loop cancellation
token; no per-probe timeout is applied, and `EnsureConnectedAsync` uses
`_options.Timeout` rather than `Probe.Timeout`. A hung CNC socket during a probe blocks
until the OS TCP timeout rather than the configured `Probe.Timeout`.
**Recommendation:** Apply `Probe.Timeout` as a linked `CancellationTokenSource` timeout
around the `ProbeAsync` call, or remove the dead `Timeout` field from
`FocasProbeOptions` / `FocasProbeDto` if it is genuinely not intended.
**Resolution:** _(open)_
### Driver.FOCAS-010
| Field | Value |
|---|---|
| Severity | Low |
| Category | Code organization & conventions |
| Location | `IFocasClient.cs:210-227` (`FocasOpMode`), `FocasConstants.cs:42-78` (`FocasOperationMode`) |
| Status | Open |
**Description:** There are two parallel operation-mode-to-text mappings with divergent
labels. `FocasOpMode.ToText` (used by the driver fixed-tree `OperationMode/ModeText`
node) yields `"TJOG"`, `"TEACH_IN_HANDLE"`; `FocasOperationModeExtensions.ToText` (in
the Wire layer) yields `"T-JOG"`, `"TEACH-IN-HANDLE"`. They also use different fallback
formats (`Mode{mode}` vs the bare number). The same concept is encoded twice with
inconsistent results depending on which path renders it.
**Recommendation:** Consolidate to a single op-mode enum + `ToText` helper shared by
both the wire layer and the driver projection, with one canonical label set.
**Resolution:** _(open)_
### Driver.FOCAS-011
| Field | Value |
|---|---|
| Severity | Low |
| Category | Code organization & conventions |
| Location | `IFocasClient.cs:275-287` (`FocasAlarmType`), `FocasAlarmProjection.cs:149-175` |
| Status | Open |
**Description:** `FocasAlarmType` declares its constants as `public const int`, but the
only consumers - `FocasAlarmProjection.MapAlarmType(short type)` and
`MapSeverity(short type)` - take a `short` and `switch` against these `int` constants. It
compiles only because the values (0..13) fit in `short` range as constant expressions.
The type mismatch is a latent maintenance hazard: adding a constant above
`short.MaxValue`, or changing the projection signatures, would break the switch in
non-obvious ways. `FocasAlarmType.All` is `-1` and is also passed where a `short` is
expected by `ReadAlarmsAsync`.
**Recommendation:** Declare the `FocasAlarmType` constants as `short` (or make it an
`enum : short`) so the type matches the wire field width and the projection signatures.
**Resolution:** _(open)_
### Driver.FOCAS-012
| Field | Value |
|---|---|
| Severity | Medium |
| Category | Testing coverage |
| Location | `FocasDriverFactoryExtensions.cs`, `FocasDriver.cs:495-629` (`FixedTreeLoopAsync`) |
| Status | Open |
**Description:** The unit test project does not exercise
`FocasDriverFactoryExtensions.CreateInstance` with `FixedTree` / `AlarmProjection` /
`HandleRecycle` config sections - which is why the config-mapping gap in
Driver.FOCAS-001 was not caught. There is also no test that drives the fixed-tree
bootstrap / capability-probe path (`FixedTreeLoopAsync`), so the false-positive
`ProgramInfo` capability in Driver.FOCAS-002 is untested, and the
`EnsureConnectedAsync` reconnect-after-disconnect path (Driver.FOCAS-006) has no
coverage.
**Recommendation:** Add factory tests that round-trip a full JSON config including the
three opt-in sections and assert the options reach the driver; add a
`FakeFocasClient`-driven test for fixed-tree bootstrap capability classification
(including the unsupported-program-info case); add a reconnect test that disposes the
fake client mid-session and asserts recovery.
**Resolution:** _(open)_