Commit Graph

8 Commits

Author SHA1 Message Date
Joseph Doherty
1f29b215c8 fix(driver-historian-wonderware): resolve Low code-review findings (Driver.Historian.Wonderware-004,005,007,008,010,011,012)
- Driver.Historian.Wonderware-004: ToHistorianEvent synthesises a fresh
  Guid when the upstream EventId is unparseable and logs the substitution
  instead of writing the historian with Guid.Empty.
- Driver.Historian.Wonderware-005: GetHealthSnapshot derives the
  connection-open booleans from the active-node fields so the snapshot
  is self-consistent without depending on the secondary lock.
- Driver.Historian.Wonderware-007: SID-mismatch branch in PipeServer now
  sends a HelloAck { Accepted=false, RejectReason } so the client sees a
  symmetric rejection.
- Driver.Historian.Wonderware-008: classify StartQuery failures —
  connection-class codes drop the connection, query-class codes throw
  QueryClassStartQueryException so the IPC layer surfaces Success=false.
- Driver.Historian.Wonderware-010: RequestTimeoutSeconds now enforced
  via BuildRequestCts linked to the caller's CancellationToken.
- Driver.Historian.Wonderware-011: refreshed XML docs to describe the
  current sidecar / named-pipe architecture (Galaxy.Host / Proxy
  references reframed as historical context).
- Driver.Historian.Wonderware-012: pinned the previously-uncovered
  HistorianDataSource behaviours with five new test files; also removed
  the stale empty tests/ZB.MOM.WW.OtOpcUa.Driver.Historian.Wonderware.Tests
  directory.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-23 08:18:10 -04:00
Joseph Doherty
0d10d30b7d fix(driver-historian-wonderware): update findings.md open count after resolving -002 -003 -006 -009
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-22 09:24:36 -04:00
Joseph Doherty
1723f5d5cd fix(driver-historian-wonderware): resolve Medium code-review finding (Driver.Historian.Wonderware-009)
Apply _config.MaxValuesPerRead as a bucket cap in ReadAggregateAsync,
mirroring the existing cap in ReadRawAsync. Without this guard a processed
read over a wide time range with a small IntervalMs could accumulate an
unbounded HistorianAggregateSample list; if the serialised reply exceeded
the 16 MiB FrameWriter frame cap WriteAsync would throw and the client
correlation-id wait would hang. Truncation now logs a Warning with a hint
to widen IntervalMs or reduce the time range.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-22 09:24:25 -04:00
Joseph Doherty
7474631992 fix(driver-historian-wonderware): resolve Medium code-review finding (Driver.Historian.Wonderware-006)
Add exponential backoff (250 ms → 500 ms → 1 s → 2 s → 4 s → 8 s cap) to
PipeServer.RunAsync after each connection-loop exception, replacing the spin
loop that previously pegged a CPU core and flooded the log on persistent errors
such as a duplicate pipe name or a failing PipeAcl.Create. After 20 consecutive
failures the method re-throws so the SCM / NSSM supervisor can restart the
sidecar cleanly. A clean connection (even a short-lived one) resets the counter.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-22 09:23:42 -04:00
Joseph Doherty
02daacbfd0 fix(driver-historian-wonderware): resolve Medium code-review finding (Driver.Historian.Wonderware-003)
Extract the string-vs-numeric value selection from raw and at-time read
loops into a SelectValue helper method. aahClientManaged's HistoryQueryResult
has no data-type field in the bound SDK version, so the heuristic (prefer
StringValue when non-empty and Value==0) is unavoidable; the helper now
documents the limitation explicitly in its XML doc so the known edge case
(numeric tag at exactly zero with a formatted StringValue) is self-evident.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-22 09:23:00 -04:00
Joseph Doherty
205b07f6aa fix(driver-historian-wonderware): resolve Medium code-review finding (Driver.Historian.Wonderware-002)
Normalise req.Events to Array.Empty<AlarmHistorianEventDto>() immediately
after MessagePack deserialization in HandleWriteAlarmEventsAsync. MessagePack
deserializes an absent or explicit-nil array field as null, not Array.Empty,
so a peer that sends a null Events array would trigger a NullReferenceException
on either .Length dereference (no-writer branch or catch block), leaving the
client correlation-id wait hanging with no reply frame.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-22 09:21:46 -04:00
Joseph Doherty
f982fa1f69 fix(driver-historian-wonderware): resolve High code-review finding (Driver.Historian.Wonderware-001)
WriteToReadOnlyFile was listed in MalformedErrors, so ClassifyOutcome/
MapOutcome routed it to PermanentFail and the store-and-forward sink
dead-lettered every alarm event in the batch. But WriteToReadOnlyFile is
a connection-configuration fault (the write session was opened without
ReadOnly = false), not an event-payload fault — treating it as permanent
silently and permanently discards alarm events on a misconfigured or
regressed connection, which is data loss.

Move WriteToReadOnlyFile from MalformedErrors into ConnectionErrors. The
batch loop now aborts the batch, resets the connection (so the reconnect
path re-opens a writable ReadOnly = false session), and defers the
events as RetryPlease for the next drain tick.

Updated the ClassifyOutcome theory data and added a dedicated regression
test pinning WriteToReadOnlyFile -> RetryPlease.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-22 06:59:40 -04:00
Joseph Doherty
8568f5cd85 docs(code-reviews): comprehensive per-module review pass at 76d35d1
Reviewed all 31 src/ production projects against the 10-category
checklist in REVIEW-PROCESS.md. Each module gets its own findings.md;
code-reviews/README.md is regenerated from them.

334 findings: 6 Critical, 46 High, 126 Medium, 156 Low.

Critical findings:
- Server-001: WriteNodeIdUnknown recurses unconditionally — a HistoryRead
  on an unresolvable node crashes the process (remote DoS).
- Admin-001/002: app-wide auth bypass (RouteView not AuthorizeRouteView)
  plus unauthenticated mutating routes.
- Core.Scripting-001: System.Environment reachable from operator scripts;
  Environment.Exit() terminates the server.
- Core.AlarmHistorian-001: rowIds/events parallel-list desync on a corrupt
  payload misapplies outcomes — silent alarm-event data loss.
- Driver.Galaxy-001: ReconnectSupervisor is built but never triggered, so
  a transient gateway drop permanently kills the event stream.

All findings are Status=Open; resolution is tracked per REVIEW-PROCESS.md
section 4. Review only — no source code changed.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-22 05:20:27 -04:00