Commit Graph

9 Commits

Author SHA1 Message Date
Joseph Doherty
ecc91b0e48 fix(driver-galaxy): resolve Medium code-review finding (Driver.Galaxy-011)
GetMemoryFootprint() returned a constant 0 with a stale "PR 4.4 sets this" comment
even though PR 4.4 shipped the SubscriptionRegistry. Replace with a live estimate:
64 bytes × TrackedItemHandleCount + 256 bytes × TrackedSubscriptionCount. A 50k-tag
set now registers ~3 MB with the server's cache-flush heuristic instead of being
invisible. Returns 0 when no subscriptions are active.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-22 09:48:37 -04:00
Joseph Doherty
0f3de4d510 fix(driver-galaxy): resolve Medium code-review finding (Driver.Galaxy-009)
Fix two resource-management bugs in StartDeployWatcher / BuildDefaultHierarchySource:
(a) Replace the discarded `_ = StartAsync(...)` with an explicit task variable that
    surfaces any synchronous InvalidOperationException (called-twice guard) rather than
    silently swallowing it.
(b) Change both StartDeployWatcher and BuildDefaultHierarchySource to use ??= on
    _ownedRepositoryClient so the first client created (by whichever path runs first)
    is reused by the second path, preventing a second GalaxyRepositoryClient from being
    created and the first from leaking past the driver's lifetime.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-22 09:47:52 -04:00
Joseph Doherty
d572a011ef fix(driver-galaxy): resolve Medium code-review finding (Driver.Galaxy-007)
Implement IAsyncDisposable on GalaxyDriver so async sub-component disposals
(EventPump, AlarmFeed, MxSession, MxClient, RepositoryClient) are awaited rather
than blocked on GetAwaiter().GetResult(). DisposeAsync is now the primary path;
Dispose() delegates to it for using-statement compatibility. Each async component's
shutdown is awaited individually with a best-effort catch so a single slow shutdown
cannot prevent the rest of the cleanup sequence from running.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-22 09:47:00 -04:00
Joseph Doherty
d14564839e fix(driver-galaxy): resolve Medium code-review finding (Driver.Galaxy-006)
HashSet<T>.First() enumeration order is unspecified and unstable across mutations, so
the "owner" handle attached to alarm events was non-deterministic when multiple alarm
subscriptions were active. Change _alarmSubscriptions from HashSet to List (preserving
insertion order) and pick [0] — the earliest-registered handle — as the deterministic
owner. The server routes transitions by SourceNodeId, not by handle, so the choice of
handle does not affect delivery to active subscribers.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-22 09:45:55 -04:00
Joseph Doherty
910a538b19 fix(driver-galaxy): resolve Medium code-review finding (Driver.Galaxy-004)
Add StatusCodeMap.ToQualityCategoryByte(uint) so the StatusCode → quality-byte
mapping lives in one place next to its inverse (FromQualityByte). GalaxyDriver
OnPumpDataChange now delegates to the helper instead of duplicating the shift+switch
inline; a future edit to the OPC UA bit layout cannot silently desync the probe-health
decode. Unit tests in StatusCodeMapTests pin all three category buckets and the
round-trip invariant.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-22 09:43:53 -04:00
Joseph Doherty
39a02f6794 fix(driver-galaxy): resolve Medium code-review finding (Driver.Galaxy-003)
StatusCodeMap.FromMxStatus checked `success != 0` to determine success, but the
mxaccessgw proto contract explicitly documents that `success` is not a boolean and
that clients must branch on `category` (MX_STATUS_CATEGORY_OK), not on `success`
alone. Replace the raw field check with `status.IsSuccess()` from
MxStatusProxyExtensions, which requires both `success != 0` AND `category == Ok`.
A worker reporting success=1 with a non-OK category was previously misreported as
Good. Updated StatusCodeMapTests with a regression case covering the inverted scenario.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-22 09:42:47 -04:00
Joseph Doherty
7f2e144f8d fix(driver-galaxy): resolve High code-review findings (Driver.Galaxy-002, Driver.Galaxy-008)
Driver.Galaxy-002 — DataTypeMap.Map had no Int64 arm though MxValueDecoder/
MxValueEncoder both fully support Int64. Galaxy attributes with the Int64
mx_data_type code fell through to the String default, creating a String
address-space node while runtime reads decoded a boxed long. Added
`6 => DriverDataType.Int64`, extending the contiguous 0..5 scheme so the type
map agrees with the decoder/encoder on all seven Galaxy data types.

Driver.Galaxy-008 — after a stream fault the EventPump's StreamEvents consumer
loop exited and its channel completed; EventPump.Start() is a no-op on a
completed-but-non-null loop, so a replayed subscription had no consumer and
ReplayAsync never re-registered the post-reconnect item handles. ReplayAsync
now recreates the EventPump (RestartEventPumpForReplay) and rebinds the
SubscriptionRegistry per subscription with the fresh item handles returned by
the post-reconnect SubscribeBulkAsync, via new SubscriptionRegistry.SnapshotEntries
and Rebind APIs.

Regression tests: DataTypeMapTests (every code incl. Int64), SubscriptionRegistry
Tests (Rebind/SnapshotEntries), EventPumpStreamFaultTests (faulted pump dead,
fresh pump resumes dispatch).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-22 06:59:38 -04:00
Joseph Doherty
4df8737c86 fix(driver-galaxy): wire event-stream faults to the reconnect supervisor (Driver.Galaxy-001)
The ReconnectSupervisor was constructed but its trigger
ReportTransportFailure was never called. When the gateway StreamEvents
stream faulted, EventPump just logged and exited — the supervisor was
never notified, so a transient gateway drop permanently stopped
data-change notifications while GetHealth() still reported Healthy.

EventPump gains an optional onStreamFault callback invoked from its
stream-fault catch block (not on clean shutdown). GalaxyDriver wires it
to ReconnectSupervisor.ReportTransportFailure so a transport drop drives
reopen → replay.

This is the minimal fix for -001; the pump-restart-on-reopen gap remains
tracked as Driver.Galaxy-008. Regression tests cover the callback being
invoked on fault, the end-to-end supervisor reopen/replay, and that a
clean shutdown does not fire it. Driver.Galaxy suite: 206/206 pass.

Resolves code-review finding Driver.Galaxy-001 (Critical).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-22 05:54:33 -04:00
Joseph Doherty
8568f5cd85 docs(code-reviews): comprehensive per-module review pass at 76d35d1
Reviewed all 31 src/ production projects against the 10-category
checklist in REVIEW-PROCESS.md. Each module gets its own findings.md;
code-reviews/README.md is regenerated from them.

334 findings: 6 Critical, 46 High, 126 Medium, 156 Low.

Critical findings:
- Server-001: WriteNodeIdUnknown recurses unconditionally — a HistoryRead
  on an unresolvable node crashes the process (remote DoS).
- Admin-001/002: app-wide auth bypass (RouteView not AuthorizeRouteView)
  plus unauthenticated mutating routes.
- Core.Scripting-001: System.Environment reachable from operator scripts;
  Environment.Exit() terminates the server.
- Core.AlarmHistorian-001: rowIds/events parallel-list desync on a corrupt
  payload misapplies outcomes — silent alarm-event data loss.
- Driver.Galaxy-001: ReconnectSupervisor is built but never triggered, so
  a transient gateway drop permanently kills the event stream.

All findings are Status=Open; resolution is tracked per REVIEW-PROCESS.md
section 4. Review only — no source code changed.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-22 05:20:27 -04:00