Commit Graph

608 Commits

Author SHA1 Message Date
Joseph Doherty
9db2edcbb5 parity: matrix fully green on dev rig (2026-04-30)
End-to-end run on the live ZB galaxy with mxaccessgw on
http://localhost:5120: 14 passed / 1 skipped / 0 failed in 18m53s.
PR 7.2's matrix-gate condition met. Three resolution patches in this
commit; the matrix doc records the new state.

1. Discoverer: defensive `[]` array-suffix strip
   ----------------------------------------------------
   The gw's GalaxyRepository.cs:173-175 appends `[]` to
   array-typed full_tag_reference values, but MxAccess COM
   IInstance.AddItem doesn't accept `[]`-suffixed addresses.
   GalaxyDiscoverer.StripArraySuffix removes the suffix client-side
   so SubscribeBulk / Read / Write paths see the canonical form.
   Tracked in mxaccessgw/requirements-array-suffix-fix.md; this
   workaround is removed when the gw fix lands.

2. WriteByClassification: pin status class, not exact code
   ---------------------------------------------------------
   Legacy MxAccessGalaxyBackend.WriteValuesAsync flat-maps every
   failure to BadInternalError (0x80020000); mxgw's
   GatewayGalaxyDataWriter.TranslateReply uses
   MxStatusProxy.RawDetectedBy to distinguish gw-layer faults
   (BadCommunicationError, 0x80050000) from MxAccess HRESULT
   faults. Both yield Bad-status — the parity invariant is the
   status class (Good/Uncertain/Bad), not the exact code. Both
   write tests now use AssertStatusClassMatches; legacy mapping
   retires alongside GalaxyProxyDriver in PR 7.2.

3. BrowseAndReadParity Read scenario: drop CLR-type assertion
   ------------------------------------------------------------
   Legacy returns the raw VARIANT (e.g. byte[]) for an attribute
   that hasn't received its first value cycle from MxAccess yet,
   while mxgw returns the typed value (Single, Int32, etc.). Once
   a real value is written or scanned, both converge. Pinning
   CLR-type equality across the uninitialized window adds noise
   without a real parity invariant — the StatusCode-class
   assertion already covers the "did the read succeed" question.
   The test still pins StatusCode-class parity per scenario.

4. Galaxy.ParityMatrix.md — first-rig results captured
   -----------------------------------------------------
   Per-row status flipped from "n/a unverified" to actual
   green / yellow / deferred outcomes from this run. Four new
   accepted-deltas added (read-value CLR type, write-status code
   mapping, single-platform ScanState scope, gw `[]` suffix
   workaround), bringing the total to nine. Outstanding deltas
   section flipped to "none as of 2026-04-30."

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-30 04:19:56 -04:00
Joseph Doherty
5e890ec9d6 parity: triage 3 false-positives from first-rig run (2026-04-30)
After running the matrix end-to-end against the live rig for the
first time, three of the nine failures were false positives — bugs in
the harness and test invariants, not real backend deltas:

1. ParityHarness configured the legacy backend with
   OTOPCUA_GALAXY_BACKEND=db, which is Discover-only. Reads, writes,
   and reinits all returned "MXAccess code lift pending — DB-backed
   backend covers Discover only". Switched to mxaccess backend; the
   ZB connection string still drives the discovery path.

2. HistoryReadParityTests asserted "neither backend implements
   IHistoryProvider" — but the legacy GalaxyProxyDriver still does
   (it's an accepted back-compat delta retired in PR 7.2). The
   architectural pin we *want* is "the new path doesn't regress to
   per-driver history", so the test now asserts only the mxgw side.

3. AlarmTransitionParityTests strict-pinned the five sub-attribute
   refs (InAlarmRef, etc.) on the legacy condition. PR 2.1 added
   those refs specifically so the new mxgw driver could populate them
   via AlarmRefBuilder; legacy pre-dates PR 2.1 and leaves them null
   — that's correct, not a regression. Test now asserts a one-way
   invariant: when legacy populated a ref, mxgw must match. When
   legacy is null, mxgw is free to populate (the mxgw → server-side
   AlarmConditionService direction).

The six remaining failures are real:

- 2 from the gw-side `[]` array suffix (filed in
  mxaccessgw/requirements-array-suffix-fix.md)
- 2 write-StatusCode mapping deltas (0x80050000 vs 0x80020000) —
  Bad-status both ways but mapped to different OPC UA codes
- 1 event-rate ratio of 5x (mxgw dispatches 5x legacy in the same
  3s window)
- (Plus the 2 ScanState scenarios that skip cleanly — single-platform
  rig as documented)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-30 03:00:44 -04:00
Joseph Doherty
580c45f494 docs: parity rig — concrete mxaccessgw setup recipe
Replaces the placeholder "configure an API key per gateway.md" with
the actual commands that worked end-to-end on this dev box:

- Build both halves (Worker x86 net48, Server net10)
- apikey init-db + apikey create-key with the seven scopes the parity
  test exercises (session:*, invoke:*, events:read, metadata:read)
- Three env-var overrides at server startup — capturing real lessons
  learned standing the rig up:
  * Kestrel__Endpoints__Http__Url = http://localhost:5120
  * Kestrel__Endpoints__Http__Protocols = Http2 (gRPC needs h2c on
    plain HTTP — without this flag the client gets HTTP_1_1_REQUIRED)
  * MxGateway__Worker__ExecutablePath = absolute path to the built
    worker (appsettings.json's relative path drops \net48 and the
    server can't resolve it)
- Note that workers spawn lazily on first OpenSession, not at server
  startup — so port-listening is necessary but not sufficient
  evidence the gateway is healthy.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-30 02:27:08 -04:00
Joseph Doherty
da277a843a docs: provisioning recipes for parity rig via graccess-cli
Calls out the single-platform constraint on this dev box and the
graccess-cli at C:\Users\dohertj2\Desktop\graccess as the way to
configure the rest of the parity-rig Galaxy shape:

- ScanState probe parity (multi-platform) is deferred to a customer
  rig — not feasible on this dev box. PR 7.2 gate accepts
  "n/a, deferred" on those rows because PR 4.7's unit tests already
  pin the state-decoder + member-tracking logic.
- Per-row provisioning recipes for the five ⚙-scriptable rows:
  FreeAccess/Operate UDA, Configure/Tune UDA, value-change source
  (recommend external write-loop over template surgery), $Alarm*
  extension, History extension. All against a reserved
  OtOpcUaParityTest sandbox UDO so plant-relevant objects stay
  untouched.
- Trailing deploy + Galaxy.Host restart so MxAccess picks up the
  change before re-running the matrix.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-29 22:40:31 -04:00
Joseph Doherty
c55da145ec docs: add Galaxy parity rig runbook
Walks through standing up both Galaxy backends side-by-side against a
single live Galaxy:

- Conceptual layout (two MxAccess sessions on distinct ClientNames so
  they don't evict each other)
- What's already on the dev box (AVEVA + OtOpcUaGalaxyHost service)
- mxaccessgw build + run + config (API key, ClientName)
- The three OTOPCUA_PARITY_* env vars the harness reads
- HarnessShapeTests as the two-line truth-teller for "did both halves
  resolve"
- Galaxy-shape coverage matrix mapping each scenario to what's needed
  for it to assert (rather than skip)
- Soak run recipes, including the compressed-tag fallback when the dev
  Galaxy doesn't have 50k attributes
- Troubleshooting for the four common SkipReasons
- Three further gates before PR 7.2 lands (matrix green, soak data,
  pilot flip)

Explicitly drops the stale "use a non-elevated shell" precondition —
the legacy Galaxy.Host pipe ACL accepts elevated and non-elevated
dohertj2 alike (resolved 2026-04-24).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-29 22:08:43 -04:00
Joseph Doherty
42f41fbe50 v2-mxgw follow-ups: production reads, secret resolution, perf knobs
Lands the five concrete code-level follow-ups identified after Phase 7.1:

#1 GalaxyDriver.ReadAsync now works in production. Previously threw
   NotSupportedException when no test reader was injected. New path
   subscribes through the existing SubscriptionRegistry + EventPump,
   waits for the first OnDataChange per item handle (gw pushes the
   initial value after SubscribeBulk), then unsubscribes. Tags the gw
   rejects up front, or that don't publish before the caller's CT
   fires, return Bad-status snapshots in input order so callers still
   get one snapshot per requested reference.

#2 ResolveApiKey() routes Gateway.ApiKeySecretRef through three forms:
   env:NAME, file:PATH, or literal-string fallback. A future DPAPI arm
   slots in here without touching the call site.

#3 GatewayGalaxySubscriber actually honors bufferedUpdateIntervalMs now
   (was being silently dropped). Calls SetBufferedUpdateInterval via
   the gw's MxCommandKind.SetBufferedUpdateInterval before SubscribeBulk
   when the requested interval differs from the cached last-applied
   value. Soft-fails on a non-Ok protocol status (the SubscribeBulk
   still succeeds at gw cadence).

#4 GalaxyMxAccessOptions.EventPumpChannelCapacity surfaces the bounded-
   channel size through DriverConfig JSON, defaulting to 50_000.

#5 Stale doc-comments in HostStatusAggregator and GatewayGalaxySubscriber
   describing follow-ups that already shipped.

Tests: +6 (read subscribe-once happy path + rejected-tag fallback;
five resolver scenarios). Total Galaxy driver tests now 180/180 green.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-29 17:27:24 -04:00
Joseph Doherty
d5a87c7467 PR 7.3 — Doc updates for v2 Galaxy backend (partial)
Forward-looking doc surface for the new in-process GalaxyDriver:

- CLAUDE.md gains a "v2 Galaxy backend" preamble at the top pointing
  readers at lmx_mxgw.md and docs/v2/Galaxy.Performance.md, and
  framing the rest of the doc as the still-accurate v1 Galaxy.Host
  description.
- New auto-memory entry project_galaxy_via_mxgateway.md captures the
  default-since-PR-7.1 status, perf surface entry points, and the
  soak validation knobs.

Intentionally deferred until PR 7.2 (parity-rig-validated):

- Removing the v1 description and rewriting the architecture section
  outright.
- Deleting mxaccess_documentation.md (still consumed by Galaxy.Host).
- Retiring memory entries for project_galaxy_host_service.md /
  project_galaxy_host_installed.md / project_aveva_platform_installed.md
  — those describe a stack that's still installed and in active use.
- Scrubbing Galaxy.Host references from docs/v2/dev-environment.md,
  docs/ServiceHosting.md, docs/Redundancy.md, docs/security.md.

All those changes presuppose the legacy stack is gone, which it isn't
yet. Re-open this PR's tail once the parity matrix in
docs/v2/Galaxy.ParityMatrix.md is fully green on a live rig.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-29 17:07:23 -04:00
Joseph Doherty
6f4cbf8449 PR 7.1 — Default-flip Galaxy backend to mxgateway
Adds Galaxy.DefaultBackend = "GalaxyMxGateway" to the server
appsettings as the forward-looking default for tooling and migration
scripts that author new Galaxy DriverInstance rows. No runtime
behavior change — both factories register independently at startup,
so existing rows keep working until PR 7.2 retires the legacy
registration (gated on the parity matrix in
docs/v2/Galaxy.ParityMatrix.md going fully green on the parity rig).

The e2e-config.sample.json comment is updated to reflect the new
default endpoint (http://localhost:5120 mxaccessgw) while still
pointing pre-flip rigs at the legacy OtOpcUaGalaxyHost path.

Install-Services.ps1's OtOpcUaGalaxyHost registration is intentionally
unchanged — yanking that mid-flight without a soaked parity rig would
leave any in-progress installation without a Galaxy backend at all.
PR 7.2 retires it alongside the legacy projects.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-29 17:05:50 -04:00
Joseph Doherty
edee47d77f PR 6.W — Galaxy.Performance.md
Documents the four perf surfaces shipped in Phase 6:

- Tracing surface (PR 6.1) — table of every span the driver emits +
  rationale for stream-level (not per-event) coverage.
- Metrics surface (PR 6.2) — three EventPump counters, tagging
  scheme, the bounded-channel design, and the
  received = dispatched + dropped + in-flight invariant.
- Buffered update interval (PR 6.3) — how MxAccess.PublishingIntervalMs
  flows through both subscribe paths and what's still pending on the
  gw side (typed SetBufferedUpdateInterval helper).
- Soak scenario (PR 6.4) — env-var-gated 24h × 50k validation with
  the CI-compressed override recipe.
- Tuned defaults (PR 6.5) — table of every default with source +
  notes; rows marked "unchanged" carry the explicit "no live data
  argues for changing this" caveat.

Closes with a "where to look first when something's slow" runbook
section so on-call doesn't have to re-derive the trace+metric
correlation map from primary docs.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-29 17:04:23 -04:00
Joseph Doherty
22ef2eb5ba PR 6.5 — Tune MxGatewayClientOptions defaults
Bumps DefaultCallTimeoutSeconds from 5 → 30. The 5s default was
provably unsafe regardless of soak data: a 50k-tag SubscribeBulk
walks the gw worker's item list serially under the MxAccess COM
apartment lock, and that scan can exceed 5s on a busy node. 30s
leaves comfortable headroom for the legitimate worst case while
still failing fast on a wedged worker.

ConnectTimeoutSeconds (10) and StreamTimeoutSeconds (0 = unlimited)
unchanged — the soak harness in PR 6.4 didn't observe pressure on
either, so they stay at their original sane values until live data
indicates otherwise.

Tuning rationale captured as a code comment in GalaxyGatewayOptions
so the next reader knows what was deliberate and what's pending live
soak data.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-29 17:03:06 -04:00
Joseph Doherty
698bdef572 PR 6.4 — Soak scenario test
Long-running soak harness exercising the in-process GalaxyDriver
against a live mxaccessgw. Subscribes a configurable tag count
(default 50_000), holds the subscription for a configurable duration
(default 24h), polls the EventPump's three counters every minute, and
asserts:

- events.received continues to grow (gw stream isn't stuck)
- events.dropped stays under a configurable percent ceiling
  (default 0.5%)
- process working-set doesn't grow >1 GB above baseline (leak guard)

Always skipped unless the operator opts in via OTOPCUA_SOAK_RUN=1.
Tag count, duration, and drop ceiling are env-overridable
(OTOPCUA_SOAK_TAGS / OTOPCUA_SOAK_MINUTES / OTOPCUA_SOAK_DROP_PCT) so
a smoke run can compress the scenario for CI gating.

Per-minute progress is logged as a CSV-style line to stdout so an
operator can grep the test runner output mid-run. PR 6.5 consumes the
data this scenario emits to tune MxGatewayClientOptions defaults.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-29 17:00:52 -04:00
Joseph Doherty
2fdad81af3 PR 6.3 — Buffered update interval landing
Wires MxAccess.PublishingIntervalMs into the gw's SubscribeBulk
bufferedUpdateIntervalMs parameter on both subscribe paths:

- GalaxyDriver.SubscribeAsync — when the caller passes TimeSpan.Zero
  (typical for infrastructure callers like the deploy watcher), the
  driver substitutes _options.MxAccess.PublishingIntervalMs. When the
  caller sets a non-zero interval (the server's UA subscription
  publishingInterval), that wins.
- PerPlatformProbeWatcher — new bufferedUpdateIntervalMs ctor parameter
  defaulting to 0 (gw default cadence). GalaxyDriver passes
  _options.MxAccess.PublishingIntervalMs so probe ScanState changes
  publish at the configured rate.

Tests: caller-wins-when-non-zero, fallback-to-config-when-zero on the
driver; default-zero, configured-forwarded, negative-rejected on the
probe watcher.

A session-level SetBufferedUpdateInterval RPC exists in the gw protocol
(MxCommandKind.SetBufferedUpdateInterval) but the .NET client doesn't
expose a typed helper yet — adjusting an existing subscription's
interval is a follow-up. Today's path subscribes once with the right
interval, which covers the common case.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-29 16:56:33 -04:00
Joseph Doherty
7b21c3b428 PR 6.2 — Bounded EventPump channel + drop-newest metrics
Decouples the gw stream-read loop from the listener-fanout loop with a
bounded Channel<MxEvent> (default capacity 50_000) sitting between them.
When a slow listener fills the channel, the producer's TryWrite returns
false and we count the drop rather than back-pressuring the gw stream.

Three counters on the ZB.MOM.WW.OtOpcUa.Driver.Galaxy meter expose the
pressure curve before it manifests as user-visible loss:

- galaxy.events.received  — MxEvents read from StreamEvents
- galaxy.events.dispatched — MxEvents that made it through to OnDataChange
- galaxy.events.dropped   — MxEvents discarded because the channel was full

Each measurement carries a galaxy.client tag so multi-driver hosts can
split by source. The driver wires _options.MxAccess.ClientName into the
new EventPump constructor parameter.

Tests: drop-newest under pressure, capacity validation, and per-pump
measurement filtering (xUnit can run other pump tests in parallel and
their measurements land on the same listener — the test filters to its
own client name).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-29 16:50:39 -04:00
Joseph Doherty
619207e7f5 PR 6.1 — OpenTelemetry traces around gw calls
In-box ActivitySource ("ZB.MOM.WW.OtOpcUa.Driver.Galaxy") wrapped around
the three gw-facing seams via decorators:

- TracedGalaxySubscriber — galaxy.subscribe_bulk / galaxy.unsubscribe_bulk
  / galaxy.stream_events spans. Stream span covers the entire stream
  lifetime with a galaxy.event_count tag (per-event spans would dominate
  the trace volume at 50k tags / 1Hz; PR 6.2 owns per-event metrics).
- TracedGalaxyDataWriter — galaxy.write spans tagged with
  galaxy.tag_count, galaxy.secured_write_count (split between FreeAccess
  /Operate vs Tune/Configure/VerifiedWrite, computed only when a listener
  is recording so the hot path stays free), galaxy.success_count.
- TracedGalaxyHierarchySource — galaxy.get_hierarchy spans tagged with
  galaxy.object_count.

GalaxyDriver.BuildProductionRuntimeAsync wraps the production seams in
the decorators. The driver itself doesn't take an OpenTelemetry package
dependency — System.Diagnostics.ActivitySource is in-box; the host
process picks the listener.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-29 16:36:47 -04:00
Joseph Doherty
78fe3e8a45 PR 5.W — Galaxy.ParityMatrix.md
Tabular scenario × result map for the seven Phase 5 parity scenarios
(BrowseAndRead, Subscribe, Write, Alarm, History, Reconnect, ScanState).
Each row records the assertion strength (green strict, yellow soft) and
flags accepted-delta cases:

- Transport-entry host name divergence (legacy = Galaxy.Host process,
  mxgw = MxAccess.ClientName)
- Reconnect latency cadence — different paths, both correct for their
  own session shape
- Sampled-read value drift (we pin StatusCode + type, not value)
- Event-rate ±50% tolerance over a 3s window
- Per-driver IHistoryProvider absence (architectural pin from PR 1.3)

Phase 7 (PR 7.1) consumes this matrix as the default-flip gate.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-29 16:32:20 -04:00
Joseph Doherty
837172ab39 PR 5.8 — Per-platform ScanState probe parity scenarios
Closes Phase 5 scenario coverage. Both
GalaxyRuntimeProbeManager (legacy) and PerPlatformProbeWatcher (PR 4.7)
must surface the same per-host status stream:

- GetHostStatuses_emits_same_host_set_after_Discover — drives Discover
  on both backends, waits 1.5s for the probe watcher's first push, then
  asserts the platform-host set agrees (transport-entry names differ
  by design — legacy uses the Galaxy.Host process identity, mxgw uses
  MxAccess.ClientName, so we strip those before comparing).
- GetHostStatuses_state_per_platform_matches_across_backends — for
  every overlapping platform host, the HostState must be identical.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-29 16:31:09 -04:00
Joseph Doherty
80a0ca2651 PR 5.7 — Reconnect / disruption parity scenarios
- Reinitialize_returns_both_backends_to_Healthy — drives
  ReinitializeAsync on each backend, asserts DriverState.Healthy
  afterwards, then re-reads a 3-tag sample to confirm the runtime
  surface is back. Recovery latency isn't pinned tightly (legacy = pipe
  + MxAccess COM client, mxgw = re-Register gw session — different
  cadences are expected).
- Health_state_diverges_only_when_one_backend_is_in_recovery — soft
  pin that both backends sit in Healthy or Degraded after init.

A tighter fault-injection scenario (toxiproxy-style) is the 5.7
follow-up — landed when the parity rig grows that capability.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-29 16:29:44 -04:00
Joseph Doherty
8d042c631b PR 5.6 — History-read parity scenarios
Galaxy history reads route through the server-owned HistoryRouter
(Phase 1, PR 1.3) — neither Galaxy backend implements IHistoryProvider
directly. Parity surface here is the routing decision:

- Discover_emits_same_historized_attribute_set_for_both_backends — the
  IsHistorized attribute set must agree symmetric-set-wise; that's what
  HistoryRouter consumes when deciding whether to route a HistoryRead to
  the Wonderware historian sidecar.
- Neither_Galaxy_backend_implements_IHistoryProvider_directly — pins
  the architectural decision so a regression that re-introduces a
  per-driver history path fires.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-29 16:29:01 -04:00
Joseph Doherty
bbdbdf8afb PR 5.5 — Alarm transition parity scenarios
- Discover_emits_same_AlarmConditionInfo_per_alarm_attribute — both
  backends produce the same alarm-condition source-node-id set, with
  matching SourceName / InitialSeverity / InAlarmRef / DescAttrNameRef
  per condition. Skips when the rig's Galaxy carries no alarm-marked
  attributes.
- Discover_marks_at_least_one_alarm_attribute_when_dev_Galaxy_has_alarms
  — IsAlarm-marked variable count parity, soft-pinned (count must
  match across backends but doesn't have to be non-zero).

Alarm-event persistence (the SQLite store-and-forward → Wonderware
historian event store path) is exercised in PR 5.6 against the
historian sidecar.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-29 16:28:13 -04:00
Joseph Doherty
982771df9a PR 5.4 — Write-by-classification parity scenarios
Both backends route a write through the same path keyed off the attribute's
SecurityClassification, so a single write request must produce the same
StatusCode on each:

- FreeAccess_or_Operate_write_returns_same_StatusCode_on_both_backends
  picks the first numeric FreeAccess/Operate attribute and writes 0.0.
- Configure_class_write_routes_through_secured_path_on_both_backends
  picks a Configure/Tune attribute, writes through the secured path,
  asserts StatusCode parity (the test doesn't care whether the write
  succeeds — only that both backends produce the same outcome).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-29 16:26:57 -04:00
Joseph Doherty
9db6da9c20 PR 5.3 — Subscribe + event-rate parity scenarios
- Subscribe_returns_a_handle_for_each_backend — both backends accept
  the same full-reference list and return a non-null handle, with
  symmetric Unsubscribe cleanup.
- Subscribe_event_rate_within_tolerance_for_a_3s_window — counts
  OnDataChange invocations on each backend across a 3s window and
  asserts the mxgw/legacy ratio sits in [0.5, 1.5]. Skips when the
  sampled tags don't change in the window (configuration-only Galaxy).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-29 16:25:42 -04:00
Joseph Doherty
71443ecbf3 PR 5.2 — Browse + read parity scenarios
Three scenarios using ParityHarness.RequireBoth:

- Discover_emits_same_variable_set_for_both_backends — symmetric set diff
  on the full-reference set must be empty.
- Discover_emits_same_DataType_and_SecurityClass_per_attribute — meta
  triple (DriverDataType, SecurityClass, IsHistorized) must match per
  attribute.
- Read_returns_same_value_and_status_for_a_sampled_attribute — samples
  the first 5 discovered variables, reads through both backends, asserts
  StatusCode equality and value-CLR-type equality (raw values may drift
  between the two reads on a live Galaxy).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-29 16:24:36 -04:00
Joseph Doherty
82cdf460c5 PR 5.1 — Driver.Galaxy.ParityTests project shell + ParityHarness
Side-by-side fixture that boots both backends against the same dev Galaxy:

- Legacy GalaxyProxyDriver against an out-of-process Galaxy.Host EXE
  (skipped when ZB SQL on localhost:1433 isn't reachable or when the EXE
  hasn't been built).
- New in-process GalaxyDriver against an mxaccessgw gateway at
  http://localhost:5120 by default (skipped when the gateway isn't
  reachable). Endpoint, API key, and client name are env-var overridable
  for the central parity host.

Per-backend availability is independent — each scenario decides whether
to RequireBoth, GetDriver(specific), or use RunOnAvailableAsync to drive
both with the same closure and diff snapshots. PR 5.2–5.8 land scenarios
on top of this shell.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-29 16:22:04 -04:00
Joseph Doherty
21cac4c8c4 PR 4.W — Galaxy:Backend wiring + server-side factory registration
- GalaxyDriver.InitializeAsync now builds the production gw runtime (MxGatewayClient,
  GalaxyMxSession, GatewayGalaxySubscriber, GatewayGalaxyDataWriter,
  ReconnectSupervisor, HostConnectivityForwarder, PerPlatformProbeWatcher) when no
  test seams are pre-injected; Dispose tears the chain down in order.
- GetHealth surfaces supervisor.IsDegraded as DriverState.Degraded so a transport
  drop is observable without polling the supervisor directly.
- DiscoverAsync now refreshes the per-platform probe watcher's membership against
  $WinPlatform / $AppEngine objects after every discovery pass.
- OnPumpDataChange routes ScanState changes through the probe watcher in addition
  to fanning out OnDataChange to ISubscribable consumers.
- Server registers GalaxyDriver under "GalaxyMxGateway" alongside the legacy
  "Galaxy" GalaxyProxyDriver factory so DriverInstance rows can opt in.
- Bumped Server.Tests' Microsoft.Extensions.Logging.Abstractions to 10.0.7 to
  resolve the downgrade pulled in transitively via MxGateway.Client.
- Lifecycle factory tests switched to the internal seam-injection ctor so they
  no longer attempt a real gRPC connect during InitializeAsync.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-29 16:10:31 -04:00
Joseph Doherty
dae520b9c0 PR 4.7 — Host-connectivity probes (IHostConnectivityProbe scaffold)
HostStatusAggregator merges transport + per-platform host entries with
change-event diffing (re-asserting same state is a no-op so a stable
ScanState=Running burst doesn't fan out duplicates). PerPlatformProbeWatcher
ports the legacy GalaxyRuntimeProbeManager state machine onto the gw
subscription path: SubscribeBulk for `<tag>.ScanState`, idempotent
SyncPlatformsAsync (subscribe new, unsubscribe dropped), and a
DecodeState helper pinning bool/int/string ScanState values + bad-quality
fallback. HostConnectivityForwarder is the skeleton for the gw-6
StreamSessionHealth signal — until that mxaccessgw RPC ships, PR 4.5's
ReconnectSupervisor pushes transport state by calling SetTransport on
session connect/disconnect.

GalaxyDriver wiring (implement IHostConnectivityProbe, route OnDataChange
to PerPlatformProbeWatcher, expose GetHostStatuses() / OnHostStatusChanged,
push transport from supervisor) is deferred to PR 4.W to avoid conflict
with the rest of the Phase 4 deferred wiring (4.5 supervisor + 4.6
DeployWatcher).

Tests: 19 new
- HostStatusAggregatorTests (9): empty snapshot, new-host change with
  Unknown predecessor, same-state silence, transition diff, snapshot
  reflects every host, case-insensitive host names, Remove returns true
  for tracked, Remove false for unknown, concurrent updates don't corrupt.
- HostConnectivityForwarderTests (5): SetTransport routes under client
  name, transitions fire change, repeated same-state silent, empty client
  name throws, post-dispose throws.
- PerPlatformProbeWatcherTests (5 + theory pinning DecodeState's full
  truth table): subscribe N platforms, idempotent re-sync, removed
  platforms unsubscribed + dropped from aggregator, OnProbeValueChanged
  routing for Running/Stopped/bad-quality/foreign-ref, Dispose
  unsubscribes everything.

NOTE: build is currently broken because mxaccessgw/clients/dotnet/ has
been removed from C:\Users\dohertj2\Desktop\mxaccessgw — this PR's source
is internally consistent and isolated from the missing dependency, but the
existing Driver.Galaxy code (PRs 4.1–4.6) can't compile until the .NET
client is restored. Once it is, expect 116 + 19 = 135 tests in the
Driver.Galaxy.Tests project.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-29 15:47:13 -04:00
Joseph Doherty
123e3e48b9 PR 4.5 — ReconnectSupervisor
State machine that drives GalaxyDriver's recovery from gw transport
failure. Healthy → TransportLost → Reopening → Replaying → Healthy. Drivers
report failure signals; the supervisor runs reopen + replay with capped
exponential backoff (default 500ms → 30s) until both succeed.

Files:
- Runtime/ReconnectSupervisor.cs — state machine with snapshot, change
  event, last-error tracking, and a one-attempt-at-a-time recovery loop.
  Idempotent ReportTransportFailure: repeated failure reports during an
  in-flight recovery do not spawn parallel loops. Reopen + replay are
  caller-supplied callbacks (the driver injects them in the wire-up PR);
  reopen re-Registers the gw session, replay re-establishes every active
  subscription via gw's ReplaySubscriptionsCommand (mxaccessgw issue gw-3)
  or the SubscribeBulk fallback. Dispose cancels the loop cleanly.
- Public StateTransition record + IsDegraded predicate the driver maps
  to DriverState.Degraded for health snapshots.

Wiring (GalaxyDriver subscribes the supervisor to its EventPump's
transport-failure signal, exposes IsDegraded through GetHealth(), routes
reopen/replay callbacks through GalaxyMxSession + SubscriptionRegistry)
lands in PR 4.W to avoid conflict with the parallel host-probe track
(PR 4.7) and align the wire-up with the rest of Phase 4's plumbing.

9 supervisor tests (full state-machine traversal, retry-until-success on
both reopen and replay failures, idempotent failure reports, last-error
propagation, Dispose mid-recovery, post-dispose throws, fast-path Healthy
WaitForHealthy).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-29 15:39:21 -04:00
Joseph Doherty
7922e573b1 PR 4.6 — DeployWatcher (IRediscoverable scaffold)
DeployWatcher consumes GalaxyRepositoryClient.WatchDeployEventsAsync,
suppresses the bootstrap event, and raises RediscoveryEventArgs whenever
time_of_last_deploy actually changes. Reconnect-on-error with capped
exponential backoff. GalaxyDriver wiring (IRediscoverable.OnRediscoveryNeeded
event + StartAsync inside InitializeAsync) lands in a follow-up so this PR
doesn't conflict with the parallel runtime track.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-29 15:33:37 -04:00
Joseph Doherty
ce004c80ab PR 4.4 — ISubscribable + EventPump
Subscription path online. GalaxyDriver implements ISubscribable; subscribes
batches via gw SubscribeBulkAsync, runs a single shared EventPump consumer
of StreamEventsAsync, fans out OnDataChange events to every driver
subscription that observes the changed gw item handle.

Files:
- Runtime/GalaxySubscriptionHandle.cs — record implementing ISubscriptionHandle.
- Runtime/SubscriptionRegistry.cs — bookkeeping with forward (subscriptionId
  → bindings) and reverse (itemHandle → list of subscriptionIds) maps. The
  reverse map is the fan-out index so a single OnDataChange dispatches to
  every subscription that observes the changed handle.
- Runtime/IGalaxySubscriber.cs — driver-side seam: SubscribeBulk +
  UnsubscribeBulk + StreamEventsAsync. Production wraps GalaxyMxSession;
  tests substitute a fake driving synthetic MxEvents.
- Runtime/GatewayGalaxySubscriber.cs — production. Forwards to
  MxGatewaySession; bufferedUpdateIntervalMs is captured for now and
  becomes a SetBufferedUpdateInterval call once gw issue #102 / gw-9 lands
  (PR 6.3 picks this up).
- Runtime/EventPump.cs — long-running background consumer of
  StreamEventsAsync. Decodes MxValue + maps quality byte/MxStatusProxy via
  StatusCodeMap. Fan-out per subscriber resolves through the registry; bad
  handler exceptions are caught + logged, never break the dispatch loop.
  Filters out non-OnDataChange families (write-complete and operation-
  complete come back via InvokeAsync's reply path, not the event stream).

GalaxyDriver:
- Adds ISubscribable. SubscribeAsync allocates a subscription id,
  SubscribeBulks, builds the binding list (failed gw entries get
  ItemHandle=0 + a per-tag warn log), registers, and returns the handle.
  EventPump is started lazily on first subscribe; one pump per driver
  shared across all subscriptions.
- UnsubscribeAsync removes from the registry first (so stale events are
  filtered immediately) then calls UnsubscribeBulk best-effort. Foreign
  handles throw ArgumentException.
- ReadAsync NotSupportedException message updated: PR 4.4 no longer the
  pointer (deferred to a small follow-up that wraps the pump as a
  one-shot reader).
- Dispose tears down the pump first, then the repository client, then
  clears state.
- Internal ctor extended with optional subscriber parameter.

Tests (15 new, 109 Galaxy total):
- SubscriptionRegistryTests: monotonic id allocation, single+multi
  subscription fan-out, failed-handle exclusion, removal isolation, count
  invariants.
- GalaxyDriverSubscribeTests: handle allocation + value-change dispatch,
  multi-subscription fan-out, failed-tag silence, unsubscribe drops gw
  handle and stops dispatch, foreign handle throws, no-subscriber throws,
  empty-tag-list returns handle without calling gw.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-29 15:33:27 -04:00
Joseph Doherty
a617086da1 PR 4.3 — IWritable + secured-write routing
Write path online. GalaxyDriver implements IWritable; routes by
SecurityClassification — SecuredWrite / VerifiedWrite tags go through
MxCommandKind.WriteSecured, everything else through MxGatewaySession.
WriteAsync. Per-tag classifications are captured during ITagDiscovery via a
SecurityCapturingBuilder wrapper that intercepts Variable() calls without
the discoverer needing to know about the driver's internal state.

Files:
- Runtime/MxValueEncoder.cs — boxed CLR value → MxValue. Covers seven Galaxy
  scalar types (bool/int8-32/uint8-32 → Int32, int64/uint64 → Int64, float,
  double, string, DateTime/DateTimeOffset → Timestamp) and 1-D array
  variants. Inverse of MxValueDecoder; round-trip pinned by tests.
  DateTime.Local converts to UTC; unsupported types throw ArgumentException.
- Runtime/IGalaxyDataWriter.cs — driver-side seam. Tests inject a fake to
  capture routing decisions; production path uses GatewayGalaxyDataWriter.
- Runtime/GatewayGalaxyDataWriter.cs — production. Lazy-AddItem caches
  itemHandles, encodes value, routes Write vs WriteSecured, translates
  MxCommandReply (ProtocolStatus → BadCommunicationError; first
  MxStatusProxy in statuses[] via StatusCodeMap.FromMxStatus). Per-tag
  exception isolation: one bad write doesn't fail the batch.
- GalaxyDriver: now implements IWritable. Discovery wraps the supplied
  IAddressSpaceBuilder in SecurityCapturingBuilder which records each
  attribute's SecurityClass into _securityByFullRef before delegating.
  WriteAsync resolves classification per tag (FreeAccess default for
  unknown tags — matches the legacy backend), routes through the injected
  writer. Throws NotSupportedException with PR 4.4 pointer when no writer
  is wired (production path requires GalaxyMxSession.Connect from PR 4.4).

Tests (32 new, 94 Galaxy total):
- MxValueEncoder: every scalar type, narrowing checks (sbyte/short/byte/
  ushort fit Int32; uint within Int32 range; ulong within Int64),
  DateTime.Local → UTC conversion, array variants for bool/double/string/
  DateTime, Dimensions populated, unsupported-type throws ArgumentException,
  encoder/decoder round-trip pin.
- GalaxyDriverWriteTests: WriteAsync routes through fake writer with
  values intact; theory exercises every SecurityClassification value through
  the discovery-then-write path; unknown-tag defaults to FreeAccess; empty-
  request short-circuit; no-writer fail-loud; post-dispose throws.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-29 15:24:22 -04:00
Joseph Doherty
85bdf0d58b PR 4.2 — IReadable abstraction + StatusCodeMap + MxValueDecoder
Read path scaffold + the byte→uint quality mapping table that the parity
matrix (PR 5.x) pins. PR 4.4 supplies the production GW-backed reader; this
PR ships the abstraction and the supporting infrastructure so 4.4 just
plugs the implementation in.

Files:
- Runtime/StatusCodeMap.cs — explicit OPC DA quality byte → OPC UA
  StatusCode uint mapping. Extends the legacy Galaxy.Host
  HistorianQualityMapper with named constants (Good / GoodLocalOverride,
  Uncertain + 4 substatuses, Bad + 7 substatuses, BadInternalError) and an
  MxStatusProxy → uint helper that honors success flag → detail byte →
  detected_by transport-error fallback. Unknown bytes fall back to category
  bucket with a once-per-session diagnostic log so field captures can
  extend the table.
- Runtime/MxValueDecoder.cs — gateway MxValue → boxed CLR value for the
  seven Galaxy data types (Boolean, Int32, Int64, Float32, Float64, String,
  DateTime) plus their array variants. Honors MxValue.IsNull and
  RawValue passthrough.
- Runtime/IGalaxyDataReader.cs — driver-side seam for one-shot reads. PR
  4.4 ships the production wrapper around MxGatewaySession.SubscribeBulk +
  StreamEvents + UnsubscribeBulk; this PR exposes the contract so
  GalaxyDriver.ReadAsync wires through it.
- Runtime/GalaxyMxSession.cs — wrapper around MxGatewaySession that owns
  the Register handle. ConnectAsync opens session + Register; AttachForTests
  lets tests bypass real gw construction. PR 4.3/4.4/4.5 add write,
  subscribe, and reconnect surfaces.

GalaxyDriver:
- Implements IReadable. ReadAsync routes through the injected
  IGalaxyDataReader (test seam) when present; production path throws
  NotSupportedException pointing at PR 4.4 — protects deployments running
  this PR from silent wrong reads while signaling that the legacy-host
  backend (Galaxy:Backend=legacy-host) handles reads in the meantime.
- Internal ctor extended with optional dataReader parameter (default null,
  preserves PR 4.0/4.1 callers).

Tests: 42 new — exhaustive byte→uint table for StatusCodeMap (15 known
codes + category-bucket fallback for unknowns + MxStatusProxy precedence
rules + OPC UA top-byte invariants), every MxValue oneof case for the
decoder (bool/int32/int64/float/double/string/timestamp/3 array variants/
raw bytes/null), GalaxyDriver IReadable wiring (route-through, empty-
request, no-reader-throws, post-dispose-throws, status-code preservation).
62 Galaxy tests total pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-29 15:15:42 -04:00
Joseph Doherty
ecba5cedf9 PR 4.1 — ITagDiscovery via GalaxyRepositoryClient + AlarmRefBuilder
Browse path online. GalaxyDriver now implements ITagDiscovery against the
gateway's GalaxyRepositoryClient (PR 0.1's mxaccessgw browse RPC) and feeds
the address-space builder one folder per gobject + one variable per dynamic
attribute, with alarm-bearing attributes carrying all five sub-attribute refs
the server-level AlarmConditionService (PR 2.2) needs.

Files:
- Browse/IGalaxyHierarchySource.cs — driver-side seam between the discoverer
  and the gateway. Test fakes return canned hierarchies so the discoverer's
  translation logic is exercised without a real gRPC channel.
- Browse/GatewayGalaxyHierarchySource.cs — production wrapper around
  GalaxyRepositoryClient.DiscoverHierarchyAsync (paged internally).
- Browse/GalaxyDiscoverer.cs — translates GalaxyObject → IAddressSpaceBuilder
  calls. Browse name = contained_name (falls back to tag_name); full
  reference = attr.full_tag_reference when set, else tag_name + "." +
  attribute_name. Skips objects/attributes with empty identity.
- Browse/DataTypeMap.cs — mx_data_type → DriverDataType (port from legacy
  GalaxyProxyDriver.MapDataType, same fallback to String for unknown codes).
- Browse/SecurityMap.cs — security_classification → SecurityClassification
  (port from legacy GalaxyProxyDriver.MapSecurity).
- Browse/AlarmRefBuilder.cs — populates the five sub-attribute refs by
  Galaxy convention (.InAlarm/.Priority/.DescAttrName/.Acked/.AckMsg). The
  same convention the legacy GalaxyAlarmTracker hard-coded; concentrated
  here so PR 2.2's service receives complete AlarmConditionInfo rows.

GalaxyDriver:
- Added internal ctor accepting IGalaxyHierarchySource? for test injection.
  Default lazily builds GatewayGalaxyHierarchySource around a
  GalaxyRepositoryClient constructed from options on first DiscoverAsync.
- Owned GalaxyRepositoryClient disposed in Dispose.
- ApiKey resolution is currently a passthrough of ApiKeySecretRef — PR 4.W
  (or follow-up) wires DPAPI-backed secret resolution.

csproj: path-based ProjectReference to mxaccessgw (the user is shipping
that repo on a parallel track; both repos sit side-by-side on the dev box).
Tests project also references MxGateway.Contracts directly to construct
GalaxyObject / GalaxyAttribute fixtures.

Tests: 10 new in Browse/GalaxyDiscovererTests.cs covering folder-per-object,
variable-per-attribute, full-ref defaulting + gw-supplied override, browse-
name fallback, every metadata field propagation, alarm sub-attribute ref
population, non-alarm rows skip MarkAsAlarmCondition, empty-identity skips,
empty-attribute-name skips, end-to-end through GalaxyDriver.DiscoverAsync.
20 total Galaxy tests pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-29 15:06:02 -04:00
Joseph Doherty
f6a4f919e2 PR 4.0 — Driver.Galaxy project skeleton + factory
New in-process .NET 10 driver project at
src/ZB.MOM.WW.OtOpcUa.Driver.Galaxy/. The Tier-A replacement for
Driver.Galaxy.Host + Driver.Galaxy.Proxy. PR 4.0 ships only the IDriver
shape + factory + options; capability bodies (browse, read, write,
subscribe, deploy-watch, host probes) land in PRs 4.1–4.7.

Files:
- Driver.Galaxy.csproj — net10 x64, AnyCPU+x64 platforms, references
  Core.Abstractions + Core. No MxGatewayClient ProjectReference yet — that
  comes in PR 4.2 once the gw NuGet package is wired (the user is
  shipping mxaccessgw on a parallel track).
- Config/GalaxyDriverOptions.cs — nested record hierarchy
  (Gateway/MxAccess/Repository/Reconnect) mirroring the JSON shape spelled
  out in lmx_mxgw_impl.md PR 4.0 acceptance section.
- GalaxyDriver.cs — minimal IDriver impl. Initialize/Shutdown toggle
  DriverHealth between Healthy/Unknown; Reinitialize bumps the timestamp;
  GetMemoryFootprint=0 (PR 4.4 wires SubscriptionRegistry size);
  FlushOptionalCachesAsync no-op. Logs intent on lifecycle calls so
  partial deployments are diagnosable.
- GalaxyDriverFactoryExtensions.cs — JSON parser, default fill-ins,
  validation throw on missing required fields. Driver type name
  "GalaxyMxGateway" intentionally distinct from legacy "Galaxy" so both
  factories coexist during parity testing (Phase 5). PR 4.W's
  Galaxy:Backend switch picks one or the other.

Tests:
- 10 tests in Driver.Galaxy.Tests covering minimal-config defaults, full
  override path, three required-field error cases, factory registration
  via DriverFactoryRegistry.TryGet, lifecycle health transitions
  (Init → Shutdown → Reinit), Dispose idempotency, and post-disposal
  ObjectDisposedException.

slnx: registers the new Driver.Galaxy + Driver.Galaxy.Tests projects.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-29 14:57:31 -04:00
Joseph Doherty
854827090a PR 3.W — Phase 3 wire-up: Wonderware sidecar DI registration
Solution + DI plumbing to complete Phase 3. With this PR the .NET 10 server
can boot with the Wonderware historian sidecar in the loop, gated by config
so existing deployments are unaffected.

slnx: registers Driver.Historian.Wonderware (net48 sidecar),
Driver.Historian.Wonderware.Client (net10 client), and both test projects.

Server.csproj: adds ProjectReference to the .NET 10 client.

Program.cs: reads Historian:Wonderware:* configuration. When Enabled=true,
constructs a WonderwareHistorianClient singleton and:
  - Registers it as IAlarmHistorianWriter so the SqliteStoreAndForwardSink
    drain (task #248) can pick it up.
  - Registers a WonderwareHistorianBootstrap hosted service that, on
    StartAsync, calls IHistoryRouter.Register(prefix, client) under the
    configured DriverInstancePrefix (default "galaxy") — lets the
    HistoryRead* dispatch in DriverNodeManager find the sidecar via
    longest-prefix-match resolution.

When Enabled=false (the default), DriverNodeManager keeps using its
internal LegacyDriverHistoryAdapter for the read path and the existing
NullAlarmHistorianSink stays in place — drop-in compatible with every
deployment that hasn't moved off Galaxy.Host yet.

42 server integration tests + 10 client tests pass. Full solution build
clean (0/0).

Note: scripts/install/Install-Services.ps1 and
src/.../Server/appsettings.json carry intermixed user WIP and are NOT
committed in this PR. Equivalent edits applied locally:

  Install-Services.ps1: new -InstallWonderwareHistorian switch installs the
  OtOpcUaWonderwareHistorian service alongside OtOpcUaGalaxyHost;
  generates a fresh historian shared secret; OtOpcUa service depends on
  both when historian sidecar is installed.

  Server/appsettings.json: new Historian.Wonderware section with
  Enabled=false default, PipeName/SharedSecret/PeerName/
  DriverInstancePrefix/ConnectTimeoutSeconds/CallTimeoutSeconds keys.

Both pieces should land in a follow-up commit once the user's WIP on those
files clears.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-29 14:48:47 -04:00
Joseph Doherty
14947fde51 PR 3.4 — Wonderware historian sidecar .NET 10 client
New project Driver.Historian.Wonderware.Client (net10 x64) implements both
Core.Abstractions.IHistorianDataSource (read paths consumed by the server's
IHistoryRouter) and Core.AlarmHistorian.IAlarmHistorianWriter (alarm-event
drain consumed by SqliteStoreAndForwardSink) against the sidecar's PR 3.3
pipe protocol.

Wire-format files (Framing/MessageKind, Hello, Contracts, FrameReader,
FrameWriter) are byte-identical mirrors of the sidecar's net48 originals —
the sidecar can't be referenced as a ProjectReference because of the
runtime/bitness gap, so we duplicate and pin the wire bytes via tests.

PipeChannel owns one bidirectional NamedPipeClientStream + Hello handshake +
serializes calls. Single in-flight at a time (semaphore); transport failures
trigger one in-flight reconnect-and-retry before propagating. Connect is
abstracted behind a Func<CancellationToken, Task<Stream>> so tests inject
in-process pipes.

WonderwareHistorianClient maps:
- HistorianSampleDto.Quality (raw OPC DA byte) → OPC UA StatusCode uint via
  QualityMapper (port of HistorianQualityMapper from sidecar).
- HistorianAggregateSampleDto.Value=null → BadNoData (0x800E0000).
- WriteAlarmEventsReply.PerEventOk[i]=true → Ack, false → RetryPlease.
  Whole-call failure or transport exception → RetryPlease for every event in
  the batch (drain worker handles backoff).
- AlarmHistorianEvent → AlarmHistorianEventDto with severity bucketed via
  AlarmSeverity-to-ushort mapping (Low=250, Medium=500, High=700, Crit=900).

GetHealthSnapshot tracks transport success + sidecar-reported failure
separately; ConsecutiveFailures rises on operation-level errors, not just
transport drops.

10 round-trip tests via FakeSidecarServer (in-process net10 fake using the
client's own framing): byte→uint quality mapping, null-bucket BadNoData,
at-time order preservation, event-field round-trip, sidecar error surfacing,
WriteBatch per-event status, whole-call retry-please mapping, Hello
shared-secret rejection, transport-drop reconnect-and-retry, health snapshot
counters.

PR 3.W will register this client as IHistorianDataSource + IAlarmHistorianWriter
in OpcUaServerService DI.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-29 14:40:56 -04:00
Joseph Doherty
9f7a4ac769 PR 3.3 — Wonderware sidecar pipe protocol + dispatcher
Sidecar now serves a length-prefixed, kind-tagged MessagePack pipe protocol
mirroring Galaxy.Host's: 4-byte BE length + 1-byte MessageKind + body, 16 MiB
cap. Hello handshake validates per-process shared secret + protocol major
version + caller SID via ImpersonateNamedPipeClient before any work frame
runs.

Five contract pairs ship in this PR:

  ReadRawRequest          ↔ ReadRawReply
  ReadProcessedRequest    ↔ ReadProcessedReply
  ReadAtTimeRequest       ↔ ReadAtTimeReply
  ReadEventsRequest       ↔ ReadEventsReply
  WriteAlarmEventsRequest ↔ WriteAlarmEventsReply

Timestamps cross the wire as DateTime ticks (long) to dodge MessagePack's
DateTime kind/timezone quirks; both sides convert with DateTime(ticks, Utc).
Sample values cross as MessagePack-serialized byte[] so the .NET 10 client
(PR 3.4) deserializes per the tag's mx_data_type without the sidecar needing
to know OPC UA types.

HistorianFrameHandler dispatches by MessageKind to IHistorianDataSource (the
PR 3.2 lifted interface) for reads, and to a new IAlarmEventWriter strategy
for the alarm-event persistence path. Per-call exceptions surface as
Success=false replies so a single bad request doesn't kill the connection.
WriteAlarmEvents replies carry per-event success flags; the SQLite
store-and-forward sink retries failed slots on the next drain tick.

Program.cs spins the pipe server when OTOPCUA_HISTORIAN_ENABLED=true. Pipe-
only mode (default false) preserves PR 3.1's smoke-test behaviour: the host
still validates env vars and waits for Ctrl-C, but doesn't initialize the
Wonderware SDK.

Sidecar test project gains 8 round-trip tests (37 total now): every contract
pair round-trips through FrameReader/FrameWriter via in-memory streams, the
handler surfaces historian exceptions cleanly, WriteAlarmEvents per-event
status flows through, and the no-writer-configured path returns a clean
error reply.

Added MessagePack 2.5.187 to the sidecar csproj.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-29 14:27:17 -04:00
Joseph Doherty
bc7ec746c5 PR 1+2.W — Wire HistoryRouter + AlarmConditionService into DI
Server-side singletons threaded through OpcUaApplicationHost → OtOpcUaServer
→ DriverNodeManager construction. New ctor parameters are last-position
optional with null defaults so every existing test construction site
(OpcUaServerIntegrationTests, AlarmSubscribeIntegrationTests, etc.) keeps
working unchanged.

Program.cs:
  AddSingleton<IHistoryRouter, HistoryRouter>();
  AddSingleton<AlarmConditionService>();

The router stays empty after this PR. DriverNodeManager's internal
LegacyDriverHistoryAdapter handles every driver that still implements
IHistoryProvider; PR 3.W will register the Wonderware sidecar as a router
source; PR 7.2 retires the legacy fallback entirely.

44 alarm + history + integration tests pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-29 14:13:51 -04:00
Joseph Doherty
9365beb966 PR 3.2 — Lift Wonderware Historian SDK code to sidecar
Move all historian implementation files from Driver.Galaxy.Host/Backend/Historian/
to Driver.Historian.Wonderware/Backend/. Sidecar now owns the aahClientManaged /
aahClientCommon SDK references; Galaxy.Host project-references the sidecar so
MxAccessGalaxyBackend keeps building until PR 7.2 retires Galaxy.Host entirely.

10 source files moved (preserving git history via git mv):
  IHistorianDataSource, HistorianDataSource, HistorianClusterEndpointPicker,
  HistorianClusterNodeState, HistorianConfiguration, HistorianEventDto,
  HistorianHealthSnapshot, HistorianQualityMapper, HistorianSample,
  IHistorianConnectionFactory.

2 historian tests moved alongside (HistorianClusterEndpointPickerTests,
HistorianQualityMapperTests). Sidecar test project now hosts 29 tests (1 PR 3.1
smoke + 28 moved historian tests, all passing).

Galaxy.Host's remaining 6 historian-flavored tests (HistorianWiringTests,
HistoryReadAtTimeTests, HistoryReadEventsTests, HistoryReadProcessedTests)
keep passing via the project reference — using directives updated to reach
the new namespace.

Sidecar deliberately speaks no Core.Abstractions — its surface is the legacy
List<HistorianSample> shape; PR 3.4's .NET 10 client translates to the
Core.Abstractions shapes added in PR 1.1.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-29 14:13:13 -04:00
Joseph Doherty
ef22a61c39 v2 mxgw migration — Phase 1+2+3.1 wiring (7 PRs)
Foundational PRs from lmx_mxgw_impl.md, all green. Bodies only — DI/wiring
deferred to PR 1+2.W (combined wire-up) and PR 3.W.

PR 1.1 — IHistorianDataSource lifted to Core.Abstractions/Historian/
  Reuses existing DataValueSnapshot + HistoricalEvent shapes; sidecar (PR
  3.4) translates byte-quality → uint StatusCode internally.

PR 1.2 — IHistoryRouter + HistoryRouter on the server
  Longest-prefix-match resolution, case-insensitive, ObjectDisposed-guarded,
  swallow-on-shutdown disposal of misbehaving sources.

PR 1.3 — DriverNodeManager.HistoryRead* dispatch through IHistoryRouter
  Per-tag resolution with LegacyDriverHistoryAdapter wrapping
  `_driver as IHistoryProvider` so existing tests + drivers keep working
  until PR 7.2 retires the fallback.

PR 2.1 — AlarmConditionInfo extended with five sub-attribute refs
  InAlarmRef / PriorityRef / DescAttrNameRef / AckedRef / AckMsgWriteRef.
  Optional defaulted parameters preserve all existing 3-arg call sites.

PR 2.2 — AlarmConditionService state machine in Server/Alarms/
  Driver-agnostic port of GalaxyAlarmTracker. Sub-attribute refs come from
  AlarmConditionInfo, values arrive as DataValueSnapshot, ack writes route
  through IAlarmAcknowledger. State machine preserves Active/Acknowledged/
  Inactive transitions, Acked-on-active reset, post-disposal silence.

PR 2.3 — DriverNodeManager wires AlarmConditionService
  MarkAsAlarmCondition registers each alarm-bearing variable with the
  service; DriverWritableAcknowledger routes ack-message writes through
  the driver's IWritable + CapabilityInvoker. Service-raised transitions
  route via OnAlarmServiceTransition → matching ConditionSink. Legacy
  IAlarmSource path unchanged for null service.

PR 3.1 — Driver.Historian.Wonderware shell project (net48 x86)
  Console host shell + smoke test; SDK references + code lift come in
  PR 3.2.

Tests: 9 (PR 1.1) + 5 (PR 2.1) + 10 (PR 1.2) + 19 (PR 2.2) + 1 (PR 3.1)
all pass. Existing AlarmSubscribeIntegrationTests + HistoryReadIntegrationTests
unchanged.

Plan + audit docs (lmx_backend.md, lmx_mxgw.md, lmx_mxgw_impl.md)
included so parallel subagent worktrees can read them.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-29 14:03:36 -04:00
Joseph Doherty
012c42a846 Task #156 — TagsTab: per-tag advanced Modbus fields (Deadband, UnitId, CoalesceProhibited)
#155 wired the basic tag form (Name / Driver / Equipment / DataType / Access /
WriteIdempotent + ModbusAddressEditor for the address). The per-tag knobs added
across #141 / #142 / #143 still required operators to hand-edit TagConfig JSON.
This commit exposes them through an "Advanced" expander.

UI changes (TagsTab.razor):

- Collapsible "▶ Advanced (Deadband / UnitId override / CoalesceProhibited)"
  button below the address editor, visible only when the selected driver is
  Modbus. Collapsed by default — basic form covers the typical edit workflow.
- Three numeric / checkbox inputs with inline help text explaining each knob's
  purpose and when to use it.
- _showAdvanced auto-opens on Edit when any of the advanced fields are present
  in the existing TagConfig — operators see immediately what's been configured.

Save-side serialization:

- New RefreshTagConfigJson serializes the address + advanced fields into a
  structured JSON object using a Dictionary<string, object?>. Fields with
  default / empty values are omitted to keep diffs in the existing draft-diff
  viewer minimal — a tag with only an address still produces
  `{"addressString":"40001:F"}` and not a full superset object with nulls.
- OnAddressChanged + OnAdvancedChanged both delegate to RefreshTagConfigJson
  so any input change keeps TagConfig in sync.

Read-side hydration:

- New HydrateModbusFromTagConfig parses an existing TagConfig JSON and
  populates _modbusAddress + the three advanced fields. Falls back to empty
  defaults on malformed JSON. ResetAdvanced is called before hydration on
  every form open so leftover state from a previous edit doesn't leak.

ResetAdvanced helper introduced + called from StartAdd so a fresh "New tag"
form starts with everything cleared.

Tests (1 new in TagServiceTests):
- TagConfig_With_Advanced_Modbus_Fields_RoundTrips_Through_Factory — creates a
  tag whose TagConfig carries addressString + deadband + unitId +
  coalesceProhibited, persists via TagService, reloads, asserts every field
  survives. Then constructs a wrapping driver-config JSON and feeds it to
  ModbusDriverFactoryExtensions.CreateInstance — confirms the field NAMES the
  UI emits match what BuildTag's DTO consumes. If the UI's JSON shape ever
  drifts from the factory's expected DTO, this test catches it before users do.

119 + 1 = 120 Admin tests green. Solution build clean.
2026-04-25 04:22:50 -04:00
Joseph Doherty
ec57df1009 Task #155 — TagService + TagsTab CRUD UI for Modbus tags
Closes the remaining loop on user-visible Modbus tag editing. Pre-#155 tags
arrived only via SQL seeding or runtime ITagDiscovery; the Admin UI had no
interactive surface for creating / editing / deleting tag rows.

Changes:

- TagService.cs (Admin/Services/) — CRUD wrapper around OtOpcUaConfigDbContext.Tags.
  ListAsync supports optional driver / equipment filters; CreateAsync auto-derives
  TagId; UpdateAsync persists editable fields; DeleteAsync removes the row. Mirrors
  the EquipmentService shape.
- TagsTab.razor (Components/Pages/Clusters/) — list + filter + add/edit/remove form.
  The address/config editor is conditional: when the selected DriverInstance is
  Modbus, ModbusAddressEditor (#145) renders with live-parse preview; otherwise a
  generic JSON textarea (matches the DriversTab pattern from #147). Save-side
  serializes the address-string into TagConfig as `{"addressString":"..."}` JSON.
- ClusterDetail.razor — new "Tags" tab in the cluster-detail nav strip + the routing
  switch.
- Program.cs — TagService registered as a scoped DI service.

Drive-by fix: ModbusDriverFactoryExtensions.CreateInstance promoted from internal
to public — Admin.Tests was using it via reflection-friendly internal access that
broke under the #153 logger overload addition. Public is the right access modifier
anyway since the Server-side bootstrapper calls it from a different assembly.

Drive-by fix #2: ModbusDriverConfigDto was missing MaxReadGap (#143) — surfaced by
the #147 round-trip test that flips MaxReadGap=12 in the view model and asserts
it lands on the resolved options. Added the field + binding line. Confirms #143's
DriverConfig JSON binding was incomplete since the original commit; no production
deployment configured this knob through JSON until now so the gap stayed hidden.

Tests (4 new TagServiceTests):
- Create_And_List_Surfaces_The_Tag — CreateAsync auto-assigns TagId; list returns
  the row.
- List_Filters_By_DriverInstance — driver-scoped filter works.
- Update_Persists_Editable_Fields — Name / DataType / AccessLevel / TagConfig all
  persist through Update.
- Delete_Removes_The_Row — basic delete verification.

113 + 4 (TagService) + 2 (DriversTab round-trip restored after compile fix) = 119
Admin tests green. Solution build clean.

Caveat: bUnit-style render tests for TagsTab still aren't included — Admin.Tests
doesn't have bUnit set up. The TagService logic is fully covered; the razor
component's parser/save glue is exercised by hand at runtime for now.
2026-04-25 01:51:02 -04:00
Joseph Doherty
802366c2c6 Task #154 — driver-diagnostics RPC: HTTP endpoint + Admin client
Foundation for surfacing per-driver runtime state from the Server process to
the Admin UI. #152 shipped GetAutoProhibitedRanges() as an in-process
accessor; #154 makes it reachable across processes.

Server side (HealthEndpointsHost):
- New URL family: /diagnostics/drivers/{driverInstanceId}/{driverType}/{topic}
- First wired topic: /diagnostics/drivers/{id}/modbus/auto-prohibited
- Driver-agnostic at the URL level — future driver types add their own
  segments[3] cases (e.g. /diagnostics/drivers/{id}/s7/dropped-pdus).
- 404 when the driver instance doesn't exist; 400 when the driver exists
  but isn't a Modbus driver (the per-type endpoint is wrong for this row).
- Response shape is flat JSON (unitId / region / startAddress / endAddress /
  lastProbedUtc / bisectionPending) so consumers don't have to reference the
  Driver.Modbus assembly's ModbusAutoProhibition record.
- Re-uses the existing HttpListener bound to localhost:4841 — same auth /
  reachability story as /healthz and /readyz.

Admin side:
- DriverDiagnosticsClient (Services/) — HttpClient wrapper that fetches the
  per-driver Modbus prohibition list. Returns null on 404/400 (driver
  missing or wrong type); throws on transport failures.
- ModbusAutoProhibitionsResponse + ModbusAutoProhibitionRow flat DTOs —
  client doesn't take a dep on Driver.Modbus.
- ModbusDiagnostics.razor at /modbus/diagnostics/{driverInstanceId} —
  table view with BISECTING (warning yellow) / ISOLATED (danger red)
  badges, relative timestamps (e.g. "5m ago"), Refresh button. Errors
  surface inline rather than swallowing.
- HttpClient registration in Program.cs reads
  DriverDiagnostics:ServerBaseUrl from appsettings.json (default
  http://localhost:4841/ for same-host deployments).

Tests (3 new in HealthEndpointsHostTests):
- Diagnostics_ReturnsModbusAutoProhibitions_ForLiveDriver — registers a
  Modbus driver with a programmable transport that protects register 102,
  records the prohibition via a coalesced ReadAsync, hits the endpoint,
  asserts the returned JSON matches (unitId / region / start / end / pending).
- Diagnostics_404_When_Driver_Not_Found
- Diagnostics_400_When_Driver_Is_Wrong_Type

Architecture note: the Admin-side bUnit-style component test isn't included
because Admin.Tests doesn't have bUnit set up. The DriverDiagnosticsClient
is unit-testable on its own with a mock HandlerStub if needed — left as a
follow-up alongside the broader bUnit setup task.

The diagnostic page is now reachable at /modbus/diagnostics/{driverId} from
any Admin instance pointing at a Server endpoint URL. Future driver types
(S7, AbCip) plug into the same channel by adding their own URL segments
in HealthEndpointsHost.WriteDriverDiagnosticsAsync.
2026-04-25 01:32:21 -04:00
Joseph Doherty
8004394892 Task #153 — ModbusDriver: inject ILogger so prohibition events reach a sink
#152 left a hook for structured logging when an auto-prohibition first
fires; this commit completes the wiring.

Changes:
- ModbusDriver constructor takes an optional ILogger<ModbusDriver> (defaults
  to NullLogger). Existing standalone callers stay compile-clean.
- RecordAutoProhibition logs LogWarning on first-fire only (re-fires of the
  same range stay quiet via the existing isNew de-dupe). Format includes
  DriverInstanceId, UnitId, Region, Start, End, Span — log aggregators can
  filter / count by any field.
- New LogProhibitionCleared helper called by both StraightReprobeAsync (when
  the re-probe succeeds on a single-register range) and BisectAndReprobeAsync
  (per-half clearing + a single combined line when both halves succeed).
- ModbusDriverFactoryExtensions.Register accepts an optional ILoggerFactory.
  Captured at registration time and used in the factory closure to construct
  a per-driver logger. Server bootstrap code that already has an ILoggerFactory
  in DI threads it through with a single argument addition; old call sites
  (Register(registry)) keep working with a null logger.

Tests (2 new ModbusLoggerInjectionTests):
- First_Failure_Emits_Single_Warning_Subsequent_Refire_Stays_Quiet — pins
  the de-dupe behaviour. First scan logs one warning with the expected
  structured fields; second scan with the same prohibition stays silent.
- Reprobe_Clearing_Prohibition_Emits_Information_Log — protected register
  unlocked between record and re-probe; re-probe success emits an info log
  containing "cleared".

CapturingLogger test harness is purpose-built (xUnit doesn't ship a logger
mock by default and adding Moq is overkill for two tests).

240 + 2 = 242 unit tests green.
2026-04-25 01:26:20 -04:00
Joseph Doherty
b8df230eb8 Task #152 — Modbus coalescing: surface auto-prohibitions through diagnostics
Auto-prohibited ranges (#148) were previously visible only through an
internal AutoProhibitedRangeCount accessor used by tests. Production
operators had no way to see what the planner had learned without pulling
logs or inspecting driver state.

Changes:

- New public record `ModbusAutoProhibition(UnitId, Region, StartAddress,
  EndAddress, LastProbedUtc, BisectionPending)` — operator-facing snapshot
  shape. Lives in the addressing assembly's logical namespace alongside
  the other public types.
- `ModbusDriver.GetAutoProhibitedRanges()` returns
  `IReadOnlyList<ModbusAutoProhibition>` — a copy of the live prohibition
  map. Lock-protected snapshot so consumers don't race with the re-probe
  loop.
- RecordAutoProhibition tracks first-fire vs re-fire via the dictionary
  insert path, leaving a hook to add structured logging once an ILogger
  is plumbed through (currently elided to keep the constructor minimal
  for testability — a future change can wire ILogger and emit a single
  warning per first-fire).

Tests (1 new, additive to the 6 in ModbusCoalescingAutoRecoveryTests):
- GetAutoProhibitedRanges_Surfaces_Operator_Visible_Snapshot — confirms
  the snapshot shape: empty before any failure, populated with correct
  UnitId/Region/Start/End/BisectionPending after a failed coalesced read,
  LastProbedUtc within the recent past.

Docs:
- docs/v2/modbus-addressing.md — new "Coalescing auto-recovery" subsection
  consolidates the #148/#150/#151/#152 surface in one place. Documents
  the diagnostic accessor + flags the in-process consumption pattern
  (Server health endpoints today; Admin UI when an RPC channel exists).

239 + 1 = 240 unit tests green.

Caveat: the Admin UI surfacing (table render, "clear all prohibitions"
button) is intentionally NOT shipped here. Admin can't reach a live
ModbusDriver instance without a driver-diagnostics RPC channel that
doesn't exist yet — that's a larger architectural piece. For now the
data is queryable in-process by the Server's health endpoints; once an
RPC channel lands, Admin can wire the existing GetAutoProhibitedRanges
into a Blazor table without further driver changes.
2026-04-25 01:19:10 -04:00
Joseph Doherty
f823c81c96 Task #150 — Modbus coalescing: bisection-style range narrowing
Pre-#150 a coalesced read failure recorded the FULL failed range as
permanently prohibited. Healthy registers around the actual protected
register stayed in per-tag mode forever (until ReinitializeAsync). The
re-probe loop shipped in #151 retried the whole range as a single block,
which would either succeed (clearing everything) or fail (changing
nothing).

Post-#150 the re-probe loop bisects multi-register prohibitions:

- _autoProhibited refactored from Dictionary<key, DateTime> to
  Dictionary<key, ProhibitionState> where ProhibitionState carries
  LastProbedUtc + SplitPending. Multi-register prohibitions enter with
  SplitPending=true; single-register prohibitions enter with
  SplitPending=false (already minimal).
- ReprobeLoopAsync delegates the per-pass work to
  RunReprobeOnceForTestAsync (also exposed for synchronous test driving).
  Each entry routes to BisectAndReprobeAsync (split-pending + multi-reg)
  or StraightReprobeAsync (single-reg / non-split-pending).
- Bisection: split (start, end) at mid = (start+end)/2. Try (start, mid)
  and (mid+1, end) as separate coalesced reads. Each FAILED half re-enters
  the prohibition map with SplitPending = (its end > its start). SUCCEEDED
  halves vanish, freeing the planner to coalesce across them on the next
  scan.
- Convergence: log2(span) re-probe ticks pin the prohibition to the
  actual single offending register(s). For a 100-register block with one
  protected address that's ~7 ticks.

Tests (3 new ModbusCoalescingBisectionTests):
- Bisection_Narrows_Multi_Register_Prohibition_Per_Reprobe — 11 tags
  100..110 with protected address 105. After 4 re-probe passes the
  prohibition collapses from (100..110) → (100..105) → (103..105) →
  (105..105).
- Bisection_Clears_When_Both_Halves_Are_Healthy — transient failure
  scenario; protection lifted before re-probe; both bisection halves
  succeed and the parent vanishes entirely.
- Bisection_Splits_Into_Two_When_Both_Halves_Still_Fail — TwoHoleTransport
  with protected addresses 102 + 108 in the same coalesced range. After
  bisection both halves still fail (each contains one of the protected
  addresses); the prohibition map grows to 2 entries.

236 + 3 = 239 unit tests green. Solution build clean.
2026-04-25 01:16:09 -04:00
Joseph Doherty
9e4aae350b Task #151 — Modbus coalescing: periodic re-probe of auto-prohibitions
#148 introduced auto-prohibited coalesced ranges that persist for the
driver lifetime. Long-running deployments with transient PLC permission
changes (firmware update unlocking a previously-protected register,
operator reconfiguring the device) had no recovery short of operator
restart.

Adds an opt-in background loop that re-probes each prohibition periodically:

- ModbusDriverOptions.AutoProhibitReprobeInterval (TimeSpan?, default null
  = disabled). Set to e.g. TimeSpan.FromHours(1) to opt in.
- _autoProhibited refactored from HashSet<key> to Dictionary<key, DateTime>
  so each entry tracks its last failure / last re-probe timestamp.
- ReprobeLoopAsync runs on the same Task.Run pattern as ProbeLoopAsync;
  cancelled by ShutdownAsync. Each tick snapshots the prohibition set
  and issues a one-shot coalesced read per range. Successful re-probes
  drop the prohibition; failed ones bump the timestamp + leave the
  prohibition in place.
- Communication failures during re-probe (transport-level) are treated
  the same as PLC-exception failures — the prohibition stays, but isn't
  upgraded to "permanent" since transports recover. The driver-instance
  health surface picks up the failure separately.
- ShutdownAsync explicitly clears the prohibition set so a manual restart
  via ReinitializeAsync starts with a clean slate (matches the old
  "restart to clear" semantics).
- Factory DTO + JSON binding extended with AutoProhibitReprobeMs field.

Tests (2 new, additive to the 3 in ModbusCoalescingAutoRecoveryTests):
- Reprobe_Clears_Prohibition_When_Range_Becomes_Healthy — protected
  register at 102 records prohibition; clearing the simulated protection
  + invoking the re-probe drops the prohibition.
- Reprobe_Leaves_Prohibition_When_Range_Is_Still_Bad — re-probe on a
  still-failing range keeps the prohibition in place.

Tests use a new internal RunReprobeOnceForTestAsync helper to fire one
re-probe pass synchronously, so the suite doesn't have to wait on the
background timer (the loop's timer behaviour is exercised implicitly via
the InitializeAsync wire-up + the synchronous helper sharing the actual
re-probe code path).

234 + 2 = 236 unit tests green.
2026-04-25 01:12:48 -04:00
Joseph Doherty
8de152df4f Task #149 — Modbus address-preview page + ImportEquipment help
The original task scope assumed a per-tag editor lived in EquipmentTab.razor
or a similar surface. Reading the codebase confirmed that's not the case:
tags are seeded via SQL (scripts/smoke/*) or arrive at runtime through
ITagDiscovery; the Admin UI has no per-tag CRUD page today. Equipment
import is for equipment metadata (Name / MachineCode / ZTag / SAPID /
Identification) — not tag rows.

Adjusted scope:

1. ModbusAddressPreview.razor — new standalone page at /modbus/address-preview.
   Hosts the ModbusAddressEditor component shipped in #145 + the family
   selector + a copy-pasteable grammar reference. Operators can sanity-check
   address-string syntax (40001:F:CDAB / HR1:I / V2000:F / D100:I etc.)
   without committing it to a config row first.

2. ImportEquipment.razor — appended a secondary alert banner clarifying
   that Modbus per-tag addressing isn't part of equipment import; points
   users at the Drivers tab + the new preview tool.

Builds clean against the existing Admin app. The actual per-tag CRUD UI is
still a separate piece of work — when it ships, it can drop in
ModbusAddressEditor directly. The preview page acts as the canonical
demonstration of how to use the component.

Razor caveat: the grammar reference uses literal `<...>` syntax tokens
that the Razor parser interprets as malformed elements when inlined in a
<pre> block. Held as a string field (_grammarReference) and rendered
through @ binding to sidestep the parser conflict.
2026-04-25 01:09:24 -04:00
Joseph Doherty
3b0e093002 Task #148 — Modbus block-coalescing: auto-recover from protected register holes
Pre-#148 behaviour: a coalesced FC03/FC04 read that crossed a write-only or
PLC-fault register marked every member tag Bad until the operator manually
flagged the offending tag with CoalesceProhibited. Healthy tags around the
hole stayed broken indefinitely.

Post-#148: two-stage recovery, no operator intervention needed.

1. Same-scan fallback: when a coalesced read fails with a Modbus exception
   (IllegalDataAddress, SlaveDeviceFailure, etc.), the planner does NOT
   mark members handled. The per-tag fallback in the same scan reads each
   member individually — non-protected members surface Good values
   immediately, and only the actual protected register stays Bad.

2. Cross-scan prohibition: the failed range (Unit, Region, Start, End) is
   recorded in a per-driver `_autoProhibited` set. On subsequent scans the
   planner checks each candidate merge against the set and refuses to
   re-form any block that overlaps a known-bad range. Net effect: after one
   scan with a failure, the protected range goes "per-tag mode" indefinitely
   while ranges around it keep coalescing normally.

Communication failures (timeouts, socket drops) are NOT auto-prohibited —
they're transport-level, not structural. The same coalesced read can succeed
once the transport recovers; recording it as "permanently bad" would defeat
coalescing for the whole driver instance.

Auto-prohibition state lives for the driver lifetime and clears on
ReinitializeAsync (operator restart). A periodic re-probe is a follow-up if
deployments need it without a restart.

Implementation:
- Added `_autoProhibited` HashSet<(byte, ModbusRegion, ushort, ushort)> +
  `_autoProhibitedLock` on ModbusDriver.
- `RangeIsAutoProhibited(unit, region, start, end)` overlap check called
  from the planner when forming blocks.
- `RecordAutoProhibition(...)` called from the catch (ModbusException)
  branch.
- The catch (Exception) branch (non-Modbus failures) keeps the pre-#148
  "mark all Bad in this scan, don't auto-prohibit" behaviour.
- Internal `AutoProhibitedRangeCount` accessor for tests.

Tests (3 new ModbusCoalescingAutoRecoveryTests):
- First_Failure_Falls_Back_To_PerTag_Same_Scan — three tags around a
  protected register at 102: T100 + T104 surface Good values via the
  per-tag fallback in the SAME scan; T102 surfaces the exception.
- Second_Scan_Skips_Coalesced_Read_Of_Prohibited_Range — confirms scan 2
  doesn't re-attempt the failed merge (no FC03 with quantity > 1 at the
  prohibited start).
- Tags_Outside_Prohibited_Range_Still_Coalesce — separate cluster at HR
  200..202 keeps coalescing normally even after the 100..104 cluster is
  prohibited.

234/234 unit tests green.

Follow-ups intentionally NOT shipped (smaller, independent changes):
- Bisection-style range narrowing — currently the prohibition range is the
  full failed block; the planner doesn't try to find the exact protected
  register. Operator-visible diagnostic + prohibition stays correct.
- Periodic re-probe to clear stale prohibitions.
- Surface auto-prohibited ranges through GetHostStatuses or a new
  diagnostic so the Admin UI can show what's been auto-isolated.
2026-04-25 01:01:42 -04:00
Joseph Doherty
0b7653d3b2 Task #147 — wire ModbusOptionsEditor into DriversTab
Branches the DriversTab driver-add form on driver type:
- For DriverType=Modbus, render the typed <ModbusOptionsEditor> component
  shipped in #145 instead of the generic JSON textarea.
- For other driver types, the existing textarea stays (other drivers ship
  their own typed editors per decision #94).

On Save, when type is Modbus, the form serialises ModbusOptionsViewModel
into the JSON DTO shape ModbusDriverFactoryExtensions consumes (host /
port / unitId / family / keepAlive / reconnect / max*** / writeOnChangeOnly
/ etc.). Other types still pass the textarea contents verbatim.

Drive-by fix: the DriverType dropdown listed "ModbusTcp" but the actual
factory-registered name is "Modbus" — DriverInstanceBootstrapper would
silently skip a row created with the old label because the factory lookup
would miss. Renamed to match.

Tests (2 new in ModbusOptionsViewModelTests):
- DriversTab_Serialized_Defaults_RoundTrip_Through_Factory — unedited
  view-model serializes to a JSON the factory accepts; resulting
  ModbusDriverOptions matches the form defaults bit-for-bit.
- DriversTab_Serializes_Edited_Values_Correctly — flipping Host / Port /
  UnitId / Family / MaxReadGap / WriteOnChangeOnly in the view model
  surfaces in the constructed driver's options.

The serializer in the test mirrors DriversTab.razor's SerializeModbusOptions
helper. If the form's serialization shape drifts, both must be updated
together; that's the cost of testing through the JSON DTO without bUnit.

Follow-up still open: the per-tag editor (ModbusAddressEditor wiring into
EquipmentTab.razor + the bulk-import help-text update) — that's a separate
surface that touches the equipment-row CRUD flow; covered as a follow-up
when the equipment tag editor surface is next touched.
2026-04-25 00:58:03 -04:00
Joseph Doherty
dfd027ebca Task #146 — Modbus addressing: align type codes with Wonderware DASMBTCP + Ignition
Web verification (2026-04-25) against current vendor docs surfaced concrete
grammar conflicts in the v1 suffix grammar shipped in #137. Hard cutover
before the Admin UI rolls out widely so users don't paste `:I` from a
Wonderware spreadsheet and silently get wrong-typed reads.

Sources:
- Wonderware DASMBTCP user guide
  https://cdn.logic-control.com/media/DASMBTCP.pdf
- Ignition Modbus addressing (8.1)
  https://www.docs.inductiveautomation.com/docs/8.1/ignition-modules/opc-ua/opc-ua-drivers/modbus/modbus-addressing

Type-code changes:

| Code   | Pre-#146 | Post-#146  | Vendor reference            |
|--------|----------|------------|------------------------------|
| `:S`   | (n/a)    | Int16      | Wonderware DASMBTCP `S`      |
| `:US`  | (n/a)    | UInt16     | Ignition `HRUS`              |
| `:I`   | Int16    | **Int32**  | Wonderware `I` + Ignition `HRI` |
| `:UI`  | UInt16   | **UInt32** | Ignition `HRUI`              |
| `:I_64`  | (n/a)  | Int64      | Ignition `HRI_64`            |
| `:UI_64` | (n/a)  | UInt64     | Ignition `HRUI_64`           |
| `:BCD_32`| (n/a)  | BCD32      | Ignition `HRBCD_32`          |

Codes REMOVED (no clear vendor precedent + conflict with the new mapping):
`:DI`, `:L`, `:UDI`, `:UL`, `:LI`, `:ULI`, `:LBCD`. Pre-#146 configs that
use them get an "Unknown type code" diagnostic at parse time so users get
a fast surface-level error rather than silent wrong-typed reads.

Codes UNCHANGED (already vendor-aligned): `:BOOL`, `:F`, `:D`, `:BCD`,
`:STR<n>`. Modicon 5/6-digit + mnemonic regions (HR/IR/C/DI) + bit suffix
`.N` are also unchanged.

Defaults:
- Coils / DiscreteInputs → `BOOL` (unchanged)
- HoldingRegisters / InputRegisters with no explicit type → Int16 (matches
  Ignition's bare `HR` default)

Byte-order mnemonics (`:ABCD` / `:CDAB` / `:BADC` / `:DCBA`) are kept but
documented as OtOpcUa-specific — they aren't in any major vendor's per-tag
address string. Ignition uses a `-R` suffix per prefix; Wonderware
configures word-order at the topic level.

Tests:
- 12 Type_Codes_Parse rows updated to assert the new mappings.
- New Removed_Aliases_Are_Rejected (×7) confirms each pre-#146 alias now
  fails fast with "Unknown type code".
- Worked_Example_Int16_Array uses the new `:S` code.
- New Worked_Example_Int32_Array_Via_I_Code documents the `:I = Int32`
  vendor-alignment intent so a future "fix" doesn't accidentally regress.
- Unknown_Type_Code_Rejected_With_Catalog updated to match the new error
  message ("Valid: BOOL, S, US, I, ...").

Docs:
- docs/v2/modbus-addressing.md — table replaced with the post-#146 codes,
  each row cites its Wonderware / Ignition reference. New "Codes removed
  in #146" subsection documents the cutover.
- docs/Driver.Modbus.Cli.md — example grammar list updated; explicit
  type-code reminder appended.

114 addressing tests + 231 driver tests still green. Solution build clean.
2026-04-25 00:51:50 -04:00
Joseph Doherty
5ea57d2d70 Task #138 — Modbus addressing grammar docs + e2e
Closes the docs/e2e end of the Modbus addressing line shipped across
#136-#145.

Docs:

- docs/v2/modbus-addressing.md (new) — full grammar reference.
  Region+offset (Modicon 5-digit / 6-digit / mnemonic), bit suffix,
  type codes (BOOL / I / UI / DI / UDI / LI / ULI / F / D / BCD / LBCD /
  STR<n>), all four byte-order mnemonics (ABCD / CDAB / BADC / DCBA),
  array-count semantics, family-native syntax (DL205 V/Y/C/X/SP and
  MELSEC D/M/X/Y with hex-vs-octal sub-family selection), driver-instance
  options (KeepAlive / Reconnect / IdleDisconnect, MaxCoilsPerRead and
  FC15/16 forcing, Deadband + WriteOnChangeOnly, MaxReadGap +
  CoalesceProhibited, multi-unit IPerCallHostResolver). Includes a worked
  JSON DTO example mixing AddressString + structured tag forms.

- docs/Driver.Modbus.Cli.md — appended a "v2 addressing grammar" section
  pointing users at the full reference, with quick-reference examples.

- Vendor-compatibility caveat documented: type codes and byte-order
  mnemonics were synthesised from training-era vendor docs (Wonderware
  DASMBTCP, Kepware KEPServerEX, Ignition, Matrikon, OAS) and should be
  verified against current vendor manuals before locking for production.

E2E tests (4 new AddressingGrammarTests in IntegrationTests):
- Modicon 5-digit and 6-digit forms map to identical wire offsets.
- Float32 + WordSwap (CDAB) round-trips end-to-end through the
  pymodbus simulator.
- Int16[5] array round-trips as a typed short[] surface.
- Block-read coalescing produces a wire-acceptable PDU when MaxReadGap=5
  bridges three nearby tags.

All tests skip gracefully when the pymodbus simulator at localhost:5020
is unreachable (matches the existing ModbusSimulatorFixture pattern).

Final test count across the Modbus addressing surface:
- 107 ModbusAddressing.Tests (parser + family + Modicon)
- 231 Driver.Modbus.Tests (driver, byte order, array, multi-unit, coalescing,
  protocol, subscribe, connection options)
- 110 Admin.Tests (incl. ModbusOptionsViewModel defaults pinning)
- 4 new AddressingGrammar integration tests (skip when sim down)
2026-04-25 00:32:27 -04:00