Commit Graph

255 Commits

Author SHA1 Message Date
Joseph Doherty
802366c2c6 Task #154 — driver-diagnostics RPC: HTTP endpoint + Admin client
Foundation for surfacing per-driver runtime state from the Server process to
the Admin UI. #152 shipped GetAutoProhibitedRanges() as an in-process
accessor; #154 makes it reachable across processes.

Server side (HealthEndpointsHost):
- New URL family: /diagnostics/drivers/{driverInstanceId}/{driverType}/{topic}
- First wired topic: /diagnostics/drivers/{id}/modbus/auto-prohibited
- Driver-agnostic at the URL level — future driver types add their own
  segments[3] cases (e.g. /diagnostics/drivers/{id}/s7/dropped-pdus).
- 404 when the driver instance doesn't exist; 400 when the driver exists
  but isn't a Modbus driver (the per-type endpoint is wrong for this row).
- Response shape is flat JSON (unitId / region / startAddress / endAddress /
  lastProbedUtc / bisectionPending) so consumers don't have to reference the
  Driver.Modbus assembly's ModbusAutoProhibition record.
- Re-uses the existing HttpListener bound to localhost:4841 — same auth /
  reachability story as /healthz and /readyz.

Admin side:
- DriverDiagnosticsClient (Services/) — HttpClient wrapper that fetches the
  per-driver Modbus prohibition list. Returns null on 404/400 (driver
  missing or wrong type); throws on transport failures.
- ModbusAutoProhibitionsResponse + ModbusAutoProhibitionRow flat DTOs —
  client doesn't take a dep on Driver.Modbus.
- ModbusDiagnostics.razor at /modbus/diagnostics/{driverInstanceId} —
  table view with BISECTING (warning yellow) / ISOLATED (danger red)
  badges, relative timestamps (e.g. "5m ago"), Refresh button. Errors
  surface inline rather than swallowing.
- HttpClient registration in Program.cs reads
  DriverDiagnostics:ServerBaseUrl from appsettings.json (default
  http://localhost:4841/ for same-host deployments).

Tests (3 new in HealthEndpointsHostTests):
- Diagnostics_ReturnsModbusAutoProhibitions_ForLiveDriver — registers a
  Modbus driver with a programmable transport that protects register 102,
  records the prohibition via a coalesced ReadAsync, hits the endpoint,
  asserts the returned JSON matches (unitId / region / start / end / pending).
- Diagnostics_404_When_Driver_Not_Found
- Diagnostics_400_When_Driver_Is_Wrong_Type

Architecture note: the Admin-side bUnit-style component test isn't included
because Admin.Tests doesn't have bUnit set up. The DriverDiagnosticsClient
is unit-testable on its own with a mock HandlerStub if needed — left as a
follow-up alongside the broader bUnit setup task.

The diagnostic page is now reachable at /modbus/diagnostics/{driverId} from
any Admin instance pointing at a Server endpoint URL. Future driver types
(S7, AbCip) plug into the same channel by adding their own URL segments
in HealthEndpointsHost.WriteDriverDiagnosticsAsync.
2026-04-25 01:32:21 -04:00
Joseph Doherty
8004394892 Task #153 — ModbusDriver: inject ILogger so prohibition events reach a sink
#152 left a hook for structured logging when an auto-prohibition first
fires; this commit completes the wiring.

Changes:
- ModbusDriver constructor takes an optional ILogger<ModbusDriver> (defaults
  to NullLogger). Existing standalone callers stay compile-clean.
- RecordAutoProhibition logs LogWarning on first-fire only (re-fires of the
  same range stay quiet via the existing isNew de-dupe). Format includes
  DriverInstanceId, UnitId, Region, Start, End, Span — log aggregators can
  filter / count by any field.
- New LogProhibitionCleared helper called by both StraightReprobeAsync (when
  the re-probe succeeds on a single-register range) and BisectAndReprobeAsync
  (per-half clearing + a single combined line when both halves succeed).
- ModbusDriverFactoryExtensions.Register accepts an optional ILoggerFactory.
  Captured at registration time and used in the factory closure to construct
  a per-driver logger. Server bootstrap code that already has an ILoggerFactory
  in DI threads it through with a single argument addition; old call sites
  (Register(registry)) keep working with a null logger.

Tests (2 new ModbusLoggerInjectionTests):
- First_Failure_Emits_Single_Warning_Subsequent_Refire_Stays_Quiet — pins
  the de-dupe behaviour. First scan logs one warning with the expected
  structured fields; second scan with the same prohibition stays silent.
- Reprobe_Clearing_Prohibition_Emits_Information_Log — protected register
  unlocked between record and re-probe; re-probe success emits an info log
  containing "cleared".

CapturingLogger test harness is purpose-built (xUnit doesn't ship a logger
mock by default and adding Moq is overkill for two tests).

240 + 2 = 242 unit tests green.
2026-04-25 01:26:20 -04:00
Joseph Doherty
b8df230eb8 Task #152 — Modbus coalescing: surface auto-prohibitions through diagnostics
Auto-prohibited ranges (#148) were previously visible only through an
internal AutoProhibitedRangeCount accessor used by tests. Production
operators had no way to see what the planner had learned without pulling
logs or inspecting driver state.

Changes:

- New public record `ModbusAutoProhibition(UnitId, Region, StartAddress,
  EndAddress, LastProbedUtc, BisectionPending)` — operator-facing snapshot
  shape. Lives in the addressing assembly's logical namespace alongside
  the other public types.
- `ModbusDriver.GetAutoProhibitedRanges()` returns
  `IReadOnlyList<ModbusAutoProhibition>` — a copy of the live prohibition
  map. Lock-protected snapshot so consumers don't race with the re-probe
  loop.
- RecordAutoProhibition tracks first-fire vs re-fire via the dictionary
  insert path, leaving a hook to add structured logging once an ILogger
  is plumbed through (currently elided to keep the constructor minimal
  for testability — a future change can wire ILogger and emit a single
  warning per first-fire).

Tests (1 new, additive to the 6 in ModbusCoalescingAutoRecoveryTests):
- GetAutoProhibitedRanges_Surfaces_Operator_Visible_Snapshot — confirms
  the snapshot shape: empty before any failure, populated with correct
  UnitId/Region/Start/End/BisectionPending after a failed coalesced read,
  LastProbedUtc within the recent past.

Docs:
- docs/v2/modbus-addressing.md — new "Coalescing auto-recovery" subsection
  consolidates the #148/#150/#151/#152 surface in one place. Documents
  the diagnostic accessor + flags the in-process consumption pattern
  (Server health endpoints today; Admin UI when an RPC channel exists).

239 + 1 = 240 unit tests green.

Caveat: the Admin UI surfacing (table render, "clear all prohibitions"
button) is intentionally NOT shipped here. Admin can't reach a live
ModbusDriver instance without a driver-diagnostics RPC channel that
doesn't exist yet — that's a larger architectural piece. For now the
data is queryable in-process by the Server's health endpoints; once an
RPC channel lands, Admin can wire the existing GetAutoProhibitedRanges
into a Blazor table without further driver changes.
2026-04-25 01:19:10 -04:00
Joseph Doherty
f823c81c96 Task #150 — Modbus coalescing: bisection-style range narrowing
Pre-#150 a coalesced read failure recorded the FULL failed range as
permanently prohibited. Healthy registers around the actual protected
register stayed in per-tag mode forever (until ReinitializeAsync). The
re-probe loop shipped in #151 retried the whole range as a single block,
which would either succeed (clearing everything) or fail (changing
nothing).

Post-#150 the re-probe loop bisects multi-register prohibitions:

- _autoProhibited refactored from Dictionary<key, DateTime> to
  Dictionary<key, ProhibitionState> where ProhibitionState carries
  LastProbedUtc + SplitPending. Multi-register prohibitions enter with
  SplitPending=true; single-register prohibitions enter with
  SplitPending=false (already minimal).
- ReprobeLoopAsync delegates the per-pass work to
  RunReprobeOnceForTestAsync (also exposed for synchronous test driving).
  Each entry routes to BisectAndReprobeAsync (split-pending + multi-reg)
  or StraightReprobeAsync (single-reg / non-split-pending).
- Bisection: split (start, end) at mid = (start+end)/2. Try (start, mid)
  and (mid+1, end) as separate coalesced reads. Each FAILED half re-enters
  the prohibition map with SplitPending = (its end > its start). SUCCEEDED
  halves vanish, freeing the planner to coalesce across them on the next
  scan.
- Convergence: log2(span) re-probe ticks pin the prohibition to the
  actual single offending register(s). For a 100-register block with one
  protected address that's ~7 ticks.

Tests (3 new ModbusCoalescingBisectionTests):
- Bisection_Narrows_Multi_Register_Prohibition_Per_Reprobe — 11 tags
  100..110 with protected address 105. After 4 re-probe passes the
  prohibition collapses from (100..110) → (100..105) → (103..105) →
  (105..105).
- Bisection_Clears_When_Both_Halves_Are_Healthy — transient failure
  scenario; protection lifted before re-probe; both bisection halves
  succeed and the parent vanishes entirely.
- Bisection_Splits_Into_Two_When_Both_Halves_Still_Fail — TwoHoleTransport
  with protected addresses 102 + 108 in the same coalesced range. After
  bisection both halves still fail (each contains one of the protected
  addresses); the prohibition map grows to 2 entries.

236 + 3 = 239 unit tests green. Solution build clean.
2026-04-25 01:16:09 -04:00
Joseph Doherty
9e4aae350b Task #151 — Modbus coalescing: periodic re-probe of auto-prohibitions
#148 introduced auto-prohibited coalesced ranges that persist for the
driver lifetime. Long-running deployments with transient PLC permission
changes (firmware update unlocking a previously-protected register,
operator reconfiguring the device) had no recovery short of operator
restart.

Adds an opt-in background loop that re-probes each prohibition periodically:

- ModbusDriverOptions.AutoProhibitReprobeInterval (TimeSpan?, default null
  = disabled). Set to e.g. TimeSpan.FromHours(1) to opt in.
- _autoProhibited refactored from HashSet<key> to Dictionary<key, DateTime>
  so each entry tracks its last failure / last re-probe timestamp.
- ReprobeLoopAsync runs on the same Task.Run pattern as ProbeLoopAsync;
  cancelled by ShutdownAsync. Each tick snapshots the prohibition set
  and issues a one-shot coalesced read per range. Successful re-probes
  drop the prohibition; failed ones bump the timestamp + leave the
  prohibition in place.
- Communication failures during re-probe (transport-level) are treated
  the same as PLC-exception failures — the prohibition stays, but isn't
  upgraded to "permanent" since transports recover. The driver-instance
  health surface picks up the failure separately.
- ShutdownAsync explicitly clears the prohibition set so a manual restart
  via ReinitializeAsync starts with a clean slate (matches the old
  "restart to clear" semantics).
- Factory DTO + JSON binding extended with AutoProhibitReprobeMs field.

Tests (2 new, additive to the 3 in ModbusCoalescingAutoRecoveryTests):
- Reprobe_Clears_Prohibition_When_Range_Becomes_Healthy — protected
  register at 102 records prohibition; clearing the simulated protection
  + invoking the re-probe drops the prohibition.
- Reprobe_Leaves_Prohibition_When_Range_Is_Still_Bad — re-probe on a
  still-failing range keeps the prohibition in place.

Tests use a new internal RunReprobeOnceForTestAsync helper to fire one
re-probe pass synchronously, so the suite doesn't have to wait on the
background timer (the loop's timer behaviour is exercised implicitly via
the InitializeAsync wire-up + the synchronous helper sharing the actual
re-probe code path).

234 + 2 = 236 unit tests green.
2026-04-25 01:12:48 -04:00
Joseph Doherty
8de152df4f Task #149 — Modbus address-preview page + ImportEquipment help
The original task scope assumed a per-tag editor lived in EquipmentTab.razor
or a similar surface. Reading the codebase confirmed that's not the case:
tags are seeded via SQL (scripts/smoke/*) or arrive at runtime through
ITagDiscovery; the Admin UI has no per-tag CRUD page today. Equipment
import is for equipment metadata (Name / MachineCode / ZTag / SAPID /
Identification) — not tag rows.

Adjusted scope:

1. ModbusAddressPreview.razor — new standalone page at /modbus/address-preview.
   Hosts the ModbusAddressEditor component shipped in #145 + the family
   selector + a copy-pasteable grammar reference. Operators can sanity-check
   address-string syntax (40001:F:CDAB / HR1:I / V2000:F / D100:I etc.)
   without committing it to a config row first.

2. ImportEquipment.razor — appended a secondary alert banner clarifying
   that Modbus per-tag addressing isn't part of equipment import; points
   users at the Drivers tab + the new preview tool.

Builds clean against the existing Admin app. The actual per-tag CRUD UI is
still a separate piece of work — when it ships, it can drop in
ModbusAddressEditor directly. The preview page acts as the canonical
demonstration of how to use the component.

Razor caveat: the grammar reference uses literal `<...>` syntax tokens
that the Razor parser interprets as malformed elements when inlined in a
<pre> block. Held as a string field (_grammarReference) and rendered
through @ binding to sidestep the parser conflict.
2026-04-25 01:09:24 -04:00
Joseph Doherty
3b0e093002 Task #148 — Modbus block-coalescing: auto-recover from protected register holes
Pre-#148 behaviour: a coalesced FC03/FC04 read that crossed a write-only or
PLC-fault register marked every member tag Bad until the operator manually
flagged the offending tag with CoalesceProhibited. Healthy tags around the
hole stayed broken indefinitely.

Post-#148: two-stage recovery, no operator intervention needed.

1. Same-scan fallback: when a coalesced read fails with a Modbus exception
   (IllegalDataAddress, SlaveDeviceFailure, etc.), the planner does NOT
   mark members handled. The per-tag fallback in the same scan reads each
   member individually — non-protected members surface Good values
   immediately, and only the actual protected register stays Bad.

2. Cross-scan prohibition: the failed range (Unit, Region, Start, End) is
   recorded in a per-driver `_autoProhibited` set. On subsequent scans the
   planner checks each candidate merge against the set and refuses to
   re-form any block that overlaps a known-bad range. Net effect: after one
   scan with a failure, the protected range goes "per-tag mode" indefinitely
   while ranges around it keep coalescing normally.

Communication failures (timeouts, socket drops) are NOT auto-prohibited —
they're transport-level, not structural. The same coalesced read can succeed
once the transport recovers; recording it as "permanently bad" would defeat
coalescing for the whole driver instance.

Auto-prohibition state lives for the driver lifetime and clears on
ReinitializeAsync (operator restart). A periodic re-probe is a follow-up if
deployments need it without a restart.

Implementation:
- Added `_autoProhibited` HashSet<(byte, ModbusRegion, ushort, ushort)> +
  `_autoProhibitedLock` on ModbusDriver.
- `RangeIsAutoProhibited(unit, region, start, end)` overlap check called
  from the planner when forming blocks.
- `RecordAutoProhibition(...)` called from the catch (ModbusException)
  branch.
- The catch (Exception) branch (non-Modbus failures) keeps the pre-#148
  "mark all Bad in this scan, don't auto-prohibit" behaviour.
- Internal `AutoProhibitedRangeCount` accessor for tests.

Tests (3 new ModbusCoalescingAutoRecoveryTests):
- First_Failure_Falls_Back_To_PerTag_Same_Scan — three tags around a
  protected register at 102: T100 + T104 surface Good values via the
  per-tag fallback in the SAME scan; T102 surfaces the exception.
- Second_Scan_Skips_Coalesced_Read_Of_Prohibited_Range — confirms scan 2
  doesn't re-attempt the failed merge (no FC03 with quantity > 1 at the
  prohibited start).
- Tags_Outside_Prohibited_Range_Still_Coalesce — separate cluster at HR
  200..202 keeps coalescing normally even after the 100..104 cluster is
  prohibited.

234/234 unit tests green.

Follow-ups intentionally NOT shipped (smaller, independent changes):
- Bisection-style range narrowing — currently the prohibition range is the
  full failed block; the planner doesn't try to find the exact protected
  register. Operator-visible diagnostic + prohibition stays correct.
- Periodic re-probe to clear stale prohibitions.
- Surface auto-prohibited ranges through GetHostStatuses or a new
  diagnostic so the Admin UI can show what's been auto-isolated.
2026-04-25 01:01:42 -04:00
Joseph Doherty
0b7653d3b2 Task #147 — wire ModbusOptionsEditor into DriversTab
Branches the DriversTab driver-add form on driver type:
- For DriverType=Modbus, render the typed <ModbusOptionsEditor> component
  shipped in #145 instead of the generic JSON textarea.
- For other driver types, the existing textarea stays (other drivers ship
  their own typed editors per decision #94).

On Save, when type is Modbus, the form serialises ModbusOptionsViewModel
into the JSON DTO shape ModbusDriverFactoryExtensions consumes (host /
port / unitId / family / keepAlive / reconnect / max*** / writeOnChangeOnly
/ etc.). Other types still pass the textarea contents verbatim.

Drive-by fix: the DriverType dropdown listed "ModbusTcp" but the actual
factory-registered name is "Modbus" — DriverInstanceBootstrapper would
silently skip a row created with the old label because the factory lookup
would miss. Renamed to match.

Tests (2 new in ModbusOptionsViewModelTests):
- DriversTab_Serialized_Defaults_RoundTrip_Through_Factory — unedited
  view-model serializes to a JSON the factory accepts; resulting
  ModbusDriverOptions matches the form defaults bit-for-bit.
- DriversTab_Serializes_Edited_Values_Correctly — flipping Host / Port /
  UnitId / Family / MaxReadGap / WriteOnChangeOnly in the view model
  surfaces in the constructed driver's options.

The serializer in the test mirrors DriversTab.razor's SerializeModbusOptions
helper. If the form's serialization shape drifts, both must be updated
together; that's the cost of testing through the JSON DTO without bUnit.

Follow-up still open: the per-tag editor (ModbusAddressEditor wiring into
EquipmentTab.razor + the bulk-import help-text update) — that's a separate
surface that touches the equipment-row CRUD flow; covered as a follow-up
when the equipment tag editor surface is next touched.
2026-04-25 00:58:03 -04:00
Joseph Doherty
dfd027ebca Task #146 — Modbus addressing: align type codes with Wonderware DASMBTCP + Ignition
Web verification (2026-04-25) against current vendor docs surfaced concrete
grammar conflicts in the v1 suffix grammar shipped in #137. Hard cutover
before the Admin UI rolls out widely so users don't paste `:I` from a
Wonderware spreadsheet and silently get wrong-typed reads.

Sources:
- Wonderware DASMBTCP user guide
  https://cdn.logic-control.com/media/DASMBTCP.pdf
- Ignition Modbus addressing (8.1)
  https://www.docs.inductiveautomation.com/docs/8.1/ignition-modules/opc-ua/opc-ua-drivers/modbus/modbus-addressing

Type-code changes:

| Code   | Pre-#146 | Post-#146  | Vendor reference            |
|--------|----------|------------|------------------------------|
| `:S`   | (n/a)    | Int16      | Wonderware DASMBTCP `S`      |
| `:US`  | (n/a)    | UInt16     | Ignition `HRUS`              |
| `:I`   | Int16    | **Int32**  | Wonderware `I` + Ignition `HRI` |
| `:UI`  | UInt16   | **UInt32** | Ignition `HRUI`              |
| `:I_64`  | (n/a)  | Int64      | Ignition `HRI_64`            |
| `:UI_64` | (n/a)  | UInt64     | Ignition `HRUI_64`           |
| `:BCD_32`| (n/a)  | BCD32      | Ignition `HRBCD_32`          |

Codes REMOVED (no clear vendor precedent + conflict with the new mapping):
`:DI`, `:L`, `:UDI`, `:UL`, `:LI`, `:ULI`, `:LBCD`. Pre-#146 configs that
use them get an "Unknown type code" diagnostic at parse time so users get
a fast surface-level error rather than silent wrong-typed reads.

Codes UNCHANGED (already vendor-aligned): `:BOOL`, `:F`, `:D`, `:BCD`,
`:STR<n>`. Modicon 5/6-digit + mnemonic regions (HR/IR/C/DI) + bit suffix
`.N` are also unchanged.

Defaults:
- Coils / DiscreteInputs → `BOOL` (unchanged)
- HoldingRegisters / InputRegisters with no explicit type → Int16 (matches
  Ignition's bare `HR` default)

Byte-order mnemonics (`:ABCD` / `:CDAB` / `:BADC` / `:DCBA`) are kept but
documented as OtOpcUa-specific — they aren't in any major vendor's per-tag
address string. Ignition uses a `-R` suffix per prefix; Wonderware
configures word-order at the topic level.

Tests:
- 12 Type_Codes_Parse rows updated to assert the new mappings.
- New Removed_Aliases_Are_Rejected (×7) confirms each pre-#146 alias now
  fails fast with "Unknown type code".
- Worked_Example_Int16_Array uses the new `:S` code.
- New Worked_Example_Int32_Array_Via_I_Code documents the `:I = Int32`
  vendor-alignment intent so a future "fix" doesn't accidentally regress.
- Unknown_Type_Code_Rejected_With_Catalog updated to match the new error
  message ("Valid: BOOL, S, US, I, ...").

Docs:
- docs/v2/modbus-addressing.md — table replaced with the post-#146 codes,
  each row cites its Wonderware / Ignition reference. New "Codes removed
  in #146" subsection documents the cutover.
- docs/Driver.Modbus.Cli.md — example grammar list updated; explicit
  type-code reminder appended.

114 addressing tests + 231 driver tests still green. Solution build clean.
2026-04-25 00:51:50 -04:00
Joseph Doherty
858f300a61 Task #145 — Admin UI: expose new Modbus driver config
Two new Blazor components surface every Modbus knob added by #136-#144 so
users can configure the driver without hand-editing DriverConfig JSON.

ModbusAddressEditor.razor (live address-string parser preview):
- Bound to a string AddressString + a Family / MelsecSubFamily hint.
- On every input keystroke, runs ModbusAddressParser.TryParse and surfaces
  the resolved breakdown (Region, Offset, DataType, Bit, ByteOrder,
  ArrayCount, StringLength) inline as a green badge.
- On parse error, shows the parser's diagnostic in red.
- Re-uses the SAME parser the wire driver uses — grammar drift is
  impossible by construction.

ModbusOptionsEditor.razor (driver-instance options panel):
- Connection group (Host / Port / UnitId).
- Family group (#144) with conditional MelsecSubFamily dropdown.
- Keep-alive group (#139): Enabled / Time / Interval / RetryCount.
- Reconnect group (#139): InitialDelay / MaxDelay / BackoffMultiplier.
- Protocol group (#140): MaxRegistersPerRead / Write / Coils / ReadGap.
- Behaviour toggles (#140 + #141): UseFC15 / UseFC16 / WriteOnChangeOnly.
- Bound to ModbusOptionsViewModel — defaults match ModbusDriverOptions
  defaults so unedited rows produce the historical wire output verbatim.

Architecture:
- Admin project gains a ProjectReference to Driver.Modbus.Addressing
  (the shared parser assembly extracted in #136). Admin does NOT take a
  dep on Driver.Modbus itself — the addressing concerns are cleanly
  separated from the wire driver.
- Same-namespace shared assembly means components reference
  ModbusAddressParser / ModbusFamily / etc. without prefix gymnastics.

Tests:
- ModbusOptionsViewModelTests (1 test) — pins every default in the view
  model against the corresponding ModbusDriverOptions default. A
  regression that flips an unedited row to a non-default value gets
  caught here. (Test references both Admin and Driver.Modbus to make the
  cross-assembly comparison.)
- Live Blazor component testing requires bUnit, which isn't currently
  in the test setup; the parser logic the component wraps is fully
  covered by the 91 ModbusAddressParser tests in the addressing project,
  so the glue layer's behaviour is verifiable end-to-end already.

Caveat: the wiring into the existing DriverInstance edit page lives in
DriversTab.razor — that integration is left as a follow-up because it
touches the cluster-edit workflow specifically and the components in
this commit are framework-agnostic enough to drop in. The components
build clean against the existing Admin project; no behavioural change
to other tabs.
2026-04-25 00:26:43 -04:00
Joseph Doherty
366212417c Task #143 — Modbus block-read coalescing (with max-gap knob)
Adds a coalescing read planner that merges nearby tags into single FC03/FC04
PDUs, opt-in via ModbusDriverOptions.MaxReadGap. Default 0 = no coalescing
(every tag gets its own PDU — preserves pre-#143 wire output).

Worked example with MaxReadGap=10:
  T1 @ HR 100 (Int16, 1 reg)
  T2 @ HR 102 (Int16, 1 reg, gap 1 → joins block)
  T3 @ HR 110 (Float32, 2 regs, gap 7 → joins block)
  T4 @ HR 200 (Int16, 1 reg, gap 89 → splits, separate read)
  → 2 PDUs total: FC03 start=100 quantity=12 + FC03 start=200 quantity=1.

Planner:
- Eligible tags: known + register region (HR/IR) + scalar + not String /
  BitInRegister / array + not CoalesceProhibited.
- Groups by (UnitId, Region) — never coalesces across slaves or regions.
- Sorts by start address; merges when (next.start - last.end - 1) ≤ MaxReadGap
  AND the resulting span ≤ MaxRegistersPerRead. Otherwise opens a new block.
- Single-tag blocks are deferred to the per-tag path so WriteOnChange cache
  semantics stay correct without duplication.
- Per-block failure marks every member tag Bad and degrades health — same
  semantics the per-tag path has, but at the block granularity.

Per-tag escape hatch ModbusTagDefinition.CoalesceProhibited (bool, default
false) — when true, the tag is read in isolation regardless of MaxReadGap.
For PLCs with protected register holes between adjacent tags.

Tests (7 new ModbusCoalescingTests):
- MaxReadGap=0 keeps the per-tag behavior (2 reads for 2 tags).
- MaxReadGap=2 merges 3 tags within 5 registers into 1 read of qty=5.
- MaxReadGap=10 splits T1+T2 from T3 when the gap exceeds the threshold.
- CoalesceProhibited tag reads alone even when neighbours are eligible.
- Coalescing never crosses UnitId boundaries (multi-slave gateway safety).
- MaxRegistersPerRead caps a would-be block; planner falls back to separate
  reads when the merged span would exceed the cap.
- Per-tag values surface independently after coalescing (slice-math sanity).

Existing 220 unit tests still green; total 224 pass with the new file (tests
are additive, no regressions).

Follow-up: auto-split-on-protected-hole isn't shipped — a coalesced read
that hits an Illegal Data Address right now marks every member Bad until
the operator sets CoalesceProhibited on the offending tag. Tracked
implicitly by #138's e2e drill against a pymodbus profile with a protected
hole mid-block.
2026-04-25 00:21:18 -04:00
Joseph Doherty
ad7d811f69 Task #142 — Modbus multi-unit-ID per TCP connection (gateway support)
Lifts the previous "one driver = one slave" assumption so a single Modbus
driver instance can front N RTU slaves behind one Ethernet gateway (Anybus,
ProSoft, Lantronix style). Each tag carries an optional UnitId that drives
the MBAP unit-id byte per-PDU, and the IPerCallHostResolver contract surfaces
per-slave host strings so per-PLC circuit breakers fire per-slave (matches
the AB CIP template documented in docs/v2/multi-host-dispatch.md).

Changes:

- ModbusTagDefinition gains optional UnitId (byte?). Null = use driver-level
  ModbusDriverOptions.UnitId (preserves single-slave deployments verbatim).
- ResolveUnitId(tag) helper computed once per ReadOneAsync / WriteOneAsync
  call; passed through ReadRegisterBlockAsync / ReadBitBlockAsync /
  ReadRegisterBlockChunkedAsync / ReadBitBlockChunkedAsync explicitly. The
  probe loop continues using driver-level UnitId (the probe is a
  connection-health check, not slave-specific).
- ModbusDriver implements IPerCallHostResolver. ResolveHost(fullReference)
  returns "host:port/unitN" — distinct strings per slave so the resilience
  pipeline keys breakers on the right granularity. Unknown references fall
  back to the bare HostName (single-slave behaviour).
- BitInRegister RMW path also threads the per-tag UnitId through both the
  read and write halves so a multi-slave deployment stays correct under bit-
  level writes.
- Factory DTO + JSON binding extended with the per-tag UnitId field.

Tests (4 new ModbusMultiUnitTests):
- Per-tag UnitId routes to the correct slave in the MBAP header (driver-level
  UnitId=99 must NOT appear when both tags override).
- Tag without override falls back to driver-level UnitId.
- IPerCallHostResolver returns distinct "host:port/unitN" strings per slave.
- Unknown reference returns the bare HostName fallback.

Existing 220 unit tests + 107 addressing tests still green. Per-PLC breaker
isolation under simulated dead slaves is verifiable via the existing AB CIP
test infra; live coverage lands as an integration test in the #138 docs/e2e
refresh.
2026-04-25 00:16:41 -04:00
Joseph Doherty
4cf0b4eb73 Task #144 — Modbus family-native parser branch (DL205 / MELSEC)
Promotes DirectLogicAddress + MelsecAddress from "utility helpers an engineer
calls manually" to "first-class branch of ModbusAddressParser." Users can now
paste DL205-native (V2000, Y0, C100, X17, SP10) and MELSEC-native (D100, M50,
X20 hex/octal, Y0) addresses directly into TagConfig and the parser handles
the PLC-native → Modbus PDU translation.

Changes:

- Both helper files moved into the shared Driver.Modbus.Addressing assembly
  (same namespace, zero-churn for callers). Required because the parser
  needs to call them and the dependency direction is parser→helpers, not
  the other way.
- New ModbusFamily enum (Generic / DL205 / MELSEC) on
  ModbusDriverOptions.Family. Generic preserves pre-#144 behaviour exactly.
- ModbusDriverOptions.MelsecSubFamily picks the X/Y notation (Q_L_iQR hex
  vs F_iQF octal). Default Q_L_iQR.
- ModbusAddressParser.Parse now takes optional family + sub-family hints.
  When non-Generic, family-native parsing runs FIRST; on miss falls back to
  Modicon / mnemonic. Cross-family ambiguity (C100 = Modicon coil under
  Generic, DL205 control relay under DL205) is unambiguous within one
  driver instance.
- Suffix grammar composes with native addresses: V2000:F:CDAB:5 parses
  end-to-end as DL205 V-memory at PDU 1024 + Float32 + word-swap + array of 5.
- Bit suffix composes too: V2000.7 parses as bit 7 of HR[1024].
- Factory DTO fields Family / MelsecSubFamily flow through to BuildTag so
  the JSON binding can drive everything per-driver.

Tests: 16 new ModbusFamilyParserTests covering DL205 V/Y/C/X/SP, MELSEC
D/M/X/Y, sub-family hex-vs-octal disambiguation, cross-family C100 ambiguity,
fallback to Modicon when native misses, and grammar composition with bit/
byte-order/array modifiers. Existing 91 parser tests still green; 220 driver
tests still green.

Caveat: bank-base offsets for MELSEC X/Y/M default to 0 in the grammar
string. Sites with non-zero "Modbus Device Assignment Parameter" bases must
use the structured tag form to override — addressed in the docs refresh
(#138).
2026-04-25 00:10:43 -04:00
Joseph Doherty
4bffe879c5 Task #141 — Modbus subscribe-side knobs (deadband + write-on-change)
Two driver-side filters that ≥5 of 6 surveyed vendors expose:

1. Per-tag Deadband (double?, on ModbusTagDefinition) — when set, the
   PollGroupEngine onChange callback suppresses publishes whose distance
   from the last-published value is below the threshold. Reduces wire
   traffic to OPC UA clients on noisy analog signals (flow meters,
   temperatures). Numeric scalar types only — Bool / BitInRegister / String
   / array tags publish unconditionally.

2. WriteOnChangeOnly (bool, on ModbusDriverOptions) — when true, the driver
   short-circuits writes whose value matches the most recent successful
   write to that tag. Saves PLC bandwidth on clients that re-publish the
   same setpoint every scan. Cache invalidates on any read that returns a
   different value, so HMI-side changes don't get masked.

Both default off so existing deployments see no behaviour change.

Implementation:
- ShouldPublish guard wraps the existing OnDataChange invocation. First sample
  always passes through (no baseline); subsequent samples compare via
  Convert.ToDouble for the cross-numeric-type math.
- IsRedundantWrite check at the top of WriteAsync; on success the cache is
  populated. Object.Equals handles boxed-numeric equality; arrays are
  excluded (reference-equality would never match anyway).
- ReadAsync invalidates the WriteOnChangeOnly cache when the new value
  differs from the cached last-written value.

Tests (5 new ModbusSubscribeOptionsTests):
- Deadband suppresses sub-threshold changes (100 → 102 → 106 → 107 with
  deadband=5 publishes 100 and 106 only).
- Deadband=null still publishes every change.
- WriteOnChangeOnly suppresses 3 identical 42 writes (only first hits wire).
- WriteOnChangeOnly default false hits the wire every time.
- Read-divergence cache invalidation: external panel write to 99, our
  client's re-write of 42 must NOT be suppressed.

220/220 unit tests green; existing ProtocolOptions tests hardened against
probe-loop noise by disabling the probe in their fixtures.
2026-04-25 00:05:25 -04:00
Joseph Doherty
55f4044a69 Task #140 — Modbus protocol-behavior knobs
Adds ModbusDriverOptions knobs that ≥4 of 6 surveyed vendors expose:

1. MaxCoilsPerRead (ushort, default 2000) — separate from MaxRegistersPerRead
   because coil packing (1 bit per coil) and register packing (16 bits each)
   have different spec ceilings. Coil-array reads above the cap auto-chunk
   the same way register reads have always done. New ReadBitBlockChunkedAsync
   re-assembles per-chunk LSB-first bitmaps into one logical bitmap.

2. UseFC15ForSingleCoilWrites (default false) — forces FC15 (Write Multiple
   Coils with quantity=1) for single-coil writes instead of the default FC05
   (Write Single Coil). Safety / audit PLCs that only accept the multi-write
   codes need this.

3. UseFC16ForSingleRegisterWrites (default false) — same idea for FC16 vs
   FC06 on single holding-register writes.

4. DisableFC23 (default false) — placeholder no-op for the future block-read
   coalescing (#143) work that may opt into FC23 (Read/Write Multiple
   Registers). Lets deployments pre-disable FC23 for PLCs that won't accept
   it, before we ship the optimisation that emits it.

Defaults preserve the historical wire output bit-for-bit (FC05/FC06 for
singles, no chunking under 2000 coils, no FC23). Factory DTO + JSON-binding
extended with parallel fields.

6 new ModbusProtocolOptionsTests covering: defaults, FC05→FC15 forcing,
FC06→FC16 forcing, MaxCoilsPerRead chunking math (2500 coils / 2000 cap →
2 reads of 2000 + 500). Existing 209 unit tests still green.
2026-04-24 23:59:04 -04:00
Joseph Doherty
6cf20131fe Task #139 — Modbus connection-layer config knobs (keep-alive / idle / reconnect)
Promotes the previously hardcoded transport-layer settings to ModbusDriverOptions
so users can tune them through DriverConfig JSON without recompiling.

Three new option groups:

1. KeepAlive (ModbusKeepAliveOptions): Enabled / Time / Interval / RetryCount.
   Defaults preserve the historical PR 53 wire output exactly (Enabled=true,
   Time=30s, Interval=10s, RetryCount=3). Set Enabled=false for PLCs that
   reject SO_KEEPALIVE.

2. IdleDisconnectTimeout (TimeSpan?): when set, the transport tracks last-PDU-
   success and proactively closes + reconnects on the next request after the
   threshold. Defends against silent NAT / firewall socket reaping. Default
   null = disabled (no behaviour change).

3. Reconnect (ModbusReconnectOptions): InitialDelay / MaxDelay /
   BackoffMultiplier for the post-drop reconnect loop. Defaults
   (InitialDelay=0, MaxDelay=30s, Multiplier=2.0) preserve the historical
   immediate-retry behaviour for the first attempt and add geometric backoff
   only if the reconnect itself fails. Capped at 10 attempts before propagating.

ModbusTcpTransport ctor extended with optional keepAlive / idleDisconnect /
reconnect parameters; existing 4-arg call sites continue to compile. Factory
DTO gains parallel KeepAlive / IdleDisconnectMs / Reconnect fields with
default-aware binding.

5 new ModbusConnectionOptionsTests covering the default-preservation contract
(every default field matches pre-#139) and the JSON-binding round-trip for
each knob group. Existing 204 unit tests still green.
2026-04-24 23:53:26 -04:00
Joseph Doherty
850b816873 Task #137 — Modbus per-tag suffix grammar (type / bit / byte-order / array)
Adds the full Wonderware/Kepware/Ignition-style address suffix grammar so
users paste tag spreadsheets without per-tag manual translation:

  <region><offset>[.<bit>][:<type>[<len>]][:<order>][:<count>]

Examples that now parse end-to-end:
  40001                          HoldingRegisters[0], Int16
  400001                         same, 6-digit form
  40001.5                        bit 5 of HR[0]
  40001:F                        Float32 (HR[0..1])
  40001:F:CDAB                   word-swapped Float32
  40001:STR20                    20-char ASCII string
  HR1:DI                         Int32 via mnemonic region
  C100                           Coils[99] (mnemonic)
  40001:F:5                      Float32[5] array (3-field shorthand)
  40001:I:CDAB:10                Int16[10] word-swapped (4-field strict)

Driver-side plumbing:
- ModbusAddressParser + ParsedModbusAddress in the shared Addressing
  assembly. 91 parser tests (every grammar variant + malformed shapes).
- ModbusDataType / ModbusByteOrder moved to shared (with the same namespace
  so callers compile unchanged). ModbusByteOrder gains ByteSwap (BADC) and
  FullReverse (DCBA) alongside the existing BigEndian (ABCD) and WordSwap
  (CDAB).
- NormalizeWordOrder extended to honor all four orders for both 4-byte and
  8-byte values. Old WordSwap behavior preserved bit-for-bit.
- ModbusTagDefinition gains optional ArrayCount.
- ReadOneAsync / WriteOneAsync handle array fan-out: one FC03/04 read covers
  N consecutive register-typed elements, decoded into a typed array (short[],
  float[], etc.). Coil arrays use FC01 reads + FC15 writes (FakeTransport
  in tests gains FC15 support to match).
- DriverAttributeInfo IsArray / ArrayDim flow from ArrayCount so the OPC UA
  address space surfaces ValueRank=1 + ArrayDimensions to clients.
- ModbusDriverFactoryExtensions gains AddressString DTO field. When
  present, the parser drives Region/Address/DataType/ByteOrder/Bit/
  StringLength/ArrayCount; structured fields (Writable, WriteIdempotent,
  StringByteOrder) still come from the DTO. Existing structured tag rows
  keep working unchanged.

Tests: 91 parser unit tests (Driver.Modbus.Addressing.Tests, all green) +
204 driver tests including new ModbusByteOrderTests (BADC/DCBA roundtrips
across Int32/Float32/Float64) and ModbusArrayTests (Int16[5], Float32[3]
CDAB, Coil[10], length-mismatch error, IsArray/ArrayDim discovery).
Solution-wide build clean.

Caveat: grammar names (type codes, byte-order mnemonics, the :count
shorthand) were synthesized from training-era vendor docs. Verify against
current Kepware Modbus Ethernet Driver Help and Ignition Modbus Addressing
manuals before freezing for production deployments — naming may need a
back-compat layer if vendor wording has shifted.
2026-04-24 23:49:22 -04:00
Joseph Doherty
501d8f494b Task #136 — Modicon address-string parser (5/6-digit) + shared addressing assembly
Foundation for the Modbus addressing-grammar work tracked in #137-#145. Adds
ModbusModiconAddress.Parse / TryParse that turns classic Modicon strings
(40001 / 400001 / 30001 / 00001 / 10001) into (Region, ushort PduOffset).

Also extracts ModbusRegion to a new Driver.Modbus.Addressing assembly so the
Admin UI (#145) can reference the addressing surface without taking a dep on
the wire driver. The new assembly intentionally extends the same
ZB.MOM.WW.OtOpcUa.Driver.Modbus namespace as the driver — callers see the
type as if it lived in one place; only the project layout changes. No
existing call site needed editing (zero-churn move).

Behaviour:
- Single leading digit selects region (0=Coils, 1=DiscreteInputs,
  3=InputRegisters, 4=HoldingRegisters).
- 5-digit form: trailing 4 digits are 1-based register, supports 1..9999.
- 6-digit form: trailing 5 digits are 1-based register, supports 1..65536
  (full PDU address space).
- Strict 5-or-6 length check; whitespace trimmed; clear FormatException
  diagnostics for every malformed shape (wrong length, non-digit body,
  illegal leading digit, register zero, register overflow).

29/29 new unit tests pass. Full Driver.Modbus suite (182 tests) and the
solution-wide build still green after the ModbusRegion move.
2026-04-24 23:34:18 -04:00
Joseph Doherty
75c07149d4 Task #124 — Phase 6.2 multi-user authz interop matrix + close LdapGroups gap
The Phase 6.2 evaluator was wired but received no input in production:
RoleBasedIdentity (the IUserIdentity our LDAP path produces) implemented
IRoleBearer but not ILdapGroupsBearer, so AuthorizationGate.BuildSessionState
always returned null and the gate lax-mode-allowed every request. UserAuthResult
also never carried the resolved LDAP groups, only the role-mapped strings.

Closing the gap so the evaluator gets real data:

- UserAuthResult adds Groups alongside Roles. LdapUserAuthenticator now
  surfaces the raw RDN values (ReadOnly / WriteOperate / ...) it already
  collected during the directory query. Roles stay separate per decision #150
  (control-plane Admin role mapping vs data-plane NodeAcl key).
- RoleBasedIdentity implements ILdapGroupsBearer so AuthorizationGate sees
  the groups via the same seam unit tests already use.

ThreeUserInteropMatrixTests drives the closure end-to-end against the live
GLAuth dev directory:

- 5 distinct group memberships (readonly / writeop / writetune /
  writeconfig / alarmack) plus the multi-group admin user
- Each is bound through the real LdapUserAuthenticator
- Resolved groups feed an LdapBoundIdentity that goes through the strict-mode
  AuthorizationGate against a seeded TriePermissionEvaluator
- 31 InlineData rows assert the role × operation matrix; failures pinpoint
  the exact (user, op) cell

The remaining wire-level leg of #124 — a real OPC UA client driving UserName
tokens through an encrypted endpoint policy — still needs a deployment knob
and stays a manual cross-vendor smoke (#119 / #124 manual scope). The doc
audit note in admin-ui-phase-6-status.md is updated to reflect what's now
auto'd vs what stays manual.

33/33 new tests pass against live GLAuth; existing 270 non-LiveLdap tests
in Server.Tests still pass; Core.Tests 205/205, Admin.Tests 109/109. The 7
integration-test failures observed during this run pre-exist this commit
(NodeId-scheme regression from #134) and are tracked separately as #135.
2026-04-24 20:40:07 -04:00
Joseph Doherty
bd6568bcbd Phase 6.1 Stream B.4 — wire ScheduledRecycleHostedService into bootstrap
Task #125 / #137. The hosted service + scheduler classes already shipped;
this commit connects them to the published-generation driver list so a
Tier C driver with `RecycleIntervalSeconds` in its `ResilienceConfig`
actually gets an armed scheduler at bootstrap.

Wiring:

- `DriverFactoryRegistry.Register` gains an optional `DriverTier`
  parameter (default Tier.A). Existing call sites unchanged —
  `GalaxyProxyDriverFactoryExtensions.Register` explicitly passes
  Tier.C so the bootstrapper can identify out-of-process drivers
  without a per-driver-type allow-list.
- `DriverResilienceOptions` + parser grow `RecycleIntervalSeconds`.
  Tier A/B values are rejected with a diagnostic (decision #74 —
  recycling an in-process driver would kill every OPC UA session).
  Non-positive values are rejected the same way.
- `DriverInstanceBootstrapper` auto-arms a `ScheduledRecycleScheduler`
  after a successful driver register when: (1) the registered tier is
  C, (2) the row's ResilienceConfig carries a positive recycle interval,
  (3) DI has an `IDriverSupervisor` keyed by that `DriverInstanceId`.
  Missing supervisor → warn + skip (no crash). That keeps the wiring
  harmless by default: no driver ships a supervisor today, so the
  hosted service runs with zero schedulers out of the box.
- `Program.cs` registers `ScheduledRecycleHostedService` as singleton
  (shared with `DriverInstanceBootstrapper`) + hosted service (drives
  the tick loop). Constructor changes on the bootstrapper ripple into
  DI resolution automatically.

Tests: 4 new parser tests covering RecycleIntervalSeconds on Tier C
happy path, null default, Tier A/B rejection, non-positive rejection.
Existing 283 Server.Tests + 200 Core.Tests all still green.

No behavioural change for existing deployments: Galaxy driver + any
future Tier C driver gain the opt-in automatically; Tier A/B drivers
(FOCAS, Modbus, S7, AB CIP, AB Legacy, TwinCAT) are structurally
excluded.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-24 18:58:13 -04:00
Joseph Doherty
69e1d320ac Cold-start guard for script engines — skip evaluation with empty upstream
Both VirtualTagEngine and ScriptedAlarmEngine share a pattern: the
BuildReadCache helper iterates the script's declared input set, reading
from _valueCache with a fallback to _upstream.ReadTag. When an upstream
tag hasn't yet delivered its first subscription push, ReadTag returns a
DataValueSnapshot with a null Value and BadNotConnected quality. User
scripts then cast `(double)ctx.GetTag(path).Value` unconditionally and
throw NullReferenceException — once per evaluation tick until the cache
fills, spamming the log with identical stack traces. The existing catch
block recovered (kept the prior state) but didn't silence the churn.

Add AreInputsReady(cache) to both engines: return true only when every
entry has a non-null Value and a non-Bad StatusCode (Good + Uncertain
are both considered ready). Skip script evaluation when the check
returns false — the engine holds the prior state (alarm) or the prior
snapshot (virtual tag) until upstream delivers. Eliminates the cold-
start NRE spam at root without changing the script-engine contract.

Also: fix $changeLines.Count in test-galaxy.ps1 — PowerShell's
Set-StrictMode -Version 3.0 errors on .Count when Where-Object returns
0 or 1 items. Wrap in `@(...)` to force an array; same pattern the
sibling _common.ps1 already uses in Write-Summary for the same reason.

Task #112 — the Galaxy live E2E now passes 3/7 stages (probe + source
read + reverse-bridge-ACL). The remaining 4 stages (virtual-tag,
subscribe-sees-change, alarm-fires, history-read) are deployment-
specific: MoveInBatchID is idle in this Galaxy + its AccessLevel blocks
writes + it's not historized. Cold-start behaviour is now correct, so
once the seed points at a live attribute those stages should light up.

Tests: 36/36 VirtualTags.Tests + 47/47 ScriptedAlarms.Tests green.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-24 17:43:48 -04:00
Joseph Doherty
8be82e02c2 Path-based NodeIds — decouple client contract from driver address
The pre-refactor design minted OPC UA NodeIds directly from the driver's
FullReference (the native-address string). That had three long-term
problems:

1. OPC UA Part 3 §5.2.2 requires NodeIds to be immutable across a node's
   lifetime. A rename of the underlying device address — Galaxy attribute,
   S7 tag, Modbus register alias — changed the NodeId and broke every
   client that had pinned the previous identifier.
2. Two drivers with coincidentally-matching native addresses (e.g. `temp`
   in Modbus and `temp` in S7 under different Equipment rows) collided on
   the NodeId identifier.
3. TagConfig was being placed verbatim on the wire; for drivers whose
   TagConfig is JSON (every driver shipped today, per the
   CK_Tag_TagConfig_IsJson check constraint), clients saw the raw JSON
   blob as the NodeId string.

Refactor:

* DriverNodeManager.Variable now mints a stable path-based NodeId
  `{driverId}/{folder-path}/{browseName}` and records the driver-side
  FullReference in a new _fullRefByNodeId map. OnReadValue / OnWriteValue
  / ResolveFullRef look the FullReference up via that map instead of
  casting NodeId.Identifier. The old cast path is preserved as a
  fallback so any test fixture that still registers variables with
  FullRef-shaped NodeIds keeps working.

* EquipmentNodeWalker.AddTagVariable now extracts the cross-driver
  `FullName` field from Tag.TagConfig before handing the address to
  DriverAttributeInfo. Every shipped driver stores the wire reference in
  TagConfig[FullName]; falling back to the raw string covers any future
  driver that wants an opaque non-JSON address. ExtractFullName is
  exposed internal for unit coverage.

* scripts/e2e/test-galaxy.ps1 defaults updated to the new path-based
  NodeIds. Verified live against p7-smoke-galaxy on the dev box:
  `ns=2;s=p7-smoke-galaxy/lab-floor/galaxy-line/reactor-1/Source` reads
  return Status=0x00000000 with a real Galaxy byte-array value.

Test suite: 195/195 Core.Tests + 283/283 Server.Tests green. Five new
ExtractFullName / FullName-passthrough tests added.

Task #112 GA-3 — golden-path read verified end-to-end; remaining E2E
script stages still blocked on pre-existing issues (ScriptedAlarm
predicate NRE on empty upstream cache, PowerShell $changeLines.Count
guard), tracked separately.
Task #134 — complete.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-24 16:57:20 -04:00
Joseph Doherty
d11dd0520b Galaxy IPC unblock — live dev-box E2E path
Three root-cause fixes to get an elevated dev-box shell past session open
through to real MXAccess reads:

1. PipeAcl — drop BUILTIN\Administrators deny ACE. UAC's filtered token
   carries the Admins SID as deny-only, so the deny fired even from
   non-elevated admin-account shells. The per-connection SID check in
   PipeServer.VerifyCaller remains the real authorization boundary.

2. PipeServer — swap the Hello-read / VerifyCaller order. ImpersonateNamedPipeClient
   returns ERROR_CANNOT_IMPERSONATE until at least one frame has been read
   from the pipe; reading Hello first satisfies that rule. Previously the
   ACL deny-first path masked this race — removing the deny ACE exposed it.

3. GalaxyIpcClient — add a background reader + single pending-response
   slot. A RuntimeStatusChange event between OpenSessionRequest and
   OpenSessionResponse used to satisfy the caller's single ReadFrameAsync
   and fail CallAsync with "Expected OpenSessionResponse, got
   RuntimeStatusChange". The reader now routes response kinds (and
   ErrorResponse) to the pending TCS and everything else to a handler the
   driver registers in InitializeAsync. The Proxy was already set up to
   raise managed events from RaiseDataChange / RaiseAlarmEvent /
   OnHostConnectivityUpdate — those helpers had no caller until now.

4. RedundancyPublisherHostedService — swallow BadServerHalted while
   polling host.Server.CurrentInstance. StandardServer throws that code
   during startup rather than returning null, so the first poll attempt
   crashed the BackgroundService (and the host) before OnServerStarted
   ran. This race was latent behind the Galaxy init failure above.

Updates docs that described the Admins deny ACE + mandatory non-elevated
shells, and drops the admin-skip guards from every Galaxy integration +
E2E fixture that had them (IpcHandshakeIntegrationTests, EndToEndIpcTests,
ParityFixture, LiveStackFixture, HostSubprocessParityTests).

Adds GalaxyIpcClientRoutingTests covering the router's
request/response match, ErrorResponse, event-between-call, idle event,
and peer-close paths.

Verified live on the dev box against the p7-smoke cluster (gen 6):
driver registered=1 failedInit=0, Phase 7 bridge subscribed, OPC UA
server up on 4840, MXAccess read round-trip returns real data with
Status=0x00000000.

Task #112 — partial: Galaxy live stack is functional end-to-end. The
supplied test-galaxy.ps1 script still fails because the UNS walker
encodes TagConfig JSON as the tag's NodeId instead of the seeded TagId
(pre-existing; separate issue from this commit).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-24 16:30:16 -04:00
Joseph Doherty
fb6dd3478d Phase 6.2 Stream C wiring — AuthorizationBootstrap + OpcUaApplicationHost.SetAuthorization
Closes task #133 — the "authz gate is inert in production" blocker
surfaced during task #123. Before this commit, every ACL check on the
six dispatch surfaces (Read, Write, HistoryRead, Browse,
CreateMonitoredItems, Call) short-circuited to allow because Program.cs
constructed OpcUaApplicationHost without passing authzGate or
scopeResolver.

New pieces:

- `AuthorizationOptions` — bound to `Node:Authorization` in
  appsettings.json. `Enabled` (default false) is the master switch;
  `StrictMode` (default false) controls the anonymous / no-LDAP-groups
  fallback behaviour.
- `AuthorizationBootstrap` — singleton service that loads `NodeAcl`
  rows for the published generation, builds a `PermissionTrieCache` +
  `AuthorizationGate`, merges every registered driver's
  `EquipmentNamespaceContent` through `ScopePathIndexBuilder` into one
  full-path `NodeScopeResolver`. Returns `(null, null)` when disabled
  or when no generation is Published yet.
- `DriverEquipmentContentRegistry.Snapshot()` — new method returning a
  defensive copy of the driver → content map so the bootstrap can
  iterate without holding the lock.
- `OpcUaApplicationHost.SetAuthorization(gate, resolver)` — late-bind
  method matching the existing `SetPhase7Sources` pattern. Must run
  before `StartAsync`; rejects post-start rebinding with
  InvalidOperationException.
- `OpcUaServerService.ExecuteAsync` calls `AuthorizationBootstrap.BuildAsync`
  after `PopulateEquipmentContentAsync` and before `applicationHost.StartAsync`,
  in the same window that `SetPhase7Sources` runs.

Behaviour change
- Default (Enabled=false): no behaviour change — the gate stays null,
  all six dispatch surfaces run unchanged. Safe for any existing
  deployment on upgrade.
- Enabled=true with StrictMode=false: identities carrying LDAP groups
  are evaluated against the trie; anonymous / no-groups identities
  pass through (v1 legacy-client compatibility).
- Enabled=true with StrictMode=true: everything evaluates. Anonymous
  or no-groups identities are denied.

Follow-up not covered here: rebind the gate+resolver on generation
refresh (the `GenerationRefreshHostedService` that shipped earlier in
this session). Today the gate only reflects the bootstrap generation
— operators publishing new ACL changes need a process restart to see
them. Matches the current driver-hot-reload limitation and is tracked
in the existing 6.3 follow-up bullet.

Docs: v2-release-readiness.md Phase 6.2 Stream C.12 bullet flipped to
Closed with operator-facing config pointer (`Node:Authorization:Enabled`).

All 283/283 Server.Tests still pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-24 15:35:46 -04:00
Joseph Doherty
ded292ecd7 Phase 6.2 Stream C — Call + Alarm Acknowledge/Confirm gating
Closes task #122 (Acknowledge + Confirm + generic Call — Shelve stays as
a follow-up pending per-instance method-NodeId resolution).

Before this commit any session with a connected channel could invoke
method nodes on driver-materialized equipment — including alarm
Acknowledge / Confirm. Combined with the Browse + CreateMonitoredItems
gates that landed earlier in Stream C, this was the last service-layer
entry point where a session could still affect state without passing
the authz trie.

Implementation on DriverNodeManager:
- `Call` override — pre-iterates methodsToCall, gates each through
  AuthorizationGate with the operation kind returned by
  MapCallOperation. Denied calls get errors[i] = BadUserAccessDenied
  before delegating to base.Call.
- `MapCallOperation(NodeId methodId)` — maps well-known Part 9 method
  NodeIds to dedicated operation kinds:
    MethodIds.AcknowledgeableConditionType_Acknowledge →
        OpcUaOperation.AlarmAcknowledge
    MethodIds.AcknowledgeableConditionType_Confirm →
        OpcUaOperation.AlarmConfirm
    everything else → OpcUaOperation.Call
  Lets the ACL distinguish "can acknowledge alarms" from "can invoke
  arbitrary methods" without conflating the two roles.
- Shelve dispatch paths through per-instance ShelvedStateMachine methods
  with dynamic NodeIds that can't be constant-matched — falls through
  to generic Call. Fine-grained OpcUaOperation.AlarmShelve is a follow-
  up when the method-invocation path grows a "method-role" annotation.

Extracted GateCallMethodRequests + MapCallOperation as static internal
for unit-testability. 8 new tests (MapCallOperation Acknowledge /
Confirm / generic; gate-null no-op, denied-Acknowledge, allowed-
Acknowledge, mixed-batch, pre-populated-error-preserved).
Server.Tests 269 → 277.

Known follow-ups:
- Shelve per-operation gating (see above).
- TranslateBrowsePathsToNodeIds gating (Browse follow-up from #120).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-24 15:22:19 -04:00
Joseph Doherty
6a6b0f56f2 Phase 6.2 Stream C — CreateMonitoredItems per-item gating
Closes task #121 (partial — creation-time gate; decision #153 per-item
revocation stamp is a follow-up).

Before this commit a session could subscribe to any node via
CreateMonitoredItems, even nodes where Read was denied — the
subscription would surface BadUserAccessDenied on each data-change
read, but the client saw a successful CreateMonitoredItems response
and held the subscription open, wasting resources and leaking the
address-space shape through the item metadata.

New override on DriverNodeManager.CreateMonitoredItems:
- Pre-iterates itemsToCreate, gates each through AuthorizationGate with
  OpcUaOperation.CreateMonitoredItems at the target node's scope.
- For denied slots: sets errors[i] = new ServiceResult(
  StatusCodes.BadUserAccessDenied). The OPC Foundation base stack
  honours pre-populated non-success errors and skips item creation for
  those slots — the subscription never holds a handle to a denied
  node.
- Preserves prior errors (e.g. BadNodeIdUnknown) — first diagnosis wins.
- Non-string-identifier references (stack-synthesized numeric ids)
  bypass the gate.

Extracted the pure filter logic into
GateMonitoredItemCreateRequests(items, errors, identity, gate,
scopeResolver) — static internal, unit-testable without the OPC UA
server stack.

Tests — 6 new in MonitoredItemGatingTests.cs (gate-null no-op,
denied-gets-BadUserAccessDenied, allowed-passes, mixed-batch-denies-
per-item, pre-populated-error-preserved, numeric-id-bypass). Server.Tests
263 → 269.

Known follow-ups:
- Per-item (AuthGenerationId, MembershipVersion) stamp (decision #153)
  for detecting revocation mid-subscription — needs subscription-layer
  plumbing.
- TransferSubscriptions not yet wired (same pattern, smaller scope).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-24 15:17:40 -04:00
Joseph Doherty
e8b8541554 Phase 6.2 Stream C — Browse gating on DriverNodeManager
Closes task #120 (partial — strict point-check; ancestor-visibility
implication is a follow-up).

Before this commit DriverNodeManager exposed every materialized node to
every browsing session regardless of the user's ACL. Read + Write +
HistoryRead were already gated through AuthorizationGate in Phase 6.2
Stream C core; Browse was the one surface where the session could still
enumerate nodes it had no permission to touch, discovering structure
even when reads failed with BadUserAccessDenied.

Implementation
- New `Browse` override on DriverNodeManager that calls base.Browse
  first (lets the stack populate the reference list normally), then
  post-filters the IList<ReferenceDescription> so denied nodes are
  removed silently. OPC UA convention: Browse filtering is invisible to
  the client; no BadUserAccessDenied surfaces.
- Extracted the filter loop into the static internal
  `FilterBrowseReferences(references, userIdentity, gate, scopeResolver)`
  so the policy is unit-testable without standing up the full OPC UA
  server stack.
- Non-string NodeId identifiers (stack-synthesized standard-type
  references with numeric identifiers) bypass the gate — only driver-
  materialized nodes key into the authz trie.
- When AuthorizationGate or NodeScopeResolver is null, the filter is a
  no-op — preserves the pre-Phase-6.2 dispatch path for integration
  tests that construct DriverNodeManager without authz.

Tests — 6 new in BrowseGatingTests.cs (gate-null no-op, empty-list
no-op, denied-removed, allowed-passes-through, numeric-id bypass,
lax-mode null-identity keeps references). Server.Tests 257 → 263.

Known follow-up (tracked implicitly under #120 re-scope):
- Ancestor-visibility implication (acl-design.md §Browse line 111): a
  user with Read at `Line/Tag` should be able to Browse `Line` even
  without an explicit Browse grant. Current filter does a strict
  point-check. Proper fix needs TriePermissionEvaluator to expose a
  "subtree-has-any-grant" query.
- TranslateBrowsePathsToNodeIds not yet filtered (same extension
  pattern; small follow-up).

Docs: v2-release-readiness.md Phase 6.2 Stream C hardening list marks
the Browse bullet struck-through with "Partial" close-out note.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-24 15:11:19 -04:00
Joseph Doherty
a23de2a7e4 Phase 6.3 A.2 + D.1 — GenerationRefreshHostedService: poll + lease-wrap apply
Closes tasks #132 + #118 (GA hardening backlog).

Before this commit, the Server only observed the generation in force at
process start (SealedBootstrap). Peer-published generations accumulated
in the shared config DB while the running node kept serving the
generation it had sealed on boot. Two consequences:

1. Operator role-swaps required a process restart — Admin publishes a
   new generation, but the Server's RedundancyCoordinator never re-read
   the topology.
2. ApplyLeaseRegistry had no apply to wrap. ServiceLevelBand sat at
   PrimaryHealthy (255) during every publish because nothing opened a
   lease; PrimaryMidApply (200) was effectively dead code.

New GenerationRefreshHostedService (src/.../Server/Hosting/):
- Polls sp_GetCurrentGenerationForCluster every 5s (tunable).
- On change: opens leases.BeginApplyLease(newGenerationId, Guid.NewGuid()),
  calls coordinator.RefreshAsync inside the `await using`, releases on
  scope exit (success / exception / cancellation via IAsyncDisposable).
- Diagnostic properties: LastAppliedGenerationId, TickCount, RefreshCount.
- Delegate-injected currentGenerationQuery for test drive-through; real
  path is the private static DefaultQueryCurrentGenerationAsync.
- Registered as HostedService in Program.cs alongside the Phase 6.3
  redundancy / peer-probe stack.

Scope intentionally narrow: only the coordinator refreshes today. Driver
re-init, virtual-tag re-bind, script-engine reload remain as follow-up
wiring. The lease wrap is the right seam for those subscribers to hook
once they grow hot-reload support — the doc comments say so.

Tests
- 5 new unit tests in GenerationRefreshHostedServiceTests (first-apply,
  identity no-op, change-triggers-refresh, null-generation-is-no-op,
  lease-is-released-on-exit). Stub generation-query delegate; real
  coordinator backed by EF InMemory DB.
- Server.Tests total 252 → 257.

Docs
- v2-release-readiness.md Phase 6.3 follow-ups list marks the
  sp_PublishGeneration lease wrap bullet struck-through with close-out
  note.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-24 15:02:33 -04:00
Joseph Doherty
de77d42eab Phase 6.3 Stream B — peer-probe HostedServices populating PeerReachabilityTracker
Closes task #116 (GA hardening backlog). Before this commit the
RedundancyStatePublisher saw PeerReachability.Unknown for every peer
because the tracker had no writers — every healthy peer got
degraded to the Isolated-Primary band (230) even when fully reachable.
Not release-blocking (safe default), but not the full non-transparent-
redundancy UX either.

Two-layer probe model per docs/v2/implementation/phase-6-3-redundancy-runtime.md
§Stream B:

- PeerHttpProbeLoop (Stream B.1) — fast-fail layer at 2 s / 1 s timeout.
  Hits each peer's http://{Host}:{DashboardPort}/healthz via an injected
  IHttpClientFactory. Writes the HTTP bit of PeerReachability while
  preserving the UA bit from the last UA probe so a transient HTTP blip
  doesn't clobber the authoritative UA reading.

- PeerUaProbeLoop (Stream B.2) — authoritative layer at 10 s / 5 s
  timeout. Calls DiscoveryClient.GetEndpoints against opc.tcp://{Host}:
  {OpcUaPort} — cheap compared to a full Session.Create, no cert trust
  required. Short-circuits when the HTTP probe last reported the peer
  unhealthy (no wasted handshakes on a known-dead endpoint), clearing
  the stale UaHealthy bit in that case.

Both inherit from BackgroundService, follow the tick/delay/catch pattern
RedundancyPublisherHostedService + ResilienceStatusPublisherHostedService
established, and expose TickAsync() as internal for test drive-through.

New PeerProbeOptions class carries the four intervals/timeouts so
operators can tune cadence per site. Registered as singleton in Program.cs;
HTTP client registered by name so the OtOpcUa handler chain
(Serilog enrichers, potential future OpenTelemetry instrumentation) isn't
bypassed.

Tests — 9 new unit tests across PeerHttpProbeLoopTests (5) and
PeerUaProbeLoopTests (4). All pass. Server.Tests total 243 → 252.
Full solution build clean.

Docs: v2-release-readiness.md Phase 6.3 follow-ups list marks the
peer-probe bullet struck-through with a close-out note.

Still deferred in Phase 6.3:
  - OPC UA variable-node binding (task #117 — ServiceLevel + ServerUriArray)
  - sp_PublishGeneration lease wrap (task #118)
  - Client interop matrix (task #119)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-24 14:53:38 -04:00
Joseph Doherty
96918b148c Unblock phase-6 compliance meta-runner on task-galaxy-e2e
Two small fixes so `scripts/compliance/phase-6-all.ps1` exits 0 — this is
GA exit-criterion #1 from docs/v2/v2-release-readiness.md.

1. Admin csproj: bump OpenTelemetry.Extensions.Hosting 1.15.2 → 1.15.3 +
   OpenTelemetry.Exporter.Prometheus.AspNetCore 1.15.2-beta.1 →
   1.15.3-beta.1. Fixes NU1902 moderate-severity advisory
   (GHSA-g94r-2vxg-569j) on the transitive OpenTelemetry.Api 1.15.2 pull.
   TreatWarningsAsErrors on the Admin project promoted the advisory to an
   error and failed the whole `dotnet test` run at restore.

2. SchemaComplianceTests.All_expected_tables_exist: the expected-tables
   list drifted behind four Phase 7 migration additions — Script,
   ScriptedAlarm, ScriptedAlarmState, VirtualTag. The EF model + live
   migrations have carried these tables for a while; the compliance test
   just needed the four names added. Applied migrations against a scratch
   DB to confirm the list is exhaustive.

Verification: full solution test pass 2301 / 2301 (one tolerated
pre-existing CLI flake). Phase 6 aggregate compliance: all four phases
PASS with no test-count regression.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-24 14:36:20 -04:00
Joseph Doherty
69e0d02c72 task-galaxy-e2e branch — non-FOCAS work-in-progress snapshot
Catch-all commit for pending work on the task-galaxy-e2e branch that
wasn't part of the FOCAS migration. Grouping by topic so future per-topic
commits can be cherry-picked if needed.

TwinCAT
- src/.../Driver.TwinCAT/AdsTwinCATClient.cs + TwinCATDriverFactoryExtensions.cs:
  factory-registration extensions + ADS client refinements.
- src/.../Driver.TwinCAT.Cli/Commands/BrowseCommand.cs: new browse command
  for the TwinCAT test-client CLI.
- tests/.../Driver.TwinCAT.IntegrationTests/TwinCAT3SmokeTests.cs + TwinCatProject/:
  fixture scaffold with a minimal POU + README pointing at the TCBSD/ESXi
  VM for e2e.
- docs/Driver.TwinCAT.Cli.md + docs/drivers/TwinCAT-Test-Fixture.md:
  documentation for the above.
- docs/v3/twincat-backlog.md: forward-looking backlog seed.

Admin UI + fleet status
- src/.../Admin/Components/Pages/Clusters/DriversTab.razor + Hosts.razor:
  UI refresh for fleet-status rendering.
- src/.../Admin/Hubs/FleetStatusHub.cs + FleetStatusPoller.cs +
  Admin/Program.cs: SignalR hub + poller plumbing for live fleet data.
- tests/.../Admin.Tests/FleetStatusPollerTests.cs: poller coverage.

Server + redundancy runtime (Phase 6.3 follow-ups)
- src/.../Server/Hosting/RedundancyPublisherHostedService.cs: HostedService
  that owns the RedundancyStatePublisher lifecycle + wires peer reachability.
- src/.../Server/Redundancy/ServerRedundancyNodeWriter.cs: OPC UA
  variable-node writer binding ServiceLevel + ServerUriArray to the
  publisher's events.
- src/.../Server/Program.cs + Server.csproj: hosted-service registration.
- tests/.../Server.Tests/ServerRedundancyNodeWriterTests.cs +
  Server.Tests.csproj: coverage for the above.

Configuration
- src/.../Configuration/Validation/DraftValidator.cs +
  tests/.../Configuration.Tests/DraftValidatorTests.cs: draft-validation
  refinements.

E2E scripts (shared infrastructure)
- scripts/e2e/README.md + _common.ps1 + test-all.ps1: shared helpers + the
  all-drivers test-all runner.
- scripts/e2e/test-opcuaclient.ps1: OPC UA Client e2e runner.

Docs
- docs/v2/implementation/phase-6-{1,2,3,4}*.md + exit-gate-phase-{3,7}.md:
  phase-gate + implementation doc updates.
- docs/v2/plan.md: top-level plan refresh.
- docs/v2/redundancy-interop-playbook.md: client interop playbook for the
  Phase 6.3 redundancy-runtime work.

Two orphan FOCAS docs remain on disk but deliberately unstaged —
docs/v2/focas-deployment.md and docs/v2/implementation/focas-simulator-plan.md
describe the now-retired Tier-C topology and should either be rewritten
or deleted in a follow-up.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-24 14:12:19 -04:00
Joseph Doherty
4b0664bd55 FOCAS — retire Tier-C split, inline managed wire client, make read-only
Migration closes the FOCAS Tier-C architecture. OtOpcUa previously had
`Driver.FOCAS.Host` (NSSM-wrapped Windows service loading Fwlib64.dll via
P/Invoke) + `Driver.FOCAS.Shared` (MessagePack IPC contracts) + a C shim
DLL stand-in for unit tests. All of it is deleted; the driver is now a
single in-process managed assembly talking the FOCAS/2 Ethernet binary
protocol directly on TCP:8193.

Architecture

- Pure-managed `FocasWireClient` inlined at `src/.../Driver.FOCAS/Wire/`
  (owner-imported — see Wire/FocasWireClient.cs for the full surface).
  Opens two TCP sockets, runs the initiate handshake, serialises requests
  on socket 2 through a semaphore, closes cleanly with PDU + socket
  teardown. Both sync `IDisposable` and async `IAsyncDisposable`.
- `WireFocasClient` (same folder) adapts the wire client to OtOpcUa's
  `IFocasClient` surface — fixed-tree reads, PARAM/MACRO/PMC addresses,
  alarms. Writes return `BadNotWritable` by design — OtOpcUa is read-only
  against FOCAS.
- `FocasDriverFactoryExtensions` now accepts `"Backend": "wire"` (default)
  and `"Backend": "unimplemented"`. Legacy `ipc` and `fwlib` backends are
  rejected at startup with a diagnostic pointing at the migration doc.

Deletions

- `src/ZB.MOM.WW.OtOpcUa.Driver.FOCAS.Host/` — whole project + Ipc/,
  Backend/, Stability/, Program.cs.
- `src/ZB.MOM.WW.OtOpcUa.Driver.FOCAS.Shared/` — Contracts/, FrameReader,
  FrameWriter, whole project.
- `tests/...Driver.FOCAS.Host.Tests/` + `.Shared.Tests/` — whole projects.
- `src/.../Driver.FOCAS/FwlibNative.cs` + `FwlibFocasClient.cs` — 21
  P/Invokes + 7 `Pack=1` marshalling structs + the Fwlib-backed
  `IFocasClient` implementation.
- `src/.../Driver.FOCAS/Ipc/` + `Supervisor/` — IPC client wrapper +
  Host-process supervisor (backoff, circuit breaker, heartbeat, post-
  mortem reader, process launcher).
- `scripts/install/Install-FocasHost.ps1` — NSSM service installer.
- `tests/.../Driver.FOCAS.Tests/{IpcFocasClientTests, IpcLoopback,
  FwlibNativeHelperTests, PostMortemReaderCompatibilityTests,
  SupervisorTests, FocasDriverFactoryExtensionsTests}.cs` — tests that
  exercised the retired surfaces.
- `tests/.../Driver.FOCAS.IntegrationTests/Shim/` — the zig-built C shim
  DLL that masqueraded as Fwlib64.dll.

Solution changes

- `ZB.MOM.WW.OtOpcUa.slnx` drops the 4 retired project refs.
- `src/.../Driver.FOCAS.csproj` drops the Shared ProjectReference, adds
  `Microsoft.Extensions.Logging.Abstractions` for the optional `ILogger`
  hook in `FocasWireClient`.
- `src/.../Driver.FOCAS.Cli.csproj` drops the six `<Content Include>`
  entries that copied `vendor/fanuc/*.dll` into the CLI bin. CLI now uses
  `WireFocasClient` directly.
- `FocasDriver` default factory flips to `Wire.WireFocasClientFactory`.

Integration tests

- New `tests/.../Driver.FOCAS.IntegrationTests/` project covering fixed-
  tree reads (identity, axes, dynamic, program, operation mode, timers,
  spindle load + max RPM, servo meters), user-authored PARAM / MACRO /
  PMC reads, `DiscoverAsync` emission, `SubscribeAsync` + `OnDataChange`,
  `IAlarmSource` raise/clear transitions, and `ProbeAsync` /
  `OnHostStatusChanged`. 9 e2e tests against the focas-mock fixture
  (Docker container with the vendored Python mock's native FOCAS/2
  Ethernet responder).
- `scripts/integration/run-focas.ps1` orchestrates compose up → tests →
  compose down. Dropped the shim-build stage + DLL-copy step + the split
  testhost workaround (the latter only existed because of native-DLL
  lifecycle bugs the shim tripped).
- Docker compose collapses from 11 per-series services to one `focas-sim`
  service. Tests seed per-series state via `mock_load_profile` at test
  start.
- Vendored focas-mock snapshot refreshed to pick up upstream's native
  FOCAS/2 Ethernet responder (was 660 lines, now 1018) — the
  pre-refresh snapshot only spoke the JSON admin protocol.

Tests

- 145/145 unit tests in `Driver.FOCAS.Tests` pass (was 208 pre-deletion;
  63 removed tests exercised the retired IPC/shim/supervisor/Fwlib
  surfaces).
- 9/9 integration tests pass against the refreshed mock.
- `FocasScaffoldingTests.Unimplemented_factory_throws_on_Create…` updated
  to assert the new diagnostic message pointing at
  `docs/drivers/FOCAS.md` rather than the now-gone `Fwlib64.dll`.

Docs

- `docs/drivers/FOCAS.md` rewritten for the managed wire topology —
  deployment collapses to one `"Backend": "wire"` config block, no
  separate service, no DLL deployment, no pipe ACL.
- `docs/drivers/FOCAS-Test-Fixture.md` updated — single TCP probe skip
  gate instead of TCP + shim probe; fewer moving parts.
- `docs/drivers/README.md` row for FOCAS reflects the Tier-A managed
  topology (previously listed Tier-C + `Fwlib64.dll` P/Invoke).
- `docs/Driver.FOCAS.Cli.md` drops the Tier-C architecture-note section.
- `docs/v2/implementation/focas-isolation-plan.md` marked historical —
  the plan it documents was executed then superseded by the wire client.
- `docs/v2/v2-release-readiness.md` re-audited 2026-04-24. Phase 5
  driver complement closed. FOCAS change-log entry added.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-24 14:10:59 -04:00
Joseph Doherty
404b54add0 FOCAS — commit previously-orphaned support files
Brings seven FOCAS-related files into git that shipped as part of earlier
FOCAS work but were never staged. Adding them now so the tree reflects the
compilable state + pre-empts dead references from the migration commit that
follows:

- src/.../Driver.FOCAS/FocasAlarmProjection.cs — raise/clear diffing + severity
  mapping surfaced via IAlarmSource on FocasDriver. Referenced by committed
  FocasDriver.cs; tests in FocasAlarmProjectionTests.cs.
- src/.../Admin/Services/FocasDriverDetailService.cs — Admin UI per-instance
  detail page data source.
- src/.../Admin/Components/Pages/Drivers/FocasDetail.razor — Blazor page
  rendering the above (from task #69).
- tests/.../Admin.Tests/FocasDriverDetailServiceTests.cs — exercises the
  detail service.
- tests/.../Driver.FOCAS.Tests/FocasAlarmProjectionTests.cs — raise/clear
  diff semantics against FakeFocasClient.
- tests/.../Driver.FOCAS.Tests/FocasHandleRecycleTests.cs — proactive recycle
  cadence test.
- docs/v2/implementation/focas-wire-protocol.md — captured FOCAS/2 Ethernet
  wire protocol reference. Useful going forward even though the Tier-C /
  simulator plan docs are historical.

No runtime behaviour change — these files compile today and the solution
build/test pass already depends on them.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-24 14:09:51 -04:00
Joseph Doherty
2ec6aa480e Task #219 — OpcUaServerOptions.AnonymousRoles (5/5 e2e stages pass)
Anonymous OPC UA sessions had no roles (`UserIdentity()`), so
`WriteAuthzPolicy.IsAllowed(SecurityClassification.Operate, [])`
rejected every write with `BadUserAccessDenied`. The reverse-write
stage of the Modbus e2e script surfaced this: stages 1-3 + 5 pass
forward-direction, stage 4 (OPC UA client → server → driver → PLC)
blew up with `0x801F0000` even with the factory + seed perfectly
wired.

Adds a single config knob:

    "OpcUaServer": {
      "AnonymousRoles": ["WriteOperate"]
    }

Default empty preserves the pre-existing production-safe behaviour
(anonymous reads FreeAccess tags, rejected on everything else). When
non-empty, `OtOpcUaServer.OnImpersonateUser` wraps the anonymous token
in a `RoleBasedIdentity("(anonymous)", "Anonymous", AnonymousRoles)`
so the server-layer write guard sees the configured roles.

Wire-through:
 - OpcUaServerOptions.AnonymousRoles (new)
 - OpcUaApplicationHost passes it to OtOpcUaServer ctor
 - OtOpcUaServer new anonymousRoles ctor param + OnImpersonateUser
   branch
 - Program.cs reads `OpcUaServer:AnonymousRoles` section from config

Env override syntax: `OpcUaServer__AnonymousRoles__0=WriteOperate`.

## Verified live

Booted server against `seed-modbus-smoke.sql` with
`OpcUaServer__AnonymousRoles__0=WriteOperate` + pymodbus fixture →
`test-modbus.ps1 -BridgeNodeId "ns=2;s=HR200"`:

    === Modbus e2e summary: 5/5 passed ===
    [PASS] Probe
    [PASS] Driver loopback
    [PASS] Server bridge            (driver → server → client)
    [PASS] OPC UA write bridge      (client → server → driver)
    [PASS] Subscribe sees change

All five stages green end-to-end. Issue #219 closed by this PR; the
Modbus-seed update to set AnonymousRoles lives in the follow-up #220
live-boot PR (same AnonymousRoles value applies to every driver since
the classification is a driver-constant, not per-tag).

Full-solution build: 0 errors, only pre-existing xUnit1051 warnings.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-21 11:49:41 -04:00
Joseph Doherty
7ba783de77 Tasks #211 #212 #213 — AbCip / S7 / AbLegacy server-side factories + seed SQL
Parent: #209. Follow-up to #210 (Modbus). Registers the remaining three
non-Galaxy driver factories so a Config DB `DriverType` in
{`AbCip`, `S7`, `AbLegacy`} actually boots a live driver instead of
being silently skipped by DriverInstanceBootstrapper.

Each factory follows the same shape as ModbusDriverFactoryExtensions +
the existing Galaxy + FOCAS patterns:
 - Static `Register(DriverFactoryRegistry)` entry point.
 - Internal `CreateInstance(driverInstanceId, driverConfigJson)` —
   deserialises a DTO, strict-parses enum fields (fail-fast with an
   explicit "expected one of" list), composes the driver's options object,
   returns a new driver.
 - DriverType keys: `"AbCip"`, `"S7"`, `"AbLegacy"` (case-insensitive at
   the registry layer).

DTO surfaces cover every option the respective driver's Options class
exposes — devices, tags, probe, timeouts, per-driver quirks
(AbCip `EnableControllerBrowse` / `EnableAlarmProjection`, S7 Rack/Slot/
CpuType, AbLegacy PlcFamily).

Seed SQL (mirrors `seed-modbus-smoke.sql` shape):
 - `seed-abcip-smoke.sql` — `abcip-smoke` cluster + ControlLogix device +
   `TestDINT:DInt` tag, pointing at the ab_server compose fixture
   (`ab://127.0.0.1:44818/1,0`).
 - `seed-s7-smoke.sql` — `s7-smoke` cluster + S71500 CPU + `DB1.DBW0:Int16`
   tag at the python-snap7 fixture (`127.0.0.1:1102`, non-priv port).
 - `seed-ablegacy-smoke.sql` — `ablegacy-smoke` cluster + SLC 500 + `N7:5`
   tag. Hardware-gated per #222; placeholder gateway to be replaced with
   real SLC/MicroLogix/PLC-5/RSEmulate before running.

Build plumbing:
 - Each driver project now ProjectReferences `Core` (was
   `Core.Abstractions`-only). `DriverFactoryRegistry` lives in `Core.Hosting`
   so the factory extensions can't compile without it. Matches the FOCAS +
   Galaxy.Proxy reference shape.
 - `Server.csproj` adds the three new driver ProjectReferences so Program.cs
   resolves the symbols at compile-time + ships the assemblies at runtime.

Full-solution build: 0 errors, 334 pre-existing xUnit1051 warnings only.

Live boot verification of all four (Modbus + these three) happens in the
exit-gate PR — factories + seeds are pre-conditions and are being
shipped first so the exit-gate PR can scope to "does the server publish
the expected NodeIds + does the e2e script pass."

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-21 11:15:38 -04:00
Joseph Doherty
55245a962e Task #210 — Modbus server-side factory + seed SQL (closes first of #209 umbrella)
Parent: #209. Adds the server-side wiring so a Config DB `DriverType='Modbus'`
row actually boots a Modbus driver instance + publishes its tags under OPC UA
NodeIds, instead of being silently skipped by DriverInstanceBootstrapper.

Changes:
 - `ModbusDriverFactoryExtensions` (new) — mirrors
   `GalaxyProxyDriverFactoryExtensions` + `FocasDriverFactoryExtensions`.
   `DriverTypeName="Modbus"`, `CreateInstance` deserialises
   `ModbusDriverConfigDto` (Host/Port/UnitId/TimeoutMs/Probe/Tags) to a full
   `ModbusDriverOptions` and hands back a `ModbusDriver`. Strict enum parsing
   (Region / DataType / ByteOrder / StringByteOrder) — unknown values fail
   fast with an explicit "expected one of" error rather than at first read.
 - `Program.cs` — register the factory after Galaxy + FOCAS.
 - `Driver.Modbus.csproj` — add `Core` project reference (the DI-free factory
   needs `DriverFactoryRegistry` from `Core.Hosting`). Matches the FOCAS
   driver's reference shape.
 - `Server.csproj` — add the `Driver.Modbus` ProjectReference so the
   Program.cs registration compiles against the same assembly the server
   loads at runtime.
 - `scripts/smoke/seed-modbus-smoke.sql` (new) — one-cluster smoke seed
   modelled on `seed-phase-7-smoke.sql`. Creates a `modbus-smoke` cluster +
   `modbus-smoke-node` + Draft generation + Namespace + UnsArea/UnsLine/
   Equipment + one Modbus `DriverInstance` pointing at the pymodbus standard
   fixture (`127.0.0.1:5020`) + one Tag at `HR[200]:UInt16`, ending in
   `EXEC sp_PublishGeneration`. HR[100] is deliberately *not* used because
   pymodbus `standard.json` runs an auto-increment action on that register.

Full-solution build: 0 errors, only the pre-existing xUnit1051 warnings.

AB CIP / S7 / AB Legacy factories follow in their own PRs per #211 / #212 /
#213. Live boot verification happens in the exit-gate PR once all four
factories are in place.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-21 11:06:08 -04:00
Joseph Doherty
8d92e00e38 Task #253 — E2E CLI test scripts + FOCAS test-client CLI
The driver-layer integration tests confirm the driver sees the PLC, and
the Client.CLI tests confirm the client sees the server. Nothing glued
them end-to-end until this PR.

- scripts/e2e/_common.ps1: shared helpers — CLI invocation (published-
  binary OR `dotnet run` fallback), Test-Probe / Test-DriverLoopback /
  Test-ServerBridge (all return @{Passed;Reason} hashtables).
- scripts/e2e/test-<modbus|abcip|ablegacy|s7|focas|twincat>.ps1: per-
  driver three-stage script (probe → driver-loopback → server-bridge).
  AB Legacy / FOCAS / TwinCAT are gated behind *_TRUST_WIRE env vars
  since they need real hardware (#222) or a licensed runtime (#221).
- scripts/e2e/test-phase7-virtualtags.ps1: writes a Modbus HR, reads
  the server-side VirtualTag (VT = input * 2) back via OPC UA, triggers
  + clears a scripted alarm. Exercises the Phase 7 CachedTagUpstreamSource
  + ScriptedAlarmEngine path.
- scripts/e2e/test-all.ps1: reads e2e-config.json sidecar, runs each
  present driver, prints a FINAL MATRIX (PASS/FAIL/SKIP). Missing
  sections SKIP rather than fail hard.
- scripts/e2e/e2e-config.sample.json: commented sample — each dev's
  NodeIds are local-seed-specific so e2e-config.json is .gitignore-d.
- scripts/e2e/README.md: full walkthrough — prereqs, three-stage design,
  env-var gates, expected matrix, why this is separate from `dotnet test`.

Tasks #249-#251 shipped Modbus/AbCip/AbLegacy/S7/TwinCAT CLIs but left
FOCAS out. Since test-focas.ps1 needs it, the 6th CLI ships here:

- src/ZB.MOM.WW.OtOpcUa.Driver.FOCAS.Cli: probe/read/write/subscribe
  commands, AssemblyName `otopcua-focas-cli`. WriteCommand.ParseValue
  handles the full FocasDataType enum (Bit/Byte/Int16/Int32/Float32/
  Float64/String — no UInt variants; the FOCAS protocol exposes signed
  PMC + Fanuc-Float only). Default DataType is Int16 to match the PMC
  register convention.

Full-solution build clean (0 errors). FOCAS CLI wired into
ZB.MOM.WW.OtOpcUa.slnx. No .Tests project for the FOCAS CLI yet —
symmetric with how ProbeCommand has no unit-testable pure logic in the
other 5 CLIs either; WriteCommand.ParseValue parity will land in a
follow-up to keep this PR scoped.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-21 09:51:13 -04:00
Joseph Doherty
4dc685a365 Task #251 — S7 + TwinCAT test-client CLIs (driver CLI suite complete)
Final two of the five driver test clients. Pattern carried forward from
#249 (Modbus) + #250 (AB CIP, AB Legacy) — each CLI inherits Driver.Cli.Common
for DriverCommandBase + SnapshotFormatter and adds a protocol-specific
CommandBase + 4 commands (probe / read / write / subscribe).

New projects:
  - src/ZB.MOM.WW.OtOpcUa.Driver.S7.Cli/ — otopcua-s7-cli.
    S7CommandBase carries host/port/cpu/rack/slot/timeout. Handles all S7
    atomic types (Bool, Byte, Int16..UInt64, Float32/64, String, DateTime).
    DateTime parses via RoundtripKind so "2026-04-21T12:34:56Z" works.
  - src/ZB.MOM.WW.OtOpcUa.Driver.TwinCAT.Cli/ — otopcua-twincat-cli.
    TwinCATCommandBase carries ams-net-id + ams-port + --poll-only toggle
    (flips UseNativeNotifications=false). Covers the full IEC 61131-3
    atomic set: Bool, SInt/USInt, Int/UInt, DInt/UDInt, LInt/ULInt, Real,
    LReal, String, WString, Time/Date/DateTime/TimeOfDay. Structure writes
    refused as out-of-scope (same as AB CIP). IEC time/date variants marshal
    as UDINT on the wire per IEC spec. Subscribe banner announces "ADS
    notification" vs "polling" so the mechanism is obvious in bug reports.

Tests (49 new, 122 cumulative driver-CLI):
  - S7: 22 tests. Every S7DataType has a happy-path + bounds case. DateTime
    round-trips an ISO-8601 string. Tag-name synthesis round-trips every
    S7 address form (DB / M / I / Q, bit/word/dword, strings).
  - TwinCAT: 27 tests. Full IEC type matrix including WString UTF-8 pass-
    through + the four IEC time/date variants landing on UDINT. Structure
    rejection case. Tag-name synthesis for Program scope, GVL scope, nested
    UDT members, and array elements.

Docs:
  - docs/Driver.S7.Cli.md — address grammar cheat sheet + the PUT/GET-must-
    be-enabled gotcha every S7-1200/1500 operator hits.
  - docs/Driver.TwinCAT.Cli.md — AMS router prerequisite (XAR / standalone
    Router NuGet / remote AMS route) + per-command examples.

Wiring:
  - ZB.MOM.WW.OtOpcUa.slnx grew 4 entries (2 src + 2 tests).

Full-solution build clean. Both --help outputs verified end-to-end.

Driver CLI suite complete: 5 CLIs (otopcua-{modbus,abcip,ablegacy,s7,twincat}-cli)
sharing a common base + formatter. 122 CLI tests cumulative. Every driver family
shipped in v2 now has a shell-level ad-hoc validation tool.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-21 08:44:53 -04:00
Joseph Doherty
b2065f8730 Task #250 — AB CIP + AB Legacy test-client CLIs
Second + third of the four driver test clients. Both follow the same shape as
otopcua-modbus-cli (#249) and consume Driver.Cli.Common for DriverCommandBase +
SnapshotFormatter.

New projects:
  - src/ZB.MOM.WW.OtOpcUa.Driver.AbCip.Cli/ — otopcua-abcip-cli.
    AbCipCommandBase carries gateway (ab://host[:port]/cip-path) + family
    (ControlLogix/CompactLogix/Micro800/GuardLogix) + timeout.
    Commands: probe, read, write, subscribe.
    Value parser covers every AbCipDataType atomic type (Bool, SInt..LInt,
    USInt..ULInt, Real, LReal, String, Dt); Structure writes refused as
    out-of-scope for the CLI.
  - src/ZB.MOM.WW.OtOpcUa.Driver.AbLegacy.Cli/ — otopcua-ablegacy-cli.
    AbLegacyCommandBase carries gateway + plc-type (Slc500/MicroLogix/Plc5/
    LogixPccc) + timeout.
    Commands: probe (default address N7:0), read, write, subscribe.
    Value parser covers Bit, Int, Long, Float, AnalogInt, String, and the
    three sub-element types (TimerElement / CounterElement / ControlElement
    all land on int32 at the wire).

Tests (35 new, 73 cumulative across the driver CLI family):
  - AB CIP: 17 tests — ParseValue happy-paths for every Logix atomic type,
    failure cases (non-numeric / bool garbage), tag-name synthesis.
  - AB Legacy: 18 tests — ParseValue coverage (Bit / Int / AnalogInt / Long /
    Float / String / sub-elements), PCCC address round-trip in tag names
    including bit-within-word + sub-element syntax.

Docs:
  - docs/Driver.AbCip.Cli.md — family ↔ CIP-path cheat sheet + examples per
    command + typical workflows.
  - docs/Driver.AbLegacy.Cli.md — PCCC address primer (file letters → CLI
    --type) + known ab_server upstream gap cross-ref to #224 close-out.

Wiring:
  - ZB.MOM.WW.OtOpcUa.slnx grew 4 entries (2 src + 2 tests).

Full-solution build clean. `otopcua-abcip-cli --help` + `otopcua-ablegacy-cli
--help` verified end-to-end.

Next up (#251): S7 + TwinCAT CLIs, same pattern.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-21 08:32:43 -04:00
Joseph Doherty
5dac2e9375 Task #249 — Driver test-client CLIs: shared lib + Modbus CLI first
Mirrors the v1 otopcua-cli value prop (ad-hoc shell-level PLC validation) for
the Modbus-TCP driver, and lays down the shared scaffolding that AB CIP, AB
Legacy, S7, and TwinCAT CLIs will build on.

New projects:
  - src/ZB.MOM.WW.OtOpcUa.Driver.Cli.Common/ — DriverCommandBase (verbose
    flag + Serilog config) + SnapshotFormatter (single-tag + table +
    write-result renders with invariant-culture value formatting + OPC UA
    status-code shortnames + UTC-normalised timestamps).
  - src/ZB.MOM.WW.OtOpcUa.Driver.Modbus.Cli/ — otopcua-modbus-cli executable.
    Commands: probe, read, write, subscribe. ModbusCommandBase carries the
    host/port/unit-id flags + builds ModbusDriverOptions with Probe.Enabled
    =false (CLI runs are one-shot; driver-internal keep-alive would race).

Commands + coverage:
  - probe              single FC03 + GetHealth() + pretty-print
  - read               region × address × type synth into one driver tag
  - write              same shape + --value parsed per --type
  - subscribe          polled-subscription stream until Ctrl+C

Tests (38 total):
  - 16 SnapshotFormatterTests covering: status-code shortnames, unknown
    codes fall back to hex, null value + timestamp placeholders, bool
    lowercase, float invariant culture, string quoting, write-result shape,
    aligned table columns, mismatched-length rejection, UTC normalisation.
  - 22 Modbus CLI tests:
      · ReadCommandTests.SynthesiseTagName (5 theory cases)
      · WriteCommandParseValueTests (17 cases: bool aliases, unknown rejected,
        Int16 bounds, UInt16/Bcd16 type, Float32/64 invariant culture,
        String passthrough, BitInRegister, Int32 MinValue, non-numeric reject)

Wiring:
  - ZB.MOM.WW.OtOpcUa.slnx grew 4 entries (2 src + 2 tests).
  - docs/Driver.Modbus.Cli.md — operator-facing runbook with examples per
    command + output format + typical workflows.

Regression: full-solution build clean; shared-lib tests 16/0, Modbus CLI tests
22/0.

Next up: repeat the pattern for AB CIP (shares ~40% more with Modbus via
libplctag), then AB Legacy, S7, TwinCAT. The shared base stays as-is unless
one of those exposes a gap the Modbus-first pass missed.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-21 08:15:14 -04:00
Joseph Doherty
dfe3731c73 Task #220 — Wire FOCAS into DriverFactoryRegistry bootstrap pipeline
Closes the non-hardware gap surfaced in the #220 audit: FOCAS had full Tier-C
architecture (Driver.FOCAS + Driver.FOCAS.Host + Driver.FOCAS.Shared, supervisor,
post-mortem MMF, NSSM scripts, 239 tests) but no factory registration, so config-DB
DriverInstance rows of type "FOCAS" would fail at bootstrap with "unknown driver
type". Hardware-gated FwlibHostedBackend (real Fwlib32 P/Invoke inside the Host
process) stays deferred under #222 lab-rig.

Ships:
  - FocasDriverFactoryExtensions.Register(registry) mirroring the Galaxy pattern.
    JSON schema selects backend via "Backend" field:
      "ipc" (default) — IpcFocasClientFactory → named-pipe FocasIpcClient →
                        Driver.FOCAS.Host process (Tier-C isolation)
      "fwlib"         — direct in-process FwlibFocasClientFactory (P/Invoke)
      "unimplemented" — UnimplementedFocasClientFactory (fail-fast on use —
                        useful for staging DriverInstance rows pre-Host-deploy)
  - Devices / Tags / Probe / Timeout / Series feed into FocasDriverOptions.
    Series validated eagerly at top-level so typos fail at bootstrap, not first
    read. Tag DataType + Series enum values surface clear errors listing valid
    options.
  - Program.cs adds FocasDriverFactoryExtensions.Register alongside Galaxy.
  - Driver.FOCAS.csproj references Core (for DriverFactoryRegistry).
  - Server.csproj adds Driver.FOCAS ProjectReference so the factory type is
    reachable from Program.cs.

Tests: 13 new FocasDriverFactoryExtensionsTests covering: registry entry,
case-insensitive lookup, ipc backend with full config, ipc defaults, missing
PipeName/SharedSecret errors, fwlib backend short-path, unimplemented backend,
unknown-backend error, unknown-Series error, tag missing DataType, null/ws args,
duplicate-register throws.

Regression: 202 FOCAS + 13 FOCAS.Host + 24 FOCAS.Shared + 239 Server all pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-21 01:08:25 -04:00
Joseph Doherty
8221fac8c1 Task #219 follow-up — close AlarmConditionState child-NodeId + event-propagation gaps
PR #197 surfaced two integration-level wiring gaps in DriverNodeManager's
MarkAsAlarmCondition path; this commit fixes both and upgrades the integration
test to assert them end-to-end.

Fix 1 — addressable child nodes: AlarmConditionState inherits ~50 typed children
(Severity / Message / ActiveState / AckedState / EnabledState / …). The stack
was leaving them with Foundation-namespace NodeIds (type-declaration defaults) or
shared ns=0 counter allocations, so client Read on a child returned
BadNodeIdUnknown. Pass assignNodeIds=true to alarm.Create, then walk the condition
subtree and rewrite each descendant's NodeId symbolically as
  {condition-full-ref}.{symbolic-path}
in the node manager's namespace. Stable, unique, and collision-free across
multiple alarm instances in the same driver.

Fix 2 — event propagation to Server.EventNotifier: OPC UA Part 9 event
propagation relies on the alarm condition being reachable from Objects/Server
via HasNotifier. Call CustomNodeManager2.AddRootNotifier(alarm) after registering
the condition so subscriptions placed on Server-object EventNotifier receive the
ReportEvent calls ConditionSink emits per-transition.

Test upgrades in AlarmSubscribeIntegrationTests:
  - Driver_alarm_transition_updates_server_side_AlarmConditionState_node — now
    asserts Severity == 700, Message text, and ActiveState.Id == true through
    the OPC UA client (previously scoped out as BadNodeIdUnknown).
  - New: Driver_alarm_event_flows_to_client_subscription_on_Server_EventNotifier
    subscribes an OPC UA event monitor on ObjectIds.Server, fires a driver
    transition, and waits for the AlarmConditionType event to be delivered,
    asserting Message + Severity fields. Previously scoped out as "Part 9 event
    propagation out of reach."

Regression checks: 239 server tests pass (+1 new event-subscription test),
195 Core tests pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-21 00:22:02 -04:00
Joseph Doherty
2cb22598d6 Drop accidentally-committed LiteDB cache file + add to .gitignore
The previous commit (#248 wiring) inadvertently picked up
src/ZB.MOM.WW.OtOpcUa.Server/config_cache.db — generated by the live smoke
re-run that proved the bootstrapper works. Remove from tracking + ignore
going forward so future runs don't dirty the working tree.
2026-04-20 22:49:48 -04:00
Joseph Doherty
3d78033ea4 Driver-instance bootstrap pipeline (#248) — DriverInstance rows materialise as live IDriver instances
Closes the gap surfaced by Phase 7 live smoke (#240): DriverInstance rows in
the central config DB had no path to materialise as live IDriver instances in
DriverHost, so virtual-tag scripts read BadNodeIdUnknown for every tag.

## DriverFactoryRegistry (Core.Hosting)
Process-singleton type-name → factory map. Each driver project's static
Register call pre-loads its factory at Program.cs startup; the bootstrapper
looks up by DriverInstance.DriverType + invokes with (DriverInstanceId,
DriverConfig JSON). Case-insensitive; duplicate-type registration throws.

## GalaxyProxyDriverFactoryExtensions.Register (Driver.Galaxy.Proxy)
Static helper — no Microsoft.Extensions.DependencyInjection dep, keeps the
driver project free of DI machinery. Parses DriverConfig JSON for PipeName +
SharedSecret + ConnectTimeoutMs. DriverInstanceId from the row wins over JSON
per the schema's UX_DriverInstance_Generation_LogicalId.

## DriverInstanceBootstrapper (Server)
After NodeBootstrap loads the published generation: queries DriverInstance
rows scoped to that generation, looks up the factory per row, constructs +
DriverHost.RegisterAsync (which calls InitializeAsync). Per plan decision
#12 (driver isolation), failure of one driver doesn't prevent others —
logs ERR + continues + returns the count actually registered. Unknown
DriverType (factory not registered) logs WRN + skips so a missing-assembly
deployment doesn't take down the whole server.

## Wired into OpcUaServerService.ExecuteAsync
After NodeBootstrap.LoadCurrentGenerationAsync, before
PopulateEquipmentContentAsync + Phase7Composer.PrepareAsync. The Phase 7
chain now sees a populated DriverHost so CachedTagUpstreamSource has an
upstream feed.

## Live evidence on the dev box
Re-ran the Phase 7 smoke from task #240. Pre-#248 vs post-#248:
  Equipment namespace snapshots loaded for 0/0 driver(s)  ← before
  Equipment namespace snapshots loaded for 1/1 driver(s)  ← after

Galaxy.Host pipe ACL denied our SID (env-config issue documented in
docs/ServiceHosting.md, NOT a code issue) — the bootstrapper logged it as
"failed to initialize, driver state will reflect Faulted" and continued past
the failure exactly per plan #12. The rest of the pipeline (Equipment walker
+ Phase 7 composer) ran to completion.

## Tests — 5 new DriverFactoryRegistryTests
Register + TryGet round-trip, case-insensitive lookup, duplicate-type throws,
null-arg guards, RegisteredTypes snapshot. Pure functions; no DI/DB needed.
The bootstrapper's DB-query path is exercised by the live smoke (#240) which
operators run before each release.
2026-04-20 22:49:25 -04:00
Joseph Doherty
bb10ba7108 Phase 7 follow-up #247 — Galaxy.Host historian writer + SQLite sink activation
Closes the historian leg of Phase 7. Scripted alarm transitions now batch-flow
through the existing Galaxy.Host pipe + queue durably in a local SQLite store-
and-forward when Galaxy is the registered driver, instead of being dropped into
NullAlarmHistorianSink.

## GalaxyHistorianWriter (Driver.Galaxy.Proxy.Ipc)

IAlarmHistorianWriter implementation. Translates AlarmHistorianEvent →
HistorianAlarmEventDto (Stream D contract), batches via the existing
GalaxyIpcClient.CallAsync round-trip on MessageKind.HistorianAlarmEventRequest /
Response, maps per-event HistorianAlarmEventOutcomeDto bytes back to
HistorianWriteOutcome (Ack/RetryPlease/PermanentFail) so the SQLite drain
worker knows what to ack vs dead-letter vs retry. Empty-batch fast path.
Pipe-level transport faults (broken pipe, host crash) bubble up as
GalaxyIpcException which the SQLite sink's drain worker translates to
whole-batch RetryPlease per its catch contract.

## GalaxyProxyDriver implements IAlarmHistorianWriter

Marker interface lets Phase7Composer discover it via type check at compose
time. WriteBatchAsync delegates to a thin GalaxyHistorianWriter wrapping the
driver's existing _client. Throws InvalidOperationException if InitializeAsync
hasn't connected yet — the SQLite drain worker treats that as a transient
batch failure and retries.

## Phase7Composer.ResolveHistorianSink

Replaces the injected sink dep when any registered driver implements
IAlarmHistorianWriter. Constructs SqliteStoreAndForwardSink at
%ProgramData%/OtOpcUa/alarm-historian-queue.db (falls back to %TEMP% when
ProgramData unavailable, e.g. dev), starts the 2s drain timer, owns the sink
disposable for clean teardown. When no driver provides the writer, keeps the
NullAlarmHistorianSink wired by Program.cs (#246).

DisposeAsync now also disposes the owned SQLite sink in the right order:
bridge → engines → owned sink → injected fallback.

## Tests — 7 new GalaxyHistorianWriterMappingTests

ToDto round-trips every field; preserves null Comment; per-byte outcome enum
mapping (Ack / RetryPlease / PermanentFail) via [Theory]; unknown byte throws;
ctor null-guard. The IPC round-trip itself is covered by the live Host suite
(task #240) which constructs a real pipe.

Server.Phase7 tests: 34/34 still pass; Galaxy.Proxy tests: 25/25 (+7 = 32 total).

## Phase 7 production wiring chain — COMPLETE
-  #243 composition kernel
-  #245 scripted-alarm IReadable adapter
-  #244 driver bridge
-  #246 Program.cs wire-in
-  #247 this — Galaxy.Host historian writer + SQLite sink activation

What unblocks now: task #240 live OPC UA E2E smoke. With a Galaxy driver
registered, scripted alarm transitions flow end-to-end through the engine →
SQLite queue → drain worker → Galaxy.Host IPC → Aveva Historian alarm schema.
Without Galaxy, NullSink keeps the engines functional and the queue dormant.
2026-04-20 22:18:39 -04:00
Joseph Doherty
7352db28a6 Phase 7 follow-up #246 — Phase7Composer + Program.cs wire-in
Activates the Phase 7 engines in production. Loads Script + VirtualTag +
ScriptedAlarm rows from the bootstrapped generation, wires the engines through
the Phase7EngineComposer kernel (#243), starts the DriverSubscriptionBridge feed
(#244), and late-binds the resulting IReadable sources to OpcUaApplicationHost
before OPC UA server start.

## Phase7Composer (Server.Phase7)

Singleton orchestrator. PrepareAsync loads the three Phase 7 row sets in one
DB scope, builds CachedTagUpstreamSource, calls Phase7EngineComposer.Compose,
constructs DriverSubscriptionBridge with one DriverFeed per registered
ISubscribable driver (path-to-fullRef map built from EquipmentNamespaceContent
via MapPathsToFullRefs), starts the bridge.

DisposeAsync tears down in the right order: bridge first (no more events fired
into the cache), then engines (cascades + timers stop), then any disposable sink.

MapPathsToFullRefs: deterministic path convention is
  /{areaName}/{lineName}/{equipmentName}/{tagName}
matching exactly what EquipmentNodeWalker emits into the OPC UA browse tree, so
script literals against the operator-visible UNS tree work without translation.
Tags missing EquipmentId or pointing at unknown Equipment are skipped silently
(Galaxy SystemPlatform-style tags + dangling references handled).

## OpcUaApplicationHost.SetPhase7Sources

New late-bind setter. Throws InvalidOperationException if called after
StartAsync because OtOpcUaServer + DriverNodeManagers capture the field values
at construction; mutation post-start would silently fail.

## OpcUaServerService

After bootstrap loads the current generation, calls phase7Composer.PrepareAsync
+ applicationHost.SetPhase7Sources before applicationHost.StartAsync. StopAsync
disposes Phase7Composer first so the bridge stops feeding the cache before the
OPC UA server tears down its node managers (avoids in-flight cascades surfacing
as noisy shutdown warnings).

## Program.cs

Registers IAlarmHistorianSink as NullAlarmHistorianSink.Instance (task #247
swaps in the real Galaxy.Host-writer-backed SqliteStoreAndForwardSink), Serilog
root logger, and Phase7Composer singleton.

## Tests — 5 new Phase7ComposerMappingTests = 34 Phase 7 tests total

Maps tag → walker UNS path, skips null EquipmentId, skips unknown Equipment
reference, multiple tags under same equipment map distinctly, empty content
yields empty map. Pure functions; no DI/DB needed.

The real PrepareAsync DB query path can't be exercised without SQL Server in
the test environment — it's exercised by the live E2E smoke (task #240) which
unblocks once #247 lands.

## Phase 7 production wiring chain status
-  #243 composition kernel
-  #245 scripted-alarm IReadable adapter
-  #244 driver bridge
-  #246 this — Program.cs wire-in
- 🟡 #247 — Galaxy.Host SqliteStoreAndForwardSink writer adapter (replaces NullSink)
- 🟡 #240 — live E2E smoke (unblocks once #247 lands)
2026-04-20 22:06:03 -04:00
Joseph Doherty
e11350cf80 Phase 7 follow-up #244 — DriverSubscriptionBridge
Pumps live driver OnDataChange notifications into CachedTagUpstreamSource so
ctx.GetTag in user scripts sees the freshest driver value. The last missing piece
between #243 (composition kernel) and #246 (Program.cs wire-in).

## DriverSubscriptionBridge

IAsyncDisposable. Per DriverFeed: groups all paths for one ISubscribable into a
single SubscribeAsync call (consolidating polled drivers' work + giving
native-subscription drivers one watch list), keeps a per-feed reverse map from
driver-opaque fullRef back to script-side UNS path, hooks OnDataChange to
translate + push into the cache. DisposeAsync awaits UnsubscribeAsync per active
subscription + unhooks every handler so events post-dispose are silent.

Empty PathToFullRef map → feed skipped (no SubscribeAsync call). Subscribe failure
on any feed unhooks that feed's handler + propagates so misconfiguration aborts
bridge start cleanly. Double-Start throws InvalidOperationException; double-Dispose
is idempotent.

OTOPCUA0001 suppressed at the two ISubscribable call sites with comments
explaining the carve-out: bridge is the lifecycle-coordinator for Phase 7
subscriptions (one Subscribe at engine compose, one Unsubscribe at shutdown),
not the per-call hot-path. Driver Read dispatch still goes through CapabilityInvoker
via DriverNodeManager.

## Tests — 9 new = 29 Phase 7 tests total

DriverSubscriptionBridgeTests covers: SubscribeAsync called with distinct fullRefs,
OnDataChange pushes to cache keyed by UNS path, unmapped fullRef ignored, empty
PathToFullRef skips Subscribe, DisposeAsync unsubscribes + unhooks (post-dispose
events don't push), StartAsync called twice throws, DisposeAsync idempotent,
Subscribe failure unhooks handler + propagates, ctor null guards.

## Phase 7 production wiring chain status
- #243  composition kernel
- #245  scripted-alarm IReadable adapter
- #244  this — driver bridge
- #246 pending — Program.cs Compose call + SqliteStoreAndForwardSink lifecycle
- #240 pending — live E2E smoke (unblocks once #246 lands)
2026-04-20 21:53:05 -04:00
Joseph Doherty
d6a8bb1064 Phase 7 follow-up #245 — ScriptedAlarmReadable adapter over engine state
Task #245 — exposes each scripted alarm's current ActiveState as IReadable so
OPC UA variable reads on Source=ScriptedAlarm nodes return the live predicate
truth instead of BadNotFound.

## ScriptedAlarmReadable

Wraps ScriptedAlarmEngine + implements IReadable:
- Known alarm + Active → DataValueSnapshot(true, Good)
- Known alarm + Inactive → DataValueSnapshot(false, Good)
- Unknown alarm id → DataValueSnapshot(null, BadNodeIdUnknown) — surfaces
  misconfiguration rather than silently reading false
- Batch reads preserve request order

Phase7EngineComposer.Compose now returns this as ScriptedAlarmReadable when
ScriptedAlarm rows are present. ScriptedAlarmSource (IAlarmSource for the event
stream) stays in place — the IReadable is a separate adapter over the same engine.

## Tests — 6 new + 1 updated composer test = 19 total Phase 7 tests

ScriptedAlarmReadableTests covers: inactive + active predicate → bool snapshot,
unknown alarm id → BadNodeIdUnknown, batch order preservation, null-engine +
null-fullReferences guards. The active-predicate test uses ctx.GetTag on a seeded
upstream value to drive a real cascade through the engine.

Updated Phase7EngineComposerTests to assert ScriptedAlarmReadable is non-null
when alarms compose, null when only virtual tags.

## Follow-ups remaining
- #244 — driver-bridge feed populating CachedTagUpstreamSource
- #246 — Program.cs Compose call + SqliteStoreAndForwardSink lifecycle
2026-04-20 21:30:56 -04:00
Joseph Doherty
f64a8049d8 Phase 7 follow-up #243 — CachedTagUpstreamSource + Phase7EngineComposer
Ships the composition kernel that maps Config DB rows (Script / VirtualTag /
ScriptedAlarm) to the runtime definitions VirtualTagEngine + ScriptedAlarmEngine
consume, builds the engine instances, and wires OnEvent → historian-sink routing.

## src/ZB.MOM.WW.OtOpcUa.Server/Phase7/

- CachedTagUpstreamSource — implements both Core.VirtualTags.ITagUpstreamSource and
  Core.ScriptedAlarms.ITagUpstreamSource (identical shape, distinct namespaces) on one
  concrete type so the composer can hand one instance to both engines. Thread-safe
  ConcurrentDictionary value cache with synchronous ReadTag + fire-on-write
  Push(path, snapshot) that fans out to every observer registered via SubscribeTag.
  Unknown-path reads return a BadNodeIdUnknown-quality snapshot (status 0x80340000)
  so scripts branch on quality naturally.
- Phase7EngineComposer.Compose(scripts, virtualTags, scriptedAlarms, upstream,
  alarmStateStore, historianSink, rootScriptLogger, loggerFactory) — single static
  entry point that:
  * Indexes scripts by ScriptId, resolves VirtualTag.ScriptId + ScriptedAlarm.PredicateScriptId
    to full SourceCode
  * Projects DB rows to VirtualTagDefinition + ScriptedAlarmDefinition (mapping
    DataType string → DriverDataType enum, AlarmType string → AlarmKind enum,
    Severity 1..1000 → AlarmSeverity bucket matching the OPC UA Part 9 bands
    that AbCipAlarmProjection + OpcUaClient MapSeverity already use)
  * Constructs VirtualTagEngine + loads definitions (throws InvalidOperationException
    with the list of scripts that failed to compile — aggregated like Streams B+C)
  * Constructs ScriptedAlarmEngine + loads definitions + wires OnEvent →
    IAlarmHistorianSink.EnqueueAsync using ScriptedAlarmEvent.Emission as the event
    kind + Condition.LastAckUser/LastAckComment for audit fields
  * Returns Phase7ComposedSources with Disposables list the caller owns

Empty Phase 7 config returns Phase7ComposedSources.Empty so deployments without
scripts / alarms behave exactly as pre-Phase-7. Non-null sources flow into
OpcUaApplicationHost's virtualReadable / scriptedAlarmReadable plumbing landed by
task #239 — DriverNodeManager then dispatches reads by NodeSourceKind per PR #186.

## Tests — 12/12

CachedTagUpstreamSourceTests (6):
- Unknown-path read returns BadNodeIdUnknown-quality snapshot
- Push-then-Read returns cached value
- Push fans out to subscribers in registration order
- Push to one path doesn't fire another path's observer
- Dispose of subscription handle stops fan-out
- Satisfies both Core.VirtualTags + Core.ScriptedAlarms ITagUpstreamSource interfaces

Phase7EngineComposerTests (6):
- Empty rows → Phase7ComposedSources.Empty (both sources null)
- VirtualTag rows → VirtualReadable non-null + Disposables populated
- Missing script reference throws InvalidOperationException with the missing ScriptId
  in the message
- Disabled VirtualTag row skipped by projection
- TimerIntervalMs → TimeSpan.FromMilliseconds round-trip
- Severity 1..1000 maps to Low/Medium/High/Critical at 250/500/750 boundaries
  (matches AbCipAlarmProjection + OpcUaClient.MapSeverity banding)

## Scope — what this PR does NOT do

The composition kernel is the tricky part; the remaining wiring is three narrower
follow-ups that each build on this PR:

- task #244 — driver-bridge feed that populates CachedTagUpstreamSource from live
  driver subscriptions. Without this, ctx.GetTag returns BadNodeIdUnknown even when
  the driver has a fresh value.
- task #245 — ScriptedAlarmReadable adapter exposing each alarm's current Active
  state as IReadable. Phase7EngineComposer.Compose currently returns
  ScriptedAlarmReadable=null so reads on Source=ScriptedAlarm variables return
  BadNotFound per the ADR-002 "misconfiguration not silent fallback" signal.
- task #246 — Program.cs call to Phase7EngineComposer.Compose with config rows
  loaded from the sealed-cache DB read, plus SqliteStoreAndForwardSink lifecycle
  wiring at %ProgramData%/OtOpcUa/alarm-historian-queue.db with the Galaxy.Host
  IPC writer from Stream D.

Task #240 (live OPC UA E2E smoke) depends on all three follow-ups landing.
2026-04-20 21:23:31 -04:00
Joseph Doherty
63b31e240e Phase 7 follow-ups #239 (plumbing) + #241 (diff-proc extension)
Two complementary pieces that together unblock the last Phase 7 exit-gate deferrals.

## #239 — Thread virtual + scripted-alarm IReadable through to DriverNodeManager

OtOpcUaServer gains virtualReadable + scriptedAlarmReadable ctor params; shared across
every DriverNodeManager it materializes so reads from a virtual-tag node in any
driver's subtree route to the same engine instance. Nulls preserve pre-Phase-7
behaviour (existing tests + drivers untouched).

OpcUaApplicationHost mirrors the same params and forwards them to OtOpcUaServer.

This is the minimum viable wiring — the actual VirtualTagEngine + ScriptedAlarmEngine
instantiation (loading Script/VirtualTag/ScriptedAlarm rows from the sealed cache,
building an ITagUpstreamSource bridge to DriverNodeManager reads, compiling each
script via ScriptEvaluator) lands in task #243. Without that follow-up, deployments
composed with null sources behave exactly as they did before Phase 7 — address-space
nodes with Source=Virtual return BadNotFound per ADR-002, which is the designed
"misconfiguration, not silent fallback" behaviour from PR #186.

## #241 — sp_ComputeGenerationDiff V3 adds Script / VirtualTag / ScriptedAlarm sections

Migration 20260420232000_ExtendComputeGenerationDiffWithPhase7. Same CHECKSUM-based
Modified detection the existing sections use. Logical ids: ScriptId / VirtualTagId /
ScriptedAlarmId. Script CHECKSUM covers Name + SourceHash + Language — source edits
surface as Modified because SourceHash changes; renames surface as Modified on Name
alone; identical (hash + name + language) = Unchanged. VirtualTag + ScriptedAlarm
CHECKSUMs cover their content columns.

ScriptedAlarmState is deliberately excluded — it's logical-id keyed outside the
generation scope per plan decision #14 (ack state follows alarm identity across
Modified generations); diffing it between generations is semantically meaningless.

Down() restores V2 (the NodeAcl-extended proc from migration 20260420000001).

## No new test count — both pieces are proven by existing suites

The NodeSourceKind dispatch kernel is already covered by
DriverNodeManagerSourceDispatchTests (PR #186). The diff-proc extension is exercised
by the existing Admin DiffViewer pipeline test suite once operators publish Phase 7
drafts; a Phase 7 end-to-end diff assertion lands with task #240.
2026-04-20 21:07:59 -04:00