Commit Graph

85 Commits

Author SHA1 Message Date
Joseph Doherty
c55da145ec docs: add Galaxy parity rig runbook
Walks through standing up both Galaxy backends side-by-side against a
single live Galaxy:

- Conceptual layout (two MxAccess sessions on distinct ClientNames so
  they don't evict each other)
- What's already on the dev box (AVEVA + OtOpcUaGalaxyHost service)
- mxaccessgw build + run + config (API key, ClientName)
- The three OTOPCUA_PARITY_* env vars the harness reads
- HarnessShapeTests as the two-line truth-teller for "did both halves
  resolve"
- Galaxy-shape coverage matrix mapping each scenario to what's needed
  for it to assert (rather than skip)
- Soak run recipes, including the compressed-tag fallback when the dev
  Galaxy doesn't have 50k attributes
- Troubleshooting for the four common SkipReasons
- Three further gates before PR 7.2 lands (matrix green, soak data,
  pilot flip)

Explicitly drops the stale "use a non-elevated shell" precondition —
the legacy Galaxy.Host pipe ACL accepts elevated and non-elevated
dohertj2 alike (resolved 2026-04-24).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-29 22:08:43 -04:00
Joseph Doherty
edee47d77f PR 6.W — Galaxy.Performance.md
Documents the four perf surfaces shipped in Phase 6:

- Tracing surface (PR 6.1) — table of every span the driver emits +
  rationale for stream-level (not per-event) coverage.
- Metrics surface (PR 6.2) — three EventPump counters, tagging
  scheme, the bounded-channel design, and the
  received = dispatched + dropped + in-flight invariant.
- Buffered update interval (PR 6.3) — how MxAccess.PublishingIntervalMs
  flows through both subscribe paths and what's still pending on the
  gw side (typed SetBufferedUpdateInterval helper).
- Soak scenario (PR 6.4) — env-var-gated 24h × 50k validation with
  the CI-compressed override recipe.
- Tuned defaults (PR 6.5) — table of every default with source +
  notes; rows marked "unchanged" carry the explicit "no live data
  argues for changing this" caveat.

Closes with a "where to look first when something's slow" runbook
section so on-call doesn't have to re-derive the trace+metric
correlation map from primary docs.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-29 17:04:23 -04:00
Joseph Doherty
78fe3e8a45 PR 5.W — Galaxy.ParityMatrix.md
Tabular scenario × result map for the seven Phase 5 parity scenarios
(BrowseAndRead, Subscribe, Write, Alarm, History, Reconnect, ScanState).
Each row records the assertion strength (green strict, yellow soft) and
flags accepted-delta cases:

- Transport-entry host name divergence (legacy = Galaxy.Host process,
  mxgw = MxAccess.ClientName)
- Reconnect latency cadence — different paths, both correct for their
  own session shape
- Sampled-read value drift (we pin StatusCode + type, not value)
- Event-rate ±50% tolerance over a 3s window
- Per-driver IHistoryProvider absence (architectural pin from PR 1.3)

Phase 7 (PR 7.1) consumes this matrix as the default-flip gate.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-29 16:32:20 -04:00
Joseph Doherty
b8df230eb8 Task #152 — Modbus coalescing: surface auto-prohibitions through diagnostics
Auto-prohibited ranges (#148) were previously visible only through an
internal AutoProhibitedRangeCount accessor used by tests. Production
operators had no way to see what the planner had learned without pulling
logs or inspecting driver state.

Changes:

- New public record `ModbusAutoProhibition(UnitId, Region, StartAddress,
  EndAddress, LastProbedUtc, BisectionPending)` — operator-facing snapshot
  shape. Lives in the addressing assembly's logical namespace alongside
  the other public types.
- `ModbusDriver.GetAutoProhibitedRanges()` returns
  `IReadOnlyList<ModbusAutoProhibition>` — a copy of the live prohibition
  map. Lock-protected snapshot so consumers don't race with the re-probe
  loop.
- RecordAutoProhibition tracks first-fire vs re-fire via the dictionary
  insert path, leaving a hook to add structured logging once an ILogger
  is plumbed through (currently elided to keep the constructor minimal
  for testability — a future change can wire ILogger and emit a single
  warning per first-fire).

Tests (1 new, additive to the 6 in ModbusCoalescingAutoRecoveryTests):
- GetAutoProhibitedRanges_Surfaces_Operator_Visible_Snapshot — confirms
  the snapshot shape: empty before any failure, populated with correct
  UnitId/Region/Start/End/BisectionPending after a failed coalesced read,
  LastProbedUtc within the recent past.

Docs:
- docs/v2/modbus-addressing.md — new "Coalescing auto-recovery" subsection
  consolidates the #148/#150/#151/#152 surface in one place. Documents
  the diagnostic accessor + flags the in-process consumption pattern
  (Server health endpoints today; Admin UI when an RPC channel exists).

239 + 1 = 240 unit tests green.

Caveat: the Admin UI surfacing (table render, "clear all prohibitions"
button) is intentionally NOT shipped here. Admin can't reach a live
ModbusDriver instance without a driver-diagnostics RPC channel that
doesn't exist yet — that's a larger architectural piece. For now the
data is queryable in-process by the Server's health endpoints; once an
RPC channel lands, Admin can wire the existing GetAutoProhibitedRanges
into a Blazor table without further driver changes.
2026-04-25 01:19:10 -04:00
Joseph Doherty
dfd027ebca Task #146 — Modbus addressing: align type codes with Wonderware DASMBTCP + Ignition
Web verification (2026-04-25) against current vendor docs surfaced concrete
grammar conflicts in the v1 suffix grammar shipped in #137. Hard cutover
before the Admin UI rolls out widely so users don't paste `:I` from a
Wonderware spreadsheet and silently get wrong-typed reads.

Sources:
- Wonderware DASMBTCP user guide
  https://cdn.logic-control.com/media/DASMBTCP.pdf
- Ignition Modbus addressing (8.1)
  https://www.docs.inductiveautomation.com/docs/8.1/ignition-modules/opc-ua/opc-ua-drivers/modbus/modbus-addressing

Type-code changes:

| Code   | Pre-#146 | Post-#146  | Vendor reference            |
|--------|----------|------------|------------------------------|
| `:S`   | (n/a)    | Int16      | Wonderware DASMBTCP `S`      |
| `:US`  | (n/a)    | UInt16     | Ignition `HRUS`              |
| `:I`   | Int16    | **Int32**  | Wonderware `I` + Ignition `HRI` |
| `:UI`  | UInt16   | **UInt32** | Ignition `HRUI`              |
| `:I_64`  | (n/a)  | Int64      | Ignition `HRI_64`            |
| `:UI_64` | (n/a)  | UInt64     | Ignition `HRUI_64`           |
| `:BCD_32`| (n/a)  | BCD32      | Ignition `HRBCD_32`          |

Codes REMOVED (no clear vendor precedent + conflict with the new mapping):
`:DI`, `:L`, `:UDI`, `:UL`, `:LI`, `:ULI`, `:LBCD`. Pre-#146 configs that
use them get an "Unknown type code" diagnostic at parse time so users get
a fast surface-level error rather than silent wrong-typed reads.

Codes UNCHANGED (already vendor-aligned): `:BOOL`, `:F`, `:D`, `:BCD`,
`:STR<n>`. Modicon 5/6-digit + mnemonic regions (HR/IR/C/DI) + bit suffix
`.N` are also unchanged.

Defaults:
- Coils / DiscreteInputs → `BOOL` (unchanged)
- HoldingRegisters / InputRegisters with no explicit type → Int16 (matches
  Ignition's bare `HR` default)

Byte-order mnemonics (`:ABCD` / `:CDAB` / `:BADC` / `:DCBA`) are kept but
documented as OtOpcUa-specific — they aren't in any major vendor's per-tag
address string. Ignition uses a `-R` suffix per prefix; Wonderware
configures word-order at the topic level.

Tests:
- 12 Type_Codes_Parse rows updated to assert the new mappings.
- New Removed_Aliases_Are_Rejected (×7) confirms each pre-#146 alias now
  fails fast with "Unknown type code".
- Worked_Example_Int16_Array uses the new `:S` code.
- New Worked_Example_Int32_Array_Via_I_Code documents the `:I = Int32`
  vendor-alignment intent so a future "fix" doesn't accidentally regress.
- Unknown_Type_Code_Rejected_With_Catalog updated to match the new error
  message ("Valid: BOOL, S, US, I, ...").

Docs:
- docs/v2/modbus-addressing.md — table replaced with the post-#146 codes,
  each row cites its Wonderware / Ignition reference. New "Codes removed
  in #146" subsection documents the cutover.
- docs/Driver.Modbus.Cli.md — example grammar list updated; explicit
  type-code reminder appended.

114 addressing tests + 231 driver tests still green. Solution build clean.
2026-04-25 00:51:50 -04:00
Joseph Doherty
5ea57d2d70 Task #138 — Modbus addressing grammar docs + e2e
Closes the docs/e2e end of the Modbus addressing line shipped across
#136-#145.

Docs:

- docs/v2/modbus-addressing.md (new) — full grammar reference.
  Region+offset (Modicon 5-digit / 6-digit / mnemonic), bit suffix,
  type codes (BOOL / I / UI / DI / UDI / LI / ULI / F / D / BCD / LBCD /
  STR<n>), all four byte-order mnemonics (ABCD / CDAB / BADC / DCBA),
  array-count semantics, family-native syntax (DL205 V/Y/C/X/SP and
  MELSEC D/M/X/Y with hex-vs-octal sub-family selection), driver-instance
  options (KeepAlive / Reconnect / IdleDisconnect, MaxCoilsPerRead and
  FC15/16 forcing, Deadband + WriteOnChangeOnly, MaxReadGap +
  CoalesceProhibited, multi-unit IPerCallHostResolver). Includes a worked
  JSON DTO example mixing AddressString + structured tag forms.

- docs/Driver.Modbus.Cli.md — appended a "v2 addressing grammar" section
  pointing users at the full reference, with quick-reference examples.

- Vendor-compatibility caveat documented: type codes and byte-order
  mnemonics were synthesised from training-era vendor docs (Wonderware
  DASMBTCP, Kepware KEPServerEX, Ignition, Matrikon, OAS) and should be
  verified against current vendor manuals before locking for production.

E2E tests (4 new AddressingGrammarTests in IntegrationTests):
- Modicon 5-digit and 6-digit forms map to identical wire offsets.
- Float32 + WordSwap (CDAB) round-trips end-to-end through the
  pymodbus simulator.
- Int16[5] array round-trips as a typed short[] surface.
- Block-read coalescing produces a wire-acceptable PDU when MaxReadGap=5
  bridges three nearby tags.

All tests skip gracefully when the pymodbus simulator at localhost:5020
is unreachable (matches the existing ModbusSimulatorFixture pattern).

Final test count across the Modbus addressing surface:
- 107 ModbusAddressing.Tests (parser + family + Modicon)
- 231 Driver.Modbus.Tests (driver, byte order, array, multi-unit, coalescing,
  protocol, subscribe, connection options)
- 110 Admin.Tests (incl. ModbusOptionsViewModel defaults pinning)
- 4 new AddressingGrammar integration tests (skip when sim down)
2026-04-25 00:32:27 -04:00
Joseph Doherty
75c07149d4 Task #124 — Phase 6.2 multi-user authz interop matrix + close LdapGroups gap
The Phase 6.2 evaluator was wired but received no input in production:
RoleBasedIdentity (the IUserIdentity our LDAP path produces) implemented
IRoleBearer but not ILdapGroupsBearer, so AuthorizationGate.BuildSessionState
always returned null and the gate lax-mode-allowed every request. UserAuthResult
also never carried the resolved LDAP groups, only the role-mapped strings.

Closing the gap so the evaluator gets real data:

- UserAuthResult adds Groups alongside Roles. LdapUserAuthenticator now
  surfaces the raw RDN values (ReadOnly / WriteOperate / ...) it already
  collected during the directory query. Roles stay separate per decision #150
  (control-plane Admin role mapping vs data-plane NodeAcl key).
- RoleBasedIdentity implements ILdapGroupsBearer so AuthorizationGate sees
  the groups via the same seam unit tests already use.

ThreeUserInteropMatrixTests drives the closure end-to-end against the live
GLAuth dev directory:

- 5 distinct group memberships (readonly / writeop / writetune /
  writeconfig / alarmack) plus the multi-group admin user
- Each is bound through the real LdapUserAuthenticator
- Resolved groups feed an LdapBoundIdentity that goes through the strict-mode
  AuthorizationGate against a seeded TriePermissionEvaluator
- 31 InlineData rows assert the role × operation matrix; failures pinpoint
  the exact (user, op) cell

The remaining wire-level leg of #124 — a real OPC UA client driving UserName
tokens through an encrypted endpoint policy — still needs a deployment knob
and stays a manual cross-vendor smoke (#119 / #124 manual scope). The doc
audit note in admin-ui-phase-6-status.md is updated to reflect what's now
auto'd vs what stays manual.

33/33 new tests pass against live GLAuth; existing 270 non-LiveLdap tests
in Server.Tests still pass; Core.Tests 205/205, Admin.Tests 109/109. The 7
integration-test failures observed during this run pre-exist this commit
(NodeId-scheme regression from #134) and are tracked separately as #135.
2026-04-24 20:40:07 -04:00
Joseph Doherty
d11d160395 Admin UI Phase 6 audit — close #128–#131 as already-shipped
Task-by-task audit of the Admin UI quartet shows every page listed in
the task descriptions is already built, routed, DI-wired, SignalR-live,
and covered by Admin.Tests (112/112 green):

- #128 /hosts — Hosts.razor 233 LOC with ConsecutiveFailures +
  LastCircuitBreakerOpenUtc + Stale/Faulted/Running cards
- #129 RoleGrants + AclsTab + Probe — RoleGrants.razor (192 LOC),
  AclsTab.razor (279 LOC) with the embedded Probe form at line 38
- #130 RedundancyTab — RedundancyTab.razor 175 LOC with peer
  reachability / ServiceLevel / apply-lease / failover button
- #131 Draft/Publish/Diff/Identification — DraftEditor (105 LOC) +
  Generations (73 LOC) + DiffViewer (87 LOC) + IdentificationFields
  (49 LOC), all wired to GenerationService / DraftValidationService

Shipping docs/v2/implementation/admin-ui-phase-6-status.md as the
canonical reference. Each task's required features are listed with the
exact file / LOC / routing + DI injection so future auditors don't
need to re-derive the status.

No code change in this commit — doc-only.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-24 19:07:05 -04:00
Joseph Doherty
e5d1c9c9b9 Phase 6.1 multi-host dispatch — document shipped contract + per-driver status
Task #127 / decision #144. The resilience infrastructure for per-PLC
circuit breakers is shipped and fully tested — the task description's
"current pipeline keys on DriverInstanceId only" was stale. The actual
state:

- `DriverResiliencePipelineBuilder` keys on
  `(DriverInstanceId, HostName, DriverCapability)`.
- `CapabilityInvoker.ExecuteAsync` takes `hostName` per call.
- `IPerCallHostResolver` is the driver-side hook; AB CIP implements it.
- `PerCallHostResolverDispatchTests.DeadPlc_DoesNotOpenBreaker_For_HealthyPlc_With_Resolver`
  proves the end-to-end isolation.

Remaining work is per-driver adoption, not shared infrastructure:
- AB CIP: live + tested
- Galaxy / FOCAS / OPC UA Client / AB Legacy: 1 device per instance by
  design, trivially isolated
- Modbus / S7 / TwinCAT: single-device today; multi-device refactor is
  per-driver surgery (Device row + options + resolver + transport
  fan-out), not a shared-infra change

Shipping docs/v2/multi-host-dispatch.md as the canonical reference:
contract + driver-author checklist + current fleet-wide status table.
Future driver authors follow the AB CIP template.

No code change in this commit — doc-only.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-24 19:01:47 -04:00
Joseph Doherty
a52086efc5 Refresh phase-7-e2e-smoke.md to match current wiring
The runbook shipped at phase-7 close (2026-04-20) described the original
`Doubled = Source × 2` virtual tag, Float64 seed, and flat TagId-shaped
NodeIds. Four commits later the wiring has moved:

- Seed now targets `TestMachine_001.TestHistoryValue` (Int32, writable,
  historized) — no placeholder to fill in for the dev box.
- VirtualTag is `MachineStatus` (Boolean, `Source > 0`, historized).
- NodeIds are path-based per OPC UA Part 3 §5.2.2
  (`{driverId}/{folder-path}/{browseName}`).
- Seed inserts the ClusterNodeCredential row — without it the Server
  bootstrap fails `Unauthorized: caller X is not bound to NodeId`.

Changes:

1. Step 3 — replace "edit the placeholder" instructions with the ZB
   Galaxy-Repository query that finds writable historized attributes
   (dpc CTE + HistoryExtension EXISTS + `security_classification > 0`).
2. New step 4a — LDAP + `SecurityProfile = Basic256Sha256-Sign` recipe
   for the reverse-bridge + alarm-fires stages. Anonymous sessions are
   denied writes against `Operate`-classified attributes (PR 26 gate);
   `writeop / writeop123` against the dev-box GLAuth clears it.
3. Step 6 validation commands updated to the new NodeIds + reference
   the path-based scheme's Part-3 rationale.
4. Drive-the-alarm snippet now calls `otopcua-cli write … -U writeop`
   so operators see the explicit auth step.
5. Acceptance checklist updated for the new tag names + the
   test-galaxy.ps1 `-Username` invocation.
6. Added a 2026-04-24 second-run evidence section alongside the original
   — documents the 3/7 anonymous ceiling and what's needed to reach 7/7.

No code or seed changes in this commit — doc-only.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-24 18:13:27 -04:00
Joseph Doherty
d11dd0520b Galaxy IPC unblock — live dev-box E2E path
Three root-cause fixes to get an elevated dev-box shell past session open
through to real MXAccess reads:

1. PipeAcl — drop BUILTIN\Administrators deny ACE. UAC's filtered token
   carries the Admins SID as deny-only, so the deny fired even from
   non-elevated admin-account shells. The per-connection SID check in
   PipeServer.VerifyCaller remains the real authorization boundary.

2. PipeServer — swap the Hello-read / VerifyCaller order. ImpersonateNamedPipeClient
   returns ERROR_CANNOT_IMPERSONATE until at least one frame has been read
   from the pipe; reading Hello first satisfies that rule. Previously the
   ACL deny-first path masked this race — removing the deny ACE exposed it.

3. GalaxyIpcClient — add a background reader + single pending-response
   slot. A RuntimeStatusChange event between OpenSessionRequest and
   OpenSessionResponse used to satisfy the caller's single ReadFrameAsync
   and fail CallAsync with "Expected OpenSessionResponse, got
   RuntimeStatusChange". The reader now routes response kinds (and
   ErrorResponse) to the pending TCS and everything else to a handler the
   driver registers in InitializeAsync. The Proxy was already set up to
   raise managed events from RaiseDataChange / RaiseAlarmEvent /
   OnHostConnectivityUpdate — those helpers had no caller until now.

4. RedundancyPublisherHostedService — swallow BadServerHalted while
   polling host.Server.CurrentInstance. StandardServer throws that code
   during startup rather than returning null, so the first poll attempt
   crashed the BackgroundService (and the host) before OnServerStarted
   ran. This race was latent behind the Galaxy init failure above.

Updates docs that described the Admins deny ACE + mandatory non-elevated
shells, and drops the admin-skip guards from every Galaxy integration +
E2E fixture that had them (IpcHandshakeIntegrationTests, EndToEndIpcTests,
ParityFixture, LiveStackFixture, HostSubprocessParityTests).

Adds GalaxyIpcClientRoutingTests covering the router's
request/response match, ErrorResponse, event-between-call, idle event,
and peer-close paths.

Verified live on the dev box against the p7-smoke cluster (gen 6):
driver registered=1 failedInit=0, Phase 7 bridge subscribed, OPC UA
server up on 4840, MXAccess read round-trip returns real data with
Status=0x00000000.

Task #112 — partial: Galaxy live stack is functional end-to-end. The
supplied test-galaxy.ps1 script still fails because the UNS walker
encodes TagConfig JSON as the tag's NodeId instead of the seeded TagId
(pre-existing; separate issue from this commit).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-24 16:30:16 -04:00
Joseph Doherty
fb6dd3478d Phase 6.2 Stream C wiring — AuthorizationBootstrap + OpcUaApplicationHost.SetAuthorization
Closes task #133 — the "authz gate is inert in production" blocker
surfaced during task #123. Before this commit, every ACL check on the
six dispatch surfaces (Read, Write, HistoryRead, Browse,
CreateMonitoredItems, Call) short-circuited to allow because Program.cs
constructed OpcUaApplicationHost without passing authzGate or
scopeResolver.

New pieces:

- `AuthorizationOptions` — bound to `Node:Authorization` in
  appsettings.json. `Enabled` (default false) is the master switch;
  `StrictMode` (default false) controls the anonymous / no-LDAP-groups
  fallback behaviour.
- `AuthorizationBootstrap` — singleton service that loads `NodeAcl`
  rows for the published generation, builds a `PermissionTrieCache` +
  `AuthorizationGate`, merges every registered driver's
  `EquipmentNamespaceContent` through `ScopePathIndexBuilder` into one
  full-path `NodeScopeResolver`. Returns `(null, null)` when disabled
  or when no generation is Published yet.
- `DriverEquipmentContentRegistry.Snapshot()` — new method returning a
  defensive copy of the driver → content map so the bootstrap can
  iterate without holding the lock.
- `OpcUaApplicationHost.SetAuthorization(gate, resolver)` — late-bind
  method matching the existing `SetPhase7Sources` pattern. Must run
  before `StartAsync`; rejects post-start rebinding with
  InvalidOperationException.
- `OpcUaServerService.ExecuteAsync` calls `AuthorizationBootstrap.BuildAsync`
  after `PopulateEquipmentContentAsync` and before `applicationHost.StartAsync`,
  in the same window that `SetPhase7Sources` runs.

Behaviour change
- Default (Enabled=false): no behaviour change — the gate stays null,
  all six dispatch surfaces run unchanged. Safe for any existing
  deployment on upgrade.
- Enabled=true with StrictMode=false: identities carrying LDAP groups
  are evaluated against the trie; anonymous / no-groups identities
  pass through (v1 legacy-client compatibility).
- Enabled=true with StrictMode=true: everything evaluates. Anonymous
  or no-groups identities are denied.

Follow-up not covered here: rebind the gate+resolver on generation
refresh (the `GenerationRefreshHostedService` that shipped earlier in
this session). Today the gate only reflects the bootstrap generation
— operators publishing new ACL changes need a process restart to see
them. Matches the current driver-hot-reload limitation and is tracked
in the existing 6.3 follow-up bullet.

Docs: v2-release-readiness.md Phase 6.2 Stream C.12 bullet flipped to
Closed with operator-facing config pointer (`Node:Authorization:Enabled`).

All 283/283 Server.Tests still pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-24 15:35:46 -04:00
Joseph Doherty
1be0fb5a29 Phase 6.2 Stream C.12 — lock in ScopePathIndexBuilder semantics with tests
Closes task #123 (partial — builder semantics unit-tested; production
wiring is the new task #133).

ScopePathIndexBuilder + NodeScopeResolver indexed mode already exist —
they produce a full Cluster → Namespace → UnsArea → UnsLine → Equipment
→ Tag scope from the published generation's config rows. What was
missing: unit coverage of the Build semantics (the only consumers were
compile-time references) + explicit acknowledgement in the readiness
doc that the gate/resolver aren't yet wired into Program.cs.

Tests — 6 cases in ScopePathIndexBuilderTests.cs:
- Well-formed content emits full hierarchy.
- Tags with null EquipmentId skipped (SystemPlatform-namespace fallback).
- Tags with broken Equipment FK skipped (publish-time validation
  should have caught; builder is defensive).
- Equipment with broken Line FK skipped.
- Duplicate TagConfig throws InvalidOperationException.
- Resolver with index returns full-path scope; un-indexed ref falls
  through to cluster-only scope (pre-ADR-001 behaviour preserved).

Server.Tests 277 → 283.

Critical follow-up (task #133): Program.cs still constructs
OpcUaApplicationHost WITHOUT authzGate or scopeResolver, so all six
dispatch-layer gates (Read, Write, HistoryRead, Browse,
CreateMonitoredItems, Call) are currently inert in production. Wiring
them up — load NodeAcl + EquipmentNamespaceContent at bootstrap,
construct gate + resolver, pass into OpcUaApplicationHost, rebind on
generation refresh — is the last Phase 6.2 GA blocker.

Docs: v2-release-readiness.md Phase 6.2 Stream C hardening list marks
the scope-resolution bullet struck-through with a close-out note that
calls out the gate-inert-in-production gap + task #133.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-24 15:28:19 -04:00
Joseph Doherty
ded292ecd7 Phase 6.2 Stream C — Call + Alarm Acknowledge/Confirm gating
Closes task #122 (Acknowledge + Confirm + generic Call — Shelve stays as
a follow-up pending per-instance method-NodeId resolution).

Before this commit any session with a connected channel could invoke
method nodes on driver-materialized equipment — including alarm
Acknowledge / Confirm. Combined with the Browse + CreateMonitoredItems
gates that landed earlier in Stream C, this was the last service-layer
entry point where a session could still affect state without passing
the authz trie.

Implementation on DriverNodeManager:
- `Call` override — pre-iterates methodsToCall, gates each through
  AuthorizationGate with the operation kind returned by
  MapCallOperation. Denied calls get errors[i] = BadUserAccessDenied
  before delegating to base.Call.
- `MapCallOperation(NodeId methodId)` — maps well-known Part 9 method
  NodeIds to dedicated operation kinds:
    MethodIds.AcknowledgeableConditionType_Acknowledge →
        OpcUaOperation.AlarmAcknowledge
    MethodIds.AcknowledgeableConditionType_Confirm →
        OpcUaOperation.AlarmConfirm
    everything else → OpcUaOperation.Call
  Lets the ACL distinguish "can acknowledge alarms" from "can invoke
  arbitrary methods" without conflating the two roles.
- Shelve dispatch paths through per-instance ShelvedStateMachine methods
  with dynamic NodeIds that can't be constant-matched — falls through
  to generic Call. Fine-grained OpcUaOperation.AlarmShelve is a follow-
  up when the method-invocation path grows a "method-role" annotation.

Extracted GateCallMethodRequests + MapCallOperation as static internal
for unit-testability. 8 new tests (MapCallOperation Acknowledge /
Confirm / generic; gate-null no-op, denied-Acknowledge, allowed-
Acknowledge, mixed-batch, pre-populated-error-preserved).
Server.Tests 269 → 277.

Known follow-ups:
- Shelve per-operation gating (see above).
- TranslateBrowsePathsToNodeIds gating (Browse follow-up from #120).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-24 15:22:19 -04:00
Joseph Doherty
6a6b0f56f2 Phase 6.2 Stream C — CreateMonitoredItems per-item gating
Closes task #121 (partial — creation-time gate; decision #153 per-item
revocation stamp is a follow-up).

Before this commit a session could subscribe to any node via
CreateMonitoredItems, even nodes where Read was denied — the
subscription would surface BadUserAccessDenied on each data-change
read, but the client saw a successful CreateMonitoredItems response
and held the subscription open, wasting resources and leaking the
address-space shape through the item metadata.

New override on DriverNodeManager.CreateMonitoredItems:
- Pre-iterates itemsToCreate, gates each through AuthorizationGate with
  OpcUaOperation.CreateMonitoredItems at the target node's scope.
- For denied slots: sets errors[i] = new ServiceResult(
  StatusCodes.BadUserAccessDenied). The OPC Foundation base stack
  honours pre-populated non-success errors and skips item creation for
  those slots — the subscription never holds a handle to a denied
  node.
- Preserves prior errors (e.g. BadNodeIdUnknown) — first diagnosis wins.
- Non-string-identifier references (stack-synthesized numeric ids)
  bypass the gate.

Extracted the pure filter logic into
GateMonitoredItemCreateRequests(items, errors, identity, gate,
scopeResolver) — static internal, unit-testable without the OPC UA
server stack.

Tests — 6 new in MonitoredItemGatingTests.cs (gate-null no-op,
denied-gets-BadUserAccessDenied, allowed-passes, mixed-batch-denies-
per-item, pre-populated-error-preserved, numeric-id-bypass). Server.Tests
263 → 269.

Known follow-ups:
- Per-item (AuthGenerationId, MembershipVersion) stamp (decision #153)
  for detecting revocation mid-subscription — needs subscription-layer
  plumbing.
- TransferSubscriptions not yet wired (same pattern, smaller scope).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-24 15:17:40 -04:00
Joseph Doherty
e8b8541554 Phase 6.2 Stream C — Browse gating on DriverNodeManager
Closes task #120 (partial — strict point-check; ancestor-visibility
implication is a follow-up).

Before this commit DriverNodeManager exposed every materialized node to
every browsing session regardless of the user's ACL. Read + Write +
HistoryRead were already gated through AuthorizationGate in Phase 6.2
Stream C core; Browse was the one surface where the session could still
enumerate nodes it had no permission to touch, discovering structure
even when reads failed with BadUserAccessDenied.

Implementation
- New `Browse` override on DriverNodeManager that calls base.Browse
  first (lets the stack populate the reference list normally), then
  post-filters the IList<ReferenceDescription> so denied nodes are
  removed silently. OPC UA convention: Browse filtering is invisible to
  the client; no BadUserAccessDenied surfaces.
- Extracted the filter loop into the static internal
  `FilterBrowseReferences(references, userIdentity, gate, scopeResolver)`
  so the policy is unit-testable without standing up the full OPC UA
  server stack.
- Non-string NodeId identifiers (stack-synthesized standard-type
  references with numeric identifiers) bypass the gate — only driver-
  materialized nodes key into the authz trie.
- When AuthorizationGate or NodeScopeResolver is null, the filter is a
  no-op — preserves the pre-Phase-6.2 dispatch path for integration
  tests that construct DriverNodeManager without authz.

Tests — 6 new in BrowseGatingTests.cs (gate-null no-op, empty-list
no-op, denied-removed, allowed-passes-through, numeric-id bypass,
lax-mode null-identity keeps references). Server.Tests 257 → 263.

Known follow-up (tracked implicitly under #120 re-scope):
- Ancestor-visibility implication (acl-design.md §Browse line 111): a
  user with Read at `Line/Tag` should be able to Browse `Line` even
  without an explicit Browse grant. Current filter does a strict
  point-check. Proper fix needs TriePermissionEvaluator to expose a
  "subtree-has-any-grant" query.
- TranslateBrowsePathsToNodeIds not yet filtered (same extension
  pattern; small follow-up).

Docs: v2-release-readiness.md Phase 6.2 Stream C hardening list marks
the Browse bullet struck-through with "Partial" close-out note.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-24 15:11:19 -04:00
Joseph Doherty
a23de2a7e4 Phase 6.3 A.2 + D.1 — GenerationRefreshHostedService: poll + lease-wrap apply
Closes tasks #132 + #118 (GA hardening backlog).

Before this commit, the Server only observed the generation in force at
process start (SealedBootstrap). Peer-published generations accumulated
in the shared config DB while the running node kept serving the
generation it had sealed on boot. Two consequences:

1. Operator role-swaps required a process restart — Admin publishes a
   new generation, but the Server's RedundancyCoordinator never re-read
   the topology.
2. ApplyLeaseRegistry had no apply to wrap. ServiceLevelBand sat at
   PrimaryHealthy (255) during every publish because nothing opened a
   lease; PrimaryMidApply (200) was effectively dead code.

New GenerationRefreshHostedService (src/.../Server/Hosting/):
- Polls sp_GetCurrentGenerationForCluster every 5s (tunable).
- On change: opens leases.BeginApplyLease(newGenerationId, Guid.NewGuid()),
  calls coordinator.RefreshAsync inside the `await using`, releases on
  scope exit (success / exception / cancellation via IAsyncDisposable).
- Diagnostic properties: LastAppliedGenerationId, TickCount, RefreshCount.
- Delegate-injected currentGenerationQuery for test drive-through; real
  path is the private static DefaultQueryCurrentGenerationAsync.
- Registered as HostedService in Program.cs alongside the Phase 6.3
  redundancy / peer-probe stack.

Scope intentionally narrow: only the coordinator refreshes today. Driver
re-init, virtual-tag re-bind, script-engine reload remain as follow-up
wiring. The lease wrap is the right seam for those subscribers to hook
once they grow hot-reload support — the doc comments say so.

Tests
- 5 new unit tests in GenerationRefreshHostedServiceTests (first-apply,
  identity no-op, change-triggers-refresh, null-generation-is-no-op,
  lease-is-released-on-exit). Stub generation-query delegate; real
  coordinator backed by EF InMemory DB.
- Server.Tests total 252 → 257.

Docs
- v2-release-readiness.md Phase 6.3 follow-ups list marks the
  sp_PublishGeneration lease wrap bullet struck-through with close-out
  note.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-24 15:02:33 -04:00
Joseph Doherty
de77d42eab Phase 6.3 Stream B — peer-probe HostedServices populating PeerReachabilityTracker
Closes task #116 (GA hardening backlog). Before this commit the
RedundancyStatePublisher saw PeerReachability.Unknown for every peer
because the tracker had no writers — every healthy peer got
degraded to the Isolated-Primary band (230) even when fully reachable.
Not release-blocking (safe default), but not the full non-transparent-
redundancy UX either.

Two-layer probe model per docs/v2/implementation/phase-6-3-redundancy-runtime.md
§Stream B:

- PeerHttpProbeLoop (Stream B.1) — fast-fail layer at 2 s / 1 s timeout.
  Hits each peer's http://{Host}:{DashboardPort}/healthz via an injected
  IHttpClientFactory. Writes the HTTP bit of PeerReachability while
  preserving the UA bit from the last UA probe so a transient HTTP blip
  doesn't clobber the authoritative UA reading.

- PeerUaProbeLoop (Stream B.2) — authoritative layer at 10 s / 5 s
  timeout. Calls DiscoveryClient.GetEndpoints against opc.tcp://{Host}:
  {OpcUaPort} — cheap compared to a full Session.Create, no cert trust
  required. Short-circuits when the HTTP probe last reported the peer
  unhealthy (no wasted handshakes on a known-dead endpoint), clearing
  the stale UaHealthy bit in that case.

Both inherit from BackgroundService, follow the tick/delay/catch pattern
RedundancyPublisherHostedService + ResilienceStatusPublisherHostedService
established, and expose TickAsync() as internal for test drive-through.

New PeerProbeOptions class carries the four intervals/timeouts so
operators can tune cadence per site. Registered as singleton in Program.cs;
HTTP client registered by name so the OtOpcUa handler chain
(Serilog enrichers, potential future OpenTelemetry instrumentation) isn't
bypassed.

Tests — 9 new unit tests across PeerHttpProbeLoopTests (5) and
PeerUaProbeLoopTests (4). All pass. Server.Tests total 243 → 252.
Full solution build clean.

Docs: v2-release-readiness.md Phase 6.3 follow-ups list marks the
peer-probe bullet struck-through with a close-out note.

Still deferred in Phase 6.3:
  - OPC UA variable-node binding (task #117 — ServiceLevel + ServerUriArray)
  - sp_PublishGeneration lease wrap (task #118)
  - Client interop matrix (task #119)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-24 14:53:38 -04:00
Joseph Doherty
69e0d02c72 task-galaxy-e2e branch — non-FOCAS work-in-progress snapshot
Catch-all commit for pending work on the task-galaxy-e2e branch that
wasn't part of the FOCAS migration. Grouping by topic so future per-topic
commits can be cherry-picked if needed.

TwinCAT
- src/.../Driver.TwinCAT/AdsTwinCATClient.cs + TwinCATDriverFactoryExtensions.cs:
  factory-registration extensions + ADS client refinements.
- src/.../Driver.TwinCAT.Cli/Commands/BrowseCommand.cs: new browse command
  for the TwinCAT test-client CLI.
- tests/.../Driver.TwinCAT.IntegrationTests/TwinCAT3SmokeTests.cs + TwinCatProject/:
  fixture scaffold with a minimal POU + README pointing at the TCBSD/ESXi
  VM for e2e.
- docs/Driver.TwinCAT.Cli.md + docs/drivers/TwinCAT-Test-Fixture.md:
  documentation for the above.
- docs/v3/twincat-backlog.md: forward-looking backlog seed.

Admin UI + fleet status
- src/.../Admin/Components/Pages/Clusters/DriversTab.razor + Hosts.razor:
  UI refresh for fleet-status rendering.
- src/.../Admin/Hubs/FleetStatusHub.cs + FleetStatusPoller.cs +
  Admin/Program.cs: SignalR hub + poller plumbing for live fleet data.
- tests/.../Admin.Tests/FleetStatusPollerTests.cs: poller coverage.

Server + redundancy runtime (Phase 6.3 follow-ups)
- src/.../Server/Hosting/RedundancyPublisherHostedService.cs: HostedService
  that owns the RedundancyStatePublisher lifecycle + wires peer reachability.
- src/.../Server/Redundancy/ServerRedundancyNodeWriter.cs: OPC UA
  variable-node writer binding ServiceLevel + ServerUriArray to the
  publisher's events.
- src/.../Server/Program.cs + Server.csproj: hosted-service registration.
- tests/.../Server.Tests/ServerRedundancyNodeWriterTests.cs +
  Server.Tests.csproj: coverage for the above.

Configuration
- src/.../Configuration/Validation/DraftValidator.cs +
  tests/.../Configuration.Tests/DraftValidatorTests.cs: draft-validation
  refinements.

E2E scripts (shared infrastructure)
- scripts/e2e/README.md + _common.ps1 + test-all.ps1: shared helpers + the
  all-drivers test-all runner.
- scripts/e2e/test-opcuaclient.ps1: OPC UA Client e2e runner.

Docs
- docs/v2/implementation/phase-6-{1,2,3,4}*.md + exit-gate-phase-{3,7}.md:
  phase-gate + implementation doc updates.
- docs/v2/plan.md: top-level plan refresh.
- docs/v2/redundancy-interop-playbook.md: client interop playbook for the
  Phase 6.3 redundancy-runtime work.

Two orphan FOCAS docs remain on disk but deliberately unstaged —
docs/v2/focas-deployment.md and docs/v2/implementation/focas-simulator-plan.md
describe the now-retired Tier-C topology and should either be rewritten
or deleted in a follow-up.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-24 14:12:19 -04:00
Joseph Doherty
4b0664bd55 FOCAS — retire Tier-C split, inline managed wire client, make read-only
Migration closes the FOCAS Tier-C architecture. OtOpcUa previously had
`Driver.FOCAS.Host` (NSSM-wrapped Windows service loading Fwlib64.dll via
P/Invoke) + `Driver.FOCAS.Shared` (MessagePack IPC contracts) + a C shim
DLL stand-in for unit tests. All of it is deleted; the driver is now a
single in-process managed assembly talking the FOCAS/2 Ethernet binary
protocol directly on TCP:8193.

Architecture

- Pure-managed `FocasWireClient` inlined at `src/.../Driver.FOCAS/Wire/`
  (owner-imported — see Wire/FocasWireClient.cs for the full surface).
  Opens two TCP sockets, runs the initiate handshake, serialises requests
  on socket 2 through a semaphore, closes cleanly with PDU + socket
  teardown. Both sync `IDisposable` and async `IAsyncDisposable`.
- `WireFocasClient` (same folder) adapts the wire client to OtOpcUa's
  `IFocasClient` surface — fixed-tree reads, PARAM/MACRO/PMC addresses,
  alarms. Writes return `BadNotWritable` by design — OtOpcUa is read-only
  against FOCAS.
- `FocasDriverFactoryExtensions` now accepts `"Backend": "wire"` (default)
  and `"Backend": "unimplemented"`. Legacy `ipc` and `fwlib` backends are
  rejected at startup with a diagnostic pointing at the migration doc.

Deletions

- `src/ZB.MOM.WW.OtOpcUa.Driver.FOCAS.Host/` — whole project + Ipc/,
  Backend/, Stability/, Program.cs.
- `src/ZB.MOM.WW.OtOpcUa.Driver.FOCAS.Shared/` — Contracts/, FrameReader,
  FrameWriter, whole project.
- `tests/...Driver.FOCAS.Host.Tests/` + `.Shared.Tests/` — whole projects.
- `src/.../Driver.FOCAS/FwlibNative.cs` + `FwlibFocasClient.cs` — 21
  P/Invokes + 7 `Pack=1` marshalling structs + the Fwlib-backed
  `IFocasClient` implementation.
- `src/.../Driver.FOCAS/Ipc/` + `Supervisor/` — IPC client wrapper +
  Host-process supervisor (backoff, circuit breaker, heartbeat, post-
  mortem reader, process launcher).
- `scripts/install/Install-FocasHost.ps1` — NSSM service installer.
- `tests/.../Driver.FOCAS.Tests/{IpcFocasClientTests, IpcLoopback,
  FwlibNativeHelperTests, PostMortemReaderCompatibilityTests,
  SupervisorTests, FocasDriverFactoryExtensionsTests}.cs` — tests that
  exercised the retired surfaces.
- `tests/.../Driver.FOCAS.IntegrationTests/Shim/` — the zig-built C shim
  DLL that masqueraded as Fwlib64.dll.

Solution changes

- `ZB.MOM.WW.OtOpcUa.slnx` drops the 4 retired project refs.
- `src/.../Driver.FOCAS.csproj` drops the Shared ProjectReference, adds
  `Microsoft.Extensions.Logging.Abstractions` for the optional `ILogger`
  hook in `FocasWireClient`.
- `src/.../Driver.FOCAS.Cli.csproj` drops the six `<Content Include>`
  entries that copied `vendor/fanuc/*.dll` into the CLI bin. CLI now uses
  `WireFocasClient` directly.
- `FocasDriver` default factory flips to `Wire.WireFocasClientFactory`.

Integration tests

- New `tests/.../Driver.FOCAS.IntegrationTests/` project covering fixed-
  tree reads (identity, axes, dynamic, program, operation mode, timers,
  spindle load + max RPM, servo meters), user-authored PARAM / MACRO /
  PMC reads, `DiscoverAsync` emission, `SubscribeAsync` + `OnDataChange`,
  `IAlarmSource` raise/clear transitions, and `ProbeAsync` /
  `OnHostStatusChanged`. 9 e2e tests against the focas-mock fixture
  (Docker container with the vendored Python mock's native FOCAS/2
  Ethernet responder).
- `scripts/integration/run-focas.ps1` orchestrates compose up → tests →
  compose down. Dropped the shim-build stage + DLL-copy step + the split
  testhost workaround (the latter only existed because of native-DLL
  lifecycle bugs the shim tripped).
- Docker compose collapses from 11 per-series services to one `focas-sim`
  service. Tests seed per-series state via `mock_load_profile` at test
  start.
- Vendored focas-mock snapshot refreshed to pick up upstream's native
  FOCAS/2 Ethernet responder (was 660 lines, now 1018) — the
  pre-refresh snapshot only spoke the JSON admin protocol.

Tests

- 145/145 unit tests in `Driver.FOCAS.Tests` pass (was 208 pre-deletion;
  63 removed tests exercised the retired IPC/shim/supervisor/Fwlib
  surfaces).
- 9/9 integration tests pass against the refreshed mock.
- `FocasScaffoldingTests.Unimplemented_factory_throws_on_Create…` updated
  to assert the new diagnostic message pointing at
  `docs/drivers/FOCAS.md` rather than the now-gone `Fwlib64.dll`.

Docs

- `docs/drivers/FOCAS.md` rewritten for the managed wire topology —
  deployment collapses to one `"Backend": "wire"` config block, no
  separate service, no DLL deployment, no pipe ACL.
- `docs/drivers/FOCAS-Test-Fixture.md` updated — single TCP probe skip
  gate instead of TCP + shim probe; fewer moving parts.
- `docs/drivers/README.md` row for FOCAS reflects the Tier-A managed
  topology (previously listed Tier-C + `Fwlib64.dll` P/Invoke).
- `docs/Driver.FOCAS.Cli.md` drops the Tier-C architecture-note section.
- `docs/v2/implementation/focas-isolation-plan.md` marked historical —
  the plan it documents was executed then superseded by the wire client.
- `docs/v2/v2-release-readiness.md` re-audited 2026-04-24. Phase 5
  driver complement closed. FOCAS change-log entry added.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-24 14:10:59 -04:00
Joseph Doherty
404b54add0 FOCAS — commit previously-orphaned support files
Brings seven FOCAS-related files into git that shipped as part of earlier
FOCAS work but were never staged. Adding them now so the tree reflects the
compilable state + pre-empts dead references from the migration commit that
follows:

- src/.../Driver.FOCAS/FocasAlarmProjection.cs — raise/clear diffing + severity
  mapping surfaced via IAlarmSource on FocasDriver. Referenced by committed
  FocasDriver.cs; tests in FocasAlarmProjectionTests.cs.
- src/.../Admin/Services/FocasDriverDetailService.cs — Admin UI per-instance
  detail page data source.
- src/.../Admin/Components/Pages/Drivers/FocasDetail.razor — Blazor page
  rendering the above (from task #69).
- tests/.../Admin.Tests/FocasDriverDetailServiceTests.cs — exercises the
  detail service.
- tests/.../Driver.FOCAS.Tests/FocasAlarmProjectionTests.cs — raise/clear
  diff semantics against FakeFocasClient.
- tests/.../Driver.FOCAS.Tests/FocasHandleRecycleTests.cs — proactive recycle
  cadence test.
- docs/v2/implementation/focas-wire-protocol.md — captured FOCAS/2 Ethernet
  wire protocol reference. Useful going forward even though the Tier-C /
  simulator plan docs are historical.

No runtime behaviour change — these files compile today and the solution
build/test pass already depends on them.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-24 14:09:51 -04:00
Joseph Doherty
98a8031772 Phase 7 follow-up #240 — Live OPC UA E2E smoke runbook + seed + first-run evidence
Closes the live-smoke validation Phase 7 deferred to. Ships:

## docs/v2/implementation/phase-7-e2e-smoke.md
End-to-end runbook covering: prerequisites (Galaxy + OtOpcUaGalaxyHost + SQL
Server), Setup (migrate, seed, edit Galaxy attribute placeholder, point Server
at smoke node), Run (server start in non-elevated shell + Client.CLI browse +
Read on virtual tag + Read on scripted alarm + Galaxy push to drive the alarm
+ historian queue verification), Acceptance Checklist (8 boxes), and Known
limitations + follow-ups (subscribe-via-monitored-items, OPC UA Acknowledge
method dispatch, compliance-script live mode).

## scripts/smoke/seed-phase-7-smoke.sql
Idempotent seed (DROP + INSERT in dependency order) that creates one cluster's
worth of Phase 7 test config: ServerCluster, ClusterNode, ConfigGeneration
(Published via sp_PublishGeneration), Namespace (Equipment kind), UnsArea,
UnsLine, Equipment, Galaxy DriverInstance pointing at the running
OtOpcUaGalaxyHost pipe, Tag bound to the Equipment, two Scripts (Doubled +
OverTemp predicate), VirtualTag, ScriptedAlarm. Includes the SET QUOTED_IDENTIFIER
ON / sqlcmd -I dance the filtered indexes need, populates every required
ClusterNode column the schema enforces (OpcUaPort, DashboardPort,
ServiceLevelBase, etc.), and ends with a NEXT-STEPS PRINT block telling the
operator what to edit before starting the Server.

## First-run evidence on the dev box

Running the seed + starting the Server (non-elevated shell, Galaxy.Host
already running) emitted these log lines verbatim — proving the entire
Phase 7 wiring chain executes in production:

  Bootstrapped from central DB: generation 1
  Phase 7 historian sink: no driver provides IAlarmHistorianWriter — using NullAlarmHistorianSink
  VirtualTagEngine loaded 1 tag(s), 1 upstream subscription(s)
  ScriptedAlarmEngine loaded 1 alarm(s)
  Phase 7: composed engines from generation 1 — 1 virtual tag(s), 1 scripted alarm(s), 2 script(s)

Each line corresponds to a piece shipped in #243 / #244 / #245 / #246 / #247.
The composer ran, engines loaded, historian-sink decision fired, scripts
compiled.

## Surfaced — pre-Phase-7 deployment-wiring gaps (NOT Phase 7 regressions)

1. Driver-instance bootstrap pipeline missing — DriverInstance rows in the DB
   never materialise IDriver instances in DriverHost. Filed as task #248.
2. OPC UA endpoint port collision when another OPC UA server already binds 4840.
   Operator concern; documented in the runbook prereqs.

Both predate Phase 7 + are orthogonal. Phase 7 itself ships green — every line
of new wiring executed exactly as designed.

## Phase 7 production wiring chain — VALIDATED end-to-end

-  #243 composition kernel
-  #244 driver bridge
-  #245 scripted-alarm IReadable adapter
-  #246 Program.cs wire-in
-  #247 Galaxy.Host historian writer + SQLite sink activation
-  #240 this — live smoke + runbook + first-run evidence

Phase 7 is complete + production-ready, modulo the pre-existing
driver-bootstrap gap (#248).
2026-04-20 22:32:33 -04:00
Joseph Doherty
82e4e8c8de Phase 7 Stream H — exit gate compliance script + closeout doc
Ships the check-everything PowerShell script + the human-readable exit-gate doc that
closes Phase 7 (scripting runtime + virtual tags + scripted alarms + historian sink
+ Admin UI + address-space integration).

## scripts/compliance/phase-7-compliance.ps1

Mirrors the Phase 6.x compliance pattern. Checks:
- Stream A: Roslyn sandbox wiring, ForbiddenTypeAnalyzer, DependencyExtractor,
  ScriptLogCompanionSink, Deadband helper
- Stream B: VirtualTagEngine, DependencyGraph (iterative Tarjan),
  SemaphoreSlim async-safe cascade, TimerTriggerScheduler, VirtualTagSource
- Stream C: Part9StateMachine, AlarmConditionState GxP audit Comments,
  MessageTemplate {TagPath}, AlarmPredicateContext SetVirtualTag rejection,
  ScriptedAlarmSource IAlarmSource, IAlarmStateStore + in-memory store
- Stream D: BackoffLadder 1-60s, DefaultDeadLetterRetention (30 days),
  HistorianWriteOutcome enum, Galaxy.Host IPC contracts
- Stream E: Four new entities + check constraints + Phase 7 migration
- Stream F: Five Admin services + ScriptEditor + ScriptsTab + AlarmsHistorian
  page + Monaco loader + DraftEditor wire-up + declared-inputs-only contract
- Stream G: NodeSourceKind discriminator + walker VirtualTag/ScriptedAlarm emission
  + DriverNodeManager SelectReadable + IsWriteAllowedBySource
- Deferred (flagged, not blocking): SealedBootstrap composition, live end-to-end
  smoke, sp_ComputeGenerationDiff extension
- Cross-cutting: full-solution dotnet test (regression check against 1300 baseline)

## docs/v2/implementation/exit-gate-phase-7.md

Summarises shipped PRs (Streams A-G + G follow-up = 8 PRs, ~197 tests), lists the
compliance checks covered, names the deferred follow-ups with task IDs, and points
at the compliance script for verification.

## Exit-gate local run

2191 tests green (baseline 1300), 0 failures, 55 compliance checks PASS,
3 deferred (with follow-up task IDs).

Phase 7 ships.
2026-04-20 20:25:11 -04:00
Joseph Doherty
2a74daf228 ADR-002 — driver-vs-virtual dispatch: DriverNodeManager routes reads/writes/subscriptions across driver tags and virtual (scripted) tags via a single NodeManager with a NodeSource tag on NodeScopeResolver's output. Locks the architecture decision Phase 7 Stream G was going to have to make anyway — documenting it up front so the stream implementation can reference the chosen shape instead of rediscovering it. Option A (separate VirtualTagNodeManager sibling) rejected because shared Equipment folders owning both driver and virtual children would force two NodeManagers to fight for ownership on every Equipment node — the common case, not the exception — defeating the separation. Option C (virtual engine registers as a synthetic IDriver through DriverTypeRegistry) rejected because DriverInstance shape is wrong for scripting config (no DriverType, no HostAddress, no connectivity probe, no NSSM wrapper), IDriver.InitializeAsync semantics don't match script compilation, Polly resilience wrappers calibrated for network calls would either passthrough pointlessly or tune wrong, and Admin UI would need special-casing everywhere to hide fields that don't apply. Option B (single DriverNodeManager, NodeScopeResolver returns NodeSource enum alongside ScopeId, dispatch branches on source) accepted because it preserves one address-space tree with one walker, ACL binding works identically for both kinds, Phase 6.1 resilience + Phase 6.2 audit apply uniformly to the driver branch without needing Roslyn analyzer exemptions, and adding future source kinds is a single-enum-case addition. NodeScopeResolver.Resolve returns NodeScope(ScopeId, NodeSource, DriverInstanceId?, VirtualTagId?); DriverNodeManager pattern-matches on scope.Source and routes to either the driver dictionary or IVirtualTagEngine. OPC UA client writes to a virtual node return BadUserAccessDenied before the dispatch branch because Phase 7 decision #6 restricts virtual-tag writes to scripts via ctx.SetVirtualTag. Dispatch test coverage specified for Stream G.4: mixed Equipment folders browsing correctly, read routing per source kind, subscription fan-out across both kinds, the BadUserAccessDenied guard on virtual writes, and script-driven writes firing subscription notifications. ADR-001's walker gains the VirtualTag config-DB table as an additional input channel alongside Tag; NodeScopeResolver's ScopeId return stays unchanged so Phase 6.2's ACL trie needs no modification. Consequences flagged: whether IVirtualTagEngine lives in Core.Abstractions vs Phase 7's Core.VirtualTags project, and whether future server-side methods on virtual nodes would route through this dispatch, both marked out-of-scope for ADR-002.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-20 16:08:01 -04:00
Joseph Doherty
f2c1cc84e9 Phase 7 plan doc — scripting runtime + virtual tags + scripted alarms + historian alarm sink. Draft output from the 2026-04-20 interactive planning session. Phase 7 is the last phase before v2 release readiness; adds two additive runtime capabilities on top of the existing driver + Equipment address-space foundation: (1) virtual (calculated) tags — OPC UA variables whose values are computed by user-authored C# scripts against other tags, evaluated on change and/or timer, living in the existing Equipment tree alongside driver tags, behaving identically to clients; (2) Part 9 scripted alarms — full state machine (EnabledState/ActiveState/AckedState/ConfirmedState/ShelvingState) with persistent operator-supplied state across restarts, complementing (not replacing) the existing Galaxy-native and AB CIP ALMD alarm sources. A third tie-in capability — Aveva Historian as alarm system of record — routes every qualifying alarm transition from any IAlarmSource (scripted + Galaxy + ALMD) through a local SQLite store-and-forward queue to Galaxy.Host, which uses its already-loaded aahClientManaged DLLs to write to the Historian alarm schema; per-alarm HistorizeToAveva toggle gates which sources flow (default off for Galaxy-native to avoid duplicating the direct Galaxy historian path, default on for scripted).
Locks in 22 design decisions from the planning conversation: C# via Roslyn scripting; virtual tags in the Equipment tree (not a separate /Virtual/ namespace); change-driven + timer-driven triggers operator-configurable per tag; Shape A one-script-per-tag-or-alarm (no predicate/action split); full OPC UA Part 9 alarm fidelity; read-only sandbox (scripts read any tag, write only to virtual tags, no File/HttpClient/Process/reflection); AST-inferred dependencies via CSharpSyntaxWalker (non-literal tag paths rejected at publish); config DB storage with generation-sealed cache; ctx.GetTag returns a full DataValue {Value, StatusCode, Timestamp}; per-tag Historize checkbox; per-tag error isolation (throwing script sets tag quality BadInternalError, engine unaffected); dedicated scripts-*.log Serilog sink bound to ctx.Logger; alarm message as template with {TagPath} substitution resolved at event emission; ActiveState recomputed from tags on startup while EnabledState/AckedState/ConfirmedState/ShelvingState + audit persist to config DB; historian sink scope = all IAlarmSource impls with per-alarm toggle; SQLite store-and-forward on the node so operators are never blocked by Historian downtime; IPC to Galaxy.Host for ingestion reusing the already-loaded aahClientManaged DLLs; Monaco editor for Admin code editing; serial cascade evaluation for v1 (parallel as follow-up); shelving UX via OPC UA method calls only with no custom Admin controls (operator drives state transitions from plant HMIs or Client.CLI); 30-day dead-letter retention with manual retry button; test harness accepts only declared-input paths so the harness enforces dependency declaration.

Eight streams totaling ~10-12 weeks, scope-comparable to Phase 6: A - Core.Scripting (Roslyn engine + sandbox + AST inference + logger); B - virtual tag engine (dependency graph + change/timer schedulers + historize); C - scripted alarm engine (Part 9 state machine + template messages + startup recovery + OPC UA method binding); D - historian alarm sink (SQLite store-and-forward + Galaxy.Host IPC contract extension); E - config DB schema (four new tables under sp_PublishGeneration); F - Admin UI scripting tab (Monaco + test harness + dependency preview + script-log viewer + historian diagnostics); G - address-space integration (extend EquipmentNodeWalker for virtual source kind + extend DriverNodeManager dispatch); H - exit gate.

Compliance-check surface covers sandbox escape (typeof/Assembly.Load/File/HttpClient attempts must fail at compile), dependency inference (literal-only paths), change cascade (topological ordering), cycle rejection at publish, startup recovery (ack/confirm/shelve survive restart but ActiveState recomputed), ack audit trail persistence, historian queue durability (Galaxy.Host offline → online drains in-order), per-alarm historian toggle gating, script timeout isolation, log sink isolation, ACL binding (virtual tags inherit Equipment scope grants).

Follow-up artifacts tracked as tasks #231-#238 (stream placeholders). Supporting doc updates (plan.md §6 Migration Strategy, config-db-schema.md §§ for the four new tables, driver-specs.md §Alarm semantics clarification, new ADR-002 for driver-vs-virtual dispatch) will land alongside the streams that touch them, not in this doc.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-20 16:05:12 -04:00
Joseph Doherty
8d88ffa14d FOCAS Tier-C PR E — ops glue: ProcessHostLauncher + post-mortem MMF + NSSM install scripts + doc close-out. Final of the 5 PRs for #220. With this landing, the Tier-C architecture is fully shipped; the only remaining FOCAS work is the hardware-dependent FwlibHostedBackend (real Fwlib32.dll P/Invoke, gated on #222 lab rig).
Production IHostProcessLauncher (ProcessHostLauncher.cs): Process.Start spawns OtOpcUa.Driver.FOCAS.Host.exe with OTOPCUA_FOCAS_PIPE / OTOPCUA_ALLOWED_SID / OTOPCUA_FOCAS_SECRET / OTOPCUA_FOCAS_BACKEND in the environment (supervisor-owned, never disk), polls FocasIpcClient.ConnectAsync at 250ms cadence until the pipe is up or the Host exits or the ConnectTimeout deadline passes, then wraps the connected client in an IpcFocasClient. TerminateAsync kills the entire process tree + disposes the IPC stream. ProcessHostLauncherOptions carries HostExePath + PipeName + AllowedSid plus optional SharedSecret (auto-generated from a GUID when omitted so install scripts don't have to), Arguments, Backend (fwlib32/fake/unconfigured default-unconfigured), ConnectTimeout (15s), and Series for CNC pre-flight.

Post-mortem MMF (Host/Stability/PostMortemMmf.cs + Proxy/Supervisor/PostMortemReader.cs): ring-buffer of the last ~1000 IPC operations written by the Host into a memory-mapped file. On a Host crash the supervisor reads the MMF — which survives process death — to see what was in flight. File format: 16-byte header [magic 'OFPC' (0x4F465043) | version | capacity | writeIndex] + N × 256-byte entries [8-byte UTC unix ms | 8-byte opKind | 240-byte UTF-8 message + null terminator]. Magic distinguishes FOCAS MMFs from the Galaxy MMFs that ship the same format shape. Writer is single-producer (Host) with a lock_writeGate; reader is multi-consumer (Proxy + any diagnostic tool) using a separate MemoryMappedFile handle.

NSSM install wrappers (scripts/install/Install-FocasHost.ps1 + Uninstall-FocasHost.ps1): idempotent service registration for OtOpcUaFocasHost. Resolves SID from the ServiceAccount, generates a fresh shared secret per install if not supplied, stages OTOPCUA_FOCAS_PIPE/SID/SECRET/BACKEND in AppEnvironmentExtra so they never hit disk, rotates 10MB stdout/stderr logs under %ProgramData%\OtOpcUa, DependOnService=OtOpcUa so startup order is deterministic. Backend selector defaults to unconfigured so a fresh install doesn't accidentally load a half-configured Fwlib32.dll on first start.

Tests (7 new, 2 files): PostMortemMmfTests.cs in FOCAS.Host.Tests — round-trip write+read preserves order + content, ring-buffer wraps at capacity (writes 10 entries to a 3-slot buffer, asserts only op-7/8/9 survive in FIFO order), message truncation at the 240-byte cap is null-terminated + non-overflowing, reopening an existing file preserves entries. PostMortemReaderCompatibilityTests.cs in FOCAS.Tests — hand-writes a file in the exact host format (magic/entry layout) + asserts the Proxy reader decodes with correct ring-walk ordering when writeIndex != 0, empty-return on missing file + magic mismatch. Keeps the two codebases in format-lockstep without the net10 test project referencing the net48 Host assembly.

Docs updated: docs/v2/implementation/focas-isolation-plan.md promoted from DRAFT to PRs A-E shipped status with per-PR citations + post-ship test counts (189 + 24 + 13 = 226 FOCAS-family tests green). docs/drivers/FOCAS-Test-Fixture.md §5 updated from "architecture scoped but not implemented" to listing the shipped components with the FwlibHostedBackend gap explicitly labeled as hardware-gated. Install-FocasHost.ps1 documents the OTOPCUA_FOCAS_BACKEND selector + points at docs/v2/focas-deployment.md for Fwlib32.dll licensing.

What ISN'T in this PR: (1) the real FwlibHostedBackend implementing IFocasBackend with the P/Invoke — requires either a CNC on the bench or a licensed FANUC developer kit to validate, tracked under #220 as a single follow-up task; (2) Admin /hosts surface integration for FOCAS runtime status — Galaxy Tier-C already has the shape, FOCAS can slot in when someone wires ObservedCrashes/StickyAlertActive/BackoffAttempt to the FleetStatusHub; (3) a full integration test that actually spawns a real FOCAS Host process — ProcessHostLauncher is tested via its contract + the MMF is tested via round-trip, but no test spins up the real exe (the Galaxy Tier-C tests do this, but the FOCAS equivalent adds no new coverage over what's already in place).

Total FOCAS-family tests green after this PR: 189 driver + 24 Shared + 13 Host = 226.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-20 14:24:13 -04:00
Joseph Doherty
a6be2f77b5 FOCAS version-matrix stabilization (PR 1 of #220 split) — ship the cheap half of the hardware-free stability gap ahead of the Tier-C out-of-process split. Without any CNC or simulator on the bench, the highest-leverage move is to catch operator config errors at init time instead of at steady-state per-read. Adds FocasCncSeries enum (Unknown/16i/0i-D/0i-F family/30i family/PowerMotion-i) + FocasCapabilityMatrix static class that encodes the per-series documented ranges for macro variables (cnc_rdmacro/wrmacro), parameters (cnc_rdparam/wrparam), and PMC letters + byte ceilings (pmc_rdpmcrng/wrpmcrng) straight from the Fanuc FOCAS Developer Kit. FocasDeviceOptions gains a Series knob (defaults Unknown = permissive so pre-matrix configs don't break on upgrade). FocasDriver.InitializeAsync now calls FocasAddress.TryParse on every tag + runs FocasCapabilityMatrix.Validate against the owning device's declared series, throwing InvalidOperationException with a reason string that names both the series and the documented limit ("Parameter #30000 is outside the documented range [0, 29999] for Thirty_i") so an operator can tell whether the mismatch is in the config or in their declared CNC model. Unknown series skips validation entirely. Ships 46 new theory cases in FocasCapabilityMatrixTests.cs — covering every boundary in the matrix (widen 16i->0i-F: macro ceiling 999->9999, param 9999->14999; widen 0i-F->30i: PMC letters +K+T; PMC-number 16i=999/0i-D=1999/0i-F=9999/30i=59999), permissive Unknown-series behavior, rejection-message content, and case-insensitive PMC-letter matching. Widening a range without updating docs/v2/focas-version-matrix.md fails a test because every InlineData cites the row it reflects. Full FOCAS test suite stays at 165/165 passing (119 existing + 46 new). Also authors docs/v2/focas-version-matrix.md as the authoritative range reference with per-function citations, CNC-series era context, error-surface shape, and the link back to the matrix code; docs/v2/implementation/focas-isolation-plan.md as the multi-PR plan for #220 Tier-C isolation (Shared contracts -> Host skeleton -> move Fwlib32 calls -> Supervisor+respawn -> MMF+ops glue, 2200-3200 LOC across 5 PRs mirroring the Galaxy Tier-C topology); and promotes docs/drivers/FOCAS-Test-Fixture.md from "version-matrix coverage = no" to explicit coverage via the new test file + cross-links to the matrix and isolation-plan docs. Leaves task #220 open since isolation itself (the expensive half) is still ahead.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-20 13:44:37 -04:00
Joseph Doherty
58a0cccc67 AB CIP Logix Emulate golden-box tier — scaffold the code + docs so the L5X + Emulate PC drop in without fixture-code changes. Closes the initial design question the user raised; the actual Emulate-side work (author project, commit L5X, install Emulate on the dev box) is tracked as #223. Scaffolding ships everything that doesn't need the live Emulate instance: tier-gated test classes that skip cleanly when AB_SERVER_PROFILE is unset, the profile gate helper, the LogixProject/README.md documenting the exact project state the tests expect, the fixture coverage doc's new §Logix Emulate tier section with the when-to-trust table extended from 3 columns to 4, and the dev-environment.md integration-host row.
AbServerProfileGate — static helper that reads `AB_SERVER_PROFILE` env var (defaults to "abserver") + exposes `SkipUnless(params string[] requiredProfiles)` matching the MODBUS_SIM_PROFILE pattern the DL205StringQuirkTests uses one directory over. Emulate-only tests call `AbServerProfileGate.SkipUnless("emulate")` at the top of each fact body; ab_server-default runs see them skip with a clear message pointing at the Emulate setup steps.

AbCipEmulateUdtReadTests — one test proving the #194 whole-UDT read optimization works against the real Logix Template Object, not just the golden byte buffers the unit suite uses. Builds an `AbCipDriverOptions` with a Structure tag `Motor1 : Motor_UDT` that has three declared members (Speed : DINT, Torque : REAL, Status : DINT), reads them via the `.Speed / .Torque / .Status` dotted-tag syntax, asserts the driver gets the grouped whole-UDT path + decodes each at the right offset. Required seed values documented inline + in LogixProject/README.md: Speed=1800, Torque=42.5f, Status=0x0001.

AbCipEmulateAlmdTests — one test proving the #177 ALMD projection fires `OnAlarmEvent` when a real ALMD instruction's `In` edge rises, not just the fake `InFaulted` timer edges the unit suite drives. Needs a `SimulateAlarm : BOOL` tag routed through `MainRoutine` ladder (`XIC SimulateAlarm OTE HighTempAlarm.In`) so the test case can pulse the input via the existing `IWritable.WriteAsync` path instead of scripting Emulate via its own socket. Alarm-projection options carry `EnableAlarmProjection = true` + 200 ms poll interval; a `TaskCompletionSource` gates the raise-event assertion with a 5 s deadline. Cleanup writes SimulateAlarm=false so consecutive runs start from known state.

LogixProject/README.md — the Studio 5000 project state the Emulate-tier tests depend on. Explains why L5X over ACD (text diff, reproducible import, no per-install state), the UDT + tag + routine structure, how to bring it up on the Emulate PC. Ships as a stub pending actual author + L5X export + commit; the README itself keeps the requirements visible so the L5X author has a checklist.

docs/drivers/AbServer-Test-Fixture.md — new §Logix Emulate golden-box tier section with the coverage-promotion table (ab_server / Emulate / hardware per gap), the setup-env-var recipe, the costs to accept (license, Hyper-V conflict, manual lifecycle). "When to trust" table extended from 3 columns (ab_server / unit / rig) to 4 (ab_server / unit / Logix Emulate / rig); two new rows for EtherNet/IP embedded-switch + redundant-chassis failover that even Emulate can't help with. Follow-up candidates list gets Logix Emulate as option 1 ahead of the pre-existing "extend ab_server upstream" + "stand up a lab rig". See-also file list gains AbServerProfileGate.cs + Docker/ + Emulate/ + LogixProject/README.md entries.

docs/v2/dev-environment.md — §C Integration host gains a Rockwell Studio 5000 Logix Emulate row: purpose (AB CIP golden-box tier closing UDT/ALMD/AOI/safety/ConnectionSize gaps), type (Windows-only, Hyper-V conflict matching TwinCAT XAR's constraint), port 44818, credentials note, owner split between integration-host admin for license+install and developer for per-session runtime start.

Verified: Emulate tests skip cleanly when AB_SERVER_PROFILE is unset — both `[SKIP]` with the operator-facing message pointing at the env-var setup. Whole-solution build 0 errors. Tests will transition from skip → pass once the L5X + Emulate PC land per #223.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-20 12:54:39 -04:00
Joseph Doherty
fdb268cee0 Docs + code-comment sweep — remove stale Pymodbus/ + PythonSnap7/ + LocateBinary references left behind by the native-fallback removal PR. Answer to "is the dev inventory + documentation updated": it was partial; this PR finishes the job.
Files touched — docs/drivers/Modbus-Test-Fixture.md dropped the key-files pointer at deleted Pymodbus/ + flipped "primary launcher is Docker, native fallback retained" framing to "Docker is the only supported launch path" (matching the code). docs/v2/dev-environment.md dropped the "skips both Docker + native-binary paths" parenthetical from AB_SERVER_ENDPOINT + flipped the "Native fallbacks" subsection to a one-liner that says Docker is the only supported path. docs/v2/modbus-test-plan.md rewrote §Harness from "pip install pymodbus + serve.ps1" setup pattern to "docker compose --profile <…> up" + updated the §PR 43 status bullet to point at Docker/profiles/. docs/v2/test-data-sources.md §"CI fixture (task #180)" rewrote the AB CIP section from "LocateBinary() picks binary off PATH" + GitHub Actions zip-download step to "Docker is the only supported reproducible build path" + docker compose GitHub Actions step; dropped the pinned-version SHA256 table + lock-file reference because the Dockerfile's LIBPLCTAG_TAG build-arg is the new pin.

Code docstrings + error messages — these are developer-facing operational text too. ModbusSimulatorFixture SkipReason strings (both branches) now point at `docker compose -f Docker/docker-compose.yml --profile standard up -d` instead of the deleted `Pymodbus\serve.ps1`; doc-comment at the top references Docker/docker-compose.yml. Snap7ServerFixture SkipReason strings + doc-comment point at Docker/docker-compose.yml instead of PythonSnap7/serve.ps1. S7_1500Profile.cs docstring updated. Modbus Dockerfile comment pointing at deleted tests/.../Pymodbus/README.md redirected to docs/drivers/Modbus-Test-Fixture.md. DL205Profile.cs + DL205StringQuirkTests.cs + S7_1500Profile.cs (in Modbus project) docstrings flipped from Pymodbus/*.json references to Docker/profiles/*.json.

Left untouched deliberately: docs/v2/implementation/exit-gate-phase-2-closed.md — that's a historical as-of-2026-04-18 snapshot documenting what was skipped at Phase 2 closure; rewriting would lose the date-stamped context. Its "oitc/modbus-server Docker container not started" + "ab_server binary not on PATH" lines describe the fixture landscape that existed at close time, not current operational guidance.

Final sweep confirms zero remaining `Pymodbus/` / `PythonSnap7/` / `LocateBinary` / `AbServerSeedTag` / `BuildCliArgs` / `AbServerPlcArg` mentions anywhere in tracked files outside that historical exit-gate doc. Whole-solution build still 0 errors.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-20 12:36:19 -04:00
Joseph Doherty
0e1dcc119e Remove native-launcher fallbacks for the four Dockerized fixtures — Docker is the only supported path for Modbus / S7 / AB CIP / OpcUaClient integration. Native paths stay in place only where Docker isn't compatible (Galaxy: MXAccess COM + Windows-only; TwinCAT: Beckhoff runtime vs Hyper-V; FOCAS: closed-source Fanuc Fwlib32.dll; AB Legacy: PCCC has no OSS simulator). Simplifies the fixture landscape + removes the "which path do I run" ambiguity; removes two full native-launcher directories + the AB CIP native-spawn path; removes the parallel profile-as-CLI-arg-builder code from AbServerFixture.
Modbus — deletes tests/.../Modbus.IntegrationTests/Pymodbus/ (serve.ps1, standard.json, dl205.json, mitsubishi.json, s7_1500.json, README.md). Profile JSONs live only under Docker/profiles/ now. Docker/README.md loses its "Native-Python fallback" section; docs/drivers/Modbus-Test-Fixture.md "What the fixture is" bullet flipped from "primary launcher is Docker, native fallback under Pymodbus/" to "Docker is the only supported launch path".

S7 — deletes tests/.../S7.IntegrationTests/PythonSnap7/ (server.py, s7_1500.json, serve.ps1, README.md). Docker/README.md loses "Native-Python fallback"; docs/drivers/S7-Test-Fixture.md updated to match.

AB CIP — the biggest simplification because the native-binary spawn had the most code. AbServerFixture.cs rewrites: drops Process management (no more Process _proc + Kill/WaitForExit), drops LocateBinary() PATH lookup, drops the IAsyncLifetime initialize-spawns-server behavior. Fixture is now a thin TCP probe against localhost:44818 (or AB_SERVER_ENDPOINT override) — same shape as Snap7ServerFixture / ModbusSimulatorFixture / OpcPlcFixture. IsServerAvailable() simplifies to a single 500 ms probe. AbServerProfile.cs drops AbServerPlcArg + SeedTags + BuildCliArgs + ToCliSpec + the entire AbServerSeedTag record — the compose file is the canonical source of truth for which tags + which --plc mode each family gets; the profile record now carries just Family + ComposeProfile (matches the docker-compose service key) + Notes. KnownProfiles.ForFamily + .All stay for tests that iterate families. AbServerProfileTests.cs rewrites to match: drops BuildCliArgs_* + ToCliSpec_* + SeedTags_* tests; keeps the family-coverage contract tests + verifies the ComposeProfile strings match compose-file service names (a typo in either surfaces as a unit-test failure, not a silent "wrong family booted" at runtime). Docker/README.md loses "Native-binary fallback" section; docs/drivers/AbServer-Test-Fixture.md "What the fixture is" flipped to Docker-only with clearer skip rules.

dev-environment.md §Docker fixtures — the "Native fallbacks" subsection goes away; replaced with a one-line note that Docker is the only supported path for these four fixtures + a fresh clone needs Docker Desktop and nothing else.

Verified: whole-solution build 0 errors, AB CIP profile unit tests 6/6, AB CIP Docker smoke 4/4 (all family theory rows), S7 Docker smoke 3/3. Container lifecycle clean. The deleted native code surface was already redundant — every fixture the native paths served is now covered by Docker; keeping them invited drift between the two paths (the original AB CIP native profile had three undetected bugs per the #162 commit message: case-sensitive --plc, bracket tag notation, --path=1,0 requirement — noise the Docker path now avoids by never running the buggy code).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-20 12:27:44 -04:00
Joseph Doherty
6609141493 Dockerize Modbus + AB CIP + S7 test fixtures for reproducibility. Every driver integration simulator now has a pinned Docker image alongside the existing native launcher — Docker is the primary path, native fallbacks kept for contributors who prefer them. Matches the already-Dockerized OpcUaClient/opc-plc pattern from #215 so every fixture in the fleet presents the same compose-up/test/compose-down loop. Reproducibility gain: what used to require a local pip/Python install (Modbus pymodbus, S7 python-snap7) or a per-OS C build from source (AB CIP ab_server from libplctag) now collapses to a Dockerfile + docker compose up. Modbus — new tests/ZB.MOM.WW.OtOpcUa.Driver.Modbus.IntegrationTests/Docker/ with Dockerfile (python:3.12-slim-bookworm + pymodbus[simulator]==3.13.0) + docker-compose.yml with four compose profiles (standard / dl205 / mitsubishi / s7_1500) backed by the existing profile JSONs copied under Docker/profiles/ as canonical; native fallback in Pymodbus/ retained with the same JSON set (symlink-equivalent — manual re-sync when profiles change, noted in both READMEs). Port 5020 unchanged so MODBUS_SIM_ENDPOINT + ModbusSimulatorFixture work without code change. Dropped the --no_http CLI arg the old serve.ps1 + compose draft passed — pymodbus 3.13 doesn't recognize it; the simulator's http ui just binds inside the container where nothing maps it out and costs nothing. S7 — new tests/ZB.MOM.WW.OtOpcUa.Driver.S7.IntegrationTests/Docker/ with Dockerfile (python:3.12-slim-bookworm + python-snap7>=2.0) + docker-compose.yml with one s7_1500 compose profile; copies the existing server.py shim + s7_1500.json seed profile; runs python -u server.py ... --port 1102. Native fallback in PythonSnap7/ retained. Port 1102 unchanged. AB CIP — hardest because ab_server is a source-only C tool in libplctag's src/tools/ab_server/. New tests/ZB.MOM.WW.OtOpcUa.Driver.AbCip.IntegrationTests/Docker/ Dockerfile is multi-stage: build stage (debian:bookworm-slim + build-essential + cmake) clones libplctag at a pinned tag + cmake --build build --target ab_server; runtime stage (debian:bookworm-slim) copies just the binary from /src/build/bin_dist/ab_server. docker-compose.yml ships four compose profiles (controllogix / compactlogix / micro800 / guardlogix) with per-family ab_server CLI args matching AbServerProfile.cs. AbServerFixture updated: tries TCP probe on 127.0.0.1:44818 first (Docker path) + spawns the native binary only as fallback when no listener is there. AB_SERVER_ENDPOINT env var supported for pointing at a real PLC. AbServerFact/Theory attributes updated to IsServerAvailable() which accepts any of: live listener on 44818, AB_SERVER_ENDPOINT set, or binary on PATH. Required two CLI-compat fixes to ab_server's argument expectations that the existing native profile never caught because it was never actually run at CI: --plc is case-sensitive (ControlLogix not controllogix), CIP tags need [size] bracket notation (DINT[1] not bare DINT), ControlLogix also requires --path=1,0. Compose files carry the corrected flags; the existing native-path AbServerProfile.cs was never invoked in practice so we don't rewrite it here. Micro800 now uses the --plc=Micro800 mode rather than falling back to ControlLogix emulation — ab_server does have the dedicated mode, the old Notes saying otherwise were wrong. Updated docs: three fixture coverage docs (Modbus-Test-Fixture.md, S7-Test-Fixture.md, AbServer-Test-Fixture.md) flip their "What the fixture is" section from native-only to Docker-primary-with-native-fallback; dev-environment.md §Resource Inventory replaces the old ambiguous "Docker Desktop + ab_server native" mix with four per-driver rows (each listing the image, compose file, compose profiles, port, credentials) + a new Docker fixtures — quick reference subsection giving the one-line docker compose -f <…> --profile <…> up for each driver + the env-var override names + the native fallback install recipes. drivers/README.md coverage map table updated — Modbus/AB CIP/S7 entries now read "Dockerized …" consistent with OpcUaClient's line. Verified end-to-end against live containers: Modbus DL205 smoke 1/1, S7 3/3, AB CIP ControlLogix 4/4 (all family theory rows). Container lifecycle clean (up/test/down, no leaked state). Every fixture keeps its skip-when-absent probe + env-var endpoint override so dotnet test on a fresh clone without Docker running still gets a green run.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-20 12:09:44 -04:00
Joseph Doherty
1ddc13b7fc ADR-001 accepted — Option A (Config-primary walker); Option D (discovery-assist) deferred to v2.1. Spawning Task A + Task B. 2026-04-20 02:30:57 -04:00
Joseph Doherty
97e1f55bbb Draft ADR-001 — Equipment node walker: how driver tags bind to the UNS address space. Frames the decision blocking task #195 (IdentificationFolderBuilder wire-in): the Equipment-namespace browse tree requires a Config-DB-driven walker that traverses UNS → Equipment → Tag + hangs Identification sub-folders + identifier properties, and the open question is how driver-discovered tags bind to the UNS Equipment nodes the walker materializes. Context section documents what already exists (IdentificationFolderBuilder unused; NodeScopeResolver at Phase-1 cluster-only stub; Equipment + UnsArea + UnsLine + Tag tables with decisions #110 #116 #117 #120 #121 already landed as the data-model contract) vs what's missing (the walker itself + the ITagDiscovery/Config-DB composition strategy). Four options laid out with trade-offs: Option A Config-primary (Tag rows are the sole source of truth; ITagDiscovery becomes enrichment; BadNotFound placeholder when driver can't address a declared tag); Option B Discovery-primary (driver output is authoritative; Config-DB Equipment rows select subsets); Option C Parallel namespaces (driver-native ns + UNS overlay ns cross-referencing via OPC UA Organizes); Option D Config-primary-with-discovery-assist (same as A at runtime, plus an Admin UI offline discovery panel that lets operators one-click-import discovered tags into the draft). Recommendation: Option A now, defer Option D to v2.1. Reasons: matches decision #110's framing straight-through, identical composition across every Equipment-kind driver, Phase 6.4 Admin UI already authors Tag rows, BadNotFound is a legible failure mode, and nothing in A blocks adding D later without changing the walker contract. If the ADR is accepted, spawns two tasks: Task A builds EquipmentNodeWalker in Core.OpcUa (cluster → namespace → area → line → equipment → tag traversal, IdentificationFolderBuilder per Equipment, 5 identifier properties, BadNotFound placeholders, integration tests); Task B extends NodeScopeResolver to join against Config DB + populate full NodeScope path (unblocks per-Equipment/per-UnsLine ACL granularity + closes task #195 with the ACL integration test from the builder's docstring cross-reference). Consequences-if-we-don't-decide section captures the status quo: Identification metadata ships in DB + Admin UI but never reaches the OPC UA endpoint, external consumers can't resolve equipment via OPC UA properties as decision #121 promises, and NodeScopeResolver stays cluster-level so finer ACL grants are effectively cluster-wide at dispatch (Phase 6.2 rollout limitation, not correctness bug). Draft status — seeking decision before spawning the two implementation tasks. If accepted I'll add the tasks + start on Task A.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-20 02:28:10 -04:00
Joseph Doherty
33b87a3aa4 Phase 2 official close-out. Closes task #209. The 2026-04-18 exit-gate-phase-2-final.md captured Phase 2 state at PR 2 merge — four High/Medium adversarial findings still OPEN, Historian port + alarm subsystem + v1 archive deletion all deferred. Since then: PR 4 closed all four findings end-to-end (High 1 Read subscription-leak, High 2 no reconnect loop, Medium 3 SubscribeAsync doesn't push frames, Medium 4 WriteValuesAsync doesn't await OnWriteComplete — mapped + resolved inline in the new doc), PR 12 landed the richer historian quality mapper, PR 13 shipped GalaxyRuntimeProbeManager with per-Platform/AppEngine ScanState subscriptions + StateChanged events forwarded through the existing OnHostStatusChanged IPC frame, PR 14 wired the alarm subsystem (GalaxyAlarmTracker advising the four alarm-state attributes per IsAlarm=true attribute, raising AlarmTransition events forwarded through OnAlarmEvent IPC frames), Phase 3 PR 18 deleted the v1 source trees, and PR 61 closed V1_ARCHIVE_STATUS.md. Phase 2 is functionally done; this commit is the bookkeeping pass. New exit-gate-phase-2-closed.md at docs/v2/implementation/ — five-stream status table (A/B/C/D/E all complete with the specific close commits named), full resolution table for every 2026-04-18 adversarial finding mapped to the PR 4 resolution, cross-cutting deferrals table marking every one resolved (Historian SDK plugin port → done, subscription push frames → done under Medium 3, Historian-backed HistoryRead → done, alarm subsystem wire-up → done, reconnect-without-recycle → done under High 2, v1 archive deletion → done). Fresh 2026-04-20 test baseline captured from the current v2 tip: 1844 passing + 29 infra-gated skips across 21 test projects, including the net48 x86 Galaxy.Host.Tests suite (107 pass) that exercises the MXAccess COM path on the dev box. Flake observed — Configuration.Tests 70/71 on first full-solution run, 71/71 on retry; logged as a known non-stable flake rather than chased because it did not reproduce. The prior exit-gate-phase-2-final.md is kept in place (historical record of the 2026-04-18 snapshot) but gets a superseded-by banner at the top pointing at the new close-out doc so future readers land on current status first. docs/v2/plan.md Phase 2 section header gains the CLOSED 2026-04-20 marker + a link to the close-out doc so the top-level plan index reflects reality. "What Phase 2 closed means for Phase 3 and later" section in the new doc captures the downstream contract: Galaxy now runs as a first-class v2 driver with the same capability-interface shape as Modbus / S7 / AbCip / AbLegacy / TwinCAT / FOCAS / OpcUaClient; no v1 code path remains; the 2026-04-13 stability findings persist as named regression tests under tests/ZB.MOM.WW.OtOpcUa.Driver.Galaxy.E2E/StabilityFindingsRegressionTests.cs so any future refactor reintroducing them trips the test. "Outstanding — not Phase 2 blockers" section lists the four pending non-Phase-2 tasks (#177, #194, #195, #199) so nobody mistakes them for Phase 2 tail work.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-20 02:00:35 -04:00
Joseph Doherty
8ce5791f49 Pin libplctag ab_server to v2.6.16 — real release tag + SHA256 hashes for all three Windows arches. Closes the "pick a current version + pin" deferral left by the #180 PR docs stub. Verified the release lands ab_server.exe inside libplctag_2.6.16_windows_<arch>_tools.zip alongside plctag.dll + list_tags_* helpers by downloading each tools zip + unzip -l'ing to confirm ab_server.exe is present at 331264 bytes. New ci/ab-server.lock.json is the single source of truth — one file the CI YAML reads via ConvertFrom-Json instead of duplicating the hash across the workflow + the docs. Structure: repo (libplctag/libplctag) + tag (v2.6.16) + published date (2026-03-29) + assets keyed by platform (windows-x64 / windows-x86 / windows-arm64) each carrying filename + sha256. docs/v2/test-data-sources.md §2.CI updated — replaces the prior placeholder (ver = '<pinned libplctag release tag>', expected = '<pinned sha256>') with the real v2.6.16 + 9b78a3de... hashes pinned table, and replaces the hardcoded URL with a lockfile-driven pwsh step that picks windows-x64 by default but swaps to x86/arm64 by changing one line for non-x64 CI runners. Hash-mismatch path throws with both the expected + actual values so on the first drift the CI log tells the maintainer exactly what to update in the lockfile. Two verification notes from the release fetch: (1) libplctag v2.6.16 tools zips ship ab_server.exe + plctag.dll together — tests don't need a separate libplctag NuGet download for the integration path, the extracted tools dir covers both the simulator + the driver's native dependency; (2) the three Windows arches all carry ab_server.exe, so ARM64 Windows GitHub runners (when they arrive) can run the integration suite without changes beyond swapping the asset key. No code changes in this PR — purely docs + the new lockfile. Admin tests + Core tests unchanged + passing per the prior commit.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-20 00:04:35 -04:00
Joseph Doherty
32dff7f1d6 ab_server integration fixture — per-family profiles + documented CI-fetch contract. Closes task #180 (AB CIP follow-up — ab_server CI fixture). Replaces the prior hardcoded single-family fixture with a parametric AbServerProfile abstraction covering ControlLogix / CompactLogix / Micro800 / GuardLogix. Prebuilt-Windows-binary fetch is documented as a CI YAML step rather than fabricated C#-side, because SHA-pinned binary distribution is a CI workflow concern (libplctag owns releases, we pin a version + verify hash) not a test-framework concern. New AbServerProfile record + KnownProfiles static class at tests/.../AbServerProfile.cs. Four profiles: ControlLogix (widest coverage — DINT/REAL/BOOL/SINT/STRING atomic + DINT[16] array so the driver's @tags Symbol-Object decoder + array-bound path both get end-to-end coverage), CompactLogix (atomic subset — driver-side ConnectionSize quirk from PR 10 still applies since ab_server doesn't enforce the narrower limit), Micro800 (ab_server has no dedicated --plc micro800 mode — falls back to controllogix while driver-side path enforces empty routing + unconnected-only per PR 11; real Micro800 coverage requires a 2080 lab rig), GuardLogix (ab_server has no safety subsystem — profile emulates the _S-suffixed naming contract the driver's safety-ViewOnly classification reads in PR 12; real safety-lock behavior requires a 1756-L8xS physical rig). Each profile composes --plc + --tag args via BuildCliArgs(port) — pure string formatter so the composition logic is unit-testable without launching the simulator. AbServerFixture gains a ctor overload taking AbServerProfile + port (defaults back to ControlLogix on parameterless ctor so existing test suites keep compiling). Fixture's InitializeAsync hands the profile's CLI args to ProcessStartInfo.Arguments. New AbServerTheoryAttribute mirrors AbServerFactAttribute but extends TheoryAttribute so a single test can MemberData over KnownProfiles.All + cover all four families. AbCipReadSmokeTests converted from single-fact to theory parametrized over KnownProfiles.All — one row per family reads TestDINT + asserts Good status + Healthy driver state. Fixture lifecycle is explicit try/finally rather than await using because IAsyncLifetime.DisposeAsync returns ValueTask + xUnit's concrete IAsyncDisposable shim depends on xunit version; explicit beats implicit here. Eight new unit tests in AbServerProfileTests.cs (runs without the simulator so CI green even when the binary is absent): BuildCliArgs composes port + plc + tag flags in the documented order; empty seed-tag list still emits port + plc; SeedTag.ToCliSpec handles both 2-segment scalar + 3-segment array; KnownProfiles.ForFamily returns expected --plc arg for every family (verifies Micro800 + GuardLogix both fall back to controllogix); KnownProfiles.All covers every AbCipPlcFamily enum value (regression guard — adding a new family without a profile fails this test); ControlLogix seeds every atomic type the driver supports; GuardLogix seeds at least one _S-suffixed safety tag. Integration tests still skip cleanly when ab_server isn't on PATH. 11/11 unit tests passing in this project (8 new + 3 prior). Full Admin solution builds 0 errors. docs/v2/test-data-sources.md gets a new "CI fixture" subsection under §2.Gotchas with the exact GitHub Actions YAML step — fetch the pinned libplctag release, SHA256-verify against a pinned hash recorded in the repo's CI lockfile (drift = fail closed), extract, append to PATH. The C# harness stays PATH-driven so dev-box installs (cmake + make from source) work identically to CI.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-19 23:57:24 -04:00
Joseph Doherty
e71f44603c v2 release-readiness — blocker #3 closed; all three code-path blockers shut
Phase 6.3 Streams A + C core shipped (PRs #98-99):
- RedundancyCoordinator + ClusterTopologyLoader read the shared config DB +
  enforce the Phase 6.3 invariants (1-2 nodes, unique ApplicationUri, ≤1
  Primary in Warm/Hot). Startup fails fast on violation.
- RedundancyStatePublisher orchestrates topology + apply lease + recovery
  state + peer reachability through ServiceLevelCalculator. Edge-triggered
  OnStateChanged + OnServerUriArrayChanged events the OPC UA variable-node
  layer subscribes to.

Doc updates:
- Top status flips from NOT YET RELEASE-READY → RELEASE-READY (code-path).
  Remaining work is manual (client interop matrix, deployment signoff,
  OPC UA CTT pass) + hardening follow-ups that don't block v2 GA ship.
- Release-blocker #3 section struck through + CLOSED with PR links.
  Remaining Phase 6.3 surfaces (peer-probe HostedServices, OPC UA
  variable-node binding, sp_PublishGeneration lease wrap, client interop)
  explicitly listed as hardening follow-ups.
- Change log: new dated entry.

All three release blockers identified at the capstone are closed:
- #1 Phase 6.2 dispatch wiring  → PR #94 (2026-04-19)
- #2 Phase 6.1 Stream D wiring  → PR #96 (2026-04-19)
- #3 Phase 6.3 Streams A/C core → PRs #98-99 (2026-04-19)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-19 11:33:37 -04:00
Joseph Doherty
a8401ab8fd v2 release-readiness — blocker #2 closed; doc reflects state
PR #96 closed the Phase 6.1 Stream D config-cache wiring blocker.

- Status line: "one of three release blockers remains".
- Blocker #2 struck through + CLOSED with PR link. Periodic-poller + richer-
  snapshot-payload follow-ups downgraded to hardening.
- Change log: dated entry.

One blocker remains: Phase 6.3 Streams A/C/F redundancy runtime (tasks
#145, #147, #150).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-19 11:16:31 -04:00
Joseph Doherty
ba42967943 v2 release-readiness — blocker #1 closed; doc reflects state
PR #94 closed the Phase 6.2 dispatch wiring blocker. Update the dashboard:
- Status line: "two of three release blockers remain".
- Release-blocker #1 section struck through + marked CLOSED with PR link.
  Remaining Stream C surfaces (Browse / Subscribe / Alarm / Call + finer-
  grained scope resolution) downgraded to hardening follow-ups — not
  release-blocking.
- Change log: new dated entry.

Two remaining blockers: Phase 6.1 Stream D config-cache wiring (task #136)
+ Phase 6.3 Streams A/C/F redundancy runtime (tasks #145, #147, #150).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-19 11:04:30 -04:00
Joseph Doherty
3b2d0474a7 v2 release-readiness capstone — aggregate compliance runner + release-readiness dashboard
Closes out Phase 6 with the two pieces a release engineer needs before
tagging v2 GA:

1. scripts/compliance/phase-6-all.ps1 — meta-runner that invokes every
   per-phase Phase 6.N compliance script in sequence + aggregates results.
   Each sub-script runs in its own powershell.exe child process so per-script
   $ErrorActionPreference + exit semantics can't interfere with the parent.
   Exit 0 = every phase passes; exit 1 = one or more phases failed. Prints a
   PASS/FAIL summary matrix at the end.

2. docs/v2/v2-release-readiness.md — single-view dashboard of everything
   shipped + everything still deferred + release exit criteria. Called out
   explicitly:
   - Three release BLOCKERS (must close before v2 GA):
     * Phase 6.2 Stream C dispatch wiring — AuthorizationGate exists but no
       DriverNodeManager Read/Write/etc. path calls it (task #143).
     * Phase 6.1 Stream D follow-up — ResilientConfigReader + sealed-cache
       hook not yet consumed by any read path (task #136).
     * Phase 6.3 Streams A/C/F — coordinator + UA-node wiring + client
       interop still deferred (tasks #145, #147, #150).
   - Three nice-to-haves (not release-blocking) — Admin UI polish, background
     services, multi-host dispatch.
   - Release exit criteria: all 4 compliance scripts exit 0, dotnet test ≤ 1
     known flake, blockers closed or v2.1-deferred with written decision,
     Fleet Admin signoff on deployment checklist, live-Galaxy smoke test,
     OPC UA CTT pass, redundancy cutover validated with at least one
     production client.
   - Change log at the bottom so future ships of deferred follow-ups just
     append dates + close out dashboard rows.

Meta-runner verified locally:
  Phase 6.1 — PASS
  Phase 6.2 — PASS
  Phase 6.3 — PASS
  Phase 6.4 — PASS
  Aggregate: PASS (elapsed 340 s — most of that is the full solution
  `dotnet test` each phase runs).

Net counts at capstone time: 906 baseline → 1159 passing across Phase 6
(+253). 15 deferred follow-up tasks tracked with IDs (#134-137, #143-144,
#145, #147, #149-150, #153, #155-157). v2 is NOT YET release-ready —
capstone makes that explicit rather than letting the "shipped" label on
each phase imply full readiness.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-19 10:32:21 -04:00
Joseph Doherty
99cf1197c5 Phase 6.4 exit gate — compliance real-checks + phase doc = SHIPPED (data layer)
scripts/compliance/phase-6-4-compliance.ps1 turns stub TODOs into 11 real
checks covering:
- Stream A data layer: UnsImpactAnalyzer + DraftRevisionToken + cross-cluster
  rejection (decision #82) + all three move kinds (LineMove / AreaRename /
  LineMerge).
- Stream B data layer: EquipmentCsvImporter + version marker
  '# OtOpcUaCsv v1' + decision-#117 required columns + decision-#139
  optional columns including DeviceManualUri + duplicate-ZTag rejection +
  unknown-column rejection.

Four [DEFERRED] surfaces tracked explicitly with task IDs:
  - Stream A UI drag/drop (task #153)
  - Stream B staging + finalize + UI (task #155)
  - Stream C DiffViewer refactor (task #156)
  - Stream D OPC 40010 Identification sub-folder + Razor component (task #157)

Cross-cutting: full solution dotnet test passes 1159 >= 1137 pre-Phase-6.4
baseline; pre-existing Client.CLI Subscribe flake tolerated.

docs/v2/implementation/phase-6-4-admin-ui-completion.md status updated from
DRAFT to SHIPPED (data layer). Four Blazor / SignalR / EF / address-space
follow-ups tracked as tasks — the visual-compliance review pattern from
Phase 6.1 Stream E applies to each.

`Phase 6.4 compliance: PASS` — exit 0.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-19 10:13:46 -04:00
Joseph Doherty
2fe4bac508 Phase 6.3 exit gate — compliance real-checks + phase doc = SHIPPED (core)
scripts/compliance/phase-6-3-compliance.ps1 turns stub TODOs into 21 real
checks covering:
- Stream B 8-state matrix: ServiceLevelCalculator + ServiceLevelBand present;
  Maintenance=0, NoData=1, InvalidTopology=2, AuthoritativePrimary=255,
  IsolatedPrimary=230, PrimaryMidApply=200, RecoveringPrimary=180,
  AuthoritativeBackup=100, IsolatedBackup=80, BackupMidApply=50,
  RecoveringBackup=30 — every numeric band pattern-matched in source (any
  drift turns a check red).
- Stream B RecoveryStateManager with dwell + publish-witness gate + 60s
  default dwell.
- Stream D ApplyLeaseRegistry: BeginApplyLease returns IAsyncDisposable;
  key includes PublishRequestId (decision #162); PruneStale watchdog present;
  10 min default ApplyMaxDuration.

Five [DEFERRED] follow-up surfaces explicitly listed with task IDs:
  - Stream A topology loader (task #145)
  - Stream C OPC UA node wiring (task #147)
  - Stream E Admin UI (task #149)
  - Stream F interop + Galaxy failover (task #150)
  - sp_PublishGeneration Transparent-mode rejection (task #148 part 2)

Cross-cutting: full solution dotnet test passes 1137 >= 1097 pre-Phase-6.3
baseline; pre-existing Client.CLI Subscribe flake tolerated.

docs/v2/implementation/phase-6-3-redundancy-runtime.md status updated from
DRAFT to SHIPPED (core). Non-transparent redundancy per decision #84 keeps
role election out of scope — operator-driven failover is the v2.0 model.

`Phase 6.3 compliance: PASS` — exit 0.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-19 10:00:30 -04:00
Joseph Doherty
bd53ebd192 Phase 6.2 exit gate — compliance script real-checks + phase doc = SHIPPED (core)
scripts/compliance/phase-6-2-compliance.ps1 replaces the stub TODOs with 23
real checks spanning:
- Stream A: LdapGroupRoleMapping entity + AdminRole enum + ILdapGroupRoleMappingService
  + impl + write-time invariant + EF migration all present.
- Stream B: OpcUaOperation enum + NodeScope + AuthorizationDecision tri-state
  + IPermissionEvaluator + PermissionTrie + Builder + Cache keyed on
  GenerationId + UserAuthorizationState with MembershipFreshnessInterval=15m
  and AuthCacheMaxStaleness=5m + TriePermissionEvaluator + HistoryRead uses
  its own flag.
- Control/data-plane separation: the evaluator + trie + cache + builder +
  interface all have zero references to LdapGroupRoleMapping (decision #150).
- Stream C foundation: ILdapGroupsBearer + AuthorizationGate with StrictMode
  knob. DriverNodeManager dispatch-path wiring (11 surfaces) is Deferred,
  tracked as task #143.
- Stream D data layer: ValidatedNodeAclAuthoringService + exception type +
  rejects None permissions. Blazor UI pieces (RoleGrantsTab, AclsTab,
  SignalR invalidation, draft diff) are Deferred, tracked as task #144.
- Cross-cutting: full solution dotnet test runs; 1097 >= 1042 baseline;
  tolerates the one pre-existing Client.CLI Subscribe flake.

IPermissionEvaluator doc-comment reworded to avoid mentioning the literal
type name "LdapGroupRoleMapping" — the compliance check does a text-absence
sweep for that identifier across the data-plane files.

docs/v2/implementation/phase-6-2-authorization-runtime.md status updated from
DRAFT to SHIPPED (core). Two deferred follow-ups explicitly called out so
operators see what's still pending for the "Phase 6.2 fully wired end-to-end"
milestone.

`Phase 6.2 compliance: PASS` — exit 0. Any regression that deletes a class
or re-introduces an LdapGroupRoleMapping reference into the data-plane
evaluator turns a green check red + exit non-zero.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-19 09:45:58 -04:00
Joseph Doherty
f29043c66a Phase 6.1 exit gate — compliance script real-checks + phase doc status = SHIPPED
scripts/compliance/phase-6-1-compliance.ps1 replaces the stub TODOs with 34
real checks covering:
- Stream A: pipeline builder + CapabilityInvoker + WriteIdempotentAttribute
  present; pipeline key includes HostName (per-device isolation per decision
  #144); OnReadValue / OnWriteValue / HistoryRead route through invoker in
  DriverNodeManager; Galaxy supervisor CircuitBreaker + Backoff preserved.
- Stream B: DriverTier enum; DriverTypeMetadata requires Tier; MemoryTracking
  + MemoryRecycle (Tier C-gated) + ScheduledRecycleScheduler (rejects Tier
  A/B) + demand-aware WedgeDetector all present.
- Stream C: DriverHealthReport + HealthEndpointsHost; state matrix Healthy=200
  / Faulted=503 asserted in code; LogContextEnricher; JSON sink opt-in via
  Serilog:WriteJson.
- Stream D: GenerationSealedCache + ReadOnly marking + GenerationCacheUnavailable
  exception path; ResilientConfigReader + StaleConfigFlag.
- Stream E data layer: DriverInstanceResilienceStatus entity +
  DriverResilienceStatusTracker. SignalR/Blazor surface is Deferred per the
  visual-compliance follow-up pattern borrowed from Phase 6.4.
- Cross-cutting: full solution `dotnet test` runs; asserts 1042 >= 906
  baseline; tolerates the one pre-existing Client.CLI Subscribe flake and
  flags any new failure.

Running the script locally returns "Phase 6.1 compliance: PASS" — exit 0. Any
future regression that deletes a class or un-wires a dispatch path turns a
green check red + exit non-zero.

docs/v2/implementation/phase-6-1-resilience-and-observability.md status
updated from DRAFT to SHIPPED with the merged-PRs summary + test count delta +
the single deferred follow-up (visual review of the Admin /hosts columns).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-19 08:53:47 -04:00
Joseph Doherty
ba31f200f6 Phase 6 reconcile — merge adjustments into plan bodies, add decisions #143-162, scaffold compliance stubs
After shipping the four Phase 6 plan drafts (PRs 77-80), the adversarial-review
adjustments lived only as trailing "Review" sections. An implementer reading
Stream A would find the original unadjusted guidance, then have to cross-reference
the review to reconcile. This PR makes the plans genuinely executable:

1. Merges every ACCEPTed review finding into the actual Scope / Stream / Compliance
   sections of each phase plan:
   - phase-6-1: Scope table rewrite (per-capability retry, (instance,host) pipeline key,
     MemoryTracking vs MemoryRecycle split, hybrid watchdog formula, demand-aware
     wedge detector, generation-sealed LiteDB). Streams A/B/D + Compliance rewritten.
   - phase-6-2: AuthorizationDecision tri-state, control/data-plane separation,
     MembershipFreshnessInterval (15 min), AuthCacheMaxStaleness (5 min),
     subscription stamp-and-reevaluate. Stream C widened to 11 OPC UA operations.
   - phase-6-3: 8-state ServiceLevel matrix (OPC UA Part 5 §6.3.34-compliant),
     two-layer peer probe (/healthz + UaHealthProbe), apply-lease via await using,
     publish-generation fencing, InvalidTopology runtime state, ServerUriArray
     self-first + peers. New Stream F (interop matrix + Galaxy failover).
   - phase-6-4: DraftRevisionToken concurrency control, staged-import via
     EquipmentImportBatch with user-scoped visibility, CSV header version marker,
     decision-#117-aligned identifier columns, 1000-row diff cap,
     decision-#139 OPC 40010 fields, Identification inherits Equipment ACL.

2. Appends decisions #143 through #162 to docs/v2/plan.md capturing the
   architectural commitments the adjustments created. Each decision carries its
   dated rationale so future readers know why the choice was made.

3. Scaffolds scripts/compliance/phase-6-{1,2,3,4}-compliance.ps1 — PowerShell
   stubs with Assert-Todo / Assert-Pass / Assert-Fail helpers. Every check
   maps to a Stream task ID from the corresponding phase plan. Currently all
   checks are TODO and scripts exit 0; each implementation task is responsible
   for replacing its TODO with a real check before closing that task. Saved
   as UTF-8 with BOM so Windows PowerShell 5.1 parses em-dash characters
   without breaking.

Net result: the Phase 6.1 plan is genuinely ready to execute. Stream A.3 can
start tomorrow without reconciling Streams vs. Review on every task; the
compliance script is wired to the Stream IDs; plan.md has the architectural
commitments that justify the Stream choices.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-19 03:49:41 -04:00
Joseph Doherty
4695a5c88e Phase 6 — Draft 4 implementation plans covering v2 unimplemented features + adversarial review + adjustments. After drivers were paused per user direction, audited the v2 plan for features documented-but-unshipped and identified four coherent tracks that had no implementation plan at all. Each plan follows the docs/v2/implementation/phase-*.md template (DRAFT status, branch name, Stream A-E task breakdown, Compliance Checks, Risks, Completion Checklist). docs/v2/implementation/phase-6-1-resilience-and-observability.md (243 lines) covers Polly resilience pipelines wired to every capability interface, Tier A/B/C runtime enforcement (memory watchdog generalized beyond Galaxy, scheduled recycle per decision #67, wedge detection), health endpoints on :4841, structured Serilog with correlation IDs, LiteDB local-cache fallback per decision #36. phase-6-2-authorization-runtime.md (145 lines) wires ACL enforcement on every OPC UA Read/Write/Subscribe/Call path + LDAP-group-to-admin-role grants per decisions #105 and #129 -- runtime permission-trie evaluator over the 6-level Cluster/Namespace/UnsArea/UnsLine/Equipment/Tag hierarchy, per-session cache invalidated on generation-apply + LDAP-cache expiry. phase-6-3-redundancy-runtime.md (165 lines) lands the non-transparent warm/hot redundancy runtime per decisions #79-85: dynamic ServiceLevel node, ServerUriArray peer broadcast, mid-apply dip via sp_PublishGeneration hook, operator-driven role transition (no auto-election -- plan remains explicit about what's out of scope). phase-6-4-admin-ui-completion.md (178 lines) closes Phase 1 Stream E completion-checklist items that never landed: UNS drag-reorder + impact preview, Equipment CSV import, 5-identifier search, draft-diff viewer enhancements, OPC 40010 _base Identification field exposure per decisions #138-139. Each plan then got a Codex adversarial-review pass (codex mcp tool, read-only sandbox, synchronous). Reviews explicitly targeted decision-log conflicts, API-shape assumptions, unbounded blast radius, under-specified state transitions, and testing holes. Appended 'Adversarial Review — 2026-04-19' section to each plan with numbered findings (severity / finding / why-it-matters / adjustment accepted). Review surfaced real substantive issues that the initial drafts glossed over: Phase 6.1 auto-retry conflicting with decisions #44-45 no-auto-write-retry rule; Phase 6.1 per-driver-instance pipeline breaking decision #35's per-device isolation; Phase 6.1 recycle/watchdog at Tier A/B breaching decisions #73-74 Tier-C-only constraint; Phase 6.2 conflating control-plane LdapGroupRoleMapping with data-plane ACL grants; Phase 6.2 missing Browse enforcement entirely; Phase 6.2 subscription re-authorization policy unresolved between create-time-only and per-publish; Phase 6.3 ServiceLevel=0 colliding with OPC UA Part 5 Maintenance semantics; Phase 6.3 ServerUriArray excluding self (spec-bug); Phase 6.3 apply-window counter race on cancellation; Phase 6.3 client cutover for Kepware/Aveva OI Gateway is unverified hearsay; Phase 6.4 stale UNS impact preview overwriting concurrent draft edits; Phase 6.4 identifier contract drifting from admin-ui.md canonical set (ZTag/MachineCode/SAPID/EquipmentId/EquipmentUuid, not ZTag/SAPID/UniqueId/Alias1/Alias2); Phase 6.4 CSV import atomicity internally contradictory (single txn vs chunked inserts); Phase 6.4 OPC 40010 field list not matching decision #139. Every finding has an adjustment in the plan doc -- plans are meant to be executable from the next session with the critique already baked in rather than a clean draft that would run into the same issues at implementation time. Codex thread IDs cited in each plan's review section for reproducibility. Pure documentation PR -- no code changes. Plans are DRAFT status; each becomes its own implementation phase with its own entry-gate + exit-gate when business prioritizes. 2026-04-19 03:15:00 -04:00
Joseph Doherty
56d8af8bdb Phase 2 PR 61 -- Close V1_ARCHIVE_STATUS.md; Phase 2 Streams D + E done. Purely a documentation-closure PR. The v1 archive deletion itself happened across earlier PRs: PR 2 on phase-2-stream-d archive-marked the four v1 projects (IsTestProject=false so dotnet test slnx bypassed them); Phase 3 PR 18 deleted the archived project source trees. What remained on disk was stale bin/obj residue from pre-deletion builds -- git never tracked those, so removing them from the working tree is cosmetic only (no source-file diff in this PR). What this PR actually changes: V1_ARCHIVE_STATUS.md is rewritten from 'Deletion plan (Phase 2 PR 3)' pre-work prose to a CLOSED retrospective that (a) lists all five v1 directories as deleted with check-marks (src/OtOpcUa.Host, src/Historian.Aveva, tests/Historian.Aveva.Tests, tests/Tests.v1Archive, tests/IntegrationTests), (b) names the parity-bar tests that now fill the role the 494 v1 tests originally held (Driver.Galaxy.E2E cross-FX subprocess parity + stability-findings regression, per-component *.Tests projects, Driver.Modbus.IntegrationTests, LiveStack/ smoke tests), and (c) gives the closure timeline connecting PR 2 -> Phase 3 PR 18 -> this PR 61. Also added the Modbus TCP driver family as parity coverage that didn't exist in v1 (DL205 + S7-1500 + Mitsubishi MELSEC via pymodbus sim). Stream D (retire legacy Host) has been effectively done since Phase 3 PR 18; Stream E (parity validation) is done since PR 2 landed the Driver.Galaxy.E2E project with HostSubprocessParityTests + HierarchyParityTests + StabilityFindingsRegressionTests. This PR exists to definitively close the two pending Phase 2 tasks on the task list and give future-me (or anyone picking up Phase 2 retrospectives) a single 'what actually happened' doc instead of a 'what we plan to do' prose that didn't match reality. dotnet build ZB.MOM.WW.OtOpcUa.slnx: 0 errors, 200 warnings (all xunit1051 cancellation-token analyzer advisories, unchanged from v2 tip). No test regressions -- no source code changed. 2026-04-18 23:20:54 -04:00
8c89d603e8 Merge pull request 'Phase 3 PR 55 -- Mitsubishi MELSEC Modbus TCP quirks research doc' (#54) from phase-3-pr55-mitsubishi-research-doc into v2 2026-04-18 22:54:09 -04:00
Joseph Doherty
c506ea298a Phase 3 PR 55 -- Mitsubishi MELSEC Modbus TCP quirks research document. 451-line doc at docs/v2/mitsubishi.md mirroring the docs/v2/dl205.md template for the MELSEC family (Q-series + QJ71MT91, L-series + LJ71MT91, iQ-R + RJ71EN71, iQ-R built-in Ethernet, iQ-F FX5U built-in, FX3U + FX3U-ENET / FX3U-ENET-P502, FX3GE built-in). Like Siemens S7, MELSEC Modbus is a patchwork of per-site-configured add-on modules rather than a fixed firmware stack, but the MELSEC-specific traps are different enough to warrant their own document. Key findings worth flagging for the PR 58+ implementation track: (1) MODULE NAMING TRAP -- QJ71MB91 is SERIAL RTU, not TCP. The Q-series TCP module is QJ71MT91. Driver docs + config UI should surface this clearly because the confusion costs operators hours when they try to connect to an RS-232 module via Ethernet. (2) NO CANONICAL MAPPING -- every MELSEC Modbus site has a unique 'Modbus Device Assignment Parameter' block of up to 16 assignments (each binding a MELSEC device range like D0..D1023 to a Modbus-address range); the driver must treat the mapping as runtime config, not device-family profile. (3) X/Y BASE DEPENDS ON FAMILY -- Q/L/iQ-R use HEX notation for X/Y (X20 = decimal 32), FX/iQ-F use OCTAL (X20 = decimal 16, same as DL260); iQ-F has a GX Works3 project toggle that can flip this. Single biggest off-by-N source in MELSEC driver code -- driver address helper must take a family selector. (4) Word order CDAB across Q/L/iQ-R/iQ-F by default (CPU-level, not module-level) -- no user-configurable swap on the server side. FX5U's SWAP instruction is for CLIENT mode only. Driver Mitsubishi profile default must be ByteOrder.WordSwap, matching DL260 but OPPOSITE of Siemens S7. (5) D-registers are BINARY by default (opposite of DL205's BCD-by-default). FNC 18 BCD / FNC 19 BIN instructions confirm binary-by-default in the ladder. Caller must explicitly opt-in to Bcd16/Bcd32 tags when the ladder stores BCD, same pattern as DL205 but the default is inverted. (6) FX5U FIRMWARE GATE -- needs firmware >= 1.060 for native Modbus TCP server; older firmware is client-only. Surface a clear capability error on connect. (7) FX3U PORT 502 SPLIT -- the standard FX3U-ENET cannot bind port 502 (lower port range restricted on the firmware); only FX3U-ENET-P502 can. FX3U-ENET-ADP has no Modbus at all and is a common operator mis-purchase -- driver should surface 'module does not support Modbus' as a distinct error, not 'connection refused'. (8) QJ71MT91 does NOT support FC22 (Mask Write) or FC23 (Read-Write Multiple). iQ-R and iQ-F do. Driver bulk-read optimization must gate on module capability. (9) MAX CONNECTIONS -- 16 simultaneous on Q/L/iQ-R, 8 on FX5U and FX3U-ENET. (10) STOP-mode writes -- configurable on Q/L/iQ-R/iQ-F (default = accept writes even in STOP), always rejected with exception 04 on FX3U-ENET. Per-model test differentiation section names the tests Mitsubishi_QJ71MT91_*, Mitsubishi_FX5U_*, Mitsubishi_FX3U_ENET_*, with a shared Mitsubishi_Common_* fixture for CDAB-word-order + binary-not-BCD + standard-exception-codes tests. 17 cited references including primary Mitsubishi manuals (SH-080446 for QJ71MT91, JY997D56101 for FX5, SH-081259 for iQ-R Ethernet, JY997D18101 for FX3U-ENET) plus Ignition / Kepware / Fernhill / HMS third-party driver release notes. Three unconfirmed rumours flagged explicitly: iQ-R RJ71EN71 early firmware rumoured ABCD word order (no primary source), QJ71MT91 firmware < 2010-05 FC15 odd-byte-count truncation (forum report only), FX3U-ENET firmware < 1.14 out-of-order TxId echoes under load (unreproducible on bench). Pure documentation PR -- no code, no tests. Per-quirk implementation lands in PRs 58+. Research conducted 2026-04-18. 2026-04-18 22:51:28 -04:00
Joseph Doherty
9e2b5b330f Phase 3 PR 54 -- Siemens S7 Modbus TCP quirks research document. 485-line doc at docs/v2/s7.md mirroring the docs/v2/dl205.md template for the Siemens SIMATIC S7 family (S7-1200 / S7-1500 / S7-300 / S7-400 / ET 200SP / CP 343-1 / CP 443-1 / CP 343-1 Lean / MODBUSPN). Siemens S7 is fundamentally different from DL260: there is no fixed Modbus memory map baked into firmware -- every deployment runs MB_SERVER (S7-1200/1500/ET 200SP), MODBUSCP (S7-300/400 + CP), or MODBUSPN (S7-300/400 PN) library blocks wired up to user DBs via the MB_HOLD_REG / ADDR parameters. The driver's job is therefore to handle per-site CONFIG rather than per-family QUIRKS, and the doc makes that explicit. Key findings worth flagging for the PR 56+ implementation track: (1) S7 has no fixed memory map -- must accept per-site DriverConfig, cannot assume vendor-standard layout. (2) MB_SERVER requires NON-optimized DBs in TIA Portal; optimized DBs cause the library to return STATUS 0x8383 on every access -- the single most common S7 Modbus deployment bug in the field. (3) Word order is ABCD by default (big-endian bytes + big-endian words) across all Siemens S7 Modbus paths, which is the OPPOSITE of DL260 CDAB -- the Modbus driver's S7 profile default must be ByteOrder.BigEndian, not WordSwap. (4) MB_SERVER listens on ONE port per FB instance; multi-client support requires running MB_SERVER on 502 / 503 / 504 / ... simultaneously -- most clients assume port 502 multiplexes, which is wrong on S7. (5) CP 343-1 Lean is SERVER-ONLY and requires the separate 2XV9450-1MB00 MODBUS TCP CP library license; client mode calls return immediate error on Lean. (6) MB_SERVER does NOT filter Unit ID, accepts any value. Means the driver can't use Unit ID to detect 'direct vs gateway' topology. (7) FC23 Read-Write Multiple, FC22 Mask Write, FC20/21 File Records, FC43 Device Identification all return exception 01 Illegal Function on every S7 variant -- the driver MUST NOT attempt bulk-read optimisation via FC23 when talking to S7. (8) STOP-mode read/write behaviour is non-deterministic across firmware bands: reads may return cached data (library internal buffer), writes may succeed-silently or return exception 04 depending on CPU firmware version -- flagged as 'driver treats both as unavailable, do not distinguish'. Unconfirmed rumours flagged separately: 'V2.0+ reverses float byte order' claim (cited but not reproduced), STOP-mode caching location (folklore, no primary source). Per-model test differentiation section names the tests as S7_<model>_<behavior> matching the DL205 template convention (e.g. S7_1200_MB_SERVER_requires_non_optimized_DB, S7_343_1_Lean_rejects_client_mode, S7_FC23_returns_IllegalFunction). 31 cited references across the Siemens Industry Online Support entry-ID system (68011496 for MB_SERVER FAQ, etc.), TIA Portal library manuals, and three third-party driver vendor release notes (Kepware, Ignition, FactoryTalk). This is a pure documentation PR -- no code, no tests, no csproj changes. Per-quirk implementation lands in PRs 56+. Research conducted 2026-04-18 against latest publicly-available Siemens documentation; STOP-mode behaviour and MB_SERVER versioning specifically cross-checked against Siemens forum answers from 2024-2025. 2026-04-18 22:50:51 -04:00