Three root-cause fixes to get an elevated dev-box shell past session open
through to real MXAccess reads:
1. PipeAcl — drop BUILTIN\Administrators deny ACE. UAC's filtered token
carries the Admins SID as deny-only, so the deny fired even from
non-elevated admin-account shells. The per-connection SID check in
PipeServer.VerifyCaller remains the real authorization boundary.
2. PipeServer — swap the Hello-read / VerifyCaller order. ImpersonateNamedPipeClient
returns ERROR_CANNOT_IMPERSONATE until at least one frame has been read
from the pipe; reading Hello first satisfies that rule. Previously the
ACL deny-first path masked this race — removing the deny ACE exposed it.
3. GalaxyIpcClient — add a background reader + single pending-response
slot. A RuntimeStatusChange event between OpenSessionRequest and
OpenSessionResponse used to satisfy the caller's single ReadFrameAsync
and fail CallAsync with "Expected OpenSessionResponse, got
RuntimeStatusChange". The reader now routes response kinds (and
ErrorResponse) to the pending TCS and everything else to a handler the
driver registers in InitializeAsync. The Proxy was already set up to
raise managed events from RaiseDataChange / RaiseAlarmEvent /
OnHostConnectivityUpdate — those helpers had no caller until now.
4. RedundancyPublisherHostedService — swallow BadServerHalted while
polling host.Server.CurrentInstance. StandardServer throws that code
during startup rather than returning null, so the first poll attempt
crashed the BackgroundService (and the host) before OnServerStarted
ran. This race was latent behind the Galaxy init failure above.
Updates docs that described the Admins deny ACE + mandatory non-elevated
shells, and drops the admin-skip guards from every Galaxy integration +
E2E fixture that had them (IpcHandshakeIntegrationTests, EndToEndIpcTests,
ParityFixture, LiveStackFixture, HostSubprocessParityTests).
Adds GalaxyIpcClientRoutingTests covering the router's
request/response match, ErrorResponse, event-between-call, idle event,
and peer-close paths.
Verified live on the dev box against the p7-smoke cluster (gen 6):
driver registered=1 failedInit=0, Phase 7 bridge subscribed, OPC UA
server up on 4840, MXAccess read round-trip returns real data with
Status=0x00000000.
Task #112 — partial: Galaxy live stack is functional end-to-end. The
supplied test-galaxy.ps1 script still fails because the UNS walker
encodes TagConfig JSON as the tag's NodeId instead of the seeded TagId
(pre-existing; separate issue from this commit).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Catch-all commit for pending work on the task-galaxy-e2e branch that
wasn't part of the FOCAS migration. Grouping by topic so future per-topic
commits can be cherry-picked if needed.
TwinCAT
- src/.../Driver.TwinCAT/AdsTwinCATClient.cs + TwinCATDriverFactoryExtensions.cs:
factory-registration extensions + ADS client refinements.
- src/.../Driver.TwinCAT.Cli/Commands/BrowseCommand.cs: new browse command
for the TwinCAT test-client CLI.
- tests/.../Driver.TwinCAT.IntegrationTests/TwinCAT3SmokeTests.cs + TwinCatProject/:
fixture scaffold with a minimal POU + README pointing at the TCBSD/ESXi
VM for e2e.
- docs/Driver.TwinCAT.Cli.md + docs/drivers/TwinCAT-Test-Fixture.md:
documentation for the above.
- docs/v3/twincat-backlog.md: forward-looking backlog seed.
Admin UI + fleet status
- src/.../Admin/Components/Pages/Clusters/DriversTab.razor + Hosts.razor:
UI refresh for fleet-status rendering.
- src/.../Admin/Hubs/FleetStatusHub.cs + FleetStatusPoller.cs +
Admin/Program.cs: SignalR hub + poller plumbing for live fleet data.
- tests/.../Admin.Tests/FleetStatusPollerTests.cs: poller coverage.
Server + redundancy runtime (Phase 6.3 follow-ups)
- src/.../Server/Hosting/RedundancyPublisherHostedService.cs: HostedService
that owns the RedundancyStatePublisher lifecycle + wires peer reachability.
- src/.../Server/Redundancy/ServerRedundancyNodeWriter.cs: OPC UA
variable-node writer binding ServiceLevel + ServerUriArray to the
publisher's events.
- src/.../Server/Program.cs + Server.csproj: hosted-service registration.
- tests/.../Server.Tests/ServerRedundancyNodeWriterTests.cs +
Server.Tests.csproj: coverage for the above.
Configuration
- src/.../Configuration/Validation/DraftValidator.cs +
tests/.../Configuration.Tests/DraftValidatorTests.cs: draft-validation
refinements.
E2E scripts (shared infrastructure)
- scripts/e2e/README.md + _common.ps1 + test-all.ps1: shared helpers + the
all-drivers test-all runner.
- scripts/e2e/test-opcuaclient.ps1: OPC UA Client e2e runner.
Docs
- docs/v2/implementation/phase-6-{1,2,3,4}*.md + exit-gate-phase-{3,7}.md:
phase-gate + implementation doc updates.
- docs/v2/plan.md: top-level plan refresh.
- docs/v2/redundancy-interop-playbook.md: client interop playbook for the
Phase 6.3 redundancy-runtime work.
Two orphan FOCAS docs remain on disk but deliberately unstaged —
docs/v2/focas-deployment.md and docs/v2/implementation/focas-simulator-plan.md
describe the now-retired Tier-C topology and should either be rewritten
or deleted in a follow-up.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Migration closes the FOCAS Tier-C architecture. OtOpcUa previously had
`Driver.FOCAS.Host` (NSSM-wrapped Windows service loading Fwlib64.dll via
P/Invoke) + `Driver.FOCAS.Shared` (MessagePack IPC contracts) + a C shim
DLL stand-in for unit tests. All of it is deleted; the driver is now a
single in-process managed assembly talking the FOCAS/2 Ethernet binary
protocol directly on TCP:8193.
Architecture
- Pure-managed `FocasWireClient` inlined at `src/.../Driver.FOCAS/Wire/`
(owner-imported — see Wire/FocasWireClient.cs for the full surface).
Opens two TCP sockets, runs the initiate handshake, serialises requests
on socket 2 through a semaphore, closes cleanly with PDU + socket
teardown. Both sync `IDisposable` and async `IAsyncDisposable`.
- `WireFocasClient` (same folder) adapts the wire client to OtOpcUa's
`IFocasClient` surface — fixed-tree reads, PARAM/MACRO/PMC addresses,
alarms. Writes return `BadNotWritable` by design — OtOpcUa is read-only
against FOCAS.
- `FocasDriverFactoryExtensions` now accepts `"Backend": "wire"` (default)
and `"Backend": "unimplemented"`. Legacy `ipc` and `fwlib` backends are
rejected at startup with a diagnostic pointing at the migration doc.
Deletions
- `src/ZB.MOM.WW.OtOpcUa.Driver.FOCAS.Host/` — whole project + Ipc/,
Backend/, Stability/, Program.cs.
- `src/ZB.MOM.WW.OtOpcUa.Driver.FOCAS.Shared/` — Contracts/, FrameReader,
FrameWriter, whole project.
- `tests/...Driver.FOCAS.Host.Tests/` + `.Shared.Tests/` — whole projects.
- `src/.../Driver.FOCAS/FwlibNative.cs` + `FwlibFocasClient.cs` — 21
P/Invokes + 7 `Pack=1` marshalling structs + the Fwlib-backed
`IFocasClient` implementation.
- `src/.../Driver.FOCAS/Ipc/` + `Supervisor/` — IPC client wrapper +
Host-process supervisor (backoff, circuit breaker, heartbeat, post-
mortem reader, process launcher).
- `scripts/install/Install-FocasHost.ps1` — NSSM service installer.
- `tests/.../Driver.FOCAS.Tests/{IpcFocasClientTests, IpcLoopback,
FwlibNativeHelperTests, PostMortemReaderCompatibilityTests,
SupervisorTests, FocasDriverFactoryExtensionsTests}.cs` — tests that
exercised the retired surfaces.
- `tests/.../Driver.FOCAS.IntegrationTests/Shim/` — the zig-built C shim
DLL that masqueraded as Fwlib64.dll.
Solution changes
- `ZB.MOM.WW.OtOpcUa.slnx` drops the 4 retired project refs.
- `src/.../Driver.FOCAS.csproj` drops the Shared ProjectReference, adds
`Microsoft.Extensions.Logging.Abstractions` for the optional `ILogger`
hook in `FocasWireClient`.
- `src/.../Driver.FOCAS.Cli.csproj` drops the six `<Content Include>`
entries that copied `vendor/fanuc/*.dll` into the CLI bin. CLI now uses
`WireFocasClient` directly.
- `FocasDriver` default factory flips to `Wire.WireFocasClientFactory`.
Integration tests
- New `tests/.../Driver.FOCAS.IntegrationTests/` project covering fixed-
tree reads (identity, axes, dynamic, program, operation mode, timers,
spindle load + max RPM, servo meters), user-authored PARAM / MACRO /
PMC reads, `DiscoverAsync` emission, `SubscribeAsync` + `OnDataChange`,
`IAlarmSource` raise/clear transitions, and `ProbeAsync` /
`OnHostStatusChanged`. 9 e2e tests against the focas-mock fixture
(Docker container with the vendored Python mock's native FOCAS/2
Ethernet responder).
- `scripts/integration/run-focas.ps1` orchestrates compose up → tests →
compose down. Dropped the shim-build stage + DLL-copy step + the split
testhost workaround (the latter only existed because of native-DLL
lifecycle bugs the shim tripped).
- Docker compose collapses from 11 per-series services to one `focas-sim`
service. Tests seed per-series state via `mock_load_profile` at test
start.
- Vendored focas-mock snapshot refreshed to pick up upstream's native
FOCAS/2 Ethernet responder (was 660 lines, now 1018) — the
pre-refresh snapshot only spoke the JSON admin protocol.
Tests
- 145/145 unit tests in `Driver.FOCAS.Tests` pass (was 208 pre-deletion;
63 removed tests exercised the retired IPC/shim/supervisor/Fwlib
surfaces).
- 9/9 integration tests pass against the refreshed mock.
- `FocasScaffoldingTests.Unimplemented_factory_throws_on_Create…` updated
to assert the new diagnostic message pointing at
`docs/drivers/FOCAS.md` rather than the now-gone `Fwlib64.dll`.
Docs
- `docs/drivers/FOCAS.md` rewritten for the managed wire topology —
deployment collapses to one `"Backend": "wire"` config block, no
separate service, no DLL deployment, no pipe ACL.
- `docs/drivers/FOCAS-Test-Fixture.md` updated — single TCP probe skip
gate instead of TCP + shim probe; fewer moving parts.
- `docs/drivers/README.md` row for FOCAS reflects the Tier-A managed
topology (previously listed Tier-C + `Fwlib64.dll` P/Invoke).
- `docs/Driver.FOCAS.Cli.md` drops the Tier-C architecture-note section.
- `docs/v2/implementation/focas-isolation-plan.md` marked historical —
the plan it documents was executed then superseded by the wire client.
- `docs/v2/v2-release-readiness.md` re-audited 2026-04-24. Phase 5
driver complement closed. FOCAS change-log entry added.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Brings seven FOCAS-related files into git that shipped as part of earlier
FOCAS work but were never staged. Adding them now so the tree reflects the
compilable state + pre-empts dead references from the migration commit that
follows:
- src/.../Driver.FOCAS/FocasAlarmProjection.cs — raise/clear diffing + severity
mapping surfaced via IAlarmSource on FocasDriver. Referenced by committed
FocasDriver.cs; tests in FocasAlarmProjectionTests.cs.
- src/.../Admin/Services/FocasDriverDetailService.cs — Admin UI per-instance
detail page data source.
- src/.../Admin/Components/Pages/Drivers/FocasDetail.razor — Blazor page
rendering the above (from task #69).
- tests/.../Admin.Tests/FocasDriverDetailServiceTests.cs — exercises the
detail service.
- tests/.../Driver.FOCAS.Tests/FocasAlarmProjectionTests.cs — raise/clear
diff semantics against FakeFocasClient.
- tests/.../Driver.FOCAS.Tests/FocasHandleRecycleTests.cs — proactive recycle
cadence test.
- docs/v2/implementation/focas-wire-protocol.md — captured FOCAS/2 Ethernet
wire protocol reference. Useful going forward even though the Tier-C /
simulator plan docs are historical.
No runtime behaviour change — these files compile today and the solution
build/test pass already depends on them.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Closes the live-smoke validation Phase 7 deferred to. Ships:
## docs/v2/implementation/phase-7-e2e-smoke.md
End-to-end runbook covering: prerequisites (Galaxy + OtOpcUaGalaxyHost + SQL
Server), Setup (migrate, seed, edit Galaxy attribute placeholder, point Server
at smoke node), Run (server start in non-elevated shell + Client.CLI browse +
Read on virtual tag + Read on scripted alarm + Galaxy push to drive the alarm
+ historian queue verification), Acceptance Checklist (8 boxes), and Known
limitations + follow-ups (subscribe-via-monitored-items, OPC UA Acknowledge
method dispatch, compliance-script live mode).
## scripts/smoke/seed-phase-7-smoke.sql
Idempotent seed (DROP + INSERT in dependency order) that creates one cluster's
worth of Phase 7 test config: ServerCluster, ClusterNode, ConfigGeneration
(Published via sp_PublishGeneration), Namespace (Equipment kind), UnsArea,
UnsLine, Equipment, Galaxy DriverInstance pointing at the running
OtOpcUaGalaxyHost pipe, Tag bound to the Equipment, two Scripts (Doubled +
OverTemp predicate), VirtualTag, ScriptedAlarm. Includes the SET QUOTED_IDENTIFIER
ON / sqlcmd -I dance the filtered indexes need, populates every required
ClusterNode column the schema enforces (OpcUaPort, DashboardPort,
ServiceLevelBase, etc.), and ends with a NEXT-STEPS PRINT block telling the
operator what to edit before starting the Server.
## First-run evidence on the dev box
Running the seed + starting the Server (non-elevated shell, Galaxy.Host
already running) emitted these log lines verbatim — proving the entire
Phase 7 wiring chain executes in production:
Bootstrapped from central DB: generation 1
Phase 7 historian sink: no driver provides IAlarmHistorianWriter — using NullAlarmHistorianSink
VirtualTagEngine loaded 1 tag(s), 1 upstream subscription(s)
ScriptedAlarmEngine loaded 1 alarm(s)
Phase 7: composed engines from generation 1 — 1 virtual tag(s), 1 scripted alarm(s), 2 script(s)
Each line corresponds to a piece shipped in #243 / #244 / #245 / #246 / #247.
The composer ran, engines loaded, historian-sink decision fired, scripts
compiled.
## Surfaced — pre-Phase-7 deployment-wiring gaps (NOT Phase 7 regressions)
1. Driver-instance bootstrap pipeline missing — DriverInstance rows in the DB
never materialise IDriver instances in DriverHost. Filed as task #248.
2. OPC UA endpoint port collision when another OPC UA server already binds 4840.
Operator concern; documented in the runbook prereqs.
Both predate Phase 7 + are orthogonal. Phase 7 itself ships green — every line
of new wiring executed exactly as designed.
## Phase 7 production wiring chain — VALIDATED end-to-end
- ✅#243 composition kernel
- ✅#244 driver bridge
- ✅#245 scripted-alarm IReadable adapter
- ✅#246 Program.cs wire-in
- ✅#247 Galaxy.Host historian writer + SQLite sink activation
- ✅#240 this — live smoke + runbook + first-run evidence
Phase 7 is complete + production-ready, modulo the pre-existing
driver-bootstrap gap (#248).
Locks in 22 design decisions from the planning conversation: C# via Roslyn scripting; virtual tags in the Equipment tree (not a separate /Virtual/ namespace); change-driven + timer-driven triggers operator-configurable per tag; Shape A one-script-per-tag-or-alarm (no predicate/action split); full OPC UA Part 9 alarm fidelity; read-only sandbox (scripts read any tag, write only to virtual tags, no File/HttpClient/Process/reflection); AST-inferred dependencies via CSharpSyntaxWalker (non-literal tag paths rejected at publish); config DB storage with generation-sealed cache; ctx.GetTag returns a full DataValue {Value, StatusCode, Timestamp}; per-tag Historize checkbox; per-tag error isolation (throwing script sets tag quality BadInternalError, engine unaffected); dedicated scripts-*.log Serilog sink bound to ctx.Logger; alarm message as template with {TagPath} substitution resolved at event emission; ActiveState recomputed from tags on startup while EnabledState/AckedState/ConfirmedState/ShelvingState + audit persist to config DB; historian sink scope = all IAlarmSource impls with per-alarm toggle; SQLite store-and-forward on the node so operators are never blocked by Historian downtime; IPC to Galaxy.Host for ingestion reusing the already-loaded aahClientManaged DLLs; Monaco editor for Admin code editing; serial cascade evaluation for v1 (parallel as follow-up); shelving UX via OPC UA method calls only with no custom Admin controls (operator drives state transitions from plant HMIs or Client.CLI); 30-day dead-letter retention with manual retry button; test harness accepts only declared-input paths so the harness enforces dependency declaration.
Eight streams totaling ~10-12 weeks, scope-comparable to Phase 6: A - Core.Scripting (Roslyn engine + sandbox + AST inference + logger); B - virtual tag engine (dependency graph + change/timer schedulers + historize); C - scripted alarm engine (Part 9 state machine + template messages + startup recovery + OPC UA method binding); D - historian alarm sink (SQLite store-and-forward + Galaxy.Host IPC contract extension); E - config DB schema (four new tables under sp_PublishGeneration); F - Admin UI scripting tab (Monaco + test harness + dependency preview + script-log viewer + historian diagnostics); G - address-space integration (extend EquipmentNodeWalker for virtual source kind + extend DriverNodeManager dispatch); H - exit gate.
Compliance-check surface covers sandbox escape (typeof/Assembly.Load/File/HttpClient attempts must fail at compile), dependency inference (literal-only paths), change cascade (topological ordering), cycle rejection at publish, startup recovery (ack/confirm/shelve survive restart but ActiveState recomputed), ack audit trail persistence, historian queue durability (Galaxy.Host offline → online drains in-order), per-alarm historian toggle gating, script timeout isolation, log sink isolation, ACL binding (virtual tags inherit Equipment scope grants).
Follow-up artifacts tracked as tasks #231-#238 (stream placeholders). Supporting doc updates (plan.md §6 Migration Strategy, config-db-schema.md §§ for the four new tables, driver-specs.md §Alarm semantics clarification, new ADR-002 for driver-vs-virtual dispatch) will land alongside the streams that touch them, not in this doc.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Production IHostProcessLauncher (ProcessHostLauncher.cs): Process.Start spawns OtOpcUa.Driver.FOCAS.Host.exe with OTOPCUA_FOCAS_PIPE / OTOPCUA_ALLOWED_SID / OTOPCUA_FOCAS_SECRET / OTOPCUA_FOCAS_BACKEND in the environment (supervisor-owned, never disk), polls FocasIpcClient.ConnectAsync at 250ms cadence until the pipe is up or the Host exits or the ConnectTimeout deadline passes, then wraps the connected client in an IpcFocasClient. TerminateAsync kills the entire process tree + disposes the IPC stream. ProcessHostLauncherOptions carries HostExePath + PipeName + AllowedSid plus optional SharedSecret (auto-generated from a GUID when omitted so install scripts don't have to), Arguments, Backend (fwlib32/fake/unconfigured default-unconfigured), ConnectTimeout (15s), and Series for CNC pre-flight.
Post-mortem MMF (Host/Stability/PostMortemMmf.cs + Proxy/Supervisor/PostMortemReader.cs): ring-buffer of the last ~1000 IPC operations written by the Host into a memory-mapped file. On a Host crash the supervisor reads the MMF — which survives process death — to see what was in flight. File format: 16-byte header [magic 'OFPC' (0x4F465043) | version | capacity | writeIndex] + N × 256-byte entries [8-byte UTC unix ms | 8-byte opKind | 240-byte UTF-8 message + null terminator]. Magic distinguishes FOCAS MMFs from the Galaxy MMFs that ship the same format shape. Writer is single-producer (Host) with a lock_writeGate; reader is multi-consumer (Proxy + any diagnostic tool) using a separate MemoryMappedFile handle.
NSSM install wrappers (scripts/install/Install-FocasHost.ps1 + Uninstall-FocasHost.ps1): idempotent service registration for OtOpcUaFocasHost. Resolves SID from the ServiceAccount, generates a fresh shared secret per install if not supplied, stages OTOPCUA_FOCAS_PIPE/SID/SECRET/BACKEND in AppEnvironmentExtra so they never hit disk, rotates 10MB stdout/stderr logs under %ProgramData%\OtOpcUa, DependOnService=OtOpcUa so startup order is deterministic. Backend selector defaults to unconfigured so a fresh install doesn't accidentally load a half-configured Fwlib32.dll on first start.
Tests (7 new, 2 files): PostMortemMmfTests.cs in FOCAS.Host.Tests — round-trip write+read preserves order + content, ring-buffer wraps at capacity (writes 10 entries to a 3-slot buffer, asserts only op-7/8/9 survive in FIFO order), message truncation at the 240-byte cap is null-terminated + non-overflowing, reopening an existing file preserves entries. PostMortemReaderCompatibilityTests.cs in FOCAS.Tests — hand-writes a file in the exact host format (magic/entry layout) + asserts the Proxy reader decodes with correct ring-walk ordering when writeIndex != 0, empty-return on missing file + magic mismatch. Keeps the two codebases in format-lockstep without the net10 test project referencing the net48 Host assembly.
Docs updated: docs/v2/implementation/focas-isolation-plan.md promoted from DRAFT to PRs A-E shipped status with per-PR citations + post-ship test counts (189 + 24 + 13 = 226 FOCAS-family tests green). docs/drivers/FOCAS-Test-Fixture.md §5 updated from "architecture scoped but not implemented" to listing the shipped components with the FwlibHostedBackend gap explicitly labeled as hardware-gated. Install-FocasHost.ps1 documents the OTOPCUA_FOCAS_BACKEND selector + points at docs/v2/focas-deployment.md for Fwlib32.dll licensing.
What ISN'T in this PR: (1) the real FwlibHostedBackend implementing IFocasBackend with the P/Invoke — requires either a CNC on the bench or a licensed FANUC developer kit to validate, tracked under #220 as a single follow-up task; (2) Admin /hosts surface integration for FOCAS runtime status — Galaxy Tier-C already has the shape, FOCAS can slot in when someone wires ObservedCrashes/StickyAlertActive/BackoffAttempt to the FleetStatusHub; (3) a full integration test that actually spawns a real FOCAS Host process — ProcessHostLauncher is tested via its contract + the MMF is tested via round-trip, but no test spins up the real exe (the Galaxy Tier-C tests do this, but the FOCAS equivalent adds no new coverage over what's already in place).
Total FOCAS-family tests green after this PR: 189 driver + 24 Shared + 13 Host = 226.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
scripts/compliance/phase-6-2-compliance.ps1 replaces the stub TODOs with 23
real checks spanning:
- Stream A: LdapGroupRoleMapping entity + AdminRole enum + ILdapGroupRoleMappingService
+ impl + write-time invariant + EF migration all present.
- Stream B: OpcUaOperation enum + NodeScope + AuthorizationDecision tri-state
+ IPermissionEvaluator + PermissionTrie + Builder + Cache keyed on
GenerationId + UserAuthorizationState with MembershipFreshnessInterval=15m
and AuthCacheMaxStaleness=5m + TriePermissionEvaluator + HistoryRead uses
its own flag.
- Control/data-plane separation: the evaluator + trie + cache + builder +
interface all have zero references to LdapGroupRoleMapping (decision #150).
- Stream C foundation: ILdapGroupsBearer + AuthorizationGate with StrictMode
knob. DriverNodeManager dispatch-path wiring (11 surfaces) is Deferred,
tracked as task #143.
- Stream D data layer: ValidatedNodeAclAuthoringService + exception type +
rejects None permissions. Blazor UI pieces (RoleGrantsTab, AclsTab,
SignalR invalidation, draft diff) are Deferred, tracked as task #144.
- Cross-cutting: full solution dotnet test runs; 1097 >= 1042 baseline;
tolerates the one pre-existing Client.CLI Subscribe flake.
IPermissionEvaluator doc-comment reworded to avoid mentioning the literal
type name "LdapGroupRoleMapping" — the compliance check does a text-absence
sweep for that identifier across the data-plane files.
docs/v2/implementation/phase-6-2-authorization-runtime.md status updated from
DRAFT to SHIPPED (core). Two deferred follow-ups explicitly called out so
operators see what's still pending for the "Phase 6.2 fully wired end-to-end"
milestone.
`Phase 6.2 compliance: PASS` — exit 0. Any regression that deletes a class
or re-introduces an LdapGroupRoleMapping reference into the data-plane
evaluator turns a green check red + exit non-zero.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
scripts/compliance/phase-6-1-compliance.ps1 replaces the stub TODOs with 34
real checks covering:
- Stream A: pipeline builder + CapabilityInvoker + WriteIdempotentAttribute
present; pipeline key includes HostName (per-device isolation per decision
#144); OnReadValue / OnWriteValue / HistoryRead route through invoker in
DriverNodeManager; Galaxy supervisor CircuitBreaker + Backoff preserved.
- Stream B: DriverTier enum; DriverTypeMetadata requires Tier; MemoryTracking
+ MemoryRecycle (Tier C-gated) + ScheduledRecycleScheduler (rejects Tier
A/B) + demand-aware WedgeDetector all present.
- Stream C: DriverHealthReport + HealthEndpointsHost; state matrix Healthy=200
/ Faulted=503 asserted in code; LogContextEnricher; JSON sink opt-in via
Serilog:WriteJson.
- Stream D: GenerationSealedCache + ReadOnly marking + GenerationCacheUnavailable
exception path; ResilientConfigReader + StaleConfigFlag.
- Stream E data layer: DriverInstanceResilienceStatus entity +
DriverResilienceStatusTracker. SignalR/Blazor surface is Deferred per the
visual-compliance follow-up pattern borrowed from Phase 6.4.
- Cross-cutting: full solution `dotnet test` runs; asserts 1042 >= 906
baseline; tolerates the one pre-existing Client.CLI Subscribe flake and
flags any new failure.
Running the script locally returns "Phase 6.1 compliance: PASS" — exit 0. Any
future regression that deletes a class or un-wires a dispatch path turns a
green check red + exit non-zero.
docs/v2/implementation/phase-6-1-resilience-and-observability.md status
updated from DRAFT to SHIPPED with the merged-PRs summary + test count delta +
the single deferred follow-up (visual review of the Admin /hosts columns).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
After shipping the four Phase 6 plan drafts (PRs 77-80), the adversarial-review
adjustments lived only as trailing "Review" sections. An implementer reading
Stream A would find the original unadjusted guidance, then have to cross-reference
the review to reconcile. This PR makes the plans genuinely executable:
1. Merges every ACCEPTed review finding into the actual Scope / Stream / Compliance
sections of each phase plan:
- phase-6-1: Scope table rewrite (per-capability retry, (instance,host) pipeline key,
MemoryTracking vs MemoryRecycle split, hybrid watchdog formula, demand-aware
wedge detector, generation-sealed LiteDB). Streams A/B/D + Compliance rewritten.
- phase-6-2: AuthorizationDecision tri-state, control/data-plane separation,
MembershipFreshnessInterval (15 min), AuthCacheMaxStaleness (5 min),
subscription stamp-and-reevaluate. Stream C widened to 11 OPC UA operations.
- phase-6-3: 8-state ServiceLevel matrix (OPC UA Part 5 §6.3.34-compliant),
two-layer peer probe (/healthz + UaHealthProbe), apply-lease via await using,
publish-generation fencing, InvalidTopology runtime state, ServerUriArray
self-first + peers. New Stream F (interop matrix + Galaxy failover).
- phase-6-4: DraftRevisionToken concurrency control, staged-import via
EquipmentImportBatch with user-scoped visibility, CSV header version marker,
decision-#117-aligned identifier columns, 1000-row diff cap,
decision-#139 OPC 40010 fields, Identification inherits Equipment ACL.
2. Appends decisions #143 through #162 to docs/v2/plan.md capturing the
architectural commitments the adjustments created. Each decision carries its
dated rationale so future readers know why the choice was made.
3. Scaffolds scripts/compliance/phase-6-{1,2,3,4}-compliance.ps1 — PowerShell
stubs with Assert-Todo / Assert-Pass / Assert-Fail helpers. Every check
maps to a Stream task ID from the corresponding phase plan. Currently all
checks are TODO and scripts exit 0; each implementation task is responsible
for replacing its TODO with a real check before closing that task. Saved
as UTF-8 with BOM so Windows PowerShell 5.1 parses em-dash characters
without breaking.
Net result: the Phase 6.1 plan is genuinely ready to execute. Stream A.3 can
start tomorrow without reconciling Streams vs. Review on every task; the
compliance script is wired to the Stream IDs; plan.md has the architectural
commitments that justify the Stream choices.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Renames all 11 projects (5 src + 6 tests), the .slnx solution file, all source-file namespaces, all axaml namespace references, and all v1 documentation references in CLAUDE.md and docs/*.md (excluding docs/v2/ which is already in OtOpcUa form). Also updates the TopShelf service registration name from "LmxOpcUa" to "OtOpcUa" per Phase 0 Task 0.6.
Preserves runtime identifiers per Phase 0 Out-of-Scope rules to avoid breaking v1/v2 client trust during coexistence: OPC UA `ApplicationUri` defaults (`urn:{GalaxyName}:LmxOpcUa`), server `EndpointPath` (`/LmxOpcUa`), `ServerName` default (feeds cert subject CN), `MxAccessConfiguration.ClientName` default (defensive — stays "LmxOpcUa" for MxAccess audit-trail consistency), client OPC UA identifiers (`ApplicationName = "LmxOpcUaClient"`, `ApplicationUri = "urn:localhost:LmxOpcUaClient"`, cert directory `%LocalAppData%\LmxOpcUaClient\pki\`), and the `LmxOpcUaServer` class name (class rename out of Phase 0 scope per Task 0.5 sed pattern; happens in Phase 1 alongside `LmxNodeManager → GenericDriverNodeManager` Core extraction). 23 LmxOpcUa references retained, all enumerated and justified in `docs/v2/implementation/exit-gate-phase-0.md`.
Build clean: 0 errors, 30 warnings (lower than baseline 167). Tests at strict improvement over baseline: 821 passing / 1 failing vs baseline 820 / 2 (one flaky pre-existing failure passed this run; the other still fails — both pre-existing and unrelated to the rename). `Client.UI.Tests`, `Historian.Aveva.Tests`, `Client.Shared.Tests`, `IntegrationTests` all match baseline exactly. Exit gate compliance results recorded in `docs/v2/implementation/exit-gate-phase-0.md` with all 7 checks PASS or DEFERRED-to-PR-review (#7 service install verification needs Windows service permissions on the reviewer's box).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Record Phase 0 entry baseline: 820 passing, 2 pre-existing failures (Client.CLI.Tests.SubscribeCommandTests.Execute_PrintsSubscriptionMessage and Tests.MxAccess.MxAccessClientMonitorTests.Monitor_ProbeDataChange_PreventsStaleReconnect), 0 build errors, 167 build warnings. The two failures exist on v2 as of commit 1189dc8 and are unrelated to the rename. Phase 0 exit gate adapts the requirement to "failure count = baseline (2); pass count ≥ baseline (820)".
Branch-naming convention updated in implementation/overview.md and phase-0 doc: cannot use `v2/phase-N-slug` form because git treats `/` as path separator and `v2` already exists as a branch, blocking creation of any `v2/...` branch. Convention is now `phase-N-slug` (no v2/ prefix).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
OtOpcUa-side requirements all met or trivially met by v2: Basic256Sha256 + SignAndEncrypt + username token (transport security covers this), reject-and-trust cert workflow, endpoint URL must NOT include /discovery suffix (forum-documented failure mode), hostname-stable certs (decision #86 already enforces this since clients pin trust to ApplicationUri), OI Gateway service must NOT run under SYSTEM (deployment-guide concern). Two integrator-burden risks tracked: validation/GxP paperwork (no AVEVA blueprint exists for non-AVEVA upstream servers in Part 11 deployments — engage QA/regulatory in Year 1) and unpublished scale benchmarks (in-house benchmark required in Year 2 before cutover scheduling).
Phase 1 acceptance gains Task E.10 (decision #142): end-to-end AppServer-via-OI-Gateway smoke test against a Phase 1 OtOpcUa instance, catching AppServer-specific quirks (cert exchange, endpoint URL handling, service account, security mode combo) well before the Year 3 tier-3 cutover schedule. Non-blocking for Phase 1 exit if it surfaces only documentation-level fixes; blocking if it surfaces architectural incompatibility.
New file `docs/v2/aveva-system-platform-io-research.md` captures the full research with all source citations (AVEVA docs, Communications Drivers Pack readmes, Software Toolbox / InSource partner walkthroughs, Inductive Automation forum failure-mode reports). Plan.md decision log gains #141 and #142; Reference Documents section links the new doc.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
ACL design defines NodePermissions bitmask flags covering Browse / Read / Subscribe / HistoryRead / WriteOperate / WriteTune / WriteConfigure / AlarmRead / AlarmAcknowledge / AlarmConfirm / AlarmShelve / MethodCall plus common bundles (ReadOnly / Operator / Engineer / Admin); 6-level scope hierarchy (Cluster / Namespace / UnsArea / UnsLine / Equipment / Tag) with default-deny + additive grants and Browse-implication on ancestors; per-LDAP-group grants in a new generation-versioned NodeAcl table edited via the same draft → diff → publish → rollback boundary as every other content table; per-session permission-trie evaluator with O(depth × group-count) cost cached for the lifetime of the session and rebuilt on generation-apply or LDAP group cache expiry; cluster-create workflow seeds a default ACL set matching the v1 LmxOpcUa LDAP-role-to-permission map for v1 → v2 consumer migration parity; Admin UI ACL tab with two views (by LDAP group, by scope), bulk-grant flow, and permission simulator that lets operators preview "as user X" effective permissions across the cluster's UNS tree before publishing; explicit Deny deferred to v2.1 since verbose grants suffice at v2.0 fleet sizes; only denied OPC UA operations are audit-logged (not allowed ones — would dwarf the audit log). Schema doc gains the NodeAcl table with cross-cluster invariant enforcement and same-generation FK validation; admin-ui.md gains the ACLs tab; phase-1 doc gains Task E.9 wiring this through Stream E plus a NodeAcl entry in Task B.1's DbContext list.
Dev-environment doc inventories every external resource the v2 build needs across two tiers per decision #99 — inner-loop (in-process simulators on developer machines: SQL Server local or container, GLAuth at C:\publish\glauth\, local dev Galaxy) and integration (one dedicated Windows host with Docker Desktop on WSL2 backend so TwinCAT XAR VM can run in Hyper-V alongside containerized oitc/modbus-server, plus WSL2-hosted Snap7 and ab_server, plus OPC Foundation reference server, plus FOCAS TestStub and FaultShim) — with concrete container images, ports, default dev credentials (clearly marked dev-only since production uses Integrated Security / gMSA per decision #46), bootstrap order for both tiers, network topology diagram, test data seed locations, and operational risks (TwinCAT trial expiry automation, Docker pricing, integration host SPOF mitigation, per-developer GLAuth config sync, Aveva license scoping that keeps Galaxy tests on developer machines and off the shared host).
Removes consumer cutover (ScadaBridge / Ignition / System Platform IO) from OtOpcUa v2 scope per decision #136 — owned by a separate integration / operations team, tracked in 3-year-plan handoff §"Rollout Posture" and corrections §C5; OtOpcUa team's scope ends at Phase 5. Updates implementation/overview.md phase index to drop the "6+" row and add an explicit "OUT of v2 scope" callout; updates phase-1 and phase-2 docs to reframe cutover as integration-team-owned rather than future-phase numbered.
Decisions #129–137 added: ACL model (#129), NodeAcl generation-versioned (#130), v1-compatibility seed (#131), denied-only audit logging (#132), two-tier dev environment (#133), Docker WSL2 backend for TwinCAT VM coexistence (#134), TwinCAT VM centrally managed / Galaxy on dev machines only (#135), cutover out of v2 scope (#136), dev credentials documented openly (#137).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Add Phase 2 detailed implementation plan (docs/v2/implementation/phase-2-galaxy-out-of-process.md) covering the largest refactor phase — moving Galaxy from the legacy in-process OtOpcUa.Host project into the Tier C out-of-process topology specified in driver-stability.md. Five work streams: A. Driver.Galaxy.Shared (.NET Standard 2.0 IPC contracts using MessagePack with hello-message version negotiation), B. Driver.Galaxy.Host (.NET 4.8 x86 separate Windows service that owns MxAccessBridge / GalaxyRepository / alarm tracking / GalaxyRuntimeProbeManager / Wonderware Historian SDK / STA thread + Win32 message pump with health probe / MxAccessHandle SafeHandle for COM lifetime / subscription registry with cross-host quality scoping / named-pipe IPC server with mandatory ACL + caller SID verification + per-process shared secret / memory watchdog with Galaxy-specific 1.5x baseline + 200MB floor + 1.5GB ceiling / recycle policy with 15s grace + WM_QUIT escalation to hard-exit / post-mortem MMF writer / Driver.Galaxy.FaultShim test-only assembly), C. Driver.Galaxy.Proxy (.NET 10 in-process driver implementing every capability interface, heartbeat sender on dedicated channel with 2s/3-miss tolerance, supervisor with respawn-with-backoff and crash-loop circuit breaker with escalating cooldown 1h/4h/24h, address space build via IAddressSpaceBuilder producing byte-equivalent v1 output), D. Retire legacy OtOpcUa.Host (delete from solution, two-service Windows installer, migrate appsettings.json Galaxy sections to central DB DriverConfig blob), E. Parity validation (v1 IntegrationTests pass count = baseline failures = 0, scripted Client.CLI walkthrough output diff vs v1 only differs in timestamps/latency, four named regression tests for the 2026-04-13 stability findings). Compliance script verifies all eight Tier C cross-cutting protections have named passing tests. Decision #128 captures the survey-removal; cross-references added to plan.md Reference Documents and overview.md phase index.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>