a23de2a7e45d52b6508fb5f92ec27a24d29c39d8
7 Commits
| Author | SHA1 | Message | Date | |
|---|---|---|---|---|
|
|
a23de2a7e4 |
Phase 6.3 A.2 + D.1 — GenerationRefreshHostedService: poll + lease-wrap apply
Closes tasks #132 + #118 (GA hardening backlog). Before this commit, the Server only observed the generation in force at process start (SealedBootstrap). Peer-published generations accumulated in the shared config DB while the running node kept serving the generation it had sealed on boot. Two consequences: 1. Operator role-swaps required a process restart — Admin publishes a new generation, but the Server's RedundancyCoordinator never re-read the topology. 2. ApplyLeaseRegistry had no apply to wrap. ServiceLevelBand sat at PrimaryHealthy (255) during every publish because nothing opened a lease; PrimaryMidApply (200) was effectively dead code. New GenerationRefreshHostedService (src/.../Server/Hosting/): - Polls sp_GetCurrentGenerationForCluster every 5s (tunable). - On change: opens leases.BeginApplyLease(newGenerationId, Guid.NewGuid()), calls coordinator.RefreshAsync inside the `await using`, releases on scope exit (success / exception / cancellation via IAsyncDisposable). - Diagnostic properties: LastAppliedGenerationId, TickCount, RefreshCount. - Delegate-injected currentGenerationQuery for test drive-through; real path is the private static DefaultQueryCurrentGenerationAsync. - Registered as HostedService in Program.cs alongside the Phase 6.3 redundancy / peer-probe stack. Scope intentionally narrow: only the coordinator refreshes today. Driver re-init, virtual-tag re-bind, script-engine reload remain as follow-up wiring. The lease wrap is the right seam for those subscribers to hook once they grow hot-reload support — the doc comments say so. Tests - 5 new unit tests in GenerationRefreshHostedServiceTests (first-apply, identity no-op, change-triggers-refresh, null-generation-is-no-op, lease-is-released-on-exit). Stub generation-query delegate; real coordinator backed by EF InMemory DB. - Server.Tests total 252 → 257. Docs - v2-release-readiness.md Phase 6.3 follow-ups list marks the sp_PublishGeneration lease wrap bullet struck-through with close-out note. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> |
||
|
|
de77d42eab |
Phase 6.3 Stream B — peer-probe HostedServices populating PeerReachabilityTracker
Closes task #116 (GA hardening backlog). Before this commit the RedundancyStatePublisher saw PeerReachability.Unknown for every peer because the tracker had no writers — every healthy peer got degraded to the Isolated-Primary band (230) even when fully reachable. Not release-blocking (safe default), but not the full non-transparent- redundancy UX either. Two-layer probe model per docs/v2/implementation/phase-6-3-redundancy-runtime.md §Stream B: - PeerHttpProbeLoop (Stream B.1) — fast-fail layer at 2 s / 1 s timeout. Hits each peer's http://{Host}:{DashboardPort}/healthz via an injected IHttpClientFactory. Writes the HTTP bit of PeerReachability while preserving the UA bit from the last UA probe so a transient HTTP blip doesn't clobber the authoritative UA reading. - PeerUaProbeLoop (Stream B.2) — authoritative layer at 10 s / 5 s timeout. Calls DiscoveryClient.GetEndpoints against opc.tcp://{Host}: {OpcUaPort} — cheap compared to a full Session.Create, no cert trust required. Short-circuits when the HTTP probe last reported the peer unhealthy (no wasted handshakes on a known-dead endpoint), clearing the stale UaHealthy bit in that case. Both inherit from BackgroundService, follow the tick/delay/catch pattern RedundancyPublisherHostedService + ResilienceStatusPublisherHostedService established, and expose TickAsync() as internal for test drive-through. New PeerProbeOptions class carries the four intervals/timeouts so operators can tune cadence per site. Registered as singleton in Program.cs; HTTP client registered by name so the OtOpcUa handler chain (Serilog enrichers, potential future OpenTelemetry instrumentation) isn't bypassed. Tests — 9 new unit tests across PeerHttpProbeLoopTests (5) and PeerUaProbeLoopTests (4). All pass. Server.Tests total 243 → 252. Full solution build clean. Docs: v2-release-readiness.md Phase 6.3 follow-ups list marks the peer-probe bullet struck-through with a close-out note. Still deferred in Phase 6.3: - OPC UA variable-node binding (task #117 — ServiceLevel + ServerUriArray) - sp_PublishGeneration lease wrap (task #118) - Client interop matrix (task #119) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> |
||
|
|
4b0664bd55 |
FOCAS — retire Tier-C split, inline managed wire client, make read-only
Migration closes the FOCAS Tier-C architecture. OtOpcUa previously had
`Driver.FOCAS.Host` (NSSM-wrapped Windows service loading Fwlib64.dll via
P/Invoke) + `Driver.FOCAS.Shared` (MessagePack IPC contracts) + a C shim
DLL stand-in for unit tests. All of it is deleted; the driver is now a
single in-process managed assembly talking the FOCAS/2 Ethernet binary
protocol directly on TCP:8193.
Architecture
- Pure-managed `FocasWireClient` inlined at `src/.../Driver.FOCAS/Wire/`
(owner-imported — see Wire/FocasWireClient.cs for the full surface).
Opens two TCP sockets, runs the initiate handshake, serialises requests
on socket 2 through a semaphore, closes cleanly with PDU + socket
teardown. Both sync `IDisposable` and async `IAsyncDisposable`.
- `WireFocasClient` (same folder) adapts the wire client to OtOpcUa's
`IFocasClient` surface — fixed-tree reads, PARAM/MACRO/PMC addresses,
alarms. Writes return `BadNotWritable` by design — OtOpcUa is read-only
against FOCAS.
- `FocasDriverFactoryExtensions` now accepts `"Backend": "wire"` (default)
and `"Backend": "unimplemented"`. Legacy `ipc` and `fwlib` backends are
rejected at startup with a diagnostic pointing at the migration doc.
Deletions
- `src/ZB.MOM.WW.OtOpcUa.Driver.FOCAS.Host/` — whole project + Ipc/,
Backend/, Stability/, Program.cs.
- `src/ZB.MOM.WW.OtOpcUa.Driver.FOCAS.Shared/` — Contracts/, FrameReader,
FrameWriter, whole project.
- `tests/...Driver.FOCAS.Host.Tests/` + `.Shared.Tests/` — whole projects.
- `src/.../Driver.FOCAS/FwlibNative.cs` + `FwlibFocasClient.cs` — 21
P/Invokes + 7 `Pack=1` marshalling structs + the Fwlib-backed
`IFocasClient` implementation.
- `src/.../Driver.FOCAS/Ipc/` + `Supervisor/` — IPC client wrapper +
Host-process supervisor (backoff, circuit breaker, heartbeat, post-
mortem reader, process launcher).
- `scripts/install/Install-FocasHost.ps1` — NSSM service installer.
- `tests/.../Driver.FOCAS.Tests/{IpcFocasClientTests, IpcLoopback,
FwlibNativeHelperTests, PostMortemReaderCompatibilityTests,
SupervisorTests, FocasDriverFactoryExtensionsTests}.cs` — tests that
exercised the retired surfaces.
- `tests/.../Driver.FOCAS.IntegrationTests/Shim/` — the zig-built C shim
DLL that masqueraded as Fwlib64.dll.
Solution changes
- `ZB.MOM.WW.OtOpcUa.slnx` drops the 4 retired project refs.
- `src/.../Driver.FOCAS.csproj` drops the Shared ProjectReference, adds
`Microsoft.Extensions.Logging.Abstractions` for the optional `ILogger`
hook in `FocasWireClient`.
- `src/.../Driver.FOCAS.Cli.csproj` drops the six `<Content Include>`
entries that copied `vendor/fanuc/*.dll` into the CLI bin. CLI now uses
`WireFocasClient` directly.
- `FocasDriver` default factory flips to `Wire.WireFocasClientFactory`.
Integration tests
- New `tests/.../Driver.FOCAS.IntegrationTests/` project covering fixed-
tree reads (identity, axes, dynamic, program, operation mode, timers,
spindle load + max RPM, servo meters), user-authored PARAM / MACRO /
PMC reads, `DiscoverAsync` emission, `SubscribeAsync` + `OnDataChange`,
`IAlarmSource` raise/clear transitions, and `ProbeAsync` /
`OnHostStatusChanged`. 9 e2e tests against the focas-mock fixture
(Docker container with the vendored Python mock's native FOCAS/2
Ethernet responder).
- `scripts/integration/run-focas.ps1` orchestrates compose up → tests →
compose down. Dropped the shim-build stage + DLL-copy step + the split
testhost workaround (the latter only existed because of native-DLL
lifecycle bugs the shim tripped).
- Docker compose collapses from 11 per-series services to one `focas-sim`
service. Tests seed per-series state via `mock_load_profile` at test
start.
- Vendored focas-mock snapshot refreshed to pick up upstream's native
FOCAS/2 Ethernet responder (was 660 lines, now 1018) — the
pre-refresh snapshot only spoke the JSON admin protocol.
Tests
- 145/145 unit tests in `Driver.FOCAS.Tests` pass (was 208 pre-deletion;
63 removed tests exercised the retired IPC/shim/supervisor/Fwlib
surfaces).
- 9/9 integration tests pass against the refreshed mock.
- `FocasScaffoldingTests.Unimplemented_factory_throws_on_Create…` updated
to assert the new diagnostic message pointing at
`docs/drivers/FOCAS.md` rather than the now-gone `Fwlib64.dll`.
Docs
- `docs/drivers/FOCAS.md` rewritten for the managed wire topology —
deployment collapses to one `"Backend": "wire"` config block, no
separate service, no DLL deployment, no pipe ACL.
- `docs/drivers/FOCAS-Test-Fixture.md` updated — single TCP probe skip
gate instead of TCP + shim probe; fewer moving parts.
- `docs/drivers/README.md` row for FOCAS reflects the Tier-A managed
topology (previously listed Tier-C + `Fwlib64.dll` P/Invoke).
- `docs/Driver.FOCAS.Cli.md` drops the Tier-C architecture-note section.
- `docs/v2/implementation/focas-isolation-plan.md` marked historical —
the plan it documents was executed then superseded by the wire client.
- `docs/v2/v2-release-readiness.md` re-audited 2026-04-24. Phase 5
driver complement closed. FOCAS change-log entry added.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
||
|
|
e71f44603c |
v2 release-readiness — blocker #3 closed; all three code-path blockers shut
Phase 6.3 Streams A + C core shipped (PRs #98-99): - RedundancyCoordinator + ClusterTopologyLoader read the shared config DB + enforce the Phase 6.3 invariants (1-2 nodes, unique ApplicationUri, ≤1 Primary in Warm/Hot). Startup fails fast on violation. - RedundancyStatePublisher orchestrates topology + apply lease + recovery state + peer reachability through ServiceLevelCalculator. Edge-triggered OnStateChanged + OnServerUriArrayChanged events the OPC UA variable-node layer subscribes to. Doc updates: - Top status flips from NOT YET RELEASE-READY → RELEASE-READY (code-path). Remaining work is manual (client interop matrix, deployment signoff, OPC UA CTT pass) + hardening follow-ups that don't block v2 GA ship. - Release-blocker #3 section struck through + CLOSED with PR links. Remaining Phase 6.3 surfaces (peer-probe HostedServices, OPC UA variable-node binding, sp_PublishGeneration lease wrap, client interop) explicitly listed as hardening follow-ups. - Change log: new dated entry. All three release blockers identified at the capstone are closed: - #1 Phase 6.2 dispatch wiring → PR #94 (2026-04-19) - #2 Phase 6.1 Stream D wiring → PR #96 (2026-04-19) - #3 Phase 6.3 Streams A/C core → PRs #98-99 (2026-04-19) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> |
||
|
|
a8401ab8fd |
v2 release-readiness — blocker #2 closed; doc reflects state
PR #96 closed the Phase 6.1 Stream D config-cache wiring blocker. - Status line: "one of three release blockers remains". - Blocker #2 struck through + CLOSED with PR link. Periodic-poller + richer- snapshot-payload follow-ups downgraded to hardening. - Change log: dated entry. One blocker remains: Phase 6.3 Streams A/C/F redundancy runtime (tasks #145, #147, #150). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> |
||
|
|
ba42967943 |
v2 release-readiness — blocker #1 closed; doc reflects state
PR #94 closed the Phase 6.2 dispatch wiring blocker. Update the dashboard: - Status line: "two of three release blockers remain". - Release-blocker #1 section struck through + marked CLOSED with PR link. Remaining Stream C surfaces (Browse / Subscribe / Alarm / Call + finer- grained scope resolution) downgraded to hardening follow-ups — not release-blocking. - Change log: new dated entry. Two remaining blockers: Phase 6.1 Stream D config-cache wiring (task #136) + Phase 6.3 Streams A/C/F redundancy runtime (tasks #145, #147, #150). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> |
||
|
|
3b2d0474a7 |
v2 release-readiness capstone — aggregate compliance runner + release-readiness dashboard
Closes out Phase 6 with the two pieces a release engineer needs before
tagging v2 GA:
1. scripts/compliance/phase-6-all.ps1 — meta-runner that invokes every
per-phase Phase 6.N compliance script in sequence + aggregates results.
Each sub-script runs in its own powershell.exe child process so per-script
$ErrorActionPreference + exit semantics can't interfere with the parent.
Exit 0 = every phase passes; exit 1 = one or more phases failed. Prints a
PASS/FAIL summary matrix at the end.
2. docs/v2/v2-release-readiness.md — single-view dashboard of everything
shipped + everything still deferred + release exit criteria. Called out
explicitly:
- Three release BLOCKERS (must close before v2 GA):
* Phase 6.2 Stream C dispatch wiring — AuthorizationGate exists but no
DriverNodeManager Read/Write/etc. path calls it (task #143).
* Phase 6.1 Stream D follow-up — ResilientConfigReader + sealed-cache
hook not yet consumed by any read path (task #136).
* Phase 6.3 Streams A/C/F — coordinator + UA-node wiring + client
interop still deferred (tasks #145, #147, #150).
- Three nice-to-haves (not release-blocking) — Admin UI polish, background
services, multi-host dispatch.
- Release exit criteria: all 4 compliance scripts exit 0, dotnet test ≤ 1
known flake, blockers closed or v2.1-deferred with written decision,
Fleet Admin signoff on deployment checklist, live-Galaxy smoke test,
OPC UA CTT pass, redundancy cutover validated with at least one
production client.
- Change log at the bottom so future ships of deferred follow-ups just
append dates + close out dashboard rows.
Meta-runner verified locally:
Phase 6.1 — PASS
Phase 6.2 — PASS
Phase 6.3 — PASS
Phase 6.4 — PASS
Aggregate: PASS (elapsed 340 s — most of that is the full solution
`dotnet test` each phase runs).
Net counts at capstone time: 906 baseline → 1159 passing across Phase 6
(+253). 15 deferred follow-up tasks tracked with IDs (#134-137, #143-144,
#145, #147, #149-150, #153, #155-157). v2 is NOT YET release-ready —
capstone makes that explicit rather than letting the "shipped" label on
each phase imply full readiness.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|