Files
lmxopcua/docs/plans/2026-06-18-focas-figure-ww-poison-abcip-gate-design.md
T

12 KiB
Raw Blame History

FOCAS cnc_getfigure + Wonderware poison-event status + AbCip nested-UDT live-gate — Design

Date: 2026-06-18 Branch: feat/focas-figure-ww-poison-abcip-gate (off master 274ba2b1) Backlog items: stillpending.md §A #3 (FOCAS cnc_getfigure), #5 (Wonderware poison-event sidecar wire), #6 (AbCip nested-struct prod live-gate)

Three independent backlog items bundled into one phase. They touch disjoint projects (Driver.FOCAS + its python sim / Driver.Historian.Wonderware{,.Client} + Core.AlarmHistorian / Driver.AbCip.IntegrationTests + docs), so they are independently implementable and parallelizable.


Standing constraints (in force)

  • NO Commons wire/proto contract change, NO Core.Abstractions / breaking interface contract change, NO EF migration, NO bUnit (Razor proven only by live /run — N/A this phase, no Razor touched).
  • Stage by explicit path, never git add .; never stage the never-stage files (sql_login.txt, src/Server/.../Host/pki/, pending.md, current.md, stillpending.md, docker-dev/docker-compose.yml).
  • No force-push, no --no-verify. Never echo/commit secrets. Finish = merge to master + push.
  • dangerouslyDisableSandbox: true for all build/test/rig commands.

Component A — FOCAS cnc_getfigure wire command (backlog #3)

The finding that makes this safe

WireFocasClient (the pure-managed FOCAS/2 Ethernet client) is the production path to real CNCs (not the Fwlib P/Invoke path). Its GetPositionFiguresAsync is a hard-coded empty-list stub (Wire/WireFocasClient.cs:287-297). The consumer wraps it in a graceful probe:

// FocasDriver.cs:662
state.PositionFigures = await SafeProbe(() => client.GetPositionFiguresAsync(ct), []);

and AxisFactor (FocasDriver.cs:~853) uses a per-axis figure only when present and ≥ 0, else falls back to the PositionDecimalPlaces config knob. So a wire command that errors or returns empty degrades to exactly today's behavior — implementing it is monotonic and cannot regress real hardware.

The .NET FocasWireClient and the python focas_mock (tests/.../FOCAS.IntegrationTests/Docker/focas-mock/) speak a co-designed, internally-consistent wire protocol: the mock dispatches on command codes in server.py:_wire_payload (0x56=servo-meter, 0x26=axis position, 0x89=axis names, 0x120=timer, …). The protocol is validated against the sim, not against real Fanuc hardware — that validation is bench-CNC-gated for the entire wire backend (docs/v2/implementation/focas-wire-protocol.md), not unique to this command.

Approach (driver-internal + sim; NO interface change — IFocasClient.GetPositionFiguresAsync already exists)

  1. FocasWireClient.ReadPositionFiguresAsync(...) — mirror ReadServoMeterAsync (FocasWireClient.cs:351-386): send the figure request paired with the axis-name request (0x0089) so figures align positionally to axes (exactly how servo-meter pairs 0x0056+0x0089). Pick a wire command id currently unused by both the client and the mock (the mock uses 0x0E/0x10/0x15/0x16/0x18/0x19/0x1A/0x1C/0x1D/0x23/0x24/0x25/0x26/ 0x35/0x40/0x56/0x57/0x89/0x8A/0x98/0xE1/0xFC/0x120/0x8001/0x8002 — choose a clearly-unused code, e.g. 0x00D3, and document it as sim-consistent / bench-CNC-unvalidated). Response payload = per-axis short dec (decimal-place count); parse into IReadOnlyList<int>. rc != 0 → empty list (graceful).
  2. WireFocasClient.GetPositionFiguresAsync — replace the stub: call the new client method, return its list; any failure path returns empty (never throws — the SafeProbe + per-axis fallback contract is preserved). Rewrite the now-stale <returns> doc.
  3. focas_mock — add a cnc_getfigure admin/handler entry + a _wire_payload branch for the chosen code returning per-axis short dec from a new data-store key position_figures (default 0 per axis10^0 = 1.0 factor → no scaling → every existing integration assertion preserved). Register the method name in constants.py:IMPLEMENTED_FOCAS_METHODS + the server handler map + profile exports as needed.

Testing

  • Driver consumption is already covered offline by FocasPositionAutoScaleTests (FakeFocasClient returns figures → asserts scaled vs. fallback). No change needed there.
  • New end-to-end proof = a skip-gated integration test in Series/WireBackendCoverageTests.cs (the established pattern, [Collection(FocasSimCollection.Name)] + if (_fx.SkipReason is not null) Assert.Skip(...)): mock_patch non-zero position_figures, init the wire-backed FocasDriver, assert the published AbsolutePosition is the scaled value (raw ÷ 10^dec), not the raw integer.
  • Live /run for A = bring the focas-mock up locally on the Mac (docker compose -f tests/.../FOCAS.IntegrationTests/Docker/docker-compose.yml up -d --build) and run the FOCAS integration suite; the new test executes (does not skip) and passes.

Honest boundary (documented, not built)

Real-Fanuc validation of the chosen wire code/payload stays bench-CNC-gated — same status as the whole wire backend. Worst case on real hardware = graceful fallback to the config knob (i.e. no regression).


Component B — Wonderware poison-event per-event status (backlog #5)

The finding that shapes it

The sidecar IPC reply WriteAlarmEventsReply.PerEventOk is a bool[] on both ends — which are both in this repo (...Wonderware.Client/Ipc/Contracts.cs + ...Wonderware/Ipc/Contracts.cs, MessagePack over TCP; not a Commons proto). The client (WonderwareHistorianClient.WriteBatchAsync:340-369) can therefore only produce Ack/RetryPlease, never PermanentFail, so a poison event loops to the retry cap instead of dead-lettering immediately. The HistorianWriteOutcome enum already has PermanentFail and the sink (SqliteStoreAndForwardSink.cs:456-465) already dead-letters it immediately. The sidecar writer seam IAlarmEventWriter.WriteAsync returns only bool[] and its sole real impl is the infra-gated AahClientManagedAlarmEventWriter (AAH SDK).

Approach (additive IPC field + sidecar classifier; NO IAlarmEventWriter change, NO Commons)

  1. Additive wire field (both Contracts.cs): add [Key(4)] byte[] PerEventStatus to WriteAlarmEventsReply (0=Ack, 1=Retry, 2=Permanent). Keep PerEventOk [Key(3)] populated for rolling-deploy back-compat (new client ↔ old sidecar: empty Key(4) → fall back to PerEventOk; old client ↔ new sidecar: ignores Key(4)).
  2. Sidecar HistorianFrameHandler.HandleWriteAlarmEventsAsync: add a pure ClassifyEvents(events) step — an event that is structurally malformed (empty SourceName, empty AlarmType, or EventTimeUtcTicks <= 0) can never persist → mark Permanent and exclude it from the writer batch (mirrors the client's existing corrupt-row exclusion). Remaining events go to the writer; true→Ack, false→Retry. Populate both PerEventOk (Ack→true else false) and PerEventStatus.
  3. Client WriteBatchAsync: when reply.PerEventStatus.Length == batch.Count, map 0/1/2 → Ack/RetryPlease/PermanentFail; else fall back to the existing PerEventOk path. Rewrite the stale <remarks> + inline "PermanentFail is never emitted" comments.

Testing (fully offline — no rig)

  • Sidecar: pure ClassifyEvents unit test (malformed → Permanent + excluded; valid → delegated; writer false → Retry).
  • Client: a FakeSidecarServer reply with PerEventStatus=[2]WriteBatchAsync returns PermanentFail; a reply with empty PerEventStatus → falls back to PerEventOk (back-compat).
  • End-to-end sink: an existing SqliteStoreAndForwardSink test already proves PermanentFail → immediate dead-letter; add/confirm a test that a Permanent classification dead-letters on the first drain (vs. the retry-cap path the finding-002 regression test covers).

Honest boundary (documented)

SDK-semantic permanent rejections (a structurally-valid event the AAH SDK rejects, e.g. unknown tag) still map to Retry→cap until the infra-gated AahClientManagedAlarmEventWriter surfaces richer per-event status — a noted follow-up. This phase closes the structurally-malformed (poison) case the finding describes.


Component C — AbCip nested-struct live-gate (backlog #6)

Verdict (from the feasibility pass)

A local live-gate is architecturally impossible: ab_server (the libplctag CIP sim used by the default abserver tier) does not implement the CIP Template Object service (class 0x6C) that nested-UDT discovery depends on. The decode + threading already shipped (3d8ce4e8/d203f31c; AbCipUdtMember.NestedTemplateId → existing @udt/{id} fetch) and 301 offline tests pass. The honest close formalizes the gate at the existing Emulate fidelity tier (Logix Emulate / real ControlLogix).

Approach (skip-gated test + docs; NO runtime change)

  1. New AbCipEmulateNestedUdtTests (mirror AbCipEmulateUdtReadTests + AbServerProfileGate.SkipUnless(Emulate)): drives FocasDriver-equivalent AbCip discovery against a nested-UDT-bearing Emulate project and asserts the nested struct's atomic leaves are addressable (Parent.Status.Code, Parent.Status.Running) + the nested sub-folder materializes. Skips cleanly on the default abserver tier (which can't serve Template Object).
  2. docs/drivers/AbCip.md — document nested-struct support as Emulate-tier verified (ab_server lacks CIP Template Object), referencing the new test + the existing offline unit coverage.

Testing

The Emulate test compiles and skips locally (no Logix Emulate on this Mac) — proving it is wired into the suite. The decode/threading risk is already pinned by the shipped offline unit tests (CipTemplateObjectDecoderTests + AbCipDriverDiscoveryTests).


Component D — Reconcile + finish

  • stillpending.md (never-staged): mark #3 (FOCAS wire command shipped + sim-proven), #5 (Wonderware poison-event structurally-malformed close + the SDK-semantic follow-up boundary), #6 (AbCip live-gate formalized as the Emulate skip-gated test).
  • Update memory (project_stillpending_backlog.md + MEMORY.md index line).
  • Build clean + targeted tests green (FOCAS + Wonderware + AbCip) + Component A live /run (focas-mock) + merge to master + push.

Task slicing (independent → parallelizable)

Task Component Project(s) Class Parallel with
T1 FOCAS wire cnc_getfigure (client method + stub wire-in + mock handler) Driver.FOCAS + focas-mock (py) standard T2, T3
T2 Wonderware per-event status (DTO ×2 + sidecar classifier + client consume + tests) Wonderware{,.Client} + Core.AlarmHistorian.Tests standard T1, T3
T3 AbCip Emulate nested-UDT skip-gated test + AbCip.md AbCip.IntegrationTests + docs small T1, T2
T4 FOCAS integration test (mock) + live /run verify FOCAS.IntegrationTests small none (after T1)
T5 Reconcile stillpending #3/#5/#6 + memory + finish (build, tests, merge+push) docs (never-staged) small none

Parallel implementers use worktree isolation (the shared-tree git-race lesson) since T1/T2/T3 touch disjoint projects. T4 depends on T1; T5 runs last.

Done =

Build clean + dotnet test green (Driver.FOCAS + Wonderware client/sink + AbCip) + Component A live /run (focas-mock integration test executes & passes) + Components B/C offline-proven + merged to master + pushed.