Files
lmxopcua/docs/plans/2026-06-18-focas-figure-ww-poison-abcip-gate-design.md
T

183 lines
12 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# FOCAS cnc_getfigure + Wonderware poison-event status + AbCip nested-UDT live-gate — Design
**Date:** 2026-06-18
**Branch:** `feat/focas-figure-ww-poison-abcip-gate` (off master `274ba2b1`)
**Backlog items:** `stillpending.md` §A #3 (FOCAS `cnc_getfigure`), #5 (Wonderware poison-event sidecar wire), #6 (AbCip nested-struct prod live-gate)
Three independent backlog items bundled into one phase. They touch disjoint projects
(Driver.FOCAS + its python sim / Driver.Historian.Wonderware{,.Client} + Core.AlarmHistorian /
Driver.AbCip.IntegrationTests + docs), so they are independently implementable and parallelizable.
---
## Standing constraints (in force)
- **NO Commons wire/proto contract change**, NO Core.Abstractions / breaking interface contract change,
NO EF migration, NO bUnit (Razor proven only by live `/run` — N/A this phase, no Razor touched).
- Stage by explicit path, never `git add .`; never stage the never-stage files (`sql_login.txt`,
`src/Server/.../Host/pki/`, `pending.md`, `current.md`, `stillpending.md`, `docker-dev/docker-compose.yml`).
- No force-push, no `--no-verify`. Never echo/commit secrets. Finish = merge to master + push.
- `dangerouslyDisableSandbox: true` for all build/test/rig commands.
---
## Component A — FOCAS `cnc_getfigure` wire command (backlog #3)
### The finding that makes this safe
`WireFocasClient` (the pure-managed FOCAS/2 Ethernet client) is the **production path to real CNCs**
(not the Fwlib P/Invoke path). Its `GetPositionFiguresAsync` is a hard-coded empty-list stub
(`Wire/WireFocasClient.cs:287-297`). The consumer wraps it in a graceful probe:
```csharp
// FocasDriver.cs:662
state.PositionFigures = await SafeProbe(() => client.GetPositionFiguresAsync(ct), []);
```
and `AxisFactor` (`FocasDriver.cs:~853`) uses a per-axis figure **only when present and ≥ 0**, else falls
back to the `PositionDecimalPlaces` config knob. So a wire command that errors or returns empty degrades to
**exactly today's behavior** — implementing it is monotonic and cannot regress real hardware.
The `.NET FocasWireClient` and the python `focas_mock` (`tests/.../FOCAS.IntegrationTests/Docker/focas-mock/`)
speak a **co-designed, internally-consistent** wire protocol: the mock dispatches on command codes in
`server.py:_wire_payload` (0x56=servo-meter, 0x26=axis position, 0x89=axis names, 0x120=timer, …). The
protocol is validated **against the sim**, not against real Fanuc hardware — that validation is bench-CNC-gated
for the *entire* wire backend (`docs/v2/implementation/focas-wire-protocol.md`), not unique to this command.
### Approach (driver-internal + sim; NO interface change — `IFocasClient.GetPositionFiguresAsync` already exists)
1. **`FocasWireClient.ReadPositionFiguresAsync(...)`** — mirror `ReadServoMeterAsync` (`FocasWireClient.cs:351-386`):
send the figure request **paired with the axis-name request (`0x0089`)** so figures align positionally to
axes (exactly how servo-meter pairs `0x0056`+`0x0089`). Pick a wire command id **currently unused by both
the client and the mock** (the mock uses 0x0E/0x10/0x15/0x16/0x18/0x19/0x1A/0x1C/0x1D/0x23/0x24/0x25/0x26/
0x35/0x40/0x56/0x57/0x89/0x8A/0x98/0xE1/0xFC/0x120/0x8001/0x8002 — choose a clearly-unused code, e.g. `0x00D3`,
and document it as sim-consistent / bench-CNC-unvalidated). Response payload = per-axis `short dec`
(decimal-place count); parse into `IReadOnlyList<int>`. `rc != 0` → empty list (graceful).
2. **`WireFocasClient.GetPositionFiguresAsync`** — replace the stub: call the new client method, return its list;
any failure path returns empty (never throws — the `SafeProbe` + per-axis fallback contract is preserved).
Rewrite the now-stale `<returns>` doc.
3. **`focas_mock`** — add a `cnc_getfigure` admin/handler entry + a `_wire_payload` branch for the chosen code
returning per-axis `short dec` from a **new data-store key `position_figures`** (default **0 per axis**
`10^0 = 1.0` factor → no scaling → every existing integration assertion preserved). Register the method name
in `constants.py:IMPLEMENTED_FOCAS_METHODS` + the server handler map + profile `exports` as needed.
### Testing
- **Driver consumption is already covered offline** by `FocasPositionAutoScaleTests` (FakeFocasClient returns
figures → asserts scaled vs. fallback). No change needed there.
- **New end-to-end proof** = a skip-gated integration test in `Series/WireBackendCoverageTests.cs` (the
established pattern, `[Collection(FocasSimCollection.Name)]` + `if (_fx.SkipReason is not null) Assert.Skip(...)`):
`mock_patch` non-zero `position_figures`, init the wire-backed `FocasDriver`, assert the published
`AbsolutePosition` is the **scaled** value (raw ÷ 10^dec), not the raw integer.
- **Live `/run` for A** = bring the focas-mock up locally on the Mac
(`docker compose -f tests/.../FOCAS.IntegrationTests/Docker/docker-compose.yml up -d --build`) and run the
FOCAS integration suite; the new test executes (does not skip) and passes.
### Honest boundary (documented, not built)
Real-Fanuc validation of the chosen wire code/payload stays bench-CNC-gated — same status as the whole wire
backend. Worst case on real hardware = graceful fallback to the config knob (i.e. no regression).
---
## Component B — Wonderware poison-event per-event status (backlog #5)
### The finding that shapes it
The sidecar IPC reply `WriteAlarmEventsReply.PerEventOk` is a `bool[]` on **both** ends — which are **both in
this repo** (`...Wonderware.Client/Ipc/Contracts.cs` + `...Wonderware/Ipc/Contracts.cs`, MessagePack over TCP;
**not** a Commons proto). The client (`WonderwareHistorianClient.WriteBatchAsync:340-369`) can therefore only
produce `Ack`/`RetryPlease`, never `PermanentFail`, so a poison event loops to the retry cap instead of
dead-lettering immediately. The `HistorianWriteOutcome` enum **already has `PermanentFail`** and the sink
(`SqliteStoreAndForwardSink.cs:456-465`) **already dead-letters it immediately**. The sidecar writer seam
`IAlarmEventWriter.WriteAsync` returns only `bool[]` and its sole real impl is the **infra-gated**
`AahClientManagedAlarmEventWriter` (AAH SDK).
### Approach (additive IPC field + sidecar classifier; NO `IAlarmEventWriter` change, NO Commons)
1. **Additive wire field (both Contracts.cs):** add `[Key(4)] byte[] PerEventStatus` to `WriteAlarmEventsReply`
(0=Ack, 1=Retry, 2=Permanent). **Keep `PerEventOk [Key(3)]`** populated for rolling-deploy back-compat
(new client ↔ old sidecar: empty Key(4) → fall back to PerEventOk; old client ↔ new sidecar: ignores Key(4)).
2. **Sidecar `HistorianFrameHandler.HandleWriteAlarmEventsAsync`:** add a pure `ClassifyEvents(events)` step —
an event that is **structurally malformed** (empty `SourceName`, empty `AlarmType`, or `EventTimeUtcTicks <= 0`)
can never persist → mark **Permanent** and **exclude it from the writer batch** (mirrors the client's existing
corrupt-row exclusion). Remaining events go to the writer; `true`→Ack, `false`→Retry. Populate **both**
`PerEventOk` (Ack→true else false) and `PerEventStatus`.
3. **Client `WriteBatchAsync`:** when `reply.PerEventStatus.Length == batch.Count`, map 0/1/2 →
`Ack`/`RetryPlease`/`PermanentFail`; else fall back to the existing `PerEventOk` path. Rewrite the stale
`<remarks>` + inline "PermanentFail is never emitted" comments.
### Testing (fully offline — no rig)
- Sidecar: pure `ClassifyEvents` unit test (malformed → Permanent + excluded; valid → delegated; writer
false → Retry).
- Client: a `FakeSidecarServer` reply with `PerEventStatus=[2]``WriteBatchAsync` returns `PermanentFail`;
a reply with empty `PerEventStatus` → falls back to `PerEventOk` (back-compat).
- End-to-end sink: an existing `SqliteStoreAndForwardSink` test already proves `PermanentFail` → immediate
dead-letter; add/confirm a test that a Permanent classification dead-letters on the **first** drain (vs. the
retry-cap path the finding-002 regression test covers).
### Honest boundary (documented)
SDK-**semantic** permanent rejections (a structurally-valid event the AAH SDK rejects, e.g. unknown tag) still
map to Retry→cap until the infra-gated `AahClientManagedAlarmEventWriter` surfaces richer per-event status — a
noted follow-up. This phase closes the **structurally-malformed (poison)** case the finding describes.
---
## Component C — AbCip nested-struct live-gate (backlog #6)
### Verdict (from the feasibility pass)
A **local** live-gate is architecturally impossible: `ab_server` (the libplctag CIP sim used by the default
`abserver` tier) does **not** implement the CIP Template Object service (class 0x6C) that nested-UDT discovery
depends on. The decode + threading already **shipped** (`3d8ce4e8`/`d203f31c`; `AbCipUdtMember.NestedTemplateId`
→ existing `@udt/{id}` fetch) and 301 offline tests pass. The honest close formalizes the gate at the existing
**Emulate** fidelity tier (Logix Emulate / real ControlLogix).
### Approach (skip-gated test + docs; NO runtime change)
1. **New `AbCipEmulateNestedUdtTests`** (mirror `AbCipEmulateUdtReadTests` + `AbServerProfileGate.SkipUnless(Emulate)`):
drives `FocasDriver`-equivalent AbCip discovery against a nested-UDT-bearing Emulate project and asserts the
nested struct's atomic leaves are addressable (`Parent.Status.Code`, `Parent.Status.Running`) + the nested
sub-folder materializes. Skips cleanly on the default `abserver` tier (which can't serve Template Object).
2. **`docs/drivers/AbCip.md`** — document nested-struct support as **Emulate-tier verified** (ab_server lacks
CIP Template Object), referencing the new test + the existing offline unit coverage.
### Testing
The Emulate test **compiles and skips** locally (no Logix Emulate on this Mac) — proving it is wired into the
suite. The decode/threading risk is already pinned by the shipped offline unit tests
(`CipTemplateObjectDecoderTests` + `AbCipDriverDiscoveryTests`).
---
## Component D — Reconcile + finish
- `stillpending.md` (never-staged): mark #3 (FOCAS wire command shipped + sim-proven), #5 (Wonderware
poison-event structurally-malformed close + the SDK-semantic follow-up boundary), #6 (AbCip live-gate
formalized as the Emulate skip-gated test).
- Update memory (`project_stillpending_backlog.md` + `MEMORY.md` index line).
- Build clean + targeted tests green (FOCAS + Wonderware + AbCip) + Component A live `/run` (focas-mock) +
merge to master + push.
---
## Task slicing (independent → parallelizable)
| Task | Component | Project(s) | Class | Parallel with |
|---|---|---|---|---|
| T1 | FOCAS wire `cnc_getfigure` (client method + stub wire-in + mock handler) | Driver.FOCAS + focas-mock (py) | standard | T2, T3 |
| T2 | Wonderware per-event status (DTO ×2 + sidecar classifier + client consume + tests) | Wonderware{,.Client} + Core.AlarmHistorian.Tests | standard | T1, T3 |
| T3 | AbCip Emulate nested-UDT skip-gated test + AbCip.md | AbCip.IntegrationTests + docs | small | T1, T2 |
| T4 | FOCAS integration test (mock) + live `/run` verify | FOCAS.IntegrationTests | small | none (after T1) |
| T5 | Reconcile stillpending #3/#5/#6 + memory + finish (build, tests, merge+push) | docs (never-staged) | small | none |
Parallel implementers use **worktree isolation** (the shared-tree git-race lesson) since T1/T2/T3 touch disjoint
projects. T4 depends on T1; T5 runs last.
## Done =
Build clean + `dotnet test` green (Driver.FOCAS + Wonderware client/sink + AbCip) + Component A live `/run`
(focas-mock integration test executes & passes) + Components B/C offline-proven + merged to master + pushed.