docs(plans): phased completion design for the still-pending backlog

Roadmap for closing stillpending.md §1-§5 + §7/§9 cleanup in 9 phases
(0 hygiene -> 1 silent-deploy bugs H1/H5 -> 2 ServiceLevel H3 ->
3 OPC UA standards H4/H2-bit/H6 -> 4 driver coverage -> 5 probes ->
6 AdminUI -> 7 Client.UI -> 8 per-cluster scoping). Conservative
rebuild-on-change for H1; plan-and-execute phase-by-phase; no EF
migration; defer-list flagged (Denied/Simulated/Language/InlayHints/
HistoryUpdate-service/Galaxy-gateway-write).
This commit is contained in:
Joseph Doherty
2026-06-15 09:27:06 -04:00
parent 151b7165af
commit f64be52796
@@ -0,0 +1,123 @@
# Still-Pending Backlog — phased completion design
> **Status:** approved 2026-06-15. Source backlog: `stillpending.md` (system-wide audit, master `151b7165`).
> **Scope:** the full actionable backlog — `stillpending.md` §1–§5 + §7/§9 cleanup. Excludes §6
> (by-design / backend-gated limitations) and §8 (live-`/run` gates, which are user-driven), plus the
> recommended-defer list below.
> **Planning cadence:** plan-and-execute **one phase at a time** — author each phase's implementation
> plan when we reach it (no up-front 60-task monolith).
## Goal
Close every genuine deferred / partial / missing functional gap catalogued in `stillpending.md`,
in priority order, without a schema migration and without regressing the actor model.
## Delivery approach
- **Phased feature branches.** One phase = one branch off master, merged back after green
`dotnet build` + `dotnet test` + the classification-driven review chain (trivial→implementer only;
small→code review; standard→spec ∥ code review; high-risk→serial reviews + final integration review).
- **Live `/run` gates stay user-driven.** The user drives docker-dev; the agent never signs in. Razor/JS
changes are proven only by live `/run`, never bUnit.
- **Plan-and-execute phase-by-phase.** Later phases (drivers, AdminUI) depend on what the early phases
settle; authoring all plans up front would be stale on arrival.
Rejected alternatives: one long-lived mega-branch (un-reviewable, conflict magnet); parallel
per-area branches (the deploy-path and driver phases share `Phase7*` / `DriverHostActor`, so they collide).
## Hard constraints (carried into every phase)
- **NO Configuration entity / EF migration.** The audit confirms the whole backlog is achievable without
one (e.g. `VirtualTag.Historize` already exists; the `HistoryUpdate` bit is a Core `[Flags]` enum, not EF;
`isHistorized` / native `HistorizeToAveva` ride the `TagConfig` JSON blob).
- Stage by path — never `git add .`. Never stage `sql_login.txt`, `src/Server/.../Host/pki/`, `pending.md`,
`current.md`, `docker-dev/docker-compose.yml`. Never echo or commit secrets. No force-push, no `--no-verify`.
- Tests: xUnit + Shouldly, TDD fail-then-pass; in-memory EF where DB-backed. **No bUnit.**
## Phased roadmap
### Phase 0 — Hygiene & no-risk cleanup *(trivial / small)*
- §9 stale comments (7 spots: `DriverHostActor.cs:45`, `OpcUaPublishActor.cs:246-252`,
`ServiceCollectionExtensions.cs:66-67`, `IScriptLogPublisher.cs:8` + `ScriptLogTopicSink.cs:10`,
`ScriptAnalysisService.cs:21-23`, the Galaxy guard-message refs) + the `docs/security.md`
write-outcome section (B1 already shipped both halves).
- §7 — mark the confirmed-shipped `.tasks.json` files completed so future audits don't re-flag them.
- §3 low-consequence residue — document/verify-benign `DraftSnapshotFactory` placeholders and
`Cluster LeaderChanged` no-op (no behavioral change expected).
### Phase 1 — Silent-deploy bugs (H1 + H5) *(high-risk)*
- **H1 (conservative):** add the `Changed*` counts to `Phase7Applier.needsRebuild`; make
`VirtualTagHostActor.OnApply` stop+respawn children whose plan changed (so an edited Expression /
DependencyRefs takes effect). A rebuild repopulates from the persisted artifact and is idempotent.
- **H5:** thread `VirtualTag.Historize` through the equipment-namespace path
(`Phase7Composer``EquipmentVirtualTagPlan``VirtualTagHostActor``VirtualTagActor`) and wire a
real production `IHistoryWriter` (today drops to `NullHistoryWriter`).
- Verify on the 2-node / docker-dev rig (user-driven `/run`).
### Phase 2 — Redundancy ServiceLevel (H3) *(high-risk)*
- Feed `DbHealthProbeActor` / `PeerOpcUaProbeActor` health into `ServiceLevelCalculator.Compute`
(already written, never invoked) → published `Server.ServiceLevel` byte via `RedundancyStateActor`.
A DB-unreachable / probe-failed primary must drop below its role-based level.
- Unit tests can't catch the wiring (per memory) — verify on the 2-node rig.
### Phase 3 — OPC UA standards completeness (H4 + H2-bit + H6) *(high-risk)*
- **H4:** `OnEnableDisable` node-manager seam → `alarm-commands` DPS topic (engine + AdminUI already
handle Enable/Disable; only the OPC UA half is missing).
- **H2 (permission bit only):** add `NodePermissions.HistoryUpdate` and fix the
`TriePermissionEvaluator:86` mapping (today a HistoryRead grant would also authorize a future
HistoryUpdate). The full HistoryUpdate *service* is **deferred** (infra-gated — no backend RPC, like
modified-value history).
- **H6:** inbound native-alarm `Acknowledge``IAlarmSource.AcknowledgeAsync` → AVEVA, recording the
authenticated principal rather than a generic `"opcua-client"`.
### Phase 4 — Driver data-type & structure coverage *(standard, parallelizable per-driver)*
Splits into per-driver sub-plans (4a4i), each independent:
- S7 wide types (Int64/UInt64/LReal/String/DateTime) + Timer/Counter areas.
- Modbus Int64/UInt64 node `DataType` (add `Int64` to `DriverDataType`) + String/BitInRegister arrays.
- Array `IsArray` discovery for S7 / AbCip / AbLegacy / TwinCAT.
- AbCip + TwinCAT UDT member-path reads/writes.
- AbLegacy + TwinCAT BOOL-within-word (bit-index) writes (read-modify-write).
- FOCAS position scaling (`10^DecimalPlaces` divide) + `Unimplemented…Factory` fail at config time, not per-read.
- Galaxy nested gobject hierarchy + writer item-handle cache shared with the subscription registry.
- Historian.Wonderware `Total` aggregate + poison-event dead-letter (don't retry forever).
- OpcUaClient `IHistoryProvider.ReadEventsAsync` event-history passthrough.
### Phase 5 — Test-Connect protocol probes *(small)*
- Replace TCP-only probes with real handshakes: Modbus FC, FOCAS `cnc_allclibhndl3`, TwinCAT ADS-state,
OpcUaClient session-open, historian handshake (§2 probes + plan `2026-05-28-adminui-driver-pages`
Phase 7 + `2026-06-12-historian-tcp-transport` task 9).
### Phase 6 — AdminUI typed editors, pickers & UX *(standard; Razor → live-`/run` only, NO bUnit)*
- OpcUaClient + Historian.Wonderware typed `TagConfig` editors in the `/uns` TagModal.
- Driver-tag `isHistorized` / `historianTagname` first-class field.
- Native-alarm `HistorizeToAveva` opt-out (TagConfig + UI) — §5.
- Galaxy picker pre-fills `alarm` fields from `DriverAttributeInfo.IsAlarm` — §5.
- Typed address pickers for Modbus / S7 / AbCip / AbLegacy / TwinCAT / FOCAS (plan Phase 9).
- UNS-tree Delete for Enterprise / Cluster node kinds.
- Hosts page per-driver-instance rows.
- Monaco `ctx.SetVirtualTag(...)` write-target completions/hover; create-new-script from the inline panel.
### Phase 7 — Client.UI alarm controls *(small)*
- AlarmsView per-row Acknowledge + Shelve + Confirm commands/buttons (backends + CLI already exist).
### Phase 8 — Per-cluster scoping *(high-risk, standalone)*
- Execute tasks 310 of the **existing** `2026-06-07-per-cluster-scoping` plan (SubscribeBulk
cluster-filtering, `OpcUaPublishActor` scoping on rebuild, multi-cluster E2E harness, docker-dev
rewrite, verify, docs). Reference/execute that plan rather than re-designing here.
## Recommended defer (YAGNI) — flagged, not silently dropped
Forward-looking reservations or out-of-repo; left out unless explicitly pulled back in:
- `AuthorizationVerdict.Denied` — v2.1 explicit-deny; no authoring path exists.
- `NamespaceKind.Simulated`, `Script.Language` — future driver / scripting engine that don't exist yet.
- Monaco InlayHints — parameter-name hints; likely intended-permanent stub.
- The full **HistoryUpdate service** — infra-gated (no backend insert/replace/delete RPC).
- Galaxy `ExecuteWrite` fire-and-forget write-outcome — lives in the **mxaccessgw sibling repo**, not here.
## Verification per phase
- Each phase: red→green unit tests for the load-bearing logic; `dotnet build` clean (production projects
are `TreatWarningsAsErrors`); full `dotnet test` green before merge.
- High-risk phases (1, 2, 3, 8) additionally gated on a user-driven live `/run` on docker-dev (and the
2-node rig for redundancy in Phase 2).
- AdminUI (Phase 6) and Client.UI (Phase 7) Razor/XAML changes proven only by live `/run`.