docs(plans): design — Hosts per-driver rows + AbCip nested-struct + Galaxy hygiene
This commit is contained in:
@@ -0,0 +1,166 @@
|
||||
# Hosts per-driver rows + AbCip nested-struct + Galaxy hygiene — Design
|
||||
|
||||
**Date:** 2026-06-18
|
||||
**Branch:** `feat/hosts-rows-abcip-nested-hygiene` (off master `f59680fa`)
|
||||
**Backlog items:** `stillpending.md` §A #8 (Hosts per-driver rows), #6 (AbCip nested-struct), #3/#13 + #10 (Galaxy stale-comment hygiene + reconcile)
|
||||
|
||||
A three-component backlog phase. One live-provable AdminUI feature, one offline-provable driver
|
||||
enhancement, one doc/comment hygiene sweep. The three touch disjoint projects (AdminUI / Driver.AbCip /
|
||||
Driver.Galaxy + docs), so they are independently implementable.
|
||||
|
||||
The triggering follow-up ("cursor-based paging WITHIN a single timestamp / oversized tie clusters") is
|
||||
**stale** — it shipped at `2e6c6d3a` (`HistoryPaging.SliceTieCluster`, wired in `OtOpcUaNodeManager.cs:1928`,
|
||||
tested in `HistoryPagingTests.cs`). This phase handles the "+ backlog" half.
|
||||
|
||||
---
|
||||
|
||||
## Standing constraints (in force)
|
||||
|
||||
- **NO Commons wire/proto contract change**, NO interface/Core.Abstractions contract change, NO EF
|
||||
migration, NO bUnit (Razor proven only by live `/run`).
|
||||
- Stage by explicit path, never `git add .`; never stage the never-stage files (`sql_login.txt`,
|
||||
`src/Server/.../Host/pki/`, `pending.md`, `current.md`, `stillpending.md`, `docker-dev/docker-compose.yml`).
|
||||
- No force-push, no `--no-verify`. Finish = merge to master + push.
|
||||
- `dangerouslyDisableSandbox: true` for all build/test/rig commands.
|
||||
|
||||
---
|
||||
|
||||
## Component A — `/hosts` per-driver-instance status (backlog #8)
|
||||
|
||||
### The constraint that shapes the architecture
|
||||
|
||||
`DriverHealthChanged` (the `driver-health` DPS message, `Commons/Messages/Drivers/DriverHealthChanged.cs`)
|
||||
carries `ClusterId` + `DriverInstanceId` + health, but **no hosting-node identity**. The snapshot store
|
||||
(`InMemoryDriverStatusSnapshotStore`) keys by `DriverInstanceId` alone, so in a redundancy pair the two
|
||||
nodes' health snapshots overwrite each other. `ClusterNode.NodeId` is a *logical* id (`"LINE3-OPCUA-A"`),
|
||||
not the Akka member address — the runtime keys "this node" by that logical id
|
||||
(`DriverHostActor.cs:428` filters `s.NodeId == _localNode.Value`).
|
||||
|
||||
Consequences:
|
||||
- True per-Akka-**member** driver health would need a hosting-node field on `DriverHealthChanged` (a
|
||||
Commons change — forbidden) plus a node-keyed store, OR a per-node `DriverHostActor`-children Ask (new
|
||||
cluster actor messaging). Both are out of scope here.
|
||||
- The **reliable join key both sides share is `ClusterId`.**
|
||||
|
||||
### Approach (no Commons change)
|
||||
|
||||
- **Reuse the existing health pipeline unchanged** (`DriverStatusSignalRBridge` → snapshot store).
|
||||
- **Add one AdminUI-internal store method:** `IReadOnlyCollection<DriverHealthChanged> GetAll()` on
|
||||
`IDriverStatusSnapshotStore`, implemented in `InMemoryDriverStatusSnapshotStore` by snapshotting the
|
||||
backing `ConcurrentDictionary` values. (AdminUI-internal interface — **not** Commons.)
|
||||
- **New "Driver Instances" section on `/hosts`**, grouped by `ClusterId`:
|
||||
- Each cluster-group header lists that cluster's nodes from ConfigDB `ClusterNode` (logical NodeId + Host).
|
||||
- The body lists that cluster's live driver snapshots, enriched with `Name` + `DriverType` by joining
|
||||
ConfigDB `DriverInstance` on `DriverInstanceId`. Each row: a live status chip (the same state→color
|
||||
mapping `DriverStatusPanel` uses), last successful read, last error, 5-min error count.
|
||||
- Live-refreshed by subscribing to the store's `SnapshotChanged` event (in-process — dodges the Traefik
|
||||
self-hub trap, same pattern as `DriverStatusPanel`).
|
||||
- **Keep the existing Akka member topology rows as-is** — they are authoritative for Akka-level
|
||||
Up/Unreachable/Leader, which the DB does not know. Two sections, each authoritative for its own data.
|
||||
|
||||
### Testing
|
||||
|
||||
Extract the grouping/enrichment into a **pure view-model builder**:
|
||||
`HostsDriverView.Build(snapshots, clusterNodes, driverInstances) -> IReadOnlyList<ClusterDriverGroup>`.
|
||||
xUnit + Shouldly: group-by-cluster, Name/DriverType enrichment, unknown-driver (snapshot with no matching
|
||||
`DriverInstance`) fallback, empty inputs. The Razor is a thin shell over the builder — proven only by live
|
||||
`/run` (no bUnit), per established pattern.
|
||||
|
||||
### Live `/run` (Component A)
|
||||
|
||||
Rebuild **both** central-1 AND central-2 (the `:9200` AdminUI is Traefik round-robined across both — a
|
||||
half-deploy round-robins old/new code). Deploy a Modbus driver (Galaxy too if convenient), open
|
||||
`http://localhost:9200/hosts`, confirm the Driver Instances section lists the deployed drivers grouped by
|
||||
cluster with live state, then drive a Reconnect from a driver page and confirm the chip updates.
|
||||
|
||||
### Deferred follow-up (noted, not built)
|
||||
|
||||
True per-Akka-member nesting needs node identity on the health feed (Commons field) or a per-node
|
||||
`DriverHostActor`-children Ask. Out of scope under the no-Commons constraint.
|
||||
|
||||
---
|
||||
|
||||
## Component B — AbCip nested-struct member expansion (backlog #6)
|
||||
|
||||
### Finding (simpler than the backlog framed)
|
||||
|
||||
The backlog said the CIP Template Object member block "carries no nested template id." It actually **does**:
|
||||
the per-member info `u16` uses the *same encoding as the Symbol Object* — bit 15 = struct flag, lower 12
|
||||
bits = template instance id for a struct member (the codebase's own comment in `CipTemplateObjectDecoder`
|
||||
documents this). The prior member-expansion fix (`4e141402`/`4a7b0fde`) **discarded** that id and recursed
|
||||
with `templateInstanceId: null` (`AbCipDriver.cs:~1179`), leaving a test-only `SeedDiscoveredUdtShapeForTest`
|
||||
seam. So the close is to stop discarding it — **no new CIP query needed**.
|
||||
|
||||
### Approach (driver-internal only)
|
||||
|
||||
- Carry `uint? NestedTemplateId` on the `AbCipUdtMember` record (driver-internal — not Commons).
|
||||
- `CipTemplateObjectDecoder` sets it from `info & 0x0FFF` when the struct flag (bit 15) is set; `null` for
|
||||
scalar members.
|
||||
- The driver threads `member.NestedTemplateId` into the existing `FetchUdtShapeAsync` (recursion site
|
||||
`AbCipDriver.cs:~1179`) instead of `null`. Reuses `IAbCipTemplateReader.ReadAsync("@udt/{id}")` — zero new
|
||||
CIP primitives.
|
||||
|
||||
### Testing (fully offline)
|
||||
|
||||
Via the injectable `IAbCipTemplateReader` / `FakeTemplateReader`: feed a Template Object blob with a struct
|
||||
member whose info = `0x8000 | nestedId`; assert (1) the decoder extracts `NestedTemplateId == nestedId`,
|
||||
(2) the driver fetches the nested shape (`@udt/{nestedId}`) so nested members become addressable. Add a
|
||||
scalar-member case asserting `NestedTemplateId == null`. Live-verify stays infra-gated (no nested-UDT
|
||||
ControlLogix rig) — the unit test pins the real risk (decode + threading).
|
||||
|
||||
### Risk + fallback
|
||||
|
||||
Load-bearing assumption: struct-member lower-12 = template id. The codebase comment asserts it and the test
|
||||
pins it. If it proved false on real controller bytes, the fallback is the backlog's original per-member
|
||||
controller query — but the direct path is the confident choice.
|
||||
|
||||
---
|
||||
|
||||
## Component C — Backlog hygiene + reconcile (backlog #3/#13 + #10)
|
||||
|
||||
### Galaxy stale comments (comment-only — verify each claim against current code first)
|
||||
|
||||
Rewrite 5 sites that reference an unshipped "PR 4.W", a never-added `Galaxy:Backend` switch, and the
|
||||
**retired** "legacy-host" backend (PR 7.2 retired Galaxy.Host/Proxy/Shared):
|
||||
|
||||
- `GalaxyDriver.cs:~52` — `IGalaxyDataReader` "until then... legacy-host backend handles reads in
|
||||
production" → reads ARE supported in production now (standard driver, Phase A shipped).
|
||||
- `GalaxyDriver.cs:~92` — `_ownedMxSession` "PR 4.W — production runtime owned by InitializeAsync" →
|
||||
confirm it's built in `InitializeAsync` now, relabel to present tense.
|
||||
- `GalaxyDriver.cs:~669` — `DiscoverAsync` "PR 4.W — also refresh the per-platform probe watcher's
|
||||
membership after discovery" → confirm wired, drop the forward-ref.
|
||||
- `GalaxyDriverFactoryExtensions.cs:~19` — "PR 4.W will add a server-side `Galaxy:Backend` switch" → never
|
||||
landed and won't (legacy "Galaxy" proxy type retired; only `GalaxyMxGateway`). Replace with the shipped
|
||||
reality.
|
||||
- `HostStatusAggregator.cs:~21` — "re-raises OnHostStatusChanged as the driver-level event (wired in PR
|
||||
4.W)" → confirm wired, drop the forward-ref.
|
||||
|
||||
Each rewrite must verify the claim against the current code before asserting it (e.g. confirm
|
||||
`_ownedMxSession` really is built in `InitializeAsync`); convert to accurate present-tense or a real TODO.
|
||||
|
||||
### Reconcile
|
||||
|
||||
- `stillpending.md` (never-staged): mark item #10 SHIPPED (the `ctx`-receiver guard is in
|
||||
`ScriptAnalysisService.cs:224`, `70e1bde9`); record the #6/#8 outcomes; tidy.
|
||||
- Update the two memory files (`MEMORY.md`, `project_stillpending_backlog.md`).
|
||||
|
||||
---
|
||||
|
||||
## Task slicing (independent → parallelizable)
|
||||
|
||||
| Task | Component | Project | Class | Parallel with |
|
||||
|---|---|---|---|---|
|
||||
| T1 | AbCip nested-struct (decoder + record + driver threading + tests) | Driver.AbCip | standard | T2, T4 |
|
||||
| T2 | Hosts store `GetAll()` + pure grouping view-model + tests | AdminUI | standard | T1, T4 |
|
||||
| T3 | Hosts Razor "Driver Instances" section (uses T2) | AdminUI | small | none (after T2) |
|
||||
| T4 | Galaxy stale-comment rewrites | Driver.Galaxy | trivial | T1, T2 |
|
||||
| T5 | Reconcile stillpending #10/#6/#8 + memory | docs (never-staged) | trivial | none |
|
||||
| T6 | Build + driver/AdminUI tests + live `/run` + finish (merge+push) | — | small | none |
|
||||
|
||||
Parallel implementers will use **worktree isolation** (or serial commits) per the shared-tree git-race
|
||||
lesson. T3 depends on T2; T5/T6 run at the end.
|
||||
|
||||
## Done =
|
||||
|
||||
Build clean + `dotnet test` green (Driver.AbCip + AdminUI + Driver.Galaxy) + Component A live `/run` pass +
|
||||
merged to master + pushed.
|
||||
Reference in New Issue
Block a user