9.4 KiB
Hosts per-driver rows + AbCip nested-struct + Galaxy hygiene — Design
Date: 2026-06-18
Branch: feat/hosts-rows-abcip-nested-hygiene (off master f59680fa)
Backlog items: stillpending.md §A #8 (Hosts per-driver rows), #6 (AbCip nested-struct), #3/#13 + #10 (Galaxy stale-comment hygiene + reconcile)
A three-component backlog phase. One live-provable AdminUI feature, one offline-provable driver enhancement, one doc/comment hygiene sweep. The three touch disjoint projects (AdminUI / Driver.AbCip / Driver.Galaxy + docs), so they are independently implementable.
The triggering follow-up ("cursor-based paging WITHIN a single timestamp / oversized tie clusters") is
stale — it shipped at 2e6c6d3a (HistoryPaging.SliceTieCluster, wired in OtOpcUaNodeManager.cs:1928,
tested in HistoryPagingTests.cs). This phase handles the "+ backlog" half.
Standing constraints (in force)
- NO Commons wire/proto contract change, NO interface/Core.Abstractions contract change, NO EF
migration, NO bUnit (Razor proven only by live
/run). - Stage by explicit path, never
git add .; never stage the never-stage files (sql_login.txt,src/Server/.../Host/pki/,pending.md,current.md,stillpending.md,docker-dev/docker-compose.yml). - No force-push, no
--no-verify. Finish = merge to master + push. dangerouslyDisableSandbox: truefor all build/test/rig commands.
Component A — /hosts per-driver-instance status (backlog #8)
The constraint that shapes the architecture
DriverHealthChanged (the driver-health DPS message, Commons/Messages/Drivers/DriverHealthChanged.cs)
carries ClusterId + DriverInstanceId + health, but no hosting-node identity. The snapshot store
(InMemoryDriverStatusSnapshotStore) keys by DriverInstanceId alone, so in a redundancy pair the two
nodes' health snapshots overwrite each other. ClusterNode.NodeId is a logical id ("LINE3-OPCUA-A"),
not the Akka member address — the runtime keys "this node" by that logical id
(DriverHostActor.cs:428 filters s.NodeId == _localNode.Value).
Consequences:
- True per-Akka-member driver health would need a hosting-node field on
DriverHealthChanged(a Commons change — forbidden) plus a node-keyed store, OR a per-nodeDriverHostActor-children Ask (new cluster actor messaging). Both are out of scope here. - The reliable join key both sides share is
ClusterId.
Approach (no Commons change)
- Reuse the existing health pipeline unchanged (
DriverStatusSignalRBridge→ snapshot store). - Add one AdminUI-internal store method:
IReadOnlyCollection<DriverHealthChanged> GetAll()onIDriverStatusSnapshotStore, implemented inInMemoryDriverStatusSnapshotStoreby snapshotting the backingConcurrentDictionaryvalues. (AdminUI-internal interface — not Commons.) - New "Driver Instances" section on
/hosts, grouped byClusterId:- Each cluster-group header lists that cluster's nodes from ConfigDB
ClusterNode(logical NodeId + Host). - The body lists that cluster's live driver snapshots, enriched with
Name+DriverTypeby joining ConfigDBDriverInstanceonDriverInstanceId. Each row: a live status chip (the same state→color mappingDriverStatusPaneluses), last successful read, last error, 5-min error count. - Live-refreshed by subscribing to the store's
SnapshotChangedevent (in-process — dodges the Traefik self-hub trap, same pattern asDriverStatusPanel).
- Each cluster-group header lists that cluster's nodes from ConfigDB
- Keep the existing Akka member topology rows as-is — they are authoritative for Akka-level Up/Unreachable/Leader, which the DB does not know. Two sections, each authoritative for its own data.
Testing
Extract the grouping/enrichment into a pure view-model builder:
HostsDriverView.Build(snapshots, clusterNodes, driverInstances) -> IReadOnlyList<ClusterDriverGroup>.
xUnit + Shouldly: group-by-cluster, Name/DriverType enrichment, unknown-driver (snapshot with no matching
DriverInstance) fallback, empty inputs. The Razor is a thin shell over the builder — proven only by live
/run (no bUnit), per established pattern.
Live /run (Component A)
Rebuild both central-1 AND central-2 (the :9200 AdminUI is Traefik round-robined across both — a
half-deploy round-robins old/new code). Deploy a Modbus driver (Galaxy too if convenient), open
http://localhost:9200/hosts, confirm the Driver Instances section lists the deployed drivers grouped by
cluster with live state, then drive a Reconnect from a driver page and confirm the chip updates.
Deferred follow-up (noted, not built)
True per-Akka-member nesting needs node identity on the health feed (Commons field) or a per-node
DriverHostActor-children Ask. Out of scope under the no-Commons constraint.
Component B — AbCip nested-struct member expansion (backlog #6)
Finding (simpler than the backlog framed)
The backlog said the CIP Template Object member block "carries no nested template id." It actually does:
the per-member info u16 uses the same encoding as the Symbol Object — bit 15 = struct flag, lower 12
bits = template instance id for a struct member (the codebase's own comment in CipTemplateObjectDecoder
documents this). The prior member-expansion fix (4e141402/4a7b0fde) discarded that id and recursed
with templateInstanceId: null (AbCipDriver.cs:~1179), leaving a test-only SeedDiscoveredUdtShapeForTest
seam. So the close is to stop discarding it — no new CIP query needed.
Approach (driver-internal only)
- Carry
uint? NestedTemplateIdon theAbCipUdtMemberrecord (driver-internal — not Commons). CipTemplateObjectDecodersets it frominfo & 0x0FFFwhen the struct flag (bit 15) is set;nullfor scalar members.- The driver threads
member.NestedTemplateIdinto the existingFetchUdtShapeAsync(recursion siteAbCipDriver.cs:~1179) instead ofnull. ReusesIAbCipTemplateReader.ReadAsync("@udt/{id}")— zero new CIP primitives.
Testing (fully offline)
Via the injectable IAbCipTemplateReader / FakeTemplateReader: feed a Template Object blob with a struct
member whose info = 0x8000 | nestedId; assert (1) the decoder extracts NestedTemplateId == nestedId,
(2) the driver fetches the nested shape (@udt/{nestedId}) so nested members become addressable. Add a
scalar-member case asserting NestedTemplateId == null. Live-verify stays infra-gated (no nested-UDT
ControlLogix rig) — the unit test pins the real risk (decode + threading).
Risk + fallback
Load-bearing assumption: struct-member lower-12 = template id. The codebase comment asserts it and the test pins it. If it proved false on real controller bytes, the fallback is the backlog's original per-member controller query — but the direct path is the confident choice.
Component C — Backlog hygiene + reconcile (backlog #3/#13 + #10)
Galaxy stale comments (comment-only — verify each claim against current code first)
Rewrite 5 sites that reference an unshipped "PR 4.W", a never-added Galaxy:Backend switch, and the
retired "legacy-host" backend (PR 7.2 retired Galaxy.Host/Proxy/Shared):
GalaxyDriver.cs:~52—IGalaxyDataReader"until then... legacy-host backend handles reads in production" → reads ARE supported in production now (standard driver, Phase A shipped).GalaxyDriver.cs:~92—_ownedMxSession"PR 4.W — production runtime owned by InitializeAsync" → confirm it's built inInitializeAsyncnow, relabel to present tense.GalaxyDriver.cs:~669—DiscoverAsync"PR 4.W — also refresh the per-platform probe watcher's membership after discovery" → confirm wired, drop the forward-ref.GalaxyDriverFactoryExtensions.cs:~19— "PR 4.W will add a server-sideGalaxy:Backendswitch" → never landed and won't (legacy "Galaxy" proxy type retired; onlyGalaxyMxGateway). Replace with the shipped reality.HostStatusAggregator.cs:~21— "re-raises OnHostStatusChanged as the driver-level event (wired in PR 4.W)" → confirm wired, drop the forward-ref.
Each rewrite must verify the claim against the current code before asserting it (e.g. confirm
_ownedMxSession really is built in InitializeAsync); convert to accurate present-tense or a real TODO.
Reconcile
stillpending.md(never-staged): mark item #10 SHIPPED (thectx-receiver guard is inScriptAnalysisService.cs:224,70e1bde9); record the #6/#8 outcomes; tidy.- Update the two memory files (
MEMORY.md,project_stillpending_backlog.md).
Task slicing (independent → parallelizable)
| Task | Component | Project | Class | Parallel with |
|---|---|---|---|---|
| T1 | AbCip nested-struct (decoder + record + driver threading + tests) | Driver.AbCip | standard | T2, T4 |
| T2 | Hosts store GetAll() + pure grouping view-model + tests |
AdminUI | standard | T1, T4 |
| T3 | Hosts Razor "Driver Instances" section (uses T2) | AdminUI | small | none (after T2) |
| T4 | Galaxy stale-comment rewrites | Driver.Galaxy | trivial | T1, T2 |
| T5 | Reconcile stillpending #10/#6/#8 + memory | docs (never-staged) | trivial | none |
| T6 | Build + driver/AdminUI tests + live /run + finish (merge+push) |
— | small | none |
Parallel implementers will use worktree isolation (or serial commits) per the shared-tree git-race lesson. T3 depends on T2; T5/T6 run at the end.
Done =
Build clean + dotnet test green (Driver.AbCip + AdminUI + Driver.Galaxy) + Component A live /run pass +
merged to master + pushed.