Covers the 11 worker command kinds (§1.1), audit CorrelationId threading (§1.2), client CLI/helper parity (§4), and doc hygiene (§7). Key finding: all 11 commands already have proto/validation/scope/routing, so this is a worker-executor + COM-wrapper + client-CLI effort with zero contract changes.
7.8 KiB
Still-Pending Completion — Design
Date: 2026-06-15
Source: stillpending.md (audit at commit c7f754c)
Branch: feat/stillpending-completion
Status: Design approved; implementation plan to follow.
Goal
Close the genuinely actionable items in stillpending.md: the 11 unimplemented
worker command kinds (§1.1), audit-record CorrelationId threading (§1.2), the
client CLI/helper parity gaps (§4), and the documentation/residual-recording
hygiene (§7). Items that are deliberate v1 scope (§2), vendor- or rig-gated
(§1.3, §1.4, §3), or opt-in verification gates (§5) are documented, not built.
Key discovery that shapes the work
All 11 worker commands already have proto request and reply messages,
gateway-side validation (MxAccessGrpcRequestValidator.cs:86-111), scope
mapping (GatewayGrpcScopeResolver.cs:45-54), and generic pass-through routing
(MxAccessGatewayService.Invoke). Therefore:
- No
.protochanges — no codegen, no net48CS0246regen risk. - No gateway routing changes for the 11 commands.
- The real code surface is: worker executor arms, six new COM-wrapper methods, the constraint-path CorrelationId thread, and client CLI/helper additions.
8 of the 11 have dedicated reply messages; 3 (SetBufferedUpdateInterval,
Ping, ShutdownWorker) intentionally return the base OK reply.
Architecture
Two-process design is unchanged. Work lands in five independent workstreams.
Workstream A — Worker control/lifecycle commands (5)
Ping, GetSessionState, GetWorkerInfo, DrainEvents, ShutdownWorker.
No COM involved. New arms in MxAccessCommandExecutor.Execute
(src/ZB.MOM.WW.MxGateway.Worker/MxAccess/MxAccessCommandExecutor.cs:97-128).
The executor currently holds only COM collaborators, so it gains the runtime state these commands report on:
DrainEvents→MxAccessEventQueue.Drain(maxEvents)(already exists;maxEvents == 0drains all) →DrainEventsReply { repeated MxEvent events }.Ping→ echoesPingCommand.message; base OK reply.GetSessionState→SessionStateReply { SessionState state }from the runtime/heartbeat snapshot.GetWorkerInfo→WorkerInfoReply { worker_process_id, worker_version, mxaccess_progid, mxaccess_clsid }fromMxAccessInteropInfo+ process info.ShutdownWorker→ honorgrace_periodthen signalStaRuntimeshutdown; base OK reply.
Where the control arms physically live (inside MxAccessCommandExecutor with
injected collaborators, vs. intercepted one layer up where the runtime context
already exists) is an implementation decision for the plan; the contract surface
is identical either way.
Workstream B — Worker MXAccess COM commands (6)
Suspend, Activate, AuthenticateUser, ArchestrAUserToId,
AddBufferedItem, SetBufferedUpdateInterval. Each needs:
- A wrapper method on
IMxAccessServer(src/ZB.MOM.WW.MxGateway.Worker/MxAccess/IMxAccessServer.cs) and dispatch inMxAccessComServer(MxAccessComServer.cs). Open question resolved on windev: which native interface (ILMXProxyServer/…3/…4) exposes each method — confirm against the interop anddocs/MXAccess-Public-API.mdbefore writing the dispatch. - An executor arm mapping request → wrapper call → reply
(
SuspendReply/ActivateReplycarryMxStatusProxy;AuthenticateUserReply/ArchestrAUserToIdReplycarryuser_id;AddBufferedItemReplycarriesitem_handle;SetBufferedUpdateIntervalreturns base OK).
AuthenticateUser credentials must never reach logs (standing rule).
AddBufferedItem/SetBufferedUpdateInterval enable the already-wired
OnBufferedDataChange event path (MxAccessEventMapper.cs:231-254); per the
approved decision we verify the buffered round-trip live on windev,
capturing a real multi-sample batch to validate the §3.2 conversion path
(Conversion/VariantConverter.cs).
Workstream C — §1.2 audit CorrelationId
Thread request.ClientCorrelationId from MxAccessGatewayService.Invoke →
ApplyConstraintsAsync → the six filter helpers (EnforceReadTagAsync,
EnforceWriteHandleAsync, FilterTagBulkAsync, FilterReadBulkAsync,
FilterWriteBulkAsync, FilterHandleBulkAsync) →
IConstraintEnforcer.RecordDenialAsync, which gains a correlationId
parameter.
Type detail: AuditEvent.CorrelationId is Guid?, but ClientCorrelationId
is a free-form string. The enforcer does Guid.TryParse and stores the value
when parseable, else null. No audit-schema or contract change. Remove the
TODO(Task 2.3) at ConstraintEnforcer.cs:134-136.
Workstream D — Client CLI/helper parity (5 clients)
- Go
Write2single-session helper, modeled onWrite(session.go:559). - Python CLI — add the 4
galaxy-*commands wrapping existinggalaxy.pylibrary methods (test_connection,get_last_deploy_time,discover_hierarchy,watch_deploy_events). pingCLI — add to Go (cmd/mxgw-go/main.go) and Java (MxGatewayCli.java).browseCLI — add to all 5, wrapping each client's existingLazyBrowseNode/Browsehelper → 0/5 → 5/5 parity.- Galaxy name unification — add canonical
galaxy-test-connection/galaxy-last-deployto Java, keepinggalaxy-test/galaxy-deploy-timeas deprecated aliases (no script breakage). versionin dotnet — the explorer found aversionpath atMxGatewayClientCli.cs:85that conflicts with the audit's §4.4. Treat as verify-then-fix-only-if-genuinely-missing.
Workstream E — Docs/hygiene (§7) + residual recording
- D1 plan header (
docs/plans/2026-06-14-deferred-followups.md:4). - Stale STA "production fix needed" prose (
docs/AlarmClientDiscovery.md:765-774). - Stale EventsHub follow-up comment (
Dashboard/Hubs/EventsHub.cs:9-17). - CLAUDE.md project-name drift (
MxGateway.*→ZB.MOM.WW.MxGateway.*). - Remove dead
MapSqlException(GalaxyRepositoryGrpcService.cs:350-360). - Record §1.3 (live failover counter unproven on rig) and §1.4 (8-arg ack drops operator domain/full-name; vendor stub) as explicitly-documented residuals.
- Update
stillpending.mdto reflect what closed.
Data flow / error handling specifics
- Control commands return base/typed replies with
ProtocolStatusCode.Ok; failures map to the existingCreateInvalidRequestReply/CreateAlarmFailureReplyhelpers, never a thrown exception across the STA. - COM commands surface the native
HResulton the reply exactly as the other COM arms do — MXAccess parity means we do not "fix" surprising returns (e.g.AuthenticateUseris allowed to fail; the live test tolerates it). ShutdownWorkermust not deadlock the STA: signal shutdown, return the reply, then let the pump drain — sequencing is a plan-level concern.
Testing & cross-platform verification
| Workstream | Build/test | Host |
|---|---|---|
| A, B (worker) | Worker.Tests xUnit with a fake IMxAccessServer asserting each arm calls the right wrapper and maps the reply |
windev (net48/x86) |
| B (live) | MXGATEWAY_RUN_LIVE_MXACCESS_TESTS=1 smoke incl. buffered capture |
windev + live MXAccess |
| C (gateway) | unit test: denied op persists parsed CorrelationId |
local |
| D (clients) | go test, pytest, cargo test+clippy, gradle test, dotnet test |
local; Java on windev |
| E (docs) | doc-only; --check regen where applicable |
local |
TDD throughout, per-task commits, docs updated in the same change as source.
Out of scope (documented, not built)
§1.3 live-drive (rig can't drive a real failover), §1.4 actual delivery +
§3.4/§3.5 (AVEVA AlarmAckByName stub returns -55, AlarmAckByGUID is
E_NOTIMPL), §3.1/§3.2-conversion-fix/§3.3/§3.6/§3.7 (await live captures),
all of §2 (deliberate v1 scope, incl. validator-blocked multi-subscriber), §5
(opt-in verification gates), §7.6 (Won't Fix review findings).