# Still-Pending Completion — Design **Date:** 2026-06-15 **Source:** `stillpending.md` (audit at commit `c7f754c`) **Branch:** `feat/stillpending-completion` **Status:** Design approved; implementation plan to follow. ## Goal Close the genuinely actionable items in `stillpending.md`: the 11 unimplemented worker command kinds (§1.1), audit-record CorrelationId threading (§1.2), the client CLI/helper parity gaps (§4), and the documentation/residual-recording hygiene (§7). Items that are deliberate v1 scope (§2), vendor- or rig-gated (§1.3, §1.4, §3), or opt-in verification gates (§5) are documented, not built. ## Key discovery that shapes the work All 11 worker commands **already** have proto request *and* reply messages, gateway-side validation (`MxAccessGrpcRequestValidator.cs:86-111`), scope mapping (`GatewayGrpcScopeResolver.cs:45-54`), and generic pass-through routing (`MxAccessGatewayService.Invoke`). Therefore: - **No `.proto` changes** — no codegen, no net48 `CS0246` regen risk. - **No gateway routing changes** for the 11 commands. - The real code surface is: worker executor arms, six new COM-wrapper methods, the constraint-path CorrelationId thread, and client CLI/helper additions. 8 of the 11 have dedicated reply messages; 3 (`SetBufferedUpdateInterval`, `Ping`, `ShutdownWorker`) intentionally return the base OK reply. ## Architecture Two-process design is unchanged. Work lands in five independent workstreams. ### Workstream A — Worker control/lifecycle commands (5) `Ping`, `GetSessionState`, `GetWorkerInfo`, `DrainEvents`, `ShutdownWorker`. No COM involved. New arms in `MxAccessCommandExecutor.Execute` (`src/ZB.MOM.WW.MxGateway.Worker/MxAccess/MxAccessCommandExecutor.cs:97-128`). The executor currently holds only COM collaborators, so it gains the runtime state these commands report on: - `DrainEvents` → `MxAccessEventQueue.Drain(maxEvents)` (already exists; `maxEvents == 0` drains all) → `DrainEventsReply { repeated MxEvent events }`. - `Ping` → echoes `PingCommand.message`; base OK reply. - `GetSessionState` → `SessionStateReply { SessionState state }` from the runtime/heartbeat snapshot. - `GetWorkerInfo` → `WorkerInfoReply { worker_process_id, worker_version, mxaccess_progid, mxaccess_clsid }` from `MxAccessInteropInfo` + process info. - `ShutdownWorker` → honor `grace_period` then signal `StaRuntime` shutdown; base OK reply. Where the control arms physically live (inside `MxAccessCommandExecutor` with injected collaborators, vs. intercepted one layer up where the runtime context already exists) is an implementation decision for the plan; the contract surface is identical either way. ### Workstream B — Worker MXAccess COM commands (6) `Suspend`, `Activate`, `AuthenticateUser`, `ArchestrAUserToId`, `AddBufferedItem`, `SetBufferedUpdateInterval`. Each needs: 1. A wrapper method on `IMxAccessServer` (`src/ZB.MOM.WW.MxGateway.Worker/MxAccess/IMxAccessServer.cs`) and dispatch in `MxAccessComServer` (`MxAccessComServer.cs`). **Open question resolved on windev:** which native interface (`ILMXProxyServer` / `…3` / `…4`) exposes each method — confirm against the interop and `docs/MXAccess-Public-API.md` before writing the dispatch. 2. An executor arm mapping request → wrapper call → reply (`SuspendReply`/`ActivateReply` carry `MxStatusProxy`; `AuthenticateUserReply`/ `ArchestrAUserToIdReply` carry `user_id`; `AddBufferedItemReply` carries `item_handle`; `SetBufferedUpdateInterval` returns base OK). `AuthenticateUser` credentials must never reach logs (standing rule). `AddBufferedItem`/`SetBufferedUpdateInterval` enable the already-wired `OnBufferedDataChange` event path (`MxAccessEventMapper.cs:231-254`); per the approved decision we **verify the buffered round-trip live on windev**, capturing a real multi-sample batch to validate the §3.2 conversion path (`Conversion/VariantConverter.cs`). ### Workstream C — §1.2 audit CorrelationId Thread `request.ClientCorrelationId` from `MxAccessGatewayService.Invoke` → `ApplyConstraintsAsync` → the six filter helpers (`EnforceReadTagAsync`, `EnforceWriteHandleAsync`, `FilterTagBulkAsync`, `FilterReadBulkAsync`, `FilterWriteBulkAsync`, `FilterHandleBulkAsync`) → `IConstraintEnforcer.RecordDenialAsync`, which gains a `correlationId` parameter. **Type detail:** `AuditEvent.CorrelationId` is `Guid?`, but `ClientCorrelationId` is a free-form string. The enforcer does `Guid.TryParse` and stores the value when parseable, else null. No audit-schema or contract change. Remove the `TODO(Task 2.3)` at `ConstraintEnforcer.cs:134-136`. ### Workstream D — Client CLI/helper parity (5 clients) - **Go** `Write2` single-session helper, modeled on `Write` (`session.go:559`). - **Python CLI** — add the 4 `galaxy-*` commands wrapping existing `galaxy.py` library methods (`test_connection`, `get_last_deploy_time`, `discover_hierarchy`, `watch_deploy_events`). - **`ping` CLI** — add to Go (`cmd/mxgw-go/main.go`) and Java (`MxGatewayCli.java`). - **`browse` CLI** — add to **all 5**, wrapping each client's existing `LazyBrowseNode`/`Browse` helper → 0/5 → 5/5 parity. - **Galaxy name unification** — add canonical `galaxy-test-connection` / `galaxy-last-deploy` to Java, keeping `galaxy-test` / `galaxy-deploy-time` as **deprecated aliases** (no script breakage). - **`version` in dotnet** — the explorer found a `version` path at `MxGatewayClientCli.cs:85` that conflicts with the audit's §4.4. Treat as **verify-then-fix-only-if-genuinely-missing**. ### Workstream E — Docs/hygiene (§7) + residual recording - D1 plan header (`docs/plans/2026-06-14-deferred-followups.md:4`). - Stale STA "production fix needed" prose (`docs/AlarmClientDiscovery.md:765-774`). - Stale EventsHub follow-up comment (`Dashboard/Hubs/EventsHub.cs:9-17`). - CLAUDE.md project-name drift (`MxGateway.*` → `ZB.MOM.WW.MxGateway.*`). - Remove dead `MapSqlException` (`GalaxyRepositoryGrpcService.cs:350-360`). - Record §1.3 (live failover counter unproven on rig) and §1.4 (8-arg ack drops operator domain/full-name; vendor stub) as explicitly-documented residuals. - Update `stillpending.md` to reflect what closed. ## Data flow / error handling specifics - Control commands return base/typed replies with `ProtocolStatusCode.Ok`; failures map to the existing `CreateInvalidRequestReply` / `CreateAlarmFailureReply` helpers, never a thrown exception across the STA. - COM commands surface the native `HResult` on the reply exactly as the other COM arms do — MXAccess parity means we do not "fix" surprising returns (e.g. `AuthenticateUser` is allowed to fail; the live test tolerates it). - `ShutdownWorker` must not deadlock the STA: signal shutdown, return the reply, then let the pump drain — sequencing is a plan-level concern. ## Testing & cross-platform verification | Workstream | Build/test | Host | |---|---|---| | A, B (worker) | `Worker.Tests` xUnit with a fake `IMxAccessServer` asserting each arm calls the right wrapper and maps the reply | **windev** (net48/x86) | | B (live) | `MXGATEWAY_RUN_LIVE_MXACCESS_TESTS=1` smoke incl. buffered capture | **windev** + live MXAccess | | C (gateway) | unit test: denied op persists parsed `CorrelationId` | local | | D (clients) | `go test`, `pytest`, `cargo test`+clippy, `gradle test`, `dotnet test` | local; Java on windev | | E (docs) | doc-only; `--check` regen where applicable | local | TDD throughout, per-task commits, docs updated in the same change as source. ## Out of scope (documented, not built) §1.3 live-drive (rig can't drive a real failover), §1.4 actual delivery + §3.4/§3.5 (AVEVA `AlarmAckByName` stub returns -55, `AlarmAckByGUID` is `E_NOTIMPL`), §3.1/§3.2-conversion-fix/§3.3/§3.6/§3.7 (await live captures), all of §2 (deliberate v1 scope, incl. validator-blocked multi-subscriber), §5 (opt-in verification gates), §7.6 (`Won't Fix` review findings).