docs: design for completing stillpending.md actionable items
Covers the 11 worker command kinds (§1.1), audit CorrelationId threading (§1.2), client CLI/helper parity (§4), and doc hygiene (§7). Key finding: all 11 commands already have proto/validation/scope/routing, so this is a worker-executor + COM-wrapper + client-CLI effort with zero contract changes.
This commit is contained in:
@@ -0,0 +1,152 @@
|
||||
# Still-Pending Completion — Design
|
||||
|
||||
**Date:** 2026-06-15
|
||||
**Source:** `stillpending.md` (audit at commit `c7f754c`)
|
||||
**Branch:** `feat/stillpending-completion`
|
||||
**Status:** Design approved; implementation plan to follow.
|
||||
|
||||
## Goal
|
||||
|
||||
Close the genuinely actionable items in `stillpending.md`: the 11 unimplemented
|
||||
worker command kinds (§1.1), audit-record CorrelationId threading (§1.2), the
|
||||
client CLI/helper parity gaps (§4), and the documentation/residual-recording
|
||||
hygiene (§7). Items that are deliberate v1 scope (§2), vendor- or rig-gated
|
||||
(§1.3, §1.4, §3), or opt-in verification gates (§5) are documented, not built.
|
||||
|
||||
## Key discovery that shapes the work
|
||||
|
||||
All 11 worker commands **already** have proto request *and* reply messages,
|
||||
gateway-side validation (`MxAccessGrpcRequestValidator.cs:86-111`), scope
|
||||
mapping (`GatewayGrpcScopeResolver.cs:45-54`), and generic pass-through routing
|
||||
(`MxAccessGatewayService.Invoke`). Therefore:
|
||||
|
||||
- **No `.proto` changes** — no codegen, no net48 `CS0246` regen risk.
|
||||
- **No gateway routing changes** for the 11 commands.
|
||||
- The real code surface is: worker executor arms, six new COM-wrapper methods,
|
||||
the constraint-path CorrelationId thread, and client CLI/helper additions.
|
||||
|
||||
8 of the 11 have dedicated reply messages; 3 (`SetBufferedUpdateInterval`,
|
||||
`Ping`, `ShutdownWorker`) intentionally return the base OK reply.
|
||||
|
||||
## Architecture
|
||||
|
||||
Two-process design is unchanged. Work lands in five independent workstreams.
|
||||
|
||||
### Workstream A — Worker control/lifecycle commands (5)
|
||||
|
||||
`Ping`, `GetSessionState`, `GetWorkerInfo`, `DrainEvents`, `ShutdownWorker`.
|
||||
No COM involved. New arms in `MxAccessCommandExecutor.Execute`
|
||||
(`src/ZB.MOM.WW.MxGateway.Worker/MxAccess/MxAccessCommandExecutor.cs:97-128`).
|
||||
|
||||
The executor currently holds only COM collaborators, so it gains the runtime
|
||||
state these commands report on:
|
||||
|
||||
- `DrainEvents` → `MxAccessEventQueue.Drain(maxEvents)` (already exists;
|
||||
`maxEvents == 0` drains all) → `DrainEventsReply { repeated MxEvent events }`.
|
||||
- `Ping` → echoes `PingCommand.message`; base OK reply.
|
||||
- `GetSessionState` → `SessionStateReply { SessionState state }` from the
|
||||
runtime/heartbeat snapshot.
|
||||
- `GetWorkerInfo` → `WorkerInfoReply { worker_process_id, worker_version,
|
||||
mxaccess_progid, mxaccess_clsid }` from `MxAccessInteropInfo` + process info.
|
||||
- `ShutdownWorker` → honor `grace_period` then signal `StaRuntime` shutdown;
|
||||
base OK reply.
|
||||
|
||||
Where the control arms physically live (inside `MxAccessCommandExecutor` with
|
||||
injected collaborators, vs. intercepted one layer up where the runtime context
|
||||
already exists) is an implementation decision for the plan; the contract surface
|
||||
is identical either way.
|
||||
|
||||
### Workstream B — Worker MXAccess COM commands (6)
|
||||
|
||||
`Suspend`, `Activate`, `AuthenticateUser`, `ArchestrAUserToId`,
|
||||
`AddBufferedItem`, `SetBufferedUpdateInterval`. Each needs:
|
||||
|
||||
1. A wrapper method on `IMxAccessServer`
|
||||
(`src/ZB.MOM.WW.MxGateway.Worker/MxAccess/IMxAccessServer.cs`) and dispatch in
|
||||
`MxAccessComServer` (`MxAccessComServer.cs`). **Open question resolved on
|
||||
windev:** which native interface (`ILMXProxyServer` / `…3` / `…4`) exposes
|
||||
each method — confirm against the interop and `docs/MXAccess-Public-API.md`
|
||||
before writing the dispatch.
|
||||
2. An executor arm mapping request → wrapper call → reply
|
||||
(`SuspendReply`/`ActivateReply` carry `MxStatusProxy`; `AuthenticateUserReply`/
|
||||
`ArchestrAUserToIdReply` carry `user_id`; `AddBufferedItemReply` carries
|
||||
`item_handle`; `SetBufferedUpdateInterval` returns base OK).
|
||||
|
||||
`AuthenticateUser` credentials must never reach logs (standing rule).
|
||||
`AddBufferedItem`/`SetBufferedUpdateInterval` enable the already-wired
|
||||
`OnBufferedDataChange` event path (`MxAccessEventMapper.cs:231-254`); per the
|
||||
approved decision we **verify the buffered round-trip live on windev**,
|
||||
capturing a real multi-sample batch to validate the §3.2 conversion path
|
||||
(`Conversion/VariantConverter.cs`).
|
||||
|
||||
### Workstream C — §1.2 audit CorrelationId
|
||||
|
||||
Thread `request.ClientCorrelationId` from `MxAccessGatewayService.Invoke` →
|
||||
`ApplyConstraintsAsync` → the six filter helpers (`EnforceReadTagAsync`,
|
||||
`EnforceWriteHandleAsync`, `FilterTagBulkAsync`, `FilterReadBulkAsync`,
|
||||
`FilterWriteBulkAsync`, `FilterHandleBulkAsync`) →
|
||||
`IConstraintEnforcer.RecordDenialAsync`, which gains a `correlationId`
|
||||
parameter.
|
||||
|
||||
**Type detail:** `AuditEvent.CorrelationId` is `Guid?`, but `ClientCorrelationId`
|
||||
is a free-form string. The enforcer does `Guid.TryParse` and stores the value
|
||||
when parseable, else null. No audit-schema or contract change. Remove the
|
||||
`TODO(Task 2.3)` at `ConstraintEnforcer.cs:134-136`.
|
||||
|
||||
### Workstream D — Client CLI/helper parity (5 clients)
|
||||
|
||||
- **Go** `Write2` single-session helper, modeled on `Write` (`session.go:559`).
|
||||
- **Python CLI** — add the 4 `galaxy-*` commands wrapping existing `galaxy.py`
|
||||
library methods (`test_connection`, `get_last_deploy_time`,
|
||||
`discover_hierarchy`, `watch_deploy_events`).
|
||||
- **`ping` CLI** — add to Go (`cmd/mxgw-go/main.go`) and Java (`MxGatewayCli.java`).
|
||||
- **`browse` CLI** — add to **all 5**, wrapping each client's existing
|
||||
`LazyBrowseNode`/`Browse` helper → 0/5 → 5/5 parity.
|
||||
- **Galaxy name unification** — add canonical `galaxy-test-connection` /
|
||||
`galaxy-last-deploy` to Java, keeping `galaxy-test` / `galaxy-deploy-time` as
|
||||
**deprecated aliases** (no script breakage).
|
||||
- **`version` in dotnet** — the explorer found a `version` path at
|
||||
`MxGatewayClientCli.cs:85` that conflicts with the audit's §4.4. Treat as
|
||||
**verify-then-fix-only-if-genuinely-missing**.
|
||||
|
||||
### Workstream E — Docs/hygiene (§7) + residual recording
|
||||
|
||||
- D1 plan header (`docs/plans/2026-06-14-deferred-followups.md:4`).
|
||||
- Stale STA "production fix needed" prose (`docs/AlarmClientDiscovery.md:765-774`).
|
||||
- Stale EventsHub follow-up comment (`Dashboard/Hubs/EventsHub.cs:9-17`).
|
||||
- CLAUDE.md project-name drift (`MxGateway.*` → `ZB.MOM.WW.MxGateway.*`).
|
||||
- Remove dead `MapSqlException` (`GalaxyRepositoryGrpcService.cs:350-360`).
|
||||
- Record §1.3 (live failover counter unproven on rig) and §1.4 (8-arg ack drops
|
||||
operator domain/full-name; vendor stub) as explicitly-documented residuals.
|
||||
- Update `stillpending.md` to reflect what closed.
|
||||
|
||||
## Data flow / error handling specifics
|
||||
|
||||
- Control commands return base/typed replies with `ProtocolStatusCode.Ok`;
|
||||
failures map to the existing `CreateInvalidRequestReply` /
|
||||
`CreateAlarmFailureReply` helpers, never a thrown exception across the STA.
|
||||
- COM commands surface the native `HResult` on the reply exactly as the other
|
||||
COM arms do — MXAccess parity means we do not "fix" surprising returns
|
||||
(e.g. `AuthenticateUser` is allowed to fail; the live test tolerates it).
|
||||
- `ShutdownWorker` must not deadlock the STA: signal shutdown, return the reply,
|
||||
then let the pump drain — sequencing is a plan-level concern.
|
||||
|
||||
## Testing & cross-platform verification
|
||||
|
||||
| Workstream | Build/test | Host |
|
||||
|---|---|---|
|
||||
| A, B (worker) | `Worker.Tests` xUnit with a fake `IMxAccessServer` asserting each arm calls the right wrapper and maps the reply | **windev** (net48/x86) |
|
||||
| B (live) | `MXGATEWAY_RUN_LIVE_MXACCESS_TESTS=1` smoke incl. buffered capture | **windev** + live MXAccess |
|
||||
| C (gateway) | unit test: denied op persists parsed `CorrelationId` | local |
|
||||
| D (clients) | `go test`, `pytest`, `cargo test`+clippy, `gradle test`, `dotnet test` | local; Java on windev |
|
||||
| E (docs) | doc-only; `--check` regen where applicable | local |
|
||||
|
||||
TDD throughout, per-task commits, docs updated in the same change as source.
|
||||
|
||||
## Out of scope (documented, not built)
|
||||
|
||||
§1.3 live-drive (rig can't drive a real failover), §1.4 actual delivery +
|
||||
§3.4/§3.5 (AVEVA `AlarmAckByName` stub returns -55, `AlarmAckByGUID` is
|
||||
`E_NOTIMPL`), §3.1/§3.2-conversion-fix/§3.3/§3.6/§3.7 (await live captures),
|
||||
all of §2 (deliberate v1 scope, incl. validator-blocked multi-subscriber), §5
|
||||
(opt-in verification gates), §7.6 (`Won't Fix` review findings).
|
||||
Reference in New Issue
Block a user