Mark §1.1 (11 worker commands), §1.2 (audit CorrelationId), and §4 client CLI/helper parity as Resolved with commit refs; correct §4.4 (dotnet version already worked). Record open residuals: §1.3 live failover counter, §3.2 multi-sample buffered conversion, §1.4 vendor-stub ack, DrainEvents snapshot semantics.
21 KiB
Still Pending — Deferred / Partial / Unfinished / Missing Functionality
Generated: 2026-06-15 · Commit: c7f754c (main) · Method: six parallel read-only audits (Server, Worker, Contracts/proto, all five clients, docs/design/plans, tests + review backlog). Every item cites a verified file:line.
Resolution update (2026-06-15, branch
feat/stillpending-completion): The actionable items were implemented and verified perdocs/plans/2026-06-15-stillpending-completion.md. §1.1 (all 11 worker command kinds), §1.2 (audit CorrelationId), and the §4 client CLI/helper parity gaps are Resolved — see per-item annotations below. Worker COM commands are live-verified on the dev rig (efd9971,f7ada90). Remaining open items are the documented residuals (§1.3, §1.4, the §3 vendor/capture-gated questions incl. the new §3.2 multi-sample buffered residual) and the deliberate v1 scope of §2. Zero.protochanges were needed (all reply messages already existed).
How to read this
Items are graded by what they actually are, because most "pending" surface in this codebase is deliberate v1 scope, not accidental:
- 🔴 Genuine gap — real unfinished/missing functionality with user-visible impact; a candidate to actually build.
- 🟠 Parity hole — declared in-scope (proto/design) but not wired through; breaks "MXAccess parity" or cross-client parity.
- 🟡 Open question / vendor-gated — intentionally incomplete, awaiting a live MXAccess capture or an AVEVA fix; raw data preserved meanwhile.
- 🔵 Intentional v1 scope — deliberately deferred and documented; listed so it's catalogued, not because it's broken.
- ⚪ Verification gap — code exists but is unverified by default (opt-in/live tests).
- 📄 Stale doc / dead code — prose or code that lags reality.
1. Genuine gaps (real unfinished functionality)
✅ 1.1 — Worker does not implement 11 declared command kinds (biggest real gap) — RESOLVED
- Resolved: all 11 now implemented. The 5 control commands (
Ping,GetSessionState,GetWorkerInfo,DrainEvents,ShutdownWorker) are handled inWorkerPipeSession(off-STA —ShutdownWorkeron the STA would deadlock, and these read pipe-session state) —bf72cd8. The 6 COM commands (Suspend,Activate,AuthenticateUser,ArchestrAUserToId,AddBufferedItem,SetBufferedUpdateInterval) are implemented inMxAccessCommandExecutor(STA-dispatched) via newIMxAccessServer/MxAccessComServerwrappers selectingILMXProxyServer2/4/5—2939932. Live-verified on the dev rig (efd9971,f7ada90):ArchestrAUserToId→Ok(user_id=1),AddBufferedItem/SetBufferedUpdateInterval→Ok;AuthenticateUser/Suspend/Activate→realMxaccessFailure/HResult (parity, notINVALID_REQUEST).FakeWorkerHarnessnow answers the control commands so the default gateway suite covers them (bb5139f). Note:DrainEventsis a diagnostic snapshot — it competes with the 25 ms background stream-drain loop, so with an active event stream it returns only events not yet pushed (no loss/double-drain; the queue drain is lock-protected and destructive). - Original finding below (for history):
- Location:
src/ZB.MOM.WW.MxGateway.Worker/MxAccess/MxAccessCommandExecutor.cs:97-128(theExecuteswitch; everything else falls to_ => CreateInvalidRequestReply(...)). - What's missing: the proto
MxCommandKindenum defines these, the gateway validates (Grpc/MxAccessGrpcRequestValidator.cs:86-95), scopes (Security/Authorization/GatewayGrpcScopeResolver.cs:45-47), and routes them, and reply messages exist — but the worker answersINVALID_REQUEST:- MXAccess COM commands (parity-critical):
AddBufferedItem,SetBufferedUpdateInterval,Suspend,Activate,AuthenticateUser,ArchestrAUserToId(mxaccess_gateway.proto:107-116,151-160). - Worker control/lifecycle:
Ping,GetSessionState,GetWorkerInfo,DrainEvents,ShutdownWorker(mxaccess_gateway.proto:133-137,177-181).
- MXAccess COM commands (parity-critical):
- Why it matters:
gateway.md:890-899,docs/MxAccessWorkerInstanceDesign.md:424-433, anddocs/DesignDecisions.md:169-173list all six COM commands under "Phase 4: Full Command Surface" with the exit criterion "gRPC command surface covers the installed MXAccess public method set." So this is a declared-in-scope phase that isn't finished, not a v1 cut. - Masked by tests: the control kinds (
Ping/GetWorkerInfo/DrainEvents/ShutdownWorker) are exercised only throughFakeWorkerHarness, so the unit suite is green while the real worker can't answer them. The live integration testWorkerLiveMxAccessSmokeTests.cs:931even sendsAuthenticateUser, which would get an invalid-request reply today. - Note:
OnBufferedDataChangeevent mapping IS fully wired (Conversion/MxAccessEventMapper.cs:231-254) — but withAddBufferedItem/SetBufferedUpdateIntervalunimplemented there is no way to set buffered eventing up.
✅ 1.2 — Constraint-denial audit events drop CorrelationId — RESOLVED
- Resolved (
8415f35,55526d5):request.ClientCorrelationIdis threaded fromInvoke→ApplyConstraintsAsync→ the six filter helpers →IConstraintEnforcer.RecordDenialAsync(newstring? correlationIdparam); theTODO(Task 2.3)is gone. The auditCorrelationIdcolumn isGuid?, so a GUID-parseable id is stored typed; and the raw string is always preserved in the audit record'sDetailsJson["clientCorrelationId"]— this matters because the Rust client sends non-GUID ids (rust-client-<op>-<n>) on all traffic and Python/Java default to empty, which would otherwise have left the typed column null. An end-to-end test asserts the value propagates throughInvoke. - Original finding: denied-operation audit records always wrote
CorrelationId = null(ConstraintEnforcer.cs:134-136,147); threading needed anIConstraintEnforcersignature change.
🔴 1.3 — provider_switches{from,to,reason} counter never exercised live
- Location: metric emitted in
Alarms/GatewayAlarmMonitor.cs(failover/failback path); residual recorded indocs/plans/2026-06-14-deferred-followups.md:124-125. - Evidence: "that counter's live exercise remains the one gap; record it explicitly rather than claiming coverage." Unit-tested (Tests-032) but the dev rig can't drive a real alarmmgr→subtag failover (
project_rig_alarms_object_driven), so the counter'sreasontagging is unproven in production.
🟠 1.4 — Worker 8-arg alarm ack silently discards operator domain / full name
- Location:
src/ZB.MOM.WW.MxGateway.Worker/MxAccess/WnWrapAlarmConsumer.cs:261-278(_ = ackOperatorDomain; _ = ackOperatorFullName;). - Evidence: "the IwwAlarmConsumer2 8-arg AlarmAckByName returns -55 on this AVEVA build (looks like a stub) … fields are accepted by the proto for forward-compat but are not propagated to AVEVA today."
- Impact: two contract fields are accepted on the wire and silently not delivered. Root cause is the vendor stub (see 3.5), but the drop is currently invisible to callers.
2. Intentional v1 scope decisions (deliberately deferred — catalog)
These are documented, deliberate, and mostly enforced. Listed so the deferred surface is in one place — none are bugs. Canonical register: docs/DesignDecisions.md:466-474 ("Later Revisit Items") + gateway.md "Post-v1 revisit items".
- 🔵 Reconnectable sessions — not in v1.
docs/DesignDecisions.md:63-73,gateway.md:1087,1101. - 🔵 Multi-event-subscriber fan-out — plumbed but blocked. The option flows all the way to
Sessions/GatewaySession.cs:387-408 AttachEventSubscriber(allowMultipleSubscribers), butConfiguration/GatewayOptionsValidator.cs:181-185hard-rejects the only enabling value: "AllowMultipleEventSubscribers is not supported until event fan-out is implemented." So the fan-out code path never runs.docs/DesignDecisions.md:75-80. - 🔵 Gateway restart does not reattach orphan workers — terminates them on startup.
docs/DesignDecisions.md:65-69,CLAUDE.md. - 🔵 Workers run as the gateway service identity — restricted service account is a reserved extension point.
docs/DesignDecisions.md:179-184. - 🔵 Fail-fast event backpressure, no coalescing — opt-in coalescing is post-v1.
docs/DesignDecisions.md:187-203. - 🔵 No public command batching —
docs/DesignDecisions.md:206-212. - 🔵 API-key admin is a local CLI only — no public admin RPC.
docs/DesignDecisions.md:308-323. - 🔵 No Blazor UI component libraries — hard constraint.
docs/DesignDecisions.md:342-358. - 🔵 Lazy browse is wire-only — no lazy SQL / cache loading.
docs/DesignDecisions.md:365-376,docs/plans/2026-05-28-lazy-browse-design.md:30. - 🔵 No server-side / streaming browse search —
docs/plans/2026-05-28-lazy-browse-design.md:208. - 🔵 Alarm command surface is ack + query only — no Clear/Disable/Enable/Silence/Shelve/Inhibit; matches the MXAccess alarm-client set.
Worker/MxAccess/AlarmCommandHandler.cs, shelve/suppress out of scope perdocs/AlarmClientDiscovery.md:60-66. - 🔵 Dashboard EventsHub has no per-session ACL — any authenticated dashboard user may subscribe to any session group.
Dashboard/Hubs/EventsHub.cs:36-50(TODO(per-session-acl)); only relevant once a per-session role model exists.
3. MXAccess parity — open questions & vendor-gated items
Intentionally incomplete, awaiting a live capture or an AVEVA fix; raw payload/metadata is preserved in the meantime (no synthesis).
- 🟡 3.1
OperationCompletenative trigger condition unknown — modeled and emitted only from the real event (no synthesis), but the runtime condition that fires it isn't captured.docs/DesignDecisions.md:280-289,gateway.md:1094,docs/MxAccessWorkerInstanceDesign.md:341,366. - 🟡 3.2
OnBufferedDataChangemulti-sample conversion unvalidated — STILL OPEN (residual after B8) —AddBufferedItem/SetBufferedUpdateIntervalare now implemented and live-confirmed (§1.1), and the live B8 test (f7ada90) confirms the worker receives and cleanly converts the emptyNoDatabootstrapOnBufferedDataChange(no crash, no dropped payload). But the rig's object logic does not drive a multi-sample buffered batch on demand (same limitation as the alarm rig), so a real parallel quality/timestamp sample array (length > 1) has never been observed live — it is exercised only by the B-bundle unit tests against a fakeIMxAccessServer. Re-runGatewaySession_WithLiveWorker_BufferedItem_*against a fast-changing simulated tag to close this.docs/DesignDecisions.md:291-297. - 🟡 3.3 Completion-only status →
MXSTATUS_PROXY[]mapping unproven — completion-only operation-status bytes are kept as raw diagnostic metadata until the analysis proves an exact mapping.docs/DesignDecisions.md:299-306. - 🟡 3.4
AlarmAckByGUIDisE_NOTIMPLon this AVEVA build — throwsNotImplementedException; all acks route throughAlarmAckByName. Proto/worker keep the path for forward-compat but it is dead today.docs/AlarmClientDiscovery.md:750-763. - 🟡 3.5 8-arg
AlarmAckByNamev2 is a vendor stub (returns -55) — worker uses the 6-arg method; the 8-argdomain/full_namefields are carried for forward-compat only (see 1.4).docs/AlarmClientDiscovery.md:743-748. - 🟡 3.6 Subtag degraded-mode fidelity limits —
category,description,alarm_type_name, operator fields, andretriggerare not populated/synthesized in subtag fallback (no subtag exists for them). Documented, by design.docs/AlarmClientDiscovery.md:913-931,docs/plans/2026-06-13-alarm-subtag-fallback-design.md:292-298. - 🟡 3.7 Subtag
Cleartransition unvalidated live — Raise/Ack/AckMsg are live-confirmed; Clear is externally undrivable on the rig (object logic owns alarm state). Environmental, not code. (project_alarm_subtag_fallback,project_rig_alarms_object_driven.)
4. Clients — gaps & cross-client parity
Library RPC surface is at full parity: all gateway + GalaxyRepository RPCs and the LazyBrowseNode helper exist in all five clients, with no TODO/stub/not-implemented markers in production code. The CLI/helper gaps below are RESOLVED.
| Capability | Dotnet | Go | Python | Rust | Java |
|---|---|---|---|---|---|
Write2 single session helper |
✅ | ✅ 849f1d2 |
✅ | ✅ | ✅ |
ping CLI subcommand |
✅ | ✅ 90529dc |
✅ | ✅ | ✅ 0d5b488 |
version CLI subcommand |
✅ (already worked) | ✅ | ✅ | ✅ | ✅ |
galaxy-* CLI commands (4) |
✅ | ✅ | ✅ a211fae |
✅ | ✅ |
galaxy-browse / BrowseChildren CLI |
✅ | ✅ | ✅ | ✅ | ✅ (5/5) |
- ✅ 4.1 Go single
Write2helper — RESOLVED (849f1d2): addedWrite2/Write2Rawtoclients/go/mxgateway/session.go, matching the other four clients' signature. - ✅ 4.2 Python
galaxy-*CLI commands — RESOLVED (a211fae,a59fc99): addedgalaxy-test-connection/galaxy-last-deploy/galaxy-discover/galaxy-watchClick commands wrappinggalaxy.py; README corrected. (Fixed a UTC-offset bug in last-deploy output during review.) - ✅ 4.3
pingCLI added to Go + Java — RESOLVED (Go90529dc/742ced7, Java0d5b488). - ✅ 4.4
versionCLI in Dotnet — NOT MISSING (audit correction): the dotnetversionsubcommand already worked (MxGatewayClientCli.cs:85→WriteVersion, prints gateway/worker protocol versions). The original audit was wrong. Minor: unlike Go, dotnet'sversionomits a client-package-version line (MxGatewayClientContractInfoexposes only the two protocol versions) — cosmetic, not tracked. - ✅ 4.5 Galaxy CLI command-name divergence — RESOLVED (Java
0d5b488):galaxy-test-connection/galaxy-last-deployare now the canonical Java names, withgalaxy-test/galaxy-deploy-timekept as deprecated picocli aliases so existing scripts don't break. (Rust keeps itsgalaxy <subcommand>group style — a clap structural choice, not a name divergence.) - ✅ 4.6
browse/BrowseChildrenCLI — RESOLVED, 0/5 → 5/5 (Rust639e36b, Go8cb416b, Python39ec2a3, dotnetd7e2a8b, Java0d5b488). All five emit the per-node JSON keyhasChildrenHint(unified during review). Minor residual divergence: dotnet nests the Galaxy object fields under anobjectkey while Go/Rust/Python/Java flatten them — both carryhasChildrenHint+ a nestedchildrenarray; harmonizing the object nesting is a cosmetic follow-up, not tracked. - ⚪ 4.7 No typed wrappers for the rarer commands —
AuthenticateUser,ArchestrAUserToId,AddBufferedItem,Suspend/Activate,GetSessionState/GetWorkerInfo/DrainEvents/ShutdownWorkerremain reachable via the genericInvoke/invoke_rawescape hatch in all five clients (consistent and deliberate; the worker-side commands are now implemented per §1.1, but no client adds dedicated typed wrappers — out of scope, the CLIs that needed them gotping/browsesubcommands).
5. Verification gaps (code exists, unverified by default)
All live/integration paths are opt-in; the default unit suites do not exercise them.
- ⚪ Live MXAccess COM + STA + message pump —
Worker.Tests/MxAccess/MxAccessLiveComCreationTests.cs(5[LiveMxAccessFact]), gatedMXGATEWAY_RUN_LIVE_MXACCESS_TESTS=1. - ⚪ Live gateway↔worker↔MXAccess round-trip —
IntegrationTests/WorkerLiveMxAccessSmokeTests.cs(6[LiveMxAccessFact]). - ⚪ Live Galaxy Repository SQL —
IntegrationTests/Galaxy/GalaxyRepositoryLiveTests.cs(4[LiveGalaxyRepositoryFact]), gatedMXGATEWAY_RUN_LIVE_GALAXY_TESTS=1. - ⚪ Live LDAP dashboard auth —
IntegrationTests/DashboardLdapLiveTests.cs(5[LiveLdapFact]), gatedMXGATEWAY_RUN_LIVE_LDAP_TESTS=1. - ⚪ Alarm runtime/discovery probes (dev-rig) —
Worker.Tests/Probes/{WnWrapConsumerProbeTests,AlarmClientWmProbeTests}.cs,AlarmClientDiscoveryTests.cs— hard[Fact(Skip=...)]. - ⚪ Live alarm + subtag-fallback smoke —
Worker.Tests/Probes/{AlarmSubtagLiveSmokeTests,AlarmsLiveSmokeTests}.cs—Skip+ one[LiveMxAccessFact]; Clear path remains undrivable even when enabled. - ⚪ Python loopback TLS —
clients/python/tests/test_tls.py:111-112— gatedMXGATEWAY_RUN_TLS_TESTS=1+ openssl; only cert-config parsing runs by default. - ⚪ .NET client live browse smoke —
clients/dotnet/.../BrowseChildrenSmokeTests.cs:17-18— hard[Fact(Skip=...)]. - ⚪ Cross-language client↔gateway wire behavior — no per-client integration unit tests; only
scripts/run-client-e2e-tests.ps1against a live gateway (MXGATEWAY_INTEGRATION=1). All client wire behavior is unverified in default unit runs.
No placeholder/empty/Assert.True(true) tests were found anywhere.
6. Config-gated functional gaps (work only after configuration)
- 🟠 6.1 Alarm ack in subtag mode requires
AckCommentsubtag configured — empty by default; ack fails in subtag mode until set. Names must be validated against live MXAccess, not guessed.docs/DesignDecisions.md:454-458. (AckCommentSubtagis write-only;Worker/MxAccess/SubtagAlarmStateMachine.cs:21.) - 🔵 6.2 Multi-subscriber — see 2 (option exists, validator-blocked).
7. Stale docs, dead code, accepted gaps
- 📄 7.1 D1 plan header stale —
docs/plans/2026-06-14-deferred-followups.md:4still says "Plan only — NOT yet executed," but D1 is done (Dashboard/DashboardSnapshotService.cs:198, commit4af24b9). Update the plan status. - 📄 7.2
AlarmClientDiscovery.mdSTA "production fix needed" prose is stale —docs/AlarmClientDiscovery.md:765-774reads as a pending follow-up, but alarms now run through the worker STA /GatewayAlarmMonitor(merged). Re-check against current code. - 📄 7.3 EventsHub "publisher side is a follow-up" comment is stale —
Dashboard/Hubs/EventsHub.cs:9-17; theDashboardEventBroadcasterexists, is DI-registered (Dashboard/DashboardServiceCollectionExtensions.cs:47), runs in the live loop (Grpc/EventStreamService.cs:133), andSessionDetailsPage.razorrenders the feed. - 📄 7.4 CLAUDE.md project-name drift — CLAUDE.md uses
src/MxGateway.Server/MxGateway.Tests; the actual tree issrc/ZB.MOM.WW.MxGateway.*. Misleads path-based work. - ⚪ 7.5 Dead
MapSqlExceptionhelper —Grpc/GalaxyRepositoryGrpcService.cs:350-360, IDE0051-suppressed, kept for a hypothetical direct-SQL path that doesn't exist. - 7.6 Accepted code-review gaps (
Won't Fix, by design):Client.Python-012—Session.invoke_rawdeliberately skipsensure_mxaccess_success, so an embedded MXAccess HRESULT failure surfaces silently (raw-parity inspection).code-reviews/Client.Python/findings.md:290.Contracts-003— closed as not-a-defect.code-reviews/Contracts/findings.md.- (All 351 review findings are otherwise Resolved; none Open or Deferred.)
8. Deferred test-coverage follow-ups (noted in resolutions, never filed as findings)
- Java CLI bulk-subcommand coverage — 6 of 13 non-trivial subcommands untested:
read-bulk,write-bulk,write2-bulk,write-secured-bulk,write-secured2-bulk,bench-read-bulk(plusstream-events, the fourgalaxy-*,close-session).code-reviews/Client.Java/findings.md:495(Client.Java-026). - Per-session-ACL TODO at
Server/Dashboard/Hubs/EventsHub.cs(code-reviews/Server/findings.md:765). - Worker-Ready retry race noted at
code-reviews/Server/findings.md:611. - Duplicated
FakeWorkerProcessharness flagged as a latent regression vector —code-reviews/Tests/findings.md:463.
Bottom line
Status after feat/stillpending-completion (2026-06-15): the net-new functionality is done — §1.1 (all 11 worker command kinds, COM half live-verified on the rig), §1.2 (audit CorrelationId, with a raw-string fallback for non-GUID clients), and the entire §4 client CLI/helper parity surface (Write2, ping, galaxy-*, galaxy-browse 5/5, name aliases). Doc hygiene §7 is done (0032d2d, bd46ba1). Zero .proto changes were required.
Still open (all deliberate or environment/vendor-gated):
- §1.3 —
provider_switchescounter still only unit-tested; the dev rig can't drive a real alarmmgr→subtag failover, so livereason-tag coverage remains a recorded residual. - §1.4 / §3.4 / §3.5 — the AVEVA 8-arg
AlarmAckByNameis a vendor stub (−55) andAlarmAckByGUIDisE_NOTIMPL; thedomain/full_namefields stay forward-compat-only until AVEVA implements them. - §3.2 — buffered commands work and the empty bootstrap converts cleanly live, but a multi-sample buffered batch is undrivable on the rig (unit-tested only).
- §3.1 / §3.3 / §3.6 / §3.7 — await live MXAccess captures.
- §2 — deliberate v1 scope. §5 — opt-in verification gates. §7.6 — accepted
Won't Fixreview findings.
MXAccess event/data/value/write mapping, the Galaxy RPC surface, and now the full command surface are complete; no NotImplementedExceptions, stubbed RPC bodies, or empty tests remain in the production paths.