1aafd6bde4
Second re-review pass at commit a020350 caught 48 new findings — including
one High-severity regression I introduced in the prior sweep — and fixed
them all in one parallel wave.
High (1)
- Client.Python-018: prior sweep set `license = "Proprietary"` in
pyproject.toml. setuptools >= 77 enforces PEP 639 and rejects the
string (it must be a valid SPDX expression), so `pip wheel .` and
`pip install -e .` both fail before any source compiles. Tests
still pass because pytest bypasses the build backend via
`pythonpath`. Dropped the invalid license string, kept the
`License :: Other/Proprietary License` classifier, and added
`tests/test_packaging.py` so a future regression of the same shape
is caught in CI.
Mediums (6)
- Worker-023: `HeartbeatStuckCeiling` (default 75s = 5x HeartbeatGrace)
on WorkerPipeSessionOptions bounds the in-flight-command watchdog
suppression so a truly stuck COM call still triggers StaHung
instead of permanently defeating the watchdog.
- Client.Rust-018: reverted Rust's `latencyMs` split so the
cross-language bench comparison is apples-to-apples again;
`failureLatencyMs` kept as Rust-only enrichment.
- Client.Java-021: applied Client.Java-002's terminal-state
serialisation pattern to DeployEventStream so close() arriving
after queue-overflow can't erase the overflow exception.
- IntegrationTests-017: teardown-parity test now uses a two-window
stability check after UnAdvise instead of strict equality against
the pre-UnAdvise count (which raced against in-flight events).
- IntegrationTests-019: new RecordingTestOutputHelper wraps every
log sink the WriteSecured live test owns (worker stdout/stderr,
gateway logs, direct WriteLine) so the credential is proven
absent from the full output buffer, not just the diagnostic
message.
- Tests-020: added MxAccessGatewayServiceConstraintTests coverage
for the previously-uncovered Write2Bulk and WriteSecured2Bulk
arms of WriteBulkConstraintPlan.SetPayload.
Lows (41 — highlights)
- Server: Galaxy glob cache eviction is race-free (Server-024);
GalaxyRepositoryGrpcService takes IGalaxyRepository (Server-025);
AlarmsOptions validated at startup (Server-026); Authorization.md
Constraint Enforcement snippet/prose enumerate the bulk write/read
family (Server-027); bulk-read-commands and bulk-write-commands
capability tokens added to OpenSession (Server-029);
NotWiredAlarmRpcDispatcher XML doc and missing scope-resolver and
state-machine tests cleaned up (023, 028).
- Worker: AlarmCommandHandler now invokes the same STA-affinity
guard the poll path uses, at every command entry (Worker-024);
RunAsync null-checks the runtime-session factory result
(Worker-025).
- Worker.Tests: shared LiveMxAccessOptInVariableName lives on
GatewayContractInfo (Worker.Tests-025); MxAccessSession.CreateForTesting
rejects production sinks (Worker.Tests-026); FakeRuntimeSession's
CancelCommandReturnValue serialised under lock (Worker.Tests-027);
Probes namespace lifted to MxGateway.Worker.Tests.Probes
(Worker.Tests-029); cancel-envelope sequence numbers monotonised
(Worker.Tests-030); docs/GatewayTesting.md gains a "Dev-rig Probes"
section (Worker.Tests-028).
- Tests: ManualTimeProvider consolidated into one TestSupport/ copy
(Tests-021); SessionManagerBulkTests adds a mid-flight cancellation
test backed by a TaskCompletionSource fake (Tests-022); companion
FakeWorkerProcess.WaitForExitAsync no longer fakes its exit signal
(Tests-023); constraint plan reply-count divergence pinned
(Tests-024).
- IntegrationTests: TryGetSession chain carries [MaybeNullWhen(false)]
end-to-end (IntegrationTests-018); abnormal-exit keyword set
tightened to pipe-disconnected/end-of-stream and the test now
asserts streamTask.IsFaulted (020, 021).
- Client.Dotnet: bench commands added to isLongRunning so the
default 30s wall-clock budget doesn't kill them (015);
BenchStreamEventsAsync observes the inner stream task on every
exit path (016).
- Client.Go: parseValue wraps strconv errors with flag context and
%w (017); bench loops honour ctx.Done() (018); galaxy-watch parses
RFC3339Nano with fractional seconds (019); runStreamEvents installs
signal.NotifyContext like runGalaxyWatch (020); five new CLI-level
table-driven tests cover the bulk/bench subcommands (021).
- Client.Java: toCompletable Javadoc rewritten to match the actual
cancellation contract Client.Java-015 established (022); stream-events
text path uses Long.toUnsignedString for worker_sequence (023);
bench-read-bulk no longer pollutes success-latency histogram with
failure durations (024); --shutdown-timeout CLI option propagates
through to ClientOptions (025); seven new MxGatewayCliTests cover
the bulk and bench commands (026).
- Client.Python: mxgateway_cli ships its own py.typed marker (019);
wheel-build smoke test added under tests/test_packaging.py (020);
README documents the Galaxy CLI parity gap explicitly (021).
- Client.Rust: RustClientDesign.md signatures match session.rs and
document the AsRef<str> read_bulk genericism (019);
next_correlation_id re-exported at the crate root, with a
property-style doc contract and an explicit disclaimer that the
literal textual format is not part of the contract (020).
- Contracts: BulkWriteResult comment names the actual
IConstraintEnforcer mechanism instead of "tag-allowlist filter"
(014); BulkReadResult gains explicit per-arm payload-population
documentation for the success vs failure cases (015).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
255 lines
38 KiB
Markdown
255 lines
38 KiB
Markdown
# Code Review — Contracts
|
|
|
|
| Field | Value |
|
|
|---|---|
|
|
| Module | `src/MxGateway.Contracts` |
|
|
| Reviewer | Claude Code |
|
|
| Review date | 2026-05-20 |
|
|
| Commit reviewed | `a020350` |
|
|
| Status | Reviewed |
|
|
| Open findings | 0 |
|
|
|
|
## Checklist coverage
|
|
|
|
This re-review covers the Contracts module at `a020350`, after Contracts-009 through Contracts-013 (plus Client.Rust-013's proto comment reformat on `ReadBulkCommand`) were resolved against `1cd51bb`. The Contracts source under review is unchanged from `1cd51bb` apart from the documentation-only updates introduced by `a020350`; the pass re-checks every category on the bulk write/read family, the alarm reply surface, and the GalaxyRepository contract that were the target of those resolutions.
|
|
|
|
| # | Category | Result |
|
|
|---|---|---|
|
|
| 1 | Correctness & logic bugs | Bulk command kinds, `BulkWriteResult`, and `BulkReadResult` align with the worker executor (`MxAccessSession.ReadBulk` / `ExecuteBulkWriteEntry`), the gateway server-side filter (`MxAccessGatewayService.ReplaceWriteBulkEntries`), the validator (`MxAccessGrpcRequestValidator.ExpectedPayload`, covering every new kind), and the round-trip tests added under Contracts-010. Field numbering across all three protos remains additive and contiguous — `MxCommand.payload` 10-43 + 100-104, `MxCommandReply.payload` 20-40 + 100-102, `MxCommandKind` 0-34 + 100-104, `WorkerEnvelope.body` 10-20 — with no number reused or repurposed. No new functional bugs. |
|
|
| 2 | mxaccessgw conventions | Wire-compatibility policy comment blocks (Contracts-005 resolution) are present at the top of all three `.proto` files and the bulk additions honour them — every change since the prior review is additive. Generated code under `Generated/` is untouched. Naming, `snake_case` field names, `PascalCase` messages, enum-prefix discipline, oneof usage for command/reply/value/event/envelope, and the credential-sensitivity comments per the ProtobufStyleGuide are all consistent. No new violations. |
|
|
| 3 | Concurrency & thread safety | N/A — pure contract definitions plus a static constants class (`GatewayContractInfo`) with no shared mutable state. |
|
|
| 4 | Error handling & resilience | `BulkWriteResult` carries `was_successful` + `optional int32 hresult` + `repeated MxStatusProxy statuses` + `error_message` per entry; `BulkReadResult` carries `was_successful` + `was_cached` + per-entry `value`/`quality`/`source_timestamp`/statuses/`error_message`. The deliberate absence of `hresult` on `BulkReadResult` is pinned by `ProtobufContractRoundTripTests.BulkReadReply_RoundTripsCachedAndSnapshotResults` (descriptor assertion) and matches the documented "ReadBulk outcomes are timeout / cache / lifecycle states, not MXAccess COM return codes" rationale. The `AcknowledgeAlarmReply.status` reservation comment (Contracts-008) and the by-name ack reuse comment (Contracts-002) keep ack outcome handling unambiguous. No new issues. |
|
|
| 5 | Security | The single-item and bulk `WriteSecured` / `WriteSecured2` paths now carry the credential-sensitivity comment on both the outer command (`WriteSecuredBulkCommand` / `WriteSecured2BulkCommand`) and each entry's `value` field (Contracts-011 resolution). `AuthenticateUserCommand.verify_user_password` carries the same redaction note. No new secret-leak surfaces. |
|
|
| 6 | Performance & resource management | `ReadBulk` is still the only command without a 1:1 MXAccess analogue; the per-tag `timeout_ms` cap and `was_cached` short-circuit prevent disturbing existing subscriptions. `BulkWriteReply` / `BulkReadReply` are flat repeated lists with no nested pagination machinery, matching the "one round-trip per batch" Bulk Command Family decision. No bloat issues. |
|
|
| 7 | Design-document adherence | `gateway.md`, `docs/Contracts.md` (Contracts-009 resolution), `docs/DesignDecisions.md` (Bulk Command Family), and `docs/AlarmClientDiscovery.md` (Contracts-002 / Contracts-008 resolutions) describe the contracts now in force. The `MX_COMMAND_KIND_WRITE2_BULK` / `MX_COMMAND_KIND_WRITE_SECURED2_BULK` enum-value names use the `<BASE>2_BULK` suffix order while the public reply oneof case names use `Write2Bulk` / `WriteSecured2Bulk` (the `2` precedes `Bulk` in PascalCase); both match the corresponding command-message names — no design-doc divergence. The proto comment on `BulkWriteResult` describes a "gateway's tag-allowlist filter" that does not exist by that name in source or docs — see new finding Contracts-014. |
|
|
| 8 | Code organization & conventions | Package / namespace / file layout correct; `csharp_namespace` options remain consistent; the worker proto continues to import `mxaccess_gateway.proto` rather than duplicate the command/reply/event/value/status surface. Additive-only contract evolution observed; field numbers continuous and isolated by 100+ from diagnostic/control commands. No new issues. |
|
|
| 9 | Testing coverage | `ProtobufContractRoundTripTests` now exercises all five new bulk write/read commands, both new reply types (with `HasHresult == true` / `HasHresult == false` arms for the proto3 optional, and a descriptor-level assertion that `BulkReadResult` has no `hresult` field), every new `MxCommandReply.payload` oneof case (parameterised `[Theory]`), and the existing alarm / Galaxy / worker-envelope cases. `GatewayContractInfoTests` pins the `GatewayProtocolVersion = 3` constant for both the alarm and bulk write/read additions. No new gaps observed at the contracts surface. |
|
|
| 10 | Documentation & comments | The bulk additions all carry per-message documentation comments (`WriteBulkCommand`, `Write2BulkCommand`, `WriteSecuredBulkCommand`, `WriteSecured2BulkCommand`, `ReadBulkCommand`) and per-field credential-sensitivity comments on `WriteSecured*BulkEntry.value`. `GalaxyAttribute.mx_data_type` / `data_type_name` / `mx_attribute_category` / `security_classification` carry the parity-detail comments added under Contracts-012. Two residual gaps remain — the misleading "tag-allowlist filter" wording on `BulkWriteResult` (new finding Contracts-014), and the absence of a comment on `BulkReadResult.value` / `quality` / `source_timestamp` / `statuses` describing what they carry when `was_successful = false` (new finding Contracts-015). |
|
|
|
|
## Findings
|
|
|
|
### Contracts-001
|
|
|
|
| Field | Value |
|
|
|---|---|
|
|
| Severity | Low |
|
|
| Category | Design-document adherence |
|
|
| Location | `docs/Grpc.md:13` (and `:3`, `:32`, `:39`) |
|
|
| Status | Resolved |
|
|
|
|
**Description:** `mxaccess_gateway.proto` now declares six RPCs on `MxAccessGateway` (`OpenSession`, `CloseSession`, `Invoke`, `StreamEvents`, `AcknowledgeAlarm`, `QueryActiveAlarms`). `docs/Grpc.md` still describes "the four `MxAccessGateway` RPCs" in its type table and omits `AcknowledgeAlarm`/`QueryActiveAlarms` from the Validation Rules table. CLAUDE.md requires docs to change in the same commit as the contract; the alarm RPC commits left this doc stale and misleading about the public surface.
|
|
|
|
**Recommendation:** Update `docs/Grpc.md` to enumerate all six RPCs and add `AcknowledgeAlarm`/`QueryActiveAlarms` to the type/handler and validation tables, or explicitly cross-reference `AlarmClientDiscovery.md`.
|
|
|
|
**Resolution:** _(2026-05-18)_ Confirmed against `mxaccess_gateway.proto` — six RPCs declared, doc said "four". Updated `docs/Grpc.md`: the collaborator table now says "six `MxAccessGateway` RPCs", the RPC Handlers intro enumerates all six, added dedicated `AcknowledgeAlarm` and `QueryActiveAlarms` handler subsections (noting the alarm surface routes through `IAlarmRpcDispatcher` and is validated inline rather than via `MxAccessGrpcRequestValidator`, with a cross-reference to `AlarmClientDiscovery.md`), and added both alarm RPCs to the Validation Rules table.
|
|
|
|
### Contracts-002
|
|
|
|
| Field | Value |
|
|
|---|---|
|
|
| Severity | Medium |
|
|
| Category | Error handling & resilience |
|
|
| Location | `src/MxGateway.Contracts/Protos/mxaccess_gateway.proto:384-385`, `:95` |
|
|
| Status | Resolved |
|
|
|
|
**Description:** `MxCommandKind` includes `MX_COMMAND_KIND_ACKNOWLEDGE_ALARM_BY_NAME = 29` and `MxCommand.payload` carries `AcknowledgeAlarmByNameCommand acknowledge_alarm_by_name_command = 38`, but `MxCommandReply.payload` has only `acknowledge_alarm = 34` and `query_active_alarms = 35` — there is no by-name reply case. The by-name ack must reuse `AcknowledgeAlarmReplyPayload` or rely on the top-level `hresult`. The command/reply payload asymmetry is undocumented and easy to dispatch incorrectly.
|
|
|
|
**Recommendation:** Either add an explicit comment to `MxCommandReply` stating that by-name ack reuses the `acknowledge_alarm` payload case, or add a dedicated payload case for symmetry, and document the chosen contract in `docs/Contracts.md` / `AlarmClientDiscovery.md`.
|
|
|
|
**Resolution:** _(2026-05-18)_ Verified against both the `.proto` and the dispatch code. The asymmetry is intentional and the code is correct: the worker's `MxAccessCommandExecutor.ExecuteAcknowledgeAlarmByName` builds `reply.AcknowledgeAlarm = new AcknowledgeAlarmReplyPayload { NativeStatus = rc }` — deliberately reusing the `acknowledge_alarm` payload case — and the gateway's `WorkerAlarmRpcDispatcher.AcknowledgeAsync` only reads the top-level `hresult`/`protocol_status`, so both ack arms work. The gap was documentation only. Took the finding's preferred option (a) — comment-only, no wire-format or generated-type change: added explicit comments to the `acknowledge_alarm` reply-payload case and to the `AcknowledgeAlarmReplyPayload` message in `mxaccess_gateway.proto` stating both ack kinds reuse this case and consumers must dispatch on `MxCommandReply.kind`, and documented the contract in `docs/AlarmClientDiscovery.md` section 4. Added regression test `ProtobufContractRoundTripTests.MxCommandReply_AcknowledgeAlarmByName_ReusesAcknowledgeAlarmPayloadCase` pinning the by-name-ack → `acknowledge_alarm` reuse and asserting no by-name-specific reply oneof case exists.
|
|
|
|
### Contracts-003
|
|
|
|
| Field | Value |
|
|
|---|---|
|
|
| Severity | Low |
|
|
| Category | Code organization & conventions |
|
|
| Location | `src/MxGateway.Contracts/MxGateway.Contracts.csproj:10` |
|
|
| Status | Won't Fix |
|
|
|
|
**Description:** The `<Protobuf>` item for `mxaccess_worker.proto` omits `ProtoRoot="Protos"`, while the items for `mxaccess_gateway.proto` (line 9) and `galaxy_repository.proto` (line 11) both set it. `mxaccess_worker.proto` does `import "mxaccess_gateway.proto"`, which resolves only because Grpc.Tools adds the importing file's own directory to the proto path. The inconsistency is fragile — tooling changes to ProtoRoot handling could break import resolution.
|
|
|
|
**Recommendation:** Add `ProtoRoot="Protos"` to the `mxaccess_worker.proto` `<Protobuf>` item so all three entries are consistent.
|
|
|
|
**Resolution:** _(2026-05-18)_ Re-triaged as not-a-defect: the finding's premise is factually wrong. Line 10 of `MxGateway.Contracts.csproj` already carries `ProtoRoot="Protos"` — all three `<Protobuf>` items are already consistent. `git show 6c64030:src/MxGateway.Contracts/MxGateway.Contracts.csproj` (the reviewed commit) confirms the attribute was present at review time too; the csproj has not been touched since `133c830`. No code change made. Status set to Won't Fix because there is nothing to fix.
|
|
|
|
### Contracts-004
|
|
|
|
| Field | Value |
|
|
|---|---|
|
|
| Severity | Low |
|
|
| Category | Documentation & comments |
|
|
| Location | `src/MxGateway.Contracts/GatewayContractInfo.cs:3-6` |
|
|
| Status | Resolved |
|
|
|
|
**Description:** The XML summary says the class exposes version metadata "before generated protobuf contracts are introduced." Generated protobuf contracts have long been introduced and are consumed across the solution. The comment is stale; the class now holds the authoritative `GatewayProtocolVersion`/`WorkerProtocolVersion` advertised in `OpenSessionReply` and used to validate `WorkerEnvelope` framing.
|
|
|
|
**Recommendation:** Reword the summary to describe the current purpose — version constants advertised in `OpenSessionReply` and used to validate `WorkerEnvelope` protocol framing.
|
|
|
|
**Resolution:** _(2026-05-18)_ Confirmed stale — the class is consumed by `GatewayApplication`/`OpenSessionReply` and `WorkerEnvelope` framing checks across the solution. Reworded the XML summary on `GatewayContractInfo` to describe the actual current purpose: `GatewayProtocolVersion` is advertised to clients in `OpenSessionReply`, and `WorkerProtocolVersion` validates `WorkerEnvelope` protocol framing on the gateway↔worker pipe.
|
|
|
|
### Contracts-005
|
|
|
|
| Field | Value |
|
|
|---|---|
|
|
| Severity | Low |
|
|
| Category | mxaccessgw conventions |
|
|
| Location | `src/MxGateway.Contracts/Protos/mxaccess_gateway.proto`, `src/MxGateway.Contracts/Protos/mxaccess_worker.proto` |
|
|
| Status | Resolved |
|
|
|
|
**Description:** The ProtobufStyleGuide mandates reserving removed field numbers / enum values. Evolution to date has been purely additive, so this is not a current violation — but none of the `.proto` files contain any `reserved` declarations, leaving no in-file guardrail for the first removal. This is a latent maintainability gap.
|
|
|
|
**Recommendation:** When any field or enum value is eventually removed, add a `reserved` range/name in the same change. Consider a short comment block in each message documenting the policy so future editors apply `reserved` rather than reusing tags.
|
|
|
|
**Resolution:** _(2026-05-18)_ Confirmed: no field or enum value has ever been removed, so adding `reserved` ranges now would be incorrect (there are no retired tags to reserve, and inventing ranges for never-used numbers would itself violate the contract). Took the finding's least-invasive option — added a short wire-compatibility policy comment block at the top of all three `.proto` files (`mxaccess_gateway.proto`, `mxaccess_worker.proto`, `galaxy_repository.proto`) stating the additive-only rule and instructing future editors to add a `reserved` range + name in the same change as any removal. Comment-only, no wire-format or generated-type change. The `reserved` declarations themselves remain correctly deferred to the first actual removal.
|
|
|
|
### Contracts-006
|
|
|
|
| Field | Value |
|
|
|---|---|
|
|
| Severity | Low |
|
|
| Category | Correctness & logic bugs |
|
|
| Location | `src/MxGateway.Contracts/Protos/mxaccess_gateway.proto:647` |
|
|
| Status | Resolved |
|
|
|
|
**Description:** `MxStatusProxy.success` is declared `int32 success = 1` with no comment. The name reads like a boolean flag but the type is a 32-bit integer (mirroring MXAccess `MXSTATUS_PROXY`, which stores a numeric success/HResult-like value). Without a comment a client author can reasonably misinterpret the field (treat non-1 as failure, or expect only 0/1).
|
|
|
|
**Recommendation:** Add a comment clarifying the semantic — what range of values it carries and how 0 vs non-zero map to MXAccess status — per the style guide rule to comment fields carrying raw MXAccess status detail.
|
|
|
|
**Resolution:** _(2026-05-18)_ Confirmed: `int32 success = 1` had no comment. Cross-checked against the worker `MxStatusProxyConverter`, which reads the COM struct's `success` field verbatim (a 16-bit signed value) without reinterpretation, and against the MXAccess analysis (`MXAccess-Public-API.md`: `MxStatus`/`MXSTATUS_PROXY` are identical structs with a `short success` member). Added a field comment to `MxStatusProxy.success` stating it mirrors the COM struct's numeric `success` member (NOT a boolean), is carried verbatim for diagnostics, and that clients should branch on `category` (`MX_STATUS_CATEGORY_OK` marks success) — deliberately avoiding an over-specified 0-vs-1 claim, since the gateway never maps `success` to an outcome and `category` is the authoritative field. Comment-only change.
|
|
|
|
### Contracts-007
|
|
|
|
| Field | Value |
|
|
|---|---|
|
|
| Severity | Low |
|
|
| Category | Testing coverage |
|
|
| Location | `src/MxGateway.Tests/Contracts/ProtobufContractRoundTripTests.cs` |
|
|
| Status | Resolved |
|
|
|
|
**Description:** `ProtobufContractRoundTripTests` covers gateway command/reply/event, alarm transition, alarm ack request/reply, active-alarm snapshot, and the worker envelope. It has no coverage for: (a) any `galaxy_repository.proto` message (`DiscoverHierarchy*`, `GalaxyObject`, `GalaxyAttribute`, `DeployEvent`, the `root` oneof, wrapper-typed fields); (b) `BulkSubscribeReply`/`SubscribeResult` and the bulk command kinds; (c) `MxValue`/`MxArray` `raw_value`/`RawArray` (`bytes`) paths and the `WorkerFault`/`WorkerHeartbeat` IPC bodies.
|
|
|
|
**Recommendation:** Add round-trip tests for the Galaxy Repository messages (including the `root` oneof and proto wrapper fields), the bulk-subscribe reply, and the remaining `WorkerEnvelope` body cases.
|
|
|
|
**Resolution:** _(2026-05-18)_ Confirmed the listed gaps and added round-trip tests to `ProtobufContractRoundTripTests` covering all three areas: (a) Galaxy Repository — `GalaxyRepositoryDescriptor_ContainsBrowseServiceMethods`, `DiscoverHierarchyRequest_RoundTripsRootOneofAndWrapperFields` (a `[Theory]` exercising all three `root` oneof arms plus the `Int32Value` wrapper `max_depth`), `DiscoverHierarchyReply_RoundTripsObjectAndAttributeGraph`, `DeployEvent_RoundTripsTimestampAndCounters`, `GalaxyConnectionReplies_RoundTrip`; (b) `BulkSubscribeReply_RoundTripsSubscribeResults` and `MxCommandReply_RoundTripsBulkSubscribePayload` (bulk-subscribe command kind + payload case); (c) `MxValue_RoundTripsRawValueBytesPayload`, `MxArray_RoundTripsRawArrayPayload`, `WorkerEnvelope_RoundTripsWorkerFaultBody`, `WorkerEnvelope_RoundTripsWorkerHeartbeatBody`. All new tests pass; the full `ProtobufContractRoundTripTests` class is 27 tests green.
|
|
|
|
### Contracts-008
|
|
|
|
| Field | Value |
|
|
|---|---|
|
|
| Severity | Low |
|
|
| Category | Design-document adherence |
|
|
| Location | `src/MxGateway.Contracts/Protos/mxaccess_gateway.proto:451-459`, `:627-636` |
|
|
| Status | Resolved |
|
|
|
|
**Description:** The worker-side `AcknowledgeAlarmReplyPayload` carries the alarm-ack outcome as `int32 native_status`, while the public `AcknowledgeAlarmReply` carries it as `MxStatusProxy status` plus `optional int32 hresult`. The comment explains the worker echoes `native_status` into `AcknowledgeAlarmReply.hresult`, but the two outcome shapes (raw `int32` vs structured `MxStatusProxy`) are not reconciled in `docs/Contracts.md` / `AlarmClientDiscovery.md`. A reader cannot tell whether `MxStatusProxy status` is always populated or only on COM-layer failure.
|
|
|
|
**Recommendation:** Document in `docs/Contracts.md` (or `AlarmClientDiscovery.md`) how the worker `native_status` maps onto the public reply's `status`/`hresult` pair so client authors know which field is authoritative.
|
|
|
|
**Resolution:** _(2026-05-18)_ Verified against `WorkerAlarmRpcDispatcher.AcknowledgeAsync`. The asymmetry is larger than the finding implies: the dispatcher copies the worker `MxCommandReply.hresult` into `AcknowledgeAlarmReply.hresult` but **never** assigns `AcknowledgeAlarmReply.status` — the `MxStatusProxy status` field is left UNSET on every reply. The proto comment on `status` ("Native MxAccess status describing the outcome of the ack") was therefore actively misleading. Fixed: (1) reworded the `mxaccess_gateway.proto` comments on `AcknowledgeAlarmReply.hresult` (now identifies it as the authoritative native-return-code field) and `AcknowledgeAlarmReply.status` (now states it is reserved/unset and clients must not depend on it); (2) extended `docs/AlarmClientDiscovery.md` section 4 with a "Worker `native_status` → public `AcknowledgeAlarmReply` mapping" subsection spelling out that `hresult` is authoritative (`0` = success) and `status` is always unset, and that clients should branch on `protocol_status` then `hresult`, never `status`.
|
|
|
|
### Contracts-009
|
|
|
|
| Field | Value |
|
|
|---|---|
|
|
| Severity | Medium |
|
|
| Category | Design-document adherence |
|
|
| Location | `docs/Contracts.md:13-24` |
|
|
| Status | Resolved |
|
|
|
|
**Description:** Commit `5e375f6` ("Add bulk read/write command family across worker, gateway, and clients") added five new command kinds — `WriteBulk`, `Write2Bulk`, `WriteSecuredBulk`, `WriteSecured2Bulk`, `ReadBulk` — plus the `BulkWriteReply` / `BulkWriteResult` and `BulkReadReply` / `BulkReadResult` shapes to `mxaccess_gateway.proto`. `gateway.md` (lines 299-322) was updated in that commit, but `docs/Contracts.md` was not. It still describes only the older bulk subscription family (`AddItemBulk`, `AdviseItemBulk`, `RemoveItemBulk`, `UnAdviseItemBulk`, `SubscribeBulk`, `UnsubscribeBulk`) returning `BulkSubscribeReply` with no mention of the bulk write/read commands or their per-entry result types. The CLAUDE.md rule "Update docs in the same change as the source. When public APIs, contracts, configuration, build steps, security behavior, event shapes, value conversion, status mapping, or lifecycle rules change, the affected docs … must change in the same commit" was violated for this addition. The result is that the canonical contracts document undercounts the public bulk surface by five commands.
|
|
|
|
**Recommendation:** Extend the bulk-commands paragraph in `docs/Contracts.md` to list the new `WriteBulk` / `Write2Bulk` / `WriteSecuredBulk` / `WriteSecured2Bulk` / `ReadBulk` command kinds, the per-entry request shape (`WriteBulkEntry` etc.), and the new reply types (`BulkWriteReply` carrying `BulkWriteResult`; `BulkReadReply` carrying `BulkReadResult`). Cross-reference `gateway.md` for the cached-vs-snapshot `ReadBulk` lifecycle and `docs/DesignDecisions.md` "Bulk Command Family" for the per-entry-result rationale rather than re-stating those details.
|
|
|
|
**Resolution:** _(2026-05-20)_ Confirmed `docs/Contracts.md` documented only the older bulk subscription family and never mentioned the bulk write/read additions from commit `5e375f6`. Cross-checked against `mxaccess_gateway.proto` (`MxCommand.payload` cases 39-43, `MxCommandKind` 30-34, the `Write*BulkCommand` / `Write*BulkEntry` shapes, `ReadBulkCommand` with `tag_addresses` + `timeout_ms`, `MxCommandReply.payload` cases 36-40, and the `BulkWriteReply`/`BulkWriteResult` + `BulkReadReply`/`BulkReadResult` messages). Extended the "Files" section of `docs/Contracts.md` with a new paragraph listing the five command kinds, the per-entry request shape for each `Write*Bulk` family (with the credential-sensitive redaction rule carried through to `WriteSecuredBulkEntry`/`WriteSecured2BulkEntry`), the `BulkWriteReply` + `BulkWriteResult` reply (including the `optional int32 hresult` field and the no-raise per-entry failure contract), and the `ReadBulkCommand` → `BulkReadReply` + `BulkReadResult` reply with the cached-vs-snapshot dual-mode semantics and the deliberate absence of `hresult` on `BulkReadResult`. Cross-references to `gateway.md` (lifecycle + scopes) and `docs/DesignDecisions.md` "Bulk Command Family" (rationale) added rather than re-stating those details.
|
|
|
|
### Contracts-010
|
|
|
|
| Field | Value |
|
|
|---|---|
|
|
| Severity | Low |
|
|
| Category | Testing coverage |
|
|
| Location | `src/MxGateway.Tests/Contracts/ProtobufContractRoundTripTests.cs` |
|
|
| Status | Resolved |
|
|
|
|
**Description:** Contracts-007 (closed 2026-05-18) added Galaxy Repository, bulk-subscribe, `MxValue.raw_value` / `MxArray.raw_values`, and `WorkerFault`/`WorkerHeartbeat` round-trip coverage. The bulk write/read messages added in commit `5e375f6` were never given equivalent coverage. `ProtobufContractRoundTripTests` has no test that exercises any of: `WriteBulkCommand` / `Write2BulkCommand` / `WriteSecuredBulkCommand` / `WriteSecured2BulkCommand` / `ReadBulkCommand`; `BulkWriteReply` / `BulkWriteResult`; `BulkReadReply` / `BulkReadResult`; the new `MxCommandReply.payload` oneof cases (`write_bulk`, `write2_bulk`, `write_secured_bulk`, `write_secured2_bulk`, `read_bulk`). The asymmetry that `BulkWriteResult` carries `hresult` and `BulkReadResult` does not, and the `optional int32 hresult` semantics on `BulkWriteResult`, are exactly the kind of wire-shape details prior contract tests have been written to pin.
|
|
|
|
**Recommendation:** Add `ProtobufContractRoundTripTests` cases mirroring the existing `BulkSubscribeReply_RoundTripsSubscribeResults` / `MxCommandReply_RoundTripsBulkSubscribePayload` pattern: at minimum one round-trip per new request-side message (`WriteBulkCommand` covers the entry-list case; one secured variant proves the credential-sensitive shape; `ReadBulkCommand` covers `timeout_ms`), one round-trip for each new reply payload (`BulkWriteReply` carrying `BulkWriteResult` with `hresult` set + unset to exercise the proto3 `optional` presence; `BulkReadReply` carrying a `was_cached = true` and a `was_cached = false` entry), and at least one `MxCommandReply` test pinning a new payload-oneof case (e.g. `MxCommandReply.PayloadCase == PayloadOneofCase.ReadBulk` for `MxCommandKind.ReadBulk`).
|
|
|
|
**Resolution:** _(2026-05-20)_ Added round-trip tests in `ProtobufContractRoundTripTests` covering every gap listed: per-request `WriteBulkCommand_RoundTripsEntries`, `Write2BulkCommand_RoundTripsEntriesWithTimestampValue`, `WriteSecuredBulkCommand_RoundTripsCredentialBearingEntries`, `WriteSecured2BulkCommand_RoundTripsCredentialBearingEntriesWithTimestamp`, `ReadBulkCommand_RoundTripsTagAddressesAndTimeout`; per-reply `BulkWriteReply_RoundTripsResultsWithOptionalHresultPresence` (asserts both `HasHresult == true` and `HasHresult == false` arms of the proto3 `optional int32 hresult`) and `BulkReadReply_RoundTripsCachedAndSnapshotResults` (covers `was_cached = true`, `was_cached = false`, and a per-entry failure with `error_message`; additionally pins the deliberate absence of an `hresult` field on `BulkReadResult` via the descriptor); and `MxCommandReply` oneof-case pinning via `MxCommandReply_RoundTripsBulkWritePayloadCases` (a `[Theory]` exercising the four bulk-write payload-oneof cases) plus `MxCommandReply_RoundTripsReadBulkPayload`. All new tests pass; the full `ProtobufContractRoundTripTests` + `GatewayContractInfoTests` filter is 42 tests green.
|
|
|
|
### Contracts-011
|
|
|
|
| Field | Value |
|
|
|---|---|
|
|
| Severity | Low |
|
|
| Category | Security |
|
|
| Location | `src/MxGateway.Contracts/Protos/mxaccess_gateway.proto:392-397`, `:406-412` |
|
|
| Status | Resolved |
|
|
|
|
**Description:** The single-item `WriteSecuredCommand` (line 234-242) and `WriteSecured2Command` (line 244-253) put the credential-sensitivity redaction note on the `value` field directly ("Credential-sensitive write value. Implementations must not log this field unless an explicit redacted value-logging path is enabled."). The bulk equivalents move the note to the outer message instead — `WriteSecuredBulkCommand` (line 383-386) and `WriteSecured2BulkCommand` (line 399-400) carry it as a header comment — and the inner `WriteSecuredBulkEntry.value` (line 396) and `WriteSecured2BulkEntry.value` (line 410) are left without per-field comments. A future editor reading just `WriteSecuredBulkEntry` to add a new field or change the entry shape will not see the redaction rule. The ProtobufStyleGuide explicitly requires "Mark credential-bearing request fields clearly in comments"; the single-item path follows that rule, the bulk path does not.
|
|
|
|
**Recommendation:** Add per-field credential-sensitivity comments to `WriteSecuredBulkEntry.value` and `WriteSecured2BulkEntry.value` matching the wording on `WriteSecuredCommand.value` / `WriteSecured2Command.value`. Comment-only change with no wire-format or generated-type impact.
|
|
|
|
**Resolution:** _(2026-05-20)_ Added per-field credential-sensitivity comments to `WriteSecuredBulkEntry.value` and `WriteSecured2BulkEntry.value` in `mxaccess_gateway.proto`, mirroring verbatim the wording carried on `WriteSecuredCommand.value` / `WriteSecured2Command.value` ("Credential-sensitive write value. Implementations must not log this field unless an explicit redacted value-logging path is enabled."). The outer-message header redaction comment on `WriteSecuredBulkCommand` / `WriteSecured2BulkCommand` is retained so the rule is visible at both scopes. Comment-only change; no wire-format or generated-type impact (the `MxGateway.Contracts` build is clean against the regenerated code).
|
|
|
|
### Contracts-012
|
|
|
|
| Field | Value |
|
|
|---|---|
|
|
| Severity | Low |
|
|
| Category | Documentation & comments |
|
|
| Location | `src/MxGateway.Contracts/Protos/galaxy_repository.proto:120` |
|
|
| Status | Resolved |
|
|
|
|
**Description:** `GalaxyAttribute.mx_data_type` is declared as `int32` with no in-proto comment. The field carries the raw Galaxy SQL DB type identifier (from `dbo.data_type`), which deliberately does NOT correspond to the public `MxDataType` enum in `mxaccess_gateway.proto`; `docs/Contracts.md` calls this out ("The service is metadata-only and does not share types with mxaccess_gateway.proto") and `docs/GalaxyRepository.md:190` documents the choice ("`mx_data_type` is returned as the raw Galaxy integer rather than mapped to a language-neutral enum"), but the proto file itself gives the reader no signal. A client author looking at the .proto without those docs is likely to assume the field is a `MxDataType` value and write a `(MxDataType)` cast that silently misclassifies most attributes. The ProtobufStyleGuide rule "Comment fields that carry MXAccess parity details, raw HRESULT/status information, or compatibility constraints" applies — this is exactly a parity-detail / compatibility-constraint field where the int32 has non-obvious semantics. The accompanying `data_type_name`, `mx_attribute_category`, and `security_classification` int fields share the same gap.
|
|
|
|
**Recommendation:** Add a short comment on `GalaxyAttribute.mx_data_type` (and ideally on `mx_attribute_category` and `security_classification`) clarifying that the value is a raw Galaxy SQL identifier passed through unchanged, NOT a member of the `mxaccess_gateway.v1.MxDataType` enum, with a pointer to `docs/GalaxyRepository.md`. Comment-only change; no wire-format impact.
|
|
|
|
**Resolution:** _(2026-05-20)_ Added in-proto comments to `GalaxyAttribute.mx_data_type`, `data_type_name`, `mx_attribute_category`, and `security_classification` in `galaxy_repository.proto`. The `mx_data_type` comment explicitly calls out that the value is a raw Galaxy SQL `dbo.data_type` identifier passed through unchanged, that it is NOT a member of `mxaccess_gateway.v1.MxDataType`, and that the two enumerations must not be cast or compared (closing the silent-misclassification trap the finding describes). The `data_type_name` comment clarifies it is free-form Galaxy text from the same table, not a stable enum. `mx_attribute_category` and `security_classification` comments mark them as raw Galaxy-specific identifiers not mapped to any gateway enum. All four comments cross-reference `docs/GalaxyRepository.md` for the rationale rather than restating it. Comment-only change; no wire-format impact.
|
|
|
|
### Contracts-013
|
|
|
|
| Field | Value |
|
|
|---|---|
|
|
| Severity | Low |
|
|
| Category | Documentation & comments |
|
|
| Location | `src/MxGateway.Tests/Contracts/GatewayContractInfoTests.cs:14` |
|
|
| Status | Resolved |
|
|
|
|
**Description:** The XML summary on `GatewayContractInfoTests.GatewayProtocolVersion_IsVersionThree` reads "Verifies that the gateway protocol version is bumped to three after the alarm proto extension." That description is now incomplete: since the comment was written, the contract has been extended again (the bulk write/read command family in commit `5e375f6`) without a corresponding `GatewayProtocolVersion` bump. The test name says "IsVersionThree" but the summary attributes the value-of-3 to a single historical event (the alarm extension) — readers checking whether subsequent contract additions should have bumped the version will get a misleading rationale. This is the same class of stale-summary issue as Contracts-004 (`GatewayContractInfo` class summary), just relocated to the test that pins the constant.
|
|
|
|
**Recommendation:** Reword the summary to describe what the test pins (the current `GatewayProtocolVersion` constant equals 3) rather than narrating a specific historical bump, OR explicitly enumerate the alarm- and bulk-write/read additions covered under version 3 so readers know both extensions were additive and intentionally did not require a bump.
|
|
|
|
**Resolution:** _(2026-05-20)_ Reworded the XML summary on `GatewayContractInfoTests.GatewayProtocolVersion_IsVersionThree` to describe what the test actually pins: the current `GatewayProtocolVersion` constant equals 3, with both the alarm proto extension (`AcknowledgeAlarm` / `QueryActiveAlarms` RPCs, `OnAlarmTransitionEvent`, the alarm command/reply payload cases) AND the bulk write/read command family extension (`WriteBulk` / `Write2Bulk` / `WriteSecuredBulk` / `WriteSecured2Bulk` / `ReadBulk` with their `BulkWriteReply` / `BulkReadReply` payloads) shipping under version 3 as strictly additive changes that did not require a further bump. The new summary also instructs that a future breaking contract change should bump the constant and update the test in lock-step. Test logic is unchanged; the test still passes.
|
|
|
|
### Contracts-014
|
|
|
|
| Field | Value |
|
|
|---|---|
|
|
| Severity | Low |
|
|
| Category | Documentation & comments |
|
|
| Location | `src/MxGateway.Contracts/Protos/mxaccess_gateway.proto:549-553` |
|
|
| Status | Resolved |
|
|
|
|
**Description:** The `BulkWriteResult` header comment says `item_handle` mirrors the request entry "so callers can correlate inputs to outputs even when the gateway's tag-allowlist filter dropped some entries before reaching the worker." No "tag-allowlist filter" exists by that name anywhere in `src/`, `gateway.md`, `docs/`, or `docs/style-guides/` — a full-tree search returns matches only inside this proto comment and the prior-pass code-reviews. The real gateway-side bulk-write filter is `MxAccessGatewayService` calling `IConstraintEnforcer.CheckWriteHandleAsync` per entry (see `src/MxGateway.Server/Grpc/MxAccessGatewayService.cs:565-585` and `src/MxGateway.Server/Security/Authorization/IConstraintEnforcer.cs`); failures populate a synthetic `BulkWriteResult` with `was_successful = false` and the constraint's `ErrorMessage` is recorded via `constraintEnforcer.RecordDenialAsync`. The mechanism is a per-API-key constraint enforcer that can reject by handle (not a "tag" list), and the failure path covers any `ConstraintFailure` reason (write-handle scope, audit policy, etc.) — not a single inclusive tag allowlist. A future reader of the proto will search for "tag-allowlist" and find nothing, or worse, build a non-existent feature against the misleading name. The contract concept the comment is trying to communicate (item-level correlation matters because the gateway can drop entries before the worker sees them) is correct and worth keeping.
|
|
|
|
**Recommendation:** Reword the `BulkWriteResult` header comment to identify the actual mechanism — for example: "...so callers can correlate inputs to outputs even when the gateway's per-entry `IConstraintEnforcer.CheckWriteHandleAsync` filter (see `docs/Authorization.md`) dropped some entries before reaching the worker." Comment-only change with no wire-format impact.
|
|
|
|
**Resolution:** _(2026-05-20)_ Reworded the `BulkWriteResult` header comment in `mxaccess_gateway.proto` to identify the real gateway-side per-entry filter — `IConstraintEnforcer.CheckWriteHandleAsync` invoked by `MxAccessGatewayService.ReplaceWriteBulkEntries` — and cross-referenced `docs/Authorization.md` for the rationale. The contract concept (item-level correlation matters because the gateway can drop entries before the worker sees them) is preserved; the misleading "tag-allowlist filter" name is removed so future readers will not search for or build against a non-existent feature. The "Per-item failures populate `error_message` + `hresult` and never raise" sentence is retained verbatim. Comment-only change; `dotnet build src/MxGateway.Contracts/MxGateway.Contracts.csproj` succeeded with 0 warnings / 0 errors on both `net48` and `net10.0`.
|
|
|
|
### Contracts-015
|
|
|
|
| Field | Value |
|
|
|---|---|
|
|
| Severity | Low |
|
|
| Category | Documentation & comments |
|
|
| Location | `src/MxGateway.Contracts/Protos/mxaccess_gateway.proto:571-582` |
|
|
| Status | Resolved |
|
|
|
|
**Description:** `BulkReadResult` carries seven payload-bearing fields beyond the carrier flags — `value`, `quality`, `source_timestamp`, `statuses`, `error_message`, plus `item_handle` and `tag_address` — and the header comment only documents the `was_cached` arm. There is no in-proto statement of which fields carry data on `was_successful = true` versus `was_successful = false`. Cross-checked against the worker: `MxAccessSession.FailedRead` (line 940-956) populates only `ServerHandle`, `TagAddress`, `ItemHandle`, `WasSuccessful = false`, `WasCached`, and `ErrorMessage` — `value`, `quality`, `source_timestamp`, and `statuses` are all left at their proto3 defaults (null / 0 / null / empty). `SucceededRead` populates the value/quality/source_timestamp/statuses from the cached or snapshotted `OnDataChange`. A client reading `BulkReadResult` from the proto alone has no way to know that `value == null` and `quality == 0` on failure are deliberate "absent" markers rather than "value is null with quality bad" data — both interpretations are wire-equivalent. `BulkWriteResult` has the same shape gap for `statuses` / `hresult` on failed entries, but its header comment at least says "Per-item failures populate `error_message` + `hresult` and never raise"; `BulkReadResult` has no equivalent statement.
|
|
|
|
**Recommendation:** Extend the `BulkReadResult` header comment (or add per-field comments on `value` / `quality` / `source_timestamp` / `statuses` / `error_message`) to state explicitly which fields are populated on success and which are left at their proto3 defaults on failure — e.g. "On `was_successful = false`, only `server_handle`, `tag_address`, `item_handle` (when allocated), `was_cached`, and `error_message` are populated; `value`, `quality`, `source_timestamp`, and `statuses` are left at their proto3 defaults and must not be read as data." Comment-only change with no wire-format impact.
|
|
|
|
**Resolution:** _(2026-05-20)_ Extended the `BulkReadResult` header comment in `mxaccess_gateway.proto` with explicit per-arm documentation, mirroring the level of detail the existing `BulkWriteResult` header carries. On `was_successful = true` the comment now states `value` / `quality` / `source_timestamp` / `statuses` carry the read data (from the cached subscription or the snapshot lifecycle, depending on `was_cached`) and `error_message` is empty. On `was_successful = false` the comment lists exactly which fields are populated (`server_handle`, `tag_address`, `item_handle` when allocated, `was_cached`, `error_message`) and warns that `value` / `quality` / `source_timestamp` / `statuses` are left at their proto3 defaults and must not be read as data — explicitly noting they are wire-indistinguishable from "value is null with quality bad" data so a future reader cannot make that mistake. The comment also pins the deliberate absence of an `hresult` field on `BulkReadResult` (cross-referencing `docs/DesignDecisions.md` "Bulk Command Family" for the rationale) and the "Per-tag failures populate `error_message` and never raise" semantic that parallels `BulkWriteResult`. Comment-only change; `dotnet build src/MxGateway.Contracts/MxGateway.Contracts.csproj` succeeded with 0 warnings / 0 errors on both `net48` and `net10.0`.
|