Second re-review pass at commit a020350 caught 48 new findings — including
one High-severity regression I introduced in the prior sweep — and fixed
them all in one parallel wave.
High (1)
- Client.Python-018: prior sweep set `license = "Proprietary"` in
pyproject.toml. setuptools >= 77 enforces PEP 639 and rejects the
string (it must be a valid SPDX expression), so `pip wheel .` and
`pip install -e .` both fail before any source compiles. Tests
still pass because pytest bypasses the build backend via
`pythonpath`. Dropped the invalid license string, kept the
`License :: Other/Proprietary License` classifier, and added
`tests/test_packaging.py` so a future regression of the same shape
is caught in CI.
Mediums (6)
- Worker-023: `HeartbeatStuckCeiling` (default 75s = 5x HeartbeatGrace)
on WorkerPipeSessionOptions bounds the in-flight-command watchdog
suppression so a truly stuck COM call still triggers StaHung
instead of permanently defeating the watchdog.
- Client.Rust-018: reverted Rust's `latencyMs` split so the
cross-language bench comparison is apples-to-apples again;
`failureLatencyMs` kept as Rust-only enrichment.
- Client.Java-021: applied Client.Java-002's terminal-state
serialisation pattern to DeployEventStream so close() arriving
after queue-overflow can't erase the overflow exception.
- IntegrationTests-017: teardown-parity test now uses a two-window
stability check after UnAdvise instead of strict equality against
the pre-UnAdvise count (which raced against in-flight events).
- IntegrationTests-019: new RecordingTestOutputHelper wraps every
log sink the WriteSecured live test owns (worker stdout/stderr,
gateway logs, direct WriteLine) so the credential is proven
absent from the full output buffer, not just the diagnostic
message.
- Tests-020: added MxAccessGatewayServiceConstraintTests coverage
for the previously-uncovered Write2Bulk and WriteSecured2Bulk
arms of WriteBulkConstraintPlan.SetPayload.
Lows (41 — highlights)
- Server: Galaxy glob cache eviction is race-free (Server-024);
GalaxyRepositoryGrpcService takes IGalaxyRepository (Server-025);
AlarmsOptions validated at startup (Server-026); Authorization.md
Constraint Enforcement snippet/prose enumerate the bulk write/read
family (Server-027); bulk-read-commands and bulk-write-commands
capability tokens added to OpenSession (Server-029);
NotWiredAlarmRpcDispatcher XML doc and missing scope-resolver and
state-machine tests cleaned up (023, 028).
- Worker: AlarmCommandHandler now invokes the same STA-affinity
guard the poll path uses, at every command entry (Worker-024);
RunAsync null-checks the runtime-session factory result
(Worker-025).
- Worker.Tests: shared LiveMxAccessOptInVariableName lives on
GatewayContractInfo (Worker.Tests-025); MxAccessSession.CreateForTesting
rejects production sinks (Worker.Tests-026); FakeRuntimeSession's
CancelCommandReturnValue serialised under lock (Worker.Tests-027);
Probes namespace lifted to MxGateway.Worker.Tests.Probes
(Worker.Tests-029); cancel-envelope sequence numbers monotonised
(Worker.Tests-030); docs/GatewayTesting.md gains a "Dev-rig Probes"
section (Worker.Tests-028).
- Tests: ManualTimeProvider consolidated into one TestSupport/ copy
(Tests-021); SessionManagerBulkTests adds a mid-flight cancellation
test backed by a TaskCompletionSource fake (Tests-022); companion
FakeWorkerProcess.WaitForExitAsync no longer fakes its exit signal
(Tests-023); constraint plan reply-count divergence pinned
(Tests-024).
- IntegrationTests: TryGetSession chain carries [MaybeNullWhen(false)]
end-to-end (IntegrationTests-018); abnormal-exit keyword set
tightened to pipe-disconnected/end-of-stream and the test now
asserts streamTask.IsFaulted (020, 021).
- Client.Dotnet: bench commands added to isLongRunning so the
default 30s wall-clock budget doesn't kill them (015);
BenchStreamEventsAsync observes the inner stream task on every
exit path (016).
- Client.Go: parseValue wraps strconv errors with flag context and
%w (017); bench loops honour ctx.Done() (018); galaxy-watch parses
RFC3339Nano with fractional seconds (019); runStreamEvents installs
signal.NotifyContext like runGalaxyWatch (020); five new CLI-level
table-driven tests cover the bulk/bench subcommands (021).
- Client.Java: toCompletable Javadoc rewritten to match the actual
cancellation contract Client.Java-015 established (022); stream-events
text path uses Long.toUnsignedString for worker_sequence (023);
bench-read-bulk no longer pollutes success-latency histogram with
failure durations (024); --shutdown-timeout CLI option propagates
through to ClientOptions (025); seven new MxGatewayCliTests cover
the bulk and bench commands (026).
- Client.Python: mxgateway_cli ships its own py.typed marker (019);
wheel-build smoke test added under tests/test_packaging.py (020);
README documents the Galaxy CLI parity gap explicitly (021).
- Client.Rust: RustClientDesign.md signatures match session.rs and
document the AsRef<str> read_bulk genericism (019);
next_correlation_id re-exported at the crate root, with a
property-style doc contract and an explicit disclaimer that the
literal textual format is not part of the contract (020).
- Contracts: BulkWriteResult comment names the actual
IConstraintEnforcer mechanism instead of "tag-allowlist filter"
(014); BulkReadResult gains explicit per-arm payload-population
documentation for the success vs failure cases (015).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
+19
-8
@@ -123,10 +123,14 @@ private static string ResolveCommandScope(MxCommandKind kind)
|
||||
return kind switch
|
||||
{
|
||||
MxCommandKind.Write or
|
||||
MxCommandKind.Write2 => GatewayScopes.InvokeWrite,
|
||||
MxCommandKind.Write2 or
|
||||
MxCommandKind.WriteBulk or
|
||||
MxCommandKind.Write2Bulk => GatewayScopes.InvokeWrite,
|
||||
|
||||
MxCommandKind.WriteSecured or
|
||||
MxCommandKind.WriteSecured2 or
|
||||
MxCommandKind.WriteSecuredBulk or
|
||||
MxCommandKind.WriteSecured2Bulk or
|
||||
MxCommandKind.AuthenticateUser => GatewayScopes.InvokeSecure,
|
||||
|
||||
MxCommandKind.ArchestraUserToId or
|
||||
@@ -141,7 +145,7 @@ private static string ResolveCommandScope(MxCommandKind kind)
|
||||
}
|
||||
```
|
||||
|
||||
Reads (`Register`, `AddItem`, `Advise`, and any other unspecified kind) fall through to `InvokeRead`, which keeps the matrix small while still separating reads from writes, secured writes, metadata lookups, event drains, and worker shutdown.
|
||||
Reads (`Register`, `AddItem`, `Advise`, `ReadBulk`, and any other unspecified kind) fall through to `InvokeRead`, which keeps the matrix small while still separating reads from writes, secured writes, metadata lookups, event drains, and worker shutdown. The four bulk-write families (`WriteBulk`, `Write2Bulk`, `WriteSecuredBulk`, `WriteSecured2Bulk`) are mapped explicitly so a missing arm cannot silently demote a bulk write to a read scope.
|
||||
|
||||
## Constraint Enforcement
|
||||
|
||||
@@ -174,11 +178,18 @@ page create dialog (see
|
||||
dashboard API Keys page also renders each key's effective constraints.
|
||||
|
||||
The service checks read constraints for `AddItem`, `AddItem2`, `AddItemBulk`,
|
||||
`SubscribeBulk`, and `AdviseItemBulk`. It checks write constraints for
|
||||
`Write`, `Write2`, `WriteSecured`, and `WriteSecured2`. Successful item
|
||||
registrations are tracked per session so later item-handle commands resolve
|
||||
back to the original tag address. If a constrained key presents an unknown item
|
||||
handle, the gateway fails closed.
|
||||
`SubscribeBulk`, `AdviseItemBulk`, and `ReadBulk`. It checks write constraints
|
||||
for `Write`, `Write2`, `WriteSecured`, `WriteSecured2`, `WriteBulk`,
|
||||
`Write2Bulk`, `WriteSecuredBulk`, and `WriteSecured2Bulk`. Bulk commands run
|
||||
through `BulkConstraintPlan` (`ReadBulkConstraintPlan`,
|
||||
`WriteBulkConstraintPlan`, `SubscribeBulkConstraintPlan`), which preserves the
|
||||
caller's input order: each entry is evaluated against the constraint surface,
|
||||
and `BulkConstraintPlan.MergeDeniedInto` re-merges denied entries back into
|
||||
their original index positions so the reply slot at `entries[i]` always
|
||||
corresponds to the request slot at `entries[i]`. Successful item registrations
|
||||
are tracked per session so later item-handle commands resolve back to the
|
||||
original tag address. If a constrained key presents an unknown item handle,
|
||||
the gateway fails closed.
|
||||
|
||||
Non-bulk constraint failures return gRPC `PermissionDenied`. Bulk read
|
||||
commands preserve input order and return a failed `SubscribeResult` for each
|
||||
@@ -195,7 +206,7 @@ blocking constraint; secured values and raw credentials are never logged.
|
||||
| `SessionOpen` | `session:open` | `OpenSessionRequest` |
|
||||
| `SessionClose` | `session:close` | `CloseSessionRequest` |
|
||||
| `EventsRead` | `events:read` | `StreamEventsRequest`, `QueryActiveAlarmsRequest`, `MxCommandKind.DrainEvents` |
|
||||
| `InvokeRead` | `invoke:read` | `MxCommandRequest` for read-style command kinds (`Register`, `AddItem`, `Advise`, and any kind not otherwise mapped) |
|
||||
| `InvokeRead` | `invoke:read` | `MxCommandRequest` for read-style command kinds (`Register`, `AddItem`, `Advise`, `ReadBulk`, and any kind not otherwise mapped) |
|
||||
| `InvokeWrite` | `invoke:write` | `AcknowledgeAlarmRequest`, `MxCommandKind.Write`, `MxCommandKind.Write2`, `MxCommandKind.WriteBulk`, `MxCommandKind.Write2Bulk` |
|
||||
| `InvokeSecure` | `invoke:secure` | `MxCommandKind.WriteSecured`, `MxCommandKind.WriteSecured2`, `MxCommandKind.WriteSecuredBulk`, `MxCommandKind.WriteSecured2Bulk`, `MxCommandKind.AuthenticateUser` |
|
||||
| `MetadataRead` | `metadata:read` | `MxCommandKind.ArchestraUserToId`, `MxCommandKind.GetSessionState`, `MxCommandKind.GetWorkerInfo`, `GalaxyRepository.TestConnection`, `GalaxyRepository.GetLastDeployTime`, `GalaxyRepository.DiscoverHierarchy`, `GalaxyRepository.WatchDeployEvents` |
|
||||
|
||||
@@ -104,6 +104,36 @@ The test output includes session id, worker process id, command status,
|
||||
HRESULT/status diagnostics, event sequence and handles, close status, and worker
|
||||
stdout/stderr lines emitted during the run.
|
||||
|
||||
## Dev-rig Probes
|
||||
|
||||
`src/MxGateway.Worker.Tests/Probes/` partitions runtime probes from the regular
|
||||
Worker.Tests regression suite. The folder is its own
|
||||
`MxGateway.Worker.Tests.Probes` namespace so a discovery filter (e.g. `dotnet
|
||||
test --filter FullyQualifiedName~MxGateway.Worker.Tests.Probes`) can target or
|
||||
exclude them without enumerating individual class names. The probes are
|
||||
`[Fact(Skip = "...")]` by default and exist to characterize live AVEVA
|
||||
behavior on the dev rig, not to gate CI — flip `Skip = null` on the dev box
|
||||
with installed MXAccess + a running Galaxy provider when running them:
|
||||
|
||||
- `AlarmsLiveSmokeTests` — end-to-end smoke for the alarms-over-gateway
|
||||
pipeline (`WnWrapAlarmConsumer` + `AlarmDispatcher` +
|
||||
`MxAccessAlarmEventSink`) against `\\<machine>\Galaxy!DEV` with the dev rig's
|
||||
10-second flip script writing `TestMachine_001.TestAlarm001`.
|
||||
- `AlarmClientWmProbeTests` — registers as an `AlarmClient` consumer on a real
|
||||
hidden message-only window and logs every Win32 message that arrives during
|
||||
a fixed pump window. Used to identify the `WM_APP` /
|
||||
`RegisterWindowMessage` IDs alarm callbacks use.
|
||||
- `WnWrapConsumerProbeTests` — instantiates AVEVA's standalone `wnwrapConsumer`
|
||||
COM class, subscribes to the dev rig's `\\<machine>\Galaxy!DEV` provider,
|
||||
and polls `GetXmlCurrentAlarms2`. The XML payload bypasses the
|
||||
`FILETIME→DateTime` auto-marshaling that crashes
|
||||
`aaAlarmManagedClient.AlarmClient.GetHighPriAlarm` on this rig.
|
||||
|
||||
The probes share the Worker.Tests project (so they can use its `net48`/`x86`
|
||||
configuration and the installed `ArchestrA.MxAccess` / `aaAlarmManagedClient`
|
||||
references), but they are not part of the regression contract — a Worker.Tests
|
||||
run with `Skip` left in place passes them as skipped.
|
||||
|
||||
## Live Galaxy Repository
|
||||
|
||||
`GalaxyRepositoryLiveTests` in `src/MxGateway.IntegrationTests/Galaxy/` exercises
|
||||
|
||||
@@ -672,6 +672,23 @@ heartbeat fields until dedicated thresholds own those warnings. The worker
|
||||
reports stale STA activity, but the gateway owns the final kill decision
|
||||
through its existing heartbeat and worker lifecycle policy.
|
||||
|
||||
The in-flight-command suppression itself is bounded by
|
||||
`WorkerPipeSessionOptions.HeartbeatStuckCeiling` (default 75 seconds = 5 ×
|
||||
`HeartbeatGrace`). The motivating case for the suppression is a legitimately
|
||||
slow synchronous command — but a genuinely stuck COM call (for example
|
||||
against a dead MXAccess provider whose cross-apartment marshaler is
|
||||
permanently blocked, or a write completion that never fires) leaves
|
||||
`CurrentCommandCorrelationId` non-empty indefinitely. Without an upper bound
|
||||
the worker-side `StaHung` watchdog would be permanently defeated for that
|
||||
session and only the gateway's per-command timeout would catch the hang —
|
||||
losing the worker-originated diagnostic (`StaHung` fault category, the
|
||||
stale-by interval) from the gateway audit trail. Once `LastStaActivityUtc`
|
||||
has been stale for longer than `HeartbeatStuckCeiling`, the watchdog fires
|
||||
`StaHung` regardless of whether a command is in flight, on the assumption
|
||||
that no legitimate STA command should run that long without periodically
|
||||
refreshing activity. Deployments that legitimately run very long bulk
|
||||
operations should raise the ceiling rather than disable it.
|
||||
|
||||
## Shutdown
|
||||
|
||||
Graceful shutdown sequence:
|
||||
|
||||
Reference in New Issue
Block a user