Re-reviewed every module/client against the 10-category checklist
(REVIEW-PROCESS.md) at commit 1cd51bb, filed 72 new findings, and
fixed them in three priority waves (3 High, 17 Medium, 52 Low).
Highs
- Server-017: enumerate AcknowledgeAlarm / QueryActiveAlarms in
GatewayGrpcScopeResolver so non-admin keys can use them; document
the mapping in docs/Authorization.md; add interceptor tests.
- Client.Java-013: add the five missing bulk-method stubs to the
CLI FakeSession so the test module compiles on a clean tree.
- Client.Rust-013: fix the clippy::doc_lazy_continuation regression
in generated tonic code by reformatting the ReadBulkCommand proto
comment and scoping a #![allow(...)] to the generated submodules.
Mediums (highlights)
- Server: unify GatewaySession state-lock discipline (-015) and
make DisposeAsync race-safe against in-flight CloseAsync (-016);
add constraint-enforcement test coverage for the bulk-plan path
(-021).
- Worker: introduce StaRuntimeShutdownException so RunAlarmPollLoop
can distinguish graceful shutdown from a real STA-affinity
violation (-016); have the watchdog skip StaHung while
CurrentCommandCorrelationId is non-empty so a legitimate slow
ReadBulk no longer self-faults (-017).
- Tests: add per-method round-trip + cancellation coverage for the
11 GatewaySession bulk methods (-013); replace the real TCP probe
in GalaxyHierarchyCacheTests with an IGalaxyRepository fake
(-016).
- IntegrationTests: drive the StreamEvents writer in the live Write
test and assert OnWriteComplete (-012); add live tests for
Unadvise/RemoveItem/Unregister ordering, WriteSecured, and
abnormal worker exit (-014).
- Worker.Tests: replace MxAccessSession reflection with an internal
CreateForTesting factory (-016); cover WorkerCancel and
unexpected-body envelope branches (-017).
- Client.Java: cancel MxEventStream when close() races
beforeStart() (-014); return a CancellingCompletableFuture that
actually forwards cancellation through .thenApply chains (-015).
- Client.Python: drop the silent localhost-plaintext downgrade in
the CLI; require explicit --plaintext (-013).
- Client.Rust: stop bench-read-bulk from polluting success-latency
histograms with failed-call durations (-015); add coverage for
the five MalformedReply paths, the bulk-write helpers, the
Error::Unavailable mapping, and the unary-fault path (-016).
- Contracts: extend docs/Contracts.md with the bulk read/write
command family (-009).
Lows (highlights)
- Server: cap GalaxyGlobMatcher.RegexCache; align
WorkerAlarmRpcDispatcher missing-session handling; drop the
duplicate dashboard @page routes; refresh IAlarmRpcDispatcher
XML doc.
- Worker: surface SetXmlAlarmQuery COM failures; remove dead
subscriptionExpression / ExecutingCommand arms; preserve
factory-supplied runtime sessions; split MxAlarmSnapshot.cs into
three files.
- Tests: dispose the WebApplication in seven test classes; rebuild
FakeWorkerProcess.WaitForExitAsync against a real TaskCompletion
source; switch the heartbeat-expires test to ManualTimeProvider;
add InvariantCulture to the remaining DateTimeOffset.Parse sites;
document GalaxyFilterInputSafetyTests in GatewayTesting.md.
- IntegrationTests: comment fixes, RecordingServerStreamWriter
IDisposable, class-level [Trait], single-source ZB default
connection string.
- Worker.Tests: replace silent-return gating with LiveMxAccessFact
so absent env vars SKIP not pass; PascalCase rename of probe
[Fact]s; deterministic deadline test; new frame-protocol error
tests; ComputeTransitions diff-coverage; relocate dev-rig probes
to Probes/.
- Contracts: add round-trip coverage and per-field redaction /
Galaxy-identifier comments to the protos.
- Client.Dotnet: introduce clients/dotnet/Directory.Build.props so
TreatWarningsAsErrors / analysers apply; document
DiscoverHierarchyOptions and IMxGatewayCliClient; require typed
bulk-read handles in CLI; surface AcknowledgeAlarm transport
faults through Translate().
- Client.Go: kill dead code in alarms_test / fakeGalaxyServer /
runWriteBulkVariant; document the six new subcommands in
writeUsage; drain galaxy-watch events on limit; switch io.EOF
comparisons to errors.Is.
- Client.Java: shared shutdown helpers + new shutdownTimeout
option; regex-based credential redaction; Long.toUnsignedString
for uint64 sequence; doc fixes.
- Client.Python: combine duplicate imports; add coverage for
_percentile / bench-read-bulk / MAX_AGGREGATE_EVENTS /
_api_key_from_env; populate pyproject metadata and ship py.typed.
- Client.Rust: expose next_correlation_id() so CLI ping/close
stop hard-coding correlation IDs; resync RustClientDesign.md
with the current Session / Error surface and CLI subcommand set.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds five new MXAccess command kinds (WriteBulk, Write2Bulk,
WriteSecuredBulk, WriteSecured2Bulk, ReadBulk) that ride the existing
"one round-trip, per-entry results" bulk shape used by AddItemBulk and
SubscribeBulk today. MXAccess COM has no native bulk API; the worker
runs each bulk operation as a sequential loop on its STA, returning
one BulkWriteResult / BulkReadResult per requested entry so per-item
MXAccess failures surface as was_successful=false rather than throwing.
ReadBulk has no MXAccess analogue. The worker satisfies it by:
- Returning the last cached OnDataChange payload (was_cached=true)
when the requested tag is already in the session''s item registry
AND advised — the existing subscription is NOT touched, since the
caller did not create it.
- Otherwise taking the AddItem + Advise + wait-for-OnDataChange +
UnAdvise + RemoveItem snapshot lifecycle itself (was_cached=false)
and leaving the session exactly as it was. The wait pumps Windows
messages on the STA so the inbound MXAccess event can dispatch
while the executor still holds the thread.
The new MxAccessValueCache lives on each MxAccessSession, shared with
MxAccessBaseEventSink which populates it on every OnDataChange after
the event clears the outbound queue. Eviction on RemoveItem keeps
reused MXAccess handles from serving stale values from a previous
lifetime.
Gateway-side authorization wires WriteBulk/Write2Bulk to invoke:write,
WriteSecuredBulk/WriteSecured2Bulk to invoke:secure, ReadBulk to
invoke:read. The constraint-filter pipeline is refactored from a single
BulkConstraintPlan record into an abstract base plus three concretes
(SubscribeBulk, WriteBulk, ReadBulk), each owning its own denied-entry
merge so the dispatch site never branches on reply shape. A new
FilterWriteBulkAsync<TEntry> generic over the four write-entry shapes
runs CheckWriteHandleAsync per entry; denied entries surface as the
BulkWriteResult shape, preserving original-index order.
All five language clients (.NET, Go, Rust, Python, Java) gained the
five new methods following their existing bulk pattern, with regenerated
protobufs.
Tests added:
- MxAccessValueCacheTests (6 cases) — Set/TryGet, Remove resets the
version, TryWaitForUpdate signals on Set, pump step fires each poll.
- MxAccessBaseEventSinkTests — OnDataChange populates the cache,
ValueCache property exposes the bound instance.
- MxAccessCommandExecutorTests — four bulk-write variants (per-entry
success/failure, value+timestamp forwarding, secured user ids),
ReadBulk snapshot lifecycle on uncached tag (timeout surfaces as
was_successful=false), invalid-payload reply.
- GatewayGrpcScopeResolverTests — five new MxCommandKind cases.
- SessionManagerTests — WriteBulk and ReadBulk forwarding through
FakeWorkerHarness; ReadBulk forwards timeout_ms.
- Per-client (.NET, Go, Rust, Python, Java) — WriteBulk builds the
right command and returns per-entry results, ReadBulk forwards the
timeout and unpacks the was_cached flag.
Cross-language e2e CLI subcommands for the new bulks are deliberately
scoped out of this change (each of the five client CLIs would need
five new subcommands plus matching phases in
scripts/run-client-e2e-tests.ps1); coverage equivalent to the existing
bulk-subscribe coverage is provided by worker + gateway + per-client
unit tests.
Docs updated in the same commit: gateway.md (Public MXAccess Command
Surface), docs/DesignDecisions.md (new "Bulk Command Family" section
with the ReadBulk cache-then-snapshot rationale), and every client
README.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Worker.Tests-003: removed the wall-clock `Elapsed < 2s` assertion from
InvokeAsync_WakesIdlePumpForQueuedCommand; the awaited completion against a
30s idle period already proves the wake event drove dispatch.
Worker.Tests-004: MxAccessStaSession.Dispose now joins the alarm poll task
after cancelling the CTS (consistent with ShutdownGracefullyAsync), and
Dispose_StopsAlarmPollLoop asserts deterministically instead of via Task.Delay.
Worker.Tests-005: undisposed MemoryStream instances across the frame-protocol
and pipe-session tests are now `using` declarations.
Worker.Tests-006: Dispose_StopsAlarmPollLoop now constructs MxAccessStaSession
with `using` so a failed assertion cannot leak the STA poll loop.
Worker.Tests-007: docs/WorkerFrameProtocol.md verification section corrected
to target MxGateway.Worker.Tests / MxGateway.Worker with -p:Platform=x86.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Worker-004: post-watchdog-fault heartbeats reported a non-faulted state.
ReportWatchdogFaultIfNeededAsync now sets _state = Faulted before writing
the StaHung fault.
Worker-005 (re-triaged): the cited OnPoll site was removed by Worker-001;
the real silent-failure bug was in MxAccessStaSession.RunAlarmPollLoopAsync,
which caught only graceful-stop exceptions. A failing PollOnce now records a
WorkerFault on the event queue instead of vanishing on a non-awaited task.
Worker-006: RunAsync's finally skipped runtime disposal when shutdown timed
out, leaking the STA thread and COM object. It now always disposes
(MxAccessStaSession.Dispose is idempotent and bounded).
Worker-007 (re-triaged): replaced MxAccessComServer's Type.InvokeMember
reflection fallback with an IMxAccessServer fast path plus typed
ILMXProxyServer* casts; a non-conforming object now fails fast.
Worker-008: alarm consumer STA affinity was unenforced. MxAccessStaSession
records the alarm consumer's STA thread id and asserts every PollOnce runs
on it via a unit-testable guard.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Gap 1 — WorkerPipeSession now passes `eq => new AlarmCommandHandler(eq)` as
the alarmCommandHandlerFactory in all three places it constructs
MxAccessStaSession (two convenience constructors and InitializeMxAccessAsync).
Previously the parameterless MxAccessStaSession() set the factory to null,
so every SubscribeAlarms / AcknowledgeAlarm / QueryActiveAlarms command
returned "alarm consumer not configured" in a deployed worker.
- Added internal `MxAccessStaSession(Func<MxAccessEventQueue, IAlarmCommandHandler>?)`
constructor that builds all defaults but accepts a factory.
- Added public `MxAccessStaSession(StaRuntime, factory, eventQueue, alarmFactory?)`
4-arg overload to complete the constructor chain.
Gap 2 — WnWrapAlarmConsumer now disables its internal threadpool Timer
(pollIntervalMilliseconds=0 in the default constructor). MxAccessStaSession
starts a `RunAlarmPollLoopAsync` background task that sleeps off-STA then
calls `staRuntime.InvokeAsync(() => handler.PollOnce())` at 500ms intervals.
This satisfies the ThreadingModel=Apartment requirement of wwAlarmConsumerClass:
every GetXmlCurrentAlarms2 call now runs on the worker's STA.
- Added `PollOnce()` to `IMxAccessAlarmConsumer`, `AlarmDispatcher`,
`IAlarmCommandHandler`, and `AlarmCommandHandler`.
- Poll loop cancelled and awaited before alarm handler disposal in both
ShutdownGracefullyAsync and Dispose.
Tests: 4 new tests in MxAccessStaSessionTests verify that
- SubscribeAlarms reaches the handler when the factory is wired (Gap 1)
- SubscribeAlarms returns InvalidRequest without a factory (regression guard)
- PollOnce is called on the STA thread within 3s (Gap 2)
- The poll loop stops after Dispose (Gap 2 lifecycle)
All fake IMxAccessAlarmConsumer / IAlarmCommandHandler test implementations
updated with no-op PollOnce() to satisfy the new interface member.
Worker tests: 199 passed / 1 pre-existing failure / 4 skipped (was 195/1/4).
Server tests: 308 passed / 0 failures (unchanged).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds the worker-side IPC surface for the alarm subsystem so the gateway
can drive the AlarmDispatcher across the named-pipe boundary. Adds four
proto MxCommandKind values + matching command messages and two
MxCommandReply payload variants:
- SubscribeAlarmsCommand(subscription_expression)
- UnsubscribeAlarmsCommand
- AcknowledgeAlarmCommand(alarm_guid, comment, operator_user/node/domain/full_name)
- QueryActiveAlarmsCommand(alarm_filter_prefix)
- AcknowledgeAlarmReplyPayload(native_status)
- QueryActiveAlarmsReplyPayload(repeated ActiveAlarmSnapshot snapshots)
Worker plumbing:
- New IAlarmCommandHandler interface + AlarmCommandHandler production
impl. Lazy-creates an AlarmDispatcher (with a wnwrap-backed consumer
by default) on the first SubscribeAlarms; routes Acknowledge / QueryActive /
Unsubscribe through it. Idempotent under repeated Unsubscribe; rejects
a second Subscribe without an intervening Unsubscribe; cleans up the
consumer if the underlying Subscribe call throws.
- MxAccessCommandExecutor: 4 new switch arms map MxCommandKind values to
IAlarmCommandHandler calls. Acknowledge surfaces the AVEVA native
status into both MxCommandReply.Hresult and the dedicated
AcknowledgeAlarmReplyPayload.NativeStatus so gateway-side consumers
can echo it without unpacking the outer envelope. Invalid GUIDs and
missing payloads return InvalidRequest; handler exceptions return
MxaccessFailure with the exception message in DiagnosticMessage.
- MxAccessStaSession: new constructor overload accepts an
alarmCommandHandlerFactory; it's invoked on the STA thread during
StartAsync and the resulting handler is passed into the executor.
ShutdownGracefullyAsync + Dispose tear it down on the STA before the
data-side cleanup runs.
Tests: 20 new unit tests covering AlarmCommandHandler lazy lifecycle
(Subscribe/Unsubscribe/Acknowledge/Query/Dispose, error paths) and the
executor's 4 alarm switch arms (OK/InvalidRequest/MxaccessFailure paths,
hresult propagation, prefix filtering). Worker test suite total: 192
passed / 3 skipped (live probes) / 1 pre-existing structure-test fail
(untouched).
Deferred to next slice: gateway-side WorkerAlarmRpcDispatcher that
replaces NotWiredAlarmRpcDispatcher, builds + sends these commands across
the IPC, and unwraps the resulting MxCommandReply into AcknowledgeAlarmReply
/ ActiveAlarmSnapshot stream.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>