Browse renders the Galaxy hierarchy tree from IGalaxyHierarchyCache:
expandable areas/objects with attribute name, data type and the
alarm/historized flags, plus a name/reference filter. Right-click or
double-click an attribute to add it to a subscription panel that polls
live value, quality and source timestamp every two seconds.
Alarms lists the worker's currently-active alarm set via
IAlarmRpcDispatcher, defaulting to unacknowledged Active alarms with
filters for acknowledged alarms, area, severity range and text. It is
read-only and warns when alarm auto-subscribe is disabled.
Both tabs read live MXAccess data through a new singleton
DashboardLiveDataService that owns one shared, lazily-opened gateway
session (one worker) for the whole dashboard, re-opened transparently
if it faults or its lease expires.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Set MxGateway:Alarms with Enabled=true and the dev-rig subscription
expression \DESKTOP-6JL3KKO\Galaxy!DEV. Worker sessions now subscribe
their wnwrap alarm consumer on open; without this the QueryActiveAlarms
RPC returns an empty stream because no session is ever alarm-subscribed.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Both findings surfaced when running the cross-language e2e matrix
(scripts/run-client-e2e-tests.ps1) against the redeployed gateway at
commit 84d36b7. Filed in code-reviews/Server/findings.md and
code-reviews/Client.Dotnet/findings.md and fixed in the same change.
Server-030 (Medium / Error handling): GatewaySession.GetReadyWorkerClient
gated on `_state == Ready && _workerClient.State == Ready` but only
formatted `_state` into the SessionManagerException message. Under load
the gateway-driven `_state` and the worker-driven `WorkerClient.State`
can diverge, producing a self-contradictory diagnostic ("Session ... is
not ready. Current state is Ready."). The Java e2e client hit this on
the 56th item after 55 successful add-items. Rewrote the message to
include both states ("Session state is X; worker state is Y"), added
an XML doc explaining the two-state contract and that this branch is
the fail-fast for a divergence race, and added regression test
SessionManagerTests.InvokeAsync_WhenWorkerNotReadyButSessionReady_DiagnosticIncludesBothStates
that pins both states appear in the message. The deeper race (should
the gateway briefly wait for worker-Ready before failing?) remains
open as a follow-up.
Client.Dotnet-017 (Low / Error handling): stream-events CLI threw
OperationCanceledException as an unhandled exception when the user's
--timeout expired before --max-events was reached. Exit code
-532462766, no aggregate JSON. The other client CLIs (Go, Rust, Python,
Java) exit 0 in this case. Wrapped the `await foreach` in
`catch (OperationCanceledException) when (cancellationToken.IsCancellationRequested)`
so the supplied token's cancellation (--timeout, Ctrl+C, or parent
CTS) becomes graceful completion; the aggregate `{ "events": [...] }`
JSON still runs after the catch. Added regression test
RunAsync_StreamEvents_WhenTimeoutFiresAfterEvents_EmitsCollectedEventsAndExitsZero
backed by a new FakeCliClient.StreamHangAfterEvents hook that yields
the configured events then parks on the cancellation token.
Side cleanup: the GatewayApplicationTests test added under Server-020
was asserting an invariant (`/dashboard/dashboard/X` doesn't exist)
that I broke by reverting Server-020 in 84d36b7. The doubled endpoint
shapes do exist now (MapGroup("/dashboard") prefixing an already
"/dashboard/X" @page directive) but they're harmless — no client
requests `/dashboard/dashboard/X`. Replaced the test with a positive
assertion (`/dashboard/X` routes ARE registered) and rewrote the XML
doc to record the actual contract.
Verified: dotnet test src/MxGateway.Tests passes 480/480, dotnet test
clients/dotnet/MxGateway.Client.Tests passes 77/77, gateway redeployed
at this commit and GET http://localhost:5130/dashboard returns 200.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Server-020 incorrectly removed the duplicate `@page "/dashboard/X"` directive
from each dashboard Razor page on the assumption that `MapGroup("/dashboard")`
would prepend the prefix to Blazor SSR route matching. It does not — Blazor's
`@page` template matcher operates on the full URL path, not relative to a
MapGroup. The removal left the dashboard returning HTTP 500 with
"Unable to find the provided template '/dashboard/'" from
RouteTableFactory.CreateEntry on every page.
Restored the eight `@page "/dashboard/X"` directives. The accompanying
regression test still passes (it asserts the genuinely-double-prefixed shape
`/dashboard/dashboard/X` never appears — it never did, since the original
duplicates were `"/"` + `"/dashboard/"`, not `"/dashboard/"` repeated). XML
doc on the test rewritten to record what was actually wrong with Server-020.
Verified: gateway redeploy + `GET http://localhost:5130/dashboard` returns
HTTP 200 7.3 KB; `/dashboard/sessions` also 200.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Second re-review pass at commit a020350 caught 48 new findings — including
one High-severity regression I introduced in the prior sweep — and fixed
them all in one parallel wave.
High (1)
- Client.Python-018: prior sweep set `license = "Proprietary"` in
pyproject.toml. setuptools >= 77 enforces PEP 639 and rejects the
string (it must be a valid SPDX expression), so `pip wheel .` and
`pip install -e .` both fail before any source compiles. Tests
still pass because pytest bypasses the build backend via
`pythonpath`. Dropped the invalid license string, kept the
`License :: Other/Proprietary License` classifier, and added
`tests/test_packaging.py` so a future regression of the same shape
is caught in CI.
Mediums (6)
- Worker-023: `HeartbeatStuckCeiling` (default 75s = 5x HeartbeatGrace)
on WorkerPipeSessionOptions bounds the in-flight-command watchdog
suppression so a truly stuck COM call still triggers StaHung
instead of permanently defeating the watchdog.
- Client.Rust-018: reverted Rust's `latencyMs` split so the
cross-language bench comparison is apples-to-apples again;
`failureLatencyMs` kept as Rust-only enrichment.
- Client.Java-021: applied Client.Java-002's terminal-state
serialisation pattern to DeployEventStream so close() arriving
after queue-overflow can't erase the overflow exception.
- IntegrationTests-017: teardown-parity test now uses a two-window
stability check after UnAdvise instead of strict equality against
the pre-UnAdvise count (which raced against in-flight events).
- IntegrationTests-019: new RecordingTestOutputHelper wraps every
log sink the WriteSecured live test owns (worker stdout/stderr,
gateway logs, direct WriteLine) so the credential is proven
absent from the full output buffer, not just the diagnostic
message.
- Tests-020: added MxAccessGatewayServiceConstraintTests coverage
for the previously-uncovered Write2Bulk and WriteSecured2Bulk
arms of WriteBulkConstraintPlan.SetPayload.
Lows (41 — highlights)
- Server: Galaxy glob cache eviction is race-free (Server-024);
GalaxyRepositoryGrpcService takes IGalaxyRepository (Server-025);
AlarmsOptions validated at startup (Server-026); Authorization.md
Constraint Enforcement snippet/prose enumerate the bulk write/read
family (Server-027); bulk-read-commands and bulk-write-commands
capability tokens added to OpenSession (Server-029);
NotWiredAlarmRpcDispatcher XML doc and missing scope-resolver and
state-machine tests cleaned up (023, 028).
- Worker: AlarmCommandHandler now invokes the same STA-affinity
guard the poll path uses, at every command entry (Worker-024);
RunAsync null-checks the runtime-session factory result
(Worker-025).
- Worker.Tests: shared LiveMxAccessOptInVariableName lives on
GatewayContractInfo (Worker.Tests-025); MxAccessSession.CreateForTesting
rejects production sinks (Worker.Tests-026); FakeRuntimeSession's
CancelCommandReturnValue serialised under lock (Worker.Tests-027);
Probes namespace lifted to MxGateway.Worker.Tests.Probes
(Worker.Tests-029); cancel-envelope sequence numbers monotonised
(Worker.Tests-030); docs/GatewayTesting.md gains a "Dev-rig Probes"
section (Worker.Tests-028).
- Tests: ManualTimeProvider consolidated into one TestSupport/ copy
(Tests-021); SessionManagerBulkTests adds a mid-flight cancellation
test backed by a TaskCompletionSource fake (Tests-022); companion
FakeWorkerProcess.WaitForExitAsync no longer fakes its exit signal
(Tests-023); constraint plan reply-count divergence pinned
(Tests-024).
- IntegrationTests: TryGetSession chain carries [MaybeNullWhen(false)]
end-to-end (IntegrationTests-018); abnormal-exit keyword set
tightened to pipe-disconnected/end-of-stream and the test now
asserts streamTask.IsFaulted (020, 021).
- Client.Dotnet: bench commands added to isLongRunning so the
default 30s wall-clock budget doesn't kill them (015);
BenchStreamEventsAsync observes the inner stream task on every
exit path (016).
- Client.Go: parseValue wraps strconv errors with flag context and
%w (017); bench loops honour ctx.Done() (018); galaxy-watch parses
RFC3339Nano with fractional seconds (019); runStreamEvents installs
signal.NotifyContext like runGalaxyWatch (020); five new CLI-level
table-driven tests cover the bulk/bench subcommands (021).
- Client.Java: toCompletable Javadoc rewritten to match the actual
cancellation contract Client.Java-015 established (022); stream-events
text path uses Long.toUnsignedString for worker_sequence (023);
bench-read-bulk no longer pollutes success-latency histogram with
failure durations (024); --shutdown-timeout CLI option propagates
through to ClientOptions (025); seven new MxGatewayCliTests cover
the bulk and bench commands (026).
- Client.Python: mxgateway_cli ships its own py.typed marker (019);
wheel-build smoke test added under tests/test_packaging.py (020);
README documents the Galaxy CLI parity gap explicitly (021).
- Client.Rust: RustClientDesign.md signatures match session.rs and
document the AsRef<str> read_bulk genericism (019);
next_correlation_id re-exported at the crate root, with a
property-style doc contract and an explicit disclaimer that the
literal textual format is not part of the contract (020).
- Contracts: BulkWriteResult comment names the actual
IConstraintEnforcer mechanism instead of "tag-allowlist filter"
(014); BulkReadResult gains explicit per-arm payload-population
documentation for the success vs failure cases (015).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Re-reviewed every module/client against the 10-category checklist
(REVIEW-PROCESS.md) at commit 1cd51bb, filed 72 new findings, and
fixed them in three priority waves (3 High, 17 Medium, 52 Low).
Highs
- Server-017: enumerate AcknowledgeAlarm / QueryActiveAlarms in
GatewayGrpcScopeResolver so non-admin keys can use them; document
the mapping in docs/Authorization.md; add interceptor tests.
- Client.Java-013: add the five missing bulk-method stubs to the
CLI FakeSession so the test module compiles on a clean tree.
- Client.Rust-013: fix the clippy::doc_lazy_continuation regression
in generated tonic code by reformatting the ReadBulkCommand proto
comment and scoping a #![allow(...)] to the generated submodules.
Mediums (highlights)
- Server: unify GatewaySession state-lock discipline (-015) and
make DisposeAsync race-safe against in-flight CloseAsync (-016);
add constraint-enforcement test coverage for the bulk-plan path
(-021).
- Worker: introduce StaRuntimeShutdownException so RunAlarmPollLoop
can distinguish graceful shutdown from a real STA-affinity
violation (-016); have the watchdog skip StaHung while
CurrentCommandCorrelationId is non-empty so a legitimate slow
ReadBulk no longer self-faults (-017).
- Tests: add per-method round-trip + cancellation coverage for the
11 GatewaySession bulk methods (-013); replace the real TCP probe
in GalaxyHierarchyCacheTests with an IGalaxyRepository fake
(-016).
- IntegrationTests: drive the StreamEvents writer in the live Write
test and assert OnWriteComplete (-012); add live tests for
Unadvise/RemoveItem/Unregister ordering, WriteSecured, and
abnormal worker exit (-014).
- Worker.Tests: replace MxAccessSession reflection with an internal
CreateForTesting factory (-016); cover WorkerCancel and
unexpected-body envelope branches (-017).
- Client.Java: cancel MxEventStream when close() races
beforeStart() (-014); return a CancellingCompletableFuture that
actually forwards cancellation through .thenApply chains (-015).
- Client.Python: drop the silent localhost-plaintext downgrade in
the CLI; require explicit --plaintext (-013).
- Client.Rust: stop bench-read-bulk from polluting success-latency
histograms with failed-call durations (-015); add coverage for
the five MalformedReply paths, the bulk-write helpers, the
Error::Unavailable mapping, and the unary-fault path (-016).
- Contracts: extend docs/Contracts.md with the bulk read/write
command family (-009).
Lows (highlights)
- Server: cap GalaxyGlobMatcher.RegexCache; align
WorkerAlarmRpcDispatcher missing-session handling; drop the
duplicate dashboard @page routes; refresh IAlarmRpcDispatcher
XML doc.
- Worker: surface SetXmlAlarmQuery COM failures; remove dead
subscriptionExpression / ExecutingCommand arms; preserve
factory-supplied runtime sessions; split MxAlarmSnapshot.cs into
three files.
- Tests: dispose the WebApplication in seven test classes; rebuild
FakeWorkerProcess.WaitForExitAsync against a real TaskCompletion
source; switch the heartbeat-expires test to ManualTimeProvider;
add InvariantCulture to the remaining DateTimeOffset.Parse sites;
document GalaxyFilterInputSafetyTests in GatewayTesting.md.
- IntegrationTests: comment fixes, RecordingServerStreamWriter
IDisposable, class-level [Trait], single-source ZB default
connection string.
- Worker.Tests: replace silent-return gating with LiveMxAccessFact
so absent env vars SKIP not pass; PascalCase rename of probe
[Fact]s; deterministic deadline test; new frame-protocol error
tests; ComputeTransitions diff-coverage; relocate dev-rig probes
to Probes/.
- Contracts: add round-trip coverage and per-field redaction /
Galaxy-identifier comments to the protos.
- Client.Dotnet: introduce clients/dotnet/Directory.Build.props so
TreatWarningsAsErrors / analysers apply; document
DiscoverHierarchyOptions and IMxGatewayCliClient; require typed
bulk-read handles in CLI; surface AcknowledgeAlarm transport
faults through Translate().
- Client.Go: kill dead code in alarms_test / fakeGalaxyServer /
runWriteBulkVariant; document the six new subcommands in
writeUsage; drain galaxy-watch events on limit; switch io.EOF
comparisons to errors.Is.
- Client.Java: shared shutdown helpers + new shutdownTimeout
option; regex-based credential redaction; Long.toUnsignedString
for uint64 sequence; doc fixes.
- Client.Python: combine duplicate imports; add coverage for
_percentile / bench-read-bulk / MAX_AGGREGATE_EVENTS /
_api_key_from_env; populate pyproject metadata and ship py.typed.
- Client.Rust: expose next_correlation_id() so CLI ping/close
stop hard-coding correlation IDs; resync RustClientDesign.md
with the current Session / Error surface and CLI subcommand set.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds five new MXAccess command kinds (WriteBulk, Write2Bulk,
WriteSecuredBulk, WriteSecured2Bulk, ReadBulk) that ride the existing
"one round-trip, per-entry results" bulk shape used by AddItemBulk and
SubscribeBulk today. MXAccess COM has no native bulk API; the worker
runs each bulk operation as a sequential loop on its STA, returning
one BulkWriteResult / BulkReadResult per requested entry so per-item
MXAccess failures surface as was_successful=false rather than throwing.
ReadBulk has no MXAccess analogue. The worker satisfies it by:
- Returning the last cached OnDataChange payload (was_cached=true)
when the requested tag is already in the session''s item registry
AND advised — the existing subscription is NOT touched, since the
caller did not create it.
- Otherwise taking the AddItem + Advise + wait-for-OnDataChange +
UnAdvise + RemoveItem snapshot lifecycle itself (was_cached=false)
and leaving the session exactly as it was. The wait pumps Windows
messages on the STA so the inbound MXAccess event can dispatch
while the executor still holds the thread.
The new MxAccessValueCache lives on each MxAccessSession, shared with
MxAccessBaseEventSink which populates it on every OnDataChange after
the event clears the outbound queue. Eviction on RemoveItem keeps
reused MXAccess handles from serving stale values from a previous
lifetime.
Gateway-side authorization wires WriteBulk/Write2Bulk to invoke:write,
WriteSecuredBulk/WriteSecured2Bulk to invoke:secure, ReadBulk to
invoke:read. The constraint-filter pipeline is refactored from a single
BulkConstraintPlan record into an abstract base plus three concretes
(SubscribeBulk, WriteBulk, ReadBulk), each owning its own denied-entry
merge so the dispatch site never branches on reply shape. A new
FilterWriteBulkAsync<TEntry> generic over the four write-entry shapes
runs CheckWriteHandleAsync per entry; denied entries surface as the
BulkWriteResult shape, preserving original-index order.
All five language clients (.NET, Go, Rust, Python, Java) gained the
five new methods following their existing bulk pattern, with regenerated
protobufs.
Tests added:
- MxAccessValueCacheTests (6 cases) — Set/TryGet, Remove resets the
version, TryWaitForUpdate signals on Set, pump step fires each poll.
- MxAccessBaseEventSinkTests — OnDataChange populates the cache,
ValueCache property exposes the bound instance.
- MxAccessCommandExecutorTests — four bulk-write variants (per-entry
success/failure, value+timestamp forwarding, secured user ids),
ReadBulk snapshot lifecycle on uncached tag (timeout surfaces as
was_successful=false), invalid-payload reply.
- GatewayGrpcScopeResolverTests — five new MxCommandKind cases.
- SessionManagerTests — WriteBulk and ReadBulk forwarding through
FakeWorkerHarness; ReadBulk forwards timeout_ms.
- Per-client (.NET, Go, Rust, Python, Java) — WriteBulk builds the
right command and returns per-entry results, ReadBulk forwards the
timeout and unpacks the was_cached flag.
Cross-language e2e CLI subcommands for the new bulks are deliberately
scoped out of this change (each of the five client CLIs would need
five new subcommands plus matching phases in
scripts/run-client-e2e-tests.ps1); coverage equivalent to the existing
bulk-subscribe coverage is provided by worker + gateway + per-client
unit tests.
Docs updated in the same commit: gateway.md (Public MXAccess Command
Surface), docs/DesignDecisions.md (new "Bulk Command Family" section
with the ReadBulk cache-then-snapshot rationale), and every client
README.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Server-007: GalaxyHierarchyProjector re-filtered the whole hierarchy per
page (O(total) paging). It now memoizes the filtered list per cache-entry +
filter signature so subsequent pages are an O(pageSize) slice.
Server-008: WatchDeployEvents re-resolved browse subtrees and rebuilt globs
per streamed event. ResolveBrowseSubtrees is hoisted out of the loop and
GalaxyGlobMatcher caches compiled Regex instances per pattern.
Server-009: auth-store connections used no busy timeout or WAL. A new
OpenConnectionAsync applies journal_mode=WAL and a busy_timeout; all auth
call sites use it. docs/Authentication.md updated.
Server-010: the dashboard rendered Rotate/Revoke for revoked keys, where
Rotate silently reactivates them. ApiKeysPage now shows actions only for
Active keys. docs/Authentication.md updated.
Server-011: WorkerAlarmRpcDispatcher converted to a primary constructor and
brought in line with module conventions.
Server-012: CLAUDE.md corrected to the canonical *:* scope strings.
Server-013 (partly re-triaged): three named coverage gaps were already
closed; the genuine gap (WorkerExecutableValidator) is now covered.
Server-014: rewrote stale "alarm path not yet wired" comments in
MxAccessGatewayService to describe the production WorkerAlarmRpcDispatcher.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Server-002: the gateway never terminated leftover MxGateway.Worker.exe
processes at startup, contradicting gateway.md and CLAUDE.md. Added
IRunningProcessInspector/SystemRunningProcessInspector, OrphanWorkerTerminator,
and OrphanWorkerCleanupHostedService (best-effort, runs before sessions are
accepted); updated gateway.md to describe the implemented behavior.
Server-004: API-key scopes were persisted verbatim with no validation. Added
GatewayScopes.All/IsKnown; the CLI parser and dashboard create path now
reject unknown scope strings.
Server-005: a non-SqlException/InvalidOperationException fault on the initial
Galaxy hierarchy load faulted the BackgroundService. ExecuteAsync now catches
all non-cancellation exceptions on first load and RefreshCoreAsync broadens
its catch so the cache records Stale/Unavailable instead.
Server-006: OpenSessionAsync incremented the open-sessions gauge before
alarm auto-subscribe; an auto-subscribe failure leaked the gauge. The catch
path now calls SessionRemoved() when the gauge was incremented.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Fixes code-review findings Server-001 (Critical) and Server-003 (High).
Server-001: the dashboard Razor components were mapped with no
authorization policy, so every dashboard page — including the API Keys
page — was reachable unauthenticated. MapRazorComponents<App>() now
requires DashboardAuthenticationDefaults.AuthorizationPolicy;
unauthenticated requests are challenged by the cookie scheme and
redirected to the login page.
Server-003: DashboardAuthenticator.CreatePrincipal never issued the
'scope' claim that DashboardAuthorizationHandler checks when
Dashboard:RequireAdminScope is enabled, so enforcing the policy would
have denied every LDAP login. CreatePrincipal (reached only after the
required-group check passes) now emits the admin scope claim.
Replaces the GatewayApplicationTests case that asserted dashboard
routes allow anonymous access — it encoded the bug as expected
behavior — with tests that verify component routes require the policy
and the login/logout/denied endpoints allow anonymous.
All 309 MxGateway.Tests pass.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Group the two double-width timestamp cards at the start of the metric
row so the deploy/refresh pair reads together, ahead of the count cards.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The Last Deploy and Last Refresh metric cards hold full timestamps that
wrapped to three or four lines in a single-width card. Add a Wide option
to MetricCard (grid-column: span 2) and set it on both Galaxy timestamp
cards. Also switch .metric-value to overflow-wrap: break-word so a date
token is never split mid-value.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
overflow-wrap: anywhere let the table layout shrink a column below its
longest word, splitting tokens like "unconstrained" mid-word. Switch to
break-word so a word only breaks when it genuinely cannot fit.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Bootstrap 5 .btn no longer pins white-space, so the Rotate/Revoke action
buttons broke mid-word in the narrow API Keys actions column. Pin .btn
to white-space: nowrap so a label always reads as one word.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Status chips wrapped to two lines in narrow table columns (e.g. the
session State column). Pin .chip to white-space: nowrap so an enumerated
state always reads as one token.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Restyles the Blazor dashboard onto a portable token-based theme so it
reads like an instrument panel: warm-paper background, hairline-ruled
panels, IBM Plex type, monospace tabular numerics, and status carried by
colour chips. Vendors theme.css + IBM Plex fonts, rewrites dashboard.css
as a thin token-driven view layer, and swaps the Bootstrap navbar and
status badges for the design-system app bar and chips.
Also includes pending API-key management, Galaxy hierarchy projection,
and constraint-enforcement work with their tests.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Closes the gap where the public AcknowledgeAlarm RPC required canonical
GUIDs but OnAlarmTransitionEvent.AlarmFullReference is "Provider!Group.Tag".
Adds an AVEVA AlarmAckByName path that wraps wwAlarmConsumerClass.AlarmAckByName
so callers can ack with the natural reference.
Proto:
- New MxCommandKind.AcknowledgeAlarmByName (=29).
- New AcknowledgeAlarmByNameCommand(alarm_name, provider_name, group_name,
comment, operator_user/node/domain/full_name) on MxCommand oneof.
- AcknowledgeAlarmReplyPayload (existing) carries the AVEVA native
status; reused for the by-name path.
Worker:
- IMxAccessAlarmConsumer + WnWrapAlarmConsumer + AlarmDispatcher +
AlarmCommandHandler all gain an AcknowledgeByName(name, provider,
group, comment, operator-identity) overload that maps to
wwAlarmConsumerClass.AlarmAckByName.
- MxAccessCommandExecutor: new switch arm routes
MxCommandKind.AcknowledgeAlarmByName to the handler. Empty alarm_name
yields InvalidRequest; handler exceptions surface as MxaccessFailure.
Gateway:
- WorkerAlarmRpcDispatcher.TryParseAlarmReference: parses
"Provider!Group.Tag" with the convention that the FIRST '!' separates
provider, the FIRST '.' after '!' separates group; tag may contain
more dots.
- AcknowledgeAsync now branches: GUID input → AcknowledgeAlarm command
(existing path); reference input → AcknowledgeAlarmByName command
(new path); neither parses → InvalidRequest with a clear diagnostic.
Tests: 13 new unit tests cover each layer end-to-end:
- WorkerAlarmRpcDispatcher.TryParseAlarmReference (3 valid + 8 invalid
forms) including the realistic 4-component "Galaxy!TestArea.
TestMachine_001.TestAlarm001" reference.
- WorkerAlarmRpcDispatcher.AcknowledgeAsync routes references through
AcknowledgeAlarmByName + propagates the full operator tuple.
- Executor switch arm carries the by-name tuple and rejects empty
alarm_name.
- AlarmDispatcher.AcknowledgeByName forwards to consumer.
- Existing fakes extended for the new overload.
Counts: server 308/0, worker 195/3 skip / 1 pre-existing structure-fail
(untouched). Solution builds clean.
End-to-end alarms-over-gateway now serves the full lmxopcua flow:
client.AcknowledgeAlarm(reference="Galaxy!TestArea.TestMachine_001.TestAlarm001",
operator_user="alice") → gateway parses → IPC AcknowledgeAlarmByName →
worker AlarmAckByName → AVEVA history. The remaining piece for full
parity is a live dev-rig smoke test.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds the missing trigger that activates the worker's wnwrap consumer.
Without this, every session opened in OK state but the consumer never
started, so AcknowledgeAlarm/QueryActiveAlarms returned "alarm consumer
not configured" forever.
New AlarmsOptions config block (under MxGateway:Alarms):
- Enabled (default false): gates the auto-subscribe path so existing
deployments without alarm configuration are unaffected.
- SubscriptionExpression: explicit AVEVA expression like
\<machine>\Galaxy!<area>.
- DefaultArea: fallback used when SubscriptionExpression is empty;
composes \$(MachineName)\Galaxy!$(DefaultArea).
- RequireSubscribeOnOpen (default false): when true, an auto-subscribe
failure faults the session; when false, the failure is logged and
the session stays Ready (data subscriptions keep working, alarms
return "not subscribed" until the operator retries).
SessionManager.OpenSessionAsync gains a TryAutoSubscribeAlarmsAsync hook
that runs after MarkReady. Skips when alarms are disabled; otherwise
builds a SubscribeAlarmsCommand, invokes it on the session's worker
client, and either logs the resulting status or escalates per
RequireSubscribeOnOpen. SessionManagerException is the failure mode for
the strict path so callers in MxAccessGatewayService surface it as
session-open-failed.
Tests: 7 new unit tests cover the disabled lane, expression-driven
subscribe, DefaultArea fallback, success path, soft-failure (require
off), strict-failure (require on), and missing-config-strict-throw.
Server suite total: 295 pass / 0 fail. Solution builds clean.
End-to-end alarms-over-gateway path is now live (with config). Open a
session against a gateway with Alarms.Enabled=true + a valid
SubscriptionExpression; the worker's wnwrap consumer auto-subscribes;
QueryActiveAlarms streams snapshots; AcknowledgeAlarm acks by GUID.
Reference→GUID resolution (AlarmAckByName worker command) and the live
dev-rig smoke test remain follow-ups.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Replaces NotWiredAlarmRpcDispatcher in DI with a production
implementation that issues the new MxCommandKind.{AcknowledgeAlarm,
QueryActiveAlarms} commands across the IPC and unwraps the resulting
MxCommandReply into the public RPC types.
QueryActiveAlarms is fully wired: builds the QueryActiveAlarmsCommand
(forwarding alarm_filter_prefix), invokes it on the resolved
GatewaySession's worker client, and yields each ActiveAlarmSnapshot
from the QueryActiveAlarmsReplyPayload as the RPC stream. Worker
failures + missing sessions yield an empty stream — matches the
ConditionRefresh contract clients already speak to.
AcknowledgeAlarm is partially wired: the public RPC takes
AlarmFullReference (Provider!Group.Tag), but the worker's wnwrap
consumer acks by GUID. Strategy:
- If AlarmFullReference parses as a canonical GUID, forward it
directly through MxCommandKind.AcknowledgeAlarm. Native status
flows back via MxCommandReply.Hresult and the dedicated
AcknowledgeAlarmReplyPayload.NativeStatus.
- Otherwise, return InvalidRequest with a clear diagnostic naming the
follow-up — reference→GUID lookup needs a worker-side AlarmAckByName
command wrapping wwAlarmConsumerClass.AlarmAckByName.
DI: SessionServiceCollectionExtensions registers WorkerAlarmRpcDispatcher
as the default IAlarmRpcDispatcher; MxAccessGatewayService picks it up
via constructor injection. NotWiredAlarmRpcDispatcher is retained for
test fixtures that want the no-side-effect fake.
Tests: 7 new unit tests cover session-not-found short-circuit, GUID-vs-
reference branching, native-status propagation, worker MxaccessFailure
diagnostic propagation, and snapshot-stream yielding. Server test
suite total: 288/0 fail. Solution builds clean.
End-to-end alarms-over-gateway pipeline status:
consumer → sink → queue (A.2 + A.3 in-process slice)
worker IPC commands (A.3 worker slice)
gateway dispatcher (this slice)
Remaining for full E2E:
- Auto-issue SubscribeAlarms on session open (or add a public
SubscribeAlarms RPC). Without this trigger the consumer never
starts and Acknowledge/Query return "not subscribed".
- AlarmAckByName worker command for ack-by-reference.
- End-to-end live test against the dev rig.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Replaces the inline diagnostic strings in PR A.3's AcknowledgeAlarm
+ QueryActiveAlarms handlers with an IAlarmRpcDispatcher seam.
- IAlarmRpcDispatcher (new) — gateway-side abstraction over the
worker-RPC path that fronts AlarmClient.AlarmAckByGUID and the
active-alarm walk. AcknowledgeAsync returns the
AcknowledgeAlarmReply directly; QueryActiveAlarmsAsync yields an
IAsyncEnumerable<ActiveAlarmSnapshot>.
- NotWiredAlarmRpcDispatcher (new, default impl) — returns
PROTOCOL_STATUS_OK with a structured worker-pending diagnostic
on Acknowledge, yields an empty stream on QueryActiveAlarms.
Same observable shape as PR A.3, but the integration seam is
now in code instead of hardcoded inside the handler.
- MxAccessGatewayService — handlers delegate to the dispatcher.
Constructor accepts an optional IAlarmRpcDispatcher (default
NotWiredAlarmRpcDispatcher); a future WorkerAlarmRpcDispatcher
registration in DI swaps in the live worker-IPC routing without
changing the public RPC surface.
- 2 new dispatcher tests pin the not-wired contract; 279 → 281
total tests, all green.
Worker-side dispatch (translating Acknowledge / QueryActiveAlarms
to the IPC method that calls IMxAccessAlarmConsumer from PR A.5)
is the dev-rig follow-up — it depends on validating the AVEVA
GetAlarmChangesCompleted event subscription against a live alarm
provider before pinning a wire format.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Twelfth PR of the alarms-over-gateway epic
(docs/plans/alarms-over-gateway.md). Lands the public RPC handler
surface that PR A.1's proto introduced. The actual worker-side
ack call + active-alarm walk depend on PR A.2 (worker MxAccess
subscription); this PR ensures clients can call the RPCs and
receive a meaningful response without UNIMPLEMENTED at the gRPC
layer.
- AcknowledgeAlarm — validates session_id + alarm_full_reference,
resolves the session (NotFound on miss), returns a successful
reply with a structured DiagnosticMessage indicating worker
dispatch is pending PR A.2. Once A.2 ships, the body translates
the request into a WorkerCommand and forwards through
SessionManager.InvokeAsync.
- QueryActiveAlarms — validates session_id, returns an empty
stream. PR A.4 layers the actual ConditionRefresh implementation
once the worker's QueryActiveAlarmsCommand is available.
- OpenSessionReply.Capabilities advertises both new RPCs
(unary-acknowledge-alarm, server-stream-active-alarms) so
clients can negotiate against the contract surface.
OnAlarmTransition events flow through the existing StreamEvents
path automatically — EventStreamService and MxAccessGrpcMapper
forward whatever family the worker emits without filtering, so
no changes are needed there for A.3.
Tests: full 273-test suite still green. Per-handler unit tests
ship with PR A.4's expanded surface; A.3's stub handlers are
narrow enough that the existing parity-fixture tests cover the
contract round-trip.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- Resolve 14 conflicts from popping local stash on top of origin's
eed1e88 + 8d3352f doc-comment additions (11 mechanical, plus
version.rs, DashboardAuthenticatorTests.cs, DashboardGalaxyProjector.cs)
- Fix 4 test files that used AGENTS.md as the repo-root sentinel
(now use CLAUDE.md, since AGENTS.md was removed in 4731ab5)
- Redirect 10 doc citations from AGENTS.md to the matching gateway.md
sections (Value Model, Status Model, Security, STA Worker Thread
Model, gRPC Layer rule, cancellation rule)
Verified: solution build clean, x86 worker build clean, 266/266
gateway tests passing, 121/121 worker tests passing.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>