EventsHub publisher (closes the v2.1 follow-up flagged in the previous commit)
EventStreamService now mirrors every MxEvent it forwards to a gRPC client
into the `EventsHub` group for the session. The fan-out goes through a new
singleton `IDashboardEventBroadcaster`:
* IDashboardEventBroadcaster — abstraction so EventStreamService doesn't
take a direct dependency on SignalR.
* DashboardEventBroadcaster — singleton implementation that hands the
SendAsync to IHubContext<EventsHub> as fire-and-forget. Errors are
logged at debug and dropped so the source gRPC stream is never
blocked.
EventStreamService now takes IDashboardEventBroadcaster as a ctor parameter
and calls Publish(sessionId, publicEvent) once per event after sequence
filtering, before the bounded queue write. Test fixtures and the live
integration harness pass NullDashboardEventBroadcaster.Instance so the
broadcaster is a no-op in unit tests.
SessionDetailsPage adds a "Recent events" panel:
* implements IAsyncDisposable
* opens a second HubConnection via DashboardHubConnectionFactory targeting
/hubs/events
* calls SubscribeSession(SessionId) on Start
* renders the most recent 50 events in a small table (worker seq, family,
server/item handle, alarm reference when the event is OnAlarmTransition)
* shows a live/offline conn-pill driven by HubConnection.Closed /
Reconnected events
The dashboard mirror is intentionally passive — events appear only while a
gRPC client is also consuming that session's events. Documented as such in
the empty-state copy and in GatewayDashboardDesign.md.
Documentation refresh
Every doc that referenced the retired options (PathBase, RequireAdminScope,
RequiredGroup) and the old API-key-cookie auth flow is updated to describe
the new model:
* CLAUDE.md — Authentication section now explains LDAP bind +
GroupToRole + HubToken bearer flow.
* gateway.md — Dashboard section: root-mounted routes, snapshot/alarms/
events SignalR hubs, LDAP cookie + bearer scheme.
* docs/GatewayConfiguration.md — drop PathBase / RequireAdminScope rows,
add GroupToRole row, append "Authorization policies" and "SignalR hubs"
subsections describing the three policies and the /hubs/* endpoints.
* docs/GatewayDashboardDesign.md — hosting model (root mount, new
endpoint layout), Realtime Updates rewritten as a hub table
(DashboardSnapshotHub / AlarmsHub / EventsHub with producers, payloads,
and routing), Authentication And Authorization rewritten around LDAP +
role mapping + the hub bearer flow, Configuration block updated.
* docs/GatewayProcessDesign.md — security-section dashboard paragraph
and the example config block both refreshed to LDAP/role auth.
* docs/ImplementationPlanGateway.md — dashboard-auth deliverable list
updated (LDAP bind + GroupToRole + /hubs/token bearer mint replace the
API-key login flow).
* docs/GatewayTesting.md — DashboardLdapLiveTests blurb describes the
GroupToRole fixture (`{ GwAdmin: Admin }`) instead of the retired
RequiredGroup default; success-path assertion explains the role-claim
check.
Verification: 475 server tests, 275 worker tests (+ 9 dev-rig skips), 18
integration tests (live MxAccess + LDAP + Galaxy) all pass — including the
live worker smoke test fixture that now constructs EventStreamService with
the new broadcaster parameter.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
22 KiB
Gateway Testing
Gateway tests run without installed MXAccess by using fake workers, fake transports, and in-process gRPC service fakes. Live MXAccess verification belongs in opt-in integration tests because it depends on installed COM components and provider state.
Fake Worker Harness
FakeWorkerHarness in src/ZB.MOM.WW.MxGateway.Tests/Gateway/Workers/Fakes/ provides an
in-process worker side for named-pipe IPC tests. It uses the same
WorkerFrameReader, WorkerFrameWriter, and WorkerEnvelope contract as the
gateway so tests exercise real frame validation and worker-client state changes.
Use the harness when a gateway or session test needs worker behavior without
starting ZB.MOM.WW.MxGateway.Worker.exe or loading MXAccess COM. The harness scripts:
WorkerHelloandWorkerReadystartup,- command replies with matching correlation ids,
- ordered
WorkerEventframes, WorkerHeartbeatframes,WorkerFaultframes,- shutdown acknowledgements,
- malformed protobuf payloads and oversized frame headers,
- slow or hung workers by withholding a reply.
Session-level tests can connect the harness to the pipe created by
SessionWorkerClientFactory with ConnectToGatewayPipeAsync. Lower-level
WorkerClient tests can use CreateConnectedPairAsync to create both pipe ends
inside the test.
GatewayEndToEndFakeWorkerSmokeTests composes the real gRPC service,
SessionManager, SessionWorkerClientFactory, WorkerClient, and
EventStreamService with a scripted fake worker launcher. The smoke test covers
OpenSession, Register, AddItem, Advise, one streamed OnDataChange
event, and CloseSession without loading MXAccess COM.
Live MXAccess Smoke
WorkerLiveMxAccessSmokeTests in src/ZB.MOM.WW.MxGateway.IntegrationTests/ composes the
real gRPC service, SessionManager, SessionWorkerClientFactory,
WorkerClient, WorkerProcessLauncher, and ZB.MOM.WW.MxGateway.Worker.exe. It is
skipped unless MXGATEWAY_RUN_LIVE_MXACCESS_TESTS=1 is set because it creates
the installed MXAccess COM object and depends on live provider state.
The live smoke opens a gateway session, launches the x86 worker, runs
Register, AddItem, and Advise, waits a bounded time for the first
OnDataChange event (skipping any earlier bootstrap/registration-state event),
and closes the session in a finally block so the worker gets a graceful
shutdown request even when a command or event assertion fails. Cleanup failures
in that finally block are logged rather than thrown, so a real assertion
failure is never masked by a shutdown timeout.
WorkerLiveMxAccessSmokeTests additionally covers five MXAccess parity paths the
fake-worker tests cannot validate:
- a
Writeround-trip against an advised item, asserting both that the reply isOk/MxCommandKind.Writeand that the worker emits a matchingOnWriteCompleteevent for the targeted (server, item) handle pair — the same round-trip proof used byscripts/run-client-e2e-tests.ps1, - an
AddItemagainst an invalid server handle, asserting the MXAccess failure surfaces in the command reply without faulting the gateway transport, - the
UnAdvise→RemoveItem→Unregisterteardown chain, asserting each step repliesOkwith the matchingMxCommandKind, that no furtherOnDataChangeevents arrive for the un-advised pair, and that a secondRemoveItemagainst the freed handle relays a non-OkMXAccess failure, - a
WriteSecuredround-trip afterAuthenticateUser, asserting the reply carriesMxCommandKind.WriteSecuredand the credential password never appears in the diagnostic message (parity for both the secured-write ordering rule and the "do not log secrets" contract), and - an abnormal worker exit (the worker process is killed mid-session) where the
gateway must transition the session to
SessionState.Faultedwith a non-empty fault description carrying a known worker-client classification (pipe disconnected / worker faulted / end-of-stream / heartbeat expired).
All six tests are gated by the same MXGATEWAY_RUN_LIVE_MXACCESS_TESTS=1
opt-in variable.
Build the worker before running the smoke:
dotnet build src/ZB.MOM.WW.MxGateway.Worker/ZB.MOM.WW.MxGateway.Worker.csproj -p:Platform=x86
Run the smoke explicitly:
$env:MXGATEWAY_RUN_LIVE_MXACCESS_TESTS = "1"
dotnet test src/ZB.MOM.WW.MxGateway.IntegrationTests/ZB.MOM.WW.MxGateway.IntegrationTests.csproj --filter FullyQualifiedName~WorkerLiveMxAccessSmokeTests
Optional live smoke variables:
| Variable | Default | Description |
|---|---|---|
MXGATEWAY_LIVE_MXACCESS_WORKER_EXE |
First existing ZB.MOM.WW.MxGateway.Worker.exe under src/ZB.MOM.WW.MxGateway.Worker/bin/... |
Worker executable path. Set this when running against a packaged worker or a non-default build output. |
MXGATEWAY_LIVE_MXACCESS_ITEM |
TestChildObject.TestInt |
MXAccess item reference used by AddItem. |
MXGATEWAY_LIVE_MXACCESS_CLIENT_NAME |
ZB.MOM.WW.MxGateway.IntegrationTests |
Client name passed to Register. |
MXGATEWAY_LIVE_MXACCESS_EVENT_TIMEOUT_SECONDS |
15 |
Maximum wait for the first OnDataChange (also used for the OnWriteComplete round-trip and the abnormal-exit fault transition). |
MXGATEWAY_LIVE_MXACCESS_WRITE_SECURED_USER |
admin |
ArchestrA user name passed to AuthenticateUser before the WriteSecured parity step. |
MXGATEWAY_LIVE_MXACCESS_WRITE_SECURED_PASSWORD |
admin123 |
Password paired with the user above. Never logged; the test asserts the value does not appear in the WriteSecured diagnostic message. |
The test output includes session id, worker process id, command status, HRESULT/status diagnostics, event sequence and handles, close status, and worker stdout/stderr lines emitted during the run.
Dev-rig Probes
src/ZB.MOM.WW.MxGateway.Worker.Tests/Probes/ partitions runtime probes from the regular
Worker.Tests regression suite. The folder is its own
ZB.MOM.WW.MxGateway.Worker.Tests.Probes namespace so a discovery filter (e.g. dotnet test --filter FullyQualifiedName~ZB.MOM.WW.MxGateway.Worker.Tests.Probes) can target or
exclude them without enumerating individual class names. The probes are
[Fact(Skip = "...")] by default and exist to characterize live AVEVA
behavior on the dev rig, not to gate CI — flip Skip = null on the dev box
with installed MXAccess + a running Galaxy provider when running them:
AlarmsLiveSmokeTests— end-to-end smoke for the alarms-over-gateway pipeline (WnWrapAlarmConsumer+AlarmDispatcher+MxAccessAlarmEventSink) against\\<machine>\Galaxy!DEVwith the dev rig's 10-second flip script writingTestMachine_001.TestAlarm001.AlarmClientWmProbeTests— registers as anAlarmClientconsumer on a real hidden message-only window and logs every Win32 message that arrives during a fixed pump window. Used to identify theWM_APP/RegisterWindowMessageIDs alarm callbacks use.WnWrapConsumerProbeTests— instantiates AVEVA's standalonewnwrapConsumerCOM class, subscribes to the dev rig's\\<machine>\Galaxy!DEVprovider, and pollsGetXmlCurrentAlarms2. The XML payload bypasses theFILETIME→DateTimeauto-marshaling that crashesaaAlarmManagedClient.AlarmClient.GetHighPriAlarmon this rig.
The probes share the Worker.Tests project (so they can use its net48/x86
configuration and the installed ArchestrA.MxAccess / aaAlarmManagedClient
references), but they are not part of the regression contract — a Worker.Tests
run with Skip left in place passes them as skipped.
Live Galaxy Repository
GalaxyRepositoryLiveTests in src/ZB.MOM.WW.MxGateway.IntegrationTests/Galaxy/ exercises
GalaxyRepository directly against the ZB Galaxy Repository SQL database. It is
skipped unless MXGATEWAY_RUN_LIVE_GALAXY_TESTS=1 is set because it depends on a
reachable SQL Server instance and deployed Galaxy state — fake-worker tests cannot
cover the SQL browse RPCs.
The suite covers TestConnectionAsync, GetLastDeployTimeAsync,
GetHierarchyAsync, and GetAttributesAsync. GetHierarchyAsync and
GetAttributesAsync assert a non-empty result, so the connected ZB database
must contain a deployed Galaxy, not just an empty schema.
Run the Galaxy live tests explicitly:
$env:MXGATEWAY_RUN_LIVE_GALAXY_TESTS = "1"
dotnet test src/ZB.MOM.WW.MxGateway.IntegrationTests/ZB.MOM.WW.MxGateway.IntegrationTests.csproj --filter FullyQualifiedName~GalaxyRepositoryLiveTests
Optional live Galaxy variables:
| Variable | Default | Description |
|---|---|---|
MXGATEWAY_LIVE_GALAXY_CONN |
Server=localhost;Database=ZB;Integrated Security=True;TrustServerCertificate=True;Encrypt=False; |
Galaxy Repository connection string. Set this when the ZB database is on a non-default instance or needs SQL authentication. |
The default connection string targets ZB on localhost with Windows
authentication, which matches the Galaxy Repository conventions in CLAUDE.md.
Galaxy Filter Safety
GalaxyFilterInputSafetyTests in src/ZB.MOM.WW.MxGateway.Tests/Galaxy/ covers adversarial
input handling for the Galaxy Repository browse filter layer. It runs in the
unit-test project (no live SQL needed) and complements the live SQL coverage in
GalaxyRepositoryLiveTests.
The test class re-frames the original "Galaxy SQL injection" concern (Tests-002 in
code-reviews/Tests/findings.md). GalaxyRepository issues only four constant
SQL statements (HierarchySql, AttributesSql, SELECT 1,
SELECT time_of_last_deploy FROM galaxy) — no DiscoverHierarchyRequest field
is ever concatenated into a SQL string, so there is no dynamic SQL surface and no
LIKE-escaping helper to test. All filters (TagNameGlob, RootTagName,
template-chain, category, contained-path) are applied in memory by
GalaxyHierarchyProjector / GalaxyGlobMatcher against the cached snapshot.
The adversarial-input matrix (', ' OR '1'='1, '; DROP TABLE gobject;--,
%, _, 100%_off, [abc], Pump'001) pins the following invariants:
- SQL metacharacters (
',;) andLIKE-wildcards (%,_) are treated as opaque literals byGalaxyGlobMatcher— they never act as wildcards, never spuriously match unrelated text. - Only
*and?are glob wildcards. GalaxyGlobMatcherapplies a 100 ms regex timeout so a pathological glob (e.g. 5 000acharacters plus a literal!) completes promptly rather than catastrophically backtracking.GalaxyHierarchyProjectorreturns zero matches (rather than the whole hierarchy) for an adversarialTagNameGloborTemplateChainContains, and surfacesNotFoundfor an adversarialRootTagName.- The
DiscoverHierarchyRPC end-to-end returns zero matches for adversarialTagNameGlobrather than faulting.
These invariants are the real security surface of the Galaxy browse path; the SQL-injection framing does not apply to a constant-query layer.
Live LDAP
DashboardLdapLiveTests in src/ZB.MOM.WW.MxGateway.IntegrationTests/ exercises
DashboardAuthenticator against the live GLAuth directory. It is skipped unless
MXGATEWAY_RUN_LIVE_LDAP_TESTS=1 is set because it binds against the GLAuth
service described in glauth.md.
The suite builds the authenticator with GatewayOptions.Dashboard.GroupToRole
set to { GwAdmin: Admin }. GwAdmin is the gateway-specific
dashboard-admin role and is not part of the five baseline GLAuth role
groups — it must be provisioned before the LDAP live tests pass.
AuthenticateAsync_AdminInGwAdminGroup_Succeeds fails (rather than skips)
when GLAuth has only the baseline groups, so this is a hard prerequisite
beyond "LDAP is up." See the "Adding a gw-specific group" section of
glauth.md for the provisioning step that adds GwAdmin and grants it to
admin.
The suite covers both the success path and the DashboardAuthenticator failure
branches: admin whose LDAP groups resolve to the Admin role succeeds and
emits the role claim; readonly is denied because no group in their memberOf
appears in GroupToRole; admin with a wrong password is rejected by the
candidate bind without leaking the password into FailureMessage; an unknown
username yields no candidate; and an unreachable LDAP server is absorbed into a
failed result rather than throwing.
Run the LDAP live tests explicitly:
$env:MXGATEWAY_RUN_LIVE_LDAP_TESTS = "1"
dotnet test src/ZB.MOM.WW.MxGateway.IntegrationTests/ZB.MOM.WW.MxGateway.IntegrationTests.csproj --filter FullyQualifiedName~DashboardLdapLiveTests
Client E2E Scripts
scripts/discover-testmachine-tags.ps1 queries the ZB Galaxy Repository for the
deployed runtime references used by the live client e2e scripts. It reads
TestMachine_001 through TestMachine_020 and the expected attributes:
ProtectedValueTestChangingIntTestBoolArrayTestIntArrayTestDateTimeArrayTestStringArray
The discovery output includes the exact fullTagReference, data type, array
dimension, and security classification. The array attributes are expected to be
dimension 50. ProtectedValue has security classification 2 and requires
secured write semantics; the current client CLI e2e runner subscribes to it but
does not attempt a normal Write.
Run discovery directly when validating the Galaxy Repository inputs:
powershell -ExecutionPolicy Bypass -File scripts/discover-testmachine-tags.ps1 -Json
scripts/run-client-e2e-tests.ps1 drives the .NET, Go, Rust, Python, and Java
client CLIs through a live gateway session. The gateway and worker are assumed
to be already running at -Endpoint; the script does not start or stop them.
For each client it runs these phases, then closes the session in a finally
path and writes a JSON report under artifacts/e2e/:
-
Session + register — opens one session and registers.
-
Bulk — verifies
SubscribeBulk/UnsubscribeBulkon a bounded tag subset (skip with-SkipBulk). -
Add-item / advise — adds and advises every discovered test tag. The loop has no
StreamEventsconsumer attached, so advised tags accumulate MXAccess change events in the worker event channel (MxGateway:Events:QueueCapacity); left unbounded it overflows underFailFastbackpressure and faults the worker. Every-DrainEveryTagsadvised tags (default 15) the loop connects a short-livedStreamEventsdrain so the gateway pumps that channel empty.-DrainEveryTags 0disables the drain. -
Stream — asserts a bounded event stream delivers at least one event (skip with
-SkipStream). -
Parity — asserts MXAccess error paths are rejected rather than silently succeeding: an invalid item handle and an unknown session id (skip with
-SkipParity). -
Auth rejection — asserts
open-sessionis rejected when the API key is missing, and (when-RejectScopeApiKeyEnvnames an insufficient-scope key) when the key lacks the required scope. Skip with-SkipAuth. -
Write round-trip — opt-in (
-VerifyWrite). Runs right afterregister: adds and advises a configurable writable attribute (-WriteAttribute, defaultTestChangingInt), writes a per-client sentinel value, then streams events and asserts anOnWriteCompleteevent for that item is observed — proof the write round-tripped through the gateway, worker, and MXAccess provider. The written value being echoed back in anOnDataChangeis recorded best-effort (echoObserved): a provider-driven attribute such asTestChangingIntaccepts the write but immediately overwrites it, so no data-change carries the value back. The Ruststream-eventsCLI emits full per-event JSON (family,itemHandle,value) so all five clients apply the same checks.It is opt-in because it mutates live tag state. The phase fails fast if the write command is rejected — e.g. against a gateway whose worker predates write support (
MxAccessCommandExecutorreturningInvalidRequestforWrite/Write2/WriteSecured/WriteSecured2). -
Alarm feed + acknowledge — opt-in (
-VerifyAlarms). Runs after the stream phase. Exercises the two session-less alarm subcommands against the gateway's central alarm monitor:stream-alarmsreads a bounded slice of the feed (-AlarmStreamMax, default 1 — the feed's first message always arrives immediately, whereas later ones depend on live transitions) and asserts at least oneAlarmFeedMessage;acknowledge-alarmacknowledges-AlarmReference(defaultGalaxy!TestArea.TestMachine_001.TestAlarm001) and asserts the RPC round-trips. The native ack outcome is not asserted — it depends on whether that alarm is currently active.It is opt-in because it depends on the gateway's central alarm monitor being enabled (
MxGateway:Alarms:Enabled) and a live alarm provider.
Each client CLI is driven through one long-lived batch process. Every CLI
exposes a batch subcommand: a process that reads one command line from stdin,
runs it through the normal subcommand dispatch, writes the JSON result, then a
line containing exactly __MXGW_BATCH_EOR__. The harness launches one such
process per client and pings the ~250 operations of the flow through it, so the
process — and, for the JVM, the runtime — cold-start is paid once per client
instead of once per operation. A command that fails inside the batch process
writes its {"error":...} envelope and the loop continues; the harness treats
that envelope as the operation failure (used by the parity and auth phases).
Before the per-client phases run, the script builds the .NET CLI
(dotnet build) and installs the Java CLI (gradle :mxgateway-cli:installDist)
once, so the batch process launches straight from the compiled exe / the
installed launcher. The Go, Rust, and Python batch processes are launched via
go run / cargo run / python -m, which compile-or-start once when that
single per-client process starts.
Build the gateway and worker, start the gateway, and provide a valid API key before running the client e2e script:
$env:MXGATEWAY_API_KEY = "<api-key>"
powershell -ExecutionPolicy Bypass -File scripts/run-client-e2e-tests.ps1
Useful runner options:
powershell -ExecutionPolicy Bypass -File scripts/run-client-e2e-tests.ps1 -Clients dotnet,python -MachineStart 1 -MachineEnd 2
powershell -ExecutionPolicy Bypass -File scripts/run-client-e2e-tests.ps1 -BulkTagCount 10
powershell -ExecutionPolicy Bypass -File scripts/run-client-e2e-tests.ps1 -SkipStream
powershell -ExecutionPolicy Bypass -File scripts/run-client-e2e-tests.ps1 -SkipBulk
# Write round-trip (opt-in): point at a writable scalar attribute and its
# value type.
powershell -ExecutionPolicy Bypass -File scripts/run-client-e2e-tests.ps1 -VerifyWrite -WriteAttribute TestChangingInt -WriteType int32
# Alarm feed + acknowledge (opt-in): needs MxGateway:Alarms:Enabled on the gateway.
powershell -ExecutionPolicy Bypass -File scripts/run-client-e2e-tests.ps1 -VerifyAlarms -AlarmReference "Galaxy!TestArea.TestMachine_001.TestAlarm001"
# Auth rejection: also assert an insufficient-scope key is denied.
powershell -ExecutionPolicy Bypass -File scripts/run-client-e2e-tests.ps1 -RejectScopeApiKeyEnv MXGATEWAY_READONLY_API_KEY
# Run all five clients concurrently as isolated child processes.
powershell -ExecutionPolicy Bypass -File scripts/run-client-e2e-tests.ps1 -Parallel
# Validate the flow offline (prints commands, contacts no gateway).
powershell -ExecutionPolicy Bypass -File scripts/run-client-e2e-tests.ps1 -DryRun
powershell -ExecutionPolicy Bypass -File scripts/run-client-e2e-tests.ps1 -Endpoint localhost:5000 -ApiKeyEnv MXGATEWAY_API_KEY
When -VerifyWrite is enabled, the write round-trip fails loudly if the write
command is rejected, if -WriteAttribute does not name a writable scalar
attribute, or if no OnWriteComplete event is observed for the written item
within -WriteEchoMaxEvents (default 200) streamed events. Raise
-WriteEchoMaxEvents if the gateway's per-session event backlog is large
enough to push OnWriteComplete past that bound.
Focused Commands
Run the cross-language smoke matrix tests after changing the documented client smoke command list:
dotnet test src/ZB.MOM.WW.MxGateway.Tests/ZB.MOM.WW.MxGateway.Tests.csproj --filter FullyQualifiedName~CrossLanguageSmokeMatrixTests
Run the parity fixture matrix tests after changing the integration parity scenario list:
dotnet test src/ZB.MOM.WW.MxGateway.Tests/ZB.MOM.WW.MxGateway.Tests.csproj --filter FullyQualifiedName~ParityFixtureMatrixTests
Run the fake worker tests after changing gateway worker IPC, session startup, or event streaming behavior:
dotnet test src/ZB.MOM.WW.MxGateway.Tests/ZB.MOM.WW.MxGateway.Tests.csproj --filter FullyQualifiedName~FakeWorkerHarnessTests
dotnet test src/ZB.MOM.WW.MxGateway.Tests/ZB.MOM.WW.MxGateway.Tests.csproj --filter FullyQualifiedName~SessionWorkerClientFactoryFakeWorkerTests
dotnet test src/ZB.MOM.WW.MxGateway.Tests/ZB.MOM.WW.MxGateway.Tests.csproj --filter FullyQualifiedName~GatewayEndToEndFakeWorkerSmokeTests
dotnet test src/ZB.MOM.WW.MxGateway.Tests/ZB.MOM.WW.MxGateway.Tests.csproj --filter FullyQualifiedName~WorkerClientTests
dotnet test src/ZB.MOM.WW.MxGateway.Worker.Tests/ZB.MOM.WW.MxGateway.Worker.Tests.csproj -p:Platform=x86 --filter FullyQualifiedName~WorkerPipeSessionTests
Run the gateway test project after shared gateway test infrastructure changes:
dotnet test src/ZB.MOM.WW.MxGateway.Tests/ZB.MOM.WW.MxGateway.Tests.csproj