Files
mxaccessgw/docs/GatewayTesting.md
T

22 KiB

Gateway Testing

Gateway tests run without installed MXAccess by using fake workers, fake transports, and in-process gRPC service fakes. Live MXAccess verification belongs in opt-in integration tests because it depends on installed COM components and provider state.

Fake Worker Harness

FakeWorkerHarness in src/ZB.MOM.WW.MxGateway.Tests/Gateway/Workers/Fakes/ provides an in-process worker side for named-pipe IPC tests. It uses the same WorkerFrameReader, WorkerFrameWriter, and WorkerEnvelope contract as the gateway so tests exercise real frame validation and worker-client state changes.

Use the harness when a gateway or session test needs worker behavior without starting ZB.MOM.WW.MxGateway.Worker.exe or loading MXAccess COM. The harness scripts:

  • WorkerHello and WorkerReady startup,
  • command replies with matching correlation ids,
  • ordered WorkerEvent frames,
  • WorkerHeartbeat frames,
  • WorkerFault frames,
  • shutdown acknowledgements,
  • malformed protobuf payloads and oversized frame headers,
  • slow or hung workers by withholding a reply.

Session-level tests can connect the harness to the pipe created by SessionWorkerClientFactory with ConnectToGatewayPipeAsync. Lower-level WorkerClient tests can use CreateConnectedPairAsync to create both pipe ends inside the test.

GatewayEndToEndFakeWorkerSmokeTests composes the real gRPC service, SessionManager, SessionWorkerClientFactory, WorkerClient, and EventStreamService with a scripted fake worker launcher. The smoke test covers OpenSession, Register, AddItem, Advise, one streamed OnDataChange event, and CloseSession without loading MXAccess COM.

Live MXAccess Smoke

WorkerLiveMxAccessSmokeTests in src/ZB.MOM.WW.MxGateway.IntegrationTests/ composes the real gRPC service, SessionManager, SessionWorkerClientFactory, WorkerClient, WorkerProcessLauncher, and ZB.MOM.WW.MxGateway.Worker.exe. It is skipped unless MXGATEWAY_RUN_LIVE_MXACCESS_TESTS=1 is set because it creates the installed MXAccess COM object and depends on live provider state.

The live smoke opens a gateway session, launches the x86 worker, runs Register, AddItem, and Advise, waits a bounded time for the first OnDataChange event (skipping any earlier bootstrap/registration-state event), and closes the session in a finally block so the worker gets a graceful shutdown request even when a command or event assertion fails. Cleanup failures in that finally block are logged rather than thrown, so a real assertion failure is never masked by a shutdown timeout.

WorkerLiveMxAccessSmokeTests additionally covers five MXAccess parity paths the fake-worker tests cannot validate:

  • a Write round-trip against an advised item, asserting both that the reply is Ok / MxCommandKind.Write and that the worker emits a matching OnWriteComplete event for the targeted (server, item) handle pair — the same round-trip proof used by scripts/run-client-e2e-tests.ps1,
  • an AddItem against an invalid server handle, asserting the MXAccess failure surfaces in the command reply without faulting the gateway transport,
  • the UnAdviseRemoveItemUnregister teardown chain, asserting each step replies Ok with the matching MxCommandKind, that no further OnDataChange events arrive for the un-advised pair, and that a second RemoveItem against the freed handle relays a non-Ok MXAccess failure,
  • a WriteSecured round-trip after AuthenticateUser, asserting the reply carries MxCommandKind.WriteSecured and the credential password never appears in the diagnostic message (parity for both the secured-write ordering rule and the "do not log secrets" contract), and
  • an abnormal worker exit (the worker process is killed mid-session) where the gateway must transition the session to SessionState.Faulted with a non-empty fault description carrying a known worker-client classification (pipe disconnected / worker faulted / end-of-stream / heartbeat expired).

All six tests are gated by the same MXGATEWAY_RUN_LIVE_MXACCESS_TESTS=1 opt-in variable.

Build the worker before running the smoke:

dotnet build src/ZB.MOM.WW.MxGateway.Worker/ZB.MOM.WW.MxGateway.Worker.csproj -p:Platform=x86

Run the smoke explicitly:

$env:MXGATEWAY_RUN_LIVE_MXACCESS_TESTS = "1"
dotnet test src/ZB.MOM.WW.MxGateway.IntegrationTests/ZB.MOM.WW.MxGateway.IntegrationTests.csproj --filter FullyQualifiedName~WorkerLiveMxAccessSmokeTests

Optional live smoke variables:

Variable Default Description
MXGATEWAY_LIVE_MXACCESS_WORKER_EXE First existing ZB.MOM.WW.MxGateway.Worker.exe under src/ZB.MOM.WW.MxGateway.Worker/bin/... Worker executable path. Set this when running against a packaged worker or a non-default build output.
MXGATEWAY_LIVE_MXACCESS_ITEM TestChildObject.TestInt MXAccess item reference used by AddItem.
MXGATEWAY_LIVE_MXACCESS_CLIENT_NAME ZB.MOM.WW.MxGateway.IntegrationTests Client name passed to Register.
MXGATEWAY_LIVE_MXACCESS_EVENT_TIMEOUT_SECONDS 15 Maximum wait for the first OnDataChange (also used for the OnWriteComplete round-trip and the abnormal-exit fault transition).
MXGATEWAY_LIVE_MXACCESS_WRITE_SECURED_USER admin ArchestrA user name passed to AuthenticateUser before the WriteSecured parity step.
MXGATEWAY_LIVE_MXACCESS_WRITE_SECURED_PASSWORD admin123 Password paired with the user above. Never logged; the test asserts the value does not appear in the WriteSecured diagnostic message.

The test output includes session id, worker process id, command status, HRESULT/status diagnostics, event sequence and handles, close status, and worker stdout/stderr lines emitted during the run.

Dev-rig Probes

src/ZB.MOM.WW.MxGateway.Worker.Tests/Probes/ partitions runtime probes from the regular Worker.Tests regression suite. The folder is its own ZB.MOM.WW.MxGateway.Worker.Tests.Probes namespace so a discovery filter (e.g. dotnet test --filter FullyQualifiedName~ZB.MOM.WW.MxGateway.Worker.Tests.Probes) can target or exclude them without enumerating individual class names. The probes are [Fact(Skip = "...")] by default and exist to characterize live AVEVA behavior on the dev rig, not to gate CI — flip Skip = null on the dev box with installed MXAccess + a running Galaxy provider when running them:

  • AlarmsLiveSmokeTests — end-to-end smoke for the alarms-over-gateway pipeline (WnWrapAlarmConsumer + AlarmDispatcher + MxAccessAlarmEventSink) against \\<machine>\Galaxy!DEV with the dev rig's 10-second flip script writing TestMachine_001.TestAlarm001.
  • AlarmClientWmProbeTests — registers as an AlarmClient consumer on a real hidden message-only window and logs every Win32 message that arrives during a fixed pump window. Used to identify the WM_APP / RegisterWindowMessage IDs alarm callbacks use.
  • WnWrapConsumerProbeTests — instantiates AVEVA's standalone wnwrapConsumer COM class, subscribes to the dev rig's \\<machine>\Galaxy!DEV provider, and polls GetXmlCurrentAlarms2. The XML payload bypasses the FILETIME→DateTime auto-marshaling that crashes aaAlarmManagedClient.AlarmClient.GetHighPriAlarm on this rig.

The probes share the Worker.Tests project (so they can use its net48/x86 configuration and the installed ArchestrA.MxAccess / aaAlarmManagedClient references), but they are not part of the regression contract — a Worker.Tests run with Skip left in place passes them as skipped.

Live Galaxy Repository

GalaxyRepositoryLiveTests in src/ZB.MOM.WW.MxGateway.IntegrationTests/Galaxy/ exercises GalaxyRepository directly against the ZB Galaxy Repository SQL database. It is skipped unless MXGATEWAY_RUN_LIVE_GALAXY_TESTS=1 is set because it depends on a reachable SQL Server instance and deployed Galaxy state — fake-worker tests cannot cover the SQL browse RPCs.

The suite covers TestConnectionAsync, GetLastDeployTimeAsync, GetHierarchyAsync, and GetAttributesAsync. GetHierarchyAsync and GetAttributesAsync assert a non-empty result, so the connected ZB database must contain a deployed Galaxy, not just an empty schema.

Run the Galaxy live tests explicitly:

$env:MXGATEWAY_RUN_LIVE_GALAXY_TESTS = "1"
dotnet test src/ZB.MOM.WW.MxGateway.IntegrationTests/ZB.MOM.WW.MxGateway.IntegrationTests.csproj --filter FullyQualifiedName~GalaxyRepositoryLiveTests

Optional live Galaxy variables:

Variable Default Description
MXGATEWAY_LIVE_GALAXY_CONN Server=localhost;Database=ZB;Integrated Security=True;TrustServerCertificate=True;Encrypt=False; Galaxy Repository connection string. Set this when the ZB database is on a non-default instance or needs SQL authentication.

The default connection string targets ZB on localhost with Windows authentication, which matches the Galaxy Repository conventions in CLAUDE.md.

Galaxy Filter Safety

GalaxyFilterInputSafetyTests in src/ZB.MOM.WW.MxGateway.Tests/Galaxy/ covers adversarial input handling for the Galaxy Repository browse filter layer. It runs in the unit-test project (no live SQL needed) and complements the live SQL coverage in GalaxyRepositoryLiveTests.

The test class re-frames the original "Galaxy SQL injection" concern (Tests-002 in code-reviews/Tests/findings.md). GalaxyRepository issues only four constant SQL statements (HierarchySql, AttributesSql, SELECT 1, SELECT time_of_last_deploy FROM galaxy) — no DiscoverHierarchyRequest field is ever concatenated into a SQL string, so there is no dynamic SQL surface and no LIKE-escaping helper to test. All filters (TagNameGlob, RootTagName, template-chain, category, contained-path) are applied in memory by GalaxyHierarchyProjector / GalaxyGlobMatcher against the cached snapshot.

The adversarial-input matrix (', ' OR '1'='1, '; DROP TABLE gobject;--, %, _, 100%_off, [abc], Pump'001) pins the following invariants:

  • SQL metacharacters (', ;) and LIKE-wildcards (%, _) are treated as opaque literals by GalaxyGlobMatcher — they never act as wildcards, never spuriously match unrelated text.
  • Only * and ? are glob wildcards.
  • GalaxyGlobMatcher applies a 100 ms regex timeout so a pathological glob (e.g. 5 000 a characters plus a literal !) completes promptly rather than catastrophically backtracking.
  • GalaxyHierarchyProjector returns zero matches (rather than the whole hierarchy) for an adversarial TagNameGlob or TemplateChainContains, and surfaces NotFound for an adversarial RootTagName.
  • The DiscoverHierarchy RPC end-to-end returns zero matches for adversarial TagNameGlob rather than faulting.

These invariants are the real security surface of the Galaxy browse path; the SQL-injection framing does not apply to a constant-query layer.

Live LDAP

DashboardLdapLiveTests in src/ZB.MOM.WW.MxGateway.IntegrationTests/ exercises DashboardAuthenticator against the live GLAuth directory. It is skipped unless MXGATEWAY_RUN_LIVE_LDAP_TESTS=1 is set because it binds against the GLAuth service described in glauth.md.

The suite builds the authenticator with GatewayOptions.Dashboard.GroupToRole set to { GwAdmin: Admin }. GwAdmin is the gateway-specific dashboard-admin role and is not part of the five baseline GLAuth role groups — it must be provisioned before the LDAP live tests pass. AuthenticateAsync_AdminInGwAdminGroup_Succeeds fails (rather than skips) when GLAuth has only the baseline groups, so this is a hard prerequisite beyond "LDAP is up." See the "Adding a gw-specific group" section of glauth.md for the provisioning step that adds GwAdmin and grants it to admin.

The suite covers both the success path and the DashboardAuthenticator failure branches: admin whose LDAP groups resolve to the Admin role succeeds and emits the role claim; readonly is denied because no group in their memberOf appears in GroupToRole; admin with a wrong password is rejected by the candidate bind without leaking the password into FailureMessage; an unknown username yields no candidate; and an unreachable LDAP server is absorbed into a failed result rather than throwing.

Run the LDAP live tests explicitly:

$env:MXGATEWAY_RUN_LIVE_LDAP_TESTS = "1"
dotnet test src/ZB.MOM.WW.MxGateway.IntegrationTests/ZB.MOM.WW.MxGateway.IntegrationTests.csproj --filter FullyQualifiedName~DashboardLdapLiveTests

Client E2E Scripts

scripts/discover-testmachine-tags.ps1 queries the ZB Galaxy Repository for the deployed runtime references used by the live client e2e scripts. It reads TestMachine_001 through TestMachine_020 and the expected attributes:

  • ProtectedValue
  • TestChangingInt
  • TestBoolArray
  • TestIntArray
  • TestDateTimeArray
  • TestStringArray

The discovery output includes the exact fullTagReference, data type, array dimension, and security classification. The array attributes are expected to be dimension 50. ProtectedValue has security classification 2 and requires secured write semantics; the current client CLI e2e runner subscribes to it but does not attempt a normal Write.

Run discovery directly when validating the Galaxy Repository inputs:

powershell -ExecutionPolicy Bypass -File scripts/discover-testmachine-tags.ps1 -Json

scripts/run-client-e2e-tests.ps1 drives the .NET, Go, Rust, Python, and Java client CLIs through a live gateway session. The gateway and worker are assumed to be already running at -Endpoint; the script does not start or stop them. For each client it runs these phases, then closes the session in a finally path and writes a JSON report under artifacts/e2e/:

  1. Session + register — opens one session and registers.

  2. Bulk — verifies SubscribeBulk / UnsubscribeBulk on a bounded tag subset (skip with -SkipBulk).

  3. Add-item / advise — adds and advises every discovered test tag. The loop has no StreamEvents consumer attached, so advised tags accumulate MXAccess change events in the worker event channel (MxGateway:Events:QueueCapacity); left unbounded it overflows under FailFast backpressure and faults the worker. Every -DrainEveryTags advised tags (default 15) the loop connects a short-lived StreamEvents drain so the gateway pumps that channel empty. -DrainEveryTags 0 disables the drain.

  4. Stream — asserts a bounded event stream delivers at least one event (skip with -SkipStream).

  5. Parity — asserts MXAccess error paths are rejected rather than silently succeeding: an invalid item handle and an unknown session id (skip with -SkipParity).

  6. Auth rejection — asserts open-session is rejected when the API key is missing, and (when -RejectScopeApiKeyEnv names an insufficient-scope key) when the key lacks the required scope. Skip with -SkipAuth.

  7. Write round-tripopt-in (-VerifyWrite). Runs right after register: adds and advises a configurable writable attribute (-WriteAttribute, default TestChangingInt), writes a per-client sentinel value, then streams events and asserts an OnWriteComplete event for that item is observed — proof the write round-tripped through the gateway, worker, and MXAccess provider. The written value being echoed back in an OnDataChange is recorded best-effort (echoObserved): a provider-driven attribute such as TestChangingInt accepts the write but immediately overwrites it, so no data-change carries the value back. The Rust stream-events CLI emits full per-event JSON (family, itemHandle, value) so all five clients apply the same checks.

    It is opt-in because it mutates live tag state. The phase fails fast if the write command is rejected — e.g. against a gateway whose worker predates write support (MxAccessCommandExecutor returning InvalidRequest for Write/Write2/WriteSecured/WriteSecured2).

  8. Alarm feed + acknowledgeopt-in (-VerifyAlarms). Runs after the stream phase. Exercises the two session-less alarm subcommands against the gateway's central alarm monitor: stream-alarms reads a bounded slice of the feed (-AlarmStreamMax, default 1 — the feed's first message always arrives immediately, whereas later ones depend on live transitions) and asserts at least one AlarmFeedMessage; acknowledge-alarm acknowledges -AlarmReference (default Galaxy!TestArea.TestMachine_001.TestAlarm001) and asserts the RPC round-trips. The native ack outcome is not asserted — it depends on whether that alarm is currently active.

    It is opt-in because it depends on the gateway's central alarm monitor being enabled (MxGateway:Alarms:Enabled) and a live alarm provider.

Each client CLI is driven through one long-lived batch process. Every CLI exposes a batch subcommand: a process that reads one command line from stdin, runs it through the normal subcommand dispatch, writes the JSON result, then a line containing exactly __MXGW_BATCH_EOR__. The harness launches one such process per client and pings the ~250 operations of the flow through it, so the process — and, for the JVM, the runtime — cold-start is paid once per client instead of once per operation. A command that fails inside the batch process writes its {"error":...} envelope and the loop continues; the harness treats that envelope as the operation failure (used by the parity and auth phases).

Before the per-client phases run, the script builds the .NET CLI (dotnet build) and installs the Java CLI (gradle :zb-mom-ww-mxgateway-cli:installDist) once, so the batch process launches straight from the compiled exe / the installed launcher. The Go, Rust, and Python batch processes are launched via go run / cargo run / python -m, which compile-or-start once when that single per-client process starts.

Build the gateway and worker, start the gateway, and provide a valid API key before running the client e2e script:

$env:MXGATEWAY_API_KEY = "<api-key>"
powershell -ExecutionPolicy Bypass -File scripts/run-client-e2e-tests.ps1

Useful runner options:

powershell -ExecutionPolicy Bypass -File scripts/run-client-e2e-tests.ps1 -Clients dotnet,python -MachineStart 1 -MachineEnd 2
powershell -ExecutionPolicy Bypass -File scripts/run-client-e2e-tests.ps1 -BulkTagCount 10
powershell -ExecutionPolicy Bypass -File scripts/run-client-e2e-tests.ps1 -SkipStream
powershell -ExecutionPolicy Bypass -File scripts/run-client-e2e-tests.ps1 -SkipBulk
# Write round-trip (opt-in): point at a writable scalar attribute and its
# value type.
powershell -ExecutionPolicy Bypass -File scripts/run-client-e2e-tests.ps1 -VerifyWrite -WriteAttribute TestChangingInt -WriteType int32
# Alarm feed + acknowledge (opt-in): needs MxGateway:Alarms:Enabled on the gateway.
powershell -ExecutionPolicy Bypass -File scripts/run-client-e2e-tests.ps1 -VerifyAlarms -AlarmReference "Galaxy!TestArea.TestMachine_001.TestAlarm001"
# Auth rejection: also assert an insufficient-scope key is denied.
powershell -ExecutionPolicy Bypass -File scripts/run-client-e2e-tests.ps1 -RejectScopeApiKeyEnv MXGATEWAY_READONLY_API_KEY
# Run all five clients concurrently as isolated child processes.
powershell -ExecutionPolicy Bypass -File scripts/run-client-e2e-tests.ps1 -Parallel
# Validate the flow offline (prints commands, contacts no gateway).
powershell -ExecutionPolicy Bypass -File scripts/run-client-e2e-tests.ps1 -DryRun
powershell -ExecutionPolicy Bypass -File scripts/run-client-e2e-tests.ps1 -Endpoint localhost:5000 -ApiKeyEnv MXGATEWAY_API_KEY

When -VerifyWrite is enabled, the write round-trip fails loudly if the write command is rejected, if -WriteAttribute does not name a writable scalar attribute, or if no OnWriteComplete event is observed for the written item within -WriteEchoMaxEvents (default 200) streamed events. Raise -WriteEchoMaxEvents if the gateway's per-session event backlog is large enough to push OnWriteComplete past that bound.

Focused Commands

Run the cross-language smoke matrix tests after changing the documented client smoke command list:

dotnet test src/ZB.MOM.WW.MxGateway.Tests/ZB.MOM.WW.MxGateway.Tests.csproj --filter FullyQualifiedName~CrossLanguageSmokeMatrixTests

Run the parity fixture matrix tests after changing the integration parity scenario list:

dotnet test src/ZB.MOM.WW.MxGateway.Tests/ZB.MOM.WW.MxGateway.Tests.csproj --filter FullyQualifiedName~ParityFixtureMatrixTests

Run the fake worker tests after changing gateway worker IPC, session startup, or event streaming behavior:

dotnet test src/ZB.MOM.WW.MxGateway.Tests/ZB.MOM.WW.MxGateway.Tests.csproj --filter FullyQualifiedName~FakeWorkerHarnessTests
dotnet test src/ZB.MOM.WW.MxGateway.Tests/ZB.MOM.WW.MxGateway.Tests.csproj --filter FullyQualifiedName~SessionWorkerClientFactoryFakeWorkerTests
dotnet test src/ZB.MOM.WW.MxGateway.Tests/ZB.MOM.WW.MxGateway.Tests.csproj --filter FullyQualifiedName~GatewayEndToEndFakeWorkerSmokeTests
dotnet test src/ZB.MOM.WW.MxGateway.Tests/ZB.MOM.WW.MxGateway.Tests.csproj --filter FullyQualifiedName~WorkerClientTests
dotnet test src/ZB.MOM.WW.MxGateway.Worker.Tests/ZB.MOM.WW.MxGateway.Worker.Tests.csproj -p:Platform=x86 --filter FullyQualifiedName~WorkerPipeSessionTests

Run the gateway test project after shared gateway test infrastructure changes:

dotnet test src/ZB.MOM.WW.MxGateway.Tests/ZB.MOM.WW.MxGateway.Tests.csproj