rename: prefix gateway projects/namespaces with ZB.MOM.WW + sln→slnx

Apply the ZB.MOM.WW. prefix to all gateway-side projects, folders,
.csproj/.sln contents, C# namespaces, using directives, generated proto
C# (csharp_namespace + checked-in generated files), InternalsVisibleTo
attributes, project-name string literals (LoadProject, .sln lookups,
worker exe paths, staticwebassets manifest), and the install/script/doc
references that point at any of the above. Migrate the solution from
.sln to .slnx via `dotnet sln migrate` and delete the old file.

External-runtime identifiers are intentionally NOT prefixed so external
configuration keeps working:
- GatewayMetrics.cs MeterName ("MxGateway.Server")
- DashboardAuthenticationDefaults Scheme/Policy ("MxGateway.Dashboard")
- GatewayRequestLoggingMiddleware logger category ("MxGateway.Request")
- StaRuntime thread name ("MxGateway.Worker.STA")
- appsettings.json root section "MxGateway" + env-var prefix
  MxGateway__... and secret-name MxGateway:ApiKeyPepper
- C:\ProgramData\MxGateway\ data dir paths

Also fixes two tests that were not rename-related but became visible
while validating the rename:

- WorkerLiveMxAccessSmokeTests.ShutDownAsync: cancellation that the
  gateway service correctly maps to RpcException(Cancelled) per gRPC
  convention was being misclassified as a stream fault. Added a sibling
  catch on RpcException with StatusCode.Cancelled.

- IntegrationTestEnvironment.ResolveRepositoryRoot: extracted IsRepositoryRoot
  and made it accept either a .git marker OR a .sln/.slnx next to src/
  so the worker-exe walker works in non-git working copies.

clients/proto/proto-inputs.json's protoRoot updated to point at
src/ZB.MOM.WW.MxGateway.Contracts/Protos.

Verified by `dotnet build` and a full `dotnet test` of the .slnx with
MXGATEWAY_RUN_LIVE_{MXACCESS,LDAP,GALAXY}_TESTS=1:
  Tests: 472/472 pass
  Worker.Tests: 280/280 pass (4 dev-rig [Fact(Skip=...)] skipped)
  IntegrationTests: 18/18 pass

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
Joseph Doherty
2026-05-23 16:22:23 -04:00
parent 867bf18116
commit dc9c0c950c
491 changed files with 32854 additions and 8414 deletions
+5 -5
View File
@@ -1,7 +1,7 @@
# aaAlarmManagedClient discovery — public surface, 2026-05-01
Result of running
`MxGateway.Worker.Tests.AlarmClientDiscoveryTests.DumpAlarmClientPublicSurface`
`ZB.MOM.WW.MxGateway.Worker.Tests.AlarmClientDiscoveryTests.DumpAlarmClientPublicSurface`
against the deployed AVEVA assembly:
- File:
@@ -68,7 +68,7 @@ list.
## What this means
The architecture comment on
`src/MxGateway.Worker/MxAccess/AlarmClientConsumer.cs` (PR A.5) is
`src/ZB.MOM.WW.MxGateway.Worker/MxAccess/AlarmClientConsumer.cs` (PR A.5) is
**wrong against this deployed assembly**:
> "The AVEVA alarm-manager surface (`IAlarmMgrDataProvider`) exposes
@@ -89,7 +89,7 @@ never gets invoked at runtime. Until A.2 lands a WM_APP pump,
## Live runtime probe — 2026-05-01
`MxGateway.Worker.Tests.AlarmClientWmProbeTests.ProbeAlarmClientWmMessages`
`ZB.MOM.WW.MxGateway.Worker.Tests.AlarmClientWmProbeTests.ProbeAlarmClientWmMessages`
is a Skip-gated runtime probe that creates a real message-only
window, calls `AlarmClient.RegisterConsumer(hWnd, …)` +
`Subscribe(@"\Galaxy!", …)`, and pumps for 20s while logging every
@@ -505,7 +505,7 @@ Interop.WNWRAPCONSUMERLib.dll`). The COM class is registered in
Apartment` — `new wwAlarmConsumerClass()` succeeds via
`CoCreateInstance`.
The probe `MxGateway.Worker.Tests/WnWrapConsumerProbeTests.cs`
The probe `ZB.MOM.WW.MxGateway.Worker.Tests/WnWrapConsumerProbeTests.cs`
(Skip-gated, archival) drove the captured run. Lifecycle:
1. `new wwAlarmConsumerClass()` — instantiated.
@@ -622,7 +622,7 @@ Replacing `aaAlarmManagedClient.AlarmClient` with
alarm-consumer surface unblocks A.2 fully. Outline:
1. **Reference path:** drop `aaAlarmManagedClient.dll` reference
from `MxGateway.Worker.csproj`; add `Interop.WNWRAPCONSUMERLib.dll`
from `ZB.MOM.WW.MxGateway.Worker.csproj`; add `Interop.WNWRAPCONSUMERLib.dll`
reference from `mxaccessgw/lib/`. (Or commit the interop dll
in-tree under `lib/` and reference relatively.)
2. **`AlarmClientConsumer` → `WnWrapAlarmConsumer`:** rewrite
+16 -18
View File
@@ -107,29 +107,20 @@ The gateway keeps API key state in a dedicated SQLite database. SQLite is suffic
### Connection factory
`AuthSqliteConnectionFactory` reads `GatewayOptions.Authentication.SqlitePath`, ensures the parent directory exists, and opens the connection in `ReadWriteCreate` mode so first-run installations can create the file without manual provisioning:
`AuthSqliteConnectionFactory` reads `GatewayOptions.Authentication.SqlitePath`, ensures the parent directory exists, and builds a connection string in `ReadWriteCreate` mode so first-run installations can create the file without manual provisioning. Connection pooling is enabled and the connection string carries a non-zero `DefaultTimeout`:
```csharp
public SqliteConnection CreateConnection()
SqliteConnectionStringBuilder builder = new()
{
string sqlitePath = options.Value.Authentication.SqlitePath;
string? directory = Path.GetDirectoryName(sqlitePath);
if (!string.IsNullOrWhiteSpace(directory))
{
Directory.CreateDirectory(directory);
}
SqliteConnectionStringBuilder builder = new()
{
DataSource = sqlitePath,
Mode = SqliteOpenMode.ReadWriteCreate
};
return new SqliteConnection(builder.ToString());
}
DataSource = sqlitePath,
Mode = SqliteOpenMode.ReadWriteCreate,
Pooling = true,
DefaultTimeout = (int)BusyTimeout.TotalSeconds,
};
```
Every store opens its connection through `OpenConnectionAsync`, which opens the connection and then applies `PRAGMA journal_mode=WAL` and `PRAGMA busy_timeout`. WAL is a persistent database-level setting so re-applying it per connection is a cheap no-op; `busy_timeout` is per-connection state. Because `MarkKeyUsedAsync` runs on every authenticated request and `SqliteApiKeyAuditStore` appends on every denial, this lets concurrent readers and writers retry briefly instead of surfacing `SQLITE_BUSY` as a hard failure on the request path.
### Schema
`SqliteAuthSchema` declares table names and the current schema version as constants. Three tables are involved:
@@ -166,6 +157,8 @@ public static ApiKeyRecord Read(SqliteDataReader reader)
`SqliteApiKeyAdminStore` (`IApiKeyAdminStore`) implements administrative mutations: `CreateAsync` accepts an `ApiKeyCreateRequest`, `RevokeAsync` sets `revoked_utc` only when not already revoked, and `RotateAsync` replaces `secret_hash`, clears `last_used_utc`, and clears `revoked_utc` so a rotated key is immediately usable.
Because `RotateAsync` clears `revoked_utc`, rotating a previously revoked key reactivates it. The dashboard API Keys page therefore offers the Rotate (and Revoke) action only for keys whose status is `Active`; a revoked key shows no actions, so an operator cannot un-revoke a deliberately disabled key as a side effect of a rotation.
### Audit trail
`SqliteApiKeyAuditStore` (`IApiKeyAuditStore`) appends `ApiKeyAuditEntry` values to the `api_key_audit` table and stamps each row with a UTC timestamp inside the store rather than trusting the caller. `ListRecentAsync` returns the most recent rows ordered by `audit_id` descending and projects them into `ApiKeyAuditRecord`. Rows are kept even after the referenced key is revoked because the audit history is the durable record of administrative action; the `key_id` column is nullable to accommodate non-key-scoped events such as `init-db`.
@@ -223,6 +216,10 @@ constraints remain fully unconstrained after migration.
Key ids are restricted by the parser to ASCII letters, digits, periods, and hyphens so they remain safe to embed in the token format and in URL paths used by administrative tooling.
The CLI is not the only management surface: the dashboard API Keys page
creates, rotates, and revokes keys through the same `IApiKeyAdminStore`. See
[Gateway Dashboard Design](./GatewayDashboardDesign.md#api-keys-page).
## Scope Serialization
Scopes are persisted as a single TEXT column rather than a join table because the set is small, never queried by membership at the database level, and changes atomically with the owning row. `ApiKeyScopeSerializer.Serialize` writes a JSON array sorted with `StringComparer.Ordinal` so equivalent scope sets produce byte-identical column values, which makes audit diffing and database comparisons deterministic:
@@ -276,4 +273,5 @@ Singletons are safe because each operation opens its own short-lived `SqliteConn
- [Gateway Configuration](./GatewayConfiguration.md)
- [Authorization](./Authorization.md)
- [Gateway Dashboard Design](./GatewayDashboardDesign.md)
- [Diagnostics](./Diagnostics.md)
+37 -13
View File
@@ -8,7 +8,7 @@ what an authenticated API key can browse, read, or write inside the Galaxy.
Authorization runs as a single gRPC server interceptor registered for every call on the gateway. It pulls the authenticated identity for the current request, derives the scope that the request type requires, and either lets the call continue or fails the call with a gRPC status. The pipeline keeps service classes free of cross-cutting checks, which matches the `gateway.md` "thin gRPC layer" rule that service handlers translate between contracts and domain code without owning policy.
The participating types live under `src/MxGateway.Server/Security/Authorization/`:
The participating types live under `src/ZB.MOM.WW.MxGateway.Server/Security/Authorization/`:
- `GatewayGrpcAuthorizationInterceptor` runs the authenticate-then-authorize pipeline for unary and server-streaming calls.
- `GatewayGrpcScopeResolver` maps a request message (and, for `MxCommandRequest`, the inner `MxCommandKind`) to the scope string that must be present on the caller.
@@ -102,12 +102,18 @@ public string ResolveRequiredScope(object request)
CloseSessionRequest => GatewayScopes.SessionClose,
StreamEventsRequest => GatewayScopes.EventsRead,
MxCommandRequest commandRequest => ResolveCommandScope(commandRequest.Command?.Kind ?? MxCommandKind.Unspecified),
AcknowledgeAlarmRequest => GatewayScopes.InvokeWrite,
StreamAlarmsRequest => GatewayScopes.EventsRead,
TestConnectionRequest or
GetLastDeployTimeRequest or
DiscoverHierarchyRequest or
WatchDeployEventsRequest => GatewayScopes.MetadataRead,
_ => GatewayScopes.Admin
};
}
```
The `_ => GatewayScopes.Admin` fallback is intentional: any future request type that the resolver does not recognize fails closed, requiring the strongest scope until the resolver is updated.
The `_ => GatewayScopes.Admin` fallback is intentional: any future request type that the resolver does not recognize fails closed, requiring the strongest scope until the resolver is updated. `AcknowledgeAlarm` is treated as a write — it mutates alarm state, mirroring `MxCommandKind.Write*` — and `StreamAlarms` shares the alarm/event surface with `StreamEvents` and `MxCommandKind.DrainEvents`, so it carries `events:read`. Both alarm RPCs are session-less: the scope check is the only authorization gate, since there is no per-session ownership to enforce.
`MxCommandRequest` is special because it multiplexes many MxAccess operations through a single RPC. The resolver inspects the embedded `MxCommandKind` so each operation gets its own scope:
@@ -117,10 +123,14 @@ private static string ResolveCommandScope(MxCommandKind kind)
return kind switch
{
MxCommandKind.Write or
MxCommandKind.Write2 => GatewayScopes.InvokeWrite,
MxCommandKind.Write2 or
MxCommandKind.WriteBulk or
MxCommandKind.Write2Bulk => GatewayScopes.InvokeWrite,
MxCommandKind.WriteSecured or
MxCommandKind.WriteSecured2 or
MxCommandKind.WriteSecuredBulk or
MxCommandKind.WriteSecured2Bulk or
MxCommandKind.AuthenticateUser => GatewayScopes.InvokeSecure,
MxCommandKind.ArchestraUserToId or
@@ -135,7 +145,7 @@ private static string ResolveCommandScope(MxCommandKind kind)
}
```
Reads (`Register`, `AddItem`, `Advise`, and any other unspecified kind) fall through to `InvokeRead`, which keeps the matrix small while still separating reads from writes, secured writes, metadata lookups, event drains, and worker shutdown.
Reads (`Register`, `AddItem`, `Advise`, `ReadBulk`, and any other unspecified kind) fall through to `InvokeRead`, which keeps the matrix small while still separating reads from writes, secured writes, metadata lookups, event drains, and worker shutdown. The four bulk-write families (`WriteBulk`, `Write2Bulk`, `WriteSecuredBulk`, `WriteSecured2Bulk`) are mapped explicitly so a missing arm cannot silently demote a bulk write to a read scope.
## Constraint Enforcement
@@ -161,12 +171,25 @@ Glob matching is anchored, case-insensitive, and supports `*` and `?`.
Subtree and tag glob lists are alternatives: matching either list allows that
scope dimension. Empty lists mean unconstrained for that dimension.
Constraints are set when a key is created — through the `apikey create-key`
flags (see [Authentication](./Authentication.md)) or the dashboard API Keys
page create dialog (see
[Gateway Dashboard Design](./GatewayDashboardDesign.md#api-keys-page)). The
dashboard API Keys page also renders each key's effective constraints.
The service checks read constraints for `AddItem`, `AddItem2`, `AddItemBulk`,
`SubscribeBulk`, and `AdviseItemBulk`. It checks write constraints for
`Write`, `Write2`, `WriteSecured`, and `WriteSecured2`. Successful item
registrations are tracked per session so later item-handle commands resolve
back to the original tag address. If a constrained key presents an unknown item
handle, the gateway fails closed.
`SubscribeBulk`, `AdviseItemBulk`, and `ReadBulk`. It checks write constraints
for `Write`, `Write2`, `WriteSecured`, `WriteSecured2`, `WriteBulk`,
`Write2Bulk`, `WriteSecuredBulk`, and `WriteSecured2Bulk`. Bulk commands run
through `BulkConstraintPlan` (`ReadBulkConstraintPlan`,
`WriteBulkConstraintPlan`, `SubscribeBulkConstraintPlan`), which preserves the
caller's input order: each entry is evaluated against the constraint surface,
and `BulkConstraintPlan.MergeDeniedInto` re-merges denied entries back into
their original index positions so the reply slot at `entries[i]` always
corresponds to the request slot at `entries[i]`. Successful item registrations
are tracked per session so later item-handle commands resolve back to the
original tag address. If a constrained key presents an unknown item handle,
the gateway fails closed.
Non-bulk constraint failures return gRPC `PermissionDenied`. Bulk read
commands preserve input order and return a failed `SubscribeResult` for each
@@ -182,10 +205,10 @@ blocking constraint; secured values and raw credentials are never logged.
|----------|-------|--------------|
| `SessionOpen` | `session:open` | `OpenSessionRequest` |
| `SessionClose` | `session:close` | `CloseSessionRequest` |
| `EventsRead` | `events:read` | `StreamEventsRequest`, `MxCommandKind.DrainEvents` |
| `InvokeRead` | `invoke:read` | `MxCommandRequest` for read-style command kinds (`Register`, `AddItem`, `Advise`, and any kind not otherwise mapped) |
| `InvokeWrite` | `invoke:write` | `MxCommandKind.Write`, `MxCommandKind.Write2` |
| `InvokeSecure` | `invoke:secure` | `MxCommandKind.WriteSecured`, `MxCommandKind.WriteSecured2`, `MxCommandKind.AuthenticateUser` |
| `EventsRead` | `events:read` | `StreamEventsRequest`, `StreamAlarmsRequest`, `MxCommandKind.DrainEvents` |
| `InvokeRead` | `invoke:read` | `MxCommandRequest` for read-style command kinds (`Register`, `AddItem`, `Advise`, `ReadBulk`, and any kind not otherwise mapped) |
| `InvokeWrite` | `invoke:write` | `AcknowledgeAlarmRequest`, `MxCommandKind.Write`, `MxCommandKind.Write2`, `MxCommandKind.WriteBulk`, `MxCommandKind.Write2Bulk` |
| `InvokeSecure` | `invoke:secure` | `MxCommandKind.WriteSecured`, `MxCommandKind.WriteSecured2`, `MxCommandKind.WriteSecuredBulk`, `MxCommandKind.WriteSecured2Bulk`, `MxCommandKind.AuthenticateUser` |
| `MetadataRead` | `metadata:read` | `MxCommandKind.ArchestraUserToId`, `MxCommandKind.GetSessionState`, `MxCommandKind.GetWorkerInfo`, `GalaxyRepository.TestConnection`, `GalaxyRepository.GetLastDeployTime`, `GalaxyRepository.DiscoverHierarchy`, `GalaxyRepository.WatchDeployEvents` |
| `Admin` | `admin` | `MxCommandKind.ShutdownWorker`, the default for any unrecognized request type, and the dashboard authorization policy |
@@ -252,6 +275,7 @@ Singleton lifetimes are appropriate because none of the three classes hold per-r
## Related Documentation
- [Authentication](./Authentication.md)
- [Gateway Dashboard Design](./GatewayDashboardDesign.md)
- [Grpc](./Grpc.md)
- [GatewayConfiguration](./GatewayConfiguration.md)
- [Galaxy Repository Browse](./GalaxyRepository.md)
+1 -1
View File
@@ -398,7 +398,7 @@ README.md
examples/
```
Generated code should be reproducible from `src/MxGateway.Contracts/Protos/`.
Generated code should be reproducible from `src/ZB.MOM.WW.MxGateway.Contracts/Protos/`.
Do not hand-edit generated code.
The stable client proto manifest defines the generated-code directories:
+10 -10
View File
@@ -8,7 +8,7 @@ in [Toolchain Links](./ToolchainLinks.md) when a command is missing from
## Shared Inputs
All clients generate bindings from the shared protobuf files under
`src/MxGateway.Contracts/Protos`. Regenerate the published client descriptor
`src/ZB.MOM.WW.MxGateway.Contracts/Protos`. Regenerate the published client descriptor
after changing either `.proto` file or `clients/proto/proto-inputs.json`:
```powershell
@@ -35,37 +35,37 @@ machine boundary or uses a production certificate.
## .NET
The .NET client uses .NET 10 and references
`src/MxGateway.Contracts/MxGateway.Contracts.csproj` for generated C# contract
`src/ZB.MOM.WW.MxGateway.Contracts/ZB.MOM.WW.MxGateway.Contracts.csproj` for generated C# contract
types. `clients/dotnet/generated` remains reserved for client-local generator
output if the client later decouples from the contracts project.
Regenerate the generated C# contract types:
```powershell
dotnet build src/MxGateway.Contracts/MxGateway.Contracts.csproj
dotnet build src/ZB.MOM.WW.MxGateway.Contracts/ZB.MOM.WW.MxGateway.Contracts.csproj
```
Build and test from the repository root:
```powershell
dotnet build clients/dotnet/MxGateway.Client.sln
dotnet test clients/dotnet/MxGateway.Client.sln --no-build
dotnet build clients/dotnet/ZB.MOM.WW.MxGateway.Client.sln
dotnet test clients/dotnet/ZB.MOM.WW.MxGateway.Client.sln --no-build
```
Create local package artifacts:
```powershell
$dotnetPackageOutput = Join-Path (Get-Location) 'artifacts/clients/dotnet'
dotnet pack clients/dotnet/MxGateway.Client/MxGateway.Client.csproj -c Release -p:PackageOutputPath="$dotnetPackageOutput"
dotnet publish clients/dotnet/MxGateway.Client.Cli/MxGateway.Client.Cli.csproj -c Release -o artifacts/clients/dotnet/mxgw-dotnet
dotnet pack clients/dotnet/ZB.MOM.WW.MxGateway.Client/ZB.MOM.WW.MxGateway.Client.csproj -c Release -p:PackageOutputPath="$dotnetPackageOutput"
dotnet publish clients/dotnet/ZB.MOM.WW.MxGateway.Client.Cli/ZB.MOM.WW.MxGateway.Client.Cli.csproj -c Release -o artifacts/clients/dotnet/mxgw-dotnet
```
Run the CLI from source:
```powershell
dotnet run --project clients/dotnet/MxGateway.Client.Cli -- version --json
dotnet run --project clients/dotnet/MxGateway.Client.Cli -- smoke --endpoint "http://$env:MXGATEWAY_ENDPOINT" --api-key-env MXGATEWAY_API_KEY --item $env:MXGATEWAY_TEST_ITEM --json
dotnet run --project clients/dotnet/MxGateway.Client.Cli -- smoke --endpoint "https://mxgateway.example.local:5001" --tls --ca-file C:\certs\mxgateway-ca.pem --server-name mxgateway.example.local --api-key-env MXGATEWAY_API_KEY --item $env:MXGATEWAY_TEST_ITEM --json
dotnet run --project clients/dotnet/ZB.MOM.WW.MxGateway.Client.Cli -- version --json
dotnet run --project clients/dotnet/ZB.MOM.WW.MxGateway.Client.Cli -- smoke --endpoint "http://$env:MXGATEWAY_ENDPOINT" --api-key-env MXGATEWAY_API_KEY --item $env:MXGATEWAY_TEST_ITEM --json
dotnet run --project clients/dotnet/ZB.MOM.WW.MxGateway.Client.Cli -- smoke --endpoint "https://mxgateway.example.local:5001" --tls --ca-file C:\certs\mxgateway-ca.pem --server-name mxgateway.example.local --api-key-env MXGATEWAY_API_KEY --item $env:MXGATEWAY_TEST_ITEM --json
```
## Go
+7 -7
View File
@@ -21,9 +21,9 @@ records:
The source files listed by the manifest are:
- `src/MxGateway.Contracts/Protos/mxaccess_gateway.proto`
- `src/MxGateway.Contracts/Protos/mxaccess_worker.proto`
- `src/MxGateway.Contracts/Protos/galaxy_repository.proto`
- `src/ZB.MOM.WW.MxGateway.Contracts/Protos/mxaccess_gateway.proto`
- `src/ZB.MOM.WW.MxGateway.Contracts/Protos/mxaccess_worker.proto`
- `src/ZB.MOM.WW.MxGateway.Contracts/Protos/galaxy_repository.proto`
`mxaccess_gateway.proto` defines the public gRPC service and shared DTOs.
`mxaccess_worker.proto` is included in the descriptor because worker-aware
@@ -86,7 +86,7 @@ issues.
## Language Generation Inputs
All generators use `src/MxGateway.Contracts/Protos` as the protobuf import
All generators use `src/ZB.MOM.WW.MxGateway.Contracts/Protos` as the protobuf import
root. The checked-in descriptor is available when a language build prefers a
descriptor input, but the `.proto` files remain canonical.
@@ -94,7 +94,7 @@ Use these commands to regenerate language-specific client bindings:
| Client | Command |
|--------|---------|
| .NET | `dotnet build src/MxGateway.Contracts/MxGateway.Contracts.csproj` |
| .NET | `dotnet build src/ZB.MOM.WW.MxGateway.Contracts/ZB.MOM.WW.MxGateway.Contracts.csproj` |
| Go | `Push-Location clients/go; ./generate-proto.ps1; Pop-Location` |
| Rust | `Push-Location clients/rust; cargo check --workspace; Pop-Location` |
| Python | `Push-Location clients/python; ./generate-proto.ps1; Pop-Location` |
@@ -103,10 +103,10 @@ Use these commands to regenerate language-specific client bindings:
.NET generation currently runs through the contracts project:
```powershell
dotnet build src/MxGateway.Contracts/MxGateway.Contracts.csproj
dotnet build src/ZB.MOM.WW.MxGateway.Contracts/ZB.MOM.WW.MxGateway.Contracts.csproj
```
Future .NET client projects may either reference `MxGateway.Contracts` or
Future .NET client projects may either reference `ZB.MOM.WW.MxGateway.Contracts` or
generate client-local files into `clients/dotnet/generated` with `Grpc.Tools`.
Go clients should generate `mxaccess_gateway.proto` and
+49 -7
View File
@@ -6,7 +6,7 @@ recreated by the contracts project build.
## Files
`src/MxGateway.Contracts/Protos/mxaccess_gateway.proto` defines the public
`src/ZB.MOM.WW.MxGateway.Contracts/Protos/mxaccess_gateway.proto` defines the public
`MxAccessGateway` gRPC service, command payloads, command replies, event DTOs,
`MxValue`, `MxArray`, and `MxStatusProxy`.
@@ -23,19 +23,61 @@ the corresponding MXAccess `AddItem`, `Advise`, `UnAdvise`, and `RemoveItem`
calls sequentially on the session STA and preserves input order in the result
list.
`src/MxGateway.Contracts/Protos/mxaccess_worker.proto` defines the named-pipe
The command model also includes bulk write/read command kinds:
`WriteBulk`, `Write2Bulk`, `WriteSecuredBulk`, `WriteSecured2Bulk`, and
`ReadBulk`. They are unary `Invoke` payloads on the same `MxAccessGateway`
surface (not separate gRPC methods) and exist so a caller can submit one list
of items per round trip while preserving MXAccess parity per entry.
- `WriteBulkCommand` / `Write2BulkCommand` / `WriteSecuredBulkCommand` /
`WriteSecured2BulkCommand` each carry a `server_handle` and a `repeated`
list of entries (`WriteBulkEntry`, `Write2BulkEntry`,
`WriteSecuredBulkEntry`, `WriteSecured2BulkEntry`). Each entry mirrors the
single-item command shape — `item_handle` + `value` (+ `timestamp_value` on
the `*2` variants, + `current_user_id` / `verifier_user_id` on the secured
variants). All four replies use `BulkWriteReply`, which carries
`repeated BulkWriteResult`. A `BulkWriteResult` has `server_handle`,
`item_handle`, `was_successful`, `optional int32 hresult`, `repeated
MxStatusProxy statuses`, and `error_message`. Per-entry failures populate
`error_message` + `hresult` and never raise — callers iterate and inspect
each entry. The credential-sensitive redaction rules for `WriteSecured` /
`WriteSecured2` apply to every `value` inside `WriteSecuredBulkEntry` and
`WriteSecured2BulkEntry`.
- `ReadBulkCommand` carries `server_handle`, `repeated string tag_addresses`,
and `uint32 timeout_ms` (0 means use the gateway-configured default). The
reply is `BulkReadReply` carrying `repeated BulkReadResult`. A
`BulkReadResult` has `server_handle`, `tag_address`, `item_handle`,
`was_successful`, `was_cached`, `value`, `quality`, `source_timestamp`,
`repeated MxStatusProxy statuses`, and `error_message`. MXAccess has no
synchronous `Read`, so `ReadBulk` is dual-mode per entry: when a tag is
already advised in the session the worker returns the cached
`OnDataChange` payload without touching the subscription
(`was_cached = true`); otherwise the worker takes a full
`AddItem` + `Advise` + wait-for-first-`OnDataChange` + `UnAdvise` +
`RemoveItem` snapshot lifecycle and returns the result
(`was_cached = false`). The asymmetry that `BulkReadResult` has no
`hresult` field is intentional — `ReadBulk` outcomes are timeout / cache
/ lifecycle states rather than MXAccess COM return codes.
See `gateway.md` for the full cached-vs-snapshot `ReadBulk` lifecycle and the
per-command scope requirements, and `docs/DesignDecisions.md` "Bulk Command
Family" for the rationale behind the per-entry result shape (independent
success tracking, input-order preservation, no partial-failure exceptions).
`src/ZB.MOM.WW.MxGateway.Contracts/Protos/mxaccess_worker.proto` defines the named-pipe
worker IPC envelope and control messages. It imports
`mxaccess_gateway.proto` so the worker and gateway use the same command, reply,
event, value, and status shapes.
`src/MxGateway.Contracts/Protos/galaxy_repository.proto` defines the
`src/ZB.MOM.WW.MxGateway.Contracts/Protos/galaxy_repository.proto` defines the
`GalaxyRepository` service used by clients to browse the Galaxy Repository
(deployed object hierarchy and dynamic attributes). The service is metadata-
only and does not share types with `mxaccess_gateway.proto`. See
[Galaxy Repository Browse](./GalaxyRepository.md) for the RPC catalog and
behavior.
Generated C# output is written to `src/MxGateway.Contracts/Generated/`. Do not
Generated C# output is written to `src/ZB.MOM.WW.MxGateway.Contracts/Generated/`. Do not
hand-edit generated files.
Client generation inputs are published through
@@ -49,20 +91,20 @@ generation inputs, output directories, and golden protobuf JSON fixtures.
Run the contracts build to regenerate C# protobuf and gRPC code:
```bash
dotnet build src/MxGateway.Contracts/MxGateway.Contracts.csproj
dotnet build src/ZB.MOM.WW.MxGateway.Contracts/ZB.MOM.WW.MxGateway.Contracts.csproj
```
Run the focused contract tests after changing either `.proto` file:
```bash
dotnet test src/MxGateway.Tests/MxGateway.Tests.csproj --filter ProtobufContractRoundTripTests
dotnet test src/ZB.MOM.WW.MxGateway.Tests/ZB.MOM.WW.MxGateway.Tests.csproj --filter ProtobufContractRoundTripTests
```
The full solution build also regenerates the C# contracts before compiling
gateway and test projects:
```bash
dotnet build src/MxGateway.sln
dotnet build src/ZB.MOM.WW.MxGateway.slnx
```
Regenerate the client descriptor after changing either `.proto` file:
+1 -1
View File
@@ -85,7 +85,7 @@ The explicit sequence remains the parity baseline for issue-level validation.
Run the matrix shape tests after changing the smoke matrix:
```bash
dotnet test src/MxGateway.Tests/MxGateway.Tests.csproj --filter FullyQualifiedName~CrossLanguageSmokeMatrixTests
dotnet test src/ZB.MOM.WW.MxGateway.Tests/ZB.MOM.WW.MxGateway.Tests.csproj --filter FullyQualifiedName~CrossLanguageSmokeMatrixTests
```
Live execution remains a separate opt-in step because it depends on a running
+63
View File
@@ -82,6 +82,18 @@ fan-out may be added later with explicit backpressure semantics.
Rationale: one subscriber preserves simple event ordering and failure behavior
while parity is being proven.
### Alarms — superseded for the alarm subsystem
The single-subscriber rule above no longer applies to alarms. The gateway runs
an always-on central alarm monitor (`GatewayAlarmMonitor`) that owns one
gateway-managed worker session, caches the active-alarm set, and fans it out to
any number of clients through the session-less `StreamAlarms` RPC. Per-session
alarm auto-subscribe is removed; `AcknowledgeAlarm` is session-less and routes
through the monitor. Data-side `StreamEvents` remains one subscriber per
session. Rationale: alarm state is gateway-wide, not session-scoped — every
client wants the same current set plus updates, and forcing each to own a
worker would multiply AVEVA polling load for no benefit.
## Authentication
Decision: API key authentication for the public gateway.
@@ -199,6 +211,57 @@ and failure behavior are easy to compare against direct MXAccess.
Batch tag registration can be added later if measured setup latency requires it.
## Bulk Command Family
Decision: the gateway exposes a fixed set of *bulk* command kinds —
`AddItemBulk`, `AdviseItemBulk`, `RemoveItemBulk`, `UnAdviseItemBulk`,
`SubscribeBulk`, `UnsubscribeBulk`, `WriteBulk`, `Write2Bulk`,
`WriteSecuredBulk`, `WriteSecured2Bulk`, `ReadBulk` — that carry a list of
entries in one round-trip and return one per-entry result. Each command kind
runs the corresponding single-item MXAccess COM call sequentially on the
worker STA; per-entry failures populate `was_successful = false` with the
underlying HRESULT and never throw. There is no transactional / fail-fast
semantic — bulk here means "one round-trip, per-entry results", not
"atomic".
Rationale: MXAccess COM itself has no native bulk API for any of these
operations. Surfacing the per-entry result list keeps parity transparent —
the caller sees the same per-item HRESULT they would see calling MXAccess
N times directly — while the bulk shape collapses the gateway/IPC overhead
to one round-trip per batch and lets the worker keep the STA hot.
`ReadBulk` is the only bulk command without a 1:1 MXAccess analogue. Two
choices were considered:
1. **Cache-then-snapshot** (chosen): when a requested tag is already in the
session's item registry AND advised, the worker returns the last cached
`OnDataChange` value without touching the subscription
(`was_cached = true`). Otherwise it takes the full `AddItem + Advise +
wait-for-first-OnDataChange + UnAdvise + RemoveItem` lifecycle itself
(`was_cached = false`) and leaves the session exactly as it was before
the call. The cache lives on a per-session `MxAccessValueCache`,
populated by `MxAccessBaseEventSink` on every `OnDataChange` after the
event clears the outbound queue.
2. **Always-snapshot**: take the AddItem-through-RemoveItem lifecycle for
every requested tag. Cleaner conceptually but pays the full lifecycle
cost on every call and would interfere with existing subscriptions if
MXAccess reuses item handles.
The chosen behavior matches what callers actually want from "current
value" — a free read of an already-streaming tag, and a one-shot snapshot
otherwise — and never disturbs subscriptions the caller did not create.
The decision intentionally does NOT synthesize an `OnDataChange` event
from the snapshot path: the snapshot value reaches the caller through
`ReadBulk`'s reply payload only, not through the event stream. This
preserves the "Don't synthesize events" rule that scopes the rest of the
worker.
`ReadBulk`'s wait loop pumps Windows messages on the worker STA
(`StaRuntime.PumpPendingMessages`) on every poll iteration so the inbound
MXAccess COM event can dispatch while the bulk executor still holds the
thread — without the pump the OnDataChange would never deliver.
## Graceful Worker Shutdown
Decision: best-effort cleanup before COM release.
+4 -4
View File
@@ -1,6 +1,6 @@
# Gateway Diagnostics
The diagnostics subsystem provides structured logging, credential redaction, and request-scoped log enrichment for the gateway. It lives under `src/MxGateway.Server/Diagnostics/` and is wired into the ASP.NET Core pipeline so every gRPC and HTTP request carries the same correlation fields.
The diagnostics subsystem provides structured logging, credential redaction, and request-scoped log enrichment for the gateway. It lives under `src/ZB.MOM.WW.MxGateway.Server/Diagnostics/` and is wired into the ASP.NET Core pipeline so every gRPC and HTTP request carries the same correlation fields.
## Goals
@@ -162,7 +162,7 @@ public static IApplicationBuilder UseGatewayRequestLoggingScope(this IApplicatio
{
ILogger logger = context.RequestServices
.GetRequiredService<ILoggerFactory>()
.CreateLogger("MxGateway.Request");
.CreateLogger("ZB.MOM.WW.MxGateway.Request");
using IDisposable? scope = logger.BeginGatewayScope(new GatewayLogScope(
SessionId: ReadHeader(context, SessionIdHeaderName),
@@ -188,7 +188,7 @@ The scope is keyed off four custom headers and the standard `authorization` head
The numeric headers use `int.TryParse` and `ulong.TryParse`; missing or unparseable values become `null` and are dropped by `GatewayLogScope.ToDictionary`. This keeps the middleware tolerant of clients that do not yet emit every header, which matters because the earliest call in a session (`OpenSession`) has no `SessionId` to send.
The logger category is `MxGateway.Request`, which lets operators filter the request scope events independently from per-component categories.
The logger category is `ZB.MOM.WW.MxGateway.Request`, which lets operators filter the request scope events independently from per-component categories.
### Pipeline ordering
@@ -213,7 +213,7 @@ The order matters: putting the logging scope first ensures that authentication f
- `GatewayLogScope.ToDictionary` redacts `ClientIdentity` whenever a scope is materialized.
- `DashboardRedactor.Redact` delegates to `RedactClientIdentity` for any value containing the `mxgw_` marker, then falls back to a marker-keyword check for fields like `password` or `token`. This keeps dashboard renders aligned with log redaction.
- `MxGateway.Tests/Diagnostics/GatewayLogRedactorTests.cs` covers each redaction branch, including the assertion that `WriteSecured` values stay redacted even when `valueLoggingEnabled` is true.
- `ZB.MOM.WW.MxGateway.Tests/Diagnostics/GatewayLogRedactorTests.cs` covers each redaction branch, including the assertion that `WriteSecured` values stay redacted even when `valueLoggingEnabled` is true.
## Related Documentation
+91 -19
View File
@@ -2,7 +2,7 @@
The gateway exposes a read-only browse surface over the AVEVA System Platform
Galaxy Repository (the SQL Server database named `ZB`). Clients use it to
enumerate the deployed object hierarchy and each object's dynamic attributes
enumerate the deployed object hierarchy and each object's attributes
before subscribing to runtime values via the existing `MxAccessGateway` RPCs.
This is a metadata layer: it never reads or writes runtime tag values, never
@@ -19,20 +19,22 @@ ArchestrA IDE renders the deployment tree. Surfacing that data over gRPC lets
remote clients build a navigable address space without any coupling to the
COM layer or the host platform.
The query bodies are kept byte-for-byte identical to the equivalent OPC UA
server in the OtOpcUa project so the two consumers see the same row sets.
`HierarchySql` is the object-hierarchy query originally ported from the
equivalent OPC UA server in the OtOpcUa project. `AttributesSql` has since
diverged from OtOpcUa — see [Built-in vs configured attributes](#built-in-vs-configured-attributes)
— and is no longer kept in sync with it.
## RPC Surface
The service is defined in
`src/MxGateway.Contracts/Protos/galaxy_repository.proto` under package
`src/ZB.MOM.WW.MxGateway.Contracts/Protos/galaxy_repository.proto` under package
`galaxy_repository.v1`.
| RPC | Purpose |
|-----|---------|
| `TestConnection` | Connectivity probe. Returns `{ ok: bool }` after a `SELECT 1`. Does not throw on SQL failure — returns `ok = false`. Always hits SQL directly so it remains a true health check. |
| `GetLastDeployTime` | Returns the cached `galaxy.time_of_last_deploy`. Served from the shared hierarchy cache; refreshed in the background. |
| `DiscoverHierarchy` | Returns one page of the deployed hierarchy plus each returned object's dynamic attributes. **Served from cache** — see [Hierarchy Cache](#hierarchy-cache). |
| `DiscoverHierarchy` | Returns one page of the deployed hierarchy plus each returned object's attributes (configured and built-in — see [Built-in vs configured attributes](#built-in-vs-configured-attributes)). **Served from cache** — see [Hierarchy Cache](#hierarchy-cache). |
| `WatchDeployEvents` | **Server-streaming.** The server emits the current state immediately on subscribe (so clients can bootstrap without waiting), then emits one event per detected deploy change. See [Deploy Notifications](#deploy-notifications). |
`DiscoverHierarchy` is a paged unary RPC. The raw request accepts `page_size`
@@ -53,7 +55,7 @@ reports the post-filter count.
## Hierarchy Cache
The gateway holds a single shared `IGalaxyHierarchyCache`
(`src/MxGateway.Server/Galaxy/GalaxyHierarchyCache.cs`) — every
(`src/ZB.MOM.WW.MxGateway.Server/Galaxy/GalaxyHierarchyCache.cs`) — every
`DiscoverHierarchy` and `GetLastDeployTime` request reads from this cache
rather than hitting SQL. Many clients can browse concurrently with at most
one SQL query in flight.
@@ -87,10 +89,40 @@ load to complete before returning. If the first load fails or times out,
the client gets `Unavailable` with a short reason. Once any load completes
(success or failure), this wait is skipped on subsequent calls.
### On-disk snapshot
The gateway may lose connectivity to the Galaxy database — and the database is
often unreachable right when the gateway itself restarts. To keep browse
working across that gap, the cache persists its dataset to disk:
- After every successful **heavy** refresh (a deploy change), the raw
hierarchy and attribute rowsets are written to
`MxGateway:Galaxy:SnapshotCachePath`
(default `C:\ProgramData\MxGateway\galaxy-snapshot.json`). The write is
atomic — a temp file plus rename — so a crash mid-write cannot corrupt the
snapshot. Cheap no-change ticks write nothing; the file is already current.
- On the **first** refresh after startup, before any SQL runs, the cache
reloads that file. The restored data is served with `Stale` status —
it is last-known data, not live — so clients can browse immediately even
when the Galaxy database is unreachable.
- The first live query then reconciles: if it observes the **same**
`time_of_last_deploy` the snapshot was saved at, the entry is promoted to
`Healthy` with no heavy re-query (the snapshot is provably current); if it
observes a newer deploy, the heavy queries run and replace the snapshot; if
the database is still unreachable, the entry stays `Stale`.
`is_alarm` / `is_historized` filters, paging, and the dashboard summary all
work against a restored snapshot exactly as against a live pull — the restore
path runs the same materialization. Persistence is disabled by setting
`MxGateway:Galaxy:PersistSnapshot` to `false`; the snapshot file is then
neither written nor read, and a cold start with an unreachable database comes
up `Unavailable` as before. The on-disk file is a cache, not a system of
record: deleting it only forces the next cold start to wait for live SQL.
## Deploy Notifications
`WatchDeployEvents` is a server-streaming RPC backed by
`IGalaxyDeployNotifier` (`src/MxGateway.Server/Galaxy/GalaxyDeployNotifier.cs`).
`IGalaxyDeployNotifier` (`src/ZB.MOM.WW.MxGateway.Server/Galaxy/GalaxyDeployNotifier.cs`).
The notifier maintains a private bounded channel per subscriber so a slow
client cannot back-pressure other subscribers or the publisher.
@@ -176,6 +208,43 @@ message DiscoverHierarchyReply {
}
```
### Built-in vs configured attributes
Each `GalaxyObject` carries two kinds of attribute, both surfaced the same way
in the `attributes` list:
- **Configured (dynamic) attributes** — attributes added in the ArchestrA IDE
attribute editor. Stored in the Galaxy `dynamic_attribute` table.
- **Built-in attributes** — attributes every object inherits from its
primitives: the object framework, the engine/platform primitives, and the
per-attribute extensions (Alarm, History, Boolean, …). Stored in
`attribute_definition` and reached through `primitive_instance`.
Built-in attributes are why an `AppEngine` or `WinPlatform` object reports its
`Engine.*` and `Alarm*` attributes, and why an alarmed attribute such as
`TestAlarm001` reports its extension leaves `TestAlarm001.Acked`,
`TestAlarm001.AckMsg`, `TestAlarm001.ActiveAlarmState`, and so on. An earlier
version of the browse query returned only configured attributes, so those
objects came back empty or partial; including built-ins makes the browse
surface match what System Platform's own Object Viewer shows. Expect roughly
seven times as many attributes as configured-only — the dashboard attribute
count reflects this.
Two rules govern the built-in rows:
- **No category filter.** `attribute_definition` uses a different
`mx_attribute_category` numbering than `dynamic_attribute`, so only the
`_`-prefixed-name and `.Description` exclusions apply to built-ins. (The
configured-attribute category allow-list is unchanged.)
- **`is_historized` / `is_alarm` are always `false` for built-in rows.** Those
flags identify a configured attribute that *anchors* a history or alarm
extension (e.g. `TestAlarm001`), not the extension's machinery leaves
(`TestAlarm001.Acked`). `alarm_bearing_only` and `historized_only` therefore
still select the anchor attributes, not their built-in children.
When a configured attribute and a built-in attribute resolve to the same
reference, the configured attribute wins.
### Contained name vs tag name
Galaxy objects carry two names. `tag_name` is globally unique and is what
@@ -201,7 +270,7 @@ fields cannot express null. Use it to distinguish "no dimension reported" from
```text
gRPC client(s)
-> GalaxyRepositoryGrpcService (src/MxGateway.Server/Grpc/)
-> GalaxyRepositoryGrpcService (src/ZB.MOM.WW.MxGateway.Server/Grpc/)
DiscoverHierarchy, GetLastDeployTime -> IGalaxyHierarchyCache.Current
WatchDeployEvents -> IGalaxyDeployNotifier
TestConnection -> GalaxyRepository (direct SQL)
@@ -218,29 +287,30 @@ GalaxyHierarchyRefreshService (BackgroundService)
Component breakdown:
- `GalaxyRepository` (`src/MxGateway.Server/Galaxy/GalaxyRepository.cs`) holds
the SQL. Its constants `HierarchySql` and `AttributesSql` are copied verbatim
from the OtOpcUa project; do not edit them in isolation here. The two
queries walk template-derivation and package-derivation chains via
recursive CTEs and pick the most-derived attribute override per object.
- `GalaxyRepository` (`src/ZB.MOM.WW.MxGateway.Server/Galaxy/GalaxyRepository.cs`) holds
the SQL. Both `HierarchySql` and `AttributesSql` walk template-derivation and
package-derivation chains via recursive CTEs and pick the most-derived
override per object. `HierarchySql` still matches the OtOpcUa original;
`AttributesSql` does not — it additionally enumerates built-in primitive
attributes (see [Built-in vs configured attributes](#built-in-vs-configured-attributes)).
- `GalaxyHierarchyCache`
(`src/MxGateway.Server/Galaxy/GalaxyHierarchyCache.cs`) holds the most
(`src/ZB.MOM.WW.MxGateway.Server/Galaxy/GalaxyHierarchyCache.cs`) holds the most
recent immutable `GalaxyHierarchyCacheEntry` (materialized objects +
precomputed dashboard summary + counts + status). All gRPC clients share the
same entry.
- `GalaxyHierarchyRefreshService`
(`src/MxGateway.Server/Galaxy/GalaxyHierarchyRefreshService.cs`) is a
(`src/ZB.MOM.WW.MxGateway.Server/Galaxy/GalaxyHierarchyRefreshService.cs`) is a
hosted `BackgroundService` that drives `RefreshAsync` on the configured
interval, with deploy-time gating to avoid unnecessary heavy queries.
- `GalaxyDeployNotifier`
(`src/MxGateway.Server/Galaxy/GalaxyDeployNotifier.cs`) is a thin
(`src/ZB.MOM.WW.MxGateway.Server/Galaxy/GalaxyDeployNotifier.cs`) is a thin
per-subscriber-channel fan-out for streaming clients.
- `GalaxyProtoMapper`
(`src/MxGateway.Server/Grpc/GalaxyProtoMapper.cs`) converts row models to
(`src/ZB.MOM.WW.MxGateway.Server/Grpc/GalaxyProtoMapper.cs`) converts row models to
proto messages. Used by the cache during refresh to materialize the reply
once.
- `GalaxyRepositoryGrpcService`
(`src/MxGateway.Server/Grpc/GalaxyRepositoryGrpcService.cs`) implements
(`src/ZB.MOM.WW.MxGateway.Server/Grpc/GalaxyRepositoryGrpcService.cs`) implements
the four RPCs.
## Configuration
@@ -251,6 +321,8 @@ Bound to `MxGateway:Galaxy` via `GalaxyRepositoryOptions`.
|--------|---------|-------------|
| `MxGateway:Galaxy:ConnectionString` | `Server=localhost;Database=ZB;Integrated Security=True;TrustServerCertificate=True;Encrypt=False;` | SQL Server connection string for the Galaxy Repository. Integrated Security against `localhost` is the dev default; production deployments should override this through the standard double-underscore environment variable form, e.g. `MxGateway__Galaxy__ConnectionString`. |
| `MxGateway:Galaxy:CommandTimeoutSeconds` | `60` | Per-command SQL timeout. Applies to all three RPCs. |
| `MxGateway:Galaxy:PersistSnapshot` | `true` | Persists each successful browse dataset to disk and reloads it at startup. See [On-disk snapshot](#on-disk-snapshot). |
| `MxGateway:Galaxy:SnapshotCachePath` | `C:\ProgramData\MxGateway\galaxy-snapshot.json` | File path for the persisted browse snapshot. Ignored when `PersistSnapshot` is `false`. |
The connection string is not treated as a secret in dev (`Integrated
Security`), but production deployments that use SQL authentication should set
@@ -306,7 +378,7 @@ that as a yellow or red status badge plus the truncated error.
- Failures to reach the Galaxy database surface as `Unavailable`. Detailed
SQL exceptions are logged at `Warning` and never returned to clients.
- Integration tests live in
`src/MxGateway.IntegrationTests/Galaxy/GalaxyRepositoryLiveTests.cs`. Set
`src/ZB.MOM.WW.MxGateway.IntegrationTests/Galaxy/GalaxyRepositoryLiveTests.cs`. Set
`MXGATEWAY_RUN_LIVE_GALAXY_TESTS=1` (and optionally
`MXGATEWAY_LIVE_GALAXY_CONN`) to run them; otherwise they skip.
+25 -3
View File
@@ -19,7 +19,7 @@ paths, timeouts, queue sizes, enum values, or protocol values are invalid.
"RunMigrationsOnStartup": true
},
"Worker": {
"ExecutablePath": "src\\MxGateway.Worker\\bin\\x86\\Release\\MxGateway.Worker.exe",
"ExecutablePath": "src\\ZB.MOM.WW.MxGateway.Worker\\bin\\x86\\Release\\ZB.MOM.WW.MxGateway.Worker.exe",
"WorkingDirectory": null,
"RequiredArchitecture": "X86",
"StartupTimeoutSeconds": 30,
@@ -60,7 +60,15 @@ paths, timeouts, queue sizes, enum values, or protocol values are invalid.
"Galaxy": {
"ConnectionString": "Server=localhost;Database=ZB;Integrated Security=True;TrustServerCertificate=True;Encrypt=False;",
"CommandTimeoutSeconds": 60,
"DashboardRefreshIntervalSeconds": 30
"DashboardRefreshIntervalSeconds": 30,
"PersistSnapshot": true,
"SnapshotCachePath": "C:\\ProgramData\\MxGateway\\galaxy-snapshot.json"
},
"Alarms": {
"Enabled": false,
"SubscriptionExpression": "",
"DefaultArea": "",
"ReconcileIntervalSeconds": 30
}
}
}
@@ -86,7 +94,7 @@ When `Mode` is `ApiKey`, `SqlitePath` and `PepperSecretName` must be present.
| Option | Default | Description |
|--------|---------|-------------|
| `MxGateway:Worker:ExecutablePath` | `src\MxGateway.Worker\bin\x86\Release\MxGateway.Worker.exe` | Path to the x86 worker executable launched for each gateway session. The path must be valid and point to a `.exe` file. |
| `MxGateway:Worker:ExecutablePath` | `src\ZB.MOM.WW.MxGateway.Worker\bin\x86\Release\ZB.MOM.WW.MxGateway.Worker.exe` | Path to the x86 worker executable launched for each gateway session. The path must be valid and point to a `.exe` file. |
| `MxGateway:Worker:WorkingDirectory` | `null` | Optional working directory for the worker process. When set, it must be a valid filesystem path. |
| `MxGateway:Worker:RequiredArchitecture` | `X86` | Required Portable Executable architecture for the worker. Supported values are `X86` and `X64`; MXAccess parity uses `X86`. |
| `MxGateway:Worker:StartupTimeoutSeconds` | `30` | Total startup budget for process launch, startup probe, pipe connect, handshake, and worker readiness. |
@@ -164,10 +172,24 @@ at startup.
| `MxGateway:Galaxy:ConnectionString` | `Server=localhost;Database=ZB;Integrated Security=True;TrustServerCertificate=True;Encrypt=False;` | SQL Server connection string for the Galaxy Repository (`ZB`) used by the `GalaxyRepository` browse RPCs. Override in production via `MxGateway__Galaxy__ConnectionString`. |
| `MxGateway:Galaxy:CommandTimeoutSeconds` | `60` | Per-command SQL timeout for all Galaxy browse RPCs. |
| `MxGateway:Galaxy:DashboardRefreshIntervalSeconds` | `30` | Interval between background refreshes of the dashboard Galaxy summary cache. SQL is hit at most once per interval regardless of dashboard render rate. |
| `MxGateway:Galaxy:PersistSnapshot` | `true` | Persists the latest successful Galaxy browse dataset to disk. When `true`, the cache reloads that snapshot at startup so clients can still browse last-known data while the Galaxy database is unreachable. The restored data is served with `Stale` status until a live query confirms it. |
| `MxGateway:Galaxy:SnapshotCachePath` | `C:\ProgramData\MxGateway\galaxy-snapshot.json` | File path for the persisted Galaxy browse snapshot. Ignored when `PersistSnapshot` is `false`. The snapshot is written atomically (temp file plus rename). |
See [Galaxy Repository Browse](./GalaxyRepository.md) for the RPC surface and
behavior.
## Alarm Options
| Option | Default | Description |
|--------|---------|-------------|
| `MxGateway:Alarms:Enabled` | `false` | Gates the gateway's always-on central alarm monitor. When `true`, the gateway opens one gateway-owned worker session dedicated to alarms, caches the active-alarm set, and fans it out to every client through the `StreamAlarms` RPC — no client opens its own session to see alarms. |
| `MxGateway:Alarms:SubscriptionExpression` | _(empty)_ | AVEVA alarm-subscription expression the monitor subscribes on startup, in canonical `\\<machine>\Galaxy!<area>` form. The literal `Galaxy` provider is correct regardless of the Galaxy database name. When empty and `Enabled` is `true`, the gateway falls back to `\\<MachineName>\Galaxy!<DefaultArea>` if `DefaultArea` is set. |
| `MxGateway:Alarms:DefaultArea` | _(empty)_ | Area name used to compose a default subscription when `SubscriptionExpression` is empty. If both are empty while `Enabled` is `true`, the monitor faults with a configuration diagnostic. |
| `MxGateway:Alarms:ReconcileIntervalSeconds` | `30` | How often the monitor reconciles its in-process alarm cache against the worker's authoritative active-alarm snapshot, catching transitions the live poll-and-diff feed missed. Floored at 5 seconds. |
The alarm monitor is independent of client sessions: `AcknowledgeAlarm` and
`StreamAlarms` are session-less RPCs served by the monitor.
## Related Documentation
- [Gateway Process Detailed Design](./GatewayProcessDesign.md)
+103 -5
View File
@@ -34,7 +34,7 @@ SignalR circuit. Bootstrap is sufficient for a basic dashboard.
## Hosting Model
The dashboard is hosted by `MxGateway.Server` alongside the gRPC API. When
The dashboard is hosted by `ZB.MOM.WW.MxGateway.Server` alongside the gRPC API. When
`MxGateway:Dashboard:Enabled` is `true`, `MapGatewayDashboard()` maps the
configured `Dashboard:PathBase` to the Blazor Server app and maps the login,
logout, and access-denied HTTP endpoints beside it. When dashboard hosting is
@@ -49,6 +49,7 @@ Endpoint layout:
/dashboard/workers
/dashboard/events
/dashboard/galaxy
/dashboard/apikeys
/dashboard/settings
/dashboard/_blazor
```
@@ -68,7 +69,7 @@ dashboard as the default web page. Otherwise leave gRPC/API hosting unaffected.
## High-Level Components
```text
MxGateway.Server
ZB.MOM.WW.MxGateway.Server
Dashboard/
Components/
App.razor
@@ -83,6 +84,7 @@ MxGateway.Server
SessionDetailsPage.razor
WorkersPage.razor
EventsPage.razor
ApiKeysPage.razor
SettingsPage.razor
Shared/
MetricCard.razor
@@ -91,6 +93,9 @@ MxGateway.Server
DashboardSnapshotService.cs
DashboardAuthorizationHandler.cs
DashboardAuthenticator.cs
DashboardApiKeyAuthorization.cs
DashboardApiKeyManagementService.cs
DashboardApiKeySummary.cs
DashboardSnapshot.cs
DashboardSessionSummary.cs
DashboardWorkerSummary.cs
@@ -249,6 +254,99 @@ Show aggregate event diagnostics:
Do not display full tag values by default. If value display is later added, make
it opt-in and redacted.
### Browse page
`/dashboard/browse` lets an operator explore the Galaxy tag hierarchy and watch
live values. The tree is built in-process by `DashboardBrowseTreeBuilder` from
`IGalaxyHierarchyCache.Current` — the same cache the Galaxy page reads — so a
render costs no gRPC call and no SQL round-trip. Each node shows its child
objects and, when expanded, its attributes with attribute name, data type
(including array dimension), and the alarm / historized flags. Galaxy SQL
carries no attribute description, so none is shown. A filter box switches the
tree to a flat list of matching attributes.
Right-clicking an attribute (or double-clicking it) adds it to the subscription
panel. The panel shows each subscribed tag's live value, MXAccess data type,
quality and source timestamp, refreshed every two seconds. The subscription
panel is the explicit opt-in tag-value surface: it always shows values
regardless of `Dashboard:ShowTagValues`, which continues to govern only the
diagnostic session/worker views.
### Alarms page
`/dashboard/alarms` lists the alarms the gateway's central alarm monitor
currently holds as Active or ActiveAcked, refreshed every three seconds. It
defaults to showing unacknowledged `Active` alarms; filters add acknowledged
alarms and narrow by area, severity range, and a reference/source/description
text search. Cleared alarms are not retained — the gateway holds no
alarm-history store, so the page reflects only the live active set. The page is
read-only; it does not acknowledge alarms. If `MxGateway:Alarms:Enabled` is
false the central monitor never starts, and the page says so instead of showing
an empty list with no explanation.
### Live data source
Both the Browse subscription panel and the Alarms page read live MXAccess data
through `IDashboardLiveDataService` (`DashboardLiveDataService`). For tag data
it owns one shared gateway session for the whole dashboard, opened lazily on
first use via `ISessionManager` and re-opened transparently when it faults or
its lease expires. One session means one worker process backs every dashboard
circuit; all access is serialised so the worker sees one in-flight command at a
time. Tag reads go through `GatewaySession.SubscribeBulkAsync` / `ReadBulkAsync`.
The Alarms page does **not** use the dashboard session: alarm data comes from
the gateway's always-on central monitor. `QueryAlarmsAsync` reads
`IGatewayAlarmService.CurrentAlarms` — the monitor's in-process cache — so the
dashboard sees the same active-alarm set as every `StreamAlarms` client, with
no per-dashboard alarm subscription. When `MxGateway:Alarms:Enabled` is false
the monitor never starts and the cache stays empty.
### API keys page
`/dashboard/apikeys` lists the gateway's API keys and, for authorized
operators, manages them. It reads key metadata through the same
`IApiKeyAdminStore` the `apikey` CLI uses, so the dashboard and the CLI act
on one source of truth.
The table shows one row per key:
- key id,
- status (`Active` or `Revoked`),
- display name,
- scopes,
- constraints (rendered as `unconstrained` when none are set),
- created timestamp,
- last-used timestamp.
Key secrets are never listed. Only the peppered hash is stored, and the page
never reconstructs a key. See [Authorization](./Authorization.md#constraint-enforcement)
for what each constraint means and how it is enforced on the gRPC path.
#### Management actions
Create, Rotate, and Revoke controls render only when the signed-in user is
authorized. `DashboardApiKeyAuthorization.CanManage` requires an authenticated
principal that is a member of the LDAP `MxGateway:Ldap:RequiredGroup` — the
same group the dashboard login enforces. An anonymous localhost viewer can read
the table but sees no action controls.
- **Create** opens a dialog for the key id, display name, scope checkboxes
(the `GatewayScopes` catalog), and the optional constraint fields: read and
write subtrees, read and write tag globs, browse subtrees, max write
classification, and the read-alarm-only / read-historized-only flags.
- **Rotate** issues a new secret for an existing key id and invalidates the
old one.
- **Revoke** marks a key revoked; a revoked key cannot be un-revoked.
Create and Rotate return the assembled `mxgw_<keyId>_<secret>` token **once**,
in a one-time banner. It is never shown again, so the operator must copy it
immediately. This mirrors the `apikey create-key` / `rotate-key` CLI.
Every management action appends an `api_key_audit` entry
(`dashboard-create-key`, `dashboard-rotate-key`, `dashboard-revoke-key`) with
the key id and the caller's remote address. Secrets and pepper values are never
logged.
### Settings page
Show read-only effective configuration:
@@ -330,8 +428,8 @@ Suggested configuration:
## Styling
The dashboard serves Bootstrap 5.3.3 assets from
`src/MxGateway.Server/wwwroot/lib/bootstrap/` and local layout/status styling
from `src/MxGateway.Server/wwwroot/css/dashboard.css`.
`src/ZB.MOM.WW.MxGateway.Server/wwwroot/lib/bootstrap/` and local layout/status styling
from `src/ZB.MOM.WW.MxGateway.Server/wwwroot/css/dashboard.css`.
Recommended visual language:
@@ -377,7 +475,7 @@ Integration tests should verify:
The first dashboard slice implements:
1. Blazor Server hosting in `MxGateway.Server`.
1. Blazor Server hosting in `ZB.MOM.WW.MxGateway.Server`.
2. local Bootstrap static assets.
3. dashboard configuration binding.
4. dashboard auth using API key login and HTTP-only cookie.
+10 -10
View File
@@ -59,7 +59,7 @@ Those belong to the worker.
## High-Level Components
```text
MxGateway.Server
ZB.MOM.WW.MxGateway.Server
Program / Host
Configuration
Grpc
@@ -677,7 +677,7 @@ development only.
Dashboard authentication reuses the API-key verifier and scope model. The
dashboard login endpoint accepts the key in a form post, checks `admin` scope
when `Dashboard:RequireAdminScope` is enabled, and signs in with the
`MxGateway.Dashboard` cookie scheme. The cookie is HTTP-only, secure, strict
`ZB.MOM.WW.MxGateway.Dashboard` cookie scheme. The cookie is HTTP-only, secure, strict
SameSite, and scoped with the `__Host-MxGatewayDashboard` name. Logout clears
that cookie. Login and logout posts use anti-forgery validation, and dashboard
API keys are not accepted in query strings. `Dashboard:AllowAnonymousLocalhost`
@@ -703,15 +703,15 @@ gRPC admin API. It should initialize the auth database, create keys, list keys
without secrets, revoke keys, rotate keys, and print raw secrets only once at
creation.
`MxGateway.Server` exposes local API-key administration as an `apikey`
`ZB.MOM.WW.MxGateway.Server` exposes local API-key administration as an `apikey`
subcommand before the web host starts:
```bash
MxGateway.Server apikey init-db --sqlite-path C:\ProgramData\MxGateway\gateway-auth.db
MxGateway.Server apikey create-key --key-id operator01 --display-name Operator --scopes session:open,events:read
MxGateway.Server apikey list-keys --json
MxGateway.Server apikey revoke-key --key-id operator01
MxGateway.Server apikey rotate-key --key-id operator01 --json
ZB.MOM.WW.MxGateway.Server apikey init-db --sqlite-path C:\ProgramData\MxGateway\gateway-auth.db
ZB.MOM.WW.MxGateway.Server apikey create-key --key-id operator01 --display-name Operator --scopes session:open,events:read
ZB.MOM.WW.MxGateway.Server apikey list-keys --json
ZB.MOM.WW.MxGateway.Server apikey revoke-key --key-id operator01
ZB.MOM.WW.MxGateway.Server apikey rotate-key --key-id operator01 --json
```
The subcommands accept `--sqlite-path`, `--pepper`, and `--json`. `--pepper`
@@ -846,7 +846,7 @@ Suggested configuration shape:
"RunMigrationsOnStartup": true
},
"Worker": {
"ExecutablePath": "src/MxGateway.Worker/bin/x86/Release/MxGateway.Worker.exe",
"ExecutablePath": "src/ZB.MOM.WW.MxGateway.Worker/bin/x86/Release/ZB.MOM.WW.MxGateway.Worker.exe",
"WorkingDirectory": null,
"RequiredArchitecture": "X86",
"StartupTimeoutSeconds": 30,
@@ -887,7 +887,7 @@ Suggested configuration shape:
Do not scatter connection or path constants through implementation code.
`MxGateway.Server` binds this section to `GatewayOptions` at startup and
`ZB.MOM.WW.MxGateway.Server` binds this section to `GatewayOptions` at startup and
registers validation with `ValidateOnStart()`. Startup fails before the gateway
begins serving traffic when required authentication settings are missing,
timeouts or queue sizes are not positive, dashboard settings are malformed, or
+260 -25
View File
@@ -7,13 +7,13 @@ provider state.
## Fake Worker Harness
`FakeWorkerHarness` in `src/MxGateway.Tests/Gateway/Workers/Fakes/` provides an
`FakeWorkerHarness` in `src/ZB.MOM.WW.MxGateway.Tests/Gateway/Workers/Fakes/` provides an
in-process worker side for named-pipe IPC tests. It uses the same
`WorkerFrameReader`, `WorkerFrameWriter`, and `WorkerEnvelope` contract as the
gateway so tests exercise real frame validation and worker-client state changes.
Use the harness when a gateway or session test needs worker behavior without
starting `MxGateway.Worker.exe` or loading MXAccess COM. The harness scripts:
starting `ZB.MOM.WW.MxGateway.Worker.exe` or loading MXAccess COM. The harness scripts:
- `WorkerHello` and `WorkerReady` startup,
- command replies with matching correlation ids,
@@ -37,43 +37,196 @@ event, and `CloseSession` without loading MXAccess COM.
## Live MXAccess Smoke
`WorkerLiveMxAccessSmokeTests` in `src/MxGateway.IntegrationTests/` composes the
`WorkerLiveMxAccessSmokeTests` in `src/ZB.MOM.WW.MxGateway.IntegrationTests/` composes the
real gRPC service, `SessionManager`, `SessionWorkerClientFactory`,
`WorkerClient`, `WorkerProcessLauncher`, and `MxGateway.Worker.exe`. It is
`WorkerClient`, `WorkerProcessLauncher`, and `ZB.MOM.WW.MxGateway.Worker.exe`. It is
skipped unless `MXGATEWAY_RUN_LIVE_MXACCESS_TESTS=1` is set because it creates
the installed MXAccess COM object and depends on live provider state.
The live smoke opens a gateway session, launches the x86 worker, runs
`Register`, `AddItem`, and `Advise`, waits a bounded time for one
`OnDataChange`, and closes the session in a `finally` block so the worker gets a
graceful shutdown request even when a command or event assertion fails.
`Register`, `AddItem`, and `Advise`, waits a bounded time for the first
`OnDataChange` event (skipping any earlier bootstrap/registration-state event),
and closes the session in a `finally` block so the worker gets a graceful
shutdown request even when a command or event assertion fails. Cleanup failures
in that `finally` block are logged rather than thrown, so a real assertion
failure is never masked by a shutdown timeout.
`WorkerLiveMxAccessSmokeTests` additionally covers five MXAccess parity paths the
fake-worker tests cannot validate:
- a `Write` round-trip against an advised item, asserting both that the reply is
`Ok` / `MxCommandKind.Write` *and* that the worker emits a matching
`OnWriteComplete` event for the targeted (server, item) handle pair — the
same round-trip proof used by `scripts/run-client-e2e-tests.ps1`,
- an `AddItem` against an invalid server handle, asserting the MXAccess failure
surfaces in the command reply without faulting the gateway transport,
- the `UnAdvise``RemoveItem``Unregister` teardown chain, asserting each
step replies `Ok` with the matching `MxCommandKind`, that no further
`OnDataChange` events arrive for the un-advised pair, and that a second
`RemoveItem` against the freed handle relays a non-`Ok` MXAccess failure,
- a `WriteSecured` round-trip after `AuthenticateUser`, asserting the reply
carries `MxCommandKind.WriteSecured` and the credential password never
appears in the diagnostic message (parity for both the secured-write
ordering rule and the "do not log secrets" contract), and
- an abnormal worker exit (the worker process is killed mid-session) where the
gateway must transition the session to `SessionState.Faulted` with a
non-empty fault description carrying a known worker-client classification
(pipe disconnected / worker faulted / end-of-stream / heartbeat expired).
All six tests are gated by the same `MXGATEWAY_RUN_LIVE_MXACCESS_TESTS=1`
opt-in variable.
Build the worker before running the smoke:
```bash
dotnet build src/MxGateway.Worker/MxGateway.Worker.csproj -p:Platform=x86
dotnet build src/ZB.MOM.WW.MxGateway.Worker/ZB.MOM.WW.MxGateway.Worker.csproj -p:Platform=x86
```
Run the smoke explicitly:
```bash
$env:MXGATEWAY_RUN_LIVE_MXACCESS_TESTS = "1"
dotnet test src/MxGateway.IntegrationTests/MxGateway.IntegrationTests.csproj --filter FullyQualifiedName~WorkerLiveMxAccessSmokeTests
dotnet test src/ZB.MOM.WW.MxGateway.IntegrationTests/ZB.MOM.WW.MxGateway.IntegrationTests.csproj --filter FullyQualifiedName~WorkerLiveMxAccessSmokeTests
```
Optional live smoke variables:
| Variable | Default | Description |
|----------|---------|-------------|
| `MXGATEWAY_LIVE_MXACCESS_WORKER_EXE` | First existing `MxGateway.Worker.exe` under `src/MxGateway.Worker/bin/...` | Worker executable path. Set this when running against a packaged worker or a non-default build output. |
| `MXGATEWAY_LIVE_MXACCESS_WORKER_EXE` | First existing `ZB.MOM.WW.MxGateway.Worker.exe` under `src/ZB.MOM.WW.MxGateway.Worker/bin/...` | Worker executable path. Set this when running against a packaged worker or a non-default build output. |
| `MXGATEWAY_LIVE_MXACCESS_ITEM` | `TestChildObject.TestInt` | MXAccess item reference used by `AddItem`. |
| `MXGATEWAY_LIVE_MXACCESS_CLIENT_NAME` | `MxGateway.IntegrationTests` | Client name passed to `Register`. |
| `MXGATEWAY_LIVE_MXACCESS_EVENT_TIMEOUT_SECONDS` | `15` | Maximum wait for the first `OnDataChange`. |
| `MXGATEWAY_LIVE_MXACCESS_CLIENT_NAME` | `ZB.MOM.WW.MxGateway.IntegrationTests` | Client name passed to `Register`. |
| `MXGATEWAY_LIVE_MXACCESS_EVENT_TIMEOUT_SECONDS` | `15` | Maximum wait for the first `OnDataChange` (also used for the `OnWriteComplete` round-trip and the abnormal-exit fault transition). |
| `MXGATEWAY_LIVE_MXACCESS_WRITE_SECURED_USER` | `admin` | ArchestrA user name passed to `AuthenticateUser` before the `WriteSecured` parity step. |
| `MXGATEWAY_LIVE_MXACCESS_WRITE_SECURED_PASSWORD` | `admin123` | Password paired with the user above. Never logged; the test asserts the value does not appear in the WriteSecured diagnostic message. |
The test output includes session id, worker process id, command status,
HRESULT/status diagnostics, event sequence and handles, close status, and worker
stdout/stderr lines emitted during the run.
## Dev-rig Probes
`src/ZB.MOM.WW.MxGateway.Worker.Tests/Probes/` partitions runtime probes from the regular
Worker.Tests regression suite. The folder is its own
`ZB.MOM.WW.MxGateway.Worker.Tests.Probes` namespace so a discovery filter (e.g. `dotnet
test --filter FullyQualifiedName~ZB.MOM.WW.MxGateway.Worker.Tests.Probes`) can target or
exclude them without enumerating individual class names. The probes are
`[Fact(Skip = "...")]` by default and exist to characterize live AVEVA
behavior on the dev rig, not to gate CI — flip `Skip = null` on the dev box
with installed MXAccess + a running Galaxy provider when running them:
- `AlarmsLiveSmokeTests` — end-to-end smoke for the alarms-over-gateway
pipeline (`WnWrapAlarmConsumer` + `AlarmDispatcher` +
`MxAccessAlarmEventSink`) against `\\<machine>\Galaxy!DEV` with the dev rig's
10-second flip script writing `TestMachine_001.TestAlarm001`.
- `AlarmClientWmProbeTests` — registers as an `AlarmClient` consumer on a real
hidden message-only window and logs every Win32 message that arrives during
a fixed pump window. Used to identify the `WM_APP` /
`RegisterWindowMessage` IDs alarm callbacks use.
- `WnWrapConsumerProbeTests` — instantiates AVEVA's standalone `wnwrapConsumer`
COM class, subscribes to the dev rig's `\\<machine>\Galaxy!DEV` provider,
and polls `GetXmlCurrentAlarms2`. The XML payload bypasses the
`FILETIME→DateTime` auto-marshaling that crashes
`aaAlarmManagedClient.AlarmClient.GetHighPriAlarm` on this rig.
The probes share the Worker.Tests project (so they can use its `net48`/`x86`
configuration and the installed `ArchestrA.MxAccess` / `aaAlarmManagedClient`
references), but they are not part of the regression contract — a Worker.Tests
run with `Skip` left in place passes them as skipped.
## Live Galaxy Repository
`GalaxyRepositoryLiveTests` in `src/ZB.MOM.WW.MxGateway.IntegrationTests/Galaxy/` exercises
`GalaxyRepository` directly against the `ZB` Galaxy Repository SQL database. It is
skipped unless `MXGATEWAY_RUN_LIVE_GALAXY_TESTS=1` is set because it depends on a
reachable SQL Server instance and deployed Galaxy state — fake-worker tests cannot
cover the SQL browse RPCs.
The suite covers `TestConnectionAsync`, `GetLastDeployTimeAsync`,
`GetHierarchyAsync`, and `GetAttributesAsync`. `GetHierarchyAsync` and
`GetAttributesAsync` assert a non-empty result, so the connected `ZB` database
must contain a deployed Galaxy, not just an empty schema.
Run the Galaxy live tests explicitly:
```bash
$env:MXGATEWAY_RUN_LIVE_GALAXY_TESTS = "1"
dotnet test src/ZB.MOM.WW.MxGateway.IntegrationTests/ZB.MOM.WW.MxGateway.IntegrationTests.csproj --filter FullyQualifiedName~GalaxyRepositoryLiveTests
```
Optional live Galaxy variables:
| Variable | Default | Description |
|----------|---------|-------------|
| `MXGATEWAY_LIVE_GALAXY_CONN` | `Server=localhost;Database=ZB;Integrated Security=True;TrustServerCertificate=True;Encrypt=False;` | Galaxy Repository connection string. Set this when the `ZB` database is on a non-default instance or needs SQL authentication. |
The default connection string targets `ZB` on `localhost` with Windows
authentication, which matches the Galaxy Repository conventions in CLAUDE.md.
## Galaxy Filter Safety
`GalaxyFilterInputSafetyTests` in `src/ZB.MOM.WW.MxGateway.Tests/Galaxy/` covers adversarial
input handling for the Galaxy Repository browse filter layer. It runs in the
unit-test project (no live SQL needed) and complements the live SQL coverage in
`GalaxyRepositoryLiveTests`.
The test class re-frames the original "Galaxy SQL injection" concern (Tests-002 in
`code-reviews/Tests/findings.md`). `GalaxyRepository` issues only four *constant*
SQL statements (`HierarchySql`, `AttributesSql`, `SELECT 1`,
`SELECT time_of_last_deploy FROM galaxy`) — no `DiscoverHierarchyRequest` field
is ever concatenated into a SQL string, so there is no dynamic SQL surface and no
`LIKE`-escaping helper to test. All filters (`TagNameGlob`, `RootTagName`,
template-chain, category, contained-path) are applied **in memory** by
`GalaxyHierarchyProjector` / `GalaxyGlobMatcher` against the cached snapshot.
The adversarial-input matrix (`'`, `' OR '1'='1`, `'; DROP TABLE gobject;--`,
`%`, `_`, `100%_off`, `[abc]`, `Pump'001`) pins the following invariants:
- SQL metacharacters (`'`, `;`) and `LIKE`-wildcards (`%`, `_`) are treated as
opaque literals by `GalaxyGlobMatcher` — they never act as wildcards, never
spuriously match unrelated text.
- Only `*` and `?` are glob wildcards.
- `GalaxyGlobMatcher` applies a 100 ms regex timeout so a pathological glob
(e.g. 5 000 `a` characters plus a literal `!`) completes promptly rather than
catastrophically backtracking.
- `GalaxyHierarchyProjector` returns zero matches (rather than the whole
hierarchy) for an adversarial `TagNameGlob` or `TemplateChainContains`, and
surfaces `NotFound` for an adversarial `RootTagName`.
- The `DiscoverHierarchy` RPC end-to-end returns zero matches for adversarial
`TagNameGlob` rather than faulting.
These invariants are the real security surface of the Galaxy browse path; the
SQL-injection framing does not apply to a constant-query layer.
## Live LDAP
`DashboardLdapLiveTests` in `src/ZB.MOM.WW.MxGateway.IntegrationTests/` exercises
`DashboardAuthenticator` against the live GLAuth directory. It is skipped unless
`MXGATEWAY_RUN_LIVE_LDAP_TESTS=1` is set because it binds against the GLAuth
service described in `glauth.md`.
The suite builds the authenticator with a default `GatewayOptions`, so
`LdapOptions.RequiredGroup` keeps its `GwAdmin` default. `GwAdmin` is the
gateway-specific dashboard-admin role and is **not** part of the five baseline
GLAuth role groups — it must be provisioned before the LDAP live tests pass.
`AuthenticateAsync_AdminInGwAdminGroup_Succeeds` fails (rather than skips) when
GLAuth has only the baseline groups, so this is a hard prerequisite beyond "LDAP
is up." See the "Adding a gw-specific group" section of `glauth.md` for the
provisioning step that adds `GwAdmin` and grants it to `admin`.
The suite covers both the success path and the `DashboardAuthenticator` failure
branches: `admin` in `GwAdmin` succeeds; `readonly` is denied for missing group;
`admin` with a wrong password is rejected by the candidate bind without leaking
the password into `FailureMessage`; an unknown username yields no candidate; and
an unreachable LDAP server is absorbed into a failed result rather than throwing.
Run the LDAP live tests explicitly:
```bash
$env:MXGATEWAY_RUN_LIVE_LDAP_TESTS = "1"
dotnet test src/ZB.MOM.WW.MxGateway.IntegrationTests/ZB.MOM.WW.MxGateway.IntegrationTests.csproj --filter FullyQualifiedName~DashboardLdapLiveTests
```
## Client E2E Scripts
`scripts/discover-testmachine-tags.ps1` queries the ZB Galaxy Repository for the
@@ -100,11 +253,75 @@ powershell -ExecutionPolicy Bypass -File scripts/discover-testmachine-tags.ps1 -
```
`scripts/run-client-e2e-tests.ps1` drives the .NET, Go, Rust, Python, and Java
client CLIs through a live gateway session. For each client it opens one
session, registers, verifies `SubscribeBulk` and `UnsubscribeBulk` on a bounded
tag subset, adds and advises every discovered test tag, reads a bounded event
stream, then closes the session in a `finally` path. The script writes a JSON
report under `artifacts/e2e/`.
client CLIs through a live gateway session. The gateway and worker are assumed
to be already running at `-Endpoint`; the script does not start or stop them.
For each client it runs these phases, then closes the session in a `finally`
path and writes a JSON report under `artifacts/e2e/`:
1. **Session + register** — opens one session and registers.
2. **Bulk** — verifies `SubscribeBulk` / `UnsubscribeBulk` on a bounded tag
subset (skip with `-SkipBulk`).
3. **Add-item / advise** — adds and advises every discovered test tag. The
loop has no `StreamEvents` consumer attached, so advised tags accumulate
MXAccess change events in the worker event channel
(`MxGateway:Events:QueueCapacity`); left unbounded it overflows under
`FailFast` backpressure and faults the worker. Every `-DrainEveryTags`
advised tags (default 15) the loop connects a short-lived `StreamEvents`
drain so the gateway pumps that channel empty. `-DrainEveryTags 0` disables
the drain.
4. **Stream** — asserts a bounded event stream delivers at least one event
(skip with `-SkipStream`).
5. **Parity** — asserts MXAccess error paths are rejected rather than silently
succeeding: an invalid item handle and an unknown session id (skip with
`-SkipParity`).
6. **Auth rejection** — asserts `open-session` is rejected when the API key is
missing, and (when `-RejectScopeApiKeyEnv` names an insufficient-scope key)
when the key lacks the required scope. Skip with `-SkipAuth`.
7. **Write round-trip***opt-in (`-VerifyWrite`).* Runs right after
`register`: adds and advises a configurable writable attribute
(`-WriteAttribute`, default `TestChangingInt`), writes a per-client
sentinel value, then streams events and asserts an `OnWriteComplete` event
for that item is observed — proof the write round-tripped through the
gateway, worker, and MXAccess provider. The written value being echoed back
in an `OnDataChange` is recorded best-effort (`echoObserved`): a
provider-driven attribute such as `TestChangingInt` accepts the write but
immediately overwrites it, so no data-change carries the value back. The
Rust `stream-events` CLI emits full per-event JSON (`family`, `itemHandle`,
`value`) so all five clients apply the same checks.
It is opt-in because it mutates live tag state. The phase fails fast if the
write command is rejected — e.g. against a gateway whose worker predates
write support (`MxAccessCommandExecutor` returning `InvalidRequest` for
`Write`/`Write2`/`WriteSecured`/`WriteSecured2`).
8. **Alarm feed + acknowledge***opt-in (`-VerifyAlarms`).* Runs after the
stream phase. Exercises the two session-less alarm subcommands against the
gateway's central alarm monitor: `stream-alarms` reads a bounded slice of
the feed (`-AlarmStreamMax`, default 1 — the feed's first message always
arrives immediately, whereas later ones depend on live transitions) and
asserts at least one `AlarmFeedMessage`; `acknowledge-alarm` acknowledges
`-AlarmReference` (default `Galaxy!TestArea.TestMachine_001.TestAlarm001`)
and asserts the RPC round-trips. The native ack outcome is not asserted —
it depends on whether that alarm is currently active.
It is opt-in because it depends on the gateway's central alarm monitor
being enabled (`MxGateway:Alarms:Enabled`) and a live alarm provider.
Each client CLI is driven through one long-lived `batch` process. Every CLI
exposes a `batch` subcommand: a process that reads one command line from stdin,
runs it through the normal subcommand dispatch, writes the JSON result, then a
line containing exactly `__MXGW_BATCH_EOR__`. The harness launches one such
process per client and pings the ~250 operations of the flow through it, so the
process — and, for the JVM, the runtime — cold-start is paid once per client
instead of once per operation. A command that fails inside the batch process
writes its `{"error":...}` envelope and the loop continues; the harness treats
that envelope as the operation failure (used by the parity and auth phases).
Before the per-client phases run, the script builds the .NET CLI
(`dotnet build`) and installs the Java CLI (`gradle :mxgateway-cli:installDist`)
once, so the `batch` process launches straight from the compiled exe / the
installed launcher. The Go, Rust, and Python batch processes are launched via
`go run` / `cargo run` / `python -m`, which compile-or-start once when that
single per-client process starts.
Build the gateway and worker, start the gateway, and provide a valid API key
before running the client e2e script:
@@ -121,40 +338,58 @@ powershell -ExecutionPolicy Bypass -File scripts/run-client-e2e-tests.ps1 -Clien
powershell -ExecutionPolicy Bypass -File scripts/run-client-e2e-tests.ps1 -BulkTagCount 10
powershell -ExecutionPolicy Bypass -File scripts/run-client-e2e-tests.ps1 -SkipStream
powershell -ExecutionPolicy Bypass -File scripts/run-client-e2e-tests.ps1 -SkipBulk
# Write round-trip (opt-in): point at a writable scalar attribute and its
# value type.
powershell -ExecutionPolicy Bypass -File scripts/run-client-e2e-tests.ps1 -VerifyWrite -WriteAttribute TestChangingInt -WriteType int32
# Alarm feed + acknowledge (opt-in): needs MxGateway:Alarms:Enabled on the gateway.
powershell -ExecutionPolicy Bypass -File scripts/run-client-e2e-tests.ps1 -VerifyAlarms -AlarmReference "Galaxy!TestArea.TestMachine_001.TestAlarm001"
# Auth rejection: also assert an insufficient-scope key is denied.
powershell -ExecutionPolicy Bypass -File scripts/run-client-e2e-tests.ps1 -RejectScopeApiKeyEnv MXGATEWAY_READONLY_API_KEY
# Run all five clients concurrently as isolated child processes.
powershell -ExecutionPolicy Bypass -File scripts/run-client-e2e-tests.ps1 -Parallel
# Validate the flow offline (prints commands, contacts no gateway).
powershell -ExecutionPolicy Bypass -File scripts/run-client-e2e-tests.ps1 -DryRun
powershell -ExecutionPolicy Bypass -File scripts/run-client-e2e-tests.ps1 -Endpoint localhost:5000 -ApiKeyEnv MXGATEWAY_API_KEY
```
When `-VerifyWrite` is enabled, the write round-trip fails loudly if the write
command is rejected, if `-WriteAttribute` does not name a writable scalar
attribute, or if no `OnWriteComplete` event is observed for the written item
within `-WriteEchoMaxEvents` (default 200) streamed events. Raise
`-WriteEchoMaxEvents` if the gateway's per-session event backlog is large
enough to push `OnWriteComplete` past that bound.
## Focused Commands
Run the cross-language smoke matrix tests after changing the documented client
smoke command list:
```bash
dotnet test src/MxGateway.Tests/MxGateway.Tests.csproj --filter FullyQualifiedName~CrossLanguageSmokeMatrixTests
dotnet test src/ZB.MOM.WW.MxGateway.Tests/ZB.MOM.WW.MxGateway.Tests.csproj --filter FullyQualifiedName~CrossLanguageSmokeMatrixTests
```
Run the parity fixture matrix tests after changing the integration parity
scenario list:
```bash
dotnet test src/MxGateway.Tests/MxGateway.Tests.csproj --filter FullyQualifiedName~ParityFixtureMatrixTests
dotnet test src/ZB.MOM.WW.MxGateway.Tests/ZB.MOM.WW.MxGateway.Tests.csproj --filter FullyQualifiedName~ParityFixtureMatrixTests
```
Run the fake worker tests after changing gateway worker IPC, session startup, or
event streaming behavior:
```bash
dotnet test src/MxGateway.Tests/MxGateway.Tests.csproj --filter FullyQualifiedName~FakeWorkerHarnessTests
dotnet test src/MxGateway.Tests/MxGateway.Tests.csproj --filter FullyQualifiedName~SessionWorkerClientFactoryFakeWorkerTests
dotnet test src/MxGateway.Tests/MxGateway.Tests.csproj --filter FullyQualifiedName~GatewayEndToEndFakeWorkerSmokeTests
dotnet test src/MxGateway.Tests/MxGateway.Tests.csproj --filter FullyQualifiedName~WorkerClientTests
dotnet test src/MxGateway.Worker.Tests/MxGateway.Worker.Tests.csproj -p:Platform=x86 --filter FullyQualifiedName~WorkerPipeSessionTests
dotnet test src/ZB.MOM.WW.MxGateway.Tests/ZB.MOM.WW.MxGateway.Tests.csproj --filter FullyQualifiedName~FakeWorkerHarnessTests
dotnet test src/ZB.MOM.WW.MxGateway.Tests/ZB.MOM.WW.MxGateway.Tests.csproj --filter FullyQualifiedName~SessionWorkerClientFactoryFakeWorkerTests
dotnet test src/ZB.MOM.WW.MxGateway.Tests/ZB.MOM.WW.MxGateway.Tests.csproj --filter FullyQualifiedName~GatewayEndToEndFakeWorkerSmokeTests
dotnet test src/ZB.MOM.WW.MxGateway.Tests/ZB.MOM.WW.MxGateway.Tests.csproj --filter FullyQualifiedName~WorkerClientTests
dotnet test src/ZB.MOM.WW.MxGateway.Worker.Tests/ZB.MOM.WW.MxGateway.Worker.Tests.csproj -p:Platform=x86 --filter FullyQualifiedName~WorkerPipeSessionTests
```
Run the gateway test project after shared gateway test infrastructure changes:
```bash
dotnet test src/MxGateway.Tests/MxGateway.Tests.csproj
dotnet test src/ZB.MOM.WW.MxGateway.Tests/ZB.MOM.WW.MxGateway.Tests.csproj
```
## Related Documentation
+12 -2
View File
@@ -10,7 +10,7 @@ The layer is composed of four collaborators:
| Type | Lifetime | Role |
|------|----------|------|
| `MxAccessGatewayService` | scoped (gRPC) | Implements the four `MxAccessGateway` RPCs, performs exception mapping. |
| `MxAccessGatewayService` | scoped (gRPC) | Implements the six `MxAccessGateway` RPCs, performs exception mapping. |
| `MxAccessGrpcRequestValidator` | singleton | Rejects malformed requests before any session work runs. |
| `MxAccessGrpcMapper` | singleton | Converts public proto types to internal `WorkerCommand`/`WorkerEvent` types and back. |
| `IEventStreamService` (`EventStreamService`) | singleton | Owns the event stream pipeline, including bounded queue and backpressure handling. |
@@ -29,7 +29,7 @@ A second gRPC service, `GalaxyRepositoryGrpcService`, is mapped alongside it. It
## RPC Handlers
`MxAccessGatewayService` derives from the generated `MxAccessGateway.MxAccessGatewayBase` and implements every RPC declared in `mxaccess_gateway.proto`. The proto contract itself is documented in [Contracts](./Contracts.md); this section covers only what the server-side handler does on top of that contract.
`MxAccessGatewayService` derives from the generated `MxAccessGateway.MxAccessGatewayBase` and implements every RPC declared in `mxaccess_gateway.proto` — six in total: `OpenSession`, `CloseSession`, `Invoke`, `StreamEvents`, `AcknowledgeAlarm`, and `StreamAlarms`. The proto contract itself is documented in [Contracts](./Contracts.md); this section covers only what the server-side handler does on top of that contract.
Public gRPC send and receive message sizes are configured from
`MxGateway:Protocol:MaxGrpcMessageBytes` (default 16 MiB). Official clients use
@@ -86,6 +86,14 @@ Carrying the enqueue timestamp into the worker layer is what lets queue-wait tim
`StreamEvents` is a server-streaming RPC. The handler delegates the full pipeline to `IEventStreamService` and just forwards each `MxEvent` onto the response stream. Keeping the channel and producer/consumer machinery out of the handler means cancellation, exception mapping, and metric bookkeeping live in one place.
### `AcknowledgeAlarm`
`AcknowledgeAlarm` is a unary, **session-less** RPC that acknowledges a single alarm. The handler validates `alarm_full_reference` inline (it does not run through `MxAccessGrpcRequestValidator`) and delegates to `IGatewayAlarmService.AcknowledgeAsync`. The always-on `GatewayAlarmMonitor` routes the ack over its own gateway-managed worker session — clients no longer open a session to acknowledge an alarm. A reference that parses as a canonical GUID forwards to `AcknowledgeAlarmCommand`; a `Provider!Group.Tag` reference forwards to `AcknowledgeAlarmByNameCommand`.
### `StreamAlarms`
`StreamAlarms` is a server-streaming, **session-less** RPC that attaches to the gateway's central alarm feed. The handler delegates to `IGatewayAlarmService.StreamAsync`. The stream opens with one `AlarmFeedMessage` carrying an `active_alarm` per currently-active alarm (the ConditionRefresh snapshot), then a single `snapshot_complete`, then a `transition` for every subsequent raise / acknowledge / clear. It is served by the always-on `GatewayAlarmMonitor`, which owns a single gateway-managed worker session and fans out to every attached client — clients no longer open a session of their own. `alarm_filter_prefix`, when set, scopes the stream to a sub-tree.
## Validation Rules
`MxAccessGrpcRequestValidator` rejects requests with `StatusCode.InvalidArgument` before any session work happens. The rules are intentionally narrow — anything that requires session state (for example, "session does not exist") is left for `ISessionManager` so the validator can stay synchronous and side-effect free.
@@ -96,6 +104,8 @@ Carrying the enqueue timestamp into the worker layer is what lets queue-wait tim
| `CloseSession` | `session_id` must be non-empty. | `InvalidArgument` |
| `StreamEvents` | `session_id` must be non-empty. | `InvalidArgument` |
| `Invoke` | `session_id` non-empty, `command` present, `kind` not `Unspecified`, payload oneof must match `kind`. | `InvalidArgument` |
| `AcknowledgeAlarm` | `alarm_full_reference` must be non-empty. Validated inline in the handler, not by `MxAccessGrpcRequestValidator`. | `InvalidArgument` |
| `StreamAlarms` | No required fields — `alarm_filter_prefix` is optional. | — |
The payload-vs-kind check matters because the `MxCommand.payload` oneof is non-discriminated on the wire — a misaligned client could send `kind = Write` with a `Register` payload and silently confuse the worker. The validator turns that into a clear client error:
+3 -3
View File
@@ -64,9 +64,9 @@ Labels: `area:client-dotnet`, `type:infra`, `priority:p0`
Deliverables:
- `clients/dotnet/MxGateway.Client`,
- `clients/dotnet/MxGateway.Client.Cli`,
- `clients/dotnet/MxGateway.Client.Tests`,
- `clients/dotnet/ZB.MOM.WW.MxGateway.Client`,
- `clients/dotnet/ZB.MOM.WW.MxGateway.Client.Cli`,
- `clients/dotnet/ZB.MOM.WW.MxGateway.Client.Tests`,
- optional integration test project,
- generated protobuf setup.
+10 -10
View File
@@ -22,19 +22,19 @@ Labels: `area:gateway`, `type:infra`, `priority:p0`
Deliverables:
- create `src/MxGateway.sln`,
- create `src/MxGateway.Contracts`,
- create `src/MxGateway.Server`,
- create `src/MxGateway.Tests`,
- create `src/MxGateway.IntegrationTests`,
- target `MxGateway.Server` to `net10.0`,
- create `src/ZB.MOM.WW.MxGateway.slnx`,
- create `src/ZB.MOM.WW.MxGateway.Contracts`,
- create `src/ZB.MOM.WW.MxGateway.Server`,
- create `src/ZB.MOM.WW.MxGateway.Tests`,
- create `src/ZB.MOM.WW.MxGateway.IntegrationTests`,
- target `ZB.MOM.WW.MxGateway.Server` to `net10.0`,
- add shared C# build settings in `Directory.Build.props`,
- add baseline tests.
Acceptance criteria:
- `dotnet build src/MxGateway.sln` succeeds,
- `dotnet test src/MxGateway.sln` succeeds,
- `dotnet build src/ZB.MOM.WW.MxGateway.slnx` succeeds,
- `dotnet test src/ZB.MOM.WW.MxGateway.slnx` succeeds,
- gateway project does not reference MXAccess COM.
### Issue: Define Protobuf Contracts
@@ -43,8 +43,8 @@ Labels: `area:contracts`, `type:feature`, `priority:p0`
Deliverables:
- `src/MxGateway.Contracts/Protos/mxaccess_gateway.proto`,
- `src/MxGateway.Contracts/Protos/mxaccess_worker.proto`,
- `src/ZB.MOM.WW.MxGateway.Contracts/Protos/mxaccess_gateway.proto`,
- `src/ZB.MOM.WW.MxGateway.Contracts/Protos/mxaccess_worker.proto`,
- `MxAccessGateway` service with `OpenSession`, `CloseSession`, `Invoke`, and
`StreamEvents`,
- `WorkerEnvelope` and worker IPC messages,
+2 -2
View File
@@ -23,12 +23,12 @@ Labels: `area:worker`, `type:infra`, `priority:p0`
Deliverables:
- create `src/MxGateway.Worker`,
- create `src/ZB.MOM.WW.MxGateway.Worker`,
- target `.NET Framework 4.8`,
- platform target `x86`,
- reference generated worker contracts,
- reference `ArchestrA.MXAccess.dll`,
- create `src/MxGateway.Worker.Tests`,
- create `src/ZB.MOM.WW.MxGateway.Worker.Tests`,
- document MSBuild command from `docs/ToolchainLinks.md`.
Acceptance criteria:
+2 -2
View File
@@ -4,7 +4,7 @@ The metrics subsystem exposes counters, histograms, and observable gauges that d
## Overview
`GatewayMetrics` is a singleton (registered in `GatewayApplication.cs`) that owns a single `Meter` named `MxGateway.Server` and a set of synchronised counters, histograms, and observable gauges. Subsystems call typed mutator methods (`SessionOpened`, `CommandFailed`, `EventReceived`, etc.) rather than touching the `Meter` directly, which keeps the OpenTelemetry instrument names and tag conventions in one place. A `lock (_syncRoot)` block guards the scalar fields used by `GetSnapshot`, while per-event maps use `ConcurrentDictionary<string, long>` so the hot event path avoids the lock.
`GatewayMetrics` is a singleton (registered in `GatewayApplication.cs`) that owns a single `Meter` named `ZB.MOM.WW.MxGateway.Server` and a set of synchronised counters, histograms, and observable gauges. Subsystems call typed mutator methods (`SessionOpened`, `CommandFailed`, `EventReceived`, etc.) rather than touching the `Meter` directly, which keeps the OpenTelemetry instrument names and tag conventions in one place. A `lock (_syncRoot)` block guards the scalar fields used by `GetSnapshot`, while per-event maps use `ConcurrentDictionary<string, long>` so the hot event path avoids the lock.
## Meter and OpenTelemetry Compatibility
@@ -13,7 +13,7 @@ The meter name is exposed as a constant so that hosting code can register it wit
```csharp
public sealed class GatewayMetrics : IDisposable
{
public const string MeterName = "MxGateway.Server";
public const string MeterName = "ZB.MOM.WW.MxGateway.Server";
public GatewayMetrics()
{
+40 -13
View File
@@ -33,23 +33,23 @@ project targets .NET Framework 4.8, but the SDK resolver comes from the .NET SDK
installation:
```powershell
dotnet msbuild src\MxGateway.Worker\MxGateway.Worker.csproj /restore /p:Configuration=Debug /p:Platform=x86
dotnet msbuild src\ZB.MOM.WW.MxGateway.Worker\ZB.MOM.WW.MxGateway.Worker.csproj /restore /p:Configuration=Debug /p:Platform=x86
```
`docs/ToolchainLinks.md` records the Visual Studio MSBuild executable for
classic .NET Framework and COM interop builds:
```powershell
& "C:\Program Files (x86)\Microsoft Visual Studio\2022\BuildTools\MSBuild\Current\Bin\MSBuild.exe" src\MxGateway.Worker\MxGateway.Worker.csproj /p:Configuration=Debug /p:Platform=x86
& "C:\Program Files (x86)\Microsoft Visual Studio\2022\BuildTools\MSBuild\Current\Bin\MSBuild.exe" src\ZB.MOM.WW.MxGateway.Worker\ZB.MOM.WW.MxGateway.Worker.csproj /p:Configuration=Debug /p:Platform=x86
```
Run the worker tests with the same platform target:
```powershell
dotnet test src\MxGateway.Worker.Tests\MxGateway.Worker.Tests.csproj -p:Platform=x86
dotnet test src\ZB.MOM.WW.MxGateway.Worker.Tests\ZB.MOM.WW.MxGateway.Worker.Tests.csproj -p:Platform=x86
```
The only MXAccess interop reference belongs in `MxGateway.Worker`. Gateway and
The only MXAccess interop reference belongs in `ZB.MOM.WW.MxGateway.Worker`. Gateway and
test projects may reference the worker project for metadata and scaffold tests,
but they must not reference `ArchestrA.MXAccess.dll` directly.
@@ -132,7 +132,7 @@ credential, or API key values before the message is written.
## Internal Components
```text
MxGateway.Worker
ZB.MOM.WW.MxGateway.Worker
Program
Bootstrap
WorkerOptions
@@ -251,7 +251,7 @@ The loop should update a heartbeat timestamp after:
- processing an MXAccess event.
`StaRuntime` implements this runtime boundary in the worker. It starts one
background thread named `MxGateway.Worker.STA`, sets it to `ApartmentState.STA`,
background thread named `ZB.MOM.WW.MxGateway.Worker.STA`, sets it to `ApartmentState.STA`,
initializes COM through `StaComApartmentInitializer`, and runs
`StaMessagePump`. Commands are scheduled through `InvokeAsync`; the command
queue signals an `AutoResetEvent` so `MsgWaitForMultipleObjectsEx` can wake the
@@ -655,12 +655,39 @@ the event queue implementation owns those counters.
The STA watchdog currently emits a `WorkerFault` with
`WorkerFaultCategory.StaHung` when `LastStaActivityUtc` is older than
`WorkerPipeSessionOptions.HeartbeatGrace`. The fault includes the current
command correlation id when a command is active. Command duration and high event
queue depth remain observable through heartbeat fields until dedicated
thresholds own those warnings. The worker reports stale STA activity, but the
gateway owns the final kill decision through its existing heartbeat and worker
lifecycle policy.
`WorkerPipeSessionOptions.HeartbeatGrace` **and no command is in flight**.
`StaRuntime.ProcessQueuedCommands` calls `MarkActivity()` only immediately
before and after each work item, so a synchronously long-running STA command
(for example a `ReadBulk` waiting `timeout_ms` for the first `OnDataChange`)
legitimately freezes `LastStaActivityUtc` for the duration of the wait while
the worker is healthy. The watchdog is therefore suppressed while the
heartbeat snapshot's `CurrentCommandCorrelationId` is non-empty: the worker is
busy executing a command, not hung, and the heartbeat already surfaces the
in-flight correlation id so the gateway can apply its own per-command timeout
if it considers the command too slow. The fault still fires on a truly hung
STA — no command in flight and no activity for longer than `HeartbeatGrace`
which is the only case the watchdog can usefully distinguish from a slow
command. Command duration and high event queue depth remain observable through
heartbeat fields until dedicated thresholds own those warnings. The worker
reports stale STA activity, but the gateway owns the final kill decision
through its existing heartbeat and worker lifecycle policy.
The in-flight-command suppression itself is bounded by
`WorkerPipeSessionOptions.HeartbeatStuckCeiling` (default 75 seconds = 5 ×
`HeartbeatGrace`). The motivating case for the suppression is a legitimately
slow synchronous command — but a genuinely stuck COM call (for example
against a dead MXAccess provider whose cross-apartment marshaler is
permanently blocked, or a write completion that never fires) leaves
`CurrentCommandCorrelationId` non-empty indefinitely. Without an upper bound
the worker-side `StaHung` watchdog would be permanently defeated for that
session and only the gateway's per-command timeout would catch the hang —
losing the worker-originated diagnostic (`StaHung` fault category, the
stale-by interval) from the gateway audit trail. Once `LastStaActivityUtc`
has been stale for longer than `HeartbeatStuckCeiling`, the watchdog fires
`StaHung` regardless of whether a command is in flight, on the assumption
that no legitimate STA command should run that long without periodically
refreshing activity. Deployments that legitimately run very long bulk
operations should raise the ceiling rather than disable it.
## Shutdown
@@ -807,7 +834,7 @@ tests. `AddItem` uses `TestChildObject.TestInt` by default and accepts an
override through `MXGATEWAY_LIVE_MXACCESS_ITEM`; `AddItem2` uses the captured
parity fixture shape `AddItem2("TestInt", "TestChildObject")`.
`WorkerLiveMxAccessSmokeTests` in `src/MxGateway.IntegrationTests/` uses the
`WorkerLiveMxAccessSmokeTests` in `src/ZB.MOM.WW.MxGateway.IntegrationTests/` uses the
same opt-in variable for the gateway-to-worker live smoke. It launches the x86
worker through `WorkerProcessLauncher`, opens a gateway session, runs
`Register`, `AddItem`, and `Advise`, waits for one `OnDataChange`, and closes
+1 -1
View File
@@ -88,7 +88,7 @@ into a transport failure when the worker captured HRESULT or status details.
Run the parity fixture matrix tests after changing the matrix:
```bash
dotnet test src/MxGateway.Tests/MxGateway.Tests.csproj --filter FullyQualifiedName~ParityFixtureMatrixTests
dotnet test src/ZB.MOM.WW.MxGateway.Tests/ZB.MOM.WW.MxGateway.Tests.csproj --filter FullyQualifiedName~ParityFixtureMatrixTests
```
Live MXAccess execution remains opt-in. The matrix defines which scenarios to
+10 -3
View File
@@ -16,7 +16,7 @@ All four interfaces (`ISessionManager`, `ISessionRegistry`, `ISessionWorkerClien
The session id is an opaque string in the form `session-{guid:N}` and the per-session pipe name is `mxaccess-gateway-{ProcessId}-{SessionId}`. Encoding the gateway PID into the pipe name avoids collisions when an old gateway process leaks pipes that the OS has not yet reclaimed.
`SessionState` itself is the protobuf-generated enum from `MxGateway.Contracts.Proto`, so it is shared between the gateway and clients on the wire.
`SessionState` itself is the protobuf-generated enum from `ZB.MOM.WW.MxGateway.Contracts.Proto`, so it is shared between the gateway and clients on the wire.
```csharp
public void TransitionTo(SessionState nextState)
@@ -33,12 +33,19 @@ public void TransitionTo(SessionState nextState)
return;
}
if (_state is SessionState.Closing
&& nextState is not SessionState.Closed
&& nextState is not SessionState.Faulted)
{
return;
}
_state = nextState;
}
}
```
`Closed` is terminal and `Faulted` only allows a transition to `Closed`. This guards against late callbacks (worker exit, heartbeat timeout) re-animating a session that is already torn down.
`Closed` is terminal, `Faulted` only allows a transition to `Closed`, and `Closing` only allows a transition to `Closed` or `Faulted`. This guards against late callbacks (worker exit, heartbeat timeout) re-animating a session that is already tearing down or torn down — once `CloseAsync` has set `Closing` under `_syncRoot`, no `TransitionTo(Ready)` from another thread can walk the session back to `Ready`. Both close-related writes (`Closing` and `Closed`) go through `_syncRoot` exactly like every other state write; `_closeLock` only serializes concurrent close attempts.
### SessionManager (ISessionManager)
@@ -184,7 +191,7 @@ Sessions open with `MxGateway:Sessions:DefaultLeaseSeconds` (default 1800) added
### Close
`GatewaySession.CloseAsync` is serialized by a per-session `SemaphoreSlim` (`_closeLock`). It transitions to `Closing`, asks the worker client to shut down within `ShutdownTimeout`, and on success transitions to `Closed`. If `WorkerClient.ShutdownAsync` throws, the session falls back to `IWorkerClient.Kill` (forced close):
`GatewaySession.CloseAsync` is serialized by a per-session `SemaphoreSlim` (`_closeLock`) so only one close runs at a time, but every read/write of `_state` still passes through `_syncRoot` (via `TryBeginClose` and `MarkClosed`). The close path therefore obeys the same lock discipline as `TransitionTo` / `MarkFaulted`: it transitions to `Closing`, asks the worker client to shut down within `ShutdownTimeout`, and on success transitions to `Closed`. `DisposeAsync` waits on `_closeLock` once before disposing the semaphore so an in-flight close's `Release()` cannot race against the dispose. If `WorkerClient.ShutdownAsync` throws, the session falls back to `IWorkerClient.Kill` (forced close):
```csharp
if (_workerClient is not null)
+2 -2
View File
@@ -1,13 +1,13 @@
# Worker Bootstrap
The bootstrap layer parses the command-line arguments and environment variables passed to the `MxGateway.Worker` process, validates them against the gateway contract, and produces either a populated `WorkerOptions` instance or a structured failure that maps to a `WorkerExitCode`.
The bootstrap layer parses the command-line arguments and environment variables passed to the `ZB.MOM.WW.MxGateway.Worker` process, validates them against the gateway contract, and produces either a populated `WorkerOptions` instance or a structured failure that maps to a `WorkerExitCode`.
## Overview
The worker process is a short-lived child of the gateway. The gateway side of this contract lives in [WorkerProcessLauncher](./WorkerProcessLauncher.md). On the worker side, `Program.cs` is a single line that delegates to `WorkerApplication.Run(args)`:
```csharp
using MxGateway.Worker;
using ZB.MOM.WW.MxGateway.Worker;
return WorkerApplication.Run(args);
```
+2 -2
View File
@@ -1,10 +1,10 @@
# Worker Conversion Layer
The conversion layer in `MxGateway.Worker.Conversion` projects COM `VARIANT` payloads, `HRESULT` codes, and `MXSTATUS_PROXY` records into the protobuf wire types in `MxGateway.Contracts.Proto`. The design is parity-first: every projection preserves enough raw metadata that the original COM observation can be reconstructed downstream.
The conversion layer in `ZB.MOM.WW.MxGateway.Worker.Conversion` projects COM `VARIANT` payloads, `HRESULT` codes, and `MXSTATUS_PROXY` records into the protobuf wire types in `ZB.MOM.WW.MxGateway.Contracts.Proto`. The design is parity-first: every projection preserves enough raw metadata that the original COM observation can be reconstructed downstream.
## Overview
`gateway.md` (sections "Value Model" and "Status Model") requires that the wire format use a value union capable of representing COM `VARIANT` values and arrays, that lossy conversions retain both the typed projection and raw diagnostic metadata, and that `MXSTATUS_PROXY` arrays never collapse to a single success flag. The types in `src/MxGateway.Worker/Conversion/` are the worker-side enforcement of those rules.
`gateway.md` (sections "Value Model" and "Status Model") requires that the wire format use a value union capable of representing COM `VARIANT` values and arrays, that lossy conversions retain both the typed projection and raw diagnostic metadata, and that `MXSTATUS_PROXY` arrays never collapse to a single success flag. The types in `src/ZB.MOM.WW.MxGateway.Worker/Conversion/` are the worker-side enforcement of those rules.
The layer is split into three concerns:
+11 -6
View File
@@ -35,17 +35,22 @@ oversized frames, protocol version mismatches, and session mismatches.
## Verification
The frame protocol lives in `ZB.MOM.WW.MxGateway.Worker.Ipc` (`WorkerFrameReader`,
`WorkerFrameWriter`, `WorkerFrameProtocolOptions`) and is covered by
`src/ZB.MOM.WW.MxGateway.Worker.Tests/Ipc/WorkerFrameProtocolTests.cs`. The worker is an
x86 process, so build and test it with `-p:Platform=x86`.
Run the focused tests after changing the frame protocol:
```bash
dotnet test src/MxGateway.Tests/MxGateway.Tests.csproj --filter WorkerFrameProtocolTests
```powershell
dotnet test src/ZB.MOM.WW.MxGateway.Worker.Tests/ZB.MOM.WW.MxGateway.Worker.Tests.csproj -p:Platform=x86 --filter WorkerFrameProtocolTests
```
Run the gateway build because the frame protocol is part of
`MxGateway.Server`:
Run the x86 worker build because the frame protocol is part of
`ZB.MOM.WW.MxGateway.Worker`:
```bash
dotnet build src/MxGateway.Server/MxGateway.Server.csproj
```powershell
dotnet build src/ZB.MOM.WW.MxGateway.Worker/ZB.MOM.WW.MxGateway.Worker.csproj -p:Platform=x86
```
## Related Documentation
+3 -3
View File
@@ -60,13 +60,13 @@ optional pipe reservation, records a worker kill metric, and reports a
Run the focused launcher tests after changing process launch behavior:
```bash
dotnet test src/MxGateway.Tests/MxGateway.Tests.csproj --filter WorkerProcessLauncherTests
dotnet test src/ZB.MOM.WW.MxGateway.Tests/ZB.MOM.WW.MxGateway.Tests.csproj --filter WorkerProcessLauncherTests
```
Run the gateway build because the launcher is part of `MxGateway.Server`:
Run the gateway build because the launcher is part of `ZB.MOM.WW.MxGateway.Server`:
```bash
dotnet build src/MxGateway.Server/MxGateway.Server.csproj
dotnet build src/ZB.MOM.WW.MxGateway.Server/ZB.MOM.WW.MxGateway.Server.csproj
```
## Related Documentation
+3 -3
View File
@@ -4,7 +4,7 @@ The worker STA runtime owns the dedicated single-threaded apartment thread that
## Why an STA Is Required
The installed MXAccess interop assembly declares an `Apartment` threading model (see `gateway.md` under "STA Worker Thread Model"). COM objects with that model must be created and called on a thread initialized as a single-threaded apartment, and any callbacks the object raises (event sink calls) are delivered through the thread's Windows message queue. A plain blocking queue is not sufficient: the STA loop must pump Windows messages so that the COM marshaler can deliver event invocations on the same thread that holds the object. Because of that constraint, every MXAccess operation in the worker is funneled through the types in `src/MxGateway.Worker/Sta/`.
The installed MXAccess interop assembly declares an `Apartment` threading model (see `gateway.md` under "STA Worker Thread Model"). COM objects with that model must be created and called on a thread initialized as a single-threaded apartment, and any callbacks the object raises (event sink calls) are delivered through the thread's Windows message queue. A plain blocking queue is not sufficient: the STA loop must pump Windows messages so that the COM marshaler can deliver event invocations on the same thread that holds the object. Because of that constraint, every MXAccess operation in the worker is funneled through the types in `src/ZB.MOM.WW.MxGateway.Worker/Sta/`.
## Types
@@ -20,13 +20,13 @@ The installed MXAccess interop assembly declares an `Apartment` threading model
## STA Thread Initialization
`StaRuntime`'s constructor configures a background `Thread` named `MxGateway.Worker.STA` and forces it into `ApartmentState.STA` before the thread starts. `Start()` releases the thread and then blocks on `startedEvent` so callers observe a fully-initialized apartment (or a captured `startupException`) before the first `InvokeAsync` call:
`StaRuntime`'s constructor configures a background `Thread` named `ZB.MOM.WW.MxGateway.Worker.STA` and forces it into `ApartmentState.STA` before the thread starts. `Start()` releases the thread and then blocks on `startedEvent` so callers observe a fully-initialized apartment (or a captured `startupException`) before the first `InvokeAsync` call:
```csharp
staThread = new Thread(ThreadMain)
{
IsBackground = true,
Name = "MxGateway.Worker.STA"
Name = "ZB.MOM.WW.MxGateway.Worker.STA"
};
staThread.SetApartmentState(ApartmentState.STA);
```