rename: prefix gateway projects/namespaces with ZB.MOM.WW + sln→slnx

Apply the ZB.MOM.WW. prefix to all gateway-side projects, folders,
.csproj/.sln contents, C# namespaces, using directives, generated proto
C# (csharp_namespace + checked-in generated files), InternalsVisibleTo
attributes, project-name string literals (LoadProject, .sln lookups,
worker exe paths, staticwebassets manifest), and the install/script/doc
references that point at any of the above. Migrate the solution from
.sln to .slnx via `dotnet sln migrate` and delete the old file.

External-runtime identifiers are intentionally NOT prefixed so external
configuration keeps working:
- GatewayMetrics.cs MeterName ("MxGateway.Server")
- DashboardAuthenticationDefaults Scheme/Policy ("MxGateway.Dashboard")
- GatewayRequestLoggingMiddleware logger category ("MxGateway.Request")
- StaRuntime thread name ("MxGateway.Worker.STA")
- appsettings.json root section "MxGateway" + env-var prefix
  MxGateway__... and secret-name MxGateway:ApiKeyPepper
- C:\ProgramData\MxGateway\ data dir paths

Also fixes two tests that were not rename-related but became visible
while validating the rename:

- WorkerLiveMxAccessSmokeTests.ShutDownAsync: cancellation that the
  gateway service correctly maps to RpcException(Cancelled) per gRPC
  convention was being misclassified as a stream fault. Added a sibling
  catch on RpcException with StatusCode.Cancelled.

- IntegrationTestEnvironment.ResolveRepositoryRoot: extracted IsRepositoryRoot
  and made it accept either a .git marker OR a .sln/.slnx next to src/
  so the worker-exe walker works in non-git working copies.

clients/proto/proto-inputs.json's protoRoot updated to point at
src/ZB.MOM.WW.MxGateway.Contracts/Protos.

Verified by `dotnet build` and a full `dotnet test` of the .slnx with
MXGATEWAY_RUN_LIVE_{MXACCESS,LDAP,GALAXY}_TESTS=1:
  Tests: 472/472 pass
  Worker.Tests: 280/280 pass (4 dev-rig [Fact(Skip=...)] skipped)
  IntegrationTests: 18/18 pass

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
Joseph Doherty
2026-05-23 16:22:23 -04:00
parent 867bf18116
commit dc9c0c950c
491 changed files with 32854 additions and 8414 deletions
+40 -13
View File
@@ -33,23 +33,23 @@ project targets .NET Framework 4.8, but the SDK resolver comes from the .NET SDK
installation:
```powershell
dotnet msbuild src\MxGateway.Worker\MxGateway.Worker.csproj /restore /p:Configuration=Debug /p:Platform=x86
dotnet msbuild src\ZB.MOM.WW.MxGateway.Worker\ZB.MOM.WW.MxGateway.Worker.csproj /restore /p:Configuration=Debug /p:Platform=x86
```
`docs/ToolchainLinks.md` records the Visual Studio MSBuild executable for
classic .NET Framework and COM interop builds:
```powershell
& "C:\Program Files (x86)\Microsoft Visual Studio\2022\BuildTools\MSBuild\Current\Bin\MSBuild.exe" src\MxGateway.Worker\MxGateway.Worker.csproj /p:Configuration=Debug /p:Platform=x86
& "C:\Program Files (x86)\Microsoft Visual Studio\2022\BuildTools\MSBuild\Current\Bin\MSBuild.exe" src\ZB.MOM.WW.MxGateway.Worker\ZB.MOM.WW.MxGateway.Worker.csproj /p:Configuration=Debug /p:Platform=x86
```
Run the worker tests with the same platform target:
```powershell
dotnet test src\MxGateway.Worker.Tests\MxGateway.Worker.Tests.csproj -p:Platform=x86
dotnet test src\ZB.MOM.WW.MxGateway.Worker.Tests\ZB.MOM.WW.MxGateway.Worker.Tests.csproj -p:Platform=x86
```
The only MXAccess interop reference belongs in `MxGateway.Worker`. Gateway and
The only MXAccess interop reference belongs in `ZB.MOM.WW.MxGateway.Worker`. Gateway and
test projects may reference the worker project for metadata and scaffold tests,
but they must not reference `ArchestrA.MXAccess.dll` directly.
@@ -132,7 +132,7 @@ credential, or API key values before the message is written.
## Internal Components
```text
MxGateway.Worker
ZB.MOM.WW.MxGateway.Worker
Program
Bootstrap
WorkerOptions
@@ -251,7 +251,7 @@ The loop should update a heartbeat timestamp after:
- processing an MXAccess event.
`StaRuntime` implements this runtime boundary in the worker. It starts one
background thread named `MxGateway.Worker.STA`, sets it to `ApartmentState.STA`,
background thread named `ZB.MOM.WW.MxGateway.Worker.STA`, sets it to `ApartmentState.STA`,
initializes COM through `StaComApartmentInitializer`, and runs
`StaMessagePump`. Commands are scheduled through `InvokeAsync`; the command
queue signals an `AutoResetEvent` so `MsgWaitForMultipleObjectsEx` can wake the
@@ -655,12 +655,39 @@ the event queue implementation owns those counters.
The STA watchdog currently emits a `WorkerFault` with
`WorkerFaultCategory.StaHung` when `LastStaActivityUtc` is older than
`WorkerPipeSessionOptions.HeartbeatGrace`. The fault includes the current
command correlation id when a command is active. Command duration and high event
queue depth remain observable through heartbeat fields until dedicated
thresholds own those warnings. The worker reports stale STA activity, but the
gateway owns the final kill decision through its existing heartbeat and worker
lifecycle policy.
`WorkerPipeSessionOptions.HeartbeatGrace` **and no command is in flight**.
`StaRuntime.ProcessQueuedCommands` calls `MarkActivity()` only immediately
before and after each work item, so a synchronously long-running STA command
(for example a `ReadBulk` waiting `timeout_ms` for the first `OnDataChange`)
legitimately freezes `LastStaActivityUtc` for the duration of the wait while
the worker is healthy. The watchdog is therefore suppressed while the
heartbeat snapshot's `CurrentCommandCorrelationId` is non-empty: the worker is
busy executing a command, not hung, and the heartbeat already surfaces the
in-flight correlation id so the gateway can apply its own per-command timeout
if it considers the command too slow. The fault still fires on a truly hung
STA — no command in flight and no activity for longer than `HeartbeatGrace`
which is the only case the watchdog can usefully distinguish from a slow
command. Command duration and high event queue depth remain observable through
heartbeat fields until dedicated thresholds own those warnings. The worker
reports stale STA activity, but the gateway owns the final kill decision
through its existing heartbeat and worker lifecycle policy.
The in-flight-command suppression itself is bounded by
`WorkerPipeSessionOptions.HeartbeatStuckCeiling` (default 75 seconds = 5 ×
`HeartbeatGrace`). The motivating case for the suppression is a legitimately
slow synchronous command — but a genuinely stuck COM call (for example
against a dead MXAccess provider whose cross-apartment marshaler is
permanently blocked, or a write completion that never fires) leaves
`CurrentCommandCorrelationId` non-empty indefinitely. Without an upper bound
the worker-side `StaHung` watchdog would be permanently defeated for that
session and only the gateway's per-command timeout would catch the hang —
losing the worker-originated diagnostic (`StaHung` fault category, the
stale-by interval) from the gateway audit trail. Once `LastStaActivityUtc`
has been stale for longer than `HeartbeatStuckCeiling`, the watchdog fires
`StaHung` regardless of whether a command is in flight, on the assumption
that no legitimate STA command should run that long without periodically
refreshing activity. Deployments that legitimately run very long bulk
operations should raise the ceiling rather than disable it.
## Shutdown
@@ -807,7 +834,7 @@ tests. `AddItem` uses `TestChildObject.TestInt` by default and accepts an
override through `MXGATEWAY_LIVE_MXACCESS_ITEM`; `AddItem2` uses the captured
parity fixture shape `AddItem2("TestInt", "TestChildObject")`.
`WorkerLiveMxAccessSmokeTests` in `src/MxGateway.IntegrationTests/` uses the
`WorkerLiveMxAccessSmokeTests` in `src/ZB.MOM.WW.MxGateway.IntegrationTests/` uses the
same opt-in variable for the gateway-to-worker live smoke. It launches the x86
worker through `WorkerProcessLauncher`, opens a gateway session, runs
`Register`, `AddItem`, and `Advise`, waits for one `OnDataChange`, and closes