Merge remote-tracking branch 'origin/main' into agent-3/issue-32-implement-heartbeat-and-watchdog

# Conflicts:
#	src/MxGateway.Worker/Ipc/WorkerPipeSession.cs
#	src/MxGateway.Worker/MxAccess/MxAccessStaSession.cs
This commit is contained in:
Joseph Doherty
2026-04-26 19:16:42 -04:00
70 changed files with 3195 additions and 364 deletions
+13
View File
@@ -18,6 +18,12 @@ event, value, and status shapes.
Generated C# output is written to `src/MxGateway.Contracts/Generated/`. Do not
hand-edit generated files.
Client generation inputs are published through
`clients/proto/proto-inputs.json` and the descriptor set under
`clients/proto/descriptors/`. See
[Client Proto Generation](./client-proto-generation.md) for language-specific
generation inputs, output directories, and golden protobuf JSON fixtures.
## Generation
Run the contracts build to regenerate C# protobuf and gRPC code:
@@ -39,8 +45,15 @@ gateway and test projects:
dotnet build src/MxGateway.sln
```
Regenerate the client descriptor after changing either `.proto` file:
```bash
powershell -ExecutionPolicy Bypass -File scripts/publish-client-proto-inputs.ps1
```
## Related Documentation
- [Client Proto Generation](./client-proto-generation.md)
- [Gateway Process Detailed Design](./gateway-process-design.md)
- [MXAccess Worker Instance Detailed Design](./mxaccess-worker-instance-design.md)
- [Protobuf Style Guide](./style-guides/ProtobufStyleGuide.md)
+15
View File
@@ -26,6 +26,11 @@ Language-specific plans:
- `docs/clients-python-design.md`
- `docs/clients-java-design.md`
Shared generation inputs:
- `docs/client-proto-generation.md`
- `clients/proto/proto-inputs.json`
Language style guides:
| Client | Style guide |
@@ -365,6 +370,16 @@ examples/
Generated code should be reproducible from `src/MxGateway.Contracts/Protos/`.
Do not hand-edit generated code.
The stable client proto manifest defines the generated-code directories:
```text
clients/dotnet/generated
clients/go/internal/generated
clients/rust/src/generated
clients/python/src/mxgateway/generated
clients/java/src/main/generated
```
## Versioning
All clients should expose:
+133
View File
@@ -0,0 +1,133 @@
# Client Proto Generation
This document defines the stable protobuf inputs that official clients use to
generate language-specific gRPC bindings. The checked-in `.proto` files remain
the source of truth so clients do not drift from the gateway and worker
contracts.
## Stable Inputs
The stable client input manifest is `clients/proto/proto-inputs.json`. It
records:
- the public gateway protocol version,
- the worker IPC protocol version,
- the protobuf import root,
- the public and worker source files,
- the descriptor set path,
- golden fixture locations,
- generated-code output directories for each planned client.
The source files listed by the manifest are:
- `src/MxGateway.Contracts/Protos/mxaccess_gateway.proto`
- `src/MxGateway.Contracts/Protos/mxaccess_worker.proto`
`mxaccess_gateway.proto` defines the public gRPC service and shared DTOs.
`mxaccess_worker.proto` is included in the descriptor because worker-aware
tests and fake-worker clients need the same command, reply, event, value, and
status shapes.
## Protocol Version
`GatewayContractInfo.GatewayProtocolVersion` is the public gateway protocol
version. `OpenSessionReply.gateway_protocol_version` returns the same value so
clients can compare their generated bindings against the gateway before issuing
MXAccess commands.
`GatewayContractInfo.WorkerProtocolVersion` remains the gateway-to-worker IPC
protocol version. It is also present in `OpenSessionReply` because parity
fixtures and fake-worker tests need to know the worker contract used by the
session.
## Descriptor Publishing
Run this command after changing either source `.proto` file or the client proto
manifest:
```powershell
scripts/publish-client-proto-inputs.ps1
```
The script writes
`clients/proto/descriptors/mxaccessgw-client-v1.protoset` with imports and
source information included. The descriptor is a generated artifact; do not edit
it by hand.
Use the check mode in CI or before committing:
```powershell
scripts/publish-client-proto-inputs.ps1 -Check
```
`-Check` rebuilds the descriptor in a temporary path and fails when the checked
in descriptor is stale.
## Output Directories
The manifest declares these generated-code directories:
| Client | Directory |
|--------|-----------|
| .NET | `clients/dotnet/generated` |
| Go | `clients/go/internal/generated` |
| Rust | `clients/rust/src/generated` |
| Python | `clients/python/src/mxgateway/generated` |
| Java | `clients/java/src/main/generated` |
Only generator output belongs in these directories. Handwritten client wrappers
belong in the language-specific source trees created by the client scaffold
issues.
## Language Generation Inputs
All generators use `src/MxGateway.Contracts/Protos` as the protobuf import
root. The checked-in descriptor is available when a language build prefers a
descriptor input, but the `.proto` files remain canonical.
.NET generation currently runs through the contracts project:
```powershell
dotnet build src/MxGateway.Contracts/MxGateway.Contracts.csproj
```
Future .NET client projects may either reference `MxGateway.Contracts` or
generate client-local files into `clients/dotnet/generated` with `Grpc.Tools`.
Go clients should generate `mxaccess_gateway.proto` and
`mxaccess_worker.proto` into `clients/go/internal/generated` with
`protoc-gen-go` and `protoc-gen-go-grpc`. Keep generated packages internal
unless the wrapper API intentionally exposes raw protobuf messages.
Rust clients should use `tonic-build` or the selected protobuf generator from
the Rust client build script, with generated modules placed under
`clients/rust/src/generated` or included from the build output according to the
client crate design.
Python clients should use `grpc_tools.protoc` and write generated modules under
`clients/python/src/mxgateway/generated` so imports stay separate from
handwritten async wrappers.
Java clients should use the Gradle protobuf plugin and write generated sources
under `clients/java/src/main/generated`. The Java client scaffold owns the
Gradle plugin versions and source-set wiring.
## Golden Fixtures
Golden protobuf JSON fixtures live in `clients/proto/fixtures/golden`. They
exercise payloads that every language client must parse:
- `open-session-reply.ok.json`
- `register-command-request.json`
- `on-data-change-event.json`
The fixtures use protobuf JSON field names and enum values. Contract tests parse
them with the generated C# types so schema drift is caught before client
generation work starts.
## Related Documentation
- [Protobuf Contracts](./Contracts.md)
- [Client Libraries Detailed Design](./client-libraries-design.md)
- [Client Libraries Implementation Plan](./implementation-plan-clients.md)
- [Protobuf Style Guide](./style-guides/ProtobufStyleGuide.md)
+33 -30
View File
@@ -34,9 +34,13 @@ SignalR circuit. Bootstrap is sufficient for a basic dashboard.
## Hosting Model
The dashboard is hosted by `MxGateway.Server` alongside the gRPC API.
The dashboard is hosted by `MxGateway.Server` alongside the gRPC API. When
`MxGateway:Dashboard:Enabled` is `true`, `MapGatewayDashboard()` maps the
configured `Dashboard:PathBase` to the Blazor Server app and maps the login,
logout, and access-denied HTTP endpoints beside it. When dashboard hosting is
disabled, those routes are not mapped.
Suggested endpoint layout:
Endpoint layout:
```text
/dashboard
@@ -45,7 +49,7 @@ Suggested endpoint layout:
/dashboard/workers
/dashboard/events
/dashboard/settings
/_blazor
/dashboard/_blazor
```
The app should redirect `/` to `/dashboard` only if the deployment wants the
@@ -59,9 +63,10 @@ MxGateway.Server
Components/
App.razor
Routes.razor
DashboardPageBase.cs
DashboardDisplay.cs
Layout/
DashboardLayout.razor
NavMenu.razor
Pages/
DashboardHome.razor
SessionsPage.razor
@@ -69,26 +74,21 @@ MxGateway.Server
WorkersPage.razor
EventsPage.razor
SettingsPage.razor
Components/
Shared/
MetricCard.razor
SessionTable.razor
WorkerTable.razor
EventRatePanel.razor
StatusBadge.razor
FaultList.razor
Services/
DashboardSnapshotService.cs
DashboardUpdateHub.cs
DashboardAuthorization.cs
Models/
DashboardSnapshot.cs
SessionSummary.cs
WorkerSummary.cs
MetricSummary.cs
DashboardSnapshotService.cs
DashboardAuthorizationHandler.cs
DashboardAuthenticator.cs
DashboardSnapshot.cs
DashboardSessionSummary.cs
DashboardWorkerSummary.cs
DashboardMetricSummary.cs
```
`DashboardUpdateHub` here means an internal application update service, not a
separate public SignalR hub unless implementation proves one is needed. Blazor
Server already uses SignalR for UI circuits.
Blazor Server provides the SignalR circuit for UI updates. The implementation
does not add a separate public dashboard hub.
## Dashboard Data Source
@@ -137,7 +137,7 @@ gateway internals.
Use Blazor Server component state updates for real-time dashboard refresh.
Recommended pattern:
Implemented pattern:
1. Page/component subscribes to `WatchSnapshotsAsync`.
2. Snapshot service emits updates from a bounded channel or timer.
@@ -147,10 +147,8 @@ Recommended pattern:
Default update cadence:
- immediate update on session create/close/fault,
- immediate update on worker fault,
- periodic metrics refresh every 1 second,
- event-rate windows updated every 1 second.
- event counters update on the next snapshot tick.
Avoid pushing every MXAccess data-change event to the dashboard. Aggregate event
counts and rates instead.
@@ -320,7 +318,9 @@ Suggested configuration:
## Styling
Use Bootstrap utility classes and a small local stylesheet.
The dashboard serves Bootstrap 5.3.3 assets from
`src/MxGateway.Server/wwwroot/lib/bootstrap/` and local layout/status styling
from `src/MxGateway.Server/wwwroot/css/dashboard.css`.
Recommended visual language:
@@ -361,15 +361,18 @@ Integration tests should verify:
## Initial Implementation Slice
The first dashboard slice should implement:
The first dashboard slice implements:
1. Blazor Server hosting in `MxGateway.Server`.
2. Bootstrap static assets.
2. local Bootstrap static assets.
3. dashboard configuration binding.
4. dashboard auth using API key login and HTTP-only cookie.
5. read-only `DashboardSnapshotService`.
6. home page with metric cards.
7. sessions page with active session table.
7. sessions page with active session table and session details.
8. workers page with worker table.
9. 1-second realtime refresh through Blazor Server.
10. redaction tests for secrets.
9. events page with aggregate counters.
10. settings page with redacted effective configuration.
11. periodic realtime refresh through Blazor Server.
12. route-mapping tests, disabled-dashboard tests, auth tests, and snapshot
projection/redaction tests.
@@ -277,6 +277,8 @@ Live tests:
Labels: `area:worker`, `type:feature`, `priority:p0`
Status: implemented.
Deliverables:
- handlers for `OnDataChange`,
+45 -3
View File
@@ -348,9 +348,28 @@ Event handling rules:
- Enqueue to the outbound event queue.
- Return quickly to preserve message pumping.
If event conversion throws, catch it inside the event handler, enqueue a
structured `WorkerFault` or diagnostic event, and keep the worker alive only if
the fault policy allows it.
`MxAccessBaseEventSink` implements the COM connection-point handlers and keeps
the handlers limited to event argument conversion plus enqueue. It uses
`MxAccessEventMapper` to create `MxEvent` DTOs for `OnDataChange`,
`OnWriteComplete`, `OperationComplete`, and `OnBufferedDataChange`. The mapper
converts scalar and array values through `VariantConverter`, converts
`MXSTATUS_PROXY[]` through `MxStatusProxyConverter`, and maps installed
`MxDataType` values to the public protobuf enum while preserving the raw data
type on buffered events. `OperationComplete` is only emitted from the native
`OperationComplete` handler; write completion does not synthesize it.
`MxAccessEventQueue` is the bounded outbound event queue for one worker
session. It assigns the monotonic `WorkerSequence` and `WorkerTimestamp` when an
event is accepted, preserving the order in which MXAccess handlers enqueue
events. The default capacity is `10000`. When the queue reaches capacity it
records a `WorkerFaultCategory.QueueOverflow` fault and rejects further events.
The event handler catches conversion and enqueue failures, records the first
fault on the queue, and returns to the STA message pump instead of writing to
the pipe.
If event conversion throws, catch it inside the event handler, record a
structured `WorkerFault`, and keep the worker alive only if the fault policy
allows it.
## Command Queue
@@ -451,6 +470,26 @@ cross-server handle behavior remains owned by MXAccess. COM exceptions continue
through `StaCommandDispatcher`, which preserves the HRESULT and leaves
diagnostic registry state unchanged for failed cleanup calls.
`MxAccessCommandExecutor` implements advice lifecycle commands on the same STA
path:
- `Advise` calls `LMXProxyServerClass.Advise` with the requested server handle
and item handle.
- `AdviseSupervisory` calls `LMXProxyServerClass.AdviseSupervisory` with the
requested server handle and item handle. This remains a distinct command from
plain `Advise` even though observed scalar captures share the same lower-level
subscription body.
- `UnAdvise` calls `LMXProxyServerClass.UnAdvise` with the requested server
handle and item handle.
The worker records plain and supervisory advice separately only after the COM
call returns normally. Successful `UnAdvise` removes all tracked advice for the
server and item pair because the public MXAccess cleanup method has no plain
versus supervisory selector. Successful `RemoveItem` and `Unregister` also clear
related advice state from the worker registry. Failed advice and cleanup calls
leave registry state unchanged so diagnostics continue to reflect the last
successful MXAccess-owned state transition.
## Handle Registry
The worker should track MXAccess state for diagnostics and cleanup, while still
@@ -475,6 +514,9 @@ Rules:
- Remove server handles only after `Unregister` succeeds.
- Record item handles only after `AddItem` or `AddItem2` succeeds.
- Remove item handles only after `RemoveItem` succeeds.
- Record advice state only after `Advise` or `AdviseSupervisory` succeeds.
- Remove advice state only after `UnAdvise`, `RemoveItem`, or `Unregister`
succeeds.
- Preserve invalid-handle behavior from MXAccess.
- Preserve cross-server handle behavior from MXAccess.
- Use registry state for cleanup and diagnostics, not semantic correction.