docs: alarms-over-gateway — add Track E client surface refresh

Cover both client surfaces that become user-visible when the alarm
path lights up:

- mxaccessgw client SDKs in 5 languages (.NET, Python, Go, Java,
  Rust). E.1 regens proto across all of them; E.2-E.6 add per-language
  alarms helpers (subscribe / acknowledge / query-active) plus matching
  CLI verbs.
- lmxopcua OPC UA-facing clients (Client.CLI, Client.UI). E.7 extends
  AlarmEventArgs with the new optional fields, surfaces them in the
  CLI's --verbose / --json output and the UI's Show-details toggle,
  and updates ClientRequirements + Client.{CLI,UI}.md.

Sequencing: E.1 first (mechanical regen), then E.2-E.7 in parallel.
E.2 (.NET) is on the critical path because lmxopcua consumes it; the
other-language SDKs can ship asynchronously without gating D.1.

12 PRs grew to 19 total: 4 in A, 5 in B, 2 in C, 7 in E, 1 in D.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
Joseph Doherty
2026-04-30 15:20:57 -04:00
parent 8d0e13e69e
commit 49ae6e7b6f

View File

@@ -534,6 +534,252 @@ completes that slot. Two PRs in the sidecar + one consumer-side PR
C.2's lmxopcua-side consumer is **PR B.4 in Track B**, which depends
on C.2 being deployed.
## Track E — client surface refresh
Two surfaces become user-visible when the alarm path lights up: the
**mxaccessgw client SDKs** (5 languages, each with its own CLI) that
consume the new `OnAlarmTransition` event family + `AcknowledgeAlarm`
/ `QueryActiveAlarms` RPCs directly, and the **lmxopcua OPC UA-facing
clients** (Client.CLI, Client.UI) that consume the richer Part 9
condition payload through the OPC UA server. Both need updates so the
new fields actually reach end-users; without Track E, the data
arrives at the gateway / OPC UA server but the off-the-shelf clients
display the same five columns they did under v2-pre-this-epic.
Track E is split per-language so each PR stays small and reviewable.
PRs E.2 through E.6 are independent — they share only the proto
regen from E.1 — and can land in parallel by whoever owns each
language binding.
### PR E.1 — regenerate proto across all client SDKs
**Depends on:** A.1 merged (proto change live).
**Files** (`c:\Users\dohertj2\Desktop\mxaccessgw\clients\`):
1. **.NET** — codegen runs on csproj rebuild via `Grpc.Tools`; just
rebuild `MxGateway.Client.csproj` after pulling A.1.
2. **Python** — run `clients\python\generate-proto.ps1`; commit the
regenerated `_pb2.py` + `_pb2_grpc.py` files under
`clients\python\src\`.
3. **Go** — run `clients\go\generate-proto.ps1`; commit the
regenerated `*.pb.go` + `*_grpc.pb.go` files under
`clients\go\mxgateway\`.
4. **Java** — Gradle's `protobuf-gradle-plugin` regenerates on
`gradle build`; verify the new types appear in the build
output. Commit any pinned generated source under
`clients\java\mxgateway-client\src\main\java\` if that's the
convention (check `JavaClientDesign.md`).
5. **Rust**`build.rs` runs `tonic-build` on the proto; just
`cargo build`. Generated code lives under
`clients\rust\target\` (gitignored) — nothing to commit;
verify the new types compile.
No hand-written code in this PR. Pure regen + commit of generated
artifacts. Per-language pre-existing proto-regen tests in each
client's test suite must stay green.
### PR E.2 — .NET client SDK + CLI
**Depends on:** E.1, A.3 (gateway alarm dispatch + ack RPC live).
**Files** (`clients\dotnet\MxGateway.Client\` + `MxGateway.Client.Cli\`):
1. `MxGatewayClient.cs` — new public methods:
```csharp
IAsyncEnumerable<AlarmTransition> SubscribeAlarmsAsync(
IAsyncEnumerable<MxGatewaySession> session,
AlarmFilter? filter = null,
CancellationToken ct = default);
Task<MxStatus> AcknowledgeAlarmAsync(
MxGatewaySession session,
string alarmFullReference,
string comment,
string userPrincipal,
CancellationToken ct = default);
IAsyncEnumerable<ActiveAlarmSnapshot> QueryActiveAlarmsAsync(
MxGatewaySession session,
string? filterPrefix = null,
CancellationToken ct = default);
```
Existing `MxGatewayClientRetryPolicy` covers the new operations
without bespoke retry config.
2. `MxGateway.Client.Cli` — add `alarms` verb with subcommands:
`subscribe` (streams transitions until cancelled),
`acknowledge --ref <full-ref> --comment "<text>"`,
`query-active [--prefix <equipment>]`. Output formatting mirrors
the existing `events stream` verb (default human-readable +
`--json` flag for machine output).
3. AuthN — `MxGatewayClientOptions` validates new scopes
`invoke:alarm-ack` / `invoke:alarm-query` exist on the API key
when those operations are invoked; pre-flight check fails fast
with a clear error rather than letting the gateway return
`PERMISSION_DENIED` mid-stream.
**Tests** (`clients\dotnet\MxGateway.Client.Tests\`):
- `FakeGatewayTransport` extended to emit `OnAlarmTransition`
events; assert `SubscribeAlarmsAsync` yields each as the right
payload shape.
- Ack: assert request shape, retry policy, and error wrapping
(Unauthenticated → `MxGatewayAuthenticationException`,
PermissionDenied → `MxGatewayAuthorizationException`,
resource-exhausted → `MxGatewayException` with the right
message).
- CLI verb tests in `MxGatewayClientCliTests.cs` — argument
parsing, JSON output shape, exit codes.
### PR E.3 — Python client SDK + CLI
**Depends on:** E.1.
**Files** (`clients\python\src\mxgateway\` + the existing CLI entry
point — verify the exact name during PR; `PythonClientDesign.md`
documents it):
1. New module `alarms.py` exposing async helpers:
```python
async def subscribe_alarms(session, *, filter=None) -> AsyncIterator[AlarmTransition]: ...
async def acknowledge_alarm(session, *, alarm_ref, comment, user) -> MxStatus: ...
async def query_active_alarms(session, *, prefix=None) -> AsyncIterator[ActiveAlarmSnapshot]: ...
```
2. CLI: add `alarms subscribe / acknowledge / query-active` verbs.
Use the same JSON output schema as E.2's CLI so cross-language
tooling can parse either.
3. Type stubs (`*.pyi`) updated for the new types.
**Tests** (`clients\python\tests\`):
- pytest-asyncio fixtures using a stub gRPC server; assert each
helper's request/response shape.
- CLI smoke via `subprocess` + captured stdout JSON comparison.
### PR E.4 — Go client SDK + CLI
**Depends on:** E.1.
**Files** (`clients\go\mxgateway\` + `clients\go\cmd\`):
1. New `alarms.go` exposing:
```go
func (c *Client) SubscribeAlarms(ctx context.Context, opts ...SubscribeOption) (<-chan AlarmTransition, error)
func (c *Client) AcknowledgeAlarm(ctx context.Context, ref, comment, user string) (MxStatus, error)
func (c *Client) QueryActiveAlarms(ctx context.Context, prefix string) ([]ActiveAlarmSnapshot, error)
```
2. CLI: add `alarms` subcommand under `clients\go\cmd\mxgateway-cli\`
(verify the binary name in `GoClientDesign.md`). Same verb shape
as E.2 / E.3.
3. Errors wrapped via `errors.Is` against named sentinels
(`ErrAuthFailed`, `ErrPermissionDenied`, etc.) so callers can
programmatically distinguish failure modes.
**Tests:** standard Go table-driven tests against a stub gRPC server
under `clients\go\internal\testserver\`.
### PR E.5 — Java client SDK + CLI
**Depends on:** E.1.
**Files** (`clients\java\mxgateway-client\src\main\java\` +
`clients\java\mxgateway-cli\`):
1. New methods on the existing client class (verify in
`JavaClientDesign.md`):
```java
Flowable<AlarmTransition> subscribeAlarms(Session s, AlarmFilter filter);
Single<MxStatus> acknowledgeAlarm(Session s, String alarmRef, String comment, String user);
Flowable<ActiveAlarmSnapshot> queryActiveAlarms(Session s, String prefix);
```
(RxJava idiom matching the existing data-change subscription
API; if the existing API uses `CompletableFuture` instead, follow
that convention — verify during PR.)
2. CLI: same `alarms subscribe / acknowledge / query-active`
verbs.
**Tests:** JUnit 5 + a stub gRPC server. CLI tested via
`ProcessBuilder` exec + JSON output comparison.
### PR E.6 — Rust client SDK
**Depends on:** E.1.
**Files** (`clients\rust\crates\mxgateway-client\src\` +
likely a `mxgateway-cli` crate — verify in `RustClientDesign.md`):
1. New methods on the client struct:
```rust
pub fn subscribe_alarms(&self, filter: Option<AlarmFilter>) -> impl Stream<Item = Result<AlarmTransition>>;
pub async fn acknowledge_alarm(&self, alarm_ref: &str, comment: &str, user: &str) -> Result<MxStatus>;
pub fn query_active_alarms(&self, prefix: Option<&str>) -> impl Stream<Item = Result<ActiveAlarmSnapshot>>;
```
2. CLI: same verb shape.
3. `thiserror`-based error enum extended with `AlarmAckPermissionDenied`
etc. variants if the existing pattern uses one.
**Tests:** `tokio::test` against a stub gRPC server using
`tonic-build`'s test harness. CLI tested via `assert_cmd`.
### PR E.7 — lmxopcua OPC UA-facing client refresh
**Depends on:** B.2 + B.3 (server-side payload final on the OPC UA
wire). Independent of E.2-E.6 — different consumer surface (OPC UA
Part 9, not gateway gRPC).
**Files** (`c:\Users\dohertj2\Desktop\lmxopcua\src\`):
1. `Core.Abstractions\AlarmEventArgs.cs` *(extend, not new)* — add
optional fields the new path surfaces:
- `OperatorComment` (nullable string — populated by the native
ack path; null on sub-attribute fallback path)
- `OriginalRaiseTimestampUtc` (nullable; null on fallback path)
- `AlarmCategory` (nullable string)
- `AlarmTypeName` (already exists per v1 docs — leave alone)
2. `Server\OpcUa\DriverNodeManager.cs` — populate the corresponding
OPC UA Part 9 condition fields when the new payload is non-null:
`Comment` (from OperatorComment), `Time` (from OriginalRaiseTimestampUtc
when present, else event arrival time), `ConditionClassName` (from
AlarmCategory if mapping is defined).
3. `Client.Shared\Models\AlarmEventArgs.cs` — mirror the new fields
on the client-side DTO.
4. `Client.CLI\Commands\AlarmsCommand.cs` — add columns under a new
`--verbose` flag, plus full payload under `--json`. Default output
stays five-column compatible.
5. `Client.UI\ViewModels\AlarmEventViewModel.cs` — bind the new
fields. Add columns to `Views\AlarmsView.axaml` (collapsible
under a "Show details" toggle so the default view stays compact).
Surface `OperatorComment` in `AckAlarmWindow.axaml` as a
prepopulated default when re-acknowledging an already-acked
alarm.
6. `docs\Client.CLI.md` — add the new `--verbose` and `--json`
flag examples to the alarms section.
7. `docs\Client.UI.md` — add a screenshot or description of the
"Show details" expansion behavior.
8. `docs\reqs\ClientRequirements.md` — line 116 + 153 reference
the alarm subscription contract; extend the field list to cover
the new payload.
9. `docs\AlarmTracking.md` (new in B.5) — wire in client-side
examples.
**Tests:**
- `Client.Shared.Tests` — DTO round-trip through the alarm event
pump with all fields populated and all-null cases.
- `Client.CLI.Tests``--verbose` column ordering, `--json`
schema validation, default output stays five-column.
- `Client.UI.Tests``AlarmEventViewModel` bindings exposed,
collapsible-detail toggle behavior.
### Sequencing within Track E:
E.1 first (mechanical). E.2-E.7 can land in parallel. E.7 has its own
dependency chain inside lmxopcua (B.2 + B.3) and doesn't gate any
other E PR. The .NET client (E.2) is the only language SDK
**lmxopcua** consumes today; if the gateway repo's release schedule
prefers landing E.2 first and shipping E.3-E.6 in a follow-up release,
that's a valid sequence — the customer-facing constraint is "at
least one language SDK ships at the same time as A.4 lights up the
gateway dispatch."
## Track D — deployment refresh
The dev box at `DESKTOP-6JL3KKO` runs three live services from
@@ -674,38 +920,51 @@ duration (parity-rig validation gate; see "Test gates" above).
## Sequencing matrix
```
Track A (mxaccessgw) Track B (lmxopcua) Track C (sidecar)
───────────────────────── ───────────────────────── ─────────────────────────
A.1 proto (waits) C.1 AahClientManagedAlarmEventWriter
│ │ no cross-repo dep
├──────────────────────────► B.1 EventPump branch │
A.2 worker subscription │ uses proto types only │
│ │ unit-testable │
│ C.2 Program.cs wires writer
A.3 gateway dispatch + ack RPC ──►B.2 GalaxyDriver : IAlarmSource │
│ │ │
│ ──►B.3 DriverNodeManager routing │
│ │
A.4 ConditionRefresh
B.4 SidecarAlarmHistorianWriter
(depends on C.2 deployed)
Track D (deployment)
─────────────────────────
D.1 Refresh C:\publish + restart services
(depends on A.4 + B.4 + C.2 merged)
──►B.5 docs + memory + completion banner
Track A (mxaccessgw) Track B (lmxopcua) Track C (sidecar) Track E (clients)
───────────────────────── ───────────────────────── ───────────────────── ──────────────────────────
A.1 proto (waits) C.1 AahClientManagedWriter E.1 proto regen ×5 langs
│ │ │ (mechanical, after A.1)
├──────────────────────────► B.1 EventPump branch │
A.2 worker subscription │ uses proto types only │
│ │ unit-testable │
│ C.2 Program.cs wires
A.3 gateway dispatch + ack RPC ──►B.2 GalaxyDriver : IAlarmSource │ ──►E.2 .NET SDK + CLI
│ │ │ ──►E.3 Python SDK + CLI
│ ──►B.3 DriverNodeManager routing │ ──►E.4 Go SDK + CLI
│ │ ──►E.5 Java SDK + CLI
│ ──►E.6 Rust SDK
A.4 ConditionRefresh │ │
│ │ │
B.4 SidecarAlarmHistorianWriter │
(depends on C.2 deployed) │ │
│ │
(B.2 + B.3 done) ────────────────────────────────────────────► E.7 lmxopcua client refresh
│ │
▼ │
Track D (deployment) │
───────────────────────── │
D.1 Refresh C:\publish + restart services │
(depends on A.4 + B.4 + C.2 + E.2 merged) │
▼ │
──►B.5 docs + memory + completion banner ◄─────────(E.7 done)──┘
```
A.1 + B.1 + C.1 can all land in parallel — none have cross-repo runtime
dependencies. B.1's tests use proto types without needing a running
gateway. C.1 is purely sidecar-internal. The gateway-side dispatch (A.3)
gates B.2; the sidecar-side wiring (C.2) gates B.4. D.1 (deployment
refresh) gates B.5 (docs) — the docs sweep records the as-deployed
state, so the deploy must be live first.
A.1 + B.1 + C.1 + E.1 can all land in parallel — none have cross-repo
runtime dependencies. B.1's tests use proto types without needing a
running gateway. C.1 is purely sidecar-internal. E.1 is mechanical
codegen.
The gateway-side dispatch (A.3) gates B.2 and E.2-E.6. The
sidecar-side wiring (C.2) gates B.4. E.7 gates on B.2 + B.3 only —
it's the OPC UA client surface, not the gateway client surface.
D.1 (deployment refresh) requires E.2 to also be merged because the
deployed `MxGateway.Client.dll` consumed by GalaxyDriver needs the new
methods. E.3-E.6 (other-language SDKs) don't gate D.1 — they ship on
their own release cadence.
B.5 (docs sweep) gates on D.1 + E.7 both merged — it's the final
"snapshot the as-shipped state" pass.
## Test gates
@@ -830,8 +1089,56 @@ needed); land B.4 last and only after end-of-epic gate is green.
- `docs\v2\dev-environment.md` (D.1 — document the refresh workflow)
- `docs\plans\artifacts\d1-rollout-YYYY-MM-DD.md` *(new — D.1 captured smoke run)*
Total: ~10 source files added/modified in mxaccessgw; ~14 in lmxopcua
proper; ~3 in the historian sidecar; ~2 deployment scripts; ~12 test
files across all repos. Should land in 4-6 weeks of focused work given
the parity-rig dependency for end-to-end validation, plus a short
final-week ops slot for D.1.
**mxaccessgw — client SDKs (Track E):**
- `clients\proto\` — no source change; downstream codegen consumes A.1
- **.NET (E.2)**:
- `clients\dotnet\MxGateway.Client\MxGatewayClient.cs`
- `clients\dotnet\MxGateway.Client\Alarms\` *(new namespace)*
- `clients\dotnet\MxGateway.Client.Cli\Verbs\AlarmsVerb.cs` *(new)*
- `clients\dotnet\MxGateway.Client.Tests\AlarmsTests.cs` *(new)*
- **Python (E.3)**:
- `clients\python\src\mxgateway\alarms.py` *(new)*
- `clients\python\src\mxgateway\cli\alarms.py` *(new — verify CLI module path)*
- `clients\python\tests\test_alarms.py` *(new)*
- **Go (E.4)**:
- `clients\go\mxgateway\alarms.go` *(new)*
- `clients\go\cmd\mxgateway-cli\alarms.go` *(new — verify dir name)*
- `clients\go\internal\testserver\alarms_test.go` *(new)*
- **Java (E.5)**:
- `clients\java\mxgateway-client\src\main\java\…\AlarmsApi.java` *(new)*
- `clients\java\mxgateway-cli\src\main\java\…\AlarmsCommand.java` *(new)*
- `clients\java\mxgateway-client\src\test\java\…\AlarmsApiTest.java` *(new)*
- **Rust (E.6)**:
- `clients\rust\crates\mxgateway-client\src\alarms.rs` *(new)*
- `clients\rust\crates\mxgateway-cli\src\alarms.rs` *(new — verify crate name)*
- `clients\rust\tests\alarms.rs` *(new)*
**lmxopcua — OPC UA client refresh (Track E.7):**
- `src\ZB.MOM.WW.OtOpcUa.Core.Abstractions\AlarmEventArgs.cs` (extend)
- `src\ZB.MOM.WW.OtOpcUa.Server\OpcUa\DriverNodeManager.cs` (Part 9 field population)
- `src\ZB.MOM.WW.OtOpcUa.Client.Shared\Models\AlarmEventArgs.cs` (DTO mirror)
- `src\ZB.MOM.WW.OtOpcUa.Client.CLI\Commands\AlarmsCommand.cs` (verbose / json flags)
- `src\ZB.MOM.WW.OtOpcUa.Client.UI\ViewModels\AlarmEventViewModel.cs`
- `src\ZB.MOM.WW.OtOpcUa.Client.UI\ViewModels\AlarmsViewModel.cs`
- `src\ZB.MOM.WW.OtOpcUa.Client.UI\Views\AlarmsView.axaml` (+ `.cs`)
- `src\ZB.MOM.WW.OtOpcUa.Client.UI\Views\AckAlarmWindow.axaml` (+ `.cs`)
- `docs\Client.CLI.md` (alarms section examples)
- `docs\Client.UI.md` (Show-details toggle description)
- `docs\reqs\ClientRequirements.md` (extend AlarmEventArgs contract)
- `docs\AlarmTracking.md` (B.5 — cross-link client examples)
- `tests\ZB.MOM.WW.OtOpcUa.Client.Shared.Tests\` (DTO round-trip)
- `tests\ZB.MOM.WW.OtOpcUa.Client.CLI.Tests\` (flag behaviour)
- `tests\ZB.MOM.WW.OtOpcUa.Client.UI.Tests\` (view-model bindings)
Total: ~10 source files added/modified in mxaccessgw server/worker
side; ~14 in lmxopcua server/driver side; ~3 in the historian sidecar;
~2 deployment scripts; ~30 across the five gateway-client SDK
languages; ~12 in lmxopcua client surfaces; ~25 test files across
all repos. The gateway-client multi-language work is parallelizable
across maintainers, so wall-clock effort lands in 4-6 weeks of
coordinated work given the parity-rig dependency for end-to-end
validation. If only the .NET SDK ships at first (E.2 only) and
E.3-E.6 follow asynchronously, lmxopcua's critical path stays
unchanged.