diff --git a/docs/plans/2026-05-28-mxgateway-data-connection-design.md b/docs/plans/2026-05-28-mxgateway-data-connection-design.md new file mode 100644 index 00000000..c5365ffe --- /dev/null +++ b/docs/plans/2026-05-28-mxgateway-data-connection-design.md @@ -0,0 +1,199 @@ +# MxGateway Data Connection — Design + +**Date:** 2026-05-28 +**Component:** Data Connection Layer (#4), with touches to Commons (#16), Central UI (#9), Host (#15) +**Status:** Approved — ready for implementation planning + +## Summary + +Add a second data-connection protocol, **MxGateway**, alongside the existing OPC UA +client. MxGateway connects to the MxAccess Gateway +(`https://gitea.dohertylan.com/dohertj2`, packages `ZB.MOM.WW.MxGateway.Client` + +`ZB.MOM.WW.MxGateway.Contracts`) over gRPC and exposes an AVEVA/Wonderware +MXAccess-backed Galaxy as a clean tag-value pipe, identical in role to the OPC UA +adapter. + +The Data Connection Layer was built for exactly this: `DataConnectionFactory` +exposes `RegisterAdapter(protocolType, factory)` and every surrounding mechanism +(the `DataConnectionActor` Become/Stash state machine, primary/backup failover, +health reporting, re-subscribe-on-reconnect) is protocol-agnostic. The new +protocol is a single `IDataConnection` adapter plus one registration line — no +changes to the actor, the entity schema, or the failover machinery. + +## Scope + +**In scope (this slice):** +- Read / Subscribe / Write — MxGateway as a clean tag-value pipe. +- Galaxy hierarchy browse for the instance-config tag picker. +- Optional second endpoint for failover (reusing the existing primary/backup model). + +**Out of scope (possible later slices):** +- Native MXAccess alarms (`QueryActiveAlarms` / `StreamAlarms` / `AcknowledgeAlarm`). + ScadaBridge evaluates its own alarms via Alarm Actors from tag values; native + alarms are a new concept. +- Secured writes (`WriteSecured`, operator + verifier userId). Plain writes carry a + configurable `WriteUserId` only. + +## Decisions + +| Decision | Choice | +|---|---| +| Approach | New `IDataConnection` adapter behind the existing factory extension point (not a shared base class, not a separate subsystem). | +| Protocol string | `"MxGateway"` (matches the NuGet package family). | +| Browse plumbing | **Generalized** to protocol-agnostic browse driven by `IBrowsableDataConnection`; OPC UA and MxGateway share one path. | +| Write user context | Optional `WriteUserId` config field, default `0`. No script API change. | +| Endpoint redundancy | Reuse existing primary/backup failover; backup = a second gateway endpoint. | +| ApiKey secret handling | Match whatever OPC UA `UserIdentityConfig` username/password does today. | + +## Section 1 — Adapter & client lifecycle mapping + +New project-internal `MxGatewayDataConnection : IDataConnection, IBrowsableDataConnection` +in `ZB.MOM.WW.ScadaBridge.DataConnectionLayer/Adapters/`, wrapping an injected +`IMxGatewayClientFactory` (mirrors the `IOpcUaClientFactory` seam so it is +unit-testable with a fake). + +| `IDataConnection` | MxGateway client | +|---|---| +| `ConnectAsync(details)` | `MxGatewayClient.Create(Endpoint, ApiKey, TLS)` → `OpenSessionAsync` → `RegisterAsync(clientName)` (store `serverHandle`); start background `StreamEventsAsync` consumer loop | +| `SubscribeAsync(tagPath, cb)` | `AddItemAsync` → `AdviseAsync` (or `SubscribeBulkAsync`); map `itemHandle ↔ tagPath ↔ callback`; return subscriptionId | +| `UnsubscribeAsync(id)` | `UnAdviseAsync` + `RemoveItemAsync` | +| `ReadAsync` / `ReadBatchAsync` | `ReadBulkAsync` (uses cached advised value when present) | +| `WriteAsync` / `WriteBatchAsync` | `WriteBulkAsync` with `WriteUserId`; value via `ToMxValue()` | +| `WriteBatchAndWaitAsync` | generic compose: write values → write flag → poll `responsePath` (advised value or `ReadBulk`) until match/timeout | +| `Status` | `ConnectionHealth` tracked across session state | +| `Disconnected` | fired once (Interlocked guard) when `StreamEventsAsync` faults or the channel breaks | + +**Value/quality mapping.** Each `OnDataChange` `MxEvent` carries `item_handle`, +`value` (`MxValue` → `ToClrValue()`), `quality` (OPC-style int), `source_timestamp`, +`statuses`, and `worker_sequence`. Dispatched to the matching tag's +`SubscriptionCallback` as `TagValue(ToClrValue(value), mapQuality(quality, statuses), +source_timestamp)`. Quality: `quality >= 192` → `Good`; bad-category status → `Bad`; +otherwise `Uncertain`. The loop tracks `worker_sequence` and resumes with +`afterWorkerSequence` on reconnect so no change is missed. + +**Reconnection needs no new logic.** The existing `DataConnectionActor` catches +`Disconnected`, pushes bad quality to all subscribed tags, disposes the adapter, and +on retry calls `ConnectAsync` on a fresh adapter then re-subscribes all tags — +identical to OPC UA. + +## Section 2 — Configuration, secrets & endpoint redundancy + +New `MxGatewayEndpointConfig` in Commons (alongside `OpcUaEndpointConfig`) with a +matching `MxGatewayEndpointConfigSerializer` (flat-dict ⇄ JSON) and +`MxGatewayEndpointConfigValidator`. Stored exactly like OPC UA: per-connection JSON +in `DataConnection.PrimaryConfiguration` / `BackupConfiguration`. **Primary/backup +failover works for free** — backup = a second gateway endpoint, round-robin, no +auto-failback, driven by the existing `FailoverRetryCount` state machine. No entity +or migration changes. + +| Key | Type | Default | Notes | +|---|---|---|---| +| `Endpoint` | string | `http://localhost:5000` | Gateway base URL | +| `ApiKey` | string | — | Sent as `authorization: Bearer ` | +| `ClientName` | string | `scadabridge-` | Registration name | +| `WriteUserId` | int | `0` | Applied to every write-back | +| `UseTls` / `CaFile` / `ServerName` | bool/string/string | `false` / — / — | TLS to a secured gateway | +| `ReadTimeoutMs` | int | `5000` | `ReadBulk` per-call timeout | + +**Secrets.** `ApiKey` follows whatever OPC UA `UserIdentityConfig` username/password +does today (same at-rest treatment, same log/telemetry redaction). Match that pattern +exactly; if OPC UA stores credentials in plaintext, `ApiKey` inherits the same known +limitation (not a new regression) — flag during implementation. + +**Shared settings** (`ReconnectInterval`, `TagResolutionRetryInterval`, +`WriteTimeout`) stay in `DataConnectionOptions`, unchanged, applying to all protocols. + +## Section 3 — Protocol-agnostic browse (tag picker) + +`IBrowsableDataConnection` is already protocol-neutral (node ids are opaque strings). +Generalize the OPC-UA-named plumbing so both protocols flow through one path. + +**Renames (site + central + UI):** + +| Today | Becomes | +|---|---| +| `BrowseOpcUaNodeCommand` / `BrowseOpcUaNodeResult` | `BrowseNodeCommand` / `BrowseNodeResult` | +| `OpcUaBrowseService` / `IOpcUaBrowseService` | `BrowseService` / `IBrowseService` | +| `OpcUaBrowserDialog.razor` | `NodeBrowserDialog.razor` | +| `BrowseFailure` / `BrowseFailureKind` | kept (already generic) | + +`DataConnectionManagerActor` resolves the connection, checks +`adapter is IBrowsableDataConnection`, and calls `BrowseChildrenAsync(parentNodeId)` +regardless of protocol (already the OPC UA logic — just drop the "OpcUa" from names). +Adapters without the interface return a "browse not supported" failure (unchanged). + +**MxGateway side.** `MxGatewayDataConnection.BrowseChildrenAsync` wraps +`GalaxyRepositoryClient.BrowseChildrenAsync` (one Galaxy level per call). Mapping: +- Galaxy object → `BrowseNode(NodeId = gobjectId-or-contained-path, + DisplayName = tagName, NodeClass = Object, HasChildren = child_has_children[i])`. +- Each object's attributes → `BrowseNode(NodeId = FullTagReference, + NodeClass = Variable, HasChildren = false)` — Variable rows are the selectable tag + paths stored in instance config. + +`GalaxyRepositoryClient` is a separate gRPC client from `MxGatewayClient`, so the +adapter holds both (same Endpoint + ApiKey): browse uses the read-only repository +client, the hot path uses the gateway client. The tag-picker dialog opens identically +for either protocol; only the tree shape and opaque node-id strings differ. + +## Section 4 — Packaging, DI registration & error classification + +**NuGet feed.** Add a repo-root `nuget.config` declaring the Gitea feed +(`https://gitea.dohertylan.com/api/packages/dohertj2/nuget/index.json`) alongside +nuget.org. Credentials are **not** committed — from the developer's `~/.nuget`, or +for the Docker image build a build-arg/secret-mounted credential (wire into +`docker/deploy.sh`). The DCL project references `ZB.MOM.WW.MxGateway.Client` +(`…Contracts` transitively); both target net10.0. + +**DI registration** in `DataConnectionFactory`: +```csharp +RegisterAdapter("MxGateway", details => new MxGatewayDataConnection( + new MxGatewayClientFactory(_loggerFactory), + _loggerFactory.CreateLogger())); +``` +plus an `MxGatewayGlobalOptions` (parallel to `OpcUaGlobalOptions`) bound in Host. +OPC UA registration untouched. + +**Error classification** (drives bad-quality push vs. synchronous script error): +- *Connection/transport faults* (`MxGatewaySessionException`, gRPC unavailable, stream + break) → `Disconnected` → reconnect + bad quality. Transient. +- *Per-item read/write failures* (`BulkReadResult` / `BulkWriteResult` with + `WasSuccessful = false`: bad tag, MXAccess rejection) → returned to caller (write) or + bad quality (read). Not a disconnect. +- *Auth failures* (`MxGatewayAuthenticationException` / `…AuthorizationException`) → + treated like a failed connect (logged, retried on failover/reconnect cadence); a + rotated key is operationally a connection problem, not per-tag. + +Matches OPC UA's "operations fail immediately to the caller; connection loss triggers +reconnect" split. + +## Section 5 — Testing, docs & deploy + +**Testing** (fake client seam, no live gateway, following the OPC UA adapter style): +- `MxGatewayDataConnection` against a `FakeMxGatewayClient`: connect→register→advise + lifecycle; `OnDataChange` → `TagValue` dispatch incl. quality mapping; read/write/batch + success + per-item failure; `WriteBatchAndWait` match & timeout; `Disconnected` fires + once on stream fault; `worker_sequence` resume on reconnect. +- `MxGatewayEndpointConfigSerializer` / `Validator` round-trip + defaults + + invalid-numeric fallback. +- Browse mapping (object→Object, attribute→Variable, `HasChildren` hint) against a fake + repository client. +- Generalized-browse regression: existing OPC UA browse tests updated to renamed + `BrowseNodeCommand` / `BrowseService` and still passing. + +**Docs (spec travels with code):** +- `Component-DataConnectionLayer.md`: add MxGateway under "Supported Protocols", an + "MxGateway Settings" config table, note `IBrowsableDataConnection` now backs both + protocols. +- `README.md` protocol mentions if any. +- This design doc. + +**Deploy.** `bash docker/deploy.sh` rebuilds the image; only deploy-config change is +NuGet credential wiring for restore. Sites get the adapter automatically (compiled into +Host). No new ports/services — the adapter is an outbound gRPC client to the gateway. + +**Affected components:** DCL (adapter, factory, options), Commons (config type, +serializer, validator, renamed browse messages + `IBrowsableDataConnection` +consumers), Configuration Database (none — no schema change), Central UI (renamed +browse service/dialog, protocol selector + `MxGatewayEndpointEditor` in +`DataConnectionForm` — net-new UI, use `frontend-design` skill), Host (options +binding), tests, docs, `nuget.config`.