docs(dcl): design for MxGateway data connection (2nd protocol)

Add design doc for a second data-connection protocol, MxGateway, alongside
the OPC UA client. New IDataConnection adapter behind the existing
DataConnectionFactory extension point; tag pipe (read/subscribe/write) plus
Galaxy hierarchy browse, optional 2nd endpoint for failover. Generalizes the
OPC-UA-named browse plumbing to protocol-agnostic browse via
IBrowsableDataConnection. No entity/schema changes.
This commit is contained in:
Joseph Doherty
2026-05-29 07:28:21 -04:00
parent 5c98d23800
commit 8730c6e30a
@@ -0,0 +1,199 @@
# MxGateway Data Connection — Design
**Date:** 2026-05-28
**Component:** Data Connection Layer (#4), with touches to Commons (#16), Central UI (#9), Host (#15)
**Status:** Approved — ready for implementation planning
## Summary
Add a second data-connection protocol, **MxGateway**, alongside the existing OPC UA
client. MxGateway connects to the MxAccess Gateway
(`https://gitea.dohertylan.com/dohertj2`, packages `ZB.MOM.WW.MxGateway.Client` +
`ZB.MOM.WW.MxGateway.Contracts`) over gRPC and exposes an AVEVA/Wonderware
MXAccess-backed Galaxy as a clean tag-value pipe, identical in role to the OPC UA
adapter.
The Data Connection Layer was built for exactly this: `DataConnectionFactory`
exposes `RegisterAdapter(protocolType, factory)` and every surrounding mechanism
(the `DataConnectionActor` Become/Stash state machine, primary/backup failover,
health reporting, re-subscribe-on-reconnect) is protocol-agnostic. The new
protocol is a single `IDataConnection` adapter plus one registration line — no
changes to the actor, the entity schema, or the failover machinery.
## Scope
**In scope (this slice):**
- Read / Subscribe / Write — MxGateway as a clean tag-value pipe.
- Galaxy hierarchy browse for the instance-config tag picker.
- Optional second endpoint for failover (reusing the existing primary/backup model).
**Out of scope (possible later slices):**
- Native MXAccess alarms (`QueryActiveAlarms` / `StreamAlarms` / `AcknowledgeAlarm`).
ScadaBridge evaluates its own alarms via Alarm Actors from tag values; native
alarms are a new concept.
- Secured writes (`WriteSecured`, operator + verifier userId). Plain writes carry a
configurable `WriteUserId` only.
## Decisions
| Decision | Choice |
|---|---|
| Approach | New `IDataConnection` adapter behind the existing factory extension point (not a shared base class, not a separate subsystem). |
| Protocol string | `"MxGateway"` (matches the NuGet package family). |
| Browse plumbing | **Generalized** to protocol-agnostic browse driven by `IBrowsableDataConnection`; OPC UA and MxGateway share one path. |
| Write user context | Optional `WriteUserId` config field, default `0`. No script API change. |
| Endpoint redundancy | Reuse existing primary/backup failover; backup = a second gateway endpoint. |
| ApiKey secret handling | Match whatever OPC UA `UserIdentityConfig` username/password does today. |
## Section 1 — Adapter & client lifecycle mapping
New project-internal `MxGatewayDataConnection : IDataConnection, IBrowsableDataConnection`
in `ZB.MOM.WW.ScadaBridge.DataConnectionLayer/Adapters/`, wrapping an injected
`IMxGatewayClientFactory` (mirrors the `IOpcUaClientFactory` seam so it is
unit-testable with a fake).
| `IDataConnection` | MxGateway client |
|---|---|
| `ConnectAsync(details)` | `MxGatewayClient.Create(Endpoint, ApiKey, TLS)``OpenSessionAsync``RegisterAsync(clientName)` (store `serverHandle`); start background `StreamEventsAsync` consumer loop |
| `SubscribeAsync(tagPath, cb)` | `AddItemAsync``AdviseAsync` (or `SubscribeBulkAsync`); map `itemHandle ↔ tagPath ↔ callback`; return subscriptionId |
| `UnsubscribeAsync(id)` | `UnAdviseAsync` + `RemoveItemAsync` |
| `ReadAsync` / `ReadBatchAsync` | `ReadBulkAsync` (uses cached advised value when present) |
| `WriteAsync` / `WriteBatchAsync` | `WriteBulkAsync` with `WriteUserId`; value via `ToMxValue()` |
| `WriteBatchAndWaitAsync` | generic compose: write values → write flag → poll `responsePath` (advised value or `ReadBulk`) until match/timeout |
| `Status` | `ConnectionHealth` tracked across session state |
| `Disconnected` | fired once (Interlocked guard) when `StreamEventsAsync` faults or the channel breaks |
**Value/quality mapping.** Each `OnDataChange` `MxEvent` carries `item_handle`,
`value` (`MxValue``ToClrValue()`), `quality` (OPC-style int), `source_timestamp`,
`statuses`, and `worker_sequence`. Dispatched to the matching tag's
`SubscriptionCallback` as `TagValue(ToClrValue(value), mapQuality(quality, statuses),
source_timestamp)`. Quality: `quality >= 192``Good`; bad-category status → `Bad`;
otherwise `Uncertain`. The loop tracks `worker_sequence` and resumes with
`afterWorkerSequence` on reconnect so no change is missed.
**Reconnection needs no new logic.** The existing `DataConnectionActor` catches
`Disconnected`, pushes bad quality to all subscribed tags, disposes the adapter, and
on retry calls `ConnectAsync` on a fresh adapter then re-subscribes all tags —
identical to OPC UA.
## Section 2 — Configuration, secrets & endpoint redundancy
New `MxGatewayEndpointConfig` in Commons (alongside `OpcUaEndpointConfig`) with a
matching `MxGatewayEndpointConfigSerializer` (flat-dict ⇄ JSON) and
`MxGatewayEndpointConfigValidator`. Stored exactly like OPC UA: per-connection JSON
in `DataConnection.PrimaryConfiguration` / `BackupConfiguration`. **Primary/backup
failover works for free** — backup = a second gateway endpoint, round-robin, no
auto-failback, driven by the existing `FailoverRetryCount` state machine. No entity
or migration changes.
| Key | Type | Default | Notes |
|---|---|---|---|
| `Endpoint` | string | `http://localhost:5000` | Gateway base URL |
| `ApiKey` | string | — | Sent as `authorization: Bearer <key>` |
| `ClientName` | string | `scadabridge-<connName>` | Registration name |
| `WriteUserId` | int | `0` | Applied to every write-back |
| `UseTls` / `CaFile` / `ServerName` | bool/string/string | `false` / — / — | TLS to a secured gateway |
| `ReadTimeoutMs` | int | `5000` | `ReadBulk` per-call timeout |
**Secrets.** `ApiKey` follows whatever OPC UA `UserIdentityConfig` username/password
does today (same at-rest treatment, same log/telemetry redaction). Match that pattern
exactly; if OPC UA stores credentials in plaintext, `ApiKey` inherits the same known
limitation (not a new regression) — flag during implementation.
**Shared settings** (`ReconnectInterval`, `TagResolutionRetryInterval`,
`WriteTimeout`) stay in `DataConnectionOptions`, unchanged, applying to all protocols.
## Section 3 — Protocol-agnostic browse (tag picker)
`IBrowsableDataConnection` is already protocol-neutral (node ids are opaque strings).
Generalize the OPC-UA-named plumbing so both protocols flow through one path.
**Renames (site + central + UI):**
| Today | Becomes |
|---|---|
| `BrowseOpcUaNodeCommand` / `BrowseOpcUaNodeResult` | `BrowseNodeCommand` / `BrowseNodeResult` |
| `OpcUaBrowseService` / `IOpcUaBrowseService` | `BrowseService` / `IBrowseService` |
| `OpcUaBrowserDialog.razor` | `NodeBrowserDialog.razor` |
| `BrowseFailure` / `BrowseFailureKind` | kept (already generic) |
`DataConnectionManagerActor` resolves the connection, checks
`adapter is IBrowsableDataConnection`, and calls `BrowseChildrenAsync(parentNodeId)`
regardless of protocol (already the OPC UA logic — just drop the "OpcUa" from names).
Adapters without the interface return a "browse not supported" failure (unchanged).
**MxGateway side.** `MxGatewayDataConnection.BrowseChildrenAsync` wraps
`GalaxyRepositoryClient.BrowseChildrenAsync` (one Galaxy level per call). Mapping:
- Galaxy object → `BrowseNode(NodeId = gobjectId-or-contained-path,
DisplayName = tagName, NodeClass = Object, HasChildren = child_has_children[i])`.
- Each object's attributes → `BrowseNode(NodeId = FullTagReference,
NodeClass = Variable, HasChildren = false)` — Variable rows are the selectable tag
paths stored in instance config.
`GalaxyRepositoryClient` is a separate gRPC client from `MxGatewayClient`, so the
adapter holds both (same Endpoint + ApiKey): browse uses the read-only repository
client, the hot path uses the gateway client. The tag-picker dialog opens identically
for either protocol; only the tree shape and opaque node-id strings differ.
## Section 4 — Packaging, DI registration & error classification
**NuGet feed.** Add a repo-root `nuget.config` declaring the Gitea feed
(`https://gitea.dohertylan.com/api/packages/dohertj2/nuget/index.json`) alongside
nuget.org. Credentials are **not** committed — from the developer's `~/.nuget`, or
for the Docker image build a build-arg/secret-mounted credential (wire into
`docker/deploy.sh`). The DCL project references `ZB.MOM.WW.MxGateway.Client`
(`…Contracts` transitively); both target net10.0.
**DI registration** in `DataConnectionFactory`:
```csharp
RegisterAdapter("MxGateway", details => new MxGatewayDataConnection(
new MxGatewayClientFactory(_loggerFactory),
_loggerFactory.CreateLogger<MxGatewayDataConnection>()));
```
plus an `MxGatewayGlobalOptions` (parallel to `OpcUaGlobalOptions`) bound in Host.
OPC UA registration untouched.
**Error classification** (drives bad-quality push vs. synchronous script error):
- *Connection/transport faults* (`MxGatewaySessionException`, gRPC unavailable, stream
break) → `Disconnected` → reconnect + bad quality. Transient.
- *Per-item read/write failures* (`BulkReadResult` / `BulkWriteResult` with
`WasSuccessful = false`: bad tag, MXAccess rejection) → returned to caller (write) or
bad quality (read). Not a disconnect.
- *Auth failures* (`MxGatewayAuthenticationException` / `…AuthorizationException`) →
treated like a failed connect (logged, retried on failover/reconnect cadence); a
rotated key is operationally a connection problem, not per-tag.
Matches OPC UA's "operations fail immediately to the caller; connection loss triggers
reconnect" split.
## Section 5 — Testing, docs & deploy
**Testing** (fake client seam, no live gateway, following the OPC UA adapter style):
- `MxGatewayDataConnection` against a `FakeMxGatewayClient`: connect→register→advise
lifecycle; `OnDataChange` → `TagValue` dispatch incl. quality mapping; read/write/batch
success + per-item failure; `WriteBatchAndWait` match & timeout; `Disconnected` fires
once on stream fault; `worker_sequence` resume on reconnect.
- `MxGatewayEndpointConfigSerializer` / `Validator` round-trip + defaults +
invalid-numeric fallback.
- Browse mapping (object→Object, attribute→Variable, `HasChildren` hint) against a fake
repository client.
- Generalized-browse regression: existing OPC UA browse tests updated to renamed
`BrowseNodeCommand` / `BrowseService` and still passing.
**Docs (spec travels with code):**
- `Component-DataConnectionLayer.md`: add MxGateway under "Supported Protocols", an
"MxGateway Settings" config table, note `IBrowsableDataConnection` now backs both
protocols.
- `README.md` protocol mentions if any.
- This design doc.
**Deploy.** `bash docker/deploy.sh` rebuilds the image; only deploy-config change is
NuGet credential wiring for restore. Sites get the adapter automatically (compiled into
Host). No new ports/services — the adapter is an outbound gRPC client to the gateway.
**Affected components:** DCL (adapter, factory, options), Commons (config type,
serializer, validator, renamed browse messages + `IBrowsableDataConnection`
consumers), Configuration Database (none — no schema change), Central UI (renamed
browse service/dialog, protocol selector + `MxGatewayEndpointEditor` in
`DataConnectionForm` — net-new UI, use `frontend-design` skill), Host (options
binding), tests, docs, `nuget.config`.