diff --git a/lmx_backend.md b/lmx_backend.md new file mode 100644 index 0000000..94f1c4d --- /dev/null +++ b/lmx_backend.md @@ -0,0 +1,274 @@ +# Galaxy / LMX Backend — Restructuring Options + +## Context + +Today the Galaxy driver is structured very differently from every other driver +in this repo: + +- **Galaxy.Proxy** (.NET 10, in-process): tiny shim that frames IPC to the host. +- **Galaxy.Host** (.NET Framework 4.8 **x86**, NSSM-wrapped Windows service): + owns MXAccess COM, the STA pump, the ZB Galaxy Repository SQL queries, the + Wonderware Historian SDK plugin, the per-platform `ScanState` probe manager, + the alarm tracker (`.InAlarm`/`.Priority`/`.DescAttrName`/`.Acked` state + machine + ack writer), recycle policy, and post-mortem MMF. + +Other drivers (Modbus, S7, AB CIP, OpcUaClient, TwinCAT, FOCAS Tier-C) are +**in-process Tier-A drivers** in the .NET 10 server. They do data + browse +only; historian and alarming are driver-agnostic concerns at the server layer. + +A sibling project, **mxaccessgw** +(`C:\Users\dohertj2\Desktop\mxaccessgw`), already provides: + +- A .NET 10 x64 gRPC gateway in front of per-session .NET 4.8 x86 worker + processes that own MXAccess COM, the STA, and event sinks + (`MxGateway.Server` + `MxGateway.Worker`). +- A full MXAccess command + event surface (`Register`, `AddItem`, `Advise`, + `Write`, `WriteSecured`, `OnDataChange`, `OnWriteComplete`, etc.). +- A cached, deploy-gated, paged **Galaxy Repository browse** RPC + (`galaxy_repository.v1`) reading the same ZB tables we read today, with the + query bodies kept byte-identical to OtOpcUa. +- A .NET client library (`clients/dotnet/MxGateway.Client`). +- API-key auth, Blazor dashboard, structured logs, metrics, watchdog/recycle. + +The proposal is to **strip Galaxy down to data + browse** — push historian and +alarming out to server-level subsystems where they live for every other driver +— and pick how the slimmed-down driver talks to MXAccess. + +--- + +## What "push historian and alarming out" means + +Both options below assume the same scope reduction; they only differ in how +the driver reaches MXAccess. + +| Concern | Today (Galaxy.Host) | After | +|---|---|---| +| Galaxy hierarchy browse | `GalaxyRepository` (SQL) inside Host | Driver (Option 1: via gw browse RPC; Option 2: own SQL or worker) | +| Live read / write / subscribe | `MxAccessClient` + STA pump in Host | gw (Option 1) or embedded worker (Option 2) | +| Wonderware Historian SDK | `HistorianDataSource` in Host (x86) | Separate Historian data source plugged into the server's HA service. Likely stays its own .NET 4.8 x86 sidecar because the SDK is x86-only; **independent of the Galaxy driver lifecycle**. | +| Alarm state machine (`.InAlarm`/`.Acked` quartet, transitions, ack writer) | `GalaxyAlarmTracker` in Host | Server-level A&E subsystem subscribes to alarm-bearing attributes the driver advertises and runs the AlarmCondition state machine generically. Driver only flags `IsAlarm=true` in node metadata. | +| `ScanState` per-platform probes | `GalaxyRuntimeProbeManager` in Host | Driver-side: ScanState is just another tag subscription; the driver re-advises one per discovered `$WinPlatform`/`$AppEngine` and reports `HostConnectivityStatus` from the value stream. No special host-side machinery. | + +After the strip-down, the Galaxy driver looks like Modbus or OpcUaClient: it +discovers nodes, reads/writes/subscribes, and reports per-host transport +health. Everything else is the server's problem. + +--- + +## Option 1 — Tier-A driver against the MxAccess Gateway + +`Driver.Galaxy` becomes a regular **in-process .NET 10 driver** in the OtOpcUa +server (no `.Host`, no `.Proxy` split, no x86). It talks to a separately +deployed `MxGateway.Server` over gRPC using `MxGateway.Client`. Browse comes +from `galaxy_repository.v1.DiscoverHierarchy`. Live data comes from +`MxAccessGateway.OpenSession`/`AddItem`/`Advise`/`StreamEvents`. + +``` +OtOpcUa.Server (.NET 10 x64) + └── Driver.Galaxy (in-proc, .NET 10) + └── gRPC ──► MxGateway.Server (.NET 10 x64) + └── pipe ──► MxGateway.Worker (.NET 4.8 x86) + └── MXAccess COM (STA) +``` + +### Pros + +- **Architectural parity with other drivers.** No bespoke `Host` service, no + x86 build target, no NSSM wrapper, no STA pump in this repo, no + `PostMortemMmf`/`RecyclePolicy` we maintain ourselves. +- **OtOpcUa server stops needing AVEVA installed on its own host.** The + gateway runs where MXAccess lives; the OPC UA server can live on a different + box, in a container, or on a hardened jump host. +- **One canonical MXAccess surface across the org.** Any future tool — a + diagnostic CLI, a Historian replacement, an integration harness — talks to + the same gw with the same parity guarantees we get. +- **Multi-instance friendly.** Two OtOpcUa servers (warm/hot redundancy) share + one gw and one MXAccess footprint instead of each running their own + `Galaxy.Host` with duplicate Wonderware client identities. +- **Browse + cache for free.** `galaxy_repository.v1` already implements the + hierarchy cache, deploy-time gating, paging, and `WatchDeployEvents` — we + delete `GalaxyRepository.cs`, `GalaxyHierarchyRow.cs`, the change-detection + poll loop, and the matching SQL plumbing. +- **Operability for free.** API-key auth, Blazor dashboard at `/dashboard`, + metrics via `Meter`, structured logs with redaction. We currently have + none of that in `Galaxy.Host`. +- **Future backend swap.** When AVEVA exposes managed NMX or another modern + path, gw routes to it without OtOpcUa changes (gw's stated roadmap). +- **Tighter blast radius.** A hung COM event, a leaking COM object, a + crashing worker — all owned by gw's session/worker isolation, not the + OPC UA server process. +- **Simpler version story for OtOpcUa.** Driver is plain .NET 10; the + bitness/runtime split lives entirely in mxaccessgw's repo. + +### Cons + +- **Extra deployment dependency.** mxaccessgw is now a service that has to be + installed, monitored, and kept on a compatible protocol version. For a + single-box install this is one more moving piece. +- **Two hops on every call** (driver→gw, gw→worker) instead of one + (proxy→host). Today's hop is MessagePack over a named pipe; the new outer + hop is gRPC over TCP. Per-call overhead is a few hundred microseconds, not + a regression for OPC UA workloads but measurable for very chatty bursts. +- **Auth/secret surface added.** OtOpcUa now holds an API key for gw and + rotates it; gw's SQLite-backed key store has to be managed. +- **Failure model spans two processes we don't own** — gw + worker. Reconnect + logic in our driver has to ride both: gw transport drop, gw session lease + expiry, gw-detected worker crash, plus the worker's own MXAccess reconnect. + All of it is exposed in the gRPC contract, but it's still surface area. +- **Cross-repo protocol coupling.** Bumping `mxaccessgw` major version (gRPC + contract changes, session shape changes) ripples into OtOpcUa releases. + Mitigated by versioned contracts; not free. +- **Galaxy redundancy still has to think about gw.** A redundancy fail-over of + OtOpcUa is independent of the gw's session lifecycle. Need to decide whether + the standby holds an open session or only opens it on takeover. +- **Sensitive writes (`WriteSecured`, `AuthenticateUser`) cross the network** + if gw is remote. TLS + mTLS solves it but adds setup. + +--- + +## Option 2 — Embed mxaccessgw worker, no gateway + +`Driver.Galaxy` is still in-process .NET 10, but instead of speaking gRPC to a +gateway service, it directly **launches and supervises one (or more) +`MxGateway.Worker` processes** and talks to them over the same named-pipe +worker protocol gw uses internally +(`docs/WorkerFrameProtocol.md`, `docs/WorkerProcessLauncher.md`). Browse stays +local — driver runs the SQL queries against ZB itself. + +``` +OtOpcUa.Server (.NET 10 x64) + └── Driver.Galaxy (in-proc, .NET 10) + ├── ZB SQL (local, in-proc) + └── pipe ──► MxGateway.Worker (.NET 4.8 x86, child process) + └── MXAccess COM (STA) +``` + +### Pros + +- **One hop, not two.** Driver → worker pipe is the same shape as today's + Proxy → Host pipe. Latency is on par with the current implementation. +- **No new service to deploy.** Worker is launched as a child process the + same way `Galaxy.Host` is launched today (just with mxaccessgw's worker + binary). Single-machine install story stays simple. +- **Keeps the trust boundary local.** No API keys, no TLS, no exposed gRPC + port on the OtOpcUa box. +- **Reuses mxaccessgw's parity-tested worker code** — STA pump, COM lifetime, + event conversion, fault model — without inheriting gw's ASP.NET Core / + Blazor / SQLite footprint. +- **Tighter ownership.** OtOpcUa owns the worker lifecycle; recycle, kill, + restart, post-mortem all decided by the driver, not by an external service + we don't control. +- **Easier to reason about during integration tests.** No second service to + spin up in CI; just a child process per test fixture. + +### Cons + +- **OtOpcUa server box must still have AVEVA + MXAccess installed**, since + the worker runs locally. The major deployment win of Option 1 + (separating where MXAccess runs from where OtOpcUa runs) is lost. +- **OtOpcUa still ships an x86 .NET 4.8 binary alongside it.** Even if we + vendor mxaccessgw's worker rather than write our own, installer complexity + and bitness considerations remain. +- **We re-implement everything gw already gives.** Process supervision, + watchdog, recycle policy, heartbeat, post-mortem — these are exactly what + `Galaxy.Host` does today, and they'd live in our repo again, just calling a + different worker binary. +- **No browse cache, no deploy gating, no `WatchDeployEvents`** — we keep + running our own ZB queries and our own `time_of_last_deploy` poll, or we + port gw's cache code into the driver. Either way it's duplicated logic. +- **No auth, no dashboard, no metrics.** Operability stays where it is today + (i.e., minimal). Adding it ourselves is a separate project. +- **Multiple OtOpcUa instances multiply MXAccess sessions.** Redundancy pair + → two MXAccess clients on the Galaxy from the same software, vs. Option 1 + where one gw arbitrates. +- **Worker protocol coupling without the contract surface.** We depend on + mxaccessgw's worker IPC frame format — a surface that mxaccessgw treats as + *internal* to its own gw↔worker boundary. If they refactor it, we have to + follow. The public gRPC contract (Option 1) is more stable by design. +- **Loses the "common MXAccess access point" benefit.** Other consumers + (CLI, integration harnesses, future tools) can't share state with our + embedded worker. + +--- + +## Status quo (for comparison) + +Keep `Galaxy.Host` as today, and in-place rip out historian + alarming + +probe manager. End state: the Host shrinks to `MxAccessClient` + `GalaxyRepository`, +which is roughly what Option 2 ends up looking like — but with our hand-rolled +COM bridge instead of mxaccessgw's worker. Not a serious option once +mxaccessgw exists; we'd be maintaining a parallel implementation of the same +thing. + +--- + +## Recommendation (effort-agnostic) + +**Go with Option 1 — Tier-A driver against the MxAccess Gateway.** + +The decisive arguments: + +1. **It's the only option that aligns Galaxy with how every other driver in + this repo is structured.** The user's stated goal — "keep lmx to data + + browsing, similar to other drivers" — only fully resolves if there is no + `.Host` and no x86 build artifact in this repo at all. Option 2 still has + an x86 child process and supervisor code; it's `Galaxy.Host` with a + different worker binary inside. + +2. **It separates *where MXAccess runs* from *where OtOpcUa runs*.** That is + a strategically larger win than a few hundred microseconds of per-call + latency. The OPC UA server stops being chained to AVEVA install footprint, + bitness, and Wonderware client identity — which removes a class of + deployment, redundancy, and CI problems we hit today (e.g., the + `DESKTOP-6JL3KKO` Hyper-V/Docker conflict, the `dohertj2`-only pipe ACL, + the live-Galaxy smoke test prerequisites). + +3. **It collapses scope.** A non-trivial fraction of `Galaxy.Host` (browse + cache, deploy-event watch, worker supervision, COM bridge, post-mortem, + recycle, ACL hardening) is reproduced *better* in mxaccessgw. Option 1 + deletes our copy. Option 2 keeps it. + +4. **It positions historian and alarming for the right home.** Once the + Galaxy driver is "just another driver", historian becomes a server-level + data source (one that can also feed Modbus/S7 history if we ever want it), + and alarming becomes a server-level A&E subsystem. Option 2 nominally + allows the same move, but the temptation to keep them in `Galaxy.Host` + "while we're already there" is real. + +5. **It future-proofs against AVEVA's roadmap.** Managed NMX, ASB, or any + replacement that shows up over the next few years gets adopted in + mxaccessgw without a release in this repo. + +The case for Option 2 is real but narrow: it's the right call **only** if we +commit to single-box deployments forever, refuse to take a gRPC dependency, +and value local-trust simplicity over the consolidation/operability benefits +gw provides. None of those constraints hold here. + +### What flips the recommendation + +- If the gw protocol is unstable or perf-tested under our subscription + patterns turns out worse than expected → revisit Option 2. +- If org-policy forbids running an MXAccess gateway as its own service → + Option 2. +- If Galaxy goes from one of several drivers to *the* primary driver and + raw call-rate matters more than architectural fit → revisit. + +Otherwise: Option 1. + +--- + +## Out-of-scope follow-ups (don't decide here, but flag them) + +- **Where does the Wonderware Historian SDK live?** Likely its own + .NET 4.8 x86 sidecar exposing a small `IHistorianDataSource` over a pipe or + gRPC, plugged into the OPC UA server's HA service alongside any future + historian sources. Independent of which option above is chosen. +- **Alarm subsystem ownership.** Decide whether the server hosts a generic + AlarmCondition state machine driven by driver-advertised alarm metadata, or + whether each driver continues to emit pre-shaped alarm transitions. Galaxy's + 4-attr quartet is a strong forcing function for the generic approach. +- **Redundancy + gw sessions.** Standby OtOpcUa holds an open gw session + (warm) vs. opens on takeover (cold). Affects gw worker count and Galaxy + client-identity collisions. +- **Auth between OtOpcUa and gw.** API key in DPAPI-protected secret file vs. + Windows-auth gRPC. Both supported by gw; pick before rollout. diff --git a/lmx_mxgw.md b/lmx_mxgw.md new file mode 100644 index 0000000..c231c0b --- /dev/null +++ b/lmx_mxgw.md @@ -0,0 +1,476 @@ +# Galaxy → MxAccessGateway Migration Plan + +Implements **Option 1** from `lmx_backend.md`: replace the bespoke `Galaxy.Host` ++ `Galaxy.Proxy` IPC pair with an **in-process Tier-A** `Driver.Galaxy` running +in the .NET 10 OtOpcUa server, talking to a separately-deployed +`MxGateway.Server` (mxaccessgw repo) over gRPC for live MXAccess work and +Galaxy Repository browse. + +## Outcome + +After this work: + +- `OtOpcUa.Server` is fully .NET 10 x64 — no x86 build artifacts in this repo. +- `Driver.Galaxy.Host` (Windows service, NSSM-wrapped, .NET 4.8 x86) is + retired. `Driver.Galaxy.Proxy` and `Driver.Galaxy.Shared` are deleted. + AVEVA platform is no longer required on the OtOpcUa box. +- A new in-process `Driver.Galaxy` lives next to `Driver.Modbus`, + `Driver.OpcUaClient`, etc. It implements the same `IDriver` capability set + the proxy implements today, but its body calls `MxGateway.Client` + (`MxGatewayClient`, `MxGatewaySession`, `GalaxyRepositoryClient`). +- Wonderware Historian SDK access moves out of the Galaxy driver into a + driver-agnostic historian data source (`Driver.Historian.Wonderware`, + separate sidecar, .NET 4.8 x86). The OPC UA HA service plugs into it the + same way it would plug into any future historian. +- Alarm condition tracking moves out of the driver into the OPC UA server's + generic A&E subsystem. The driver only flags `IsAlarm=true` on attribute + metadata and forwards live `.InAlarm`/`.Acked`/etc value changes; the + server runs the AlarmCondition state machine. +- Per-platform `ScanState` probes degrade to plain attribute subscriptions — + no special probe manager. + +--- + +## Pre-flight: improvements to land in mxaccessgw first + +These are **integration-quality changes** in the mxaccessgw repo that make +the OtOpcUa side dramatically simpler / faster / more robust. They aren't +strictly required to start, but ship enough of them before phase 3 that we're +not designing around gaps. + +### gw-1. Galaxy attribute metadata parity + +**What's there:** `galaxy_repository.v1.DiscoverHierarchy` returns +`GalaxyObject` with name, parent, category, and dynamic attributes. + +**What's missing for OtOpcUa:** every field today's `MxAccessGalaxyBackend` +copies into `GalaxyAttributeInfo` — confirm gw's `Attribute` proto carries: +- `mx_data_type` (int) +- `is_array` (bool) +- `array_dimension` (uint, optional) +- `security_classification` (int) +- `is_historized` (bool, from `HistorizedExtension` primitive) +- `is_alarm` (bool, from `AlarmExtension` primitive) + +If any are missing, add them to the proto and the server-side query mapper. +Without `IsAlarm` and `IsHistorized` the OPC UA server can't decide which +nodes get HasHistoricalConfiguration / which become AlarmConditions. + +### gw-2. Stable, documented event-stream resume semantics + +**What's needed:** the OtOpcUa driver must survive a transient gw transport +drop without losing subscription state or duplicating change events. gw's +`StreamEventsAsync(afterWorkerSequence)` already exposes resumption. +Document the per-session retention window (how long does the worker buffer +events the gateway hasn't acked?) and the "events were dropped, you must +re-subscribe" signal. If retention is bounded by count rather than time, +expose the bound in `OpenSessionReply` so the client can size its own buffer. + +### gw-3. Reconnectable sessions + +Listed under "post-v1 revisit" in `gateway.md`. Without it, every gw or +OtOpcUa restart re-`Register`s, re-`AddItem`s, re-`Advise`s the entire +address space — for a 50k-tag Galaxy that's a non-trivial cold-start. With +reconnectable sessions, the driver presents its `SessionId` after a restart +and the worker keeps its handles. + +If full reconnection is too large, ship a **bulk replay** instead: a single +RPC that takes the full subscription set and the worker performs the +register/add/advise inside one round trip. We can drive it from a +client-side cache rather than gw state. See gw-5 below. + +### gw-4. Driver-shaped subscribe primitive + +`MxGatewaySession` already has `SubscribeBulkAsync` (one RPC: `Register` +implicit + `AddItem` + `Advise` for a list of tag addresses, returning +per-tag `SubscribeResult`). That's exactly what `ISubscribable.SubscribeAsync` +wants. Confirm it returns enough per-tag detail to surface a partial-failure +list to OPC UA monitored items (good handle, status code, error text). + +If not already, expose **`SubscribeBulk` with optional update-rate hint** +forwarded to `SetBufferedUpdateInterval` so the OPC UA publishing interval +becomes a single field on the subscribe call rather than a follow-up RPC. + +### gw-5. Subscription replay snapshot + +Provide an RPC `ReplaySubscriptionsAsync(SessionId, IEnumerable)` +that re-establishes a list of subscriptions after a session reset and returns +per-tag results. The client stores its tag list locally (the driver already +has it from `Discover`), and the gw worker turns it into one +register/add/advise sequence. This is the minimum surface we need; full +"reattach to a previous session by id" (gw-3) is a richer version of the +same thing. + +### gw-6. Transport-health stream + +The gw already exposes worker / session health on its dashboard. Add a small +streaming RPC `StreamSessionHealth(SessionId) → stream SessionHealth` so the +OtOpcUa driver can surface "MXAccess transport up/down" to its +`IHostConnectivityProbe` without faking it via probe-tag subscriptions. +Today `MxAccessClient.ConnectionStateChanged` does this in-process; we want +the same signal at the gw boundary. + +### gw-7. Optional .NET 10 client polish + +- Async-disposable session pattern is already there. +- Add a **typed `MxValue` ⇄ `object` adapter** for the seven Galaxy types + OtOpcUa cares about (Boolean, Int32, Float, Double, String, DateTime, + arrays of the same). Today every consumer writes its own `MxValue.From` + helpers; this shaves boilerplate from the driver. +- Add a **`SubscribeWithCallback`** convenience wrapper that combines + `OpenSession` + `SubscribeBulk` + `StreamEvents` and routes events through + a delegate per tag. Keeps the OPC UA driver from re-implementing the + fan-out / sequencer pattern. + +### gw-8. Auth minimums + +Document API-key scoping as it applies to OtOpcUa: the server identity needs +`session`, `invoke`, `event`, and `metadata:read` scopes. Provide a CLI to +mint a key bound to those scopes for an OtOpcUa instance. + +### gw-9. Performance: bulk paths and value coalescing + +- Confirm `SubscribeBulkAsync` is implemented as a single MXAccess + `AddItem`+`Advise` loop on the worker, not N pipe round trips. If not, fix + before we drive 50k-tag Galaxies through it. +- Expose `SetBufferedUpdateInterval` per session so OtOpcUa can request + buffered updates at the OPC UA publishing interval and get one batched + `OnBufferedDataChange` per tick rather than N `OnDataChange` events. + +These can all ship in mxaccessgw independently and improve every consumer. + +--- + +## OtOpcUa-side improvements to land in parallel + +Some are forced by removing `Galaxy.Host`; others are quality-of-life. + +### ot-1. Promote `IHistorianDataSource` to a server-level extension point + +Today `IHistorianDataSource` is a Galaxy-internal abstraction in +`Driver.Galaxy.Host`. Lift it to `OtOpcUa.Core.Abstractions` (or a similar +home next to `IDriver`) and let the OPC UA HA service consume **any number +of registered data sources** keyed by node namespace. Drivers don't own +historian access; the server mounts data sources alongside drivers. This is +the prerequisite that lets us move Wonderware Historian out of the Galaxy +driver without losing the feature. + +### ot-2. Generic alarm condition state machine in the server + +Move the `.InAlarm`/`.Priority`/`.DescAttrName`/`.Acked` quartet handling +out of `GalaxyAlarmTracker` into a server-level alarm subsystem keyed off the +`IsAlarm=true` flag drivers set during discovery. The server subscribes to +the four sub-attributes itself and runs the AlarmCondition state machine. +Driver only: +- declares `IsAlarm=true` in `DriverAttributeInfo`, +- forwards plain attribute value changes (already done by `ISubscribable`). + +This is also a precondition for future drivers (Modbus DL205 alarm bits, +S7 alarm DBs) to emit alarms without each writing their own tracker. + +### ot-3. Driver capabilities trim + +After ot-1 and ot-2, `Driver.Galaxy` no longer needs to implement: +- `IHistoryProvider` (server's HA service handles it via Wonderware + historian data source) +- `IAlarmHistorianWriter` (server's A&E historian, or kept generic — Galaxy + shouldn't own the SQLite path) +- `IAlarmSource` ack route (server-level alarm subsystem writes back via the + driver's `IWritable.WriteAsync`, which the gw already supports) + +Keep: +- `IDriver`, `ITagDiscovery`, `IReadable`, `IWritable`, `ISubscribable`, + `IRediscoverable`, `IHostConnectivityProbe`. + +### ot-4. Treat `time_of_last_deploy` as `IRediscoverable`'s pump + +Replace the Host-side change-detection poll with a managed +`GalaxyRepositoryClient.WatchDeployEventsAsync` consumer in the driver. +Each event raises `OnRediscoveryNeeded` with the new deploy time as the +`scopeHint`. No polling code in this repo. + +### ot-5. Connection pool at the server, not the driver + +If the redundancy pair runs two OtOpcUa instances against one gw, both +should share a single `GrpcChannel` per process (already gRPC default) but +**different sessions** (one MXAccess client identity per OtOpcUa instance, +not one shared session that fights over Wonderware client state). Encode +the per-instance MXAccess client name in driver config — already partly +there (`OTOPCUA_GALAXY_CLIENT_NAME`); make it explicit in the new driver's +`appsettings.json` shape. + +--- + +## Phased implementation + +Each phase is a working, mergeable slice. Keep `Galaxy.Host` running +alongside the new driver until phase 7 — gated by a config switch +`Galaxy:Backend = legacy-host | mxgateway`. + +### Phase 0 — pre-flight (mxaccessgw repo) + +Ship gw-1, gw-2, gw-4, gw-9 (the parity, performance, and contract bits the +plan immediately depends on). gw-3, gw-5, gw-6, gw-7 can come during or +after phase 5. + +**Exit:** local OtOpcUa dev box can `MxGatewayClient.Create` a client, open a +session, `SubscribeBulkAsync` 100 tags, and observe `OnDataChange` events at +the configured update rate. + +### Phase 1 — server-level historian extension point (ot-1) + +1. Extract `IHistorianDataSource` (and its DTOs `HistorianSample`, + `HistorianAggregateSample`, `HistoricalEvent`) from + `Driver.Galaxy.Host/Backend/Historian/` into + `src/ZB.MOM.WW.OtOpcUa.Core/Abstractions/Historian/`. +2. Extend the OPC UA HA service to look up a registered + `IHistorianDataSource` per namespace and call into it for `HistoryRead`, + `HistoryReadProcessed`, `HistoryReadAtTime`, `HistoryReadEvents`. Drivers + stop implementing `IHistoryProvider` directly; the server proxies. +3. Add a no-op default registration so drivers without history keep working. + +**Exit:** all current Galaxy history reads route through an +`IHistorianDataSource` registered by `Driver.Galaxy.Host` (still legacy) +without behavior change. Other drivers untouched. + +### Phase 2 — server-level alarm subsystem (ot-2) + +1. Add an `IAlarmConditionDeclaration` API on the address-space builder so + discovery can flag a node as alarm-bearing and supply the four + sub-attribute references. +2. Add a hosted `AlarmConditionService` in the server that, on driver + `Discover`, subscribes to the four sub-attributes via the driver's own + `ISubscribable`, runs the state machine, and emits + `IAlarmSource.OnAlarmEvent` itself. Acks route back through the driver's + `IWritable.WriteAsync` to the `.AckMsg` attribute. +3. Add Galaxy-specific defaults (sub-attribute naming) as a small adapter + so the same service can serve future drivers with different conventions. + +**Exit:** Galaxy alarms still work end-to-end; the tracker code that runs +inside `Galaxy.Host` is dead but kept for the legacy-host backend path. + +### Phase 3 — Wonderware Historian sidecar (`Driver.Historian.Wonderware`) + +1. New solution project: `Driver.Historian.Wonderware`, .NET 4.8 x86, + console app + NSSM (mirrors today's Galaxy.Host packaging exactly, + minus Galaxy responsibilities). +2. Hosts the existing `HistorianDataSource`, `HistorianClusterEndpointPicker`, + `HistorianHealthSnapshot` code lifted from `Galaxy.Host/Backend/Historian/` + and exposes them over a small named-pipe protocol (or local gRPC if + .NET 4.8 cost is acceptable; named pipe is simpler). +3. Add `Driver.Historian.Wonderware.Client` — .NET 10 — implementing + `IHistorianDataSource` against the sidecar. +4. Server registers it as a data source for the `Galaxy` namespace. + +**Exit:** OPC UA history reads work via the sidecar with the legacy-host +backend still in place. We've decoupled history from MXAccess. + +### Phase 4 — new `Driver.Galaxy` against gw + +This is the meat. New project: `src/ZB.MOM.WW.OtOpcUa.Driver.Galaxy/`, .NET 10, +in-process. Capabilities (post ot-3): `IDriver`, `ITagDiscovery`, `IReadable`, +`IWritable`, `ISubscribable`, `IRediscoverable`, `IHostConnectivityProbe`. + +Shape: + +``` +Driver.Galaxy/ + GalaxyDriver.cs # IDriver root + Browse/ + GalaxyDiscoverer.cs # consumes GalaxyRepositoryClient.DiscoverHierarchyAsync + DataTypeMap.cs # mx_data_type → DriverDataType + SecurityMap.cs # security_classification → SecurityClassification + Runtime/ + GalaxyMxSession.cs # owns one MxGatewaySession; Register + map per-driver client name + SubscriptionRegistry.cs # tag → server/item handles; persists to memory only + EventPump.cs # consumes session.StreamEventsAsync, fans out to OnDataChange + ReconnectSupervisor.cs # gw transport drop / session-lost recovery + DeployWatcher.cs # GalaxyRepositoryClient.WatchDeployEventsAsync → OnRediscoveryNeeded + Health/ + HostConnectivityForwarder.cs # gw-6 SessionHealth → IHostConnectivityProbe + Config/ + GalaxyDriverOptions.cs # endpoint, ApiKey, ClientName, TLS, retry, intervals + GalaxyDriverFactoryExtensions.cs # AddGalaxyDriver(IServiceCollection) +``` + +Key behaviors: + +- **Discovery** calls `GalaxyRepositoryClient.DiscoverHierarchyAsync()` + once at init and on every `WatchDeployEvents` event, then drives the + address space builder. Same node naming as today (parent contained-name + hierarchy + leaf attributes named `tag_name.AttributeName`). +- **Read** uses one-off `AddItem` + `Advise` + read-after-first-callback + is overkill; instead, use **`Register` + per-call `AddItem`/`Read`** if gw + exposes a synchronous read, otherwise short-lived advise. *Action item:* + confirm gw's read story; if absent, request a synchronous `ReadAsync` RPC + on top of MXAccess `Read` (which exists in the COM API). +- **Write** maps `WriteRequest.Value` to `MxValue` via gw-7 helpers and + calls `WriteAsync(serverHandle, itemHandle, value, userId=0)`. Routes + `WriteSecured` (where `SecurityClassification == SecuredWrite/Verified`) + to `WriteSecuredAsync` once exposed on `MxGatewaySession`. +- **Subscribe** calls `SubscribeBulkAsync` once per `ISubscribable.Subscribe` + call. Stores `(tag → itemHandle, sid)` in `SubscriptionRegistry`. The + single `EventPump` consumes one `StreamEventsAsync` per session and fans + out per `sid`. +- **Unsubscribe** calls `UnsubscribeBulkAsync` and drops registry entries. +- **Reconnect** — when the gRPC channel drops or `StreamEvents` returns, + `ReconnectSupervisor` reopens the session and replays subscriptions via + gw-5 `ReplaySubscriptionsAsync`. The driver flags `DriverState.Degraded` + during recovery; the server keeps publishing last-good values with + `Uncertain` quality. +- **Host connectivity** — single synthesized host entry named after + `OTOPCUA_GALAXY_CLIENT_NAME` driven by gw-6 `SessionHealth` updates + (or, until gw-6 lands, by transport drops). + +Wire into the server next to other Tier-A drivers in the +`AddDrivers(...)` call site. + +**Exit:** flipping `Galaxy:Backend` to `mxgateway` runs the OPC UA server +end-to-end with no `Galaxy.Host` involvement. Live read, live write, live +subscribe pass against the dev Galaxy. Historian + alarms still work via +phases 1–3. + +### Phase 5 — parity test matrix + +Reuse the existing live-Galaxy integration tests; run each scenario twice: +once with `Galaxy:Backend=legacy-host`, once with `mxgateway`. Compare: + +- discovered hierarchy node count + names + datatypes, +- subscribed publish rates (allow ±10% tolerance vs. legacy), +- write success / status codes for each `SecurityClassification`, +- alarm condition transitions (Active / Acked / Inactive) — already + routed through phase 2's server-level subsystem, +- history reads — phase 3 sidecar, identical results both backends, +- reconnect behavior under gw kill, worker kill, network drop, ZB drop. + +Document the matrix; resolve every discrepancy or explicitly accept it. + +**Exit:** parity matrix has zero unexplained deltas. Performance budget +agreed: e.g. ≤ 2× per-call latency vs. named-pipe baseline at the 95th +percentile, equal or better throughput in `SubscribeBulk` setup time. + +### Phase 6 — perf + hardening + +- Land gw-9 buffered-update intervals. +- Add OpenTelemetry traces from the driver around every gw call, + correlated via `client_correlation_id`. +- Write soak test: 50k tags subscribed, 24h, count missed events, gw + restarts, OtOpcUa restarts. +- Tune `MxGatewayClientOptions.MaxGrpcMessageBytes`, retry pipeline, + call timeouts based on soak results. + +**Exit:** production-acceptable perf numbers documented in +`docs/Galaxy.Driver.md`. + +### Phase 7 — retirement + +1. Default `Galaxy:Backend = mxgateway` everywhere (sample configs, + install scripts, e2e configs). +2. Delete `src/ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Host`, + `src/ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Proxy`, + `src/ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Shared`, and matching tests. +3. Remove `OtOpcUaGalaxyHost` NSSM registration from + `scripts/install/Install-Services.ps1`. Add a registration block for the + Wonderware historian sidecar from phase 3. +4. Remove every x86 .NET 4.8 reference, build target, and CI step from this + repo; remove `mxaccess_documentation.md`-driven dependencies that no + longer apply. +5. Update CLAUDE.md, `docs/v2/dev-environment.md`, `docs/ServiceHosting.md`, + `docs/Redundancy.md` to reflect the new topology. +6. Memory housekeeping: retire `project_galaxy_host_service.md` and + `project_galaxy_host_installed.md`; add a short note about the gw + dependency. + +**Exit:** `git grep -i 'Galaxy\.Host'` returns nothing in source. + +--- + +## Configuration shape (new driver) + +```jsonc +"Drivers": { + "Galaxy": { + "Type": "Galaxy", + "InstanceId": "galaxy-prod-1", + "Gateway": { + "Endpoint": "https://mxgw.aveva.local:5001", + "ApiKeySecretRef": "galaxy:apiKey", // resolved via existing secret store + "UseTls": true, + "CaCertificatePath": "C:\\publish\\mxgw\\ca.crt", + "ConnectTimeoutSeconds": 10, + "DefaultCallTimeoutSeconds": 5, + "StreamTimeoutSeconds": 0 // unbounded + }, + "MxAccess": { + "ClientName": "OtOpcUa-A", // unique per OtOpcUa instance + "PublishingIntervalMs": 1000, // hint for SetBufferedUpdateInterval + "WriteUserId": 0 + }, + "Repository": { + "DiscoverPageSize": 5000, + "WatchDeployEvents": true + }, + "Reconnect": { + "InitialBackoffMs": 500, + "MaxBackoffMs": 30000, + "ReplayOnSessionLost": true + } + } +} +``` + +The OtOpcUa secret store already handles DPAPI-protected values for LDAP +binds; reuse it for the gw API key. Never put the key in plaintext in the +sample config. + +--- + +## Risks and mitigations + +| Risk | Mitigation | +|---|---| +| gw protocol regression breaks production | Pin gw NuGet to a contract version range; CI runs parity matrix on every gw bump; staged rollout via `Galaxy:Backend` flag. | +| Per-call latency regresses for chatty workloads | Land gw-9 (buffered updates) before phase 5; soak the 95p in phase 6. | +| Reconnect storm after gw restart re-registers 50k tags | Land gw-3 or gw-5 before phase 6; client-side bulk replay throttled by `SubscribeBulkAsync` chunk size. | +| Alarm parity gap from moving tracker server-side | Phase 2 ships before phase 4; parity matrix gates phase 7. | +| Historian sidecar adds a second .NET 4.8 x86 service | Acceptable: it's a *driver-agnostic* component, and it ships only where Wonderware historian access is actually needed. | +| Two OtOpcUa instances both registering as same MXAccess client | `ClientName` is per-instance config (ot-5); install scripts lint that the redundancy pair has distinct names. | +| Cross-machine MXAccess writes traverse plaintext gRPC | Phase 0 enforces `UseTls=true` for any non-loopback `Endpoint`; CI lints the sample configs. | +| gw API key leaked in logs | gw and `MxGatewayClient` already redact `authorization` metadata; phase 6 audit. | +| Memory leak in `EventPump` under high event rate | Bounded channel between `StreamEventsAsync` and per-sub fan-out, drop-newest with a metric counter; soak test catches. | + +--- + +## Cross-cutting deliverables + +- **Docs:** `docs/v2/Galaxy.Driver.md` (new), updates to + `docs/v2/dev-environment.md`, `docs/ServiceHosting.md`, + `docs/Redundancy.md`, `CLAUDE.md`. +- **Install scripts:** `scripts/install/Install-Services.ps1` removes + `OtOpcUaGalaxyHost`, adds `OtOpcUaWonderwareHistorian`, no Galaxy + service registration on the OtOpcUa node. +- **e2e:** `scripts/e2e/e2e-config.sample.json` — drop `OTOPCUA_GALAXY_*` + pipe vars, add `Drivers:Galaxy:Gateway:Endpoint` etc. +- **Memory:** retire stale Galaxy.Host entries; add gw dependency entry, + redundancy + client-name guidance. + +--- + +## Order-of-work summary + +``` +Phase 0 (gw repo): gw-1, gw-2, gw-4, gw-9 +Phase 1 (this): ot-1 — historian extension point +Phase 2 (this): ot-2 — alarm subsystem +Phase 3 (this): Driver.Historian.Wonderware sidecar +Phase 4 (this): Driver.Galaxy (new) behind backend flag + — depends on Phase 0, 1, 2 +Phase 5 (this+gw): parity matrix + — drives gw-3 / gw-5 / gw-6 / gw-7 if gaps surface +Phase 6 (this): perf + hardening +Phase 7 (this): retire Galaxy.Host / Proxy / Shared +``` + +Phases 1–3 are independent of each other and can run in parallel. Phase 4 +needs all three plus Phase 0. Phase 5 requires Phase 4. Phases 6 and 7 are +sequential after Phase 5. diff --git a/lmx_mxgw_impl.md b/lmx_mxgw_impl.md new file mode 100644 index 0000000..65b9675 --- /dev/null +++ b/lmx_mxgw_impl.md @@ -0,0 +1,1050 @@ +# Galaxy → MxGateway Migration — Detailed Implementation Plan + +Companion to `lmx_mxgw.md` (design plan). This document breaks the plan into +PR-sized tasks with concrete file paths, acceptance checks, test deltas, and +explicit parallel-safety analysis for subagent execution. + +Cross-repo scope: +- **`lmxopcua`** (this repo) — drivers, server, install scripts, e2e, docs. +- **`mxaccessgw`** (`C:\Users\dohertj2\Desktop\mxaccessgw`) — gRPC gateway, + worker, .NET client. + +--- + +## How to use parallel subagents safely + +The plan lists each task with a `parallel-key`. Two tasks share a key when +they touch the same file(s); tasks with **disjoint keys are safe to run in +parallel**. Tasks within the same phase that share a key MUST run +sequentially. + +### Subagent execution rules + +1. **One git worktree per parallel subagent.** Spawn each parallel agent + with `Agent({ isolation: "worktree", ... })` so they never collide on the + working tree. Merge back to a shared integration branch after each + parallel batch completes. +2. **Interface-defining tasks run first, then their consumers.** Anywhere + the plan says "PR X.0: define interface", that PR must merge to the + integration branch before its consumers fan out in parallel. +3. **Shared-file edits serialize.** Files touched by more than one PR in a + batch — `ZB.MOM.WW.OtOpcUa.slnx`, `Install-Services.ps1`, + `appsettings.json`, `CLAUDE.md`, `MEMORY.md` — get a single dedicated + "wire-up" PR at the end of the batch that ingests every parallel branch's + needed line. Don't let parallel agents edit them. +4. **Test fixtures own their fixture file.** When two PRs both need a + `FakeMxGatewayClient`, the first PR creates it and exposes the contract; + subsequent PRs add cases to the same file or extend it via partial class + in their own test files. +5. **Subagent prompt must include the parallel-key and disallowed paths.** + Any agent prompt must say "you may NOT edit ``, + ``, or files outside ``. If you discover a + needed change there, surface it as a task for the wire-up PR; do not + make it yourself." This prevents merge conflicts at integration time. +6. **Choose the right subagent type.** + - `Explore` — read-only research/locate. Cheap. Use before any PR that + needs to learn the surrounding code. + - `Plan` — produce a step-by-step PR plan from a brief; no code writes. + Use when a task description below is too coarse for a fresh agent. + - `general-purpose` — code-writing. Use for PRs that create/modify + source. + - `code-simplifier` — post-PR cleanup pass on the same files. + - `codex:rescue` — a stuck PR; use sparingly. +7. **Foreground vs. background.** Run one PR foreground if its result + gates the rest of your work this turn. Run the rest in background and + read results when they complete. +8. **Trust but verify.** After every subagent claims completion, the + parent runs the build (`dotnet build ZB.MOM.WW.OtOpcUa.slnx`) and the + target tests. The agent's report is hearsay until the build is green. +9. **Worktree cleanup.** When `isolation: "worktree"` returns no path, + nothing was changed; if it returns a path, integrate by cherry-picking + or fast-forwarding into the integration branch, then prune the worktree. + +### Locked files (never edit from a parallel batch) + +These get a dedicated wire-up PR at the **end** of each phase's parallel +fanout: + +| File | Why locked | +|---|---| +| `ZB.MOM.WW.OtOpcUa.slnx` | New project additions stack and conflict | +| `src/ZB.MOM.WW.OtOpcUa.Server/appsettings.json` | Config schema additions stack | +| `src/ZB.MOM.WW.OtOpcUa.Server/Program.cs` (or `Startup.cs`) | DI registrations stack | +| `scripts/install/Install-Services.ps1` | Service registrations stack | +| `scripts/e2e/e2e-config.sample.json` | E2E config stacks | +| `CLAUDE.md`, `docs/v2/dev-environment.md` | Doc edits stack | +| `MEMORY.md` (auto-memory index) | One line per change; conflicts often | +| `mxaccessgw/MxGateway.sln` | Same reason as our slnx | +| `mxaccessgw/clients/proto/*.proto` files | Proto edits stack and reorder field numbers | + +--- + +## Phase 0 — mxaccessgw foundation work + +Repo: `C:\Users\dohertj2\Desktop\mxaccessgw`. Branch off `main` per task. + +| PR | Title | Parallel-key | Files | +|----|-------|--------------|-------| +| 0.1 | Galaxy attribute metadata parity | `gw-proto-galaxy` | `clients/proto/galaxy_repository.proto`, `src/MxGateway.Server/Galaxy/AttributeMapper.cs`, `src/MxGateway.Server/Galaxy/GalaxyHierarchyCache.cs`, `gr/`-equivalent SQL in `src/MxGateway.Server/Galaxy/Sql/`, contract tests | +| 0.2 | Bulk subscribe with publishing-interval hint | `gw-proto-mxaccess` | `clients/proto/mxaccess_gateway.proto` (extend `SubscribeBulkCommand` with `optional uint32 buffered_update_interval_ms`), `src/MxGateway.Worker/MxAccess/Commands/SubscribeBulkHandler.cs`, `src/MxGateway.Server/Sessions/Mappers.cs`, worker tests | +| 0.3 | Subscription replay RPC | `gw-proto-mxaccess` | Same proto file as 0.2 (add `ReplaySubscriptionsCommand`), `src/MxGateway.Worker/MxAccess/Commands/ReplaySubscriptionsHandler.cs`, gateway forwarder, tests | +| 0.4 | Session health stream | `gw-proto-mxaccess` | Same proto (add `StreamSessionHealth(SessionId) returns (stream SessionHealth)`), `src/MxGateway.Server/Sessions/SessionHealthService.cs`, dashboard projection, tests | +| 0.5 | Document event-stream resume contract | `gw-docs` | `docs/Sessions.md`, `docs/gateway-process-design.md` — define retention bound, `events_lost` signal in `MxEvent` envelope | +| 0.6 | .NET client `MxValue` adapter + `SubscribeWithCallback` | `gw-dotnet-client` | `clients/dotnet/MxGateway.Client/MxValueAdapter.cs` (new), `clients/dotnet/MxGateway.Client/MxGatewaySession.cs` (extend with `SubscribeWithCallbackAsync`), `clients/dotnet/MxGateway.Client.Tests/` | +| 0.7 | API key scopes + `mxgw-key` minting CLI | `gw-auth` | `src/MxGateway.Server/Auth/`, `src/MxGateway.Cli/`, `docs/Authentication.md` | + +### Phase 0 parallel batches + +- **Batch 0a (parallel):** 0.1 (`gw-proto-galaxy`), 0.5 (`gw-docs`), + 0.6 (`gw-dotnet-client`), 0.7 (`gw-auth`). Four worktrees, four + `general-purpose` agents. +- **Batch 0b (sequential within key, parallel across keys):** 0.2 → 0.3 → + 0.4 all share `gw-proto-mxaccess`. Land them in order on the same agent + (or three sequential calls). Field number assignment must be coordinated + through the wire-up PR. +- **Wire-up 0.W:** integrate proto-generated descriptors, regenerate + `clients/proto/descriptors`, run cross-language smoke matrix. + +**Phase 0 exit:** mxaccessgw `main` carries all seven PRs. Tag the gw NuGet +release. Bump `MxGateway.Client` consumed by lmxopcua. + +--- + +## Phase 1 — Server-level historian extension point (lmxopcua) + +Goal: detach `IHistorianDataSource` from the Galaxy driver. Server's +`HistoryRead*` operations call into a registered data source by namespace, +not into `IHistoryProvider` on the driver. + +### Tasks + +#### PR 1.1 — Lift `IHistorianDataSource` to `Core.Abstractions` + +**Parallel-key:** `core-abs-historian` (locks files in +`Core.Abstractions/Historian/`). + +**Files** +- Create: + - `src/ZB.MOM.WW.OtOpcUa.Core.Abstractions/Historian/IHistorianDataSource.cs` + - `src/ZB.MOM.WW.OtOpcUa.Core.Abstractions/Historian/HistorianSample.cs` + - `src/ZB.MOM.WW.OtOpcUa.Core.Abstractions/Historian/HistorianAggregateSample.cs` + - `src/ZB.MOM.WW.OtOpcUa.Core.Abstractions/Historian/HistorianEvent.cs` + - `src/ZB.MOM.WW.OtOpcUa.Core.Abstractions/Historian/HistorianHealthSnapshot.cs` +- Move-from (Galaxy.Host originals stay until phase 7; new copies live in + Core.Abstractions and are pure POCO): + - source bodies in `src/.../Driver.Galaxy.Host/Backend/Historian/` +- Modify: + - `src/ZB.MOM.WW.OtOpcUa.Core.Abstractions/ZB.MOM.WW.OtOpcUa.Core.Abstractions.csproj` (no change if files auto-included) +- Tests: + - `tests/ZB.MOM.WW.OtOpcUa.Core.Abstractions.Tests/Historian/IHistorianDataSourceContractTests.cs` — + contract documentation tests (null arg behavior, time-range ordering). + +**Acceptance** +- `dotnet build` clean. +- New tests run and pass. +- Galaxy.Host still compiles (it keeps its own copies until phase 7). + +**Subagent prompt boilerplate** (template — re-use this shape for each PR): +> You are working in worktree ``. Create the files listed in PR 1.1 of +> `lmx_mxgw_impl.md`. Do NOT edit any file under `Driver.Galaxy.Host/`, +> `appsettings.json`, the `.slnx`, or `Program.cs`. The DTOs are pure value +> records — do not import OPC UA types or COM types. Run +> `dotnet build src/ZB.MOM.WW.OtOpcUa.Core.Abstractions` before reporting. + +#### PR 1.2 — `IHistoryService` plugin host on the server + +**Parallel-key:** `server-history`. + +**Files** +- Create: + - `src/ZB.MOM.WW.OtOpcUa.Server/History/IHistoryRouter.cs` — namespace → `IHistorianDataSource`. + - `src/ZB.MOM.WW.OtOpcUa.Server/History/HistoryRouter.cs` — registry impl. + - `src/ZB.MOM.WW.OtOpcUa.Server/History/HistoryServiceAdapter.cs` — + bridges OPC UA `HistoryRead`/`HistoryReadProcessed`/`HistoryReadAtTime`/ + `HistoryReadEvents` to the router. +- Modify: + - `src/ZB.MOM.WW.OtOpcUa.Server/OpcUaServerService.cs` — register + `HistoryServiceAdapter`. *Locked file* — defer to wire-up PR 1.W. +- Tests: + - `tests/ZB.MOM.WW.OtOpcUa.Server.Tests/History/HistoryRouterTests.cs`. + +**Acceptance** +- Router resolves data source by namespace prefix. +- Unknown namespace returns `BadHistoryOperationUnsupported` (or current + status used for that case — verify against existing server behavior in + `OpcUaServerService.cs` before coding). + +**Depends on:** 1.1 merged. + +#### PR 1.3 — Driver capability shrink: drop `IHistoryProvider` requirement + +**Parallel-key:** `server-history`. + +**Files** +- Modify: + - `src/ZB.MOM.WW.OtOpcUa.Server/DriverNodeManager.cs` (or wherever + `IHistoryProvider` is consumed; locate via `Grep "IHistoryProvider"`). + Replace direct calls with `IHistoryRouter.Resolve(...)`. +- Tests: + - Update any test that exercised `IHistoryProvider` paths to register a + fake data source via the router. + +**Depends on:** 1.2 merged. + +#### PR 1.W — Phase 1 wire-up + +**Parallel-key:** locked-files. + +**Files** +- `src/ZB.MOM.WW.OtOpcUa.Server/OpcUaServerService.cs` — DI registration of + `HistoryRouter` + the legacy Galaxy.Host historian adapter. +- `ZB.MOM.WW.OtOpcUa.slnx` — no change unless a new project was added; if + PR 1.1 went into the existing `Core.Abstractions` project, no slnx edit. + +### Phase 1 parallel batches + +- **Batch 1a (sequential):** 1.1 → 1.2 → 1.3 → 1.W. Each blocks the next. +- Total: one foreground sequence; no parallelism in Phase 1. Use one + `general-purpose` agent across all four PRs, or one PR per agent in + order. + +--- + +## Phase 2 — Server-level alarm condition subsystem (lmxopcua) + +Goal: drop `GalaxyAlarmTracker` from the driver's responsibilities; the +server runs the AlarmCondition state machine driven by `IsAlarm=true` +attribute metadata. + +### Tasks + +#### PR 2.1 — Address-space builder alarm-declaration API + +**Parallel-key:** `core-abs-alarms`. + +**Files** +- Modify: + - `src/ZB.MOM.WW.OtOpcUa.Core.Abstractions/IAddressSpaceBuilder.cs` — + add `IAlarmConditionDeclaration MarkAsAlarmCondition(...)` (the + method already exists per `GalaxyProxyDriver.cs:146`; verify shape and + extend with the four sub-attribute references). + - `src/ZB.MOM.WW.OtOpcUa.Core.Abstractions/Alarms/AlarmConditionInfo.cs` + — add `InAlarmRef`, `PriorityRef`, `DescAttrNameRef`, `AckedRef`, + `AckMsgWriteRef` fields. +- Tests: + - `tests/ZB.MOM.WW.OtOpcUa.Core.Abstractions.Tests/Alarms/AlarmConditionInfoTests.cs`. + +**Acceptance** +- Existing call sites (`GalaxyProxyDriver.DiscoverAsync`) still compile — + add the new fields with safe defaults. + +#### PR 2.2 — `AlarmConditionService` (state machine) + +**Parallel-key:** `server-alarms`. + +**Files** +- Create: + - `src/ZB.MOM.WW.OtOpcUa.Server/Alarms/AlarmConditionService.cs` + - `src/ZB.MOM.WW.OtOpcUa.Server/Alarms/AlarmConditionState.cs` + - `src/ZB.MOM.WW.OtOpcUa.Server/Alarms/IAlarmAcknowledger.cs` +- Reference impl to **port** (do not duplicate — read it for invariants): + - `src/ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Host/Backend/Alarms/GalaxyAlarmTracker.cs` +- Tests: + - `tests/ZB.MOM.WW.OtOpcUa.Server.Tests/Alarms/AlarmConditionServiceTests.cs` — + port the existing tracker tests (`tests/.../Galaxy.Host.Tests/`). + +**Subagent guidance** +- **Two-step.** First a `Plan` agent: read `GalaxyAlarmTracker.cs` and + produce a state-transition table + a list of tests to port. Then a + `general-purpose` agent: implement `AlarmConditionService` against that + table. + +**Depends on:** 2.1 merged. + +#### PR 2.3 — Wire alarm service into `DriverNodeManager` + +**Parallel-key:** `server-alarms`. + +**Files** +- Modify: + - `src/ZB.MOM.WW.OtOpcUa.Server/DriverNodeManager.cs` — on each driver's + discovery, collect alarm declarations and hand to `AlarmConditionService` + along with the driver's `ISubscribable` and `IWritable` for sub-attribute + advise + ack writes. +- Tests: + - extend `DriverNodeManagerTests` with a fake driver that declares one + alarm-bearing node. + +**Depends on:** 2.2 merged. + +#### PR 2.W — Phase 2 wire-up + +DI registration of `AlarmConditionService` in `OpcUaServerService.cs`. + +### Phase 2 parallel batches + +- **Batch 2a (sequential):** 2.1 → 2.2 → 2.3 → 2.W. + +### Phases 1 + 2 cross-batch parallelism + +PR 1.1 and PR 2.1 touch **different files** in `Core.Abstractions/` (one +under `Historian/`, one in `IAddressSpaceBuilder.cs` + `Alarms/`). They are +**parallel-safe**. + +PR 1.2/1.3 and PR 2.2/2.3 both modify `OpcUaServerService.cs` and +`DriverNodeManager.cs`. They share **two locked files** — but only at the +DI-registration level. If we split the `OpcUaServerService.cs` edits into a +single combined wire-up PR (1+2.W), the body PRs 1.2/1.3 and 2.2/2.3 don't +touch them. Then the body PRs *can* run in parallel batches across +phase 1 and phase 2. + +**Recommended Phase 1+2 plan** (parallel): +1. Run **PR 1.1 and PR 2.1 in parallel** (two worktrees, two + `general-purpose` agents). Both target `Core.Abstractions` only. +2. Merge both to integration branch. +3. Run **PR 1.2/1.3 and PR 2.2/2.3 in parallel**, each as a sequential + 2-PR chain on its own worktree. Constraint: neither chain edits + `OpcUaServerService.cs` or `DriverNodeManager.cs` — defer all DI/wiring + to the combined wire-up. +4. Merge both chains. +5. **Combined wire-up PR 1+2.W** edits `OpcUaServerService.cs` and + `DriverNodeManager.cs` once. + +--- + +## Phase 3 — `Driver.Historian.Wonderware` sidecar + +Goal: house the existing `HistorianDataSource` code in its own .NET 4.8 x86 +service, exposed over named pipe; ship a .NET 10 client implementing +`IHistorianDataSource`. + +### Tasks + +#### PR 3.1 — Create the sidecar shell project + +**Parallel-key:** `historian-sidecar-host`. + +**Files** +- Create project: `src/ZB.MOM.WW.OtOpcUa.Driver.Historian.Wonderware/` + - `Driver.Historian.Wonderware.csproj` (`net48`, + `x86`). + - `Program.cs` — Serilog + console host + named pipe server (mirror + `Driver.Galaxy.Host/Program.cs` shape: env-driven pipe name, allowed SID, + shared secret). +- Create test project: + - `tests/ZB.MOM.WW.OtOpcUa.Driver.Historian.Wonderware.Tests/` +- *Locked:* `.slnx`, `Install-Services.ps1` (wire-up). + +#### PR 3.2 — Lift `HistorianDataSource` & friends + +**Parallel-key:** `historian-sidecar-host`. + +**Files** +- Move (preserve git history with `git mv`): + - `src/.../Driver.Galaxy.Host/Backend/Historian/HistorianDataSource.cs` + → `src/.../Driver.Historian.Wonderware/Backend/HistorianDataSource.cs` + - `HistorianClusterEndpointPicker.cs` + - `HistorianClusterNodeState.cs` + - `HistorianConfiguration.cs` + - `HistorianEventDto.cs` + - `HistorianHealthSnapshot.cs` + - `HistorianQualityMapper.cs` + - `HistorianSample.cs` + - `IHistorianConnectionFactory.cs` +- Add a thin `IHistorianDataSource` shim in the sidecar that re-implements + the **interface from `Core.Abstractions/Historian/`** (after PR 1.1). +- Galaxy.Host needs to keep building until phase 7. Either: + - Add `Driver.Historian.Wonderware` ProjectReference from + `Driver.Galaxy.Host` and re-use the moved code, OR + - Leave a stub copy in Galaxy.Host that delegates to the sidecar via the + new client. Pick option 1 (cleaner). +- Tests: + - `git mv` matching test files from + `tests/ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Host.Tests/Backend/Historian/` + to `tests/ZB.MOM.WW.OtOpcUa.Driver.Historian.Wonderware.Tests/`. + +**Depends on:** PR 1.1 merged (interface lives in Core.Abstractions). + +#### PR 3.3 — Pipe contract + handler + +**Parallel-key:** `historian-sidecar-pipe`. + +**Files** +- Create: + - `src/ZB.MOM.WW.OtOpcUa.Driver.Historian.Wonderware/Ipc/Contracts.cs` + (MessagePack DTOs: `ReadRawRequest/Reply`, `ReadProcessedRequest/Reply`, + `ReadAtTimeRequest/Reply`, `ReadEventsRequest/Reply`, + **`WriteAlarmEventsRequest/Reply`** — alarm-event persistence write + path; mirror today's `GalaxyHistorianWriter.WriteBatchAsync` payload + so the SQLite store-and-forward sink in `Core.AlarmHistorian` can + drain into the Wonderware historian event store after Galaxy.Proxy is + deleted). + - `Ipc/PipeServer.cs` — copy + adapt + `Driver.Galaxy.Host/Ipc/PipeServer.cs` (same ACL/secret model). + - `Ipc/HistorianFrameHandler.cs` — handles all five contract pairs + above. +- Tests: + - `tests/.../Driver.Historian.Wonderware.Tests/Ipc/PipeRoundTripTests.cs` + — round-trip every contract pair including `WriteAlarmEvents`. + +#### PR 3.4 — .NET 10 client + +**Parallel-key:** `historian-sidecar-client`. + +**Files** +- Create project: `src/ZB.MOM.WW.OtOpcUa.Driver.Historian.Wonderware.Client/` + (.NET 10 x64). Implements: + - `IHistorianDataSource` (read path: raw / processed / at-time / events) + against the sidecar pipe. + - `IAlarmHistorianWriter` (write path: alarm-event persistence) against + the sidecar pipe `WriteAlarmEvents` contract from PR 3.3. +- Tests: + - `tests/.../Driver.Historian.Wonderware.Client.Tests/` against an + in-proc fake pipe server. Cover both the read interface and the + alarm-event write interface; verify the SQLite store-and-forward sink + (`Core.AlarmHistorian.SqliteStoreAndForwardSink`) drains successfully + when the client is plugged in as its target. + +**Depends on:** PR 3.3 merged (contracts published). + +#### PR 3.W — Phase 3 wire-up + +**Files** +- `ZB.MOM.WW.OtOpcUa.slnx` — register three new projects + two new test + projects. +- `scripts/install/Install-Services.ps1` — register + `OtOpcUaWonderwareHistorian` NSSM service. +- `src/ZB.MOM.WW.OtOpcUa.Server/OpcUaServerService.cs` — register the + client as both an `IHistorianDataSource` for the Galaxy namespace **and** + the `IAlarmHistorianWriter` target for the SQLite store-and-forward + sink, replacing today's `GalaxyProxyDriver.WriteBatchAsync` route. +- `src/ZB.MOM.WW.OtOpcUa.Server/appsettings.json` — `Historian:Wonderware` + block. + +### Phase 3 parallel batches + +- **Batch 3a (sequential):** 3.1 (shell) → 3.2 (lift code). +- **Batch 3b (parallel after 3.2):** 3.3 (pipe) and 3.4 (client) — but + 3.4 depends on 3.3's contracts. So sequential within Phase 3. +- **Batch 3c:** 3.W. + +But Phase 3 is **fully independent of Phase 1.1's downstream work** once +1.1 has merged. Phase 3 can run in parallel with Phase 1.2/1.3 and all of +Phase 2. + +**Recommended phasing**: kick off Phase 3 in parallel with Phase 2, both +gated only on Phase 1.1's merge. + +--- + +## Phase 4 — New `Driver.Galaxy` (Tier-A, .NET 10) against gw + +This is the bulk of the work. Each PR adds one capability to the new driver. +The driver builds and links from PR 4.0 onward; capabilities arrive as +incremental green bars. + +The driver lives at `src/ZB.MOM.WW.OtOpcUa.Driver.Galaxy/` (note: same +short name as the old `.Proxy`, but new project. The `.Host`, `.Proxy`, +`.Shared` projects continue to coexist until phase 7). + +### Tasks + +#### PR 4.0 — Project skeleton, options, factory + +**Parallel-key:** `galaxy-shell`. + +**Files** +- Create project: `src/ZB.MOM.WW.OtOpcUa.Driver.Galaxy/` + - `Driver.Galaxy.csproj` (.NET 10 x64), references + `Core.Abstractions`, `Core`, `MxGateway.Client` (NuGet from gw repo). + - `GalaxyDriver.cs` — `IDriver` + `IDisposable` skeleton; `Initialize` + creates `MxGatewayClient` and opens a session; `Shutdown` disposes. + - `Config/GalaxyDriverOptions.cs` — POCO matching the JSON shape in + `lmx_mxgw.md`. + - `GalaxyDriverFactoryExtensions.cs` — `AddGalaxyDriver(IServiceCollection)`. +- Tests: + - `tests/ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Tests/` (new project) + - `Tests/GalaxyDriverInitializationTests.cs` — uses a fake + `IMxGatewayClientTransport` to verify open-session behavior. +- *Locked:* `.slnx` (wire-up PR 4.W). + +**Acceptance** +- Driver builds, `Initialize` opens a session against a fake transport, + `Shutdown` closes it. +- `IDriver.RecycleAsync` (if present in the interface today) returns the + same stub shape as the legacy backend — `{Accepted = true, GraceSeconds + = 15}` — and is documented in the file as intentionally a no-op until a + future PR wires it through gw. Today's `MxAccessGalaxyBackend.RecycleAsync` + is itself a stub, so this preserves behavior exactly. + +#### PR 4.1 — `ITagDiscovery` via `GalaxyRepositoryClient` + +**Parallel-key:** `galaxy-discover`. + +**Files** +- Create: + - `src/.../Driver.Galaxy/Browse/GalaxyDiscoverer.cs` + - `src/.../Driver.Galaxy/Browse/DataTypeMap.cs` — + `mx_data_type → DriverDataType`. Port table from + `GalaxyProxyDriver.MapDataType` (lines 523–532) and verify against + `gr/data_type_mapping.md`. + - `src/.../Driver.Galaxy/Browse/SecurityMap.cs` — port + `GalaxyProxyDriver.MapSecurity` (lines 534–544). + - `src/.../Driver.Galaxy/Browse/AlarmRefBuilder.cs` — for any attribute + where `IsAlarm=true`, compute the five sub-attribute references by + Galaxy naming convention (`..InAlarm`, + `..Priority`, `..DescAttrName`, + `..Acked`, `..AckMsg`) and populate + `AlarmConditionInfo.{InAlarmRef, PriorityRef, DescAttrNameRef, + AckedRef, AckMsgWriteRef}` before passing to `MarkAsAlarmCondition`. + Mirrors today's behavior in + `MxAccessGalaxyBackend.SubscribeAlarmsAsync` so the server-level + `AlarmConditionService` (Phase 2) has every ref it needs. +- Modify: + - `GalaxyDriver.cs` — implement `ITagDiscovery.DiscoverAsync` calling + discoverer. +- Tests: + - `Tests/Browse/GalaxyDiscovererTests.cs` — fake + `IGalaxyRepositoryClientTransport` with canned `GalaxyObject` list. + - `Tests/Browse/AlarmRefBuilderTests.cs` — for an alarm-bearing + attribute, verify all five refs match the `..{...}` shape + and round-trip cleanly through `MarkAsAlarmCondition`. + +**Acceptance** +- Discovered nodes carry `mx_data_type`, `IsArray`, `ArrayDim`, + `SecurityClassification`, `IsHistorized`, `IsAlarm` matching what the + legacy backend produces (snapshot-compared in Phase 5). +- Every `IsAlarm=true` attribute calls `MarkAsAlarmCondition` with all + five sub-attribute refs populated. The `AlarmConditionService` from + Phase 2 must be able to subscribe and ack without further help from + the driver. + +**Subagent guidance** +- Use an `Explore` agent first: "find every place in + `Driver.Galaxy.Proxy/GalaxyProxyDriver.cs` that consumes + `DiscoverHierarchyResponse` and list every wire field it reads, so we + know what gw's proto must surface." + +**Depends on:** PR 4.0 merged + PR 0.1 (gw attribute parity) NuGet bumped. + +#### PR 4.2 — `IReadable` (one-shot read path) + +**Parallel-key:** `galaxy-read`. + +**Files** +- Create: + - `src/.../Driver.Galaxy/Runtime/GalaxyMxSession.cs` — owns + `MxGatewaySession`, `Register` server handle, in-memory + `tag → itemHandle` registry. + - `src/.../Driver.Galaxy/Runtime/MxValueDecoder.cs` — + `MxValue → object` (boolean/int32/float/double/string/datetime, plus + array variants). + - `src/.../Driver.Galaxy/Runtime/StatusCodeMap.cs` — explicit + `MxStatusProxy → uint OPC UA StatusCode` mapping table. Today's + coarse `vtq.Quality >= 192 ? Good : Uncertain_Placeholder` becomes a + full mapping covering at minimum: + `Good (0x0)`, `Uncertain (0x40000000)`, `Uncertain_LastUsableValue + (0x40A40000)`, `Bad (0x80000000)`, `Bad_NotConnected (0x808A0000)`, + `Bad_NoCommunication (0x80310000)`, `Bad_OutOfService (0x808D0000)`. + Document any unmapped category as `Bad_InternalError` and log once + with the raw `MxStatusProxy` so the matrix can be extended from + field data. +- Modify: + - `GalaxyDriver.cs` — implement `IReadable.ReadAsync`: per tag, + `AddItem` → short-lived `Advise` → first `OnDataChange`. (If + Phase 0 added a synchronous `ReadAsync` RPC, use that; flag a follow-up + if missing.) +- Tests: + - `Tests/Runtime/GalaxyReadTests.cs` — fake transport with scripted + `OnDataChange` responses. + - `Tests/Runtime/StatusCodeMapTests.cs` — exhaustive mapping cases plus + "unknown category falls back to Bad_InternalError and emits a single + diagnostic log" assertion. + +**Depends on:** PR 4.0. + +#### PR 4.3 — `IWritable` + secured-write routing + +**Parallel-key:** `galaxy-write`. + +**Files** +- Create: + - `src/.../Driver.Galaxy/Runtime/MxValueEncoder.cs` — + `object → MxValue` (the inverse of 4.2's decoder; unify into one type + if simpler). +- Modify: + - `GalaxyDriver.cs` — implement `IWritable.WriteAsync`. + Route writes whose attribute carries + `SecurityClassification.SecuredWrite` / `VerifiedWrite` through + `WriteSecuredAsync` (mxaccessgw exposes this in `MxGatewaySession`). +- Tests: + - `Tests/Runtime/GalaxyWriteTests.cs` — verify the routing decision + given each `SecurityClassification` value. + +**Depends on:** PR 4.2 merged (shares `GalaxyMxSession` + value type code). + +#### PR 4.4 — `ISubscribable` + `EventPump` + +**Parallel-key:** `galaxy-subscribe`. + +**Files** +- Create: + - `src/.../Driver.Galaxy/Runtime/SubscriptionRegistry.cs` — + `(driverSubId → list)` and reverse map. + - `src/.../Driver.Galaxy/Runtime/EventPump.cs` — single consumer of + `MxGatewaySession.StreamEventsAsync`. Maps each `OnDataChange` to a + `DataChangeEventArgs` per registered driver subscription. + - `src/.../Driver.Galaxy/Runtime/GalaxySubscriptionHandle.cs` (port from + Proxy). +- Modify: + - `GalaxyDriver.cs` — implement `ISubscribable.SubscribeAsync` using + `SubscribeBulkAsync` with the `buffered_update_interval_ms` hint + from PR 0.2. +- Tests: + - `Tests/Runtime/EventPumpFanoutTests.cs` — one item → multiple driver + subscriptions → one event per driver subscription. + - `Tests/Runtime/SubscribeBulkTests.cs` — partial failures. + +**Depends on:** PR 4.3. + +#### PR 4.5 — `ReconnectSupervisor` + +**Parallel-key:** `galaxy-reconnect`. + +**Files** +- Create: + - `src/.../Driver.Galaxy/Runtime/ReconnectSupervisor.cs` — state machine + `(Healthy → TransportLost → ReopeningSession → ReplayingSubscriptions + → Healthy)`. Surfaces `DriverState.Degraded` while not Healthy. +- Modify: + - `GalaxyDriver.cs` + `GalaxyMxSession.cs` — wire transport-error + callbacks into the supervisor; replay subscriptions via + `ReplaySubscriptionsCommand` (PR 0.3). +- Tests: + - `Tests/Runtime/ReconnectSupervisorTests.cs` with simulated drops. + +**Depends on:** PR 4.4. Strong recommend Phase 0.3 (replay RPC) merged. + +#### PR 4.6 — `IRediscoverable` via `WatchDeployEvents` + +**Parallel-key:** `galaxy-deploy`. + +**Files** +- Create: + - `src/.../Driver.Galaxy/Browse/DeployWatcher.cs` — long-lived consumer + of `GalaxyRepositoryClient.WatchDeployEventsAsync`. +- Modify: + - `GalaxyDriver.cs` — start watcher on Initialize; raise + `OnRediscoveryNeeded` per event. +- Tests: + - `Tests/Browse/DeployWatcherTests.cs`. + +**Depends on:** PR 4.0. **Independent of PR 4.2–4.5** — can run in +parallel with all of them. + +#### PR 4.7 — `IHostConnectivityProbe` (transport health + per-platform probes) + +**Parallel-key:** `galaxy-health`. + +The current driver reports two flavors of host connectivity: + +1. **Top-level transport health** — flips `Running`/`Stopped` on the + synthetic host named after `OTOPCUA_GALAXY_CLIENT_NAME` whenever the + MXAccess COM proxy connects/disconnects. +2. **Per-platform `ScanState` probes** — for each discovered + `$WinPlatform` and `$AppEngine` gobject, advise its `ScanState` + attribute and translate value transitions into per-host + `Running`/`Stopped`/`Unknown`. Lives in + `Driver.Galaxy.Host/Backend/Stability/GalaxyRuntimeProbeManager.cs`. + +This PR ports both. + +**Files** +- Create: + - `src/.../Driver.Galaxy/Health/HostConnectivityForwarder.cs` — + consumes PR 0.4 `StreamSessionHealth` and surfaces the synthetic + top-level host entry (named after the configured MXAccess + `ClientName`). + - `src/.../Driver.Galaxy/Health/PerPlatformProbeWatcher.cs` — port of + `GalaxyRuntimeProbeManager`. On `Discover`, takes the list of + discovered `$WinPlatform`/`$AppEngine` tag names, subscribes their + `ScanState` via the driver's own `GalaxyMxSession.SubscribeBulkAsync` + (or directly through the gw session), runs the same state machine + (`OnProbeCallback` interpretation logic — port verbatim with tests), + and raises per-host `HostStatusChangedEventArgs` through the + aggregator below. + - `src/.../Driver.Galaxy/Health/HostStatusAggregator.cs` — single + sink that merges the forwarder's transport entry with the watcher's + per-platform entries into the `IReadOnlyList` + surfaced by `IHostConnectivityProbe.GetHostStatuses()`. Owns the + de-dup + diff logic that today lives in + `GalaxyProxyDriver.OnHostConnectivityUpdate`. +- Modify: + - `GalaxyDriver.cs` — wire forwarder + watcher + aggregator into + Initialize. On every `ITagDiscovery.DiscoverAsync` completion (incl. + re-discovery from PR 4.6), feed the watcher the fresh platform list + so probe subscriptions follow Galaxy redeploys. +- Tests: + - `Tests/Health/HostConnectivityForwarderTests.cs`. + - `Tests/Health/PerPlatformProbeWatcherTests.cs` — port the existing + `GalaxyRuntimeProbeManagerTests` (or whatever covers + `OnProbeCallback`) verbatim. Cover: initial subscribe on Discover, + re-subscribe after Rediscover, value-transition state machine, + cleanup on Shutdown. + - `Tests/Health/HostStatusAggregatorTests.cs` — transport entry plus + multiple per-platform entries, transitions, aggregator emits + `OnHostStatusChanged` only on actual state change. + +**Acceptance** +- Top-level transport up/down reflected within 1s of gw `SessionHealth` + flip. +- Each `$WinPlatform` / `$AppEngine` gobject in the discovered hierarchy + produces exactly one entry in `GetHostStatuses()`, transitioning on + `ScanState` changes. +- After a redeploy that adds a new platform, the watcher subscribes its + `ScanState` without restarting the driver. + +**Depends on:** PR 4.0 + PR 4.1 (needs the discoverer's platform list). +**Independent of PR 4.2–4.6** — parallel-safe with the runtime track. + +#### PR 4.W — Backend-flag wiring + +**Parallel-key:** locked-files. + +**Files** +- `src/.../Server/Configuration/DriverFactoryRegistry.cs` (or wherever + drivers are wired) — add a `Galaxy:Backend` switch: + - `legacy-host` → existing `GalaxyProxyDriver` registration (untouched). + - `mxgateway` → new `GalaxyDriver` registration via PR 4.0's extension. +- `src/.../Server/appsettings.json` — sample new config block. +- `ZB.MOM.WW.OtOpcUa.slnx` — register `Driver.Galaxy` and its tests. +- `CLAUDE.md` — note new driver, retain old driver pointers. + +**Acceptance** +- With `Galaxy:Backend=legacy-host` (default), unchanged behavior. +- With `Galaxy:Backend=mxgateway`, server boots against the new driver and + passes a smoke test against the dev gw. + +### Phase 4 parallel batches + +Dependency graph: + +``` +4.0 (shell) ──┬── 4.1 (discover) ──┬── 4.6 (deploy) + │ └── 4.7 (health: needs platform list) + ├── 4.2 (read) ── 4.3 (write) ── 4.4 (subscribe) ── 4.5 (reconnect) + │ \ + │ → 4.W (wire-up) + └── (no longer parallel-with-4.1: 4.7 moved under 4.1) +``` + +- After 4.0 merges, **4.1 and the 4.2-chain head** can run in two parallel + worktrees. +- After 4.1 merges, **4.6 and 4.7** can run in two parallel worktrees. +- 4.2 → 4.3 → 4.4 → 4.5 is one sequential chain on its own worktree + (they all touch `GalaxyDriver.cs` and `GalaxyMxSession.cs`) and runs + alongside the discover/deploy/health track. +- 4.W gathers everything. + +**Recommended Phase 4 plan:** +- Stage 1 (after 4.0): two worktrees — W1: 4.1; W2: 4.2 → 4.3 → 4.4 → 4.5. +- Stage 2 (after 4.1 merges, W2 still running): three worktrees — + W1: 4.6; W3: 4.7; W2: continues runtime chain. +- Stage 3: 4.W wire-up. + +--- + +## Phase 5 — Parity test matrix + +### Tasks + +#### PR 5.1 — `Driver.Galaxy.ParityTests` project + +**Parallel-key:** `parity-shell`. + +**Files** +- Create: `tests/ZB.MOM.WW.OtOpcUa.Driver.Galaxy.ParityTests/` + - `ParityHarness.cs` — boots the OtOpcUa server twice with each backend, + drives the same OPC UA scenarios, captures structured snapshots. + - Theory data per scenario (browse, subscribe, alarm transition, write + by classification, history read). +- Reuses existing live-Galaxy fixtures from + `tests/.../Driver.Galaxy.E2E/`. + +#### PR 5.2 — Browse + read parity scenarios + +**Parallel-key:** `parity-browse`. + +#### PR 5.3 — Subscribe + event-rate parity scenarios + +**Parallel-key:** `parity-subscribe`. + +#### PR 5.4 — Write-by-classification parity scenarios + +**Parallel-key:** `parity-write`. + +#### PR 5.5 — Alarm-transition parity scenarios + +**Parallel-key:** `parity-alarms`. + +Cover both: +- **Live transitions:** Active / Acknowledged / Inactive sequences against + `.InAlarm` / `.Acked` value flips on the dev Galaxy. Must match + legacy-host event ordering and severity mapping. +- **Alarm-event persistence:** trigger N alarm transitions, then verify + the SQLite store-and-forward sink drains them into the Wonderware + historian event store via the new sidecar's `WriteAlarmEvents` + contract (PR 3.3). Compare the persisted rows to those produced by the + legacy `GalaxyHistorianWriter` path. + +#### PR 5.6 — History-read parity scenarios + +**Parallel-key:** `parity-history`. + +#### PR 5.7 — Reconnect/disruption scenarios + +**Parallel-key:** `parity-reconnect`. + +#### PR 5.8 — Per-platform `ScanState` probe parity + +**Parallel-key:** `parity-probes`. + +Verify the new `PerPlatformProbeWatcher` (PR 4.7) produces the same +per-host `HostConnectivityStatus` stream as the legacy +`GalaxyRuntimeProbeManager`: +- Initial state on Discover for each `$WinPlatform` / `$AppEngine`. +- Transition events when a runtime is stopped/started on the dev Galaxy. +- Re-subscription after a redeploy that adds/removes a platform. +- Cleanup of probe subscriptions on Shutdown (no leaked advises in gw). + +#### PR 5.W — Parity matrix doc + +**Files** +- `docs/v2/Galaxy.ParityMatrix.md` — table of scenario × result for both + backends. Resolved deltas marked, accepted deltas justified. + +### Phase 5 parallel batches + +After 5.1 lands, scenarios 5.2–5.8 are **fully parallel** — they each add +a separate test class file. Seven worktrees, seven `general-purpose` agents. + +5.W runs after all scenarios merge and pass. + +--- + +## Phase 6 — Performance + hardening + +### Tasks + +#### PR 6.1 — OpenTelemetry traces + +**Parallel-key:** `perf-otel`. + +#### PR 6.2 — Bounded channel + drop-newest metrics + +**Parallel-key:** `perf-eventpump`. + +#### PR 6.3 — Buffered update interval landing + +**Parallel-key:** `perf-buffered`. +Wire `MxAccess:PublishingIntervalMs` → `SetBufferedUpdateInterval` once +gw exposes it. + +#### PR 6.4 — Soak test scenario + +**Parallel-key:** `perf-soak`. +50k tags, 24h, automated metric collection. + +#### PR 6.5 — Tune `MxGatewayClientOptions` defaults + +**Parallel-key:** `perf-tuning`. +Based on soak data. + +#### PR 6.W — Performance doc + +`docs/v2/Galaxy.Performance.md`. + +### Phase 6 parallel batches + +6.1, 6.2, 6.3 all touch `Driver.Galaxy/Runtime/`. Serialize them, OR split +files explicitly: +- 6.1 owns a new `Runtime/Tracing.cs` injected via decorator. Parallel-safe. +- 6.2 owns `Runtime/EventPump.cs`. Conflicts with PR 4.4 only if reordered; + not in parallel with 6.1 if 6.1 also wraps EventPump. Decide upfront: + PR 6.1 wraps at the gateway-client boundary, PR 6.2 owns EventPump + internals. Parallel-safe. +- 6.3 modifies `GalaxyDriver.SubscribeAsync` only. Parallel-safe. + +So 6.1, 6.2, 6.3 parallel, then 6.4 (depends on all three). 6.5 sequential +after 6.4 (uses its data). 6.W last. + +--- + +## Phase 7 — Retire legacy + +### Tasks + +#### PR 7.1 — Default flip + +**Parallel-key:** `retire-defaults`. + +**Files** +- `src/.../Server/appsettings.json` → `Galaxy:Backend = mxgateway`. +- `scripts/e2e/e2e-config.sample.json` → drop `OTOPCUA_GALAXY_*` pipe vars, + add gw endpoint. +- `scripts/install/Install-Services.ps1` → remove + `OtOpcUaGalaxyHost` registration; keep `OtOpcUaWonderwareHistorian` from + PR 3.W. + +#### PR 7.2 — Delete legacy projects + +**Parallel-key:** `retire-delete`. + +**Files** +- Delete: + - `src/ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Host/` + - `src/ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Proxy/` + - `src/ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Shared/` + - `tests/ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Host.Tests/` + - `tests/ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Proxy.Tests/` + - `tests/ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Shared.Tests/` +- Modify: + - `ZB.MOM.WW.OtOpcUa.slnx` — remove the six entries. + - `Server/Configuration/DriverFactoryRegistry.cs` — remove the + `legacy-host` switch arm. + +**Depends on:** PR 7.1 fully soaked (no rollback risk). + +#### PR 7.3 — Doc + memory housekeeping + +**Parallel-key:** `retire-docs`. + +**Files** +- `CLAUDE.md` — rewrite Galaxy section. +- `docs/v2/dev-environment.md` — drop `OtOpcUaGalaxyHost` references. +- `docs/ServiceHosting.md`, `docs/Redundancy.md`, `docs/security.md` — + scrub `Galaxy.Host`/`Galaxy.Proxy` mentions. +- `~/.claude/projects/.../memory/MEMORY.md` — retire entries: + - `project_galaxy_host_service.md` + - `project_galaxy_host_installed.md` + - `project_aveva_platform_installed.md` (revise — server box no longer + needs AVEVA; gw box does) +- Delete: + - `mxaccess_documentation.md` (no longer consumed by this repo). +- Add memory entry: `project_galaxy_via_mxgateway.md`. + +### Phase 7 parallel batches + +- **Batch 7a (sequential, gated by phase 6 production soak):** 7.1. +- **Batch 7b (parallel after 7.1):** 7.2 (`retire-delete`) and 7.3 + (`retire-docs`) — disjoint files. + +--- + +## Cross-phase dependency graph + +``` +Phase 0 (gw repo) ────────────────────────────────────┐ + │ +Phase 1.1 (Core.Abs/Historian) ──┐ │ + ├── Phase 1.2/1.3 │ + │ (server History)│ +Phase 2.1 (Core.Abs/Alarms) ──────┤ │ + ├── Phase 2.2/2.3 │ + │ (server Alarms) │ + │ │ + └── Phase 3 (sidecar host + client) + │ │ + └─────────┴── Phase 4 (Driver.Galaxy) + │ + Phase 5 (parity) + │ + Phase 6 (perf) + │ + Phase 7 (retire) +``` + +### Maximum-parallelism rollout (one possible execution) + +- **Day 0–N (mxaccessgw):** Phase 0 batches 0a + 0b + 0.W in parallel + worktrees, separate repo from this one — runs in parallel with everything + below until consumers need the gw bump. +- **Day 0–N (this repo):** Phases 1.1 and 2.1 in parallel (two worktrees). + Merge. +- **Day N+:** Phases 1.2/1.3, 2.2/2.3, 3.1+3.2+3.3+3.4 in parallel (three + worktrees, each a sequential chain). +- **Day M:** combined wire-up PR 1+2.W, then PR 3.W. Server passes existing + e2e against legacy backend. +- **Day M+:** Phase 4.0 lands. Phase 4 fan-out (four worktrees) starts. +- **Day P:** Phase 4 wire-up. Phase 5 fan-out (six worktrees) starts. +- **Day Q:** Phase 5 wire-up. Phase 6 fan-out (three worktrees + sequential). +- **Day R:** Phase 7. Done. + +--- + +## Subagent prompt template + +Re-use this shell when launching any of the parallel coding tasks. Replace +`` parts. + +``` +You are implementing PR from lmx_mxgw_impl.md (""). +Repo: <C:\Users\dohertj2\Desktop\lmxopcua | C:\Users\dohertj2\Desktop\mxaccessgw>. +Worktree: <path>. + +Scope (you may create/edit only these files): +<list> + +DO NOT edit: +- Any file outside the scope above +- ZB.MOM.WW.OtOpcUa.slnx / mxaccessgw/MxGateway.sln +- src/.../Server/Program.cs, OpcUaServerService.cs, appsettings.json +- scripts/install/Install-Services.ps1 +- scripts/e2e/e2e-config.sample.json +- CLAUDE.md, docs/**, MEMORY.md, mxaccess_documentation.md + +Acceptance: +<list> + +Tests: +<list> + +If you find a needed change outside scope, STOP and surface it as a +finding rather than editing — it will be picked up by the wire-up PR. + +Before reporting completion: +1. Run `dotnet build <smallest project tree that covers your scope>`. +2. Run the new/changed tests. +3. Report: files changed, test command + result, any out-of-scope + findings. +``` + +--- + +## Risk register (operational) + +| Risk | When it bites | Mitigation | +|---|---|---| +| Phase 0 gw bump breaks existing mxaccessgw consumers | Phase 0 wire-up | Cross-language smoke matrix in mxaccessgw must run before merge | +| Two parallel agents both edit `OpcUaServerService.cs` despite the rule | Phases 1+2 parallel | Wire-up convention + grep-based pre-merge check (`git diff --stat origin/main` of locked files in the integration branch must be empty until the wire-up PR) | +| Subagent silently adds a stray `using` to a locked file | Anytime | The build-and-test step in the prompt will fail if the locked file changed and broke compile; a `git diff --name-only` whitelist check at integration-branch merge time enforces it | +| Galaxy.Host can't build during phase 3.2 because lifted files vanished | Phase 3 mid-flight | PR 3.2 adds a ProjectReference from Galaxy.Host to Driver.Historian.Wonderware so the moved files remain reachable; tests cover both call sites | +| Phase 4 chain stalls because gw exposes no synchronous read | PR 4.2 | Surface as a Phase 0 finding immediately — add a `ReadCommand` to gw or accept short-lived advise as the read mechanism (document as a perf accepted delta in 5.W) | +| Phase 5 parity matrix exposes a delta no one wants to fix | Phase 5 | Phase 7 gating: `Galaxy:Backend=mxgateway` does not become default until every parity delta is either resolved or has a written acceptance from the user | +| Soak test in 6.4 finds a memory leak in `EventPump` | Phase 6 | EventPump bounded-channel design (PR 6.2) is shipped before soak so the leak is bounded by design | +| Stale memory file references retired code after phase 7 | Phase 7 | PR 7.3 explicitly retires `project_galaxy_host_*` entries; add a memory-audit step to phase-close checklist | + +--- + +## Phase-close checklist (apply at the end of each phase) + +Before declaring a phase done: +1. `dotnet build ZB.MOM.WW.OtOpcUa.slnx` clean on integration branch. +2. `dotnet test ZB.MOM.WW.OtOpcUa.slnx` clean (or all-but-known-skipped). +3. Live-Galaxy smoke (when applicable) green on dev box. +4. No locked files modified outside their wire-up PR + (`git log --name-only origin/main..HEAD -- <locked-paths>` shows only + the wire-up commit). +5. `MEMORY.md` updated for any persistent context this phase introduced. +6. Doc updates limited to the phase's scope (no doc edits sprinkled across + non-doc PRs). diff --git a/src/ZB.MOM.WW.OtOpcUa.Core.Abstractions/Historian/HistorianClusterNodeState.cs b/src/ZB.MOM.WW.OtOpcUa.Core.Abstractions/Historian/HistorianClusterNodeState.cs new file mode 100644 index 0000000..d54041f --- /dev/null +++ b/src/ZB.MOM.WW.OtOpcUa.Core.Abstractions/Historian/HistorianClusterNodeState.cs @@ -0,0 +1,19 @@ +namespace ZB.MOM.WW.OtOpcUa.Core.Abstractions; + +/// <summary> +/// Point-in-time state of a single historian cluster node, included inside +/// <see cref="HistorianHealthSnapshot.Nodes"/> when the backend is clustered. +/// </summary> +/// <param name="Name">Node identifier — backend-specific (typically a hostname).</param> +/// <param name="IsHealthy">True when the node is currently considered usable for reads.</param> +/// <param name="CooldownUntil">When the next retry against an unhealthy node is allowed; null when no cooldown is active.</param> +/// <param name="FailureCount">Consecutive failures observed against this node since the last success.</param> +/// <param name="LastError">Diagnostic text from the last failure against this node; null when no failures.</param> +/// <param name="LastFailureTime">UTC of the last failure against this node; null when no failures.</param> +public sealed record HistorianClusterNodeState( + string Name, + bool IsHealthy, + DateTime? CooldownUntil, + int FailureCount, + string? LastError, + DateTime? LastFailureTime); diff --git a/src/ZB.MOM.WW.OtOpcUa.Core.Abstractions/Historian/HistorianHealthSnapshot.cs b/src/ZB.MOM.WW.OtOpcUa.Core.Abstractions/Historian/HistorianHealthSnapshot.cs new file mode 100644 index 0000000..8831922 --- /dev/null +++ b/src/ZB.MOM.WW.OtOpcUa.Core.Abstractions/Historian/HistorianHealthSnapshot.cs @@ -0,0 +1,32 @@ +namespace ZB.MOM.WW.OtOpcUa.Core.Abstractions; + +/// <summary> +/// Point-in-time runtime health of a historian data source. Returned by +/// <see cref="IHistorianDataSource.GetHealthSnapshot"/> and projected onto the +/// server status dashboard. +/// </summary> +/// <param name="TotalQueries">Lifetime count of read calls received.</param> +/// <param name="TotalSuccesses">Subset of <paramref name="TotalQueries"/> that completed without error.</param> +/// <param name="TotalFailures">Subset of <paramref name="TotalQueries"/> that ended in error.</param> +/// <param name="ConsecutiveFailures">Failures since the last success — non-zero means the source is currently degraded.</param> +/// <param name="LastSuccessTime">UTC of the most recent successful read; null if none yet.</param> +/// <param name="LastFailureTime">UTC of the most recent failed read; null if none yet.</param> +/// <param name="LastError">Diagnostic text from the most recent failure; null when no failures recorded.</param> +/// <param name="ProcessConnectionOpen">True when the source's process-data connection is currently established.</param> +/// <param name="EventConnectionOpen">True when the source's event-data connection is currently established. Some backends share one connection — implementations may report the same value here as <paramref name="ProcessConnectionOpen"/>.</param> +/// <param name="ActiveProcessNode">Cluster node currently serving process reads; null when no node is active or the backend is non-clustered.</param> +/// <param name="ActiveEventNode">Cluster node currently serving event reads; null when no node is active or the backend is non-clustered.</param> +/// <param name="Nodes">Per-cluster-node state. Empty when the backend is non-clustered.</param> +public sealed record HistorianHealthSnapshot( + long TotalQueries, + long TotalSuccesses, + long TotalFailures, + int ConsecutiveFailures, + DateTime? LastSuccessTime, + DateTime? LastFailureTime, + string? LastError, + bool ProcessConnectionOpen, + bool EventConnectionOpen, + string? ActiveProcessNode, + string? ActiveEventNode, + IReadOnlyList<HistorianClusterNodeState> Nodes); diff --git a/src/ZB.MOM.WW.OtOpcUa.Core.Abstractions/Historian/IHistorianDataSource.cs b/src/ZB.MOM.WW.OtOpcUa.Core.Abstractions/Historian/IHistorianDataSource.cs new file mode 100644 index 0000000..581ce77 --- /dev/null +++ b/src/ZB.MOM.WW.OtOpcUa.Core.Abstractions/Historian/IHistorianDataSource.cs @@ -0,0 +1,74 @@ +namespace ZB.MOM.WW.OtOpcUa.Core.Abstractions; + +/// <summary> +/// Server-side historian data source. Registered with the server's history router +/// and resolved per OPC UA namespace, independent of any driver's lifecycle. +/// </summary> +/// <remarks> +/// Distinct from <see cref="IHistoryProvider"/>: +/// <list type="bullet"> +/// <item><see cref="IHistoryProvider"/> is a *driver capability* — the server +/// dispatches to it via the driver instance.</item> +/// <item><see cref="IHistorianDataSource"/> is a *server registration* — the +/// server resolves it via namespace and calls it directly, so a single +/// historian (e.g. Wonderware) can serve many drivers' nodes, and drivers can +/// restart without dropping history availability.</item> +/// </list> +/// All values returned use the shared <see cref="DataValueSnapshot"/> / +/// <see cref="HistoricalEvent"/> shapes; backend-specific quality / type encodings +/// are translated to OPC UA <c>StatusCode</c> uints inside the data source. +/// </remarks> +public interface IHistorianDataSource : IDisposable +{ + /// <summary> + /// Read raw historical samples for a single tag over a time range. + /// </summary> + Task<HistoryReadResult> ReadRawAsync( + string fullReference, + DateTime startUtc, + DateTime endUtc, + uint maxValuesPerNode, + CancellationToken cancellationToken); + + /// <summary> + /// Read processed (interval-bucketed) samples — average / min / max / count / etc. + /// A bucket with no source data returns a sample whose + /// <see cref="DataValueSnapshot.StatusCode"/> indicates BadNoData. + /// </summary> + Task<HistoryReadResult> ReadProcessedAsync( + string fullReference, + DateTime startUtc, + DateTime endUtc, + TimeSpan interval, + HistoryAggregateType aggregate, + CancellationToken cancellationToken); + + /// <summary> + /// Read one sample per requested timestamp — OPC UA HistoryReadAtTime service. + /// Implementations interpolate or return prior-boundary samples per their + /// backend's policy. The returned list MUST be the same length and order as + /// <paramref name="timestampsUtc"/>; gaps are returned as Bad-quality snapshots. + /// </summary> + Task<HistoryReadResult> ReadAtTimeAsync( + string fullReference, + IReadOnlyList<DateTime> timestampsUtc, + CancellationToken cancellationToken); + + /// <summary> + /// Read historical alarm / event records — OPC UA HistoryReadEvents service. + /// Distinct from any live event stream; sources here come from the historian's + /// event log. <paramref name="sourceName"/> is null to return all sources. + /// </summary> + Task<HistoricalEventsResult> ReadEventsAsync( + string? sourceName, + DateTime startUtc, + DateTime endUtc, + int maxEvents, + CancellationToken cancellationToken); + + /// <summary> + /// Point-in-time health snapshot for diagnostics and dashboards. Pure + /// observation; never blocks on backend I/O. + /// </summary> + HistorianHealthSnapshot GetHealthSnapshot(); +} diff --git a/src/ZB.MOM.WW.OtOpcUa.Core.Abstractions/IAddressSpaceBuilder.cs b/src/ZB.MOM.WW.OtOpcUa.Core.Abstractions/IAddressSpaceBuilder.cs index 96075c8..32ac0ac 100644 --- a/src/ZB.MOM.WW.OtOpcUa.Core.Abstractions/IAddressSpaceBuilder.cs +++ b/src/ZB.MOM.WW.OtOpcUa.Core.Abstractions/IAddressSpaceBuilder.cs @@ -62,10 +62,41 @@ public interface IVariableHandle /// <param name="SourceName">Human-readable alarm name used for the <c>SourceName</c> event field.</param> /// <param name="InitialSeverity">Severity at address-space build time; updates arrive via <see cref="IAlarmConditionSink"/>.</param> /// <param name="InitialDescription">Initial description; updates arrive via <see cref="IAlarmConditionSink"/>.</param> +/// <param name="InAlarmRef"> +/// Driver-side full reference for the boolean attribute that toggles when the +/// alarm condition becomes active. Consumed by the server-level alarm-condition +/// service to subscribe to active/inactive transitions. Null when the driver +/// reports alarm transitions through some other channel. +/// </param> +/// <param name="PriorityRef"> +/// Driver-side full reference for the integer attribute carrying the alarm's +/// current priority / severity. Live updates flow through the same subscription +/// pipeline as <paramref name="InAlarmRef"/>. Null when the driver does not +/// expose live priority changes. +/// </param> +/// <param name="DescAttrNameRef"> +/// Driver-side full reference for the string attribute carrying the human-readable +/// description / message. Null when the driver does not expose a live description. +/// </param> +/// <param name="AckedRef"> +/// Driver-side full reference for the boolean attribute that toggles when the +/// alarm is acknowledged. Null when acknowledgement is not observable on the +/// driver side. +/// </param> +/// <param name="AckMsgWriteRef"> +/// Driver-side full reference the server writes to acknowledge the condition, +/// typically the alarm's <c>.AckMsg</c> attribute. Null when the driver does not +/// accept acknowledgement writes (or routes them through a separate API). +/// </param> public sealed record AlarmConditionInfo( string SourceName, AlarmSeverity InitialSeverity, - string? InitialDescription); + string? InitialDescription, + string? InAlarmRef = null, + string? PriorityRef = null, + string? DescAttrNameRef = null, + string? AckedRef = null, + string? AckMsgWriteRef = null); /// <summary> /// Sink a concrete address-space builder returns from <see cref="IVariableHandle.MarkAsAlarmCondition"/>. diff --git a/src/ZB.MOM.WW.OtOpcUa.Driver.Historian.Wonderware/Program.cs b/src/ZB.MOM.WW.OtOpcUa.Driver.Historian.Wonderware/Program.cs new file mode 100644 index 0000000..3dcf8c6 --- /dev/null +++ b/src/ZB.MOM.WW.OtOpcUa.Driver.Historian.Wonderware/Program.cs @@ -0,0 +1,58 @@ +using System; +using System.Threading; +using Serilog; + +namespace ZB.MOM.WW.OtOpcUa.Driver.Historian.Wonderware; + +/// <summary> +/// Entry point for the Wonderware Historian sidecar host. PR 3.1 only scaffolds the +/// console host shell — pipe server wiring and SDK access are added in PR 3.3 and +/// PR 3.2 respectively. The host reads the pipe name, allowed-SID, and shared secret +/// from environment variables (passed by the supervisor at spawn time per +/// <c>driver-stability.md</c>) and validates them up front so misconfiguration fails +/// loudly rather than silently degrading. +/// </summary> +public static class Program +{ + public static int Main(string[] args) + { + Log.Logger = new LoggerConfiguration() + .MinimumLevel.Information() + .WriteTo.File( + @"%ProgramData%\OtOpcUa\historian-wonderware-.log".Replace("%ProgramData%", Environment.GetFolderPath(Environment.SpecialFolder.CommonApplicationData)), + rollingInterval: RollingInterval.Day) + .CreateLogger(); + + try + { + var pipeName = Environment.GetEnvironmentVariable("OTOPCUA_HISTORIAN_PIPE") + ?? throw new InvalidOperationException("OTOPCUA_HISTORIAN_PIPE not set — supervisor must pass the sidecar pipe name"); + var allowedSidValue = Environment.GetEnvironmentVariable("OTOPCUA_ALLOWED_SID") + ?? throw new InvalidOperationException("OTOPCUA_ALLOWED_SID not set — supervisor must pass the server principal SID"); + var sharedSecret = Environment.GetEnvironmentVariable("OTOPCUA_HISTORIAN_SECRET") + ?? throw new InvalidOperationException("OTOPCUA_HISTORIAN_SECRET not set — supervisor must pass the per-process secret at spawn time"); + + // Touch the secret so a future trim/AOT pass cannot strip the read; the value is + // consumed for real in PR 3.3 when the pipe handshake is wired in. + _ = sharedSecret.Length; + + using var cts = new CancellationTokenSource(); + Console.CancelKeyPress += (_, e) => { e.Cancel = true; cts.Cancel(); }; + + Log.Information("Wonderware historian sidecar starting — pipe={Pipe} allowedSid={Sid}", pipeName, allowedSidValue); + + // PR 3.1 has no pipe server yet. Block until Ctrl-C so NSSM/the supervisor sees a + // long-running process and the smoke harness can exercise the host lifecycle. + cts.Token.WaitHandle.WaitOne(); + + Log.Information("Wonderware historian sidecar stopping cleanly"); + return 0; + } + catch (Exception ex) + { + Log.Fatal(ex, "Wonderware historian sidecar fatal"); + return 2; + } + finally { Log.CloseAndFlush(); } + } +} diff --git a/src/ZB.MOM.WW.OtOpcUa.Driver.Historian.Wonderware/ZB.MOM.WW.OtOpcUa.Driver.Historian.Wonderware.csproj b/src/ZB.MOM.WW.OtOpcUa.Driver.Historian.Wonderware/ZB.MOM.WW.OtOpcUa.Driver.Historian.Wonderware.csproj new file mode 100644 index 0000000..2acff21 --- /dev/null +++ b/src/ZB.MOM.WW.OtOpcUa.Driver.Historian.Wonderware/ZB.MOM.WW.OtOpcUa.Driver.Historian.Wonderware.csproj @@ -0,0 +1,28 @@ +<Project Sdk="Microsoft.NET.Sdk"> + + <PropertyGroup> + <OutputType>Exe</OutputType> + <TargetFramework>net48</TargetFramework> + <!-- x86 to match the in-process bitness expectations of the Wonderware Historian SDK + that PR 3.2 will lift in. Mirrors Driver.Galaxy.Host's bitness for consistency. --> + <PlatformTarget>x86</PlatformTarget> + <Prefer32Bit>true</Prefer32Bit> + <Nullable>enable</Nullable> + <LangVersion>latest</LangVersion> + <TreatWarningsAsErrors>true</TreatWarningsAsErrors> + <GenerateDocumentationFile>true</GenerateDocumentationFile> + <NoWarn>$(NoWarn);CS1591</NoWarn> + <RootNamespace>ZB.MOM.WW.OtOpcUa.Driver.Historian.Wonderware</RootNamespace> + <AssemblyName>OtOpcUa.Driver.Historian.Wonderware</AssemblyName> + </PropertyGroup> + + <ItemGroup> + <PackageReference Include="Serilog" Version="4.2.0"/> + <PackageReference Include="Serilog.Sinks.File" Version="7.0.0"/> + </ItemGroup> + + <ItemGroup> + <InternalsVisibleTo Include="ZB.MOM.WW.OtOpcUa.Driver.Historian.Wonderware.Tests"/> + </ItemGroup> + +</Project> diff --git a/src/ZB.MOM.WW.OtOpcUa.Server/Alarms/AlarmConditionService.cs b/src/ZB.MOM.WW.OtOpcUa.Server/Alarms/AlarmConditionService.cs new file mode 100644 index 0000000..ff59f9a --- /dev/null +++ b/src/ZB.MOM.WW.OtOpcUa.Server/Alarms/AlarmConditionService.cs @@ -0,0 +1,289 @@ +using System.Collections.Concurrent; +using ZB.MOM.WW.OtOpcUa.Core.Abstractions; + +namespace ZB.MOM.WW.OtOpcUa.Server.Alarms; + +/// <summary> +/// Server-level alarm-condition state machine. Tracks one entry per registered +/// condition; consumes value changes from the four sub-attribute references in +/// <see cref="AlarmConditionInfo"/> (InAlarm / Priority / Description / Acked) and +/// raises <see cref="TransitionRaised"/> on Active / Acknowledged / Inactive +/// transitions per OPC UA Part 9 (simplified). Operator acknowledgement routes +/// through <see cref="IAlarmAcknowledger"/> against <c>AckMsgWriteRef</c>. +/// </summary> +/// <remarks> +/// This is the driver-agnostic replacement for <c>GalaxyAlarmTracker</c>. The +/// service does not own subscription lifecycle — PR 2.3 will wire DriverNodeManager +/// to subscribe through the driver's <c>ISubscribable</c> and forward value changes +/// here via <see cref="OnValueChanged"/>. Keeping the service free of subscription +/// plumbing makes it trivially testable and lets future drivers feed it from any +/// value source (in-process, gRPC, named pipe). +/// </remarks> +public sealed class AlarmConditionService : IDisposable +{ + private readonly Func<DateTime> _clock; + + // ConditionId → state. + private readonly ConcurrentDictionary<string, AlarmConditionState> _conditions = + new(StringComparer.OrdinalIgnoreCase); + + // Sub-attribute full ref → (conditionId, which field). Multiple conditions may + // observe the same sub-attribute (rare but legal); the value is a list to support + // fan-out on a single value change. + private readonly ConcurrentDictionary<string, List<(string ConditionId, AlarmField Field)>> _refToCondition = + new(StringComparer.OrdinalIgnoreCase); + + private readonly object _refMapLock = new(); + + private bool _disposed; + + /// <summary> + /// Fired when a registered condition transitions Active / Acknowledged / Inactive. + /// Handlers must be cheap; the event is raised on whatever thread feeds + /// <see cref="OnValueChanged"/> and blocks the value-change pipeline. + /// </summary> + public event EventHandler<AlarmConditionTransition>? TransitionRaised; + + public AlarmConditionService() : this(() => DateTime.UtcNow) { } + + /// <summary>Test seam — inject a fixed clock for deterministic transition timestamps.</summary> + internal AlarmConditionService(Func<DateTime> clock) + { + _clock = clock ?? throw new ArgumentNullException(nameof(clock)); + } + + /// <summary>Number of currently tracked conditions. Diagnostic only.</summary> + public int TrackedCount => _conditions.Count; + + /// <summary> + /// Register a condition. Idempotent — repeat calls for the same + /// <paramref name="conditionId"/> are a no-op. The acker is captured for the + /// condition's lifetime; pass null when the driver does not accept acks. + /// </summary> + public void Track(string conditionId, AlarmConditionInfo info, IAlarmAcknowledger? acker = null) + { + ObjectDisposedException.ThrowIf(_disposed, this); + ArgumentException.ThrowIfNullOrWhiteSpace(conditionId); + ArgumentNullException.ThrowIfNull(info); + + var state = new AlarmConditionState(conditionId, info, acker); + if (!_conditions.TryAdd(conditionId, state)) return; + + lock (_refMapLock) + { + AddRefMapping(info.InAlarmRef, conditionId, AlarmField.InAlarm); + AddRefMapping(info.PriorityRef, conditionId, AlarmField.Priority); + AddRefMapping(info.DescAttrNameRef, conditionId, AlarmField.DescAttrName); + AddRefMapping(info.AckedRef, conditionId, AlarmField.Acked); + } + } + + /// <summary>Deregister a condition. No-op when not tracked.</summary> + public void Untrack(string conditionId) + { + if (_disposed) return; + if (!_conditions.TryRemove(conditionId, out var state)) return; + + lock (_refMapLock) + { + RemoveRefMapping(state.Info.InAlarmRef, conditionId); + RemoveRefMapping(state.Info.PriorityRef, conditionId); + RemoveRefMapping(state.Info.DescAttrNameRef, conditionId); + RemoveRefMapping(state.Info.AckedRef, conditionId); + } + } + + /// <summary> + /// Returns the set of sub-attribute references the service currently needs + /// subscribed. Callers wire one subscription per ref through the driver's + /// <see cref="ISubscribable"/>; PR 2.3 owns that wiring. + /// </summary> + public IReadOnlyCollection<string> GetSubscribedReferences() + { + lock (_refMapLock) return [.. _refToCondition.Keys]; + } + + /// <summary> + /// Operator acknowledgement entry point. Returns false when the condition is + /// not tracked, the condition has no acker registered, the condition has no + /// <c>AckMsgWriteRef</c>, or the acker reports the write failed. + /// </summary> + public Task<bool> AcknowledgeAsync(string conditionId, string comment, CancellationToken cancellationToken = default) + { + if (_disposed || !_conditions.TryGetValue(conditionId, out var state)) + return Task.FromResult(false); + if (state.Acker is null || string.IsNullOrEmpty(state.Info.AckMsgWriteRef)) + return Task.FromResult(false); + return state.Acker.WriteAckMessageAsync(state.Info.AckMsgWriteRef, comment ?? string.Empty, cancellationToken); + } + + /// <summary> + /// Snapshot every tracked condition's current state. Diagnostic / dashboard use only. + /// </summary> + public IReadOnlyList<AlarmConditionSnapshot> Snapshot() + { + return [.. _conditions.Values.Select(s => + { + lock (s.Lock) + return new AlarmConditionSnapshot(s.ConditionId, s.InAlarm, s.Acked, s.Priority, s.Description); + })]; + } + + /// <summary> + /// Feed a value change for one of the registered sub-attribute references. + /// The service runs the state machine and raises <see cref="TransitionRaised"/> + /// when the change produces a lifecycle transition. Unknown references are + /// silently dropped — the caller may register and unregister concurrently with + /// value-change delivery, and a stale callback for a recently-untracked + /// condition must not throw. + /// </summary> + public void OnValueChanged(string fullReference, DataValueSnapshot value) + { + if (_disposed) return; + if (string.IsNullOrEmpty(fullReference)) return; + + List<(string ConditionId, AlarmField Field)>? targets; + lock (_refMapLock) + { + if (!_refToCondition.TryGetValue(fullReference, out targets) || targets.Count == 0) return; + // Snapshot under lock; the state machine runs outside. + targets = [.. targets]; + } + + var now = _clock(); + foreach (var (conditionId, field) in targets) + { + if (!_conditions.TryGetValue(conditionId, out var state)) continue; + + AlarmConditionTransition? transition = null; + lock (state.Lock) + { + transition = ApplyValue(state, field, value, now); + } + + if (transition is { } t) + { + TransitionRaised?.Invoke(this, t); + } + } + } + + /// <summary> + /// Apply one value change to one condition. Returns a transition when the + /// change crosses a state boundary; null otherwise. Caller holds <c>state.Lock</c>. + /// </summary> + private static AlarmConditionTransition? ApplyValue( + AlarmConditionState state, AlarmField field, DataValueSnapshot value, DateTime now) + { + AlarmConditionTransition? transition = null; + state.LastUpdateUtc = now; + + switch (field) + { + case AlarmField.InAlarm: + { + var wasActive = state.InAlarm; + var isActive = value.Value is bool b && b; + state.InAlarm = isActive; + if (!wasActive && isActive) + { + // Reset Acked on every active transition so a re-alarm requires fresh ack. + state.Acked = false; + transition = new AlarmConditionTransition( + state.ConditionId, AlarmStateTransition.Active, + state.Priority, state.Description, now); + } + else if (wasActive && !isActive) + { + transition = new AlarmConditionTransition( + state.ConditionId, AlarmStateTransition.Inactive, + state.Priority, state.Description, now); + } + break; + } + case AlarmField.Priority: + state.Priority = CoercePriority(value.Value, state.Priority); + break; + case AlarmField.DescAttrName: + state.Description = value.Value as string; + break; + case AlarmField.Acked: + { + var wasAcked = state.Acked; + var isAcked = value.Value is bool b && b; + state.Acked = isAcked; + // Only fire Acknowledged on false → true while still active. The first + // post-Track callback often arrives with isAcked == wasAcked (state starts + // Acked=true so an initially-quiet alarm doesn't misfire). + if (!wasAcked && isAcked && state.InAlarm) + { + transition = new AlarmConditionTransition( + state.ConditionId, AlarmStateTransition.Acknowledged, + state.Priority, state.Description, now); + } + break; + } + } + + return transition; + } + + private static int CoercePriority(object? raw, int fallback) => raw switch + { + int i => i, + short s => s, + long l when l <= int.MaxValue => (int)l, + byte b => b, + ushort us => us, + uint ui when ui <= int.MaxValue => (int)ui, + _ => fallback, + }; + + private void AddRefMapping(string? fullRef, string conditionId, AlarmField field) + { + if (string.IsNullOrEmpty(fullRef)) return; + if (!_refToCondition.TryGetValue(fullRef, out var list)) + { + list = []; + _refToCondition[fullRef] = list; + } + list.Add((conditionId, field)); + } + + private void RemoveRefMapping(string? fullRef, string conditionId) + { + if (string.IsNullOrEmpty(fullRef)) return; + if (!_refToCondition.TryGetValue(fullRef, out var list)) return; + list.RemoveAll(t => string.Equals(t.ConditionId, conditionId, StringComparison.OrdinalIgnoreCase)); + if (list.Count == 0) _refToCondition.TryRemove(fullRef, out _); + } + + public void Dispose() + { + if (_disposed) return; + _disposed = true; + _conditions.Clear(); + lock (_refMapLock) _refToCondition.Clear(); + } + + private enum AlarmField { InAlarm, Priority, DescAttrName, Acked } + + /// <summary>Per-condition mutable state. Access guarded by <see cref="Lock"/>.</summary> + private sealed class AlarmConditionState(string conditionId, AlarmConditionInfo info, IAlarmAcknowledger? acker) + { + public readonly object Lock = new(); + public string ConditionId { get; } = conditionId; + public AlarmConditionInfo Info { get; } = info; + public IAlarmAcknowledger? Acker { get; } = acker; + + public bool InAlarm; + + // Default Acked=true so the first post-Track callback (.Acked=true on a quiet + // alarm) doesn't misfire as a transition. Active sets it back to false. + public bool Acked = true; + + public int Priority; + public string? Description; + public DateTime LastUpdateUtc; + } +} diff --git a/src/ZB.MOM.WW.OtOpcUa.Server/Alarms/AlarmConditionTransition.cs b/src/ZB.MOM.WW.OtOpcUa.Server/Alarms/AlarmConditionTransition.cs new file mode 100644 index 0000000..c4c3612 --- /dev/null +++ b/src/ZB.MOM.WW.OtOpcUa.Server/Alarms/AlarmConditionTransition.cs @@ -0,0 +1,44 @@ +namespace ZB.MOM.WW.OtOpcUa.Server.Alarms; + +/// <summary> +/// Lifecycle transition for an alarm condition. Mirrors OPC UA Part 9 alarm states +/// simplified to the active / acknowledged / inactive triplet that every driver in +/// the repo exposes today. +/// </summary> +public enum AlarmStateTransition +{ + /// <summary>InAlarm flipped false → true. Default to unacknowledged.</summary> + Active, + + /// <summary>Acked flipped false → true while the alarm is still active.</summary> + Acknowledged, + + /// <summary>InAlarm flipped true → false.</summary> + Inactive, +} + +/// <summary> +/// One alarm-state transition raised by <see cref="AlarmConditionService.TransitionRaised"/>. +/// </summary> +/// <param name="ConditionId">Stable identifier the caller registered the condition under (typically the driver's alarm full reference).</param> +/// <param name="Transition">Which state the alarm transitioned to.</param> +/// <param name="Priority">Latest known priority. 0 when no priority sub-attribute was registered or no value has been observed yet.</param> +/// <param name="Description">Latest known description text; null when not registered or not yet observed.</param> +/// <param name="AtUtc">Server-clock UTC of the value change that produced this transition.</param> +public sealed record AlarmConditionTransition( + string ConditionId, + AlarmStateTransition Transition, + int Priority, + string? Description, + DateTime AtUtc); + +/// <summary> +/// Read-only snapshot of an alarm condition's current state. Used for diagnostics +/// and dashboards; not part of the live transition stream. +/// </summary> +public sealed record AlarmConditionSnapshot( + string ConditionId, + bool InAlarm, + bool Acked, + int Priority, + string? Description); diff --git a/src/ZB.MOM.WW.OtOpcUa.Server/Alarms/IAlarmAcknowledger.cs b/src/ZB.MOM.WW.OtOpcUa.Server/Alarms/IAlarmAcknowledger.cs new file mode 100644 index 0000000..eaf2634 --- /dev/null +++ b/src/ZB.MOM.WW.OtOpcUa.Server/Alarms/IAlarmAcknowledger.cs @@ -0,0 +1,23 @@ +namespace ZB.MOM.WW.OtOpcUa.Server.Alarms; + +/// <summary> +/// Strategy for routing operator acknowledgement writes back to the underlying driver. +/// Decouples <see cref="AlarmConditionService"/> from any specific driver's write API +/// so the service can be tested without a real driver and reused across drivers with +/// different write paths. +/// </summary> +/// <remarks> +/// PR 2.3 supplies a default implementation that writes through the driver's +/// <c>IWritable.WriteAsync</c> using the <c>AckMsgWriteRef</c> from +/// <c>AlarmConditionInfo</c>. Drivers that route acks differently (e.g. a dedicated +/// RPC) can supply a custom implementation when registering the condition. +/// </remarks> +public interface IAlarmAcknowledger +{ + /// <summary> + /// Writes the operator's <paramref name="comment"/> to <paramref name="ackMsgWriteRef"/>. + /// Returns true on driver-reported success, false otherwise. Implementations should + /// propagate cancellation but never throw on a write that the driver cleanly rejects. + /// </summary> + Task<bool> WriteAckMessageAsync(string ackMsgWriteRef, string comment, CancellationToken cancellationToken); +} diff --git a/src/ZB.MOM.WW.OtOpcUa.Server/History/HistoryRouter.cs b/src/ZB.MOM.WW.OtOpcUa.Server/History/HistoryRouter.cs new file mode 100644 index 0000000..6d14167 --- /dev/null +++ b/src/ZB.MOM.WW.OtOpcUa.Server/History/HistoryRouter.cs @@ -0,0 +1,71 @@ +using System.Collections.Concurrent; +using ZB.MOM.WW.OtOpcUa.Core.Abstractions; + +namespace ZB.MOM.WW.OtOpcUa.Server.History; + +/// <summary> +/// Default <see cref="IHistoryRouter"/> implementation. +/// </summary> +public sealed class HistoryRouter : IHistoryRouter +{ + private readonly ConcurrentDictionary<string, IHistorianDataSource> _registry = + new(StringComparer.OrdinalIgnoreCase); + + private bool _disposed; + + /// <inheritdoc /> + public void Register(string fullReferencePrefix, IHistorianDataSource source) + { + ObjectDisposedException.ThrowIf(_disposed, this); + ArgumentNullException.ThrowIfNull(fullReferencePrefix); + ArgumentNullException.ThrowIfNull(source); + + if (!_registry.TryAdd(fullReferencePrefix, source)) + { + throw new InvalidOperationException( + $"A historian data source is already registered for prefix '{fullReferencePrefix}'."); + } + } + + /// <inheritdoc /> + public IHistorianDataSource? Resolve(string fullReference) + { + ObjectDisposedException.ThrowIf(_disposed, this); + ArgumentNullException.ThrowIfNull(fullReference); + + // Longest-prefix match. Sources are typically a handful per server, so a linear + // scan is fine and avoids building a trie for a low-cardinality registry. + IHistorianDataSource? best = null; + var bestPrefixLength = -1; + + foreach (var (prefix, source) in _registry) + { + if (fullReference.StartsWith(prefix, StringComparison.OrdinalIgnoreCase) + && prefix.Length > bestPrefixLength) + { + best = source; + bestPrefixLength = prefix.Length; + } + } + + return best; + } + + /// <summary> + /// Disposes every registered source and prevents further registrations or + /// resolutions. Sources may not all be disposable — null-safe disposal pattern. + /// </summary> + public void Dispose() + { + if (_disposed) return; + _disposed = true; + + foreach (var source in _registry.Values) + { + try { source.Dispose(); } + catch { /* best-effort — server shutdown should not throw on a misbehaving source */ } + } + + _registry.Clear(); + } +} diff --git a/src/ZB.MOM.WW.OtOpcUa.Server/History/IHistoryRouter.cs b/src/ZB.MOM.WW.OtOpcUa.Server/History/IHistoryRouter.cs new file mode 100644 index 0000000..ee6c2e8 --- /dev/null +++ b/src/ZB.MOM.WW.OtOpcUa.Server/History/IHistoryRouter.cs @@ -0,0 +1,37 @@ +using ZB.MOM.WW.OtOpcUa.Core.Abstractions; + +namespace ZB.MOM.WW.OtOpcUa.Server.History; + +/// <summary> +/// Server-level routing of OPC UA HistoryRead service calls to a registered +/// <see cref="IHistorianDataSource"/>. One router per server instance; sources are +/// registered at startup keyed by a driver-side full-reference prefix (typically the +/// driver instance id). +/// </summary> +/// <remarks> +/// <para> +/// The router decouples history availability from the driver lifecycle: a driver +/// can restart (or be temporarily disconnected) without taking history offline, +/// and a single historian can serve nodes from multiple drivers. +/// </para> +/// <para> +/// Resolution is by longest-prefix match so a per-driver source registered under +/// <c>"galaxy"</c> wins over a fallback registered under empty string. +/// </para> +/// </remarks> +public interface IHistoryRouter : IDisposable +{ + /// <summary> + /// Resolves a full reference to its registered data source, or null when no source + /// covers it. + /// </summary> + IHistorianDataSource? Resolve(string fullReference); + + /// <summary> + /// Registers a data source for full references that start with + /// <paramref name="fullReferencePrefix"/>. Throws when the prefix is already + /// registered — duplicate registrations indicate a startup-config bug rather than + /// a runtime concern. + /// </summary> + void Register(string fullReferencePrefix, IHistorianDataSource source); +} diff --git a/src/ZB.MOM.WW.OtOpcUa.Server/OpcUa/DriverNodeManager.cs b/src/ZB.MOM.WW.OtOpcUa.Server/OpcUa/DriverNodeManager.cs index 3b1874f..83d0c20 100644 --- a/src/ZB.MOM.WW.OtOpcUa.Server/OpcUa/DriverNodeManager.cs +++ b/src/ZB.MOM.WW.OtOpcUa.Server/OpcUa/DriverNodeManager.cs @@ -5,6 +5,8 @@ using Opc.Ua.Server; using ZB.MOM.WW.OtOpcUa.Core.Abstractions; using ZB.MOM.WW.OtOpcUa.Core.Authorization; using ZB.MOM.WW.OtOpcUa.Core.Resilience; +using ZB.MOM.WW.OtOpcUa.Server.Alarms; +using ZB.MOM.WW.OtOpcUa.Server.History; using ZB.MOM.WW.OtOpcUa.Server.Security; using DriverWriteRequest = ZB.MOM.WW.OtOpcUa.Core.Abstractions.WriteRequest; // Core.Abstractions defines a type-named HistoryReadResult (driver-side samples + continuation @@ -85,10 +87,31 @@ public sealed class DriverNodeManager : CustomNodeManager2, IAddressSpaceBuilder private readonly IReadable? _virtualReadable; private readonly IReadable? _scriptedAlarmReadable; + // PR 1.3 — server-level history routing. When non-null + a source is registered for + // the requested full reference, the four HistoryRead* overrides dispatch through the + // router. Otherwise we fall back to the legacy `_driver as IHistoryProvider` path + // wrapped in a thin adapter, so existing tests and drivers that still implement + // IHistoryProvider directly keep working until PR 1.W flips DI to register the + // legacy path inside the router. + private readonly IHistoryRouter? _historyRouter; + private LegacyDriverHistoryAdapter? _legacyHistoryAdapter; + + // PR 2.3 — server-level alarm-condition state machine. When non-null, every + // MarkAsAlarmCondition call also registers the condition with the service so the + // server runs the Active/Acknowledged/Inactive transitions itself instead of + // relying on the driver's own tracker. _conditionSinks maps conditionId → + // ConditionSink so service-raised transitions reach the right OPC UA AlarmCondition + // sibling. Legacy IAlarmSource path keeps working in parallel until PR 7.2. + private readonly AlarmConditionService? _alarmService; + private readonly Dictionary<string, ConditionSink> _conditionSinks = new(StringComparer.OrdinalIgnoreCase); + private EventHandler<AlarmConditionTransition>? _alarmTransitionHandler; + public DriverNodeManager(IServerInternal server, ApplicationConfiguration configuration, IDriver driver, CapabilityInvoker invoker, ILogger<DriverNodeManager> logger, AuthorizationGate? authzGate = null, NodeScopeResolver? scopeResolver = null, - IReadable? virtualReadable = null, IReadable? scriptedAlarmReadable = null) + IReadable? virtualReadable = null, IReadable? scriptedAlarmReadable = null, + IHistoryRouter? historyRouter = null, + AlarmConditionService? alarmService = null) : base(server, configuration, namespaceUris: $"urn:OtOpcUa:{driver.DriverInstanceId}") { _driver = driver; @@ -100,7 +123,117 @@ public sealed class DriverNodeManager : CustomNodeManager2, IAddressSpaceBuilder _scopeResolver = scopeResolver; _virtualReadable = virtualReadable; _scriptedAlarmReadable = scriptedAlarmReadable; + _historyRouter = historyRouter; + _alarmService = alarmService; _logger = logger; + + if (_alarmService is not null) + { + _alarmTransitionHandler = OnAlarmServiceTransition; + _alarmService.TransitionRaised += _alarmTransitionHandler; + } + } + + /// <summary> + /// Routes <see cref="AlarmConditionService.TransitionRaised"/> to the matching + /// <see cref="ConditionSink"/> registered during <c>MarkAsAlarmCondition</c>. Translates + /// <see cref="AlarmConditionTransition"/> into the legacy <see cref="AlarmEventArgs"/> + /// shape the existing sink consumes — the sink's switch on <c>AlarmType</c> string + /// ("Active" / "Acknowledged" / "Inactive") is preserved so PR 2.3 doesn't perturb the + /// OPC UA Part 9 state mapping. Stale transitions for an untracked condition are + /// silently dropped. + /// </summary> + private void OnAlarmServiceTransition(object? sender, AlarmConditionTransition t) + { + ConditionSink? sink; + lock (Lock) + { + _conditionSinks.TryGetValue(t.ConditionId, out sink); + } + if (sink is null) return; + + var transitionName = t.Transition switch + { + AlarmStateTransition.Active => "Active", + AlarmStateTransition.Acknowledged => "Acknowledged", + AlarmStateTransition.Inactive => "Inactive", + _ => "Unknown", + }; + + sink.OnTransition(new AlarmEventArgs( + SubscriptionHandle: null!, + SourceNodeId: t.ConditionId, + ConditionId: t.ConditionId, + AlarmType: transitionName, + Message: t.Description ?? t.ConditionId, + Severity: MapPriorityToSeverity(t.Priority), + SourceTimestampUtc: t.AtUtc)); + } + + /// <summary> + /// Maps the integer priority Galaxy carries on <c>.Priority</c> (typically 1-1000) to + /// the four-bucket <see cref="AlarmSeverity"/> the OPC UA condition sibling consumes. + /// Mirrors the legacy <c>GalaxyProxyDriver.MapSeverity</c> bucketing. + /// </summary> + private static AlarmSeverity MapPriorityToSeverity(int priority) => priority switch + { + <= 250 => AlarmSeverity.Low, + <= 500 => AlarmSeverity.Medium, + <= 800 => AlarmSeverity.High, + _ => AlarmSeverity.Critical, + }; + + /// <summary> + /// Default <see cref="IAlarmAcknowledger"/> bound to a driver's <see cref="IWritable"/>. + /// Writes the operator comment to the alarm's <c>.AckMsg</c> sub-attribute via the same + /// dispatcher OnWriteValue uses so the resilience pipeline gates the call. Returns + /// false when the driver doesn't implement <see cref="IWritable"/> — alarms whose + /// drivers can't write are tracked but cannot be acknowledged through this path. + /// </summary> + private sealed class DriverWritableAcknowledger( + IWritable? writable, CapabilityInvoker invoker, string driverInstanceId) : IAlarmAcknowledger + { + public async Task<bool> WriteAckMessageAsync( + string ackMsgWriteRef, string comment, CancellationToken cancellationToken) + { + if (writable is null || string.IsNullOrEmpty(ackMsgWriteRef)) return false; + + var request = new DriverWriteRequest( + FullReference: ackMsgWriteRef, + Value: comment ?? string.Empty); + + try + { + // Ack writes are not idempotent — repeating an ack would re-trigger the + // driver-side acknowledgement state change. False matches the OnWriteValue + // default path for non-Idempotent attributes. + var results = await invoker.ExecuteWriteAsync( + driverInstanceId, + isIdempotent: false, + async ct => await writable.WriteAsync(new[] { request }, ct).ConfigureAwait(false), + cancellationToken).ConfigureAwait(false); + return results.Count > 0 && results[0].StatusCode == 0; + } + catch + { + return false; + } + } + } + + /// <summary> + /// Detach from the alarm service before the base disposes. The service is shared across + /// drivers, so leaking the handler keeps a dead DriverNodeManager pinned in memory and + /// dispatches transitions to a sink that's no longer wired to any OPC UA node. + /// </summary> + protected override void Dispose(bool disposing) + { + if (disposing && _alarmService is not null && _alarmTransitionHandler is not null) + { + _alarmService.TransitionRaised -= _alarmTransitionHandler; + _alarmTransitionHandler = null; + } + base.Dispose(disposing); } protected override NodeStateCollection LoadPredefinedNodes(ISystemContext context) => new(); @@ -644,7 +777,22 @@ public sealed class DriverNodeManager : CustomNodeManager2, IAddressSpaceBuilder // Without this the Report fires but has no subscribers to deliver to. _owner.AddRootNotifier(alarm); - return new ConditionSink(_owner, alarm); + var sink = new ConditionSink(_owner, alarm); + + // PR 2.3 — when the server-level alarm-condition service is wired, register + // this condition with it so the state machine runs server-side. The sink-map + // entry routes future TransitionRaised events back to this OPC UA node. + // Conditions whose info lacks an InAlarmRef can't be observed without driver + // help — those still rely on the legacy IAlarmSource path until PR 7.2. + if (_owner._alarmService is not null && !string.IsNullOrEmpty(info.InAlarmRef)) + { + _owner._conditionSinks[FullReference] = sink; + var acker = new DriverWritableAcknowledger( + _owner._writable, _owner._invoker, _owner._driver.DriverInstanceId); + _owner._alarmService.Track(FullReference, info, acker); + } + + return sink; } } @@ -808,29 +956,97 @@ public sealed class DriverNodeManager : CustomNodeManager2, IAddressSpaceBuilder internal bool TryGetVariable(string fullRef, out BaseDataVariableState? v) => _variablesByFullRef.TryGetValue(fullRef, out v!); - // ===================== HistoryRead service handlers (LMX #1, PR 38) ===================== + // ===================== HistoryRead service handlers (LMX #1, PR 38; PR 1.3 routing) ===================== // - // Wires the driver's IHistoryProvider capability (PR 35 added ReadAtTimeAsync / ReadEventsAsync - // alongside the PR 19 ReadRawAsync / ReadProcessedAsync) to the OPC UA HistoryRead service. - // CustomNodeManager2 has four protected per-kind hooks; the base dispatches to the right one - // based on the concrete HistoryReadDetails subtype. Each hook is sync-returning-void — the - // per-driver async calls are bridged via GetAwaiter().GetResult(), matching the pattern - // OnReadValue / OnWriteValue already use in this class so HistoryRead doesn't introduce a - // different sync-over-async convention. + // Wires HistoryRead to the server-level IHistoryRouter (PR 1.2). For each tag: + // (1) the router resolves the longest-matching IHistorianDataSource registration — + // when a server-registered source covers the namespace it wins; (2) when the router + // doesn't match (or no router is configured), we fall back to the driver's own + // IHistoryProvider capability via a thin adapter, preserving the legacy behavior tests + // rely on. PR 1.W will register the legacy adapter inside the router as well, at + // which point this fallback can be deleted. // - // Per-node routing: every HistoryReadValueId in nodesToRead has a NodeHandle in - // nodesToProcess; the NodeHandle's NodeId.Identifier is the driver-side full reference - // (set during Variable() registration) so we can dispatch straight to IHistoryProvider - // without a second lookup. Nodes without IHistoryProvider backing (drivers that don't - // implement the capability) surface BadHistoryOperationUnsupported per slot and the - // rest of the batch continues — same failure-isolation pattern as OnWriteValue. - // - // Continuation-point handling is pass-through only in this PR: the driver returns null - // from its ContinuationPoint field today so the outer result's ContinuationPoint stays - // empty. Full Session.SaveHistoryContinuationPoint plumbing is a follow-up when a driver - // actually needs paging — the dispatch shape doesn't change, only the result-population. + // Continuation-point handling is pass-through only: the source returns null from its + // ContinuationPoint today so the outer result's ContinuationPoint stays empty. Proper + // Session.SaveHistoryContinuationPoint plumbing is a follow-up when a source actually + // needs paging — the dispatch shape doesn't change, only the result-population. - private IHistoryProvider? History => _driver as IHistoryProvider; + /// <summary> + /// Resolves the historian data source for a given driver full reference. Returns + /// null when neither the router nor the legacy IHistoryProvider path can serve it. + /// </summary> + /// <param name="fullRef"> + /// Full reference, or null for driver-root event-history queries (event reads can + /// target a notifier rather than a specific variable). Null fullRef skips router + /// lookup and goes straight to the legacy fallback so today's "all events in the + /// driver namespace" path keeps working. + /// </param> + private IHistorianDataSource? ResolveHistory(string? fullRef) + { + if (fullRef is not null + && _historyRouter?.Resolve(fullRef) is { } routed) + { + return routed; + } + + if (_driver is IHistoryProvider legacy) + { + return _legacyHistoryAdapter ??= new LegacyDriverHistoryAdapter(legacy); + } + + return null; + } + + /// <summary> + /// Wraps a driver's <see cref="IHistoryProvider"/> as an + /// <see cref="IHistorianDataSource"/> so the four HistoryRead* methods can dispatch + /// through one interface regardless of resolution path. PR 1.W's legacy + /// auto-registration uses the same adapter; PR 7.2 deletes both once + /// IHistoryProvider stops being a driver capability. + /// </summary> + // OTOPCUA0001 (UnwrappedCapabilityCallAnalyzer) flags every direct IHistoryProvider call + // that isn't lexically inside a CapabilityInvoker.ExecuteAsync lambda. The adapter's + // pass-throughs are direct calls — but the four HistoryRead* call sites that own the + // adapter ARE inside ExecuteAsync lambdas, so the wrapping is preserved at runtime. + // Suppress here rather than at every call site. +#pragma warning disable OTOPCUA0001 + private sealed class LegacyDriverHistoryAdapter(IHistoryProvider provider) : IHistorianDataSource + { + // HistoryReadResult is unqualified-ambiguous in this file (Core.Abstractions vs. + // Opc.Ua); fully qualify on the adapter signatures so the file's existing var-based + // dispatch sites stay readable. + public Task<Core.Abstractions.HistoryReadResult> ReadRawAsync( + string fullReference, DateTime startUtc, DateTime endUtc, uint maxValuesPerNode, + CancellationToken cancellationToken) + => provider.ReadRawAsync(fullReference, startUtc, endUtc, maxValuesPerNode, cancellationToken); + + public Task<Core.Abstractions.HistoryReadResult> ReadProcessedAsync( + string fullReference, DateTime startUtc, DateTime endUtc, TimeSpan interval, + HistoryAggregateType aggregate, CancellationToken cancellationToken) + => provider.ReadProcessedAsync(fullReference, startUtc, endUtc, interval, aggregate, cancellationToken); + + public Task<Core.Abstractions.HistoryReadResult> ReadAtTimeAsync( + string fullReference, IReadOnlyList<DateTime> timestampsUtc, CancellationToken cancellationToken) + => provider.ReadAtTimeAsync(fullReference, timestampsUtc, cancellationToken); + + public Task<HistoricalEventsResult> ReadEventsAsync( + string? sourceName, DateTime startUtc, DateTime endUtc, int maxEvents, + CancellationToken cancellationToken) + => provider.ReadEventsAsync(sourceName, startUtc, endUtc, maxEvents, cancellationToken); + + // Legacy IHistoryProvider has no health surface. Return an "unknown but reachable" + // snapshot so dashboards don't show the data source as broken. + public HistorianHealthSnapshot GetHealthSnapshot() + => new(0, 0, 0, 0, null, null, null, + ProcessConnectionOpen: true, EventConnectionOpen: true, + ActiveProcessNode: null, ActiveEventNode: null, + Nodes: []); + + // Legacy lifecycle is the driver's responsibility — disposing the adapter must + // not dispose the driver out from under DriverNodeManager. + public void Dispose() { } + } +#pragma warning restore OTOPCUA0001 protected override void HistoryReadRawModified( ServerSystemContext context, ReadRawModifiedDetails details, TimestampsToReturn timestamps, @@ -838,12 +1054,6 @@ public sealed class DriverNodeManager : CustomNodeManager2, IAddressSpaceBuilder IList<ServiceResult> errors, List<NodeHandle> nodesToProcess, IDictionary<NodeId, NodeState> cache) { - if (History is null) - { - MarkAllUnsupported(nodesToProcess, results, errors); - return; - } - // IsReadModified=true requests a "modifications" history (who changed the data, when // it was re-written). The driver side has no modifications store — surface that // explicitly rather than silently returning raw data, which would mislead the client. @@ -868,6 +1078,13 @@ public sealed class DriverNodeManager : CustomNodeManager2, IAddressSpaceBuilder continue; } + var source = ResolveHistory(fullRef); + if (source is null) + { + WriteUnsupported(results, errors, i); + continue; + } + if (_authzGate is not null && _scopeResolver is not null) { var historyScope = _scopeResolver.Resolve(fullRef); @@ -883,7 +1100,7 @@ public sealed class DriverNodeManager : CustomNodeManager2, IAddressSpaceBuilder var driverResult = _invoker.ExecuteAsync( DriverCapability.HistoryRead, ResolveHostFor(fullRef), - async ct => await History.ReadRawAsync( + async ct => await source.ReadRawAsync( fullRef, details.StartTime, details.EndTime, @@ -912,12 +1129,6 @@ public sealed class DriverNodeManager : CustomNodeManager2, IAddressSpaceBuilder IList<ServiceResult> errors, List<NodeHandle> nodesToProcess, IDictionary<NodeId, NodeState> cache) { - if (History is null) - { - MarkAllUnsupported(nodesToProcess, results, errors); - return; - } - // AggregateType is one NodeId shared across every item in the batch — map once. var aggregate = MapAggregate(details.AggregateType?.FirstOrDefault()); if (aggregate is null) @@ -930,10 +1141,6 @@ public sealed class DriverNodeManager : CustomNodeManager2, IAddressSpaceBuilder for (var n = 0; n < nodesToProcess.Count; n++) { var handle = nodesToProcess[n]; - // NodeHandle.Index points back to the slot in the outer results/errors/nodesToRead - // arrays. nodesToProcess is the filtered subset (just the nodes this manager - // claimed), so writing to results[n] lands in the wrong slot when N > 1 and nodes - // are interleaved across multiple node managers. var i = handle.Index; var fullRef = ResolveFullRef(handle); if (fullRef is null) @@ -942,6 +1149,13 @@ public sealed class DriverNodeManager : CustomNodeManager2, IAddressSpaceBuilder continue; } + var source = ResolveHistory(fullRef); + if (source is null) + { + WriteUnsupported(results, errors, i); + continue; + } + if (_authzGate is not null && _scopeResolver is not null) { var historyScope = _scopeResolver.Resolve(fullRef); @@ -957,7 +1171,7 @@ public sealed class DriverNodeManager : CustomNodeManager2, IAddressSpaceBuilder var driverResult = _invoker.ExecuteAsync( DriverCapability.HistoryRead, ResolveHostFor(fullRef), - async ct => await History.ReadProcessedAsync( + async ct => await source.ReadProcessedAsync( fullRef, details.StartTime, details.EndTime, @@ -987,20 +1201,10 @@ public sealed class DriverNodeManager : CustomNodeManager2, IAddressSpaceBuilder IList<ServiceResult> errors, List<NodeHandle> nodesToProcess, IDictionary<NodeId, NodeState> cache) { - if (History is null) - { - MarkAllUnsupported(nodesToProcess, results, errors); - return; - } - var requestedTimes = (IReadOnlyList<DateTime>)(details.ReqTimes?.ToArray() ?? Array.Empty<DateTime>()); for (var n = 0; n < nodesToProcess.Count; n++) { var handle = nodesToProcess[n]; - // NodeHandle.Index points back to the slot in the outer results/errors/nodesToRead - // arrays. nodesToProcess is the filtered subset (just the nodes this manager - // claimed), so writing to results[n] lands in the wrong slot when N > 1 and nodes - // are interleaved across multiple node managers. var i = handle.Index; var fullRef = ResolveFullRef(handle); if (fullRef is null) @@ -1009,6 +1213,13 @@ public sealed class DriverNodeManager : CustomNodeManager2, IAddressSpaceBuilder continue; } + var source = ResolveHistory(fullRef); + if (source is null) + { + WriteUnsupported(results, errors, i); + continue; + } + if (_authzGate is not null && _scopeResolver is not null) { var historyScope = _scopeResolver.Resolve(fullRef); @@ -1024,7 +1235,7 @@ public sealed class DriverNodeManager : CustomNodeManager2, IAddressSpaceBuilder var driverResult = _invoker.ExecuteAsync( DriverCapability.HistoryRead, ResolveHostFor(fullRef), - async ct => await History.ReadAtTimeAsync(fullRef, requestedTimes, ct).ConfigureAwait(false), + async ct => await source.ReadAtTimeAsync(fullRef, requestedTimes, ct).ConfigureAwait(false), CancellationToken.None).AsTask().GetAwaiter().GetResult(); WriteResult(results, errors, i, StatusCodes.Good, @@ -1048,34 +1259,30 @@ public sealed class DriverNodeManager : CustomNodeManager2, IAddressSpaceBuilder IList<ServiceResult> errors, List<NodeHandle> nodesToProcess, IDictionary<NodeId, NodeState> cache) { - if (History is null) - { - MarkAllUnsupported(nodesToProcess, results, errors); - return; - } - // SourceName filter extraction is deferred — EventFilter SelectClauses + WhereClause - // handling is a dedicated concern (proper per-select-clause Variant population + where - // filter evaluation). This PR treats the event query as "all events in range for the - // node's source" and populates only the standard BaseEventType fields. Richer filter - // handling is a follow-up; clients issuing empty/default filters get the right answer - // today which covers the common alarm-history browse case. + // handling is a dedicated concern. This PR treats the event query as "all events in + // range for the node's source" and populates only the standard BaseEventType fields. var maxEvents = (int)details.NumValuesPerNode; if (maxEvents <= 0) maxEvents = 1000; for (var n = 0; n < nodesToProcess.Count; n++) { var handle = nodesToProcess[n]; - // NodeHandle.Index points back to the slot in the outer results/errors/nodesToRead - // arrays. nodesToProcess is the filtered subset (just the nodes this manager - // claimed), so writing to results[n] lands in the wrong slot when N > 1 and nodes - // are interleaved across multiple node managers. var i = handle.Index; // Event history queries may target a notifier object (e.g. the driver-root folder) - // rather than a specific variable — in that case we pass sourceName=null to mean - // "all sources in the driver's namespace" per the IHistoryProvider contract. + // rather than a specific variable — in that case fullRef is null and we pass + // sourceName=null to the source meaning "all sources in this source's namespace." var fullRef = ResolveFullRef(handle); + // ResolveHistory tolerates null fullRef — for notifier queries the router is + // skipped and the legacy fallback handles "all sources" reads. + var source = ResolveHistory(fullRef); + if (source is null) + { + WriteUnsupported(results, errors, i); + continue; + } + // fullRef is null for event-history queries that target a notifier (driver root). // Those are cluster-wide reads + need a different scope shape; skip the gate here // and let the driver-level authz handle them. Non-null path gets per-node gating. @@ -1094,7 +1301,7 @@ public sealed class DriverNodeManager : CustomNodeManager2, IAddressSpaceBuilder var driverResult = _invoker.ExecuteAsync( DriverCapability.HistoryRead, fullRef is null ? _driver.DriverInstanceId : ResolveHostFor(fullRef), - async ct => await History.ReadEventsAsync( + async ct => await source.ReadEventsAsync( sourceName: fullRef, startUtc: details.StartTime, endUtc: details.EndTime, diff --git a/tests/ZB.MOM.WW.OtOpcUa.Core.Abstractions.Tests/Alarms/AlarmConditionInfoTests.cs b/tests/ZB.MOM.WW.OtOpcUa.Core.Abstractions.Tests/Alarms/AlarmConditionInfoTests.cs new file mode 100644 index 0000000..664de66 --- /dev/null +++ b/tests/ZB.MOM.WW.OtOpcUa.Core.Abstractions.Tests/Alarms/AlarmConditionInfoTests.cs @@ -0,0 +1,100 @@ +using Shouldly; +using Xunit; +using ZB.MOM.WW.OtOpcUa.Core.Abstractions; + +namespace ZB.MOM.WW.OtOpcUa.Core.Abstractions.Tests.Alarms; + +/// <summary> +/// Contract tests for the <see cref="AlarmConditionInfo"/> record extension added in PR 2.1. +/// Five sub-attribute references (InAlarmRef, PriorityRef, DescAttrNameRef, AckedRef, +/// AckMsgWriteRef) carry the driver-side tag references the server-level alarm-condition +/// service uses to subscribe to live alarm-state attributes and route ack writes. +/// </summary> +public sealed class AlarmConditionInfoTests +{ + [Fact] + public void LegacyThreeArgConstructor_StillCompiles_AndDefaultsRefsToNull() + { + var info = new AlarmConditionInfo( + SourceName: "Tank.HiHi", + InitialSeverity: AlarmSeverity.High, + InitialDescription: "High-high alarm"); + + info.SourceName.ShouldBe("Tank.HiHi"); + info.InitialSeverity.ShouldBe(AlarmSeverity.High); + info.InitialDescription.ShouldBe("High-high alarm"); + info.InAlarmRef.ShouldBeNull(); + info.PriorityRef.ShouldBeNull(); + info.DescAttrNameRef.ShouldBeNull(); + info.AckedRef.ShouldBeNull(); + info.AckMsgWriteRef.ShouldBeNull(); + } + + [Fact] + public void FullConstructor_PopulatesAllFiveSubAttributeRefs() + { + var info = new AlarmConditionInfo( + SourceName: "Tank1.HiAlarm", + InitialSeverity: AlarmSeverity.Medium, + InitialDescription: "Tank level high", + InAlarmRef: "Tank1.HiAlarm.InAlarm", + PriorityRef: "Tank1.HiAlarm.Priority", + DescAttrNameRef: "Tank1.HiAlarm.DescAttrName", + AckedRef: "Tank1.HiAlarm.Acked", + AckMsgWriteRef: "Tank1.HiAlarm.AckMsg"); + + info.InAlarmRef.ShouldBe("Tank1.HiAlarm.InAlarm"); + info.PriorityRef.ShouldBe("Tank1.HiAlarm.Priority"); + info.DescAttrNameRef.ShouldBe("Tank1.HiAlarm.DescAttrName"); + info.AckedRef.ShouldBe("Tank1.HiAlarm.Acked"); + info.AckMsgWriteRef.ShouldBe("Tank1.HiAlarm.AckMsg"); + } + + [Fact] + public void RecordEquality_ComparesAllEightFields() + { + var a = new AlarmConditionInfo( + "T.Alarm", AlarmSeverity.Low, "desc", + "T.Alarm.InAlarm", "T.Alarm.Priority", "T.Alarm.DescAttrName", + "T.Alarm.Acked", "T.Alarm.AckMsg"); + + var b = new AlarmConditionInfo( + "T.Alarm", AlarmSeverity.Low, "desc", + "T.Alarm.InAlarm", "T.Alarm.Priority", "T.Alarm.DescAttrName", + "T.Alarm.Acked", "T.Alarm.AckMsg"); + + a.ShouldBe(b); + } + + [Fact] + public void RecordEquality_DistinctWhenAnyRefDiffers() + { + var baseInfo = new AlarmConditionInfo( + "T.Alarm", AlarmSeverity.Low, "desc", + InAlarmRef: "T.Alarm.InAlarm"); + + var differingAckRef = baseInfo with { AckedRef = "T.Alarm.Acked" }; + + baseInfo.ShouldNotBe(differingAckRef); + } + + [Fact] + public void WithExpression_AllowsPartialUpdates() + { + var legacy = new AlarmConditionInfo("S", AlarmSeverity.Medium, null); + + var enriched = legacy with + { + InAlarmRef = "S.InAlarm", + AckedRef = "S.Acked", + AckMsgWriteRef = "S.AckMsg", + }; + + enriched.SourceName.ShouldBe("S"); + enriched.InAlarmRef.ShouldBe("S.InAlarm"); + enriched.PriorityRef.ShouldBeNull(); + enriched.DescAttrNameRef.ShouldBeNull(); + enriched.AckedRef.ShouldBe("S.Acked"); + enriched.AckMsgWriteRef.ShouldBe("S.AckMsg"); + } +} diff --git a/tests/ZB.MOM.WW.OtOpcUa.Core.Abstractions.Tests/Historian/IHistorianDataSourceContractTests.cs b/tests/ZB.MOM.WW.OtOpcUa.Core.Abstractions.Tests/Historian/IHistorianDataSourceContractTests.cs new file mode 100644 index 0000000..9f81311 --- /dev/null +++ b/tests/ZB.MOM.WW.OtOpcUa.Core.Abstractions.Tests/Historian/IHistorianDataSourceContractTests.cs @@ -0,0 +1,121 @@ +using System.Reflection; +using Shouldly; +using Xunit; +using ZB.MOM.WW.OtOpcUa.Core.Abstractions; + +namespace ZB.MOM.WW.OtOpcUa.Core.Abstractions.Tests.Historian; + +/// <summary> +/// Structural contract tests for the historian data-source surface added in PR 1.1. +/// Asserts the type shape — implementations are tested in their own projects. +/// </summary> +public sealed class IHistorianDataSourceContractTests +{ + [Fact] + public void Interface_LivesInRootNamespace() + { + typeof(IHistorianDataSource).Namespace + .ShouldBe("ZB.MOM.WW.OtOpcUa.Core.Abstractions"); + } + + [Fact] + public void Interface_IsPublic() + { + typeof(IHistorianDataSource).IsPublic.ShouldBeTrue(); + typeof(IHistorianDataSource).IsInterface.ShouldBeTrue(); + } + + [Fact] + public void Interface_ExtendsIDisposable() + { + typeof(IDisposable).IsAssignableFrom(typeof(IHistorianDataSource)) + .ShouldBeTrue("data sources own backend connections; the server disposes them on shutdown"); + } + + [Theory] + [InlineData("ReadRawAsync", typeof(Task<HistoryReadResult>))] + [InlineData("ReadProcessedAsync", typeof(Task<HistoryReadResult>))] + [InlineData("ReadAtTimeAsync", typeof(Task<HistoryReadResult>))] + [InlineData("ReadEventsAsync", typeof(Task<HistoricalEventsResult>))] + public void ReadMethods_ReturnExpectedTaskShape(string methodName, Type expectedReturnType) + { + var method = typeof(IHistorianDataSource).GetMethod(methodName); + method.ShouldNotBeNull(); + method!.ReturnType.ShouldBe(expectedReturnType); + } + + [Fact] + public void GetHealthSnapshot_IsSynchronous() + { + var method = typeof(IHistorianDataSource).GetMethod("GetHealthSnapshot"); + method.ShouldNotBeNull(); + method!.ReturnType.ShouldBe(typeof(HistorianHealthSnapshot)); + } + + [Fact] + public void HealthSnapshot_AcceptsEmptyClusterNodeList() + { + var snapshot = new HistorianHealthSnapshot( + TotalQueries: 0, + TotalSuccesses: 0, + TotalFailures: 0, + ConsecutiveFailures: 0, + LastSuccessTime: null, + LastFailureTime: null, + LastError: null, + ProcessConnectionOpen: false, + EventConnectionOpen: false, + ActiveProcessNode: null, + ActiveEventNode: null, + Nodes: Array.Empty<HistorianClusterNodeState>()); + + snapshot.Nodes.ShouldBeEmpty(); + } + + [Fact] + public void HealthSnapshot_PreservesClusterNodes() + { + var node = new HistorianClusterNodeState( + Name: "hist-01", + IsHealthy: true, + CooldownUntil: null, + FailureCount: 0, + LastError: null, + LastFailureTime: null); + + var snapshot = new HistorianHealthSnapshot( + TotalQueries: 5, + TotalSuccesses: 5, + TotalFailures: 0, + ConsecutiveFailures: 0, + LastSuccessTime: new DateTime(2026, 4, 29, 12, 0, 0, DateTimeKind.Utc), + LastFailureTime: null, + LastError: null, + ProcessConnectionOpen: true, + EventConnectionOpen: true, + ActiveProcessNode: "hist-01", + ActiveEventNode: "hist-01", + Nodes: new[] { node }); + + snapshot.Nodes.Count.ShouldBe(1); + snapshot.Nodes[0].ShouldBe(node); + } + + [Fact] + public void ClusterNodeState_RecordEqualityByValue() + { + var a = new HistorianClusterNodeState("hist-01", true, null, 0, null, null); + var b = new HistorianClusterNodeState("hist-01", true, null, 0, null, null); + + a.ShouldBe(b); + } + + [Fact] + public void ClusterNodeState_DistinctByAnyField() + { + var healthy = new HistorianClusterNodeState("hist-01", true, null, 0, null, null); + var unhealthy = new HistorianClusterNodeState("hist-01", false, null, 1, "boom", null); + + healthy.ShouldNotBe(unhealthy); + } +} diff --git a/tests/ZB.MOM.WW.OtOpcUa.Driver.Historian.Wonderware.Tests/ProgramSmokeTests.cs b/tests/ZB.MOM.WW.OtOpcUa.Driver.Historian.Wonderware.Tests/ProgramSmokeTests.cs new file mode 100644 index 0000000..81bbe95 --- /dev/null +++ b/tests/ZB.MOM.WW.OtOpcUa.Driver.Historian.Wonderware.Tests/ProgramSmokeTests.cs @@ -0,0 +1,21 @@ +using Shouldly; +using Xunit; +using ZB.MOM.WW.OtOpcUa.Driver.Historian.Wonderware; + +namespace ZB.MOM.WW.OtOpcUa.Driver.Historian.Wonderware.Tests; + +/// <summary> +/// Smoke test confirming the sidecar project links and the test project resolves a +/// ProjectReference to it. Real behavioural tests arrive in PR 3.2 (backend lift) and +/// PR 3.3 (pipe server). For PR 3.1 we just verify the assembly identity is what the +/// csproj declares. +/// </summary> +public class ProgramSmokeTests +{ + [Fact] + public void Program_Assembly_HasExpectedName() + { + typeof(Program).Assembly.GetName().Name + .ShouldBe("OtOpcUa.Driver.Historian.Wonderware"); + } +} diff --git a/tests/ZB.MOM.WW.OtOpcUa.Driver.Historian.Wonderware.Tests/ZB.MOM.WW.OtOpcUa.Driver.Historian.Wonderware.Tests.csproj b/tests/ZB.MOM.WW.OtOpcUa.Driver.Historian.Wonderware.Tests/ZB.MOM.WW.OtOpcUa.Driver.Historian.Wonderware.Tests.csproj new file mode 100644 index 0000000..2d80209 --- /dev/null +++ b/tests/ZB.MOM.WW.OtOpcUa.Driver.Historian.Wonderware.Tests/ZB.MOM.WW.OtOpcUa.Driver.Historian.Wonderware.Tests.csproj @@ -0,0 +1,28 @@ +<Project Sdk="Microsoft.NET.Sdk"> + + <PropertyGroup> + <TargetFramework>net48</TargetFramework> + <PlatformTarget>x86</PlatformTarget> + <Prefer32Bit>true</Prefer32Bit> + <Nullable>enable</Nullable> + <LangVersion>latest</LangVersion> + <IsPackable>false</IsPackable> + <IsTestProject>true</IsTestProject> + <RootNamespace>ZB.MOM.WW.OtOpcUa.Driver.Historian.Wonderware.Tests</RootNamespace> + </PropertyGroup> + + <ItemGroup> + <PackageReference Include="xunit" Version="2.9.2"/> + <PackageReference Include="xunit.runner.visualstudio" Version="3.0.2"> + <PrivateAssets>all</PrivateAssets> + <IncludeAssets>runtime; build; native; contentfiles; analyzers; buildtransitive</IncludeAssets> + </PackageReference> + <PackageReference Include="Shouldly" Version="4.3.0"/> + <PackageReference Include="Microsoft.NET.Test.Sdk" Version="17.12.0"/> + </ItemGroup> + + <ItemGroup> + <ProjectReference Include="..\..\src\ZB.MOM.WW.OtOpcUa.Driver.Historian.Wonderware\ZB.MOM.WW.OtOpcUa.Driver.Historian.Wonderware.csproj"/> + </ItemGroup> + +</Project> diff --git a/tests/ZB.MOM.WW.OtOpcUa.Server.Tests/Alarms/AlarmConditionServiceTests.cs b/tests/ZB.MOM.WW.OtOpcUa.Server.Tests/Alarms/AlarmConditionServiceTests.cs new file mode 100644 index 0000000..40f5551 --- /dev/null +++ b/tests/ZB.MOM.WW.OtOpcUa.Server.Tests/Alarms/AlarmConditionServiceTests.cs @@ -0,0 +1,331 @@ +using System.Collections.Concurrent; +using Shouldly; +using Xunit; +using ZB.MOM.WW.OtOpcUa.Core.Abstractions; +using ZB.MOM.WW.OtOpcUa.Server.Alarms; + +namespace ZB.MOM.WW.OtOpcUa.Server.Tests.Alarms; + +/// <summary> +/// Server-level alarm-condition state-machine tests added in PR 2.2. Ports the live +/// transition cases from <c>GalaxyAlarmTrackerTests</c> against the new +/// driver-agnostic <see cref="AlarmConditionService"/>: sub-attribute references come +/// from <see cref="AlarmConditionInfo"/>, value changes flow as +/// <see cref="DataValueSnapshot"/> instead of MX-specific <c>Vtq</c>, and the ack +/// write path is decoupled into <see cref="IAlarmAcknowledger"/>. +/// </summary> +public sealed class AlarmConditionServiceTests +{ + private const string ConditionId = "TankFarm.Tank1.Level.HiHi"; + private const string InAlarmRef = "TankFarm.Tank1.Level.HiHi.InAlarm"; + private const string PriorityRef = "TankFarm.Tank1.Level.HiHi.Priority"; + private const string DescRef = "TankFarm.Tank1.Level.HiHi.DescAttrName"; + private const string AckedRef = "TankFarm.Tank1.Level.HiHi.Acked"; + private const string AckMsgWriteRef = "TankFarm.Tank1.Level.HiHi.AckMsg"; + + private static AlarmConditionInfo Info( + string? inAlarm = InAlarmRef, string? priority = PriorityRef, + string? desc = DescRef, string? acked = AckedRef, string? ackMsg = AckMsgWriteRef) + => new( + SourceName: ConditionId, + InitialSeverity: AlarmSeverity.Medium, + InitialDescription: null, + InAlarmRef: inAlarm, + PriorityRef: priority, + DescAttrNameRef: desc, + AckedRef: acked, + AckMsgWriteRef: ackMsg); + + private static DataValueSnapshot Bool(bool v) => + new(v, StatusCode: 0, SourceTimestampUtc: DateTime.UtcNow, ServerTimestampUtc: DateTime.UtcNow); + private static DataValueSnapshot Int(int v) => + new(v, 0, DateTime.UtcNow, DateTime.UtcNow); + private static DataValueSnapshot Str(string v) => + new(v, 0, DateTime.UtcNow, DateTime.UtcNow); + + private sealed class FakeAcker : IAlarmAcknowledger + { + public readonly ConcurrentQueue<(string Ref, string Comment)> Writes = new(); + public bool ReturnValue { get; set; } = true; + + public Task<bool> WriteAckMessageAsync(string ackMsgWriteRef, string comment, CancellationToken cancellationToken) + { + Writes.Enqueue((ackMsgWriteRef, comment)); + return Task.FromResult(ReturnValue); + } + } + + [Fact] + public void Track_AddsCondition_AndExposesSubscribedReferences() + { + using var svc = new AlarmConditionService(); + + svc.Track(ConditionId, Info()); + + svc.TrackedCount.ShouldBe(1); + var refs = svc.GetSubscribedReferences(); + refs.ShouldContain(InAlarmRef); + refs.ShouldContain(PriorityRef); + refs.ShouldContain(DescRef); + refs.ShouldContain(AckedRef); + refs.Count.ShouldBe(4); + } + + [Fact] + public void Track_IsIdempotentOnRepeatCall() + { + using var svc = new AlarmConditionService(); + + svc.Track(ConditionId, Info()); + svc.Track(ConditionId, Info()); + + svc.TrackedCount.ShouldBe(1); + } + + [Fact] + public void Track_OmitsNullSubAttributeRefs() + { + using var svc = new AlarmConditionService(); + + // Driver may not expose every sub-attribute (e.g. no .Acked observable). + svc.Track(ConditionId, Info(priority: null, desc: null, acked: null)); + + svc.GetSubscribedReferences().ShouldBe(new[] { InAlarmRef }); + } + + [Fact] + public void InAlarmFalseToTrue_FiresActiveTransition() + { + using var svc = new AlarmConditionService(); + var transitions = new ConcurrentQueue<AlarmConditionTransition>(); + svc.TransitionRaised += (_, t) => transitions.Enqueue(t); + svc.Track(ConditionId, Info()); + + svc.OnValueChanged(PriorityRef, Int(500)); + svc.OnValueChanged(DescRef, Str("Tank level high-high")); + svc.OnValueChanged(InAlarmRef, Bool(true)); + + transitions.Count.ShouldBe(1); + transitions.TryDequeue(out var t).ShouldBeTrue(); + t!.Transition.ShouldBe(AlarmStateTransition.Active); + t.Priority.ShouldBe(500); + t.Description.ShouldBe("Tank level high-high"); + t.ConditionId.ShouldBe(ConditionId); + } + + [Fact] + public void InAlarmTrueToFalse_FiresInactiveTransition() + { + using var svc = new AlarmConditionService(); + var transitions = new ConcurrentQueue<AlarmConditionTransition>(); + svc.TransitionRaised += (_, t) => transitions.Enqueue(t); + svc.Track(ConditionId, Info()); + + svc.OnValueChanged(InAlarmRef, Bool(true)); + svc.OnValueChanged(InAlarmRef, Bool(false)); + + transitions.Count.ShouldBe(2); + transitions.TryDequeue(out _); + transitions.TryDequeue(out var t).ShouldBeTrue(); + t!.Transition.ShouldBe(AlarmStateTransition.Inactive); + } + + [Fact] + public void AckedFalseToTrue_FiresAcknowledged_WhileActive() + { + using var svc = new AlarmConditionService(); + var transitions = new ConcurrentQueue<AlarmConditionTransition>(); + svc.TransitionRaised += (_, t) => transitions.Enqueue(t); + svc.Track(ConditionId, Info()); + + svc.OnValueChanged(InAlarmRef, Bool(true)); // Active, resets Acked → false + svc.OnValueChanged(AckedRef, Bool(true)); // Acknowledged + + transitions.Count.ShouldBe(2); + transitions.TryDequeue(out _); + transitions.TryDequeue(out var t).ShouldBeTrue(); + t!.Transition.ShouldBe(AlarmStateTransition.Acknowledged); + } + + [Fact] + public void AckedTransitionWhileInactive_DoesNotFire() + { + using var svc = new AlarmConditionService(); + var transitions = new ConcurrentQueue<AlarmConditionTransition>(); + svc.TransitionRaised += (_, t) => transitions.Enqueue(t); + svc.Track(ConditionId, Info()); + + // Initial Acked=true on subscribe (alarm at rest, pre-ack'd) — must not fire. + svc.OnValueChanged(AckedRef, Bool(true)); + + transitions.ShouldBeEmpty(); + } + + [Fact] + public void RepeatedActiveTransitions_ResetAckedFlag() + { + using var svc = new AlarmConditionService(); + var transitions = new ConcurrentQueue<AlarmConditionTransition>(); + svc.TransitionRaised += (_, t) => transitions.Enqueue(t); + svc.Track(ConditionId, Info()); + + // Cycle 1: active → ack → inactive → active again + svc.OnValueChanged(InAlarmRef, Bool(true)); + svc.OnValueChanged(AckedRef, Bool(true)); + svc.OnValueChanged(InAlarmRef, Bool(false)); + svc.OnValueChanged(InAlarmRef, Bool(true)); // re-arms — Acked must reset to false + svc.OnValueChanged(AckedRef, Bool(true)); // produces a fresh Acknowledged + + // Active, Acknowledged, Inactive, Active, Acknowledged + transitions.Count.ShouldBe(5); + var ordered = transitions.Select(t => t.Transition).ToArray(); + ordered.ShouldBe(new[] + { + AlarmStateTransition.Active, + AlarmStateTransition.Acknowledged, + AlarmStateTransition.Inactive, + AlarmStateTransition.Active, + AlarmStateTransition.Acknowledged, + }); + } + + [Fact] + public async Task AcknowledgeAsync_RoutesToAckerWithAckMsgRef() + { + using var svc = new AlarmConditionService(); + var acker = new FakeAcker(); + svc.Track(ConditionId, Info(), acker); + + var ok = await svc.AcknowledgeAsync(ConditionId, "operator-1: cleared", CancellationToken.None); + + ok.ShouldBeTrue(); + acker.Writes.Count.ShouldBe(1); + acker.Writes.TryDequeue(out var w).ShouldBeTrue(); + w.Ref.ShouldBe(AckMsgWriteRef); + w.Comment.ShouldBe("operator-1: cleared"); + } + + [Fact] + public async Task AcknowledgeAsync_ReturnsFalse_WhenConditionUntracked() + { + using var svc = new AlarmConditionService(); + var acker = new FakeAcker(); + svc.Track("OtherCondition", Info(), acker); + + var ok = await svc.AcknowledgeAsync(ConditionId, "comment"); + + ok.ShouldBeFalse(); + acker.Writes.ShouldBeEmpty(); + } + + [Fact] + public async Task AcknowledgeAsync_ReturnsFalse_WhenNoAckerRegistered() + { + using var svc = new AlarmConditionService(); + svc.Track(ConditionId, Info(), acker: null); + + var ok = await svc.AcknowledgeAsync(ConditionId, "comment"); + + ok.ShouldBeFalse(); + } + + [Fact] + public async Task AcknowledgeAsync_ReturnsFalse_WhenAckMsgRefMissing() + { + using var svc = new AlarmConditionService(); + var acker = new FakeAcker(); + svc.Track(ConditionId, Info(ackMsg: null), acker); + + var ok = await svc.AcknowledgeAsync(ConditionId, "comment"); + + ok.ShouldBeFalse(); + acker.Writes.ShouldBeEmpty(); + } + + [Fact] + public void Snapshot_ReportsLatestFields() + { + using var svc = new AlarmConditionService(); + svc.Track(ConditionId, Info()); + svc.OnValueChanged(InAlarmRef, Bool(true)); + svc.OnValueChanged(PriorityRef, Int(900)); + svc.OnValueChanged(DescRef, Str("MyAlarm")); + svc.OnValueChanged(AckedRef, Bool(true)); + + var snap = svc.Snapshot(); + + snap.Count.ShouldBe(1); + snap[0].ConditionId.ShouldBe(ConditionId); + snap[0].InAlarm.ShouldBeTrue(); + snap[0].Acked.ShouldBeTrue(); + snap[0].Priority.ShouldBe(900); + snap[0].Description.ShouldBe("MyAlarm"); + } + + [Fact] + public void OnValueChanged_ForUnknownReference_IsSilentlyIgnored() + { + using var svc = new AlarmConditionService(); + var transitions = new ConcurrentQueue<AlarmConditionTransition>(); + svc.TransitionRaised += (_, t) => transitions.Enqueue(t); + + svc.OnValueChanged("Some.Random.Tag.InAlarm", Bool(true)); + + transitions.ShouldBeEmpty(); + } + + [Fact] + public void Untrack_RemovesConditionAndReleasesReferences() + { + using var svc = new AlarmConditionService(); + svc.Track(ConditionId, Info()); + + svc.Untrack(ConditionId); + + svc.TrackedCount.ShouldBe(0); + svc.GetSubscribedReferences().ShouldBeEmpty(); + } + + [Fact] + public void Untrack_NonexistentConditionIsNoOp() + { + using var svc = new AlarmConditionService(); + svc.Track(ConditionId, Info()); + + Should.NotThrow(() => svc.Untrack("does-not-exist")); + svc.TrackedCount.ShouldBe(1); + } + + [Fact] + public void Track_ThrowsAfterDisposal() + { + var svc = new AlarmConditionService(); + svc.Dispose(); + + Should.Throw<ObjectDisposedException>(() => svc.Track(ConditionId, Info())); + } + + [Fact] + public void OnValueChanged_AfterDisposal_IsSilentlyDropped() + { + var svc = new AlarmConditionService(); + svc.Track(ConditionId, Info()); + svc.Dispose(); + + // Stale callbacks during disposal must not throw. + Should.NotThrow(() => svc.OnValueChanged(InAlarmRef, Bool(true))); + } + + [Fact] + public void PriorityCoercion_AcceptsCommonNumericTypes() + { + using var svc = new AlarmConditionService(); + svc.Track(ConditionId, Info()); + + svc.OnValueChanged(PriorityRef, new DataValueSnapshot((short)123, 0, null, DateTime.UtcNow)); + svc.OnValueChanged(InAlarmRef, Bool(true)); + + var snap = svc.Snapshot()[0]; + snap.Priority.ShouldBe(123); + } +} diff --git a/tests/ZB.MOM.WW.OtOpcUa.Server.Tests/History/HistoryRouterTests.cs b/tests/ZB.MOM.WW.OtOpcUa.Server.Tests/History/HistoryRouterTests.cs new file mode 100644 index 0000000..082a1d3 --- /dev/null +++ b/tests/ZB.MOM.WW.OtOpcUa.Server.Tests/History/HistoryRouterTests.cs @@ -0,0 +1,169 @@ +using Shouldly; +using Xunit; +using ZB.MOM.WW.OtOpcUa.Core.Abstractions; +using ZB.MOM.WW.OtOpcUa.Server.History; + +namespace ZB.MOM.WW.OtOpcUa.Server.Tests.History; + +/// <summary> +/// Tests for <see cref="HistoryRouter"/> registration + resolution semantics added +/// in PR 1.2. The router is the only seam between OPC UA HistoryRead service calls +/// and registered <see cref="IHistorianDataSource"/> implementations, so the +/// resolution rules (case-insensitive prefix, longest-match wins, no source => +/// null) need explicit coverage. +/// </summary> +public sealed class HistoryRouterTests +{ + [Fact] + public void Resolve_ReturnsNull_WhenNoSourceRegistered() + { + using var router = new HistoryRouter(); + router.Resolve("anything").ShouldBeNull(); + } + + [Fact] + public void Resolve_ReturnsRegisteredSource_WhenPrefixMatches() + { + using var router = new HistoryRouter(); + var source = new FakeSource("galaxy"); + router.Register("galaxy", source); + + router.Resolve("galaxy.TankFarm.Tank1.Level").ShouldBe(source); + } + + [Fact] + public void Resolve_ReturnsNull_WhenPrefixDoesNotMatch() + { + using var router = new HistoryRouter(); + router.Register("galaxy", new FakeSource("galaxy")); + + router.Resolve("modbus.MyDevice.Tag1").ShouldBeNull(); + } + + [Fact] + public void Resolve_LongestPrefixWins_WhenMultipleRegistered() + { + using var router = new HistoryRouter(); + var generic = new FakeSource("generic"); + var specific = new FakeSource("specific"); + + router.Register("galaxy", generic); + router.Register("galaxy.HighRate", specific); + + router.Resolve("galaxy.HighRate.Sensor1").ShouldBe(specific); + router.Resolve("galaxy.LowRate.Sensor2").ShouldBe(generic); + } + + [Fact] + public void Resolve_IsCaseInsensitive_OnPrefixMatch() + { + using var router = new HistoryRouter(); + var source = new FakeSource("galaxy"); + router.Register("Galaxy", source); + + router.Resolve("galaxy.foo").ShouldBe(source); + router.Resolve("GALAXY.foo").ShouldBe(source); + } + + [Fact] + public void Register_Throws_WhenPrefixAlreadyRegistered() + { + using var router = new HistoryRouter(); + router.Register("galaxy", new FakeSource("first")); + + Should.Throw<InvalidOperationException>( + () => router.Register("galaxy", new FakeSource("second"))); + } + + [Fact] + public void Dispose_DisposesAllRegisteredSources() + { + var router = new HistoryRouter(); + var a = new FakeSource("a"); + var b = new FakeSource("b"); + router.Register("ns_a", a); + router.Register("ns_b", b); + + router.Dispose(); + + a.IsDisposed.ShouldBeTrue(); + b.IsDisposed.ShouldBeTrue(); + } + + [Fact] + public void Dispose_SwallowsExceptionsFromMisbehavingSource() + { + var router = new HistoryRouter(); + var throwing = new ThrowingFakeSource(); + var clean = new FakeSource("clean"); + router.Register("bad", throwing); + router.Register("good", clean); + + // Even when one source's Dispose throws, the router must finish disposing the + // remaining sources (server shutdown invariant). + Should.NotThrow(() => router.Dispose()); + clean.IsDisposed.ShouldBeTrue(); + } + + [Fact] + public void Resolve_Throws_AfterDisposal() + { + var router = new HistoryRouter(); + router.Dispose(); + + Should.Throw<ObjectDisposedException>(() => router.Resolve("anything")); + } + + [Fact] + public void Register_Throws_AfterDisposal() + { + var router = new HistoryRouter(); + router.Dispose(); + + Should.Throw<ObjectDisposedException>( + () => router.Register("ns", new FakeSource("x"))); + } + + private sealed class FakeSource(string name) : IHistorianDataSource + { + public string Name { get; } = name; + public bool IsDisposed { get; private set; } + + public void Dispose() => IsDisposed = true; + + public Task<HistoryReadResult> ReadRawAsync(string fullReference, DateTime startUtc, DateTime endUtc, uint maxValuesPerNode, CancellationToken cancellationToken) + => throw new NotImplementedException(); + + public Task<HistoryReadResult> ReadProcessedAsync(string fullReference, DateTime startUtc, DateTime endUtc, TimeSpan interval, HistoryAggregateType aggregate, CancellationToken cancellationToken) + => throw new NotImplementedException(); + + public Task<HistoryReadResult> ReadAtTimeAsync(string fullReference, IReadOnlyList<DateTime> timestampsUtc, CancellationToken cancellationToken) + => throw new NotImplementedException(); + + public Task<HistoricalEventsResult> ReadEventsAsync(string? sourceName, DateTime startUtc, DateTime endUtc, int maxEvents, CancellationToken cancellationToken) + => throw new NotImplementedException(); + + public HistorianHealthSnapshot GetHealthSnapshot() + => new(0, 0, 0, 0, null, null, null, false, false, null, null, []); + } + + private sealed class ThrowingFakeSource : IHistorianDataSource + { + public void Dispose() => throw new InvalidOperationException("boom"); + + public Task<HistoryReadResult> ReadRawAsync(string fullReference, DateTime startUtc, DateTime endUtc, uint maxValuesPerNode, CancellationToken cancellationToken) + => throw new NotImplementedException(); + + public Task<HistoryReadResult> ReadProcessedAsync(string fullReference, DateTime startUtc, DateTime endUtc, TimeSpan interval, HistoryAggregateType aggregate, CancellationToken cancellationToken) + => throw new NotImplementedException(); + + public Task<HistoryReadResult> ReadAtTimeAsync(string fullReference, IReadOnlyList<DateTime> timestampsUtc, CancellationToken cancellationToken) + => throw new NotImplementedException(); + + public Task<HistoricalEventsResult> ReadEventsAsync(string? sourceName, DateTime startUtc, DateTime endUtc, int maxEvents, CancellationToken cancellationToken) + => throw new NotImplementedException(); + + public HistorianHealthSnapshot GetHealthSnapshot() + => new(0, 0, 0, 0, null, null, null, false, false, null, null, []); + } +}