Compare commits

..

19 Commits

Author SHA1 Message Date
Joseph Doherty 64d8838e18 docs: reconcile alarms-over-gateway banner with audited source
The 'All 19 PRs merged' banner contradicted the warning paragraph in
the same block and overstated reality against the source tree. Audit
of the lmxopcua + mxaccessgw repos on 2026-05-01 found:

- 17 of 19 PRs merged. Four merged PRs ship inert scaffolds:
  - A.2: MxAccessAlarmEventSink.Attach is a no-op.
  - A.3 / A.4: NotWiredAlarmRpcDispatcher returns OK-with-diagnostic
    for AcknowledgeAlarm and an empty stream for QueryActiveAlarms.
  - C.1: SdkAlarmHistorianWriteBackend.WriteBatchAsync returns
    RetryPlease for every event with a placeholder log.
- The architectural decision the warning paragraph asks the operator
  to make was already resolved 2026-04-30. MxAccessAlarmEventSink.cs
  in mxaccessgw records that aaAlarmManagedClient.AlarmClient is x86
  net48 (same bitness as the worker), and pins the discovered API
  surface (RegisterConsumer / Subscribe / GetStatistics /
  GetAlarmExtendedRec / AlarmAckByGUID). What remains is wiring PRs
  in the worker, not architectural choice.
- D.1 smoke artifact (docs/plans/artifacts/d1-rollout-YYYY-MM-DD.md)
  not yet captured; directory does not exist.

Banner rewritten to split functional-end-to-end vs merged-but-inert
PRs explicitly so future readers don't have to reconcile the doc
against the source tree themselves.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-01 06:31:22 -04:00
dohertj2 69f02fed7f Merge pull request 'docs: alarms-over-gateway plan banner — record A.2 dev-rig finding' (#418) from track-d1-followup-plan-banner into master 2026-04-30 21:31:40 -04:00
Joseph Doherty 5ed26d2ec6 docs: alarms-over-gateway plan banner — record A.2 dev-rig finding
Replaces the "ships as a follow-up gated on dev-rig validation"
banner with the actual finding from the dev-rig inspection: the
MXAccess COM Toolkit on this AVEVA install does not expose any
alarm-event family, and the AVEVA alarm-subscription managed
assemblies (aaAlarmManagedClient, ArchestrAAlarmsAndEvents.SDK)
are x64-only and incompatible with the worker's x86 bitness.

Two operator-facing paths forward documented inline:

1. Stay on the value-driven sub-attribute path (current production
   behaviour). Operator-comment fidelity is the only v1 regression.

2. Add an x64 alarm-helper sub-process alongside the worker that
   loads aaAlarmManagedClient and forwards transitions over a
   named-pipe IPC. Recovers full v1 fidelity but adds operational
   complexity.

The full architectural notes live in the mxaccessgw repo at
src/MxGateway.Worker/MxAccess/MxAccessAlarmEventSink.cs.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-30 21:29:16 -04:00
dohertj2 439b39463b Merge pull request 'scripts+docs: Refresh-Services.ps1 for alarm-rig deploy refresh (PR D.1)' (#417) from track-d1-refresh-services into master 2026-04-30 21:13:58 -04:00
dohertj2 62d01e76e5 Merge pull request 'docs: alarms-over-gateway completion banner + AlarmTracking v2 (PR B.5)' (#416) from track-b5-docs-memory-housekeeping into master 2026-04-30 21:11:29 -04:00
Joseph Doherty 32b872d5c7 scripts+docs: Refresh-Services.ps1 for alarm-rig deploy refresh (PR D.1)
Seventeenth PR of the alarms-over-gateway epic
(docs/plans/alarms-over-gateway.md). Lands the script that the
plan calls for in Track D — the actual smoke-run validation
on the dev rig (publish, restart, fire alarms, capture artifacts)
remains operator work; this PR ships the automation that the
operator drives.

scripts/install/Refresh-Services.ps1 — single-shot refresh
script. Designed to run elevated on the deploy host
(DESKTOP-6JL3KKO today; production uses a separate runbook).
The script:

- Stops services in reverse-dependency order (OtOpcUa →
  OtOpcUaWonderwareHistorian → MxAccessGw) and force-kills any
  residual processes (avoids the publish-time MSB3027 file-lock
  the original install script hit).
- Snapshots existing C:\publish trees to
  C:\publish\.backup-YYYY-MM-DD-HHMMSS\ for rollback (skip with
  -SkipBackup).
- Builds + copies mxaccessgw worker (x86 net48) + server (net10.0)
  binaries from the sibling repo.
- Publishes OtOpcUa Server + Wonderware historian sidecar from
  this repo.
- Ensures OTOPCUA_HISTORIAN_ALARM_WRITE_ENABLED=true is set on
  the historian service env block (PR C.2 toggle).
- Starts services in forward-dependency order with the
  inter-service waits the original install used.
- Smoke-verifies (service status, listening ports 5120 / 4840
  / 4841, recent log tails).

Supports -WhatIf for dry-run inspection without touching the
running services.

docs/v2/dev-environment.md — new "Service Refresh —
Refresh-Services.ps1" section between Credential Management
and Test Data Seed. Cross-references the plan's Track D
functional verification scenarios.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-30 21:11:27 -04:00
Joseph Doherty 89004c052c docs: alarms-over-gateway completion banner + AlarmTracking v2 (PR B.5)
Sixteenth PR of the alarms-over-gateway epic
(docs/plans/alarms-over-gateway.md). Closes the documentation sweep
the plan calls for.

- docs/AlarmTracking.md — promoted top-level v2-final architecture
  doc (was a worktree-only draft pre-epic). Covers the three alarm
  sources (Galaxy MxAccess driver-native / Galaxy sub-attribute
  fallback / scripted alarms), how they converge on
  AlarmConditionService, the Acknowledge routing decision in
  DriverNodeManager (driver-native preferred over IWritable
  sub-attribute fallback), the sidecar historian write-back path
  for non-Galaxy producers, and cross-references to the plan +
  v1 archive.
- docs/v1/AlarmTracking.md — banner pointing readers at the v2
  doc; preserved as historical record.
- docs/drivers/Galaxy.md — capability list updated to include
  IAlarmSource (now eight capabilities, restored by B.2). Replaced
  the "IAlarmSource retired in 7.2" sentence with the restoration
  note + cross-link to docs/AlarmTracking.md.
- docs/plans/alarms-over-gateway.md — completion banner at the
  top of the plan, marking 14 of 16 PRs shipped 2026-04-30 and
  noting that A.2 + A.4 + D.1 are the hardware-gated follow-up.

Memory entries updated separately:
- project_alarms_over_gateway_epic.md (new) — epic summary +
  per-PR digest.
- project_galaxy_via_mxgateway.md — added "Alarms restored"
  bullet pointing at the new architecture.
- project_server_history_alarm_subsystems.md — bullet 2 updated
  to describe the new ack-routing decision (B.3) + bullet 3
  added describing the historian write-back path that B.4 + C.1
  + C.2 light up.
- MEMORY.md index — new pointer entry.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-30 21:09:04 -04:00
dohertj2 2baca785ad Merge pull request 'abstractions+driver+client.shared: extend AlarmEventArgs with rich payload (PR E.7)' (#415) from track-e7-alarm-event-args-extension into master 2026-04-30 17:49:19 -04:00
Joseph Doherty 1d62709060 abstractions+driver+client.shared: extend AlarmEventArgs with rich payload (PR E.7)
Fourteenth PR of the alarms-over-gateway epic
(docs/plans/alarms-over-gateway.md). Depends on PR B.2 (GalaxyDriver
implements IAlarmSource, merged) and B.3 (DriverNodeManager prefers
driver-native ack, merged).

Three new optional fields on Core.Abstractions.AlarmEventArgs:

- OperatorComment — populated by the driver-native gateway path on
  Acknowledge transitions. Null on raise / clear, and null on the
  sub-attribute fallback path where the comment collapses into a
  single string write.
- OriginalRaiseTimestampUtc — preserved across Acknowledge so OPC
  UA Part 9 conditions keep the original raise time.
- AlarmCategory — taxonomy bucket from the upstream alarm system.
  Maps to ConditionClassName downstream when a class mapping is
  configured.

GalaxyDriver.OnPumpAlarmTransition populates the new fields from
GalaxyAlarmTransition (PR B.1). Empty strings collapse to null so
consumers can use is-null rather than is-null-or-empty checks.

Client.Shared mirror DTO (Client.Shared/Models/AlarmEventArgs)
gains the same three properties so the Client.UI / Client.CLI
surfaces can reflect the rich payload — the actual UI/CLI
verbose-output and Show-Details rendering ship as a follow-up
PR; this PR locks in the payload contract.

Tests:
- 2 new tests in Driver.Galaxy.Tests pin the populated-vs-null
  behaviour for full-payload Acknowledge and bare-bones Raise
  transitions respectively.
- Solution build clean.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-30 17:46:47 -04:00
dohertj2 0b5a4a676e Merge pull request 'server: DriverNodeManager prefers IAlarmSource ack over IWritable (PR B.3)' (#414) from track-b3-prefer-driver-native-alarm into master 2026-04-30 17:23:09 -04:00
Joseph Doherty edc984987b server: DriverNodeManager prefers IAlarmSource ack over IWritable (PR B.3)
Thirteenth PR of the alarms-over-gateway epic
(docs/plans/alarms-over-gateway.md). Depends on PR B.2 (GalaxyDriver
implements IAlarmSource, merged).

When DriverNodeManager registers an AlarmConditionState with
AlarmConditionService, it now picks the acknowledger:

- Driver implements IAlarmSource → DriverAlarmSourceAcknowledger
  routes the operator comment through IAlarmSource.AcknowledgeAsync
  via the existing AlarmSurfaceInvoker (Phase 6.1 resilience pipeline,
  no-retry per decision #143). Preserves operator-comment fidelity
  end-to-end — the value-driven sub-attribute write collapses the
  comment into a single string write that loses MxAccess metadata.
- Driver does not implement IAlarmSource →
  DriverWritableAcknowledger fallback (existing behaviour for
  AbCip / Modbus / S7 / etc).

The dedup logic that prefers driver-native transitions over
sub-attribute synthesis lives in AlarmConditionService and is
already in place — drivers that surface OnAlarmEvent (B.2) feed
the service directly, while sub-attribute writes still flow
through DriverNodeManager's ConditionSink so a Galaxy template
without $Alarm extensions stays functional.

Tests:
- 2 new routing-decision tests in
  DriverAlarmSourceAcknowledgerRoutingTests pin the
  IAlarmSource detection used at registration time.
- Server build clean.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-30 17:20:45 -04:00
dohertj2 6126374594 Merge pull request 'driver-galaxy: GalaxyDriver implements IAlarmSource (PR B.2)' (#413) from track-b2-galaxy-driver-ialarmsource into master 2026-04-30 17:18:20 -04:00
Joseph Doherty 38afc234ff driver-galaxy: GalaxyDriver implements IAlarmSource (PR B.2)
Twelfth PR of the alarms-over-gateway epic
(docs/plans/alarms-over-gateway.md). Depends on PR B.1 (EventPump
dispatch, merged) and PR E.2 (.NET SDK alarm methods, merged).
Restores the v1 IAlarmSource capability that PR 7.2 retired with the
legacy Galaxy.Host / Galaxy.Proxy projects.

GalaxyDriver gains:
- IAlarmSource on the class declaration → eight capabilities total
  (IDriver / ITagDiscovery / IReadable / IWritable / ISubscribable /
  IRediscoverable / IHostConnectivityProbe / IAlarmSource).
- SubscribeAlarmsAsync — returns a sentinel handle and starts the
  shared EventPump (alarm wiring is lazy on first sub).
  Multiple handles share the same gateway stream; the server-side
  AlarmConditionService dispatches per-source-node downstream.
- UnsubscribeAlarmsAsync — symmetric handle removal; rejects
  handles not issued by this driver.
- AcknowledgeAsync — issues one gateway RPC per acknowledgement
  through IGalaxyAlarmAcknowledger. ConditionId carries the alarm
  full reference; falls back to SourceNodeId when empty.
- OnAlarmEvent — bridges EventPump.OnAlarmTransition (B.1) onto
  AlarmEventArgs. Suppressed when no alarm subscription is active so
  untracked transitions don't leak through.

New runtime types:
- IGalaxyAlarmAcknowledger — test seam.
- GatewayGalaxyAlarmAcknowledger — production wrapper around
  MxGatewayClient.AcknowledgeAlarmAsync (PR E.2). Maps native
  MxStatus failures to a logged warning rather than a thrown
  exception so a transient MxAccess hiccup doesn't fail the
  operator's Acknowledge.
- GalaxyAlarmSubscriptionHandle — driver-side IAlarmSubscriptionHandle.

Production runtime construction in BuildProductionRuntimeAsync wires
the acknowledger when not pre-injected; tests inject a fake via the
internal ctor.

Tests:
- 7 new tests in GalaxyDriverAlarmSourceTests — subscribe → event
  fire path, suppress without subscription, unsubscribe stops flow,
  foreign-handle rejection, ack routes per-request, ack falls back
  to SourceNodeId, ack throws NotSupported without acknowledger.
- Full Driver.Galaxy.Tests: 203 passed (was 196; 7 new).

Operates as a "stub-ready" surface — runtime ack calls will return
PERMISSION_DENIED until A.3 ships the gateway-side dispatch, and no
alarm transitions will arrive until A.2 adds the worker MxAccess
subscription. Both will activate this code path automatically when
the gateway side lands.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-30 17:15:46 -04:00
dohertj2 95422995c0 Merge pull request 'server: Phase7Composer accepts DI-registered IAlarmHistorianWriter (PR B.4)' (#412) from track-b4-sidecar-alarm-historian-writer into master 2026-04-30 16:33:27 -04:00
Joseph Doherty 6e282b9946 server: Phase7Composer accepts DI-registered IAlarmHistorianWriter (PR B.4)
Sixth PR of the alarms-over-gateway epic
(docs/plans/alarms-over-gateway.md). Depends on PR C.2 (sidecar
serves IAlarmEventWriter when enabled), already merged.

Today Phase7Composer.ResolveHistorianSink only scans drivers for an
IAlarmHistorianWriter — no Galaxy driver provides one since PR 7.2,
so the resolution falls through to NullAlarmHistorianSink and
scripted-alarm transitions are silently discarded.

WonderwareHistorianClient already implements IAlarmHistorianWriter
and Program.cs:178 already registers it as a singleton when
Historian:Wonderware:Enabled=true. The gap was that Phase7Composer
ignored DI: this PR adds an optional injectedWriter constructor
parameter, and ASP.NET Core DI resolves it from the same
registration when present.

- Phase7Composer constructor: new optional IAlarmHistorianWriter?
  injectedWriter parameter (default null). Backward-compatible —
  existing callers don't need to change; DI populates it
  automatically when the singleton is registered.
- New static SelectAlarmHistorianWriter helper — resolution order
  is driver → DI → null. Drivers win when both are present so a
  future GalaxyDriver-as-IAlarmHistorianWriter takes the write
  path directly, preserving the v1 invariant where a driver that
  natively owns the historian client doesn't bounce through the
  sidecar IPC.
- ResolveHistorianSink uses the helper + emits a structured log
  line identifying which source provided the writer.

Tests:
- 4 SelectAlarmHistorianWriter precedence tests — no source / DI
  only / driver wins over DI / first-driver-with-writer wins.
- Pre-existing 4 HostStatusPublisherTests SQL failures unrelated
  to this change (require the docker-host SQL Server at
  10.100.0.35,14330 per CLAUDE.md). Phase7 + alarm tests all
  green.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-30 16:31:00 -04:00
dohertj2 f67b3b1b30 Merge pull request 'sidecar: wire IAlarmEventWriter into Program.cs (PR C.2)' (#411) from track-c2-program-wires-alarm-writer into master 2026-04-30 16:22:36 -04:00
Joseph Doherty ffacbe0370 sidecar: wire IAlarmEventWriter into Program.cs (PR C.2)
Fifth PR of the alarms-over-gateway epic
(docs/plans/alarms-over-gateway.md). Depends on PR C.1
(AahClientManagedAlarmEventWriter), already merged.

Today HistorianFrameHandler is constructed at Program.cs line 57
without an alarmWriter, so every WriteAlarmEvents frame replies
"Sidecar not configured with an alarm-event writer" and the lmxopcua
side keeps the row queued. C.2 wires a real writer behind a new
OTOPCUA_HISTORIAN_ALARM_WRITE_ENABLED toggle.

- Program.BuildAlarmWriter — gated on the env var (default true,
  fail-open under accidental misconfiguration). Constructs an
  AahClientManagedAlarmEventWriter wrapping a
  SdkAlarmHistorianWriteBackend with the same connection config the
  read path uses.
- Install-Services.ps1 — appends OTOPCUA_HISTORIAN_ALARM_WRITE_ENABLED=true
  to the OtOpcUaWonderwareHistorian service env block when the
  sidecar is installed. Read-only deployments flip it to false at
  service-config edit time without re-installing.
- HistorianFrameHandler signature already accepts
  IAlarmEventWriter? — supplying non-null at line 57 lights up
  the WriteAlarmEvents reply path that's been dormant since PR 3.3.

Until PR D.1 pins the live aahClientManaged entry point, the
SdkAlarmHistorianWriteBackend reports RetryPlease for every event
with a structured diagnostic. The lmxopcua-side
SqliteStoreAndForwardSink retains queued events; same effective
behaviour as today's NullAlarmHistorianSink fallback but with
visible diagnostics rather than silent discard.

Tests:
- 6 BuildAlarmWriter env-var cases — unset / true / false /
  unrecognized → default-on / capitalization variants.
- Full sidecar test suite: 56 passed (was 48; 8 new).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-30 16:20:11 -04:00
dohertj2 8a4526a376 Merge pull request 'sidecar: AahClientManagedAlarmEventWriter implements IAlarmEventWriter (PR C.1)' (#410) from track-c1-aah-alarm-writer into master 2026-04-30 16:19:36 -04:00
Joseph Doherty f99cf5033a sidecar: AahClientManagedAlarmEventWriter implements IAlarmEventWriter (PR C.1)
Fourth PR of the alarms-over-gateway epic
(docs/plans/alarms-over-gateway.md). Independent of Tracks A and B —
the sidecar slot defined in HistorianFrameHandler line 242 is unwired
today; PR C.2 (next) flips it on in Program.cs.

- AlarmHistorianWriteOutcome (sidecar-local, net48 — twin of
  Core.AlarmHistorian.HistorianWriteOutcome which is net10): Ack /
  RetryPlease / PermanentFail.
- IAlarmHistorianWriteBackend abstraction so the SDK call can be
  faked in unit tests.
- AahClientManagedAlarmEventWriter implements IAlarmEventWriter,
  delegates to the backend, maps Ack→true / Retry|Permanent→false
  for the IPC bool[] reply contract. Backend exception → whole
  batch RetryPlease (preserves the sender's queue across transients
  rather than dropping). Wrong-count return defends against a
  backend bug desyncing queue accounting.
- SdkAlarmHistorianWriteBackend — production binding skeleton.
  Reports RetryPlease for every event and logs a structured
  diagnostic until PR D.1 pins the live aahClientManaged entry
  point against the dev rig. The sender's SqliteStoreAndForwardSink
  retains queued events, mirroring today's NullAlarmHistorianSink
  behaviour but with visible diagnostics instead of silent discard.
- MapOutcome shared helper — pinned via theory tests so the D.1
  swap can change the SDK call site without reshuffling the
  HRESULT → outcome mapping.

Tests:
- 6 writer tests — empty batch / single Ack / mixed Ack-Retry-
  Permanent-Ack ordering / backend-throw → RetryPlease batch /
  cancellation propagates / wrong-count defensive degrade.
- 5 outcome theory cases — hresult 0 → Ack, malformed wins over
  hresult 0, comm error → Retry, unknown failure → Retry,
  malformed + comm → Permanent.
- Full sidecar test suite: 48 passed (was 42; 6 new).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-30 16:17:05 -04:00
26 changed files with 1974 additions and 21 deletions
+129
View File
@@ -0,0 +1,129 @@
# Alarm tracking — v2 final architecture
This document describes how OtOpcUa surfaces alarms to OPC UA Part 9
clients after the **alarms-over-gateway** epic
([docs/plans/alarms-over-gateway.md](plans/alarms-over-gateway.md))
landed. The v1 architecture (Galaxy.Host's COM-side `GalaxyAlarmTracker`)
is preserved at [docs/v1/AlarmTracking.md](v1/AlarmTracking.md) for
historical reference.
## Three alarm sources, one OPC UA Part 9 surface
| Source | Driver capability | Path |
|----------------------------------|--------------------------|------|
| **Galaxy MxAccess (driver-native)** | `GalaxyDriver : IAlarmSource` | gateway → worker → MxAccess alarm sink → `MX_EVENT_FAMILY_ON_ALARM_TRANSITION``EventPump` → driver `OnAlarmEvent``AlarmConditionService` |
| **Galaxy sub-attribute fallback** | `IWritable` writes to `$Alarm*` sub-attributes | gateway data subscription → driver `OnDataChange``DriverNodeManager` ConditionSink → `AlarmConditionService` |
| **Scripted alarms** | `Phase7EngineComposer` | server-side script evaluator → `Phase7EngineComposer.RouteToHistorianAsync` + `AlarmConditionService` |
All three converge on `AlarmConditionService` (`src/ZB.MOM.WW.OtOpcUa.Server/Alarms/AlarmConditionService.cs`),
which owns the OPC UA Part 9 state machine and dispatches transitions
to the OPC UA condition node managers. Driver-native transitions take
precedence over sub-attribute synthesis when both arrive for the same
condition — the dedup logic prefers the richer driver-native record
because it carries the full operator + raise-time + category metadata
that the value-driven path collapses.
## Galaxy driver path (driver-native)
Restored in PR B.2 of the epic. `GalaxyDriver` implements
`IAlarmSource` with these surfaces:
- `SubscribeAlarmsAsync(sourceNodeIds)` → returns a sentinel handle.
The driver doesn't multiplex per source-node-id today; every
active handle observes the gateway's alarm-event stream. The
server-side `AlarmConditionService` filters by source-node before
raising the OPC UA condition.
- `UnsubscribeAlarmsAsync(handle)` → symmetric handle removal.
- `AcknowledgeAsync(requests)` → routes one gateway RPC per
acknowledgement through `IGalaxyAlarmAcknowledger`. Production
uses `GatewayGalaxyAlarmAcknowledger` calling
`MxGatewayClient.AcknowledgeAlarmAsync` (PR E.2 SDK method).
- `OnAlarmEvent` → bridges `EventPump.OnAlarmTransition` (PR B.1)
onto `AlarmEventArgs`. Suppressed when no alarm subscription is
active so untracked transitions don't leak through.
The proto contract carries the rich payload — alarm full reference,
source-object reference, alarm-type-name, transition kind (Raise /
Acknowledge / Clear / Retrigger), severity (raw MxAccess scale),
original raise timestamp, transition timestamp, operator user,
operator comment, alarm category, description. `MxAccessSeverityMapper`
(PR B.1) translates the raw severity onto the four-bucket
`AlarmSeverity` ladder — boundaries match v1's `GalaxyAlarmTracker`
so customers see no surprise re-classification.
The richer fields surface on `Core.Abstractions.AlarmEventArgs` via
the optional properties added in PR E.7 (`OperatorComment`,
`OriginalRaiseTimestampUtc`, `AlarmCategory`). Consumers that don't
need them are unaffected; consumers that do (Client.UI, Client.CLI
verbose mode) read the new fields when present.
## Galaxy sub-attribute fallback
For Galaxy templates without `$Alarm*` extensions, the value-driven
path stays in place: `DriverNodeManager` registers an
`AlarmConditionState` per Galaxy variable that bears alarm-bearing
sub-attributes (`InAlarm`, `Acked`, `Priority`, `Description`),
subscribes to those sub-attributes, and synthesizes Part 9 transitions
when the values change. This path operated as the only Galaxy alarm
path between PR 7.2 and the alarms-over-gateway epic; it remains the
fallback today.
When both paths report the same condition,
`AlarmConditionService.AlarmConditionState` keeps the
driver-native record and discards the duplicate sub-attribute
synthesis. Driver-native transitions are richer (carry operator
comment + original raise time) and arrive lower-latency (no
publishing-interval delay on the sub-attribute reads), so they win
the dedup.
## Acknowledge routing
`DriverNodeManager` picks the acknowledger when registering each
condition (PR B.3 logic):
- Driver implements `IAlarmSource`
`DriverAlarmSourceAcknowledger` routes the operator comment
through `IAlarmSource.AcknowledgeAsync` via the existing
`AlarmSurfaceInvoker` (Phase 6.1 resilience pipeline; no-retry
per decision #143). End-to-end operator-comment fidelity is
preserved.
- Driver doesn't implement `IAlarmSource`
`DriverWritableAcknowledger` writes the comment into the
`AckMsgWriteRef` sub-attribute via `IWritable.WriteAsync`. Same
resilience pipeline; collapses the comment into a single string
write at the wire level.
The OPC UA Part 9 `AlarmConditionState.OnAcknowledge` delegate
already validates the session's `AlarmAck` role before dispatching,
so the gateway-side ack RPC only sees authenticated, authorised
calls.
## Historian write-back (non-Galaxy alarms)
Scripted alarms (and any future non-Galaxy `IAlarmSource` like
AB CIP ALMD) route to AVEVA Historian via the Wonderware sidecar:
- `Phase7Composer.ResolveHistorianSink` resolves an
`IAlarmHistorianWriter` from either a driver that natively
implements it or the DI-registered `WonderwareHistorianClient`
(the sidecar IPC client). Driver-provided wins when both are
present.
- `SqliteStoreAndForwardSink` queues each transition to a local
SQLite database and drains in the background via the resolved
writer.
- Sidecar (PR C.1 + C.2) forwards the events to `aahClientManaged`'s
alarm-event write API; the live SDK call site is pinned during
PR D.1's deploy-rig validation.
Galaxy-native alarms with `$Alarm*` extensions reach AVEVA Historian
directly via System Platform's `HistorizeToAveva` toggle on the
alarm primitive — no involvement from OtOpcUa. This sidecar path is
exclusively for non-Galaxy alarm producers.
## Cross-references
- Plan: [docs/plans/alarms-over-gateway.md](plans/alarms-over-gateway.md)
- v1 archive: [docs/v1/AlarmTracking.md](v1/AlarmTracking.md)
- Galaxy driver: [docs/drivers/Galaxy.md](drivers/Galaxy.md)
- Phase 7 scripting + alarming: [docs/v2/implementation/phase-7-scripting-and-alarming.md](v2/implementation/phase-7-scripting-and-alarming.md)
- Security + ACL: [docs/Security.md](Security.md)
+14 -2
View File
@@ -15,7 +15,8 @@ For the driver spec (capability surface, config shape, addressing), see [docs/v2
| ITagDiscovery / IReadable / |
| IWritable / ISubscribable / |
| IRediscoverable / |
| IHostConnectivityProbe |
| IHostConnectivityProbe / |
| IAlarmSource |
+-------------------+-------------------+
|
gRPC (default http://localhost:5120)
@@ -33,7 +34,18 @@ For the driver spec (capability surface, config shape, addressing), see [docs/v2
+---------------------------------------+
```
History reads + alarm-condition tracking moved server-side in PR 7.2 (`IHistoryRouter`, `AlarmConditionService`). Galaxy no longer implements `IHistoryProvider` or `IAlarmSource` of its own.
History reads moved server-side in PR 7.2 (`IHistoryRouter`). Galaxy no longer implements `IHistoryProvider` of its own.
`IAlarmSource` was retired with PR 7.2 and **restored in PR B.2** of the
alarms-over-gateway epic ([docs/plans/alarms-over-gateway.md](../plans/alarms-over-gateway.md)).
Alarm transitions arrive on the same gateway `StreamEvents` channel as
data-change events under the new `MX_EVENT_FAMILY_ON_ALARM_TRANSITION`
family; acknowledgements route through the gateway's
`AcknowledgeAlarm` RPC. The previous value-driven sub-attribute path
remains as a fallback for Galaxy templates without `$Alarm*`
extensions — the server-side `AlarmConditionService` dedups when both
paths fire on the same condition. See [docs/AlarmTracking.md](../AlarmTracking.md)
for the v2-final architecture.
## Project Layout
+58
View File
@@ -1,5 +1,63 @@
# Plan — alarms over the mxaccessgw gateway
> **17 of 19 PRs merged. Public contract surface and the lmxopcua /
> sidecar consumers are live; four merged PRs ship as scaffolds
> pending worker-side wiring.** Status reconciled against the source
> tree on 2026-05-01.
>
> **Functional end-to-end today:** B.1 / B.2 / B.3 / B.4 / B.5
> (EventPump branch, GalaxyDriver `IAlarmSource`, DriverNodeManager
> ack routing, `WonderwareHistorianClient : IAlarmHistorianWriter`,
> docs sweep), C.2 (sidecar wires the alarm-write slot), D.1 script
> (`scripts/install/Refresh-Services.ps1`), E.1 E.7 (proto regen +
> .NET / Python / Go / Java / Rust SDK alarm methods + lmxopcua client
> surface). The value-driven sub-attribute fallback path keeps Galaxy
> alarms functional today.
>
> **Merged-but-inert scaffolds (gated on worker AlarmClient wiring):**
>
> - **A.2** — `MxAccessAlarmEventSink.Attach` is a no-op; the COM-side
> `aaAlarmManagedClient.AlarmClient` registration / subscription has
> not landed yet, so the gateway's
> `MX_EVENT_FAMILY_ON_ALARM_TRANSITION` is reserved on the wire but
> never emitted.
> - **A.3** AcknowledgeAlarm + **A.4** QueryActiveAlarms — public RPC
> handlers in `MxAccessGatewayService.cs` route through
> `NotWiredAlarmRpcDispatcher` (Ack returns OK with a `worker dispatch
> pending dev-rig wiring` diagnostic; Query yields an empty stream).
> - **C.1** sidecar — `AahClientManagedAlarmEventWriter` exists and the
> IPC slot is wired, but the production backend
> `SdkAlarmHistorianWriteBackend.WriteBatchAsync` returns
> `RetryPlease` for every event with a placeholder log — the live
> `aahClientManaged` SDK call site is pinned during the D.1 dev-rig
> smoke. Effect: scripted-alarm transitions queue locally in
> `SqliteStoreAndForwardSink` and the drain worker repeatedly retries.
>
> **Architectural decision RESOLVED 2026-04-30** (recorded in the
> mxaccessgw repo at `src/MxGateway.Worker/MxAccess/MxAccessAlarmEventSink.cs`
> xmldoc): the worker hosts `aaAlarmManagedClient.AlarmClient` (x86
> .NET Framework 4.8 — same bitness as the existing MxAccess COM
> consumer) alongside the COM consumer, sharing the worker's STA +
> WM_APP message pump. The discovered API surface
> (`RegisterConsumer`, `Subscribe`, `GetStatistics`,
> `GetAlarmExtendedRec`, `AlarmAckByGUID`) is documented in that
> file's xmldoc. The earlier concern that AVEVA's alarm SDK was
> x64-only proved wrong against the deployed assemblies. What remains
> is wiring PRs in the worker — session-startup `RegisterConsumer` +
> `Subscribe`, an STA WM_APP handler that routes
> alarm-changed messages into `EnqueueTransition`, and the worker
> command path that calls `AlarmAckByGUID` from a gateway
> `AcknowledgeAlarm` RPC.
>
> **D.1 smoke artifact**
> (`docs/plans/artifacts/d1-rollout-YYYY-MM-DD.md`, called for in the
> Track D test plan below) not yet captured — gated on the worker
> AlarmClient wiring being live on the dev rig so the smoke can
> exercise the alarm scenarios end-to-end and pin the
> `SdkAlarmHistorianWriteBackend` SDK entry point.
>
> The remainder of this document is preserved as the design record.
Coordinated epic across two repos:
- **`lmxopcua`** (this repo) — `c:\Users\dohertj2\Desktop\lmxopcua\`
+9 -1
View File
@@ -1,4 +1,12 @@
# Alarm Tracking
# Alarm Tracking — v1 archive
> **Historical record.** This document describes the v1 / pre-PR-7.2
> Galaxy alarm path that ran inside `Galaxy.Host`'s STA pump as
> `GalaxyAlarmTracker`. PR 7.2 retired the in-process Galaxy stack; the
> alarms-over-gateway epic (B.2 / B.3 / E.7) restored Galaxy's
> `IAlarmSource` capability against the new gateway-mediated transport.
> See [docs/AlarmTracking.md](../AlarmTracking.md) for the v2 final
> architecture — that is the document to read for current behaviour.
Alarm surfacing is an optional driver capability exposed via `IAlarmSource` (`src/ZB.MOM.WW.OtOpcUa.Core.Abstractions/IAlarmSource.cs`). Drivers whose backends have an alarm concept implement it — today: Galaxy (MXAccess alarms), FOCAS (CNC alarms), OPC UA Client (A&C events from the upstream server). Modbus / S7 / AB CIP / AB Legacy / TwinCAT do not implement the interface and the feature is simply absent from their subtrees.
+43
View File
@@ -408,6 +408,49 @@ For production:
- Per-NodeId credentials in `ClusterNodeCredential` table (per decision #83)
- Admin app uses LDAP (no SQL credential at all on the user-facing side)
## Service Refresh — `Refresh-Services.ps1`
The deploy host hosts three NSSM-wrapped services (`MxAccessGw`,
`OtOpcUaWonderwareHistorian`, `OtOpcUa`) that consume binaries from
`C:\publish\`. After landing changes in either repo, refresh the
deployed bits with `scripts\install\Refresh-Services.ps1`:
```powershell
# Default invocation (dev rig).
& C:\Users\dohertj2\Desktop\lmxopcua\scripts\install\Refresh-Services.ps1
# Skip the timestamped backup (faster on iterative dev cycles).
& Refresh-Services.ps1 -SkipBackup
# Dry-run — print the actions without doing them.
& Refresh-Services.ps1 -WhatIf
```
The script:
1. Stops services in reverse-dependency order (`OtOpcUa`
`OtOpcUaWonderwareHistorian``MxAccessGw`) and force-kills
any residual processes.
2. Snapshots the existing `C:\publish\mxaccessgw\` and
`C:\publish\lmxopcua\` trees to `C:\publish\.backup-<timestamp>\`
for rollback (skip with `-SkipBackup`).
3. Builds + copies mxaccessgw worker (x86 net48) + server (net10.0)
binaries from the sibling repo.
4. `dotnet publish`-es the OtOpcUa server + Wonderware historian
sidecar from this repo.
5. Ensures `OTOPCUA_HISTORIAN_ALARM_WRITE_ENABLED=true` is set on
the historian service env block (PR C.2 toggle).
6. Starts services in forward-dependency order (`MxAccessGw`
`OtOpcUaWonderwareHistorian``OtOpcUa`).
7. Smoke-verifies — service status, listening ports (5120 / 4840 /
4841), recent log tails.
Functional verification (alarm raise / scripted alarm historian
round-trip / sub-attribute fallback) is the operator's next step
after the refresh; see
[docs/plans/alarms-over-gateway.md](../plans/alarms-over-gateway.md)
§Track D for the scenarios.
## Test Data Seed
Each environment needs a baseline data set so cross-developer tests are reproducible. Lives in `tests/ZB.MOM.WW.OtOpcUa.IntegrationTests/SeedData/`:
+4
View File
@@ -87,6 +87,10 @@ if ($InstallWonderwareHistorian) {
"OTOPCUA_ALLOWED_SID=$sid"
"OTOPCUA_HISTORIAN_SECRET=$HistorianSharedSecret"
"OTOPCUA_HISTORIAN_ENABLED=true"
# Default-on when the historian sidecar is installed; flip to false for a
# read-only deployment that still loads aahClientManaged for reads but
# rejects WriteAlarmEvents frames.
"OTOPCUA_HISTORIAN_ALARM_WRITE_ENABLED=true"
"OTOPCUA_HISTORIAN_SERVER=$HistorianServer"
"OTOPCUA_HISTORIAN_PORT=$HistorianPort"
) -join "`0"
+210
View File
@@ -0,0 +1,210 @@
[CmdletBinding()]
param(
[string]$RepoRoot = "C:\Users\dohertj2\Desktop\lmxopcua",
[string]$GatewayRoot = "C:\Users\dohertj2\Desktop\mxaccessgw",
[string]$PublishRoot = "C:\publish",
[switch]$SkipBackup,
[switch]$WhatIf
)
# PR D.1 — refresh C:\publish + restart services for the alarms-over-gateway
# epic. Stops services in reverse-dependency order (OtOpcUa →
# OtOpcUaWonderwareHistorian → MxAccessGw), refreshes binaries from the
# repos, then starts in forward order. A timestamped backup of the existing
# C:\publish trees lands under C:\publish\.backup-YYYY-MM-DD\ unless
# -SkipBackup is supplied.
#
# Designed to run as a single elevated PowerShell session on the deploy host
# (the dev rig today; production refresh is a separate runbook).
Set-StrictMode -Version Latest
$ErrorActionPreference = "Stop"
function Step([string]$Message) {
Write-Host ""
Write-Host "==> $Message" -ForegroundColor Cyan
}
function Run([scriptblock]$Block, [string]$Description) {
if ($WhatIf) {
Write-Host " (skip) $Description" -ForegroundColor DarkYellow
return
}
Write-Host " $Description"
& $Block
}
function Test-NssmService([string]$Name) {
$svc = Get-Service -Name $Name -ErrorAction SilentlyContinue
return $null -ne $svc
}
# ------------------------------------------------------------------------
# Step 1: Stop in reverse dependency order
# ------------------------------------------------------------------------
Step "Stopping services (OtOpcUa → OtOpcUaWonderwareHistorian → MxAccessGw)"
foreach ($name in @('OtOpcUa', 'OtOpcUaWonderwareHistorian', 'MxAccessGw')) {
if (Test-NssmService $name) {
Run { nssm stop $name } "stop $name"
}
else {
Write-Host " ($name not installed; skipping)" -ForegroundColor DarkGray
}
}
if (-not $WhatIf) {
Start-Sleep -Seconds 3
Get-Process MxGateway.Server, MxGateway.Worker, OtOpcUa.Server, OtOpcUa.Driver.Historian.Wonderware -ErrorAction SilentlyContinue |
ForEach-Object {
Write-Host " killing residual process $($_.ProcessName) (PID=$($_.Id))" -ForegroundColor DarkYellow
Stop-Process -Id $_.Id -Force -ErrorAction SilentlyContinue
}
}
# ------------------------------------------------------------------------
# Step 2: Backup existing C:\publish trees
# ------------------------------------------------------------------------
if (-not $SkipBackup -and (Test-Path $PublishRoot)) {
$backupRoot = Join-Path $PublishRoot ".backup-$((Get-Date).ToString('yyyy-MM-dd-HHmmss'))"
Step "Backing up $PublishRoot$backupRoot"
Run {
New-Item -ItemType Directory -Path $backupRoot | Out-Null
foreach ($subdir in @('mxaccessgw', 'lmxopcua')) {
$src = Join-Path $PublishRoot $subdir
if (Test-Path $src) {
Copy-Item -Recurse -Path $src -Destination (Join-Path $backupRoot $subdir)
}
}
} "snapshot publish dirs (rollback target)"
}
else {
Write-Host " (backup skipped)" -ForegroundColor DarkGray
}
# ------------------------------------------------------------------------
# Step 3: Refresh mxaccessgw binaries (Track A output)
# ------------------------------------------------------------------------
Step "Building + copying mxaccessgw binaries from $GatewayRoot"
Run {
& dotnet build "$GatewayRoot\src\MxGateway.Worker" -c Release | Out-Null
& dotnet build "$GatewayRoot\src\MxGateway.Server" -c Release | Out-Null
} "dotnet build (Worker x86 net48 + Server net10.0)"
Run {
$serverDest = Join-Path $PublishRoot "mxaccessgw\Server"
$workerDest = Join-Path $PublishRoot "mxaccessgw\Worker"
if (-not (Test-Path $serverDest)) { New-Item -ItemType Directory -Path $serverDest -Force | Out-Null }
if (-not (Test-Path $workerDest)) { New-Item -ItemType Directory -Path $workerDest -Force | Out-Null }
Copy-Item -Recurse -Force "$GatewayRoot\src\MxGateway.Server\bin\Release\net10.0\*" $serverDest
Copy-Item -Recurse -Force "$GatewayRoot\src\MxGateway.Worker\bin\x86\Release\net48\*" $workerDest
} "copy gateway server + worker outputs"
# ------------------------------------------------------------------------
# Step 4: Refresh OtOpcUa + Wonderware historian sidecar
# ------------------------------------------------------------------------
Step "Publishing OtOpcUa server + Wonderware historian sidecar from $RepoRoot"
Run {
& dotnet publish "$RepoRoot\src\ZB.MOM.WW.OtOpcUa.Server" `
-c Release -o (Join-Path $PublishRoot "lmxopcua") | Out-Null
& dotnet publish "$RepoRoot\src\ZB.MOM.WW.OtOpcUa.Driver.Historian.Wonderware" `
-c Release -o (Join-Path $PublishRoot "lmxopcua\WonderwareHistorian") | Out-Null
} "dotnet publish (Server + sidecar)"
# ------------------------------------------------------------------------
# Step 5: Service env block — ensure OTOPCUA_HISTORIAN_ALARM_WRITE_ENABLED
# is set on the Wonderware historian service (PR C.2 toggle).
# ------------------------------------------------------------------------
if (Test-NssmService 'OtOpcUaWonderwareHistorian') {
Step "Ensuring OTOPCUA_HISTORIAN_ALARM_WRITE_ENABLED is set on the historian service"
Run {
$existing = nssm get OtOpcUaWonderwareHistorian AppEnvironmentExtra
if ($existing -notmatch 'OTOPCUA_HISTORIAN_ALARM_WRITE_ENABLED') {
$combined = $existing + "`r`nOTOPCUA_HISTORIAN_ALARM_WRITE_ENABLED=true"
nssm set OtOpcUaWonderwareHistorian AppEnvironmentExtra $combined | Out-Null
Write-Host " appended OTOPCUA_HISTORIAN_ALARM_WRITE_ENABLED=true" -ForegroundColor DarkGreen
}
else {
Write-Host " already present; leaving service env block untouched"
}
} "patch service env block"
}
# ------------------------------------------------------------------------
# Step 6: Start in forward dependency order
# ------------------------------------------------------------------------
Step "Starting services (MxAccessGw → OtOpcUaWonderwareHistorian → OtOpcUa)"
foreach ($pair in @(
@{ Name = 'MxAccessGw'; Wait = 4 },
@{ Name = 'OtOpcUaWonderwareHistorian'; Wait = 4 },
@{ Name = 'OtOpcUa'; Wait = 8 }
)) {
$name = $pair.Name
if (Test-NssmService $name) {
Run { nssm start $name } "start $name"
if (-not $WhatIf) { Start-Sleep -Seconds $pair.Wait }
}
else {
Write-Host " ($name not installed; skipping)" -ForegroundColor DarkGray
}
}
# ------------------------------------------------------------------------
# Step 7: Smoke verification
# ------------------------------------------------------------------------
Step "Smoke verification"
if (-not $WhatIf) {
foreach ($name in @('MxAccessGw', 'OtOpcUaWonderwareHistorian', 'OtOpcUa')) {
if (Test-NssmService $name) {
$status = (Get-Service $name).Status
$color = if ($status -eq 'Running') { 'Green' } else { 'Red' }
Write-Host " $name = $status" -ForegroundColor $color
}
}
foreach ($port in @(5120, 4840, 4841)) {
$listening = Get-NetTCPConnection -LocalPort $port -State Listen -ErrorAction SilentlyContinue
$color = if ($listening) { 'Green' } else { 'DarkYellow' }
Write-Host " TCP $port listening = $($null -ne $listening)" -ForegroundColor $color
}
Write-Host ""
Write-Host " Recent log tails:" -ForegroundColor DarkCyan
$tails = @(
"$PublishRoot\lmxopcua\logs\otopcua-*.log",
"$PublishRoot\mxaccessgw\stdout.log",
"$env:ProgramData\OtOpcUa\historian-wonderware-*.log"
)
foreach ($pattern in $tails) {
$latest = Get-ChildItem -Path $pattern -ErrorAction SilentlyContinue |
Sort-Object LastWriteTime -Descending |
Select-Object -First 1
if ($null -ne $latest) {
Write-Host ""
Write-Host " --- $($latest.FullName) (last 10 lines) ---" -ForegroundColor DarkGray
Get-Content $latest.FullName -Tail 10 | ForEach-Object { Write-Host " $_" }
}
}
}
Write-Host ""
Write-Host "Refresh complete." -ForegroundColor Green
Write-Host ""
Write-Host "Next: run the functional verification scenarios from"
Write-Host " docs\plans\alarms-over-gateway.md §Track D §6 'Functional verification'"
Write-Host " - Galaxy-native alarm raise"
Write-Host " - Scripted alarm → AVEVA Historian round-trip"
Write-Host " - Sub-attribute fallback path with IAlarmSource disabled"
@@ -15,7 +15,10 @@ public sealed class AlarmEventArgs : EventArgs
bool ackedState,
DateTime time,
byte[]? eventId = null,
string? conditionNodeId = null)
string? conditionNodeId = null,
string? operatorComment = null,
DateTime? originalRaiseTimestampUtc = null,
string? alarmCategory = null)
{
SourceName = sourceName;
ConditionName = conditionName;
@@ -27,6 +30,9 @@ public sealed class AlarmEventArgs : EventArgs
Time = time;
EventId = eventId;
ConditionNodeId = conditionNodeId;
OperatorComment = operatorComment;
OriginalRaiseTimestampUtc = originalRaiseTimestampUtc;
AlarmCategory = alarmCategory;
}
/// <summary>The name of the source object that raised the alarm.</summary>
@@ -58,4 +64,25 @@ public sealed class AlarmEventArgs : EventArgs
/// <summary>The NodeId of the condition instance (SourceNode), used for acknowledgment.</summary>
public string? ConditionNodeId { get; }
/// <summary>
/// PR E.7 — Operator-supplied comment recorded by the upstream alarm system on
/// Acknowledge transitions. Null on raise / clear, or when the upstream path
/// can't surface the comment (sub-attribute fallback path collapses comments
/// into a single string write).
/// </summary>
public string? OperatorComment { get; }
/// <summary>
/// PR E.7 — When the alarm originally entered the active state. Preserved
/// across Acknowledge transitions so OPC UA Part 9 conditions keep the
/// original raise time. Null when the upstream path doesn't surface it.
/// </summary>
public DateTime? OriginalRaiseTimestampUtc { get; }
/// <summary>
/// PR E.7 — Upstream alarm taxonomy bucket (e.g. <c>Process</c> /
/// <c>Safety</c> / <c>Diagnostics</c>). Null when not surfaced.
/// </summary>
public string? AlarmCategory { get; }
}
@@ -41,6 +41,30 @@ public sealed record AlarmAcknowledgeRequest(
string? Comment);
/// <summary>Event payload for <see cref="IAlarmSource.OnAlarmEvent"/>.</summary>
/// <param name="SubscriptionHandle">Subscription this event belongs to.</param>
/// <param name="SourceNodeId">Driver-side identifier for the alarm source.</param>
/// <param name="ConditionId">Stable id correlating raise / ack / clear of the same condition.</param>
/// <param name="AlarmType">Driver-defined alarm type name (e.g. AnalogLimitAlarm.HiHi).</param>
/// <param name="Message">Human-readable alarm description.</param>
/// <param name="Severity">Four-bucket severity ladder.</param>
/// <param name="SourceTimestampUtc">When this transition occurred.</param>
/// <param name="OperatorComment">
/// Operator-supplied comment recorded by the upstream alarm system on Acknowledge
/// transitions. Null on raise / clear, or when the upstream path can't surface
/// the comment (the Galaxy sub-attribute fallback path collapses comments into a
/// single string write — null on that path; the driver-native gateway path
/// populates this).
/// </param>
/// <param name="OriginalRaiseTimestampUtc">
/// When the alarm originally entered the active state. Preserved across
/// Acknowledge transitions so OPC UA Part 9 conditions keep the original raise
/// time in <c>Time</c>. Null when the upstream path doesn't surface it.
/// </param>
/// <param name="AlarmCategory">
/// Upstream alarm taxonomy bucket (e.g. <c>Process</c> / <c>Safety</c> /
/// <c>Diagnostics</c>). Maps to OPC UA <c>ConditionClassName</c> downstream when
/// a class mapping is configured. Null when the upstream path doesn't carry it.
/// </param>
public sealed record AlarmEventArgs(
IAlarmSubscriptionHandle SubscriptionHandle,
string SourceNodeId,
@@ -48,7 +72,10 @@ public sealed record AlarmEventArgs(
string AlarmType,
string Message,
AlarmSeverity Severity,
DateTime SourceTimestampUtc);
DateTime SourceTimestampUtc,
string? OperatorComment = null,
DateTime? OriginalRaiseTimestampUtc = null,
string? AlarmCategory = null);
/// <summary>Mirrors the <c>NodePermissions</c> alarm-severity enum in <c>docs/v2/acl-design.md</c>.</summary>
public enum AlarmSeverity { Low, Medium, High, Critical }
@@ -26,7 +26,7 @@ namespace ZB.MOM.WW.OtOpcUa.Driver.Galaxy;
/// "GalaxyMxGateway" so both paths can be live simultaneously during parity testing.
/// </remarks>
public sealed class GalaxyDriver
: IDriver, ITagDiscovery, IReadable, IWritable, ISubscribable, IRediscoverable, IHostConnectivityProbe, IDisposable
: IDriver, ITagDiscovery, IReadable, IWritable, ISubscribable, IRediscoverable, IHostConnectivityProbe, IAlarmSource, IDisposable
{
private readonly string _driverInstanceId;
private readonly GalaxyDriverOptions _options;
@@ -63,6 +63,16 @@ public sealed class GalaxyDriver
private EventPump? _eventPump;
private readonly Lock _pumpLock = new();
// PR B.2 — IAlarmSource implementation. Production-side acks route through
// GatewayGalaxyAlarmAcknowledger which calls MxGatewayClient.AcknowledgeAlarmAsync
// (PR E.2 SDK). Tests inject IGalaxyAlarmAcknowledger via the internal ctor to
// exercise the wiring without a running gateway. The alarm event stream is
// delivered by EventPump.OnAlarmTransition (PR B.1) — this driver is the
// consumer that bridges it onto IAlarmSource.OnAlarmEvent.
private IGalaxyAlarmAcknowledger? _alarmAcknowledger;
private readonly Lock _alarmHandlersLock = new();
private readonly HashSet<GalaxyAlarmSubscriptionHandle> _alarmSubscriptions = new();
// PR 4.W — production runtime owned by InitializeAsync. The driver builds these
// when it opens a real gw session; tests bypass them by injecting seams via the
// internal ctor.
@@ -99,12 +109,16 @@ public sealed class GalaxyDriver
/// <summary>Fires when a host transitions Running ↔ Stopped (PR 4.7 HostStatusAggregator).</summary>
public event EventHandler<HostStatusChangedEventArgs>? OnHostStatusChanged;
/// <inheritdoc />
public event EventHandler<AlarmEventArgs>? OnAlarmEvent;
public GalaxyDriver(
string driverInstanceId,
GalaxyDriverOptions options,
ILogger<GalaxyDriver>? logger = null)
: this(driverInstanceId, options,
hierarchySource: null, dataReader: null, dataWriter: null, subscriber: null, logger)
hierarchySource: null, dataReader: null, dataWriter: null, subscriber: null,
alarmAcknowledger: null, logger)
{
}
@@ -121,6 +135,7 @@ public sealed class GalaxyDriver
IGalaxyDataReader? dataReader = null,
IGalaxyDataWriter? dataWriter = null,
IGalaxySubscriber? subscriber = null,
IGalaxyAlarmAcknowledger? alarmAcknowledger = null,
ILogger<GalaxyDriver>? logger = null)
{
_driverInstanceId = !string.IsNullOrWhiteSpace(driverInstanceId)
@@ -132,6 +147,7 @@ public sealed class GalaxyDriver
_dataReader = dataReader;
_dataWriter = dataWriter;
_subscriber = subscriber;
_alarmAcknowledger = alarmAcknowledger;
// Forward the aggregator's transitions through IHostConnectivityProbe.
_hostStatuses.OnHostStatusChanged += (_, args) => OnHostStatusChanged?.Invoke(this, args);
@@ -213,6 +229,9 @@ public sealed class GalaxyDriver
_probeWatcher = new PerPlatformProbeWatcher(
_subscriber, _hostStatuses, _logger,
bufferedUpdateIntervalMs: _options.MxAccess.PublishingIntervalMs);
// PR B.2 — wire the alarm acknowledger to the live gateway client.
_alarmAcknowledger ??= new GatewayGalaxyAlarmAcknowledger(_ownedMxClient, _ownedMxSession, _logger);
}
/// <summary>
@@ -705,11 +724,135 @@ public sealed class GalaxyDriver
channelCapacity: _options.MxAccess.EventPumpChannelCapacity,
clientName: _options.MxAccess.ClientName);
_eventPump.OnDataChange += OnPumpDataChange;
_eventPump.OnAlarmTransition += OnPumpAlarmTransition;
_eventPump.Start();
return _eventPump;
}
}
// ===== IAlarmSource (PR B.2) =====
/// <inheritdoc />
public Task<IAlarmSubscriptionHandle> SubscribeAlarmsAsync(
IReadOnlyList<string> sourceNodeIds, CancellationToken cancellationToken)
{
ObjectDisposedException.ThrowIf(_disposed, this);
ArgumentNullException.ThrowIfNull(sourceNodeIds);
// The driver doesn't multiplex alarm subscriptions per source-node-id today —
// alarm events arrive on the same gateway StreamEvents channel as data-change
// events once the gateway emits the new family (PRs A.2 + A.3). The
// subscription handle is a sentinel the server uses for symmetric Unsubscribe;
// every active handle receives every alarm transition, and the server filters
// by source node before raising Part 9 conditions. Same shape AbCip uses.
EnsureEventPumpStarted();
var handle = new GalaxyAlarmSubscriptionHandle(Guid.NewGuid().ToString("N"));
lock (_alarmHandlersLock)
{
_alarmSubscriptions.Add(handle);
}
return Task.FromResult<IAlarmSubscriptionHandle>(handle);
}
/// <inheritdoc />
public Task UnsubscribeAlarmsAsync(IAlarmSubscriptionHandle handle, CancellationToken cancellationToken)
{
ObjectDisposedException.ThrowIf(_disposed, this);
ArgumentNullException.ThrowIfNull(handle);
if (handle is not GalaxyAlarmSubscriptionHandle gash)
{
throw new ArgumentException(
$"Subscription handle was not issued by this driver (expected GalaxyAlarmSubscriptionHandle, got {handle.GetType().Name}).",
nameof(handle));
}
lock (_alarmHandlersLock)
{
_alarmSubscriptions.Remove(gash);
}
return Task.CompletedTask;
}
/// <inheritdoc />
public async Task AcknowledgeAsync(
IReadOnlyList<AlarmAcknowledgeRequest> acknowledgements, CancellationToken cancellationToken)
{
ObjectDisposedException.ThrowIf(_disposed, this);
ArgumentNullException.ThrowIfNull(acknowledgements);
if (acknowledgements.Count == 0) return;
if (_alarmAcknowledger is null)
{
throw new NotSupportedException(
"GalaxyDriver.AcknowledgeAsync requires GatewayGalaxyAlarmAcknowledger wired against a connected " +
"GalaxyMxSession (PR B.2). InitializeAsync must run before alarm acknowledgements can flow.");
}
// Acks are issued one-by-one — the gateway RPC accepts a single alarm
// reference per call. AlarmConditionState's per-condition Acknowledge in the
// server-side ACL layer is the natural rate-limit, so issuing in series here
// keeps the operator-comment ordering deterministic without bursting the
// worker's STA queue.
foreach (var ack in acknowledgements)
{
// ConditionId carries the alarm full reference for the Galaxy driver —
// SourceNodeId is the OPC UA browse path, which the gateway can't address.
// The server-side condition state pairs them through AlarmConditionService.
var alarmFullReference = !string.IsNullOrEmpty(ack.ConditionId)
? ack.ConditionId
: ack.SourceNodeId;
await _alarmAcknowledger.AcknowledgeAsync(
alarmFullReference,
ack.Comment ?? string.Empty,
operatorUser: string.Empty, // server-side ACL fills this from the OPC UA session
cancellationToken).ConfigureAwait(false);
}
}
/// <summary>
/// Receives <see cref="GalaxyAlarmTransition"/> events from the EventPump and
/// reshapes them into <see cref="AlarmEventArgs"/> for OPC UA-side consumers.
/// Fires <see cref="OnAlarmEvent"/> only when at least one alarm subscription is
/// active so a server that hasn't called <see cref="SubscribeAlarmsAsync"/> yet
/// doesn't surface untracked transitions.
/// </summary>
private void OnPumpAlarmTransition(object? sender, GalaxyAlarmTransition transition)
{
GalaxyAlarmSubscriptionHandle? handle;
lock (_alarmHandlersLock)
{
// Pick any active subscription handle as the "owner" of the event. The
// server-side state machine doesn't multiplex by handle today; if multiple
// alarm subscriptions are active we still only fire the event once and
// the AlarmConditionService dispatches per-source-node downstream.
handle = _alarmSubscriptions.Count > 0
? _alarmSubscriptions.First()
: null;
}
if (handle is null) return;
var args = new AlarmEventArgs(
SubscriptionHandle: handle,
SourceNodeId: transition.SourceObjectReference,
ConditionId: transition.AlarmFullReference,
AlarmType: transition.AlarmTypeName,
Message: transition.Description,
Severity: transition.SeverityBucket,
SourceTimestampUtc: transition.TransitionTimestampUtc,
OperatorComment: string.IsNullOrEmpty(transition.OperatorComment) ? null : transition.OperatorComment,
OriginalRaiseTimestampUtc: transition.OriginalRaiseTimestampUtc,
AlarmCategory: string.IsNullOrEmpty(transition.Category) ? null : transition.Category);
try
{
OnAlarmEvent?.Invoke(this, args);
}
catch (Exception ex)
{
_logger.LogWarning(ex,
"GalaxyDriver OnAlarmEvent handler threw for {AlarmRef} — continuing.",
transition.AlarmFullReference);
}
}
/// <summary>
/// Forwards every fan-out event to the public <see cref="OnDataChange"/> for
/// ISubscribable consumers, AND routes ScanState changes to the per-platform
@@ -0,0 +1,21 @@
using ZB.MOM.WW.OtOpcUa.Core.Abstractions;
namespace ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Runtime;
/// <summary>
/// Driver-side handle returned by <see cref="GalaxyDriver.SubscribeAlarmsAsync"/>.
/// The driver doesn't multiplex alarm transitions per handle — every active handle
/// observes the gateway's alarm-event stream — but the handle is needed for
/// symmetric Unsubscribe and for the server-side AlarmConditionService to
/// correlate transitions with the originating subscription.
/// </summary>
internal sealed class GalaxyAlarmSubscriptionHandle : IAlarmSubscriptionHandle
{
public GalaxyAlarmSubscriptionHandle(string diagnosticId)
{
DiagnosticId = diagnosticId;
}
/// <inheritdoc />
public string DiagnosticId { get; }
}
@@ -0,0 +1,65 @@
using Microsoft.Extensions.Logging;
using MxGateway.Client;
using MxGateway.Contracts.Proto;
namespace ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Runtime;
/// <summary>
/// Production <see cref="IGalaxyAlarmAcknowledger"/> backed by the
/// <c>MxGatewayClient.AcknowledgeAlarmAsync</c> RPC (PR E.2). Maps the
/// reply's protocol status into a thrown exception when the gateway
/// reports a non-OK condition; native MxStatus failures inside the reply
/// surface as a logged warning so operator workflows aren't blocked by a
/// transient MxAccess hiccup.
/// </summary>
internal sealed class GatewayGalaxyAlarmAcknowledger : IGalaxyAlarmAcknowledger
{
private readonly MxGatewayClient _client;
private readonly GalaxyMxSession _session;
private readonly ILogger _logger;
public GatewayGalaxyAlarmAcknowledger(
MxGatewayClient client,
GalaxyMxSession session,
ILogger logger)
{
_client = client ?? throw new ArgumentNullException(nameof(client));
_session = session ?? throw new ArgumentNullException(nameof(session));
_logger = logger ?? throw new ArgumentNullException(nameof(logger));
}
public async Task AcknowledgeAsync(
string alarmFullReference,
string comment,
string operatorUser,
CancellationToken cancellationToken)
{
ArgumentException.ThrowIfNullOrEmpty(alarmFullReference);
var session = _session.Session
?? throw new InvalidOperationException(
"GatewayGalaxyAlarmAcknowledger requires a connected GalaxyMxSession; underlying gateway session is null.");
var sessionId = session.SessionId;
var reply = await _client.AcknowledgeAlarmAsync(
new AcknowledgeAlarmRequest
{
SessionId = sessionId,
ClientCorrelationId = Guid.NewGuid().ToString("N"),
AlarmFullReference = alarmFullReference,
Comment = comment ?? string.Empty,
OperatorUser = operatorUser ?? string.Empty,
},
cancellationToken).ConfigureAwait(false);
if (reply.Status is { Success: 0 } status)
{
// Native MxAccess rejected the ack — log but don't throw. Treat as a
// best-effort operator workflow; the operator can retry via the OPC UA
// session if necessary.
_logger.LogWarning(
"Galaxy AcknowledgeAlarm for {AlarmRef} returned MxStatus failure: category={Category} detail={Detail} text={Text}",
alarmFullReference, status.Category, status.Detail, status.DiagnosticText);
}
}
}
@@ -0,0 +1,32 @@
using ZB.MOM.WW.OtOpcUa.Core.Abstractions;
namespace ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Runtime;
/// <summary>
/// Test seam for the gateway-side Acknowledge call. Production wraps the
/// <c>MxGatewayClient.AcknowledgeAlarmAsync</c> RPC; tests substitute a fake
/// so <see cref="GalaxyDriver.AcknowledgeAsync"/> can be exercised without a
/// running gateway.
/// </summary>
internal interface IGalaxyAlarmAcknowledger
{
/// <summary>
/// Forward a single alarm acknowledgement to the gateway. The gateway
/// translates this to an MxAccess Acknowledge call against the worker's
/// session and returns the native MxStatus on the reply.
/// </summary>
/// <param name="alarmFullReference">
/// Fully-qualified alarm reference (e.g. <c>"Tank01.Level.HiHi"</c>).
/// </param>
/// <param name="comment">Operator-supplied comment forwarded to MxAccess.</param>
/// <param name="operatorUser">
/// Operator principal performing the acknowledgement. Resolved from the
/// OPC UA session by the server-side ACL layer before reaching the driver.
/// </param>
/// <param name="cancellationToken">Cancels the gateway RPC.</param>
Task AcknowledgeAsync(
string alarmFullReference,
string comment,
string operatorUser,
CancellationToken cancellationToken);
}
@@ -0,0 +1,105 @@
using System;
using System.Threading;
using System.Threading.Tasks;
using Serilog;
using ZB.MOM.WW.OtOpcUa.Driver.Historian.Wonderware.Ipc;
namespace ZB.MOM.WW.OtOpcUa.Driver.Historian.Wonderware.Backend
{
/// <summary>
/// IPC-side <see cref="IAlarmEventWriter"/> implementation that delegates to an
/// <see cref="IAlarmHistorianWriteBackend"/> (production: aahClientManaged-bound)
/// and maps the trinary <see cref="AlarmHistorianWriteOutcome"/> down to the
/// <c>bool[]</c> the IPC reply contract carries. Per-event outcomes:
/// <list type="bullet">
/// <item><description><see cref="AlarmHistorianWriteOutcome.Ack"/> → <c>true</c> (drop from sender's queue).</description></item>
/// <item><description><see cref="AlarmHistorianWriteOutcome.RetryPlease"/> → <c>false</c> (sender retries on next drain tick).</description></item>
/// <item><description><see cref="AlarmHistorianWriteOutcome.PermanentFail"/> → <c>false</c> (sender's B.4 widens the IPC bool back into the trinary outcome by inspecting structured diagnostics; this slot intentionally collapses to "not-ok" at the wire).</description></item>
/// </list>
/// </summary>
public sealed class AahClientManagedAlarmEventWriter : IAlarmEventWriter
{
private static readonly ILogger Log = Serilog.Log.ForContext<AahClientManagedAlarmEventWriter>();
private readonly IAlarmHistorianWriteBackend _backend;
public AahClientManagedAlarmEventWriter(IAlarmHistorianWriteBackend backend)
{
_backend = backend ?? throw new ArgumentNullException(nameof(backend));
}
public async Task<bool[]> WriteAsync(AlarmHistorianEventDto[] events, CancellationToken cancellationToken)
{
if (events is null || events.Length == 0)
{
return new bool[0];
}
AlarmHistorianWriteOutcome[] outcomes;
try
{
outcomes = await _backend.WriteBatchAsync(events, cancellationToken).ConfigureAwait(false);
}
catch (OperationCanceledException)
{
throw;
}
catch (Exception ex)
{
// Backend-level failure (cluster unreachable, transport error). Treat the
// whole batch as RetryPlease so the sender's queue holds the rows for
// the next drain tick — preferable to dropping them on a transient.
Log.Warning(ex,
"Alarm historian backend WriteBatchAsync threw — marking entire {Count}-event batch RetryPlease.",
events.Length);
var fallback = new bool[events.Length];
return fallback;
}
if (outcomes.Length != events.Length)
{
// Backend contract violation — defensive degrade so a bug in the backend
// doesn't desync the sender's queue accounting. Treat as RetryPlease.
Log.Warning(
"Alarm historian backend returned {ReturnedCount} outcomes for a batch of {InputCount} events; degrading to RetryPlease for the whole batch.",
outcomes.Length, events.Length);
return new bool[events.Length];
}
var perEventOk = new bool[outcomes.Length];
for (var i = 0; i < outcomes.Length; i++)
{
perEventOk[i] = outcomes[i] == AlarmHistorianWriteOutcome.Ack;
}
return perEventOk;
}
/// <summary>
/// Translate the outcome of a single SDK call (raw HRESULT + diagnostic) into the
/// trinary <see cref="AlarmHistorianWriteOutcome"/>. Exposed for the production
/// <see cref="SdkAlarmHistorianWriteBackend"/> to share the mapping with tests.
/// </summary>
public static AlarmHistorianWriteOutcome MapOutcome(int hresult, bool isCommunicationError, bool isMalformedInput)
{
// Order matters: malformed input is permanent regardless of HRESULT pattern;
// communication-class errors are transient regardless of which specific
// HRESULT bit fired.
if (isMalformedInput)
{
return AlarmHistorianWriteOutcome.PermanentFail;
}
if (hresult == 0)
{
return AlarmHistorianWriteOutcome.Ack;
}
if (isCommunicationError)
{
return AlarmHistorianWriteOutcome.RetryPlease;
}
// Default: unknown HRESULT failure — be conservative and let the sender retry.
// The sender's drain worker has its own dead-letter cap so a permanently-broken
// event won't loop forever.
return AlarmHistorianWriteOutcome.RetryPlease;
}
}
}
@@ -0,0 +1,19 @@
namespace ZB.MOM.WW.OtOpcUa.Driver.Historian.Wonderware.Backend
{
/// <summary>
/// Per-event outcome from <see cref="IAlarmHistorianWriteBackend.WriteBatchAsync"/>.
/// Sidecar-local twin of <c>Core.AlarmHistorian.HistorianWriteOutcome</c> (the
/// sidecar runs net48 and cannot reference the net10 Core project; the IPC
/// contract narrows this to <c>bool</c> per slot, so the lmxopcua-side consumer
/// widens that back into the trinary outcome at the IPC boundary in PR B.4).
/// </summary>
public enum AlarmHistorianWriteOutcome
{
/// <summary>Event accepted by the historian. Drop from the store-and-forward queue.</summary>
Ack,
/// <summary>Transient failure (server busy, disconnected, timeout). Leave queued; retry on next drain tick.</summary>
RetryPlease,
/// <summary>Permanent failure (malformed event, unrecoverable SDK error). Move to dead-letter on the lmxopcua side.</summary>
PermanentFail,
}
}
@@ -0,0 +1,30 @@
using System.Threading;
using System.Threading.Tasks;
using ZB.MOM.WW.OtOpcUa.Driver.Historian.Wonderware.Ipc;
namespace ZB.MOM.WW.OtOpcUa.Driver.Historian.Wonderware.Backend
{
/// <summary>
/// The actual aahClientManaged-bound writer. Extracted so unit tests can
/// substitute a fake without touching the SDK; the production
/// implementation lives in <see cref="SdkAlarmHistorianWriteBackend"/>.
/// </summary>
/// <remarks>
/// Implementations are responsible for connection management + cluster
/// failover. The wrapping <see cref="AahClientManagedAlarmEventWriter"/>
/// handles batch-level orchestration but delegates the per-event SDK call
/// here so the unit tests can drive every documented MxStatus outcome
/// without an installed AVEVA Historian.
/// </remarks>
public interface IAlarmHistorianWriteBackend
{
/// <summary>
/// Persist the supplied events to the historian. Returns one outcome per
/// input slot in the same order — must always return an array of the same
/// length as <paramref name="events"/>.
/// </summary>
Task<AlarmHistorianWriteOutcome[]> WriteBatchAsync(
AlarmHistorianEventDto[] events,
CancellationToken cancellationToken);
}
}
@@ -0,0 +1,73 @@
using System;
using System.Threading;
using System.Threading.Tasks;
using Serilog;
using ZB.MOM.WW.OtOpcUa.Driver.Historian.Wonderware.Ipc;
namespace ZB.MOM.WW.OtOpcUa.Driver.Historian.Wonderware.Backend
{
/// <summary>
/// Production <see cref="IAlarmHistorianWriteBackend"/> backed by AVEVA Historian's
/// <c>aahClientManaged</c> alarm-event write API. The exact SDK entry point is
/// pinned during the live-rig smoke in PR D.1 — until that gate, this backend
/// reports <see cref="AlarmHistorianWriteOutcome.RetryPlease"/> for every
/// event with a structured diagnostic so the lmxopcua-side
/// <c>SqliteStoreAndForwardSink</c> retains the queued events rather than dropping
/// or hard-failing them.
/// </summary>
/// <remarks>
/// <para>
/// Cluster failover reuses <see cref="HistorianClusterEndpointPicker"/> via
/// the shared <see cref="HistorianDataSource"/> connection pool — there is
/// no second connection pool for writes. Wonderware Historian's alarm-event
/// write surface accepts the same <c>HistorianAccess</c> session a read
/// opens, so reusing the picker is parity-preserving with v1's
/// <c>GalaxyHistorianWriter</c>.
/// </para>
/// <para>
/// Once D.1 confirms the SDK entry point, this class swaps the placeholder
/// body for the real call sequence. The mapping from raw HRESULT /
/// <c>HistorianError</c> codes onto <see cref="AlarmHistorianWriteOutcome"/>
/// is already shared via <see cref="AahClientManagedAlarmEventWriter.MapOutcome"/>
/// so the smoke-pinned change stays minimal.
/// </para>
/// </remarks>
public sealed class SdkAlarmHistorianWriteBackend : IAlarmHistorianWriteBackend
{
private static readonly ILogger Log = Serilog.Log.ForContext<SdkAlarmHistorianWriteBackend>();
private readonly HistorianConfiguration _config;
public SdkAlarmHistorianWriteBackend(HistorianConfiguration config)
{
_config = config ?? throw new ArgumentNullException(nameof(config));
}
public Task<AlarmHistorianWriteOutcome[]> WriteBatchAsync(
AlarmHistorianEventDto[] events,
CancellationToken cancellationToken)
{
if (events is null || events.Length == 0)
{
return Task.FromResult(new AlarmHistorianWriteOutcome[0]);
}
// Placeholder: pin the SDK entry point in PR D.1 against a live AVEVA
// Historian. Until then the call returns RetryPlease for every slot so
// the lmxopcua-side sink keeps the events queued rather than dropping
// them — same effect as the current NullAlarmHistorianSink fallback,
// but visible through the structured diagnostic + per-event outcome.
Log.Warning(
"Alarm historian SDK write path not yet pinned — returning RetryPlease for {Count} event(s) from server {Server}. PR D.1 swaps this for the live aahClientManaged call.",
events.Length,
_config.ServerName);
var outcomes = new AlarmHistorianWriteOutcome[events.Length];
for (var i = 0; i < outcomes.Length; i++)
{
outcomes[i] = AlarmHistorianWriteOutcome.RetryPlease;
}
return Task.FromResult(outcomes);
}
}
}
@@ -54,7 +54,8 @@ public static class Program
}
using var historian = BuildHistorian();
var handler = new HistorianFrameHandler(historian, Log.Logger);
var alarmWriter = BuildAlarmWriter();
var handler = new HistorianFrameHandler(historian, Log.Logger, alarmWriter);
using var server = new PipeServer(pipeName, allowedSid, sharedSecret, Log.Logger);
Log.Information("Wonderware historian sidecar serving — pipe={Pipe} allowedSid={Sid}", pipeName, allowedSidValue);
@@ -107,4 +108,46 @@ public static class Program
var raw = Environment.GetEnvironmentVariable(envName);
return int.TryParse(raw, out var parsed) ? parsed : defaultValue;
}
/// <summary>
/// Constructs the alarm-event writer when the alarm-write toggle is on, otherwise
/// returns <c>null</c> so <see cref="HistorianFrameHandler"/> falls back to the
/// "not configured" reply for any incoming <c>WriteAlarmEvents</c> frame.
/// Default is <c>true</c> when <c>OTOPCUA_HISTORIAN_ENABLED=true</c>; explicitly
/// set <c>OTOPCUA_HISTORIAN_ALARM_WRITE_ENABLED=false</c> to keep a read-only
/// deployment that still loads the SDK for reads.
/// </summary>
internal static IAlarmEventWriter? BuildAlarmWriter()
{
var raw = Environment.GetEnvironmentVariable("OTOPCUA_HISTORIAN_ALARM_WRITE_ENABLED");
var enabled = string.IsNullOrWhiteSpace(raw)
? true
: !string.Equals(raw, "false", StringComparison.OrdinalIgnoreCase);
if (!enabled)
{
Log.Information("Alarm-event writer disabled (OTOPCUA_HISTORIAN_ALARM_WRITE_ENABLED=false); historian sidecar will reject WriteAlarmEvents frames.");
return null;
}
var cfg = BuildAlarmWriterConfig();
var backend = new SdkAlarmHistorianWriteBackend(cfg);
Log.Information("Alarm-event writer enabled — backend=SdkAlarmHistorianWriteBackend server={Server}", cfg.ServerName);
return new AahClientManagedAlarmEventWriter(backend);
}
private static HistorianConfiguration BuildAlarmWriterConfig()
{
return new HistorianConfiguration
{
Enabled = true,
ServerName = Environment.GetEnvironmentVariable("OTOPCUA_HISTORIAN_SERVER") ?? "localhost",
Port = TryParseInt("OTOPCUA_HISTORIAN_PORT", 32568),
IntegratedSecurity = !string.Equals(Environment.GetEnvironmentVariable("OTOPCUA_HISTORIAN_INTEGRATED"), "false", StringComparison.OrdinalIgnoreCase),
UserName = Environment.GetEnvironmentVariable("OTOPCUA_HISTORIAN_USER"),
Password = Environment.GetEnvironmentVariable("OTOPCUA_HISTORIAN_PASS"),
CommandTimeoutSeconds = TryParseInt("OTOPCUA_HISTORIAN_TIMEOUT_SEC", 30),
FailureCooldownSeconds = TryParseInt("OTOPCUA_HISTORIAN_COOLDOWN_SEC", 60),
};
}
}
@@ -221,6 +221,49 @@ public sealed class DriverNodeManager : CustomNodeManager2, IAddressSpaceBuilder
}
}
/// <summary>
/// PR B.3 — preferred <see cref="IAlarmAcknowledger"/> for drivers that implement
/// <see cref="IAlarmSource"/> (today: Galaxy via the gateway-side AcknowledgeAlarm
/// RPC). Routes the operator comment through the driver's native ack API, which
/// preserves operator-comment fidelity end-to-end (the value-driven sub-attribute
/// fallback collapses the comment into a single string write).
/// </summary>
private sealed class DriverAlarmSourceAcknowledger(
IAlarmSource alarmSource,
string conditionId,
ZB.MOM.WW.OtOpcUa.Core.Resilience.AlarmSurfaceInvoker alarmInvoker) : IAlarmAcknowledger
{
public async Task<bool> WriteAckMessageAsync(
string ackMsgWriteRef, string comment, CancellationToken cancellationToken)
{
// ackMsgWriteRef is unused on this path — the driver's IAlarmSource.AcknowledgeAsync
// routes the ack against the alarm condition itself, not against the
// sub-attribute. ConditionId carries the alarm full reference; SourceNodeId
// is left empty since the gateway only addresses by full reference.
// _ = alarmSource keeps the analyzer-required reference visible without an
// unwrapped call — the actual ack runs through the AlarmSurfaceInvoker which
// wires the AlarmAcknowledge resilience pipeline (no-retry per decision #143).
_ = alarmSource;
try
{
await alarmInvoker.AcknowledgeAsync(
new[]
{
new AlarmAcknowledgeRequest(
SourceNodeId: string.Empty,
ConditionId: conditionId,
Comment: comment ?? string.Empty),
},
cancellationToken).ConfigureAwait(false);
return true;
}
catch
{
return false;
}
}
}
/// <summary>
/// Detach from the alarm service before the base disposes. The service is shared across
/// drivers, so leaking the handler keeps a dead DriverNodeManager pinned in memory and
@@ -787,8 +830,23 @@ public sealed class DriverNodeManager : CustomNodeManager2, IAddressSpaceBuilder
if (_owner._alarmService is not null && !string.IsNullOrEmpty(info.InAlarmRef))
{
_owner._conditionSinks[FullReference] = sink;
var acker = new DriverWritableAcknowledger(
_owner._writable, _owner._invoker, _owner._driver.DriverInstanceId);
// PR B.3 — prefer IAlarmSource.AcknowledgeAsync (driver-native path)
// when the driver supports it. Galaxy implements this since PR B.2;
// for drivers without IAlarmSource the value-driven sub-attribute
// fallback (DriverWritableAcknowledger) preserves the existing
// behaviour.
IAlarmAcknowledger acker;
if (_owner._driver is IAlarmSource alarmSource)
{
var alarmInvoker = new ZB.MOM.WW.OtOpcUa.Core.Resilience.AlarmSurfaceInvoker(
_owner._invoker, alarmSource, _owner._driver.DriverInstanceId);
acker = new DriverAlarmSourceAcknowledger(alarmSource, FullReference, alarmInvoker);
}
else
{
acker = new DriverWritableAcknowledger(
_owner._writable, _owner._invoker, _owner._driver.DriverInstanceId);
}
_owner._alarmService.Track(FullReference, info, acker);
}
@@ -41,6 +41,7 @@ public sealed class Phase7Composer : IAsyncDisposable
private readonly DriverHost _driverHost;
private readonly DriverEquipmentContentRegistry _equipmentRegistry;
private readonly IAlarmHistorianSink _historianSink;
private readonly IAlarmHistorianWriter? _injectedWriter;
private readonly ILoggerFactory _loggerFactory;
private readonly Serilog.ILogger _scriptLogger;
private readonly ILogger<Phase7Composer> _logger;
@@ -59,12 +60,14 @@ public sealed class Phase7Composer : IAsyncDisposable
IAlarmHistorianSink historianSink,
ILoggerFactory loggerFactory,
Serilog.ILogger scriptLogger,
ILogger<Phase7Composer> logger)
ILogger<Phase7Composer> logger,
IAlarmHistorianWriter? injectedWriter = null)
{
_scopeFactory = scopeFactory ?? throw new ArgumentNullException(nameof(scopeFactory));
_driverHost = driverHost ?? throw new ArgumentNullException(nameof(driverHost));
_equipmentRegistry = equipmentRegistry ?? throw new ArgumentNullException(nameof(equipmentRegistry));
_historianSink = historianSink ?? throw new ArgumentNullException(nameof(historianSink));
_injectedWriter = injectedWriter;
_loggerFactory = loggerFactory ?? throw new ArgumentNullException(nameof(loggerFactory));
_scriptLogger = scriptLogger ?? throw new ArgumentNullException(nameof(scriptLogger));
_logger = logger ?? throw new ArgumentNullException(nameof(logger));
@@ -131,27 +134,53 @@ public sealed class Phase7Composer : IAsyncDisposable
return _sources;
}
private IAlarmHistorianSink ResolveHistorianSink()
/// <summary>
/// Resolution order for the alarm-historian writer:
/// <list type="number">
/// <item><description>Any registered driver that implements <see cref="IAlarmHistorianWriter"/> (today: none — Galaxy used to via the legacy GalaxyProxyDriver).</description></item>
/// <item><description>The DI-registered <see cref="IAlarmHistorianWriter"/> (PR B.4 — the WonderwareHistorianClient sidecar writer when <c>Historian:Wonderware:Enabled=true</c>).</description></item>
/// <item><description><c>null</c> — caller falls back to the injected <see cref="IAlarmHistorianSink"/> (NullAlarmHistorianSink in the default registration).</description></item>
/// </list>
/// Driver-provided writers win over the DI-registered sidecar so a future
/// GalaxyDriver-as-IAlarmHistorianWriter takes the write path directly,
/// preserving the v1 invariant where a driver that natively owns the
/// historian client doesn't bounce through the sidecar IPC.
/// </summary>
internal static IAlarmHistorianWriter? SelectAlarmHistorianWriter(
DriverHost driverHost,
IAlarmHistorianWriter? injectedWriter,
out string? selectedSourceDescription)
{
IAlarmHistorianWriter? writer = null;
foreach (var driverId in _driverHost.RegisteredDriverIds)
foreach (var driverId in driverHost.RegisteredDriverIds)
{
if (_driverHost.GetDriver(driverId) is IAlarmHistorianWriter w)
if (driverHost.GetDriver(driverId) is IAlarmHistorianWriter w)
{
writer = w;
_logger.LogInformation(
"Phase 7 historian sink: driver {Driver} provides IAlarmHistorianWriter — wiring SqliteStoreAndForwardSink",
driverId);
break;
selectedSourceDescription = $"driver:{driverId}";
return w;
}
}
if (injectedWriter is not null)
{
selectedSourceDescription = $"di:{injectedWriter.GetType().Name}";
return injectedWriter;
}
selectedSourceDescription = null;
return null;
}
private IAlarmHistorianSink ResolveHistorianSink()
{
var writer = SelectAlarmHistorianWriter(_driverHost, _injectedWriter, out var sourceDescription);
if (writer is null)
{
_logger.LogInformation(
"Phase 7 historian sink: no driver provides IAlarmHistorianWriter — using {Sink}",
"Phase 7 historian sink: no driver or DI-registered IAlarmHistorianWriter — using {Sink}",
_historianSink.GetType().Name);
return _historianSink;
}
_logger.LogInformation(
"Phase 7 historian sink: IAlarmHistorianWriter resolved from {Source} — SqliteStoreAndForwardSink active",
sourceDescription);
var queueRoot = Environment.GetFolderPath(Environment.SpecialFolder.CommonApplicationData);
if (string.IsNullOrEmpty(queueRoot)) queueRoot = Path.GetTempPath();
@@ -0,0 +1,138 @@
using System.Threading.Channels;
using Google.Protobuf.WellKnownTypes;
using MxGateway.Contracts.Proto;
using Shouldly;
using Xunit;
using ZB.MOM.WW.OtOpcUa.Core.Abstractions;
using ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Config;
using ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Runtime;
namespace ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Tests;
/// <summary>
/// PR E.7 — pins that the GalaxyDriver populates the extended AlarmEventArgs
/// fields (OperatorComment, OriginalRaiseTimestampUtc, AlarmCategory) when the
/// gateway emits a transition with the rich payload, and leaves them null on
/// events that don't carry them.
/// </summary>
public sealed class GalaxyDriverAlarmEventArgsExtensionTests
{
[Fact]
public async Task Acknowledge_transition_with_full_payload_populates_extended_fields()
{
var subscriber = new ManualSubscriber();
using var driver = NewDriver(subscriber);
await driver.SubscribeAlarmsAsync(["Tank01"], CancellationToken.None);
var observed = new List<AlarmEventArgs>();
driver.OnAlarmEvent += (_, args) => observed.Add(args);
await driver.SubscribeAsync(["Tank01.Level"], TimeSpan.Zero, CancellationToken.None);
var raise = new DateTime(2026, 5, 1, 12, 0, 0, DateTimeKind.Utc);
var ack = raise.AddSeconds(45);
await subscriber.EmitAlarmAsync(new MxEvent
{
Family = MxEventFamily.OnAlarmTransition,
OnAlarmTransition = new OnAlarmTransitionEvent
{
AlarmFullReference = "Tank01.Level.HiHi",
SourceObjectReference = "Tank01",
AlarmTypeName = "AnalogLimitAlarm.HiHi",
TransitionKind = AlarmTransitionKind.Acknowledge,
Severity = 750,
OriginalRaiseTimestamp = Timestamp.FromDateTime(raise),
TransitionTimestamp = Timestamp.FromDateTime(ack),
OperatorUser = "alice",
OperatorComment = "investigating",
Category = "Process",
Description = "Tank 01 high-high level",
},
});
for (var i = 0; i < 20 && observed.Count == 0; i++)
{
await Task.Delay(50);
}
observed.ShouldHaveSingleItem();
observed[0].OperatorComment.ShouldBe("investigating");
observed[0].OriginalRaiseTimestampUtc.ShouldBe(raise);
observed[0].AlarmCategory.ShouldBe("Process");
}
[Fact]
public async Task Raise_transition_without_optional_fields_leaves_them_null()
{
var subscriber = new ManualSubscriber();
using var driver = NewDriver(subscriber);
await driver.SubscribeAlarmsAsync(["Tank01"], CancellationToken.None);
var observed = new List<AlarmEventArgs>();
driver.OnAlarmEvent += (_, args) => observed.Add(args);
await driver.SubscribeAsync(["Tank01.Level"], TimeSpan.Zero, CancellationToken.None);
await subscriber.EmitAlarmAsync(new MxEvent
{
Family = MxEventFamily.OnAlarmTransition,
OnAlarmTransition = new OnAlarmTransitionEvent
{
AlarmFullReference = "Tank01.Level.HiHi",
AlarmTypeName = "AnalogLimitAlarm.HiHi",
TransitionKind = AlarmTransitionKind.Raise,
Severity = 750,
TransitionTimestamp = Timestamp.FromDateTime(DateTime.UtcNow),
},
});
for (var i = 0; i < 20 && observed.Count == 0; i++)
{
await Task.Delay(50);
}
observed.ShouldHaveSingleItem();
observed[0].OperatorComment.ShouldBeNull();
observed[0].OriginalRaiseTimestampUtc.ShouldBeNull();
observed[0].AlarmCategory.ShouldBeNull();
}
private static GalaxyDriver NewDriver(ManualSubscriber subscriber)
{
var options = new GalaxyDriverOptions(
new GalaxyGatewayOptions("http://localhost:5000", "literal-api-key"),
new GalaxyMxAccessOptions("AlarmExtensionTest"),
new GalaxyRepositoryOptions(),
new GalaxyReconnectOptions());
return new GalaxyDriver(
driverInstanceId: "drv-1",
options: options,
hierarchySource: null,
dataReader: null,
dataWriter: null,
subscriber: subscriber,
alarmAcknowledger: null);
}
private sealed class ManualSubscriber : IGalaxySubscriber
{
private readonly Channel<MxEvent> _stream =
Channel.CreateUnbounded<MxEvent>(new UnboundedChannelOptions { SingleReader = true });
public Task<IReadOnlyList<SubscribeResult>> SubscribeBulkAsync(
IReadOnlyList<string> fullReferences, int bufferedUpdateIntervalMs, CancellationToken cancellationToken)
{
var results = new List<SubscribeResult>();
var nextHandle = 100;
foreach (var r in fullReferences)
{
results.Add(new SubscribeResult { TagAddress = r, ItemHandle = nextHandle++, WasSuccessful = true });
}
return Task.FromResult<IReadOnlyList<SubscribeResult>>(results);
}
public Task UnsubscribeBulkAsync(IReadOnlyList<int> itemHandles, CancellationToken cancellationToken)
=> Task.CompletedTask;
public IAsyncEnumerable<MxEvent> StreamEventsAsync(CancellationToken cancellationToken)
=> _stream.Reader.ReadAllAsync(cancellationToken);
public ValueTask EmitAlarmAsync(MxEvent ev) => _stream.Writer.WriteAsync(ev);
}
}
@@ -0,0 +1,244 @@
using System.Threading.Channels;
using Google.Protobuf.WellKnownTypes;
using MxGateway.Contracts.Proto;
using Shouldly;
using Xunit;
using ZB.MOM.WW.OtOpcUa.Core.Abstractions;
using ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Config;
using ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Runtime;
namespace ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Tests;
/// <summary>
/// PR B.2 — pins GalaxyDriver's IAlarmSource implementation. The driver bridges
/// EventPump.OnAlarmTransition (PR B.1) onto IAlarmSource.OnAlarmEvent and
/// forwards Acknowledge through IGalaxyAlarmAcknowledger (production:
/// GatewayGalaxyAlarmAcknowledger calling the gateway's AcknowledgeAlarm RPC
/// from PR E.2).
/// </summary>
public sealed class GalaxyDriverAlarmSourceTests
{
[Fact]
public async Task SubscribeAlarmsAsync_returns_handle_and_event_fires_after_pump_alarm()
{
var subscriber = new ManualSubscriber();
var ack = new RecordingAcknowledger();
using var driver = NewDriver(subscriber, ack);
// Subscribe so OnAlarmEvent has a registered handle to fire under.
var handle = await driver.SubscribeAlarmsAsync(["Tank01"], CancellationToken.None);
handle.ShouldNotBeNull();
var observed = new List<AlarmEventArgs>();
driver.OnAlarmEvent += (_, args) => observed.Add(args);
// SubscribeAsync to start the EventPump (alarm wiring is lazy on first sub).
await driver.SubscribeAsync(["Tank01.Level"], TimeSpan.Zero, CancellationToken.None);
await subscriber.EmitAlarmAsync(new MxEvent
{
Family = MxEventFamily.OnAlarmTransition,
OnAlarmTransition = new OnAlarmTransitionEvent
{
AlarmFullReference = "Tank01.Level.HiHi",
SourceObjectReference = "Tank01",
AlarmTypeName = "AnalogLimitAlarm.HiHi",
TransitionKind = AlarmTransitionKind.Raise,
Severity = 750,
TransitionTimestamp = Timestamp.FromDateTime(DateTime.UtcNow),
Description = "Tank 01 high-high level",
},
});
// Drain pump events.
for (var i = 0; i < 20 && observed.Count == 0; i++)
{
await Task.Delay(50);
}
observed.ShouldHaveSingleItem();
observed[0].ConditionId.ShouldBe("Tank01.Level.HiHi");
observed[0].SourceNodeId.ShouldBe("Tank01");
observed[0].AlarmType.ShouldBe("AnalogLimitAlarm.HiHi");
observed[0].Severity.ShouldBe(AlarmSeverity.Critical);
observed[0].SubscriptionHandle.ShouldBe(handle);
}
[Fact]
public async Task OnAlarmEvent_does_not_fire_when_no_subscription_active()
{
var subscriber = new ManualSubscriber();
var ack = new RecordingAcknowledger();
using var driver = NewDriver(subscriber, ack);
var observed = new List<AlarmEventArgs>();
driver.OnAlarmEvent += (_, args) => observed.Add(args);
// Start the pump via a data subscription so alarm events flow but no alarm
// subscription is registered → OnAlarmEvent is suppressed.
await driver.SubscribeAsync(["Tank01.Level"], TimeSpan.Zero, CancellationToken.None);
await subscriber.EmitAlarmAsync(new MxEvent
{
Family = MxEventFamily.OnAlarmTransition,
OnAlarmTransition = new OnAlarmTransitionEvent
{
AlarmFullReference = "Tank01.Level.HiHi",
TransitionKind = AlarmTransitionKind.Raise,
Severity = 600,
TransitionTimestamp = Timestamp.FromDateTime(DateTime.UtcNow),
},
});
await Task.Delay(150);
observed.ShouldBeEmpty();
}
[Fact]
public async Task UnsubscribeAlarmsAsync_stops_event_flow()
{
var subscriber = new ManualSubscriber();
var ack = new RecordingAcknowledger();
using var driver = NewDriver(subscriber, ack);
var handle = await driver.SubscribeAlarmsAsync(["Tank01"], CancellationToken.None);
var observed = new List<AlarmEventArgs>();
driver.OnAlarmEvent += (_, args) => observed.Add(args);
await driver.SubscribeAsync(["Tank01.Level"], TimeSpan.Zero, CancellationToken.None);
await driver.UnsubscribeAlarmsAsync(handle, CancellationToken.None);
await subscriber.EmitAlarmAsync(new MxEvent
{
Family = MxEventFamily.OnAlarmTransition,
OnAlarmTransition = new OnAlarmTransitionEvent
{
AlarmFullReference = "Tank01.Level.HiHi",
TransitionKind = AlarmTransitionKind.Raise,
Severity = 600,
TransitionTimestamp = Timestamp.FromDateTime(DateTime.UtcNow),
},
});
await Task.Delay(150);
observed.ShouldBeEmpty();
}
[Fact]
public async Task UnsubscribeAlarmsAsync_throws_for_foreign_handle()
{
var subscriber = new ManualSubscriber();
var ack = new RecordingAcknowledger();
using var driver = NewDriver(subscriber, ack);
var foreignHandle = new ForeignAlarmHandle();
await Should.ThrowAsync<ArgumentException>(() =>
driver.UnsubscribeAlarmsAsync(foreignHandle, CancellationToken.None));
}
[Fact]
public async Task AcknowledgeAsync_routes_each_request_to_the_acknowledger()
{
var subscriber = new ManualSubscriber();
var ack = new RecordingAcknowledger();
using var driver = NewDriver(subscriber, ack);
var requests = new[]
{
new AlarmAcknowledgeRequest("Tank01", "Tank01.Level.HiHi", "shift handover"),
new AlarmAcknowledgeRequest("Tank02", "Tank02.Level.HiHi", "investigating"),
};
await driver.AcknowledgeAsync(requests, CancellationToken.None);
ack.Calls.Count.ShouldBe(2);
ack.Calls[0].AlarmRef.ShouldBe("Tank01.Level.HiHi");
ack.Calls[0].Comment.ShouldBe("shift handover");
ack.Calls[1].AlarmRef.ShouldBe("Tank02.Level.HiHi");
}
[Fact]
public async Task AcknowledgeAsync_falls_back_to_SourceNodeId_when_ConditionId_empty()
{
var subscriber = new ManualSubscriber();
var ack = new RecordingAcknowledger();
using var driver = NewDriver(subscriber, ack);
await driver.AcknowledgeAsync(
[new AlarmAcknowledgeRequest("Tank01.Level.HiHi", string.Empty, null)],
CancellationToken.None);
ack.Calls[0].AlarmRef.ShouldBe("Tank01.Level.HiHi");
}
[Fact]
public async Task AcknowledgeAsync_throws_NotSupported_without_acknowledger()
{
var subscriber = new ManualSubscriber();
using var driver = NewDriver(subscriber, alarmAcknowledger: null);
await Should.ThrowAsync<NotSupportedException>(() =>
driver.AcknowledgeAsync(
[new AlarmAcknowledgeRequest("Tank01", "Tank01.Level.HiHi", null)],
CancellationToken.None));
}
private static GalaxyDriver NewDriver(
ManualSubscriber subscriber, IGalaxyAlarmAcknowledger? alarmAcknowledger)
{
var options = new GalaxyDriverOptions(
new GalaxyGatewayOptions("http://localhost:5000", "literal-api-key"),
new GalaxyMxAccessOptions("AlarmSourceTest"),
new GalaxyRepositoryOptions(),
new GalaxyReconnectOptions());
return new GalaxyDriver(
driverInstanceId: "drv-1",
options: options,
hierarchySource: null,
dataReader: null,
dataWriter: null,
subscriber: subscriber,
alarmAcknowledger: alarmAcknowledger);
}
private sealed class RecordingAcknowledger : IGalaxyAlarmAcknowledger
{
public List<(string AlarmRef, string Comment, string Operator)> Calls { get; } = [];
public Task AcknowledgeAsync(string alarmFullReference, string comment, string operatorUser, CancellationToken cancellationToken)
{
Calls.Add((alarmFullReference, comment, operatorUser));
return Task.CompletedTask;
}
}
private sealed class ForeignAlarmHandle : IAlarmSubscriptionHandle
{
public string DiagnosticId => "foreign";
}
private sealed class ManualSubscriber : IGalaxySubscriber
{
private readonly Channel<MxEvent> _stream =
Channel.CreateUnbounded<MxEvent>(new UnboundedChannelOptions { SingleReader = true });
public Task<IReadOnlyList<SubscribeResult>> SubscribeBulkAsync(
IReadOnlyList<string> fullReferences, int bufferedUpdateIntervalMs, CancellationToken cancellationToken)
{
var results = new List<SubscribeResult>();
var nextHandle = 100;
foreach (var r in fullReferences)
{
results.Add(new SubscribeResult { TagAddress = r, ItemHandle = nextHandle++, WasSuccessful = true });
}
return Task.FromResult<IReadOnlyList<SubscribeResult>>(results);
}
public Task UnsubscribeBulkAsync(IReadOnlyList<int> itemHandles, CancellationToken cancellationToken)
=> Task.CompletedTask;
public IAsyncEnumerable<MxEvent> StreamEventsAsync(CancellationToken cancellationToken)
=> _stream.Reader.ReadAllAsync(cancellationToken);
public ValueTask EmitAlarmAsync(MxEvent ev) => _stream.Writer.WriteAsync(ev);
}
}
@@ -0,0 +1,159 @@
using System;
using System.Collections.Generic;
using System.Linq;
using System.Threading;
using System.Threading.Tasks;
using Shouldly;
using Xunit;
using ZB.MOM.WW.OtOpcUa.Driver.Historian.Wonderware.Backend;
using ZB.MOM.WW.OtOpcUa.Driver.Historian.Wonderware.Ipc;
namespace ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Host.Tests
{
/// <summary>
/// PR C.1 — pins the trinary outcome → IPC bool[] mapping that the sidecar uses
/// on the WriteAlarmEvents reply. Per-event outcomes:
/// Ack → true, RetryPlease → false, PermanentFail → false.
/// The sender's B.4 widens the IPC bool back into the trinary outcome at the
/// IPC boundary using structured diagnostics; the wire intentionally collapses
/// to "ok / not-ok".
/// </summary>
[Trait("Category", "Unit")]
public sealed class AahClientManagedAlarmEventWriterTests
{
[Fact]
public async Task Empty_batch_returns_empty_array_without_invoking_backend()
{
var backend = new RecordingBackend(_ => throw new InvalidOperationException("must not invoke for empty input"));
var writer = new AahClientManagedAlarmEventWriter(backend);
var result = await writer.WriteAsync(Array.Empty<AlarmHistorianEventDto>(), CancellationToken.None);
result.ShouldBeEmpty();
backend.Calls.ShouldBe(0);
}
[Fact]
public async Task Single_ack_outcome_maps_to_true()
{
var backend = new RecordingBackend(events => events.Select(_ => AlarmHistorianWriteOutcome.Ack).ToArray());
var writer = new AahClientManagedAlarmEventWriter(backend);
var result = await writer.WriteAsync(new[] { Event("E1") }, CancellationToken.None);
result.ShouldBe(new[] { true });
}
[Fact]
public async Task Mixed_batch_preserves_per_slot_ordering()
{
// Ack / Retry / Permanent / Ack — the sender uses positional matching against
// its queue, so every slot must hit the exact bool corresponding to its input.
var backend = new RecordingBackend(_ => new[]
{
AlarmHistorianWriteOutcome.Ack,
AlarmHistorianWriteOutcome.RetryPlease,
AlarmHistorianWriteOutcome.PermanentFail,
AlarmHistorianWriteOutcome.Ack,
});
var writer = new AahClientManagedAlarmEventWriter(backend);
var result = await writer.WriteAsync(
new[] { Event("E1"), Event("E2"), Event("E3"), Event("E4") },
CancellationToken.None);
result.ShouldBe(new[] { true, false, false, true });
}
[Fact]
public async Task Backend_exception_marks_whole_batch_RetryPlease()
{
var backend = new RecordingBackend(_ => throw new InvalidOperationException("cluster unreachable"));
var writer = new AahClientManagedAlarmEventWriter(backend);
var result = await writer.WriteAsync(
new[] { Event("E1"), Event("E2"), Event("E3") },
CancellationToken.None);
// Whole batch must end up as "not ok" (RetryPlease at the trinary layer) —
// dropping a transiently-failed batch corrupts the sender's queue.
result.ShouldBe(new[] { false, false, false });
}
[Fact]
public async Task Cancellation_propagates_from_backend()
{
var backend = new RecordingBackend(_ => throw new OperationCanceledException());
var writer = new AahClientManagedAlarmEventWriter(backend);
var ex = await Should.ThrowAsync<OperationCanceledException>(() =>
writer.WriteAsync(new[] { Event("E1") }, CancellationToken.None));
ex.ShouldNotBeNull();
}
[Fact]
public async Task Backend_returning_wrong_count_degrades_to_RetryPlease()
{
// Backend returns more outcomes than inputs — defensive degrade rather than
// letting a backend bug desync the sender's queue accounting.
var backend = new RecordingBackend(_ => new[]
{
AlarmHistorianWriteOutcome.Ack,
AlarmHistorianWriteOutcome.Ack,
});
var writer = new AahClientManagedAlarmEventWriter(backend);
var result = await writer.WriteAsync(new[] { Event("E1") }, CancellationToken.None);
result.ShouldBe(new[] { false });
}
[Theory]
// hresult 0 + clean → Ack
[InlineData(0, false, false, AlarmHistorianWriteOutcome.Ack)]
// hresult 0 but malformed → PermanentFail (malformed wins)
[InlineData(0, false, true, AlarmHistorianWriteOutcome.PermanentFail)]
// non-zero hresult + comm error → RetryPlease
[InlineData(unchecked((int)0x80131500), true, false, AlarmHistorianWriteOutcome.RetryPlease)]
// non-zero hresult, no comm flag, no malformed → conservative RetryPlease
[InlineData(unchecked((int)0x80131500), false, false, AlarmHistorianWriteOutcome.RetryPlease)]
// any malformed input → PermanentFail regardless of hresult
[InlineData(unchecked((int)0x80131500), true, true, AlarmHistorianWriteOutcome.PermanentFail)]
public void MapOutcome_table(int hresult, bool isCommunicationError, bool isMalformedInput, AlarmHistorianWriteOutcome expected)
{
AahClientManagedAlarmEventWriter
.MapOutcome(hresult, isCommunicationError, isMalformedInput)
.ShouldBe(expected);
}
private static AlarmHistorianEventDto Event(string id) => new AlarmHistorianEventDto
{
EventId = id,
SourceName = "Tank01",
ConditionId = "Tank01.Level.HiHi",
AlarmType = "AnalogLimitAlarm.HiHi",
Message = "Tank 01 high-high level",
Severity = 750,
EventTimeUtcTicks = DateTime.UtcNow.Ticks,
AckComment = null,
};
private sealed class RecordingBackend : IAlarmHistorianWriteBackend
{
private readonly Func<AlarmHistorianEventDto[], AlarmHistorianWriteOutcome[]> _produce;
public int Calls { get; private set; }
public RecordingBackend(Func<AlarmHistorianEventDto[], AlarmHistorianWriteOutcome[]> produce)
{
_produce = produce;
}
public Task<AlarmHistorianWriteOutcome[]> WriteBatchAsync(
AlarmHistorianEventDto[] events, CancellationToken cancellationToken)
{
Calls++;
return Task.FromResult(_produce(events));
}
}
}
}
@@ -0,0 +1,82 @@
using System;
using Shouldly;
using Xunit;
using ZB.MOM.WW.OtOpcUa.Driver.Historian.Wonderware;
using ZB.MOM.WW.OtOpcUa.Driver.Historian.Wonderware.Backend;
namespace ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Host.Tests
{
/// <summary>
/// PR C.2 — pins the env-var contract that gates whether the sidecar boots an
/// alarm-event writer. Default-on (when the historian itself is enabled) so a
/// fresh deploy picks up the writer without a service-config edit; explicit
/// <c>false</c> opts a read-only deployment out.
/// </summary>
[Trait("Category", "Unit")]
public sealed class ProgramAlarmWriterTests
{
[Fact]
public void BuildAlarmWriter_returns_writer_when_env_unset()
{
using var _ = ScopedEnv("OTOPCUA_HISTORIAN_ALARM_WRITE_ENABLED", null);
var writer = Program.BuildAlarmWriter();
writer.ShouldNotBeNull();
writer.ShouldBeOfType<AahClientManagedAlarmEventWriter>();
}
[Theory]
[InlineData("true")]
[InlineData("True")]
[InlineData("TRUE")]
public void BuildAlarmWriter_returns_writer_when_env_explicitly_true(string value)
{
using var _ = ScopedEnv("OTOPCUA_HISTORIAN_ALARM_WRITE_ENABLED", value);
var writer = Program.BuildAlarmWriter();
writer.ShouldNotBeNull();
}
[Theory]
[InlineData("false")]
[InlineData("False")]
[InlineData("FALSE")]
public void BuildAlarmWriter_returns_null_when_env_false(string value)
{
using var _ = ScopedEnv("OTOPCUA_HISTORIAN_ALARM_WRITE_ENABLED", value);
var writer = Program.BuildAlarmWriter();
writer.ShouldBeNull();
}
[Fact]
public void BuildAlarmWriter_treats_unrecognized_value_as_enabled()
{
// Anything other than the literal "false" (case-insensitive) keeps the writer
// wired — fail-open under accidental misconfiguration so an alarm-write deploy
// doesn't silently lose alarms because of a typo.
using var _ = ScopedEnv("OTOPCUA_HISTORIAN_ALARM_WRITE_ENABLED", "yes");
var writer = Program.BuildAlarmWriter();
writer.ShouldNotBeNull();
}
private static IDisposable ScopedEnv(string name, string? value)
{
var prior = Environment.GetEnvironmentVariable(name);
Environment.SetEnvironmentVariable(name, value);
return new DisposableAction(() => Environment.SetEnvironmentVariable(name, prior));
}
private sealed class DisposableAction : IDisposable
{
private readonly Action _action;
public DisposableAction(Action action) { _action = action; }
public void Dispose() => _action();
}
}
}
@@ -0,0 +1,72 @@
using Shouldly;
using Xunit;
using ZB.MOM.WW.OtOpcUa.Core.Abstractions;
namespace ZB.MOM.WW.OtOpcUa.Server.Tests.Alarms;
/// <summary>
/// PR B.3 — pins the routing decision DriverNodeManager makes when registering
/// an AlarmConditionState: drivers that implement <see cref="IAlarmSource"/>
/// get an acknowledger that calls AcknowledgeAsync (driver-native path); drivers
/// that don't fall back to the IWritable sub-attribute write.
/// </summary>
[Trait("Category", "Unit")]
public sealed class DriverAlarmSourceAcknowledgerRoutingTests
{
[Fact]
public void Driver_with_IAlarmSource_is_recognized()
{
IDriver driver = new FakeDriverWithAlarmSource("drv-1");
(driver is IAlarmSource).ShouldBeTrue(
"fakes that participate in the routing-test fixture must report IAlarmSource");
}
[Fact]
public void Driver_without_IAlarmSource_falls_to_writable_path()
{
IDriver driver = new FakeDriverNoAlarmSource("drv-2");
(driver is IAlarmSource).ShouldBeFalse(
"drivers without IAlarmSource take the legacy DriverWritableAcknowledger path");
}
private sealed class FakeDriverWithAlarmSource(string id) : IDriver, IAlarmSource
{
public string DriverInstanceId { get; } = id;
public string DriverType => "FakeAlarmSource";
public Task InitializeAsync(string c, CancellationToken ct) => Task.CompletedTask;
public Task ReinitializeAsync(string c, CancellationToken ct) => Task.CompletedTask;
public Task ShutdownAsync(CancellationToken ct) => Task.CompletedTask;
public DriverHealth GetHealth() => new(DriverState.Healthy, DateTime.UtcNow, null);
public long GetMemoryFootprint() => 0;
public Task FlushOptionalCachesAsync(CancellationToken ct) => Task.CompletedTask;
public Task<IAlarmSubscriptionHandle> SubscribeAlarmsAsync(
IReadOnlyList<string> sourceNodeIds, CancellationToken cancellationToken)
=> Task.FromResult<IAlarmSubscriptionHandle>(new FakeHandle("h"));
public Task UnsubscribeAlarmsAsync(IAlarmSubscriptionHandle handle, CancellationToken cancellationToken)
=> Task.CompletedTask;
public Task AcknowledgeAsync(
IReadOnlyList<AlarmAcknowledgeRequest> acknowledgements, CancellationToken cancellationToken)
=> Task.CompletedTask;
public event EventHandler<AlarmEventArgs>? OnAlarmEvent;
private void NoUnusedWarning() => OnAlarmEvent?.Invoke(this, null!);
}
private sealed class FakeDriverNoAlarmSource(string id) : IDriver
{
public string DriverInstanceId { get; } = id;
public string DriverType => "FakeNoAlarmSource";
public Task InitializeAsync(string c, CancellationToken ct) => Task.CompletedTask;
public Task ReinitializeAsync(string c, CancellationToken ct) => Task.CompletedTask;
public Task ShutdownAsync(CancellationToken ct) => Task.CompletedTask;
public DriverHealth GetHealth() => new(DriverState.Healthy, DateTime.UtcNow, null);
public long GetMemoryFootprint() => 0;
public Task FlushOptionalCachesAsync(CancellationToken ct) => Task.CompletedTask;
}
private sealed class FakeHandle(string id) : IAlarmSubscriptionHandle
{
public string DiagnosticId { get; } = id;
}
}
@@ -0,0 +1,122 @@
using Shouldly;
using Xunit;
using ZB.MOM.WW.OtOpcUa.Core.AlarmHistorian;
using ZB.MOM.WW.OtOpcUa.Core.Abstractions;
using ZB.MOM.WW.OtOpcUa.Core.Hosting;
using ZB.MOM.WW.OtOpcUa.Server.Phase7;
namespace ZB.MOM.WW.OtOpcUa.Server.Tests.Phase7;
/// <summary>
/// PR B.4 — pins the precedence order Phase7Composer uses to pick an
/// <see cref="IAlarmHistorianWriter"/>:
/// driver-provided > DI-registered > none. Driver wins so a future
/// GalaxyDriver-as-IAlarmHistorianWriter takes the write path directly,
/// preserving the v1 invariant where a driver that natively owns the
/// historian client doesn't bounce through the sidecar IPC.
/// </summary>
[Trait("Category", "Unit")]
public sealed class Phase7ComposerWriterSelectionTests
{
[Fact]
public async Task No_driver_no_injected_writer_returns_null()
{
await using var host = new DriverHost();
var writer = Phase7Composer.SelectAlarmHistorianWriter(host, injectedWriter: null, out var source);
writer.ShouldBeNull();
source.ShouldBeNull();
}
[Fact]
public async Task Injected_writer_only_is_selected()
{
await using var host = new DriverHost();
var injected = new RecordingWriter("from-di");
var writer = Phase7Composer.SelectAlarmHistorianWriter(host, injected, out var source);
writer.ShouldBeSameAs(injected);
source.ShouldStartWith("di:");
}
[Fact]
public async Task Driver_writer_wins_over_injected()
{
await using var host = new DriverHost();
var driver = new FakeDriverWithWriter("drv-1", "drv-out");
await host.RegisterAsync(driver, driverConfigJson: "{}", CancellationToken.None);
var injected = new RecordingWriter("from-di");
var writer = Phase7Composer.SelectAlarmHistorianWriter(host, injected, out var source);
writer.ShouldBeSameAs(driver);
source.ShouldBe("driver:drv-1");
}
[Fact]
public async Task First_driver_implementing_writer_wins()
{
await using var host = new DriverHost();
var driverNoWriter = new FakeDriverWithoutWriter("drv-1");
var driverWithWriter = new FakeDriverWithWriter("drv-2", "drv-out");
await host.RegisterAsync(driverNoWriter, "{}", CancellationToken.None);
await host.RegisterAsync(driverWithWriter, "{}", CancellationToken.None);
var writer = Phase7Composer.SelectAlarmHistorianWriter(host, injectedWriter: null, out var source);
writer.ShouldBeSameAs(driverWithWriter);
source.ShouldBe("driver:drv-2");
}
private sealed class RecordingWriter : IAlarmHistorianWriter
{
public string Tag { get; }
public RecordingWriter(string tag) { Tag = tag; }
public Task<IReadOnlyList<HistorianWriteOutcome>> WriteBatchAsync(
IReadOnlyList<AlarmHistorianEvent> batch, CancellationToken cancellationToken)
{
var outcomes = new HistorianWriteOutcome[batch.Count];
for (var i = 0; i < outcomes.Length; i++) outcomes[i] = HistorianWriteOutcome.Ack;
return Task.FromResult<IReadOnlyList<HistorianWriteOutcome>>(outcomes);
}
}
private sealed class FakeDriverWithoutWriter : IDriver
{
public FakeDriverWithoutWriter(string id) { DriverInstanceId = id; }
public string DriverInstanceId { get; }
public string DriverType => "FakeNoWriter";
public Task InitializeAsync(string c, CancellationToken ct) => Task.CompletedTask;
public Task ReinitializeAsync(string c, CancellationToken ct) => Task.CompletedTask;
public Task ShutdownAsync(CancellationToken ct) => Task.CompletedTask;
public DriverHealth GetHealth() => new(DriverState.Healthy, DateTime.UtcNow, null);
public long GetMemoryFootprint() => 0;
public Task FlushOptionalCachesAsync(CancellationToken ct) => Task.CompletedTask;
}
private sealed class FakeDriverWithWriter : IDriver, IAlarmHistorianWriter
{
private readonly RecordingWriter _writer;
public FakeDriverWithWriter(string id, string tag)
{
DriverInstanceId = id;
_writer = new RecordingWriter(tag);
}
public string DriverInstanceId { get; }
public string DriverType => "FakeWithWriter";
public Task InitializeAsync(string c, CancellationToken ct) => Task.CompletedTask;
public Task ReinitializeAsync(string c, CancellationToken ct) => Task.CompletedTask;
public Task ShutdownAsync(CancellationToken ct) => Task.CompletedTask;
public DriverHealth GetHealth() => new(DriverState.Healthy, DateTime.UtcNow, null);
public long GetMemoryFootprint() => 0;
public Task FlushOptionalCachesAsync(CancellationToken ct) => Task.CompletedTask;
public Task<IReadOnlyList<HistorianWriteOutcome>> WriteBatchAsync(
IReadOnlyList<AlarmHistorianEvent> batch, CancellationToken cancellationToken)
=> _writer.WriteBatchAsync(batch, cancellationToken);
}
}