Files
lmxopcua/docs/plans/2026-06-27-otopcua-historian-followups.md
T
Joseph Doherty 10a6ac6f3e
v2-ci / build (pull_request) Failing after 41s
v2-ci / unit-tests (tests/Core/ZB.MOM.WW.OtOpcUa.Cluster.Tests) (pull_request) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.ControlPlane.Tests) (pull_request) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.OpcUaServer.Tests) (pull_request) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.Runtime.Tests) (pull_request) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.Security.Tests) (pull_request) Has been skipped
v2-ci / integration (tests/Server/ZB.MOM.WW.OtOpcUa.Host.IntegrationTests) (pull_request) Has been skipped
v2-ci / integration (tests/Server/ZB.MOM.WW.OtOpcUa.OpcUaServer.IntegrationTests) (pull_request) Has been skipped
docs(historian-gateway): note FU-3 alias handling (review fix) in follow-up plan
Claude-Session: https://claude.ai/code/session_012SDSQ3AcaXqPcBtDESBRii
2026-06-27 00:57:14 -04:00

196 lines
14 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# OtOpcUa ↔ HistorianGateway — Follow-up & Deferred Items
**Status:** the 21-task integration (`feat/historian-gateway-backend`, Gitea PR
[#423](https://gitea.dohertylan.com/dohertj2/lmxopcua/pulls/423)) + the continuous-historization
ref-feed are complete and **live-validated** against `wonder-sql-vd03`. The offline suite is green;
the live `Category=LiveIntegration` suite is green (read ✅, write-persist ✅, alarm-send ✅,
alarm-readback ⏭ skip). This doc tracks everything deliberately deferred or surfaced during
validation, with the **owning repo** for each.
> **Execution update (2026-06-27 — this follow-up pass):**
> - **FU-1 — RESOLVED as a documented protocol limitation** (NOT a fixable gateway bug): the captured
> CM_EVENT event-send wire never carries `SourceName`, so `Source_Object` cannot be populated by the
> gateway. Recorded as `pending.md` **C4** + a CLAUDE.md note in the HistorianGateway repo (commit
> `174a4a9` on `fix/gateway-otopcua-followups`). The OtOpcUa live test stays skipped with the corrected
> reason. See FU-1 below for the (now-confirmed) root cause.
> - **FU-2 — ✅ DONE + live-validated** in HistorianGateway (`fix/gateway-otopcua-followups`, commits
> `150868c` + `1c2d11d`). The SQL live-write path converts UTC→server-local in-SQL via
> `DATEADD(MINUTE, DATEPART(TZOFFSET, SYSDATETIMEOFFSET()), @dt)`; an explicit-timestamp round-trip is
> now EXACT against the live historian (delta 00:00:00).
> - **FU-3 — ✅ DONE** in OtOpcUa (this branch, commit `111adc92`): `HistorizedTagRef(MuxRef, HistorianName)`
> carried through the sink/recorder; interest registered by mux ref, values written under the historian
> name. Recorder + applier tests green.
> - **FU-4 — ✅ DONE** in OtOpcUa (this branch, commit `b2276b5b`).
> - **FU-5** — still pre-existing/not-ours (tracked below). **FU-6** — still pending the merges.
**Live-validation harness recap (how to reproduce any of the live findings below):** run the
HistorianGateway locally against the live historian, then point the OtOpcUa live tests (or `grpcurl`)
at it. The gateway boots from env-var config (secrets from `~/.zshenv`):
```
ASPNETCORE_ENVIRONMENT=Development
Historian__Host=$HISTORIAN_GRPC_HOST Historian__Port=32565 Historian__GrpcUseTls=true
Historian__UserName=$HISTORIAN_USER Historian__Password=$HISTORIAN_PASSWORD
Historian__AllowUntrustedServerCertificate=true
Galaxy__ConnectionString=$GALAXY_SQL_CONNECTION
RuntimeDb__Enabled=true RuntimeDb__EventReadsEnabled=true
RuntimeDb__ConnectionString="Server=$HISTORIAN_GRPC_HOST;Database=Runtime;User Id=$HISTORIAN_SQL_USER;Password=$HISTORIAN_SQL_PASSWORD;TrustServerCertificate=true;Encrypt=false"
ApiKeys__Mode=Disabled
# dotnet run the Server → gRPC h2c on localhost:5221, HTTP on :5220 (/healthz, /health/ready)
```
OtOpcUa live tests then read `HISTGW_GATEWAY_ENDPOINT=http://localhost:5221` +
`HISTGW_GATEWAY_APIKEY=<any>` + `HISTGW_TEST_TAG`/`HISTGW_WRITE_SANDBOX_TAG`/`HISTGW_ALARM_SOURCE`.
Direct SQL: `Runtime.dbo.Events` is an **INSQL linked-server view that rejects untimed queries**
always include an `EventTimeUtc` range. `sqlcmd -S $HISTORIAN_GRPC_HOST -d Runtime -U $HISTORIAN_SQL_USER -C`
(password via `SQLCMDPASSWORD`).
---
## Priority 1 — Gateway-side bugs that block OtOpcUa write/read use cases
**Owning repo: `~/Desktop/HistorianGateway` (HistorianGateway).** OtOpcUa code is correct for both;
these are gateway defects that gate the "write OtOpcUa's own data, read it back" use case.
### FU-1 — `SendEvent` does not populate `Source_Object` — ✅ RESOLVED as a documented protocol limitation (2026-06-27)
> **Outcome:** root-caused and confirmed **not fixable at the gateway** — the captured CM_EVENT event-send
> wire (`HistorianEventWriteProtocol.SerializeEventValueBlob`) serializes Namespace/Type/properties but
> **never `SourceName`** (the gateway threads it correctly; the wire drops it). `Source_Object` is a
> Galaxy-platform association for object-raised events. Documented as `pending.md` **C4** + a CLAUDE.md note
> in HistorianGateway; likely won't-fix (would need new wire-capture evidence in `histsdk` — vendored
> sources aren't hand-edited). The "Investigation/Proposed fix" below is retained for the record; option 1
> is now known to be infeasible.
**Symptom (live-proven):** OtOpcUa's `GatewayAlarmHistorianWriter.SendEvent` of an event with
`source_name="HistGW.LiveTest.AlarmSource"` **acks** and **lands in `Runtime.dbo.Events`** with the
correct `Type` (`LimitAlarm`) and `EventTimeUtc` (no shift) — but with **`Source_Object = NULL`** (and
all other `Source_*`/`Provider_*` columns null). The gateway's `SqlEventReader` filters
`WHERE Source_Object = @source`, so a source-filtered `ReadEvents` of a just-sent event returns 0.
**What works (so this is narrow, not "C2 won't-fix"):**
- Time-only `ReadEvents` (no source filter) returns events (50 in a 2-day window during validation).
- Source-filtered `ReadEvents` for a **real Galaxy event source** (`TableAlarms_006`) returns its
history (`System.Deploy`/`Undeploy`/`Alarm.Set`, each with `source_name` populated). So the SQL
reader + source filter are functional; only **ad-hoc SendEvents lack a `Source_Object`.**
-**Reading existing Galaxy alarm/event history by source already works** (the mxaccessgw read use
case). Only round-tripping OtOpcUa's *own* sends by source is blocked.
**Investigation (gateway repo):**
- Read the v8 event-send path: `RegisterCmEventTag` + the `ConnectionType=Event` send (CM_EVENT). Find
where the event's source/tag is set on the wire payload and whether the historian maps any send-side
field → the `Events.Source_Object` column. Start at the gateway `SendEvent` service + the vendored
`AVEVA.Historian.Client` event session (`HistorianEventSession`), and the
`event-session-reuse-spike` notes in `../histsdk/docs/reverse-engineering/`.
- Determine whether the historian's CM_EVENT API even *allows* setting a `Source_Object` for an event
not raised by a Galaxy object. If the source must be a registered event-tag/source name, decide how
OtOpcUa's `EquipmentPath` should map to it.
**Proposed fix (one of):**
1. If the send payload has a source/tag field that maps to `Source_Object`: populate it from the event's
`source_name` in the gateway `SendEvent` handler. (Preferred — makes write-back-by-source work.)
2. If the historian cannot carry a source for ad-hoc events: document it, and have the gateway's
`SqlEventReader` optionally match the source in a fallback column the send *does* populate (if any),
or expose a "read all events in window, filter client-side" mode. Update OtOpcUa's
`GatewayHistorianDataSource.ReadEventsAsync` defensive client-side source filter accordingly (it
currently drops events whose mapped `SourceName` ≠ requested source — which would also drop
source-less sends even if the server returned them).
**Acceptance:** an OtOpcUa `SendEvent(source=X)` is readable back via `ReadEvents(source=X)` within the
window. Then **un-skip** `Alarm_SendEvent_then_ReadEvents` in
`tests/Drivers/.../Live/GatewayLiveIntegrationTests.cs` (it currently `Assert.Skip`s on a 0-result with
the accurate reason).
### FU-2 — `WriteLiveValues` shifts an explicit timestamp by the local↔UTC offset (~+4h) — ✅ DONE + live-validated (2026-06-27)
> **Outcome:** fixed in HistorianGateway (`fix/gateway-otopcua-followups`). The SQL live-write path now
> converts UTC→server-local in-SQL via `DATEADD(MINUTE, DATEPART(TZOFFSET, SYSDATETIMEOFFSET()), @dt)` (a
> single atomic offset read). An explicit-timestamp round-trip (real SQL write → gateway UTC ReadRaw) is now
> EXACT against the live 2023 R2 historian (delta 00:00:00); offline unit test locks the exact conversion
> expression. The OtOpcUa live write test can now be tightened (see acceptance).
**Symptom (live-proven, reproduces via raw `grpcurl` — no OtOpcUa code involved):** a `WriteLiveValues`
with an **explicit** `timestamp=2026-06-27T03:45:00Z` lands in the historian at
`2026-06-27T07:45:00Z` (+4h = the deployment's local↔UTC delta). A **server-stamped** write (null
timestamp) lands correctly at the gateway's UTC now. The OtOpcUa value-writer sends correct UTC
(`Timestamp.FromDateTime(SpecifyKind(ts, Utc))`), so the shift is in the gateway's SQL write path.
**Impact:** the continuous-historization recorder writes the driver's **source** timestamp (explicit),
so historized values would carry timestamps offset by the host's UTC offset until fixed. (The OtOpcUa
live write test currently uses a ±12h tz-tolerant readback window to validate *persistence* around
this — see FU-2 acceptance.)
**Investigation (gateway repo):** `SqlLiveValueWriter` (the `aaAnalogTagInsert` + `INSERT INTO History`
path). Inspect which `History` DateTime column is written (local vs `*UTC`) and the conversion applied
to the incoming proto UTC `Timestamp`. The +4h (value lands *later* than supplied UTC) is consistent
with writing a UTC value into a **local** column that `ReadRaw` then converts local→UTC, on a server
whose offset is 4h (EDT). Compare against the **server-stamped** path (which is correct) to see what
conversion the explicit path is missing.
**Proposed fix:** convert the supplied UTC timestamp to the historian server's local time before the
`History` insert (or write the UTC-typed column), so an explicit UTC timestamp round-trips unchanged.
Add a gateway unit/live test: write explicit `T`, read back, assert the sample timestamp == `T`.
**Acceptance:** an explicit-timestamp `WriteLiveValues` reads back at the supplied UTC time. Then
**tighten** the OtOpcUa live write test (`Write_then_read_on_sandbox_tag`) back to a narrow recent
window anchored on the write time.
---
## Priority 2 — OtOpcUa-side follow-ups
**Owning repo: `~/Desktop/OtOpcUa` (this repo).**
### FU-3 — Continuous-historization `HistorianTagname` override edge case — ✅ DONE (2026-06-27, commit `111adc92`)
> **Outcome:** implemented the "carry both identifiers" fix below. A new `HistorizedTagRef(MuxRef,
> HistorianName)` record threads through `IHistorizedTagSubscriptionSink` → the recorder; the recorder keeps
> a **muxRef → SET-of-historian-names** map, registers/filters mux interest by `MuxRef` (= driver `FullName`)
> but writes under every `HistorianName` (override-or-FullName) sharing that ref. The applier resolves both.
> The set (not a single name) closes a code-review **Critical**: one driver ref can back several historized
> equipment tags via aliasing (identical machines sharing a register), each with its own override — a single
> fan must write ALL of them, not silently drop all but one. Tests: divergent-override, aliased-refs-each-
> get-the-value, remove-one-alias-keeps-the-ref, override-rename updates the write target without mux churn;
> applier feed tests assert the full pairs. Commits `111adc92` + `60695179` (review fix).
The `ContinuousHistorizationRecorder` registers `DependencyMuxActor` interest **by the resolved
historian name** (`HistorianTagname` override else `FullName`) — the same key the EnsureTags hook and
the writer use. The mux fans `DependencyValueChanged` **keyed by `FullReference`** (the driver's
published ref). In the **common case (no override)** historian-name == `FullReference`, so it's fully
consistent and works (live-validated path is the value writer; mux fan-out is the recorder's input).
**When a `HistorianTagname` override is set** (override ≠ `FullReference`), the recorder registers
interest under a key the mux never fans → that tag's values are never captured.
**Fix options:** register mux interest by `FullReference` (the mux key) while writing to the historian
under the resolved historian name — i.e. carry both identifiers through `IHistorizedTagSubscriptionSink`
/ the recorder (a `(muxRef, historianName)` pair) instead of a single string. Add a recorder test with
a divergent override. **Low urgency** (overrides are uncommon); only matters for non-Galaxy historized
tags that set an explicit `HistorianTagname`.
### FU-4 — `AlarmHistorianOptions.Validate()` `MaxAttempts<=0` test coverage (minor) — ✅ DONE (2026-06-27, commit `b2276b5b`)
T19 pruned the Wonderware-shaped fields and reworked `AlarmHistorianRegistrationTests`. The
`MaxAttempts <= 0` warning branch in `AlarmHistorianOptions.Validate()` is exercised in prod but not
covered by a test (the sibling warnings for `DrainIntervalSeconds`/`Capacity`/`DeadLetterRetentionDays`
are). Add a `Validate_warns_on_non_positive_max_attempts` case. Trivial.
### FU-5 — Pre-existing `Host.IntegrationTests` failure (NOT ours — track separately)
`EquipmentNamespaceMaterializationTests.Deploying_an_equipment_namespace_carries_the_signal_into_the_artifact`
fails (`Rejected` vs expected `Accepted`) on a **Modbus-only** namespace via `DraftValidator`/
`ConfigComposer` — untouched by this branch. **Verified failing identically on `master`** (via
`git stash`). Environment/pre-existing; out of scope for the historian work but worth a separate ticket.
---
## Priority 3 — Cross-repo propagation (after merges)
- **FU-6 — scadaproj index + agent memory.** When PR #423 merges (and the Plan 1 client PR), update
`../scadaproj/CLAUDE.md` (the HistorianGateway + OtOpcUa entries) and the agent memory notes
(`otopcua-historian-backend`, `scadaproj-umbrella`) to record: OtOpcUa now consumes
`ZB.MOM.WW.HistorianGateway.Client` as its historian backend; the Wonderware historian driver was
retired; the two gateway follow-ups (FU-1/FU-2). Per the CLAUDE.md cross-repo propagation rule.
---
## Already resolved this effort (for the record — do NOT redo)
- **Alarm SendEvent event-id bug** — `AlarmEventMapper` set the wire `Id` → gateway handler throws →
every alarm send `PermanentFail`. **Fixed** (`44644ddc`): leave `Id` unset, carry the id as an
`AlarmId` property. Live-validated (send acks).
- **Continuous-historization ref-feed gap** — recorder spawned with an empty ref set. **Closed**
(`2982cc4b`): `IHistorizedTagSubscriptionSink` + recorder `UpdateHistorizedRefs(added, removed)`
converges mux interest on each `AddressSpaceApplier.Apply()`.
- **Read path / use case 1** — live-validated PASS (ReadRaw through `GatewayHistorianDataSource`).
- **C2 mis-attribution** — the alarm readback-0 was NOT the "C2 server-gated event reads" limitation;
the SQL reader works (see FU-1).