Files
lmxopcua/docs/plans/2026-06-27-otopcua-historian-followups.md
T
Joseph Doherty 00cc1da362 docs(historian-gateway): mark follow-up plan status — FU-1 documented-limitation, FU-2/3/4 done
Record the execution outcomes in the follow-up plan: FU-1 resolved as a documented
protocol limitation (gateway pending.md C4; not fixable without histsdk wire-capture
evidence), FU-2 done + live-validated (exact round-trip), FU-3 done (mux-ref vs
historian-name decoupled via HistorizedTagRef), FU-4 done. FU-5 (pre-existing Modbus
failure) and FU-6 (post-merge propagation) remain tracked.

Claude-Session: https://claude.ai/code/session_012SDSQ3AcaXqPcBtDESBRii
2026-06-27 00:45:19 -04:00

14 KiB
Raw Blame History

OtOpcUa ↔ HistorianGateway — Follow-up & Deferred Items

Status: the 21-task integration (feat/historian-gateway-backend, Gitea PR #423) + the continuous-historization ref-feed are complete and live-validated against wonder-sql-vd03. The offline suite is green; the live Category=LiveIntegration suite is green (read , write-persist , alarm-send , alarm-readback ⏭ skip). This doc tracks everything deliberately deferred or surfaced during validation, with the owning repo for each.

Execution update (2026-06-27 — this follow-up pass):

  • FU-1 — RESOLVED as a documented protocol limitation (NOT a fixable gateway bug): the captured CM_EVENT event-send wire never carries SourceName, so Source_Object cannot be populated by the gateway. Recorded as pending.md C4 + a CLAUDE.md note in the HistorianGateway repo (commit 174a4a9 on fix/gateway-otopcua-followups). The OtOpcUa live test stays skipped with the corrected reason. See FU-1 below for the (now-confirmed) root cause.
  • FU-2 — DONE + live-validated in HistorianGateway (fix/gateway-otopcua-followups, commits 150868c + 1c2d11d). The SQL live-write path converts UTC→server-local in-SQL via DATEADD(MINUTE, DATEPART(TZOFFSET, SYSDATETIMEOFFSET()), @dt); an explicit-timestamp round-trip is now EXACT against the live historian (delta 00:00:00).
  • FU-3 — DONE in OtOpcUa (this branch, commit 111adc92): HistorizedTagRef(MuxRef, HistorianName) carried through the sink/recorder; interest registered by mux ref, values written under the historian name. Recorder + applier tests green.
  • FU-4 — DONE in OtOpcUa (this branch, commit b2276b5b).
  • FU-5 — still pre-existing/not-ours (tracked below). FU-6 — still pending the merges.

Live-validation harness recap (how to reproduce any of the live findings below): run the HistorianGateway locally against the live historian, then point the OtOpcUa live tests (or grpcurl) at it. The gateway boots from env-var config (secrets from ~/.zshenv):

ASPNETCORE_ENVIRONMENT=Development
Historian__Host=$HISTORIAN_GRPC_HOST  Historian__Port=32565  Historian__GrpcUseTls=true
Historian__UserName=$HISTORIAN_USER   Historian__Password=$HISTORIAN_PASSWORD
Historian__AllowUntrustedServerCertificate=true
Galaxy__ConnectionString=$GALAXY_SQL_CONNECTION
RuntimeDb__Enabled=true  RuntimeDb__EventReadsEnabled=true
RuntimeDb__ConnectionString="Server=$HISTORIAN_GRPC_HOST;Database=Runtime;User Id=$HISTORIAN_SQL_USER;Password=$HISTORIAN_SQL_PASSWORD;TrustServerCertificate=true;Encrypt=false"
ApiKeys__Mode=Disabled
# dotnet run the Server → gRPC h2c on localhost:5221, HTTP on :5220 (/healthz, /health/ready)

OtOpcUa live tests then read HISTGW_GATEWAY_ENDPOINT=http://localhost:5221 + HISTGW_GATEWAY_APIKEY=<any> + HISTGW_TEST_TAG/HISTGW_WRITE_SANDBOX_TAG/HISTGW_ALARM_SOURCE. Direct SQL: Runtime.dbo.Events is an INSQL linked-server view that rejects untimed queries — always include an EventTimeUtc range. sqlcmd -S $HISTORIAN_GRPC_HOST -d Runtime -U $HISTORIAN_SQL_USER -C (password via SQLCMDPASSWORD).


Priority 1 — Gateway-side bugs that block OtOpcUa write/read use cases

Owning repo: ~/Desktop/HistorianGateway (HistorianGateway). OtOpcUa code is correct for both; these are gateway defects that gate the "write OtOpcUa's own data, read it back" use case.

FU-1 — SendEvent does not populate Source_Object RESOLVED as a documented protocol limitation (2026-06-27)

Outcome: root-caused and confirmed not fixable at the gateway — the captured CM_EVENT event-send wire (HistorianEventWriteProtocol.SerializeEventValueBlob) serializes Namespace/Type/properties but never SourceName (the gateway threads it correctly; the wire drops it). Source_Object is a Galaxy-platform association for object-raised events. Documented as pending.md C4 + a CLAUDE.md note in HistorianGateway; likely won't-fix (would need new wire-capture evidence in histsdk — vendored sources aren't hand-edited). The "Investigation/Proposed fix" below is retained for the record; option 1 is now known to be infeasible.

Symptom (live-proven): OtOpcUa's GatewayAlarmHistorianWriter.SendEvent of an event with source_name="HistGW.LiveTest.AlarmSource" acks and lands in Runtime.dbo.Events with the correct Type (LimitAlarm) and EventTimeUtc (no shift) — but with Source_Object = NULL (and all other Source_*/Provider_* columns null). The gateway's SqlEventReader filters WHERE Source_Object = @source, so a source-filtered ReadEvents of a just-sent event returns 0.

What works (so this is narrow, not "C2 won't-fix"):

  • Time-only ReadEvents (no source filter) returns events (50 in a 2-day window during validation).
  • Source-filtered ReadEvents for a real Galaxy event source (TableAlarms_006) returns its history (System.Deploy/Undeploy/Alarm.Set, each with source_name populated). So the SQL reader + source filter are functional; only ad-hoc SendEvents lack a Source_Object.
  • Reading existing Galaxy alarm/event history by source already works (the mxaccessgw read use case). Only round-tripping OtOpcUa's own sends by source is blocked.

Investigation (gateway repo):

  • Read the v8 event-send path: RegisterCmEventTag + the ConnectionType=Event send (CM_EVENT). Find where the event's source/tag is set on the wire payload and whether the historian maps any send-side field → the Events.Source_Object column. Start at the gateway SendEvent service + the vendored AVEVA.Historian.Client event session (HistorianEventSession), and the event-session-reuse-spike notes in ../histsdk/docs/reverse-engineering/.
  • Determine whether the historian's CM_EVENT API even allows setting a Source_Object for an event not raised by a Galaxy object. If the source must be a registered event-tag/source name, decide how OtOpcUa's EquipmentPath should map to it.

Proposed fix (one of):

  1. If the send payload has a source/tag field that maps to Source_Object: populate it from the event's source_name in the gateway SendEvent handler. (Preferred — makes write-back-by-source work.)
  2. If the historian cannot carry a source for ad-hoc events: document it, and have the gateway's SqlEventReader optionally match the source in a fallback column the send does populate (if any), or expose a "read all events in window, filter client-side" mode. Update OtOpcUa's GatewayHistorianDataSource.ReadEventsAsync defensive client-side source filter accordingly (it currently drops events whose mapped SourceName ≠ requested source — which would also drop source-less sends even if the server returned them).

Acceptance: an OtOpcUa SendEvent(source=X) is readable back via ReadEvents(source=X) within the window. Then un-skip Alarm_SendEvent_then_ReadEvents in tests/Drivers/.../Live/GatewayLiveIntegrationTests.cs (it currently Assert.Skips on a 0-result with the accurate reason).

FU-2 — WriteLiveValues shifts an explicit timestamp by the local↔UTC offset (~+4h) — DONE + live-validated (2026-06-27)

Outcome: fixed in HistorianGateway (fix/gateway-otopcua-followups). The SQL live-write path now converts UTC→server-local in-SQL via DATEADD(MINUTE, DATEPART(TZOFFSET, SYSDATETIMEOFFSET()), @dt) (a single atomic offset read). An explicit-timestamp round-trip (real SQL write → gateway UTC ReadRaw) is now EXACT against the live 2023 R2 historian (delta 00:00:00); offline unit test locks the exact conversion expression. The OtOpcUa live write test can now be tightened (see acceptance).

Symptom (live-proven, reproduces via raw grpcurl — no OtOpcUa code involved): a WriteLiveValues with an explicit timestamp=2026-06-27T03:45:00Z lands in the historian at 2026-06-27T07:45:00Z (+4h = the deployment's local↔UTC delta). A server-stamped write (null timestamp) lands correctly at the gateway's UTC now. The OtOpcUa value-writer sends correct UTC (Timestamp.FromDateTime(SpecifyKind(ts, Utc))), so the shift is in the gateway's SQL write path.

Impact: the continuous-historization recorder writes the driver's source timestamp (explicit), so historized values would carry timestamps offset by the host's UTC offset until fixed. (The OtOpcUa live write test currently uses a ±12h tz-tolerant readback window to validate persistence around this — see FU-2 acceptance.)

Investigation (gateway repo): SqlLiveValueWriter (the aaAnalogTagInsert + INSERT INTO History path). Inspect which History DateTime column is written (local vs *UTC) and the conversion applied to the incoming proto UTC Timestamp. The +4h (value lands later than supplied UTC) is consistent with writing a UTC value into a local column that ReadRaw then converts local→UTC, on a server whose offset is 4h (EDT). Compare against the server-stamped path (which is correct) to see what conversion the explicit path is missing.

Proposed fix: convert the supplied UTC timestamp to the historian server's local time before the History insert (or write the UTC-typed column), so an explicit UTC timestamp round-trips unchanged. Add a gateway unit/live test: write explicit T, read back, assert the sample timestamp == T.

Acceptance: an explicit-timestamp WriteLiveValues reads back at the supplied UTC time. Then tighten the OtOpcUa live write test (Write_then_read_on_sandbox_tag) back to a narrow recent window anchored on the write time.


Priority 2 — OtOpcUa-side follow-ups

Owning repo: ~/Desktop/OtOpcUa (this repo).

FU-3 — Continuous-historization HistorianTagname override edge case — DONE (2026-06-27, commit 111adc92)

Outcome: implemented the "carry both identifiers" fix below. A new HistorizedTagRef(MuxRef, HistorianName) record threads through IHistorizedTagSubscriptionSink → the recorder; the recorder keeps a muxRef→historianName map, registers/filters mux interest by MuxRef (= driver FullName) but writes under HistorianName (override-or-FullName). The applier resolves both. Divergent-override + override- rename-no-churn recorder tests added; applier feed tests assert the full pairs.

The ContinuousHistorizationRecorder registers DependencyMuxActor interest by the resolved historian name (HistorianTagname override else FullName) — the same key the EnsureTags hook and the writer use. The mux fans DependencyValueChanged keyed by FullReference (the driver's published ref). In the common case (no override) historian-name == FullReference, so it's fully consistent and works (live-validated path is the value writer; mux fan-out is the recorder's input). When a HistorianTagname override is set (override ≠ FullReference), the recorder registers interest under a key the mux never fans → that tag's values are never captured. Fix options: register mux interest by FullReference (the mux key) while writing to the historian under the resolved historian name — i.e. carry both identifiers through IHistorizedTagSubscriptionSink / the recorder (a (muxRef, historianName) pair) instead of a single string. Add a recorder test with a divergent override. Low urgency (overrides are uncommon); only matters for non-Galaxy historized tags that set an explicit HistorianTagname.

FU-4 — AlarmHistorianOptions.Validate() MaxAttempts<=0 test coverage (minor) — DONE (2026-06-27, commit b2276b5b)

T19 pruned the Wonderware-shaped fields and reworked AlarmHistorianRegistrationTests. The MaxAttempts <= 0 warning branch in AlarmHistorianOptions.Validate() is exercised in prod but not covered by a test (the sibling warnings for DrainIntervalSeconds/Capacity/DeadLetterRetentionDays are). Add a Validate_warns_on_non_positive_max_attempts case. Trivial.

FU-5 — Pre-existing Host.IntegrationTests failure (NOT ours — track separately)

EquipmentNamespaceMaterializationTests.Deploying_an_equipment_namespace_carries_the_signal_into_the_artifact fails (Rejected vs expected Accepted) on a Modbus-only namespace via DraftValidator/ ConfigComposer — untouched by this branch. Verified failing identically on master (via git stash). Environment/pre-existing; out of scope for the historian work but worth a separate ticket.


Priority 3 — Cross-repo propagation (after merges)

  • FU-6 — scadaproj index + agent memory. When PR #423 merges (and the Plan 1 client PR), update ../scadaproj/CLAUDE.md (the HistorianGateway + OtOpcUa entries) and the agent memory notes (otopcua-historian-backend, scadaproj-umbrella) to record: OtOpcUa now consumes ZB.MOM.WW.HistorianGateway.Client as its historian backend; the Wonderware historian driver was retired; the two gateway follow-ups (FU-1/FU-2). Per the CLAUDE.md cross-repo propagation rule.

Already resolved this effort (for the record — do NOT redo)

  • Alarm SendEvent event-id bugAlarmEventMapper set the wire Id → gateway handler throws → every alarm send PermanentFail. Fixed (44644ddc): leave Id unset, carry the id as an AlarmId property. Live-validated (send acks).
  • Continuous-historization ref-feed gap — recorder spawned with an empty ref set. Closed (2982cc4b): IHistorizedTagSubscriptionSink + recorder UpdateHistorizedRefs(added, removed) converges mux interest on each AddressSpaceApplier.Apply().
  • Read path / use case 1 — live-validated PASS (ReadRaw through GatewayHistorianDataSource).
  • C2 mis-attribution — the alarm readback-0 was NOT the "C2 server-gated event reads" limitation; the SQL reader works (see FU-1).