Commit Graph

1029 Commits

Author SHA1 Message Date
Joseph Doherty 44644ddc7f fix(historian-gateway): alarm SendEvent must not set wire event Id (live-validated)
v2-ci / build (pull_request) Failing after 45s
v2-ci / unit-tests (tests/Core/ZB.MOM.WW.OtOpcUa.Cluster.Tests) (pull_request) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.ControlPlane.Tests) (pull_request) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.OpcUaServer.Tests) (pull_request) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.Runtime.Tests) (pull_request) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.Security.Tests) (pull_request) Has been skipped
v2-ci / integration (tests/Server/ZB.MOM.WW.OtOpcUa.Host.IntegrationTests) (pull_request) Has been skipped
v2-ci / integration (tests/Server/ZB.MOM.WW.OtOpcUa.OpcUaServer.IntegrationTests) (pull_request) Has been skipped
Live validation against wonder-sql-vd03 caught that the gateway's SendEvent handler
throws when the wire event carries a client-supplied Id — so every alarm send from
OtOpcUa failed (PermanentFail). AlarmEventMapper now leaves HistorianEvent.Id unset
(the historian assigns event identity) and preserves the alarm's id as an 'AlarmId'
property. With this, the live alarm send acks.

Also harden the env-gated live tests against two gateway/historian-side limitations
surfaced during validation (neither an OtOpcUa defect): the write readback uses a
timezone-tolerant window (an explicit-timestamp WriteLiveValues lands offset by the
deployment's local-vs-UTC delta — reproducible via raw grpcurl; OtOpcUa sends correct
UTC), and the alarm ReadEvents readback skips with a clear reason when the historian's
server-gated event reads (C2, won't-fix) return nothing. Read + write-persist +
alarm-send are all live-validated green; the alarm send-ack is split into its own test.

Claude-Session: https://claude.ai/code/session_012SDSQ3AcaXqPcBtDESBRii
2026-06-26 23:48:52 -04:00
Joseph Doherty 2982cc4bb5 feat(historian-gateway): feed historized refs to the recorder on deploy (close continuous-historization ref-feed gap)
v2-ci / build (pull_request) Failing after 39s
v2-ci / unit-tests (tests/Core/ZB.MOM.WW.OtOpcUa.Cluster.Tests) (pull_request) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.ControlPlane.Tests) (pull_request) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.OpcUaServer.Tests) (pull_request) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.Runtime.Tests) (pull_request) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.Security.Tests) (pull_request) Has been skipped
v2-ci / integration (tests/Server/ZB.MOM.WW.OtOpcUa.Host.IntegrationTests) (pull_request) Has been skipped
v2-ci / integration (tests/Server/ZB.MOM.WW.OtOpcUa.OpcUaServer.IntegrationTests) (pull_request) Has been skipped
The ContinuousHistorizationRecorder was spawned with an EMPTY historized-ref
set, so it registered interest in nothing and historized nothing. This feeds it
the currently-historized tag refs on every address-space deploy/redeploy so its
DependencyMuxActor interest converges to exactly the historized set (the same
refs the EnsureTags provisioning hook resolves: override-or-FullName).

Design — delta convergence (the plan is a pure DIFF):
- New seam IHistorizedTagSubscriptionSink (Core.Abstractions/Historian) with a
  Null no-op singleton, mirroring how IHistorianProvisioning decouples the T15
  hook. AddressSpaceApplier gains a DEFAULTED ctor param (Null sink) so all ~80
  existing call sites + the production site compile unchanged.
- Apply() only ever sees a plan diff (an incremental/surgical apply carries a
  delta, not the full set), so the applier feeds an add/remove DELTA computed
  from AddedEquipmentTags / RemovedEquipmentTags / ChangedEquipmentTags. The
  recorder keeps the full set and re-registers it. The feed is a single
  non-blocking Tell behind the sink, wrapped in try/catch so a faulting feed
  never blocks or breaks a deploy (same discipline as the provisioning hook).
- Recorder.UpdateHistorizedRefs(added, removed) converges the tracked set, then
  — only when it actually changed — sends ONE RegisterInterest with the full set
  (the mux's RegisterInterest is a full-REPLACE) or one UnregisterInterest when
  it drains to empty (the mux has no per-ref unregister). An unchanged delta is
  a no-op (no mux churn).
- DI: the recorder is now spawned BEFORE the applier so the adapter
  (ActorHistorizedTagSubscriptionSink) can wrap its IActorRef; the Null sink is
  used when continuous historization is off/unwired.

Tests: recorder convergence (add-from-empty, add+remove converge, idempotent,
drain-to-empty unregisters); applier feeds resolved added refs, removed+renamed
deltas, and survives a throwing sink. Build clean (0 warnings on touched
projects); Runtime/OpcUaServer/Gateway/AdminUI suites green.

Claude-Session: https://claude.ai/code/session_012SDSQ3AcaXqPcBtDESBRii
2026-06-26 23:21:18 -04:00
Joseph Doherty 0b4b2e4cfd refactor(historian-gateway): retire Wonderware historian projects (gateway is sole backend)
The HistorianGateway driver is now the sole historian read/write+alarm backend, so the
Wonderware sidecar projects are dead code. Removes the 5 Wonderware projects (driver,
.Client, .Client.Contracts, + their 2 test projects) from the solution and tree, and fully
retires the vestigial 'Historian.Wonderware' driver type (UI/probe-only; it had no driver
factory): the Host probe registration, the AdminUI driver-config surface (driver page,
tag-config editor/model/validator entry, address picker/builder, driver-type catalog +
dropdown + edit-router entries), and their tests. Prunes the now-unused Wonderware
connection fields (Host/Port/UseTls/ServerCertThumbprint/SharedSecret) from
AlarmHistorianOptions (keeping Enabled + the SQLite store-and-forward knobs) and refreshes
the stale XML docs that named Wonderware as the production backend.

Claude-Session: https://claude.ai/code/session_012SDSQ3AcaXqPcBtDESBRii
2026-06-26 19:25:21 -04:00
Joseph Doherty b32436902a test(historian-gateway): env-gated live validation vs wonder-sql-vd03 (read/write/alarm round-trips)
Claude-Session: https://claude.ai/code/session_012SDSQ3AcaXqPcBtDESBRii
2026-06-26 19:01:36 -04:00
Joseph Doherty 2a5c717755 feat(historian-gateway): wire ContinuousHistorizationRecorder into DI + hosted lifecycle + meters
Bind ContinuousHistorizationOptions (Enabled/OutboxPath/CommitMode/
CommitIntervalMs/DrainBatchSize/DrainIntervalSeconds/Capacity/backoff) with a
warn-only Validate(); gated on Enabled AND the ServerHistorian gateway being
configured, the Host registers the durable FasterLogHistorizationOutbox (container
-disposed) + a gateway-backed GatewayHistorianValueWriter, and binds outbox
depth/dropped observable gauges on the central scraped meter. WithOtOpcUaRuntimeActors
spawns the recorder (over the same dependency-mux ref) when the options + writer +
outbox resolve, registering ContinuousHistorizationRecorderKey. Spawned with an EMPTY
historized-ref set: the deployed address space builds later, so ref population is a
documented follow-on (a later SetHistorizedRefs feed) — T18 wires the actor + outbox
+ writer + meters; the ref feed is the known remaining gap.

Claude-Session: https://claude.ai/code/session_012SDSQ3AcaXqPcBtDESBRii
2026-06-26 18:47:20 -04:00
Joseph Doherty 97528c500f fix(historian-gateway): guard recorder outbox-append failures + retry-success test + Sender capture + mux deregister
I-1: Wrap the OnValueChangedAsync AppendAsync in try/catch so a durable-boundary
failure (e.g. a PerEntry fsync hitting disk-full/I-O error) can no longer propagate
out of the handler and trip Akka supervision into a restart loop. A canceled append
during shutdown returns quietly; any other exception increments a new
_outboxAppendFailures counter, logs a Warning (exception type name only), and drops
the value without recording it or nudging the drain. The counter is surfaced on
RecorderStatus (new OutboxAppendFailures field).

I-2: Strengthen Writer_failure_keeps_entry_for_retry to prove the drain actually ran
— assert the writer was invoked (the fake records even on Succeed=false) AND the
outbox stayed at 1 (RemoveAsync not called), via AwaitAssertAsync.

M-3: Capture Sender before the await in the GetStatus handler, then Tell the reply.

M-4: Add Retry_after_writer_failure_eventually_acks proving the retry -> success ->
ack path; FakeValueWriter gains a FailFirstN option + CallCount (Succeed behaviour
unchanged). Short minBackoff keeps it fast and deterministic (AwaitAssert, no sleep).

M-5: Deregister mux interest on PostStop via DependencyMuxActor.UnregisterInterest,
mirroring VirtualTagActor.PostStop, closing the dead-letter window before Terminated.

Claude-Session: https://claude.ai/code/session_012SDSQ3AcaXqPcBtDESBRii
2026-06-26 18:34:19 -04:00
Joseph Doherty bbfbc7b215 feat(historian-gateway): ContinuousHistorizationRecorder actor (outbox->WriteLiveValues, backoff)
Continuous-historization engine for non-Galaxy driver tags. Registers
interest with the per-node DependencyMuxActor for the historized refs and
taps the VirtualTagActor.DependencyValueChanged values the mux fans:
coerce to numeric -> append to the durable IHistorizationOutbox (crash
boundary) -> off-thread drain writes batches through IHistorianValueWriter
and acks (FIFO-truncates) on success, backing off (exponential, capped) on
failure. Non-numeric values are dropped + metered (SQL analog path is
numeric-only).

- New seam IHistorianValueWriter + HistorizationValue in Core.Abstractions
  so Runtime stays free of the gRPC driver.
- GatewayHistorianValueWriter (driver) adapts IHistorianGatewayClient.
  WriteLiveValues: HistorizationValue -> HistorianLiveValue proto, WriteAck
  Success||Queued -> true; non-throwing (errors -> false for retry).
- Drain runs via PipeTo(Self) so the mailbox never blocks on the gateway
  write; appends awaited on the actor thread to stay serialized.

Adaptation vs plan: the mux fans DependencyValueChanged (TagId/Value/
TimestampUtc, no quality), not DriverInstanceActor.AttributeValuePublished,
so values are recorded Good-quality (192) by the same convention the
scripted-alarm host uses.

Claude-Session: https://claude.ai/code/session_012SDSQ3AcaXqPcBtDESBRii
2026-06-26 18:18:34 -04:00
Joseph Doherty 8b4028de84 feat(historian-gateway): EnsureTags provisioning hook in AddressSpaceApplier (non-blocking)
Claude-Session: https://claude.ai/code/session_012SDSQ3AcaXqPcBtDESBRii
2026-06-26 18:03:40 -04:00
Joseph Doherty 035bde0562 fix(historian-gateway): dispose alarm-write channel at shutdown + ServerHistorian startup diagnostic
Claude-Session: https://claude.ai/code/session_012SDSQ3AcaXqPcBtDESBRii
2026-06-26 17:55:44 -04:00
Joseph Doherty 22711444cc fix(historian-gateway): cancellation-safe alarm writer + dispose-safe outbox + provisioner polish + outbox tests
I-1: GatewayAlarmHistorianWriter no longer dead-letters events cancelled
mid-drain at shutdown. WriteBatchAsync short-circuits remaining events to
RetryPlease once cancellation is requested, and SendOneAsync catches
OperationCanceledException (when the token is cancelled) -> RetryPlease,
so in-flight events stay queued instead of being permanently dropped.

I-2: FasterLogHistorizationOutbox.Dispose now guards the awaited periodic
loop with a broad catch (Exception) after the OperationCanceledException
catch, so a non-Faster teardown fault (e.g. ObjectDisposedException) can
never escape Dispose.

M-1: GatewayTagProvisioner skips the empty EnsureTags round-trip when every
request is non-historizable (early return).

M-2: GatewayTagProvisioner handles plain shutdown cancellation quietly
(Debug, not Warning), counting the unsent batch as Failed, never throwing.

M-3/M-4: Added remove-last-entry (TailAddress truncation branch) and
FIFO implicit-ack (RemoveAsync acks up to and including the target)
durability tests, both reopen-and-survive.

M-5: Clarifying comment in RecoverState on the transient over-capacity
rebuild after a crash between append-commit and drop-truncation-commit.

Claude-Session: https://claude.ai/code/session_012SDSQ3AcaXqPcBtDESBRii
2026-06-26 17:47:20 -04:00
Joseph Doherty 0be79219fc feat(historian-gateway): alarm-write cutover — AddAlarmHistorian drains to GatewayAlarmHistorianWriter
Claude-Session: https://claude.ai/code/session_012SDSQ3AcaXqPcBtDESBRii
2026-06-26 17:40:23 -04:00
Joseph Doherty 8559905e8a feat(historian-gateway): IHistorianProvisioning + GatewayTagProvisioner (EnsureTags, non-blocking)
Claude-Session: https://claude.ai/code/session_012SDSQ3AcaXqPcBtDESBRii
2026-06-26 17:30:03 -04:00
Joseph Doherty d3081a659f feat(historian-gateway): GatewayAlarmHistorianWriter — SendEvent + gRPC->outcome mapping
Claude-Session: https://claude.ai/code/session_012SDSQ3AcaXqPcBtDESBRii
2026-06-26 17:27:03 -04:00
Joseph Doherty 555bd477f1 feat(historian-gateway): FasterLog historization outbox (PerEntry/Periodic, drop-oldest)
Claude-Session: https://claude.ai/code/session_012SDSQ3AcaXqPcBtDESBRii
2026-06-26 17:20:06 -04:00
Joseph Doherty 1a6eb7efe6 test(historian-gateway): cover MaxTieClusterOverfetch warning + refresh AddServerHistorian doc
Addresses Task 9 review: add the enabled+nonpositive MaxTieClusterOverfetch warning
test; update the AddServerHistorian XML doc to describe the gateway-backed data source
(the alarm-path Wonderware doc stays until T13).

Claude-Session: https://claude.ai/code/session_012SDSQ3AcaXqPcBtDESBRii
2026-06-26 17:09:45 -04:00
Joseph Doherty 36f7c3c5bf feat(historian-gateway): read cutover — AddServerHistorian builds GatewayHistorianDataSource
Claude-Session: https://claude.ai/code/session_012SDSQ3AcaXqPcBtDESBRii
2026-06-26 17:07:59 -04:00
Joseph Doherty 1d5fa8230e fix(historian-gateway): Dispose() delegates to DisposeAsync() + sync-dispose test
Addresses T7/T8/T11 code-review minors: route the sync dispose through DisposeAsync
so a double Dispose()+DisposeAsync() stays a no-op; cover the sync path.

Claude-Session: https://claude.ai/code/session_012SDSQ3AcaXqPcBtDESBRii
2026-06-26 16:54:23 -04:00
Joseph Doherty 718f1fdad2 feat(historian-gateway): reshape ServerHistorianOptions to gateway form (Endpoint/ApiKey/Tls)
Claude-Session: https://claude.ai/code/session_012SDSQ3AcaXqPcBtDESBRii
2026-06-26 16:52:56 -04:00
Joseph Doherty 35aace7fdf feat(historian-gateway): ReadEventsAsync alarm-history via gateway ReadEvents (+ truncation signal)
Claude-Session: https://claude.ai/code/session_012SDSQ3AcaXqPcBtDESBRii
2026-06-26 16:47:04 -04:00
Joseph Doherty 0a540d9f09 feat(historian-gateway): GetHealthSnapshot via Probe/GetConnectionStatus (counter discipline)
Claude-Session: https://claude.ai/code/session_012SDSQ3AcaXqPcBtDESBRii
2026-06-26 16:45:40 -04:00
Joseph Doherty 1e93b2ebfb feat(historian-gateway): GatewayHistorianDataSource read paths (raw/processed/at-time)
Claude-Session: https://claude.ai/code/session_012SDSQ3AcaXqPcBtDESBRii
2026-06-26 16:44:48 -04:00
Joseph Doherty c51ca2276b fix(historian-gateway): seam maxEvents XML doc + driver Platforms + ValueTask in fake
Addresses Task 1 code-review: document that ReadEventsAsync.maxEvents is enforced
client-side (no server cap in the wire contract); add Platforms=AnyCPU;x64 to match
sibling drivers; use ValueTask.CompletedTask in FakeHistorianGatewayClient.

Claude-Session: https://claude.ai/code/session_012SDSQ3AcaXqPcBtDESBRii
2026-06-26 16:35:08 -04:00
Joseph Doherty a96e85f0e4 feat(historian-gateway): AlarmHistorianEvent->HistorianEvent mapper (SendEvent properties)
Claude-Session: https://claude.ai/code/session_012SDSQ3AcaXqPcBtDESBRii
2026-06-26 16:32:38 -04:00
Joseph Doherty a54c7a9366 feat(historian-gateway): HistorianEvent->HistoricalEvent mapper (+ clamped severity)
Claude-Session: https://claude.ai/code/session_012SDSQ3AcaXqPcBtDESBRii
2026-06-26 16:32:38 -04:00
Joseph Doherty c7296d7458 feat(historian-gateway): sample/aggregate->DataValueSnapshot + quality mapper (Wonderware parity)
Claude-Session: https://claude.ai/code/session_012SDSQ3AcaXqPcBtDESBRii
2026-06-26 16:32:38 -04:00
Joseph Doherty 3226b87818 feat(historian-gateway): DriverDataType->HistorianDataType mapper + write-gap fallbacks (matrix-guarded)
Claude-Session: https://claude.ai/code/session_012SDSQ3AcaXqPcBtDESBRii
2026-06-26 16:32:38 -04:00
Joseph Doherty c822a6b196 feat(historian-gateway): HistoryAggregateType->RetrievalMode mapper (matrix-guarded)
Claude-Session: https://claude.ai/code/session_012SDSQ3AcaXqPcBtDESBRii
2026-06-26 16:32:38 -04:00
Joseph Doherty a98fc46d26 feat(historian-gateway): scaffold Gateway driver project + consume client package
Claude-Session: https://claude.ai/code/session_012SDSQ3AcaXqPcBtDESBRii
2026-06-26 16:18:50 -04:00
Joseph Doherty 04159fd716 test(otopcua): ConfigComposer->ParseComposition DeviceHost round-trip (follow-up E) 2026-06-26 15:39:40 -04:00
Joseph Doherty 0074f37a64 test(otopcua): tighten multi-device collapse assertion + clear warn-state on removal (follow-up E) 2026-06-26 15:16:59 -04:00
Joseph Doherty 50f08635ec feat(otopcua): multi-device-per-driver FixedTree partition (follow-up E) 2026-06-26 15:00:11 -04:00
Joseph Doherty 51721df563 refactor(otopcua): extract authored-only send helper + empty-authored dropped-path test (follow-up D) 2026-06-26 14:44:26 -04:00
Joseph Doherty 05c820795a perf(otopcua): one SetDesiredSubscriptions per driver per redeploy (follow-up D) 2026-06-26 14:30:16 -04:00
Joseph Doherty cde16063d9 test(otopcua): negative + convergence coverage for rebind re-trigger (follow-up C) 2026-06-26 14:18:01 -04:00
Joseph Doherty 533671487e feat(otopcua): re-trigger discovery on config-unchanged rebind (follow-up C) 2026-06-26 14:06:50 -04:00
Joseph Doherty adcd7b57c1 feat(otopcua): driver-level equipment resolution + per-equipment discovered-plan cache (follow-up E) 2026-06-26 13:33:21 -04:00
Joseph Doherty cb7ce7f171 feat(otopcua): EquipmentNode carries DriverInstanceId/DeviceId/DeviceHost (follow-up E projection) 2026-06-26 13:07:31 -04:00
Joseph Doherty e7d5ebe956 fix(otopcua): cancel pending rediscover timer on TriggerRediscovery + test hardening (follow-up C) 2026-06-26 12:57:08 -04:00
Joseph Doherty f7358bf4fd feat(otopcua): DriverInstanceActor.TriggerRediscovery message (follow-up C) 2026-06-26 12:45:26 -04:00
Joseph Doherty a1a655e6c9 test(otopcua): Once re-discovery reruns one pass per reconnect + comment tidy (follow-up B) 2026-06-26 12:38:52 -04:00
Joseph Doherty ce34816a50 feat(otopcua): DriverInstanceActor honors RediscoverPolicy (Never/Once/UntilStable) (follow-up B) 2026-06-26 12:32:28 -04:00
Joseph Doherty efbdaf853c feat(otopcua): set Modbus/S7/Galaxy re-discovery policy to Once + Once-branch test (follow-up B) 2026-06-26 12:26:28 -04:00
Joseph Doherty a378b572af feat(otopcua): add ITagDiscovery.RediscoverPolicy + per-driver assignments (follow-up B) 2026-06-26 12:18:44 -04:00
Joseph Doherty c2c368dcec feat(otopcua): make FixedTree re-discovery per-pass timeout injectable (follow-up A) 2026-06-26 12:12:47 -04:00
Joseph Doherty 25ccd25b6b test(otopcua): assert exact discovered NodeId in the e2e 2026-06-26 09:04:26 -04:00
Joseph Doherty 5104540e32 test(otopcua): cover discovered-node rebind drop + clarify re-apply scope 2026-06-26 09:01:01 -04:00
Joseph Doherty 1aa13ebd27 test(otopcua): end-to-end discovered-node injection + value flow 2026-06-26 08:46:28 -04:00
Joseph Doherty 0788bad145 feat(otopcua): re-inject discovered nodes after address-space rebuild 2026-06-26 08:36:52 -04:00
Joseph Doherty b1e4fba792 fix(otopcua): skip redundant discovered-node re-apply + guard tests 2026-06-26 08:28:42 -04:00
Joseph Doherty 21298ec1b2 fix(otopcua): resume discovery on actor context + bound/harden re-discovery 2026-06-26 08:19:12 -04:00