Commit Graph

168 Commits

Author SHA1 Message Date
Joseph Doherty 60695179ee fix(historian-gateway): historize EVERY aliased tag sharing a mux ref (FU-3 review — close silent-drop)
Code review found a residual silent-data-loss path: a single driver ref (mux
ref) can back SEVERAL historized equipment tags via aliasing (identical machines
sharing a register — DriverHostActor._nodeIdByDriverRef is a HashSet), each with
its own HistorianTagname. The muxRef->single-name map collapsed last-wins, so
under alias + divergent overrides only one historian tag got the value and the
rest were silently dropped — the exact failure class FU-3 exists to eliminate.

Model the fan as muxRef -> HashSet<historianName> and append ONE outbox entry per
name in OnValueChangedAsync (a per-name append failure drops only that name and
continues). Convergence removes/adds each (muxRef, name) pair individually from
the per-ref set, dropping the mux key only when its last name is removed — so
removing one alias leaves the shared ref fanning for the others with no mux churn.

Tests: aliased-refs-each-get-the-value (one fan → both historian names written),
removing-one-alias-keeps-the-ref-registered, and the override-rename test now
feeds a value post-rename to prove the write target actually moved to the new
name. Runtime 350/0, OpcUaServer 327/0; 0 warnings.

Claude-Session: https://claude.ai/code/session_012SDSQ3AcaXqPcBtDESBRii
2026-06-27 00:56:17 -04:00
Joseph Doherty 111adc92b6 fix(historian-gateway): historize under the historian name, not the mux ref, when HistorianTagname overrides (FU-3)
The continuous-historization recorder conflated two identifiers into one string:
the dependency mux fans DependencyValueChanged keyed by the driver FullName
(the mux ref), but a value must be historized under the resolved historian name
(HistorianTagname override, else FullName). In the common no-override case the
two are equal, so it worked; with an override they diverge and the recorder
registered mux interest under a key the mux never fans — that tag's values were
never captured (and, had they been, would have been written under the mux ref).

Carry BOTH identifiers through the seam: a new HistorizedTagRef(MuxRef,
HistorianName) record on IHistorizedTagSubscriptionSink. The applier resolves
MuxRef = FullName and HistorianName = override-or-FullName. The recorder now
keeps a muxRef->historianName map: it registers/filters mux interest by MuxRef
but writes the outbox entry (and drains) under HistorianName. The convergence
handler re-registers the mux only when the registered key-set changes, so an
override-only rename (same FullName) updates the write target without mux churn.

Tests: a divergent-override recorder test (interest by mux ref, value written
under the override name, never the mux ref) + an override-rename no-churn test;
the applier feed tests now assert the full (mux ref, historian name) pairs.
Runtime 348/0, OpcUaServer 327/0; 0 warnings. Closes FU-3.

Claude-Session: https://claude.ai/code/session_012SDSQ3AcaXqPcBtDESBRii
2026-06-27 00:43:28 -04:00
Joseph Doherty b2276b5b04 test(historian-gateway): cover AlarmHistorianOptions.Validate MaxAttempts<=0 warning (FU-4)
The MaxAttempts<=0 warning branch in AlarmHistorianOptions.Validate() was the
only one without a test (the sibling DrainIntervalSeconds/Capacity/
DeadLetterRetentionDays warnings are covered). Add the matching case. Closes FU-4.

Claude-Session: https://claude.ai/code/session_012SDSQ3AcaXqPcBtDESBRii
2026-06-27 00:24:12 -04:00
Joseph Doherty 2982cc4bb5 feat(historian-gateway): feed historized refs to the recorder on deploy (close continuous-historization ref-feed gap)
v2-ci / build (pull_request) Failing after 39s
v2-ci / unit-tests (tests/Core/ZB.MOM.WW.OtOpcUa.Cluster.Tests) (pull_request) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.ControlPlane.Tests) (pull_request) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.OpcUaServer.Tests) (pull_request) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.Runtime.Tests) (pull_request) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.Security.Tests) (pull_request) Has been skipped
v2-ci / integration (tests/Server/ZB.MOM.WW.OtOpcUa.Host.IntegrationTests) (pull_request) Has been skipped
v2-ci / integration (tests/Server/ZB.MOM.WW.OtOpcUa.OpcUaServer.IntegrationTests) (pull_request) Has been skipped
The ContinuousHistorizationRecorder was spawned with an EMPTY historized-ref
set, so it registered interest in nothing and historized nothing. This feeds it
the currently-historized tag refs on every address-space deploy/redeploy so its
DependencyMuxActor interest converges to exactly the historized set (the same
refs the EnsureTags provisioning hook resolves: override-or-FullName).

Design — delta convergence (the plan is a pure DIFF):
- New seam IHistorizedTagSubscriptionSink (Core.Abstractions/Historian) with a
  Null no-op singleton, mirroring how IHistorianProvisioning decouples the T15
  hook. AddressSpaceApplier gains a DEFAULTED ctor param (Null sink) so all ~80
  existing call sites + the production site compile unchanged.
- Apply() only ever sees a plan diff (an incremental/surgical apply carries a
  delta, not the full set), so the applier feeds an add/remove DELTA computed
  from AddedEquipmentTags / RemovedEquipmentTags / ChangedEquipmentTags. The
  recorder keeps the full set and re-registers it. The feed is a single
  non-blocking Tell behind the sink, wrapped in try/catch so a faulting feed
  never blocks or breaks a deploy (same discipline as the provisioning hook).
- Recorder.UpdateHistorizedRefs(added, removed) converges the tracked set, then
  — only when it actually changed — sends ONE RegisterInterest with the full set
  (the mux's RegisterInterest is a full-REPLACE) or one UnregisterInterest when
  it drains to empty (the mux has no per-ref unregister). An unchanged delta is
  a no-op (no mux churn).
- DI: the recorder is now spawned BEFORE the applier so the adapter
  (ActorHistorizedTagSubscriptionSink) can wrap its IActorRef; the Null sink is
  used when continuous historization is off/unwired.

Tests: recorder convergence (add-from-empty, add+remove converge, idempotent,
drain-to-empty unregisters); applier feeds resolved added refs, removed+renamed
deltas, and survives a throwing sink. Build clean (0 warnings on touched
projects); Runtime/OpcUaServer/Gateway/AdminUI suites green.

Claude-Session: https://claude.ai/code/session_012SDSQ3AcaXqPcBtDESBRii
2026-06-26 23:21:18 -04:00
Joseph Doherty 0b4b2e4cfd refactor(historian-gateway): retire Wonderware historian projects (gateway is sole backend)
The HistorianGateway driver is now the sole historian read/write+alarm backend, so the
Wonderware sidecar projects are dead code. Removes the 5 Wonderware projects (driver,
.Client, .Client.Contracts, + their 2 test projects) from the solution and tree, and fully
retires the vestigial 'Historian.Wonderware' driver type (UI/probe-only; it had no driver
factory): the Host probe registration, the AdminUI driver-config surface (driver page,
tag-config editor/model/validator entry, address picker/builder, driver-type catalog +
dropdown + edit-router entries), and their tests. Prunes the now-unused Wonderware
connection fields (Host/Port/UseTls/ServerCertThumbprint/SharedSecret) from
AlarmHistorianOptions (keeping Enabled + the SQLite store-and-forward knobs) and refreshes
the stale XML docs that named Wonderware as the production backend.

Claude-Session: https://claude.ai/code/session_012SDSQ3AcaXqPcBtDESBRii
2026-06-26 19:25:21 -04:00
Joseph Doherty 2a5c717755 feat(historian-gateway): wire ContinuousHistorizationRecorder into DI + hosted lifecycle + meters
Bind ContinuousHistorizationOptions (Enabled/OutboxPath/CommitMode/
CommitIntervalMs/DrainBatchSize/DrainIntervalSeconds/Capacity/backoff) with a
warn-only Validate(); gated on Enabled AND the ServerHistorian gateway being
configured, the Host registers the durable FasterLogHistorizationOutbox (container
-disposed) + a gateway-backed GatewayHistorianValueWriter, and binds outbox
depth/dropped observable gauges on the central scraped meter. WithOtOpcUaRuntimeActors
spawns the recorder (over the same dependency-mux ref) when the options + writer +
outbox resolve, registering ContinuousHistorizationRecorderKey. Spawned with an EMPTY
historized-ref set: the deployed address space builds later, so ref population is a
documented follow-on (a later SetHistorizedRefs feed) — T18 wires the actor + outbox
+ writer + meters; the ref feed is the known remaining gap.

Claude-Session: https://claude.ai/code/session_012SDSQ3AcaXqPcBtDESBRii
2026-06-26 18:47:20 -04:00
Joseph Doherty 97528c500f fix(historian-gateway): guard recorder outbox-append failures + retry-success test + Sender capture + mux deregister
I-1: Wrap the OnValueChangedAsync AppendAsync in try/catch so a durable-boundary
failure (e.g. a PerEntry fsync hitting disk-full/I-O error) can no longer propagate
out of the handler and trip Akka supervision into a restart loop. A canceled append
during shutdown returns quietly; any other exception increments a new
_outboxAppendFailures counter, logs a Warning (exception type name only), and drops
the value without recording it or nudging the drain. The counter is surfaced on
RecorderStatus (new OutboxAppendFailures field).

I-2: Strengthen Writer_failure_keeps_entry_for_retry to prove the drain actually ran
— assert the writer was invoked (the fake records even on Succeed=false) AND the
outbox stayed at 1 (RemoveAsync not called), via AwaitAssertAsync.

M-3: Capture Sender before the await in the GetStatus handler, then Tell the reply.

M-4: Add Retry_after_writer_failure_eventually_acks proving the retry -> success ->
ack path; FakeValueWriter gains a FailFirstN option + CallCount (Succeed behaviour
unchanged). Short minBackoff keeps it fast and deterministic (AwaitAssert, no sleep).

M-5: Deregister mux interest on PostStop via DependencyMuxActor.UnregisterInterest,
mirroring VirtualTagActor.PostStop, closing the dead-letter window before Terminated.

Claude-Session: https://claude.ai/code/session_012SDSQ3AcaXqPcBtDESBRii
2026-06-26 18:34:19 -04:00
Joseph Doherty bbfbc7b215 feat(historian-gateway): ContinuousHistorizationRecorder actor (outbox->WriteLiveValues, backoff)
Continuous-historization engine for non-Galaxy driver tags. Registers
interest with the per-node DependencyMuxActor for the historized refs and
taps the VirtualTagActor.DependencyValueChanged values the mux fans:
coerce to numeric -> append to the durable IHistorizationOutbox (crash
boundary) -> off-thread drain writes batches through IHistorianValueWriter
and acks (FIFO-truncates) on success, backing off (exponential, capped) on
failure. Non-numeric values are dropped + metered (SQL analog path is
numeric-only).

- New seam IHistorianValueWriter + HistorizationValue in Core.Abstractions
  so Runtime stays free of the gRPC driver.
- GatewayHistorianValueWriter (driver) adapts IHistorianGatewayClient.
  WriteLiveValues: HistorizationValue -> HistorianLiveValue proto, WriteAck
  Success||Queued -> true; non-throwing (errors -> false for retry).
- Drain runs via PipeTo(Self) so the mailbox never blocks on the gateway
  write; appends awaited on the actor thread to stay serialized.

Adaptation vs plan: the mux fans DependencyValueChanged (TagId/Value/
TimestampUtc, no quality), not DriverInstanceActor.AttributeValuePublished,
so values are recorded Good-quality (192) by the same convention the
scripted-alarm host uses.

Claude-Session: https://claude.ai/code/session_012SDSQ3AcaXqPcBtDESBRii
2026-06-26 18:18:34 -04:00
Joseph Doherty 1a6eb7efe6 test(historian-gateway): cover MaxTieClusterOverfetch warning + refresh AddServerHistorian doc
Addresses Task 9 review: add the enabled+nonpositive MaxTieClusterOverfetch warning
test; update the AddServerHistorian XML doc to describe the gateway-backed data source
(the alarm-path Wonderware doc stays until T13).

Claude-Session: https://claude.ai/code/session_012SDSQ3AcaXqPcBtDESBRii
2026-06-26 17:09:45 -04:00
Joseph Doherty 718f1fdad2 feat(historian-gateway): reshape ServerHistorianOptions to gateway form (Endpoint/ApiKey/Tls)
Claude-Session: https://claude.ai/code/session_012SDSQ3AcaXqPcBtDESBRii
2026-06-26 16:52:56 -04:00
Joseph Doherty 0074f37a64 test(otopcua): tighten multi-device collapse assertion + clear warn-state on removal (follow-up E) 2026-06-26 15:16:59 -04:00
Joseph Doherty 50f08635ec feat(otopcua): multi-device-per-driver FixedTree partition (follow-up E) 2026-06-26 15:00:11 -04:00
Joseph Doherty 51721df563 refactor(otopcua): extract authored-only send helper + empty-authored dropped-path test (follow-up D) 2026-06-26 14:44:26 -04:00
Joseph Doherty 05c820795a perf(otopcua): one SetDesiredSubscriptions per driver per redeploy (follow-up D) 2026-06-26 14:30:16 -04:00
Joseph Doherty cde16063d9 test(otopcua): negative + convergence coverage for rebind re-trigger (follow-up C) 2026-06-26 14:18:01 -04:00
Joseph Doherty 533671487e feat(otopcua): re-trigger discovery on config-unchanged rebind (follow-up C) 2026-06-26 14:06:50 -04:00
Joseph Doherty adcd7b57c1 feat(otopcua): driver-level equipment resolution + per-equipment discovered-plan cache (follow-up E) 2026-06-26 13:33:21 -04:00
Joseph Doherty cb7ce7f171 feat(otopcua): EquipmentNode carries DriverInstanceId/DeviceId/DeviceHost (follow-up E projection) 2026-06-26 13:07:31 -04:00
Joseph Doherty e7d5ebe956 fix(otopcua): cancel pending rediscover timer on TriggerRediscovery + test hardening (follow-up C) 2026-06-26 12:57:08 -04:00
Joseph Doherty f7358bf4fd feat(otopcua): DriverInstanceActor.TriggerRediscovery message (follow-up C) 2026-06-26 12:45:26 -04:00
Joseph Doherty a1a655e6c9 test(otopcua): Once re-discovery reruns one pass per reconnect + comment tidy (follow-up B) 2026-06-26 12:38:52 -04:00
Joseph Doherty ce34816a50 feat(otopcua): DriverInstanceActor honors RediscoverPolicy (Never/Once/UntilStable) (follow-up B) 2026-06-26 12:32:28 -04:00
Joseph Doherty c2c368dcec feat(otopcua): make FixedTree re-discovery per-pass timeout injectable (follow-up A) 2026-06-26 12:12:47 -04:00
Joseph Doherty 25ccd25b6b test(otopcua): assert exact discovered NodeId in the e2e 2026-06-26 09:04:26 -04:00
Joseph Doherty 5104540e32 test(otopcua): cover discovered-node rebind drop + clarify re-apply scope 2026-06-26 09:01:01 -04:00
Joseph Doherty 1aa13ebd27 test(otopcua): end-to-end discovered-node injection + value flow 2026-06-26 08:46:28 -04:00
Joseph Doherty 0788bad145 feat(otopcua): re-inject discovered nodes after address-space rebuild 2026-06-26 08:36:52 -04:00
Joseph Doherty b1e4fba792 fix(otopcua): skip redundant discovered-node re-apply + guard tests 2026-06-26 08:28:42 -04:00
Joseph Doherty 21298ec1b2 fix(otopcua): resume discovery on actor context + bound/harden re-discovery 2026-06-26 08:19:12 -04:00
Joseph Doherty b9b8d3d389 feat(otopcua): inject discovered nodes into the equipment projection on connect 2026-06-26 07:59:01 -04:00
Joseph Doherty cf6b1abf4c feat(otopcua): re-run driver discovery on reconnect 2026-06-26 07:44:28 -04:00
Joseph Doherty 51634cca38 feat(otopcua): driver-instance post-connect bounded re-discovery 2026-06-26 07:40:24 -04:00
Joseph Doherty ccf93fc029 feat(otopcua): OpcUaPublishActor handles discovered-node materialisation 2026-06-26 07:24:22 -04:00
Joseph Doherty 598cdfad5a feat(otopcua): applier pass to materialise discovered nodes idempotently 2026-06-26 07:16:36 -04:00
Joseph Doherty 93f7586590 fix(otopcua): guard root-level discovered var parent + tighten mapper 2026-06-26 06:59:34 -04:00
Joseph Doherty d7a0da5ea1 feat(otopcua): map discovered nodes under an equipment subfolder 2026-06-26 06:47:18 -04:00
Joseph Doherty da55c6913d feat(otopcua): capturing address-space builder for driver discovery 2026-06-26 06:39:18 -04:00
Joseph Doherty 23b42b424d fix(code-review): resolve OpcUaServer-001 — UNS Area/Line rename refreshes folder DisplayName
A rename-only deploy produced an IsEmpty plan that short-circuited before MaterialiseHierarchy,
leaving the OPC UA folder DisplayName stale. AddressSpacePlanner now diffs UnsAreas/UnsLines by
stable id into a RenamedFolders set (counted in IsEmpty); the applier refreshes the folder in
place via a new UpdateFolderDisplayName on ISurgicalAddressSpaceSink (forwarded through
DeferredAddressSpaceSink so it is NOT inert on driver hosts; falls back to rebuild when the sink
is non-surgical). DeploymentArtifact byte-parity untouched (rename rides the existing Name
round-trip). No EF migration, no serialized wire/proto contract change. +13 OpcUaServer tests, Runtime rebuild test.
2026-06-20 23:10:24 -04:00
Joseph Doherty 3512089c90 review(Runtime): record findings + fix artifact-decode type tolerance
Review at HEAD 7286d320. Runtime-002/006 (Medium): DeploymentArtifact lenient-parse
now degrades wrong-typed JSON fields to defaults/skipped-row instead of throwing (fails
the deploy) + regression tests; byte-parity with AddressSpaceComposer preserved. Runtime-001
(UNS rename) deferred cross-module (needs AddressSpacePlan rename signal + EnsureFolder
rename). 003/004/005 Won't-Fix.
2026-06-19 10:52:22 -04:00
Joseph Doherty de6ce147fc fix(runtime): drop OpcUaProbeResult in redundancy-topic subscribers (no dead-letter) 2026-06-19 00:32:03 -04:00
Joseph Doherty 40e8a23e7c refactor(opcuaserver): rename Phase7* address-space pipeline to AddressSpace*
v2-ci / build (push) Failing after 37s
v2-ci / unit-tests (tests/Core/ZB.MOM.WW.OtOpcUa.Cluster.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.ControlPlane.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.OpcUaServer.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.Runtime.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.Security.Tests) (push) Has been skipped
v2-ci / integration (tests/Server/ZB.MOM.WW.OtOpcUa.Host.IntegrationTests) (push) Has been skipped
v2-ci / integration (tests/Server/ZB.MOM.WW.OtOpcUa.OpcUaServer.IntegrationTests) (push) Has been skipped
The OPC UA address-space build pipeline was named after a v2-roadmap
milestone number rather than its domain. Rename the family to describe
what it does (build/diff/apply the OPC UA address space):

  Phase7Composer          -> AddressSpaceComposer
  Phase7CompositionResult -> AddressSpaceComposition
  Phase7Planner           -> AddressSpacePlanner
  Phase7Plan              -> AddressSpacePlan
  Phase7Applier           -> AddressSpaceApplier
  Phase7ApplyOutcome      -> AddressSpaceApplyOutcome

The 9 Phase7*Tests suites follow suit; Phase7ScriptingEntitiesTests ->
ScriptingEntitiesTests (it tests the scripting migration, not the
pipeline). Log-message prefixes move to the new class names.

Pure mechanical rename, no behavioral change. EF migration classes/IDs
(AddPhase7ScriptingTables, ExtendComputeGenerationDiffWithPhase7) are
immutable and left untouched, as are historical design docs.

Build clean; OpcUaServer 261/261, Runtime 272/272, ScriptingEntities
12/12 green.
2026-06-18 19:16:28 -04:00
Joseph Doherty eb8a8dc19d test(runtime): cover disabled-array + zero-length in artifact parity round-trip (review) 2026-06-16 21:45:19 -04:00
Joseph Doherty 0a747c343d feat(runtime): decode IsArray/ArrayLength byte-parity in DeploymentArtifact 2026-06-16 21:40:22 -04:00
Joseph Doherty a792820283 feat(opcua): EnsureVariable array params (ValueRank=OneDimension + ArrayDimensions) 2026-06-16 21:16:07 -04:00
Joseph Doherty 23f353e79b test(adminui): close Phase 6 review test-gaps + Enterprise-delete warning 2026-06-16 17:10:40 -04:00
Joseph Doherty 6a8020e7e7 feat(adminui): native-alarm HistorizeToAveva opt-out 2026-06-16 16:27:31 -04:00
Joseph Doherty 93d9160dae feat(alarms): DriverHostActor routes native-condition acks to the owning driver [H6d] 2026-06-15 14:46:00 -04:00
Joseph Doherty 418663b359 feat(alarms): thread isNative through MaterialiseAlarmCondition; node manager tracks native conditions [H6a] 2026-06-15 14:13:30 -04:00
Joseph Doherty 4c78dcd358 feat(redundancy): wire dbHealth into OpcUaPublishActor + spawn PeerProbeSupervisor per node 2026-06-15 13:33:34 -04:00
Joseph Doherty 5a064e086d test(redundancy): lock in stale-Terminated guard + clarify OnTerminated (code-review) 2026-06-15 13:29:58 -04:00