Commit Graph

360 Commits

Author SHA1 Message Date
Joseph Doherty 0a747c343d feat(runtime): decode IsArray/ArrayLength byte-parity in DeploymentArtifact 2026-06-16 21:40:22 -04:00
Joseph Doherty 584e9f2aee test(opcua): applier forwards array params + overflow rows + doc fix (review)
Extends RecordingSink to capture isArray/arrayLength per EnsureVariable call,
adds two applier-level tests asserting the wire-through for array and scalar
plans, adds float/overflow InlineData rows to ExtractTagArray theory, and
corrects the ExtractTagArray XML-doc wording (null => unbounded ArrayDimensions=[0]).
2026-06-16 21:36:38 -04:00
Joseph Doherty 71cc417182 feat(opcua): EquipmentTagPlan IsArray/ArrayLength + composer ExtractTagArray + applier wire-in 2026-06-16 21:27:43 -04:00
Joseph Doherty 3172b7bdee test(opcua): cover null-arrayLength dimension + tighten scalar assertion (review) 2026-06-16 21:22:43 -04:00
Joseph Doherty a792820283 feat(opcua): EnsureVariable array params (ValueRank=OneDimension + ArrayDimensions) 2026-06-16 21:16:07 -04:00
Joseph Doherty a40c77de87 fix(adminui): flip remaining ModbusTcp test seeds + doc comments to Modbus (review) 2026-06-16 19:51:56 -04:00
Joseph Doherty 8b4675b1a5 fix(adminui): canonicalize Modbus driver-type string on "Modbus" (was ModbusTcp) 2026-06-16 19:39:41 -04:00
Joseph Doherty 23f353e79b test(adminui): close Phase 6 review test-gaps + Enterprise-delete warning 2026-06-16 17:10:40 -04:00
Joseph Doherty 069a5f3165 feat(adminui): Galaxy picker pre-fills native-alarm fields from IsAlarm 2026-06-16 16:55:05 -04:00
Joseph Doherty c00a547191 feat(adminui): isHistorized + historianTagname as first-class Tag fields 2026-06-16 16:44:46 -04:00
Joseph Doherty c98625fd9f feat(adminui): create-new-script from the inline virtual-tag panel 2026-06-16 16:44:14 -04:00
Joseph Doherty 526eebb3bb feat(adminui): UNS-tree delete for Cluster + Enterprise (refuse-if-children, no rowversion) 2026-06-16 16:35:07 -04:00
Joseph Doherty 6a8020e7e7 feat(adminui): native-alarm HistorizeToAveva opt-out 2026-06-16 16:27:31 -04:00
Joseph Doherty 72d414ada7 feat(adminui): typed TagConfig editors for OpcUaClient + Historian 2026-06-16 16:25:19 -04:00
Joseph Doherty 1164d423b6 fix(probe): Galaxy gRPC ping — drop invalid Retry, treat MxGatewayAuth exceptions as reachable (live /run)
v2-ci / build (push) Failing after 44s
v2-ci / unit-tests (tests/Core/ZB.MOM.WW.OtOpcUa.Cluster.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.ControlPlane.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.OpcUaServer.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.Runtime.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.Security.Tests) (push) Has been skipped
v2-ci / integration (tests/Server/ZB.MOM.WW.OtOpcUa.Host.IntegrationTests) (push) Has been skipped
v2-ci / integration (tests/Server/ZB.MOM.WW.OtOpcUa.OpcUaServer.IntegrationTests) (push) Has been skipped
Two bugs caught by live verification against the mxaccessgw at 10.100.0.48:5120:
- MaxAttempts=1 produced an invalid Polly RetryStrategyOptions -> the probe failed
  on every real gateway. Removed the Retry override (matches GalaxyDriver); fail-fast
  is already guaranteed by the TCP preflight + the per-call deadline.
- A rejected key surfaces as a typed MxGatewayAuthenticationException, not a raw
  RpcException, so 'auth-rejection = reachable' was bypassed. Catch the typed auth/
  authorization exceptions -> Ok=true.
Adds DriverProbeHandshakeE2eTests: direct-probe, skip-gated cross-protocol green/red
discrimination (Modbus, OpcUaClient, Galaxy + a local real OPC UA server).
2026-06-16 07:32:59 -04:00
Joseph Doherty 93d9160dae feat(alarms): DriverHostActor routes native-condition acks to the owning driver [H6d] 2026-06-15 14:46:00 -04:00
Joseph Doherty 87dd65b97a test(alarms): native ack wrong-role deny + tidy NativeAlarmAck doc (code-review) 2026-06-15 14:39:26 -04:00
Joseph Doherty a6d9de091b feat(alarms): native condition Acknowledge routes to NativeAlarmAckRouter with principal [H6c] 2026-06-15 14:33:58 -04:00
Joseph Doherty 328bd1b9ee feat(alarms): wire OnEnableDisable over OPC UA (AlarmAck-gated; native→BadNotSupported) [H4] 2026-06-15 14:24:19 -04:00
Joseph Doherty 226587d817 test(alarms): cover isNative rebuild/kind-flip lifecycle + Phase7Applier call-site (code-review) 2026-06-15 14:20:20 -04:00
Joseph Doherty 418663b359 feat(alarms): thread isNative through MaterialiseAlarmCondition; node manager tracks native conditions [H6a] 2026-06-15 14:13:30 -04:00
Joseph Doherty 4c78dcd358 feat(redundancy): wire dbHealth into OpcUaPublishActor + spawn PeerProbeSupervisor per node 2026-06-15 13:33:34 -04:00
Joseph Doherty 5a064e086d test(redundancy): lock in stale-Terminated guard + clarify OnTerminated (code-review) 2026-06-15 13:29:58 -04:00
Joseph Doherty f41e957e07 feat(redundancy): PeerProbeSupervisor maintains one peer OPC UA probe per driver peer 2026-06-15 13:22:38 -04:00
Joseph Doherty 37b32a5623 feat(redundancy): periodic HealthTick refreshes DB reachability via Ask/PipeTo 2026-06-15 13:15:26 -04:00
Joseph Doherty 5382eea9b5 test(redundancy): cover stale-probe-not-demoted branch + make _probeFreshnessWindow readonly (code-review) 2026-06-15 13:11:01 -04:00
Joseph Doherty cf278035d2 feat(redundancy): OpcUaProbeOk from peer-probes-me with freshness debounce 2026-06-15 13:04:41 -04:00
Joseph Doherty a9ff1a64b2 fix(redundancy): always publish first ServiceLevel (even 0) + log SafeSelfStatus failures (code-review) 2026-06-15 13:00:25 -04:00
Joseph Doherty 3e609a2b19 feat(redundancy): OpcUaPublishActor computes ServiceLevel via calculator (DB+stale+leader; legacy seam) 2026-06-15 12:51:32 -04:00
Joseph Doherty ff0f62db38 refactor(redundancy): move ServiceLevelCalculator to Core.Cluster (shared, Runtime-reachable) 2026-06-15 12:45:17 -04:00
Joseph Doherty 4501f12669 feat(vtags): wire IHistoryWriter through DriverHostActor (Null default; durable sink infra-gated) (H5d, stillpending §1) 2026-06-15 10:38:49 -04:00
Joseph Doherty 2f30c54dc1 test(vtags): thread-safe CapturingHistoryWriter + drop redundant wait (H5c review follow-up) 2026-06-15 10:33:14 -04:00
Joseph Doherty 0c6d4c5491 feat(vtags): forward historized vtag results to IHistoryWriter (H5c, stillpending §1) 2026-06-15 10:26:25 -04:00
Joseph Doherty 83d3b9f7be test(vtags): planner detects Historize-only toggle as a change + doc nit (H5a review follow-up) 2026-06-15 10:21:31 -04:00
Joseph Doherty 9c5a091395 feat(vtags): decode VirtualTag Historize from artifact, byte-parity with composer (H5b, stillpending §1) 2026-06-15 10:17:08 -04:00
Joseph Doherty fc8121cbf3 feat(vtags): carry VirtualTag.Historize onto EquipmentVirtualTagPlan (H5a, stillpending §1) 2026-06-15 10:17:05 -04:00
Joseph Doherty ebf2f1dd7a fix(vtags): prune _planByVtag on child termination + crash-then-change test (H1b review follow-up) 2026-06-15 10:12:11 -04:00
Joseph Doherty ada01e1af8 fix(vtags): respawn equipment virtualtag child on in-place plan change (H1b, stillpending §1) 2026-06-15 10:05:29 -04:00
Joseph Doherty 1dc713693a fix(deploy): count removed equipment tags/vtags in RemovedNodes (H1a review follow-up) 2026-06-15 10:01:37 -04:00
Joseph Doherty 1e95856b00 fix(deploy): rebuild address space on changed-only deploys (H1a, stillpending §1) 2026-06-15 09:57:40 -04:00
Joseph Doherty c9643f68ba fix(runtime): restart driver no longer throws 'actor name is not unique'
v2-ci / build (push) Failing after 42s
v2-ci / unit-tests (tests/Core/ZB.MOM.WW.OtOpcUa.Cluster.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.ControlPlane.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.OpcUaServer.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.Runtime.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.Security.Tests) (push) Has been skipped
v2-ci / integration (tests/Server/ZB.MOM.WW.OtOpcUa.Host.IntegrationTests) (push) Has been skipped
v2-ci / integration (tests/Server/ZB.MOM.WW.OtOpcUa.OpcUaServer.IntegrationTests) (push) Has been skipped
HandleRestartDriver stopped + respawned the child within one synchronous
message handler, reusing the base actor name drv-<id>. Context.Stop is async
(the child processes its own stop on its own mailbox), so the old child was
ALWAYS still registered when the respawn ran — Context.ActorOf threw
InvalidActorNameException deterministically on every AdminUI Restart press,
crashing + restarting the host.

Fix: a monotonic _childSpawnGeneration counter (single-threaded actor) feeds a
-g<gen> suffix on every spawned child name, so a respawn can never collide with
the still-terminating predecessor. Children are tracked by the _children dict
(by IActorRef), never by actor path, so the suffix is invisible to callers.
This also closes the same-shaped latent race in the reconcile path (a removed-
then-readded instance, and a driver-type-change ToStop+ToSpawn in one plan).

Regression test RestartDriver_respawns_the_child_without_an_actor_name_collision
(verified: FAILS on the old code with the exact InvalidActorNameException,
PASSES with the fix). Runtime.Tests 238/238 green. Code-reviewed (approved).
2026-06-15 05:41:18 -04:00
Joseph Doherty bea0b482d4 fix(historian): address code review on Raw HistoryRead paging
C1 (critical): a boundary tie cluster larger than NumValuesPerNode could
silently truncate a resumed read to GoodNoData, permanently dropping the
un-emitted ties — the (timestamp, skip) cursor cannot advance past a single
timestamp the fixed-(start,end,cap) backend keeps re-returning. Now detected
and failed LOUDLY per node with BadHistoryOperationUnsupported + a log naming
the tag/timestamp/cap; documented in Historian.md with the larger-cap remedy.
Regression test Raw_tie_cluster_larger_than_page_fails_loudly_not_silently.

I3: build HistoryData before Save() so a projection failure can never orphan a
stored continuation cursor.

N1 (YAGNI): drop the never-produced HistoryReadKind enum + Processed-only
Aggregate/IntervalTicks fields from HistoryContinuationState — only Raw pages.

N3: ComputeResumeCursor guards its documented non-empty precondition.

I1: document InMemoryHistoryContinuationStore's eventual-consistency (test double).

Build clean, 182/182 OpcUaServer tests pass.
2026-06-15 05:15:07 -04:00
Joseph Doherty 94c3ca60fc feat(historian): server-side continuation-point paging for HistoryRead-Raw
The Wonderware historian backend is single-shot — it returns up to
NumValuesPerNode samples with a null continuation point — so paging is
synthesised server-side, time-based, for the only count-capped arm (Raw):

- A full page (count == NumValuesPerNode, NumValuesPerNode > 0) emits an
  opaque 16-byte continuation point and stores a resume cursor; a short page
  (or NumValuesPerNode == 0 "all values") emits none.
- A resume read takes the stored cursor, reads the next page from the boundary
  forward, and emits a fresh CP only if that page is also full.
- The resume cursor is tie-safe (HistoryPaging.ComputeResumeCursor /
  TrimBoundaryDuplicates): the next page resumes from the boundary timestamp
  INCLUSIVE and drops the head ties already returned, so samples sharing the
  boundary SourceTimestamp are neither duplicated nor skipped.

Continuation points are bound to the OPC UA session via the SDK's
ISession.SaveHistoryContinuationPoint / RestoreHistoryContinuationPoint store
(SessionHistoryContinuationStore) — capped by ServerConfiguration.
MaxHistoryContinuationPoints (default 100, oldest-evicted) and disposed on
session close. releaseContinuationPoints is honoured via an override of
HistoryReleaseContinuationPoints (the base dispatcher routes release-only reads
there, never to the per-details arms). An unknown / evicted / released point
resumes to BadContinuationPointInvalid.

Processed and AtTime stay single-shot: neither details type carries a client
count cap, so the single-shot backend returns the complete result in one read
and there is no "full page" signal to page on (spec-conformant). Modified-value
history remains out of scope.

The pure paging decisions + CP store contract are unit-tested via HistoryPaging
+ InMemoryHistoryContinuationStore; the full multi-page round trip is driven
end-to-end through the node manager with an in-memory store + a series-backed
fake historian (the in-process harness is session-less).
2026-06-15 03:02:48 -04:00
Joseph Doherty a5c0c82661 fix(opcua): address code review on write-outcome surfacing
v2-ci / build (push) Failing after 35s
v2-ci / unit-tests (tests/Core/ZB.MOM.WW.OtOpcUa.Cluster.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.ControlPlane.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.OpcUaServer.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.Runtime.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.Security.Tests) (push) Has been skipped
v2-ci / integration (tests/Server/ZB.MOM.WW.OtOpcUa.Host.IntegrationTests) (push) Has been skipped
v2-ci / integration (tests/Server/ZB.MOM.WW.OtOpcUa.OpcUaServer.IntegrationTests) (push) Has been skipped
- A.1 (false-rejection safety): restrict the structural fail-fast's confident-mismatch check to
  the CLOSED set of built-in types ResolveBuiltInDataType emits (numeric families + Boolean/
  String/DateTime/ByteString). Any other expected type (Enumeration, Guid, …) now defers to the
  SDK, so a coercible write (Int32→Enumeration) is never false-rejected. + A7/A8 regression tests.
- C.1: guard BuildWriteFailureAuditEvent (under Lock) in try/catch like ReportAuditEvent, so a
  SetChildValue surprise is swallowed+logged, never thrown out of the fire-and-forget continuation.
2026-06-15 02:45:51 -04:00
Joseph Doherty bb59fd4e75 feat(opcua): surface failed inbound writes to clients (fail-fast, Bad blip, audit event)
Three deferred 'surface the failed write' enhancements on the write-outcome
self-correction path in OtOpcUaNodeManager:

- Item A: synchronous structural fail-fast. EvaluateEquipmentWriteStructure
  (pure static) rejects a structurally-invalid write INLINE (Bad sync) after
  the authz gate but before the optimistic dispatch, so the SDK never applies
  it. Null payload -> BadTypeMismatch; plus a confidence-gated cheap built-in
  type compatibility check (numeric widening + BaseDataType wildcard tolerated;
  uncertain cases defer to the SDK's own coercion).

- Item B: Bad-quality blip on device-write failure. On a revert,
  RevertOptimisticWriteIfNeeded first publishes the still-applied optimistic
  value with StatusCode BadDeviceFailure, then restores the prior value/status
  (both under the existing Lock). Documents the queue-coalescing caveat (a slow
  subscriber may see only the restored value -> the audit event is the reliable
  signal).

- Item C: Part 8 AuditWriteUpdateEvent on device-write failure. Builds an
  AuditWriteUpdateEventState (SourceNode=node, AttributeId=Value, OldValue=prior,
  NewValue=attempted, ClientUserId from the threaded identity, Message carries
  outcome.Reason) under Lock and reports it via Server.ReportEvent OUTSIDE Lock.
  Guarded so auditing-disabled / report failure never breaks the revert.

Threads the writing identity's user-id + node into the continuation. Adds 6
unit tests for EvaluateEquipmentWriteStructure. Build clean (0 warnings);
158/158 OpcUaServer.Tests green.
2026-06-15 02:38:57 -04:00
Joseph Doherty 7f313df7a6 fix(alarms): subscribe native alarms to un-gate the IAlarmSource feed
Phase B native alarms never fired end-to-end: GalaxyDriver suppresses OnAlarmEvent until
an alarm subscription exists (_alarmSubscriptions.Count > 0), but the runtime only attached
the OnAlarmEvent handler and never called SubscribeAlarmsAsync — so the central feed stayed
gated and no transition reached the Part 9 condition / /alerts. Unit tests passed because
they inject through the IAlarmSource seam directly; the deferred live /run surfaced it.

DriverHostActor computes per-driver alarm refs (alarm-bearing tags' FullNames) and hands them
via SetDesiredSubscriptions; DriverInstanceActor calls SubscribeAlarmsAsync for IAlarmSource
drivers on Connected entry and whenever alarm refs are pushed while Connected (the deploy path),
idempotent via a cached handle reset on detach so reconnect re-subscribes.
2026-06-15 00:42:43 -04:00
Joseph Doherty fa2388eabf test(alarms): assert reconnect-dropped native alarm does not dead-letter; tighten severity doc
v2-ci / build (push) Failing after 38s
v2-ci / unit-tests (tests/Core/ZB.MOM.WW.OtOpcUa.Cluster.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.ControlPlane.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.OpcUaServer.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.Runtime.Tests) (push) Has been skipped
v2-ci / unit-tests (tests/Server/ZB.MOM.WW.OtOpcUa.Security.Tests) (push) Has been skipped
v2-ci / integration (tests/Server/ZB.MOM.WW.OtOpcUa.Host.IntegrationTests) (push) Has been skipped
v2-ci / integration (tests/Server/ZB.MOM.WW.OtOpcUa.OpcUaServer.IntegrationTests) (push) Has been skipped
Add AllDeadLetters probe to Native_alarm_during_reconnect_is_dropped_not_forwarded so the
test genuinely guards the Reconnecting state's Receive<NativeAlarmRaised> drop handler —
removing that handler would now cause a dead-letter and fail the assertion (false-negative
gap closed). Reword the ScriptedAlarms.md severity-mapping note: "snaps on the first
transition" → "every transition maps … overriding the authored seed from the first
transition onward", clarifying that MapSeverity runs on every event, not just the first.
2026-06-14 22:56:18 -04:00
Joseph Doherty c03361de1b test(drivers): extract shared stub-driver harness (de-dup) 2026-06-14 22:49:26 -04:00
Joseph Doherty d8129e5ab7 test(alarms): fix reconnect-drop test to use mailbox-ordering approach 2026-06-14 22:48:36 -04:00
Joseph Doherty 51cda2c744 test(alarms): assert OperatorComment flows through ForwardNativeAlarm 2026-06-14 22:41:43 -04:00