Commit Graph

45 Commits

Author SHA1 Message Date
Joseph Doherty b5347faf44 fix(DV-2): clarify placeholder comment, stable MinValue timestamp, drain DCL probe in new tests
- Replace placeholder-loop comment with the double-render guard explanation
- Use _alarmTimestamps.GetValueOrDefault(binding, DateTimeOffset.MinValue) so the
  placeholder timestamp is stable/idempotent across snapshot calls (was UtcNow)
- Add dcl.ExpectMsg<SubscribeAlarmsRequest>() drain in Snapshot_QuietNativeBinding_EmitsPlaceholder
  and Snapshot_NativeBindingWithLiveCondition_NoPlaceholder to consume the DCL message
  the NativeAlarmActor sends at startup
2026-06-17 15:08:37 -04:00
Joseph Doherty 5d07ac24cb feat(debugview): DV-2 emit placeholder rows for quiet native alarm bindings
InstanceActor.BuildAlarmStatesSnapshot now adds an IsConfiguredPlaceholder
row per configured native source binding that currently has no live
condition, so the Debug View tree can show the binding node even when
quiet. A binding is "quiet" when no retained AlarmStateChanged carries its
NativeSourceCanonicalName (DV-1).

Kind derivation: reuses the exact nativeKind value already computed via
ResolveNativeKind(nativeSource.ConnectionName) at the NativeAlarmActor
creation site and stored in a new _nativeAlarmKinds dictionary -- the
accurate per-binding kind (NativeOpcUa vs NativeMxAccess), not the
NativeOpcUa default.

Tests: Snapshot_QuietNativeBinding_EmitsPlaceholder,
Snapshot_NativeBindingWithLiveCondition_NoPlaceholder.
2026-06-17 15:00:20 -04:00
Joseph Doherty 899ad6e106 feat(debugview): DV-1 native-binding linkage on AlarmStateChanged contract chain
Add two additive init-only fields to AlarmStateChanged so the Debug View can
nest live native conditions under their configured source-binding node:
  - NativeSourceCanonicalName (binding canonical name, e.g. "Motor1.MotorAlarms")
  - IsConfiguredPlaceholder (quiet-binding placeholder flag; default false)

Flow on BOTH cross-process paths:
  - Live: proto AlarmStateUpdate fields 22/23 -> StreamRelayActor packs ->
    SiteStreamGrpcClient unpacks (regenerated SiteStreamGrpc/Sitestream.cs).
  - Snapshot (Newtonsoft): record defaults carry through; no special handling.

NativeAlarmActor.Emit now stamps NativeSourceCanonicalName = _source.CanonicalName.
Additive-only: no existing positional constructor or wire frame changed.

Tests: StreamRelayActorTests round-trips both fields pack->unpack;
NativeAlarmActorTests asserts the emitted event carries the binding canonical name.
2026-06-17 14:52:03 -04:00
Joseph Doherty 0e989c867d feat(siteruntime): event-driven Attributes.WriteBatchAndWaitAsync (batched DCL write + trigger + existing WaitForAttribute waiter) + compile mirror 2026-06-17 12:13:02 -04:00
Joseph Doherty b88f04ec2d fix(siteruntime): normalize routed WaitForAttribute response value for cross-process transport 2026-06-17 11:10:17 -04:00
Joseph Doherty af54c8ad11 merge: integrate WaitAsync/M5-audit (parallel session) with galaxy array-write + inbound-timeout fixes 2026-06-17 09:28:15 -04:00
Joseph Doherty bf2f481bb4 fix(siteruntime): normalize routed script return value for cross-process transport
A routed inbound-API call (Route.To(inst).Call(script)) runs the script on
the Site and returns its value to Central inside RouteToCallResponse, which
crosses the Central<->Site PROCESS boundary. A script's natural
'return new { ... }' is a compiler-generated anonymous type that Akka's
cross-process serializer cannot reconstruct on the receiving node, so the
reply was silently dropped and the caller's Route.To().Call() Ask timed out
at 30s with 'Script execution timed out' -- even though the script completed
and all device writes committed.

DeploymentManagerActor.RouteInboundApiCall now projects the routed return
value to a plain CLR graph (Dictionary/List/string/long/double/bool/null)
via a JSON round-trip before placing it in RouteToCallResponse. The graph
round-trips the wire and re-serializes to the same JSON shape the inbound
API expects for the HTTP body / ReturnDefinition validation.

Diagnosed live: IpsenMESMoveIn writes committed + site_events showed the
IpsenMoveIn script completed in ~0.6s, yet the inbound POST returned 500 at
30s; Central's Akka serializer logged 'Writing value of type
<>f__AnonymousType0`1 as Json' at the timeout moment.

379/379 SiteRuntime tests green.
2026-06-17 09:19:12 -04:00
Joseph Doherty c482cac110 feat(siteruntime): unpack routed RouteToWaitForAttributeRequest into InstanceActor (spec §6 site half) 2026-06-17 09:10:08 -04:00
Joseph Doherty 61048a4ecf feat(siteruntime): WaitForAsync/WaitResult + quality-gated WaitAsync (spec §3, §4.2) 2026-06-17 09:05:12 -04:00
Joseph Doherty 04e97f4a87 fix(siteruntime): harden WaitAsync — no spurious match on quality republish, guard throwing predicate, Ask-timeout returns false 2026-06-17 08:44:03 -04:00
Joseph Doherty 75ffa09b8f feat(siteruntime): event-driven Attributes.WaitAsync attribute-change helper
Adds InstanceActor one-shot waiter registry (fast-path + change-match + scheduled
timeout self-eviction), threads per-script timeout token through ScriptRuntimeContext,
and exposes Attributes.WaitAsync(value|predicate, timeout). Replaces handshake busy-poll.
Implements spec docs/plans/2026-06-17-waitfor-attribute-change-helper-spec.md §3-§5;
§6 routed variant + WaitForAsync + quality-only mode deferred.
2026-06-17 08:25:06 -04:00
Joseph Doherty 20760014c2 feat(audit): M5.4 ParentExecutionId tag-cascade for alarm + nested calls (T4) 2026-06-16 21:42:14 -04:00
Joseph Doherty 0780c2e49e docs(m4.4): clear stale deferred/no-op markers for shipped features (relay, bundle-import audit, M5 redaction, audit drill-in, Transport CLI, traceability)
- SiteCallAudit/ServiceCollectionExtensions.cs: drop "still deferred" note on relay; point to SiteCallAuditActor where it lives
- Transport/Import/BundleImporter.cs: update "Only LoadAsync implemented" to reflect all three phases shipped
- SiteRuntime/Scripts/AuditingDbCommand.cs: replace two M5-deferred redaction comments with accurate references to AuditLogOptions.PerTargetOverrides
- SiteRuntime/Scripts/ScriptRuntimeContext.cs: replace "M5 will layer redaction" note with accurate description of shipped redactor
- CentralUI/AuditLogPage.razor.cs: replace "Bundle C wires… no-op seam" with accurate description of HandleRowSelected implementation
- docs/plans/2026-05-24-transport-design.md §13: update from "CLI Deferred / not built in v1" to reflect shipped BundleCommands.cs; update Open Questions entry
- docs/plans/2026-05-24-transport.md: convert Out-of-Scope "Do NOT build CLI" reminder to a factual note that it shipped
- docs/plans/2026-05-24-transport.md.tasks.json: flip all 30 tasks from pending → done (entire Transport feature shipped)
2026-06-16 20:30:29 -04:00
Joseph Doherty 64d6ac7288 refactor(siteruntime): M3.3 ValidateTrustModel delegates to shared ScriptAnalysis + compile-surface parity test 2026-06-16 19:37:50 -04:00
Joseph Doherty feeae1371e fix(multivalue): NJ-3/NJ-4/NJ-5 review fixes
- NJ-3: widen per-row catch to Exception (an STJ encode failure can't abort startup); drop dead null-guard already excluded by the SQL filter
- NJ-4: capture logger/instanceName in locals for the fire-and-forget normalize continuation (match the sibling pattern in this actor)
- NJ-5: emit a warn-log when a malformed List value is imported verbatim; thread an optional ILogger<BundleImporter> to the sync re-import site
2026-06-16 18:25:42 -04:00
Joseph Doherty 5841cec958 feat(siteruntime): normalize old-form List static overrides to native JSON on load 2026-06-16 17:49:21 -04:00
Joseph Doherty 94be5e813b fix(siteruntime): decode List value to typed array before DCL write (OPC UA array write path) 2026-06-16 16:48:28 -04:00
Joseph Doherty ad6bfc8af9 fix(siteruntime): reject SetStaticAttribute with malformed list value (no silent poison persist) 2026-06-16 15:59:30 -04:00
Joseph Doherty 7f97780bb3 feat(siteruntime): decode static List attributes to typed lists in InstanceActor (load/override/set) 2026-06-16 15:52:29 -04:00
Joseph Doherty 96e817a7e1 fix(siteruntime): MV-8 review fixes (construct list inside try; dictionary attr lookup; test hygiene) 2026-06-16 15:48:25 -04:00
Joseph Doherty 4765706e94 feat(dcl): coerce OPC UA array reads to typed list attributes; Bad quality on element mismatch 2026-06-16 15:39:19 -04:00
Joseph Doherty a1d464b50d fix(siteruntime): encode list attribute writes via AttributeValueCodec (was .ToString())
Replace value?.ToString() with AttributeValueCodec.Encode(value) in
AttributeAccessor indexer set and SetAsync, so a List<string>{"a","b"}
encodes to ["a","b"] instead of the garbage ToString representation.
Add using ZB.MOM.WW.ScadaBridge.Commons.Types. Tests verify the codec
contract (list→JSON array, scalar passthrough, null); full round-trip
through the accessor is not viable without a live Akka ActorSystem —
noted in-test with explanation.
2026-06-16 15:38:00 -04:00
Joseph Doherty e2b31a9fd2 fix(siteruntime): M2.12 review nits — observe logger fault + meaningful source fallback (#25)
Replace bare task-discard with ContinueWith(OnlyOnFaulted|ExecuteSynchronously) so a
faulted ISiteEventLogger is logged and swallowed rather than going to the unobserved-task
firehose. Replace the "ScriptRuntimeContext" class-name fallback with the meaningful
"InstanceScript:{instanceName}" identifier (matching the site-event-log source convention).
Update the method doc-comment to state the best-effort contract explicitly. Pin the new
fallback value in the shape-precision test.
2026-06-16 06:26:00 -04:00
Joseph Doherty f08038db23 feat(siteruntime): M2.12 (#25) — emit script Error site event on recursion-limit violation
Inject ISiteEventLogger into ScriptRuntimeContext (additive optional ctor
param, defaulted null, all existing callers source-compatible). Add a single
private EmitRecursionLimitEventAsync helper that fires-and-forgets a
"script"/Error site event; called at both recursion guard sites (CallScript
at ~:332 and ScriptCallHelper.CallShared at ~:499). ScriptExecutionActor
threads the already-resolved siteEventLogger singleton into the context;
AlarmExecutionActor leaves it null (no siteEventLogger wired there).

Existing _logger.LogError + throw behaviour unchanged.

Tests: RecursionLimitSiteEventTests — 5 tests covering both CallScript and
CallShared (ISiteEventLogger.LogEventAsync called once with category "script",
severity "Error"; null logger path does not throw).
2026-06-16 06:20:58 -04:00
Joseph Doherty dbf44b9e10 fix(siteruntime): M2.11 — unknown-instance debug snapshot returns InstanceNotFound=true (#24)
RouteDebugSnapshot and RouteDebugViewSubscribe on DeploymentManagerActor
previously returned an empty DebugViewSnapshot for unknown instances,
indistinguishable from a deployed-but-empty instance. Callers had no way
to differentiate "not deployed here" from "deployed, no data yet."

Approach — additive field on existing message contract:
  Added `bool InstanceNotFound = false` as an optional trailing parameter
  to DebugViewSnapshot (Commons). All existing positional constructor calls
  and serialized wire frames are unaffected (default = false). A dedicated
  new message type was considered but rejected: the ClusterClient channel
  and DebugStreamService TCS are already typed on DebugViewSnapshot, and a
  second reply union would require wider changes for zero additive-safety
  gain.

Changes:
  - Commons/DebugViewSnapshot: add InstanceNotFound = false (additive)
  - DeploymentManagerActor: set InstanceNotFound=true in both unknown-
    instance branches (RouteDebugViewSubscribe, RouteDebugSnapshot)
  - DebugStreamBridgeActor: when snapshot.InstanceNotFound, forward it to
    _onEvent (resolves the TCS) then stop cleanly; no gRPC stream opened
  - DebugView.razor: check session.InitialSnapshot.InstanceNotFound after
    connect and show a clear "not deployed on this site" error toast
  - 3 new tests in DeploymentManagerActorTests covering: unknown→snapshot,
    unknown→subscribe, known-empty→InstanceNotFound stays false
2026-06-16 06:08:21 -04:00
Joseph Doherty 3edef09f51 feat(runtime): per-script execution timeout overriding the global default (#9)
Spec promised a per-script timeout but only the global ScriptExecutionTimeoutSeconds
existed. Add nullable TemplateScript.ExecutionTimeoutSeconds threaded through EF +
flattening (ResolvedScript) to ScriptExecutionActor/AlarmExecutionActor, which use
perScript ?? global for the execution CTS. Includes the EF migration for the new column.
2026-06-15 14:40:38 -04:00
Joseph Doherty d05270640d fix(db): classify transient vs permanent SQL errors in Database.CachedWrite (#7)
CachedWrite buffered ALL write failures and retried forever, never returning a
synchronous failure to the script — permanent SQL errors (constraint/syntax/
permission) were treated as transient. Mirror the External-System API path:
attempt immediately, return Failed synchronously on permanent SQL errors (no
buffering), buffer only transient errors; the S&F retry path parks permanent
failures instead of retrying forever. New SqlErrorClassifier + PermanentDatabaseException.
2026-06-15 13:53:15 -04:00
Joseph Doherty e5534fddca fix(siteeventlog): suppress snapshot-resync alarm re-emit + coverage + hardening (review) 2026-06-15 12:45:00 -04:00
Joseph Doherty e74c3aef23 feat(siteeventlog): emit script started/completed Info events (M1.8)
ScriptExecutionActor previously emitted only an Error 'script' event on failure.
It now also fire-and-forgets an Info 'script' event when execution starts (right
before RunAsync) and when it completes successfully — giving the operational log
the full started/completed/failed lifecycle. Uses the already-resolved
siteEventLogger; fire-and-forget so the event log can never block or fault the
script's own run.

Extends the SingleServiceProvider test helper to also serve IServiceScopeFactory
(returning a self-scope) so ScriptExecutionActor's serviceProvider.CreateScope()
reaches the logging hot path in tests instead of throwing into the catch.
2026-06-15 12:33:31 -04:00
Joseph Doherty 09b9e8f259 feat(siteeventlog): emit deployment + instance_lifecycle events (M1.6)
DeploymentManagerActor now fire-and-forgets a 'deployment' site operational
event on deploy/enable/disable/delete outcomes (Info on success, Error on
failure), source 'DeploymentManagerActor'. The disable/delete events are emitted
from the existing PipeTo continuations (safe: reads only the immutable
_serviceProvider and fire-and-forgets).

InstanceActor now emits an 'instance_lifecycle' Info event in PreStart (started)
and a new PostStop (stopped) — covering start/stop/enable/disable/redeploy/
failover transitions from the instance's own vantage point. Both actors already
hold _serviceProvider; no ctor change.

Resolution is optional and LogEventAsync is fire-and-forget so a logging failure
never affects the deployment pipeline or instance lifecycle.
2026-06-15 12:26:54 -04:00
Joseph Doherty a00e43c4f9 feat(siteeventlog): emit alarm-category events on alarm transitions (M1.5)
AlarmActor (computed) and NativeAlarmActor (native mirror) now fire-and-forget
an 'alarm' site operational event on every state transition:
- raise/activate: Error (priority/severity >= 700) or Warning
- clear/return-to-normal, ack, inter-band transition: Info

Both actors take a new optional IServiceProvider? ctor param (default null so
existing direct-construction tests still compile); InstanceActor passes its
_serviceProvider at the two Props.Create sites. Resolution is optional and the
LogEventAsync call is fire-and-forget, so a logging failure never affects alarm
evaluation. Rehydration replays are not re-logged.

Adds a capturing FakeSiteEventLogger test helper + SingleServiceProvider.
2026-06-15 12:23:04 -04:00
Joseph Doherty 6b0140dd62 fix(sitecallaudit): UpdatedAtUtc index + per-row pull resilience + UTC-convention + first-cycle test (review) 2026-06-15 10:47:25 -04:00
Joseph Doherty 963e3427da feat(sitecallaudit): PullSiteCalls reconciliation plumbing (store read + RPC + site handler + central client)
Site Call Audit (#22): build the documented periodic reconciliation PULL
self-heal path for the eventually-consistent central SiteCalls mirror, as a
dedicated PullSiteCalls gRPC RPC kept separate from the audit pull. This is the
pull PLUMBING only; the central reconciliation tick is a separate follow-up.

- IOperationTrackingStore.ReadChangedSinceAsync(sinceUtc, batchSize): inclusive
  UpdatedAtUtc cursor, oldest-first, batch-capped; SQLite impl projects tracking
  rows onto SiteCallOperational (Kind->Channel, TargetSummary->Target, SourceSite
  left empty - the store has no site-id column).
- sitestream.proto: rpc PullSiteCalls + PullSiteCallsRequest/Response, mirroring
  PullAuditEvents; regenerated checked-in SiteStreamGrpc/*.cs.
- SiteCallDtoMapper.ToDto(SiteCallOperational): inverse of FromDto for the handler.
- SiteStreamGrpcServer.PullSiteCalls handler + SetOperationTrackingStore seam;
  Host wires the seam alongside SetSiteAuditQueue (site roles only).
- Central IPullSiteCallsClient + GrpcPullSiteCallsClient (home: AuditLog/Central to
  reuse ISiteEnumerator; SiteCallAudit does not reference AuditLog). Re-stamps
  SourceSite from the dialed siteId; no-throw on tolerable transport faults;
  SpecifyKind (not ToUniversalTime) cursor handling. Central-only DI registration.

Tests: ReadChangedSinceAsync (4), PullSiteCalls handler (6), GrpcPullSiteCallsClient
(8). Full solution build 0 warnings/0 errors (TreatWarningsAsErrors).
2026-06-15 10:39:06 -04:00
Joseph Doherty eabf270d71 docs: complete XML doc coverage (returns, summaries, inheritdoc)
Resolve all 622 issues flagged by the enhanced CommentChecker: add missing
<returns> tags (incl. the standard phrasing on non-generic Task methods),
add missing <summary> tags, and replace misused/redundant <inheritdoc/> on
members that override or implement nothing with real documentation.
Documentation-only — no behavior change; solution builds clean.
2026-06-03 11:39:32 -04:00
Joseph Doherty db707bb0de feat(audit)!: ScadaBridge C3 — swap to canonical ZB.MOM.WW.Audit.AuditEvent across seams/emitters/DTO/redactor wiring; transitional 24-col storage shim (Task 2.5) 2026-06-02 12:37:50 -04:00
Joseph Doherty 6d318586d1 feat(siteruntime): InstanceActor spawns NativeAlarmActors + enriched alarm snapshot; clear native state on redeploy/undeploy 2026-05-31 02:06:39 -04:00
Joseph Doherty fda7ac9c50 feat(siteruntime): NativeAlarmActor mirrors source alarms (snapshot swap, retention, persistence) 2026-05-31 01:49:28 -04:00
Joseph Doherty 24fd7bee53 feat(siteruntime): site SQLite native_alarm_state store 2026-05-31 01:44:40 -04:00
Joseph Doherty b44a844152 feat(siteruntime): native alarm cap + retry options 2026-05-31 01:42:41 -04:00
Joseph Doherty bfd8b25108 fix(siteruntime): capture Self before Task.Run in artifact deploy; seed MxGateway connections
- DeploymentManagerActor.HandleDeployArtifacts read the Self property inside its
  Task.Run lambda (line dispatching ApplyArtifactDataConnectionsToDcl). Self is
  backed by the ambient ActorCell, null on a thread-pool thread, so it threw
  'no active ActorContext' — surfaced the first time a data connection is
  deployed via deploy-artifacts. Capture Self into a local first (as Sender
  already was).
- seed-sites.sh: create a shared MxGateway data connection (10.100.0.48:5120)
  on each site and deploy artifacts so the DCL establishes them.
- build.sh: nounset-safe empty-array expansion (bash 3.2).
2026-05-29 08:26:39 -04:00
Joseph Doherty 5461e4968e feat(dcl): register MxGateway protocol in factory + config flatten + options
DataConnectionFactory registers 'MxGateway' -> MxGatewayDataConnection over the
real client; AddDataConnectionLayer binds MxGatewayGlobalOptions; DeploymentManager
FlattenConnectionConfig gains an MxGateway arm using the typed serializer. Factory
test confirms Create("MxGateway") returns the adapter.
2026-05-29 07:58:51 -04:00
Joseph Doherty 9b7916bb2e refactor(browse): rename BrowseOpcUaNode* to protocol-agnostic BrowseNode*
Renames BrowseOpcUaNodeCommand/Result -> BrowseNodeCommand/Result and
CommunicationService.BrowseOpcUaNodeAsync -> BrowseNodeAsync across Commons,
Communication, SiteRuntime, DCL actors, and CentralUI. Wire manifest name
follows (BrowseOpcUaNode -> BrowseNode). Browse regression tests green.
2026-05-29 07:57:36 -04:00
Joseph Doherty 2a7dee4afa feat(centralui+dcl): Test Bindings popup — one-shot live read of bound tags
Adds a Test Bindings button to the Connection Bindings table on the Configure
Instance page that opens a modal showing the live current value of every bound
attribute. Reuses the routing path that the OPC UA tag browser landed on:

  Central:  TestBindingsDialog → IBindingTester → CommunicationService
            → ReadTagValuesCommand → SiteEnvelope (Ask)
  Site:     SiteCommunicationActor → DeploymentManagerActor singleton
            → DataConnectionManagerActor → child DataConnectionActor
            → _adapter.ReadBatchAsync

Split mirrors the browse handler:
  • Manager owns ConnectionNotFound (only it sees the per-site connection set).
  • Child owns ConnectionNotConnected (pre-call status check, never stash —
    read is interactive design-time), Timeout (OperationCanceledException),
    ServerError (any other exception). Per-tag failures from ReadBatchAsync
    become failure TagReadOutcomes without aborting the batch.

CentralUI:
  • IBindingTester / BindingTester — Design-role guard via HasClaim against
    JwtTokenService.RoleClaimType (not IsInRole — see c1e16cf), typed
    transport-failure translation.
  • TestBindingsDialog — ShowAsync(siteId, rows, instanceLabel) method-arg
    pattern (no Razor parameter race; see 2c138b6), groups rows by connection
    and issues one ReadAsync per connection in parallel, per-row error subline
    + per-connection banner, Refresh button re-issues the reads.
  • InstanceConfigure.razor — Test Bindings button next to Save Bindings,
    disabled when no testable rows. OPC UA only today (other protocols have
    no ReadTagValuesCommand wiring yet).

Tests:
  • Commons: ReadTagValuesCommand discovered by ManagementCommandRegistry.
  • DataConnectionLayer: unknown connection → ConnectionNotFound,
    not-connected adapter → ConnectionNotConnected (ReadBatchAsync NOT called),
    success-path mapping (Good/Bad + per-tag error), cancellation → Timeout.
  • CentralUI: register IBindingTester (and the previously-missing
    IOpcUaBrowseService) on the existing InstanceConfigureAuditDrillinTests
    Bunit container so the page renders cleanly with the new dialog.
2026-05-28 13:25:48 -04:00
Joseph Doherty f401a9ea0e fix(comm+site): route BrowseOpcUaNodeCommand via DeploymentManagerActor singleton 2026-05-28 12:51:45 -04:00
Joseph Doherty 7b0b9c7365 refactor: rename ScadaLink → ZB.MOM.WW.ScadaBridge (code + projects + namespaces)
Solution + 23 src projects + 26 test projects renamed; folders, csproj,
namespaces, and ScadaLinkDbContext/ScadaBridgeDbContext class updated.
ActorSystem "scadalink" → "scadabridge", Akka seed-node URLs migrated.
SQL roles/logins, LDAP domains, CLI command name, and CLI config dir
(~/.scadalink → ~/.scadabridge) also renamed.

Build green; 5 Host.Tests fail awaiting SQL login rename in next commit.
Pre-existing StaleTagMonitor timing flakes unchanged.

Rename script committed at tools/rename-to-scadabridge.sh.
2026-05-28 09:37:45 -04:00