Commit Graph

40 Commits

Author SHA1 Message Date
Joseph Doherty af54c8ad11 merge: integrate WaitAsync/M5-audit (parallel session) with galaxy array-write + inbound-timeout fixes 2026-06-17 09:28:15 -04:00
Joseph Doherty bf2f481bb4 fix(siteruntime): normalize routed script return value for cross-process transport
A routed inbound-API call (Route.To(inst).Call(script)) runs the script on
the Site and returns its value to Central inside RouteToCallResponse, which
crosses the Central<->Site PROCESS boundary. A script's natural
'return new { ... }' is a compiler-generated anonymous type that Akka's
cross-process serializer cannot reconstruct on the receiving node, so the
reply was silently dropped and the caller's Route.To().Call() Ask timed out
at 30s with 'Script execution timed out' -- even though the script completed
and all device writes committed.

DeploymentManagerActor.RouteInboundApiCall now projects the routed return
value to a plain CLR graph (Dictionary/List/string/long/double/bool/null)
via a JSON round-trip before placing it in RouteToCallResponse. The graph
round-trips the wire and re-serializes to the same JSON shape the inbound
API expects for the HTTP body / ReturnDefinition validation.

Diagnosed live: IpsenMESMoveIn writes committed + site_events showed the
IpsenMoveIn script completed in ~0.6s, yet the inbound POST returned 500 at
30s; Central's Akka serializer logged 'Writing value of type
<>f__AnonymousType0`1 as Json' at the timeout moment.

379/379 SiteRuntime tests green.
2026-06-17 09:19:12 -04:00
Joseph Doherty c482cac110 feat(siteruntime): unpack routed RouteToWaitForAttributeRequest into InstanceActor (spec §6 site half) 2026-06-17 09:10:08 -04:00
Joseph Doherty 61048a4ecf feat(siteruntime): WaitForAsync/WaitResult + quality-gated WaitAsync (spec §3, §4.2) 2026-06-17 09:05:12 -04:00
Joseph Doherty 04e97f4a87 fix(siteruntime): harden WaitAsync — no spurious match on quality republish, guard throwing predicate, Ask-timeout returns false 2026-06-17 08:44:03 -04:00
Joseph Doherty 75ffa09b8f feat(siteruntime): event-driven Attributes.WaitAsync attribute-change helper
Adds InstanceActor one-shot waiter registry (fast-path + change-match + scheduled
timeout self-eviction), threads per-script timeout token through ScriptRuntimeContext,
and exposes Attributes.WaitAsync(value|predicate, timeout). Replaces handshake busy-poll.
Implements spec docs/plans/2026-06-17-waitfor-attribute-change-helper-spec.md §3-§5;
§6 routed variant + WaitForAsync + quality-only mode deferred.
2026-06-17 08:25:06 -04:00
Joseph Doherty 20760014c2 feat(audit): M5.4 ParentExecutionId tag-cascade for alarm + nested calls (T4) 2026-06-16 21:42:14 -04:00
Joseph Doherty 0780c2e49e docs(m4.4): clear stale deferred/no-op markers for shipped features (relay, bundle-import audit, M5 redaction, audit drill-in, Transport CLI, traceability)
- SiteCallAudit/ServiceCollectionExtensions.cs: drop "still deferred" note on relay; point to SiteCallAuditActor where it lives
- Transport/Import/BundleImporter.cs: update "Only LoadAsync implemented" to reflect all three phases shipped
- SiteRuntime/Scripts/AuditingDbCommand.cs: replace two M5-deferred redaction comments with accurate references to AuditLogOptions.PerTargetOverrides
- SiteRuntime/Scripts/ScriptRuntimeContext.cs: replace "M5 will layer redaction" note with accurate description of shipped redactor
- CentralUI/AuditLogPage.razor.cs: replace "Bundle C wires… no-op seam" with accurate description of HandleRowSelected implementation
- docs/plans/2026-05-24-transport-design.md §13: update from "CLI Deferred / not built in v1" to reflect shipped BundleCommands.cs; update Open Questions entry
- docs/plans/2026-05-24-transport.md: convert Out-of-Scope "Do NOT build CLI" reminder to a factual note that it shipped
- docs/plans/2026-05-24-transport.md.tasks.json: flip all 30 tasks from pending → done (entire Transport feature shipped)
2026-06-16 20:30:29 -04:00
Joseph Doherty 64d6ac7288 refactor(siteruntime): M3.3 ValidateTrustModel delegates to shared ScriptAnalysis + compile-surface parity test 2026-06-16 19:37:50 -04:00
Joseph Doherty feeae1371e fix(multivalue): NJ-3/NJ-4/NJ-5 review fixes
- NJ-3: widen per-row catch to Exception (an STJ encode failure can't abort startup); drop dead null-guard already excluded by the SQL filter
- NJ-4: capture logger/instanceName in locals for the fire-and-forget normalize continuation (match the sibling pattern in this actor)
- NJ-5: emit a warn-log when a malformed List value is imported verbatim; thread an optional ILogger<BundleImporter> to the sync re-import site
2026-06-16 18:25:42 -04:00
Joseph Doherty 5841cec958 feat(siteruntime): normalize old-form List static overrides to native JSON on load 2026-06-16 17:49:21 -04:00
Joseph Doherty 94be5e813b fix(siteruntime): decode List value to typed array before DCL write (OPC UA array write path) 2026-06-16 16:48:28 -04:00
Joseph Doherty ad6bfc8af9 fix(siteruntime): reject SetStaticAttribute with malformed list value (no silent poison persist) 2026-06-16 15:59:30 -04:00
Joseph Doherty 7f97780bb3 feat(siteruntime): decode static List attributes to typed lists in InstanceActor (load/override/set) 2026-06-16 15:52:29 -04:00
Joseph Doherty 96e817a7e1 fix(siteruntime): MV-8 review fixes (construct list inside try; dictionary attr lookup; test hygiene) 2026-06-16 15:48:25 -04:00
Joseph Doherty 4765706e94 feat(dcl): coerce OPC UA array reads to typed list attributes; Bad quality on element mismatch 2026-06-16 15:39:19 -04:00
Joseph Doherty a1d464b50d fix(siteruntime): encode list attribute writes via AttributeValueCodec (was .ToString())
Replace value?.ToString() with AttributeValueCodec.Encode(value) in
AttributeAccessor indexer set and SetAsync, so a List<string>{"a","b"}
encodes to ["a","b"] instead of the garbage ToString representation.
Add using ZB.MOM.WW.ScadaBridge.Commons.Types. Tests verify the codec
contract (list→JSON array, scalar passthrough, null); full round-trip
through the accessor is not viable without a live Akka ActorSystem —
noted in-test with explanation.
2026-06-16 15:38:00 -04:00
Joseph Doherty e2b31a9fd2 fix(siteruntime): M2.12 review nits — observe logger fault + meaningful source fallback (#25)
Replace bare task-discard with ContinueWith(OnlyOnFaulted|ExecuteSynchronously) so a
faulted ISiteEventLogger is logged and swallowed rather than going to the unobserved-task
firehose. Replace the "ScriptRuntimeContext" class-name fallback with the meaningful
"InstanceScript:{instanceName}" identifier (matching the site-event-log source convention).
Update the method doc-comment to state the best-effort contract explicitly. Pin the new
fallback value in the shape-precision test.
2026-06-16 06:26:00 -04:00
Joseph Doherty f08038db23 feat(siteruntime): M2.12 (#25) — emit script Error site event on recursion-limit violation
Inject ISiteEventLogger into ScriptRuntimeContext (additive optional ctor
param, defaulted null, all existing callers source-compatible). Add a single
private EmitRecursionLimitEventAsync helper that fires-and-forgets a
"script"/Error site event; called at both recursion guard sites (CallScript
at ~:332 and ScriptCallHelper.CallShared at ~:499). ScriptExecutionActor
threads the already-resolved siteEventLogger singleton into the context;
AlarmExecutionActor leaves it null (no siteEventLogger wired there).

Existing _logger.LogError + throw behaviour unchanged.

Tests: RecursionLimitSiteEventTests — 5 tests covering both CallScript and
CallShared (ISiteEventLogger.LogEventAsync called once with category "script",
severity "Error"; null logger path does not throw).
2026-06-16 06:20:58 -04:00
Joseph Doherty dbf44b9e10 fix(siteruntime): M2.11 — unknown-instance debug snapshot returns InstanceNotFound=true (#24)
RouteDebugSnapshot and RouteDebugViewSubscribe on DeploymentManagerActor
previously returned an empty DebugViewSnapshot for unknown instances,
indistinguishable from a deployed-but-empty instance. Callers had no way
to differentiate "not deployed here" from "deployed, no data yet."

Approach — additive field on existing message contract:
  Added `bool InstanceNotFound = false` as an optional trailing parameter
  to DebugViewSnapshot (Commons). All existing positional constructor calls
  and serialized wire frames are unaffected (default = false). A dedicated
  new message type was considered but rejected: the ClusterClient channel
  and DebugStreamService TCS are already typed on DebugViewSnapshot, and a
  second reply union would require wider changes for zero additive-safety
  gain.

Changes:
  - Commons/DebugViewSnapshot: add InstanceNotFound = false (additive)
  - DeploymentManagerActor: set InstanceNotFound=true in both unknown-
    instance branches (RouteDebugViewSubscribe, RouteDebugSnapshot)
  - DebugStreamBridgeActor: when snapshot.InstanceNotFound, forward it to
    _onEvent (resolves the TCS) then stop cleanly; no gRPC stream opened
  - DebugView.razor: check session.InitialSnapshot.InstanceNotFound after
    connect and show a clear "not deployed on this site" error toast
  - 3 new tests in DeploymentManagerActorTests covering: unknown→snapshot,
    unknown→subscribe, known-empty→InstanceNotFound stays false
2026-06-16 06:08:21 -04:00
Joseph Doherty 3edef09f51 feat(runtime): per-script execution timeout overriding the global default (#9)
Spec promised a per-script timeout but only the global ScriptExecutionTimeoutSeconds
existed. Add nullable TemplateScript.ExecutionTimeoutSeconds threaded through EF +
flattening (ResolvedScript) to ScriptExecutionActor/AlarmExecutionActor, which use
perScript ?? global for the execution CTS. Includes the EF migration for the new column.
2026-06-15 14:40:38 -04:00
Joseph Doherty d05270640d fix(db): classify transient vs permanent SQL errors in Database.CachedWrite (#7)
CachedWrite buffered ALL write failures and retried forever, never returning a
synchronous failure to the script — permanent SQL errors (constraint/syntax/
permission) were treated as transient. Mirror the External-System API path:
attempt immediately, return Failed synchronously on permanent SQL errors (no
buffering), buffer only transient errors; the S&F retry path parks permanent
failures instead of retrying forever. New SqlErrorClassifier + PermanentDatabaseException.
2026-06-15 13:53:15 -04:00
Joseph Doherty e5534fddca fix(siteeventlog): suppress snapshot-resync alarm re-emit + coverage + hardening (review) 2026-06-15 12:45:00 -04:00
Joseph Doherty e74c3aef23 feat(siteeventlog): emit script started/completed Info events (M1.8)
ScriptExecutionActor previously emitted only an Error 'script' event on failure.
It now also fire-and-forgets an Info 'script' event when execution starts (right
before RunAsync) and when it completes successfully — giving the operational log
the full started/completed/failed lifecycle. Uses the already-resolved
siteEventLogger; fire-and-forget so the event log can never block or fault the
script's own run.

Extends the SingleServiceProvider test helper to also serve IServiceScopeFactory
(returning a self-scope) so ScriptExecutionActor's serviceProvider.CreateScope()
reaches the logging hot path in tests instead of throwing into the catch.
2026-06-15 12:33:31 -04:00
Joseph Doherty 09b9e8f259 feat(siteeventlog): emit deployment + instance_lifecycle events (M1.6)
DeploymentManagerActor now fire-and-forgets a 'deployment' site operational
event on deploy/enable/disable/delete outcomes (Info on success, Error on
failure), source 'DeploymentManagerActor'. The disable/delete events are emitted
from the existing PipeTo continuations (safe: reads only the immutable
_serviceProvider and fire-and-forgets).

InstanceActor now emits an 'instance_lifecycle' Info event in PreStart (started)
and a new PostStop (stopped) — covering start/stop/enable/disable/redeploy/
failover transitions from the instance's own vantage point. Both actors already
hold _serviceProvider; no ctor change.

Resolution is optional and LogEventAsync is fire-and-forget so a logging failure
never affects the deployment pipeline or instance lifecycle.
2026-06-15 12:26:54 -04:00
Joseph Doherty a00e43c4f9 feat(siteeventlog): emit alarm-category events on alarm transitions (M1.5)
AlarmActor (computed) and NativeAlarmActor (native mirror) now fire-and-forget
an 'alarm' site operational event on every state transition:
- raise/activate: Error (priority/severity >= 700) or Warning
- clear/return-to-normal, ack, inter-band transition: Info

Both actors take a new optional IServiceProvider? ctor param (default null so
existing direct-construction tests still compile); InstanceActor passes its
_serviceProvider at the two Props.Create sites. Resolution is optional and the
LogEventAsync call is fire-and-forget, so a logging failure never affects alarm
evaluation. Rehydration replays are not re-logged.

Adds a capturing FakeSiteEventLogger test helper + SingleServiceProvider.
2026-06-15 12:23:04 -04:00
Joseph Doherty 6b0140dd62 fix(sitecallaudit): UpdatedAtUtc index + per-row pull resilience + UTC-convention + first-cycle test (review) 2026-06-15 10:47:25 -04:00
Joseph Doherty 963e3427da feat(sitecallaudit): PullSiteCalls reconciliation plumbing (store read + RPC + site handler + central client)
Site Call Audit (#22): build the documented periodic reconciliation PULL
self-heal path for the eventually-consistent central SiteCalls mirror, as a
dedicated PullSiteCalls gRPC RPC kept separate from the audit pull. This is the
pull PLUMBING only; the central reconciliation tick is a separate follow-up.

- IOperationTrackingStore.ReadChangedSinceAsync(sinceUtc, batchSize): inclusive
  UpdatedAtUtc cursor, oldest-first, batch-capped; SQLite impl projects tracking
  rows onto SiteCallOperational (Kind->Channel, TargetSummary->Target, SourceSite
  left empty - the store has no site-id column).
- sitestream.proto: rpc PullSiteCalls + PullSiteCallsRequest/Response, mirroring
  PullAuditEvents; regenerated checked-in SiteStreamGrpc/*.cs.
- SiteCallDtoMapper.ToDto(SiteCallOperational): inverse of FromDto for the handler.
- SiteStreamGrpcServer.PullSiteCalls handler + SetOperationTrackingStore seam;
  Host wires the seam alongside SetSiteAuditQueue (site roles only).
- Central IPullSiteCallsClient + GrpcPullSiteCallsClient (home: AuditLog/Central to
  reuse ISiteEnumerator; SiteCallAudit does not reference AuditLog). Re-stamps
  SourceSite from the dialed siteId; no-throw on tolerable transport faults;
  SpecifyKind (not ToUniversalTime) cursor handling. Central-only DI registration.

Tests: ReadChangedSinceAsync (4), PullSiteCalls handler (6), GrpcPullSiteCallsClient
(8). Full solution build 0 warnings/0 errors (TreatWarningsAsErrors).
2026-06-15 10:39:06 -04:00
Joseph Doherty eabf270d71 docs: complete XML doc coverage (returns, summaries, inheritdoc)
Resolve all 622 issues flagged by the enhanced CommentChecker: add missing
<returns> tags (incl. the standard phrasing on non-generic Task methods),
add missing <summary> tags, and replace misused/redundant <inheritdoc/> on
members that override or implement nothing with real documentation.
Documentation-only — no behavior change; solution builds clean.
2026-06-03 11:39:32 -04:00
Joseph Doherty db707bb0de feat(audit)!: ScadaBridge C3 — swap to canonical ZB.MOM.WW.Audit.AuditEvent across seams/emitters/DTO/redactor wiring; transitional 24-col storage shim (Task 2.5) 2026-06-02 12:37:50 -04:00
Joseph Doherty 6d318586d1 feat(siteruntime): InstanceActor spawns NativeAlarmActors + enriched alarm snapshot; clear native state on redeploy/undeploy 2026-05-31 02:06:39 -04:00
Joseph Doherty fda7ac9c50 feat(siteruntime): NativeAlarmActor mirrors source alarms (snapshot swap, retention, persistence) 2026-05-31 01:49:28 -04:00
Joseph Doherty 24fd7bee53 feat(siteruntime): site SQLite native_alarm_state store 2026-05-31 01:44:40 -04:00
Joseph Doherty b44a844152 feat(siteruntime): native alarm cap + retry options 2026-05-31 01:42:41 -04:00
Joseph Doherty bfd8b25108 fix(siteruntime): capture Self before Task.Run in artifact deploy; seed MxGateway connections
- DeploymentManagerActor.HandleDeployArtifacts read the Self property inside its
  Task.Run lambda (line dispatching ApplyArtifactDataConnectionsToDcl). Self is
  backed by the ambient ActorCell, null on a thread-pool thread, so it threw
  'no active ActorContext' — surfaced the first time a data connection is
  deployed via deploy-artifacts. Capture Self into a local first (as Sender
  already was).
- seed-sites.sh: create a shared MxGateway data connection (10.100.0.48:5120)
  on each site and deploy artifacts so the DCL establishes them.
- build.sh: nounset-safe empty-array expansion (bash 3.2).
2026-05-29 08:26:39 -04:00
Joseph Doherty 5461e4968e feat(dcl): register MxGateway protocol in factory + config flatten + options
DataConnectionFactory registers 'MxGateway' -> MxGatewayDataConnection over the
real client; AddDataConnectionLayer binds MxGatewayGlobalOptions; DeploymentManager
FlattenConnectionConfig gains an MxGateway arm using the typed serializer. Factory
test confirms Create("MxGateway") returns the adapter.
2026-05-29 07:58:51 -04:00
Joseph Doherty 9b7916bb2e refactor(browse): rename BrowseOpcUaNode* to protocol-agnostic BrowseNode*
Renames BrowseOpcUaNodeCommand/Result -> BrowseNodeCommand/Result and
CommunicationService.BrowseOpcUaNodeAsync -> BrowseNodeAsync across Commons,
Communication, SiteRuntime, DCL actors, and CentralUI. Wire manifest name
follows (BrowseOpcUaNode -> BrowseNode). Browse regression tests green.
2026-05-29 07:57:36 -04:00
Joseph Doherty 2a7dee4afa feat(centralui+dcl): Test Bindings popup — one-shot live read of bound tags
Adds a Test Bindings button to the Connection Bindings table on the Configure
Instance page that opens a modal showing the live current value of every bound
attribute. Reuses the routing path that the OPC UA tag browser landed on:

  Central:  TestBindingsDialog → IBindingTester → CommunicationService
            → ReadTagValuesCommand → SiteEnvelope (Ask)
  Site:     SiteCommunicationActor → DeploymentManagerActor singleton
            → DataConnectionManagerActor → child DataConnectionActor
            → _adapter.ReadBatchAsync

Split mirrors the browse handler:
  • Manager owns ConnectionNotFound (only it sees the per-site connection set).
  • Child owns ConnectionNotConnected (pre-call status check, never stash —
    read is interactive design-time), Timeout (OperationCanceledException),
    ServerError (any other exception). Per-tag failures from ReadBatchAsync
    become failure TagReadOutcomes without aborting the batch.

CentralUI:
  • IBindingTester / BindingTester — Design-role guard via HasClaim against
    JwtTokenService.RoleClaimType (not IsInRole — see c1e16cf), typed
    transport-failure translation.
  • TestBindingsDialog — ShowAsync(siteId, rows, instanceLabel) method-arg
    pattern (no Razor parameter race; see 2c138b6), groups rows by connection
    and issues one ReadAsync per connection in parallel, per-row error subline
    + per-connection banner, Refresh button re-issues the reads.
  • InstanceConfigure.razor — Test Bindings button next to Save Bindings,
    disabled when no testable rows. OPC UA only today (other protocols have
    no ReadTagValuesCommand wiring yet).

Tests:
  • Commons: ReadTagValuesCommand discovered by ManagementCommandRegistry.
  • DataConnectionLayer: unknown connection → ConnectionNotFound,
    not-connected adapter → ConnectionNotConnected (ReadBatchAsync NOT called),
    success-path mapping (Good/Bad + per-tag error), cancellation → Timeout.
  • CentralUI: register IBindingTester (and the previously-missing
    IOpcUaBrowseService) on the existing InstanceConfigureAuditDrillinTests
    Bunit container so the page renders cleanly with the new dialog.
2026-05-28 13:25:48 -04:00
Joseph Doherty f401a9ea0e fix(comm+site): route BrowseOpcUaNodeCommand via DeploymentManagerActor singleton 2026-05-28 12:51:45 -04:00
Joseph Doherty 7b0b9c7365 refactor: rename ScadaLink → ZB.MOM.WW.ScadaBridge (code + projects + namespaces)
Solution + 23 src projects + 26 test projects renamed; folders, csproj,
namespaces, and ScadaLinkDbContext/ScadaBridgeDbContext class updated.
ActorSystem "scadalink" → "scadabridge", Akka seed-node URLs migrated.
SQL roles/logins, LDAP domains, CLI command name, and CLI config dir
(~/.scadalink → ~/.scadabridge) also renamed.

Build green; 5 Host.Tests fail awaiting SQL login rename in next commit.
Pre-existing StaleTagMonitor timing flakes unchanged.

Rename script committed at tools/rename-to-scadabridge.sh.
2026-05-28 09:37:45 -04:00