ScadaBridge

Author	SHA1	Message	Date
Joseph Doherty	75919cec31	test(security): DL-3 review nits — assert OnValidatePrincipal on prod path + warning/doc polish	2026-06-16 08:52:28 -04:00
Joseph Doherty	e89604298d	feat(security): wire DisableLogin flag — auto-login scheme + startup warning	2026-06-16 08:47:19 -04:00
Joseph Doherty	0926ce4dda	test(security): DL-2 review nits — assert IsAuthenticated + clarify handler flag gating	2026-06-16 08:44:06 -04:00
Joseph Doherty	dcd445a380	feat(security): AutoLoginAuthenticationHandler — all-roles system-wide dev auto-login	2026-06-16 08:40:30 -04:00
Joseph Doherty	72691e5577	feat(security): AuthDisableLoginOptions + Roles.All for dev auto-login	2026-06-16 08:36:48 -04:00
Joseph Doherty	fddc69545f	fix(security): M2.19 review nits — idle/refresh config guard + adapter tests + dead-var/doc cleanup (#15 ) - Add SecurityOptionsValidator (IValidateOptions<SecurityOptions>) enforcing RoleRefreshThresholdMinutes < IdleTimeoutMinutes; registered with ValidateOnStart in AddSecurity — startup FAILS if threshold >= idle, so the invariant cannot be silently misconfigured away. - Update SecurityOptions XML-docs: class-level summary distinguishes JWT Bearer path (JwtSigningKey/JwtExpiryMinutes) from Blazor cookie session path (IdleTimeoutMinutes/ RoleRefreshThresholdMinutes); both time fields document the ~45-min effective idle window and the new cross-field constraint. - Remove dead jwtService variable from /auth/login lambda in AuthEndpoints.cs (resolved but never used since login moved to SessionClaimBuilder). - Extract ApplyValidationResultAsync helper from OnValidatePrincipalAsync (pure decision-application step); add 3 adapter tests covering Reject → RejectPrincipal + SignOutAsync; Replace → ReplacePrincipal + ShouldRenew; Keep → no-op. - Fix inaccurate TryRefreshAsync comment (dropped "OR last-activity needs advancing" — the code only returns non-null when roleRefreshDue). - Add InternalsVisibleTo for Security.Tests in Security.csproj. - Add IsRoleRefreshDue tests: missing claim → due; unparsable claim → due; plus integration test covering the full ValidateAsync path for a principal missing zb:lastrolerefresh (triggers refresh + re-stamps anchor rather than keeping stale principal forever). - Add SecurityOptionsValidatorConfigGuardTests: default succeeds; equal fails; greater fails; boundary (idle-1) succeeds; wiring confirmed via AddSecurity container.	2026-06-16 08:12:11 -04:00
Joseph Doherty	8fe7f46df6	feat(security): cookie session idle-timeout + LDAP-free role-mapping refresh (#15 , M2.19) Spike outcome: the shared ILdapAuthService (ZB.MOM.WW.Auth.Abstractions, an external NuGet package) exposes ONLY AuthenticateAsync(username, password, ct) — no passwordless service-account group-search. A live LDAP group re-query for an active session therefore requires a new lib method and is OUT OF SCOPE (cannot modify the external package). Implemented the always-achievable layers (cookie-only; no embedded JWT for cookie principals): - /auth/login now stores the user's raw LDAP groups (one zb:group claim each) plus a zb:lastrolerefresh anchor (login time, UTC), seeding the LastActivity idle anchor too. - SessionClaimBuilder: single shared DRY claim-builder used by BOTH /auth/login AND the refresh path, so the two claim shapes cannot drift (canonical identity/role/scope claims with nameType/roleType pinned, plus the M2.19 group + refresh-anchor additions). - CookieSessionValidator (TimeProvider-injected, unit-testable) + a thin CookieAuthenticationEvents.OnValidatePrincipal adapter: * idle-timeout: a session past IdleTimeoutMinutes (default 30) is RejectPrincipal+SignOut; consistent with the cookie ExpireTimeSpan+SlidingExpiration window (same value). * role refresh WITHOUT LDAP: when older than RoleRefreshThresholdMinutes (new option, default 15) the DB-backed RoleMapper re-runs on the STORED groups, claims are rebuilt via the shared builder, the anchor advances, principal is replaced + cookie renewed. Revoked DB mappings drop the user's roles mid-session. * fail-soft: any refresh error KEEPS the existing principal (no sign-out, never throws) — mirrors the documented "LDAP failure: active sessions continue with current roles". - Documented residual limitation in Component-Security.md: central role-mapping/scope changes apply within ~15 min without LDAP; live directory group-membership changes are picked up only at next login (needs a passwordless group-search on the external ZB.MOM.WW.Auth.Ldap lib — tracked follow-up). Tests (Security.Tests, all green): CookieSessionValidatorTests + SessionClaimBuilderParityTests — idle reject/keep, LDAP-free remap-from-stored-groups, revoked-roles loss, sub-threshold no-refresh, refresh-throws-keeps-session, and login/refresh claim-parity.	2026-06-16 07:54:31 -04:00
Joseph Doherty	a0d9379a4f	fix(debug-stream): M2.18 review nits — thread-safe test mock + AlarmKey null-guard + rename stale test (#26 ) - MockSiteStreamGrpcClient.SubscribeCalls and UnsubscribedCorrelationIds switched from bare List<T> to lock-guarded backing fields with snapshot accessors, eliminating the actor-thread/test-thread data race (matches the existing lock(events) pattern for ReceivedEvents) - AttributeKey and AlarmKey null-guard each component with ?? string.Empty so a null SourceReference/AlarmName/etc. cannot silently collide with an empty-string component in the dedup dictionary - On_Snapshot_Opens_GrpcStream renamed to On_Snapshot_Does_Not_Open_Additional_GrpcStream; assertion updated to confirm exactly one subscribe (the PreStart stream-first open) with no second subscribe after snapshot delivery - _stopped ordering in InstanceNotFound path moved after CleanupGrpc() for consistency with DebugStreamTerminated and ReceiveTimeout handlers	2026-06-16 07:41:41 -04:00
Joseph Doherty	d8519cb464	fix(debug-stream): stream-first lifecycle with replay/dedup (#26 , M2.18) Re-architect DebugStreamBridgeActor from snapshot-first to stream-first so no attribute/alarm event occurring during the snapshot-build + network-transit window is lost (#26). Lifecycle change: - PreStart now opens the gRPC subscription FIRST (alongside sending the SubscribeDebugViewRequest), so live events start flowing immediately. - Phase model via a single _snapshotDelivered flag (mutated only on the actor thread). While buffering (snapshot not yet delivered), AttributeValueChanged/ AlarmStateChanged are appended to an ordered _preSnapshotBuffer instead of being delivered. After snapshot+flush, the same handlers pass through directly. - On DebugViewSnapshot: deliver snapshot, then flush the buffer in arrival order with per-entity dedup, then set _snapshotDelivered=true (pass-through). Dedup rule (exactly-once): - Identity: attributes by (InstanceUniqueName, AttributePath, AttributeName); alarms by (InstanceUniqueName, AlarmName, SourceReference) so native per-condition alarms are not conflated. Keys joined with a NUL delimiter (declared as an escaped char constant; no raw NUL in source) so distinct identities never collide on a space within a name. - Boundary: a buffered event whose timestamp is <= the snapshot's timestamp for the same entity is already reflected -> DROP; strictly-newer (>) -> DELIVER; entity absent from the snapshot -> DELIVER (genuine gap-window event). Preserved paths: - M2.11 InstanceNotFound: with stream-first the gRPC stream is already open, so the not-found path now tears it down (CleanupGrpc) + clears the buffer, does NOT enter pass-through, delivers the not-found snapshot, and stops cleanly. - Reconnect (ReconnectGrpcStream -> OpenGrpcStream) does not touch the phase flag: a mid-session reconnect resumes pass-through; a reconnect during the buffering phase stays buffering until the snapshot arrives. - Communication-008 retry/stability/stop/terminate + ReceiveTimeout orphan net unchanged. Duplicate/late snapshot after delivery is ignored defensively. Tests: 10 new M2.18 tests (stream-first ordering, gap-window buffering, dedup drop/deliver for attrs + alarms, ordering, pass-through, InstanceNotFound teardown, reconnect-during-buffering, reconnect-after-snapshot) + revised the M2.11 not-found test to assert stream teardown. Full DebugStreamBridgeActor class green: 23/23.	2026-06-16 07:33:51 -04:00
Joseph Doherty	c9244d8bda	fix(health): M2.16 review nit — real idempotency guard for SiteEventLog health bridge (#30 ) AddSiteEventLogHealthMetricsBridge registered via AddHostedService(factory-lambda), which sets ImplementationFactory and leaves ImplementationType null. The prior ImplementationType == guard was therefore silently dead — a second call would spin up a second SiteEventLogFailureCountReporter. Fix: add a private SiteEventLogHealthMetricsBridgeMarker singleton and guard on its ServiceType instead. Also corrects the cycle-path comment in both ServiceCollectionExtensions.cs and SiteEventLogFailureCountReporter.cs: StoreAndForward.csproj does reference SiteEventLogging.csproj, so the transitive path HealthMonitoring → StoreAndForward → SiteEventLogging is real, but adding a direct HealthMonitoring → SiteEventLogging reference would NOT create a cycle (SiteEventLogging has no back-edge to HealthMonitoring). The Func<long> seam is a coupling-avoidance measure, not a cycle-breaker. Adds AddSiteEventLogHealthMetricsBridgeTests.AddSiteEventLogHealthMetricsBridge_IsIdempotent_DoesNotDoubleRegister_HostedService as a regression test (builds provider and asserts exactly one reporter via GetServices<IHostedService>().OfType<T>()).	2026-06-16 07:22:35 -04:00
Joseph Doherty	d81f747434	feat(health): wire ISiteEventLogger.FailedWriteCount into SiteHealthReport (#30 , M2.16) Add SiteHealthReport.SiteEventLogWriteFailures (trailing optional long = 0, additive-only), ISiteHealthCollector.SetSiteEventLogWriteFailures (default no-op so existing fakes compile), and SiteEventLogFailureCountReporter (hosted service in HealthMonitoring, Func<long> delegate to avoid the HealthMonitoring → StoreAndForward → SiteEventLogging cycle). Registration helper AddSiteEventLogHealthMetricsBridge added to HealthMonitoring.ServiceCollectionExtensions; wired in SiteServiceRegistration after AddSiteEventLogging. Tests: SiteEventLogWriteFailuresMetricTests (4 collector tests) + SiteEventLogFailureCountReporterTests (2 poller tests) in HealthMonitoring.Tests. 79/79 HealthMonitoring.Tests green, 59/59 SiteEventLogging.Tests green, 0 warnings.	2026-06-16 07:14:54 -04:00
Joseph Doherty	e1ee37e508	fix(siteeventlog): gate EventLogPurge to active node via IClusterNodeProvider.SelfIsPrimary (#29 , M2.15)	2026-06-16 07:02:26 -04:00
Joseph Doherty	6b1cb9e0e6	refactor(host)/test: M2.14 review nits — simplify probe cancellation + pre-cancelled-token test (#28 ) - Remove redundant linked CancellationTokenSource in ProbeAsync; pass the framework cancellationToken and ProbeTimeout directly to Ask (the two-CTS pattern was redundant — Ask already honours both the timeout and the token). - Add EchoActor XML <remarks> explaining why no Receive<Identify> handler is needed (ActorBase answers Identify automatically). - Add PreCancelledToken_ReportsUnhealthy_DoesNotThrow test: verifies the never-throws guarantee on the shutdown-race path (token already cancelled before CheckHealthAsync is invoked).	2026-06-16 06:54:28 -04:00
Joseph Doherty	253bec5a52	feat(host): readiness gates on required cluster singletons (#28 , M2.14) REQ-HOST-4a lists "required cluster singletons running (if applicable)" as a readiness criterion, but /health/ready only checked database + akka-cluster. Add a third Ready-tagged check, RequiredSingletonsHealthCheck, registered in the Central-role AddHealthChecks() chain (so it is naturally role-scoped — site nodes never run it). Probe: for each required central singleton, Ask its local ClusterSingletonProxy an Identify with a short bounded per-singleton timeout (~2s, probes run concurrently via Task.WhenAll). A non-null ActorIdentity.Subject within the timeout means the singleton is running and reachable through the proxy; a null subject or a timeout means unreachable → Unhealthy, naming the unreachable singleton(s). The check never throws (catch-all → Unhealthy) and resolves ActorSystem lazily from DI per probe (Unhealthy if Akka not yet up). Required-always set = the five singleton proxies created unconditionally in AkkaHostedService.RegisterCentralActors: notification-outbox, audit-log-ingest, site-call-audit, audit-log-purge, site-audit-reconciliation. There are no feature/config-gated central singletons today; any future gated singleton is the "if applicable" case and must NOT be added to the required set. Leadership-agnostic: the proxy reaches the singleton from either central node, so a ready standby still reports ready (readiness must not require cluster leadership — that is the Active tier's job). During a brief singleton handover the probe may time out and the node flaps to not-ready, which is correct (a node mid-handover is legitimately not fully ready); no retries, to keep the probe fast. Tests (TDD): RequiredSingletonsHealthCheckTests exercises the probe against a TestKit ActorSystem — all proxies present+reachable → Healthy; one missing → Unhealthy naming it; ActorSystem absent → Unhealthy, no throw. HealthCheckTests regression-guards the Ready tag + absence of the Active tag on the new check.	2026-06-16 06:49:18 -04:00
Joseph Doherty	722b8663c1	feat(dcl): populate obtainable NativeAlarmTransition fields from OPC UA and MxGateway (#27 , M2.13) OPC UA (RealOpcUaClient): - Append 5 new SelectClauses at indices 13–17 (never renumber 0–12): - 13: AlarmConditionType/ActiveState/TransitionTime → OriginalRaiseTime - 14–17: LimitAlarmType HighHighLimit/HighLimit/LowLimit/LowLowLimit → LimitValue - New OpcUaAlarmMapper.PickLimitValue helper: first non-null in HiHi→Hi→Lo→LoLo priority order, InvariantCulture-formatted; empty string for non-limit alarm types. - HandleAlarmEvent reads new indices with fields.Count > N guards; hard minimum (6) unchanged so base ConditionType events still process without the limit fields. - Document unavailable-by-protocol fields (Category, Description, OperatorUser, CurrentValue) inline in BuildAlarmEventFilter and HandleAlarmEvent. MxGateway (MxGatewayAlarmMapper): - MapTransition: CurrentValue and LimitValue now populated via MxValueToString (uses MxValueExtensions.ToClrValue + InvariantCulture) from OnAlarmTransitionEvent proto fields current_value/limit_value. - MapSnapshot: same — populated from ActiveAlarmSnapshot.current_value/limit_value. - MxValueToString helper (internal): null-safe MxValue → string conversion. Tests (17 new, 40 total pass): - OpcUaAlarmMapperTests: PickLimitValue priority, InvariantCulture, all-null case. - MxGatewayAlarmMapperTests: CurrentValue/LimitValue populate from double/string MxValue; absent fields yield empty strings. - RealOpcUaClientAlarmFilterTests: index alignment assertions (count=18, per-index TypeDefinitionId+BrowsePath), regression guard on existing indices 0–12.	2026-06-16 06:37:19 -04:00
Joseph Doherty	e2b31a9fd2	fix(siteruntime): M2.12 review nits — observe logger fault + meaningful source fallback (#25 ) Replace bare task-discard with ContinueWith(OnlyOnFaulted\|ExecuteSynchronously) so a faulted ISiteEventLogger is logged and swallowed rather than going to the unobserved-task firehose. Replace the "ScriptRuntimeContext" class-name fallback with the meaningful "InstanceScript:{instanceName}" identifier (matching the site-event-log source convention). Update the method doc-comment to state the best-effort contract explicitly. Pin the new fallback value in the shape-precision test.	2026-06-16 06:26:00 -04:00
Joseph Doherty	f08038db23	feat(siteruntime): M2.12 (#25 ) — emit script Error site event on recursion-limit violation Inject ISiteEventLogger into ScriptRuntimeContext (additive optional ctor param, defaulted null, all existing callers source-compatible). Add a single private EmitRecursionLimitEventAsync helper that fires-and-forgets a "script"/Error site event; called at both recursion guard sites (CallScript at ~:332 and ScriptCallHelper.CallShared at ~:499). ScriptExecutionActor threads the already-resolved siteEventLogger singleton into the context; AlarmExecutionActor leaves it null (no siteEventLogger wired there). Existing _logger.LogError + throw behaviour unchanged. Tests: RecursionLimitSiteEventTests — 5 tests covering both CallScript and CallShared (ISiteEventLogger.LogEventAsync called once with category "script", severity "Error"; null logger path does not throw).	2026-06-16 06:20:58 -04:00
Joseph Doherty	d160c7f694	test(communication): M2.11 review nits — bridge-actor not-found test + dead-letter comment + toast wording (#24 ) - Add DebugStreamBridgeActorTests: On_InstanceNotFound_Snapshot_Forwards_To_OnEvent_Does_Not_Open_Stream_And_Terminates — asserts _onEvent receives the not-found snapshot, SubscribeCalls remains empty, and the actor terminates cleanly via Watch/ExpectTerminated. - Add comment in DebugStreamBridgeActor near Context.Stop(Self) explaining that the subsequent StopDebugStream Tell from DebugStreamService.StopStream produces a benign expected dead-letter. - Reword not-found toast in DebugView.razor to "Instance not found on the selected site — check the deployment target." (accurate when the instance may be deployed to a different site).	2026-06-16 06:15:26 -04:00
Joseph Doherty	dbf44b9e10	fix(siteruntime): M2.11 — unknown-instance debug snapshot returns InstanceNotFound=true (#24 ) RouteDebugSnapshot and RouteDebugViewSubscribe on DeploymentManagerActor previously returned an empty DebugViewSnapshot for unknown instances, indistinguishable from a deployed-but-empty instance. Callers had no way to differentiate "not deployed here" from "deployed, no data yet." Approach — additive field on existing message contract: Added `bool InstanceNotFound = false` as an optional trailing parameter to DebugViewSnapshot (Commons). All existing positional constructor calls and serialized wire frames are unaffected (default = false). A dedicated new message type was considered but rejected: the ClusterClient channel and DebugStreamService TCS are already typed on DebugViewSnapshot, and a second reply union would require wider changes for zero additive-safety gain. Changes: - Commons/DebugViewSnapshot: add InstanceNotFound = false (additive) - DeploymentManagerActor: set InstanceNotFound=true in both unknown- instance branches (RouteDebugViewSubscribe, RouteDebugSnapshot) - DebugStreamBridgeActor: when snapshot.InstanceNotFound, forward it to _onEvent (resolves the TCS) then stop cleanly; no gRPC stream opened - DebugView.razor: check session.InitialSnapshot.InstanceNotFound after connect and show a clear "not deployed on this site" error toast - 3 new tests in DeploymentManagerActorTests covering: unknown→snapshot, unknown→subscribe, known-empty→InstanceNotFound stays false	2026-06-16 06:08:21 -04:00
Joseph Doherty	9cd62aa5b4	test(configdb): M2.10 review fix — catch bracketed AuditLog identifiers; document EF/multi-line scan limits (#18 ) Extends ContainsAuditLogMutation regex to match T-SQL bracketed forms ([AuditLog], [dbo].[AuditLog]) that SSMS-generated SQL produces; the prior optional-schema pattern only matched bare/dbo-prefixed names, silently missing these real violation forms. Changes: - Schema sub-pattern (?:dbo\.)? → (?:\[?dbo\]?\.)? (matches dbo. and [dbo].) - Table sub-pattern AuditLog\b → \[?AuditLog\]?\b (matches AuditLog and [AuditLog]) - Pattern compiled as static readonly Regex field for clarity/performance - Adds 4 new planted-positive cases: UPDATE [dbo].[AuditLog], UPDATE [AuditLog], DELETE FROM [dbo].[AuditLog], DELETE FROM [AuditLog] - Retains all existing negatives; adds DELETE FROM [dbo].[Notifications] negative - Fixes misleading "reverse order" comment on the comment-prefix positive case - Documents scan limitations (EF Core bulk methods; multi-line DML) in class XML doc	2026-06-16 05:55:27 -04:00
Joseph Doherty	e7b6fe33a4	test(configdb): guard test for AuditLog append-only invariant (M2.10, #18 ) Adds AuditLogAppendOnlyGuardTests.cs to tests/ZB.MOM.WW.ScadaBridge.ConfigurationDatabase.Tests/ — a code-level backstop for the DB-role DENY UPDATE / DENY DELETE control established in migration 20260602174346_CollapseAuditLogToCanonical. The guard scans every non-Designer, non-Snapshot .cs file in the ConfigurationDatabase source tree and fails the test run if any line matches the DML-syntax pattern: UPDATE\s+(?:dbo\.)?AuditLog\b DELETE\s+(?:FROM\s+)?(?:dbo\.)?AuditLog\b The tight DML-syntax pattern naturally excludes false positives without extra exclusion checks: DENY UPDATE ON dbo.AuditLog is not matched (UPDATE is followed by ON, not the table name); ALTER TABLE … SWITCH and TRUNCATE contain no UPDATE/ DELETE keyword; comments with UPDATE/AuditLog in separate clauses are not matched. Self-verifying unit tests (ContainsAuditLogMutation_) prove the helper: - returns false on clean-source lines (INSERT, SELECT, DENY DDL, ALTER SWITCH, TRUNCATE, DELETE FROM Notifications); - returns TRUE on planted violations (UPDATE AuditLog SET …, DELETE FROM dbo.AuditLog WHERE …, lower-case variants); - returns false on the exact DENY/GRANT/partition-switch strings from the production migration files. All 256 ConfigurationDatabase.Tests pass; solution builds 0 W / 0 E.	2026-06-16 05:49:51 -04:00
Joseph Doherty	76198b36e3	fix(host): add MachineDataDb startup validation for Central (reverts Host-008, M2.9 #17 ) REQ-HOST-3/REQ-HOST-4 require a MachineDataDb connection string for Central nodes. The shipped docker appsettings (docker/central-node-a/appsettings.Central.json and central-node-b) already carry the key. Host-008 had removed the fail-fast Require because MachineDataDb had no consumer yet; this commit reverses that decision so a misconfigured or missing connection string is caught at startup with a clear error. Changes: - DatabaseOptions: add MachineDataDb property with XML doc comment - StartupValidator: add .Require for ScadaBridge:Database:MachineDataDb inside the existing Central .When block, immediately after the ConfigurationDb Require - StartupValidatorTests: rename Central_MissingMachineDataDb_PassesValidation -> FailsValidation and flip to Assert.Throws; update comment to cite REQ-HOST-3/4, shipped docker appsettings, and the Host-008 reversal; add MachineDataDb to ValidCentralConfig() so all other Central tests remain green - CentralDbTestEnvironment: supply ScadaBridge__Database__MachineDataDb env var (mirrors ConfigurationDb pattern) so HostStartupTests, HealthCheckTests, and MetricsEndpointTests pass through the new Require - CompositionRootTests, AkkaHostedServiceAuditWiringTests, ActorPathTests: set ScadaBridge__Database__MachineDataDb env var alongside the pepper env var and clear it in Dispose, matching the existing pepper handling pattern Build: 0 warnings, 0 errors. dotnet test Host.Tests: 233/233 passed.	2026-06-16 05:41:25 -04:00
Joseph Doherty	21b801b71f	test(template): M2.8 review nits — stale-binding comment + stale-ID & inert-check tests (#23 ) Add code comments in ValidateConnectionBindingCompleteness explaining that the unbound-attribute branch also covers the silently-dropped stale-binding case (cross-reference FlatteningService.ApplyConnectionBindings), and that the `continue` skips the exists-at-site check for unbound attrs. Add two new tests: - FlatteningPipelineConnectionBindingTests: stale DataConnectionId (999) not present in site connections → flattener drops it silently → validator reports ConnectionBinding Error, IsValid false. - ValidationServiceTests: enforce:true + siteConnectionNames:null on a properly-bound attribute → no ConnectionBinding error (exists-at-site check stays inert when site set is not supplied).	2026-06-16 05:34:56 -04:00
Joseph Doherty	7c14a69091	feat(#23 ): elevate connection-binding completeness to a deploy-gating Error (M2.8) Pre-deployment validation only WARNED when a data-sourced attribute had no connection binding, so an instance with unresolved bindings still passed IsValid and could deploy. There was also no check that a binding resolves to a connection that actually exists at the target site. - ValidationService.Validate gains an opt-in `enforceConnectionBindings` flag (default false) plus a `siteConnectionNames` set. Default-false keeps the template DESIGN-TIME path (ManagementActor.HandleValidateTemplate) non-blocking, since bindings are legitimately set later at instance/deploy time. The DEPLOY path (FlatteningPipeline) opts in (true) so: * a data-sourced attribute with no binding is now a deploy-gating Error; * a binding to a connection that does not exist on the target site is an Error. Static (non-data-sourced) attributes are never flagged. - FlatteningPipeline computes the site-connection-names set from the loaded site data connections (mirroring M2.1's alarmCapableConnectionNames) and threads it in. - Tests: TemplateEngine.Tests covers design-time warning / deploy-time error / static-ok / exists-at-site / non-existent-connection. New FlatteningPipelineConnectionBindingTests proves the deploy path enforces it. Mark M2.7 + M2.8 completed in the plan task tracker.	2026-06-16 05:28:06 -04:00
Joseph Doherty	a8e9e9952d	fix(template): M2.7 review nits — comment-aware arg tokenizer + stricter numeric-literal inference (#20/#21) SplitCallArguments now skips C# line (`//`) and block (`/* */`) comments when tokenizing the argument list, so a comma inside a comment no longer produces a spurious arg-count mismatch. IsNumericLiteral now explicitly rejects tokens whose first non-sign character is `_` or a letter (e.g. `_2`), and restricts underscore digit-separators to positions after at least one digit, preventing identifier-shaped tokens from being inferred as Integer/Float.	2026-06-16 05:21:23 -04:00
Joseph Doherty	958229e1f8	feat(template): SemanticValidator script-call return-type (#20 ) + argument-type (#21 ) checks — M2.7 #20 return-type: when a CallScript/CallShared result is assigned directly into a typed local declaration (optionally awaited, optionally via an Instance./ Scripts./Parent./Children["x"]. receiver), compare the LHS declared type against the target script's declared ReturnDefinition and flag clear cross-category mismatches (ReturnTypeMismatch). Previously BuildReturnMap was built but never read. #21 argument-type: positional call arguments are now split (paren/brace/bracket + string-literal aware) and each literal-inferable argument is checked against the target's declared parameter type (ParameterMismatch), not just the count. Conservative — only CLEAR primitive mismatches (String/Integer/Float/Boolean) are flagged; Integer<->Float widening is tolerated. Unknown/Object/List declarations, var/untyped/unused/expression-embedded assignments, and non-literal arguments (variables, member access, method/await chains, casts, object/array initializers, compound or concatenated expressions, interpolated strings) are never flagged. Inference limits documented in code. Adds 16 SemanticValidatorTests covering mismatch detection, correct-call pass, and the dynamic/unknown no-false-positive cases.	2026-06-16 05:11:40 -04:00
Joseph Doherty	411d0c043b	fix(inbound-api): M2.6 review nits — legacy required default, recursion depth guard, return-validator comment (#13 ) - legacy flat-array "required":"false" (string) now treated as optional (matches migration) - depth ceiling (32) on InboundApiSchema Parse/Validate recursion — guards against stack-overflow from a deeply-nested stored schema (Parse throws->400, Validate adds error) - DocOptions.MaxDepth=128 so the application-level structural guard fires before the System.Text.Json reader ceiling (each schema level = ~3 JSON reader levels) - comment the intentional ParameterValidator/ReturnValueValidator early-return asymmetry - note intentional datetime->string legacy collapse in NormalizeType - tests: legacy string-false optional, parse/validate depth ceiling, scalar return schema	2026-06-15 15:18:44 -04:00
Joseph Doherty	4b6187c853	feat(inbound-api): nested Object/List extended-type validation (#13 ) Object/List parameters and return values were shape-validated only (object vs array), with no field-level/nested type checks — type-wrong nested data passed inbound validation and failed only at script runtime. Add recursive type validation (declared Object field types, List element type, scalars at any depth) with path-qualified errors, symmetric across ParameterValidator and ReturnValueValidator. Both validators now parse the canonical JSON Schema definition format (the Central UI / MigrateParametersToJsonSchema output) via a shared recursive engine, Commons.Types.InboundApi.InboundApiSchema, instead of the legacy flat [{name,type}] array which they could not even deserialize from migrated rows. The legacy flat-array form is still accepted on read for transition safety. Undeclared fields are rejected at every level (consistent with the existing top-level unexpected-parameter rejection); a present-but-null value satisfies any type, only absence of a required field is an error.	2026-06-15 15:04:28 -04:00
Joseph Doherty	3032faac0d	fix(template): preserve per-script ExecutionTimeoutSeconds across UI edits; add alarm fallback tests (#9 ) The UI script editor has no ExecutionTimeoutSeconds control (authoring deferred), so a body edit silently cleared a timeout set via Transport import. Round-trip the loaded value so UI edits preserve it. Add the missing AlarmExecutionActor null/<=0 fallback tests for symmetry with ScriptExecutionActor.	2026-06-15 14:49:37 -04:00
Joseph Doherty	3edef09f51	feat(runtime): per-script execution timeout overriding the global default (#9 ) Spec promised a per-script timeout but only the global ScriptExecutionTimeoutSeconds existed. Add nullable TemplateScript.ExecutionTimeoutSeconds threaded through EF + flattening (ResolvedScript) to ScriptExecutionActor/AlarmExecutionActor, which use perScript ?? global for the execution CTS. Includes the EF migration for the new column.	2026-06-15 14:40:38 -04:00
Joseph Doherty	00304a26e6	fix(dcl): resolve OPC UA alarm type NodeId to friendly name so conditionFilter works (#8 ) HandleAlarmEvent set AlarmTypeName to the event-type NodeId string ("i=9341"), but the client-side conditionFilter gate (and the OPC UA WhereClause) use friendly type names — so a friendly-name filter built a correct server WhereClause yet the client gate dropped every event (zero alarms delivered). Resolve the event-type NodeId to its friendly name via an inverse of KnownConditionTypeIds (NodeId-string fallback for custom types) so both sides agree. Also fix a dead-code ternary in the SourceName derivation.	2026-06-15 14:25:35 -04:00
Joseph Doherty	8825df56be	fix(dcl): apply native-alarm conditionFilter (client-side gate + OPC UA WhereClause) (#8 ) conditionFilter was plumbed end-to-end but applied nowhere — a filtered source silently mirrored all conditions. Define the filter as a comma-separated, case-insensitive list of condition type names (blank = all); enforce it authoritatively client-side in DataConnectionActor routing (uniform across OPC UA + MxGateway) and, for OPC UA, additionally build a server-side EventFilter WhereClause as a bandwidth optimization.	2026-06-15 14:16:10 -04:00
Joseph Doherty	de375ff7ea	fix(db): classify non-SqlException DB outages as transient; propagate cancellation (#7 ) ExecuteWriteAsync only caught SqlException, so a live outage surfacing as InvalidOperationException/SocketException/IOException/TimeoutException escaped unclassified and crashed the script actor instead of buffering. Mirror the HTTP path: propagate OperationCanceledException on cancellation, classify transport exceptions as transient (buffer+retry), let unexpected exceptions propagate.	2026-06-15 14:03:25 -04:00
Joseph Doherty	d05270640d	fix(db): classify transient vs permanent SQL errors in Database.CachedWrite (#7 ) CachedWrite buffered ALL write failures and retried forever, never returning a synchronous failure to the script — permanent SQL errors (constraint/syntax/ permission) were treated as transient. Mirror the External-System API path: attempt immediately, return Failed synchronously on permanent SQL errors (no buffering), buffer only transient errors; the S&F retry path parks permanent failures instead of retrying forever. New SqlErrorClassifier + PermanentDatabaseException.	2026-06-15 13:53:15 -04:00
Joseph Doherty	198770f578	fix(deploy): address M2.2 review nits — backup endpoint in diff summary + null-oldConfig test (#10 ) - FormatConnection now includes BackupConfigurationJson so a backup-only change no longer renders identical Before/After cells (covers all 4 ConnectionsEqual fields) - add ComputeConnectionsDiff(null, newConfig) first-deploy unit test	2026-06-15 13:41:39 -04:00
Joseph Doherty	e9a84ba220	feat(deploy): surface connection-level changes in the deployment diff (#10 ) ComputeConnectionsDiff existed with tests but was never called and ConfigurationDiff had no slot for it, so standalone connection endpoint/protocol/failover drift never appeared in the deployment diff (only per-attribute binding drift did). Add a ConnectionChanges slot, wire ComputeConnectionsDiff into ComputeDiff, and render the connection section in the deployment diff UI.	2026-06-15 13:36:40 -04:00
Joseph Doherty	41d828e38e	fix(deploy): address M2.1 review nits — comparer consistency + comments (#22 ) - connection-name capable-set comparer kept as StringComparer.Ordinal: FlatteningService and SemanticValidator use all-ordinal name-keyed dictionaries throughout; OrdinalIgnoreCase would be inconsistent with the rest of the binding-resolution path — added comment documenting this - IsAlarmCapable protocol-match confirmed consistent with DataConnectionFactory (both OrdinalIgnoreCase); added case-insensitive InlineData variants (OPCUA, opcua, mxgateway, MXGATEWAY) to lock the contract - clarified FlatteningPipeline comment: "filters connections by alarm-capable protocol, then collects their names" (was "maps from the protocol string") - added DataConnectionLayer/DataConnectionFactory.cs path reference to AlarmCapableProtocols sync-risk comment	2026-06-15 13:27:26 -04:00
Joseph Doherty	d6909207a8	fix(deploy): wire native-alarm-source capability validation into flattening pipeline (#22 ) FlatteningPipeline loaded data connections but never passed the alarm-capable connection set to SemanticValidator, so the native-alarm-source capability check (built but inert) never ran — a source bound to a non-alarm-capable connection deployed silently. Compute the capable set (IAlarmSubscribableConnection: OPC UA + MxGateway) and thread it through ValidationService to SemanticValidator.	2026-06-15 13:20:20 -04:00
Joseph Doherty	e5534fddca	fix(siteeventlog): suppress snapshot-resync alarm re-emit + coverage + hardening (review)	2026-06-15 12:45:00 -04:00
Joseph Doherty	e74c3aef23	feat(siteeventlog): emit script started/completed Info events (M1.8) ScriptExecutionActor previously emitted only an Error 'script' event on failure. It now also fire-and-forgets an Info 'script' event when execution starts (right before RunAsync) and when it completes successfully — giving the operational log the full started/completed/failed lifecycle. Uses the already-resolved siteEventLogger; fire-and-forget so the event log can never block or fault the script's own run. Extends the SingleServiceProvider test helper to also serve IServiceScopeFactory (returning a self-scope) so ScriptExecutionActor's serviceProvider.CreateScope() reaches the logging hot path in tests instead of throwing into the catch.	2026-06-15 12:33:31 -04:00
Joseph Doherty	d8b5dbb386	feat(siteeventlog): emit store_and_forward + notification events (M1.7) StoreAndForwardService gains an optional ISiteEventLogger? ctor param (default null so the many direct-construction tests still compile) and, when wired, mirrors its own buffer/retry/park activity onto site operational events via the existing OnActivity hook (which already isolates a throwing subscriber, so a failing event log can never be misclassified as a transient delivery failure): - store_and_forward (ExternalSystem / CachedDbWrite): queued/retried/delivered/ parked. Warning on buffer/retry, Error on park, Info on retry-recovery; an immediate-success delivery is the hot path and is not logged. - notification (the site forward-to-central path): logged ONLY on forward FAILURE (buffered after the immediate forward threw) and on park, per the Component-SiteEventLogging spec — routine enqueue and forward-success are deliberately not logged (central's Notifications table is the audit record). Wired through AddStoreAndForward (resolves ISiteEventLogger optionally from DI); StoreAndForward project now references SiteEventLogging (acyclic: SiteEventLogging references only Commons). Also documents the 'notification' category on the ISiteEventLogger.LogEventAsync eventType param (folds in M1.8 doc fix).	2026-06-15 12:31:04 -04:00
Joseph Doherty	09b9e8f259	feat(siteeventlog): emit deployment + instance_lifecycle events (M1.6) DeploymentManagerActor now fire-and-forgets a 'deployment' site operational event on deploy/enable/disable/delete outcomes (Info on success, Error on failure), source 'DeploymentManagerActor'. The disable/delete events are emitted from the existing PipeTo continuations (safe: reads only the immutable _serviceProvider and fire-and-forgets). InstanceActor now emits an 'instance_lifecycle' Info event in PreStart (started) and a new PostStop (stopped) — covering start/stop/enable/disable/redeploy/ failover transitions from the instance's own vantage point. Both actors already hold _serviceProvider; no ctor change. Resolution is optional and LogEventAsync is fire-and-forget so a logging failure never affects the deployment pipeline or instance lifecycle.	2026-06-15 12:26:54 -04:00
Joseph Doherty	a00e43c4f9	feat(siteeventlog): emit alarm-category events on alarm transitions (M1.5) AlarmActor (computed) and NativeAlarmActor (native mirror) now fire-and-forget an 'alarm' site operational event on every state transition: - raise/activate: Error (priority/severity >= 700) or Warning - clear/return-to-normal, ack, inter-band transition: Info Both actors take a new optional IServiceProvider? ctor param (default null so existing direct-construction tests still compile); InstanceActor passes its _serviceProvider at the two Props.Create sites. Resolution is optional and the LogEventAsync call is fire-and-forget, so a logging failure never affects alarm evaluation. Rehydration replays are not re-logged. Adds a capturing FakeSiteEventLogger test helper + SingleServiceProvider.	2026-06-15 12:23:04 -04:00
Joseph Doherty	f49ac51771	fix(sitecallaudit): async DI scope in tick paths + options clamp tests + cursor/retry docs (review)	2026-06-15 12:10:54 -04:00
Joseph Doherty	e675b34500	feat(sitecallaudit): daily terminal-row purge scheduler Add a daily purge tick to SiteCallAuditActor that drops terminal SiteCalls rows older than the retention window via ISiteCallAuditRepository.PurgeTerminalAsync. The threshold is computed each tick as UtcNow - RetentionDays so an operator who lowers RetentionDays sees it on the next purge without a restart. Mirrors AuditLogPurgeActor's daily cadence + continue-on-error posture: a purge fault is logged and swallowed so the central singleton stays alive and retries next tick. The purge timer is started in PreStart alongside the reconciliation timer and gates on the same collaborators (pull client + enumerator) being available — the repo-only test ctor injects neither, so neither background timer runs there. Options: PurgeInterval (default 24h, clamped >= 1 min so a zero config value can't spin the scheduler) + RetentionDays (default 365), plus a test-only override that bypasses the clamp for millisecond cadences. Tests (all in-memory, no live MSSQL): purge tick calls PurgeTerminalAsync with a UtcNow - RetentionDays threshold (non-default 30 days); default retention yields a 365-day threshold; a throwing repo does not kill the singleton (a second tick still arrives).	2026-06-15 12:03:49 -04:00
Joseph Doherty	e427b38fb3	feat(sitecallaudit): periodic reconciliation pull back-fills lost telemetry Add a periodic reconciliation tick to SiteCallAuditActor that, per site, pulls changed SiteCall rows since a per-site UpdatedAtUtc cursor and upserts them idempotently (monotonic UpsertAsync) — the documented self-heal for lost best-effort gRPC telemetry. Mirrors SiteAuditReconciliationActor's structure (per-site cursor, per-site try/catch failure isolation, advance cursor by max observed UpdatedAtUtc) minus the stalled-detection EventStream machinery. Dependency wiring: add an acyclic SiteCallAudit -> AuditLog project reference and resolve IPullSiteCallsClient + ISiteEnumerator (central-only singletons registered by AddAuditLogCentralReconciliationClient) from the IServiceProvider the production ctor already holds — no Host Props.Create change needed. The repo-only test ctor injects neither collaborator, so the tick is gated off there. A new public test ctor injects fake client + enumerator + repo so the tick is unit-testable in-memory (public, not internal: Akka's ActivatorProducer uses public-only reflection binding). Options: ReconciliationInterval (default 5 min, clamped >= 1s so a zero config value can't spin the scheduler) + ReconciliationBatchSize (default 500), plus a test-only override that bypasses the clamp for millisecond cadences. Tests (all in-memory, no live MSSQL): absent row is upserted on a tick; second tick advances the cursor past already-pulled rows; one failing site does not sink other sites; repo-only ctor does not start the tick.	2026-06-15 12:01:22 -04:00
Joseph Doherty	6b0140dd62	fix(sitecallaudit): UpdatedAtUtc index + per-row pull resilience + UTC-convention + first-cycle test (review)	2026-06-15 10:47:25 -04:00
Joseph Doherty	963e3427da	feat(sitecallaudit): PullSiteCalls reconciliation plumbing (store read + RPC + site handler + central client) Site Call Audit (#22): build the documented periodic reconciliation PULL self-heal path for the eventually-consistent central SiteCalls mirror, as a dedicated PullSiteCalls gRPC RPC kept separate from the audit pull. This is the pull PLUMBING only; the central reconciliation tick is a separate follow-up. - IOperationTrackingStore.ReadChangedSinceAsync(sinceUtc, batchSize): inclusive UpdatedAtUtc cursor, oldest-first, batch-capped; SQLite impl projects tracking rows onto SiteCallOperational (Kind->Channel, TargetSummary->Target, SourceSite left empty - the store has no site-id column). - sitestream.proto: rpc PullSiteCalls + PullSiteCallsRequest/Response, mirroring PullAuditEvents; regenerated checked-in SiteStreamGrpc/*.cs. - SiteCallDtoMapper.ToDto(SiteCallOperational): inverse of FromDto for the handler. - SiteStreamGrpcServer.PullSiteCalls handler + SetOperationTrackingStore seam; Host wires the seam alongside SetSiteAuditQueue (site roles only). - Central IPullSiteCallsClient + GrpcPullSiteCallsClient (home: AuditLog/Central to reuse ISiteEnumerator; SiteCallAudit does not reference AuditLog). Re-stamps SourceSite from the dialed siteId; no-throw on tolerable transport faults; SpecifyKind (not ToUniversalTime) cursor handling. Central-only DI registration. Tests: ReadChangedSinceAsync (4), PullSiteCalls handler (6), GrpcPullSiteCallsClient (8). Full solution build 0 warnings/0 errors (TreatWarningsAsErrors).	2026-06-15 10:39:06 -04:00
Joseph Doherty	c092e89fd1	fix(audit): robust central options binding + interval clamps + doc/contract fixes (review)	2026-06-15 10:11:49 -04:00
Joseph Doherty	36a08a4145	feat(audit): start purge + reconciliation singletons; production ISiteEnumerator	2026-06-15 10:00:44 -04:00

1 2 3 4 5 ...

732 Commits