ScadaBridge

Author	SHA1	Message	Date
Joseph Doherty	56d6508a5b	docs(plan): implementation plan for dev disable-login flag (4 tasks)	2026-06-16 08:35:05 -04:00
Joseph Doherty	5cf2d1cb99	docs(plan): design for dev disable-login auto-login flag (port from OtOpcUa) Faithful copy (warning only, no env guard); custom AuthenticationHandler under the cookie scheme; reuses M2.19 SessionClaimBuilder for an all-roles system-wide principal.	2026-06-16 08:31:19 -04:00
Joseph Doherty	b2d8fd8a0a	Merge M2: stillpending.md Tier-2 correctness & behavioral gaps (#7,#8,#9,#10,#13,#15,#17,#18,#20,#21,#22,#23,#24,#25,#26,#27,#28,#29,#30,#31,#32) 20 tasks (M2.0-M2.19), each through its classification-driven review chain. Full-solution build green (0 warnings, TreatWarningsAsErrors). Per-task targeted suites all passed. Known pre-existing: 2 partition-purge E2E failures (follow-up #52).	2026-06-16 08:27:59 -04:00
Joseph Doherty	077770fe35	docs(plan): record M2.19 review-fix SHA; M2 (Tier-2) complete — all 20 tasks done	2026-06-16 08:12:49 -04:00
Joseph Doherty	fddc69545f	fix(security): M2.19 review nits — idle/refresh config guard + adapter tests + dead-var/doc cleanup (#15 ) - Add SecurityOptionsValidator (IValidateOptions<SecurityOptions>) enforcing RoleRefreshThresholdMinutes < IdleTimeoutMinutes; registered with ValidateOnStart in AddSecurity — startup FAILS if threshold >= idle, so the invariant cannot be silently misconfigured away. - Update SecurityOptions XML-docs: class-level summary distinguishes JWT Bearer path (JwtSigningKey/JwtExpiryMinutes) from Blazor cookie session path (IdleTimeoutMinutes/ RoleRefreshThresholdMinutes); both time fields document the ~45-min effective idle window and the new cross-field constraint. - Remove dead jwtService variable from /auth/login lambda in AuthEndpoints.cs (resolved but never used since login moved to SessionClaimBuilder). - Extract ApplyValidationResultAsync helper from OnValidatePrincipalAsync (pure decision-application step); add 3 adapter tests covering Reject → RejectPrincipal + SignOutAsync; Replace → ReplacePrincipal + ShouldRenew; Keep → no-op. - Fix inaccurate TryRefreshAsync comment (dropped "OR last-activity needs advancing" — the code only returns non-null when roleRefreshDue). - Add InternalsVisibleTo for Security.Tests in Security.csproj. - Add IsRoleRefreshDue tests: missing claim → due; unparsable claim → due; plus integration test covering the full ValidateAsync path for a principal missing zb:lastrolerefresh (triggers refresh + re-stamps anchor rather than keeping stale principal forever). - Add SecurityOptionsValidatorConfigGuardTests: default succeeds; equal fails; greater fails; boundary (idle-1) succeeds; wiring confirmed via AddSecurity container.	2026-06-16 08:12:11 -04:00
Joseph Doherty	c7916d79a8	chore(tasks): record M2.19 implementation commit SHA (`8fe7f46`)	2026-06-16 07:54:49 -04:00
Joseph Doherty	8fe7f46df6	feat(security): cookie session idle-timeout + LDAP-free role-mapping refresh (#15 , M2.19) Spike outcome: the shared ILdapAuthService (ZB.MOM.WW.Auth.Abstractions, an external NuGet package) exposes ONLY AuthenticateAsync(username, password, ct) — no passwordless service-account group-search. A live LDAP group re-query for an active session therefore requires a new lib method and is OUT OF SCOPE (cannot modify the external package). Implemented the always-achievable layers (cookie-only; no embedded JWT for cookie principals): - /auth/login now stores the user's raw LDAP groups (one zb:group claim each) plus a zb:lastrolerefresh anchor (login time, UTC), seeding the LastActivity idle anchor too. - SessionClaimBuilder: single shared DRY claim-builder used by BOTH /auth/login AND the refresh path, so the two claim shapes cannot drift (canonical identity/role/scope claims with nameType/roleType pinned, plus the M2.19 group + refresh-anchor additions). - CookieSessionValidator (TimeProvider-injected, unit-testable) + a thin CookieAuthenticationEvents.OnValidatePrincipal adapter: * idle-timeout: a session past IdleTimeoutMinutes (default 30) is RejectPrincipal+SignOut; consistent with the cookie ExpireTimeSpan+SlidingExpiration window (same value). * role refresh WITHOUT LDAP: when older than RoleRefreshThresholdMinutes (new option, default 15) the DB-backed RoleMapper re-runs on the STORED groups, claims are rebuilt via the shared builder, the anchor advances, principal is replaced + cookie renewed. Revoked DB mappings drop the user's roles mid-session. * fail-soft: any refresh error KEEPS the existing principal (no sign-out, never throws) — mirrors the documented "LDAP failure: active sessions continue with current roles". - Documented residual limitation in Component-Security.md: central role-mapping/scope changes apply within ~15 min without LDAP; live directory group-membership changes are picked up only at next login (needs a passwordless group-search on the external ZB.MOM.WW.Auth.Ldap lib — tracked follow-up). Tests (Security.Tests, all green): CookieSessionValidatorTests + SessionClaimBuilderParityTests — idle reject/keep, LDAP-free remap-from-stored-groups, revoked-roles loss, sub-threshold no-refresh, refresh-throws-keeps-session, and login/refresh claim-parity.	2026-06-16 07:54:31 -04:00
Joseph Doherty	a0d9379a4f	fix(debug-stream): M2.18 review nits — thread-safe test mock + AlarmKey null-guard + rename stale test (#26 ) - MockSiteStreamGrpcClient.SubscribeCalls and UnsubscribedCorrelationIds switched from bare List<T> to lock-guarded backing fields with snapshot accessors, eliminating the actor-thread/test-thread data race (matches the existing lock(events) pattern for ReceivedEvents) - AttributeKey and AlarmKey null-guard each component with ?? string.Empty so a null SourceReference/AlarmName/etc. cannot silently collide with an empty-string component in the dedup dictionary - On_Snapshot_Opens_GrpcStream renamed to On_Snapshot_Does_Not_Open_Additional_GrpcStream; assertion updated to confirm exactly one subscribe (the PreStart stream-first open) with no second subscribe after snapshot delivery - _stopped ordering in InstanceNotFound path moved after CleanupGrpc() for consistency with DebugStreamTerminated and ReceiveTimeout handlers	2026-06-16 07:41:41 -04:00
Joseph Doherty	7210cdbcb5	docs: record M2.18 (#26 ) implementation commit SHA in M2 task tracker	2026-06-16 07:34:06 -04:00
Joseph Doherty	d8519cb464	fix(debug-stream): stream-first lifecycle with replay/dedup (#26 , M2.18) Re-architect DebugStreamBridgeActor from snapshot-first to stream-first so no attribute/alarm event occurring during the snapshot-build + network-transit window is lost (#26). Lifecycle change: - PreStart now opens the gRPC subscription FIRST (alongside sending the SubscribeDebugViewRequest), so live events start flowing immediately. - Phase model via a single _snapshotDelivered flag (mutated only on the actor thread). While buffering (snapshot not yet delivered), AttributeValueChanged/ AlarmStateChanged are appended to an ordered _preSnapshotBuffer instead of being delivered. After snapshot+flush, the same handlers pass through directly. - On DebugViewSnapshot: deliver snapshot, then flush the buffer in arrival order with per-entity dedup, then set _snapshotDelivered=true (pass-through). Dedup rule (exactly-once): - Identity: attributes by (InstanceUniqueName, AttributePath, AttributeName); alarms by (InstanceUniqueName, AlarmName, SourceReference) so native per-condition alarms are not conflated. Keys joined with a NUL delimiter (declared as an escaped char constant; no raw NUL in source) so distinct identities never collide on a space within a name. - Boundary: a buffered event whose timestamp is <= the snapshot's timestamp for the same entity is already reflected -> DROP; strictly-newer (>) -> DELIVER; entity absent from the snapshot -> DELIVER (genuine gap-window event). Preserved paths: - M2.11 InstanceNotFound: with stream-first the gRPC stream is already open, so the not-found path now tears it down (CleanupGrpc) + clears the buffer, does NOT enter pass-through, delivers the not-found snapshot, and stops cleanly. - Reconnect (ReconnectGrpcStream -> OpenGrpcStream) does not touch the phase flag: a mid-session reconnect resumes pass-through; a reconnect during the buffering phase stays buffering until the snapshot arrives. - Communication-008 retry/stability/stop/terminate + ReceiveTimeout orphan net unchanged. Duplicate/late snapshot after delivery is ignored defensively. Tests: 10 new M2.18 tests (stream-first ordering, gap-window buffering, dedup drop/deliver for attrs + alarms, ordering, pass-through, InstanceNotFound teardown, reconnect-during-buffering, reconnect-after-snapshot) + revised the M2.11 not-found test to assert stream teardown. Full DebugStreamBridgeActor class green: 23/23.	2026-06-16 07:33:51 -04:00
Joseph Doherty	c1043569f6	docs(deployment): reconcile delete-from-NotDeployed — spec matrix now matches deliberate code (#31 , M2.17) git blame shows commit `1d5465f3` deliberately added NotDeployed to CanDelete so an undeployed instance can have its orphan record fully removed. Code + tests already permit it; the spec matrix said 'No'. Per M2.17, reconcile doc→code (not the reverse): matrix now reads 'Delete from Not deployed = Yes (removes the orphan record)' with a note, and CanDelete carries a remark citing the rationale + origin commit.	2026-06-16 07:24:57 -04:00
Joseph Doherty	c9244d8bda	fix(health): M2.16 review nit — real idempotency guard for SiteEventLog health bridge (#30 ) AddSiteEventLogHealthMetricsBridge registered via AddHostedService(factory-lambda), which sets ImplementationFactory and leaves ImplementationType null. The prior ImplementationType == guard was therefore silently dead — a second call would spin up a second SiteEventLogFailureCountReporter. Fix: add a private SiteEventLogHealthMetricsBridgeMarker singleton and guard on its ServiceType instead. Also corrects the cycle-path comment in both ServiceCollectionExtensions.cs and SiteEventLogFailureCountReporter.cs: StoreAndForward.csproj does reference SiteEventLogging.csproj, so the transitive path HealthMonitoring → StoreAndForward → SiteEventLogging is real, but adding a direct HealthMonitoring → SiteEventLogging reference would NOT create a cycle (SiteEventLogging has no back-edge to HealthMonitoring). The Func<long> seam is a coupling-avoidance measure, not a cycle-breaker. Adds AddSiteEventLogHealthMetricsBridgeTests.AddSiteEventLogHealthMetricsBridge_IsIdempotent_DoesNotDoubleRegister_HostedService as a regression test (builds provider and asserts exactly one reporter via GetServices<IHostedService>().OfType<T>()).	2026-06-16 07:22:35 -04:00
Joseph Doherty	d81f747434	feat(health): wire ISiteEventLogger.FailedWriteCount into SiteHealthReport (#30 , M2.16) Add SiteHealthReport.SiteEventLogWriteFailures (trailing optional long = 0, additive-only), ISiteHealthCollector.SetSiteEventLogWriteFailures (default no-op so existing fakes compile), and SiteEventLogFailureCountReporter (hosted service in HealthMonitoring, Func<long> delegate to avoid the HealthMonitoring → StoreAndForward → SiteEventLogging cycle). Registration helper AddSiteEventLogHealthMetricsBridge added to HealthMonitoring.ServiceCollectionExtensions; wired in SiteServiceRegistration after AddSiteEventLogging. Tests: SiteEventLogWriteFailuresMetricTests (4 collector tests) + SiteEventLogFailureCountReporterTests (2 poller tests) in HealthMonitoring.Tests. 79/79 HealthMonitoring.Tests green, 59/59 SiteEventLogging.Tests green, 0 warnings.	2026-06-16 07:14:54 -04:00
Joseph Doherty	e1ee37e508	fix(siteeventlog): gate EventLogPurge to active node via IClusterNodeProvider.SelfIsPrimary (#29 , M2.15)	2026-06-16 07:02:26 -04:00
Joseph Doherty	6b1cb9e0e6	refactor(host)/test: M2.14 review nits — simplify probe cancellation + pre-cancelled-token test (#28 ) - Remove redundant linked CancellationTokenSource in ProbeAsync; pass the framework cancellationToken and ProbeTimeout directly to Ask (the two-CTS pattern was redundant — Ask already honours both the timeout and the token). - Add EchoActor XML <remarks> explaining why no Receive<Identify> handler is needed (ActorBase answers Identify automatically). - Add PreCancelledToken_ReportsUnhealthy_DoesNotThrow test: verifies the never-throws guarantee on the shutdown-race path (token already cancelled before CheckHealthAsync is invoked).	2026-06-16 06:54:28 -04:00
Joseph Doherty	473429a202	docs: record M2.14 (#28 ) commit SHA in M2 task tracker	2026-06-16 06:49:28 -04:00
Joseph Doherty	253bec5a52	feat(host): readiness gates on required cluster singletons (#28 , M2.14) REQ-HOST-4a lists "required cluster singletons running (if applicable)" as a readiness criterion, but /health/ready only checked database + akka-cluster. Add a third Ready-tagged check, RequiredSingletonsHealthCheck, registered in the Central-role AddHealthChecks() chain (so it is naturally role-scoped — site nodes never run it). Probe: for each required central singleton, Ask its local ClusterSingletonProxy an Identify with a short bounded per-singleton timeout (~2s, probes run concurrently via Task.WhenAll). A non-null ActorIdentity.Subject within the timeout means the singleton is running and reachable through the proxy; a null subject or a timeout means unreachable → Unhealthy, naming the unreachable singleton(s). The check never throws (catch-all → Unhealthy) and resolves ActorSystem lazily from DI per probe (Unhealthy if Akka not yet up). Required-always set = the five singleton proxies created unconditionally in AkkaHostedService.RegisterCentralActors: notification-outbox, audit-log-ingest, site-call-audit, audit-log-purge, site-audit-reconciliation. There are no feature/config-gated central singletons today; any future gated singleton is the "if applicable" case and must NOT be added to the required set. Leadership-agnostic: the proxy reaches the singleton from either central node, so a ready standby still reports ready (readiness must not require cluster leadership — that is the Active tier's job). During a brief singleton handover the probe may time out and the node flaps to not-ready, which is correct (a node mid-handover is legitimately not fully ready); no retries, to keep the probe fast. Tests (TDD): RequiredSingletonsHealthCheckTests exercises the probe against a TestKit ActorSystem — all proxies present+reachable → Healthy; one missing → Unhealthy naming it; ActorSystem absent → Unhealthy, no throw. HealthCheckTests regression-guards the Ready tag + absence of the Active tag on the new check.	2026-06-16 06:49:18 -04:00
Joseph Doherty	3945789970	docs(dcl): M2.13 review nits — OriginalRaiseTime ConditionRefresh/UTC caveats + Description-vs-Message note (#27 )	2026-06-16 06:40:40 -04:00
Joseph Doherty	722b8663c1	feat(dcl): populate obtainable NativeAlarmTransition fields from OPC UA and MxGateway (#27 , M2.13) OPC UA (RealOpcUaClient): - Append 5 new SelectClauses at indices 13–17 (never renumber 0–12): - 13: AlarmConditionType/ActiveState/TransitionTime → OriginalRaiseTime - 14–17: LimitAlarmType HighHighLimit/HighLimit/LowLimit/LowLowLimit → LimitValue - New OpcUaAlarmMapper.PickLimitValue helper: first non-null in HiHi→Hi→Lo→LoLo priority order, InvariantCulture-formatted; empty string for non-limit alarm types. - HandleAlarmEvent reads new indices with fields.Count > N guards; hard minimum (6) unchanged so base ConditionType events still process without the limit fields. - Document unavailable-by-protocol fields (Category, Description, OperatorUser, CurrentValue) inline in BuildAlarmEventFilter and HandleAlarmEvent. MxGateway (MxGatewayAlarmMapper): - MapTransition: CurrentValue and LimitValue now populated via MxValueToString (uses MxValueExtensions.ToClrValue + InvariantCulture) from OnAlarmTransitionEvent proto fields current_value/limit_value. - MapSnapshot: same — populated from ActiveAlarmSnapshot.current_value/limit_value. - MxValueToString helper (internal): null-safe MxValue → string conversion. Tests (17 new, 40 total pass): - OpcUaAlarmMapperTests: PickLimitValue priority, InvariantCulture, all-null case. - MxGatewayAlarmMapperTests: CurrentValue/LimitValue populate from double/string MxValue; absent fields yield empty strings. - RealOpcUaClientAlarmFilterTests: index alignment assertions (count=18, per-index TypeDefinitionId+BrowsePath), regression guard on existing indices 0–12.	2026-06-16 06:37:19 -04:00
Joseph Doherty	e2b31a9fd2	fix(siteruntime): M2.12 review nits — observe logger fault + meaningful source fallback (#25 ) Replace bare task-discard with ContinueWith(OnlyOnFaulted\|ExecuteSynchronously) so a faulted ISiteEventLogger is logged and swallowed rather than going to the unobserved-task firehose. Replace the "ScriptRuntimeContext" class-name fallback with the meaningful "InstanceScript:{instanceName}" identifier (matching the site-event-log source convention). Update the method doc-comment to state the best-effort contract explicitly. Pin the new fallback value in the shape-precision test.	2026-06-16 06:26:00 -04:00
Joseph Doherty	f08038db23	feat(siteruntime): M2.12 (#25 ) — emit script Error site event on recursion-limit violation Inject ISiteEventLogger into ScriptRuntimeContext (additive optional ctor param, defaulted null, all existing callers source-compatible). Add a single private EmitRecursionLimitEventAsync helper that fires-and-forgets a "script"/Error site event; called at both recursion guard sites (CallScript at ~:332 and ScriptCallHelper.CallShared at ~:499). ScriptExecutionActor threads the already-resolved siteEventLogger singleton into the context; AlarmExecutionActor leaves it null (no siteEventLogger wired there). Existing _logger.LogError + throw behaviour unchanged. Tests: RecursionLimitSiteEventTests — 5 tests covering both CallScript and CallShared (ISiteEventLogger.LogEventAsync called once with category "script", severity "Error"; null logger path does not throw).	2026-06-16 06:20:58 -04:00
Joseph Doherty	d160c7f694	test(communication): M2.11 review nits — bridge-actor not-found test + dead-letter comment + toast wording (#24 ) - Add DebugStreamBridgeActorTests: On_InstanceNotFound_Snapshot_Forwards_To_OnEvent_Does_Not_Open_Stream_And_Terminates — asserts _onEvent receives the not-found snapshot, SubscribeCalls remains empty, and the actor terminates cleanly via Watch/ExpectTerminated. - Add comment in DebugStreamBridgeActor near Context.Stop(Self) explaining that the subsequent StopDebugStream Tell from DebugStreamService.StopStream produces a benign expected dead-letter. - Reword not-found toast in DebugView.razor to "Instance not found on the selected site — check the deployment target." (accurate when the instance may be deployed to a different site).	2026-06-16 06:15:26 -04:00
Joseph Doherty	dbf44b9e10	fix(siteruntime): M2.11 — unknown-instance debug snapshot returns InstanceNotFound=true (#24 ) RouteDebugSnapshot and RouteDebugViewSubscribe on DeploymentManagerActor previously returned an empty DebugViewSnapshot for unknown instances, indistinguishable from a deployed-but-empty instance. Callers had no way to differentiate "not deployed here" from "deployed, no data yet." Approach — additive field on existing message contract: Added `bool InstanceNotFound = false` as an optional trailing parameter to DebugViewSnapshot (Commons). All existing positional constructor calls and serialized wire frames are unaffected (default = false). A dedicated new message type was considered but rejected: the ClusterClient channel and DebugStreamService TCS are already typed on DebugViewSnapshot, and a second reply union would require wider changes for zero additive-safety gain. Changes: - Commons/DebugViewSnapshot: add InstanceNotFound = false (additive) - DeploymentManagerActor: set InstanceNotFound=true in both unknown- instance branches (RouteDebugViewSubscribe, RouteDebugSnapshot) - DebugStreamBridgeActor: when snapshot.InstanceNotFound, forward it to _onEvent (resolves the TCS) then stop cleanly; no gRPC stream opened - DebugView.razor: check session.InitialSnapshot.InstanceNotFound after connect and show a clear "not deployed on this site" error toast - 3 new tests in DeploymentManagerActorTests covering: unknown→snapshot, unknown→subscribe, known-empty→InstanceNotFound stays false	2026-06-16 06:08:21 -04:00
Joseph Doherty	9cd62aa5b4	test(configdb): M2.10 review fix — catch bracketed AuditLog identifiers; document EF/multi-line scan limits (#18 ) Extends ContainsAuditLogMutation regex to match T-SQL bracketed forms ([AuditLog], [dbo].[AuditLog]) that SSMS-generated SQL produces; the prior optional-schema pattern only matched bare/dbo-prefixed names, silently missing these real violation forms. Changes: - Schema sub-pattern (?:dbo\.)? → (?:\[?dbo\]?\.)? (matches dbo. and [dbo].) - Table sub-pattern AuditLog\b → \[?AuditLog\]?\b (matches AuditLog and [AuditLog]) - Pattern compiled as static readonly Regex field for clarity/performance - Adds 4 new planted-positive cases: UPDATE [dbo].[AuditLog], UPDATE [AuditLog], DELETE FROM [dbo].[AuditLog], DELETE FROM [AuditLog] - Retains all existing negatives; adds DELETE FROM [dbo].[Notifications] negative - Fixes misleading "reverse order" comment on the comment-prefix positive case - Documents scan limitations (EF Core bulk methods; multi-line DML) in class XML doc	2026-06-16 05:55:27 -04:00
Joseph Doherty	e7b6fe33a4	test(configdb): guard test for AuditLog append-only invariant (M2.10, #18 ) Adds AuditLogAppendOnlyGuardTests.cs to tests/ZB.MOM.WW.ScadaBridge.ConfigurationDatabase.Tests/ — a code-level backstop for the DB-role DENY UPDATE / DENY DELETE control established in migration 20260602174346_CollapseAuditLogToCanonical. The guard scans every non-Designer, non-Snapshot .cs file in the ConfigurationDatabase source tree and fails the test run if any line matches the DML-syntax pattern: UPDATE\s+(?:dbo\.)?AuditLog\b DELETE\s+(?:FROM\s+)?(?:dbo\.)?AuditLog\b The tight DML-syntax pattern naturally excludes false positives without extra exclusion checks: DENY UPDATE ON dbo.AuditLog is not matched (UPDATE is followed by ON, not the table name); ALTER TABLE … SWITCH and TRUNCATE contain no UPDATE/ DELETE keyword; comments with UPDATE/AuditLog in separate clauses are not matched. Self-verifying unit tests (ContainsAuditLogMutation_) prove the helper: - returns false on clean-source lines (INSERT, SELECT, DENY DDL, ALTER SWITCH, TRUNCATE, DELETE FROM Notifications); - returns TRUE on planted violations (UPDATE AuditLog SET …, DELETE FROM dbo.AuditLog WHERE …, lower-case variants); - returns false on the exact DENY/GRANT/partition-switch strings from the production migration files. All 256 ConfigurationDatabase.Tests pass; solution builds 0 W / 0 E.	2026-06-16 05:49:51 -04:00
Joseph Doherty	76198b36e3	fix(host): add MachineDataDb startup validation for Central (reverts Host-008, M2.9 #17 ) REQ-HOST-3/REQ-HOST-4 require a MachineDataDb connection string for Central nodes. The shipped docker appsettings (docker/central-node-a/appsettings.Central.json and central-node-b) already carry the key. Host-008 had removed the fail-fast Require because MachineDataDb had no consumer yet; this commit reverses that decision so a misconfigured or missing connection string is caught at startup with a clear error. Changes: - DatabaseOptions: add MachineDataDb property with XML doc comment - StartupValidator: add .Require for ScadaBridge:Database:MachineDataDb inside the existing Central .When block, immediately after the ConfigurationDb Require - StartupValidatorTests: rename Central_MissingMachineDataDb_PassesValidation -> FailsValidation and flip to Assert.Throws; update comment to cite REQ-HOST-3/4, shipped docker appsettings, and the Host-008 reversal; add MachineDataDb to ValidCentralConfig() so all other Central tests remain green - CentralDbTestEnvironment: supply ScadaBridge__Database__MachineDataDb env var (mirrors ConfigurationDb pattern) so HostStartupTests, HealthCheckTests, and MetricsEndpointTests pass through the new Require - CompositionRootTests, AkkaHostedServiceAuditWiringTests, ActorPathTests: set ScadaBridge__Database__MachineDataDb env var alongside the pepper env var and clear it in Dispose, matching the existing pepper handling pattern Build: 0 warnings, 0 errors. dotnet test Host.Tests: 233/233 passed.	2026-06-16 05:41:25 -04:00
Joseph Doherty	21b801b71f	test(template): M2.8 review nits — stale-binding comment + stale-ID & inert-check tests (#23 ) Add code comments in ValidateConnectionBindingCompleteness explaining that the unbound-attribute branch also covers the silently-dropped stale-binding case (cross-reference FlatteningService.ApplyConnectionBindings), and that the `continue` skips the exists-at-site check for unbound attrs. Add two new tests: - FlatteningPipelineConnectionBindingTests: stale DataConnectionId (999) not present in site connections → flattener drops it silently → validator reports ConnectionBinding Error, IsValid false. - ValidationServiceTests: enforce:true + siteConnectionNames:null on a properly-bound attribute → no ConnectionBinding error (exists-at-site check stays inert when site set is not supplied).	2026-06-16 05:34:56 -04:00
Joseph Doherty	3b79b896cf	chore: record M2.8 commit SHA in plan task tracker	2026-06-16 05:28:19 -04:00
Joseph Doherty	7c14a69091	feat(#23 ): elevate connection-binding completeness to a deploy-gating Error (M2.8) Pre-deployment validation only WARNED when a data-sourced attribute had no connection binding, so an instance with unresolved bindings still passed IsValid and could deploy. There was also no check that a binding resolves to a connection that actually exists at the target site. - ValidationService.Validate gains an opt-in `enforceConnectionBindings` flag (default false) plus a `siteConnectionNames` set. Default-false keeps the template DESIGN-TIME path (ManagementActor.HandleValidateTemplate) non-blocking, since bindings are legitimately set later at instance/deploy time. The DEPLOY path (FlatteningPipeline) opts in (true) so: * a data-sourced attribute with no binding is now a deploy-gating Error; * a binding to a connection that does not exist on the target site is an Error. Static (non-data-sourced) attributes are never flagged. - FlatteningPipeline computes the site-connection-names set from the loaded site data connections (mirroring M2.1's alarmCapableConnectionNames) and threads it in. - Tests: TemplateEngine.Tests covers design-time warning / deploy-time error / static-ok / exists-at-site / non-existent-connection. New FlatteningPipelineConnectionBindingTests proves the deploy path enforces it. Mark M2.7 + M2.8 completed in the plan task tracker.	2026-06-16 05:28:06 -04:00
Joseph Doherty	a8e9e9952d	fix(template): M2.7 review nits — comment-aware arg tokenizer + stricter numeric-literal inference (#20/#21) SplitCallArguments now skips C# line (`//`) and block (`/* */`) comments when tokenizing the argument list, so a comma inside a comment no longer produces a spurious arg-count mismatch. IsNumericLiteral now explicitly rejects tokens whose first non-sign character is `_` or a letter (e.g. `_2`), and restricts underscore digit-separators to positions after at least one digit, preventing identifier-shaped tokens from being inferred as Integer/Float.	2026-06-16 05:21:23 -04:00
Joseph Doherty	958229e1f8	feat(template): SemanticValidator script-call return-type (#20 ) + argument-type (#21 ) checks — M2.7 #20 return-type: when a CallScript/CallShared result is assigned directly into a typed local declaration (optionally awaited, optionally via an Instance./ Scripts./Parent./Children["x"]. receiver), compare the LHS declared type against the target script's declared ReturnDefinition and flag clear cross-category mismatches (ReturnTypeMismatch). Previously BuildReturnMap was built but never read. #21 argument-type: positional call arguments are now split (paren/brace/bracket + string-literal aware) and each literal-inferable argument is checked against the target's declared parameter type (ParameterMismatch), not just the count. Conservative — only CLEAR primitive mismatches (String/Integer/Float/Boolean) are flagged; Integer<->Float widening is tolerated. Unknown/Object/List declarations, var/untyped/unused/expression-embedded assignments, and non-literal arguments (variables, member access, method/await chains, casts, object/array initializers, compound or concatenated expressions, interpolated strings) are never flagged. Inference limits documented in code. Adds 16 SemanticValidatorTests covering mismatch detection, correct-call pass, and the dynamic/unknown no-false-positive cases.	2026-06-16 05:11:40 -04:00
Joseph Doherty	42d22766c7	docs(plan): mark M2.0-M2.6 complete in tasks.json; record commits + follow-ups	2026-06-15 15:20:04 -04:00
Joseph Doherty	411d0c043b	fix(inbound-api): M2.6 review nits — legacy required default, recursion depth guard, return-validator comment (#13 ) - legacy flat-array "required":"false" (string) now treated as optional (matches migration) - depth ceiling (32) on InboundApiSchema Parse/Validate recursion — guards against stack-overflow from a deeply-nested stored schema (Parse throws->400, Validate adds error) - DocOptions.MaxDepth=128 so the application-level structural guard fires before the System.Text.Json reader ceiling (each schema level = ~3 JSON reader levels) - comment the intentional ParameterValidator/ReturnValueValidator early-return asymmetry - note intentional datetime->string legacy collapse in NormalizeType - tests: legacy string-false optional, parse/validate depth ceiling, scalar return schema	2026-06-15 15:18:44 -04:00
Joseph Doherty	4b6187c853	feat(inbound-api): nested Object/List extended-type validation (#13 ) Object/List parameters and return values were shape-validated only (object vs array), with no field-level/nested type checks — type-wrong nested data passed inbound validation and failed only at script runtime. Add recursive type validation (declared Object field types, List element type, scalars at any depth) with path-qualified errors, symmetric across ParameterValidator and ReturnValueValidator. Both validators now parse the canonical JSON Schema definition format (the Central UI / MigrateParametersToJsonSchema output) via a shared recursive engine, Commons.Types.InboundApi.InboundApiSchema, instead of the legacy flat [{name,type}] array which they could not even deserialize from migrated rows. The legacy flat-array form is still accepted on read for transition safety. Undeclared fields are rejected at every level (consistent with the existing top-level unexpected-parameter rejection); a present-but-null value satisfies any type, only absence of a required field is an error.	2026-06-15 15:04:28 -04:00
Joseph Doherty	3032faac0d	fix(template): preserve per-script ExecutionTimeoutSeconds across UI edits; add alarm fallback tests (#9 ) The UI script editor has no ExecutionTimeoutSeconds control (authoring deferred), so a body edit silently cleared a timeout set via Transport import. Round-trip the loaded value so UI edits preserve it. Add the missing AlarmExecutionActor null/<=0 fallback tests for symmetry with ScriptExecutionActor.	2026-06-15 14:49:37 -04:00
Joseph Doherty	3edef09f51	feat(runtime): per-script execution timeout overriding the global default (#9 ) Spec promised a per-script timeout but only the global ScriptExecutionTimeoutSeconds existed. Add nullable TemplateScript.ExecutionTimeoutSeconds threaded through EF + flattening (ResolvedScript) to ScriptExecutionActor/AlarmExecutionActor, which use perScript ?? global for the execution CTS. Includes the EF migration for the new column.	2026-06-15 14:40:38 -04:00
Joseph Doherty	00304a26e6	fix(dcl): resolve OPC UA alarm type NodeId to friendly name so conditionFilter works (#8 ) HandleAlarmEvent set AlarmTypeName to the event-type NodeId string ("i=9341"), but the client-side conditionFilter gate (and the OPC UA WhereClause) use friendly type names — so a friendly-name filter built a correct server WhereClause yet the client gate dropped every event (zero alarms delivered). Resolve the event-type NodeId to its friendly name via an inverse of KnownConditionTypeIds (NodeId-string fallback for custom types) so both sides agree. Also fix a dead-code ternary in the SourceName derivation.	2026-06-15 14:25:35 -04:00
Joseph Doherty	8825df56be	fix(dcl): apply native-alarm conditionFilter (client-side gate + OPC UA WhereClause) (#8 ) conditionFilter was plumbed end-to-end but applied nowhere — a filtered source silently mirrored all conditions. Define the filter as a comma-separated, case-insensitive list of condition type names (blank = all); enforce it authoritatively client-side in DataConnectionActor routing (uniform across OPC UA + MxGateway) and, for OPC UA, additionally build a server-side EventFilter WhereClause as a bandwidth optimization.	2026-06-15 14:16:10 -04:00
Joseph Doherty	de375ff7ea	fix(db): classify non-SqlException DB outages as transient; propagate cancellation (#7 ) ExecuteWriteAsync only caught SqlException, so a live outage surfacing as InvalidOperationException/SocketException/IOException/TimeoutException escaped unclassified and crashed the script actor instead of buffering. Mirror the HTTP path: propagate OperationCanceledException on cancellation, classify transport exceptions as transient (buffer+retry), let unexpected exceptions propagate.	2026-06-15 14:03:25 -04:00
Joseph Doherty	d05270640d	fix(db): classify transient vs permanent SQL errors in Database.CachedWrite (#7 ) CachedWrite buffered ALL write failures and retried forever, never returning a synchronous failure to the script — permanent SQL errors (constraint/syntax/ permission) were treated as transient. Mirror the External-System API path: attempt immediately, return Failed synchronously on permanent SQL errors (no buffering), buffer only transient errors; the S&F retry path parks permanent failures instead of retrying forever. New SqlErrorClassifier + PermanentDatabaseException.	2026-06-15 13:53:15 -04:00
Joseph Doherty	198770f578	fix(deploy): address M2.2 review nits — backup endpoint in diff summary + null-oldConfig test (#10 ) - FormatConnection now includes BackupConfigurationJson so a backup-only change no longer renders identical Before/After cells (covers all 4 ConnectionsEqual fields) - add ComputeConnectionsDiff(null, newConfig) first-deploy unit test	2026-06-15 13:41:39 -04:00
Joseph Doherty	e9a84ba220	feat(deploy): surface connection-level changes in the deployment diff (#10 ) ComputeConnectionsDiff existed with tests but was never called and ConfigurationDiff had no slot for it, so standalone connection endpoint/protocol/failover drift never appeared in the deployment diff (only per-attribute binding drift did). Add a ConnectionChanges slot, wire ComputeConnectionsDiff into ComputeDiff, and render the connection section in the deployment diff UI.	2026-06-15 13:36:40 -04:00
Joseph Doherty	41d828e38e	fix(deploy): address M2.1 review nits — comparer consistency + comments (#22 ) - connection-name capable-set comparer kept as StringComparer.Ordinal: FlatteningService and SemanticValidator use all-ordinal name-keyed dictionaries throughout; OrdinalIgnoreCase would be inconsistent with the rest of the binding-resolution path — added comment documenting this - IsAlarmCapable protocol-match confirmed consistent with DataConnectionFactory (both OrdinalIgnoreCase); added case-insensitive InlineData variants (OPCUA, opcua, mxgateway, MXGATEWAY) to lock the contract - clarified FlatteningPipeline comment: "filters connections by alarm-capable protocol, then collects their names" (was "maps from the protocol string") - added DataConnectionLayer/DataConnectionFactory.cs path reference to AlarmCapableProtocols sync-risk comment	2026-06-15 13:27:26 -04:00
Joseph Doherty	d6909207a8	fix(deploy): wire native-alarm-source capability validation into flattening pipeline (#22 ) FlatteningPipeline loaded data connections but never passed the alarm-capable connection set to SemanticValidator, so the native-alarm-source capability check (built but inert) never ran — a source bound to a non-alarm-capable connection deployed silently. Compute the capable set (IAlarmSubscribableConnection: OPC UA + MxGateway) and thread it through ValidationService to SemanticValidator.	2026-06-15 13:20:20 -04:00
Joseph Doherty	2fb608f1b5	fix(configdb): resync EF model snapshot to clear PendingModelChangesWarning (#32 ) The actual drift was NOT OccurredAtUtc's converter (a same-CLR-type DateTime->DateTime ValueConverter emits no snapshot annotation and never triggers PendingModelChangesWarning). The real pending change was a HasData seed row: SecurityConfiguration adds LdapGroupMapping Id=5 (SCADA-Viewers -> Viewer) but the model snapshot omitted it, so MsSqlMigrationFixture's MigrateAsync threw PendingModelChangesWarning and failed every fixture-backed AuditLog MSSQL test (~57). Generated via `dotnet ef migrations add`; Up/Down are seed-data DML only (InsertData/DeleteData of the single reference row) -- no schema DDL. The snapshot now carries the Id=5 seed and has-pending-model-changes is clean.	2026-06-15 13:13:22 -04:00
Joseph Doherty	28bc639786	docs(plan): M2 implementation plan — Tier-2 correctness/behavioral gaps 19 tasks (M2.0-M2.19) covering stillpending.md Tier-2 items #7,#8,#9,#10, #13,#17,#18,#20-#31, plus pre-existing EF model/snapshot drift (#32, lead item). Risk-first ordering; migration tasks serialized. Scope decisions recorded: #19 done in M1.8; #16 deferred to M8; #17 reverts Host-008 per design doc; #8 filter semantics defined; #15 LDAP re-query spike-gated.	2026-06-15 13:08:37 -04:00
Joseph Doherty	3d9f562368	Merge M1: stillpending.md Tier-1 runtime wiring Closes the Tier-1 silent gaps from the stillpending.md audit (#3-#6): - AuditLog 365-day purge actor + reconciliation self-heal now actually start and run on the central node (were dead code). - SiteCall reconciliation pull (new PullSiteCalls RPC + plumbing) + daily terminal-row purge scheduler. - Site Event Logging now emits all 5 previously-missing categories (alarm, deployment, instance_lifecycle, store_and_forward, notification, script started/completed). 14 commits, each implement->review->fix. Build 0/0; cluster verified healthy with the new singletons starting cleanly (bash docker/deploy.sh).	2026-06-15 12:53:25 -04:00
Joseph Doherty	e5534fddca	fix(siteeventlog): suppress snapshot-resync alarm re-emit + coverage + hardening (review)	2026-06-15 12:45:00 -04:00
Joseph Doherty	e74c3aef23	feat(siteeventlog): emit script started/completed Info events (M1.8) ScriptExecutionActor previously emitted only an Error 'script' event on failure. It now also fire-and-forgets an Info 'script' event when execution starts (right before RunAsync) and when it completes successfully — giving the operational log the full started/completed/failed lifecycle. Uses the already-resolved siteEventLogger; fire-and-forget so the event log can never block or fault the script's own run. Extends the SingleServiceProvider test helper to also serve IServiceScopeFactory (returning a self-scope) so ScriptExecutionActor's serviceProvider.CreateScope() reaches the logging hot path in tests instead of throwing into the catch.	2026-06-15 12:33:31 -04:00
Joseph Doherty	d8b5dbb386	feat(siteeventlog): emit store_and_forward + notification events (M1.7) StoreAndForwardService gains an optional ISiteEventLogger? ctor param (default null so the many direct-construction tests still compile) and, when wired, mirrors its own buffer/retry/park activity onto site operational events via the existing OnActivity hook (which already isolates a throwing subscriber, so a failing event log can never be misclassified as a transient delivery failure): - store_and_forward (ExternalSystem / CachedDbWrite): queued/retried/delivered/ parked. Warning on buffer/retry, Error on park, Info on retry-recovery; an immediate-success delivery is the hot path and is not logged. - notification (the site forward-to-central path): logged ONLY on forward FAILURE (buffered after the immediate forward threw) and on park, per the Component-SiteEventLogging spec — routine enqueue and forward-success are deliberately not logged (central's Notifications table is the audit record). Wired through AddStoreAndForward (resolves ISiteEventLogger optionally from DI); StoreAndForward project now references SiteEventLogging (acyclic: SiteEventLogging references only Commons). Also documents the 'notification' category on the ISiteEventLogger.LogEventAsync eventType param (folds in M1.8 doc fix).	2026-06-15 12:31:04 -04:00

1 2 3 4 5 ...

1377 Commits