scadalink-design

Author	SHA1	Message	Date
Joseph Doherty	943c2ced39	feat(ui): Audit KPI tiles on Health dashboard (#23 M7) Adds three KPI tiles to the central Health dashboard for the Audit channel: volume (rows in the last hour), error rate (Failed/Parked/Discarded over total), and backlog (sum of SiteAuditBacklog.PendingCount across all sites). Repo + service: - IAuditLogRepository.GetKpiSnapshotAsync(window, nowUtc) — single aggregate SELECT over the trailing window returning total + error counts; nowUtc is optional for production callers and pinned by integration tests against the shared MSSQL fixture so the global counts are deterministic. - AuditLogQueryService.GetKpiSnapshotAsync() — composes the repo aggregate with a sum of SiteAuditBacklog.PendingCount read from ICentralHealthAggregator. - AuditLogKpiSnapshot record in Commons/Types/. UI: - New AuditKpiTiles Blazor component (Components/Health/) — three Bootstrap card-tiles, click navigates to /audit/log with the matching pre-filter. - Health.razor wires the tiles in alongside the existing Notification Outbox KPIs; LoadAuditKpis() runs on every 10s refresh tick and degrades to em dashes + inline error if the query fails. - AuditLogPage extended to parse ?status= so the error-rate tile drill-in (?status=Failed) auto-loads the grid. Tests: - AuditLogRepositoryTests: GetKpiSnapshotAsync mixed-status + empty-window cases against the MSSQL migration fixture. - AuditLogQueryServiceTests: forwarding + backlog composition; sites with null SiteAuditBacklog contribute zero. - AuditKpiTilesTests: 9 bUnit tests covering tile render, error-rate maths with safe zero-events handling, em-dash unavailable path, click-through navigation, and warning/danger border thresholds. - HealthPageTests: new Renders_AuditKpiTiles_WithValues plus IAuditLogQueryService stub registration in the constructor so existing outbox tests still pass. - AuditLogPageScaffoldTests: ?status=Failed auto-load + unknown status drop.	2026-05-20 20:43:57 -04:00
Joseph Doherty	e93f655ce4	feat(health): SiteAuditBacklog metric (count + age + bytes) (#23 M6)	2026-05-20 19:02:01 -04:00
Joseph Doherty	75b060e0a8	feat(auditlog): AuditLogPartitionMaintenanceService monthly roll-forward (#23 M6)	2026-05-20 18:51:43 -04:00
Joseph Doherty	660fdc4e93	feat(auditlog): AuditLogPurgeActor daily partition-switch purge (#23 M6) Central singleton (M6-T4 Bundle C) that drives the daily AuditLog partition purge. On a configurable timer (default 24 hours) the actor: 1. Queries IAuditLogRepository.GetPartitionBoundariesOlderThanAsync for monthly boundaries whose latest OccurredAtUtc is older than DateTime.UtcNow - AuditLogOptions.RetentionDays. 2. For each eligible boundary calls SwitchOutPartitionAsync, which runs the drop-and-rebuild dance around UX_AuditLog_EventId. 3. Publishes AuditLogPurgedEvent(boundary, rowsDeleted, durationMs) on the actor-system EventStream so the Bundle E central health collector and ops surfaces can subscribe without coupling to this actor. Co-changes: * SwitchOutPartitionAsync returns long (rows deleted) — sampled BEFORE the switch via COUNT_BIG over the per-partition filter so the count reflects what the switch removed, not a post-purge scan of a table that no longer exists. All stub implementations updated. * AuditLogPurgeOptions: IntervalHours (default 24), IntervalOverride for tests, Interval property resolving either. * AuditLogPurgedEvent: record with MonthBoundary, RowsDeleted, DurationMs. Behavior: * Continue-on-error per boundary — one partition that throws does NOT abandon the rest of the tick. * DI scope opened per tick (IAuditLogRepository is a SCOPED EF Core service); mirrors SiteAuditReconciliationActor and AuditLogIngestActor. * SupervisorStrategy Resume keeps the singleton alive across leaked exceptions. * EventStream capture BEFORE the first await — Context is unsafe after await in async receive handlers (same pattern as Sender-capture in AuditLogIngestActor.OnIngestAsync). Tests: * Tick_Fires_OnDailyInterval — visible timer side effect. * Tick_OldPartitions_SwitchedOut — both seeded boundaries purged. * Tick_NewerPartitions_Untouched — empty enumerator → no switches. * Tick_PublishesPurgedEvent_WithRowCount — AuditLogPurgedEvent carries RowsDeleted and DurationMs. * Tick_SwitchThrows_OtherPartitionsStillProcessed — continue-on-error. * Threshold_UsesAuditLogOptionsRetentionDays — non-default 30-day window computed from UtcNow - RetentionDays. * EndToEnd_RealPartition_RowsRemoved_PurgedEventPublished — TestKit + MsSqlMigrationFixture: real partitioned table, Jan-2026 row purged, Apr-2026 row kept, AuditLogPurgedEvent observed via probe.	2026-05-20 18:36:31 -04:00
Joseph Doherty	6069a20e0f	fix(configdb): replace SwitchOutPartitionAsync stub with drop-and-rebuild dance (#23 M6) Replaces M1's NotSupportedException stub with the production drop-DROP-INDEX → CREATE-staging → SWITCH PARTITION → DROP-staging → CREATE-INDEX dance documented in alog.md §4. UX_AuditLog_EventId is intentionally non-aligned with ps_AuditLog_Month so single-column EventId uniqueness can be enforced cheaply for InsertIfNotExistsAsync; SQL Server rejects ALTER TABLE SWITCH while a non-aligned unique index is present, so the implementation drops it, switches the partition data into a GUID-suffixed staging table on [PRIMARY], drops staging (discarding the rows), and rebuilds the unique index — all inside an explicit transaction with a CATCH that guarantees the unique index is rebuilt regardless of failure point. Also adds GetPartitionBoundariesOlderThanAsync to IAuditLogRepository: a CROSS APPLY over sys.partition_range_values + per-partition MAX(OccurredAtUtc) to enumerate retention-eligible months for the M6 purge actor (next commit). Tests verify: * Old partition's rows are removed; other months untouched * UX_AuditLog_EventId is rebuilt after a successful switch * InsertIfNotExistsAsync's first-write-wins idempotency still holds after switch * On engineered SWITCH failure (inbound FK from a probe table), SqlException propagates AND UX_AuditLog_EventId is still present (CATCH branch ran) * GetPartitionBoundariesOlderThanAsync returns only boundaries whose partition's MAX(OccurredAtUtc) is strictly older than the threshold; empty partitions excluded	2026-05-20 18:20:55 -04:00
Joseph Doherty	640fd07454	feat(comms): site-side PullAuditEvents handler (#23 M6)	2026-05-20 17:58:43 -04:00
Joseph Doherty	23c0fd417e	feat(health): AuditRedactionFailure counter + bridge (#23 M5) Bundle C task M5-T7 — surface DefaultAuditPayloadFilter redactor over-redactions as a Site Health metric so a misconfigured / catastrophic regex shows up on /monitoring/health rather than disappearing into a NoOp sink. - SiteHealthReport: new 'AuditRedactionFailure' int field (defaulted to 0 for back-compat with existing producers/tests). - ISiteHealthCollector / SiteHealthCollector: new IncrementAuditRedactionFailure() — per-interval atomic counter with Interlocked, reset on CollectReport, mirroring the M2 Bundle G SiteAuditWriteFailures pattern. - HealthMetricsAuditRedactionFailureCounter: new bridge in ScadaLink.AuditLog.Site that forwards IAuditRedactionFailureCounter increments to ISiteHealthCollector — mirrors HealthMetricsAuditWriteFailureCounter one-for-one. - AddAuditLogHealthMetricsBridge: now ALSO Replaces the NoOpAuditRedactionFailureCounter binding with the health-metrics bridge, so a single AddAuditLogHealthMetricsBridge() call wires both the M2 Bundle G write-failure counter and the M5 Bundle C redaction-failure counter into the health report. Site-side only for M5 — the filter also runs on CentralAuditWriter and AuditLogIngestActor (where it just keeps the NoOp default), but a central-side health-metric surface for AuditRedactionFailure is deferred to M6 alongside the rest of the central health collector work. Tests: - AuditRedactionFailureMetricTests (HealthMonitoring) covers the SiteHealthCollector increment/report/reset shape (3 tests). - HealthMetricsAuditRedactionFailureCounterTests (AuditLog) covers the AuditLog → HealthMonitoring bridge (3 tests). - Existing CountCapturingHealthCollector stub in DeploymentManagerRedeployTests extended with the new no-op interface method. Verified: dotnet build clean, all 24 test projects green (the only Failed at first ScadaLink.SiteRuntime.Tests run was the known-flaky InstanceActorChildAttributeRaceTests; passes on re-run in isolation and full suite, unrelated to these changes).	2026-05-20 17:28:33 -04:00
Joseph Doherty	63eb1f4225	feat(snf): per-attempt and terminal cached-call lifecycle observer (#23 M3) Hook the store-and-forward retry loop so the audit pipeline can emit per-attempt + terminal telemetry under the original TrackedOperationId (Bundle E Tasks E4 + E5). New seam: * ICachedCallLifecycleObserver + CachedCallAttemptContext in Commons.Interfaces.Services. Outcome enum (Delivered / TransientFailure / PermanentFailure / ParkedMaxRetries) is S&F-vocabulary; the bridge living in ScadaLink.AuditLog (Bundle F) will map it to the AuditKind/AuditStatus pair when building the CachedCallTelemetry packet. * StoreAndForwardService gains an optional cachedCallObserver constructor parameter + siteId. RetryMessageAsync fires the observer exactly once per attempt with the appropriate outcome: - handler returns true -> Delivered - handler returns false -> PermanentFailure (and parks) - handler throws + retries remaining -> TransientFailure - handler throws + max retries hit -> ParkedMaxRetries (and parks) Hook is best-effort: a thrown observer is logged + swallowed so a failing audit pipeline can never be misclassified as a transient delivery failure or corrupt the retry-count bookkeeping (alog.md §7). Only cached-call categories (ExternalSystem, CachedDbWrite) generate notifications — Notification category has its own central-side audit pipeline (Notification Outbox / #21). Pre-M3 callers that didn't thread a TrackedOperationId into the S&F message id are silently skipped — the observer requires a parseable id by contract. New S&F callers stamp the id as messageId (Bundle E3). Bundle E tasks E4 + E5.	2026-05-20 14:52:34 -04:00
Joseph Doherty	42430dd10a	feat(siteruntime): ExternalSystem.CachedCall emits CachedSubmit telemetry (#23 M3) Rework ScriptRuntimeContext.ExternalSystem.CachedCall to fit the M3 combined-telemetry model: * Mints a fresh TrackedOperationId and emits one CachedSubmit packet via ICachedCallTelemetryForwarder BEFORE handing the call off — the SiteCalls row is materialised before the first delivery attempt so Tracking.Status(id) can observe a Submitted row even if immediate delivery resolves before the helper returns. * Threads the TrackedOperationId into IExternalSystemClient.CachedCallAsync as a new optional parameter (and into IDatabaseGateway.CachedWriteAsync for the Database mirror set up here for E6). The gateway uses the id as the StoreAndForward messageId so the retry loop (Tasks E4/E5) can recover it from StoreAndForwardMessage.Id. * Returns the TrackedOperationId rather than ExternalCallResult — the script's contract is now "get a tracking handle, observe outcome via Tracking.Status". Best-effort emission: a thrown forwarder is logged + swallowed; the original call still runs and the id is still returned. DatabaseHelper gets the matching siteId / sourceScript / forwarder fields and a parallel CachedSubmit emitter (Channel=DbOutbound) so Task E6's Database.CachedWrite mirror plugs in without further runtime wiring. New ICachedCallTelemetryForwarder seam in Commons.Interfaces.Services so SiteRuntime depends on Commons (existing arrow) rather than ScadaLink.AuditLog (would have introduced a new dependency). Bundle E task E3 (and helper-shape work for E6).	2026-05-20 14:48:05 -04:00
Joseph Doherty	0a97fff906	feat(auditlog): combined telemetry dual-write transaction (#23 M3)	2026-05-20 14:33:14 -04:00
Joseph Doherty	de110f8b42	feat(scaudit): SiteCallAuditActor minimum surface (#22 , #23 M3) Bundle C of Audit Log #23 M3. Adds the ScadaLink.SiteCallAudit project + matching tests project, mirroring the ScadaLink.AuditLog scaffolding pattern (net10.0, central package management, InternalsVisibleTo to the tests assembly). SiteCallAuditActor is the central singleton entry point for Site Call Audit (#22): it receives UpsertSiteCallCommand and persists the SiteCall via ISiteCallAuditRepository.UpsertAsync (monotonic, idempotent — out-of-order or duplicate updates are silent no-ops at the repo). Audit-write failures NEVER abort the user-facing action (CLAUDE.md): repository throws are caught + logged, the actor replies Accepted=false, and the singleton stays alive (Resume supervisor strategy as defence in depth). Two constructors mirror AuditLogIngestActor: - IServiceProvider production constructor resolves the scoped EF repository from a fresh DI scope per message. - ISiteCallAuditRepository test constructor injects a concrete repository so the TestKit tests exercise the real monotonic-upsert SQL end to end. UpsertSiteCallCommand + UpsertSiteCallReply live in ScadaLink.Commons (same home as IngestAuditEventsCommand) so Bundle D's gRPC server can construct them without taking a project reference on the actor's host project. AddSiteCallAudit() is a placeholder for symmetry with AddAuditLog / AddNotificationOutbox; Bundle F will populate it with the actor's Props factory + options bindings. Tests (Akka.TestKit.Xunit2 + MsSqlMigrationFixture via project ref to ScadaLink.ConfigurationDatabase.Tests, mirroring Bundle D2): - Receive_UpsertSiteCallCommand_Persists_Replies_Accepted - Receive_DuplicateUpsert_OlderStatus_NoOp_StillRepliesAccepted (idempotency) - Receive_RepoThrowsTransient_RepliesAccepted_False_ActorStaysAlive Reconciliation, KPIs, and the central->site Retry/Discard relay are deferred per CLAUDE.md scope discipline. ScadaLink.slnx updated to include both new projects. All 3 new tests pass against the running infra/mssql container; full suite (2683 tests across 27 projects) passes with no regressions.	2026-05-20 14:18:49 -04:00
Joseph Doherty	bedfa6b8f3	feat(configdb): ISiteCallAuditRepository + EF impl, monotonic upsert (#22 , #23 M3) Bundle B3 of Audit Log #23 M3: data-access layer for the central SiteCalls table introduced in B1+B2. UpsertAsync is insert-if-not-exists then monotonic-status update so out-of-order telemetry, duplicate gRPC packets, and reconciliation pulls all converge on the same row without rolling state backward. - src/ScadaLink.Commons/Interfaces/Repositories/ISiteCallAuditRepository.cs: UpsertAsync (monotonic), GetAsync, QueryAsync, PurgeTerminalAsync. - src/ScadaLink.Commons/Types/Audit/SiteCallQueryFilter.cs + SiteCallPaging.cs: filter (Channel/SourceSite/Status/Target/time range) and keyset paging cursor on (CreatedAtUtc DESC, TrackedOperationId DESC), mirrored on M1's AuditLog* equivalents. - src/ScadaLink.ConfigurationDatabase/Repositories/SiteCallAuditRepository.cs: raw-SQL InsertIfNotExists + conditional UPDATE with inline CASE rank compare (Submitted=0, Forwarded=1, Attempted/Skipped=2, terminal=3 — terminal statuses are mutually exclusive so e.g. Delivered cannot overwrite Parked). Duplicate-key violations (SQL 2601/2627) are swallowed at Debug, identical to AuditLogRepository's race-fix. QueryAsync uses FromSqlInterpolated because EF Core 10 cannot translate string.Compare against the value-converted TrackedOperationId column inside an expression tree. - ServiceCollectionExtensions wires the repository (scoped, after IAuditLogRepository). - 12 integration tests in tests/ScadaLink.ConfigurationDatabase.Tests/ Repositories/ (MsSqlMigrationFixture + [SkippableFact]): fresh insert, monotonic advance, older-status no-op, same-status no-op, terminal-over-terminal no-op, 50-way concurrent-insert race produces exactly one row, Get known/unknown, filter by site, keyset paging no overlap, purge terminal-and-old, purge keeps non-terminal-and-recent.	2026-05-20 14:10:24 -04:00
Joseph Doherty	3162286ade	feat(configdb): map SiteCall to SiteCalls table (#22 , #23 M3) Bundle B1 of Audit Log #23 M3: introduces the SiteCall entity + EF mapping for the central SiteCalls operational-state table. One row per TrackedOperationId, mirrored from sites via best-effort telemetry then periodic reconciliation; eventually-consistent mirror, not a dispatcher. - src/ScadaLink.Commons/Entities/Audit/SiteCall.cs: append-once record with required TrackedOperationId/Channel/Target/SourceSite/Status, monotonic status update at the repo layer. - src/ScadaLink.ConfigurationDatabase/Configurations/SiteCallEntityTypeConfiguration.cs: table SiteCalls, PK on TrackedOperationId (stored as varchar(36) via value conversion through the canonical 'D'-format GUID string — matches the wire shape used by gRPC + SQLite columns), two named indexes (IX_SiteCalls_Source_Created, IX_SiteCalls_Status_Updated). - ScadaLinkDbContext: DbSet<SiteCall> SiteCalls in the existing Audit section, after AuditLogs. - Tests in tests/ScadaLink.ConfigurationDatabase.Tests/Configurations/: table name, PK, value-conversion shape, index presence + ordering.	2026-05-20 14:04:17 -04:00
Joseph Doherty	e416b21dad	feat(commons): CachedCallTelemetry combined operational+audit packet (#23 M3)	2026-05-20 13:58:57 -04:00
Joseph Doherty	b86d7c61ab	feat(siteruntime): OperationTrackingStore site-local SQLite (#23 M3)	2026-05-20 13:51:09 -04:00
Joseph Doherty	1c38dd540f	feat(commons): TrackedOperationId strong type (#23 M3)	2026-05-20 13:47:40 -04:00
Joseph Doherty	dd3351da93	feat(health): SiteAuditWriteFailures counter + AuditLog bridge (#23 ) Bundle G of Audit Log #23 M2. Bridges the FallbackAuditWriter primary- failure counter into the Site Health Monitoring report payload so a sustained audit-write outage surfaces on /monitoring/health instead of disappearing into a NoOp sink. - SiteHealthReport: add SiteAuditWriteFailures (defaulted, additive). - ISiteHealthCollector + SiteHealthCollector: new IncrementSiteAuditWriteFailures() counter, per-interval reset semantics matching ScriptErrorCount / DeadLetterCount. - HealthMetricsAuditWriteFailureCounter: adapter forwarding IAuditWriteFailureCounter.Increment() to the collector. - AddAuditLogHealthMetricsBridge(): swaps the NoOp default registration for the real bridge; called from SiteServiceRegistration after AddSiteHealthMonitoring + AddAuditLog. - Existing host-wiring test updated: site composition now resolves HealthMetricsAuditWriteFailureCounter (not NoOp). Tests: HealthMonitoring 60 -> 63 (3 new), AuditLog 56 -> 59 (3 new), full solution green.	2026-05-20 13:22:25 -04:00
Joseph Doherty	87cae88f92	feat(auditlog): AuditLogIngestActor + gRPC handler (#23 )	2026-05-20 12:48:26 -04:00
Joseph Doherty	db32a149d3	feat(commons): add IAuditLogRepository + AuditLogQueryFilter + AuditLogPaging (#23 ) Append-only data-access surface for the central AuditLog table — three methods: InsertIfNotExistsAsync (first-write-wins on EventId), QueryAsync (filter + keyset paging on (OccurredAtUtc desc, EventId desc)), and SwitchOutPartitionAsync (M1 honest contract — throws NotSupported until M6 lands the non-aligned-index drop/rebuild dance for the partition switch). No Update, no row-delete; bulk purge is partition-only. Bundle D of the Audit Log #23 M1 Foundation plan.	2026-05-20 11:04:59 -04:00
Joseph Doherty	08743bc42d	feat(commons): add audit telemetry + pull message DTOs (#23 )	2026-05-20 09:57:39 -04:00
Joseph Doherty	8ac5ebe97e	feat(commons): add IAuditWriter and ICentralAuditWriter (#23 )	2026-05-20 09:56:49 -04:00
Joseph Doherty	e41a18ba7d	feat(commons): add AuditEvent record (#23 )	2026-05-20 09:56:11 -04:00
Joseph Doherty	f80eea375c	feat(commons): add Audit{Channel,Kind,Status,ForwardState} enums for #23	2026-05-20 09:55:13 -04:00
Joseph Doherty	adcab9dcfc	feat(notification-outbox): per-site KPI request/response message contracts	2026-05-19 05:33:37 -04:00
Joseph Doherty	67b86aa683	feat(notification-outbox): per-site KPI snapshot type + repository contract	2026-05-19 05:22:45 -04:00
Joseph Doherty	c8b5871782	fix(notification-outbox): re-align Central UI sandbox Notify API with production The script-analysis sandbox Notify surface was stale after the Notification Outbox change: SandboxNotifyTarget.Send returned Task<NotificationResult> and there was no Status method, while production NotifyTarget.Send returns Task<string> (a NotificationId) plus NotifyHelper.Status. A script that test-ran cleanly in the sandbox would not compile against the real site runtime. - Move the NotificationDeliveryStatus record from ScadaLink.SiteRuntime.Scripts into ScadaLink.Commons.Messages.Notification so both production and the CentralUI sandbox reference the exact same type (CentralUI does not, and should not, reference SiteRuntime). Production NotifyHelper.Status is otherwise untouched. - Rewrite SandboxNotifyHelper/SandboxNotifyTarget to be a signature-faithful no-op fake: Send returns Task<string> (a fake NotificationId), Status returns Task<NotificationDeliveryStatus>. Production now enqueues into the site S&F engine, which has no central-side equivalent in the sandbox, so the fake no longer carries an INotificationDeliveryService. - Add script-analysis tests proving a script using the new Notify shape both diagnoses clean and runs in the sandbox.	2026-05-19 03:44:34 -04:00
Joseph Doherty	77a05a8960	fix(notification-outbox): give KPI response a failure shape; log status-query faults	2026-05-19 01:55:46 -04:00
Joseph Doherty	c547f82957	feat(notification-outbox): add notification message and outbox query contracts	2026-05-19 01:13:36 -04:00
Joseph Doherty	07cd185368	refactor(notification-outbox): align outbox repository with cancellationToken convention	2026-05-19 01:05:52 -04:00
Joseph Doherty	2c59d59b61	feat(notification-outbox): add NotificationOutbox repository	2026-05-19 01:02:06 -04:00
Joseph Doherty	87ac9b8a4d	feat(notification-outbox): add Type field to NotificationList	2026-05-19 00:52:23 -04:00
Joseph Doherty	397a62677f	feat(notification-outbox): add Notification entity	2026-05-19 00:48:48 -04:00
Joseph Doherty	f9b942bb94	feat(notification-outbox): add NotificationType and NotificationStatus enums	2026-05-19 00:45:05 -04:00
Joseph Doherty	7da303d7bb	fix(configuration-database): resolve ConfigurationDatabase-012 — store inbound-API keys as HMAC-SHA256 hashes Inbound-API bearer credentials are no longer persisted in plaintext. ApiKey now holds a KeyHash (peppered HMAC-SHA256); the key is shown once at creation and only its hash is stored. Lookup and validation hash the presented candidate. Cross-module: Commons (ApiKey, ApiKeyHasher), ConfigurationDatabase (mapping + HashApiKeyValue migration), InboundAPI (ApiKeyValidator), ManagementService (key creation), CentralUI (ApiKeys.razor). Existing keys must be re-issued.	2026-05-17 05:42:52 -04:00
Joseph Doherty	a78c3bcb6f	fix(commons): resolve Commons-013,014 — integral JSON index handling, distinguish Malformed vs Legacy OPC UA config	2026-05-17 03:18:17 -04:00
Joseph Doherty	8dd74121c3	fix(inbound-api): resolve InboundAPI-012 — move ParameterDefinition POCO to ScadaLink.Commons (Types/InboundApi)	2026-05-17 00:04:56 -04:00
Joseph Doherty	a55502254e	fix(external-system-gateway): resolve ExternalSystemGateway-011 — name-keyed repository lookups replace fetch-all-then-filter on the call hot path	2026-05-17 00:02:45 -04:00
Joseph Doherty	b1f4251d75	fix(commons): resolve Commons-008 — replace ValueTuple in SetConnectionBindingsCommand with named ConnectionBinding record (CLI, ManagementService, TemplateEngine, CentralUI)	2026-05-16 23:54:31 -04:00
Joseph Doherty	c07f524ca4	fix(commons): resolve Commons-005..007,009..012 — OPC UA parse status, TryConvert correctness, Result null guard, invariant formatting, doc refresh	2026-05-16 22:04:21 -04:00
Joseph Doherty	3e7a3d7e31	fix(commons): resolve Commons-001..004 — stale-fire race, JsonDocument lifetime, GetNullable strictness, registry symmetry	2026-05-16 20:58:03 -04:00
Joseph Doherty	305b42ea6d	feat(template-engine): resolve TemplateEngine-002 — per-slot alarm override for derived templates Adds IsInherited/LockedInDerived to the TemplateAlarm entity (mirroring the attribute/script override model), an EF migration, base-alarm copy-on-derive, inherited-alarm flattening skip, and LockedInDerived override-rejection validation.	2026-05-16 20:12:24 -04:00
Joseph Doherty	bc548e1447	feat(deployment-manager): resolve DeploymentManager-006 — query site deployment state before redeploy and reconcile Adds DeploymentStateQuery request/response contracts (Commons), a site-side handler (SiteRuntime), a CommunicationService query method (Communication), and reconciliation in DeploymentService: when a prior record is InProgress or Failed-on-timeout, query the site; if it already holds the target revision hash mark the record Success without re-sending; on query failure fall through to a normal deploy (site-side stale-rejection is the safety net).	2026-05-16 20:12:24 -04:00
Joseph Doherty	199cdbe798	feat(triggers): add Expression to the script & alarm trigger codecs	2026-05-16 05:27:33 -04:00
Joseph Doherty	295150751f	feat(scripts): realign Test Run with runtime API, add anonymous-object calls and instance binding The Test Run sandbox and Monaco analysis modelled a script API that had drifted from the site runtime's ScriptGlobals, so real scripts failed to compile in Test Run. Realign both to the runtime surface (Instance/Scripts/ExternalSystem/Attributes/Children/Parent) and drop the duplicate ScriptHost stub so the two cannot diverge again. - Script calls (Scripts.CallShared, Instance.CallScript, Route.To().Call) accept an anonymous object instead of a hand-built dictionary, via a shared ScriptArgs normalizer; existing dictionary calls still compile. - Test Run can optionally bind to a deployed instance, so Instance/ Attributes/CallScript route to it cross-site; adds site-side RouteToGetAttributes/RouteToSetAttributes handlers. - Adds Test Run panels to the API method and template script editors. - Fixes the TestDatabaseQuery seed script, which queried a table that never existed. Also commits unrelated in-progress work already in the tree: the health monitoring report loop, site streaming changes, and the Admin/Design data-connection and SMTP page reorganization.	2026-05-16 03:37:56 -04:00
Joseph Doherty	1d5465f31c	fix(deployment): instance delete fully removes the record Deleting an instance only undeployed it from the site and set the state to NotDeployed, leaving an orphan record that could never be removed — the state-transition matrix rejected delete from NotDeployed. Delete now removes the instance record entirely (deployment history, snapshot, attribute/alarm overrides, and connection bindings go with it), and is permitted from any state.	2026-05-15 12:05:13 -04:00
Joseph Doherty	7bba48a14a	feat(ui/monitoring): redesign Parked Messages page with filters, drawer, and bulk actions Triage was painful on the old layout: a lone Site dropdown sat on a sparse row, errors were truncated mid-sentence with a per-row View/Hide toggle that on expand pushed an unwrapped <pre> through the table and shoved the Actions column off-screen, all rows looked the same regardless of age or attempt count, and OriginInstance — which tells you which instance produced the failure — wasn't displayed at all even though the data was on the entity. This pass: - Adds a real filter bar: Site, Category, Target system, Origin instance, Age window, free-text search. Category/Target/Origin/Age/Search filter the loaded page client-side; Site still drives the server query (and changing site now auto-queries — one fewer click). - Replaces the in-table expansion with an Offcanvas detail drawer. Clicking a row slides in a side panel with full message ID + copy, category label, origin, attempts, both timestamps in relative + absolute form, the complete error (pre-wrap, scrollable), and big Retry / Discard buttons. The table never overflows. - Stacks Target + Method into one column (target in semibold, method small/muted below) and surfaces Origin as a code-styled chip in a new column ("—" muted when null). - Severity left-border on each row, derived client-side from AttemptCount/MaxAttempts and age of the last attempt: red when retries are exhausted and last attempt was in the past hour, amber when exhausted but stale, muted grey otherwise. - Mini attempt progress bar under the n/max count, red when fully exhausted and amber while partial. - Relative timestamps ("5m ago", "1h ago", "2d ago") with absolute UTC on hover via the title attribute — applies in both the table and the drawer. - Bulk select: header checkbox selects the filtered set, per-row checkboxes. When ≥1 selected, a sticky action strip slides in below the filter bar offering Retry selected / Discard selected with the usual confirm dialog. Toast reports per-item success/failure counts. - Summary line next to the title: "N parked · K target systems · oldest Xh ago" (and "(showing M of N)" when filters are active). - ParkedMessageEntry contract extended additively with MaxAttempts, Category, and OriginInstance so the UI has the data it needs for severity, the category filter, and the new column. - Bumped page size from 25 to 50 to better match the dense layout.	2026-05-13 08:05:22 -04:00
Joseph Doherty	6f1f6b8467	fix(health): replicate site health reports between central nodes CentralHealthAggregator is a per-node hosted singleton, but site health reports flow through ClusterClient which round-robins each report to one central node only. The other node's aggregator never saw those reports and marked sites offline at the 60s threshold — sites constantly flapped between online and offline on the monitoring page. On receive, the active CentralCommunicationActor now republishes a SiteHealthReportReplica wrapper on a DistributedPubSub topic. Both central nodes subscribe to the topic and process replicas through a dedicated path that updates the local aggregator without re-broadcasting (avoids fan-out loops). The aggregator's existing sequence-number idempotency makes self-delivery a cheap no-op. DistributedPubSubExtensionProvider is now listed in the HOCON `akka.extensions` block so the mediator is initialised at cluster start, eliminating a race where the first Subscribe arrived before the extension was loaded.	2026-05-13 06:20:07 -04:00
Joseph Doherty	751248feb6	feat(alarms): HiLo trigger type with per-band level, hysteresis, messages, overrides Adds a new HiLo alarm trigger type with four configurable setpoints (LoLo / Lo / Hi / HiHi). Each setpoint carries an optional priority, deadband (for hysteresis), and operator message. The site runtime emits AlarmStateChanged with an AlarmLevel field so consumers can differentiate warning vs critical bands. Plumbing: - new AlarmLevel enum + AlarmStateChanged.Level/Message init properties - AlarmTriggerEditor (Blazor) gets a HiLo render with severity tinting - AlarmTriggerConfigCodec extracted from the editor for testability - sitestream.proto carries level + message over gRPC - SemanticValidator enforces numeric attribute, setpoint ordering, non-negative deadband - on-trigger scripts get an Alarm global (Name/Level/Priority/Message) so notification routing can branch by severity - per-instance InstanceAlarmOverride entity + EF migration + flattening step + CLI commands; HiLo overrides merge setpoint-by-setpoint, binary types whole-replace - DebugView shows a Level badge + per-band message tooltip - App.razor auto-reloads on permanent Blazor circuit failure - docker/regen-proto.sh automates the proto regen workflow (the linux/arm64 protoc segfault means generated files are checked in for now)	2026-05-13 03:23:32 -04:00
Joseph Doherty	5615f3d0c7	feat(templates): phase 1 — derived-template schema (additive) Phase 1 of the design at docs/plans/2026-05-12-derive-on-compose-design.md. Additive schema only — no behavior changes. Existing data and code paths continue to work; subsequent phases will start writing the new fields. Template gains: IsDerived true when this row was auto-created to back a composition slot OwnerCompositionId back-ref to the owning TemplateComposition (plain int, not an EF nav property — managed by TemplateService for cascade-delete) TemplateAttribute / TemplateScript each gain: IsInherited row copied from base and not yet overridden; changes to the base flow downward LockedInDerived on a base, blocks derived from overriding; enforced at the service layer in later phases EF Core migration AddDerivedTemplateFields adds four columns: Templates.IsDerived bit NOT NULL DEFAULT 0 Templates.OwnerCompositionId int NULL TemplateAttributes.IsInherited bit NOT NULL DEFAULT 0 TemplateAttributes.LockedInDerived bit NOT NULL DEFAULT 0 TemplateScripts.IsInherited bit NOT NULL DEFAULT 0 TemplateScripts.LockedInDerived bit NOT NULL DEFAULT 0 Existing rows get the defaults. Tests across SiteRuntime / TemplateEngine / CentralUI suites stay green (129 / 199 / 159). Next: phase 2 — wire AddCompositionAsync to derive on compose for new compositions. Old data still flows the direct-reference path until phase 3's migration script.	2026-05-12 08:16:24 -04:00
Joseph Doherty	0139c9ca83	refactor(scripts): scoped parent query + parent picker for multi-parent templates Two caveats from the script-scope rollout addressed: 1. ITemplateEngineRepository.GetTemplatesComposingAsync — a scoped query that returns only the templates referencing a given template via Compositions, eager-loaded with their Attributes / Scripts / Compositions. Replaces the GetAllTemplatesAsync + filter pattern in TemplateEdit so the Monaco metadata fetch doesn't pull the entire template catalog to find one parent. 2. Multi-parent picker. The previous implementation suppressed Parent assistance entirely when more than one template composes the open one. Now TemplateEdit collects every parent into _editorParents and renders a small `select` above the script editor when there are >1, letting the user choose which parent's metadata drives Parent.Attributes / Parent.CallScript completion + diagnostics. Single-parent templates skip the picker (no UI change). Zero parents (root template) hide the picker and surface no Parent assistance. Browser-verified on the Sensor Module template (composed by both Pump and Variable Speed Motor): picker shows both options, switching updates the editor's parent metadata immediately via the existing GetContext callback. Test counts unchanged (159 / 199); the new repo method is exercised end-to-end by the parent-picker browser path.	2026-05-12 06:00:02 -04:00

1 2

97 Commits