Commit Graph

1338 Commits

Author SHA1 Message Date
Joseph Doherty d05270640d fix(db): classify transient vs permanent SQL errors in Database.CachedWrite (#7)
CachedWrite buffered ALL write failures and retried forever, never returning a
synchronous failure to the script — permanent SQL errors (constraint/syntax/
permission) were treated as transient. Mirror the External-System API path:
attempt immediately, return Failed synchronously on permanent SQL errors (no
buffering), buffer only transient errors; the S&F retry path parks permanent
failures instead of retrying forever. New SqlErrorClassifier + PermanentDatabaseException.
2026-06-15 13:53:15 -04:00
Joseph Doherty 198770f578 fix(deploy): address M2.2 review nits — backup endpoint in diff summary + null-oldConfig test (#10)
- FormatConnection now includes BackupConfigurationJson so a backup-only change
  no longer renders identical Before/After cells (covers all 4 ConnectionsEqual fields)
- add ComputeConnectionsDiff(null, newConfig) first-deploy unit test
2026-06-15 13:41:39 -04:00
Joseph Doherty e9a84ba220 feat(deploy): surface connection-level changes in the deployment diff (#10)
ComputeConnectionsDiff existed with tests but was never called and ConfigurationDiff
had no slot for it, so standalone connection endpoint/protocol/failover drift never
appeared in the deployment diff (only per-attribute binding drift did). Add a
ConnectionChanges slot, wire ComputeConnectionsDiff into ComputeDiff, and render the
connection section in the deployment diff UI.
2026-06-15 13:36:40 -04:00
Joseph Doherty 41d828e38e fix(deploy): address M2.1 review nits — comparer consistency + comments (#22)
- connection-name capable-set comparer kept as StringComparer.Ordinal:
  FlatteningService and SemanticValidator use all-ordinal name-keyed
  dictionaries throughout; OrdinalIgnoreCase would be inconsistent with
  the rest of the binding-resolution path — added comment documenting this
- IsAlarmCapable protocol-match confirmed consistent with DataConnectionFactory
  (both OrdinalIgnoreCase); added case-insensitive InlineData variants
  (OPCUA, opcua, mxgateway, MXGATEWAY) to lock the contract
- clarified FlatteningPipeline comment: "filters connections by alarm-capable
  protocol, then collects their names" (was "maps from the protocol string")
- added DataConnectionLayer/DataConnectionFactory.cs path reference to
  AlarmCapableProtocols sync-risk comment
2026-06-15 13:27:26 -04:00
Joseph Doherty d6909207a8 fix(deploy): wire native-alarm-source capability validation into flattening pipeline (#22)
FlatteningPipeline loaded data connections but never passed the alarm-capable
connection set to SemanticValidator, so the native-alarm-source capability check
(built but inert) never ran — a source bound to a non-alarm-capable connection
deployed silently. Compute the capable set (IAlarmSubscribableConnection: OPC UA
+ MxGateway) and thread it through ValidationService to SemanticValidator.
2026-06-15 13:20:20 -04:00
Joseph Doherty 2fb608f1b5 fix(configdb): resync EF model snapshot to clear PendingModelChangesWarning (#32)
The actual drift was NOT OccurredAtUtc's converter (a same-CLR-type
DateTime->DateTime ValueConverter emits no snapshot annotation and never
triggers PendingModelChangesWarning). The real pending change was a HasData
seed row: SecurityConfiguration adds LdapGroupMapping Id=5 (SCADA-Viewers ->
Viewer) but the model snapshot omitted it, so MsSqlMigrationFixture's
MigrateAsync threw PendingModelChangesWarning and failed every fixture-backed
AuditLog MSSQL test (~57).

Generated via `dotnet ef migrations add`; Up/Down are seed-data DML only
(InsertData/DeleteData of the single reference row) -- no schema DDL. The
snapshot now carries the Id=5 seed and has-pending-model-changes is clean.
2026-06-15 13:13:22 -04:00
Joseph Doherty 28bc639786 docs(plan): M2 implementation plan — Tier-2 correctness/behavioral gaps
19 tasks (M2.0-M2.19) covering stillpending.md Tier-2 items #7,#8,#9,#10,
#13,#17,#18,#20-#31, plus pre-existing EF model/snapshot drift (#32, lead item).
Risk-first ordering; migration tasks serialized. Scope decisions recorded:
#19 done in M1.8; #16 deferred to M8; #17 reverts Host-008 per design doc;
#8 filter semantics defined; #15 LDAP re-query spike-gated.
2026-06-15 13:08:37 -04:00
Joseph Doherty 3d9f562368 Merge M1: stillpending.md Tier-1 runtime wiring
Closes the Tier-1 silent gaps from the stillpending.md audit (#3-#6):
- AuditLog 365-day purge actor + reconciliation self-heal now actually
  start and run on the central node (were dead code).
- SiteCall reconciliation pull (new PullSiteCalls RPC + plumbing) + daily
  terminal-row purge scheduler.
- Site Event Logging now emits all 5 previously-missing categories
  (alarm, deployment, instance_lifecycle, store_and_forward, notification,
  script started/completed).

14 commits, each implement->review->fix. Build 0/0; cluster verified
healthy with the new singletons starting cleanly (bash docker/deploy.sh).
2026-06-15 12:53:25 -04:00
Joseph Doherty e5534fddca fix(siteeventlog): suppress snapshot-resync alarm re-emit + coverage + hardening (review) 2026-06-15 12:45:00 -04:00
Joseph Doherty e74c3aef23 feat(siteeventlog): emit script started/completed Info events (M1.8)
ScriptExecutionActor previously emitted only an Error 'script' event on failure.
It now also fire-and-forgets an Info 'script' event when execution starts (right
before RunAsync) and when it completes successfully — giving the operational log
the full started/completed/failed lifecycle. Uses the already-resolved
siteEventLogger; fire-and-forget so the event log can never block or fault the
script's own run.

Extends the SingleServiceProvider test helper to also serve IServiceScopeFactory
(returning a self-scope) so ScriptExecutionActor's serviceProvider.CreateScope()
reaches the logging hot path in tests instead of throwing into the catch.
2026-06-15 12:33:31 -04:00
Joseph Doherty d8b5dbb386 feat(siteeventlog): emit store_and_forward + notification events (M1.7)
StoreAndForwardService gains an optional ISiteEventLogger? ctor param (default
null so the many direct-construction tests still compile) and, when wired,
mirrors its own buffer/retry/park activity onto site operational events via the
existing OnActivity hook (which already isolates a throwing subscriber, so a
failing event log can never be misclassified as a transient delivery failure):

- store_and_forward (ExternalSystem / CachedDbWrite): queued/retried/delivered/
  parked. Warning on buffer/retry, Error on park, Info on retry-recovery; an
  immediate-success delivery is the hot path and is not logged.
- notification (the site forward-to-central path): logged ONLY on forward
  FAILURE (buffered after the immediate forward threw) and on park, per the
  Component-SiteEventLogging spec — routine enqueue and forward-success are
  deliberately not logged (central's Notifications table is the audit record).

Wired through AddStoreAndForward (resolves ISiteEventLogger optionally from DI);
StoreAndForward project now references SiteEventLogging (acyclic: SiteEventLogging
references only Commons). Also documents the 'notification' category on the
ISiteEventLogger.LogEventAsync eventType param (folds in M1.8 doc fix).
2026-06-15 12:31:04 -04:00
Joseph Doherty 09b9e8f259 feat(siteeventlog): emit deployment + instance_lifecycle events (M1.6)
DeploymentManagerActor now fire-and-forgets a 'deployment' site operational
event on deploy/enable/disable/delete outcomes (Info on success, Error on
failure), source 'DeploymentManagerActor'. The disable/delete events are emitted
from the existing PipeTo continuations (safe: reads only the immutable
_serviceProvider and fire-and-forgets).

InstanceActor now emits an 'instance_lifecycle' Info event in PreStart (started)
and a new PostStop (stopped) — covering start/stop/enable/disable/redeploy/
failover transitions from the instance's own vantage point. Both actors already
hold _serviceProvider; no ctor change.

Resolution is optional and LogEventAsync is fire-and-forget so a logging failure
never affects the deployment pipeline or instance lifecycle.
2026-06-15 12:26:54 -04:00
Joseph Doherty a00e43c4f9 feat(siteeventlog): emit alarm-category events on alarm transitions (M1.5)
AlarmActor (computed) and NativeAlarmActor (native mirror) now fire-and-forget
an 'alarm' site operational event on every state transition:
- raise/activate: Error (priority/severity >= 700) or Warning
- clear/return-to-normal, ack, inter-band transition: Info

Both actors take a new optional IServiceProvider? ctor param (default null so
existing direct-construction tests still compile); InstanceActor passes its
_serviceProvider at the two Props.Create sites. Resolution is optional and the
LogEventAsync call is fire-and-forget, so a logging failure never affects alarm
evaluation. Rehydration replays are not re-logged.

Adds a capturing FakeSiteEventLogger test helper + SingleServiceProvider.
2026-06-15 12:23:04 -04:00
Joseph Doherty f49ac51771 fix(sitecallaudit): async DI scope in tick paths + options clamp tests + cursor/retry docs (review) 2026-06-15 12:10:54 -04:00
Joseph Doherty e675b34500 feat(sitecallaudit): daily terminal-row purge scheduler
Add a daily purge tick to SiteCallAuditActor that drops terminal SiteCalls
rows older than the retention window via ISiteCallAuditRepository.PurgeTerminalAsync.
The threshold is computed each tick as UtcNow - RetentionDays so an operator who
lowers RetentionDays sees it on the next purge without a restart. Mirrors
AuditLogPurgeActor's daily cadence + continue-on-error posture: a purge fault is
logged and swallowed so the central singleton stays alive and retries next tick.

The purge timer is started in PreStart alongside the reconciliation timer and
gates on the same collaborators (pull client + enumerator) being available — the
repo-only test ctor injects neither, so neither background timer runs there.

Options: PurgeInterval (default 24h, clamped >= 1 min so a zero config value
can't spin the scheduler) + RetentionDays (default 365), plus a test-only
override that bypasses the clamp for millisecond cadences.

Tests (all in-memory, no live MSSQL): purge tick calls PurgeTerminalAsync with a
UtcNow - RetentionDays threshold (non-default 30 days); default retention yields
a 365-day threshold; a throwing repo does not kill the singleton (a second tick
still arrives).
2026-06-15 12:03:49 -04:00
Joseph Doherty e427b38fb3 feat(sitecallaudit): periodic reconciliation pull back-fills lost telemetry
Add a periodic reconciliation tick to SiteCallAuditActor that, per site,
pulls changed SiteCall rows since a per-site UpdatedAtUtc cursor and upserts
them idempotently (monotonic UpsertAsync) — the documented self-heal for lost
best-effort gRPC telemetry. Mirrors SiteAuditReconciliationActor's structure
(per-site cursor, per-site try/catch failure isolation, advance cursor by max
observed UpdatedAtUtc) minus the stalled-detection EventStream machinery.

Dependency wiring: add an acyclic SiteCallAudit -> AuditLog project reference
and resolve IPullSiteCallsClient + ISiteEnumerator (central-only singletons
registered by AddAuditLogCentralReconciliationClient) from the IServiceProvider
the production ctor already holds — no Host Props.Create change needed. The
repo-only test ctor injects neither collaborator, so the tick is gated off
there. A new public test ctor injects fake client + enumerator + repo so the
tick is unit-testable in-memory (public, not internal: Akka's ActivatorProducer
uses public-only reflection binding).

Options: ReconciliationInterval (default 5 min, clamped >= 1s so a zero config
value can't spin the scheduler) + ReconciliationBatchSize (default 500), plus a
test-only override that bypasses the clamp for millisecond cadences.

Tests (all in-memory, no live MSSQL): absent row is upserted on a tick; second
tick advances the cursor past already-pulled rows; one failing site does not
sink other sites; repo-only ctor does not start the tick.
2026-06-15 12:01:22 -04:00
Joseph Doherty 6b0140dd62 fix(sitecallaudit): UpdatedAtUtc index + per-row pull resilience + UTC-convention + first-cycle test (review) 2026-06-15 10:47:25 -04:00
Joseph Doherty 963e3427da feat(sitecallaudit): PullSiteCalls reconciliation plumbing (store read + RPC + site handler + central client)
Site Call Audit (#22): build the documented periodic reconciliation PULL
self-heal path for the eventually-consistent central SiteCalls mirror, as a
dedicated PullSiteCalls gRPC RPC kept separate from the audit pull. This is the
pull PLUMBING only; the central reconciliation tick is a separate follow-up.

- IOperationTrackingStore.ReadChangedSinceAsync(sinceUtc, batchSize): inclusive
  UpdatedAtUtc cursor, oldest-first, batch-capped; SQLite impl projects tracking
  rows onto SiteCallOperational (Kind->Channel, TargetSummary->Target, SourceSite
  left empty - the store has no site-id column).
- sitestream.proto: rpc PullSiteCalls + PullSiteCallsRequest/Response, mirroring
  PullAuditEvents; regenerated checked-in SiteStreamGrpc/*.cs.
- SiteCallDtoMapper.ToDto(SiteCallOperational): inverse of FromDto for the handler.
- SiteStreamGrpcServer.PullSiteCalls handler + SetOperationTrackingStore seam;
  Host wires the seam alongside SetSiteAuditQueue (site roles only).
- Central IPullSiteCallsClient + GrpcPullSiteCallsClient (home: AuditLog/Central to
  reuse ISiteEnumerator; SiteCallAudit does not reference AuditLog). Re-stamps
  SourceSite from the dialed siteId; no-throw on tolerable transport faults;
  SpecifyKind (not ToUniversalTime) cursor handling. Central-only DI registration.

Tests: ReadChangedSinceAsync (4), PullSiteCalls handler (6), GrpcPullSiteCallsClient
(8). Full solution build 0 warnings/0 errors (TreatWarningsAsErrors).
2026-06-15 10:39:06 -04:00
Joseph Doherty c092e89fd1 fix(audit): robust central options binding + interval clamps + doc/contract fixes (review) 2026-06-15 10:11:49 -04:00
Joseph Doherty 36a08a4145 feat(audit): start purge + reconciliation singletons; production ISiteEnumerator 2026-06-15 10:00:44 -04:00
Joseph Doherty d03c2af9a1 fix(audit): race-safe channel cache + UTC-kind cursor handling in gRPC pull client (review) 2026-06-15 09:49:43 -04:00
Joseph Doherty 2adc5767da feat(audit): production gRPC IPullAuditEventsClient for site reconciliation 2026-06-15 09:41:13 -04:00
Joseph Doherty 9aa1259504 docs(plans): Phase 1 (M1-M4) implementation plan for stillpending.md
Bite-sized TDD plan. M1 (runtime wiring) fully detailed across 10 tasks
after verifying the purge/reconciliation actors already exist and only
need Host wiring + a gRPC pull client + event-logger injection. M2/M3/M4
as right-sized task inventories with files, classification, and AC.
Co-located .tasks.json for executing-plans resume.
2026-06-15 09:32:14 -04:00
Joseph Doherty f4707745bf docs(plans): completion roadmap for stillpending.md audit
Add the system-completion design doc (risk-first milestones M1-M10):
Phase 1 Stabilize (M1 runtime wiring, M2 correctness, M3 script trust
boundary, M4 doc reconciliation) then Phase 2 Expand (M5-M10 feature
epics). Scope = all Tier 1/2/4 + in-scope Tier 3 features; T12/T19
deferred to own brainstorm; deliberate anti-goals excluded. Also commit
the source audit (stillpending.md).
2026-06-15 09:27:00 -04:00
Joseph Doherty 4584b8e483 build: bump MxGateway driver (Client + Contracts) 0.1.0 → 0.1.1
The MxAccess Gateway .NET driver was republished at 0.1.1. Update both
ZB.MOM.WW.MxGateway.Client and ZB.MOM.WW.MxGateway.Contracts package
versions in central package management. Build is clean (0 errors/warnings),
API-compatible — no code changes required. Local docker cluster rebuilt
and redeployed (scadabridge:latest), all 8 nodes + Traefik healthy.
2026-06-15 05:54:03 -04:00
Joseph Doherty 68f911e634 docs: note alarmOverrides in GetInstanceDocumentAsync; mark template-alarm/override plan complete 2026-06-07 10:31:02 -04:00
Joseph Doherty 5bc8dbad31 test(playwright): Notification hygiene — scoped pager locator, next-enabled re-assert, role-mapping-delete doc note 2026-06-07 10:22:54 -04:00
Joseph Doherty d3adf8c2e4 test(playwright): SiteCalls hygiene — site-a seeds where grid-visible, scoped pager locator, next-enabled re-assert 2026-06-07 10:20:33 -04:00
Joseph Doherty f78086334f test(playwright): InstanceConfigure alarm-override set-priority/clear round-trip; drop stale TODO 2026-06-07 10:17:27 -04:00
Joseph Doherty c3d7d8a6a4 test(playwright): provision a HiLo alarm in InstanceConfigureFixture (via typed CLI flags) 2026-06-07 10:13:45 -04:00
Joseph Doherty bc8960779b feat(ui): add data-test hooks to InstanceConfigure alarm-override section 2026-06-07 10:10:50 -04:00
Joseph Doherty c84eb5aeef docs(cli): note intentional omission of HiLo per-setpoint priorities/deadbands/messages (review fix) 2026-06-07 10:06:58 -04:00
Joseph Doherty f0b144ebda test(playwright): CliRunner AddAlarm + alarm-override-delete helpers + round-trip (typed flags) 2026-06-07 10:05:32 -04:00
Joseph Doherty bbc3804d07 feat(cli): typed setpoint flags for template alarm add (serializes trigger-config JSON) 2026-06-07 10:02:51 -04:00
Joseph Doherty 9d7e69056a docs(plans): add template-alarm CLI + alarm-override coverage implementation plan 2026-06-07 10:00:48 -04:00
Joseph Doherty 475bfadacd docs(plans): design for template-alarm CLI ergonomics + alarm-override coverage 2026-06-07 09:53:34 -04:00
Joseph Doherty fdea9e0bde docs(plans): mark Wave 4 tasks complete 2026-06-07 04:33:16 -04:00
Joseph Doherty 70e84a7b79 test(playwright): seed inside try in Notification filter/modal tests for guaranteed cleanup (review fix) 2026-06-07 04:32:16 -04:00
Joseph Doherty b1d7497463 test(playwright): seed inside try in Notification stuck/pagination tests for guaranteed cleanup (review fix) 2026-06-07 04:23:33 -04:00
Joseph Doherty 99a69c1fba test(playwright): Notification Report stuck-only + pagination edge cases (Wave 4) 2026-06-07 04:18:04 -04:00
Joseph Doherty 5774b30d0d test(playwright): scope Notification detail-modal title selector to the open modal (review fix) 2026-06-07 04:15:25 -04:00
Joseph Doherty 42f38996a9 test(playwright): Notification Report filter-combo + detail-modal edge cases (Wave 4) 2026-06-07 04:12:02 -04:00
Joseph Doherty e36adf8acd test(playwright): gate Audit non-API cURL assertion on rendered drawer body (review fix) 2026-06-07 04:09:24 -04:00
Joseph Doherty 3b71ac220a test(playwright): Site Calls keyset pagination edge case (Wave 4) 2026-06-07 04:06:08 -04:00
Joseph Doherty f5535ad5c1 test(playwright): Audit Log non-API-no-cURL + drawer-close edge cases (Wave 4) 2026-06-07 04:02:01 -04:00
Joseph Doherty eea68b97f6 test(playwright): Site Calls status-filter + empty-state edge cases (Wave 4) 2026-06-07 03:58:52 -04:00
Joseph Doherty 79778e12b7 test(playwright): Audit Log filter-combination + empty-state edge cases (Wave 4) 2026-06-07 03:55:34 -04:00
Joseph Doherty 0efbb66bc3 test(playwright): LDAP missing-field + duplicate-group edge cases (Wave 4) 2026-06-07 03:50:05 -04:00
Joseph Doherty 8419eb0d86 test(playwright): Templates edit-attribute + delete-blocked-by-instance edge cases (Wave 4) 2026-06-07 03:47:04 -04:00
Joseph Doherty 3e57c6b054 test(playwright): drop inert defensive teardown in Sites dup-identifier test (review fix) 2026-06-07 03:43:49 -04:00