Commit Graph

20 Commits

Author SHA1 Message Date
Joseph Doherty e3519fdb39 feat(sitecallaudit): query, KPI and detail backend for the Site Calls page 2026-05-21 04:14:49 -04:00
Joseph Doherty 943c2ced39 feat(ui): Audit KPI tiles on Health dashboard (#23 M7)
Adds three KPI tiles to the central Health dashboard for the Audit channel:
volume (rows in the last hour), error rate (Failed/Parked/Discarded over
total), and backlog (sum of SiteAuditBacklog.PendingCount across all sites).

Repo + service:
- IAuditLogRepository.GetKpiSnapshotAsync(window, nowUtc) — single aggregate
  SELECT over the trailing window returning total + error counts; nowUtc is
  optional for production callers and pinned by integration tests against the
  shared MSSQL fixture so the global counts are deterministic.
- AuditLogQueryService.GetKpiSnapshotAsync() — composes the repo aggregate
  with a sum of SiteAuditBacklog.PendingCount read from ICentralHealthAggregator.
- AuditLogKpiSnapshot record in Commons/Types/.

UI:
- New AuditKpiTiles Blazor component (Components/Health/) — three Bootstrap
  card-tiles, click navigates to /audit/log with the matching pre-filter.
- Health.razor wires the tiles in alongside the existing Notification Outbox
  KPIs; LoadAuditKpis() runs on every 10s refresh tick and degrades to em
  dashes + inline error if the query fails.
- AuditLogPage extended to parse ?status= so the error-rate tile drill-in
  (?status=Failed) auto-loads the grid.

Tests:
- AuditLogRepositoryTests: GetKpiSnapshotAsync mixed-status + empty-window
  cases against the MSSQL migration fixture.
- AuditLogQueryServiceTests: forwarding + backlog composition; sites with
  null SiteAuditBacklog contribute zero.
- AuditKpiTilesTests: 9 bUnit tests covering tile render, error-rate maths
  with safe zero-events handling, em-dash unavailable path, click-through
  navigation, and warning/danger border thresholds.
- HealthPageTests: new Renders_AuditKpiTiles_WithValues plus IAuditLogQueryService
  stub registration in the constructor so existing outbox tests still pass.
- AuditLogPageScaffoldTests: ?status=Failed auto-load + unknown status drop.
2026-05-20 20:43:57 -04:00
Joseph Doherty 660fdc4e93 feat(auditlog): AuditLogPurgeActor daily partition-switch purge (#23 M6)
Central singleton (M6-T4 Bundle C) that drives the daily AuditLog partition
purge. On a configurable timer (default 24 hours) the actor:
  1. Queries IAuditLogRepository.GetPartitionBoundariesOlderThanAsync for
     monthly boundaries whose latest OccurredAtUtc is older than
     DateTime.UtcNow - AuditLogOptions.RetentionDays.
  2. For each eligible boundary calls SwitchOutPartitionAsync, which runs
     the drop-and-rebuild dance around UX_AuditLog_EventId.
  3. Publishes AuditLogPurgedEvent(boundary, rowsDeleted, durationMs) on
     the actor-system EventStream so the Bundle E central health collector
     and ops surfaces can subscribe without coupling to this actor.

Co-changes:
* SwitchOutPartitionAsync returns long (rows deleted) — sampled BEFORE the
  switch via COUNT_BIG over the per-partition  filter so the count
  reflects what the switch removed, not a post-purge scan of a table that
  no longer exists. All stub implementations updated.
* AuditLogPurgeOptions: IntervalHours (default 24), IntervalOverride for
  tests, Interval property resolving either.
* AuditLogPurgedEvent: record with MonthBoundary, RowsDeleted, DurationMs.

Behavior:
* Continue-on-error per boundary — one partition that throws does NOT
  abandon the rest of the tick.
* DI scope opened per tick (IAuditLogRepository is a SCOPED EF Core
  service); mirrors SiteAuditReconciliationActor and AuditLogIngestActor.
* SupervisorStrategy Resume keeps the singleton alive across leaked
  exceptions.
* EventStream capture BEFORE the first await — Context is unsafe after
  await in async receive handlers (same pattern as Sender-capture in
  AuditLogIngestActor.OnIngestAsync).

Tests:
* Tick_Fires_OnDailyInterval — visible timer side effect.
* Tick_OldPartitions_SwitchedOut — both seeded boundaries purged.
* Tick_NewerPartitions_Untouched — empty enumerator → no switches.
* Tick_PublishesPurgedEvent_WithRowCount — AuditLogPurgedEvent carries
  RowsDeleted and DurationMs.
* Tick_SwitchThrows_OtherPartitionsStillProcessed — continue-on-error.
* Threshold_UsesAuditLogOptionsRetentionDays — non-default 30-day window
  computed from UtcNow - RetentionDays.
* EndToEnd_RealPartition_RowsRemoved_PurgedEventPublished — TestKit +
  MsSqlMigrationFixture: real partitioned table, Jan-2026 row purged,
  Apr-2026 row kept, AuditLogPurgedEvent observed via probe.
2026-05-20 18:36:31 -04:00
Joseph Doherty 6069a20e0f fix(configdb): replace SwitchOutPartitionAsync stub with drop-and-rebuild dance (#23 M6)
Replaces M1's NotSupportedException stub with the production drop-DROP-INDEX
→ CREATE-staging → SWITCH PARTITION → DROP-staging → CREATE-INDEX dance
documented in alog.md §4. UX_AuditLog_EventId is intentionally non-aligned
with ps_AuditLog_Month so single-column EventId uniqueness can be enforced
cheaply for InsertIfNotExistsAsync; SQL Server rejects ALTER TABLE SWITCH
while a non-aligned unique index is present, so the implementation drops
it, switches the partition data into a GUID-suffixed staging table on
[PRIMARY], drops staging (discarding the rows), and rebuilds the unique
index — all inside an explicit transaction with a CATCH that guarantees
the unique index is rebuilt regardless of failure point.

Also adds GetPartitionBoundariesOlderThanAsync to IAuditLogRepository: a
CROSS APPLY over sys.partition_range_values + per-partition MAX(OccurredAtUtc)
to enumerate retention-eligible months for the M6 purge actor (next commit).

Tests verify:
* Old partition's rows are removed; other months untouched
* UX_AuditLog_EventId is rebuilt after a successful switch
* InsertIfNotExistsAsync's first-write-wins idempotency still holds after switch
* On engineered SWITCH failure (inbound FK from a probe table), SqlException
  propagates AND UX_AuditLog_EventId is still present (CATCH branch ran)
* GetPartitionBoundariesOlderThanAsync returns only boundaries whose partition's
  MAX(OccurredAtUtc) is strictly older than the threshold; empty partitions
  excluded
2026-05-20 18:20:55 -04:00
Joseph Doherty bedfa6b8f3 feat(configdb): ISiteCallAuditRepository + EF impl, monotonic upsert (#22, #23 M3)
Bundle B3 of Audit Log #23 M3: data-access layer for the central SiteCalls
table introduced in B1+B2. UpsertAsync is insert-if-not-exists then
monotonic-status update so out-of-order telemetry, duplicate gRPC packets,
and reconciliation pulls all converge on the same row without rolling
state backward.

- src/ScadaLink.Commons/Interfaces/Repositories/ISiteCallAuditRepository.cs:
  UpsertAsync (monotonic), GetAsync, QueryAsync, PurgeTerminalAsync.
- src/ScadaLink.Commons/Types/Audit/SiteCallQueryFilter.cs +
  SiteCallPaging.cs: filter (Channel/SourceSite/Status/Target/time range)
  and keyset paging cursor on (CreatedAtUtc DESC, TrackedOperationId DESC),
  mirrored on M1's AuditLog* equivalents.
- src/ScadaLink.ConfigurationDatabase/Repositories/SiteCallAuditRepository.cs:
  raw-SQL InsertIfNotExists + conditional UPDATE with inline CASE rank
  compare (Submitted=0, Forwarded=1, Attempted/Skipped=2, terminal=3 —
  terminal statuses are mutually exclusive so e.g. Delivered cannot
  overwrite Parked). Duplicate-key violations (SQL 2601/2627) are
  swallowed at Debug, identical to AuditLogRepository's race-fix.
  QueryAsync uses FromSqlInterpolated because EF Core 10 cannot translate
  string.Compare against the value-converted TrackedOperationId column
  inside an expression tree.
- ServiceCollectionExtensions wires the repository (scoped, after
  IAuditLogRepository).
- 12 integration tests in tests/ScadaLink.ConfigurationDatabase.Tests/
  Repositories/ (MsSqlMigrationFixture + [SkippableFact]): fresh insert,
  monotonic advance, older-status no-op, same-status no-op,
  terminal-over-terminal no-op, 50-way concurrent-insert race produces
  exactly one row, Get known/unknown, filter by site, keyset paging no
  overlap, purge terminal-and-old, purge keeps non-terminal-and-recent.
2026-05-20 14:10:24 -04:00
Joseph Doherty db32a149d3 feat(commons): add IAuditLogRepository + AuditLogQueryFilter + AuditLogPaging (#23)
Append-only data-access surface for the central AuditLog table — three
methods: InsertIfNotExistsAsync (first-write-wins on EventId), QueryAsync
(filter + keyset paging on (OccurredAtUtc desc, EventId desc)), and
SwitchOutPartitionAsync (M1 honest contract — throws NotSupported until
M6 lands the non-aligned-index drop/rebuild dance for the partition
switch). No Update, no row-delete; bulk purge is partition-only.

Bundle D of the Audit Log #23 M1 Foundation plan.
2026-05-20 11:04:59 -04:00
Joseph Doherty 67b86aa683 feat(notification-outbox): per-site KPI snapshot type + repository contract 2026-05-19 05:22:45 -04:00
Joseph Doherty 07cd185368 refactor(notification-outbox): align outbox repository with cancellationToken convention 2026-05-19 01:05:52 -04:00
Joseph Doherty 2c59d59b61 feat(notification-outbox): add NotificationOutbox repository 2026-05-19 01:02:06 -04:00
Joseph Doherty a55502254e fix(external-system-gateway): resolve ExternalSystemGateway-011 — name-keyed repository lookups replace fetch-all-then-filter on the call hot path 2026-05-17 00:02:45 -04:00
Joseph Doherty 1d5465f31c fix(deployment): instance delete fully removes the record
Deleting an instance only undeployed it from the site and set the state
to NotDeployed, leaving an orphan record that could never be removed —
the state-transition matrix rejected delete from NotDeployed.

Delete now removes the instance record entirely (deployment history,
snapshot, attribute/alarm overrides, and connection bindings go with
it), and is permitted from any state.
2026-05-15 12:05:13 -04:00
Joseph Doherty 751248feb6 feat(alarms): HiLo trigger type with per-band level, hysteresis, messages, overrides
Adds a new HiLo alarm trigger type with four configurable setpoints
(LoLo / Lo / Hi / HiHi). Each setpoint carries an optional priority,
deadband (for hysteresis), and operator message. The site runtime emits
AlarmStateChanged with an AlarmLevel field so consumers can differentiate
warning vs critical bands.

Plumbing:
  - new AlarmLevel enum + AlarmStateChanged.Level/Message init properties
  - AlarmTriggerEditor (Blazor) gets a HiLo render with severity tinting
  - AlarmTriggerConfigCodec extracted from the editor for testability
  - sitestream.proto carries level + message over gRPC
  - SemanticValidator enforces numeric attribute, setpoint ordering,
    non-negative deadband
  - on-trigger scripts get an Alarm global (Name/Level/Priority/Message)
    so notification routing can branch by severity
  - per-instance InstanceAlarmOverride entity + EF migration + flattening
    step + CLI commands; HiLo overrides merge setpoint-by-setpoint, binary
    types whole-replace
  - DebugView shows a Level badge + per-band message tooltip
  - App.razor auto-reloads on permanent Blazor circuit failure
  - docker/regen-proto.sh automates the proto regen workflow (the linux/arm64
    protoc segfault means generated files are checked in for now)
2026-05-13 03:23:32 -04:00
Joseph Doherty 0139c9ca83 refactor(scripts): scoped parent query + parent picker for multi-parent templates
Two caveats from the script-scope rollout addressed:

1. ITemplateEngineRepository.GetTemplatesComposingAsync — a scoped
   query that returns only the templates referencing a given template
   via Compositions, eager-loaded with their Attributes / Scripts /
   Compositions. Replaces the GetAllTemplatesAsync + filter pattern
   in TemplateEdit so the Monaco metadata fetch doesn't pull the
   entire template catalog to find one parent.

2. Multi-parent picker. The previous implementation suppressed Parent
   assistance entirely when more than one template composes the open
   one. Now TemplateEdit collects every parent into _editorParents
   and renders a small `select` above the script editor when there
   are >1, letting the user choose which parent's metadata drives
   Parent.Attributes / Parent.CallScript completion + diagnostics.
   Single-parent templates skip the picker (no UI change). Zero
   parents (root template) hide the picker and surface no Parent
   assistance.

Browser-verified on the Sensor Module template (composed by both Pump
and Variable Speed Motor): picker shows both options, switching
updates the editor's parent metadata immediately via the existing
GetContext callback.

Test counts unchanged (159 / 199); the new repo method is exercised
end-to-end by the parent-picker browser path.
2026-05-12 06:00:02 -04:00
Joseph Doherty 4b1077d686 feat(repo): add TemplateFolder repository methods 2026-05-11 10:45:20 -04:00
Joseph Doherty 970d0a5cb3 refactor: simplify data connections from many-to-many site assignment to direct site ownership
Replace SiteDataConnectionAssignment join table with a direct SiteId FK on DataConnection,
simplifying the data model, repositories, UI, CLI, and deployment service.
2026-03-21 21:07:10 -04:00
Joseph Doherty 4879c4e01e Fix auth, Bootstrap, Blazor nav, LDAP, and deployment pipeline for working Central UI
Bootstrap served locally with absolute paths and <base href="/">.
LDAP auth uses search-then-bind with service account for GLAuth compatibility.
CookieAuthenticationStateProvider reads HttpContext.User instead of parsing JWT.
Login/logout forms opt out of Blazor enhanced nav (data-enhance="false").
Nav links use absolute paths; seed data includes Design/Deployment group mappings.
DataConnections page loads all connections (not just site-assigned).
Site appsettings configured for Test Plant A; Site registers with Central on startup.
DeploymentService resolves string site identifier for Akka routing.
Instances page gains Create Instance form.
2026-03-17 10:03:06 -04:00
Joseph Doherty 6ea38faa6f Phase 3C: Deployment pipeline & Store-and-Forward engine
Deployment Manager (WP-1–8, WP-16):
- DeploymentService: full pipeline (flatten→validate→send→track→audit)
- OperationLockManager: per-instance concurrency control
- StateTransitionValidator: Enabled/Disabled/NotDeployed transition matrix
- ArtifactDeploymentService: broadcast to all sites with per-site results
- Deployment identity (GUID + revision hash), idempotency, staleness detection
- Instance lifecycle commands (disable/enable/delete) with deduplication

Store-and-Forward (WP-9–15):
- StoreAndForwardStorage: SQLite persistence, 3 categories, no max buffer
- StoreAndForwardService: fixed-interval retry, transient-only buffering, parking
- ReplicationService: async best-effort to standby (fire-and-forget)
- Parked message management (query/retry/discard from central)
- Messages survive instance deletion, S&F drains on disable

620 tests pass (+79 new), zero warnings.
2026-03-16 21:27:18 -04:00
Joseph Doherty faef2d0de6 Phase 2 WP-1–13+23: Template Engine CRUD, composition, overrides, locking, collision detection, acyclicity
- WP-23: ITemplateEngineRepository full EF Core implementation
- WP-1: Template CRUD with deletion constraints (instances, children, compositions)
- WP-2–4: Attribute, alarm, script definitions with lock flags and override granularity
- WP-5: Shared script CRUD with syntax validation
- WP-6–7: Composition with recursive nesting and canonical naming
- WP-8–11: Override granularity, locking rules, inheritance/composition scope
- WP-12: Naming collision detection on canonical names (recursive)
- WP-13: Graph acyclicity (inheritance + composition cycles)
Core services: TemplateService, SharedScriptService, TemplateResolver,
LockEnforcer, CollisionDetector, CycleDetector. 358 tests pass.
2026-03-16 20:10:34 -04:00
Joseph Doherty cafb7d2006 Phase 1 WP-2–10: Repositories, audit service, security & auth (LDAP, JWT, roles, policies, data protection)
- WP-2: SecurityRepository + CentralUiRepository with audit log queries
- WP-3: AuditService with transactional guarantee (same SaveChangesAsync)
- WP-4: Optimistic concurrency tests (deployment records vs template last-write-wins)
- WP-5: Seed data (SCADA-Admins → Admin role mapping)
- WP-6: LdapAuthService (direct bind, TLS enforcement, group query)
- WP-7: JwtTokenService (HMAC-SHA256, 15-min refresh, 30-min idle timeout)
- WP-8: RoleMapper (LDAP groups → roles with site-scoped deployment)
- WP-9: Authorization policies (Admin/Design/Deployment + site scope handler)
- WP-10: Shared Data Protection keys via EF Core
141 tests pass, zero warnings.
2026-03-16 19:32:43 -04:00
Joseph Doherty 22e1eba58a Phase 0 WP-0.2–0.9: Implement Commons (types, entities, interfaces, messages, protocol, tests)
- WP-0.2: Namespace/folder skeleton (26 directories)
- WP-0.3: Shared data types (6 enums, RetryPolicy, Result<T>)
- WP-0.4: 24 domain entity POCOs across 10 domain areas
- WP-0.5: 7 repository interfaces with full CRUD signatures
- WP-0.6: IAuditService cross-cutting interface
- WP-0.7: 26 message contract records across 8 concern areas
- WP-0.8: IDataConnection protocol abstraction with batch ops
- WP-0.9: 8 architectural constraint enforcement tests
All 40 tests pass, zero warnings.
2026-03-16 18:48:24 -04:00