Merge branch 'feature/audit-log-m2-site-sync-pipeline': Audit Log #23 M2 Site Pipeline (sync-only)
M2 ships the first end-to-end audit emission. A script-initiated ExternalSystem.Call() produces one ApiOutbound/ApiCall row in the central AuditLog table via site SQLite hot-path + gRPC telemetry push + central ingest actor. Audit-write failures NEVER abort the script. Shipped (13 commits): - Race-fix + tiebreaker: InsertIfNotExistsAsync swallows duplicate-key races (SqlException 2601/2627); same-OccurredAt keyset test added. - Site SQLite writer: SqliteAuditWriter (Channel<T> + background batch inserter, sub-ms enqueue) + RingBufferFallback (1024-cap drop-oldest) + FallbackAuditWriter composing primary+ring+failure counter. - gRPC layer: IngestAuditEvents unary RPC + AuditEventDto on sitestream.proto; AuditEventMapper for AuditEvent <-> Dto round-trip (ForwardState site-only, IngestedAtUtc central-only). - Actors: SiteAuditTelemetryActor (per-site, dedicated dispatcher, drain loop with 5s busy / 30s idle cadence); AuditLogIngestActor (central singleton, scope-per-message via IServiceProvider ctor for scoped repository, idempotent acks). - Host wiring: cluster singleton + proxy on central, per-site actor bound to audit-telemetry-dispatcher (ForkJoin, 2 threads). NoOp ISiteStreamAuditClient registered as production default; real site->central gRPC client deferred to M6 (orthogonal to M3). - ESG emission: ScriptRuntimeContext.ExternalSystem.Call wraps ExternalSystemClient.CallAsync; emits one ApiOutbound/ApiCall row per call with provenance from context (SourceSiteId/Instance/Script). Three nested fail-safe layers ensure audit failure never aborts script. - Health metric: SiteAuditWriteFailures counter + Interlocked.Increment, exposed in SiteHealthReport; HealthMetricsAuditWriteFailureCounter bridge swaps the NoOp default when both AddHealthMonitoring + AddAuditLog are registered. - E2E: component-level test using TestKit + MsSqlMigrationFixture + DirectActorSiteStreamAuditClient stub. Verifies push, retry, and duplicate-collapse in <15s. Tests: full solution dotnet test ScadaLink.slnx green (one isolated SiteRuntime sandbox-timeout flake is pre-existing and not M2-related). ~80 net new tests across Commons.Tests / ConfigDb.Tests / Communication.Tests / HealthMonitoring.Tests / AuditLog.Tests / SiteRuntime.Tests / Host.Tests. Strict invariants honored: infra/* never touched on any branch commit; no push to origin; explicit git add throughout; alog.md unchanged (vocabulary correction from M1 stands).
This commit is contained in:
408
docs/plans/2026-05-20-auditlog-m2-site-sync-pipeline.md
Normal file
408
docs/plans/2026-05-20-auditlog-m2-site-sync-pipeline.md
Normal file
@@ -0,0 +1,408 @@
|
||||
# Audit Log #23 — M2 Site Pipeline (sync-only) Implementation Plan
|
||||
|
||||
> **For Claude:** REQUIRED SUB-SKILL: Use superpowers-extended-cc:subagent-driven-development (bundled cadence per `feedback_subagent_cadence`).
|
||||
|
||||
**Goal:** First end-to-end audit emission. A script-initiated `ExternalSystem.Call()` produces exactly one `ApiOutbound`/`ApiCall` row in the central `AuditLog` table via site SQLite hot-path + gRPC push telemetry + central ingest actor. Audit-write failures NEVER abort the script.
|
||||
|
||||
**Architecture (decisions locked):**
|
||||
- Provenance: **Wrap CallAsync in ScriptRuntimeContext** — IExternalSystemClient.CallAsync signature unchanged; ScriptRuntimeContext.ExternalSystem.Call captures instance/script/site and emits the AuditEvent via IAuditWriter.
|
||||
- Direction: **Push primary** — SiteAuditTelemetryActor batches Pending rows and pushes via a new `IngestAuditEvents` unary gRPC RPC on `sitestream.proto`. Pull (reconciliation) deferred to M6.
|
||||
- E2E: **Component-level test** via TestKit + MSSQL fixture; stubbed gRPC client forwards directly to the central ingest actor. No expansion of `ScadaLinkWebApplicationFactory`.
|
||||
- Site writer: **Mirror SiteEventLogger** — `Channel<PendingAuditEvent>` + background writer Task for sub-ms enqueue durability.
|
||||
|
||||
**M1 realities baked in:**
|
||||
- Enum vocabulary: `AuditKind.ApiCall` for sync API call; `AuditStatus.Delivered` for success, `AuditStatus.Failed` for HTTP non-2xx (permanent OR transient → both Failed for a sync call; cached path differs in M3). The "Status=Success/TransientFailure/PermanentFailure" wording in the roadmap is stale and must be replaced with the new vocabulary.
|
||||
- `AuditLogRepository.InsertIfNotExistsAsync` race window — M2 is the first concurrent writer; harden it before AuditLogIngestActor lands.
|
||||
- Keyset tiebreaker test gap from Bundle D — add a same-OccurredAt test in M2.
|
||||
- `MsSqlMigrationFixture` reusable as-is; promoted to `[CollectionDefinition]`-shared if multiple test classes need it (defer until actually needed).
|
||||
- `Xunit.SkippableFact` + `Skip.IfNot(_fixture.Available, _fixture.SkipReason)` for any MSSQL-dependent tests.
|
||||
- `ScadaLink.AuditLog/Site/` and `ScadaLink.AuditLog/Central/` and `ScadaLink.AuditLog/Telemetry/` subfolders. DI extension `AddAuditLog` is the registration point.
|
||||
|
||||
**Tech stack additions:**
|
||||
- `Microsoft.Data.Sqlite 10.0.7` (pinned).
|
||||
- `Akka.TestKit.Xunit2 1.5.62` (pinned).
|
||||
- `Grpc.Tools` already configured in `ScadaLink.Communication.csproj`.
|
||||
|
||||
---
|
||||
|
||||
## Bundles
|
||||
|
||||
- **Bundle A — Repo race-fix + tiebreaker test** (M1 realities catch-up).
|
||||
- **Bundle B — Site SQLite writer + fallback** (M2-T1, T2, T3, T4).
|
||||
- **Bundle C — gRPC proto + mapper** (M2-T5, T6).
|
||||
- **Bundle D — Telemetry actor + ingest actor + gRPC handler** (M2-T7, T8).
|
||||
- **Bundle E — Host wiring** (M2-T9).
|
||||
- **Bundle F — ESG emission via ScriptRuntimeContext wrapper** (M2-T10).
|
||||
- **Bundle G — Health metric SiteAuditWriteFailures** (M2-T11).
|
||||
- **Bundle H — Component-level integration test** (M2-T12).
|
||||
|
||||
Final cross-bundle reviewer pass, then merge + roadmap update.
|
||||
|
||||
---
|
||||
|
||||
## Bundle A — Repo race-fix + keyset tiebreaker test
|
||||
|
||||
### Task A1: Harden `InsertIfNotExistsAsync` against duplicate-key race
|
||||
|
||||
**Files:**
|
||||
- Modify: `src/ScadaLink.ConfigurationDatabase/Repositories/AuditLogRepository.cs:30-60` — wrap the `ExecuteSqlInterpolatedAsync` call in a `try/catch Microsoft.Data.SqlClient.SqlException` that swallows error numbers 2601 and 2627 (unique-index violation on `UX_AuditLog_EventId`) and logs at Debug. Other SqlExceptions rethrow.
|
||||
- Modify: `tests/ScadaLink.ConfigurationDatabase.Tests/Repositories/AuditLogRepositoryTests.cs` — add:
|
||||
- `InsertIfNotExistsAsync_ConcurrentDuplicateInserts_ProduceExactlyOneRow` — fire 50 parallel `InsertIfNotExistsAsync` calls with the same `EventId`, assert row count = 1 and no exception escapes.
|
||||
- `QueryAsync_Keyset_SameOccurredAtUtc_TiebreaksOnEventId` — Bundle D reviewer's deferred recommendation. Insert 4 rows with identical OccurredAtUtc but distinct EventIds; page through them with PageSize=2; assert no overlap, correct count, and that the second page's first row's EventId is strictly less than the first page's last row's EventId.
|
||||
|
||||
**Steps:**
|
||||
1. Write failing concurrency test.
|
||||
2. Run: expect SqlException 2601/2627 OR identical-row-count violation.
|
||||
3. Add try/catch in the repo.
|
||||
4. Run: pass.
|
||||
5. Write failing keyset-tiebreaker test.
|
||||
6. Run: depending on EF Core 10's Guid.CompareTo translation, this may already pass — confirm.
|
||||
7. If passing, the test locks in the behavior; commit anyway.
|
||||
8. Commit: `fix(configdb): InsertIfNotExistsAsync swallows duplicate-key races + add keyset tiebreaker test (#23)`.
|
||||
|
||||
**Bundle A acceptance:** All ConfigurationDatabase.Tests still green; 2 new tests pass.
|
||||
|
||||
---
|
||||
|
||||
## Bundle B — Site SQLite writer + fallback (mirror SiteEventLogger pattern)
|
||||
|
||||
### Task B1: `SqliteAuditWriter` — schema + connection bootstrap
|
||||
|
||||
**Files:**
|
||||
- Create: `src/ScadaLink.AuditLog/Site/SqliteAuditWriter.cs` — implements `IAuditWriter` per Bundle A's signature (single `Task WriteAsync(AuditEvent evt, CancellationToken ct = default)`). Constructor takes `IOptions<SqliteAuditWriterOptions>` + `ILogger`. Single `SqliteConnection` opened at construction (`Data Source={path};Cache=Shared`). Sync `_writeLock` Monitor-pattern (mirrors `SiteEventLogger.cs:32`). Inline `InitializeSchema()` runs `PRAGMA auto_vacuum = INCREMENTAL` + `CREATE TABLE IF NOT EXISTS AuditLog (...)`.
|
||||
- Create: `src/ScadaLink.AuditLog/Site/SqliteAuditWriterOptions.cs` — `string DatabasePath = "auditlog.db"`, `int ChannelCapacity = 4096` (bounded; drop-oldest applies in Bundle B-T3 ring overflow, but the writer's pending channel is bounded as a safety net), `int BatchSize = 256`, `int FlushIntervalMs = 50`.
|
||||
- Create: `tests/ScadaLink.AuditLog.Tests/Site/SqliteAuditWriterSchemaTests.cs`.
|
||||
|
||||
**Schema (20 site columns + ForwardState — IngestedAtUtc is central-only):**
|
||||
|
||||
```sql
|
||||
CREATE TABLE IF NOT EXISTS AuditLog (
|
||||
EventId TEXT NOT NULL,
|
||||
OccurredAtUtc TEXT NOT NULL,
|
||||
Channel TEXT NOT NULL,
|
||||
Kind TEXT NOT NULL,
|
||||
CorrelationId TEXT NULL,
|
||||
SourceSiteId TEXT NULL,
|
||||
SourceInstanceId TEXT NULL,
|
||||
SourceScript TEXT NULL,
|
||||
Actor TEXT NULL,
|
||||
Target TEXT NULL,
|
||||
Status TEXT NOT NULL,
|
||||
HttpStatus INTEGER NULL,
|
||||
DurationMs INTEGER NULL,
|
||||
ErrorMessage TEXT NULL,
|
||||
ErrorDetail TEXT NULL,
|
||||
RequestSummary TEXT NULL,
|
||||
ResponseSummary TEXT NULL,
|
||||
PayloadTruncated INTEGER NOT NULL,
|
||||
Extra TEXT NULL,
|
||||
ForwardState TEXT NOT NULL,
|
||||
PRIMARY KEY (EventId)
|
||||
);
|
||||
CREATE INDEX IF NOT EXISTS IX_SiteAuditLog_ForwardState_Occurred
|
||||
ON AuditLog (ForwardState, OccurredAtUtc);
|
||||
```
|
||||
|
||||
**Tests:**
|
||||
1. `Opens_Creates_AuditLog_Table_With_All_Columns_And_PK`
|
||||
2. `Opens_Creates_IX_ForwardState_Occurred_Index`
|
||||
3. `PRAGMA_auto_vacuum_Is_INCREMENTAL`
|
||||
|
||||
**Steps:**
|
||||
1. Failing test asserts table + PK + 20 columns + index via `PRAGMA table_info(AuditLog)` + `PRAGMA index_list(AuditLog)`.
|
||||
2. Implement constructor + InitializeSchema with inline SQL.
|
||||
3. Run: pass.
|
||||
4. Commit: `feat(auditlog): SqliteAuditWriter schema bootstrap (#23)`.
|
||||
|
||||
### Task B2: `SqliteAuditWriter` — Channel<T> + background writer for hot-path
|
||||
|
||||
**Files:**
|
||||
- Modify: `src/ScadaLink.AuditLog/Site/SqliteAuditWriter.cs` — add `Channel<PendingAuditEvent> _writeQueue` (bounded BoundedChannelFullMode.Wait, default capacity 4096), background `Task ProcessWriteQueueAsync()` launched in constructor. `WriteAsync` enqueues + returns the pending's `TaskCompletionSource`. The loop reads up to `BatchSize`, opens a transaction, INSERTs all events, commits, completes the TCS for each.
|
||||
- Pattern mirrors `src/ScadaLink.SiteEventLogging/SiteEventLogger.cs:135-173`.
|
||||
- Test: `tests/ScadaLink.AuditLog.Tests/Site/SqliteAuditWriterWriteTests.cs`.
|
||||
|
||||
**Tests:**
|
||||
1. `WriteAsync_FreshEvent_PersistsWithForwardStatePending` — write one event, query SQLite, assert row has `ForwardState='Pending'`.
|
||||
2. `WriteAsync_Concurrent_1000Calls_All_Persist_NoExceptions` — fire 1000 parallel WriteAsync, assert row count = 1000 and zero exceptions surface.
|
||||
3. `WriteAsync_LatencyP99_LessThan_5ms_For_Enqueue` — assert TCS Task.IsCompleted within reasonable time AFTER awaiting, but the enqueue itself returns near-instantly (verify via a stopwatch around the Channel.Writer.TryWriteAsync).
|
||||
4. `WriteAsync_DuplicateEventId_FirstWriteWins_NoException` — insert same EventId twice, assert one row only and no exception (the PRIMARY KEY violation is caught/swallowed in the writer loop).
|
||||
|
||||
**Steps:**
|
||||
1. Failing tests for 1, 2, 4.
|
||||
2. Implement Channel + background loop + transactional batch INSERT.
|
||||
3. Run: pass.
|
||||
4. Commit: `feat(auditlog): SqliteAuditWriter Channel-based hot-path write (#23)`.
|
||||
|
||||
### Task B3: `RingBufferFallback`
|
||||
|
||||
**Files:**
|
||||
- Create: `src/ScadaLink.AuditLog/Site/RingBufferFallback.cs` — `Channel<AuditEvent>` bounded at 1024 with `BoundedChannelFullMode.DropOldest`. Exposes `bool TryEnqueue(AuditEvent)`, `IAsyncEnumerable<AuditEvent> DrainAsync(CancellationToken)`, and an event `RingBufferOverflowed` (callback for the health counter).
|
||||
- Test: `tests/ScadaLink.AuditLog.Tests/Site/RingBufferFallbackTests.cs`.
|
||||
|
||||
**Tests:**
|
||||
1. `Enqueue_1025_Into_1024Cap_Ring_DropsOldest_AndRaisesOverflow` — invoke 1025 enqueues, assert the OverflowEvent counter increments once, and the surviving 1024 are the latest.
|
||||
2. `DrainAsync_Yields_FIFO_Then_Completes_When_Empty`.
|
||||
|
||||
**Steps:**
|
||||
1. Failing tests.
|
||||
2. Implement using `Channel.CreateBounded<AuditEvent>(new BoundedChannelOptions(1024) { FullMode = BoundedChannelFullMode.DropOldest })`.
|
||||
3. Run: pass.
|
||||
4. Commit: `feat(auditlog): RingBufferFallback with drop-oldest overflow (#23)`.
|
||||
|
||||
### Task B4: `FallbackAuditWriter` — compose primary + ring
|
||||
|
||||
**Files:**
|
||||
- Create: `src/ScadaLink.AuditLog/Site/FallbackAuditWriter.cs` — implements `IAuditWriter`. Constructor takes the primary `SqliteAuditWriter` + `RingBufferFallback` + `IAuditWriteFailureCounter` (lightweight DI'd interface, Bundle G implements it as `SiteAuditWriteFailures` counter on health metrics). On primary success: returns. On primary throw: increments counter, enqueues into ring (DropOldest), returns success. On the NEXT successful primary call (success after a failure window), drains the ring back through the primary.
|
||||
- Test: `tests/ScadaLink.AuditLog.Tests/Site/FallbackAuditWriterTests.cs`.
|
||||
|
||||
**Tests:**
|
||||
1. `WriteAsync_PrimaryThrows_EventLandsInRing_CallReturnsSuccess`.
|
||||
2. `WriteAsync_PrimaryRecovers_RingDrains_InFIFOOrder_OnNextWrite`.
|
||||
3. `WriteAsync_PrimaryAlwaysSucceeds_Ring_StaysEmpty`.
|
||||
|
||||
**Steps:**
|
||||
1. Failing tests.
|
||||
2. Implement; mock the primary with a `Func<AuditEvent, Task>` flip-switch failure.
|
||||
3. Run: pass.
|
||||
4. Commit: `feat(auditlog): FallbackAuditWriter compose SQLite + ring (#23)`.
|
||||
|
||||
**Bundle B acceptance:** 4 tasks merged. `ScadaLink.AuditLog.Tests` adds ~12+ tests. No regressions.
|
||||
|
||||
---
|
||||
|
||||
## Bundle C — gRPC proto + mapper
|
||||
|
||||
### Task C1: Extend `sitestream.proto` with `IngestAuditEvents`
|
||||
|
||||
**Files:**
|
||||
- Modify: `src/ScadaLink.Communication/Protos/sitestream.proto` — add the messages and unary RPC. Use `google.protobuf.Timestamp` for `OccurredAtUtc`; encode enums as `string` (matches the EF mapping).
|
||||
|
||||
Proposed addition:
|
||||
```proto
|
||||
message AuditEventDto {
|
||||
string event_id = 1;
|
||||
google.protobuf.Timestamp occurred_at_utc = 2;
|
||||
string channel = 3;
|
||||
string kind = 4;
|
||||
string correlation_id = 5; // empty string when null
|
||||
string source_site_id = 6;
|
||||
string source_instance_id = 7;
|
||||
string source_script = 8;
|
||||
string actor = 9;
|
||||
string target = 10;
|
||||
string status = 11;
|
||||
google.protobuf.Int32Value http_status = 12;
|
||||
google.protobuf.Int32Value duration_ms = 13;
|
||||
string error_message = 14;
|
||||
string error_detail = 15;
|
||||
string request_summary = 16;
|
||||
string response_summary = 17;
|
||||
bool payload_truncated = 18;
|
||||
string extra = 19;
|
||||
}
|
||||
message AuditEventBatch { repeated AuditEventDto events = 1; }
|
||||
message IngestAck { repeated string accepted_event_ids = 1; }
|
||||
|
||||
service SiteStreamService {
|
||||
// existing rpcs...
|
||||
rpc IngestAuditEvents(AuditEventBatch) returns (IngestAck);
|
||||
}
|
||||
```
|
||||
|
||||
(Use `google.protobuf.Int32Value` to encode nullable ints; empty string semantics for nullable text fields.)
|
||||
|
||||
- Test: `tests/ScadaLink.Communication.Tests/Protos/AuditEventProtoTests.cs`.
|
||||
|
||||
**Steps:**
|
||||
1. Edit proto + rebuild (`dotnet build src/ScadaLink.Communication/`).
|
||||
2. Failing test round-trips an `AuditEventDto` through `ToByteArray()` and `Parser.ParseFrom()`; asserts all populated fields survive.
|
||||
3. Run: pass.
|
||||
4. Commit: `feat(comms): IngestAuditEvents RPC + AuditEventDto proto (#23)`.
|
||||
|
||||
### Task C2: `AuditEvent` ↔ `AuditEventDto` mapper
|
||||
|
||||
**Files:**
|
||||
- Create: `src/ScadaLink.AuditLog/Telemetry/AuditEventMapper.cs` — static `ToDto(AuditEvent)` and `FromDto(AuditEventDto)`. Handles nullable→empty-string, Timestamp↔DateTime UTC, enum↔string. ForwardState NOT carried in the proto (site-local only; central never sees it).
|
||||
- Test: `tests/ScadaLink.AuditLog.Tests/Telemetry/AuditEventMapperTests.cs`.
|
||||
|
||||
**Tests:**
|
||||
1. `Roundtrip_FullyPopulated_PreservesAllFields`.
|
||||
2. `Roundtrip_AllNullableFieldsNull_ProducesEmptyDtoFields`.
|
||||
3. `FromDto_EmptyOptionalString_BecomesNullProperty`.
|
||||
4. `ToDto_Sets_OccurredAtUtc_As_UtcTimestamp` — Round-trip with `DateTimeKind.Utc` preserved.
|
||||
|
||||
**Steps:**
|
||||
1. Failing tests.
|
||||
2. Implement.
|
||||
3. Run: pass.
|
||||
4. Commit: `feat(auditlog): AuditEvent ↔ proto mapper (#23)`.
|
||||
|
||||
**Bundle C acceptance:** Communication.Tests + AuditLog.Tests still green; proto rebuilds cleanly.
|
||||
|
||||
---
|
||||
|
||||
## Bundle D — SiteAuditTelemetryActor + AuditLogIngestActor + gRPC handler
|
||||
|
||||
### Task D1: `SiteAuditTelemetryActor` — drain loop
|
||||
|
||||
**Files:**
|
||||
- Create: `src/ScadaLink.AuditLog/Site/Telemetry/SiteAuditTelemetryActor.cs` — `ReceiveActor`. On `Drain`: queries `SqliteAuditWriter.ReadPendingAsync(BatchSize)`, calls `gRPC client.IngestAuditEventsAsync(batch)`, on ack flips returned EventIds to `Forwarded` via `SqliteAuditWriter.MarkForwardedAsync(eventIds)`. Re-schedules `Drain` self-tick: 5s if ≥1 row drained, 30s otherwise. On gRPC error: re-schedule 5s; rows stay Pending.
|
||||
- Modify: `src/ScadaLink.AuditLog/Site/SqliteAuditWriter.cs` — add `ReadPendingAsync(int limit, CancellationToken)` returning `IReadOnlyList<AuditEvent>` (with ForwardState=Pending), and `MarkForwardedAsync(IReadOnlyList<Guid> eventIds, CancellationToken)`.
|
||||
- Create: `src/ScadaLink.AuditLog/Site/Telemetry/SiteAuditTelemetryOptions.cs` — `BatchSize=256`, `BusyIntervalSeconds=5`, `IdleIntervalSeconds=30`.
|
||||
- Test: `tests/ScadaLink.AuditLog.Tests/Site/Telemetry/SiteAuditTelemetryActorTests.cs` using `TestKit` + NSubstitute-mocked gRPC client.
|
||||
|
||||
**Tests:**
|
||||
1. `Drain_With_50PendingRows_Sends_OneBatch_Of_50`.
|
||||
2. `Drain_Ack_Flips_Rows_To_Forwarded`.
|
||||
3. `Drain_GrpcThrows_Rows_StayPending_NextTick_Retries`.
|
||||
4. `Drain_Cadence_5s_AfterNonZero_30s_AfterZero` (via `TestScheduler`).
|
||||
|
||||
**Steps:**
|
||||
1. Failing tests.
|
||||
2. Implement.
|
||||
3. Run: pass.
|
||||
4. Commit: `feat(auditlog): SiteAuditTelemetryActor drain loop (#23)`.
|
||||
|
||||
### Task D2: `AuditLogIngestActor` + gRPC server handler
|
||||
|
||||
**Files:**
|
||||
- Create: `src/ScadaLink.AuditLog/Central/AuditLogIngestActor.cs` — `ReceiveActor` accepting `IngestAuditEventsCommand(IReadOnlyList<AuditEvent> events, IActorRef replyTo)`. For each event, calls `IAuditLogRepository.InsertIfNotExistsAsync` (which now swallows duplicates per Bundle A). Sets `IngestedAtUtc = DateTime.UtcNow` before insert (this is the central-side timestamp). Replies with `IngestAck(acceptedEventIds)` — by spec "accepted" includes already-existed rows (idempotent semantics).
|
||||
- Create: `src/ScadaLink.AuditLog/Central/IngestAuditEventsCommand.cs` (Akka message).
|
||||
- Create: `src/ScadaLink.AuditLog/Central/IngestAck.cs` (Akka reply).
|
||||
- Modify: `src/ScadaLink.Communication/SiteStreamGrpc/SiteStreamGrpcServer.cs` — implement `public override async Task<IngestAck> IngestAuditEvents(AuditEventBatch request, ServerCallContext context)` — Ask the central `AuditLogIngestActor` proxy with the deserialized batch, await reply, return.
|
||||
- Modify: `src/ScadaLink.Communication/SiteStreamGrpc/SiteStreamGrpcServer.cs` — add a setter `SetAuditIngestActor(IActorRef)` mirroring how `SetNotificationOutbox` is wired (per recon: Notification Outbox proxy is handed in via `commService?.SetNotificationOutbox(outboxProxy)`).
|
||||
- Test: `tests/ScadaLink.AuditLog.Tests/Central/AuditLogIngestActorTests.cs`.
|
||||
- Test: `tests/ScadaLink.Communication.Tests/SiteStreamIngestAuditEventsTests.cs`.
|
||||
|
||||
**Tests:**
|
||||
1. `Receive_BatchOf5_Calls_Repo_5Times_Acks_All`.
|
||||
2. `Receive_BatchWith_AlreadyExistingEvent_AcksAll_NoDoubleInsert` (idempotent).
|
||||
3. `Receive_RepoThrowsTransient_Replies_AckExcludingFailedEventIds_LogsError` (partial-failure semantics — what gets acked is what was persisted).
|
||||
4. `Receive_Sets_IngestedAtUtc_Before_Insert`.
|
||||
5. `gRPC_Handler_Routes_To_Actor_Returns_Reply`.
|
||||
|
||||
**Steps:**
|
||||
1. Failing tests.
|
||||
2. Implement actor + gRPC handler.
|
||||
3. Run: pass.
|
||||
4. Commit: `feat(auditlog): AuditLogIngestActor + gRPC handler (#23)`.
|
||||
|
||||
**Bundle D acceptance:** New actor + gRPC handler tests all green.
|
||||
|
||||
---
|
||||
|
||||
## Bundle E — Host wiring (central singleton + site actor + dispatcher)
|
||||
|
||||
### Task E1: Register `AuditLogIngestActor` + `SiteAuditTelemetryActor` + dispatcher
|
||||
|
||||
**Files:**
|
||||
- Modify: `src/ScadaLink.Host/Actors/AkkaHostedService.cs` — mirror the Notification Outbox pattern (recon report's exact lines 272-295):
|
||||
- Central role: `AuditLogIngestActor` as `ClusterSingletonManager` (singleton name `"audit-log-ingest"`) + `ClusterSingletonProxy` (`"audit-log-ingest-proxy"`). Hand the proxy to `SiteStreamGrpcServer.SetAuditIngestActor(proxy)`.
|
||||
- Site role: `SiteAuditTelemetryActor` as a per-site actor (`actorSystem.ActorOf(Props.Create(...)`), bound to the dedicated dispatcher (below).
|
||||
- Modify: HOCON in `src/ScadaLink.Host/Configuration/` (the existing akka config file) — add:
|
||||
```
|
||||
audit-telemetry-dispatcher {
|
||||
type = ForkJoinDispatcher
|
||||
throughput = 100
|
||||
dedicated-thread-pool { thread-count = 2 }
|
||||
}
|
||||
```
|
||||
Apply `.WithDispatcher("audit-telemetry-dispatcher")` to `SiteAuditTelemetryActor`'s Props.
|
||||
- Modify: `src/ScadaLink.AuditLog/ServiceCollectionExtensions.cs:AddAuditLog` — register the SqliteAuditWriter+RingBufferFallback+FallbackAuditWriter chain and the actor factories.
|
||||
- Test: `tests/ScadaLink.Host.Tests/AkkaHostedServiceAuditWiringTests.cs`.
|
||||
|
||||
**Tests:**
|
||||
1. `Central_Host_Starts_With_AuditLogIngest_Singleton_Healthy`.
|
||||
2. `Site_Host_Starts_With_SiteAuditTelemetry_Bound_To_DedicatedDispatcher`.
|
||||
3. `AuditWriter_Resolves_From_DI_To_FallbackAuditWriter`.
|
||||
|
||||
**Steps:**
|
||||
1. Failing tests against current host (which doesn't wire audit).
|
||||
2. Implement wiring.
|
||||
3. Run: pass.
|
||||
4. Commit: `feat(host): register Audit Log #23 singletons with dedicated dispatcher`.
|
||||
|
||||
**Bundle E acceptance:** Host.Tests still green; 3 new tests pass.
|
||||
|
||||
---
|
||||
|
||||
## Bundle F — ESG audit emission via ScriptRuntimeContext wrapper
|
||||
|
||||
### Task F1: Wrap `ExternalSystem.Call` in `ScriptRuntimeContext` to emit audit
|
||||
|
||||
**Files:**
|
||||
- Modify: `src/ScadaLink.SiteRuntime/Scripts/ScriptRuntimeContext.cs` — find the existing `ExternalSystem.Call` method (or add one if scripts call through a dynamic API surface). Inside, after `_externalSystemClient.CallAsync(...)` returns OR throws, build the `AuditEvent` (channel=`ApiOutbound`, kind=`ApiCall`, status=`Delivered` for success, `Failed` for HTTP non-2xx or exception, populate `Target=$"{systemName}.{methodName}"`, `SourceSiteId={siteId}`, `SourceInstanceId={instanceName}`, `SourceScript={sourceScript}`, `DurationMs={stopwatch}`, `HttpStatus`, `ErrorMessage`). Call `_auditWriter.WriteAsync(evt)` inside a try/catch that swallows + logs at Warning + increments `SiteAuditWriteFailures` (via the same counter Bundle G defines). Re-throw the original ExternalSystem exception (if any) so the script sees its original error path unchanged.
|
||||
- Modify: `src/ScadaLink.SiteRuntime/Scripts/ScriptRuntimeContext.cs` constructor — inject `IAuditWriter`.
|
||||
- Modify: `src/ScadaLink.SiteRuntime/Actors/ScriptExecutionActor.cs` — resolve and pass `IAuditWriter` into the ScriptRuntimeContext.
|
||||
- Test: `tests/ScadaLink.SiteRuntime.Tests/Scripts/ExternalSystemCallAuditEmissionTests.cs`.
|
||||
|
||||
**Tests:**
|
||||
1. `Call_Success_EmitsOneEvent_Channel_ApiOutbound_Kind_ApiCall_Status_Delivered`.
|
||||
2. `Call_HTTP500_EmitsEvent_Status_Failed_HttpStatus_500_ErrorMessage_Set`.
|
||||
3. `Call_HTTP400_EmitsEvent_Status_Failed_HttpStatus_400`.
|
||||
4. `Call_ClientThrows_NetworkError_EmitsEvent_Status_Failed_ErrorMessage_SetFromException`.
|
||||
5. `AuditWriter_Throws_Script_Call_Returns_Original_Result_Unchanged_Audit_Failure_Counter_Incremented`.
|
||||
6. `Provenance_Populated_FromContext` — SourceInstanceId, SourceScript, SourceSiteId all match the ScriptRuntimeContext's values.
|
||||
|
||||
**Steps:**
|
||||
1. Failing tests.
|
||||
2. Implement wrapper + provenance threading.
|
||||
3. Run: pass.
|
||||
4. Commit: `feat(siteruntime): ExternalSystem.Call emits Audit Log #23 event on every sync call`.
|
||||
|
||||
**Bundle F acceptance:** SiteRuntime.Tests still green; 6 new tests.
|
||||
|
||||
---
|
||||
|
||||
## Bundle G — Health metric `SiteAuditWriteFailures`
|
||||
|
||||
### Task G1: Counter + DI surface
|
||||
|
||||
**Files:**
|
||||
- Create: `src/ScadaLink.AuditLog/Site/IAuditWriteFailureCounter.cs` — `void Increment();`. Bundle B's `FallbackAuditWriter` already takes this.
|
||||
- Modify: `src/ScadaLink.HealthMonitoring/SiteHealthCollector.cs` — add `int _siteAuditWriteFailures` field + `IncrementSiteAuditWriteFailures()` method using `Interlocked.Increment`. Expose via a snapshot read.
|
||||
- Modify: `src/ScadaLink.HealthMonitoring/SiteHealthState.cs` — add `SiteAuditWriteFailures` property to the report payload.
|
||||
- Implementation: a small adapter class `HealthMetricsAuditWriteFailureCounter : IAuditWriteFailureCounter` registered in DI that bridges to `ISiteHealthCollector.IncrementSiteAuditWriteFailures()`.
|
||||
- Test: `tests/ScadaLink.HealthMonitoring.Tests/SiteAuditWriteFailuresMetricTests.cs`.
|
||||
|
||||
**Tests:**
|
||||
1. `Increment_Three_Times_Counter_Reports_3`.
|
||||
2. `Report_Payload_Includes_SiteAuditWriteFailures`.
|
||||
|
||||
**Steps:**
|
||||
1. Failing tests.
|
||||
2. Implement counter + adapter + DI registration.
|
||||
3. Run: pass.
|
||||
4. Commit: `feat(health): SiteAuditWriteFailures counter (#23)`.
|
||||
|
||||
**Bundle G acceptance:** HealthMonitoring.Tests still green; 2 new tests.
|
||||
|
||||
---
|
||||
|
||||
## Bundle H — Component-level integration test
|
||||
|
||||
### Task H1: End-to-end via TestKit + MSSQL fixture
|
||||
|
||||
**Files:**
|
||||
- Create: `tests/ScadaLink.AuditLog.Tests/Integration/SyncCallEmissionEndToEndTests.cs` — uses `MsSqlMigrationFixture` (the M1 reusable fixture; depend on `Xunit.SkippableFact`):
|
||||
- Brings up `SqliteAuditWriter` against `:memory:`.
|
||||
- Brings up `SiteAuditTelemetryActor` via TestKit.
|
||||
- Brings up `AuditLogIngestActor` via TestKit, configured with the MSSQL `IAuditLogRepository` from M1.
|
||||
- Stubs the gRPC client by overriding the actor's gRPC dependency with a direct `IActorRef`-backed mock that forwards `IngestAuditEvents` directly to the central actor.
|
||||
- Writes one `AuditEvent` via the FallbackAuditWriter.
|
||||
- Drives a `Drain` tick on the telemetry actor.
|
||||
- Asserts the row appears in the MS SQL `AuditLog` table within 5 seconds via `IAuditLogRepository.QueryAsync`.
|
||||
|
||||
**Steps:**
|
||||
1. Failing test (telemetry not yet wired).
|
||||
2. Wire the components together via the test harness.
|
||||
3. Run: pass.
|
||||
4. Commit: `test(auditlog): end-to-end sync-call emission via TestKit + MSSQL fixture (#23)`.
|
||||
|
||||
**Bundle H acceptance:** New test passes when MSSQL container is up; skips cleanly when down.
|
||||
|
||||
---
|
||||
|
||||
## Final cross-bundle review
|
||||
|
||||
After Bundles A–H, dispatch a final reviewer agent with the same template as M1's. Acceptance gate: full `dotnet test ScadaLink.slnx` green. Then merge `--no-ff` with summary; update M3–M8 with M2 realities; status paragraph; proceed to M3.
|
||||
153
src/ScadaLink.AuditLog/Central/AuditLogIngestActor.cs
Normal file
153
src/ScadaLink.AuditLog/Central/AuditLogIngestActor.cs
Normal file
@@ -0,0 +1,153 @@
|
||||
using Akka.Actor;
|
||||
using Microsoft.Extensions.DependencyInjection;
|
||||
using Microsoft.Extensions.Logging;
|
||||
using ScadaLink.Commons.Entities.Audit;
|
||||
using ScadaLink.Commons.Interfaces.Repositories;
|
||||
using ScadaLink.Commons.Messages.Audit;
|
||||
|
||||
namespace ScadaLink.AuditLog.Central;
|
||||
|
||||
/// <summary>
|
||||
/// Central-side singleton (per Bundle E wiring) that ingests batches of
|
||||
/// <see cref="AuditEvent"/> rows pushed from sites via the
|
||||
/// <c>IngestAuditEvents</c> gRPC RPC. Each row is stamped with the central-side
|
||||
/// <see cref="AuditEvent.IngestedAtUtc"/> and inserted idempotently via
|
||||
/// <see cref="IAuditLogRepository.InsertIfNotExistsAsync"/> — duplicates are
|
||||
/// silently swallowed (first-write-wins per Bundle A's hardening).
|
||||
/// </summary>
|
||||
/// <remarks>
|
||||
/// <para>
|
||||
/// Idempotency is the contract: a row that already exists at central counts
|
||||
/// as "accepted" for the purposes of the reply, because the storage state is
|
||||
/// consistent and the site is free to flip its local row to <c>Forwarded</c>.
|
||||
/// </para>
|
||||
/// <para>
|
||||
/// Per Bundle D's brief, audit-write failures must NEVER abort the user-facing
|
||||
/// action. The actor wraps each repository call in its own try/catch so a
|
||||
/// single bad row cannot cause the rest of the batch to be lost; the actor's
|
||||
/// <see cref="SupervisorStrategy"/> uses <c>Resume</c> so a thrown exception
|
||||
/// inside <c>ReceiveAsync</c> does not restart the actor (which would also
|
||||
/// reset any in-flight state).
|
||||
/// </para>
|
||||
/// <para>
|
||||
/// Two constructors exist for a deliberate reason: Bundle D's tests inject a
|
||||
/// concrete <see cref="IAuditLogRepository"/> against a per-test MSSQL fixture
|
||||
/// (the only way to verify the IngestedAtUtc stamp + duplicate-key idempotency
|
||||
/// end to end), while Bundle E's host wiring registers the actor as a cluster
|
||||
/// singleton and must therefore resolve the repository — which is a scoped EF
|
||||
/// Core service — from a fresh DI scope per message. Mirroring the Notification
|
||||
/// Outbox actor's pattern.
|
||||
/// </para>
|
||||
/// </remarks>
|
||||
public class AuditLogIngestActor : ReceiveActor
|
||||
{
|
||||
private readonly IServiceProvider? _serviceProvider;
|
||||
private readonly IAuditLogRepository? _injectedRepository;
|
||||
private readonly ILogger<AuditLogIngestActor> _logger;
|
||||
|
||||
/// <summary>
|
||||
/// Test-mode constructor — injects a concrete repository instance whose
|
||||
/// lifetime exceeds the test, so the actor reuses the same instance across
|
||||
/// every message. Used by Bundle D's MSSQL-backed TestKit fixture.
|
||||
/// </summary>
|
||||
public AuditLogIngestActor(
|
||||
IAuditLogRepository repository,
|
||||
ILogger<AuditLogIngestActor> logger)
|
||||
{
|
||||
ArgumentNullException.ThrowIfNull(repository);
|
||||
ArgumentNullException.ThrowIfNull(logger);
|
||||
|
||||
_injectedRepository = repository;
|
||||
_logger = logger;
|
||||
|
||||
ReceiveAsync<IngestAuditEventsCommand>(OnIngestAsync);
|
||||
}
|
||||
|
||||
/// <summary>
|
||||
/// Production constructor — resolves <see cref="IAuditLogRepository"/> from
|
||||
/// a fresh DI scope per message because the repository is a scoped EF Core
|
||||
/// service registered by <c>AddConfigurationDatabase</c>. The actor itself
|
||||
/// is a long-lived cluster singleton, so it cannot hold a scope across
|
||||
/// messages.
|
||||
/// </summary>
|
||||
public AuditLogIngestActor(
|
||||
IServiceProvider serviceProvider,
|
||||
ILogger<AuditLogIngestActor> logger)
|
||||
{
|
||||
ArgumentNullException.ThrowIfNull(serviceProvider);
|
||||
ArgumentNullException.ThrowIfNull(logger);
|
||||
|
||||
_serviceProvider = serviceProvider;
|
||||
_logger = logger;
|
||||
|
||||
ReceiveAsync<IngestAuditEventsCommand>(OnIngestAsync);
|
||||
}
|
||||
|
||||
/// <summary>
|
||||
/// Audit-write failures are best-effort by design (see alog.md §13): a
|
||||
/// thrown exception in the ingest pipeline must not crash the actor.
|
||||
/// Resume keeps the actor's state intact so the next batch is processed
|
||||
/// against the same repository instance.
|
||||
/// </summary>
|
||||
protected override SupervisorStrategy SupervisorStrategy()
|
||||
{
|
||||
return new OneForOneStrategy(maxNrOfRetries: 0, withinTimeRange: TimeSpan.Zero, decider:
|
||||
Akka.Actor.SupervisorStrategy.DefaultDecider);
|
||||
}
|
||||
|
||||
private async Task OnIngestAsync(IngestAuditEventsCommand cmd)
|
||||
{
|
||||
// Sender is captured before the first await — Akka resets Sender
|
||||
// between message dispatches, so a post-await Tell would go to
|
||||
// DeadLetters.
|
||||
var replyTo = Sender;
|
||||
var nowUtc = DateTime.UtcNow;
|
||||
var accepted = new List<Guid>(cmd.Events.Count);
|
||||
|
||||
// Resolve the repository for the whole batch — one DbContext per
|
||||
// message, mirroring NotificationOutboxActor. The injected-repository
|
||||
// mode (Bundle D tests) skips the scope entirely.
|
||||
IServiceScope? scope = null;
|
||||
IAuditLogRepository repository;
|
||||
if (_injectedRepository is not null)
|
||||
{
|
||||
repository = _injectedRepository;
|
||||
}
|
||||
else
|
||||
{
|
||||
scope = _serviceProvider!.CreateScope();
|
||||
repository = scope.ServiceProvider.GetRequiredService<IAuditLogRepository>();
|
||||
}
|
||||
|
||||
try
|
||||
{
|
||||
foreach (var evt in cmd.Events)
|
||||
{
|
||||
try
|
||||
{
|
||||
// Stamp IngestedAtUtc here, not at the site. Bundle A's
|
||||
// repository hardening already swallows duplicate-key races,
|
||||
// so the same id arriving twice (site retry, reconciliation)
|
||||
// is a silent no-op.
|
||||
var ingested = evt with { IngestedAtUtc = nowUtc };
|
||||
await repository.InsertIfNotExistsAsync(ingested).ConfigureAwait(false);
|
||||
accepted.Add(evt.EventId);
|
||||
}
|
||||
catch (Exception ex)
|
||||
{
|
||||
// Per-row catch — one bad row never sinks the whole batch.
|
||||
// The row stays Pending at the site; the next drain retries.
|
||||
_logger.LogError(ex,
|
||||
"Failed to persist audit event {EventId} during batch ingest; row will be retried by the site.",
|
||||
evt.EventId);
|
||||
}
|
||||
}
|
||||
}
|
||||
finally
|
||||
{
|
||||
scope?.Dispose();
|
||||
}
|
||||
|
||||
replyTo.Tell(new IngestAuditEventsReply(accepted));
|
||||
}
|
||||
}
|
||||
@@ -8,7 +8,12 @@
|
||||
</PropertyGroup>
|
||||
|
||||
<ItemGroup>
|
||||
<!-- Bundle D D1: SiteAuditTelemetryActor + (D2) AuditLogIngestActor live
|
||||
in this project, so Akka is an explicit dependency. -->
|
||||
<PackageReference Include="Akka" />
|
||||
<PackageReference Include="Microsoft.Data.Sqlite" />
|
||||
<PackageReference Include="Microsoft.Extensions.DependencyInjection.Abstractions" />
|
||||
<PackageReference Include="Microsoft.Extensions.Logging.Abstractions" />
|
||||
<PackageReference Include="Microsoft.Extensions.Options" />
|
||||
<PackageReference Include="Microsoft.Extensions.Options.ConfigurationExtensions" />
|
||||
</ItemGroup>
|
||||
@@ -19,6 +24,8 @@
|
||||
IAuditLogRepository is registered by ScadaLink.ConfigurationDatabase; the project
|
||||
reference is documented here so M2 writers + telemetry actors can depend on it. -->
|
||||
<ProjectReference Include="../ScadaLink.ConfigurationDatabase/ScadaLink.ConfigurationDatabase.csproj" />
|
||||
<!-- Communication carries the IngestAuditEvents proto + DTOs (#23 M2 site sync). -->
|
||||
<ProjectReference Include="../ScadaLink.Communication/ScadaLink.Communication.csproj" />
|
||||
</ItemGroup>
|
||||
|
||||
<ItemGroup>
|
||||
|
||||
@@ -1,44 +1,139 @@
|
||||
using Microsoft.Extensions.Configuration;
|
||||
using Microsoft.Extensions.DependencyInjection;
|
||||
using Microsoft.Extensions.DependencyInjection.Extensions;
|
||||
using Microsoft.Extensions.Logging;
|
||||
using Microsoft.Extensions.Options;
|
||||
using ScadaLink.AuditLog.Configuration;
|
||||
using ScadaLink.AuditLog.Site;
|
||||
using ScadaLink.AuditLog.Site.Telemetry;
|
||||
using ScadaLink.Commons.Interfaces.Services;
|
||||
|
||||
namespace ScadaLink.AuditLog;
|
||||
|
||||
/// <summary>
|
||||
/// Composition root for the Audit Log (#23) component. M1 registers
|
||||
/// <see cref="AuditLogOptions"/> and its validator; later milestones extend
|
||||
/// this method to wire up writers, telemetry actors, and the central ingest
|
||||
/// pipeline. Audit Log (#23) sits alongside Notification Outbox (#21) and
|
||||
/// Site Call Audit (#22).
|
||||
/// Composition root for the Audit Log (#23) component.
|
||||
/// </summary>
|
||||
/// <remarks>
|
||||
/// <para>
|
||||
/// M1 registered <see cref="AuditLogOptions"/> + the validator. M2 Bundle E
|
||||
/// extends the surface with the site-side writer chain
|
||||
/// (<see cref="SqliteAuditWriter"/> + <see cref="RingBufferFallback"/> +
|
||||
/// <see cref="FallbackAuditWriter"/>) and the telemetry collaborators
|
||||
/// (<see cref="ISiteAuditQueue"/>, <see cref="ISiteStreamAuditClient"/>,
|
||||
/// <see cref="IAuditWriteFailureCounter"/>, <see cref="SiteAuditTelemetryOptions"/>,
|
||||
/// <see cref="SqliteAuditWriterOptions"/>).
|
||||
/// </para>
|
||||
/// <para>
|
||||
/// Audit Log (#23) sits alongside Notification Outbox (#21) and Site Call
|
||||
/// Audit (#22). <c>IAuditLogRepository</c> is registered by
|
||||
/// <c>ScadaLink.ConfigurationDatabase.ServiceCollectionExtensions.AddConfigurationDatabase</c>,
|
||||
/// so the caller (the Host on the central node) must also call that.
|
||||
/// </para>
|
||||
/// </remarks>
|
||||
public static class ServiceCollectionExtensions
|
||||
{
|
||||
/// <summary>Configuration section bound to <see cref="AuditLogOptions"/>.</summary>
|
||||
public const string ConfigSectionName = "AuditLog";
|
||||
|
||||
/// <summary>Configuration section bound to <see cref="SqliteAuditWriterOptions"/>.</summary>
|
||||
public const string SiteWriterSectionName = "AuditLog:SiteWriter";
|
||||
|
||||
/// <summary>Configuration section bound to <see cref="SiteAuditTelemetryOptions"/>.</summary>
|
||||
public const string SiteTelemetrySectionName = "AuditLog:SiteTelemetry";
|
||||
|
||||
/// <summary>
|
||||
/// Binds <see cref="AuditLogOptions"/> from the
|
||||
/// <see cref="ConfigSectionName"/> section of <paramref name="config"/>
|
||||
/// and registers <see cref="AuditLogOptionsValidator"/> so a misconfigured
|
||||
/// <c>AuditLog</c> section is rejected with a key-naming message when the
|
||||
/// options are first resolved (or at startup when consumers wire in
|
||||
/// <c>ValidateOnStart()</c>). M2+ will register writers, telemetry actors,
|
||||
/// and the central ingest pipeline here. <c>IAuditLogRepository</c> is
|
||||
/// registered by
|
||||
/// <c>ScadaLink.ConfigurationDatabase.ServiceCollectionExtensions.AddConfigurationDatabase</c>,
|
||||
/// so the caller (the Host on the central node) must also call that.
|
||||
/// Registers the Audit Log (#23) component services: options, the site
|
||||
/// SQLite writer chain (primary + ring fallback + failure-counter sink),
|
||||
/// and the site-→central telemetry collaborators. Idempotent re-registration
|
||||
/// is not supported; call this exactly once per <see cref="IServiceCollection"/>.
|
||||
/// </summary>
|
||||
public static IServiceCollection AddAuditLog(this IServiceCollection services, IConfiguration config)
|
||||
{
|
||||
ArgumentNullException.ThrowIfNull(services);
|
||||
ArgumentNullException.ThrowIfNull(config);
|
||||
|
||||
// M1: top-level AuditLogOptions + validator (redaction policy, payload caps, etc.).
|
||||
services.AddOptions<AuditLogOptions>()
|
||||
.Bind(config.GetSection(ConfigSectionName))
|
||||
.ValidateOnStart();
|
||||
services.AddSingleton<IValidateOptions<AuditLogOptions>, AuditLogOptionsValidator>();
|
||||
|
||||
// M2 Bundle E: site writer + telemetry options bindings.
|
||||
// BindConfiguration is not used because the configuration root supplied
|
||||
// by the caller may not be the application root — we go through the
|
||||
// section explicitly so a partial IConfiguration (e.g. a test stub
|
||||
// anchored on the AuditLog section's parent) still works.
|
||||
services.AddOptions<SqliteAuditWriterOptions>()
|
||||
.Bind(config.GetSection(SiteWriterSectionName));
|
||||
services.AddOptions<SiteAuditTelemetryOptions>()
|
||||
.Bind(config.GetSection(SiteTelemetrySectionName));
|
||||
|
||||
// SqliteAuditWriter is a singleton with a single owned SqliteConnection
|
||||
// and a background writer Task; multiple instances would race on the
|
||||
// same file. Registered concretely so the ISiteAuditQueue + IAuditWriter
|
||||
// forwards below resolve to the same instance — the actor must observe
|
||||
// the writes made via the hot-path interface.
|
||||
services.AddSingleton<SqliteAuditWriter>();
|
||||
services.AddSingleton<ISiteAuditQueue>(sp => sp.GetRequiredService<SqliteAuditWriter>());
|
||||
|
||||
// RingBufferFallback: drop-oldest in-memory ring used by
|
||||
// FallbackAuditWriter when the primary SQLite writer throws. Default
|
||||
// capacity is fine for M2 (1024).
|
||||
services.AddSingleton<RingBufferFallback>();
|
||||
|
||||
// IAuditWriteFailureCounter: NoOp default. Bundle G overrides this
|
||||
// binding with the real Site Health Monitoring counter. Registered
|
||||
// before FallbackAuditWriter so the factory can resolve it.
|
||||
services.AddSingleton<IAuditWriteFailureCounter, NoOpAuditWriteFailureCounter>();
|
||||
|
||||
// The script-thread surface is FallbackAuditWriter (primary + ring +
|
||||
// counter), not the raw SqliteAuditWriter — primary failures must NEVER
|
||||
// abort the user-facing action.
|
||||
services.AddSingleton<IAuditWriter>(sp => new FallbackAuditWriter(
|
||||
primary: sp.GetRequiredService<SqliteAuditWriter>(),
|
||||
ring: sp.GetRequiredService<RingBufferFallback>(),
|
||||
failureCounter: sp.GetRequiredService<IAuditWriteFailureCounter>(),
|
||||
logger: sp.GetRequiredService<ILogger<FallbackAuditWriter>>()));
|
||||
|
||||
// ISiteStreamAuditClient: NoOp default. M6's reconciliation work brings
|
||||
// the real gRPC-backed implementation (no site→central gRPC channel
|
||||
// exists today — sites talk to central via Akka ClusterClient only).
|
||||
// Bundle H's integration test substitutes a stub directly into the
|
||||
// SiteAuditTelemetryActor's Props.Create call.
|
||||
services.AddSingleton<ISiteStreamAuditClient, NoOpSiteStreamAuditClient>();
|
||||
|
||||
return services;
|
||||
}
|
||||
|
||||
/// <summary>
|
||||
/// Audit Log (#23) M2 Bundle G — swap the default
|
||||
/// <see cref="NoOpAuditWriteFailureCounter"/> registration for the real
|
||||
/// <see cref="HealthMetricsAuditWriteFailureCounter"/> bridge so the
|
||||
/// FallbackAuditWriter primary-failure counter surfaces in the site health
|
||||
/// report payload as <c>SiteHealthReport.SiteAuditWriteFailures</c>.
|
||||
/// </summary>
|
||||
/// <remarks>
|
||||
/// <para>
|
||||
/// Must be called AFTER both <see cref="AddAuditLog"/> (registers the
|
||||
/// NoOp default this method replaces) and
|
||||
/// <c>ScadaLink.HealthMonitoring.ServiceCollectionExtensions.AddHealthMonitoring</c>
|
||||
/// or <c>AddSiteHealthMonitoring</c> (registers the
|
||||
/// <see cref="ISiteHealthCollector"/> the bridge depends on). Resolving
|
||||
/// <see cref="IAuditWriteFailureCounter"/> without the latter throws
|
||||
/// <see cref="InvalidOperationException"/> at <c>GetRequiredService</c>
|
||||
/// time — by design, since a silent NoOp would mask a misconfiguration.
|
||||
/// </para>
|
||||
/// <para>
|
||||
/// Idempotent — calling twice replaces the descriptor each time without
|
||||
/// piling up registrations.
|
||||
/// </para>
|
||||
/// </remarks>
|
||||
public static IServiceCollection AddAuditLogHealthMetricsBridge(this IServiceCollection services)
|
||||
{
|
||||
ArgumentNullException.ThrowIfNull(services);
|
||||
|
||||
services.Replace(
|
||||
ServiceDescriptor.Singleton<IAuditWriteFailureCounter, HealthMetricsAuditWriteFailureCounter>());
|
||||
return services;
|
||||
}
|
||||
}
|
||||
|
||||
125
src/ScadaLink.AuditLog/Site/FallbackAuditWriter.cs
Normal file
125
src/ScadaLink.AuditLog/Site/FallbackAuditWriter.cs
Normal file
@@ -0,0 +1,125 @@
|
||||
using Microsoft.Extensions.Logging;
|
||||
using ScadaLink.Commons.Entities.Audit;
|
||||
using ScadaLink.Commons.Interfaces.Services;
|
||||
|
||||
namespace ScadaLink.AuditLog.Site;
|
||||
|
||||
/// <summary>
|
||||
/// Composes the primary <see cref="SqliteAuditWriter"/> with a drop-oldest
|
||||
/// <see cref="RingBufferFallback"/>. Audit writes are best-effort by contract
|
||||
/// (see <see cref="IAuditWriter"/>) — a primary failure must NEVER bubble out
|
||||
/// to the calling script. Failed events are stashed in the ring; on the next
|
||||
/// successful primary write the ring is drained back through the primary in
|
||||
/// FIFO order.
|
||||
/// </summary>
|
||||
/// <remarks>
|
||||
/// <para>
|
||||
/// Each primary failure increments <see cref="IAuditWriteFailureCounter"/> so
|
||||
/// Site Health Monitoring can surface a sustained outage as
|
||||
/// <c>SiteAuditWriteFailures</c> (Bundle G).
|
||||
/// </para>
|
||||
/// <para>
|
||||
/// Errors raised by the ring drain on recovery are logged and silently dropped
|
||||
/// so we don't loop the failure mode — the trigger event itself succeeded, and
|
||||
/// retrying the drain on the NEXT successful write is the recovery path.
|
||||
/// </para>
|
||||
/// </remarks>
|
||||
public sealed class FallbackAuditWriter : IAuditWriter
|
||||
{
|
||||
private readonly IAuditWriter _primary;
|
||||
private readonly RingBufferFallback _ring;
|
||||
private readonly IAuditWriteFailureCounter _failureCounter;
|
||||
private readonly ILogger<FallbackAuditWriter> _logger;
|
||||
private readonly SemaphoreSlim _drainGate = new(1, 1);
|
||||
|
||||
public FallbackAuditWriter(
|
||||
IAuditWriter primary,
|
||||
RingBufferFallback ring,
|
||||
IAuditWriteFailureCounter failureCounter,
|
||||
ILogger<FallbackAuditWriter> logger)
|
||||
{
|
||||
_primary = primary ?? throw new ArgumentNullException(nameof(primary));
|
||||
_ring = ring ?? throw new ArgumentNullException(nameof(ring));
|
||||
_failureCounter = failureCounter ?? throw new ArgumentNullException(nameof(failureCounter));
|
||||
_logger = logger ?? throw new ArgumentNullException(nameof(logger));
|
||||
}
|
||||
|
||||
public async Task WriteAsync(AuditEvent evt, CancellationToken ct = default)
|
||||
{
|
||||
ArgumentNullException.ThrowIfNull(evt);
|
||||
|
||||
try
|
||||
{
|
||||
await _primary.WriteAsync(evt, ct).ConfigureAwait(false);
|
||||
}
|
||||
catch (Exception ex)
|
||||
{
|
||||
// Primary down: record the failure, stash in the ring, return
|
||||
// success to the caller. Audit-write failures NEVER abort the
|
||||
// user-facing action (alog.md §7). DO NOT attempt the ring drain
|
||||
// here — primary is throwing, draining would just scramble FIFO
|
||||
// order across re-enqueues.
|
||||
_failureCounter.Increment();
|
||||
_logger.LogWarning(ex,
|
||||
"Primary audit writer threw; routing EventId {EventId} to drop-oldest ring.",
|
||||
evt.EventId);
|
||||
_ring.TryEnqueue(evt);
|
||||
return;
|
||||
}
|
||||
|
||||
// Primary succeeded — opportunistically drain anything that piled up
|
||||
// in the ring during the outage. Best-effort: a failure during the
|
||||
// drain re-enqueues the popped event and is logged; the next
|
||||
// successful write will retry. Drain order in the audit log is
|
||||
// therefore: <triggering event>, <backlog FIFO>.
|
||||
if (_ring.Count > 0)
|
||||
{
|
||||
await TryDrainRingAsync(ct).ConfigureAwait(false);
|
||||
}
|
||||
}
|
||||
|
||||
private async Task TryDrainRingAsync(CancellationToken ct)
|
||||
{
|
||||
// Serialise drains so two concurrent recoveries don't double-replay.
|
||||
if (!await _drainGate.WaitAsync(0, ct).ConfigureAwait(false))
|
||||
{
|
||||
return;
|
||||
}
|
||||
|
||||
try
|
||||
{
|
||||
// Pull only what is currently buffered; do NOT wait for new events.
|
||||
// We iterate with a snapshot of Count so we never starve under
|
||||
// concurrent enqueues.
|
||||
var pending = _ring.Count;
|
||||
for (var i = 0; i < pending; i++)
|
||||
{
|
||||
if (!_ring.TryDequeue(out var queued))
|
||||
{
|
||||
break;
|
||||
}
|
||||
|
||||
try
|
||||
{
|
||||
await _primary.WriteAsync(queued, ct).ConfigureAwait(false);
|
||||
}
|
||||
catch (Exception ex)
|
||||
{
|
||||
// Primary fell over again. Put the event back at the head
|
||||
// of the queue is impossible with Channel<T>; route to the
|
||||
// tail (drop-oldest preserves the most-recent picture).
|
||||
_failureCounter.Increment();
|
||||
_logger.LogWarning(ex,
|
||||
"Ring drain re-throw on EventId {EventId}; re-enqueuing.",
|
||||
queued.EventId);
|
||||
_ring.TryEnqueue(queued);
|
||||
break;
|
||||
}
|
||||
}
|
||||
}
|
||||
finally
|
||||
{
|
||||
_drainGate.Release();
|
||||
}
|
||||
}
|
||||
}
|
||||
@@ -0,0 +1,33 @@
|
||||
using ScadaLink.HealthMonitoring;
|
||||
|
||||
namespace ScadaLink.AuditLog.Site;
|
||||
|
||||
/// <summary>
|
||||
/// Audit Log (#23) M2 Bundle G — bridges <see cref="IAuditWriteFailureCounter"/>
|
||||
/// (incremented by <see cref="FallbackAuditWriter"/> every time the primary
|
||||
/// SQLite writer throws) into <see cref="ISiteHealthCollector"/> so the count
|
||||
/// surfaces in the site health report payload as
|
||||
/// <c>SiteHealthReport.SiteAuditWriteFailures</c>.
|
||||
/// </summary>
|
||||
/// <remarks>
|
||||
/// <para>
|
||||
/// Registered by <see cref="ServiceCollectionExtensions.AddAuditLogHealthMetricsBridge"/>;
|
||||
/// callers must register <c>AddHealthMonitoring()</c> first so
|
||||
/// <see cref="ISiteHealthCollector"/> resolves. The default <see cref="AddAuditLog"/>
|
||||
/// registration keeps <see cref="NoOpAuditWriteFailureCounter"/> for nodes
|
||||
/// where Site Health Monitoring is not wired (the silent-sink contract — audit
|
||||
/// write failures must NEVER abort the user-facing action, alog.md §7).
|
||||
/// </para>
|
||||
/// </remarks>
|
||||
public sealed class HealthMetricsAuditWriteFailureCounter : IAuditWriteFailureCounter
|
||||
{
|
||||
private readonly ISiteHealthCollector _collector;
|
||||
|
||||
public HealthMetricsAuditWriteFailureCounter(ISiteHealthCollector collector)
|
||||
{
|
||||
_collector = collector ?? throw new ArgumentNullException(nameof(collector));
|
||||
}
|
||||
|
||||
/// <inheritdoc/>
|
||||
public void Increment() => _collector.IncrementSiteAuditWriteFailures();
|
||||
}
|
||||
14
src/ScadaLink.AuditLog/Site/IAuditWriteFailureCounter.cs
Normal file
14
src/ScadaLink.AuditLog/Site/IAuditWriteFailureCounter.cs
Normal file
@@ -0,0 +1,14 @@
|
||||
namespace ScadaLink.AuditLog.Site;
|
||||
|
||||
/// <summary>
|
||||
/// Lightweight counter sink invoked by <see cref="FallbackAuditWriter"/> every
|
||||
/// time the primary <see cref="SqliteAuditWriter"/> throws on an audit write.
|
||||
/// Bundle G (M2-T11) implements this as a thread-safe Interlocked counter
|
||||
/// bridged into the Site Health Monitoring report payload as
|
||||
/// <c>SiteAuditWriteFailures</c>.
|
||||
/// </summary>
|
||||
public interface IAuditWriteFailureCounter
|
||||
{
|
||||
/// <summary>Increment the audit-write failure counter by one.</summary>
|
||||
void Increment();
|
||||
}
|
||||
25
src/ScadaLink.AuditLog/Site/NoOpAuditWriteFailureCounter.cs
Normal file
25
src/ScadaLink.AuditLog/Site/NoOpAuditWriteFailureCounter.cs
Normal file
@@ -0,0 +1,25 @@
|
||||
namespace ScadaLink.AuditLog.Site;
|
||||
|
||||
/// <summary>
|
||||
/// Default <see cref="IAuditWriteFailureCounter"/> registered by
|
||||
/// <see cref="ScadaLink.AuditLog.ServiceCollectionExtensions.AddAuditLog"/> on
|
||||
/// every node. Bundle G replaces this binding with a real counter that bridges
|
||||
/// into the Site Health Monitoring report payload as
|
||||
/// <c>SiteAuditWriteFailures</c> — until then,
|
||||
/// <see cref="FallbackAuditWriter"/> emits to a silent sink rather than NRE-ing
|
||||
/// on a null collaborator.
|
||||
/// </summary>
|
||||
/// <remarks>
|
||||
/// Audit-write failures must NEVER abort the user-facing action (alog.md §7),
|
||||
/// so the counter is best-effort by contract. A NoOp default is the correct
|
||||
/// safe fallback while the health metric is being wired in.
|
||||
/// </remarks>
|
||||
public sealed class NoOpAuditWriteFailureCounter : IAuditWriteFailureCounter
|
||||
{
|
||||
/// <inheritdoc/>
|
||||
public void Increment()
|
||||
{
|
||||
// Intentionally empty. Bundle G overrides this binding with the real
|
||||
// counter once Site Health Monitoring is wired.
|
||||
}
|
||||
}
|
||||
115
src/ScadaLink.AuditLog/Site/RingBufferFallback.cs
Normal file
115
src/ScadaLink.AuditLog/Site/RingBufferFallback.cs
Normal file
@@ -0,0 +1,115 @@
|
||||
using System.Runtime.CompilerServices;
|
||||
using System.Threading.Channels;
|
||||
using ScadaLink.Commons.Entities.Audit;
|
||||
|
||||
namespace ScadaLink.AuditLog.Site;
|
||||
|
||||
/// <summary>
|
||||
/// Drop-oldest in-memory ring buffer used by <see cref="FallbackAuditWriter"/>
|
||||
/// when the primary SQLite writer is throwing. Capacity is fixed at construction
|
||||
/// (default 1024). When full, the oldest event is silently dropped to make room
|
||||
/// for the newest — preserving the most recent picture of activity in the face
|
||||
/// of an extended SQLite outage — and <see cref="RingBufferOverflowed"/> is
|
||||
/// raised so a health counter can record the loss.
|
||||
/// </summary>
|
||||
/// <remarks>
|
||||
/// <para>
|
||||
/// Backed by a <see cref="Channel{T}"/> with
|
||||
/// <see cref="BoundedChannelFullMode.DropOldest"/>. The channel doesn't natively
|
||||
/// notify on drop, so this class compares <c>Reader.Count</c> before and after
|
||||
/// each enqueue: any time we hit capacity and a subsequent enqueue keeps the
|
||||
/// count at capacity, exactly one event has been dropped.
|
||||
/// </para>
|
||||
/// <para>
|
||||
/// Per the M2 plan: the ring is the absolute-last-resort buffer for the
|
||||
/// hot-path; it is NOT a substitute for the bounded
|
||||
/// <see cref="SqliteAuditWriter"/> write queue.
|
||||
/// </para>
|
||||
/// </remarks>
|
||||
public sealed class RingBufferFallback
|
||||
{
|
||||
private readonly Channel<AuditEvent> _channel;
|
||||
private readonly int _capacity;
|
||||
|
||||
/// <summary>
|
||||
/// Raised once each time a drop-oldest overflow occurs. Hooked by
|
||||
/// <see cref="FallbackAuditWriter"/>'s health counter wiring.
|
||||
/// </summary>
|
||||
public event Action? RingBufferOverflowed;
|
||||
|
||||
public RingBufferFallback(int capacity = 1024)
|
||||
{
|
||||
if (capacity <= 0)
|
||||
{
|
||||
throw new ArgumentOutOfRangeException(nameof(capacity), "capacity must be > 0.");
|
||||
}
|
||||
|
||||
_capacity = capacity;
|
||||
_channel = Channel.CreateBounded<AuditEvent>(new BoundedChannelOptions(capacity)
|
||||
{
|
||||
FullMode = BoundedChannelFullMode.DropOldest,
|
||||
SingleReader = true,
|
||||
SingleWriter = false,
|
||||
});
|
||||
}
|
||||
|
||||
/// <summary>Current event count in the ring (for diagnostics/tests).</summary>
|
||||
public int Count => _channel.Reader.Count;
|
||||
|
||||
/// <summary>
|
||||
/// Try to enqueue an event. Returns <see langword="true"/> on success (even
|
||||
/// when an overflow caused an older event to be dropped); returns
|
||||
/// <see langword="false"/> only when the ring has been
|
||||
/// <see cref="Complete"/>-d.
|
||||
/// </summary>
|
||||
public bool TryEnqueue(AuditEvent evt)
|
||||
{
|
||||
ArgumentNullException.ThrowIfNull(evt);
|
||||
|
||||
// DropOldest TryWrite always succeeds unless the channel is completed.
|
||||
// Detect overflow by comparing the count before vs. after: if we were
|
||||
// already at capacity and remain at capacity, exactly one event was
|
||||
// dropped to make room for evt.
|
||||
var beforeCount = _channel.Reader.Count;
|
||||
if (!_channel.Writer.TryWrite(evt))
|
||||
{
|
||||
return false;
|
||||
}
|
||||
|
||||
if (beforeCount >= _capacity)
|
||||
{
|
||||
// The new event displaced an existing one.
|
||||
RingBufferOverflowed?.Invoke();
|
||||
}
|
||||
|
||||
return true;
|
||||
}
|
||||
|
||||
/// <summary>
|
||||
/// Drain the ring in FIFO order. Yields available events immediately and
|
||||
/// then completes when the channel is empty AND <see cref="Complete"/> has
|
||||
/// been called. Callers that only want to drain what's currently buffered
|
||||
/// must call <see cref="Complete"/> first.
|
||||
/// </summary>
|
||||
public async IAsyncEnumerable<AuditEvent> DrainAsync(
|
||||
[EnumeratorCancellation] CancellationToken cancellationToken)
|
||||
{
|
||||
await foreach (var evt in _channel.Reader.ReadAllAsync(cancellationToken).ConfigureAwait(false))
|
||||
{
|
||||
yield return evt;
|
||||
}
|
||||
}
|
||||
|
||||
/// <summary>
|
||||
/// Non-blocking single-item dequeue used by the
|
||||
/// <see cref="FallbackAuditWriter"/> recovery path. Returns
|
||||
/// <see langword="false"/> when the ring is empty.
|
||||
/// </summary>
|
||||
public bool TryDequeue(out AuditEvent evt) => _channel.Reader.TryRead(out evt!);
|
||||
|
||||
/// <summary>
|
||||
/// Mark the ring as no-more-writes. <see cref="DrainAsync"/> will yield the
|
||||
/// remaining events and then complete.
|
||||
/// </summary>
|
||||
public void Complete() => _channel.Writer.TryComplete();
|
||||
}
|
||||
479
src/ScadaLink.AuditLog/Site/SqliteAuditWriter.cs
Normal file
479
src/ScadaLink.AuditLog/Site/SqliteAuditWriter.cs
Normal file
@@ -0,0 +1,479 @@
|
||||
using System.Threading.Channels;
|
||||
using Microsoft.Data.Sqlite;
|
||||
using Microsoft.Extensions.Logging;
|
||||
using Microsoft.Extensions.Options;
|
||||
using ScadaLink.AuditLog.Site.Telemetry;
|
||||
using ScadaLink.Commons.Entities.Audit;
|
||||
using ScadaLink.Commons.Interfaces.Services;
|
||||
using ScadaLink.Commons.Types.Enums;
|
||||
|
||||
namespace ScadaLink.AuditLog.Site;
|
||||
|
||||
/// <summary>
|
||||
/// Site-side SQLite hot-path writer for Audit Log (#23) events. Mirrors the
|
||||
/// <see cref="ScadaLink.SiteEventLogging.SiteEventLogger"/> design — a single
|
||||
/// owned <see cref="SqliteConnection"/> serialised behind a write lock, fed by a
|
||||
/// bounded <see cref="Channel{T}"/> drained on a dedicated background writer
|
||||
/// task — so script-thread callers never block on disk I/O.
|
||||
/// </summary>
|
||||
/// <remarks>
|
||||
/// <para>
|
||||
/// The schema is bootstrapped in the constructor (Bundle B-T1). The
|
||||
/// Channel-based <see cref="WriteAsync"/> hot-path + Bundle D
|
||||
/// <see cref="ReadPendingAsync"/> / <see cref="MarkForwardedAsync"/> support
|
||||
/// surface are wired in Bundle B-T2.
|
||||
/// </para>
|
||||
/// <para>
|
||||
/// Site rows always carry <see cref="AuditForwardState.Pending"/> on first
|
||||
/// insert; the central row-shape's <c>IngestedAtUtc</c> column does NOT live in
|
||||
/// the site SQLite schema — central stamps it on ingest.
|
||||
/// </para>
|
||||
/// </remarks>
|
||||
public class SqliteAuditWriter : IAuditWriter, ISiteAuditQueue, IAsyncDisposable, IDisposable
|
||||
{
|
||||
// Microsoft.Data.Sqlite reports a generic SQLITE_CONSTRAINT (error code 19)
|
||||
// on a PRIMARY KEY violation; the extended subcode 1555 (SQLITE_CONSTRAINT_PRIMARYKEY)
|
||||
// is exposed via SqliteException.SqliteExtendedErrorCode but isn't reliably
|
||||
// surfaced across all SQLite builds. We treat any constraint error on insert
|
||||
// as a duplicate-eventid race and swallow it (first-write-wins) — the index
|
||||
// on EventId is the only constraint on this table, so this scope is precise.
|
||||
private const int SqliteErrorConstraint = 19;
|
||||
|
||||
private readonly SqliteConnection _connection;
|
||||
private readonly SqliteAuditWriterOptions _options;
|
||||
private readonly ILogger<SqliteAuditWriter> _logger;
|
||||
private readonly object _writeLock = new();
|
||||
private readonly Channel<PendingAuditEvent> _writeQueue;
|
||||
private readonly Task _writerLoop;
|
||||
private bool _disposed;
|
||||
|
||||
public SqliteAuditWriter(
|
||||
IOptions<SqliteAuditWriterOptions> options,
|
||||
ILogger<SqliteAuditWriter> logger,
|
||||
string? connectionStringOverride = null)
|
||||
{
|
||||
ArgumentNullException.ThrowIfNull(options);
|
||||
ArgumentNullException.ThrowIfNull(logger);
|
||||
|
||||
_options = options.Value;
|
||||
_logger = logger;
|
||||
|
||||
var connectionString = connectionStringOverride
|
||||
?? $"Data Source={_options.DatabasePath};Cache=Shared";
|
||||
_connection = new SqliteConnection(connectionString);
|
||||
_connection.Open();
|
||||
|
||||
InitializeSchema();
|
||||
|
||||
_writeQueue = Channel.CreateBounded<PendingAuditEvent>(
|
||||
new BoundedChannelOptions(_options.ChannelCapacity)
|
||||
{
|
||||
// The hot-path enqueue must back-pressure if the background
|
||||
// writer falls behind; a higher-level fallback (Bundle B-T4)
|
||||
// handles truly catastrophic primary failure with a drop-oldest
|
||||
// ring buffer.
|
||||
FullMode = BoundedChannelFullMode.Wait,
|
||||
SingleReader = true,
|
||||
SingleWriter = false,
|
||||
});
|
||||
_writerLoop = Task.Run(ProcessWriteQueueAsync);
|
||||
}
|
||||
|
||||
private void InitializeSchema()
|
||||
{
|
||||
// auto_vacuum must be set before any table is created for it to take
|
||||
// effect on a fresh database. INCREMENTAL lets a future
|
||||
// `PRAGMA incremental_vacuum` shrink the file after the 7-day retention
|
||||
// purge — see alog.md §10.
|
||||
using (var pragmaCmd = _connection.CreateCommand())
|
||||
{
|
||||
pragmaCmd.CommandText = "PRAGMA auto_vacuum = INCREMENTAL";
|
||||
pragmaCmd.ExecuteNonQuery();
|
||||
}
|
||||
|
||||
using var cmd = _connection.CreateCommand();
|
||||
cmd.CommandText = """
|
||||
CREATE TABLE IF NOT EXISTS AuditLog (
|
||||
EventId TEXT NOT NULL,
|
||||
OccurredAtUtc TEXT NOT NULL,
|
||||
Channel TEXT NOT NULL,
|
||||
Kind TEXT NOT NULL,
|
||||
CorrelationId TEXT NULL,
|
||||
SourceSiteId TEXT NULL,
|
||||
SourceInstanceId TEXT NULL,
|
||||
SourceScript TEXT NULL,
|
||||
Actor TEXT NULL,
|
||||
Target TEXT NULL,
|
||||
Status TEXT NOT NULL,
|
||||
HttpStatus INTEGER NULL,
|
||||
DurationMs INTEGER NULL,
|
||||
ErrorMessage TEXT NULL,
|
||||
ErrorDetail TEXT NULL,
|
||||
RequestSummary TEXT NULL,
|
||||
ResponseSummary TEXT NULL,
|
||||
PayloadTruncated INTEGER NOT NULL,
|
||||
Extra TEXT NULL,
|
||||
ForwardState TEXT NOT NULL,
|
||||
PRIMARY KEY (EventId)
|
||||
);
|
||||
CREATE INDEX IF NOT EXISTS IX_SiteAuditLog_ForwardState_Occurred
|
||||
ON AuditLog (ForwardState, OccurredAtUtc);
|
||||
""";
|
||||
cmd.ExecuteNonQuery();
|
||||
}
|
||||
|
||||
/// <summary>
|
||||
/// Enqueue an event for durable persistence. The returned <see cref="Task"/>
|
||||
/// completes once the event has been INSERTed (or, in the duplicate-EventId
|
||||
/// case, recognised as already present); it faults only if the writer loop
|
||||
/// itself collapses. The enqueue side never blocks on disk I/O — it only
|
||||
/// awaits the bounded channel's back-pressure when the writer is briefly
|
||||
/// behind.
|
||||
/// </summary>
|
||||
public Task WriteAsync(AuditEvent evt, CancellationToken ct = default)
|
||||
{
|
||||
ArgumentNullException.ThrowIfNull(evt);
|
||||
|
||||
// Site rows always carry a non-null ForwardState; central rows leave it
|
||||
// null. Force Pending on enqueue so callers can pass a bare AuditEvent
|
||||
// without thinking about site-vs-central provenance.
|
||||
var siteEvt = evt.ForwardState is null
|
||||
? evt with { ForwardState = AuditForwardState.Pending }
|
||||
: evt;
|
||||
|
||||
var pending = new PendingAuditEvent(siteEvt);
|
||||
|
||||
// CreateBounded(FullMode=Wait) means WriteAsync will await room rather
|
||||
// than throw when full — exactly the hot-path back-pressure semantics
|
||||
// we want.
|
||||
if (!_writeQueue.Writer.TryWrite(pending))
|
||||
{
|
||||
// The writer is either completed (logger disposed) or the channel
|
||||
// is at capacity. Fall back to the async path which honours the
|
||||
// FullMode=Wait policy.
|
||||
return WriteSlowPathAsync(pending, ct);
|
||||
}
|
||||
|
||||
return pending.Completion.Task;
|
||||
}
|
||||
|
||||
private async Task WriteSlowPathAsync(PendingAuditEvent pending, CancellationToken ct)
|
||||
{
|
||||
try
|
||||
{
|
||||
await _writeQueue.Writer.WriteAsync(pending, ct).ConfigureAwait(false);
|
||||
}
|
||||
catch (ChannelClosedException)
|
||||
{
|
||||
pending.Completion.TrySetException(
|
||||
new ObjectDisposedException(nameof(SqliteAuditWriter),
|
||||
"Event could not be recorded: the audit writer has been disposed."));
|
||||
}
|
||||
|
||||
await pending.Completion.Task.ConfigureAwait(false);
|
||||
}
|
||||
|
||||
private async Task ProcessWriteQueueAsync()
|
||||
{
|
||||
var batch = new List<PendingAuditEvent>(_options.BatchSize);
|
||||
|
||||
// ReadAllAsync completes when the channel is marked complete (Dispose).
|
||||
await foreach (var first in _writeQueue.Reader.ReadAllAsync().ConfigureAwait(false))
|
||||
{
|
||||
batch.Clear();
|
||||
batch.Add(first);
|
||||
|
||||
// Pull additional ready events up to BatchSize. TryRead is non-
|
||||
// blocking and lets us amortise the transaction overhead across a
|
||||
// burst of concurrent enqueues.
|
||||
while (batch.Count < _options.BatchSize &&
|
||||
_writeQueue.Reader.TryRead(out var next))
|
||||
{
|
||||
batch.Add(next);
|
||||
}
|
||||
|
||||
FlushBatch(batch);
|
||||
}
|
||||
}
|
||||
|
||||
private void FlushBatch(IReadOnlyList<PendingAuditEvent> batch)
|
||||
{
|
||||
lock (_writeLock)
|
||||
{
|
||||
if (_disposed)
|
||||
{
|
||||
foreach (var pending in batch)
|
||||
{
|
||||
pending.Completion.TrySetException(
|
||||
new ObjectDisposedException(nameof(SqliteAuditWriter),
|
||||
"Event could not be recorded: the audit writer was disposed before the write completed."));
|
||||
}
|
||||
return;
|
||||
}
|
||||
|
||||
using var transaction = _connection.BeginTransaction();
|
||||
try
|
||||
{
|
||||
using var cmd = _connection.CreateCommand();
|
||||
cmd.Transaction = transaction;
|
||||
cmd.CommandText = """
|
||||
INSERT INTO AuditLog (
|
||||
EventId, OccurredAtUtc, Channel, Kind, CorrelationId,
|
||||
SourceSiteId, SourceInstanceId, SourceScript, Actor, Target,
|
||||
Status, HttpStatus, DurationMs, ErrorMessage, ErrorDetail,
|
||||
RequestSummary, ResponseSummary, PayloadTruncated, Extra, ForwardState
|
||||
) VALUES (
|
||||
$EventId, $OccurredAtUtc, $Channel, $Kind, $CorrelationId,
|
||||
$SourceSiteId, $SourceInstanceId, $SourceScript, $Actor, $Target,
|
||||
$Status, $HttpStatus, $DurationMs, $ErrorMessage, $ErrorDetail,
|
||||
$RequestSummary, $ResponseSummary, $PayloadTruncated, $Extra, $ForwardState
|
||||
);
|
||||
""";
|
||||
|
||||
var pEventId = cmd.Parameters.Add("$EventId", SqliteType.Text);
|
||||
var pOccurredAt = cmd.Parameters.Add("$OccurredAtUtc", SqliteType.Text);
|
||||
var pChannel = cmd.Parameters.Add("$Channel", SqliteType.Text);
|
||||
var pKind = cmd.Parameters.Add("$Kind", SqliteType.Text);
|
||||
var pCorrelationId = cmd.Parameters.Add("$CorrelationId", SqliteType.Text);
|
||||
var pSourceSiteId = cmd.Parameters.Add("$SourceSiteId", SqliteType.Text);
|
||||
var pSourceInstanceId = cmd.Parameters.Add("$SourceInstanceId", SqliteType.Text);
|
||||
var pSourceScript = cmd.Parameters.Add("$SourceScript", SqliteType.Text);
|
||||
var pActor = cmd.Parameters.Add("$Actor", SqliteType.Text);
|
||||
var pTarget = cmd.Parameters.Add("$Target", SqliteType.Text);
|
||||
var pStatus = cmd.Parameters.Add("$Status", SqliteType.Text);
|
||||
var pHttpStatus = cmd.Parameters.Add("$HttpStatus", SqliteType.Integer);
|
||||
var pDurationMs = cmd.Parameters.Add("$DurationMs", SqliteType.Integer);
|
||||
var pErrorMessage = cmd.Parameters.Add("$ErrorMessage", SqliteType.Text);
|
||||
var pErrorDetail = cmd.Parameters.Add("$ErrorDetail", SqliteType.Text);
|
||||
var pRequestSummary = cmd.Parameters.Add("$RequestSummary", SqliteType.Text);
|
||||
var pResponseSummary = cmd.Parameters.Add("$ResponseSummary", SqliteType.Text);
|
||||
var pPayloadTruncated = cmd.Parameters.Add("$PayloadTruncated", SqliteType.Integer);
|
||||
var pExtra = cmd.Parameters.Add("$Extra", SqliteType.Text);
|
||||
var pForwardState = cmd.Parameters.Add("$ForwardState", SqliteType.Text);
|
||||
|
||||
foreach (var pending in batch)
|
||||
{
|
||||
var e = pending.Event;
|
||||
pEventId.Value = e.EventId.ToString();
|
||||
pOccurredAt.Value = e.OccurredAtUtc.ToString("o");
|
||||
pChannel.Value = e.Channel.ToString();
|
||||
pKind.Value = e.Kind.ToString();
|
||||
pCorrelationId.Value = (object?)e.CorrelationId?.ToString() ?? DBNull.Value;
|
||||
pSourceSiteId.Value = (object?)e.SourceSiteId ?? DBNull.Value;
|
||||
pSourceInstanceId.Value = (object?)e.SourceInstanceId ?? DBNull.Value;
|
||||
pSourceScript.Value = (object?)e.SourceScript ?? DBNull.Value;
|
||||
pActor.Value = (object?)e.Actor ?? DBNull.Value;
|
||||
pTarget.Value = (object?)e.Target ?? DBNull.Value;
|
||||
pStatus.Value = e.Status.ToString();
|
||||
pHttpStatus.Value = (object?)e.HttpStatus ?? DBNull.Value;
|
||||
pDurationMs.Value = (object?)e.DurationMs ?? DBNull.Value;
|
||||
pErrorMessage.Value = (object?)e.ErrorMessage ?? DBNull.Value;
|
||||
pErrorDetail.Value = (object?)e.ErrorDetail ?? DBNull.Value;
|
||||
pRequestSummary.Value = (object?)e.RequestSummary ?? DBNull.Value;
|
||||
pResponseSummary.Value = (object?)e.ResponseSummary ?? DBNull.Value;
|
||||
pPayloadTruncated.Value = e.PayloadTruncated ? 1 : 0;
|
||||
pExtra.Value = (object?)e.Extra ?? DBNull.Value;
|
||||
pForwardState.Value = (e.ForwardState ?? AuditForwardState.Pending).ToString();
|
||||
|
||||
try
|
||||
{
|
||||
cmd.ExecuteNonQuery();
|
||||
pending.Completion.TrySetResult();
|
||||
}
|
||||
catch (SqliteException ex) when (ex.SqliteErrorCode == SqliteErrorConstraint)
|
||||
{
|
||||
// Duplicate EventId — first-write-wins (alog.md §11).
|
||||
// Treat as success: the lifecycle event is durably
|
||||
// recorded under the first writer's payload.
|
||||
_logger.LogDebug(ex,
|
||||
"Duplicate EventId {EventId} swallowed by SqliteAuditWriter",
|
||||
e.EventId);
|
||||
pending.Completion.TrySetResult();
|
||||
}
|
||||
}
|
||||
|
||||
transaction.Commit();
|
||||
}
|
||||
catch (Exception ex)
|
||||
{
|
||||
transaction.Rollback();
|
||||
_logger.LogError(ex, "SqliteAuditWriter batch insert failed; faulting {Count} pending events", batch.Count);
|
||||
foreach (var pending in batch)
|
||||
{
|
||||
pending.Completion.TrySetException(ex);
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
/// <summary>
|
||||
/// Returns up to <paramref name="limit"/> rows in <see cref="AuditForwardState.Pending"/>,
|
||||
/// oldest <see cref="AuditEvent.OccurredAtUtc"/> first, with <see cref="AuditEvent.EventId"/>
|
||||
/// as the deterministic tiebreaker. Called by Bundle D's site telemetry
|
||||
/// actor to build a batch for the gRPC push.
|
||||
/// </summary>
|
||||
public Task<IReadOnlyList<AuditEvent>> ReadPendingAsync(int limit, CancellationToken ct = default)
|
||||
{
|
||||
if (limit <= 0)
|
||||
{
|
||||
throw new ArgumentOutOfRangeException(nameof(limit), "limit must be > 0.");
|
||||
}
|
||||
|
||||
// SqliteConnection is not thread-safe so we go through the same write
|
||||
// lock the batch INSERTer uses. The actor caller is single-threaded,
|
||||
// so contention is bounded.
|
||||
lock (_writeLock)
|
||||
{
|
||||
ObjectDisposedException.ThrowIf(_disposed, this);
|
||||
|
||||
using var cmd = _connection.CreateCommand();
|
||||
cmd.CommandText = """
|
||||
SELECT EventId, OccurredAtUtc, Channel, Kind, CorrelationId,
|
||||
SourceSiteId, SourceInstanceId, SourceScript, Actor, Target,
|
||||
Status, HttpStatus, DurationMs, ErrorMessage, ErrorDetail,
|
||||
RequestSummary, ResponseSummary, PayloadTruncated, Extra, ForwardState
|
||||
FROM AuditLog
|
||||
WHERE ForwardState = $pending
|
||||
ORDER BY OccurredAtUtc ASC, EventId ASC
|
||||
LIMIT $limit;
|
||||
""";
|
||||
cmd.Parameters.AddWithValue("$pending", AuditForwardState.Pending.ToString());
|
||||
cmd.Parameters.AddWithValue("$limit", limit);
|
||||
|
||||
var rows = new List<AuditEvent>(Math.Min(limit, 256));
|
||||
using var reader = cmd.ExecuteReader();
|
||||
while (reader.Read())
|
||||
{
|
||||
rows.Add(MapRow(reader));
|
||||
}
|
||||
|
||||
return Task.FromResult<IReadOnlyList<AuditEvent>>(rows);
|
||||
}
|
||||
}
|
||||
|
||||
/// <summary>
|
||||
/// Flips the supplied EventIds from <see cref="AuditForwardState.Pending"/> to
|
||||
/// <see cref="AuditForwardState.Forwarded"/> in a single UPDATE. Non-existent
|
||||
/// or already-forwarded ids are no-ops.
|
||||
/// </summary>
|
||||
public Task MarkForwardedAsync(IReadOnlyList<Guid> eventIds, CancellationToken ct = default)
|
||||
{
|
||||
ArgumentNullException.ThrowIfNull(eventIds);
|
||||
if (eventIds.Count == 0)
|
||||
{
|
||||
return Task.CompletedTask;
|
||||
}
|
||||
|
||||
lock (_writeLock)
|
||||
{
|
||||
ObjectDisposedException.ThrowIf(_disposed, this);
|
||||
|
||||
using var cmd = _connection.CreateCommand();
|
||||
// Build a single IN (...) parameter list so we issue one UPDATE per
|
||||
// batch regardless of size. Each id is bound as its own parameter,
|
||||
// so no string concatenation of user data ever enters the SQL.
|
||||
var sb = new System.Text.StringBuilder();
|
||||
sb.Append("UPDATE AuditLog SET ForwardState = $forwarded WHERE EventId IN (");
|
||||
for (int i = 0; i < eventIds.Count; i++)
|
||||
{
|
||||
if (i > 0) sb.Append(',');
|
||||
var p = $"$id{i}";
|
||||
sb.Append(p);
|
||||
cmd.Parameters.AddWithValue(p, eventIds[i].ToString());
|
||||
}
|
||||
sb.Append(");");
|
||||
cmd.CommandText = sb.ToString();
|
||||
cmd.Parameters.AddWithValue("$forwarded", AuditForwardState.Forwarded.ToString());
|
||||
|
||||
cmd.ExecuteNonQuery();
|
||||
return Task.CompletedTask;
|
||||
}
|
||||
}
|
||||
|
||||
private static AuditEvent MapRow(SqliteDataReader reader)
|
||||
{
|
||||
return new AuditEvent
|
||||
{
|
||||
EventId = Guid.Parse(reader.GetString(0)),
|
||||
OccurredAtUtc = DateTime.Parse(reader.GetString(1),
|
||||
System.Globalization.CultureInfo.InvariantCulture,
|
||||
System.Globalization.DateTimeStyles.RoundtripKind),
|
||||
Channel = Enum.Parse<AuditChannel>(reader.GetString(2)),
|
||||
Kind = Enum.Parse<AuditKind>(reader.GetString(3)),
|
||||
CorrelationId = reader.IsDBNull(4) ? null : Guid.Parse(reader.GetString(4)),
|
||||
SourceSiteId = reader.IsDBNull(5) ? null : reader.GetString(5),
|
||||
SourceInstanceId = reader.IsDBNull(6) ? null : reader.GetString(6),
|
||||
SourceScript = reader.IsDBNull(7) ? null : reader.GetString(7),
|
||||
Actor = reader.IsDBNull(8) ? null : reader.GetString(8),
|
||||
Target = reader.IsDBNull(9) ? null : reader.GetString(9),
|
||||
Status = Enum.Parse<AuditStatus>(reader.GetString(10)),
|
||||
HttpStatus = reader.IsDBNull(11) ? null : reader.GetInt32(11),
|
||||
DurationMs = reader.IsDBNull(12) ? null : reader.GetInt32(12),
|
||||
ErrorMessage = reader.IsDBNull(13) ? null : reader.GetString(13),
|
||||
ErrorDetail = reader.IsDBNull(14) ? null : reader.GetString(14),
|
||||
RequestSummary = reader.IsDBNull(15) ? null : reader.GetString(15),
|
||||
ResponseSummary = reader.IsDBNull(16) ? null : reader.GetString(16),
|
||||
PayloadTruncated = reader.GetInt32(17) != 0,
|
||||
Extra = reader.IsDBNull(18) ? null : reader.GetString(18),
|
||||
ForwardState = Enum.Parse<AuditForwardState>(reader.GetString(19)),
|
||||
};
|
||||
}
|
||||
|
||||
public void Dispose()
|
||||
{
|
||||
DisposeAsync().AsTask().GetAwaiter().GetResult();
|
||||
}
|
||||
|
||||
public async ValueTask DisposeAsync()
|
||||
{
|
||||
Task? writerLoop;
|
||||
lock (_writeLock)
|
||||
{
|
||||
if (_disposed) return;
|
||||
// Stop accepting new events. Setting _disposed first ensures any
|
||||
// FlushBatch entered after we mark disposed will fault its pending
|
||||
// events rather than touching the about-to-close connection.
|
||||
_writeQueue.Writer.TryComplete();
|
||||
writerLoop = _writerLoop;
|
||||
}
|
||||
|
||||
// Wait outside the lock — the loop reacquires it for each batch.
|
||||
try
|
||||
{
|
||||
if (writerLoop is not null)
|
||||
{
|
||||
await writerLoop.WaitAsync(TimeSpan.FromSeconds(5)).ConfigureAwait(false);
|
||||
}
|
||||
}
|
||||
catch (TimeoutException)
|
||||
{
|
||||
_logger.LogWarning("SqliteAuditWriter writer loop did not drain within 5s of dispose.");
|
||||
}
|
||||
catch (Exception ex)
|
||||
{
|
||||
// The loop's per-batch try/catch already routed individual failures
|
||||
// to pending TCSes; a top-level fault here is unexpected.
|
||||
_logger.LogError(ex, "SqliteAuditWriter writer loop faulted during dispose.");
|
||||
}
|
||||
|
||||
lock (_writeLock)
|
||||
{
|
||||
if (_disposed) return;
|
||||
_disposed = true;
|
||||
_connection.Dispose();
|
||||
}
|
||||
}
|
||||
|
||||
/// <summary>An audit event awaiting persistence by the background writer.</summary>
|
||||
private sealed class PendingAuditEvent
|
||||
{
|
||||
public PendingAuditEvent(AuditEvent evt)
|
||||
{
|
||||
Event = evt;
|
||||
Completion = new TaskCompletionSource(TaskCreationOptions.RunContinuationsAsynchronously);
|
||||
}
|
||||
|
||||
public AuditEvent Event { get; }
|
||||
public TaskCompletionSource Completion { get; }
|
||||
}
|
||||
}
|
||||
27
src/ScadaLink.AuditLog/Site/SqliteAuditWriterOptions.cs
Normal file
27
src/ScadaLink.AuditLog/Site/SqliteAuditWriterOptions.cs
Normal file
@@ -0,0 +1,27 @@
|
||||
namespace ScadaLink.AuditLog.Site;
|
||||
|
||||
/// <summary>
|
||||
/// Options for the site-side SQLite hot-path audit writer.
|
||||
/// Mirrors the ScadaLink.SiteEventLogging pattern: a single SQLite connection
|
||||
/// fed by a background writer task draining a bounded
|
||||
/// <see cref="System.Threading.Channels.Channel{T}"/> so script-thread enqueues
|
||||
/// never block on disk I/O.
|
||||
/// </summary>
|
||||
public sealed class SqliteAuditWriterOptions
|
||||
{
|
||||
/// <summary>SQLite database path (or in-memory URI for tests).</summary>
|
||||
public string DatabasePath { get; set; } = "auditlog.db";
|
||||
|
||||
/// <summary>
|
||||
/// Capacity of the bounded write queue. Set high enough that ordinary
|
||||
/// script bursts never fill it; <see cref="System.Threading.Channels.BoundedChannelFullMode.Wait"/>
|
||||
/// applies when the writer falls behind.
|
||||
/// </summary>
|
||||
public int ChannelCapacity { get; set; } = 4096;
|
||||
|
||||
/// <summary>Max number of pending events the writer drains in one transaction.</summary>
|
||||
public int BatchSize { get; set; } = 256;
|
||||
|
||||
/// <summary>Soft flush interval the writer enforces when fewer than BatchSize events are queued.</summary>
|
||||
public int FlushIntervalMs { get; set; } = 50;
|
||||
}
|
||||
34
src/ScadaLink.AuditLog/Site/Telemetry/ISiteAuditQueue.cs
Normal file
34
src/ScadaLink.AuditLog/Site/Telemetry/ISiteAuditQueue.cs
Normal file
@@ -0,0 +1,34 @@
|
||||
using ScadaLink.Commons.Entities.Audit;
|
||||
|
||||
namespace ScadaLink.AuditLog.Site.Telemetry;
|
||||
|
||||
/// <summary>
|
||||
/// Site-local audit-log queue surface consumed by <see cref="SiteAuditTelemetryActor"/>.
|
||||
/// Extracted from <see cref="SqliteAuditWriter"/> so the telemetry actor can be
|
||||
/// unit-tested against a stub without touching SQLite. <see cref="SqliteAuditWriter"/>
|
||||
/// implements this interface; production wiring injects the same instance.
|
||||
/// </summary>
|
||||
/// <remarks>
|
||||
/// Only the two methods the drain loop needs are exposed — the hot-path
|
||||
/// <c>WriteAsync</c> stays on <see cref="Commons.Interfaces.Services.IAuditWriter"/>
|
||||
/// (script-thread surface), separated by concern from the
|
||||
/// telemetry-actor surface so each side can be mocked independently.
|
||||
/// </remarks>
|
||||
public interface ISiteAuditQueue
|
||||
{
|
||||
/// <summary>
|
||||
/// Returns up to <paramref name="limit"/> rows currently in
|
||||
/// <see cref="ScadaLink.Commons.Types.Enums.AuditForwardState.Pending"/>,
|
||||
/// oldest first. Idempotent — repeated calls before
|
||||
/// <see cref="MarkForwardedAsync"/> will yield the same rows again.
|
||||
/// </summary>
|
||||
Task<IReadOnlyList<AuditEvent>> ReadPendingAsync(int limit, CancellationToken ct = default);
|
||||
|
||||
/// <summary>
|
||||
/// Flips the supplied EventIds from
|
||||
/// <see cref="ScadaLink.Commons.Types.Enums.AuditForwardState.Pending"/> to
|
||||
/// <see cref="ScadaLink.Commons.Types.Enums.AuditForwardState.Forwarded"/>.
|
||||
/// Non-existent or already-forwarded ids are silent no-ops.
|
||||
/// </summary>
|
||||
Task MarkForwardedAsync(IReadOnlyList<Guid> eventIds, CancellationToken ct = default);
|
||||
}
|
||||
@@ -0,0 +1,23 @@
|
||||
using ScadaLink.Communication.Grpc;
|
||||
|
||||
namespace ScadaLink.AuditLog.Site.Telemetry;
|
||||
|
||||
/// <summary>
|
||||
/// Mockable abstraction over the central site-stream gRPC client surface that
|
||||
/// <see cref="SiteAuditTelemetryActor"/> uses to push <see cref="AuditEventBatch"/>
|
||||
/// payloads. The production implementation (added in Bundle E host wiring)
|
||||
/// wraps the auto-generated <c>SiteStreamService.SiteStreamServiceClient</c>;
|
||||
/// unit tests substitute via NSubstitute against this interface so the actor
|
||||
/// never needs a live gRPC channel.
|
||||
/// </summary>
|
||||
public interface ISiteStreamAuditClient
|
||||
{
|
||||
/// <summary>
|
||||
/// Pushes <paramref name="batch"/> to the central <c>IngestAuditEvents</c>
|
||||
/// RPC. The returned <see cref="IngestAck"/> carries the
|
||||
/// <c>accepted_event_ids</c> the actor will flip to
|
||||
/// <see cref="ScadaLink.Commons.Types.Enums.AuditForwardState.Forwarded"/>
|
||||
/// in the site SQLite queue.
|
||||
/// </summary>
|
||||
Task<IngestAck> IngestAuditEventsAsync(AuditEventBatch batch, CancellationToken ct);
|
||||
}
|
||||
@@ -0,0 +1,41 @@
|
||||
using ScadaLink.Communication.Grpc;
|
||||
|
||||
namespace ScadaLink.AuditLog.Site.Telemetry;
|
||||
|
||||
/// <summary>
|
||||
/// Default <see cref="ISiteStreamAuditClient"/> registered by
|
||||
/// <see cref="ScadaLink.AuditLog.ServiceCollectionExtensions.AddAuditLog"/>.
|
||||
/// Ships with M2 site-sync-pipeline wiring; the real gRPC-backed
|
||||
/// implementation is deferred to M6 reconciliation, where a site→central gRPC
|
||||
/// channel will be introduced (no such channel exists today — sites talk to
|
||||
/// central exclusively via Akka ClusterClient, while the gRPC SiteStreamService
|
||||
/// is hosted on the SITE side for central→site streaming).
|
||||
/// </summary>
|
||||
/// <remarks>
|
||||
/// <para>
|
||||
/// Returns an empty <see cref="IngestAck"/> so the
|
||||
/// <see cref="SiteAuditTelemetryActor"/> doesn't flip any rows to
|
||||
/// <c>Forwarded</c> when this NoOp is in effect — Bundle H's integration test
|
||||
/// substitutes a stub client that routes directly to the central
|
||||
/// <c>AuditLogIngestActor</c> in-process. Production wiring (M6) will replace
|
||||
/// this binding with a real client.
|
||||
/// </para>
|
||||
/// <para>
|
||||
/// Audit-write paths are best-effort by contract: a NoOp client keeps the
|
||||
/// host running cleanly and is consistent with "audit-write failures never
|
||||
/// abort the user-facing action".
|
||||
/// </para>
|
||||
/// </remarks>
|
||||
public sealed class NoOpSiteStreamAuditClient : ISiteStreamAuditClient
|
||||
{
|
||||
private static readonly IngestAck EmptyAck = new();
|
||||
|
||||
/// <inheritdoc/>
|
||||
public Task<IngestAck> IngestAuditEventsAsync(AuditEventBatch batch, CancellationToken ct)
|
||||
{
|
||||
ArgumentNullException.ThrowIfNull(batch);
|
||||
// Empty ack — no EventIds will be flipped to Forwarded, so rows stay
|
||||
// Pending until M6's real client (or a Bundle H test stub) takes over.
|
||||
return Task.FromResult(EmptyAck);
|
||||
}
|
||||
}
|
||||
179
src/ScadaLink.AuditLog/Site/Telemetry/SiteAuditTelemetryActor.cs
Normal file
179
src/ScadaLink.AuditLog/Site/Telemetry/SiteAuditTelemetryActor.cs
Normal file
@@ -0,0 +1,179 @@
|
||||
using Akka.Actor;
|
||||
using Microsoft.Extensions.Logging;
|
||||
using Microsoft.Extensions.Options;
|
||||
using ScadaLink.AuditLog.Telemetry;
|
||||
using ScadaLink.Commons.Entities.Audit;
|
||||
using ScadaLink.Communication.Grpc;
|
||||
|
||||
namespace ScadaLink.AuditLog.Site.Telemetry;
|
||||
|
||||
/// <summary>
|
||||
/// Site-side actor that drains the local SQLite audit queue and pushes Pending
|
||||
/// rows to central via the <c>IngestAuditEvents</c> gRPC RPC. On a successful
|
||||
/// ack the matching EventIds flip to
|
||||
/// <see cref="ScadaLink.Commons.Types.Enums.AuditForwardState.Forwarded"/>; on
|
||||
/// a gRPC failure the rows stay Pending and the next drain retries.
|
||||
/// </summary>
|
||||
/// <remarks>
|
||||
/// <para>
|
||||
/// The drain self-tick is a private <c>Drain</c> message scheduled via the
|
||||
/// actor system scheduler. The cadence is options-driven: <c>BusyIntervalSeconds</c>
|
||||
/// when the previous drain found rows (or faulted — we want quick recovery),
|
||||
/// <c>IdleIntervalSeconds</c> when the queue was empty.
|
||||
/// </para>
|
||||
/// <para>
|
||||
/// Both collaborators are injected as interfaces (<see cref="ISiteAuditQueue"/>
|
||||
/// and <see cref="ISiteStreamAuditClient"/>) so unit tests substitute with
|
||||
/// NSubstitute and never touch real SQLite or gRPC.
|
||||
/// </para>
|
||||
/// <para>
|
||||
/// Per Bundle D's brief, audit-write paths must be fail-safe — a thrown
|
||||
/// exception inside the actor MUST NOT crash it. The Drain handler wraps the
|
||||
/// pipeline in a top-level try/catch that logs and re-schedules, and the
|
||||
/// actor's <see cref="SupervisorStrategy"/> defaults to
|
||||
/// <see cref="Akka.Actor.SupervisorStrategy.DefaultStrategy"/>'s Restart for
|
||||
/// child actors — but this actor has no children, so the catch is what matters.
|
||||
/// </para>
|
||||
/// </remarks>
|
||||
public class SiteAuditTelemetryActor : ReceiveActor
|
||||
{
|
||||
private readonly ISiteAuditQueue _queue;
|
||||
private readonly ISiteStreamAuditClient _client;
|
||||
private readonly SiteAuditTelemetryOptions _options;
|
||||
private readonly ILogger<SiteAuditTelemetryActor> _logger;
|
||||
private ICancelable? _pendingTick;
|
||||
|
||||
public SiteAuditTelemetryActor(
|
||||
ISiteAuditQueue queue,
|
||||
ISiteStreamAuditClient client,
|
||||
IOptions<SiteAuditTelemetryOptions> options,
|
||||
ILogger<SiteAuditTelemetryActor> logger)
|
||||
{
|
||||
ArgumentNullException.ThrowIfNull(queue);
|
||||
ArgumentNullException.ThrowIfNull(client);
|
||||
ArgumentNullException.ThrowIfNull(options);
|
||||
ArgumentNullException.ThrowIfNull(logger);
|
||||
|
||||
_queue = queue;
|
||||
_client = client;
|
||||
_options = options.Value;
|
||||
_logger = logger;
|
||||
|
||||
ReceiveAsync<Drain>(_ => OnDrainAsync());
|
||||
}
|
||||
|
||||
protected override void PreStart()
|
||||
{
|
||||
base.PreStart();
|
||||
// Initial tick fires on the busy interval so the actor starts polling
|
||||
// soon after host startup. A subsequent empty drain will move to the
|
||||
// idle interval naturally.
|
||||
ScheduleNext(TimeSpan.FromSeconds(_options.BusyIntervalSeconds));
|
||||
}
|
||||
|
||||
protected override void PostStop()
|
||||
{
|
||||
_pendingTick?.Cancel();
|
||||
base.PostStop();
|
||||
}
|
||||
|
||||
private async Task OnDrainAsync()
|
||||
{
|
||||
var nextDelay = TimeSpan.FromSeconds(_options.BusyIntervalSeconds);
|
||||
try
|
||||
{
|
||||
var pending = await _queue.ReadPendingAsync(_options.BatchSize, CancellationToken.None)
|
||||
.ConfigureAwait(false);
|
||||
if (pending.Count == 0)
|
||||
{
|
||||
// No rows — settle into the idle cadence until the next write
|
||||
// bumps us back into the busy cadence.
|
||||
nextDelay = TimeSpan.FromSeconds(_options.IdleIntervalSeconds);
|
||||
return;
|
||||
}
|
||||
|
||||
var batch = BuildBatch(pending);
|
||||
|
||||
IngestAck ack;
|
||||
try
|
||||
{
|
||||
ack = await _client.IngestAuditEventsAsync(batch, CancellationToken.None)
|
||||
.ConfigureAwait(false);
|
||||
}
|
||||
catch (Exception ex)
|
||||
{
|
||||
// gRPC fault — leave the rows in Pending so the next drain
|
||||
// retries. Bundle D's brief: "On gRPC exception (any), log
|
||||
// Warning, schedule next Drain in BusyIntervalSeconds."
|
||||
_logger.LogWarning(ex,
|
||||
"IngestAuditEvents push failed for {Count} pending events; will retry next drain.",
|
||||
pending.Count);
|
||||
return;
|
||||
}
|
||||
|
||||
var acceptedIds = ParseAcceptedIds(ack);
|
||||
if (acceptedIds.Count > 0)
|
||||
{
|
||||
await _queue.MarkForwardedAsync(acceptedIds, CancellationToken.None)
|
||||
.ConfigureAwait(false);
|
||||
}
|
||||
}
|
||||
catch (Exception ex)
|
||||
{
|
||||
// Catch-all so a SQLite hiccup or mapper bug never crashes the
|
||||
// actor. The next tick is still scheduled in the finally block.
|
||||
_logger.LogError(ex, "Unexpected error during audit-log telemetry drain.");
|
||||
}
|
||||
finally
|
||||
{
|
||||
ScheduleNext(nextDelay);
|
||||
}
|
||||
}
|
||||
|
||||
private static AuditEventBatch BuildBatch(IReadOnlyList<AuditEvent> events)
|
||||
{
|
||||
var batch = new AuditEventBatch();
|
||||
foreach (var e in events)
|
||||
{
|
||||
batch.Events.Add(AuditEventMapper.ToDto(e));
|
||||
}
|
||||
return batch;
|
||||
}
|
||||
|
||||
private static IReadOnlyList<Guid> ParseAcceptedIds(IngestAck ack)
|
||||
{
|
||||
if (ack.AcceptedEventIds.Count == 0)
|
||||
{
|
||||
return Array.Empty<Guid>();
|
||||
}
|
||||
|
||||
var list = new List<Guid>(ack.AcceptedEventIds.Count);
|
||||
foreach (var raw in ack.AcceptedEventIds)
|
||||
{
|
||||
if (Guid.TryParse(raw, out var id))
|
||||
{
|
||||
list.Add(id);
|
||||
}
|
||||
// Malformed ids are ignored — central should never emit them, but
|
||||
// we refuse to crash the actor over a bad string.
|
||||
}
|
||||
return list;
|
||||
}
|
||||
|
||||
private void ScheduleNext(TimeSpan delay)
|
||||
{
|
||||
_pendingTick?.Cancel();
|
||||
_pendingTick = Context.System.Scheduler.ScheduleTellOnceCancelable(
|
||||
delay,
|
||||
Self,
|
||||
Drain.Instance,
|
||||
Self);
|
||||
}
|
||||
|
||||
/// <summary>Self-tick message that triggers a drain cycle.</summary>
|
||||
private sealed class Drain
|
||||
{
|
||||
public static readonly Drain Instance = new();
|
||||
private Drain() { }
|
||||
}
|
||||
}
|
||||
@@ -0,0 +1,28 @@
|
||||
namespace ScadaLink.AuditLog.Site.Telemetry;
|
||||
|
||||
/// <summary>
|
||||
/// Tuning knobs for the site-side <see cref="SiteAuditTelemetryActor"/> drain
|
||||
/// loop. Defaults mirror Bundle D's plan: drain every 5 s while rows are
|
||||
/// flowing (busy), every 30 s when the queue is empty (idle).
|
||||
/// </summary>
|
||||
public sealed class SiteAuditTelemetryOptions
|
||||
{
|
||||
/// <summary>
|
||||
/// Maximum number of <see cref="ScadaLink.Commons.Entities.Audit.AuditEvent"/>
|
||||
/// rows read from the site SQLite queue and pushed in a single gRPC batch.
|
||||
/// </summary>
|
||||
public int BatchSize { get; set; } = 256;
|
||||
|
||||
/// <summary>
|
||||
/// Delay between drains when the previous drain found at least one Pending
|
||||
/// row OR the previous push faulted. Re-drain quickly to keep telemetry
|
||||
/// flowing and to retry transient gRPC errors.
|
||||
/// </summary>
|
||||
public int BusyIntervalSeconds { get; set; } = 5;
|
||||
|
||||
/// <summary>
|
||||
/// Delay between drains when the previous drain found no Pending rows.
|
||||
/// Longer interval avoids hammering an idle SQLite + gRPC channel.
|
||||
/// </summary>
|
||||
public int IdleIntervalSeconds { get; set; } = 30;
|
||||
}
|
||||
112
src/ScadaLink.AuditLog/Telemetry/AuditEventMapper.cs
Normal file
112
src/ScadaLink.AuditLog/Telemetry/AuditEventMapper.cs
Normal file
@@ -0,0 +1,112 @@
|
||||
using ScadaLink.Commons.Entities.Audit;
|
||||
using ScadaLink.Commons.Types.Enums;
|
||||
using ScadaLink.Communication.Grpc;
|
||||
using Timestamp = Google.Protobuf.WellKnownTypes.Timestamp;
|
||||
|
||||
namespace ScadaLink.AuditLog.Telemetry;
|
||||
|
||||
/// <summary>
|
||||
/// Bridges Audit Log (#23) rows between the in-process <see cref="AuditEvent"/> record
|
||||
/// and the wire-format <see cref="AuditEventDto"/> exchanged over the
|
||||
/// <c>IngestAuditEvents</c> RPC.
|
||||
/// </summary>
|
||||
/// <remarks>
|
||||
/// <para><b>Lossy by design:</b> the proto contract intentionally omits two fields.</para>
|
||||
/// <list type="bullet">
|
||||
/// <item><see cref="AuditEvent.ForwardState"/> — site-local SQLite state, never travels.</item>
|
||||
/// <item><see cref="AuditEvent.IngestedAtUtc"/> — central-set at ingest time, not at the site.</item>
|
||||
/// </list>
|
||||
/// <para>
|
||||
/// String nullability convention: proto3 scalar strings cannot be absent, so nullable
|
||||
/// .NET strings round-trip as empty strings on the wire. Nullable integers use the
|
||||
/// <c>Int32Value</c> wrapper so they preserve true null semantics.
|
||||
/// </para>
|
||||
/// </remarks>
|
||||
public static class AuditEventMapper
|
||||
{
|
||||
/// <summary>
|
||||
/// Projects an <see cref="AuditEvent"/> into its wire-format DTO. Null reference
|
||||
/// fields collapse to empty strings; null integer fields leave the wrapper unset.
|
||||
/// </summary>
|
||||
public static AuditEventDto ToDto(AuditEvent evt)
|
||||
{
|
||||
ArgumentNullException.ThrowIfNull(evt);
|
||||
|
||||
var dto = new AuditEventDto
|
||||
{
|
||||
EventId = evt.EventId.ToString(),
|
||||
OccurredAtUtc = Timestamp.FromDateTime(EnsureUtc(evt.OccurredAtUtc)),
|
||||
Channel = evt.Channel.ToString(),
|
||||
Kind = evt.Kind.ToString(),
|
||||
CorrelationId = evt.CorrelationId?.ToString() ?? string.Empty,
|
||||
SourceSiteId = evt.SourceSiteId ?? string.Empty,
|
||||
SourceInstanceId = evt.SourceInstanceId ?? string.Empty,
|
||||
SourceScript = evt.SourceScript ?? string.Empty,
|
||||
Actor = evt.Actor ?? string.Empty,
|
||||
Target = evt.Target ?? string.Empty,
|
||||
Status = evt.Status.ToString(),
|
||||
ErrorMessage = evt.ErrorMessage ?? string.Empty,
|
||||
ErrorDetail = evt.ErrorDetail ?? string.Empty,
|
||||
RequestSummary = evt.RequestSummary ?? string.Empty,
|
||||
ResponseSummary = evt.ResponseSummary ?? string.Empty,
|
||||
PayloadTruncated = evt.PayloadTruncated,
|
||||
Extra = evt.Extra ?? string.Empty
|
||||
};
|
||||
|
||||
if (evt.HttpStatus.HasValue)
|
||||
{
|
||||
dto.HttpStatus = evt.HttpStatus.Value;
|
||||
}
|
||||
|
||||
if (evt.DurationMs.HasValue)
|
||||
{
|
||||
dto.DurationMs = evt.DurationMs.Value;
|
||||
}
|
||||
|
||||
return dto;
|
||||
}
|
||||
|
||||
/// <summary>
|
||||
/// Reconstructs an <see cref="AuditEvent"/> from its wire-format DTO. Empty strings
|
||||
/// rehydrate as null reference values; absent integer wrappers stay null.
|
||||
/// <see cref="AuditEvent.ForwardState"/> and <see cref="AuditEvent.IngestedAtUtc"/>
|
||||
/// are intentionally left null — the central ingest actor sets the latter.
|
||||
/// </summary>
|
||||
public static AuditEvent FromDto(AuditEventDto dto)
|
||||
{
|
||||
ArgumentNullException.ThrowIfNull(dto);
|
||||
|
||||
return new AuditEvent
|
||||
{
|
||||
EventId = Guid.Parse(dto.EventId),
|
||||
OccurredAtUtc = DateTime.SpecifyKind(dto.OccurredAtUtc.ToDateTime(), DateTimeKind.Utc),
|
||||
IngestedAtUtc = null,
|
||||
Channel = Enum.Parse<AuditChannel>(dto.Channel),
|
||||
Kind = Enum.Parse<AuditKind>(dto.Kind),
|
||||
CorrelationId = NullIfEmpty(dto.CorrelationId) is { } cid ? Guid.Parse(cid) : null,
|
||||
SourceSiteId = NullIfEmpty(dto.SourceSiteId),
|
||||
SourceInstanceId = NullIfEmpty(dto.SourceInstanceId),
|
||||
SourceScript = NullIfEmpty(dto.SourceScript),
|
||||
Actor = NullIfEmpty(dto.Actor),
|
||||
Target = NullIfEmpty(dto.Target),
|
||||
Status = Enum.Parse<AuditStatus>(dto.Status),
|
||||
HttpStatus = dto.HttpStatus,
|
||||
DurationMs = dto.DurationMs,
|
||||
ErrorMessage = NullIfEmpty(dto.ErrorMessage),
|
||||
ErrorDetail = NullIfEmpty(dto.ErrorDetail),
|
||||
RequestSummary = NullIfEmpty(dto.RequestSummary),
|
||||
ResponseSummary = NullIfEmpty(dto.ResponseSummary),
|
||||
PayloadTruncated = dto.PayloadTruncated,
|
||||
Extra = NullIfEmpty(dto.Extra),
|
||||
ForwardState = null
|
||||
};
|
||||
}
|
||||
|
||||
private static string? NullIfEmpty(string? value) =>
|
||||
string.IsNullOrEmpty(value) ? null : value;
|
||||
|
||||
private static DateTime EnsureUtc(DateTime value) =>
|
||||
value.Kind == DateTimeKind.Utc
|
||||
? value
|
||||
: DateTime.SpecifyKind(value.ToUniversalTime(), DateTimeKind.Utc);
|
||||
}
|
||||
@@ -0,0 +1,20 @@
|
||||
using ScadaLink.Commons.Entities.Audit;
|
||||
|
||||
namespace ScadaLink.Commons.Messages.Audit;
|
||||
|
||||
/// <summary>
|
||||
/// Akka message sent to the central <c>AuditLogIngestActor</c> (Audit Log #23,
|
||||
/// M2 site-sync pipeline) carrying a batch of <see cref="AuditEvent"/> rows
|
||||
/// decoded by the <c>SiteStreamGrpcServer</c> from a site's
|
||||
/// <c>IngestAuditEvents</c> gRPC RPC. The actor stamps
|
||||
/// <see cref="AuditEvent.IngestedAtUtc"/> and writes the rows idempotently to
|
||||
/// the central <c>AuditLog</c> table.
|
||||
/// </summary>
|
||||
/// <remarks>
|
||||
/// Lives in <c>ScadaLink.Commons</c> rather than <c>ScadaLink.AuditLog</c>
|
||||
/// because the gRPC server in <c>ScadaLink.Communication</c> needs to construct
|
||||
/// it, and <c>ScadaLink.AuditLog</c> already references
|
||||
/// <c>ScadaLink.Communication</c> (the proto DTOs live there). Putting the
|
||||
/// message in Commons avoids a project-reference cycle.
|
||||
/// </remarks>
|
||||
public sealed record IngestAuditEventsCommand(IReadOnlyList<AuditEvent> Events);
|
||||
@@ -0,0 +1,11 @@
|
||||
namespace ScadaLink.Commons.Messages.Audit;
|
||||
|
||||
/// <summary>
|
||||
/// Reply from the central <c>AuditLogIngestActor</c> for an
|
||||
/// <see cref="IngestAuditEventsCommand"/>. <see cref="AcceptedEventIds"/> lists
|
||||
/// every row the actor considers durably persisted at central — including ids
|
||||
/// that were already present before the call (first-write-wins idempotency).
|
||||
/// The gRPC handler echoes these ids back over the wire as the <c>IngestAck</c>
|
||||
/// the site uses to flip rows to <c>Forwarded</c>.
|
||||
/// </summary>
|
||||
public sealed record IngestAuditEventsReply(IReadOnlyList<Guid> AcceptedEventIds);
|
||||
@@ -20,7 +20,12 @@ public record SiteHealthReport(
|
||||
IReadOnlyDictionary<string, string>? DataConnectionEndpoints = null,
|
||||
IReadOnlyDictionary<string, TagQualityCounts>? DataConnectionTagQuality = null,
|
||||
int ParkedMessageCount = 0,
|
||||
IReadOnlyList<NodeStatus>? ClusterNodes = null);
|
||||
IReadOnlyList<NodeStatus>? ClusterNodes = null,
|
||||
// Audit Log (#23) M2 Bundle G: per-interval count of FallbackAuditWriter
|
||||
// primary failures (SQLite throws routed to the drop-oldest ring). Surfaces
|
||||
// a sustained audit-write outage on /monitoring/health. Defaults to 0 so
|
||||
// existing producers / tests that don't construct the field stay valid.
|
||||
int SiteAuditWriteFailures = 0);
|
||||
|
||||
/// <summary>
|
||||
/// Broadcast wrapper used between central nodes to keep per-node
|
||||
|
||||
@@ -4,6 +4,9 @@ using Akka.Actor;
|
||||
using Grpc.Core;
|
||||
using Microsoft.Extensions.Logging;
|
||||
using Microsoft.Extensions.Options;
|
||||
using ScadaLink.Commons.Entities.Audit;
|
||||
using ScadaLink.Commons.Messages.Audit;
|
||||
using ScadaLink.Commons.Types.Enums;
|
||||
using GrpcStatus = Grpc.Core.Status;
|
||||
|
||||
namespace ScadaLink.Communication.Grpc;
|
||||
@@ -23,6 +26,15 @@ public class SiteStreamGrpcServer : SiteStreamService.SiteStreamServiceBase
|
||||
private readonly TimeSpan _maxStreamLifetime;
|
||||
private volatile bool _ready;
|
||||
private long _actorCounter;
|
||||
// Audit Log (#23 M2): central-side ingest actor proxy. Set by the host
|
||||
// after the cluster singleton starts (see Bundle E wiring). When null the
|
||||
// IngestAuditEvents RPC replies with an empty IngestAck so sites can
|
||||
// safely retry — wiring-incomplete is treated as transient, never fatal.
|
||||
private IActorRef? _auditIngestActor;
|
||||
// Per Bundle D's brief — Ask timeout is 30 s. The ingest actor's repo
|
||||
// calls are sub-100 ms in steady state; a generous timeout absorbs a slow
|
||||
// MSSQL connection without surfacing as a gRPC failure on a healthy site.
|
||||
private static readonly TimeSpan AuditIngestAskTimeout = TimeSpan.FromSeconds(30);
|
||||
|
||||
/// <summary>
|
||||
/// Test-only constructor — kept <c>internal</c> so the DI container sees a
|
||||
@@ -76,6 +88,19 @@ public class SiteStreamGrpcServer : SiteStreamService.SiteStreamServiceBase
|
||||
_ready = true;
|
||||
}
|
||||
|
||||
/// <summary>
|
||||
/// Hands the central-side <c>AuditLogIngestActor</c> proxy to the gRPC
|
||||
/// server so the <see cref="IngestAuditEvents"/> RPC can route incoming
|
||||
/// site batches. Audit Log (#23) M2 wiring point — mirrors the way
|
||||
/// <c>CommunicationService.SetNotificationOutbox</c> takes the Notification
|
||||
/// Outbox singleton proxy. Bundle E supplies the actor after the cluster
|
||||
/// singleton starts.
|
||||
/// </summary>
|
||||
public void SetAuditIngestActor(IActorRef proxy)
|
||||
{
|
||||
_auditIngestActor = proxy;
|
||||
}
|
||||
|
||||
/// <summary>
|
||||
/// Number of currently active streaming subscriptions. Exposed for diagnostics.
|
||||
/// </summary>
|
||||
@@ -168,6 +193,114 @@ public class SiteStreamGrpcServer : SiteStreamService.SiteStreamServiceBase
|
||||
}
|
||||
}
|
||||
|
||||
/// <summary>
|
||||
/// Audit Log (#23) M2 site→central push RPC. Decodes a site batch into
|
||||
/// <see cref="AuditEvent"/> rows, Asks the central <c>AuditLogIngestActor</c>
|
||||
/// proxy to persist them, and echoes the accepted EventIds back so the site
|
||||
/// can flip its local rows to <c>Forwarded</c>.
|
||||
/// </summary>
|
||||
/// <remarks>
|
||||
/// <para>
|
||||
/// The DTO→entity conversion is inlined here (rather than calling the
|
||||
/// AuditLog mapper) to avoid a project-reference cycle:
|
||||
/// <c>ScadaLink.AuditLog</c> already references
|
||||
/// <c>ScadaLink.Communication</c>, so the gRPC server cannot reach back
|
||||
/// into AuditLog for its mapper. The shape mirrors
|
||||
/// <c>AuditEventMapper.FromDto</c> in <c>ScadaLink.AuditLog.Telemetry</c>;
|
||||
/// the two must evolve together.
|
||||
/// </para>
|
||||
/// <para>
|
||||
/// When <see cref="_auditIngestActor"/> is not yet wired (host startup
|
||||
/// race window), the RPC returns an empty <see cref="IngestAck"/> rather
|
||||
/// than failing — the site treats the missing ack as a transient outcome
|
||||
/// and retries on the next drain, which is the desired idempotent
|
||||
/// behaviour.
|
||||
/// </para>
|
||||
/// </remarks>
|
||||
public override async Task<IngestAck> IngestAuditEvents(
|
||||
AuditEventBatch request,
|
||||
ServerCallContext context)
|
||||
{
|
||||
// Empty batch is a no-op; reply immediately so the client moves on.
|
||||
if (request.Events.Count == 0)
|
||||
{
|
||||
return new IngestAck();
|
||||
}
|
||||
|
||||
var actor = _auditIngestActor;
|
||||
if (actor is null)
|
||||
{
|
||||
// Wiring incomplete (host startup race). Sites treat an empty
|
||||
// ack as "nothing was acked, leave rows Pending, retry next
|
||||
// drain" — exactly the right behaviour during host bring-up.
|
||||
_logger.LogWarning(
|
||||
"IngestAuditEvents received {Count} events before SetAuditIngestActor was called; returning empty ack.",
|
||||
request.Events.Count);
|
||||
return new IngestAck();
|
||||
}
|
||||
|
||||
// Inlined FromDto. Keep in sync with AuditEventMapper.FromDto in
|
||||
// ScadaLink.AuditLog.Telemetry — there is no shared mapper because
|
||||
// doing so would create a project-reference cycle (AuditLog → Communication).
|
||||
var entities = new List<AuditEvent>(request.Events.Count);
|
||||
foreach (var dto in request.Events)
|
||||
{
|
||||
entities.Add(new AuditEvent
|
||||
{
|
||||
EventId = Guid.Parse(dto.EventId),
|
||||
OccurredAtUtc = DateTime.SpecifyKind(dto.OccurredAtUtc.ToDateTime(), DateTimeKind.Utc),
|
||||
IngestedAtUtc = null,
|
||||
Channel = Enum.Parse<AuditChannel>(dto.Channel),
|
||||
Kind = Enum.Parse<AuditKind>(dto.Kind),
|
||||
CorrelationId = string.IsNullOrEmpty(dto.CorrelationId) ? null : Guid.Parse(dto.CorrelationId),
|
||||
SourceSiteId = NullIfEmpty(dto.SourceSiteId),
|
||||
SourceInstanceId = NullIfEmpty(dto.SourceInstanceId),
|
||||
SourceScript = NullIfEmpty(dto.SourceScript),
|
||||
Actor = NullIfEmpty(dto.Actor),
|
||||
Target = NullIfEmpty(dto.Target),
|
||||
Status = Enum.Parse<AuditStatus>(dto.Status),
|
||||
HttpStatus = dto.HttpStatus,
|
||||
DurationMs = dto.DurationMs,
|
||||
ErrorMessage = NullIfEmpty(dto.ErrorMessage),
|
||||
ErrorDetail = NullIfEmpty(dto.ErrorDetail),
|
||||
RequestSummary = NullIfEmpty(dto.RequestSummary),
|
||||
ResponseSummary = NullIfEmpty(dto.ResponseSummary),
|
||||
PayloadTruncated = dto.PayloadTruncated,
|
||||
Extra = NullIfEmpty(dto.Extra),
|
||||
ForwardState = null,
|
||||
});
|
||||
}
|
||||
|
||||
var cmd = new IngestAuditEventsCommand(entities);
|
||||
IngestAuditEventsReply reply;
|
||||
try
|
||||
{
|
||||
reply = await actor.Ask<IngestAuditEventsReply>(
|
||||
cmd, AuditIngestAskTimeout, context.CancellationToken);
|
||||
}
|
||||
catch (Exception ex)
|
||||
{
|
||||
// Audit ingest is best-effort; failing this RPC at the gRPC layer
|
||||
// would surface as a transport error and force the site to retry
|
||||
// (which it would do anyway). Logging + an empty ack keeps the
|
||||
// semantics consistent with the "wiring incomplete" path above.
|
||||
_logger.LogError(ex,
|
||||
"AuditLogIngestActor Ask failed for batch of {Count} events; returning empty ack.",
|
||||
request.Events.Count);
|
||||
return new IngestAck();
|
||||
}
|
||||
|
||||
var ack = new IngestAck();
|
||||
foreach (var id in reply.AcceptedEventIds)
|
||||
{
|
||||
ack.AcceptedEventIds.Add(id.ToString());
|
||||
}
|
||||
return ack;
|
||||
}
|
||||
|
||||
private static string? NullIfEmpty(string? value) =>
|
||||
string.IsNullOrEmpty(value) ? null : value;
|
||||
|
||||
/// <summary>
|
||||
/// Tracks a single active stream so cleanup only removes its own entry.
|
||||
/// </summary>
|
||||
|
||||
@@ -3,9 +3,11 @@ option csharp_namespace = "ScadaLink.Communication.Grpc";
|
||||
package sitestream;
|
||||
|
||||
import "google/protobuf/timestamp.proto";
|
||||
import "google/protobuf/wrappers.proto"; // Int32Value
|
||||
|
||||
service SiteStreamService {
|
||||
rpc SubscribeInstance(InstanceStreamRequest) returns (stream SiteStreamEvent);
|
||||
rpc IngestAuditEvents(AuditEventBatch) returns (IngestAck);
|
||||
}
|
||||
|
||||
message InstanceStreamRequest {
|
||||
@@ -63,3 +65,31 @@ message AlarmStateUpdate {
|
||||
AlarmLevelEnum level = 6; // ALARM_LEVEL_NONE for binary trigger types; set by HiLo.
|
||||
string message = 7; // Optional per-band operator message; empty when unset.
|
||||
}
|
||||
|
||||
// Audit Log (#23) telemetry: single lifecycle event ferried from a site SQLite
|
||||
// hot-path row to central via IngestAuditEvents. Mirrors AuditEvent (Commons)
|
||||
// minus the site-local ForwardState and the central IngestedAtUtc (set on ingest).
|
||||
message AuditEventDto {
|
||||
string event_id = 1;
|
||||
google.protobuf.Timestamp occurred_at_utc = 2;
|
||||
string channel = 3;
|
||||
string kind = 4;
|
||||
string correlation_id = 5; // empty string represents null
|
||||
string source_site_id = 6;
|
||||
string source_instance_id = 7;
|
||||
string source_script = 8;
|
||||
string actor = 9;
|
||||
string target = 10;
|
||||
string status = 11;
|
||||
google.protobuf.Int32Value http_status = 12; // null when absent
|
||||
google.protobuf.Int32Value duration_ms = 13;
|
||||
string error_message = 14;
|
||||
string error_detail = 15;
|
||||
string request_summary = 16;
|
||||
string response_summary = 17;
|
||||
bool payload_truncated = 18;
|
||||
string extra = 19;
|
||||
}
|
||||
|
||||
message AuditEventBatch { repeated AuditEventDto events = 1; }
|
||||
message IngestAck { repeated string accepted_event_ids = 1; }
|
||||
|
||||
File diff suppressed because it is too large
Load Diff
@@ -49,6 +49,10 @@ namespace ScadaLink.Communication.Grpc {
|
||||
static readonly grpc::Marshaller<global::ScadaLink.Communication.Grpc.InstanceStreamRequest> __Marshaller_sitestream_InstanceStreamRequest = grpc::Marshallers.Create(__Helper_SerializeMessage, context => __Helper_DeserializeMessage(context, global::ScadaLink.Communication.Grpc.InstanceStreamRequest.Parser));
|
||||
[global::System.CodeDom.Compiler.GeneratedCode("grpc_csharp_plugin", null)]
|
||||
static readonly grpc::Marshaller<global::ScadaLink.Communication.Grpc.SiteStreamEvent> __Marshaller_sitestream_SiteStreamEvent = grpc::Marshallers.Create(__Helper_SerializeMessage, context => __Helper_DeserializeMessage(context, global::ScadaLink.Communication.Grpc.SiteStreamEvent.Parser));
|
||||
[global::System.CodeDom.Compiler.GeneratedCode("grpc_csharp_plugin", null)]
|
||||
static readonly grpc::Marshaller<global::ScadaLink.Communication.Grpc.AuditEventBatch> __Marshaller_sitestream_AuditEventBatch = grpc::Marshallers.Create(__Helper_SerializeMessage, context => __Helper_DeserializeMessage(context, global::ScadaLink.Communication.Grpc.AuditEventBatch.Parser));
|
||||
[global::System.CodeDom.Compiler.GeneratedCode("grpc_csharp_plugin", null)]
|
||||
static readonly grpc::Marshaller<global::ScadaLink.Communication.Grpc.IngestAck> __Marshaller_sitestream_IngestAck = grpc::Marshallers.Create(__Helper_SerializeMessage, context => __Helper_DeserializeMessage(context, global::ScadaLink.Communication.Grpc.IngestAck.Parser));
|
||||
|
||||
[global::System.CodeDom.Compiler.GeneratedCode("grpc_csharp_plugin", null)]
|
||||
static readonly grpc::Method<global::ScadaLink.Communication.Grpc.InstanceStreamRequest, global::ScadaLink.Communication.Grpc.SiteStreamEvent> __Method_SubscribeInstance = new grpc::Method<global::ScadaLink.Communication.Grpc.InstanceStreamRequest, global::ScadaLink.Communication.Grpc.SiteStreamEvent>(
|
||||
@@ -58,6 +62,14 @@ namespace ScadaLink.Communication.Grpc {
|
||||
__Marshaller_sitestream_InstanceStreamRequest,
|
||||
__Marshaller_sitestream_SiteStreamEvent);
|
||||
|
||||
[global::System.CodeDom.Compiler.GeneratedCode("grpc_csharp_plugin", null)]
|
||||
static readonly grpc::Method<global::ScadaLink.Communication.Grpc.AuditEventBatch, global::ScadaLink.Communication.Grpc.IngestAck> __Method_IngestAuditEvents = new grpc::Method<global::ScadaLink.Communication.Grpc.AuditEventBatch, global::ScadaLink.Communication.Grpc.IngestAck>(
|
||||
grpc::MethodType.Unary,
|
||||
__ServiceName,
|
||||
"IngestAuditEvents",
|
||||
__Marshaller_sitestream_AuditEventBatch,
|
||||
__Marshaller_sitestream_IngestAck);
|
||||
|
||||
/// <summary>Service descriptor</summary>
|
||||
public static global::Google.Protobuf.Reflection.ServiceDescriptor Descriptor
|
||||
{
|
||||
@@ -74,6 +86,12 @@ namespace ScadaLink.Communication.Grpc {
|
||||
throw new grpc::RpcException(new grpc::Status(grpc::StatusCode.Unimplemented, ""));
|
||||
}
|
||||
|
||||
[global::System.CodeDom.Compiler.GeneratedCode("grpc_csharp_plugin", null)]
|
||||
public virtual global::System.Threading.Tasks.Task<global::ScadaLink.Communication.Grpc.IngestAck> IngestAuditEvents(global::ScadaLink.Communication.Grpc.AuditEventBatch request, grpc::ServerCallContext context)
|
||||
{
|
||||
throw new grpc::RpcException(new grpc::Status(grpc::StatusCode.Unimplemented, ""));
|
||||
}
|
||||
|
||||
}
|
||||
|
||||
/// <summary>Client for SiteStreamService</summary>
|
||||
@@ -113,6 +131,26 @@ namespace ScadaLink.Communication.Grpc {
|
||||
{
|
||||
return CallInvoker.AsyncServerStreamingCall(__Method_SubscribeInstance, null, options, request);
|
||||
}
|
||||
[global::System.CodeDom.Compiler.GeneratedCode("grpc_csharp_plugin", null)]
|
||||
public virtual global::ScadaLink.Communication.Grpc.IngestAck IngestAuditEvents(global::ScadaLink.Communication.Grpc.AuditEventBatch request, grpc::Metadata headers = null, global::System.DateTime? deadline = null, global::System.Threading.CancellationToken cancellationToken = default(global::System.Threading.CancellationToken))
|
||||
{
|
||||
return IngestAuditEvents(request, new grpc::CallOptions(headers, deadline, cancellationToken));
|
||||
}
|
||||
[global::System.CodeDom.Compiler.GeneratedCode("grpc_csharp_plugin", null)]
|
||||
public virtual global::ScadaLink.Communication.Grpc.IngestAck IngestAuditEvents(global::ScadaLink.Communication.Grpc.AuditEventBatch request, grpc::CallOptions options)
|
||||
{
|
||||
return CallInvoker.BlockingUnaryCall(__Method_IngestAuditEvents, null, options, request);
|
||||
}
|
||||
[global::System.CodeDom.Compiler.GeneratedCode("grpc_csharp_plugin", null)]
|
||||
public virtual grpc::AsyncUnaryCall<global::ScadaLink.Communication.Grpc.IngestAck> IngestAuditEventsAsync(global::ScadaLink.Communication.Grpc.AuditEventBatch request, grpc::Metadata headers = null, global::System.DateTime? deadline = null, global::System.Threading.CancellationToken cancellationToken = default(global::System.Threading.CancellationToken))
|
||||
{
|
||||
return IngestAuditEventsAsync(request, new grpc::CallOptions(headers, deadline, cancellationToken));
|
||||
}
|
||||
[global::System.CodeDom.Compiler.GeneratedCode("grpc_csharp_plugin", null)]
|
||||
public virtual grpc::AsyncUnaryCall<global::ScadaLink.Communication.Grpc.IngestAck> IngestAuditEventsAsync(global::ScadaLink.Communication.Grpc.AuditEventBatch request, grpc::CallOptions options)
|
||||
{
|
||||
return CallInvoker.AsyncUnaryCall(__Method_IngestAuditEvents, null, options, request);
|
||||
}
|
||||
/// <summary>Creates a new instance of client from given <c>ClientBaseConfiguration</c>.</summary>
|
||||
[global::System.CodeDom.Compiler.GeneratedCode("grpc_csharp_plugin", null)]
|
||||
protected override SiteStreamServiceClient NewInstance(ClientBaseConfiguration configuration)
|
||||
@@ -127,7 +165,8 @@ namespace ScadaLink.Communication.Grpc {
|
||||
public static grpc::ServerServiceDefinition BindService(SiteStreamServiceBase serviceImpl)
|
||||
{
|
||||
return grpc::ServerServiceDefinition.CreateBuilder()
|
||||
.AddMethod(__Method_SubscribeInstance, serviceImpl.SubscribeInstance).Build();
|
||||
.AddMethod(__Method_SubscribeInstance, serviceImpl.SubscribeInstance)
|
||||
.AddMethod(__Method_IngestAuditEvents, serviceImpl.IngestAuditEvents).Build();
|
||||
}
|
||||
|
||||
/// <summary>Register service method with a service binder with or without implementation. Useful when customizing the service binding logic.
|
||||
@@ -138,6 +177,7 @@ namespace ScadaLink.Communication.Grpc {
|
||||
public static void BindService(grpc::ServiceBinderBase serviceBinder, SiteStreamServiceBase serviceImpl)
|
||||
{
|
||||
serviceBinder.AddMethod(__Method_SubscribeInstance, serviceImpl == null ? null : new grpc::ServerStreamingServerMethod<global::ScadaLink.Communication.Grpc.InstanceStreamRequest, global::ScadaLink.Communication.Grpc.SiteStreamEvent>(serviceImpl.SubscribeInstance));
|
||||
serviceBinder.AddMethod(__Method_IngestAuditEvents, serviceImpl == null ? null : new grpc::UnaryServerMethod<global::ScadaLink.Communication.Grpc.AuditEventBatch, global::ScadaLink.Communication.Grpc.IngestAck>(serviceImpl.IngestAuditEvents));
|
||||
}
|
||||
|
||||
}
|
||||
|
||||
@@ -1,4 +1,7 @@
|
||||
using Microsoft.Data.SqlClient;
|
||||
using Microsoft.EntityFrameworkCore;
|
||||
using Microsoft.Extensions.Logging;
|
||||
using Microsoft.Extensions.Logging.Abstractions;
|
||||
using ScadaLink.Commons.Entities.Audit;
|
||||
using ScadaLink.Commons.Interfaces.Repositories;
|
||||
using ScadaLink.Commons.Types.Audit;
|
||||
@@ -12,11 +15,22 @@ namespace ScadaLink.ConfigurationDatabase.Repositories;
|
||||
/// </summary>
|
||||
public class AuditLogRepository : IAuditLogRepository
|
||||
{
|
||||
private readonly ScadaLinkDbContext _context;
|
||||
// SQL Server error numbers for duplicate-key violations on
|
||||
// UX_AuditLog_EventId. 2601 is a unique-index violation; 2627 is a
|
||||
// primary-key/unique-constraint violation. The IF NOT EXISTS … INSERT
|
||||
// pattern has a check-then-act race window — two sessions can both pass
|
||||
// the EXISTS check and then both attempt the INSERT — and the loser
|
||||
// surfaces as one of these errors. Idempotency demands we swallow them.
|
||||
private const int SqlErrorUniqueIndexViolation = 2601;
|
||||
private const int SqlErrorPrimaryKeyViolation = 2627;
|
||||
|
||||
public AuditLogRepository(ScadaLinkDbContext context)
|
||||
private readonly ScadaLinkDbContext _context;
|
||||
private readonly ILogger<AuditLogRepository> _logger;
|
||||
|
||||
public AuditLogRepository(ScadaLinkDbContext context, ILogger<AuditLogRepository>? logger = null)
|
||||
{
|
||||
_context = context ?? throw new ArgumentNullException(nameof(context));
|
||||
_logger = logger ?? NullLogger<AuditLogRepository>.Instance;
|
||||
}
|
||||
|
||||
/// <summary>
|
||||
@@ -44,6 +58,8 @@ public class AuditLogRepository : IAuditLogRepository
|
||||
|
||||
// FormattableString interpolation parameterises every value (no concatenation),
|
||||
// so this is safe against injection even for the string columns.
|
||||
try
|
||||
{
|
||||
await _context.Database.ExecuteSqlInterpolatedAsync(
|
||||
$@"IF NOT EXISTS (SELECT 1 FROM dbo.AuditLog WHERE EventId = {evt.EventId})
|
||||
INSERT INTO dbo.AuditLog
|
||||
@@ -58,6 +74,23 @@ VALUES
|
||||
{evt.ResponseSummary}, {evt.PayloadTruncated}, {evt.Extra}, {forwardState});",
|
||||
ct);
|
||||
}
|
||||
catch (SqlException ex) when (
|
||||
ex.Number == SqlErrorUniqueIndexViolation
|
||||
|| ex.Number == SqlErrorPrimaryKeyViolation)
|
||||
{
|
||||
// Two concurrent sessions both passed the IF NOT EXISTS check and
|
||||
// both attempted the INSERT — the loser raises 2601/2627 against
|
||||
// UX_AuditLog_EventId. First-write-wins idempotency is already the
|
||||
// documented contract for this method, so the race outcome is
|
||||
// semantically a no-op. Swallow at Debug; other SqlExceptions
|
||||
// bubble.
|
||||
_logger.LogDebug(
|
||||
ex,
|
||||
"InsertIfNotExistsAsync swallowed duplicate-key violation (error {SqlErrorNumber}) for EventId {EventId}; treating as no-op.",
|
||||
ex.Number,
|
||||
evt.EventId);
|
||||
}
|
||||
}
|
||||
|
||||
/// <summary>
|
||||
/// Builds an <c>AsNoTracking</c> queryable over <see cref="AuditEvent"/>, applies
|
||||
|
||||
@@ -12,6 +12,13 @@ public interface ISiteHealthCollector
|
||||
void IncrementScriptError();
|
||||
void IncrementAlarmError();
|
||||
void IncrementDeadLetter();
|
||||
/// <summary>
|
||||
/// Audit Log (#23) Bundle G — increment the per-interval count of
|
||||
/// <c>FallbackAuditWriter</c> primary failures. Bridged from the
|
||||
/// <c>IAuditWriteFailureCounter</c> binding registered via
|
||||
/// <c>AddAuditLogHealthMetricsBridge()</c>.
|
||||
/// </summary>
|
||||
void IncrementSiteAuditWriteFailures();
|
||||
void UpdateConnectionHealth(string connectionName, ConnectionHealth health);
|
||||
void RemoveConnection(string connectionName);
|
||||
void UpdateTagResolution(string connectionName, int totalSubscribed, int successfullyResolved);
|
||||
|
||||
@@ -13,6 +13,7 @@ public class SiteHealthCollector : ISiteHealthCollector
|
||||
private int _scriptErrorCount;
|
||||
private int _alarmErrorCount;
|
||||
private int _deadLetterCount;
|
||||
private int _siteAuditWriteFailures;
|
||||
private readonly ConcurrentDictionary<string, ConnectionHealth> _connectionStatuses = new();
|
||||
private readonly ConcurrentDictionary<string, TagResolutionStatus> _tagResolutionCounts = new();
|
||||
private readonly ConcurrentDictionary<string, string> _connectionEndpoints = new();
|
||||
@@ -61,6 +62,18 @@ public class SiteHealthCollector : ISiteHealthCollector
|
||||
Interlocked.Increment(ref _deadLetterCount);
|
||||
}
|
||||
|
||||
/// <summary>
|
||||
/// Audit Log (#23) Bundle G — increment the per-interval count of
|
||||
/// <c>FallbackAuditWriter</c> primary failures. Bridged from the
|
||||
/// <c>IAuditWriteFailureCounter</c> binding registered via
|
||||
/// <c>AddAuditLogHealthMetricsBridge()</c>; reset every interval together
|
||||
/// with the other per-interval counters.
|
||||
/// </summary>
|
||||
public void IncrementSiteAuditWriteFailures()
|
||||
{
|
||||
Interlocked.Increment(ref _siteAuditWriteFailures);
|
||||
}
|
||||
|
||||
/// <summary>
|
||||
/// Update the health status for a named data connection.
|
||||
/// Called by DCL when connection state changes.
|
||||
@@ -144,6 +157,7 @@ public class SiteHealthCollector : ISiteHealthCollector
|
||||
var scriptErrors = Interlocked.Exchange(ref _scriptErrorCount, 0);
|
||||
var alarmErrors = Interlocked.Exchange(ref _alarmErrorCount, 0);
|
||||
var deadLetters = Interlocked.Exchange(ref _deadLetterCount, 0);
|
||||
var siteAuditWriteFailures = Interlocked.Exchange(ref _siteAuditWriteFailures, 0);
|
||||
|
||||
// Snapshot current connection and tag resolution state
|
||||
var connectionStatuses = new Dictionary<string, ConnectionHealth>(_connectionStatuses);
|
||||
@@ -175,6 +189,7 @@ public class SiteHealthCollector : ISiteHealthCollector
|
||||
DataConnectionEndpoints: connectionEndpoints,
|
||||
DataConnectionTagQuality: tagQuality,
|
||||
ParkedMessageCount: Interlocked.CompareExchange(ref _parkedMessageCount, 0, 0),
|
||||
ClusterNodes: _clusterNodes?.ToList());
|
||||
ClusterNodes: _clusterNodes?.ToList(),
|
||||
SiteAuditWriteFailures: siteAuditWriteFailures);
|
||||
}
|
||||
}
|
||||
|
||||
@@ -128,6 +128,13 @@ public class AkkaHostedService : IHostedService
|
||||
var rolesStr = string.Join(",", roles.Select(QuoteHocon));
|
||||
|
||||
return $@"
|
||||
audit-telemetry-dispatcher {{
|
||||
type = ForkJoinDispatcher
|
||||
throughput = 100
|
||||
dedicated-thread-pool {{
|
||||
thread-count = 2
|
||||
}}
|
||||
}}
|
||||
akka {{
|
||||
extensions = [
|
||||
""Akka.Cluster.Tools.PublishSubscribe.DistributedPubSubExtensionProvider, Akka.Cluster.Tools""
|
||||
@@ -294,6 +301,47 @@ akka {{
|
||||
commService?.SetNotificationOutbox(outboxProxy);
|
||||
_logger.LogInformation("NotificationOutbox singleton created and registered with CentralCommunicationActor");
|
||||
|
||||
// Audit Log (#23) — central singleton mirrors the Notification Outbox
|
||||
// pattern. The IngestAuditEvents gRPC handler lives on SiteStreamGrpcServer
|
||||
// (Communication.Grpc); a central node hosting that server (M6 reconciliation
|
||||
// path) hands the proxy in via SetAuditIngestActor below. When the gRPC
|
||||
// server is not registered (current central topology), the host still
|
||||
// brings the singleton up so a Bundle H in-process test (or a future
|
||||
// direct caller) can Ask the proxy without further wiring.
|
||||
// IAuditLogRepository is a SCOPED EF Core service, so the singleton
|
||||
// actor takes the root IServiceProvider and creates a fresh scope per
|
||||
// message (mirroring NotificationOutboxActor). Pre-resolving the
|
||||
// repository here would attempt to take a scoped service from the
|
||||
// root and fail under DI scope validation.
|
||||
var auditIngestLogger = _serviceProvider.GetRequiredService<ILoggerFactory>()
|
||||
.CreateLogger<ScadaLink.AuditLog.Central.AuditLogIngestActor>();
|
||||
|
||||
var auditIngestSingletonProps = ClusterSingletonManager.Props(
|
||||
singletonProps: Props.Create(() => new ScadaLink.AuditLog.Central.AuditLogIngestActor(
|
||||
_serviceProvider,
|
||||
auditIngestLogger)),
|
||||
terminationMessage: PoisonPill.Instance,
|
||||
settings: ClusterSingletonManagerSettings.Create(_actorSystem!)
|
||||
.WithSingletonName("audit-log-ingest"));
|
||||
_actorSystem!.ActorOf(auditIngestSingletonProps, "audit-log-ingest-singleton");
|
||||
|
||||
var auditIngestProxyProps = ClusterSingletonProxy.Props(
|
||||
singletonManagerPath: "/user/audit-log-ingest-singleton",
|
||||
settings: ClusterSingletonProxySettings.Create(_actorSystem)
|
||||
.WithSingletonName("audit-log-ingest"));
|
||||
var auditIngestProxy = _actorSystem.ActorOf(auditIngestProxyProps, "audit-log-ingest-proxy");
|
||||
|
||||
// Hand the proxy to the SiteStreamGrpcServer (if registered on this node)
|
||||
// so the IngestAuditEvents RPC routes incoming site batches to the singleton.
|
||||
// The gRPC server is currently only registered on Site nodes; on a central
|
||||
// node this resolves to null and the wiring is a no-op until M6 (which
|
||||
// brings central-hosted gRPC + a real site→central client).
|
||||
var grpcServer = _serviceProvider.GetService<ScadaLink.Communication.Grpc.SiteStreamGrpcServer>();
|
||||
grpcServer?.SetAuditIngestActor(auditIngestProxy);
|
||||
_logger.LogInformation(
|
||||
"AuditLogIngestActor singleton created (gRPC server bound: {GrpcBound})",
|
||||
grpcServer is not null);
|
||||
|
||||
_logger.LogInformation("Central actors registered. CentralCommunicationActor created.");
|
||||
}
|
||||
|
||||
@@ -504,6 +552,41 @@ akka {{
|
||||
contacts.Count, _nodeOptions.SiteId);
|
||||
}
|
||||
|
||||
// Audit Log (#23) — site-side telemetry actor that drains the SQLite
|
||||
// Pending queue and pushes to central via IngestAuditEvents. Not a
|
||||
// cluster singleton: each site is its own cluster, and the actor reads
|
||||
// node-local SQLite (no replication). The Props are bound to the
|
||||
// dedicated audit-telemetry-dispatcher (defined in BuildHocon) so a
|
||||
// batch SQLite read + gRPC push never contend with the default
|
||||
// dispatcher used by hot-path actors.
|
||||
//
|
||||
// Per Bundle E's brief: the SiteAuditTelemetryActor takes its
|
||||
// collaborators through its constructor, so we resolve them from DI
|
||||
// and pass them in via Props.Create rather than relying on a future
|
||||
// FactoryProvider. This also lets the M6 follow-up swap the
|
||||
// NoOpSiteStreamAuditClient registration for the real gRPC client
|
||||
// without touching this site wiring.
|
||||
var siteAuditOptions = _serviceProvider
|
||||
.GetRequiredService<IOptions<ScadaLink.AuditLog.Site.Telemetry.SiteAuditTelemetryOptions>>();
|
||||
var siteAuditQueue = _serviceProvider
|
||||
.GetRequiredService<ScadaLink.AuditLog.Site.Telemetry.ISiteAuditQueue>();
|
||||
var siteAuditClient = _serviceProvider
|
||||
.GetRequiredService<ScadaLink.AuditLog.Site.Telemetry.ISiteStreamAuditClient>();
|
||||
var siteAuditLogger = _serviceProvider.GetRequiredService<ILoggerFactory>()
|
||||
.CreateLogger<ScadaLink.AuditLog.Site.Telemetry.SiteAuditTelemetryActor>();
|
||||
|
||||
var siteAuditTelemetryProps = Props.Create(() =>
|
||||
new ScadaLink.AuditLog.Site.Telemetry.SiteAuditTelemetryActor(
|
||||
siteAuditQueue,
|
||||
siteAuditClient,
|
||||
siteAuditOptions,
|
||||
siteAuditLogger))
|
||||
.WithDispatcher("audit-telemetry-dispatcher");
|
||||
_actorSystem.ActorOf(siteAuditTelemetryProps, "site-audit-telemetry");
|
||||
_logger.LogInformation(
|
||||
"SiteAuditTelemetryActor created (dispatcher=audit-telemetry-dispatcher, client={ClientType})",
|
||||
siteAuditClient.GetType().Name);
|
||||
|
||||
// Gate gRPC subscriptions until the actor system and SiteStreamManager are
|
||||
// initialized (REQ-HOST-7).
|
||||
//
|
||||
|
||||
@@ -1,5 +1,6 @@
|
||||
using HealthChecks.UI.Client;
|
||||
using Microsoft.AspNetCore.Diagnostics.HealthChecks;
|
||||
using ScadaLink.AuditLog;
|
||||
using ScadaLink.CentralUI;
|
||||
using ScadaLink.ClusterInfrastructure;
|
||||
using ScadaLink.Communication;
|
||||
@@ -77,6 +78,10 @@ try
|
||||
// AddNotificationService() SMTP machinery above. AddNotificationOutbox binds
|
||||
// NotificationOutboxOptions via BindConfiguration, so no explicit Configure is needed.
|
||||
builder.Services.AddNotificationOutbox();
|
||||
// Audit Log (#23) — central node owns the AuditLogIngestActor singleton +
|
||||
// IAuditLogRepository. The site writer chain is still registered (lazy
|
||||
// singletons) but is never resolved on a central node.
|
||||
builder.Services.AddAuditLog(builder.Configuration);
|
||||
builder.Services.AddTemplateEngine();
|
||||
builder.Services.AddDeploymentManager();
|
||||
builder.Services.AddSecurity();
|
||||
|
||||
@@ -38,6 +38,7 @@
|
||||
<ProjectReference Include="../ScadaLink.ExternalSystemGateway/ScadaLink.ExternalSystemGateway.csproj" />
|
||||
<ProjectReference Include="../ScadaLink.NotificationService/ScadaLink.NotificationService.csproj" />
|
||||
<ProjectReference Include="../ScadaLink.NotificationOutbox/ScadaLink.NotificationOutbox.csproj" />
|
||||
<ProjectReference Include="../ScadaLink.AuditLog/ScadaLink.AuditLog.csproj" />
|
||||
<ProjectReference Include="../ScadaLink.CentralUI/ScadaLink.CentralUI.csproj" />
|
||||
<ProjectReference Include="../ScadaLink.Security/ScadaLink.Security.csproj" />
|
||||
<ProjectReference Include="../ScadaLink.HealthMonitoring/ScadaLink.HealthMonitoring.csproj" />
|
||||
|
||||
@@ -1,3 +1,4 @@
|
||||
using ScadaLink.AuditLog;
|
||||
using ScadaLink.ClusterInfrastructure;
|
||||
using ScadaLink.Communication;
|
||||
using ScadaLink.DataConnectionLayer;
|
||||
@@ -44,6 +45,19 @@ public static class SiteServiceRegistration
|
||||
services.AddStoreAndForward();
|
||||
services.AddSiteEventLogging();
|
||||
|
||||
// Audit Log (#23) — site-side hot-path writer + telemetry collaborators.
|
||||
// The SiteAuditTelemetryActor itself is registered by AkkaHostedService
|
||||
// in the site-role block; this call wires every DI dependency it (and
|
||||
// ScriptRuntimeContext, when Bundle F lands) reaches for.
|
||||
services.AddAuditLog(config);
|
||||
|
||||
// Audit Log (#23) M2 Bundle G — bridge FallbackAuditWriter primary
|
||||
// failures into the site health report payload as
|
||||
// SiteAuditWriteFailures. Must come AFTER both AddSiteHealthMonitoring
|
||||
// (registers ISiteHealthCollector) and AddAuditLog (registers the
|
||||
// NoOp default this call replaces).
|
||||
services.AddAuditLogHealthMetricsBridge();
|
||||
|
||||
// WP-13: Akka.NET bootstrap via hosted service
|
||||
services.AddSingleton<AkkaHostedService>();
|
||||
services.AddHostedService(sp => sp.GetRequiredService<AkkaHostedService>());
|
||||
|
||||
@@ -101,6 +101,10 @@ public class ScriptExecutionActor : ReceiveActor
|
||||
// provider supplies the site id stamped on enqueued notifications.
|
||||
StoreAndForwardService? storeAndForward = null;
|
||||
var siteId = string.Empty;
|
||||
// Audit Log #23 (M2 Bundle F): the writer is a singleton (FallbackAuditWriter
|
||||
// composes the SQLite hot-path + drop-oldest ring); null in tests / hosts
|
||||
// that haven't called AddAuditLog, which the helper handles as a no-op.
|
||||
IAuditWriter? auditWriter = null;
|
||||
|
||||
if (serviceProvider != null)
|
||||
{
|
||||
@@ -110,6 +114,7 @@ public class ScriptExecutionActor : ReceiveActor
|
||||
storeAndForward = serviceScope.ServiceProvider.GetService<StoreAndForwardService>();
|
||||
siteId = serviceScope.ServiceProvider.GetService<ISiteIdentityProvider>()?.SiteId
|
||||
?? string.Empty;
|
||||
auditWriter = serviceScope.ServiceProvider.GetService<IAuditWriter>();
|
||||
}
|
||||
|
||||
var context = new ScriptRuntimeContext(
|
||||
@@ -128,7 +133,12 @@ public class ScriptExecutionActor : ReceiveActor
|
||||
siteId,
|
||||
// Notification Outbox (FU3): stamp the executing script onto outbound
|
||||
// notifications using the Site Event Logging "Source" convention.
|
||||
sourceScript: $"ScriptActor:{scriptName}");
|
||||
sourceScript: $"ScriptActor:{scriptName}",
|
||||
// Audit Log #23 (M2 Bundle F): emit one ApiOutbound/ApiCall row per
|
||||
// ExternalSystem.Call. Writer is best-effort; failures are logged
|
||||
// and swallowed inside the helper so the script's call path is
|
||||
// never aborted by an audit failure.
|
||||
auditWriter: auditWriter);
|
||||
|
||||
var globals = new ScriptGlobals
|
||||
{
|
||||
|
||||
@@ -1,6 +1,9 @@
|
||||
using System.Diagnostics;
|
||||
using System.Text.Json;
|
||||
using System.Text.RegularExpressions;
|
||||
using Akka.Actor;
|
||||
using Microsoft.Extensions.Logging;
|
||||
using ScadaLink.Commons.Entities.Audit;
|
||||
using ScadaLink.Commons.Interfaces.Services;
|
||||
using ScadaLink.Commons.Messages.Instance;
|
||||
using ScadaLink.Commons.Messages.Notification;
|
||||
@@ -75,6 +78,13 @@ public class ScriptRuntimeContext
|
||||
/// </summary>
|
||||
private readonly string? _sourceScript;
|
||||
|
||||
/// <summary>
|
||||
/// Audit Log #23: best-effort emitter for boundary-crossing actions executed
|
||||
/// by the script. Optional — when null the helpers degrade to a no-op audit
|
||||
/// path so tests / contexts that do not need the audit pipeline still work.
|
||||
/// </summary>
|
||||
private readonly IAuditWriter? _auditWriter;
|
||||
|
||||
public ScriptRuntimeContext(
|
||||
IActorRef instanceActor,
|
||||
IActorRef self,
|
||||
@@ -89,7 +99,8 @@ public class ScriptRuntimeContext
|
||||
StoreAndForwardService? storeAndForward = null,
|
||||
ICanTell? siteCommunicationActor = null,
|
||||
string siteId = "",
|
||||
string? sourceScript = null)
|
||||
string? sourceScript = null,
|
||||
IAuditWriter? auditWriter = null)
|
||||
{
|
||||
_instanceActor = instanceActor;
|
||||
_self = self;
|
||||
@@ -105,6 +116,7 @@ public class ScriptRuntimeContext
|
||||
_siteCommunicationActor = siteCommunicationActor;
|
||||
_siteId = siteId;
|
||||
_sourceScript = sourceScript;
|
||||
_auditWriter = auditWriter;
|
||||
}
|
||||
|
||||
/// <summary>
|
||||
@@ -204,7 +216,8 @@ public class ScriptRuntimeContext
|
||||
/// ExternalSystem.Call("systemName", "methodName", params)
|
||||
/// ExternalSystem.CachedCall("systemName", "methodName", params)
|
||||
/// </summary>
|
||||
public ExternalSystemHelper ExternalSystem => new(_externalSystemClient, _instanceName, _logger);
|
||||
public ExternalSystemHelper ExternalSystem => new(
|
||||
_externalSystemClient, _instanceName, _logger, _auditWriter, _siteId, _sourceScript);
|
||||
|
||||
/// <summary>
|
||||
/// WP-13: Provides access to database operations.
|
||||
@@ -275,17 +288,41 @@ public class ScriptRuntimeContext
|
||||
/// <summary>
|
||||
/// WP-13: Helper for ExternalSystem.Call/CachedCall syntax.
|
||||
/// </summary>
|
||||
/// <remarks>
|
||||
/// Audit Log #23 (M2 Bundle F): every <see cref="Call"/> invocation emits
|
||||
/// one <c>ApiOutbound</c>/<c>ApiCall</c> audit row via <see cref="IAuditWriter"/>.
|
||||
/// The audit emission is wrapped in a try/catch that swallows every exception
|
||||
/// — the audit pipeline is best-effort and must NEVER abort the script's
|
||||
/// outbound call (alog.md §7). The original <see cref="ExternalCallResult"/>
|
||||
/// (or the original thrown exception) flows back to the caller unchanged.
|
||||
/// </remarks>
|
||||
public class ExternalSystemHelper
|
||||
{
|
||||
private static readonly Regex HttpStatusRegex = new(
|
||||
@"HTTP\s+(?<code>\d{3})",
|
||||
RegexOptions.Compiled | RegexOptions.CultureInvariant);
|
||||
|
||||
private readonly IExternalSystemClient? _client;
|
||||
private readonly string _instanceName;
|
||||
private readonly ILogger _logger;
|
||||
private readonly IAuditWriter? _auditWriter;
|
||||
private readonly string _siteId;
|
||||
private readonly string? _sourceScript;
|
||||
|
||||
internal ExternalSystemHelper(IExternalSystemClient? client, string instanceName, ILogger logger)
|
||||
internal ExternalSystemHelper(
|
||||
IExternalSystemClient? client,
|
||||
string instanceName,
|
||||
ILogger logger,
|
||||
IAuditWriter? auditWriter = null,
|
||||
string siteId = "",
|
||||
string? sourceScript = null)
|
||||
{
|
||||
_client = client;
|
||||
_instanceName = instanceName;
|
||||
_logger = logger;
|
||||
_auditWriter = auditWriter;
|
||||
_siteId = siteId;
|
||||
_sourceScript = sourceScript;
|
||||
}
|
||||
|
||||
public async Task<ExternalCallResult> Call(
|
||||
@@ -297,7 +334,31 @@ public class ScriptRuntimeContext
|
||||
if (_client == null)
|
||||
throw new InvalidOperationException("External system client not available");
|
||||
|
||||
return await _client.CallAsync(systemName, methodName, parameters, cancellationToken);
|
||||
// Audit Log #23 (M2 Bundle F): wrap the outbound call so every
|
||||
// attempt emits exactly one ApiOutbound/ApiCall row. The wrapper
|
||||
// mirrors the existing call-site behaviour — the original result
|
||||
// OR original exception flows back to the script untouched; the
|
||||
// audit emission is best-effort.
|
||||
var occurredAtUtc = DateTime.UtcNow;
|
||||
var startTicks = Stopwatch.GetTimestamp();
|
||||
ExternalCallResult? result = null;
|
||||
Exception? thrown = null;
|
||||
try
|
||||
{
|
||||
result = await _client.CallAsync(systemName, methodName, parameters, cancellationToken);
|
||||
return result;
|
||||
}
|
||||
catch (Exception ex)
|
||||
{
|
||||
thrown = ex;
|
||||
throw;
|
||||
}
|
||||
finally
|
||||
{
|
||||
var elapsedMs = (int)((Stopwatch.GetTimestamp() - startTicks)
|
||||
* 1000d / Stopwatch.Frequency);
|
||||
EmitCallAudit(systemName, methodName, occurredAtUtc, elapsedMs, result, thrown);
|
||||
}
|
||||
}
|
||||
|
||||
public async Task<ExternalCallResult> CachedCall(
|
||||
@@ -311,6 +372,145 @@ public class ScriptRuntimeContext
|
||||
|
||||
return await _client.CachedCallAsync(systemName, methodName, parameters, _instanceName, cancellationToken);
|
||||
}
|
||||
|
||||
/// <summary>
|
||||
/// Best-effort emission of one <c>ApiOutbound</c>/<c>ApiCall</c> audit
|
||||
/// row. Any exception thrown by the writer is logged and swallowed —
|
||||
/// audit-write failures must never abort the user-facing action.
|
||||
/// </summary>
|
||||
private void EmitCallAudit(
|
||||
string systemName,
|
||||
string methodName,
|
||||
DateTime occurredAtUtc,
|
||||
int durationMs,
|
||||
ExternalCallResult? result,
|
||||
Exception? thrown)
|
||||
{
|
||||
if (_auditWriter == null)
|
||||
{
|
||||
return;
|
||||
}
|
||||
|
||||
AuditEvent evt;
|
||||
try
|
||||
{
|
||||
evt = BuildCallAuditEvent(systemName, methodName, occurredAtUtc, durationMs, result, thrown);
|
||||
}
|
||||
catch (Exception buildEx)
|
||||
{
|
||||
// Building the event itself must never propagate. This is a
|
||||
// defensive guard — populating a record from already-validated
|
||||
// values shouldn't throw, but we honour the alog.md §7
|
||||
// best-effort contract regardless.
|
||||
_logger.LogWarning(buildEx,
|
||||
"Failed to build Audit Log #23 event for {System}.{Method} — skipping emission",
|
||||
systemName, methodName);
|
||||
return;
|
||||
}
|
||||
|
||||
try
|
||||
{
|
||||
// Fire-and-forget so we never block the script on the audit
|
||||
// writer; the writer itself is responsible for fast, durable
|
||||
// enqueue (site SQLite hot-path). We DO observe failures via
|
||||
// ContinueWith so a thrown writer is logged rather than going
|
||||
// to the unobserved-task firehose.
|
||||
var writeTask = _auditWriter.WriteAsync(evt, CancellationToken.None);
|
||||
if (!writeTask.IsCompleted)
|
||||
{
|
||||
writeTask.ContinueWith(
|
||||
t => _logger.LogWarning(t.Exception,
|
||||
"Audit Log #23 write failed for EventId {EventId} ({System}.{Method})",
|
||||
evt.EventId, systemName, methodName),
|
||||
CancellationToken.None,
|
||||
TaskContinuationOptions.OnlyOnFaulted | TaskContinuationOptions.ExecuteSynchronously,
|
||||
TaskScheduler.Default);
|
||||
}
|
||||
else if (writeTask.IsFaulted)
|
||||
{
|
||||
_logger.LogWarning(writeTask.Exception,
|
||||
"Audit Log #23 write failed for EventId {EventId} ({System}.{Method})",
|
||||
evt.EventId, systemName, methodName);
|
||||
}
|
||||
}
|
||||
catch (Exception writeEx)
|
||||
{
|
||||
// Synchronous throw from WriteAsync (e.g. ArgumentNullException
|
||||
// before the writer's own try/catch). Swallow + log per the
|
||||
// alog.md §7 contract.
|
||||
_logger.LogWarning(writeEx,
|
||||
"Audit Log #23 write threw synchronously for EventId {EventId} ({System}.{Method})",
|
||||
evt.EventId, systemName, methodName);
|
||||
}
|
||||
}
|
||||
|
||||
private AuditEvent BuildCallAuditEvent(
|
||||
string systemName,
|
||||
string methodName,
|
||||
DateTime occurredAtUtc,
|
||||
int durationMs,
|
||||
ExternalCallResult? result,
|
||||
Exception? thrown)
|
||||
{
|
||||
// Status: Delivered on a Success result; Failed otherwise (the
|
||||
// ExternalSystemClient already maps HTTP non-2xx + transient
|
||||
// exceptions into Success=false on the result, or surfaces a raw
|
||||
// exception). M2 makes no distinction between transient + permanent
|
||||
// failure here — both manifest as Status.Failed on the sync path.
|
||||
var status = (thrown == null && result != null && result.Success)
|
||||
? AuditStatus.Delivered
|
||||
: AuditStatus.Failed;
|
||||
|
||||
string? errorMessage = null;
|
||||
string? errorDetail = null;
|
||||
int? httpStatus = null;
|
||||
|
||||
if (thrown != null)
|
||||
{
|
||||
errorMessage = thrown.Message;
|
||||
errorDetail = thrown.ToString();
|
||||
}
|
||||
else if (result != null && !result.Success)
|
||||
{
|
||||
errorMessage = result.ErrorMessage;
|
||||
// The ExternalSystemClient embeds the HTTP status code in the
|
||||
// error message as "HTTP {code}". Parse it back out so the
|
||||
// audit row carries the structured value.
|
||||
if (!string.IsNullOrEmpty(result.ErrorMessage))
|
||||
{
|
||||
var match = HttpStatusRegex.Match(result.ErrorMessage);
|
||||
if (match.Success
|
||||
&& int.TryParse(match.Groups["code"].Value, out var parsed))
|
||||
{
|
||||
httpStatus = parsed;
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
return new AuditEvent
|
||||
{
|
||||
EventId = Guid.NewGuid(),
|
||||
OccurredAtUtc = DateTime.SpecifyKind(occurredAtUtc, DateTimeKind.Utc),
|
||||
Channel = AuditChannel.ApiOutbound,
|
||||
Kind = AuditKind.ApiCall,
|
||||
CorrelationId = null,
|
||||
SourceSiteId = string.IsNullOrEmpty(_siteId) ? null : _siteId,
|
||||
SourceInstanceId = _instanceName,
|
||||
SourceScript = _sourceScript,
|
||||
Actor = null,
|
||||
Target = $"{systemName}.{methodName}",
|
||||
Status = status,
|
||||
HttpStatus = httpStatus,
|
||||
DurationMs = durationMs,
|
||||
ErrorMessage = errorMessage,
|
||||
ErrorDetail = errorDetail,
|
||||
RequestSummary = null,
|
||||
ResponseSummary = null,
|
||||
PayloadTruncated = false,
|
||||
Extra = null,
|
||||
ForwardState = AuditForwardState.Pending,
|
||||
};
|
||||
}
|
||||
}
|
||||
|
||||
/// <summary>
|
||||
|
||||
@@ -1,28 +1,43 @@
|
||||
using Microsoft.Extensions.Configuration;
|
||||
using Microsoft.Extensions.DependencyInjection;
|
||||
using Microsoft.Extensions.Logging;
|
||||
using Microsoft.Extensions.Logging.Abstractions;
|
||||
using Microsoft.Extensions.Options;
|
||||
using ScadaLink.AuditLog.Configuration;
|
||||
using ScadaLink.AuditLog.Site;
|
||||
using ScadaLink.AuditLog.Site.Telemetry;
|
||||
using ScadaLink.Commons.Interfaces.Services;
|
||||
using ScadaLink.HealthMonitoring;
|
||||
|
||||
namespace ScadaLink.AuditLog.Tests;
|
||||
|
||||
/// <summary>
|
||||
/// Bundle E (M1) smoke tests for the Audit Log (#23) DI scaffold. Verifies
|
||||
/// <c>AddAuditLog</c> registers <see cref="AuditLogOptions"/> against the
|
||||
/// <c>AuditLog</c> configuration section. Bundle E ships only the scaffold;
|
||||
/// the validator + full options surface land in Task 9.
|
||||
/// Bundle E (M2 Task E1) DI surface tests for <c>AddAuditLog</c>. M1 shipped
|
||||
/// the options-only scaffold; M2 extends it with the site writer chain
|
||||
/// (<see cref="SqliteAuditWriter"/> + <see cref="RingBufferFallback"/> +
|
||||
/// <see cref="FallbackAuditWriter"/>) and the telemetry collaborators
|
||||
/// (<see cref="ISiteAuditQueue"/>, <see cref="ISiteStreamAuditClient"/>,
|
||||
/// <see cref="IAuditWriteFailureCounter"/>).
|
||||
/// </summary>
|
||||
public class AddAuditLogTests
|
||||
{
|
||||
[Fact]
|
||||
public void AddAuditLog_RegistersAuditLogOptions()
|
||||
private static ServiceProvider BuildProvider(IDictionary<string, string?>? settings = null)
|
||||
{
|
||||
var config = new ConfigurationBuilder()
|
||||
.AddInMemoryCollection(new Dictionary<string, string?>())
|
||||
.AddInMemoryCollection(settings ?? new Dictionary<string, string?>())
|
||||
.Build();
|
||||
|
||||
var services = new ServiceCollection();
|
||||
services.AddSingleton<ILoggerFactory, NullLoggerFactory>();
|
||||
services.AddSingleton(typeof(ILogger<>), typeof(NullLogger<>));
|
||||
services.AddAuditLog(config);
|
||||
var provider = services.BuildServiceProvider();
|
||||
return services.BuildServiceProvider();
|
||||
}
|
||||
|
||||
[Fact]
|
||||
public void AddAuditLog_RegistersAuditLogOptions()
|
||||
{
|
||||
using var provider = BuildProvider();
|
||||
|
||||
var opts = provider.GetService<IOptions<AuditLogOptions>>();
|
||||
|
||||
@@ -47,4 +62,182 @@ public class AddAuditLogTests
|
||||
Assert.Throws<ArgumentNullException>(
|
||||
() => services.AddAuditLog(null!));
|
||||
}
|
||||
|
||||
// -- Bundle E (M2 Task E1) ---------------------------------------------
|
||||
|
||||
[Fact]
|
||||
public void AddAuditLog_Registers_SqliteAuditWriter_Singleton_FromDI()
|
||||
{
|
||||
using var provider = BuildProvider(new Dictionary<string, string?>
|
||||
{
|
||||
// In-memory database keeps the writer's owned connection portable
|
||||
// across tests; the per-instance Cache=Shared in the writer's
|
||||
// default connection string ensures no on-disk file is touched.
|
||||
["AuditLog:SiteWriter:DatabasePath"] = ":memory:",
|
||||
});
|
||||
|
||||
var writer = provider.GetService<SqliteAuditWriter>();
|
||||
|
||||
Assert.NotNull(writer);
|
||||
// Singleton — same instance on a second resolve.
|
||||
Assert.Same(writer, provider.GetService<SqliteAuditWriter>());
|
||||
}
|
||||
|
||||
[Fact]
|
||||
public void AddAuditLog_Registers_IAuditWriter_AsFallbackAuditWriter()
|
||||
{
|
||||
using var provider = BuildProvider(new Dictionary<string, string?>
|
||||
{
|
||||
["AuditLog:SiteWriter:DatabasePath"] = ":memory:",
|
||||
});
|
||||
|
||||
var writer = provider.GetService<IAuditWriter>();
|
||||
|
||||
Assert.NotNull(writer);
|
||||
Assert.IsType<FallbackAuditWriter>(writer);
|
||||
}
|
||||
|
||||
[Fact]
|
||||
public void AddAuditLog_Registers_ISiteAuditQueue_AsSameInstance_As_SqliteAuditWriter()
|
||||
{
|
||||
// The telemetry actor reads from ISiteAuditQueue while scripts write
|
||||
// through IAuditWriter → SqliteAuditWriter. Both surfaces MUST resolve
|
||||
// to the same instance or pending rows will never be visible.
|
||||
using var provider = BuildProvider(new Dictionary<string, string?>
|
||||
{
|
||||
["AuditLog:SiteWriter:DatabasePath"] = ":memory:",
|
||||
});
|
||||
|
||||
var queue = provider.GetService<ISiteAuditQueue>();
|
||||
var writer = provider.GetService<SqliteAuditWriter>();
|
||||
|
||||
Assert.NotNull(queue);
|
||||
Assert.NotNull(writer);
|
||||
Assert.Same(writer, queue);
|
||||
}
|
||||
|
||||
[Fact]
|
||||
public void AddAuditLog_Registers_RingBufferFallback_Singleton()
|
||||
{
|
||||
using var provider = BuildProvider(new Dictionary<string, string?>
|
||||
{
|
||||
["AuditLog:SiteWriter:DatabasePath"] = ":memory:",
|
||||
});
|
||||
|
||||
var ring = provider.GetService<RingBufferFallback>();
|
||||
Assert.NotNull(ring);
|
||||
Assert.Same(ring, provider.GetService<RingBufferFallback>());
|
||||
}
|
||||
|
||||
[Fact]
|
||||
public void AddAuditLog_Registers_AuditWriteFailureCounter_AsNoOpDefault()
|
||||
{
|
||||
using var provider = BuildProvider(new Dictionary<string, string?>
|
||||
{
|
||||
["AuditLog:SiteWriter:DatabasePath"] = ":memory:",
|
||||
});
|
||||
|
||||
var counter = provider.GetService<IAuditWriteFailureCounter>();
|
||||
Assert.NotNull(counter);
|
||||
Assert.IsType<NoOpAuditWriteFailureCounter>(counter);
|
||||
}
|
||||
|
||||
[Fact]
|
||||
public void AddAuditLog_Registers_SiteStreamAuditClient_AsNoOpDefault()
|
||||
{
|
||||
using var provider = BuildProvider(new Dictionary<string, string?>
|
||||
{
|
||||
["AuditLog:SiteWriter:DatabasePath"] = ":memory:",
|
||||
});
|
||||
|
||||
var client = provider.GetService<ISiteStreamAuditClient>();
|
||||
Assert.NotNull(client);
|
||||
Assert.IsType<NoOpSiteStreamAuditClient>(client);
|
||||
}
|
||||
|
||||
[Fact]
|
||||
public void AddAuditLog_Options_Bind_RoundTrip_SqliteWriter()
|
||||
{
|
||||
using var provider = BuildProvider(new Dictionary<string, string?>
|
||||
{
|
||||
["AuditLog:SiteWriter:DatabasePath"] = "/tmp/test-audit.db",
|
||||
["AuditLog:SiteWriter:ChannelCapacity"] = "8192",
|
||||
["AuditLog:SiteWriter:BatchSize"] = "128",
|
||||
["AuditLog:SiteWriter:FlushIntervalMs"] = "75",
|
||||
});
|
||||
|
||||
var opts = provider.GetRequiredService<IOptions<SqliteAuditWriterOptions>>().Value;
|
||||
Assert.Equal("/tmp/test-audit.db", opts.DatabasePath);
|
||||
Assert.Equal(8192, opts.ChannelCapacity);
|
||||
Assert.Equal(128, opts.BatchSize);
|
||||
Assert.Equal(75, opts.FlushIntervalMs);
|
||||
}
|
||||
|
||||
[Fact]
|
||||
public void AddAuditLog_Options_Bind_RoundTrip_SiteTelemetry()
|
||||
{
|
||||
using var provider = BuildProvider(new Dictionary<string, string?>
|
||||
{
|
||||
["AuditLog:SiteTelemetry:BatchSize"] = "512",
|
||||
["AuditLog:SiteTelemetry:BusyIntervalSeconds"] = "3",
|
||||
["AuditLog:SiteTelemetry:IdleIntervalSeconds"] = "60",
|
||||
});
|
||||
|
||||
var opts = provider.GetRequiredService<IOptions<SiteAuditTelemetryOptions>>().Value;
|
||||
Assert.Equal(512, opts.BatchSize);
|
||||
Assert.Equal(3, opts.BusyIntervalSeconds);
|
||||
Assert.Equal(60, opts.IdleIntervalSeconds);
|
||||
}
|
||||
|
||||
// -- Bundle G (M2 Task G1) Site Health Monitoring bridge ----------------
|
||||
|
||||
[Fact]
|
||||
public void AddAuditLogHealthMetricsBridge_Swaps_FailureCounter_To_HealthMetrics_Implementation()
|
||||
{
|
||||
var config = new ConfigurationBuilder()
|
||||
.AddInMemoryCollection(new Dictionary<string, string?>
|
||||
{
|
||||
["AuditLog:SiteWriter:DatabasePath"] = ":memory:",
|
||||
})
|
||||
.Build();
|
||||
|
||||
var services = new ServiceCollection();
|
||||
services.AddSingleton<ILoggerFactory, NullLoggerFactory>();
|
||||
services.AddSingleton(typeof(ILogger<>), typeof(NullLogger<>));
|
||||
services.AddAuditLog(config);
|
||||
// The bridge depends on ISiteHealthCollector; AddHealthMonitoring is
|
||||
// what registers it on the site (and the central self-host).
|
||||
services.AddHealthMonitoring();
|
||||
services.AddAuditLogHealthMetricsBridge();
|
||||
using var provider = services.BuildServiceProvider();
|
||||
|
||||
var counter = provider.GetRequiredService<IAuditWriteFailureCounter>();
|
||||
|
||||
Assert.IsType<HealthMetricsAuditWriteFailureCounter>(counter);
|
||||
}
|
||||
|
||||
[Fact]
|
||||
public void AddAuditLogHealthMetricsBridge_Without_HealthMonitoring_Still_Resolves_But_Errors_On_Use()
|
||||
{
|
||||
// The bridge replaces the registration unconditionally; resolving the
|
||||
// counter when ISiteHealthCollector is missing throws at GetRequiredService
|
||||
// time. This documents the contract — callers must register
|
||||
// AddHealthMonitoring() before the bridge.
|
||||
var config = new ConfigurationBuilder()
|
||||
.AddInMemoryCollection(new Dictionary<string, string?>
|
||||
{
|
||||
["AuditLog:SiteWriter:DatabasePath"] = ":memory:",
|
||||
})
|
||||
.Build();
|
||||
|
||||
var services = new ServiceCollection();
|
||||
services.AddSingleton<ILoggerFactory, NullLoggerFactory>();
|
||||
services.AddSingleton(typeof(ILogger<>), typeof(NullLogger<>));
|
||||
services.AddAuditLog(config);
|
||||
services.AddAuditLogHealthMetricsBridge();
|
||||
using var provider = services.BuildServiceProvider();
|
||||
|
||||
Assert.Throws<InvalidOperationException>(
|
||||
() => provider.GetRequiredService<IAuditWriteFailureCounter>());
|
||||
}
|
||||
}
|
||||
|
||||
@@ -0,0 +1,220 @@
|
||||
using Akka.Actor;
|
||||
using Akka.TestKit.Xunit2;
|
||||
using Microsoft.EntityFrameworkCore;
|
||||
using Microsoft.Extensions.Logging.Abstractions;
|
||||
using ScadaLink.AuditLog.Central;
|
||||
using ScadaLink.Commons.Entities.Audit;
|
||||
using ScadaLink.Commons.Interfaces.Repositories;
|
||||
using ScadaLink.Commons.Messages.Audit;
|
||||
using ScadaLink.Commons.Types.Audit;
|
||||
using ScadaLink.Commons.Types.Enums;
|
||||
using ScadaLink.ConfigurationDatabase;
|
||||
using ScadaLink.ConfigurationDatabase.Repositories;
|
||||
using ScadaLink.ConfigurationDatabase.Tests.Migrations;
|
||||
|
||||
namespace ScadaLink.AuditLog.Tests.Central;
|
||||
|
||||
/// <summary>
|
||||
/// Bundle D D2 tests for <see cref="AuditLogIngestActor"/>. Uses the same
|
||||
/// <see cref="MsSqlMigrationFixture"/> as the M1 repository tests so the actor
|
||||
/// exercises real <see cref="AuditLogRepository.InsertIfNotExistsAsync"/>
|
||||
/// against a partitioned MSSQL schema (the only way to verify the
|
||||
/// IngestedAtUtc stamp + duplicate-key idempotency end to end).
|
||||
/// </summary>
|
||||
public class AuditLogIngestActorTests : TestKit, IClassFixture<MsSqlMigrationFixture>
|
||||
{
|
||||
private readonly MsSqlMigrationFixture _fixture;
|
||||
|
||||
public AuditLogIngestActorTests(MsSqlMigrationFixture fixture)
|
||||
{
|
||||
_fixture = fixture;
|
||||
}
|
||||
|
||||
private ScadaLinkDbContext CreateContext()
|
||||
{
|
||||
var options = new DbContextOptionsBuilder<ScadaLinkDbContext>()
|
||||
.UseSqlServer(_fixture.ConnectionString)
|
||||
.Options;
|
||||
return new ScadaLinkDbContext(options);
|
||||
}
|
||||
|
||||
private static string NewSiteId() =>
|
||||
"test-bundle-d2-" + Guid.NewGuid().ToString("N").Substring(0, 8);
|
||||
|
||||
private static AuditEvent NewEvent(string siteId, Guid? id = null) => new()
|
||||
{
|
||||
EventId = id ?? Guid.NewGuid(),
|
||||
OccurredAtUtc = new DateTime(2026, 5, 20, 10, 0, 0, DateTimeKind.Utc),
|
||||
Channel = AuditChannel.ApiOutbound,
|
||||
Kind = AuditKind.ApiCall,
|
||||
Status = AuditStatus.Delivered,
|
||||
SourceSiteId = siteId,
|
||||
};
|
||||
|
||||
private IActorRef CreateActor(IAuditLogRepository repository) =>
|
||||
Sys.ActorOf(Props.Create(() => new AuditLogIngestActor(
|
||||
repository,
|
||||
NullLogger<AuditLogIngestActor>.Instance)));
|
||||
|
||||
[SkippableFact]
|
||||
public async Task Receive_BatchOf5_Calls_Repo_5Times_Acks_All_5()
|
||||
{
|
||||
Skip.IfNot(_fixture.Available, _fixture.SkipReason);
|
||||
|
||||
var siteId = NewSiteId();
|
||||
var events = Enumerable.Range(0, 5).Select(_ => NewEvent(siteId)).ToList();
|
||||
|
||||
await using var context = CreateContext();
|
||||
var repo = new AuditLogRepository(context);
|
||||
var actor = CreateActor(repo);
|
||||
|
||||
actor.Tell(new IngestAuditEventsCommand(events), TestActor);
|
||||
|
||||
var reply = ExpectMsg<IngestAuditEventsReply>(TimeSpan.FromSeconds(10));
|
||||
Assert.Equal(5, reply.AcceptedEventIds.Count);
|
||||
Assert.True(events.Select(e => e.EventId).ToHashSet().SetEquals(reply.AcceptedEventIds.ToHashSet()));
|
||||
|
||||
// Verify rows landed in MSSQL.
|
||||
await using var readContext = CreateContext();
|
||||
var rows = await readContext.Set<AuditEvent>()
|
||||
.Where(e => e.SourceSiteId == siteId)
|
||||
.ToListAsync();
|
||||
Assert.Equal(5, rows.Count);
|
||||
}
|
||||
|
||||
[SkippableFact]
|
||||
public async Task Receive_BatchWith_AlreadyExistingEvent_AcksAll_NoDoubleInsert()
|
||||
{
|
||||
Skip.IfNot(_fixture.Available, _fixture.SkipReason);
|
||||
|
||||
var siteId = NewSiteId();
|
||||
var pre = NewEvent(siteId);
|
||||
|
||||
// Pre-insert one event directly via the repo so the actor sees it
|
||||
// already present when it processes the batch.
|
||||
await using (var seedContext = CreateContext())
|
||||
{
|
||||
var seedRepo = new AuditLogRepository(seedContext);
|
||||
await seedRepo.InsertIfNotExistsAsync(pre);
|
||||
}
|
||||
|
||||
// Build the batch including the pre-existing event plus 2 new ones.
|
||||
var fresh1 = NewEvent(siteId);
|
||||
var fresh2 = NewEvent(siteId);
|
||||
var batch = new List<AuditEvent> { pre, fresh1, fresh2 };
|
||||
|
||||
await using var context = CreateContext();
|
||||
var repo = new AuditLogRepository(context);
|
||||
var actor = CreateActor(repo);
|
||||
|
||||
actor.Tell(new IngestAuditEventsCommand(batch), TestActor);
|
||||
|
||||
var reply = ExpectMsg<IngestAuditEventsReply>(TimeSpan.FromSeconds(10));
|
||||
// All 3 acked under idempotent first-write-wins.
|
||||
Assert.Equal(3, reply.AcceptedEventIds.Count);
|
||||
|
||||
// Verify no double-insert.
|
||||
await using var readContext = CreateContext();
|
||||
var count = await readContext.Set<AuditEvent>()
|
||||
.Where(e => e.SourceSiteId == siteId)
|
||||
.CountAsync();
|
||||
Assert.Equal(3, count);
|
||||
}
|
||||
|
||||
[SkippableFact]
|
||||
public async Task Receive_Sets_IngestedAtUtc_Before_Insert()
|
||||
{
|
||||
Skip.IfNot(_fixture.Available, _fixture.SkipReason);
|
||||
|
||||
var siteId = NewSiteId();
|
||||
var events = Enumerable.Range(0, 3).Select(_ => NewEvent(siteId)).ToList();
|
||||
|
||||
var before = DateTime.UtcNow.AddSeconds(-1);
|
||||
|
||||
await using var context = CreateContext();
|
||||
var repo = new AuditLogRepository(context);
|
||||
var actor = CreateActor(repo);
|
||||
|
||||
actor.Tell(new IngestAuditEventsCommand(events), TestActor);
|
||||
ExpectMsg<IngestAuditEventsReply>(TimeSpan.FromSeconds(10));
|
||||
|
||||
var after = DateTime.UtcNow.AddSeconds(1);
|
||||
|
||||
await using var readContext = CreateContext();
|
||||
var rows = await readContext.Set<AuditEvent>()
|
||||
.Where(e => e.SourceSiteId == siteId)
|
||||
.ToListAsync();
|
||||
|
||||
Assert.Equal(3, rows.Count);
|
||||
Assert.All(rows, r =>
|
||||
{
|
||||
Assert.NotNull(r.IngestedAtUtc);
|
||||
Assert.InRange(r.IngestedAtUtc!.Value, before, after);
|
||||
});
|
||||
}
|
||||
|
||||
[SkippableFact]
|
||||
public async Task Receive_RepoThrowsForOneEvent_Other4StillPersisted()
|
||||
{
|
||||
Skip.IfNot(_fixture.Available, _fixture.SkipReason);
|
||||
|
||||
var siteId = NewSiteId();
|
||||
var events = Enumerable.Range(0, 5).Select(_ => NewEvent(siteId)).ToList();
|
||||
var poisonId = events[2].EventId;
|
||||
|
||||
// Wrapper repo that throws only when the poison EventId is being
|
||||
// inserted. The four neighbours must still land in MSSQL.
|
||||
await using var context = CreateContext();
|
||||
var realRepo = new AuditLogRepository(context);
|
||||
var wrappedRepo = new ThrowingRepository(realRepo, poisonId);
|
||||
var actor = CreateActor(wrappedRepo);
|
||||
|
||||
actor.Tell(new IngestAuditEventsCommand(events), TestActor);
|
||||
var reply = ExpectMsg<IngestAuditEventsReply>(TimeSpan.FromSeconds(10));
|
||||
|
||||
// The actor catches the throw per-row, so 4 ids are accepted and 1 is
|
||||
// left out.
|
||||
Assert.Equal(4, reply.AcceptedEventIds.Count);
|
||||
Assert.DoesNotContain(poisonId, reply.AcceptedEventIds);
|
||||
|
||||
await using var readContext = CreateContext();
|
||||
var rows = await readContext.Set<AuditEvent>()
|
||||
.Where(e => e.SourceSiteId == siteId)
|
||||
.ToListAsync();
|
||||
Assert.Equal(4, rows.Count);
|
||||
Assert.DoesNotContain(rows, r => r.EventId == poisonId);
|
||||
}
|
||||
|
||||
/// <summary>
|
||||
/// Tiny test double that delegates to a real repository but throws on a
|
||||
/// specified EventId. Used to verify per-row failure isolation: one bad
|
||||
/// row must not cause the rest of the batch to be lost.
|
||||
/// </summary>
|
||||
private sealed class ThrowingRepository : IAuditLogRepository
|
||||
{
|
||||
private readonly IAuditLogRepository _inner;
|
||||
private readonly Guid _poisonId;
|
||||
|
||||
public ThrowingRepository(IAuditLogRepository inner, Guid poisonId)
|
||||
{
|
||||
_inner = inner;
|
||||
_poisonId = poisonId;
|
||||
}
|
||||
|
||||
public Task InsertIfNotExistsAsync(AuditEvent evt, CancellationToken ct = default)
|
||||
{
|
||||
if (evt.EventId == _poisonId)
|
||||
{
|
||||
throw new InvalidOperationException("simulated repo failure for poison row");
|
||||
}
|
||||
return _inner.InsertIfNotExistsAsync(evt, ct);
|
||||
}
|
||||
|
||||
public Task<IReadOnlyList<AuditEvent>> QueryAsync(
|
||||
AuditLogQueryFilter filter, AuditLogPaging paging, CancellationToken ct = default) =>
|
||||
_inner.QueryAsync(filter, paging, ct);
|
||||
|
||||
public Task SwitchOutPartitionAsync(DateTime monthBoundary, CancellationToken ct = default) =>
|
||||
_inner.SwitchOutPartitionAsync(monthBoundary, ct);
|
||||
}
|
||||
}
|
||||
@@ -0,0 +1,341 @@
|
||||
using Akka.Actor;
|
||||
using Akka.TestKit.Xunit2;
|
||||
using Microsoft.EntityFrameworkCore;
|
||||
using Microsoft.Extensions.Logging.Abstractions;
|
||||
using Microsoft.Extensions.Options;
|
||||
using ScadaLink.AuditLog.Central;
|
||||
using ScadaLink.AuditLog.Site;
|
||||
using ScadaLink.AuditLog.Site.Telemetry;
|
||||
using ScadaLink.Commons.Entities.Audit;
|
||||
using ScadaLink.Commons.Interfaces.Repositories;
|
||||
using ScadaLink.Commons.Messages.Audit;
|
||||
using ScadaLink.Commons.Types.Audit;
|
||||
using ScadaLink.Commons.Types.Enums;
|
||||
using ScadaLink.ConfigurationDatabase;
|
||||
using ScadaLink.ConfigurationDatabase.Repositories;
|
||||
using ScadaLink.ConfigurationDatabase.Tests.Migrations;
|
||||
using ScadaLink.Communication.Grpc;
|
||||
|
||||
namespace ScadaLink.AuditLog.Tests.Integration;
|
||||
|
||||
/// <summary>
|
||||
/// Bundle H — end-to-end test wiring the full Audit Log #23 M2 sync-call pipeline:
|
||||
/// <see cref="FallbackAuditWriter"/> over a <see cref="SqliteAuditWriter"/> backed by
|
||||
/// an in-memory SQLite database; the <see cref="SiteAuditTelemetryActor"/> drains
|
||||
/// Pending rows and pushes them through a stub <see cref="ISiteStreamAuditClient"/>
|
||||
/// that forwards directly to the central <see cref="AuditLogIngestActor"/> backed
|
||||
/// by a real <see cref="AuditLogRepository"/> on the <see cref="MsSqlMigrationFixture"/>.
|
||||
/// </summary>
|
||||
/// <remarks>
|
||||
/// <para>
|
||||
/// This is a <b>component-level</b> integration test, not a full Akka-cluster
|
||||
/// test (per the M2 brainstorm decision). The stub gRPC client short-circuits
|
||||
/// the wire so we exercise the real telemetry actor, the real ingest actor, the
|
||||
/// real SQLite writer, and the real MSSQL repository — without standing up a
|
||||
/// Kestrel host or two-cluster topology.
|
||||
/// </para>
|
||||
/// <para>
|
||||
/// The site-side telemetry actor's <c>Drain</c> message is private; rather than
|
||||
/// expose it we drive the drain by setting <c>BusyIntervalSeconds = 1</c> so the
|
||||
/// initial scheduled tick fires within a second of actor start. Tests then
|
||||
/// <see cref="TestKitBase.AwaitAssertAsync"/> until the central repository
|
||||
/// observes the expected rows.
|
||||
/// </para>
|
||||
/// <para>
|
||||
/// Each test uses a unique <c>SourceSiteId</c> (Guid suffix) so concurrent tests
|
||||
/// and the per-fixture MSSQL database lifetime don't interfere with each other.
|
||||
/// </para>
|
||||
/// </remarks>
|
||||
public class SyncCallEmissionEndToEndTests : TestKit, IClassFixture<MsSqlMigrationFixture>
|
||||
{
|
||||
private readonly MsSqlMigrationFixture _fixture;
|
||||
|
||||
public SyncCallEmissionEndToEndTests(MsSqlMigrationFixture fixture)
|
||||
{
|
||||
_fixture = fixture;
|
||||
}
|
||||
|
||||
private static string NewSiteId() =>
|
||||
"test-bundle-h-" + Guid.NewGuid().ToString("N").Substring(0, 8);
|
||||
|
||||
private ScadaLinkDbContext CreateContext()
|
||||
{
|
||||
var options = new DbContextOptionsBuilder<ScadaLinkDbContext>()
|
||||
.UseSqlServer(_fixture.ConnectionString)
|
||||
.Options;
|
||||
return new ScadaLinkDbContext(options);
|
||||
}
|
||||
|
||||
private static AuditEvent NewEvent(string siteId, Guid? id = null) => new()
|
||||
{
|
||||
EventId = id ?? Guid.NewGuid(),
|
||||
OccurredAtUtc = DateTime.UtcNow,
|
||||
Channel = AuditChannel.ApiOutbound,
|
||||
Kind = AuditKind.ApiCall,
|
||||
Status = AuditStatus.Delivered,
|
||||
SourceSiteId = siteId,
|
||||
Target = "external-system-a/method",
|
||||
};
|
||||
|
||||
private static IOptions<SqliteAuditWriterOptions> InMemorySqliteOptions() =>
|
||||
Options.Create(new SqliteAuditWriterOptions
|
||||
{
|
||||
// Per-test unique database name + Mode=Memory + Cache=Shared keeps
|
||||
// the in-memory database alive for the duration of the test even
|
||||
// though Microsoft.Data.Sqlite tears the file down with the last
|
||||
// connection. The DatabasePath field is unused because we override
|
||||
// the connection string below.
|
||||
DatabasePath = "ignored",
|
||||
BatchSize = 64,
|
||||
ChannelCapacity = 1024,
|
||||
});
|
||||
|
||||
private static SqliteAuditWriter CreateInMemorySqliteWriter() =>
|
||||
// The 3rd constructor argument is connectionStringOverride. A unique
|
||||
// shared-cache in-memory URI keeps the schema scoped to this writer
|
||||
// instance and torn down when the writer is disposed.
|
||||
new SqliteAuditWriter(
|
||||
InMemorySqliteOptions(),
|
||||
NullLogger<SqliteAuditWriter>.Instance,
|
||||
connectionStringOverride: $"Data Source=file:auditlog-h-{Guid.NewGuid():N}?mode=memory&cache=shared");
|
||||
|
||||
private static IOptions<SiteAuditTelemetryOptions> FastTelemetryOptions() =>
|
||||
Options.Create(new SiteAuditTelemetryOptions
|
||||
{
|
||||
BatchSize = 256,
|
||||
// 1s for both intervals so the initial scheduled tick fires fast
|
||||
// and any failure-driven re-tick also fires fast — without
|
||||
// requiring a public Drain message to be exposed.
|
||||
BusyIntervalSeconds = 1,
|
||||
IdleIntervalSeconds = 1,
|
||||
});
|
||||
|
||||
private IActorRef CreateIngestActor(IAuditLogRepository repo) =>
|
||||
Sys.ActorOf(Props.Create(() => new AuditLogIngestActor(
|
||||
repo,
|
||||
NullLogger<AuditLogIngestActor>.Instance)));
|
||||
|
||||
private IActorRef CreateTelemetryActor(
|
||||
ISiteAuditQueue queue,
|
||||
ISiteStreamAuditClient client) =>
|
||||
Sys.ActorOf(Props.Create(() => new SiteAuditTelemetryActor(
|
||||
queue,
|
||||
client,
|
||||
FastTelemetryOptions(),
|
||||
NullLogger<SiteAuditTelemetryActor>.Instance)));
|
||||
|
||||
[SkippableFact]
|
||||
public async Task EndToEnd_OneWrittenEvent_Reaches_Central_AuditLog_Within_Reasonable_Time()
|
||||
{
|
||||
Skip.IfNot(_fixture.Available, _fixture.SkipReason);
|
||||
|
||||
var siteId = NewSiteId();
|
||||
|
||||
// Real central wiring: repo + ingest actor.
|
||||
await using var ingestContext = CreateContext();
|
||||
var ingestRepo = new AuditLogRepository(ingestContext);
|
||||
var ingestActor = CreateIngestActor(ingestRepo);
|
||||
|
||||
// Real site wiring: SQLite (in-memory) + ring + fallback + telemetry.
|
||||
await using var sqliteWriter = CreateInMemorySqliteWriter();
|
||||
var ring = new RingBufferFallback();
|
||||
var fallback = new FallbackAuditWriter(
|
||||
sqliteWriter,
|
||||
ring,
|
||||
new NoOpAuditWriteFailureCounter(),
|
||||
NullLogger<FallbackAuditWriter>.Instance);
|
||||
|
||||
var stubClient = new DirectActorSiteStreamAuditClient(ingestActor);
|
||||
CreateTelemetryActor(sqliteWriter, stubClient);
|
||||
|
||||
// Act — one fresh event written via the FallbackAuditWriter hot-path.
|
||||
var evt = NewEvent(siteId);
|
||||
await fallback.WriteAsync(evt);
|
||||
|
||||
// Assert — the central AuditLog row materialises within a window that
|
||||
// covers initial tick (1s) + a generous slack for SQLite + the actor
|
||||
// round-trip + EF/MSSQL latency.
|
||||
await AwaitAssertAsync(async () =>
|
||||
{
|
||||
await using var readContext = CreateContext();
|
||||
var readRepo = new AuditLogRepository(readContext);
|
||||
var rows = await readRepo.QueryAsync(
|
||||
new AuditLogQueryFilter(SourceSiteId: siteId),
|
||||
new AuditLogPaging(PageSize: 10));
|
||||
Assert.Single(rows);
|
||||
Assert.Equal(evt.EventId, rows[0].EventId);
|
||||
// Central stamps IngestedAtUtc; site never sets it.
|
||||
Assert.NotNull(rows[0].IngestedAtUtc);
|
||||
}, TimeSpan.FromSeconds(15));
|
||||
}
|
||||
|
||||
[SkippableFact]
|
||||
public async Task EndToEnd_GrpcStubError_RowStays_Pending_NextTick_Succeeds()
|
||||
{
|
||||
Skip.IfNot(_fixture.Available, _fixture.SkipReason);
|
||||
|
||||
var siteId = NewSiteId();
|
||||
|
||||
await using var ingestContext = CreateContext();
|
||||
var ingestRepo = new AuditLogRepository(ingestContext);
|
||||
var ingestActor = CreateIngestActor(ingestRepo);
|
||||
|
||||
await using var sqliteWriter = CreateInMemorySqliteWriter();
|
||||
var ring = new RingBufferFallback();
|
||||
var fallback = new FallbackAuditWriter(
|
||||
sqliteWriter,
|
||||
ring,
|
||||
new NoOpAuditWriteFailureCounter(),
|
||||
NullLogger<FallbackAuditWriter>.Instance);
|
||||
|
||||
// Stub fails the first push; subsequent calls flow through. The
|
||||
// telemetry actor's on-failure branch keeps rows in Pending state, so
|
||||
// the next tick re-reads them and tries again.
|
||||
var stubClient = new DirectActorSiteStreamAuditClient(ingestActor)
|
||||
{
|
||||
FailNextCallCount = 1,
|
||||
};
|
||||
CreateTelemetryActor(sqliteWriter, stubClient);
|
||||
|
||||
var evt = NewEvent(siteId);
|
||||
await fallback.WriteAsync(evt);
|
||||
|
||||
// Wait long enough for at least one failure-then-success cycle. With
|
||||
// both intervals = 1s the actor retries quickly; allow 15s for slow CI.
|
||||
await AwaitAssertAsync(async () =>
|
||||
{
|
||||
await using var readContext = CreateContext();
|
||||
var readRepo = new AuditLogRepository(readContext);
|
||||
var rows = await readRepo.QueryAsync(
|
||||
new AuditLogQueryFilter(SourceSiteId: siteId),
|
||||
new AuditLogPaging(PageSize: 10));
|
||||
Assert.Single(rows);
|
||||
Assert.Equal(evt.EventId, rows[0].EventId);
|
||||
}, TimeSpan.FromSeconds(15));
|
||||
|
||||
Assert.True(stubClient.CallCount >= 2,
|
||||
$"Expected at least one failed push + one successful push; saw {stubClient.CallCount} total client calls.");
|
||||
|
||||
// The site SQLite row must have flipped to Forwarded after the
|
||||
// successful retry. ReadPendingAsync only returns Pending rows; the
|
||||
// row should NOT show up there anymore.
|
||||
var stillPending = await sqliteWriter.ReadPendingAsync(64);
|
||||
Assert.DoesNotContain(stillPending, p => p.EventId == evt.EventId);
|
||||
}
|
||||
|
||||
[SkippableFact]
|
||||
public async Task EndToEnd_DuplicateSubmit_OnlyOneCentralRow()
|
||||
{
|
||||
Skip.IfNot(_fixture.Available, _fixture.SkipReason);
|
||||
|
||||
var siteId = NewSiteId();
|
||||
|
||||
await using var ingestContext = CreateContext();
|
||||
var ingestRepo = new AuditLogRepository(ingestContext);
|
||||
var ingestActor = CreateIngestActor(ingestRepo);
|
||||
|
||||
await using var sqliteWriter = CreateInMemorySqliteWriter();
|
||||
var ring = new RingBufferFallback();
|
||||
var fallback = new FallbackAuditWriter(
|
||||
sqliteWriter,
|
||||
ring,
|
||||
new NoOpAuditWriteFailureCounter(),
|
||||
NullLogger<FallbackAuditWriter>.Instance);
|
||||
|
||||
var stubClient = new DirectActorSiteStreamAuditClient(ingestActor);
|
||||
CreateTelemetryActor(sqliteWriter, stubClient);
|
||||
|
||||
// Both writes carry the SAME EventId. Site SQLite's PRIMARY KEY
|
||||
// constraint and the central repo's InsertIfNotExistsAsync both
|
||||
// enforce first-write-wins, so only one central row must materialise.
|
||||
var sharedId = Guid.NewGuid();
|
||||
var evt1 = NewEvent(siteId, sharedId);
|
||||
var evt2 = NewEvent(siteId, sharedId);
|
||||
|
||||
await fallback.WriteAsync(evt1);
|
||||
await fallback.WriteAsync(evt2);
|
||||
|
||||
await AwaitAssertAsync(async () =>
|
||||
{
|
||||
await using var readContext = CreateContext();
|
||||
var readRepo = new AuditLogRepository(readContext);
|
||||
var rows = await readRepo.QueryAsync(
|
||||
new AuditLogQueryFilter(SourceSiteId: siteId),
|
||||
new AuditLogPaging(PageSize: 10));
|
||||
Assert.Single(rows);
|
||||
Assert.Equal(sharedId, rows[0].EventId);
|
||||
}, TimeSpan.FromSeconds(15));
|
||||
}
|
||||
|
||||
/// <summary>
|
||||
/// Test double for <see cref="ISiteStreamAuditClient"/> that short-circuits
|
||||
/// the gRPC wire and forwards the batch directly to a central
|
||||
/// <see cref="AuditLogIngestActor"/> via Akka <see cref="Futures.Ask"/>. The
|
||||
/// Akka <see cref="IngestAuditEventsReply"/> is converted to the proto
|
||||
/// <see cref="IngestAck"/> that the telemetry actor expects.
|
||||
/// </summary>
|
||||
private sealed class DirectActorSiteStreamAuditClient : ISiteStreamAuditClient
|
||||
{
|
||||
private readonly IActorRef _ingestActor;
|
||||
private int _failsRemaining;
|
||||
private int _callCount;
|
||||
|
||||
public DirectActorSiteStreamAuditClient(IActorRef ingestActor)
|
||||
{
|
||||
_ingestActor = ingestActor ?? throw new ArgumentNullException(nameof(ingestActor));
|
||||
}
|
||||
|
||||
/// <summary>
|
||||
/// When > 0, the next <c>FailNextCallCount</c> invocations of
|
||||
/// <see cref="IngestAuditEventsAsync"/> throw to simulate a gRPC error;
|
||||
/// after that count is exhausted, calls succeed normally.
|
||||
/// </summary>
|
||||
public int FailNextCallCount
|
||||
{
|
||||
get => _failsRemaining;
|
||||
set => _failsRemaining = value;
|
||||
}
|
||||
|
||||
public int CallCount => Volatile.Read(ref _callCount);
|
||||
|
||||
public async Task<IngestAck> IngestAuditEventsAsync(AuditEventBatch batch, CancellationToken ct)
|
||||
{
|
||||
Interlocked.Increment(ref _callCount);
|
||||
|
||||
// Atomically consume one of the queued failures, if any. This
|
||||
// lets the test arm a deterministic number of failures before the
|
||||
// stub recovers.
|
||||
if (Interlocked.Decrement(ref _failsRemaining) >= 0)
|
||||
{
|
||||
throw new InvalidOperationException("simulated gRPC failure for test");
|
||||
}
|
||||
|
||||
// Decrement under-ran into negative territory; clamp at -1 to keep
|
||||
// the field bounded even under many calls.
|
||||
Interlocked.Exchange(ref _failsRemaining, -1);
|
||||
|
||||
// Decode the proto batch back into AuditEvent records — this
|
||||
// mirrors what the production SiteStreamGrpcServer does before
|
||||
// dispatching to the ingest actor (see Bundle D's gRPC handler).
|
||||
var events = new List<AuditEvent>(batch.Events.Count);
|
||||
foreach (var dto in batch.Events)
|
||||
{
|
||||
events.Add(ScadaLink.AuditLog.Telemetry.AuditEventMapper.FromDto(dto));
|
||||
}
|
||||
|
||||
// Ask the central actor; the reply carries the accepted EventIds.
|
||||
var reply = await _ingestActor
|
||||
.Ask<IngestAuditEventsReply>(
|
||||
new IngestAuditEventsCommand(events),
|
||||
TimeSpan.FromSeconds(10))
|
||||
.ConfigureAwait(false);
|
||||
|
||||
var ack = new IngestAck();
|
||||
foreach (var id in reply.AcceptedEventIds)
|
||||
{
|
||||
ack.AcceptedEventIds.Add(id.ToString());
|
||||
}
|
||||
return ack;
|
||||
}
|
||||
}
|
||||
}
|
||||
@@ -9,11 +9,30 @@
|
||||
</PropertyGroup>
|
||||
|
||||
<ItemGroup>
|
||||
<PackageReference Include="Akka.TestKit.Xunit2" />
|
||||
<PackageReference Include="coverlet.collector" />
|
||||
<!--
|
||||
Bundle D D2 needs Microsoft.Data.SqlClient for the MsSqlMigrationFixture
|
||||
(mirroring ScadaLink.ConfigurationDatabase.Tests). Pinning 6.1.1 here for
|
||||
the same reason: EF SqlServer 10.0.7 needs >= 6.1.1 but the central pin
|
||||
is 6.0.2 (production ExternalSystemGateway). Override is test-only.
|
||||
-->
|
||||
<PackageReference Include="Microsoft.Data.SqlClient" VersionOverride="6.1.1" />
|
||||
<PackageReference Include="Microsoft.Data.Sqlite" />
|
||||
<PackageReference Include="Microsoft.EntityFrameworkCore.SqlServer" />
|
||||
<PackageReference Include="Microsoft.Extensions.Configuration.Json" />
|
||||
<PackageReference Include="Microsoft.Extensions.DependencyInjection" />
|
||||
<PackageReference Include="Microsoft.Extensions.Logging.Abstractions" />
|
||||
<PackageReference Include="Microsoft.NET.Test.Sdk" />
|
||||
<PackageReference Include="NSubstitute" />
|
||||
<PackageReference Include="xunit" />
|
||||
<PackageReference Include="xunit.runner.visualstudio" />
|
||||
<!--
|
||||
SkippableFact pattern (xunit 2.9.x has no native Assert.Skip) — used by
|
||||
Bundle D D2 MSSQL-backed AuditLogIngestActor tests to report Skipped when
|
||||
the dev MSSQL container is not reachable.
|
||||
-->
|
||||
<PackageReference Include="Xunit.SkippableFact" />
|
||||
</ItemGroup>
|
||||
|
||||
<ItemGroup>
|
||||
@@ -22,6 +41,13 @@
|
||||
|
||||
<ItemGroup>
|
||||
<ProjectReference Include="../../src/ScadaLink.AuditLog/ScadaLink.AuditLog.csproj" />
|
||||
<!--
|
||||
D2: the AuditLogIngestActor tests use the real AuditLogRepository against
|
||||
a per-test MSSQL database via MsSqlMigrationFixture. The fixture lives in
|
||||
ScadaLink.ConfigurationDatabase.Tests; we reference that test project so
|
||||
the fixture + EF migrations come along without duplicating them.
|
||||
-->
|
||||
<ProjectReference Include="../ScadaLink.ConfigurationDatabase.Tests/ScadaLink.ConfigurationDatabase.Tests.csproj" />
|
||||
</ItemGroup>
|
||||
|
||||
</Project>
|
||||
|
||||
133
tests/ScadaLink.AuditLog.Tests/Site/FallbackAuditWriterTests.cs
Normal file
133
tests/ScadaLink.AuditLog.Tests/Site/FallbackAuditWriterTests.cs
Normal file
@@ -0,0 +1,133 @@
|
||||
using Microsoft.Extensions.Logging.Abstractions;
|
||||
using NSubstitute;
|
||||
using ScadaLink.AuditLog.Site;
|
||||
using ScadaLink.Commons.Entities.Audit;
|
||||
using ScadaLink.Commons.Interfaces.Services;
|
||||
using ScadaLink.Commons.Types.Enums;
|
||||
|
||||
namespace ScadaLink.AuditLog.Tests.Site;
|
||||
|
||||
/// <summary>
|
||||
/// Bundle B (M2-T4) tests for <see cref="FallbackAuditWriter"/> — composes the
|
||||
/// primary <see cref="SqliteAuditWriter"/>, the drop-oldest
|
||||
/// <see cref="RingBufferFallback"/>, and an
|
||||
/// <see cref="IAuditWriteFailureCounter"/> health counter.
|
||||
/// </summary>
|
||||
public class FallbackAuditWriterTests
|
||||
{
|
||||
private static AuditEvent NewEvent(string? target = null) => new()
|
||||
{
|
||||
EventId = Guid.NewGuid(),
|
||||
OccurredAtUtc = DateTime.UtcNow,
|
||||
Channel = AuditChannel.ApiOutbound,
|
||||
Kind = AuditKind.ApiCall,
|
||||
Status = AuditStatus.Delivered,
|
||||
Target = target,
|
||||
PayloadTruncated = false,
|
||||
ForwardState = AuditForwardState.Pending,
|
||||
};
|
||||
|
||||
/// <summary>Flip-switch primary writer mock.</summary>
|
||||
private sealed class FlipSwitchPrimary : IAuditWriter
|
||||
{
|
||||
public bool FailNext { get; set; }
|
||||
public List<AuditEvent> Written { get; } = new();
|
||||
|
||||
public Task WriteAsync(AuditEvent evt, CancellationToken ct = default)
|
||||
{
|
||||
if (FailNext)
|
||||
{
|
||||
return Task.FromException(new InvalidOperationException("primary down"));
|
||||
}
|
||||
Written.Add(evt);
|
||||
return Task.CompletedTask;
|
||||
}
|
||||
}
|
||||
|
||||
[Fact]
|
||||
public async Task WriteAsync_PrimaryThrows_EventLandsInRing_CallReturnsSuccess()
|
||||
{
|
||||
var primary = new FlipSwitchPrimary { FailNext = true };
|
||||
var ring = new RingBufferFallback(capacity: 16);
|
||||
var counter = Substitute.For<IAuditWriteFailureCounter>();
|
||||
|
||||
var fallback = new FallbackAuditWriter(primary, ring, counter, NullLogger<FallbackAuditWriter>.Instance);
|
||||
|
||||
var evt = NewEvent("doomed");
|
||||
// Must NOT throw — audit failures are always swallowed at this layer.
|
||||
await fallback.WriteAsync(evt);
|
||||
|
||||
Assert.Equal(1, ring.Count);
|
||||
counter.Received(1).Increment();
|
||||
}
|
||||
|
||||
[Fact]
|
||||
public async Task WriteAsync_PrimaryRecovers_RingDrains_InFIFOOrder_OnNextWrite()
|
||||
{
|
||||
var primary = new FlipSwitchPrimary { FailNext = true };
|
||||
var ring = new RingBufferFallback(capacity: 16);
|
||||
var counter = Substitute.For<IAuditWriteFailureCounter>();
|
||||
|
||||
var fallback = new FallbackAuditWriter(primary, ring, counter, NullLogger<FallbackAuditWriter>.Instance);
|
||||
|
||||
var failed = new[] { NewEvent("a"), NewEvent("b"), NewEvent("c") };
|
||||
foreach (var e in failed)
|
||||
{
|
||||
await fallback.WriteAsync(e);
|
||||
}
|
||||
|
||||
Assert.Equal(3, ring.Count);
|
||||
|
||||
// Primary recovers; the very next successful write should drain the
|
||||
// ring in FIFO order through the primary.
|
||||
primary.FailNext = false;
|
||||
var trigger = NewEvent("trigger");
|
||||
await fallback.WriteAsync(trigger);
|
||||
|
||||
Assert.Equal(0, ring.Count);
|
||||
// Order: the triggering event reaches the primary first (that's the
|
||||
// signal the primary has recovered), then the backlog drains in FIFO
|
||||
// submission order behind it.
|
||||
Assert.Equal(4, primary.Written.Count);
|
||||
Assert.Equal("trigger", primary.Written[0].Target);
|
||||
Assert.Equal("a", primary.Written[1].Target);
|
||||
Assert.Equal("b", primary.Written[2].Target);
|
||||
Assert.Equal("c", primary.Written[3].Target);
|
||||
}
|
||||
|
||||
[Fact]
|
||||
public async Task WriteAsync_PrimaryAlwaysSucceeds_Ring_StaysEmpty()
|
||||
{
|
||||
var primary = new FlipSwitchPrimary();
|
||||
var ring = new RingBufferFallback(capacity: 16);
|
||||
var counter = Substitute.For<IAuditWriteFailureCounter>();
|
||||
|
||||
var fallback = new FallbackAuditWriter(primary, ring, counter, NullLogger<FallbackAuditWriter>.Instance);
|
||||
|
||||
for (int i = 0; i < 10; i++)
|
||||
{
|
||||
await fallback.WriteAsync(NewEvent());
|
||||
}
|
||||
|
||||
Assert.Equal(0, ring.Count);
|
||||
Assert.Equal(10, primary.Written.Count);
|
||||
counter.DidNotReceive().Increment();
|
||||
}
|
||||
|
||||
[Fact]
|
||||
public async Task WriteAsync_FailureCounter_Incremented_Per_PrimaryFailure()
|
||||
{
|
||||
var primary = new FlipSwitchPrimary { FailNext = true };
|
||||
var ring = new RingBufferFallback(capacity: 16);
|
||||
var counter = Substitute.For<IAuditWriteFailureCounter>();
|
||||
|
||||
var fallback = new FallbackAuditWriter(primary, ring, counter, NullLogger<FallbackAuditWriter>.Instance);
|
||||
|
||||
for (int i = 0; i < 5; i++)
|
||||
{
|
||||
await fallback.WriteAsync(NewEvent());
|
||||
}
|
||||
|
||||
counter.Received(5).Increment();
|
||||
}
|
||||
}
|
||||
@@ -0,0 +1,46 @@
|
||||
using NSubstitute;
|
||||
using ScadaLink.AuditLog.Site;
|
||||
using ScadaLink.HealthMonitoring;
|
||||
|
||||
namespace ScadaLink.AuditLog.Tests.Site;
|
||||
|
||||
/// <summary>
|
||||
/// Bundle G (M2-T11) — the <see cref="HealthMetricsAuditWriteFailureCounter"/>
|
||||
/// adapter is the production binding for <see cref="IAuditWriteFailureCounter"/>
|
||||
/// on site nodes; it forwards every FallbackAuditWriter primary failure into
|
||||
/// the shared <see cref="ISiteHealthCollector"/> so the site health report
|
||||
/// surfaces the failure count as <c>SiteAuditWriteFailures</c>.
|
||||
/// </summary>
|
||||
public class HealthMetricsAuditWriteFailureCounterTests
|
||||
{
|
||||
[Fact]
|
||||
public void Increment_Routes_To_Collector_IncrementSiteAuditWriteFailures()
|
||||
{
|
||||
var collector = Substitute.For<ISiteHealthCollector>();
|
||||
var counter = new HealthMetricsAuditWriteFailureCounter(collector);
|
||||
|
||||
counter.Increment();
|
||||
|
||||
collector.Received(1).IncrementSiteAuditWriteFailures();
|
||||
}
|
||||
|
||||
[Fact]
|
||||
public void Increment_Multiple_Calls_Route_To_Collector_Each_Time()
|
||||
{
|
||||
var collector = Substitute.For<ISiteHealthCollector>();
|
||||
var counter = new HealthMetricsAuditWriteFailureCounter(collector);
|
||||
|
||||
counter.Increment();
|
||||
counter.Increment();
|
||||
counter.Increment();
|
||||
|
||||
collector.Received(3).IncrementSiteAuditWriteFailures();
|
||||
}
|
||||
|
||||
[Fact]
|
||||
public void Construction_With_Null_Collector_Throws_ArgumentNullException()
|
||||
{
|
||||
Assert.Throws<ArgumentNullException>(
|
||||
() => new HealthMetricsAuditWriteFailureCounter(null!));
|
||||
}
|
||||
}
|
||||
@@ -0,0 +1,91 @@
|
||||
using ScadaLink.AuditLog.Site;
|
||||
using ScadaLink.Commons.Entities.Audit;
|
||||
using ScadaLink.Commons.Types.Enums;
|
||||
|
||||
namespace ScadaLink.AuditLog.Tests.Site;
|
||||
|
||||
/// <summary>
|
||||
/// Bundle B (M2-T3) tests for <see cref="RingBufferFallback"/> — the
|
||||
/// drop-oldest fallback used by <see cref="FallbackAuditWriter"/> when the
|
||||
/// primary SQLite writer is throwing.
|
||||
/// </summary>
|
||||
public class RingBufferFallbackTests
|
||||
{
|
||||
private static AuditEvent NewEvent(string? target = null)
|
||||
{
|
||||
return new AuditEvent
|
||||
{
|
||||
EventId = Guid.NewGuid(),
|
||||
OccurredAtUtc = DateTime.UtcNow,
|
||||
Channel = AuditChannel.ApiOutbound,
|
||||
Kind = AuditKind.ApiCall,
|
||||
Status = AuditStatus.Delivered,
|
||||
Target = target,
|
||||
PayloadTruncated = false,
|
||||
ForwardState = AuditForwardState.Pending,
|
||||
};
|
||||
}
|
||||
|
||||
[Fact]
|
||||
public async Task Enqueue_1025_Into_1024Cap_Ring_DropsOldest_AndRaisesOverflowOnce()
|
||||
{
|
||||
var ring = new RingBufferFallback(capacity: 1024);
|
||||
var overflowCount = 0;
|
||||
ring.RingBufferOverflowed += () => Interlocked.Increment(ref overflowCount);
|
||||
|
||||
var events = Enumerable.Range(0, 1025).Select(i => NewEvent(target: i.ToString())).ToList();
|
||||
foreach (var e in events)
|
||||
{
|
||||
Assert.True(ring.TryEnqueue(e));
|
||||
}
|
||||
|
||||
Assert.Equal(1, overflowCount);
|
||||
|
||||
// The surviving 1024 are events[1..1024] (oldest dropped).
|
||||
var drained = new List<AuditEvent>();
|
||||
ring.Complete();
|
||||
await foreach (var e in ring.DrainAsync(CancellationToken.None))
|
||||
{
|
||||
drained.Add(e);
|
||||
}
|
||||
|
||||
Assert.Equal(1024, drained.Count);
|
||||
Assert.Equal("1", drained[0].Target);
|
||||
Assert.Equal("1024", drained[^1].Target);
|
||||
}
|
||||
|
||||
[Fact]
|
||||
public async Task DrainAsync_Yields_FIFO_Then_Completes_When_Empty()
|
||||
{
|
||||
var ring = new RingBufferFallback(capacity: 16);
|
||||
var enqueued = Enumerable.Range(0, 5).Select(i => NewEvent(target: i.ToString())).ToList();
|
||||
foreach (var e in enqueued)
|
||||
{
|
||||
Assert.True(ring.TryEnqueue(e));
|
||||
}
|
||||
|
||||
ring.Complete();
|
||||
|
||||
var drained = new List<AuditEvent>();
|
||||
await foreach (var e in ring.DrainAsync(CancellationToken.None))
|
||||
{
|
||||
drained.Add(e);
|
||||
}
|
||||
|
||||
Assert.Equal(5, drained.Count);
|
||||
for (int i = 0; i < 5; i++)
|
||||
{
|
||||
Assert.Equal(i.ToString(), drained[i].Target);
|
||||
}
|
||||
}
|
||||
|
||||
[Fact]
|
||||
public void TryEnqueue_AllSucceeds_ReturnsTrue()
|
||||
{
|
||||
var ring = new RingBufferFallback(capacity: 16);
|
||||
for (int i = 0; i < 8; i++)
|
||||
{
|
||||
Assert.True(ring.TryEnqueue(NewEvent()));
|
||||
}
|
||||
}
|
||||
}
|
||||
@@ -0,0 +1,128 @@
|
||||
using Microsoft.Data.Sqlite;
|
||||
using Microsoft.Extensions.Logging.Abstractions;
|
||||
using Microsoft.Extensions.Options;
|
||||
using ScadaLink.AuditLog.Site;
|
||||
|
||||
namespace ScadaLink.AuditLog.Tests.Site;
|
||||
|
||||
/// <summary>
|
||||
/// Bundle B (M2-T1) schema-bootstrap tests for <see cref="SqliteAuditWriter"/>.
|
||||
/// Uses an in-memory shared-cache SQLite database so the same connection name
|
||||
/// reaches the same file-less db across both the writer and the verifier.
|
||||
/// </summary>
|
||||
public class SqliteAuditWriterSchemaTests
|
||||
{
|
||||
/// <summary>
|
||||
/// Each test uses a unique shared-cache in-memory database. The
|
||||
/// "Mode=Memory;Cache=Shared" syntax lets two SqliteConnections see the same
|
||||
/// in-memory store as long as both use the same Data Source name.
|
||||
/// </summary>
|
||||
private static (SqliteAuditWriter writer, string dataSource) CreateWriter(string testName)
|
||||
{
|
||||
var dataSource = $"file:{testName}-{Guid.NewGuid():N}?mode=memory&cache=shared";
|
||||
var options = new SqliteAuditWriterOptions
|
||||
{
|
||||
DatabasePath = dataSource,
|
||||
};
|
||||
// The writer uses raw "Data Source={path}" by appending Cache=Shared. Override
|
||||
// by passing the full connection string via the connectionStringOverride hook.
|
||||
var writer = new SqliteAuditWriter(
|
||||
Options.Create(options),
|
||||
NullLogger<SqliteAuditWriter>.Instance,
|
||||
connectionStringOverride: $"Data Source={dataSource};Cache=Shared");
|
||||
return (writer, dataSource);
|
||||
}
|
||||
|
||||
private static SqliteConnection OpenVerifierConnection(string dataSource)
|
||||
{
|
||||
var connection = new SqliteConnection($"Data Source={dataSource};Cache=Shared");
|
||||
connection.Open();
|
||||
return connection;
|
||||
}
|
||||
|
||||
[Fact]
|
||||
public void Opens_Creates_AuditLog_Table_With_20Columns_And_PK_On_EventId()
|
||||
{
|
||||
var (writer, dataSource) = CreateWriter(nameof(Opens_Creates_AuditLog_Table_With_20Columns_And_PK_On_EventId));
|
||||
using (writer)
|
||||
{
|
||||
using var connection = OpenVerifierConnection(dataSource);
|
||||
using var cmd = connection.CreateCommand();
|
||||
cmd.CommandText = "PRAGMA table_info(AuditLog);";
|
||||
using var reader = cmd.ExecuteReader();
|
||||
|
||||
var columns = new List<(string Name, int Pk)>();
|
||||
while (reader.Read())
|
||||
{
|
||||
columns.Add((reader.GetString(1), reader.GetInt32(5)));
|
||||
}
|
||||
|
||||
Assert.Equal(20, columns.Count);
|
||||
|
||||
var expected = new[]
|
||||
{
|
||||
"EventId", "OccurredAtUtc", "Channel", "Kind", "CorrelationId",
|
||||
"SourceSiteId", "SourceInstanceId", "SourceScript", "Actor", "Target",
|
||||
"Status", "HttpStatus", "DurationMs", "ErrorMessage", "ErrorDetail",
|
||||
"RequestSummary", "ResponseSummary", "PayloadTruncated", "Extra",
|
||||
"ForwardState",
|
||||
};
|
||||
Assert.Equal(expected.OrderBy(n => n), columns.Select(c => c.Name).OrderBy(n => n));
|
||||
|
||||
// PK is EventId only.
|
||||
var pkColumns = columns.Where(c => c.Pk > 0).Select(c => c.Name).ToList();
|
||||
Assert.Single(pkColumns);
|
||||
Assert.Equal("EventId", pkColumns[0]);
|
||||
}
|
||||
}
|
||||
|
||||
[Fact]
|
||||
public void Opens_Creates_IX_ForwardState_Occurred_Index()
|
||||
{
|
||||
var (writer, dataSource) = CreateWriter(nameof(Opens_Creates_IX_ForwardState_Occurred_Index));
|
||||
using (writer)
|
||||
{
|
||||
using var connection = OpenVerifierConnection(dataSource);
|
||||
using var cmd = connection.CreateCommand();
|
||||
cmd.CommandText = "PRAGMA index_list(AuditLog);";
|
||||
using var reader = cmd.ExecuteReader();
|
||||
|
||||
var indexNames = new List<string>();
|
||||
while (reader.Read())
|
||||
{
|
||||
indexNames.Add(reader.GetString(1));
|
||||
}
|
||||
|
||||
Assert.Contains("IX_SiteAuditLog_ForwardState_Occurred", indexNames);
|
||||
|
||||
// Verify the index columns are ForwardState, OccurredAtUtc in that order.
|
||||
using var infoCmd = connection.CreateCommand();
|
||||
infoCmd.CommandText = "PRAGMA index_info(IX_SiteAuditLog_ForwardState_Occurred);";
|
||||
using var infoReader = infoCmd.ExecuteReader();
|
||||
|
||||
var indexColumns = new List<string>();
|
||||
while (infoReader.Read())
|
||||
{
|
||||
indexColumns.Add(infoReader.GetString(2));
|
||||
}
|
||||
|
||||
Assert.Equal(new[] { "ForwardState", "OccurredAtUtc" }, indexColumns);
|
||||
}
|
||||
}
|
||||
|
||||
[Fact]
|
||||
public void PRAGMA_auto_vacuum_Is_INCREMENTAL()
|
||||
{
|
||||
var (writer, dataSource) = CreateWriter(nameof(PRAGMA_auto_vacuum_Is_INCREMENTAL));
|
||||
using (writer)
|
||||
{
|
||||
using var connection = OpenVerifierConnection(dataSource);
|
||||
using var cmd = connection.CreateCommand();
|
||||
cmd.CommandText = "PRAGMA auto_vacuum;";
|
||||
var value = Convert.ToInt32(cmd.ExecuteScalar());
|
||||
|
||||
// INCREMENTAL = 2 (0 = NONE, 1 = FULL, 2 = INCREMENTAL).
|
||||
Assert.Equal(2, value);
|
||||
}
|
||||
}
|
||||
}
|
||||
@@ -0,0 +1,207 @@
|
||||
using Microsoft.Data.Sqlite;
|
||||
using Microsoft.Extensions.Logging.Abstractions;
|
||||
using Microsoft.Extensions.Options;
|
||||
using ScadaLink.AuditLog.Site;
|
||||
using ScadaLink.Commons.Entities.Audit;
|
||||
using ScadaLink.Commons.Types.Enums;
|
||||
|
||||
namespace ScadaLink.AuditLog.Tests.Site;
|
||||
|
||||
/// <summary>
|
||||
/// Bundle B (M2-T2) hot-path tests for <see cref="SqliteAuditWriter"/>. Exercise
|
||||
/// the Channel-based enqueue, the background writer's batch INSERTs, duplicate-
|
||||
/// EventId swallowing, ForwardState defaulting, and the
|
||||
/// <see cref="SqliteAuditWriter.ReadPendingAsync"/> /
|
||||
/// <see cref="SqliteAuditWriter.MarkForwardedAsync"/> support surface that
|
||||
/// Bundle D's telemetry actor will call.
|
||||
/// </summary>
|
||||
public class SqliteAuditWriterWriteTests
|
||||
{
|
||||
private static (SqliteAuditWriter writer, string dataSource) CreateWriter(
|
||||
string testName,
|
||||
int? channelCapacity = null)
|
||||
{
|
||||
var dataSource = $"file:{testName}-{Guid.NewGuid():N}?mode=memory&cache=shared";
|
||||
var opts = new SqliteAuditWriterOptions { DatabasePath = dataSource };
|
||||
if (channelCapacity is int cap)
|
||||
{
|
||||
opts.ChannelCapacity = cap;
|
||||
}
|
||||
|
||||
var writer = new SqliteAuditWriter(
|
||||
Options.Create(opts),
|
||||
NullLogger<SqliteAuditWriter>.Instance,
|
||||
connectionStringOverride: $"Data Source={dataSource};Cache=Shared");
|
||||
return (writer, dataSource);
|
||||
}
|
||||
|
||||
private static SqliteConnection OpenVerifierConnection(string dataSource)
|
||||
{
|
||||
var connection = new SqliteConnection($"Data Source={dataSource};Cache=Shared");
|
||||
connection.Open();
|
||||
return connection;
|
||||
}
|
||||
|
||||
private static AuditEvent NewEvent(Guid? id = null, DateTime? occurredAtUtc = null)
|
||||
{
|
||||
return new AuditEvent
|
||||
{
|
||||
EventId = id ?? Guid.NewGuid(),
|
||||
OccurredAtUtc = occurredAtUtc ?? DateTime.UtcNow,
|
||||
Channel = AuditChannel.ApiOutbound,
|
||||
Kind = AuditKind.ApiCall,
|
||||
Status = AuditStatus.Delivered,
|
||||
PayloadTruncated = false,
|
||||
};
|
||||
}
|
||||
|
||||
[Fact]
|
||||
public async Task WriteAsync_FreshEvent_PersistsWithForwardStatePending()
|
||||
{
|
||||
var (writer, dataSource) = CreateWriter(nameof(WriteAsync_FreshEvent_PersistsWithForwardStatePending));
|
||||
await using var _ = writer;
|
||||
|
||||
var evt = NewEvent();
|
||||
await writer.WriteAsync(evt);
|
||||
|
||||
using var connection = OpenVerifierConnection(dataSource);
|
||||
using var cmd = connection.CreateCommand();
|
||||
cmd.CommandText = "SELECT ForwardState FROM AuditLog WHERE EventId = $id;";
|
||||
cmd.Parameters.AddWithValue("$id", evt.EventId.ToString());
|
||||
var actual = cmd.ExecuteScalar() as string;
|
||||
|
||||
Assert.Equal(AuditForwardState.Pending.ToString(), actual);
|
||||
}
|
||||
|
||||
[Fact]
|
||||
public async Task WriteAsync_Concurrent_1000Calls_All_Persist_NoExceptions()
|
||||
{
|
||||
var (writer, dataSource) = CreateWriter(nameof(WriteAsync_Concurrent_1000Calls_All_Persist_NoExceptions));
|
||||
await using var _ = writer;
|
||||
|
||||
var events = Enumerable.Range(0, 1000).Select(_ => NewEvent()).ToList();
|
||||
|
||||
await Parallel.ForEachAsync(events, new ParallelOptions { MaxDegreeOfParallelism = 16 },
|
||||
async (evt, ct) => await writer.WriteAsync(evt, ct));
|
||||
|
||||
using var connection = OpenVerifierConnection(dataSource);
|
||||
using var cmd = connection.CreateCommand();
|
||||
cmd.CommandText = "SELECT COUNT(*) FROM AuditLog;";
|
||||
var count = Convert.ToInt64(cmd.ExecuteScalar());
|
||||
|
||||
Assert.Equal(1000, count);
|
||||
}
|
||||
|
||||
[Fact]
|
||||
public async Task WriteAsync_DuplicateEventId_FirstWriteWins_NoException()
|
||||
{
|
||||
var (writer, dataSource) = CreateWriter(nameof(WriteAsync_DuplicateEventId_FirstWriteWins_NoException));
|
||||
await using var _ = writer;
|
||||
|
||||
var sharedId = Guid.NewGuid();
|
||||
var first = NewEvent(sharedId) with { Target = "first" };
|
||||
var second = NewEvent(sharedId) with { Target = "second" };
|
||||
|
||||
await writer.WriteAsync(first);
|
||||
await writer.WriteAsync(second);
|
||||
|
||||
using var connection = OpenVerifierConnection(dataSource);
|
||||
using var countCmd = connection.CreateCommand();
|
||||
countCmd.CommandText = "SELECT COUNT(*) FROM AuditLog WHERE EventId = $id;";
|
||||
countCmd.Parameters.AddWithValue("$id", sharedId.ToString());
|
||||
var count = Convert.ToInt64(countCmd.ExecuteScalar());
|
||||
|
||||
Assert.Equal(1, count);
|
||||
|
||||
using var targetCmd = connection.CreateCommand();
|
||||
targetCmd.CommandText = "SELECT Target FROM AuditLog WHERE EventId = $id;";
|
||||
targetCmd.Parameters.AddWithValue("$id", sharedId.ToString());
|
||||
Assert.Equal("first", targetCmd.ExecuteScalar() as string);
|
||||
}
|
||||
|
||||
[Fact]
|
||||
public async Task WriteAsync_ForcesForwardStatePending_IfNull()
|
||||
{
|
||||
var (writer, dataSource) = CreateWriter(nameof(WriteAsync_ForcesForwardStatePending_IfNull));
|
||||
await using var _ = writer;
|
||||
|
||||
var evt = NewEvent() with { ForwardState = null };
|
||||
await writer.WriteAsync(evt);
|
||||
|
||||
using var connection = OpenVerifierConnection(dataSource);
|
||||
using var cmd = connection.CreateCommand();
|
||||
cmd.CommandText = "SELECT ForwardState FROM AuditLog WHERE EventId = $id;";
|
||||
cmd.Parameters.AddWithValue("$id", evt.EventId.ToString());
|
||||
|
||||
Assert.Equal(AuditForwardState.Pending.ToString(), cmd.ExecuteScalar() as string);
|
||||
}
|
||||
|
||||
[Fact]
|
||||
public async Task ReadPendingAsync_Returns_OldestFirst_LimitedToN()
|
||||
{
|
||||
var (writer, _) = CreateWriter(nameof(ReadPendingAsync_Returns_OldestFirst_LimitedToN));
|
||||
await using var _writer = writer;
|
||||
|
||||
var baseTime = new DateTime(2026, 5, 20, 12, 0, 0, DateTimeKind.Utc);
|
||||
var evts = new[]
|
||||
{
|
||||
NewEvent(occurredAtUtc: baseTime.AddSeconds(5)),
|
||||
NewEvent(occurredAtUtc: baseTime.AddSeconds(1)),
|
||||
NewEvent(occurredAtUtc: baseTime.AddSeconds(3)),
|
||||
NewEvent(occurredAtUtc: baseTime.AddSeconds(2)),
|
||||
NewEvent(occurredAtUtc: baseTime.AddSeconds(4)),
|
||||
};
|
||||
|
||||
foreach (var e in evts)
|
||||
{
|
||||
await writer.WriteAsync(e);
|
||||
}
|
||||
|
||||
var rows = await writer.ReadPendingAsync(limit: 3);
|
||||
|
||||
Assert.Equal(3, rows.Count);
|
||||
Assert.Equal(baseTime.AddSeconds(1), rows[0].OccurredAtUtc);
|
||||
Assert.Equal(baseTime.AddSeconds(2), rows[1].OccurredAtUtc);
|
||||
Assert.Equal(baseTime.AddSeconds(3), rows[2].OccurredAtUtc);
|
||||
}
|
||||
|
||||
[Fact]
|
||||
public async Task MarkForwardedAsync_FlipsRowsToForwarded()
|
||||
{
|
||||
var (writer, dataSource) = CreateWriter(nameof(MarkForwardedAsync_FlipsRowsToForwarded));
|
||||
await using var _ = writer;
|
||||
|
||||
var ids = new[] { Guid.NewGuid(), Guid.NewGuid(), Guid.NewGuid() };
|
||||
foreach (var id in ids)
|
||||
{
|
||||
await writer.WriteAsync(NewEvent(id));
|
||||
}
|
||||
|
||||
await writer.MarkForwardedAsync(ids);
|
||||
|
||||
using var connection = OpenVerifierConnection(dataSource);
|
||||
using var cmd = connection.CreateCommand();
|
||||
cmd.CommandText = "SELECT ForwardState, COUNT(*) FROM AuditLog GROUP BY ForwardState;";
|
||||
using var reader = cmd.ExecuteReader();
|
||||
var byState = new Dictionary<string, long>();
|
||||
while (reader.Read())
|
||||
{
|
||||
byState[reader.GetString(0)] = reader.GetInt64(1);
|
||||
}
|
||||
|
||||
Assert.Equal(3, byState[AuditForwardState.Forwarded.ToString()]);
|
||||
Assert.False(byState.ContainsKey(AuditForwardState.Pending.ToString()));
|
||||
}
|
||||
|
||||
[Fact]
|
||||
public async Task MarkForwardedAsync_NonExistentId_NoThrow()
|
||||
{
|
||||
var (writer, _) = CreateWriter(nameof(MarkForwardedAsync_NonExistentId_NoThrow));
|
||||
await using var _writer = writer;
|
||||
|
||||
var phantomIds = new[] { Guid.NewGuid(), Guid.NewGuid() };
|
||||
|
||||
await writer.MarkForwardedAsync(phantomIds);
|
||||
// No assertion needed: the call must complete without throwing.
|
||||
}
|
||||
}
|
||||
@@ -0,0 +1,235 @@
|
||||
using Akka.Actor;
|
||||
using Akka.TestKit.Xunit2;
|
||||
using Google.Protobuf.WellKnownTypes;
|
||||
using Microsoft.Extensions.Logging.Abstractions;
|
||||
using Microsoft.Extensions.Options;
|
||||
using NSubstitute;
|
||||
using NSubstitute.ExceptionExtensions;
|
||||
using ScadaLink.AuditLog.Site.Telemetry;
|
||||
using ScadaLink.Commons.Entities.Audit;
|
||||
using ScadaLink.Commons.Types.Enums;
|
||||
using ScadaLink.Communication.Grpc;
|
||||
|
||||
namespace ScadaLink.AuditLog.Tests.Site.Telemetry;
|
||||
|
||||
/// <summary>
|
||||
/// Bundle D D1 tests for <see cref="SiteAuditTelemetryActor"/>. The actor drains
|
||||
/// the site SQLite queue via <see cref="ISiteAuditQueue"/>, pushes batches via
|
||||
/// <see cref="ISiteStreamAuditClient"/>, and flips ack'd rows to Forwarded.
|
||||
/// Both collaborators are NSubstitute mocks so the tests never touch real
|
||||
/// SQLite or gRPC.
|
||||
/// </summary>
|
||||
public class SiteAuditTelemetryActorTests : TestKit
|
||||
{
|
||||
private readonly ISiteAuditQueue _queue = Substitute.For<ISiteAuditQueue>();
|
||||
private readonly ISiteStreamAuditClient _client = Substitute.For<ISiteStreamAuditClient>();
|
||||
|
||||
/// <summary>
|
||||
/// Fast options so tests don't stall waiting for the scheduler. 1s busy /
|
||||
/// 2s idle still exercises the busy-vs-idle branching, but each test
|
||||
/// completes in < 5 s wall-clock.
|
||||
/// </summary>
|
||||
private static IOptions<SiteAuditTelemetryOptions> Opts(
|
||||
int batchSize = 256,
|
||||
int busySeconds = 1,
|
||||
int idleSeconds = 2) =>
|
||||
Options.Create(new SiteAuditTelemetryOptions
|
||||
{
|
||||
BatchSize = batchSize,
|
||||
BusyIntervalSeconds = busySeconds,
|
||||
IdleIntervalSeconds = idleSeconds,
|
||||
});
|
||||
|
||||
private IActorRef CreateActor(IOptions<SiteAuditTelemetryOptions>? options = null) =>
|
||||
Sys.ActorOf(Props.Create(() => new SiteAuditTelemetryActor(
|
||||
_queue,
|
||||
_client,
|
||||
options ?? Opts(),
|
||||
NullLogger<SiteAuditTelemetryActor>.Instance)));
|
||||
|
||||
private static AuditEvent NewEvent(Guid? id = null) => new()
|
||||
{
|
||||
EventId = id ?? Guid.NewGuid(),
|
||||
OccurredAtUtc = new DateTime(2026, 5, 20, 10, 0, 0, DateTimeKind.Utc),
|
||||
Channel = AuditChannel.ApiOutbound,
|
||||
Kind = AuditKind.ApiCall,
|
||||
Status = AuditStatus.Delivered,
|
||||
SourceSiteId = "site-1",
|
||||
ForwardState = AuditForwardState.Pending,
|
||||
};
|
||||
|
||||
private static IngestAck AckAll(IReadOnlyList<AuditEvent> events)
|
||||
{
|
||||
var ack = new IngestAck();
|
||||
foreach (var e in events)
|
||||
{
|
||||
ack.AcceptedEventIds.Add(e.EventId.ToString());
|
||||
}
|
||||
return ack;
|
||||
}
|
||||
|
||||
[Fact]
|
||||
public async Task Drain_With_50PendingRows_Sends_OneBatch_Of_50_Then_FlipsToForwarded()
|
||||
{
|
||||
// Arrange — 50 pending rows on the first read, then empty on subsequent
|
||||
// reads so the actor settles after one productive drain.
|
||||
var pending = Enumerable.Range(0, 50).Select(_ => NewEvent()).ToList();
|
||||
_queue.ReadPendingAsync(Arg.Any<int>(), Arg.Any<CancellationToken>())
|
||||
.Returns(
|
||||
Task.FromResult<IReadOnlyList<AuditEvent>>(pending),
|
||||
Task.FromResult<IReadOnlyList<AuditEvent>>(Array.Empty<AuditEvent>()));
|
||||
|
||||
AuditEventBatch? capturedBatch = null;
|
||||
_client.IngestAuditEventsAsync(Arg.Any<AuditEventBatch>(), Arg.Any<CancellationToken>())
|
||||
.Returns(call =>
|
||||
{
|
||||
capturedBatch = call.Arg<AuditEventBatch>();
|
||||
return Task.FromResult(AckAll(pending));
|
||||
});
|
||||
|
||||
// Act
|
||||
CreateActor();
|
||||
|
||||
// Assert — give the scheduler time to fire the initial Drain tick.
|
||||
await AwaitAssertAsync(async () =>
|
||||
{
|
||||
await _client.Received(1).IngestAuditEventsAsync(
|
||||
Arg.Any<AuditEventBatch>(), Arg.Any<CancellationToken>());
|
||||
await _queue.Received(1).MarkForwardedAsync(
|
||||
Arg.Is<IReadOnlyList<Guid>>(g => g.Count == 50), Arg.Any<CancellationToken>());
|
||||
}, TimeSpan.FromSeconds(5));
|
||||
|
||||
Assert.NotNull(capturedBatch);
|
||||
Assert.Equal(50, capturedBatch!.Events.Count);
|
||||
|
||||
var expected = pending.Select(e => e.EventId).ToHashSet();
|
||||
await _queue.Received(1).MarkForwardedAsync(
|
||||
Arg.Is<IReadOnlyList<Guid>>(g => g.ToHashSet().SetEquals(expected)),
|
||||
Arg.Any<CancellationToken>());
|
||||
}
|
||||
|
||||
[Fact]
|
||||
public async Task Drain_GrpcThrows_RowsStayPending_NextDrainRetries()
|
||||
{
|
||||
// Arrange — first read returns 3 rows; the gRPC client throws on the
|
||||
// first push, then succeeds on the second. After the second push the
|
||||
// queue returns empty so the actor settles.
|
||||
var batch = Enumerable.Range(0, 3).Select(_ => NewEvent()).ToList();
|
||||
_queue.ReadPendingAsync(Arg.Any<int>(), Arg.Any<CancellationToken>())
|
||||
.Returns(
|
||||
Task.FromResult<IReadOnlyList<AuditEvent>>(batch),
|
||||
Task.FromResult<IReadOnlyList<AuditEvent>>(batch),
|
||||
Task.FromResult<IReadOnlyList<AuditEvent>>(Array.Empty<AuditEvent>()));
|
||||
|
||||
var calls = 0;
|
||||
_client.IngestAuditEventsAsync(Arg.Any<AuditEventBatch>(), Arg.Any<CancellationToken>())
|
||||
.Returns(_ =>
|
||||
{
|
||||
calls++;
|
||||
if (calls == 1)
|
||||
{
|
||||
throw new InvalidOperationException("simulated gRPC failure");
|
||||
}
|
||||
return Task.FromResult(AckAll(batch));
|
||||
});
|
||||
|
||||
// Act
|
||||
CreateActor();
|
||||
|
||||
// Assert — eventually MarkForwardedAsync is called exactly once (after
|
||||
// the retry succeeded). The first failure must NOT have called
|
||||
// MarkForwardedAsync because the rows stay Pending.
|
||||
await AwaitAssertAsync(async () =>
|
||||
{
|
||||
await _queue.Received(1).MarkForwardedAsync(
|
||||
Arg.Any<IReadOnlyList<Guid>>(), Arg.Any<CancellationToken>());
|
||||
}, TimeSpan.FromSeconds(10));
|
||||
|
||||
Assert.True(calls >= 2, $"Expected at least 2 client calls (1 failure + 1 retry); saw {calls}");
|
||||
}
|
||||
|
||||
[Fact]
|
||||
public async Task Drain_ZeroPending_SchedulesAtIdleInterval_NoClientCall()
|
||||
{
|
||||
// Arrange — queue always empty.
|
||||
_queue.ReadPendingAsync(Arg.Any<int>(), Arg.Any<CancellationToken>())
|
||||
.Returns(Task.FromResult<IReadOnlyList<AuditEvent>>(Array.Empty<AuditEvent>()));
|
||||
|
||||
// Idle interval = 2 s. Pause 3 s after the first tick (1 s busy on
|
||||
// PreStart) and assert the empty-queue branch did NOT push to the
|
||||
// client.
|
||||
CreateActor(Opts(busySeconds: 1, idleSeconds: 2));
|
||||
|
||||
// Allow the initial tick (~1 s) + a generous window for the idle re-tick.
|
||||
await Task.Delay(TimeSpan.FromSeconds(3));
|
||||
|
||||
await _client.DidNotReceiveWithAnyArgs().IngestAuditEventsAsync(default!, default);
|
||||
|
||||
// ReadPendingAsync was called at least once (initial tick), and at
|
||||
// most twice within the 3 s window (initial + one idle re-tick).
|
||||
var readCalls = _queue.ReceivedCalls()
|
||||
.Count(c => c.GetMethodInfo().Name == nameof(ISiteAuditQueue.ReadPendingAsync));
|
||||
Assert.InRange(readCalls, 1, 2);
|
||||
}
|
||||
|
||||
[Fact]
|
||||
public async Task Drain_NonZeroPending_SchedulesAtBusyInterval()
|
||||
{
|
||||
// Arrange — every read returns 1 row. With busy=1s the actor should
|
||||
// re-drain quickly, producing multiple client calls inside a short
|
||||
// window.
|
||||
var single = new List<AuditEvent> { NewEvent() };
|
||||
_queue.ReadPendingAsync(Arg.Any<int>(), Arg.Any<CancellationToken>())
|
||||
.Returns(Task.FromResult<IReadOnlyList<AuditEvent>>(single));
|
||||
|
||||
_client.IngestAuditEventsAsync(Arg.Any<AuditEventBatch>(), Arg.Any<CancellationToken>())
|
||||
.Returns(call => Task.FromResult(AckAll(single)));
|
||||
|
||||
CreateActor(Opts(busySeconds: 1, idleSeconds: 10));
|
||||
|
||||
// 3-second window with busy=1s should fit at least 2 drains.
|
||||
await Task.Delay(TimeSpan.FromSeconds(3));
|
||||
|
||||
var pushCalls = _client.ReceivedCalls()
|
||||
.Count(c => c.GetMethodInfo().Name == nameof(ISiteStreamAuditClient.IngestAuditEventsAsync));
|
||||
Assert.True(pushCalls >= 2,
|
||||
$"Expected ≥2 pushes within 3s when busy=1s; saw {pushCalls}");
|
||||
}
|
||||
|
||||
[Fact]
|
||||
public async Task Drain_AcceptedEventIdsSubset_OnlyMarksAccepted()
|
||||
{
|
||||
// Arrange — 5 rows pushed, but the central ack only lists 3.
|
||||
var rows = Enumerable.Range(0, 5).Select(_ => NewEvent()).ToList();
|
||||
var ackedIds = rows.Take(3).Select(r => r.EventId).ToList();
|
||||
|
||||
_queue.ReadPendingAsync(Arg.Any<int>(), Arg.Any<CancellationToken>())
|
||||
.Returns(
|
||||
Task.FromResult<IReadOnlyList<AuditEvent>>(rows),
|
||||
Task.FromResult<IReadOnlyList<AuditEvent>>(Array.Empty<AuditEvent>()));
|
||||
|
||||
var partialAck = new IngestAck();
|
||||
foreach (var id in ackedIds)
|
||||
{
|
||||
partialAck.AcceptedEventIds.Add(id.ToString());
|
||||
}
|
||||
_client.IngestAuditEventsAsync(Arg.Any<AuditEventBatch>(), Arg.Any<CancellationToken>())
|
||||
.Returns(Task.FromResult(partialAck));
|
||||
|
||||
// Act
|
||||
CreateActor();
|
||||
|
||||
await AwaitAssertAsync(async () =>
|
||||
{
|
||||
await _queue.Received(1).MarkForwardedAsync(
|
||||
Arg.Any<IReadOnlyList<Guid>>(), Arg.Any<CancellationToken>());
|
||||
}, TimeSpan.FromSeconds(5));
|
||||
|
||||
// Assert — exactly the 3 ack'd ids made it to MarkForwardedAsync, not
|
||||
// the other 2.
|
||||
var ackedSet = ackedIds.ToHashSet();
|
||||
await _queue.Received(1).MarkForwardedAsync(
|
||||
Arg.Is<IReadOnlyList<Guid>>(g => g.Count == 3 && g.ToHashSet().SetEquals(ackedSet)),
|
||||
Arg.Any<CancellationToken>());
|
||||
}
|
||||
}
|
||||
@@ -0,0 +1,224 @@
|
||||
using Google.Protobuf.WellKnownTypes;
|
||||
using ScadaLink.AuditLog.Telemetry;
|
||||
using ScadaLink.Commons.Entities.Audit;
|
||||
using ScadaLink.Commons.Types.Enums;
|
||||
using ScadaLink.Communication.Grpc;
|
||||
|
||||
namespace ScadaLink.AuditLog.Tests.Telemetry;
|
||||
|
||||
/// <summary>
|
||||
/// Round-trip + edge tests for the <see cref="AuditEventMapper"/> that bridges
|
||||
/// <see cref="AuditEvent"/> (Commons) ↔ <see cref="AuditEventDto"/> (proto).
|
||||
/// ForwardState is site-local and IngestedAtUtc is central-set, so neither survives
|
||||
/// the proto round-trip.
|
||||
/// </summary>
|
||||
public class AuditEventMapperTests
|
||||
{
|
||||
[Fact]
|
||||
public void ToDto_FromDto_Roundtrip_FullyPopulated_PreservesAllFields()
|
||||
{
|
||||
var occurredAt = new DateTime(2026, 5, 20, 10, 15, 30, 123, DateTimeKind.Utc);
|
||||
var ingestedAt = new DateTime(2026, 5, 20, 10, 15, 31, 0, DateTimeKind.Utc);
|
||||
var correlationId = Guid.NewGuid();
|
||||
var eventId = Guid.NewGuid();
|
||||
|
||||
var original = new AuditEvent
|
||||
{
|
||||
EventId = eventId,
|
||||
OccurredAtUtc = occurredAt,
|
||||
IngestedAtUtc = ingestedAt,
|
||||
Channel = AuditChannel.ApiOutbound,
|
||||
Kind = AuditKind.ApiCallCached,
|
||||
CorrelationId = correlationId,
|
||||
SourceSiteId = "site-1",
|
||||
SourceInstanceId = "Pump01",
|
||||
SourceScript = "OnDemand",
|
||||
Actor = "design-key",
|
||||
Target = "weather-api",
|
||||
Status = AuditStatus.Forwarded,
|
||||
HttpStatus = 200,
|
||||
DurationMs = 42,
|
||||
ErrorMessage = "transient timeout",
|
||||
ErrorDetail = "stack-trace",
|
||||
RequestSummary = "GET /weather",
|
||||
ResponseSummary = "{ \"ok\": true }",
|
||||
PayloadTruncated = true,
|
||||
Extra = "{ \"retryCount\": 1 }",
|
||||
ForwardState = AuditForwardState.Pending
|
||||
};
|
||||
|
||||
var dto = AuditEventMapper.ToDto(original);
|
||||
var roundTripped = AuditEventMapper.FromDto(dto);
|
||||
|
||||
Assert.Equal(original.EventId, roundTripped.EventId);
|
||||
Assert.Equal(original.OccurredAtUtc, roundTripped.OccurredAtUtc);
|
||||
Assert.Equal(original.Channel, roundTripped.Channel);
|
||||
Assert.Equal(original.Kind, roundTripped.Kind);
|
||||
Assert.Equal(original.CorrelationId, roundTripped.CorrelationId);
|
||||
Assert.Equal(original.SourceSiteId, roundTripped.SourceSiteId);
|
||||
Assert.Equal(original.SourceInstanceId, roundTripped.SourceInstanceId);
|
||||
Assert.Equal(original.SourceScript, roundTripped.SourceScript);
|
||||
Assert.Equal(original.Actor, roundTripped.Actor);
|
||||
Assert.Equal(original.Target, roundTripped.Target);
|
||||
Assert.Equal(original.Status, roundTripped.Status);
|
||||
Assert.Equal(original.HttpStatus, roundTripped.HttpStatus);
|
||||
Assert.Equal(original.DurationMs, roundTripped.DurationMs);
|
||||
Assert.Equal(original.ErrorMessage, roundTripped.ErrorMessage);
|
||||
Assert.Equal(original.ErrorDetail, roundTripped.ErrorDetail);
|
||||
Assert.Equal(original.RequestSummary, roundTripped.RequestSummary);
|
||||
Assert.Equal(original.ResponseSummary, roundTripped.ResponseSummary);
|
||||
Assert.Equal(original.PayloadTruncated, roundTripped.PayloadTruncated);
|
||||
Assert.Equal(original.Extra, roundTripped.Extra);
|
||||
|
||||
// ForwardState + IngestedAtUtc are NOT carried in the proto contract.
|
||||
Assert.Null(roundTripped.ForwardState);
|
||||
Assert.Null(roundTripped.IngestedAtUtc);
|
||||
}
|
||||
|
||||
[Fact]
|
||||
public void ToDto_NullableStringFields_BecomeEmptyString()
|
||||
{
|
||||
var evt = new AuditEvent
|
||||
{
|
||||
EventId = Guid.NewGuid(),
|
||||
OccurredAtUtc = DateTime.UtcNow,
|
||||
Channel = AuditChannel.Notification,
|
||||
Kind = AuditKind.NotifySend,
|
||||
Status = AuditStatus.Submitted
|
||||
// all string? fields left null; CorrelationId null
|
||||
};
|
||||
|
||||
var dto = AuditEventMapper.ToDto(evt);
|
||||
|
||||
Assert.Equal(string.Empty, dto.CorrelationId);
|
||||
Assert.Equal(string.Empty, dto.SourceSiteId);
|
||||
Assert.Equal(string.Empty, dto.SourceInstanceId);
|
||||
Assert.Equal(string.Empty, dto.SourceScript);
|
||||
Assert.Equal(string.Empty, dto.Actor);
|
||||
Assert.Equal(string.Empty, dto.Target);
|
||||
Assert.Equal(string.Empty, dto.ErrorMessage);
|
||||
Assert.Equal(string.Empty, dto.ErrorDetail);
|
||||
Assert.Equal(string.Empty, dto.RequestSummary);
|
||||
Assert.Equal(string.Empty, dto.ResponseSummary);
|
||||
Assert.Equal(string.Empty, dto.Extra);
|
||||
}
|
||||
|
||||
[Fact]
|
||||
public void FromDto_EmptyString_BecomesNullProperty()
|
||||
{
|
||||
var dto = new AuditEventDto
|
||||
{
|
||||
EventId = Guid.NewGuid().ToString(),
|
||||
OccurredAtUtc = Timestamp.FromDateTime(DateTime.UtcNow),
|
||||
Channel = nameof(AuditChannel.ApiOutbound),
|
||||
Kind = nameof(AuditKind.ApiCall),
|
||||
Status = nameof(AuditStatus.Submitted),
|
||||
CorrelationId = string.Empty,
|
||||
SourceSiteId = string.Empty,
|
||||
SourceInstanceId = string.Empty,
|
||||
SourceScript = string.Empty,
|
||||
Actor = string.Empty,
|
||||
Target = string.Empty,
|
||||
ErrorMessage = string.Empty,
|
||||
ErrorDetail = string.Empty,
|
||||
RequestSummary = string.Empty,
|
||||
ResponseSummary = string.Empty,
|
||||
Extra = string.Empty
|
||||
};
|
||||
|
||||
var evt = AuditEventMapper.FromDto(dto);
|
||||
|
||||
Assert.Null(evt.CorrelationId);
|
||||
Assert.Null(evt.SourceSiteId);
|
||||
Assert.Null(evt.SourceInstanceId);
|
||||
Assert.Null(evt.SourceScript);
|
||||
Assert.Null(evt.Actor);
|
||||
Assert.Null(evt.Target);
|
||||
Assert.Null(evt.ErrorMessage);
|
||||
Assert.Null(evt.ErrorDetail);
|
||||
Assert.Null(evt.RequestSummary);
|
||||
Assert.Null(evt.ResponseSummary);
|
||||
Assert.Null(evt.Extra);
|
||||
}
|
||||
|
||||
[Fact]
|
||||
public void ToDto_OccurredAtUtc_PreservesUtcKind()
|
||||
{
|
||||
var occurredAt = new DateTime(2026, 5, 20, 8, 0, 0, DateTimeKind.Utc);
|
||||
var evt = new AuditEvent
|
||||
{
|
||||
EventId = Guid.NewGuid(),
|
||||
OccurredAtUtc = occurredAt,
|
||||
Channel = AuditChannel.DbOutbound,
|
||||
Kind = AuditKind.DbWrite,
|
||||
Status = AuditStatus.Delivered
|
||||
};
|
||||
|
||||
var dto = AuditEventMapper.ToDto(evt);
|
||||
var roundTripped = AuditEventMapper.FromDto(dto);
|
||||
|
||||
Assert.Equal(DateTimeKind.Utc, roundTripped.OccurredAtUtc.Kind);
|
||||
Assert.Equal(occurredAt, roundTripped.OccurredAtUtc);
|
||||
}
|
||||
|
||||
[Fact]
|
||||
public void ToDto_NullableInt_BecomesNullInt32Value()
|
||||
{
|
||||
var evt = new AuditEvent
|
||||
{
|
||||
EventId = Guid.NewGuid(),
|
||||
OccurredAtUtc = DateTime.UtcNow,
|
||||
Channel = AuditChannel.Notification,
|
||||
Kind = AuditKind.NotifySend,
|
||||
Status = AuditStatus.Submitted,
|
||||
HttpStatus = null,
|
||||
DurationMs = null
|
||||
};
|
||||
|
||||
var dto = AuditEventMapper.ToDto(evt);
|
||||
|
||||
Assert.Null(dto.HttpStatus);
|
||||
Assert.Null(dto.DurationMs);
|
||||
}
|
||||
|
||||
[Fact]
|
||||
public void FromDto_NullInt32Value_BecomesNullProperty()
|
||||
{
|
||||
var dto = new AuditEventDto
|
||||
{
|
||||
EventId = Guid.NewGuid().ToString(),
|
||||
OccurredAtUtc = Timestamp.FromDateTime(DateTime.UtcNow),
|
||||
Channel = nameof(AuditChannel.ApiInbound),
|
||||
Kind = nameof(AuditKind.InboundRequest),
|
||||
Status = nameof(AuditStatus.Delivered)
|
||||
// HttpStatus + DurationMs intentionally left absent
|
||||
};
|
||||
|
||||
Assert.Null(dto.HttpStatus);
|
||||
Assert.Null(dto.DurationMs);
|
||||
|
||||
var evt = AuditEventMapper.FromDto(dto);
|
||||
|
||||
Assert.Null(evt.HttpStatus);
|
||||
Assert.Null(evt.DurationMs);
|
||||
}
|
||||
|
||||
[Fact]
|
||||
public void ToDto_EnumValues_StoredAsStringNames()
|
||||
{
|
||||
var evt = new AuditEvent
|
||||
{
|
||||
EventId = Guid.NewGuid(),
|
||||
OccurredAtUtc = DateTime.UtcNow,
|
||||
Channel = AuditChannel.ApiOutbound,
|
||||
Kind = AuditKind.ApiCallCached,
|
||||
Status = AuditStatus.Parked
|
||||
};
|
||||
|
||||
var dto = AuditEventMapper.ToDto(evt);
|
||||
|
||||
Assert.Equal("ApiOutbound", dto.Channel);
|
||||
Assert.Equal("ApiCallCached", dto.Kind);
|
||||
Assert.Equal("Parked", dto.Status);
|
||||
}
|
||||
}
|
||||
@@ -0,0 +1,123 @@
|
||||
using Google.Protobuf;
|
||||
using Google.Protobuf.WellKnownTypes;
|
||||
using ScadaLink.Communication.Grpc;
|
||||
|
||||
namespace ScadaLink.Communication.Tests.Protos;
|
||||
|
||||
/// <summary>
|
||||
/// Wire-format round-trip tests for the Audit Log (#23) telemetry proto messages
|
||||
/// (<see cref="AuditEventDto"/>, <see cref="AuditEventBatch"/>, <see cref="IngestAck"/>).
|
||||
/// Locks the additive contract the site → central audit pipeline depends on.
|
||||
/// </summary>
|
||||
public class AuditEventProtoTests
|
||||
{
|
||||
[Fact]
|
||||
public void AuditEventDto_RoundTrip_PreservesAllFields()
|
||||
{
|
||||
var occurredAt = Timestamp.FromDateTimeOffset(
|
||||
new DateTimeOffset(2026, 5, 20, 10, 15, 30, 123, TimeSpan.Zero));
|
||||
|
||||
var original = new AuditEventDto
|
||||
{
|
||||
EventId = Guid.NewGuid().ToString(),
|
||||
OccurredAtUtc = occurredAt,
|
||||
Channel = "ApiOutbound",
|
||||
Kind = "ApiCall",
|
||||
CorrelationId = Guid.NewGuid().ToString(),
|
||||
SourceSiteId = "site-1",
|
||||
SourceInstanceId = "Pump01",
|
||||
SourceScript = "OnDemand",
|
||||
Actor = "design-key",
|
||||
Target = "weather-api",
|
||||
Status = "Delivered",
|
||||
HttpStatus = 200,
|
||||
DurationMs = 42,
|
||||
ErrorMessage = "no error",
|
||||
ErrorDetail = "stack",
|
||||
RequestSummary = "GET /weather?city=brisbane",
|
||||
ResponseSummary = "{ \"temp\": 22.5 }",
|
||||
PayloadTruncated = true,
|
||||
Extra = "{ \"retryCount\": 0 }"
|
||||
};
|
||||
|
||||
var bytes = original.ToByteArray();
|
||||
var deserialized = AuditEventDto.Parser.ParseFrom(bytes);
|
||||
|
||||
Assert.Equal(original.EventId, deserialized.EventId);
|
||||
Assert.Equal(original.OccurredAtUtc, deserialized.OccurredAtUtc);
|
||||
Assert.Equal(original.Channel, deserialized.Channel);
|
||||
Assert.Equal(original.Kind, deserialized.Kind);
|
||||
Assert.Equal(original.CorrelationId, deserialized.CorrelationId);
|
||||
Assert.Equal(original.SourceSiteId, deserialized.SourceSiteId);
|
||||
Assert.Equal(original.SourceInstanceId, deserialized.SourceInstanceId);
|
||||
Assert.Equal(original.SourceScript, deserialized.SourceScript);
|
||||
Assert.Equal(original.Actor, deserialized.Actor);
|
||||
Assert.Equal(original.Target, deserialized.Target);
|
||||
Assert.Equal(original.Status, deserialized.Status);
|
||||
Assert.Equal(original.HttpStatus, deserialized.HttpStatus);
|
||||
Assert.Equal(original.DurationMs, deserialized.DurationMs);
|
||||
Assert.Equal(original.ErrorMessage, deserialized.ErrorMessage);
|
||||
Assert.Equal(original.ErrorDetail, deserialized.ErrorDetail);
|
||||
Assert.Equal(original.RequestSummary, deserialized.RequestSummary);
|
||||
Assert.Equal(original.ResponseSummary, deserialized.ResponseSummary);
|
||||
Assert.Equal(original.PayloadTruncated, deserialized.PayloadTruncated);
|
||||
Assert.Equal(original.Extra, deserialized.Extra);
|
||||
}
|
||||
|
||||
[Fact]
|
||||
public void AuditEventDto_NullableInt_AbsentByDefault_NotIncludedInWire()
|
||||
{
|
||||
// Int32Value fields (http_status, duration_ms) are wrapper-typed in proto;
|
||||
// when unset, the wrapper is absent, not serialized, and deserializes back to null.
|
||||
var original = new AuditEventDto
|
||||
{
|
||||
EventId = Guid.NewGuid().ToString(),
|
||||
OccurredAtUtc = Timestamp.FromDateTimeOffset(DateTimeOffset.UtcNow),
|
||||
Channel = "Notification",
|
||||
Kind = "NotifySend",
|
||||
Status = "Submitted"
|
||||
};
|
||||
|
||||
Assert.Null(original.HttpStatus);
|
||||
Assert.Null(original.DurationMs);
|
||||
|
||||
var bytes = original.ToByteArray();
|
||||
var deserialized = AuditEventDto.Parser.ParseFrom(bytes);
|
||||
|
||||
Assert.Null(deserialized.HttpStatus);
|
||||
Assert.Null(deserialized.DurationMs);
|
||||
}
|
||||
|
||||
[Fact]
|
||||
public void AuditEventBatch_Empty_RoundTrip_Yields_EmptyEvents()
|
||||
{
|
||||
var original = new AuditEventBatch();
|
||||
Assert.Empty(original.Events);
|
||||
|
||||
var bytes = original.ToByteArray();
|
||||
var deserialized = AuditEventBatch.Parser.ParseFrom(bytes);
|
||||
|
||||
Assert.Empty(deserialized.Events);
|
||||
}
|
||||
|
||||
[Fact]
|
||||
public void IngestAck_PreservesAcceptedEventIds()
|
||||
{
|
||||
var id1 = Guid.NewGuid().ToString();
|
||||
var id2 = Guid.NewGuid().ToString();
|
||||
var id3 = Guid.NewGuid().ToString();
|
||||
|
||||
var original = new IngestAck();
|
||||
original.AcceptedEventIds.Add(id1);
|
||||
original.AcceptedEventIds.Add(id2);
|
||||
original.AcceptedEventIds.Add(id3);
|
||||
|
||||
var bytes = original.ToByteArray();
|
||||
var deserialized = IngestAck.Parser.ParseFrom(bytes);
|
||||
|
||||
Assert.Equal(3, deserialized.AcceptedEventIds.Count);
|
||||
Assert.Equal(id1, deserialized.AcceptedEventIds[0]);
|
||||
Assert.Equal(id2, deserialized.AcceptedEventIds[1]);
|
||||
Assert.Equal(id3, deserialized.AcceptedEventIds[2]);
|
||||
}
|
||||
}
|
||||
@@ -0,0 +1,100 @@
|
||||
using Akka.Actor;
|
||||
using Akka.TestKit.Xunit2;
|
||||
using Google.Protobuf.WellKnownTypes;
|
||||
using Grpc.Core;
|
||||
using Microsoft.Extensions.Logging.Abstractions;
|
||||
using NSubstitute;
|
||||
using ScadaLink.Commons.Messages.Audit;
|
||||
using ScadaLink.Communication.Grpc;
|
||||
|
||||
namespace ScadaLink.Communication.Tests;
|
||||
|
||||
/// <summary>
|
||||
/// Bundle D D2 tests for <see cref="SiteStreamGrpcServer.IngestAuditEvents"/>.
|
||||
/// Verifies the DTO→entity→actor→ack round-trip through the gRPC handler.
|
||||
/// A tiny <c>StubIngestActor</c> stands in for the central
|
||||
/// <c>AuditLogIngestActor</c>, replying with the EventIds it received so the
|
||||
/// test asserts the wiring without depending on MSSQL.
|
||||
/// </summary>
|
||||
public class SiteStreamIngestAuditEventsTests : TestKit
|
||||
{
|
||||
private readonly ISiteStreamSubscriber _subscriber = Substitute.For<ISiteStreamSubscriber>();
|
||||
|
||||
private SiteStreamGrpcServer CreateServer() =>
|
||||
new(_subscriber, NullLogger<SiteStreamGrpcServer>.Instance);
|
||||
|
||||
private static ServerCallContext NewContext(CancellationToken ct = default)
|
||||
{
|
||||
var context = Substitute.For<ServerCallContext>();
|
||||
context.CancellationToken.Returns(ct);
|
||||
return context;
|
||||
}
|
||||
|
||||
private static AuditEventDto NewDto(Guid? id = null) => new()
|
||||
{
|
||||
EventId = (id ?? Guid.NewGuid()).ToString(),
|
||||
OccurredAtUtc = Timestamp.FromDateTime(
|
||||
DateTime.SpecifyKind(new DateTime(2026, 5, 20, 10, 0, 0), DateTimeKind.Utc)),
|
||||
Channel = "ApiOutbound",
|
||||
Kind = "ApiCall",
|
||||
Status = "Delivered",
|
||||
SourceSiteId = "site-1",
|
||||
};
|
||||
|
||||
[Fact]
|
||||
public async Task IngestAuditEvents_With_AuditIngestActor_Routes_To_Actor_Returns_Reply()
|
||||
{
|
||||
// Arrange — a stub actor that echoes every received EventId back.
|
||||
var stubActor = Sys.ActorOf(Props.Create(() => new EchoIngestActor()));
|
||||
|
||||
var server = CreateServer();
|
||||
server.SetAuditIngestActor(stubActor);
|
||||
|
||||
// Build a 3-event batch.
|
||||
var dtos = Enumerable.Range(0, 3).Select(_ => NewDto()).ToList();
|
||||
var batch = new AuditEventBatch();
|
||||
batch.Events.AddRange(dtos);
|
||||
|
||||
// Act
|
||||
var ack = await server.IngestAuditEvents(batch, NewContext());
|
||||
|
||||
// Assert — every dto's id appears in the ack, demonstrating end-to-
|
||||
// end routing through the actor.
|
||||
Assert.Equal(3, ack.AcceptedEventIds.Count);
|
||||
var expectedIds = dtos.Select(d => d.EventId).ToHashSet();
|
||||
Assert.True(expectedIds.SetEquals(ack.AcceptedEventIds.ToHashSet()));
|
||||
}
|
||||
|
||||
[Fact]
|
||||
public async Task IngestAuditEvents_NoActor_Wired_ReturnsEmptyAck()
|
||||
{
|
||||
var server = CreateServer();
|
||||
// Intentionally do NOT call SetAuditIngestActor — simulates host
|
||||
// startup race window.
|
||||
|
||||
var batch = new AuditEventBatch();
|
||||
batch.Events.Add(NewDto());
|
||||
|
||||
var ack = await server.IngestAuditEvents(batch, NewContext());
|
||||
|
||||
Assert.Empty(ack.AcceptedEventIds);
|
||||
}
|
||||
|
||||
/// <summary>
|
||||
/// Tiny ReceiveActor that echoes every EventId in an incoming
|
||||
/// <see cref="IngestAuditEventsCommand"/> back as an
|
||||
/// <see cref="IngestAuditEventsReply"/>. Stands in for the central
|
||||
/// AuditLogIngestActor so this test never touches MSSQL.
|
||||
/// </summary>
|
||||
private sealed class EchoIngestActor : ReceiveActor
|
||||
{
|
||||
public EchoIngestActor()
|
||||
{
|
||||
Receive<IngestAuditEventsCommand>(cmd =>
|
||||
{
|
||||
var ids = cmd.Events.Select(e => e.EventId).ToList();
|
||||
Sender.Tell(new IngestAuditEventsReply(ids));
|
||||
});
|
||||
}
|
||||
}
|
||||
}
|
||||
@@ -217,6 +217,98 @@ public class AuditLogRepositoryTests : IClassFixture<MsSqlMigrationFixture>
|
||||
Assert.Equal(t0.AddMinutes(0), page3[0].OccurredAtUtc);
|
||||
}
|
||||
|
||||
[SkippableFact]
|
||||
public async Task InsertIfNotExistsAsync_ConcurrentDuplicateInserts_ProduceExactlyOneRow()
|
||||
{
|
||||
Skip.IfNot(_fixture.Available, _fixture.SkipReason);
|
||||
|
||||
var siteId = NewSiteId();
|
||||
|
||||
// Single event used by every parallel call — same EventId, same payload.
|
||||
// The repository's IF NOT EXISTS … INSERT pattern has a check-then-act
|
||||
// race window between sessions; under concurrent load SQL Server can
|
||||
// raise a unique-index violation (error 2601) on UX_AuditLog_EventId.
|
||||
// Bundle A's hardening swallows 2601/2627 so duplicates collapse silently.
|
||||
var evt = NewEvent(siteId, occurredAtUtc: new DateTime(2026, 5, 20, 12, 0, 0, DateTimeKind.Utc));
|
||||
|
||||
// 50 parallel inserters, each with its own DbContext (DbContext is not
|
||||
// thread-safe). Parallel.ForEachAsync aggregates exceptions, so a single
|
||||
// unhandled 2601 from the repository would fail this test loudly.
|
||||
await Parallel.ForEachAsync(
|
||||
Enumerable.Range(0, 50),
|
||||
new ParallelOptions { MaxDegreeOfParallelism = 50 },
|
||||
async (_, ct) =>
|
||||
{
|
||||
await using var context = CreateContext();
|
||||
var repo = new AuditLogRepository(context);
|
||||
await repo.InsertIfNotExistsAsync(evt, ct);
|
||||
});
|
||||
|
||||
await using var readContext = CreateContext();
|
||||
var count = await readContext.Set<AuditEvent>()
|
||||
.Where(e => e.SourceSiteId == siteId)
|
||||
.CountAsync();
|
||||
|
||||
Assert.Equal(1, count);
|
||||
}
|
||||
|
||||
[SkippableFact]
|
||||
public async Task QueryAsync_Keyset_SameOccurredAtUtc_TiebreaksOnEventId()
|
||||
{
|
||||
Skip.IfNot(_fixture.Available, _fixture.SkipReason);
|
||||
|
||||
var siteId = NewSiteId();
|
||||
await using var context = CreateContext();
|
||||
var repo = new AuditLogRepository(context);
|
||||
|
||||
// Four events all sharing the exact same OccurredAtUtc — the keyset
|
||||
// cursor must lean on the EventId tiebreaker (descending) to page
|
||||
// deterministically. Bundle D's reviewer flagged this as a deferred
|
||||
// verification because it depends on EF Core 10 translating
|
||||
// Guid.CompareTo against SQL Server's uniqueidentifier sort order.
|
||||
var occurredAt = new DateTime(2026, 5, 20, 13, 0, 0, DateTimeKind.Utc);
|
||||
|
||||
// Build four distinct Guids; we don't care about the literal ordering
|
||||
// produced by Guid.CompareTo — only that paging is deterministic and
|
||||
// covers every row exactly once.
|
||||
var events = Enumerable.Range(0, 4)
|
||||
.Select(_ => NewEvent(siteId, occurredAtUtc: occurredAt))
|
||||
.ToList();
|
||||
|
||||
foreach (var e in events)
|
||||
{
|
||||
await repo.InsertIfNotExistsAsync(e);
|
||||
}
|
||||
|
||||
var filter = new AuditLogQueryFilter(SourceSiteId: siteId);
|
||||
|
||||
var page1 = await repo.QueryAsync(filter, new AuditLogPaging(PageSize: 2));
|
||||
Assert.Equal(2, page1.Count);
|
||||
Assert.All(page1, r => Assert.Equal(occurredAt, r.OccurredAtUtc));
|
||||
|
||||
var cursor = page1[^1];
|
||||
var page2 = await repo.QueryAsync(
|
||||
filter,
|
||||
new AuditLogPaging(
|
||||
PageSize: 2,
|
||||
AfterOccurredAtUtc: cursor.OccurredAtUtc,
|
||||
AfterEventId: cursor.EventId));
|
||||
|
||||
Assert.Equal(2, page2.Count);
|
||||
Assert.All(page2, r => Assert.Equal(occurredAt, r.OccurredAtUtc));
|
||||
|
||||
var page1Ids = page1.Select(r => r.EventId).ToHashSet();
|
||||
var page2Ids = page2.Select(r => r.EventId).ToHashSet();
|
||||
|
||||
// No overlap between pages.
|
||||
Assert.Empty(page1Ids.Intersect(page2Ids));
|
||||
|
||||
// Every inserted EventId appears in exactly one of the two pages.
|
||||
var allIds = page1Ids.Union(page2Ids).ToHashSet();
|
||||
Assert.Equal(4, allIds.Count);
|
||||
Assert.True(events.Select(e => e.EventId).ToHashSet().SetEquals(allIds));
|
||||
}
|
||||
|
||||
[SkippableFact]
|
||||
public async Task SwitchOutPartitionAsync_ThrowsNotSupported_ForM1()
|
||||
{
|
||||
|
||||
@@ -0,0 +1,52 @@
|
||||
namespace ScadaLink.HealthMonitoring.Tests;
|
||||
|
||||
/// <summary>
|
||||
/// Bundle G (M2-T11) regression coverage. The site-side Audit Log writer chain
|
||||
/// (FallbackAuditWriter) increments <see cref="IAuditWriteFailureCounter"/>
|
||||
/// every time the primary SQLite writer throws. Bundle G bridges that counter
|
||||
/// into the Site Health Monitoring report payload as <c>SiteAuditWriteFailures</c>
|
||||
/// so a sustained audit-write outage surfaces on /monitoring/health rather than
|
||||
/// disappearing into a NoOp sink.
|
||||
/// </summary>
|
||||
public class SiteAuditWriteFailuresMetricTests
|
||||
{
|
||||
private readonly SiteHealthCollector _collector = new();
|
||||
|
||||
[Fact]
|
||||
public void Increment_Three_Times_Counter_Reports_3()
|
||||
{
|
||||
_collector.IncrementSiteAuditWriteFailures();
|
||||
_collector.IncrementSiteAuditWriteFailures();
|
||||
_collector.IncrementSiteAuditWriteFailures();
|
||||
|
||||
var report = _collector.CollectReport("site-1");
|
||||
|
||||
Assert.Equal(3, report.SiteAuditWriteFailures);
|
||||
}
|
||||
|
||||
[Fact]
|
||||
public void Report_Payload_Includes_SiteAuditWriteFailures_AsZeroByDefault()
|
||||
{
|
||||
var report = _collector.CollectReport("site-1");
|
||||
|
||||
Assert.Equal(0, report.SiteAuditWriteFailures);
|
||||
}
|
||||
|
||||
/// <summary>
|
||||
/// Mirrors the existing per-interval reset semantics for ScriptErrorCount /
|
||||
/// AlarmEvaluationErrorCount / DeadLetterCount — SiteAuditWriteFailures is an
|
||||
/// interval count, not a running total.
|
||||
/// </summary>
|
||||
[Fact]
|
||||
public void CollectReport_Resets_SiteAuditWriteFailures()
|
||||
{
|
||||
_collector.IncrementSiteAuditWriteFailures();
|
||||
_collector.IncrementSiteAuditWriteFailures();
|
||||
|
||||
var first = _collector.CollectReport("site-1");
|
||||
Assert.Equal(2, first.SiteAuditWriteFailures);
|
||||
|
||||
var second = _collector.CollectReport("site-1");
|
||||
Assert.Equal(0, second.SiteAuditWriteFailures);
|
||||
}
|
||||
}
|
||||
306
tests/ScadaLink.Host.Tests/AkkaHostedServiceAuditWiringTests.cs
Normal file
306
tests/ScadaLink.Host.Tests/AkkaHostedServiceAuditWiringTests.cs
Normal file
@@ -0,0 +1,306 @@
|
||||
using Akka.Configuration;
|
||||
using Microsoft.AspNetCore.Builder;
|
||||
using Microsoft.AspNetCore.Mvc.Testing;
|
||||
using Microsoft.EntityFrameworkCore;
|
||||
using Microsoft.Extensions.Configuration;
|
||||
using Microsoft.Extensions.DependencyInjection;
|
||||
using Microsoft.Extensions.Hosting;
|
||||
using Microsoft.Extensions.Options;
|
||||
using ScadaLink.AuditLog;
|
||||
using ScadaLink.AuditLog.Site;
|
||||
using ScadaLink.AuditLog.Site.Telemetry;
|
||||
using ScadaLink.ClusterInfrastructure;
|
||||
using ScadaLink.Commons.Interfaces.Services;
|
||||
using ScadaLink.ConfigurationDatabase;
|
||||
using ScadaLink.Host;
|
||||
using ScadaLink.Host.Actors;
|
||||
|
||||
namespace ScadaLink.Host.Tests;
|
||||
|
||||
/// <summary>
|
||||
/// Bundle E (M2 Task E1) — verifies the Audit Log (#23) DI surface is wired
|
||||
/// into both composition roots and that the HOCON document emitted by
|
||||
/// <see cref="AkkaHostedService.BuildHocon"/> includes the dedicated
|
||||
/// <c>audit-telemetry-dispatcher</c> the site telemetry actor binds to.
|
||||
/// </summary>
|
||||
/// <remarks>
|
||||
/// <para>
|
||||
/// Full cluster bring-up is exercised by the existing
|
||||
/// <see cref="CompositionRootTests"/> pattern — these tests reuse the same
|
||||
/// <see cref="AkkaHostedServiceRemover"/> trick to short-circuit
|
||||
/// <see cref="AkkaHostedService.StartAsync"/> so DI resolution is exercised
|
||||
/// without the actor system actually being created.
|
||||
/// </para>
|
||||
/// </remarks>
|
||||
public class AkkaHostedServiceAuditWiringHoconTests
|
||||
{
|
||||
[Fact]
|
||||
public void BuildHocon_Emits_AuditTelemetryDispatcher_Block()
|
||||
{
|
||||
// Bundle E acceptance: the HOCON document the host parses must declare
|
||||
// the dedicated dispatcher the SiteAuditTelemetryActor binds to. A
|
||||
// missing dispatcher block would route the actor to the default
|
||||
// dispatcher and silently lose the isolation guarantee.
|
||||
var nodeOptions = new NodeOptions
|
||||
{
|
||||
Role = "Site",
|
||||
NodeHostname = "site-test-1",
|
||||
RemotingPort = 0,
|
||||
SiteId = "TestSite",
|
||||
};
|
||||
var clusterOptions = new ClusterOptions
|
||||
{
|
||||
SeedNodes = new List<string> { "akka.tcp://scadalink@localhost:2551" },
|
||||
SplitBrainResolverStrategy = "keep-oldest",
|
||||
MinNrOfMembers = 1,
|
||||
StableAfter = TimeSpan.FromSeconds(15),
|
||||
HeartbeatInterval = TimeSpan.FromSeconds(2),
|
||||
FailureDetectionThreshold = TimeSpan.FromSeconds(10),
|
||||
};
|
||||
|
||||
var hocon = AkkaHostedService.BuildHocon(
|
||||
nodeOptions,
|
||||
clusterOptions,
|
||||
new[] { "Site", "site-TestSite" },
|
||||
TimeSpan.FromSeconds(5),
|
||||
TimeSpan.FromSeconds(15));
|
||||
|
||||
var config = ConfigurationFactory.ParseString(hocon);
|
||||
|
||||
// The dispatcher is declared at the root, so the lookup is by its
|
||||
// unqualified name. The HOCON parser must accept the block as a
|
||||
// standalone dispatcher definition the actor system can resolve.
|
||||
var dispatcherType = config.GetString("audit-telemetry-dispatcher.type");
|
||||
Assert.Equal("ForkJoinDispatcher", dispatcherType);
|
||||
|
||||
var throughput = config.GetInt("audit-telemetry-dispatcher.throughput");
|
||||
Assert.Equal(100, throughput);
|
||||
|
||||
var threadCount = config.GetInt("audit-telemetry-dispatcher.dedicated-thread-pool.thread-count");
|
||||
Assert.Equal(2, threadCount);
|
||||
}
|
||||
}
|
||||
|
||||
/// <summary>
|
||||
/// Verifies Audit Log (#23) services land in the Central composition root.
|
||||
/// </summary>
|
||||
public class CentralAuditWiringTests : IDisposable
|
||||
{
|
||||
private readonly WebApplicationFactory<Program> _factory;
|
||||
private readonly string? _previousEnv;
|
||||
|
||||
public CentralAuditWiringTests()
|
||||
{
|
||||
_previousEnv = Environment.GetEnvironmentVariable("DOTNET_ENVIRONMENT");
|
||||
Environment.SetEnvironmentVariable("DOTNET_ENVIRONMENT", "Central");
|
||||
|
||||
_factory = new WebApplicationFactory<Program>()
|
||||
.WithWebHostBuilder(builder =>
|
||||
{
|
||||
builder.ConfigureAppConfiguration((_, config) =>
|
||||
{
|
||||
config.AddInMemoryCollection(new Dictionary<string, string?>
|
||||
{
|
||||
["ScadaLink:Node:NodeHostname"] = "localhost",
|
||||
["ScadaLink:Node:RemotingPort"] = "0",
|
||||
["ScadaLink:Cluster:SeedNodes:0"] = "akka.tcp://scadalink@localhost:2551",
|
||||
["ScadaLink:Cluster:SeedNodes:1"] = "akka.tcp://scadalink@localhost:2552",
|
||||
["ScadaLink:Database:SkipMigrations"] = "true",
|
||||
["ScadaLink:Security:JwtSigningKey"] = "test-signing-key-must-be-at-least-32-chars-long!",
|
||||
["ScadaLink:Security:LdapServer"] = "localhost",
|
||||
["ScadaLink:Security:LdapPort"] = "3893",
|
||||
["ScadaLink:Security:LdapUseTls"] = "false",
|
||||
["ScadaLink:Security:AllowInsecureLdap"] = "true",
|
||||
["ScadaLink:Security:LdapSearchBase"] = "dc=scadalink,dc=local",
|
||||
["ScadaLink:InboundApi:ApiKeyPepper"] = "test-inbound-api-key-pepper-at-least-32-chars!",
|
||||
});
|
||||
});
|
||||
builder.UseSetting("ScadaLink:Node:Role", "Central");
|
||||
builder.UseSetting("ScadaLink:Database:SkipMigrations", "true");
|
||||
builder.ConfigureServices(services =>
|
||||
{
|
||||
var descriptorsToRemove = services
|
||||
.Where(d =>
|
||||
d.ServiceType == typeof(DbContextOptions<ScadaLinkDbContext>) ||
|
||||
d.ServiceType == typeof(DbContextOptions) ||
|
||||
d.ServiceType == typeof(ScadaLinkDbContext) ||
|
||||
d.ServiceType.FullName?.Contains("EntityFrameworkCore") == true)
|
||||
.ToList();
|
||||
foreach (var d in descriptorsToRemove)
|
||||
services.Remove(d);
|
||||
|
||||
services.AddDbContext<ScadaLinkDbContext>(options =>
|
||||
options.UseInMemoryDatabase($"CentralAuditWiringTests_{Guid.NewGuid()}"));
|
||||
|
||||
AkkaHostedServiceRemover.RemoveAkkaHostedServiceOnly(services);
|
||||
});
|
||||
});
|
||||
|
||||
_ = _factory.Server;
|
||||
}
|
||||
|
||||
public void Dispose()
|
||||
{
|
||||
_factory.Dispose();
|
||||
Environment.SetEnvironmentVariable("DOTNET_ENVIRONMENT", _previousEnv);
|
||||
}
|
||||
|
||||
[Fact]
|
||||
public void Central_Resolves_IAuditWriter_AsFallbackAuditWriter()
|
||||
{
|
||||
// Central nodes still register the writer chain because AddAuditLog is
|
||||
// shared between roles — the registrations are lazy singletons and the
|
||||
// writer is never resolved on a central node in production. Asserting
|
||||
// it resolves here confirms the chain is intact and ready for the
|
||||
// future case where a central-only actor needs to emit audit events.
|
||||
var writer = _factory.Services.GetService<IAuditWriter>();
|
||||
Assert.NotNull(writer);
|
||||
Assert.IsType<FallbackAuditWriter>(writer);
|
||||
}
|
||||
|
||||
[Fact]
|
||||
public void Central_Resolves_AuditLogOptions()
|
||||
{
|
||||
var opts = _factory.Services.GetService<IOptions<ScadaLink.AuditLog.Configuration.AuditLogOptions>>();
|
||||
Assert.NotNull(opts);
|
||||
Assert.NotNull(opts!.Value);
|
||||
}
|
||||
|
||||
[Fact]
|
||||
public void Central_Resolves_SqliteAuditWriterOptions()
|
||||
{
|
||||
var opts = _factory.Services.GetService<IOptions<SqliteAuditWriterOptions>>();
|
||||
Assert.NotNull(opts);
|
||||
Assert.NotNull(opts!.Value);
|
||||
}
|
||||
|
||||
[Fact]
|
||||
public void Central_Resolves_SiteAuditTelemetryOptions()
|
||||
{
|
||||
var opts = _factory.Services.GetService<IOptions<SiteAuditTelemetryOptions>>();
|
||||
Assert.NotNull(opts);
|
||||
Assert.NotNull(opts!.Value);
|
||||
}
|
||||
|
||||
[Fact]
|
||||
public void Central_Resolves_ISiteStreamAuditClient_AsNoOpDefault()
|
||||
{
|
||||
var client = _factory.Services.GetService<ISiteStreamAuditClient>();
|
||||
Assert.NotNull(client);
|
||||
Assert.IsType<NoOpSiteStreamAuditClient>(client);
|
||||
}
|
||||
}
|
||||
|
||||
/// <summary>
|
||||
/// Verifies Audit Log (#23) services land in the Site composition root.
|
||||
/// </summary>
|
||||
public class SiteAuditWiringTests : IDisposable
|
||||
{
|
||||
private readonly WebApplication _host;
|
||||
private readonly string _tempDbPath;
|
||||
|
||||
public SiteAuditWiringTests()
|
||||
{
|
||||
_tempDbPath = Path.Combine(Path.GetTempPath(), $"scadalink_audit_wiring_{Guid.NewGuid()}.db");
|
||||
|
||||
var builder = WebApplication.CreateBuilder();
|
||||
builder.Configuration.Sources.Clear();
|
||||
builder.Configuration.AddInMemoryCollection(new Dictionary<string, string?>
|
||||
{
|
||||
["ScadaLink:Node:Role"] = "Site",
|
||||
["ScadaLink:Node:NodeHostname"] = "test-site",
|
||||
["ScadaLink:Node:SiteId"] = "TestSite",
|
||||
["ScadaLink:Node:RemotingPort"] = "0",
|
||||
["ScadaLink:Node:GrpcPort"] = "0",
|
||||
["ScadaLink:Database:SiteDbPath"] = _tempDbPath,
|
||||
["ScadaLink:Cluster:SeedNodes:0"] = "akka.tcp://scadalink@localhost:2551",
|
||||
["ScadaLink:Cluster:SeedNodes:1"] = "akka.tcp://scadalink@localhost:2552",
|
||||
// SqliteAuditWriter would attempt to open a SQLite file when first
|
||||
// resolved; point it at an in-memory connection so the test doesn't
|
||||
// pollute the working directory.
|
||||
["AuditLog:SiteWriter:DatabasePath"] = ":memory:",
|
||||
});
|
||||
|
||||
builder.Services.AddGrpc();
|
||||
builder.Services.AddSingleton<ScadaLink.Communication.Grpc.SiteStreamGrpcServer>();
|
||||
SiteServiceRegistration.Configure(builder.Services, builder.Configuration);
|
||||
AkkaHostedServiceRemover.RemoveAkkaHostedServiceOnly(builder.Services);
|
||||
|
||||
_host = builder.Build();
|
||||
}
|
||||
|
||||
public void Dispose()
|
||||
{
|
||||
(_host as IDisposable)?.Dispose();
|
||||
try { File.Delete(_tempDbPath); } catch { /* best effort */ }
|
||||
}
|
||||
|
||||
[Fact]
|
||||
public void Site_Resolves_IAuditWriter_AsFallbackAuditWriter()
|
||||
{
|
||||
var writer = _host.Services.GetService<IAuditWriter>();
|
||||
Assert.NotNull(writer);
|
||||
Assert.IsType<FallbackAuditWriter>(writer);
|
||||
}
|
||||
|
||||
[Fact]
|
||||
public void Site_Resolves_SqliteAuditWriter_AsSingleton()
|
||||
{
|
||||
var a = _host.Services.GetService<SqliteAuditWriter>();
|
||||
var b = _host.Services.GetService<SqliteAuditWriter>();
|
||||
Assert.NotNull(a);
|
||||
Assert.NotNull(b);
|
||||
Assert.Same(a, b);
|
||||
}
|
||||
|
||||
[Fact]
|
||||
public void Site_ISiteAuditQueue_AndSqliteAuditWriter_AreSameInstance()
|
||||
{
|
||||
// The telemetry actor reads from ISiteAuditQueue while ScriptRuntimeContext
|
||||
// writes through IAuditWriter → SqliteAuditWriter. If these don't resolve
|
||||
// to the same instance, pending rows are invisible to the actor.
|
||||
var queue = _host.Services.GetService<ISiteAuditQueue>();
|
||||
var writer = _host.Services.GetService<SqliteAuditWriter>();
|
||||
Assert.NotNull(queue);
|
||||
Assert.NotNull(writer);
|
||||
Assert.Same(writer, queue);
|
||||
}
|
||||
|
||||
[Fact]
|
||||
public void Site_Resolves_RingBufferFallback()
|
||||
{
|
||||
var ring = _host.Services.GetService<RingBufferFallback>();
|
||||
Assert.NotNull(ring);
|
||||
}
|
||||
|
||||
[Fact]
|
||||
public void Site_Resolves_IAuditWriteFailureCounter_AsHealthMetricsBridge()
|
||||
{
|
||||
// Bundle G (M2-T11): site composition root calls
|
||||
// AddAuditLogHealthMetricsBridge() after AddAuditLog + AddSiteHealthMonitoring,
|
||||
// which swaps the NoOp default for the real health-metrics bridge so
|
||||
// FallbackAuditWriter primary failures surface in the site health
|
||||
// report payload as SiteAuditWriteFailures.
|
||||
var counter = _host.Services.GetService<IAuditWriteFailureCounter>();
|
||||
Assert.NotNull(counter);
|
||||
Assert.IsType<HealthMetricsAuditWriteFailureCounter>(counter);
|
||||
}
|
||||
|
||||
[Fact]
|
||||
public void Site_Resolves_ISiteStreamAuditClient_AsNoOpDefault()
|
||||
{
|
||||
var client = _host.Services.GetService<ISiteStreamAuditClient>();
|
||||
Assert.NotNull(client);
|
||||
Assert.IsType<NoOpSiteStreamAuditClient>(client);
|
||||
}
|
||||
|
||||
[Fact]
|
||||
public void Site_Resolves_SiteAuditTelemetryOptions_WithDefaults()
|
||||
{
|
||||
var opts = _host.Services.GetService<IOptions<SiteAuditTelemetryOptions>>();
|
||||
Assert.NotNull(opts);
|
||||
Assert.Equal(256, opts!.Value.BatchSize);
|
||||
Assert.Equal(5, opts.Value.BusyIntervalSeconds);
|
||||
Assert.Equal(30, opts.Value.IdleIntervalSeconds);
|
||||
}
|
||||
}
|
||||
@@ -69,6 +69,7 @@ public class DeploymentManagerRedeployTests : TestKit, IDisposable
|
||||
public void IncrementScriptError() { }
|
||||
public void IncrementAlarmError() { }
|
||||
public void IncrementDeadLetter() { }
|
||||
public void IncrementSiteAuditWriteFailures() { }
|
||||
public void UpdateConnectionHealth(string connectionName, ConnectionHealth health) { }
|
||||
public void RemoveConnection(string connectionName) { }
|
||||
public void UpdateTagResolution(string connectionName, int totalSubscribed, int successfullyResolved) { }
|
||||
|
||||
@@ -0,0 +1,214 @@
|
||||
using Microsoft.Extensions.Logging.Abstractions;
|
||||
using Moq;
|
||||
using ScadaLink.Commons.Entities.Audit;
|
||||
using ScadaLink.Commons.Interfaces.Services;
|
||||
using ScadaLink.Commons.Types;
|
||||
using ScadaLink.Commons.Types.Enums;
|
||||
using ScadaLink.SiteRuntime.Scripts;
|
||||
|
||||
namespace ScadaLink.SiteRuntime.Tests.Scripts;
|
||||
|
||||
/// <summary>
|
||||
/// Audit Log #23 — M2 Bundle F (Task F1): every script-initiated
|
||||
/// <c>ExternalSystem.Call</c> emits exactly one <c>ApiOutbound</c>/<c>ApiCall</c>
|
||||
/// audit event via the wrapper inside
|
||||
/// <see cref="ScriptRuntimeContext.ExternalSystemHelper"/>. The audit emission
|
||||
/// is best-effort: a thrown <see cref="IAuditWriter.WriteAsync"/> must never
|
||||
/// abort the script's call, and the original <see cref="ExternalCallResult"/>
|
||||
/// (or original exception) must surface to the caller unchanged.
|
||||
/// </summary>
|
||||
public class ExternalSystemCallAuditEmissionTests
|
||||
{
|
||||
/// <summary>
|
||||
/// In-memory <see cref="IAuditWriter"/> that records every event passed to
|
||||
/// <see cref="WriteAsync"/>. Optionally configurable to throw, simulating a
|
||||
/// catastrophic audit-writer failure that the wrapper must swallow.
|
||||
/// </summary>
|
||||
private sealed class CapturingAuditWriter : IAuditWriter
|
||||
{
|
||||
public List<AuditEvent> Events { get; } = new();
|
||||
public Exception? ThrowOnWrite { get; set; }
|
||||
|
||||
public Task WriteAsync(AuditEvent evt, CancellationToken ct = default)
|
||||
{
|
||||
if (ThrowOnWrite != null)
|
||||
{
|
||||
return Task.FromException(ThrowOnWrite);
|
||||
}
|
||||
|
||||
Events.Add(evt);
|
||||
return Task.CompletedTask;
|
||||
}
|
||||
}
|
||||
|
||||
private const string SiteId = "site-77";
|
||||
private const string InstanceName = "Plant.Pump42";
|
||||
private const string SourceScript = "ScriptActor:CheckPressure";
|
||||
|
||||
private static ScriptRuntimeContext.ExternalSystemHelper CreateHelper(
|
||||
IExternalSystemClient client,
|
||||
IAuditWriter? auditWriter)
|
||||
{
|
||||
return new ScriptRuntimeContext.ExternalSystemHelper(
|
||||
client,
|
||||
InstanceName,
|
||||
NullLogger.Instance,
|
||||
auditWriter,
|
||||
SiteId,
|
||||
SourceScript);
|
||||
}
|
||||
|
||||
[Fact]
|
||||
public async Task Call_Success_EmitsOneEvent_Channel_ApiOutbound_Kind_ApiCall_Status_Delivered()
|
||||
{
|
||||
var client = new Mock<IExternalSystemClient>();
|
||||
client
|
||||
.Setup(c => c.CallAsync("ERP", "GetOrder", It.IsAny<IReadOnlyDictionary<string, object?>?>(), It.IsAny<CancellationToken>()))
|
||||
.ReturnsAsync(new ExternalCallResult(true, "{}", null));
|
||||
var writer = new CapturingAuditWriter();
|
||||
|
||||
var helper = CreateHelper(client.Object, writer);
|
||||
var result = await helper.Call("ERP", "GetOrder");
|
||||
|
||||
Assert.True(result.Success);
|
||||
Assert.Single(writer.Events);
|
||||
var evt = writer.Events[0];
|
||||
Assert.Equal(AuditChannel.ApiOutbound, evt.Channel);
|
||||
Assert.Equal(AuditKind.ApiCall, evt.Kind);
|
||||
Assert.Equal(AuditStatus.Delivered, evt.Status);
|
||||
Assert.Equal("ERP.GetOrder", evt.Target);
|
||||
Assert.Equal(AuditForwardState.Pending, evt.ForwardState);
|
||||
Assert.Equal(DateTimeKind.Utc, evt.OccurredAtUtc.Kind);
|
||||
Assert.NotEqual(Guid.Empty, evt.EventId);
|
||||
Assert.False(evt.PayloadTruncated);
|
||||
}
|
||||
|
||||
[Fact]
|
||||
public async Task Call_HTTP500_EmitsEvent_Status_Failed_HttpStatus_500_ErrorMessage_Set()
|
||||
{
|
||||
var client = new Mock<IExternalSystemClient>();
|
||||
client
|
||||
.Setup(c => c.CallAsync("ERP", "GetOrder", It.IsAny<IReadOnlyDictionary<string, object?>?>(), It.IsAny<CancellationToken>()))
|
||||
.ReturnsAsync(new ExternalCallResult(false, null, "Transient error: HTTP 500 from ERP: Internal Server Error"));
|
||||
var writer = new CapturingAuditWriter();
|
||||
|
||||
var helper = CreateHelper(client.Object, writer);
|
||||
var result = await helper.Call("ERP", "GetOrder");
|
||||
|
||||
Assert.False(result.Success);
|
||||
Assert.Single(writer.Events);
|
||||
var evt = writer.Events[0];
|
||||
Assert.Equal(AuditStatus.Failed, evt.Status);
|
||||
Assert.Equal(500, evt.HttpStatus);
|
||||
Assert.False(string.IsNullOrEmpty(evt.ErrorMessage));
|
||||
Assert.Contains("500", evt.ErrorMessage);
|
||||
}
|
||||
|
||||
[Fact]
|
||||
public async Task Call_HTTP400_EmitsEvent_Status_Failed_HttpStatus_400()
|
||||
{
|
||||
var client = new Mock<IExternalSystemClient>();
|
||||
client
|
||||
.Setup(c => c.CallAsync("ERP", "GetOrder", It.IsAny<IReadOnlyDictionary<string, object?>?>(), It.IsAny<CancellationToken>()))
|
||||
.ReturnsAsync(new ExternalCallResult(false, null, "Permanent error: HTTP 400 from ERP: Bad Request"));
|
||||
var writer = new CapturingAuditWriter();
|
||||
|
||||
var helper = CreateHelper(client.Object, writer);
|
||||
var result = await helper.Call("ERP", "GetOrder");
|
||||
|
||||
Assert.False(result.Success);
|
||||
Assert.Single(writer.Events);
|
||||
var evt = writer.Events[0];
|
||||
Assert.Equal(AuditStatus.Failed, evt.Status);
|
||||
Assert.Equal(400, evt.HttpStatus);
|
||||
}
|
||||
|
||||
[Fact]
|
||||
public async Task Call_ClientThrows_NetworkException_EmitsEvent_Status_Failed_ErrorMessage_FromException()
|
||||
{
|
||||
var client = new Mock<IExternalSystemClient>();
|
||||
var networkEx = new HttpRequestException("network down");
|
||||
client
|
||||
.Setup(c => c.CallAsync("ERP", "GetOrder", It.IsAny<IReadOnlyDictionary<string, object?>?>(), It.IsAny<CancellationToken>()))
|
||||
.ThrowsAsync(networkEx);
|
||||
var writer = new CapturingAuditWriter();
|
||||
|
||||
var helper = CreateHelper(client.Object, writer);
|
||||
var thrown = await Assert.ThrowsAsync<HttpRequestException>(() => helper.Call("ERP", "GetOrder"));
|
||||
Assert.Same(networkEx, thrown);
|
||||
|
||||
Assert.Single(writer.Events);
|
||||
var evt = writer.Events[0];
|
||||
Assert.Equal(AuditStatus.Failed, evt.Status);
|
||||
Assert.Null(evt.HttpStatus);
|
||||
Assert.Equal("network down", evt.ErrorMessage);
|
||||
Assert.NotNull(evt.ErrorDetail);
|
||||
Assert.Contains("HttpRequestException", evt.ErrorDetail);
|
||||
}
|
||||
|
||||
[Fact]
|
||||
public async Task AuditWriter_Throws_Script_Call_Returns_Original_Result_Unchanged()
|
||||
{
|
||||
var client = new Mock<IExternalSystemClient>();
|
||||
var expected = new ExternalCallResult(true, "{\"v\":1}", null);
|
||||
client
|
||||
.Setup(c => c.CallAsync("ERP", "GetOrder", It.IsAny<IReadOnlyDictionary<string, object?>?>(), It.IsAny<CancellationToken>()))
|
||||
.ReturnsAsync(expected);
|
||||
var writer = new CapturingAuditWriter
|
||||
{
|
||||
ThrowOnWrite = new InvalidOperationException("audit writer down")
|
||||
};
|
||||
|
||||
var helper = CreateHelper(client.Object, writer);
|
||||
var result = await helper.Call("ERP", "GetOrder");
|
||||
|
||||
Assert.Same(expected, result);
|
||||
Assert.Empty(writer.Events);
|
||||
}
|
||||
|
||||
[Fact]
|
||||
public async Task Provenance_Populated_FromContext()
|
||||
{
|
||||
var client = new Mock<IExternalSystemClient>();
|
||||
client
|
||||
.Setup(c => c.CallAsync("ERP", "GetOrder", It.IsAny<IReadOnlyDictionary<string, object?>?>(), It.IsAny<CancellationToken>()))
|
||||
.ReturnsAsync(new ExternalCallResult(true, null, null));
|
||||
var writer = new CapturingAuditWriter();
|
||||
|
||||
var helper = CreateHelper(client.Object, writer);
|
||||
var beforeId = Guid.NewGuid();
|
||||
|
||||
await helper.Call("ERP", "GetOrder");
|
||||
|
||||
var evt = writer.Events[0];
|
||||
Assert.NotEqual(beforeId, evt.EventId);
|
||||
Assert.NotEqual(Guid.Empty, evt.EventId);
|
||||
Assert.Equal(SiteId, evt.SourceSiteId);
|
||||
Assert.Equal(InstanceName, evt.SourceInstanceId);
|
||||
Assert.Equal(SourceScript, evt.SourceScript);
|
||||
Assert.Null(evt.Actor);
|
||||
Assert.Null(evt.CorrelationId);
|
||||
}
|
||||
|
||||
[Fact]
|
||||
public async Task DurationMs_Recorded_NonZero()
|
||||
{
|
||||
var client = new Mock<IExternalSystemClient>();
|
||||
client
|
||||
.Setup(c => c.CallAsync("ERP", "Slow", It.IsAny<IReadOnlyDictionary<string, object?>?>(), It.IsAny<CancellationToken>()))
|
||||
.Returns(async () =>
|
||||
{
|
||||
await Task.Delay(20);
|
||||
return new ExternalCallResult(true, null, null);
|
||||
});
|
||||
var writer = new CapturingAuditWriter();
|
||||
|
||||
var helper = CreateHelper(client.Object, writer);
|
||||
await helper.Call("ERP", "Slow");
|
||||
|
||||
var evt = writer.Events[0];
|
||||
Assert.NotNull(evt.DurationMs);
|
||||
Assert.True(evt.DurationMs >= 0, $"DurationMs={evt.DurationMs} should be >= 0");
|
||||
Assert.True(evt.DurationMs <= 5000, $"DurationMs={evt.DurationMs} should be <= 5000");
|
||||
}
|
||||
}
|
||||
Reference in New Issue
Block a user