docs+code: close Theme 1 — 24 design-doc / XML-doc drift findings
Doc/XML-comment drift + small adherence fixes across 17 modules. Highlights: - Host-017: site CoordinatedShutdown ordering — SiteStreamGrpcServer gains CancelAllStreams() (refuse new streams, cancel active), wired into Program.cs site branch via ApplicationStopping. - InboundAPI-021: ParentExecutionId now travels on RouteToGet/SetAttributes symmetric with RouteToCallRequest; RouteHelper stamps from _parentExecutionId. - ClusterInfra-012: ClusterOptionsValidator now requires both seed nodes. - Comm-018: SiteCommunicationActor.HeartbeatMessage.IsActive derived from cluster leader check (was hardcoded true). - DM-020: reconciliation audit row attributes the current user, not prior deployer. - SEL-019: EventLogPurgeService early-exits on standby via active-node check. - Plus comment/XML-doc accuracy fixes across AuditLog, ConfigurationDatabase, NotificationOutbox, SiteRuntime, SiteCallAudit; doc refreshes for Component- Commons / -ManagementService / -CLI / -ExternalSystemGateway / -HealthMonitoring / -Transport / -ConfigurationDatabase; CD-023 index-name doc alignment. 11 new regression tests (RouteHelper x4, SiteStreamGrpcServer x2, ClusterOptionsValidator x1, SiteCommunicationActor x1, DeploymentService x1, EventLogPurgeService x3). Build clean (0 warnings); InboundAPI/Communication/ Host suites all green. README regenerated: 112 open (was 136).
This commit is contained in:
@@ -307,8 +307,8 @@ Configuration is resolved in the following priority order (highest wins):
|
||||
- **Commons**: Message contracts (`Messages/Management/`) for command type definitions and registry.
|
||||
- **System.CommandLine**: Command-line argument parsing.
|
||||
- **Microsoft.AspNetCore.SignalR.Client**: SignalR client for the `debug stream` command's WebSocket connection.
|
||||
- **Management Service (#18)**: The CLI hits the central cluster via the existing HTTP Management API (`POST /management`), which dispatches to the ManagementActor. The `scadalink audit` command group rides this same transport — there is no separate audit endpoint.
|
||||
- **Audit Log (#23)**: The `scadalink audit query`, `audit export`, and `audit verify-chain` subcommands target the centralized Audit Log component's query/export/verify surfaces via the Management API. Permission checks (`OperationalAudit`, `AuditExport`) are enforced server-side.
|
||||
- **Management Service (#18)**: The CLI hits the central cluster via the existing HTTP Management API (`POST /management`), which dispatches to the ManagementActor. The `scadalink audit` command group rides a parallel REST surface on the same Host (`GET /api/audit/query` and `GET /api/audit/export`), sharing HTTP Basic Auth with `/management` but bypassing the actor for read-only, keyset-paged / streaming workloads.
|
||||
- **Audit Log (#23)**: The `scadalink audit query` and `audit export` subcommands target the centralized Audit Log component's REST endpoints (`GET /api/audit/query`, `GET /api/audit/export`) on the Host's Management API surface; `audit verify-chain` rides `POST /management` until hash-chain verification ships. Permission checks (`OperationalAudit`, `AuditExport`) are enforced server-side by `AuditEndpoints`.
|
||||
|
||||
## Interactions
|
||||
|
||||
|
||||
@@ -39,9 +39,11 @@ Commons must define shared primitive and utility types used across multiple comp
|
||||
- **`TrackedOperationKind` enum**: ExternalCall, DatabaseWrite. Discriminates the two cached-call kinds carried by a tracked operation (notifications are tracked separately via the `NotificationType` enum).
|
||||
- **`TrackedOperationStatus` enum**: Pending, Retrying, Delivered, Parked, Failed, Discarded. The unified lifecycle state shared by all tracked store-and-forward operations. This is the operation's externally-observable lifecycle status in the site-local tracking table (the status record); it is related to but distinct from the S&F buffer's own `StoreAndForwardMessageStatus`, which tracks a buffered message's retry state within the buffer (the retry mechanism). `Failed` (permanent failure) has no notification analogue — notifications use only the other five states (the `NotificationStatus` enum omits `Failed`).
|
||||
- **`AuditChannel` enum**: ApiOutbound, DbOutbound, Notification, ApiInbound. Discriminates the script-trust-boundary channel that produced an `AuditEvent`. Owned by the Audit Log component.
|
||||
- **`AuditKind` enum**: SyncCall, CachedEnqueued, CachedAttempt, CachedTerminal, SyncWrite, SyncRead, Enqueued, Attempt, Terminal, Completed. Channel-specific event kind — the valid `Kind` values for each `AuditChannel` are listed in the Audit Log component design (`Component-AuditLog.md`).
|
||||
- **`AuditStatus` enum**: Success, TransientFailure, PermanentFailure, Enqueued, Retrying, Delivered, Parked, Discarded. Outcome of a single audit event row; superset of `TrackedOperationStatus` to also cover one-shot sync calls.
|
||||
- **`AuditEvent`**: A record carrying every column of the central `AuditLog` row — `EventId` (GUID, idempotency key), `OccurredAtUtc`, `IngestedAtUtc`, `Channel` (`AuditChannel`), `Kind` (`AuditKind`), `CorrelationId`, `SourceSiteId`, `SourceInstanceId`, `SourceScript`, `Actor`, `Target`, `Status` (`AuditStatus`), `HttpStatus`, `DurationMs`, `ErrorMessage`, `ErrorDetail`, `RequestSummary`, `ResponseSummary`, `PayloadTruncated`, `Extra` — plus a site-only `ForwardState` (`Pending` | `Forwarded` | `Reconciled`) used by the site SQLite write-buffer's telemetry/reconciliation loop. `IngestedAtUtc` is unset at the site and stamped on central ingest. See `Component-AuditLog.md` for the persistence schema and ingest semantics.
|
||||
- **`AuditKind` enum**: ApiCall, ApiCallCached, DbWrite, DbWriteCached, NotifySend, NotifyDeliver, InboundRequest, InboundAuthFailure, CachedSubmit, CachedResolve. Channel-specific event kind — the valid `Kind` values for each `AuditChannel` are listed in the Audit Log component design (`Component-AuditLog.md`).
|
||||
- **`AuditStatus` enum**: Submitted, Forwarded, Attempted, Delivered, Failed, Parked, Discarded, Skipped. Lifecycle status of an audit event row; cached operations transit Submitted → Forwarded → Attempted → Delivered/Parked/Discarded. `Skipped` covers short-circuited (e.g. dry-run) actions that should still be audited.
|
||||
- **`AuditForwardState` enum**: Pending, Forwarded, Reconciled. Site-local SQLite flag governing the telemetry/reconciliation loop (set on a row but never sent to central).
|
||||
- **`AuditEvent`**: A record carrying every column of the central `AuditLog` row — `EventId` (GUID, idempotency key), `OccurredAtUtc`, `IngestedAtUtc`, `Channel` (`AuditChannel`), `Kind` (`AuditKind`), `CorrelationId`, `ExecutionId`, `ParentExecutionId`, `SourceSiteId`, `SourceNode`, `SourceInstanceId`, `SourceScript`, `Actor`, `Target`, `Status` (`AuditStatus`), `HttpStatus`, `DurationMs`, `ErrorMessage`, `ErrorDetail`, `RequestSummary`, `ResponseSummary`, `PayloadTruncated`, `Extra` — plus a site-only `ForwardState` (`AuditForwardState`) used by the site SQLite write-buffer's telemetry/reconciliation loop. `IngestedAtUtc` is unset at the site and stamped on central ingest. See `Component-AuditLog.md` for the persistence schema and ingest semantics.
|
||||
- **`SiteCall`**: A record carrying the central `SiteCalls` operational-mirror row — `TrackedOperationId`, `SourceSiteId`, `SourceNode`, `Kind`, `Target`, `Status`, `RetryCount`, key timestamps, and provenance — fed by site `CachedCallTelemetry` and the periodic reconciliation pull.
|
||||
|
||||
Types defined here must be immutable and thread-safe.
|
||||
|
||||
@@ -76,7 +78,7 @@ Entity classes are organized by domain area:
|
||||
- **Inbound API**: `ApiKey`, `ApiMethod`.
|
||||
- **Security**: `LdapGroupMapping`, `SiteScopeRule`.
|
||||
- **Deployment**: `DeploymentRecord`, `SystemArtifactDeploymentRecord`, `DeployedConfigSnapshot`.
|
||||
- **Audit**: `AuditLogEntry`.
|
||||
- **Audit**: `AuditLogEntry` (configuration-change audit, owned by Configuration Database), `AuditEvent` (centralized Audit Log row, see REQ-COM-1), `SiteCall` (`SiteCalls` operational-mirror row).
|
||||
|
||||
The **`Notification`** entity is the persistence-ignorant POCO for a row of the central `Notifications` table — the durable notification queue owned by the Notification Outbox. It is a plain class with properties for `NotificationId` (GUID, the idempotency key), `Type` (`NotificationType` enum discriminator), `ListName`, `Subject`, `Body`, `TypeData` (a JSON string — the type-agnostic extensibility hook), `Status` (`NotificationStatus` enum), `RetryCount`, `LastError`, `ResolvedTargets`, the provenance fields `SourceSiteId` / `SourceInstanceId` / `SourceScript`, and the UTC timestamps `SiteEnqueuedAt`, `CreatedAt`, `LastAttemptAt`, `NextAttemptAt`, `DeliveredAt`. As with every entity class it has no EF dependency; the Configuration Database component supplies the Fluent API mapping, value conversions, and indexes. The `Type` and `Status` enums (`NotificationType`: `Email`, `Teams`, …; `NotificationStatus`: `Pending`, `Retrying`, `Delivered`, `Parked`, `Discarded`) are defined under `Types/Enums/` per REQ-COM-1.
|
||||
|
||||
@@ -92,6 +94,7 @@ Commons must define repository interfaces that consuming components use for data
|
||||
- `INotificationRepository` — Notification lists (including the `Type` field), recipients, SMTP configuration.
|
||||
- `INotificationOutboxRepository` — The `Notifications` table: insert-if-not-exists ingest on `NotificationId`, due-row polling (`Pending` rows and `Retrying` rows past `NextAttemptAt`), status transitions, KPI aggregate queries, and the bulk delete of terminal rows used by the daily purge job.
|
||||
- `ISiteCallAuditRepository` — The `SiteCalls` table: insert-if-not-exists ingest on `TrackedOperationId`, upsert-on-newer-status from telemetry and reconciliation pulls, KPI aggregate queries, and the bulk delete of terminal rows used by the daily purge job.
|
||||
- `IAuditLogRepository` — The central `AuditLog` table (Audit Log #23): insert-if-not-exists ingest on `EventId`, keyset-paged query, monthly partition switch-out and boundary inspection, KPI snapshots, recursive execution-tree walks, and distinct-source-node enumeration.
|
||||
- `ISiteRepository` — Sites, data connections, and their site assignments.
|
||||
- `ICentralUiRepository` — Read-oriented queries spanning multiple domain areas for display purposes.
|
||||
|
||||
@@ -113,6 +116,13 @@ Commons must define service interfaces for cross-cutting concerns that multiple
|
||||
- **`INotificationDeliveryService`**: Sends notifications to a named notification list, routing transient failures to store-and-forward. Implemented by the Notification Service, consumed by the script runtime context.
|
||||
- **`IAuditWriter`**: Site-local hot-path interface for appending an `AuditEvent` to the site SQLite `AuditLog`: `Task WriteAsync(AuditEvent evt, CancellationToken ct)`. Single durable INSERT, `ForwardState = Pending`. Consumed by the script-trust-boundary call paths (External System Gateway, Database layer, Store-and-Forward Engine). Implementation lives in the Audit Log component.
|
||||
- **`ICentralAuditWriter`**: Central direct-write interface for central-originated audit rows (Inbound API request completion, Notification Outbox dispatcher attempts/terminals): `Task WriteAsync(AuditEvent evt, CancellationToken ct)`, with insert-if-not-exists semantics on `EventId` so retried handlers cannot produce duplicates. Implementation lives in the Audit Log component.
|
||||
- **`ISiteAuditQueue`**: Site-local queue handing off `AuditEvent` rows from the hot path to the gRPC telemetry forwarder. Implementation lives in the Audit Log component.
|
||||
- **`ICachedCallLifecycleObserver`** / **`ICachedCallTelemetryForwarder`**: Bridge between the Store-and-Forward Engine's cached-call lifecycle transitions and the central `CachedCallTelemetry` packet (combined audit + operational state). Implementations live in the Audit Log component.
|
||||
- **`INodeIdentityProvider`**: Resolves the writing node's `SourceNode` label (`node-a` / `node-b` / `central-a` / `central-b`) stamped on every audit row, notification, and site-call.
|
||||
- **`IOperationTrackingStore`**: Site-local SQLite-backed status record store for tracked store-and-forward operations (`Tracking.Status(id)`).
|
||||
- **`IPartitionMaintenance`**: Central monthly partition-switch / retention purge job hook used by the Audit Log partition maintenance service.
|
||||
|
||||
Bundle transport interfaces (`IBundleExporter`, `IBundleImporter`, `IBundleSessionStore`, `IAuditCorrelationContext`) live alongside the data types in `Interfaces/Transport/` and are owned by the Transport component (#24); they are defined in Commons so other components (Configuration Database for audit correlation, Central UI for the import workflow) can depend on the abstraction without taking a Transport dependency.
|
||||
|
||||
These interfaces are defined in Commons so that consuming components depend only on the abstraction, not on the implementing component.
|
||||
|
||||
@@ -159,19 +169,36 @@ ScadaLink.Commons/
|
||||
│ ├── ValueFormatter.cs # culture-invariant value-to-string helper
|
||||
│ ├── DynamicJsonElement.cs # dynamic JSON wrapper for scripts
|
||||
│ ├── TrackedOperationId.cs # tracked store-and-forward operation ID (GUID)
|
||||
│ ├── AuditLogKpiSnapshot.cs # central AuditLog KPI tile shape
|
||||
│ ├── SiteAuditBacklogSnapshot.cs # per-site audit-forward backlog snapshot
|
||||
│ ├── SiteCallOperational.cs # SiteCalls operational-row projection
|
||||
│ ├── TrackingStatusSnapshot.cs # site-local Tracking.Status(id) projection
|
||||
│ ├── Enums/ # InstanceState, DeploymentStatus, AlarmState,
|
||||
│ │ # AlarmLevel, AlarmTriggerType, ConnectionHealth,
|
||||
│ │ # DataType, StoreAndForwardCategory,
|
||||
│ │ # StoreAndForwardMessageStatus,
|
||||
│ │ # NotificationType, NotificationStatus,
|
||||
│ │ # TrackedOperationKind, TrackedOperationStatus,
|
||||
│ │ # AuditChannel, AuditKind, AuditStatus
|
||||
│ ├── Audit/ # AuditEvent record (site + central audit row)
|
||||
│ │ # AuditChannel, AuditKind, AuditStatus,
|
||||
│ │ # AuditForwardState
|
||||
│ ├── Audit/ # AuditLogPaging, AuditLogQueryFilter,
|
||||
│ │ # AuditQueryParamParsers, ExecutionTreeNode,
|
||||
│ │ # SiteCallKpiSnapshot, SiteCallPaging,
|
||||
│ │ # SiteCallQueryFilter, SiteCallSiteKpiSnapshot
|
||||
│ ├── DataConnections/ # OPC UA endpoint config value objects + enums
|
||||
│ ├── Flattening/ # FlattenedConfiguration, ConfigurationDiff,
|
||||
│ │ # DeploymentPackage, ValidationResult
|
||||
│ ├── InboundApi/ # ApiKeyHasher, ParameterDefinition
|
||||
│ ├── Notifications/ # NotificationKpiSnapshot, NotificationOutboxFilter,
|
||||
│ │ # SiteNotificationKpiSnapshot
|
||||
│ ├── Transport/ # Transport bundle value objects: BundleManifest,
|
||||
│ │ # BundleSession, BundleSummary, EncryptionMetadata,
|
||||
│ │ # ExportSelection, ImportPreview, ImportResolution,
|
||||
│ │ # ImportResult, ManifestContentEntry
|
||||
│ └── Scripts/ # AlarmContext, ScriptScope
|
||||
├── Interfaces/ # Shared interfaces by concern
|
||||
│ ├── IOperationTrackingStore.cs # site-local tracked-operation status store
|
||||
│ ├── IPartitionMaintenance.cs # central partition-switch / retention purge hook
|
||||
│ ├── Protocol/ # REQ-COM-2: Protocol abstraction (IDataConnection, etc.)
|
||||
│ ├── Repositories/ # REQ-COM-4: Per-component repository interfaces
|
||||
│ │ ├── ITemplateEngineRepository.cs
|
||||
@@ -182,16 +209,26 @@ ScadaLink.Commons/
|
||||
│ │ ├── INotificationRepository.cs
|
||||
│ │ ├── INotificationOutboxRepository.cs
|
||||
│ │ ├── ISiteCallAuditRepository.cs
|
||||
│ │ ├── IAuditLogRepository.cs
|
||||
│ │ ├── ISiteRepository.cs
|
||||
│ │ └── ICentralUiRepository.cs
|
||||
│ └── Services/ # REQ-COM-4a: Cross-cutting service interfaces
|
||||
│ ├── IAuditService.cs
|
||||
│ ├── IAuditWriter.cs
|
||||
│ ├── ICentralAuditWriter.cs
|
||||
│ ├── IDatabaseGateway.cs
|
||||
│ ├── IExternalSystemClient.cs
|
||||
│ ├── IInstanceLocator.cs
|
||||
│ └── INotificationDeliveryService.cs
|
||||
│ ├── Services/ # REQ-COM-4a: Cross-cutting service interfaces
|
||||
│ │ ├── IAuditService.cs
|
||||
│ │ ├── IAuditWriter.cs
|
||||
│ │ ├── ICentralAuditWriter.cs
|
||||
│ │ ├── ISiteAuditQueue.cs
|
||||
│ │ ├── ICachedCallLifecycleObserver.cs
|
||||
│ │ ├── ICachedCallTelemetryForwarder.cs
|
||||
│ │ ├── INodeIdentityProvider.cs
|
||||
│ │ ├── IDatabaseGateway.cs
|
||||
│ │ ├── IExternalSystemClient.cs
|
||||
│ │ ├── IInstanceLocator.cs
|
||||
│ │ └── INotificationDeliveryService.cs
|
||||
│ └── Transport/ # Bundle transport interfaces (Transport #24):
|
||||
│ ├── IAuditCorrelationContext.cs
|
||||
│ ├── IBundleExporter.cs
|
||||
│ ├── IBundleImporter.cs
|
||||
│ └── IBundleSessionStore.cs
|
||||
├── Entities/ # REQ-COM-3: Domain entity POCOs, by domain area
|
||||
│ ├── Templates/ # Template, TemplateAttribute, TemplateAlarm,
|
||||
│ │ # TemplateScript, TemplateComposition, TemplateFolder
|
||||
@@ -207,7 +244,9 @@ ScadaLink.Commons/
|
||||
│ ├── Deployment/ # DeploymentRecord, SystemArtifactDeploymentRecord,
|
||||
│ │ # DeployedConfigSnapshot
|
||||
│ ├── Scripts/ # SharedScript
|
||||
│ └── Audit/ # AuditLogEntry
|
||||
│ └── Audit/ # AuditLogEntry (config-change audit),
|
||||
│ # AuditEvent (centralized AuditLog row),
|
||||
│ # SiteCall (SiteCalls operational mirror)
|
||||
├── Messages/ # REQ-COM-5: Cross-component message contracts, by concern
|
||||
│ ├── Deployment/
|
||||
│ ├── Lifecycle/
|
||||
@@ -227,7 +266,13 @@ ScadaLink.Commons/
|
||||
│ ├── InboundApi/ # Route.To() request messages
|
||||
│ ├── RemoteQuery/ # event-log and parked-message query messages,
|
||||
│ │ # parked-operation retry/discard commands
|
||||
│ └── Management/ # HTTP/ClusterClient management commands + registry
|
||||
│ ├── Audit/ # Audit Log (#23) + Site Call Audit (#22) ingest:
|
||||
│ │ # IngestAuditEventsCommand/Reply,
|
||||
│ │ # IngestCachedTelemetryCommand/Reply,
|
||||
│ │ # UpsertSiteCallCommand/Reply, SiteCallQueries,
|
||||
│ │ # SiteCallRelayMessages
|
||||
│ └── Management/ # HTTP/ClusterClient management commands + registry,
|
||||
│ # including TransportCommands (Export/Preview/Import bundle)
|
||||
├── Serialization/ # OpcUaEndpointConfigSerializer (typed↔legacy JSON)
|
||||
└── Validators/ # OpcUaEndpointConfigValidator
|
||||
```
|
||||
|
||||
@@ -61,7 +61,7 @@ The configuration database stores all central system data, organized by domain a
|
||||
- **SiteCalls**: The central audit table for cached site calls — `ExternalSystem.CachedCall()` and `Database.CachedWrite()` — owned by the Site Call Audit component and a sibling of the `Notifications` table. One row per cached operation. Columns: `TrackedOperationId` (GUID, primary key — generated site-side at call time, used as the idempotency key), `SourceSite`, `Kind` (a `TrackedOperationKind` enum stored with values `ExternalCall` / `DatabaseWrite`), `TargetSummary` (external system + method for an `ExternalCall`, database connection name for a `DatabaseWrite`), `Status` (a `TrackedOperationStatus` enum stored with values `Pending`, `Retrying`, `Delivered`, `Parked`, `Failed`, `Discarded`), `RetryCount`, `LastError`, `Provenance` (source instance / script), `CreatedAtUtc`, `UpdatedAtUtc`, `TerminalAtUtc`. The table is populated **only** by Site Call Audit telemetry and reconciliation pulls — sites are the source of truth and the row is an eventually-consistent mirror, never written by a central dispatcher. Ingestion is **insert-if-not-exists** keyed on `TrackedOperationId`, then **upsert-on-newer-status**; the lifecycle is monotonic, so at-least-once and out-of-order telemetry are harmless. Indexed on `Status` and `SourceSite` for KPI computation and the Central UI query page. Terminal rows are removed by a daily purge job — see Scheduled Maintenance below. See Component-SiteCallAudit.md for the full lifecycle.
|
||||
|
||||
### Audit Log
|
||||
- **AuditLog**: The central, append-only audit table owned by the Audit Log component — one row per script-trust-boundary lifecycle event across all channels (outbound API calls, outbound DB writes/reads, notifications, and inbound API requests). Sibling of the `Notifications` and `SiteCalls` tables but distinct: `AuditLog` is the immutable history that observes the other subsystems, not an operational state store. Columns: `EventId` (`uniqueidentifier` primary key — generated at the originator, used as the idempotency key), `OccurredAtUtc` (`datetime2`), `IngestedAtUtc` (`datetime2`), `Channel` (`varchar(32)` — `ApiOutbound` / `DbOutbound` / `Notification` / `ApiInbound`), `Kind` (`varchar(32)` — channel-specific event kind), `CorrelationId` (`uniqueidentifier` NULL — `TrackedOperationId` for cached calls, `NotificationId` for notifications, request-id for inbound API), `SourceSiteId` (`varchar(64)` NULL), `SourceInstanceId` (`varchar(128)` NULL), `SourceScript` (`varchar(128)` NULL), `Actor` (`varchar(128)` NULL), `Target` (`varchar(256)` NULL), `Status` (`varchar(32)` — outcome of *this event*: `Success`, `TransientFailure`, `PermanentFailure`, `Enqueued`, `Retrying`, `Delivered`, `Parked`, `Discarded`), `HttpStatus` (`int` NULL), `DurationMs` (`int` NULL), `ErrorMessage` (`nvarchar(1024)` NULL), `ErrorDetail` (`nvarchar(max)` NULL), `RequestSummary` (`nvarchar(max)` NULL — truncated request payload, headers redacted), `ResponseSummary` (`nvarchar(max)` NULL — truncated response payload), `PayloadTruncated` (`bit`), `Extra` (`nvarchar(max)` NULL — channel-specific JSON for fields not promoted to columns). Indexes: `IX_AuditLog_OccurredAtUtc` (primary time-range index for global scans), `IX_AuditLog_Site_Occurred (SourceSiteId, OccurredAtUtc)` (per-site filters), `IX_AuditLog_Correlation (CorrelationId)` (drilldown from a single operation), `IX_AuditLog_Channel_Status_Occurred (Channel, Status, OccurredAtUtc)` (KPI / dashboard tiles), and `IX_AuditLog_Target_Occurred (Target, OccurredAtUtc)` ("what did we send to system X"). The primary key on `EventId` enforces idempotency — central ingest is `INSERT … WHERE NOT EXISTS`, so at-least-once telemetry and reconciliation retries collapse to a single row. **Monthly partitioning** on `OccurredAtUtc` from day one via partition function `pf_AuditLog_Month` and partition scheme `ps_AuditLog_Month`, with a filegroup-per-month rollover so that retention purge is a partition switch rather than a row-level delete. The partition-maintenance job that rolls the scheme forward and switches expired partitions is owned by the Audit Log component, not this component. The table is populated only by Audit Log writers (site telemetry, central direct-write, reconciliation pulls); central ingest is **insert-if-not-exists** keyed on `EventId`. See Component-AuditLog.md for the full lifecycle, payload-capture policy, and ingestion paths.
|
||||
- **AuditLog**: The central, append-only audit table owned by the Audit Log component — one row per script-trust-boundary lifecycle event across all channels (outbound API calls, outbound DB writes/reads, notifications, and inbound API requests). Sibling of the `Notifications` and `SiteCalls` tables but distinct: `AuditLog` is the immutable history that observes the other subsystems, not an operational state store. Columns: `EventId` (`uniqueidentifier` primary key — generated at the originator, used as the idempotency key), `OccurredAtUtc` (`datetime2`), `IngestedAtUtc` (`datetime2`), `Channel` (`varchar(32)` — `ApiOutbound` / `DbOutbound` / `Notification` / `ApiInbound`), `Kind` (`varchar(32)` — channel-specific event kind), `CorrelationId` (`uniqueidentifier` NULL — `TrackedOperationId` for cached calls, `NotificationId` for notifications, request-id for inbound API), `SourceSiteId` (`varchar(64)` NULL), `SourceInstanceId` (`varchar(128)` NULL), `SourceScript` (`varchar(128)` NULL), `Actor` (`varchar(128)` NULL), `Target` (`varchar(256)` NULL), `Status` (`varchar(32)` — outcome of *this event*: `Success`, `TransientFailure`, `PermanentFailure`, `Enqueued`, `Retrying`, `Delivered`, `Parked`, `Discarded`), `HttpStatus` (`int` NULL), `DurationMs` (`int` NULL), `ErrorMessage` (`nvarchar(1024)` NULL), `ErrorDetail` (`nvarchar(max)` NULL), `RequestSummary` (`nvarchar(max)` NULL — truncated request payload, headers redacted), `ResponseSummary` (`nvarchar(max)` NULL — truncated response payload), `PayloadTruncated` (`bit`), `Extra` (`nvarchar(max)` NULL — channel-specific JSON for fields not promoted to columns). Indexes: `IX_AuditLog_OccurredAtUtc` (primary time-range index for global scans), `IX_AuditLog_Site_Occurred (SourceSiteId, OccurredAtUtc)` (per-site filters), `IX_AuditLog_CorrelationId (CorrelationId)` (drilldown from a single operation), `IX_AuditLog_Channel_Status_Occurred (Channel, Status, OccurredAtUtc)` (KPI / dashboard tiles), and `IX_AuditLog_Target_Occurred (Target, OccurredAtUtc)` ("what did we send to system X"). The primary key on `EventId` enforces idempotency — central ingest is `INSERT … WHERE NOT EXISTS`, so at-least-once telemetry and reconciliation retries collapse to a single row. **Monthly partitioning** on `OccurredAtUtc` from day one via partition function `pf_AuditLog_Month` and partition scheme `ps_AuditLog_Month`, with a filegroup-per-month rollover so that retention purge is a partition switch rather than a row-level delete. The partition-maintenance job that rolls the scheme forward and switches expired partitions is owned by the Audit Log component, not this component. The table is populated only by Audit Log writers (site telemetry, central direct-write, reconciliation pulls); central ingest is **insert-if-not-exists** keyed on `EventId`. See Component-AuditLog.md for the full lifecycle, payload-capture policy, and ingestion paths.
|
||||
|
||||
### Inbound API
|
||||
- **API Keys**: Key definitions (name/label, key value, enabled flag).
|
||||
|
||||
@@ -39,7 +39,7 @@ Each external system definition includes:
|
||||
- **Retry Settings**: Max retry count, fixed time between retries (used by Store-and-Forward Engine for transient failures only).
|
||||
- **Method Definitions**: List of available API methods, each with:
|
||||
- Method name.
|
||||
- **HTTP method**: GET, POST, PUT, or DELETE.
|
||||
- **HTTP method**: GET, POST, PUT, PATCH, or DELETE.
|
||||
- **Path**: Relative path appended to the base URL (e.g., `/recipes/{id}`).
|
||||
- Parameter definitions (name, type). Supports the extended type system (Boolean, Integer, Float, String, Object, List).
|
||||
- Return type definition. Supports the extended type system for complex response structures.
|
||||
@@ -72,7 +72,7 @@ Each database connection definition includes:
|
||||
All external system calls are **HTTP/REST** with **JSON** serialization:
|
||||
|
||||
- The ESG acts as an HTTP client. The external system definition provides the base URL; each method definition specifies the HTTP method and relative path.
|
||||
- Request parameters are serialized as JSON in the request body (POST/PUT) or as query parameters (GET/DELETE).
|
||||
- Request parameters are serialized as JSON in the request body (POST/PUT/PATCH) or as query parameters (GET/DELETE).
|
||||
- Response bodies are deserialized from JSON into the method's defined return type.
|
||||
- Credentials (API key header or Basic Auth header) are attached to every request per the system's authentication configuration.
|
||||
|
||||
|
||||
@@ -36,8 +36,6 @@ Site clusters (metric collection and reporting). Central cluster (aggregation an
|
||||
| Notification Outbox parked count | Notification Outbox (central) | Count of `Parked` notifications — central-computed, not site-reported |
|
||||
| `SiteAuditBacklog` | Audit Log (site) | Count of `Pending` rows in the site-local `AuditLog` plus oldest-pending-age plus on-disk bytes. A configurable threshold drives a Health dashboard warning on the affected site tile. |
|
||||
| `SiteAuditWriteFailures` | Audit Log (site) | Count of failed hot-path audit appends at the site since the last health report. |
|
||||
| `SiteAuditTelemetryStalled` | Audit Log (site) | Boolean flag set when reconciliation reports a non-draining site-local audit backlog over two consecutive cycles. |
|
||||
| `CentralAuditWriteFailures` | Audit Log (central) | Count of central direct-write audit failures (Inbound API middleware, Notification Outbox dispatcher, and any other central direct writers) since the last interval. |
|
||||
| `AuditRedactionFailure` | Audit Log (central) | Count of payload redactor errors (over-redacted payloads, safety-net hit) since the last interval. |
|
||||
|
||||
## Reporting Protocol
|
||||
@@ -86,10 +84,10 @@ Unlike the Notification Outbox, the Site Call Audit is **not a dispatcher** —
|
||||
The Audit Log spans both sites (hot-path append + telemetry forward) and central (direct-write + ingest + redaction). Its operational health surfaces as three new dashboard tiles grouped under **Audit**:
|
||||
|
||||
- **Audit volume** — events/min landing in the central `AuditLog` table, shown global plus per-site sparkline; sourced from the Audit Log component on the active central node.
|
||||
- **Audit error rate** — percent of central `AuditLog` rows with `Status` other than `Success` / `Delivered` / `Enqueued` over a rolling 5-minute window. This is the operational error rate of audited operations (HTTP 5xx, transient failures, parked deliveries, etc.) — NOT the audit writer's own health. Audit-writer issues surface separately via `CentralAuditWriteFailures` and `AuditRedactionFailure`.
|
||||
- **Audit backlog** — global aggregate of `SiteAuditBacklog` across reporting sites (count of `Pending` site-local audit rows, oldest pending age, on-disk bytes); click drills into a per-site breakdown. The per-site tile surfaces a warning badge when its `SiteAuditBacklog` crosses the configurable threshold or when `SiteAuditTelemetryStalled` is set.
|
||||
- **Audit error rate** — percent of central `AuditLog` rows with `Status` other than `Success` / `Delivered` / `Enqueued` over a rolling 5-minute window. This is the operational error rate of audited operations (HTTP 5xx, transient failures, parked deliveries, etc.) — NOT the audit writer's own health. Audit-writer issues surface separately via `AuditRedactionFailure`.
|
||||
- **Audit backlog** — global aggregate of `SiteAuditBacklog` across reporting sites (count of `Pending` site-local audit rows, oldest pending age, on-disk bytes); click drills into a per-site breakdown. The per-site tile surfaces a warning badge when its `SiteAuditBacklog` crosses the configurable threshold.
|
||||
|
||||
These tiles are **point-in-time** like the Notification Outbox and Site Call Audit KPI tiles — no time-series store; consistent with Health Monitoring's "current status only" philosophy. The site-scoped `SiteAuditBacklog` / `SiteAuditWriteFailures` / `SiteAuditTelemetryStalled` metrics arrive in the existing site health report; the central-scoped `CentralAuditWriteFailures` / `AuditRedactionFailure` metrics are central-computed alongside the existing central KPIs.
|
||||
These tiles are **point-in-time** like the Notification Outbox and Site Call Audit KPI tiles — no time-series store; consistent with Health Monitoring's "current status only" philosophy. The site-scoped `SiteAuditBacklog` / `SiteAuditWriteFailures` metrics arrive in the existing site health report; the central-scoped `AuditRedactionFailure` metric is central-computed alongside the existing central KPIs.
|
||||
|
||||
## Central Storage
|
||||
|
||||
@@ -112,7 +110,7 @@ These tiles are **point-in-time** like the Notification Outbox and Site Call Aud
|
||||
- **Cluster Infrastructure (site)**: Provides node role status.
|
||||
- **Notification Outbox (central)**: Provides central-computed outbox KPIs — queue depth, stuck count, parked count — for the headline dashboard tiles.
|
||||
- **Site Call Audit (central)**: Provides central-computed cached-call KPIs — buffered count, parked count, failed/delivered (last interval), oldest pending age, stuck count — for the headline dashboard tiles.
|
||||
- **Audit Log (#23)**: Provides the site-reported `SiteAuditBacklog` / `SiteAuditWriteFailures` / `SiteAuditTelemetryStalled` metrics (via the site health report) and the central-computed `CentralAuditWriteFailures` / `AuditRedactionFailure` metrics, plus the central audit-row rate feeding the **Audit** dashboard tile group (Audit volume, Audit error rate, Audit backlog).
|
||||
- **Audit Log (#23)**: Provides the site-reported `SiteAuditBacklog` / `SiteAuditWriteFailures` metrics (via the site health report) and the central-computed `AuditRedactionFailure` metric, plus the central audit-row rate feeding the **Audit** dashboard tile group (Audit volume, Audit error rate, Audit backlog).
|
||||
|
||||
## Interactions
|
||||
|
||||
|
||||
@@ -74,6 +74,15 @@ Content-Type: application/json
|
||||
|
||||
The endpoint performs LDAP authentication and role resolution server-side, collapsing the CLI's previous two-step flow (ResolveRoles + actual command) into a single HTTP round-trip.
|
||||
|
||||
## HTTP Audit API
|
||||
|
||||
In addition to `/management`, the Management Service exposes a dedicated REST surface for the centralized Audit Log component (#23). These endpoints live in `AuditEndpoints.cs` and bypass the `ManagementActor` because the query/export workloads are read-only, keyset-paged, and stream large result sets:
|
||||
|
||||
- `GET /api/audit/query` — keyset-paged JSON query over the central `AuditLog` table. Authenticated via HTTP Basic Auth (shared with `/management`); gated on the `OperationalAudit` permission (Admin / Audit / AuditReadOnly roles).
|
||||
- `GET /api/audit/export` — server-side streaming bulk export (CSV or JSONL) of the filtered rows. Gated on the `AuditExport` permission (Admin / Audit).
|
||||
|
||||
Both endpoints honour any site-scope rules attached to the caller's audit role by intersecting the caller-supplied `sourceSiteId` filter with the user's `PermittedSiteIds` (out-of-scope requests yield HTTP 403). Permission denial returns HTTP 403 with the same envelope shape used by `/management`.
|
||||
|
||||
## Message Groups
|
||||
|
||||
### Templates
|
||||
@@ -145,7 +154,13 @@ The endpoint performs LDAP authentication and role resolution server-side, colla
|
||||
|
||||
### Audit Log
|
||||
|
||||
- **QueryAuditLog**: Query audit log entries with filtering by entity type, user, date range, etc.
|
||||
- **QueryAuditLog**: Legacy configuration-change audit query (filtered by entity type, user, date range, etc.) routed through `/management`. Gated to the `Admin` role; superseded for the centralized Audit Log component (#23) by the dedicated `/api/audit/*` REST endpoints described below.
|
||||
|
||||
### Transport (Bundle Import / Export)
|
||||
|
||||
- **ExportBundle**: Build an encrypted bundle (templates, system artifacts, central-only configuration). Gated to the `Design` role. Returns base64-encoded bundle bytes plus a byte count.
|
||||
- **PreviewBundle**: Unlock and inspect a previously uploaded bundle session, returning the per-entity preview (adds, modifies, identicals, blockers). Gated to the `Admin` role.
|
||||
- **ImportBundle**: Apply a previewed bundle with per-conflict resolutions inside a single audit-correlated session. Gated to the `Admin` role.
|
||||
|
||||
### Shared Scripts
|
||||
|
||||
@@ -206,7 +221,7 @@ The ManagementActor receives the following services and repositories via DI (inj
|
||||
|
||||
| Section | Options Class | Contents |
|
||||
|---------|--------------|----------|
|
||||
| `ScadaLink:ManagementService` | `ManagementServiceOptions` | (Reserved for future configuration — e.g., command timeout overrides) |
|
||||
| `ScadaLink:ManagementService` | `ManagementServiceOptions` | `CommandTimeout` (`TimeSpan`, default 30 s) — Ask timeout the HTTP endpoint applies when forwarding to the `ManagementActor`. A non-positive configured value falls back to the 30 s default. |
|
||||
|
||||
## Dependencies
|
||||
|
||||
|
||||
@@ -76,7 +76,7 @@ Exactly one of `content.json` or `content.enc` is present.
|
||||
}
|
||||
```
|
||||
|
||||
The manifest is plaintext so the import wizard can preview bundle contents and source provenance before the user supplies a passphrase.
|
||||
The manifest is plaintext so the import wizard can preview bundle contents and source provenance before the user supplies a passphrase. (Implementation note: `BundleImporter.LoadAsync` always parses the manifest *and* reads the content blob to verify the SHA-256 hash on every call, regardless of whether a passphrase is available — the "manifest peek" is conceptual rather than a cheap O(manifest) operation. For an encrypted bundle without a passphrase the call surfaces the encrypted-bundle prompt via the validated envelope, so the UI gets the manifest + provenance for free, but the cost is O(bundle-size) per `LoadAsync`. A future `ReadManifestAsync(Stream)` that skips the content read is a deferred optimisation.)
|
||||
|
||||
### `content.json` / `content.enc`
|
||||
|
||||
|
||||
Reference in New Issue
Block a user