diff --git a/CLAUDE.md b/CLAUDE.md index 9b6680e..425adcc 100644 --- a/CLAUDE.md +++ b/CLAUDE.md @@ -36,7 +36,7 @@ This project contains design documentation for a distributed SCADA system built - Use `git diff` to review changes before committing. - Commit related changes together with a descriptive message summarizing the design decision. -## Current Component List (20 components) +## Current Component List (21 components) 1. Template Engine — Template modeling, inheritance, composition, validation, flattening, diffs. 2. Deployment Manager — Central-side deployment pipeline, system-wide artifact deployment, instance lifecycle. @@ -45,7 +45,7 @@ This project contains design documentation for a distributed SCADA system built 5. Central–Site Communication — Akka.NET ClusterClient (command/control) + gRPC server-streaming (real-time data), message patterns, debug streaming. 6. Store-and-Forward Engine — Buffering, fixed-interval retry, parking, SQLite persistence, replication. 7. External System Gateway — External system definitions, API method invocation, database connections. -8. Notification Service — Notification lists, email delivery, store-and-forward integration. +8. Notification Service — Central-only notification-list and SMTP definitions, per-type delivery adapters (sites no longer deliver notifications). 9. Central UI — Web-based management interface, all workflows. 10. Security & Auth — LDAP/AD authentication, role-based authorization, site-scoped permissions. 11. Health Monitoring — Site health metrics collection and central reporting. @@ -58,6 +58,7 @@ This project contains design documentation for a distributed SCADA system built 18. Management Service — Akka.NET actor providing programmatic access to all admin operations, ClusterClientReceptionist registration. 19. CLI — Command-line tool using HTTP Management API, System.CommandLine, JSON/table output. 20. Traefik Proxy — Reverse proxy/load balancer fronting central cluster, active node routing via `/health/active`, automatic failover. +21. Notification Outbox — Central component ingesting store-and-forwarded notifications, `Notifications` audit table, dispatcher loop, retry/parking, delivery KPIs. ## Key Design Decisions (for context across sessions) @@ -88,6 +89,9 @@ This project contains design documentation for a distributed SCADA system built - Dual call modes: `ExternalSystem.Call()` (synchronous) and `ExternalSystem.CachedCall()` (store-and-forward on transient failure). - Error classification: HTTP 5xx/408/429/connection errors = transient; other 4xx = permanent (returned to script). - Notification Service: SMTP with OAuth2 Client Credentials (Microsoft 365) or Basic Auth. BCC delivery, plain text. +- Notification delivery is central-only: sites store-and-forward notifications to the central cluster (target = central, not SMTP); sites never talk to SMTP. Notification lists and SMTP config are no longer deployed to sites; recipient resolution happens at central, at delivery time. +- Notification lists carry a `Type` discriminator (`Email` now; `Teams` and others later). `Notify.To("list")` is type-agnostic; delivery is via per-type `INotificationDeliveryAdapter` (success/transient/permanent classification, same pattern as External System Gateway). +- `Notify.Send` is async — returns a `NotificationId` (GUID, idempotency key) status handle immediately. `Notify.Status(notificationId)` returns a status record (status, retry count, last error, key timestamps); answered site-locally as `Forwarding` while still in the site S&F buffer, otherwise round-trips to central. - Inbound API: `POST /api/{methodName}`, `X-API-Key` header, flat JSON, extended type system (Object, List). ### Templates & Deployment @@ -109,6 +113,7 @@ This project contains design documentation for a distributed SCADA system built - Async best-effort replication to standby (no ack wait). - Messages not cleared on instance deletion. - CachedCall idempotency is the caller's responsibility. +- Notification Outbox: central `NotificationOutboxActor` singleton on the active central node — the first centrally-hosted outbox (S&F Engine remains site-only). Owns the durable `Notifications` table in central MS SQL — the single source of audit truth (one row per notification). Dispatcher loop polls due rows, resolves the list, delivers via the typed adapter; transient failures retry to `Parked`, permanent failures park immediately. `Notifications` table is type-agnostic via the `Type` discriminator; status lifecycle `Pending → Retrying → Delivered / Parked / Discarded` (plus site-local `Forwarding`, never persisted centrally). Site→central handoff is at-least-once with ack-after-persist and insert-if-not-exists on `NotificationId`. No Akka replication — MS SQL is the HA store; daily purge of terminal rows after a configurable window (default 365 days). Retry reuses central SMTP max-retry-count and fixed interval. ### Security & Auth - Authentication: direct LDAP bind (username/password), no Kerberos/NTLM. LDAPS/StartTLS required. @@ -130,6 +135,7 @@ This project contains design documentation for a distributed SCADA system built - Health reports: 30s interval, 60s offline threshold, monotonic sequence numbers, raw error counts per interval. - Dead letter monitoring as a health metric. - Site Event Logging: 30-day retention, 1GB storage cap, daily purge, paginated queries with keyword search. +- Notification Outbox KPIs are central-computed point-in-time from the `Notifications` table (global + per-source-site): queue depth, stuck count, parked count, delivered-last-interval, oldest-pending age. Stuck = `Pending`/`Retrying` older than a configurable age threshold (default 10 min) — display-only (KPI count + row badge), no escalation/alerting. Headline KPI tiles surface on the Health dashboard; a new Central UI Notification Outbox page offers a queryable list with Retry/Discard actions on parked notifications. ### Code Organization - Entity classes are persistence-ignorant POCOs in Commons; EF mappings in Configuration Database.