From bbfa0c515eedc4161e2e3ef315e4fb7f7d4c26bd Mon Sep 17 00:00:00 2001 From: Joseph Doherty Date: Mon, 18 May 2026 22:57:45 -0400 Subject: [PATCH] docs(plans): fold refinement decisions into notification outbox design Resolves the six open questions: host-level forward-retry config, Notify.Status returns a status record, 10-min stuck threshold, a site-local Forwarding state, site-side logging of forward failures only, and point-in-time KPIs computed from the Notifications table. --- docs/plans/notif.md | 44 +++++++++++++++++++++++++++++--------------- 1 file changed, 29 insertions(+), 15 deletions(-) diff --git a/docs/plans/notif.md b/docs/plans/notif.md index 88a5d18..82be56a 100644 --- a/docs/plans/notif.md +++ b/docs/plans/notif.md @@ -49,8 +49,10 @@ Central Notification Outbox actor (singleton, active central node) / retries exhausted → Parked ``` -`Notify.Status(notificationId)` round-trips site→central and reads the table. Before central -has ingested the row, status reads as `Pending` (in transit). +`Notify.Status(notificationId)` returns a small **status record** — status, retry count, +last error, and key timestamps (enqueued, delivered). While the notification is still in the +site S&F buffer the site answers the query **locally** (status `Forwarding`); once forwarded, +the query round-trips to central and reads the `Notifications` table. ## Component design @@ -81,7 +83,8 @@ Notifications and SMTP config are **no longer deployed to sites**. Sites never t Keeps its notification category, but the delivery *target* changes from SMTP to **central**. "Delivering" a buffered notification now means handing it to the Communication Layer for the central cluster and clearing it on central's ack. The site→central forward uses a fixed -retry interval (host-level config, since it concerns reaching central, not any list). +retry interval configured in the host `appsettings.json` — it concerns reaching the central +cluster rather than any notification list. ## Typed notification lists @@ -121,7 +124,10 @@ All timestamps are UTC. ### Status lifecycle -- `Pending` — ingested, awaiting first dispatch. +- `Forwarding` — in the site S&F buffer, not yet received by central. **Site-local only** — + never stored in the central `Notifications` table; reported by `Notify.Status` while the + site still holds the notification. +- `Pending` — ingested by central, awaiting first dispatch. - `Retrying` — a transient failure occurred; `NextAttemptAt` schedules the next attempt. - `Delivered` — terminal, success. - `Parked` — terminal-not-delivered: a permanent failure, or retries exhausted. `LastError` @@ -202,21 +208,29 @@ escalation or alerting, consistent with the current system-wide no-alerting poli | `Component-NotificationService.md` | Delivery moves central; lists gain a `Type`; no deploy-to-sites; async script API; delivery adapters. | | `Component-StoreAndForward.md` | Notification category retargeted from SMTP to central. | | `Component-HealthMonitoring.md` | Outbox KPIs added as central-computed headline metrics. | +| `Component-SiteEventLogging.md` | New Notification event category — logs site→central forward failures and long-buffered notifications. | | `Component-CentralUI.md` | New Notification Outbox page. | | Central–Site Communication | New `NotificationSubmit` + ack message pair. | | Configuration Database / Commons | `Notifications` table, entity POCO, repository interface + implementation, EF migration, message contracts. | | `README.md` | Component table 20 → 21. | | `CLAUDE.md` | Component list 20 → 21; new key design decisions. | -## Open questions for refinement +## Refinement decisions (2026-05-18) -- **Site→central forward retry config** — where the fixed forward-retry interval lives - (host appsettings vs a deployed setting). -- **`Notify.Status` payload** — whether status queries also return retry count / last error - to scripts, or just the status enum. -- **Stuck threshold default** — 10 minutes is a placeholder. -- **Pre-ingest status** — confirm `Pending` is the right reading for a notification still - in the site S&F buffer (vs a distinct "Forwarding" state). -- **Site-side diagnostics** — whether to keep a lightweight Site Event Logging entry for - "notification enqueued / forwarded," now that central holds the authoritative record. -- **KPI history** — KPIs are currently point-in-time; whether any trend/history is wanted. +- **Site→central forward retry config** — the fixed forward-retry interval lives in the host + `appsettings.json` (infrastructure config, not a deployed artifact). +- **`Notify.Status` payload** — returns a status record: status, retry count, last error, + and key timestamps (enqueued, delivered). +- **Stuck threshold default** — 10 minutes, configurable. +- **Pre-ingest status** — a distinct site-local `Forwarding` state; the site answers + `Notify.Status` from its own S&F buffer without a round-trip to central. +- **Site-side diagnostics** — Site Event Logging records site→central **forward failures** + and long-buffered notifications only, not routine enqueue/forward success events. +- **KPI history** — point-in-time only, computed on demand from the `Notifications` table; + the ~1-year row retention answers historical questions directly, so no separate + time-series store is added. + +## Open questions + +None outstanding — the basic design is fully specified. The next step is an implementation +plan against the cross-document impact table.