docs(requirements): add cached-call telemetry pattern to Communication
This commit is contained in:
@@ -122,7 +122,7 @@ Keepalive settings are configurable via `CommunicationOptions`:
|
||||
- Site event logs.
|
||||
- Instance debug snapshots (attribute values and alarm states).
|
||||
- Central can also send management commands:
|
||||
- Retry or discard parked messages.
|
||||
- Retry or discard parked messages and parked cached calls — central sends `RetryParkedOperation` / `DiscardParkedOperation` (keyed by `TrackedOperationId`) to the owning site, which applies the change to its S&F buffer and tracking table.
|
||||
|
||||
### 9. Notification Submission (Site → Central)
|
||||
- **Pattern**: Fire-and-forget with acknowledgment.
|
||||
@@ -131,6 +131,14 @@ Keepalive settings are configurable via `CommunicationOptions`:
|
||||
- The `NotificationId` GUID — generated at the site — is the **idempotency key**. The handoff is at-least-once: a re-sent submission after a lost ack is harmless because central's insert-if-not-exists treats the duplicate as a no-op.
|
||||
- **Transport**: ClusterClient (site→central command/control), consistent with how other site→central messages are sent.
|
||||
|
||||
### 10. Cached Call Telemetry (Site → Central)
|
||||
- **Pattern**: Fire-and-forget telemetry with a periodic reconciliation pull.
|
||||
- The site **Store-and-Forward Engine** emits a `CachedCallTelemetry` message to central on **every** cached-call lifecycle transition (`Created`/`Pending → Retrying → Delivered`/`Parked`/`Failed`/`Discarded`). The message carries the `TrackedOperationId`, source site, kind, target summary, status, retry count, last error, key timestamps, and source provenance.
|
||||
- Emission is **best-effort and at-least-once**, **idempotent on `TrackedOperationId`** — central's Site Call Audit component ingests with insert-if-not-exists then upsert-on-newer-status, so a re-sent or out-of-order event is harmless.
|
||||
- **Reconciliation pull**: because telemetry is best-effort, the central **Site Call Audit** component periodically — and on site reconnect — issues a `CachedCallReconcileRequest` to each site; the site replies with a `CachedCallReconcileResponse` carrying all tracking rows changed since a cursor. Any telemetry missed during a disconnect self-heals through this pull.
|
||||
- Central audit is an **eventually-consistent mirror** — the site's operation tracking table remains the source of truth for cached-call status (`Tracking.Status(id)` is always answered site-locally).
|
||||
- **Transport**: ClusterClient (site→central command/control), consistent with how other site→central messages are sent.
|
||||
|
||||
## Topology
|
||||
|
||||
```
|
||||
@@ -182,6 +190,7 @@ Each request/response pattern has a default timeout that can be overridden in co
|
||||
| 5. Recipe/Command Delivery | 30 seconds | Fire-and-forget with ack |
|
||||
| 8. Remote Queries | 30 seconds | Querying parked messages or event logs |
|
||||
| 9. Notification Submission | 30 seconds | Fire-and-forget with ack; central acks after persisting the row |
|
||||
| 10. Cached Call Telemetry | 30 seconds | Reconciliation pull is request/response; telemetry emission itself is fire-and-forget |
|
||||
|
||||
Timeouts use the Akka.NET **ask pattern**. If no response is received within the timeout, the caller receives a timeout failure.
|
||||
|
||||
@@ -237,6 +246,7 @@ The ManagementActor is registered at the well-known path `/user/management` on c
|
||||
- **Site Runtime**: Receives deployments, lifecycle commands, and artifact updates. Provides debug view data.
|
||||
- **Central UI**: Debug view requests and remote queries flow through communication.
|
||||
- **Health Monitoring**: Receives periodic health reports from sites.
|
||||
- **Store-and-Forward Engine (site)**: Parked message queries/commands are routed through communication.
|
||||
- **Store-and-Forward Engine (site)**: Parked message queries/commands are routed through communication. Also emits `CachedCallTelemetry` and answers `CachedCallReconcileRequest` pulls, and receives relayed `RetryParkedOperation` / `DiscardParkedOperation` commands.
|
||||
- **Site Call Audit (central)**: Receives cached-call telemetry and reconciliation responses; issues reconciliation pulls and relays parked-operation Retry/Discard commands to sites through communication.
|
||||
- **Site Event Logging**: Event log queries are routed through communication.
|
||||
- **Management Service**: The ManagementActor is registered with ClusterClientReceptionist on central nodes. The CLI communicates with the ManagementActor via ClusterClient, which is a separate channel from inter-cluster remoting.
|
||||
|
||||
Reference in New Issue
Block a user