diff --git a/Component-CentralUI.md b/Component-CentralUI.md index 3f4b658..8b0b25f 100644 --- a/Component-CentralUI.md +++ b/Component-CentralUI.md @@ -8,6 +8,22 @@ The Central UI is a web-based management interface hosted on the central cluster Central cluster only. Sites have no user interface. +## Technology + +- **Framework**: Blazor Server (ASP.NET Core). UI logic executes on the server, updates pushed to the browser via SignalR. +- Keeps the entire stack in C#/.NET, consistent with the rest of the system (Akka.NET, EF Core). +- SignalR provides built-in support for real-time UI updates. + +## Real-Time Updates + +All real-time features use **server push via SignalR** (built into Blazor Server): + +- **Debug view**: Attribute value and alarm state changes streamed live from sites. +- **Health dashboard**: Site status, connection health, error rates, and buffer depths update automatically when new health reports arrive. +- **Deployment status**: Pending/in-progress/success/failed transitions push to the UI immediately. + +No manual refresh or polling is required for any of these features. + ## Responsibilities - Provide authenticated access to all management workflows. diff --git a/Component-DeploymentManager.md b/Component-DeploymentManager.md index f4c945b..8ea2a08 100644 --- a/Component-DeploymentManager.md +++ b/Component-DeploymentManager.md @@ -41,6 +41,26 @@ Engineer (UI) → Deployment Manager (Central) └── 8. Update deployment status in config DB ``` +## Deployment Concurrency + +- **Same instance**: A deployment to an instance is **blocked** if a previous deployment to that instance is still in progress (waiting for site response). The UI shows the deployment is in progress and rejects the second request. This prevents conflicting state at the site. +- **Different instances**: Deployments to different instances can proceed **in parallel**, even at the same site. Each deployment tracks status independently. This supports the bulk "deploy all out-of-date instances" operation efficiently. + +## System-Wide Artifact Deployment Failure Handling + +When deploying artifacts (shared scripts, external system definitions, etc.) to all sites, each site reports success or failure **independently**: + +- The deployment status shows a per-site result matrix. +- Successful sites are **not rolled back** if other sites fail. +- The engineer can retry failed sites individually (e.g., when an offline site comes back online). +- This is consistent with the hub-and-spoke independence model — one site's unavailability does not affect others. + +## Deployment Status Persistence + +- Only the **current deployment status** per instance is stored in the configuration database (pending, in-progress, success, failed). +- No deployment history table — the audit log (via IAuditService) already captures every deployment action with who, what, when, and result. +- The Deployment Manager uses current status to determine staleness (is this instance up-to-date?) and display deployment results in the UI. + ## Deployment Scope - Deployment is performed at the **individual instance level**. diff --git a/Component-SiteEventLogging.md b/Component-SiteEventLogging.md index 854b7df..84e3937 100644 --- a/Component-SiteEventLogging.md +++ b/Component-SiteEventLogging.md @@ -41,7 +41,7 @@ Each event entry contains: - Events are stored in **local SQLite** on each site node. - Each node maintains its own event log (the active node generates events; the standby node generates minimal events related to replication). -- **Retention**: 30 days. A background job automatically purges events older than 30 days. +- **Retention**: 30 days. A **daily background job** runs on the active node and deletes all events older than 30 days. Hard delete — no archival. ## Central Access @@ -51,7 +51,9 @@ Each event entry contains: - Time range - Instance ID - Severity -- The site processes the query locally and returns matching results to central. + - **Keyword search**: Free-text search on message and source fields (SQLite LIKE query). Useful for finding events by script name, alarm name, or error message across all instances. +- Results are **paginated** with a configurable page size (default: 500 events). Each response includes a continuation token for fetching additional pages. This prevents broad queries from overwhelming the communication channel. +- The site processes the query locally against SQLite and returns matching results to central. ## Dependencies diff --git a/Component-StoreAndForward.md b/Component-StoreAndForward.md index 2fb84db..bbb5834 100644 --- a/Component-StoreAndForward.md +++ b/Component-StoreAndForward.md @@ -60,9 +60,10 @@ There is **no maximum buffer size**. Messages accumulate in the buffer until del ## Persistence - Buffered messages are persisted to a **local SQLite database** on each site node. -- The active node persists locally and forwards each buffer operation (add, remove, park) to the standby node via Akka.NET remoting. +- The active node persists locally and forwards each buffer operation (add, remove, park) to the standby node **asynchronously** via Akka.NET remoting. The active node does not wait for standby acknowledgment — this avoids adding latency to every script that buffers a message. - The standby node applies the same operations to its own local SQLite database. -- On failover, the new active node has a complete copy of the buffer and resumes delivery. +- On failover, the new active node has a near-complete copy of the buffer. In rare cases, the most recent operations may not have been replicated (e.g., a message added or removed just before failover). This can result in a few **duplicate deliveries** (message delivered but remove not replicated) or a few **missed retries** (message added but not replicated). Both are acceptable trade-offs for the latency benefit. +- On failover, the new active node resumes delivery from its local copy. ## Parked Message Management