Files
scadalink-design/Component-DeploymentManager.md

150 lines
9.1 KiB
Markdown

# Component: Deployment Manager
## Purpose
The Deployment Manager orchestrates the process of deploying configurations from the central cluster to site clusters. It coordinates between the Template Engine (which produces flattened and validated configs), the Communication Layer (which delivers them), and tracks deployment status. It also manages system-wide artifact deployment and instance lifecycle commands (disable, enable, delete).
## Location
Central cluster only. The site-side deployment responsibilities (receiving configs, spawning Instance Actors) are handled by the Site Runtime component.
## Responsibilities
- Accept deployment requests from the Central UI for individual instances.
- Request flattened and validated configurations from the Template Engine.
- Request diffs between currently deployed and template-derived configurations from the Template Engine.
- Send flattened configurations to site clusters via the Communication Layer.
- Track deployment status (pending, in-progress, success, failed).
- Handle deployment failures gracefully — if a site is unreachable or the deployment fails, report the failure. No retry or buffering at central.
- If a central failover occurs during deployment, the deployment is treated as failed and must be re-initiated.
- Deploy system-wide artifacts (shared scripts, external system definitions, database connection definitions, data connection definitions, notification lists, SMTP configuration) to all sites or to an individual site on explicit request.
- Send instance lifecycle commands (disable, enable, delete) to sites via the Communication Layer.
## Deployment Flow
```
Engineer (UI) → Deployment Manager (Central)
├── 1. Request validated + flattened config from Template Engine
│ (validation includes flattening, script compilation,
│ trigger references, connection binding completeness)
├── 2. If validation fails → return errors to UI, stop
├── 3. Send config to site via Communication Layer
│ │
│ ▼
│ Site Runtime (Deployment Manager Singleton)
│ ├── 4. Store new flattened config locally (SQLite)
│ ├── 5. Compile scripts at site
│ ├── 6. Create/update Instance Actor (with child Script + Alarm Actors)
│ └── 7. Report success/failure back to central
└── 8. Update deployment status in config DB
```
## Deployment Identity & Idempotency
- Every deployment is assigned a unique **deployment ID** and includes the flattened configuration's **revision hash** (from the Template Engine).
- Site-side apply is **idempotent on deployment ID** — if the same deployment is received twice (e.g., after a timeout where the site actually applied it), the site responds with "already applied" rather than re-applying.
- Sites **reject stale configurations** — if a deployment carries an older revision hash than what is already applied, the site rejects it and reports the current version.
- After a central failover or timeout, the Deployment Manager **queries the site for current deployment state** before allowing a re-deploy. This prevents duplicate application and out-of-order config changes.
## Operation Concurrency
All mutating operations on a single instance (deploy, disable, enable, delete) share a **per-instance operation lock**:
- Only one mutating operation per instance can be in-flight at a time. A second operation is rejected with an "operation in progress" error.
- **Different instances**: Operations on different instances can proceed **in parallel**, even at the same site. Each tracks status independently. This supports the bulk "deploy all out-of-date instances" operation efficiently.
### Allowed State Transitions
| Current State | Deploy | Disable | Enable | Delete |
|---------------|--------|---------|--------|--------|
| Enabled | Yes | Yes | No (already enabled) | Yes |
| Disabled | Yes (enables on apply) | No (already disabled) | Yes | Yes |
| Not deployed | Yes (initial deploy) | No | No | No |
## System-Wide Artifact Deployment Failure Handling
When deploying artifacts (shared scripts, external system definitions, etc.) to all sites, each site reports success or failure **independently**:
- The deployment status shows a per-site result matrix.
- Successful sites are **not rolled back** if other sites fail.
- The engineer can retry failed sites individually (e.g., when an offline site comes back online).
- This is consistent with the hub-and-spoke independence model — one site's unavailability does not affect others.
## Deployment Status Persistence
- Only the **current deployment status** per instance is stored in the configuration database (pending, in-progress, success, failed).
- No deployment history table — the audit log (via IAuditService) already captures every deployment action with who, what, when, and result.
- The Deployment Manager uses current status to determine staleness (is this instance up-to-date?) and display deployment results in the UI.
## Deployment Scope
- Deployment is performed at the **individual instance level**.
- The UI may provide convenience operations (e.g., "deploy all out-of-date instances at Site A"), but these decompose into individual instance deployments.
## Diff View
Before deploying, the Deployment Manager can request a diff from the Template Engine showing:
- **Added** attributes, alarms, or scripts (new in the template since last deploy).
- **Removed** members (removed from template since last deploy).
- **Changed** values (attribute values, alarm thresholds, script code that differ).
- **Connection binding changes** (data connection references that changed).
## Deployed vs. Template-Derived State
The system maintains two views per instance:
- **Deployed Configuration**: What is currently running at the site, as of the last successful deployment.
- **Template-Derived Configuration**: What the instance would look like if deployed now, based on the current state of its template hierarchy and instance overrides.
These are compared to determine staleness and generate diffs.
## Deployable Artifacts
A deployment to a site includes the flattened instance configuration plus any system-wide artifacts that have changed:
- Shared scripts
- External system definitions
- Database connection definitions
- Data connection definitions
- Notification lists
- SMTP configuration
System-wide artifact deployment is a **separate action** from instance deployment, triggered explicitly by a user with the Deployment role. Artifacts can be deployed to all sites at once or to an individual site (per-site deployment via the Sites admin page).
## Site-Side Apply Atomicity
Applying a deployment at the site is **all-or-nothing per instance**:
- The site stores the new config, compiles all scripts, and creates/updates the Instance Actor as a single operation.
- If any step fails (e.g., script compilation), the entire deployment for that instance is rejected. The previous configuration remains active and unchanged.
- The site reports the specific failure reason (e.g., compilation error details) back to central.
## System-Wide Artifact Version Compatibility
- Cross-site version skew for artifacts (shared scripts, external system definitions, data connection definitions, etc.) is **supported** — sites can temporarily run different artifact versions after a partial deployment.
- Artifacts are self-contained and site-independent. A site running an older version of shared scripts continues to operate correctly with its current instance configurations.
- The Central UI clearly indicates which sites have pending artifact updates so engineers can remediate.
## Instance Lifecycle Commands
The Deployment Manager sends the following commands to sites via the Communication Layer:
- **Disable**: Instructs the site to stop the Instance Actor's data subscriptions, script triggers, and alarm evaluation. The deployed configuration is retained for re-enablement.
- **Enable**: Instructs the site to re-activate a disabled instance.
- **Delete**: Instructs the site to remove the running configuration and destroy the Instance Actor and its children. Store-and-forward messages are not cleared. If the site is unreachable, the delete command **fails** — the central side does not mark the instance as deleted until the site confirms.
## Dependencies
- **Template Engine**: Produces flattened configurations, diffs, and validation results.
- **Communication Layer**: Delivers configurations and lifecycle commands to sites.
- **Configuration Database (MS SQL)**: Stores deployment status and deployed configuration snapshots.
- **Security & Auth**: Enforces Deployment role (with optional site scoping).
- **Configuration Database (via IAuditService)**: Logs all deployment actions, system-wide artifact deployments, and instance lifecycle changes.
## Interactions
- **Central UI**: Engineers trigger deployments, view diffs/status, manage instance lifecycle, and deploy system-wide artifacts.
- **Template Engine**: Provides resolved and validated configurations.
- **Site Runtime**: Receives and applies configurations and lifecycle commands.
- **Health Monitoring**: Deployment failures contribute to site health status.