Files
scadalink-design/docs/plans/phase-6-deployment-ops-ui.md
Joseph Doherty 021817930b Generate all 11 phase implementation plans with bullet-level requirement traceability
All phases (0-8) now have detailed implementation plans with:
- Bullet-level requirement extraction from HighLevelReqs sections
- Design constraint traceability (KDD + Component Design)
- Work packages with acceptance criteria mapped to every requirement
- Split-section ownership verified across phases
- Orphan checks (forward, reverse, negative) all passing
- Codex MCP (gpt-5.4) external verification completed per phase

Total: 7,549 lines across 11 plan documents, ~160 work packages,
~400 requirements traced, ~25 open questions logged for follow-up.
2026-03-16 15:34:54 -04:00

359 lines
19 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Phase 6: Deployment Operations & Troubleshooting UI
**Date**: 2026-03-16
**Status**: Plan complete
**Goal**: Complete the operational loop — deploy, diagnose, troubleshoot from central.
---
## Scope
**Components**: Central UI (deployment + troubleshooting workflows)
**Features**:
- Staleness indicators (revision hash comparison)
- Diff view (added/removed/changed)
- Deploy with pre-validation gating
- Deployment status tracking (live SignalR)
- System-wide artifact deployment with per-site status matrix
- Debug view (instance selection, snapshot + live stream via SignalR)
- Site event log viewer (remote query with filters, pagination, keyword search)
- Parked message management (query, retry, discard)
- Audit log viewer (query with filters)
---
## Prerequisites
| Phase | What must be complete |
|-------|-----------------------|
| Phase 1 | Central UI Blazor Server shell, login, route protection, Security & Auth, Configuration Database, IAuditService |
| Phase 2 | Template Engine: flattening, diff calculation, validation, revision hashing |
| Phase 3A | Cluster Infrastructure, Site Runtime Deployment Manager singleton |
| Phase 3B | Communication Layer (all 8 patterns), Health Monitoring, Site Event Logging, site-wide Akka stream |
| Phase 3C | Deployment Manager (full pipeline), Store-and-Forward Engine (full) |
| Phase 4 | Operator/Admin UI: health dashboard, instance list, deployment status view (basic) |
| Phase 5 | Design-time authoring UI (templates, instances, definitions) |
---
## Requirements Checklist
### Section 1.4 — Deployment Behavior (UI portion)
- [ ] `[1.4-1-ui]` Site applies config immediately upon receipt — deployment status reflects this (no confirmation step in UI)
- [ ] `[1.4-3-ui]` Site reports back success/failure — UI shows deployment result
- [ ] `[1.4-4-ui]` Pre-deployment validation runs before deployment — UI displays validation errors and blocks deployment
### Section 1.5 — System-Wide Artifact Deployment (UI portion)
- [ ] `[1.5-1-ui]` Changes not automatically propagated — UI shows separate "Deploy Artifacts" action
- [ ] `[1.5-2-ui]` Deployment requires explicit action by Deployment role — UI enforces role check
- [ ] `[1.5-3-ui]` Design role manages definitions; Deployment role triggers deployment — clear separation in UI
### Section 3.9 — Template Deployment & Change Propagation (UI portion)
- [ ] `[3.9-1-ui]` Template changes not automatically propagated — staleness indicators show which instances are out of date
- [ ] `[3.9-2-ui]` Two views: deployed vs. template-derived — UI enables comparison
- [ ] `[3.9-3-ui]` Deployment at individual instance level — UI provides per-instance deploy action
- [ ] `[3.9-4-ui]` Show differences between deployed and template-derived config — diff view
- [ ] `[3.9-5-ui]` No rollback — UI does not offer rollback action
### Section 5.4 — Parked Message Management (UI portion)
- [ ] `[5.4-1-ui]` Parked messages stored at site — UI queries sites remotely
- [ ] `[5.4-2-ui]` Central UI can query sites for parked messages — query UI
- [ ] `[5.4-3-ui]` Operators can retry or discard parked messages — action buttons
- [ ] `[5.4-4-ui]` Covers external system calls, notifications, and cached database writes — all three categories shown
### Section 8 — Central UI (deployment + troubleshooting workflows, Phase 6 owns)
- [ ] `[8-deploy-1]` Deployment: View diffs between deployed and current template-derived configurations
- [ ] `[8-deploy-2]` Deployment: Deploy updates to individual instances
- [ ] `[8-deploy-3]` Deployment: Filter instances by area
- [ ] `[8-deploy-4]` Deployment: Pre-deployment validation runs automatically — errors block deployment
- [ ] `[8-deploy-5]` System-Wide Artifact Deployment: explicitly deploy shared scripts, external system definitions, DB connection definitions, notification lists to all sites
- [ ] `[8-deploy-6]` Deployment Status Monitoring: Track deployment success/failure at site level
- [ ] `[8-deploy-7]` Parked Message Management: Query sites, view details, retry or discard
- [ ] `[8-deploy-8]` Site Event Log Viewer: Query and view operational event logs from sites
### Section 8.1 — Debug View
- [ ] `[8.1-1]` Subscribe-on-demand — central subscribes to site-wide Akka stream filtered by instance
- [ ] `[8.1-2]` Site provides initial snapshot of all current attribute values and alarm states
- [ ] `[8.1-3]` Attribute value stream: [InstanceUniqueName].[AttributePath].[AttributeName], value, quality, timestamp
- [ ] `[8.1-4]` Alarm state stream: [InstanceUniqueName].[AlarmName], state (active/normal), priority, timestamp
- [ ] `[8.1-5]` Stream continues until engineer closes debug view — central unsubscribes
- [ ] `[8.1-6]` No attribute/alarm selection — always shows all for the instance
- [ ] `[8.1-7]` No special concurrency limits required
### Section 10.110.3 — Audit Log (UI portion)
- [ ] `[10.1-ui]` Audit logs stored in config DB — UI queries config DB
- [ ] `[10.2-ui]` All system-modifying actions logged — viewer covers all categories
- [ ] `[10.3-ui]` Each entry: who, what (action, entity type, entity ID, entity name), when, state (JSON after-state) — UI displays all fields
- [ ] `[10.3-2-ui]` Change history reconstructed by comparing consecutive entries — UI shows before/after by comparing entries
### Section 12.3 — Central Access to Event Logs
- [ ] `[12.3-1]` Central UI can query site event logs remotely via Communication Layer
- [ ] `[12.3-2]` Queries support filtering by event type, time range, instance, severity, keyword search
- [ ] `[12.3-3]` Results are paginated (default 500 per page) with continuation token
---
## Design Constraints Checklist
| ID | Constraint | Source | Mapped WP |
|----|-----------|--------|-----------|
| KDD-ui-1 | Blazor Server (ASP.NET Core + SignalR), Bootstrap, clean corporate design | CLAUDE.md | All WPs |
| KDD-ui-2 | Real-time push for debug view, health dashboard, deployment status | CLAUDE.md | WP-4, WP-6 |
| KDD-deploy-5 | Flattened configs include revision hash for staleness detection | CLAUDE.md | WP-1 |
| KDD-deploy-9 | System-wide artifact version skew across sites supported | CLAUDE.md | WP-5 |
| KDD-deploy-11 | Optimistic concurrency on deployment status records | CLAUDE.md | WP-4 |
| CD-DM-1 | Diff shows added/removed/changed attributes, alarms, scripts, connection binding changes | Component-DeploymentManager | WP-2 |
| CD-DM-2 | Per-site result matrix for system-wide artifact deployment; successful sites not rolled back | Component-DeploymentManager | WP-5 |
| CD-DM-3 | Retry failed sites individually after system-wide artifact deployment | Component-DeploymentManager | WP-5 |
| CD-DM-4 | Central UI indicates which sites have pending artifact updates | Component-DeploymentManager | WP-5 |
| CD-COMM-1 | Debug streams lost on failover — must be re-opened by user | Component-Communication | WP-6 |
| CD-COMM-2 | Debug view: subscribe → snapshot → stream → unsubscribe pattern | Component-Communication | WP-6 |
| CD-SEL-1 | Event log queries paginated with continuation token (500/page default) | Component-SiteEventLogging | WP-7 |
| CD-SEL-2 | Keyword search on message and source fields (SQLite LIKE) | Component-SiteEventLogging | WP-7 |
| CD-SEL-3 | Event log filters: event type, time range, instance ID, severity | Component-SiteEventLogging | WP-7 |
| CD-SF-1 | Parked message details: target, payload, retry count, timestamps | Component-StoreAndForward | WP-8 |
| CD-AUD-1 | Audit log filter: user, entity type, action type, time range | Component-CentralUI | WP-9 |
| CD-AUD-2 | Before/after state by comparing consecutive entries | Component-CentralUI | WP-9 |
---
## Work Packages
### WP-1: Staleness Indicators (S)
**Description**: Show which instances have out-of-date deployed configurations by comparing revision hashes.
**Acceptance Criteria**:
- Instance list shows staleness indicator (e.g., icon/badge) when deployed revision hash differs from current template-derived revision hash (`[3.9-1-ui]`, `KDD-deploy-5`)
- Two views accessible: deployed configuration and template-derived configuration (`[3.9-2-ui]`)
- Staleness detection does not require a full diff — uses revision hash comparison only (`KDD-deploy-5`)
- Filter/sort by staleness state
**Complexity**: S
**Traces**: `[3.9-1-ui]`, `[3.9-2-ui]`, `[3.9-5-ui]`, KDD-deploy-5
---
### WP-2: Diff View (M)
**Description**: Display differences between the deployed configuration and the current template-derived configuration.
**Acceptance Criteria**:
- Diff view shows added, removed, and changed members (attributes, alarms, scripts) (`[3.9-4-ui]`, `[8-deploy-1]`, `CD-DM-1`)
- Connection binding changes shown in diff (`CD-DM-1`)
- Clear visual distinction between additions (new members), removals, and modifications
- Diff calculated on demand when user views it
**Complexity**: M
**Traces**: `[3.9-4-ui]`, `[8-deploy-1]`, CD-DM-1
---
### WP-3: Deploy with Pre-Validation Gating (M)
**Description**: Deploy action on individual instances that automatically runs pre-deployment validation and blocks on errors.
**Acceptance Criteria**:
- Deploy action available per instance (`[3.9-3-ui]`, `[8-deploy-2]`)
- Pre-deployment validation runs automatically before deployment is sent (`[1.4-4-ui]`, `[8-deploy-4]`)
- Validation errors displayed clearly and block the deployment
- Filter instances by site, area, template (`[8-deploy-3]`)
- Site applies config immediately — no confirmation step shown in UI for site side (`[1.4-1-ui]`)
- No rollback action offered (`[3.9-5-ui]`)
- Deployment role required
**Complexity**: M
**Traces**: `[1.4-1-ui]`, `[1.4-4-ui]`, `[3.9-3-ui]`, `[3.9-5-ui]`, `[8-deploy-2]`, `[8-deploy-3]`, `[8-deploy-4]`
---
### WP-4: Deployment Status Tracking (Live SignalR) (M)
**Description**: Real-time deployment status updates pushed to the UI via SignalR.
**Acceptance Criteria**:
- Deployment status (pending, in-progress, success, failed) updates in real-time via SignalR push (`KDD-ui-2`, `[8-deploy-6]`)
- Site reports success/failure — UI reflects result (`[1.4-3-ui]`)
- Optimistic concurrency on status records handled gracefully (`KDD-deploy-11`)
- Status shown per instance with timestamp
- No manual refresh required
**Complexity**: M
**Traces**: `[1.4-3-ui]`, `[8-deploy-6]`, KDD-ui-2, KDD-deploy-11
---
### WP-5: System-Wide Artifact Deployment with Per-Site Status Matrix (L)
**Description**: UI for deploying shared scripts, external system definitions, DB connection definitions, and notification lists to all sites.
**Acceptance Criteria**:
- Separate "Deploy Artifacts" action — not automatically triggered when definitions change (`[1.5-1-ui]`, `[8-deploy-5]`)
- Deployment role required (`[1.5-2-ui]`)
- Design role manages definitions; Deployment role triggers deployment — clear separation (`[1.5-3-ui]`)
- Per-site status matrix showing success/failure for each site (`CD-DM-2`)
- Successful sites not rolled back if others fail (`CD-DM-2`)
- Individual site retry for failed sites (`CD-DM-3`)
- UI indicates which sites have pending artifact updates (`CD-DM-4`)
- Cross-site version skew supported — display shows version status per site (`KDD-deploy-9`)
**Complexity**: L
**Traces**: `[1.5-1-ui]``[1.5-3-ui]`, `[8-deploy-5]`, KDD-deploy-9, CD-DM-2, CD-DM-3, CD-DM-4
---
### WP-6: Debug View (L)
**Description**: On-demand real-time view of a specific instance's attribute values and alarm states streamed via SignalR.
**Acceptance Criteria**:
- Select a deployed instance and open debug view (`[8.1-1]`)
- Initial snapshot of all current attribute values and alarm states received from site (`[8.1-2]`)
- Attribute value stream formatted as `[InstanceUniqueName].[AttributePath].[AttributeName]`, value, quality, timestamp (`[8.1-3]`)
- Alarm state stream formatted as `[InstanceUniqueName].[AlarmName]`, state, priority, timestamp (`[8.1-4]`)
- Live updates pushed via SignalR — no polling (`KDD-ui-2`)
- Stream continues until user closes the debug view; central unsubscribes on close (`[8.1-5]`)
- All attributes and alarms shown — no selection filtering (`[8.1-6]`)
- No concurrency limits enforced (`[8.1-7]`)
- On failover, debug stream is lost; user must re-open (`CD-COMM-1`)
- Subscribe → snapshot → stream → unsubscribe lifecycle (`CD-COMM-2`)
- Deployment role required
**Complexity**: L
**Traces**: `[8.1-1]``[8.1-7]`, KDD-ui-2, CD-COMM-1, CD-COMM-2
---
### WP-7: Site Event Log Viewer (M)
**Description**: UI for querying and viewing operational event logs from site clusters remotely.
**Acceptance Criteria**:
- Remote query to sites via Communication Layer (`[12.3-1]`, `[8-deploy-8]`)
- Filter by event type/category, time range, instance ID, severity (`CD-SEL-3`, `[12.3-2]`)
- Keyword search on message and source fields (`CD-SEL-2`, `[12.3-2]`)
- Paginated results with continuation token support (default 500/page) (`CD-SEL-1`, `[12.3-3]`)
- Display all event categories: script executions (start, complete, error), alarm events (activated, cleared, evaluation errors), deployment events (received, compiled, applied, failed), connection status changes, S&F activity (queued, delivered, retried, parked), instance lifecycle (enable, disable, delete)
- Deployment role required
**Complexity**: M
**Traces**: `[12.3-1]``[12.3-3]`, `[8-deploy-8]`, CD-SEL-1, CD-SEL-2, CD-SEL-3
---
### WP-8: Parked Message Management (M)
**Description**: UI for querying, viewing, retrying, and discarding parked messages at sites.
**Acceptance Criteria**:
- Query sites for parked messages remotely (`[5.4-1-ui]`, `[5.4-2-ui]`, `[8-deploy-7]`)
- View message details: target, payload, retry count, timestamps (`CD-SF-1`)
- All three message categories shown: external system calls, notifications, cached database writes (`[5.4-4-ui]`)
- Retry action moves message back to retry queue (`[5.4-3-ui]`)
- Discard action removes message permanently (`[5.4-3-ui]`)
- Deployment role required
**Complexity**: M
**Traces**: `[5.4-1-ui]``[5.4-4-ui]`, `[8-deploy-7]`, CD-SF-1
---
### WP-9: Audit Log Viewer (M)
**Description**: UI for querying the central audit log with filters.
**Acceptance Criteria**:
- Query audit log from configuration database (`[10.1-ui]`)
- All system-modifying action categories visible (`[10.2-ui]`)
- Each entry displays: who (user), what (action, entity type, entity ID, entity name), when (timestamp), state (JSON after-state) (`[10.3-ui]`)
- Filter by user, entity type, action type, time range (`CD-AUD-1`)
- Before/after state comparison by viewing consecutive entries for the same entity (`[10.3-2-ui]`, `CD-AUD-2`)
- Admin role required
**Complexity**: M
**Traces**: `[10.1-ui]``[10.3-2-ui]`, CD-AUD-1, CD-AUD-2
---
## Test Strategy
### Unit Tests
- Staleness indicator rendering based on revision hash comparison
- Diff view component rendering for added/removed/changed members
- Deployment status SignalR update handling
- Debug view snapshot rendering and stream update handling
- Event log filter building and pagination logic
- Parked message action button state logic
- Audit log filter building and entry rendering
### Integration Tests
- Deploy workflow: view diff → validate → deploy → track status via SignalR → verify success
- Deploy with validation failure → verify deployment blocked
- System-wide artifact deployment → verify per-site status matrix → retry failed site
- Debug view: open → receive snapshot → receive stream updates → close → verify unsubscribe
- Event log viewer: query with filters → paginate → verify results match
- Parked message: query → retry → verify message moves back to queue; query → discard → verify removed
- Audit log: query with filters → verify entries displayed with correct detail
### Negative Tests
- Attempt deploy on instance with validation errors → verify blocked
- No rollback action exists in UI → verify absent
- Non-Deployment user attempts deploy → verify access denied
- Non-Admin user attempts audit log viewer → verify access denied
- Debug view during failover → verify stream lost, user must re-open
- Query event log on unreachable site → verify graceful error
---
## Verification Gate
Phase 6 is complete when:
1. All 9 work packages pass acceptance criteria
2. All unit and integration tests pass
3. All negative tests verify prohibited behaviors
4. A Deployment user can perform a full operational loop: view stale instances → view diff → deploy → track live status → open debug view → view event logs → manage parked messages
5. An Admin user can query the audit log with filters and view change details
6. Real-time features (deployment status, debug view) work via SignalR without polling
7. System-wide artifact deployment shows per-site status matrix with retry capability
---
## Open Questions
No new questions discovered during Phase 6 plan generation.
---
## Split-Section Verification
| Section | Phase 6 Bullets | Other Phase(s) | Other Phase Bullets |
|---------|----------------|-----------------|---------------------|
| 1.4 | `[1.4-1-ui]`, `[1.4-3-ui]`, `[1.4-4-ui]` (UI) | Phase 3C | `[1.4-1]``[1.4-4]` backend pipeline |
| 1.5 | `[1.5-1-ui]``[1.5-3-ui]` (UI) | Phase 3C | Backend artifact deployment |
| 3.9 | `[3.9-1-ui]``[3.9-5-ui]`, `[3.9-6]` in Phase 5 | Phase 3C | Backend pipeline, status persistence |
| 5.4 | `[5.4-1-ui]``[5.4-4-ui]` (UI) | Phase 3C | Backend parked message storage and management |
| 8 | `[8-deploy-1]``[8-deploy-8]` | Phase 4, 5 | Admin/operator, design workflows |
| 8.1 | `[8.1-1]``[8.1-7]` (all) | — | No split (Phase 6 owns entirely) |
| 10.110.3 | `[10.1-ui]``[10.3-2-ui]` (viewer UI) | Phase 1 | Backend storage, IAuditService, transactional guarantee |
| 12.3 | `[12.3-1]``[12.3-3]` (all) | — | No split (Phase 6 owns entirely) |
---
## Orphan Check Result
**Forward check**: Every Requirements Checklist item and Design Constraints Checklist item maps to at least one work package with acceptance criteria that would fail if the requirement were not implemented. PASS.
**Reverse check**: Every work package traces back to at least one requirement or design constraint. No untraceable work. PASS.
**Split-section check**: All split sections verified above. Phase 6 covers UI presentation for deployment/operations workflows. Backend functionality is in Phase 3C (deployment pipeline, S&F) and Phase 1 (audit service). No unassigned bullets found. PASS.
**Negative requirement check**: The following negative requirements have explicit acceptance criteria:
- `[3.9-5-ui]` No rollback — verified in WP-3 (no rollback action offered)
- `[1.5-1-ui]` Not automatically propagated — verified in WP-5 (separate action required)
- `[8.1-7]` No concurrency limits — verified in WP-6
PASS.
**Codex MCP verification**: Skipped — external tool verification deferred.