All phases (0-8) now have detailed implementation plans with: - Bullet-level requirement extraction from HighLevelReqs sections - Design constraint traceability (KDD + Component Design) - Work packages with acceptance criteria mapped to every requirement - Split-section ownership verified across phases - Orphan checks (forward, reverse, negative) all passing - Codex MCP (gpt-5.4) external verification completed per phase Total: 7,549 lines across 11 plan documents, ~160 work packages, ~400 requirements traced, ~25 open questions logged for follow-up.
359 lines
19 KiB
Markdown
359 lines
19 KiB
Markdown
# Phase 6: Deployment Operations & Troubleshooting UI
|
||
|
||
**Date**: 2026-03-16
|
||
**Status**: Plan complete
|
||
**Goal**: Complete the operational loop — deploy, diagnose, troubleshoot from central.
|
||
|
||
---
|
||
|
||
## Scope
|
||
|
||
**Components**: Central UI (deployment + troubleshooting workflows)
|
||
|
||
**Features**:
|
||
- Staleness indicators (revision hash comparison)
|
||
- Diff view (added/removed/changed)
|
||
- Deploy with pre-validation gating
|
||
- Deployment status tracking (live SignalR)
|
||
- System-wide artifact deployment with per-site status matrix
|
||
- Debug view (instance selection, snapshot + live stream via SignalR)
|
||
- Site event log viewer (remote query with filters, pagination, keyword search)
|
||
- Parked message management (query, retry, discard)
|
||
- Audit log viewer (query with filters)
|
||
|
||
---
|
||
|
||
## Prerequisites
|
||
|
||
| Phase | What must be complete |
|
||
|-------|-----------------------|
|
||
| Phase 1 | Central UI Blazor Server shell, login, route protection, Security & Auth, Configuration Database, IAuditService |
|
||
| Phase 2 | Template Engine: flattening, diff calculation, validation, revision hashing |
|
||
| Phase 3A | Cluster Infrastructure, Site Runtime Deployment Manager singleton |
|
||
| Phase 3B | Communication Layer (all 8 patterns), Health Monitoring, Site Event Logging, site-wide Akka stream |
|
||
| Phase 3C | Deployment Manager (full pipeline), Store-and-Forward Engine (full) |
|
||
| Phase 4 | Operator/Admin UI: health dashboard, instance list, deployment status view (basic) |
|
||
| Phase 5 | Design-time authoring UI (templates, instances, definitions) |
|
||
|
||
---
|
||
|
||
## Requirements Checklist
|
||
|
||
### Section 1.4 — Deployment Behavior (UI portion)
|
||
- [ ] `[1.4-1-ui]` Site applies config immediately upon receipt — deployment status reflects this (no confirmation step in UI)
|
||
- [ ] `[1.4-3-ui]` Site reports back success/failure — UI shows deployment result
|
||
- [ ] `[1.4-4-ui]` Pre-deployment validation runs before deployment — UI displays validation errors and blocks deployment
|
||
|
||
### Section 1.5 — System-Wide Artifact Deployment (UI portion)
|
||
- [ ] `[1.5-1-ui]` Changes not automatically propagated — UI shows separate "Deploy Artifacts" action
|
||
- [ ] `[1.5-2-ui]` Deployment requires explicit action by Deployment role — UI enforces role check
|
||
- [ ] `[1.5-3-ui]` Design role manages definitions; Deployment role triggers deployment — clear separation in UI
|
||
|
||
### Section 3.9 — Template Deployment & Change Propagation (UI portion)
|
||
- [ ] `[3.9-1-ui]` Template changes not automatically propagated — staleness indicators show which instances are out of date
|
||
- [ ] `[3.9-2-ui]` Two views: deployed vs. template-derived — UI enables comparison
|
||
- [ ] `[3.9-3-ui]` Deployment at individual instance level — UI provides per-instance deploy action
|
||
- [ ] `[3.9-4-ui]` Show differences between deployed and template-derived config — diff view
|
||
- [ ] `[3.9-5-ui]` No rollback — UI does not offer rollback action
|
||
|
||
### Section 5.4 — Parked Message Management (UI portion)
|
||
- [ ] `[5.4-1-ui]` Parked messages stored at site — UI queries sites remotely
|
||
- [ ] `[5.4-2-ui]` Central UI can query sites for parked messages — query UI
|
||
- [ ] `[5.4-3-ui]` Operators can retry or discard parked messages — action buttons
|
||
- [ ] `[5.4-4-ui]` Covers external system calls, notifications, and cached database writes — all three categories shown
|
||
|
||
### Section 8 — Central UI (deployment + troubleshooting workflows, Phase 6 owns)
|
||
- [ ] `[8-deploy-1]` Deployment: View diffs between deployed and current template-derived configurations
|
||
- [ ] `[8-deploy-2]` Deployment: Deploy updates to individual instances
|
||
- [ ] `[8-deploy-3]` Deployment: Filter instances by area
|
||
- [ ] `[8-deploy-4]` Deployment: Pre-deployment validation runs automatically — errors block deployment
|
||
- [ ] `[8-deploy-5]` System-Wide Artifact Deployment: explicitly deploy shared scripts, external system definitions, DB connection definitions, notification lists to all sites
|
||
- [ ] `[8-deploy-6]` Deployment Status Monitoring: Track deployment success/failure at site level
|
||
- [ ] `[8-deploy-7]` Parked Message Management: Query sites, view details, retry or discard
|
||
- [ ] `[8-deploy-8]` Site Event Log Viewer: Query and view operational event logs from sites
|
||
|
||
### Section 8.1 — Debug View
|
||
- [ ] `[8.1-1]` Subscribe-on-demand — central subscribes to site-wide Akka stream filtered by instance
|
||
- [ ] `[8.1-2]` Site provides initial snapshot of all current attribute values and alarm states
|
||
- [ ] `[8.1-3]` Attribute value stream: [InstanceUniqueName].[AttributePath].[AttributeName], value, quality, timestamp
|
||
- [ ] `[8.1-4]` Alarm state stream: [InstanceUniqueName].[AlarmName], state (active/normal), priority, timestamp
|
||
- [ ] `[8.1-5]` Stream continues until engineer closes debug view — central unsubscribes
|
||
- [ ] `[8.1-6]` No attribute/alarm selection — always shows all for the instance
|
||
- [ ] `[8.1-7]` No special concurrency limits required
|
||
|
||
### Section 10.1–10.3 — Audit Log (UI portion)
|
||
- [ ] `[10.1-ui]` Audit logs stored in config DB — UI queries config DB
|
||
- [ ] `[10.2-ui]` All system-modifying actions logged — viewer covers all categories
|
||
- [ ] `[10.3-ui]` Each entry: who, what (action, entity type, entity ID, entity name), when, state (JSON after-state) — UI displays all fields
|
||
- [ ] `[10.3-2-ui]` Change history reconstructed by comparing consecutive entries — UI shows before/after by comparing entries
|
||
|
||
### Section 12.3 — Central Access to Event Logs
|
||
- [ ] `[12.3-1]` Central UI can query site event logs remotely via Communication Layer
|
||
- [ ] `[12.3-2]` Queries support filtering by event type, time range, instance, severity, keyword search
|
||
- [ ] `[12.3-3]` Results are paginated (default 500 per page) with continuation token
|
||
|
||
---
|
||
|
||
## Design Constraints Checklist
|
||
|
||
| ID | Constraint | Source | Mapped WP |
|
||
|----|-----------|--------|-----------|
|
||
| KDD-ui-1 | Blazor Server (ASP.NET Core + SignalR), Bootstrap, clean corporate design | CLAUDE.md | All WPs |
|
||
| KDD-ui-2 | Real-time push for debug view, health dashboard, deployment status | CLAUDE.md | WP-4, WP-6 |
|
||
| KDD-deploy-5 | Flattened configs include revision hash for staleness detection | CLAUDE.md | WP-1 |
|
||
| KDD-deploy-9 | System-wide artifact version skew across sites supported | CLAUDE.md | WP-5 |
|
||
| KDD-deploy-11 | Optimistic concurrency on deployment status records | CLAUDE.md | WP-4 |
|
||
| CD-DM-1 | Diff shows added/removed/changed attributes, alarms, scripts, connection binding changes | Component-DeploymentManager | WP-2 |
|
||
| CD-DM-2 | Per-site result matrix for system-wide artifact deployment; successful sites not rolled back | Component-DeploymentManager | WP-5 |
|
||
| CD-DM-3 | Retry failed sites individually after system-wide artifact deployment | Component-DeploymentManager | WP-5 |
|
||
| CD-DM-4 | Central UI indicates which sites have pending artifact updates | Component-DeploymentManager | WP-5 |
|
||
| CD-COMM-1 | Debug streams lost on failover — must be re-opened by user | Component-Communication | WP-6 |
|
||
| CD-COMM-2 | Debug view: subscribe → snapshot → stream → unsubscribe pattern | Component-Communication | WP-6 |
|
||
| CD-SEL-1 | Event log queries paginated with continuation token (500/page default) | Component-SiteEventLogging | WP-7 |
|
||
| CD-SEL-2 | Keyword search on message and source fields (SQLite LIKE) | Component-SiteEventLogging | WP-7 |
|
||
| CD-SEL-3 | Event log filters: event type, time range, instance ID, severity | Component-SiteEventLogging | WP-7 |
|
||
| CD-SF-1 | Parked message details: target, payload, retry count, timestamps | Component-StoreAndForward | WP-8 |
|
||
| CD-AUD-1 | Audit log filter: user, entity type, action type, time range | Component-CentralUI | WP-9 |
|
||
| CD-AUD-2 | Before/after state by comparing consecutive entries | Component-CentralUI | WP-9 |
|
||
|
||
---
|
||
|
||
## Work Packages
|
||
|
||
### WP-1: Staleness Indicators (S)
|
||
|
||
**Description**: Show which instances have out-of-date deployed configurations by comparing revision hashes.
|
||
|
||
**Acceptance Criteria**:
|
||
- Instance list shows staleness indicator (e.g., icon/badge) when deployed revision hash differs from current template-derived revision hash (`[3.9-1-ui]`, `KDD-deploy-5`)
|
||
- Two views accessible: deployed configuration and template-derived configuration (`[3.9-2-ui]`)
|
||
- Staleness detection does not require a full diff — uses revision hash comparison only (`KDD-deploy-5`)
|
||
- Filter/sort by staleness state
|
||
|
||
**Complexity**: S
|
||
**Traces**: `[3.9-1-ui]`, `[3.9-2-ui]`, `[3.9-5-ui]`, KDD-deploy-5
|
||
|
||
---
|
||
|
||
### WP-2: Diff View (M)
|
||
|
||
**Description**: Display differences between the deployed configuration and the current template-derived configuration.
|
||
|
||
**Acceptance Criteria**:
|
||
- Diff view shows added, removed, and changed members (attributes, alarms, scripts) (`[3.9-4-ui]`, `[8-deploy-1]`, `CD-DM-1`)
|
||
- Connection binding changes shown in diff (`CD-DM-1`)
|
||
- Clear visual distinction between additions (new members), removals, and modifications
|
||
- Diff calculated on demand when user views it
|
||
|
||
**Complexity**: M
|
||
**Traces**: `[3.9-4-ui]`, `[8-deploy-1]`, CD-DM-1
|
||
|
||
---
|
||
|
||
### WP-3: Deploy with Pre-Validation Gating (M)
|
||
|
||
**Description**: Deploy action on individual instances that automatically runs pre-deployment validation and blocks on errors.
|
||
|
||
**Acceptance Criteria**:
|
||
- Deploy action available per instance (`[3.9-3-ui]`, `[8-deploy-2]`)
|
||
- Pre-deployment validation runs automatically before deployment is sent (`[1.4-4-ui]`, `[8-deploy-4]`)
|
||
- Validation errors displayed clearly and block the deployment
|
||
- Filter instances by site, area, template (`[8-deploy-3]`)
|
||
- Site applies config immediately — no confirmation step shown in UI for site side (`[1.4-1-ui]`)
|
||
- No rollback action offered (`[3.9-5-ui]`)
|
||
- Deployment role required
|
||
|
||
**Complexity**: M
|
||
**Traces**: `[1.4-1-ui]`, `[1.4-4-ui]`, `[3.9-3-ui]`, `[3.9-5-ui]`, `[8-deploy-2]`, `[8-deploy-3]`, `[8-deploy-4]`
|
||
|
||
---
|
||
|
||
### WP-4: Deployment Status Tracking (Live SignalR) (M)
|
||
|
||
**Description**: Real-time deployment status updates pushed to the UI via SignalR.
|
||
|
||
**Acceptance Criteria**:
|
||
- Deployment status (pending, in-progress, success, failed) updates in real-time via SignalR push (`KDD-ui-2`, `[8-deploy-6]`)
|
||
- Site reports success/failure — UI reflects result (`[1.4-3-ui]`)
|
||
- Optimistic concurrency on status records handled gracefully (`KDD-deploy-11`)
|
||
- Status shown per instance with timestamp
|
||
- No manual refresh required
|
||
|
||
**Complexity**: M
|
||
**Traces**: `[1.4-3-ui]`, `[8-deploy-6]`, KDD-ui-2, KDD-deploy-11
|
||
|
||
---
|
||
|
||
### WP-5: System-Wide Artifact Deployment with Per-Site Status Matrix (L)
|
||
|
||
**Description**: UI for deploying shared scripts, external system definitions, DB connection definitions, and notification lists to all sites.
|
||
|
||
**Acceptance Criteria**:
|
||
- Separate "Deploy Artifacts" action — not automatically triggered when definitions change (`[1.5-1-ui]`, `[8-deploy-5]`)
|
||
- Deployment role required (`[1.5-2-ui]`)
|
||
- Design role manages definitions; Deployment role triggers deployment — clear separation (`[1.5-3-ui]`)
|
||
- Per-site status matrix showing success/failure for each site (`CD-DM-2`)
|
||
- Successful sites not rolled back if others fail (`CD-DM-2`)
|
||
- Individual site retry for failed sites (`CD-DM-3`)
|
||
- UI indicates which sites have pending artifact updates (`CD-DM-4`)
|
||
- Cross-site version skew supported — display shows version status per site (`KDD-deploy-9`)
|
||
|
||
**Complexity**: L
|
||
**Traces**: `[1.5-1-ui]`–`[1.5-3-ui]`, `[8-deploy-5]`, KDD-deploy-9, CD-DM-2, CD-DM-3, CD-DM-4
|
||
|
||
---
|
||
|
||
### WP-6: Debug View (L)
|
||
|
||
**Description**: On-demand real-time view of a specific instance's attribute values and alarm states streamed via SignalR.
|
||
|
||
**Acceptance Criteria**:
|
||
- Select a deployed instance and open debug view (`[8.1-1]`)
|
||
- Initial snapshot of all current attribute values and alarm states received from site (`[8.1-2]`)
|
||
- Attribute value stream formatted as `[InstanceUniqueName].[AttributePath].[AttributeName]`, value, quality, timestamp (`[8.1-3]`)
|
||
- Alarm state stream formatted as `[InstanceUniqueName].[AlarmName]`, state, priority, timestamp (`[8.1-4]`)
|
||
- Live updates pushed via SignalR — no polling (`KDD-ui-2`)
|
||
- Stream continues until user closes the debug view; central unsubscribes on close (`[8.1-5]`)
|
||
- All attributes and alarms shown — no selection filtering (`[8.1-6]`)
|
||
- No concurrency limits enforced (`[8.1-7]`)
|
||
- On failover, debug stream is lost; user must re-open (`CD-COMM-1`)
|
||
- Subscribe → snapshot → stream → unsubscribe lifecycle (`CD-COMM-2`)
|
||
- Deployment role required
|
||
|
||
**Complexity**: L
|
||
**Traces**: `[8.1-1]`–`[8.1-7]`, KDD-ui-2, CD-COMM-1, CD-COMM-2
|
||
|
||
---
|
||
|
||
### WP-7: Site Event Log Viewer (M)
|
||
|
||
**Description**: UI for querying and viewing operational event logs from site clusters remotely.
|
||
|
||
**Acceptance Criteria**:
|
||
- Remote query to sites via Communication Layer (`[12.3-1]`, `[8-deploy-8]`)
|
||
- Filter by event type/category, time range, instance ID, severity (`CD-SEL-3`, `[12.3-2]`)
|
||
- Keyword search on message and source fields (`CD-SEL-2`, `[12.3-2]`)
|
||
- Paginated results with continuation token support (default 500/page) (`CD-SEL-1`, `[12.3-3]`)
|
||
- Display all event categories: script executions (start, complete, error), alarm events (activated, cleared, evaluation errors), deployment events (received, compiled, applied, failed), connection status changes, S&F activity (queued, delivered, retried, parked), instance lifecycle (enable, disable, delete)
|
||
- Deployment role required
|
||
|
||
**Complexity**: M
|
||
**Traces**: `[12.3-1]`–`[12.3-3]`, `[8-deploy-8]`, CD-SEL-1, CD-SEL-2, CD-SEL-3
|
||
|
||
---
|
||
|
||
### WP-8: Parked Message Management (M)
|
||
|
||
**Description**: UI for querying, viewing, retrying, and discarding parked messages at sites.
|
||
|
||
**Acceptance Criteria**:
|
||
- Query sites for parked messages remotely (`[5.4-1-ui]`, `[5.4-2-ui]`, `[8-deploy-7]`)
|
||
- View message details: target, payload, retry count, timestamps (`CD-SF-1`)
|
||
- All three message categories shown: external system calls, notifications, cached database writes (`[5.4-4-ui]`)
|
||
- Retry action moves message back to retry queue (`[5.4-3-ui]`)
|
||
- Discard action removes message permanently (`[5.4-3-ui]`)
|
||
- Deployment role required
|
||
|
||
**Complexity**: M
|
||
**Traces**: `[5.4-1-ui]`–`[5.4-4-ui]`, `[8-deploy-7]`, CD-SF-1
|
||
|
||
---
|
||
|
||
### WP-9: Audit Log Viewer (M)
|
||
|
||
**Description**: UI for querying the central audit log with filters.
|
||
|
||
**Acceptance Criteria**:
|
||
- Query audit log from configuration database (`[10.1-ui]`)
|
||
- All system-modifying action categories visible (`[10.2-ui]`)
|
||
- Each entry displays: who (user), what (action, entity type, entity ID, entity name), when (timestamp), state (JSON after-state) (`[10.3-ui]`)
|
||
- Filter by user, entity type, action type, time range (`CD-AUD-1`)
|
||
- Before/after state comparison by viewing consecutive entries for the same entity (`[10.3-2-ui]`, `CD-AUD-2`)
|
||
- Admin role required
|
||
|
||
**Complexity**: M
|
||
**Traces**: `[10.1-ui]`–`[10.3-2-ui]`, CD-AUD-1, CD-AUD-2
|
||
|
||
---
|
||
|
||
## Test Strategy
|
||
|
||
### Unit Tests
|
||
- Staleness indicator rendering based on revision hash comparison
|
||
- Diff view component rendering for added/removed/changed members
|
||
- Deployment status SignalR update handling
|
||
- Debug view snapshot rendering and stream update handling
|
||
- Event log filter building and pagination logic
|
||
- Parked message action button state logic
|
||
- Audit log filter building and entry rendering
|
||
|
||
### Integration Tests
|
||
- Deploy workflow: view diff → validate → deploy → track status via SignalR → verify success
|
||
- Deploy with validation failure → verify deployment blocked
|
||
- System-wide artifact deployment → verify per-site status matrix → retry failed site
|
||
- Debug view: open → receive snapshot → receive stream updates → close → verify unsubscribe
|
||
- Event log viewer: query with filters → paginate → verify results match
|
||
- Parked message: query → retry → verify message moves back to queue; query → discard → verify removed
|
||
- Audit log: query with filters → verify entries displayed with correct detail
|
||
|
||
### Negative Tests
|
||
- Attempt deploy on instance with validation errors → verify blocked
|
||
- No rollback action exists in UI → verify absent
|
||
- Non-Deployment user attempts deploy → verify access denied
|
||
- Non-Admin user attempts audit log viewer → verify access denied
|
||
- Debug view during failover → verify stream lost, user must re-open
|
||
- Query event log on unreachable site → verify graceful error
|
||
|
||
---
|
||
|
||
## Verification Gate
|
||
|
||
Phase 6 is complete when:
|
||
1. All 9 work packages pass acceptance criteria
|
||
2. All unit and integration tests pass
|
||
3. All negative tests verify prohibited behaviors
|
||
4. A Deployment user can perform a full operational loop: view stale instances → view diff → deploy → track live status → open debug view → view event logs → manage parked messages
|
||
5. An Admin user can query the audit log with filters and view change details
|
||
6. Real-time features (deployment status, debug view) work via SignalR without polling
|
||
7. System-wide artifact deployment shows per-site status matrix with retry capability
|
||
|
||
---
|
||
|
||
## Open Questions
|
||
|
||
No new questions discovered during Phase 6 plan generation.
|
||
|
||
---
|
||
|
||
## Split-Section Verification
|
||
|
||
| Section | Phase 6 Bullets | Other Phase(s) | Other Phase Bullets |
|
||
|---------|----------------|-----------------|---------------------|
|
||
| 1.4 | `[1.4-1-ui]`, `[1.4-3-ui]`, `[1.4-4-ui]` (UI) | Phase 3C | `[1.4-1]`–`[1.4-4]` backend pipeline |
|
||
| 1.5 | `[1.5-1-ui]`–`[1.5-3-ui]` (UI) | Phase 3C | Backend artifact deployment |
|
||
| 3.9 | `[3.9-1-ui]`–`[3.9-5-ui]`, `[3.9-6]` in Phase 5 | Phase 3C | Backend pipeline, status persistence |
|
||
| 5.4 | `[5.4-1-ui]`–`[5.4-4-ui]` (UI) | Phase 3C | Backend parked message storage and management |
|
||
| 8 | `[8-deploy-1]`–`[8-deploy-8]` | Phase 4, 5 | Admin/operator, design workflows |
|
||
| 8.1 | `[8.1-1]`–`[8.1-7]` (all) | — | No split (Phase 6 owns entirely) |
|
||
| 10.1–10.3 | `[10.1-ui]`–`[10.3-2-ui]` (viewer UI) | Phase 1 | Backend storage, IAuditService, transactional guarantee |
|
||
| 12.3 | `[12.3-1]`–`[12.3-3]` (all) | — | No split (Phase 6 owns entirely) |
|
||
|
||
---
|
||
|
||
## Orphan Check Result
|
||
|
||
**Forward check**: Every Requirements Checklist item and Design Constraints Checklist item maps to at least one work package with acceptance criteria that would fail if the requirement were not implemented. PASS.
|
||
|
||
**Reverse check**: Every work package traces back to at least one requirement or design constraint. No untraceable work. PASS.
|
||
|
||
**Split-section check**: All split sections verified above. Phase 6 covers UI presentation for deployment/operations workflows. Backend functionality is in Phase 3C (deployment pipeline, S&F) and Phase 1 (audit service). No unassigned bullets found. PASS.
|
||
|
||
**Negative requirement check**: The following negative requirements have explicit acceptance criteria:
|
||
- `[3.9-5-ui]` No rollback — verified in WP-3 (no rollback action offered)
|
||
- `[1.5-1-ui]` Not automatically propagated — verified in WP-5 (separate action required)
|
||
- `[8.1-7]` No concurrency limits — verified in WP-6
|
||
|
||
PASS.
|
||
|
||
**Codex MCP verification**: Skipped — external tool verification deferred.
|