Files
scadalink-design/docs/plans/generate_plans.md
Joseph Doherty d91aa83665 refactor(docs): move requirements and test infra docs into docs/ subdirectories
Organize documentation by moving requirements (HighLevelReqs, Component-*,
lmxproxy_protocol) to docs/requirements/ and test infrastructure docs to
docs/test_infra/. Updates all cross-references in README, CLAUDE.md,
infra/README, component docs, and 23 plan files.
2026-03-21 01:11:35 -04:00

736 lines
42 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Implementation Plan Generation Guide
**Date**: 2026-03-16
**Purpose**: Master plan for generating detailed implementation plans for all ScadaLink components across phased delivery.
---
## Overview
This document defines the phased implementation strategy for the ScadaLink SCADA system. It is **not** the implementation plan itself — it is the plan for **generating** implementation plans. Each phase below will produce one or more detailed implementation plan documents in `docs/plans/`.
### Guiding Principles
1. **Each phase produces a testable, working increment** — no phase ends with unverifiable work.
2. **Dependencies are respected** — no component is built before its dependencies.
3. **Requirements traceability at bullet level** — every individual requirement (each bullet point, sub-bullet, and constraint) in docs/requirements/HighLevelReqs.md must map to at least one work package. Section-level mapping is insufficient — a section like "4.4 Script Capabilities" contains ~8 distinct requirements that may land in different phases. See `docs/plans/requirements-traceability.md` for the matrix.
4. **Design decision traceability** — the Key Design Decisions in CLAUDE.md and detailed design in docs/requirements/Component-*.md documents contain implementation constraints not present in docs/requirements/HighLevelReqs.md (e.g., Become/Stash pattern, staggered startup, Tell vs Ask conventions, forbidden script APIs). Each must trace to a work package.
5. **Split-section completeness** — when a HighLevelReqs section spans multiple phases, each phase's plan must explicitly list which bullets from that section it covers. The union across all phases must be the complete section with no gaps.
6. **Questions are tracked, not blocking** — any ambiguity discovered during plan generation is logged in `docs/plans/questions.md` and generation continues. Do not stop or wait for user input during plan generation.
7. **Codex MCP is best-effort** — if the Codex MCP tool is unavailable or errors during verification, note the skip in the plan document and continue. Do not block on external tool availability.
8. **Plans are broken into implementable work packages** — each phase is subdivided into epics, each epic into concrete tasks with acceptance criteria.
9. **Failover and resilience are validated early** — not deferred to a final hardening phase. Each runtime phase includes failover acceptance criteria.
10. **Persistence/recovery semantics are defined before actor design** — Akka.NET actor protocols depend on recovery behavior.
---
## Phase Structure
### Phase 0: Solution Skeleton & Delivery Guardrails
**Goal**: Establish a buildable, testable baseline before any domain work.
**Components**:
- Solution structure (all 17 component projects + test projects)
- Commons (REQ-COM-5b namespace/folder skeleton, REQ-COM-1 shared types, REQ-COM-6 no-business-logic constraint, REQ-COM-7 dependency constraints)
- Host (REQ-HOST-1 single binary, skeleton Program.cs, REQ-HOST-10 extension method convention)
- CI baseline (build, test, format)
**Testable Outcome**: Empty host boots by role from `appsettings.json`. Test pipeline runs. All projects compile with correct references.
**HighLevelReqs Coverage**: 13.1 (UTC timestamps baked into type system)
**Plan Document**: `docs/plans/phase-0-solution-skeleton.md`
**Sub-tasks**:
1. Create .NET solution with project structure matching component architecture
2. Implement Commons type system (REQ-COM-1: enums, Result<T>, RetryPolicy, UTC convention)
3. Implement Commons namespace/folder convention (REQ-COM-5b)
4. Implement Commons entity POCOs (REQ-COM-3) — classes with properties, organized by domain area
5. Implement Commons repository interfaces (REQ-COM-4) — interface signatures
6. Implement Commons cross-cutting interfaces (REQ-COM-4a: IAuditService)
7. Implement Commons message contracts (REQ-COM-5) — record types with versioning rules (REQ-COM-5a)
8. Implement Commons protocol abstraction (REQ-COM-2: IDataConnection interface)
9. Implement Host skeleton (REQ-HOST-1, REQ-HOST-2 role detection, REQ-HOST-10 extension method convention)
10. Implement per-component options classes (REQ-HOST-3 config binding)
11. Set up CI pipeline (build, test, format)
12. Create local dev topology documentation (central + site appsettings files)
---
### Phase 1: Central Platform Foundations
**Goal**: Central node can authenticate users, persist data, and host a web shell. Site-to-central trust model is established.
**Components**:
- Configuration Database (schema, DbContext, repos, IAuditService, migrations)
- Security & Auth (LDAP bind, JWT, roles, site scoping)
- Host (REQ-HOST-4 validation, REQ-HOST-4a readiness, REQ-HOST-5 Windows Service, REQ-HOST-6 Akka bootstrap, REQ-HOST-7 ASP.NET, REQ-HOST-8 logging, REQ-HOST-8a dead letters, REQ-HOST-9 shutdown)
- Central UI (Blazor Server shell, login, route protection)
**Testable Outcome**: User logs in via LDAP, receives JWT with correct role claims, sees an empty dashboard. Admin can manage LDAP group mappings. Audit entries persist. Central runs behind load balancer. Akka.NET actor system boots with cluster configuration.
**HighLevelReqs Coverage**: 9.19.4, 10.110.4
**Plan Document**: `docs/plans/phase-1-central-foundations.md`
**Sub-tasks**:
1. Configuration Database: EF Core DbContext, Fluent API entity mappings, initial migration
2. Configuration Database: Repository implementations (ISecurityRepository, ICentralUiRepository)
3. Configuration Database: IAuditService with transactional guarantee (same-transaction writes)
4. Configuration Database: Optimistic concurrency on deployment status records
5. Configuration Database: Seed data (initial admin LDAP mapping)
6. Security & Auth: LDAP bind service (LDAPS/StartTLS required)
7. Security & Auth: JWT issuance, 15-min sliding refresh, 30-min idle timeout
8. Security & Auth: Role claim extraction from LDAP groups (Admin, Design, Deployment + site scoping)
9. Security & Auth: Authorization policies with site-scoped Deployment checks
10. Security & Auth: Shared Data Protection keys (config DB or shared config)
11. Host: Full startup validation (REQ-HOST-4)
12. Host: Readiness gating with `/health/ready` endpoint (REQ-HOST-4a)
13. Host: Akka.NET bootstrap — cluster, remoting, persistence (REQ-HOST-6)
14. Host: Serilog structured logging with SiteId/NodeHostname/NodeRole enrichment (REQ-HOST-8)
15. Host: Dead letter monitoring subscription (REQ-HOST-8a)
16. Host: CoordinatedShutdown wiring (REQ-HOST-9)
17. Host: Windows Service support (REQ-HOST-5)
18. Central UI: Blazor Server shell with SignalR
19. Central UI: Login/logout flow with JWT
20. Central UI: Role-aware navigation and route guards
21. Central UI: Failover behavior (SignalR reconnect, JWT survives, shared Data Protection keys)
22. Integration tests: Auth flow, audit logging, startup validation, readiness gating
---
### Phase 2: Core Modeling, Validation & Deployment Contract
**Goal**: Template authoring data model, validation pipeline, and the compiled deployment artifact contract are functional. The output of this phase defines exactly what gets deployed to a site.
**Components**:
- Template Engine (full)
- Configuration Database (ITemplateEngineRepository, IDeploymentManagerRepository stubs)
**Testable Outcome**: Complex template trees can be authored, flattened, diffed, and validated programmatically. Revision hashes generated. The flattened configuration output format (the "deployment package") is stable and versioned. All validation rules enforced including semantic checks.
**HighLevelReqs Coverage**: 3.13.11, 4.1, 4.5
**Plan Document**: `docs/plans/phase-2-modeling-validation.md`
**Sub-tasks**:
1. Template CRUD with inheritance relationships
2. Composition (has-a) with recursive nesting
3. Path-qualified canonical naming for composed members
4. Attribute, alarm, script definitions with lock flags
5. Override granularity enforcement per entity type/field
6. Naming collision detection (recursive across composed modules)
7. Graph acyclicity enforcement (inheritance + composition)
8. Flattening algorithm (full resolution chain: Instance → Child → Parent → Composing → Composed)
9. Diff calculation (deployed vs. template-derived)
10. Revision hash generation for flattened output
11. **Deployment package contract**: Define the exact serialization format of a flattened configuration that will be sent to sites and stored in SQLite. This is the stable boundary between Template Engine, Deployment Manager, and Site Runtime.
12. Pre-deployment validation pipeline:
- Flattening success
- Naming collisions
- Script test compilation
- Semantic validation (call targets, arg types, return types, trigger operand types)
- Alarm/script trigger reference existence
- Data connection binding completeness
13. On-demand validation (same pipeline, no deployment trigger)
14. Shared script validation (syntax/structural only)
15. Template deletion constraint enforcement
16. Instance CRUD (create from template, overrides, area assignment, connection binding)
17. Site and data connection management (CRUD)
18. Area management (hierarchical CRUD)
19. Unit tests for flattening, validation, diff, collision detection, acyclicity
---
### Phase 3A: Runtime Foundation & Persistence Model
**Goal**: Prove the Akka.NET cluster, singleton, and local persistence model work correctly — including failover.
**Components**:
- Cluster Infrastructure (full)
- Host (site-role Akka bootstrap)
- Site Runtime (Deployment Manager singleton skeleton, basic Instance Actor)
- Local SQLite persistence model (deployed config storage, static attribute overrides)
**Testable Outcome**: Two-node site cluster forms. Singleton starts on oldest node. Failover migrates singleton to surviving node. Singleton reads deployed configs from SQLite and recreates Instance Actors. Static attribute overrides persist across restart. `min-nr-of-members=1` verified. CoordinatedShutdown enables fast handover.
**HighLevelReqs Coverage**: 1.2 (failover), 1.1 (site responsibilities — partial)
**Plan Document**: `docs/plans/phase-3a-runtime-foundation.md`
**Sub-tasks**:
1. Cluster Infrastructure: Akka.NET cluster config with keep-oldest SBR, down-if-alone
2. Cluster Infrastructure: Both-as-seed, min-nr-of-members=1
3. Cluster Infrastructure: Failure detection timing (2s heartbeat, 10s threshold, 15s stable-after)
4. Cluster Infrastructure: Graceful shutdown / CoordinatedShutdown for fast singleton handover
5. Host: Site-role Akka bootstrap (generic Host, no Kestrel)
6. Site Runtime: Deployment Manager singleton (cluster singleton registration + proxy)
7. Site Runtime: Startup behavior — read SQLite, staggered Instance Actor creation
8. Site Runtime: Instance Actor skeleton — hold attribute state, publish to stream
9. Site Runtime: Supervision strategies per actor type
10. Site Runtime: Static attribute persistence to SQLite (write + load on startup)
11. Local persistence: SQLite schema for deployed configs and attribute overrides
12. **Failover acceptance tests**:
- Active node crash → singleton migrates to standby
- Graceful shutdown → fast singleton handover
- Both nodes down → first up forms cluster, rebuilds from SQLite
- Static attribute overrides survive failover
---
### Phase 3B: Site I/O & Observability
**Goal**: Site can connect to equipment, collect data, evaluate scripts and alarms, and report health to central.
**Components**:
- Communication Layer (full)
- Data Connection Layer (IDataConnection, OPC UA adapter, connection actor)
- Site Runtime (full — scripts, alarms, shared scripts, stream)
- Health Monitoring (site-side collection + central-side aggregation)
- Site Event Logging (event recording, retention, remote query)
**Testable Outcome**: Site connects to OPC UA server, subscribes to tags, delivers values to Instance Actors. Scripts execute on triggers. Alarms evaluate and publish state. Health reports flow to central with sequence numbers. Event logs record and are queryable from central. Debug view streams live data.
**HighLevelReqs Coverage**: 2.22.5, 3.4.1, 4.24.4, 4.4.1, 4.5, 4.6, 8.1, 11.111.2, 12.112.3
**Plan Document**: `docs/plans/phase-3b-site-io-observability.md`
**Sub-tasks**:
1. Communication Layer: Message contracts with correlation IDs
2. Communication Layer: Per-pattern timeout configuration (120s deploy, 30s queries)
3. Communication Layer: Transport heartbeat config (2s interval, 10s threshold)
4. Communication Layer: All 8 message patterns implementation
5. Communication Layer: Application-level correlation for idempotency
6. Data Connection Layer: Connection actor with Become/Stash lifecycle
7. Data Connection Layer: OPC UA adapter implementing IDataConnection
8. Data Connection Layer: Auto-reconnect (fixed interval), immediate bad quality on disconnect
9. Data Connection Layer: Transparent re-subscribe on reconnection
10. Data Connection Layer: Write-back with synchronous error to script
11. Data Connection Layer: Tag path resolution with periodic retry
12. Data Connection Layer: Health reporting (connection status + tag resolution counts)
13. Site Runtime: Script Actor + Script Execution Actor (triggers, concurrent execution, dedicated dispatcher)
14. Site Runtime: Alarm Actor + Alarm Execution Actor (condition evaluation, state management)
15. Site Runtime: Shared script library (inline execution)
16. Site Runtime: Script Runtime API (GetAttribute, SetAttribute, CallScript, CallShared)
17. Site Runtime: Script trust model (forbidden APIs, execution timeout, constrained compilation)
18. Site Runtime: Recursion limit enforcement
19. Site Runtime: Tell vs Ask conventions (Tell for hot path, Ask for CallScript)
20. Site Runtime: Site-wide Akka stream with per-subscriber backpressure
21. Site Runtime: Concurrency serialization (Instance Actor serializes mutations)
22. Health Monitoring: Site-side metric collection (all metrics)
23. Health Monitoring: Periodic reporting with monotonic sequence numbers
24. Health Monitoring: Central-side aggregation, offline detection (60s threshold)
25. Health Monitoring: Dead letter count as metric
26. Site Event Logging: Event recording to SQLite
27. Site Event Logging: 30-day retention with daily purge + 1GB storage cap
28. Site Event Logging: Remote query with pagination (500/page) and keyword search
29. **Failover acceptance tests**: DCL reconnection after failover, health report continuity, stream recovery
---
### Phase 3C: Deployment Pipeline & Store-and-Forward
**Goal**: Complete the deploy-to-site pipeline end-to-end with resilience.
**Components**:
- Deployment Manager (full)
- Store-and-Forward Engine (full)
**Testable Outcome**: Central validates, flattens, and deploys an instance to a site. Site compiles scripts, creates actors, reports success. Deployment ID ensures idempotency. Per-instance operation lock works. Instance lifecycle commands (disable/enable/delete) work. Store-and-forward buffers messages on transient failure, retries, parks. Async replication to standby. Parked messages queryable from central.
**HighLevelReqs Coverage**: 1.3, 1.4, 1.5, 3.8.1, 3.9, 5.3, 5.4, 6.4
**Plan Document**: `docs/plans/phase-3c-deployment-store-forward.md`
**Sub-tasks**:
1. Deployment Manager: Deployment flow (validate → flatten → send → track)
2. Deployment Manager: Deployment ID + revision hash for idempotency
3. Deployment Manager: Site-side state query after timeout/failover (reconciliation)
4. Deployment Manager: Per-instance operation lock (all mutating commands)
5. Deployment Manager: State transition matrix (enabled/disabled/not-deployed × operations)
6. Deployment Manager: Site-side apply atomicity (all-or-nothing per instance)
7. Deployment Manager: Instance lifecycle commands (disable, enable, delete)
8. Deployment Manager: System-wide artifact deployment with per-site status
9. Deployment Manager: Artifact version compatibility (cross-site skew supported)
10. Deployment Manager: Deployment status persistence (current only + audit log)
11. Store-and-Forward Engine: Message buffering with SQLite persistence
12. Store-and-Forward Engine: Fixed-interval retry per source entity
13. Store-and-Forward Engine: Transient-only buffering (5xx/connection errors)
14. Store-and-Forward Engine: Parking after max retries
15. Store-and-Forward Engine: Async best-effort replication to standby
16. Store-and-Forward Engine: Parked message management (query, retry, discard from central)
17. Store-and-Forward Engine: Messages survive instance deletion
18. **End-to-end acceptance tests**: Deploy → run → failover → redeploy → lifecycle commands
19. **Resilience tests**: Mid-deploy failover, timeout + reconciliation, S&F buffer takeover on failover
---
### Phase 4: Minimal Operator/Admin UI
**Goal**: Operators can manage sites, monitor health, and control instance lifecycle from the browser.
**Components**:
- Central UI (admin + operator workflows)
**Testable Outcome**: Admin manages sites, data connections, areas, LDAP mappings, API keys. Operator sees health dashboard with live SignalR push, manages instance lifecycle, views deployment status.
**HighLevelReqs Coverage**: 8 (partial — admin and operator workflows), 7.2 (API key management)
**Plan Document**: `docs/plans/phase-4-operator-ui.md`
**Sub-tasks**:
1. Admin: Site management (CRUD)
2. Admin: Data connection management (define, assign to sites)
3. Admin: Area management (hierarchical CRUD)
4. Admin: LDAP group mapping management
5. Admin: API key management (create, enable/disable, delete)
6. Operator: Health monitoring dashboard (live push via SignalR)
7. Operator: Instance list with filtering (site, area, template, status)
8. Operator: Deployment status view (live push)
9. Operator: Instance actions (enable, disable, delete)
10. Shared UX: Authorization-aware navigation
11. Shared UX: Error/success notifications
12. Shared UX: Long-running action state indicators
---
### Phase 5: Design-Time UI & Authoring Workflows
**Goal**: Design users can author templates, scripts, and system definitions through the UI.
**Note**: This phase authors **metadata/definitions only** for External System Gateway, DB Connections, and Notification Service. The runtime execution of these integrations is Phase 7. UI for these definitions does not depend on Phase 7 runtime.
**Components**:
- Central UI (design workflows)
**Testable Outcome**: Full template authoring with hierarchy visualization, composition, lock indicators, on-demand validation. Shared script management. External system, DB connection, notification list definition management. Instance creation with per-attribute binding and bulk assignment.
**HighLevelReqs Coverage**: 8 (design workflows), 3.13.11 (UI for template operations), 5.1, 5.5, 6.1, 7.4
**Plan Document**: `docs/plans/phase-5-authoring-ui.md`
**Sub-tasks**:
1. Template authoring: CRUD, inheritance tree visualization
2. Template authoring: Composition management with collision detection feedback
3. Template authoring: Attribute/alarm/script editing with lock indicators
4. Template authoring: Inherited vs. local vs. overridden visual indicators
5. Template authoring: On-demand validation with actionable error display
6. Shared script management (CRUD, compilation check)
7. External system definition management (CRUD, method definitions) — metadata only
8. Database connection definition management (CRUD) — metadata only
9. Notification list management (CRUD, recipients, SMTP config) — metadata only
10. Inbound API method definition management (CRUD) — metadata only
11. Instance creation from template
12. Instance per-attribute data connection binding with bulk assignment
13. Instance attribute override editing
---
### Phase 6: Deployment Operations & Troubleshooting UI
**Goal**: Complete the operational loop — deploy, diagnose, troubleshoot from central.
**Components**:
- Central UI (deployment + troubleshooting workflows)
**Testable Outcome**: Full deployment workflow with diffs and validation gating. System-wide artifact deployment. Debug view with live streaming. Site event log viewer. Parked message management. Audit log viewer.
**HighLevelReqs Coverage**: 8 (remaining workflows), 8.1, 10.110.3, 12.3
**Plan Document**: `docs/plans/phase-6-deployment-ops-ui.md`
**Sub-tasks**:
1. Deployment: Staleness indicators (revision hash comparison)
2. Deployment: Diff view (added/removed/changed members)
3. Deployment: Deploy with pre-validation gating
4. Deployment: Deployment status tracking (live SignalR push)
5. System-wide artifact deployment with per-site status matrix
6. Debug view: Instance selection, snapshot + live stream via SignalR
7. Site event log viewer: Remote query with filters, pagination, keyword search
8. Parked message management: Query, retry, discard
9. Audit log viewer: Query with filters (user, entity type, action, time range)
---
### Phase 7: Integration Surfaces
**Goal**: External systems can call in and site scripts can call out.
**Components**:
- Inbound API (full runtime)
- External System Gateway (full site-side execution)
- Notification Service (full site-side delivery)
**Testable Outcome**: External systems call `POST /api/{method}` with X-API-Key auth. Site scripts use `ExternalSystem.Call()` / `CachedCall()` with HTTP/REST + JSON. Notifications sent via SMTP/OAuth2. Store-and-forward handles transient failures for CachedCall and notifications. Error classification works (transient → S&F, permanent → script).
**HighLevelReqs Coverage**: 5.15.6, 6.16.4, 7.17.5
**Plan Document**: `docs/plans/phase-7-integrations.md`
**Sub-tasks**:
1. Inbound API: ASP.NET endpoint registration, X-API-Key header auth
2. Inbound API: Method routing, parameter validation (extended type system)
3. Inbound API: Script execution engine on central
4. Inbound API: Route.To() for cross-site calls (via Communication Layer)
5. Inbound API: Batch attribute operations (GetAttributes/SetAttributes)
6. Inbound API: Error handling (401/403/400/500), per-method timeout
7. Inbound API: Failures-only logging
8. External System Gateway: HTTP/REST client with JSON serialization
9. External System Gateway: API key + Basic Auth outbound authentication
10. External System Gateway: Per-system timeout
11. External System Gateway: Dual call modes — Call() (synchronous) and CachedCall() (S&F on transient)
12. External System Gateway: Error classification (5xx/408/429 transient, other 4xx permanent)
13. External System Gateway: CachedCall idempotency documentation for callers
14. External System Gateway: Dedicated blocking I/O dispatcher for Script Execution Actors
15. External System Gateway: Database access — Connection() (ADO.NET pooling) and CachedWrite() (S&F)
16. Notification Service: SMTP client with OAuth2 Client Credentials (M365) + Basic Auth
17. Notification Service: Token lifecycle (fetch, cache, refresh on expiry)
18. Notification Service: BCC delivery, plain text, from address
19. Notification Service: Connection timeout + max concurrent connections
20. Notification Service: Error classification (SMTP 4xx transient, 5xx permanent)
21. Notification Service: Store-and-forward integration for transient failures
22. End-to-end tests: Inbound API → Route.To() → site script → ExternalSystem.Call() → response
23. End-to-end tests: Script → CachedCall → transient failure → S&F buffer → retry → success
24. End-to-end tests: Script → Notify.To().Send() → SMTP delivery (or S&F on failure)
---
### Phase 8: Production Readiness & Hardening
**Goal**: System is validated at target scale and ready for production deployment.
**Note**: This phase is for **comprehensive** resilience and scale testing. Basic failover testing is embedded in Phases 3A and 3C. This phase validates the full system under realistic conditions.
**Components**: Cross-cutting (all components)
**Testable Outcome**: Documented pass on failover, recovery, scale, and security targets. No workflow requires direct DB access. Operational documentation complete.
**HighLevelReqs Coverage**: 2.5 (scale verification), all sections (final verification pass)
**Plan Document**: `docs/plans/phase-8-production-readiness.md`
**Sub-tasks**:
1. Full-system failover testing: Central failover (JWT survival, SignalR reconnect, deployment state)
2. Full-system failover testing: Site failover (singleton migration, S&F takeover, DCL reconnect, alarm re-evaluation)
3. Full-system failover testing: Dual-node failure and automatic recovery (both central and site)
4. Load/performance testing at target scale (10 sites, 500 machines, 75 tags each)
5. Security hardening: LDAPS verification, JWT signing key rotation procedures, secrets management
6. Script sandboxing verification (forbidden API enforcement under adversarial conditions)
7. Recovery drills: Mid-deploy failover, communication drops during artifact deployment, site restarts
8. Observability validation: Structured log review, correlation ID coverage, health dashboard accuracy
9. Message contract compatibility testing (version skew scenarios between central and sites)
10. Installer/deployment packaging (Windows Service setup, appsettings templates)
11. Operational runbooks and documentation
---
## Delivery Milestones
| Milestone | Phases | Outcome |
|-----------|--------|---------|
| M1: Foundation | 01 | Central host, auth, DB, web shell, Akka bootstrap |
| M2: Core Modeling | 2 | Template modeling, validation, deployment package contract |
| M3: Site Runtime | 3A3C | Full deployment pipeline, site execution, failover proven |
| M4: Operator UI | 4 | Admin/operator workflows for pilot operations |
| M5: Full Management | 56 | Complete central management and troubleshooting experience |
| M6: Integration-Ready | 7 | External systems can call in, site scripts can call out |
| M7: Production | 8 | Hardened, tested, documented for production |
---
## Plan Generation Procedure
For each phase, the implementation plan document must contain:
1. **Scope** — Components and features included
2. **Prerequisites** — Which phases/components must be complete
3. **Requirements Checklist** — A bullet-level checklist extracted from docs/requirements/HighLevelReqs.md for every section this phase covers (see Bullet-Level Extraction below). Each bullet is a checkbox that must map to a work package.
4. **Design Constraints Checklist** — Applicable constraints from CLAUDE.md Key Design Decisions and docs/requirements/Component-*.md documents, each mapped to a work package.
5. **Work Packages** — Numbered tasks with:
- Description
- Acceptance criteria (must cover every checklist bullet mapped to this work package)
- Estimated complexity (S/M/L)
- Requirements traced (HighLevelReqs bullet IDs + REQ-* IDs + design constraint refs)
6. **Test Strategy** — Unit, integration, and failover tests required
7. **Verification Gate** — What must pass before the phase is considered complete
8. **Open Questions** — Any ambiguities discovered, added to `questions.md`
### Bullet-Level Extraction
When a phase covers HighLevelReqs sections, the plan must decompose each section into its individual requirements:
- Each bullet point or sub-bullet is a separate requirement line item.
- Each sentence within a bullet that introduces a distinct constraint or behavior is a separate requirement.
- Negative requirements ("cannot", "does not", "no") are explicit line items — they become acceptance criteria that verify the behavior is correctly prohibited.
- For sections split across phases, each phase lists only its bullets. After all plans covering that section are generated, verify the union is complete.
**Example**: Section 4.4 "Script Capabilities" decomposes into:
- `[4.4-1]` Read attribute values (live + static)
- `[4.4-2]` Write attributes — data-sourced writes go to DCL, value updates on device confirm
- `[4.4-3]` Write attributes — static writes persist to SQLite, survive restart/failover, reset on redeploy
- `[4.4-4]` CallScript with ask pattern, concurrent execution
- `[4.4-5]` CallShared executes inline (no separate actor)
- `[4.4-6]` ExternalSystem.Call() synchronous
- `[4.4-7]` ExternalSystem.CachedCall() with S&F
- `[4.4-8]` Send notifications
- `[4.4-9]` Database.Connection() for raw ADO.NET access
- `[4.4-10]` Cannot access other instances' attributes or scripts
### Design Constraint Extraction
Each phase plan must also scan the following sources for implementation constraints relevant to its components:
1. **CLAUDE.md → Key Design Decisions**: Each bullet is a constraint. Tag with `[KDD-category-N]` (e.g., `[KDD-runtime-3]` for "Staggered Instance Actor startup on failover").
2. **Component-*.md documents**: Design details beyond what HighLevelReqs specifies (e.g., connection actor Become/Stash pattern, health report monotonic sequence numbers, 30s keep-alive interval). Tag with `[CD-ComponentName-N]`.
These are mapped to work packages and verified in acceptance criteria just like HighLevelReqs bullets.
### Generation Steps
1. Read the phase definition in this document
2. Read all referenced docs/requirements/Component-*.md documents
3. Read referenced docs/requirements/HighLevelReqs.md sections **line by line** — extract every bullet, sub-bullet, and constraint as a numbered requirement
4. Read CLAUDE.md Key Design Decisions — extract constraints relevant to this phase's components
5. Build the Requirements Checklist and Design Constraints Checklist
6. Break sub-tasks into concrete work packages with acceptance criteria, mapping every checklist item
7. Verify: every checklist item maps to at least one work package. Flag any orphans.
8. Identify test scenarios — negative requirements ("cannot", "does not") must have explicit test cases
9. Write the plan document to `docs/plans/phase-N-<name>.md`
10. Update `requirements-traceability.md` with bullet-level references
11. Log any questions to `questions.md`
### Post-Generation Verification (Orphan Check)
After writing a phase plan, perform this verification before considering it complete:
1. **Forward check**: Walk every item in the Requirements Checklist and Design Constraints Checklist. Confirm each maps to a work package with acceptance criteria that would fail if the requirement were not implemented.
2. **Reverse check**: Walk every work package. Confirm each traces back to at least one requirement or design constraint (no untraceable work).
3. **Split-section check**: For any HighLevelReqs section shared with another phase, list the bullets this phase does NOT cover and note which phase owns them. If a bullet is unowned, it's a gap — assign it.
4. **Negative requirement check**: Every "cannot", "does not", "no", "not" constraint has an acceptance criterion that verifies the prohibition (e.g., "Scripts cannot access other instances" → test that cross-instance access fails).
5. Record the verification result at the bottom of the plan document.
### External Verification (Codex MCP)
After the orphan check passes, submit the plan to the Codex MCP tool (model: `gpt-5.4`) for independent review. This catches blind spots that self-review misses.
**Step 1 — Requirements coverage review**: Submit the following as a single Codex prompt:
- The complete phase plan document
- The full text of every docs/requirements/HighLevelReqs.md section this phase covers
- The full text of every docs/requirements/Component-*.md document referenced by this phase
- The relevant Key Design Decisions from CLAUDE.md
Ask Codex: *"Review this implementation plan against the provided requirements, component designs, and design constraints. Identify: (1) any requirement bullet, sub-bullet, or constraint from the source documents that is not covered by a work package or acceptance criterion in the plan, (2) any acceptance criterion that does not actually verify its linked requirement, (3) any contradictions between the plan and the source documents. List each finding with the specific source text and what is missing or wrong."*
**Step 2 — Negative requirement review**: Submit the plan's negative requirements and their acceptance criteria. Ask Codex: *"For each negative requirement ('cannot', 'does not', 'no'), evaluate whether the acceptance criterion would actually catch a violation. Flag any that are too weak or test the wrong thing."*
**Step 3 — Split-section gap review** (only for phases covering split sections): Submit this phase's bullet assignments alongside the other phase(s) that share the section. Ask Codex: *"Verify that the union of bullets assigned across these phases equals the complete section. Identify any bullets that are unassigned or double-assigned."*
**Handling findings**: If Codex identifies gaps, update the plan before finalizing. If a finding is a false positive (e.g., Codex misread the requirement), document why it was dismissed. Record the Codex review outcome (pass / pass with corrections / findings dismissed with rationale) at the bottom of the plan document alongside the orphan check result.
---
## Phase Execution Procedure
This section governs how implementation plans are executed. The goal is autonomous execution with built-in compliance checks — no user input required between starting a phase and completing it.
### Principles
1. **Work packages are executed in dependency order** — respect `blockedBy` relationships within the phase. Independent WPs may be parallelized.
2. **Each WP is self-verifying** — acceptance criteria are checked before moving on. A WP is not complete until all its criteria pass.
3. **Tests are written alongside code, not after** — unit tests for a WP are part of that WP, not a separate step.
4. **Commit at WP boundaries** — each completed WP (or logical group of small WPs) gets its own commit with a message referencing the WP number and phase.
5. **Questions are logged, not blocking** — if an ambiguity is discovered during implementation, log it to `questions.md`, make the best judgment, and continue. Do not stop for user input.
6. **Codex MCP is best-effort** — if unavailable during verification steps, note the skip and continue.
### Per-Work-Package Execution
For each work package, follow this sequence:
```
┌─────────────────────────────────────────────────────┐
│ 1. READ the WP description and acceptance criteria │
│ 2. READ all traced requirements (HLR bullets, KDD, │
│ CD constraints) to understand intent │
│ 3. IMPLEMENT the WP │
│ - Write code │
│ - Write unit tests for acceptance criteria │
│ - Write negative tests for prohibition criteria │
│ 4. VERIFY acceptance criteria │
│ - Run tests: all must pass │
│ - Walk each acceptance criterion line by line │
│ - If a criterion cannot be verified yet (depends │
│ on a later WP), note it as "deferred to WP-N" │
│ 5. UPDATE the phase execution checklist │
│ - Mark WP as complete with date │
│ - Note any deferred criteria │
│ - Note any questions logged │
│ 6. COMMIT with message: "Phase N WP-M: <summary>" │
└─────────────────────────────────────────────────────┘
```
### Mid-Phase Compliance Check
After completing approximately half of a phase's work packages (or at any natural milestone), perform a lightweight compliance check:
1. **Build check**: The solution compiles with zero warnings (treat warnings as errors).
2. **Test check**: All tests written so far pass.
3. **Deferred criteria review**: Check if any previously deferred acceptance criteria can now be verified. If so, verify them and update the checklist.
4. **Traceability spot-check**: Pick 3 random requirements from the Requirements Checklist. Verify they have corresponding code and tests.
If any check fails, fix the issue before proceeding to the next WP.
### Phase Completion Gate
After all work packages are complete, execute the verification gate from the plan document:
1. **Run the full test suite** — all unit, integration, negative, and failover tests must pass.
2. **Walk the verification gate checklist** — each item in the plan's "Verification Gate" section must pass. Record pass/fail per item.
3. **Walk the Requirements Checklist** — confirm every bullet has been implemented and tested. Check off each item.
4. **Walk the Design Constraints Checklist** — confirm every constraint has been respected. Check off each item.
5. **Check deferred criteria** — all previously deferred acceptance criteria must now be resolvable. Verify them.
6. **Run the phase execution checklist** — all WPs must be marked complete with no open items.
### Post-Phase Codex MCP Review
After the completion gate passes, submit the implementation for external review:
**Step 1 — Implementation vs. Plan review**: Submit to Codex (model: `gpt-5.4`):
- The phase plan document
- A summary of what was implemented (file list, key decisions made)
- The phase execution checklist (showing all WPs complete)
- Any test output or coverage summary
Ask Codex: *"Review this implementation against the plan. For each work package, verify that the acceptance criteria described in the plan are reflected in the implementation summary. Identify: (1) any acceptance criterion that appears unimplemented, (2) any implementation that contradicts the plan, (3) any requirement from the plan's Requirements Checklist that lacks corresponding code or tests. List each finding with the specific plan text and what is missing."*
**Step 2 — Code quality review**: Ask Codex: *"Review the implementation for: (1) security vulnerabilities (OWASP top 10), (2) race conditions or thread safety issues, (3) missing error handling at system boundaries, (4) violations of the project's architectural constraints (no business logic in Commons, no EF in consuming components, Tell for hot-path/Ask for boundaries). List each finding with file and line reference."*
**Handling findings**: Fix valid findings. Dismiss false positives with rationale. Record the review outcome in the execution checklist.
### Final Steps
1. **Commit** any fixes from the post-phase review.
2. **Update `requirements-traceability.md`** — change status from "Plan generated" to "Implemented" for all sections, REQ-* IDs, and KDD/CD constraints covered by this phase.
3. **Push** to remote.
4. **Report** — output a summary of what was implemented, tests passing, and any questions logged.
---
## Phase Execution Checklist Template
Each phase creates a checklist file at `docs/plans/phase-N-checklist.md` when execution begins. This file tracks compliance as work packages are completed.
```markdown
# Phase N Execution Checklist
**Phase**: [phase name]
**Started**: [date]
**Completed**: [date or "in progress"]
---
## Work Package Status
| WP | Description | Status | Date | Deferred Criteria | Notes |
|----|-------------|--------|------|-------------------|-------|
| WP-1 | [description] | [ ] Pending | | | |
| WP-2 | [description] | [ ] Pending | | | |
| ... | | | | | |
## Mid-Phase Compliance Checks
| Check | Date | Result | Notes |
|-------|------|--------|-------|
| Build (zero warnings) | | [ ] Pass / [ ] Fail | |
| All tests pass | | [ ] Pass / [ ] Fail | |
| Deferred criteria review | | [ ] Pass / [ ] N/A | |
| Traceability spot-check (3 random) | | [ ] Pass / [ ] Fail | |
## Verification Gate
| # | Gate Criterion | Pass | Notes |
|---|---------------|------|-------|
| 1 | [from plan] | [ ] | |
| 2 | [from plan] | [ ] | |
| ... | | | |
## Requirements Checklist Verification
| ID | Requirement | Implemented | Tested | Notes |
|----|------------|-------------|--------|-------|
| [3.1-1] | [requirement text] | [ ] | [ ] | |
| [3.1-2] | [requirement text] | [ ] | [ ] | |
| ... | | | | |
## Design Constraints Checklist Verification
| ID | Constraint | Respected | Notes |
|----|-----------|-----------|-------|
| KDD-xxx-N | [constraint] | [ ] | |
| CD-xxx-N | [constraint] | [ ] | |
| ... | | | |
## Post-Phase Review
| Review Step | Date | Result | Findings |
|------------|------|--------|----------|
| Codex: Implementation vs. Plan | | [ ] Pass / [ ] Pass with corrections / [ ] Skipped | |
| Codex: Code quality | | [ ] Pass / [ ] Pass with corrections / [ ] Skipped | |
## Questions Logged During Execution
| # | Question | Logged to questions.md | Resolution |
|---|----------|----------------------|------------|
| | | | |
## Summary
- **Total WPs**: N
- **Tests written**: N unit, N integration, N negative
- **Tests passing**: N/N
- **Requirements verified**: N/N
- **Design constraints verified**: N/N
- **Questions logged**: N
- **Codex review**: [outcome]
```
When starting a phase, the executor:
1. Creates the checklist file from this template, populated with the phase's actual WPs, gate criteria, requirements, and constraints.
2. Updates the checklist as each WP is completed.
3. Fills in the verification sections during the completion gate.
4. Commits the final checklist alongside the last phase commit.
---
## File Index
| File | Purpose |
|------|---------|
| `docs/plans/generate_plans.md` | This document — master plan, generation procedure, and execution procedure |
| `docs/plans/requirements-traceability.md` | Matrix linking every requirement to its implementation phase |
| `docs/plans/questions.md` | Questions requiring follow-up before or during implementation |
| **Implementation Plans** | |
| `docs/plans/phase-0-solution-skeleton.md` | Generated |
| `docs/plans/phase-1-central-foundations.md` | Generated |
| `docs/plans/phase-2-modeling-validation.md` | Generated |
| `docs/plans/phase-3a-runtime-foundation.md` | Generated |
| `docs/plans/phase-3b-site-io-observability.md` | Generated |
| `docs/plans/phase-3c-deployment-store-forward.md` | Generated |
| `docs/plans/phase-4-operator-ui.md` | Generated |
| `docs/plans/phase-5-authoring-ui.md` | Generated |
| `docs/plans/phase-6-deployment-ops-ui.md` | Generated |
| `docs/plans/phase-7-integrations.md` | Generated |
| `docs/plans/phase-8-production-readiness.md` | Generated |
| **Execution Checklists** (created when each phase begins) | |
| `docs/plans/phase-0-checklist.md` | Phase 0 execution tracking |
| `docs/plans/phase-1-checklist.md` | Phase 1 execution tracking |
| `docs/plans/phase-2-checklist.md` | Phase 2 execution tracking |
| `docs/plans/phase-3a-checklist.md` | Phase 3A execution tracking |
| `docs/plans/phase-3b-checklist.md` | Phase 3B execution tracking |
| `docs/plans/phase-3c-checklist.md` | Phase 3C execution tracking |
| `docs/plans/phase-4-checklist.md` | Phase 4 execution tracking |
| `docs/plans/phase-5-checklist.md` | Phase 5 execution tracking |
| `docs/plans/phase-6-checklist.md` | Phase 6 execution tracking |
| `docs/plans/phase-7-checklist.md` | Phase 7 execution tracking |
| `docs/plans/phase-8-checklist.md` | Phase 8 execution tracking |