Create implementation plan generation framework

generate_plans.md: Master plan defining 10 phases (0, 1, 2, 3A, 3B, 3C, 4-8) with component assignments, sub-tasks, testable outcomes, and HighLevelReqs coverage. Phase 3 split into 3A (runtime foundation + failover), 3B (site I/O + observability), 3C (deployment pipeline + S&F) per Codex review. Failover testing embedded in runtime phases, not deferred to hardening. requirements-traceability.md: Full matrix mapping all 54 HighLevelReqs sections and 22 REQ-* identifiers to implementation phases. Zero unmapped requirements. questions.md: 15 open questions requiring follow-up before/during implementation (tooling, environments, team, integration targets).
2026-03-16 09:59:23 -04:00
parent 760eb38eac
commit 7a0bd0f701
3 changed files with 650 additions and 0 deletions
--- a/docs/plans/generate_plans.md
+++ b/docs/plans/generate_plans.md
@@ -0,0 +1,485 @@
+# Implementation Plan Generation Guide
+
+**Date**: 2026-03-16
+**Purpose**: Master plan for generating detailed implementation plans for all ScadaLink components across phased delivery.
+
+---
+
+## Overview
+
+This document defines the phased implementation strategy for the ScadaLink SCADA system. It is **not** the implementation plan itself — it is the plan for **generating** implementation plans. Each phase below will produce one or more detailed implementation plan documents in `docs/plans/`.
+
+### Guiding Principles
+
+1. **Each phase produces a testable, working increment** — no phase ends with unverifiable work.
+2. **Dependencies are respected** — no component is built before its dependencies.
+3. **Requirements traceability** — every HighLevelReqs section and REQ-* identifier must map to at least one phase. See `docs/plans/requirements-traceability.md` for the full matrix.
+4. **Questions are tracked** — any ambiguity discovered during plan generation is logged in `docs/plans/questions.md`.
+5. **Plans are broken into implementable work packages** — each phase is subdivided into epics, each epic into concrete tasks with acceptance criteria.
+6. **Failover and resilience are validated early** — not deferred to a final hardening phase. Each runtime phase includes failover acceptance criteria.
+7. **Persistence/recovery semantics are defined before actor design** — Akka.NET actor protocols depend on recovery behavior.
+
+---
+
+## Phase Structure
+
+### Phase 0: Solution Skeleton & Delivery Guardrails
+
+**Goal**: Establish a buildable, testable baseline before any domain work.
+
+**Components**:
+- Solution structure (all 17 component projects + test projects)
+- Commons (REQ-COM-5b namespace/folder skeleton, REQ-COM-1 shared types, REQ-COM-6 no-business-logic constraint, REQ-COM-7 dependency constraints)
+- Host (REQ-HOST-1 single binary, skeleton Program.cs, REQ-HOST-10 extension method convention)
+- CI baseline (build, test, format)
+
+**Testable Outcome**: Empty host boots by role from `appsettings.json`. Test pipeline runs. All projects compile with correct references.
+
+**HighLevelReqs Coverage**: 13.1 (UTC timestamps baked into type system)
+
+**Plan Document**: `docs/plans/phase-0-solution-skeleton.md`
+
+**Sub-tasks**:
+1. Create .NET solution with project structure matching component architecture
+2. Implement Commons type system (REQ-COM-1: enums, Result<T>, RetryPolicy, UTC convention)
+3. Implement Commons namespace/folder convention (REQ-COM-5b)
+4. Implement Commons entity POCOs (REQ-COM-3) — classes with properties, organized by domain area
+5. Implement Commons repository interfaces (REQ-COM-4) — interface signatures
+6. Implement Commons cross-cutting interfaces (REQ-COM-4a: IAuditService)
+7. Implement Commons message contracts (REQ-COM-5) — record types with versioning rules (REQ-COM-5a)
+8. Implement Commons protocol abstraction (REQ-COM-2: IDataConnection interface)
+9. Implement Host skeleton (REQ-HOST-1, REQ-HOST-2 role detection, REQ-HOST-10 extension method convention)
+10. Implement per-component options classes (REQ-HOST-3 config binding)
+11. Set up CI pipeline (build, test, format)
+12. Create local dev topology documentation (central + site appsettings files)
+
+---
+
+### Phase 1: Central Platform Foundations
+
+**Goal**: Central node can authenticate users, persist data, and host a web shell. Site-to-central trust model is established.
+
+**Components**:
+- Configuration Database (schema, DbContext, repos, IAuditService, migrations)
+- Security & Auth (LDAP bind, JWT, roles, site scoping)
+- Host (REQ-HOST-4 validation, REQ-HOST-4a readiness, REQ-HOST-5 Windows Service, REQ-HOST-6 Akka bootstrap, REQ-HOST-7 ASP.NET, REQ-HOST-8 logging, REQ-HOST-8a dead letters, REQ-HOST-9 shutdown)
+- Central UI (Blazor Server shell, login, route protection)
+
+**Testable Outcome**: User logs in via LDAP, receives JWT with correct role claims, sees an empty dashboard. Admin can manage LDAP group mappings. Audit entries persist. Central runs behind load balancer. Akka.NET actor system boots with cluster configuration.
+
+**HighLevelReqs Coverage**: 9.1–9.4, 10.1–10.4
+
+**Plan Document**: `docs/plans/phase-1-central-foundations.md`
+
+**Sub-tasks**:
+1. Configuration Database: EF Core DbContext, Fluent API entity mappings, initial migration
+2. Configuration Database: Repository implementations (ISecurityRepository, ICentralUiRepository)
+3. Configuration Database: IAuditService with transactional guarantee (same-transaction writes)
+4. Configuration Database: Optimistic concurrency on deployment status records
+5. Configuration Database: Seed data (initial admin LDAP mapping)
+6. Security & Auth: LDAP bind service (LDAPS/StartTLS required)
+7. Security & Auth: JWT issuance, 15-min sliding refresh, 30-min idle timeout
+8. Security & Auth: Role claim extraction from LDAP groups (Admin, Design, Deployment + site scoping)
+9. Security & Auth: Authorization policies with site-scoped Deployment checks
+10. Security & Auth: Shared Data Protection keys (config DB or shared config)
+11. Host: Full startup validation (REQ-HOST-4)
+12. Host: Readiness gating with `/health/ready` endpoint (REQ-HOST-4a)
+13. Host: Akka.NET bootstrap — cluster, remoting, persistence (REQ-HOST-6)
+14. Host: Serilog structured logging with SiteId/NodeHostname/NodeRole enrichment (REQ-HOST-8)
+15. Host: Dead letter monitoring subscription (REQ-HOST-8a)
+16. Host: CoordinatedShutdown wiring (REQ-HOST-9)
+17. Host: Windows Service support (REQ-HOST-5)
+18. Central UI: Blazor Server shell with SignalR
+19. Central UI: Login/logout flow with JWT
+20. Central UI: Role-aware navigation and route guards
+21. Central UI: Failover behavior (SignalR reconnect, JWT survives, shared Data Protection keys)
+22. Integration tests: Auth flow, audit logging, startup validation, readiness gating
+
+---
+
+### Phase 2: Core Modeling, Validation & Deployment Contract
+
+**Goal**: Template authoring data model, validation pipeline, and the compiled deployment artifact contract are functional. The output of this phase defines exactly what gets deployed to a site.
+
+**Components**:
+- Template Engine (full)
+- Configuration Database (ITemplateEngineRepository, IDeploymentManagerRepository stubs)
+
+**Testable Outcome**: Complex template trees can be authored, flattened, diffed, and validated programmatically. Revision hashes generated. The flattened configuration output format (the "deployment package") is stable and versioned. All validation rules enforced including semantic checks.
+
+**HighLevelReqs Coverage**: 3.1–3.11, 4.1, 4.5
+
+**Plan Document**: `docs/plans/phase-2-modeling-validation.md`
+
+**Sub-tasks**:
+1. Template CRUD with inheritance relationships
+2. Composition (has-a) with recursive nesting
+3. Path-qualified canonical naming for composed members
+4. Attribute, alarm, script definitions with lock flags
+5. Override granularity enforcement per entity type/field
+6. Naming collision detection (recursive across composed modules)
+7. Graph acyclicity enforcement (inheritance + composition)
+8. Flattening algorithm (full resolution chain: Instance → Child → Parent → Composing → Composed)
+9. Diff calculation (deployed vs. template-derived)
+10. Revision hash generation for flattened output
+11. **Deployment package contract**: Define the exact serialization format of a flattened configuration that will be sent to sites and stored in SQLite. This is the stable boundary between Template Engine, Deployment Manager, and Site Runtime.
+12. Pre-deployment validation pipeline:
+    - Flattening success
+    - Naming collisions
+    - Script test compilation
+    - Semantic validation (call targets, arg types, return types, trigger operand types)
+    - Alarm/script trigger reference existence
+    - Data connection binding completeness
+13. On-demand validation (same pipeline, no deployment trigger)
+14. Shared script validation (syntax/structural only)
+15. Template deletion constraint enforcement
+16. Instance CRUD (create from template, overrides, area assignment, connection binding)
+17. Site and data connection management (CRUD)
+18. Area management (hierarchical CRUD)
+19. Unit tests for flattening, validation, diff, collision detection, acyclicity
+
+---
+
+### Phase 3A: Runtime Foundation & Persistence Model
+
+**Goal**: Prove the Akka.NET cluster, singleton, and local persistence model work correctly — including failover.
+
+**Components**:
+- Cluster Infrastructure (full)
+- Host (site-role Akka bootstrap)
+- Site Runtime (Deployment Manager singleton skeleton, basic Instance Actor)
+- Local SQLite persistence model (deployed config storage, static attribute overrides)
+
+**Testable Outcome**: Two-node site cluster forms. Singleton starts on oldest node. Failover migrates singleton to surviving node. Singleton reads deployed configs from SQLite and recreates Instance Actors. Static attribute overrides persist across restart. `min-nr-of-members=1` verified. CoordinatedShutdown enables fast handover.
+
+**HighLevelReqs Coverage**: 1.2 (failover), 1.1 (site responsibilities — partial)
+
+**Plan Document**: `docs/plans/phase-3a-runtime-foundation.md`
+
+**Sub-tasks**:
+1. Cluster Infrastructure: Akka.NET cluster config with keep-oldest SBR, down-if-alone
+2. Cluster Infrastructure: Both-as-seed, min-nr-of-members=1
+3. Cluster Infrastructure: Failure detection timing (2s heartbeat, 10s threshold, 15s stable-after)
+4. Cluster Infrastructure: Graceful shutdown / CoordinatedShutdown for fast singleton handover
+5. Host: Site-role Akka bootstrap (generic Host, no Kestrel)
+6. Site Runtime: Deployment Manager singleton (cluster singleton registration + proxy)
+7. Site Runtime: Startup behavior — read SQLite, staggered Instance Actor creation
+8. Site Runtime: Instance Actor skeleton — hold attribute state, publish to stream
+9. Site Runtime: Supervision strategies per actor type
+10. Site Runtime: Static attribute persistence to SQLite (write + load on startup)
+11. Local persistence: SQLite schema for deployed configs and attribute overrides
+12. **Failover acceptance tests**:
+    - Active node crash → singleton migrates to standby
+    - Graceful shutdown → fast singleton handover
+    - Both nodes down → first up forms cluster, rebuilds from SQLite
+    - Static attribute overrides survive failover
+
+---
+
+### Phase 3B: Site I/O & Observability
+
+**Goal**: Site can connect to equipment, collect data, evaluate scripts and alarms, and report health to central.
+
+**Components**:
+- Communication Layer (full)
+- Data Connection Layer (IDataConnection, OPC UA adapter, connection actor)
+- Site Runtime (full — scripts, alarms, shared scripts, stream)
+- Health Monitoring (site-side collection + central-side aggregation)
+- Site Event Logging (event recording, retention, remote query)
+
+**Testable Outcome**: Site connects to OPC UA server, subscribes to tags, delivers values to Instance Actors. Scripts execute on triggers. Alarms evaluate and publish state. Health reports flow to central with sequence numbers. Event logs record and are queryable from central. Debug view streams live data.
+
+**HighLevelReqs Coverage**: 2.2–2.5, 3.4.1, 4.2–4.4, 4.4.1, 4.5, 4.6, 8.1, 11.1–11.2, 12.1–12.3
+
+**Plan Document**: `docs/plans/phase-3b-site-io-observability.md`
+
+**Sub-tasks**:
+1. Communication Layer: Message contracts with correlation IDs
+2. Communication Layer: Per-pattern timeout configuration (120s deploy, 30s queries)
+3. Communication Layer: Transport heartbeat config (2s interval, 10s threshold)
+4. Communication Layer: All 8 message patterns implementation
+5. Communication Layer: Application-level correlation for idempotency
+6. Data Connection Layer: Connection actor with Become/Stash lifecycle
+7. Data Connection Layer: OPC UA adapter implementing IDataConnection
+8. Data Connection Layer: Auto-reconnect (fixed interval), immediate bad quality on disconnect
+9. Data Connection Layer: Transparent re-subscribe on reconnection
+10. Data Connection Layer: Write-back with synchronous error to script
+11. Data Connection Layer: Tag path resolution with periodic retry
+12. Data Connection Layer: Health reporting (connection status + tag resolution counts)
+13. Site Runtime: Script Actor + Script Execution Actor (triggers, concurrent execution, dedicated dispatcher)
+14. Site Runtime: Alarm Actor + Alarm Execution Actor (condition evaluation, state management)
+15. Site Runtime: Shared script library (inline execution)
+16. Site Runtime: Script Runtime API (GetAttribute, SetAttribute, CallScript, CallShared)
+17. Site Runtime: Script trust model (forbidden APIs, execution timeout, constrained compilation)
+18. Site Runtime: Recursion limit enforcement
+19. Site Runtime: Tell vs Ask conventions (Tell for hot path, Ask for CallScript)
+20. Site Runtime: Site-wide Akka stream with per-subscriber backpressure
+21. Site Runtime: Concurrency serialization (Instance Actor serializes mutations)
+22. Health Monitoring: Site-side metric collection (all metrics)
+23. Health Monitoring: Periodic reporting with monotonic sequence numbers
+24. Health Monitoring: Central-side aggregation, offline detection (60s threshold)
+25. Health Monitoring: Dead letter count as metric
+26. Site Event Logging: Event recording to SQLite
+27. Site Event Logging: 30-day retention with daily purge + 1GB storage cap
+28. Site Event Logging: Remote query with pagination (500/page) and keyword search
+29. **Failover acceptance tests**: DCL reconnection after failover, health report continuity, stream recovery
+
+---
+
+### Phase 3C: Deployment Pipeline & Store-and-Forward
+
+**Goal**: Complete the deploy-to-site pipeline end-to-end with resilience.
+
+**Components**:
+- Deployment Manager (full)
+- Store-and-Forward Engine (full)
+
+**Testable Outcome**: Central validates, flattens, and deploys an instance to a site. Site compiles scripts, creates actors, reports success. Deployment ID ensures idempotency. Per-instance operation lock works. Instance lifecycle commands (disable/enable/delete) work. Store-and-forward buffers messages on transient failure, retries, parks. Async replication to standby. Parked messages queryable from central.
+
+**HighLevelReqs Coverage**: 1.3, 1.4, 1.5, 3.8.1, 3.9, 5.3, 5.4, 6.4
+
+**Plan Document**: `docs/plans/phase-3c-deployment-store-forward.md`
+
+**Sub-tasks**:
+1. Deployment Manager: Deployment flow (validate → flatten → send → track)
+2. Deployment Manager: Deployment ID + revision hash for idempotency
+3. Deployment Manager: Site-side state query after timeout/failover (reconciliation)
+4. Deployment Manager: Per-instance operation lock (all mutating commands)
+5. Deployment Manager: State transition matrix (enabled/disabled/not-deployed × operations)
+6. Deployment Manager: Site-side apply atomicity (all-or-nothing per instance)
+7. Deployment Manager: Instance lifecycle commands (disable, enable, delete)
+8. Deployment Manager: System-wide artifact deployment with per-site status
+9. Deployment Manager: Artifact version compatibility (cross-site skew supported)
+10. Deployment Manager: Deployment status persistence (current only + audit log)
+11. Store-and-Forward Engine: Message buffering with SQLite persistence
+12. Store-and-Forward Engine: Fixed-interval retry per source entity
+13. Store-and-Forward Engine: Transient-only buffering (5xx/connection errors)
+14. Store-and-Forward Engine: Parking after max retries
+15. Store-and-Forward Engine: Async best-effort replication to standby
+16. Store-and-Forward Engine: Parked message management (query, retry, discard from central)
+17. Store-and-Forward Engine: Messages survive instance deletion
+18. **End-to-end acceptance tests**: Deploy → run → failover → redeploy → lifecycle commands
+19. **Resilience tests**: Mid-deploy failover, timeout + reconciliation, S&F buffer takeover on failover
+
+---
+
+### Phase 4: Minimal Operator/Admin UI
+
+**Goal**: Operators can manage sites, monitor health, and control instance lifecycle from the browser.
+
+**Components**:
+- Central UI (admin + operator workflows)
+
+**Testable Outcome**: Admin manages sites, data connections, areas, LDAP mappings, API keys. Operator sees health dashboard with live SignalR push, manages instance lifecycle, views deployment status.
+
+**HighLevelReqs Coverage**: 8 (partial — admin and operator workflows), 7.2 (API key management)
+
+**Plan Document**: `docs/plans/phase-4-operator-ui.md`
+
+**Sub-tasks**:
+1. Admin: Site management (CRUD)
+2. Admin: Data connection management (define, assign to sites)
+3. Admin: Area management (hierarchical CRUD)
+4. Admin: LDAP group mapping management
+5. Admin: API key management (create, enable/disable, delete)
+6. Operator: Health monitoring dashboard (live push via SignalR)
+7. Operator: Instance list with filtering (site, area, template, status)
+8. Operator: Deployment status view (live push)
+9. Operator: Instance actions (enable, disable, delete)
+10. Shared UX: Authorization-aware navigation
+11. Shared UX: Error/success notifications
+12. Shared UX: Long-running action state indicators
+
+---
+
+### Phase 5: Design-Time UI & Authoring Workflows
+
+**Goal**: Design users can author templates, scripts, and system definitions through the UI.
+
+**Note**: This phase authors **metadata/definitions only** for External System Gateway, DB Connections, and Notification Service. The runtime execution of these integrations is Phase 7. UI for these definitions does not depend on Phase 7 runtime.
+
+**Components**:
+- Central UI (design workflows)
+
+**Testable Outcome**: Full template authoring with hierarchy visualization, composition, lock indicators, on-demand validation. Shared script management. External system, DB connection, notification list definition management. Instance creation with per-attribute binding and bulk assignment.
+
+**HighLevelReqs Coverage**: 8 (design workflows), 3.1–3.11 (UI for template operations), 5.1, 5.5, 6.1, 7.4
+
+**Plan Document**: `docs/plans/phase-5-authoring-ui.md`
+
+**Sub-tasks**:
+1. Template authoring: CRUD, inheritance tree visualization
+2. Template authoring: Composition management with collision detection feedback
+3. Template authoring: Attribute/alarm/script editing with lock indicators
+4. Template authoring: Inherited vs. local vs. overridden visual indicators
+5. Template authoring: On-demand validation with actionable error display
+6. Shared script management (CRUD, compilation check)
+7. External system definition management (CRUD, method definitions) — metadata only
+8. Database connection definition management (CRUD) — metadata only
+9. Notification list management (CRUD, recipients, SMTP config) — metadata only
+10. Inbound API method definition management (CRUD) — metadata only
+11. Instance creation from template
+12. Instance per-attribute data connection binding with bulk assignment
+13. Instance attribute override editing
+
+---
+
+### Phase 6: Deployment Operations & Troubleshooting UI
+
+**Goal**: Complete the operational loop — deploy, diagnose, troubleshoot from central.
+
+**Components**:
+- Central UI (deployment + troubleshooting workflows)
+
+**Testable Outcome**: Full deployment workflow with diffs and validation gating. System-wide artifact deployment. Debug view with live streaming. Site event log viewer. Parked message management. Audit log viewer.
+
+**HighLevelReqs Coverage**: 8 (remaining workflows), 8.1, 10.1–10.3, 12.3
+
+**Plan Document**: `docs/plans/phase-6-deployment-ops-ui.md`
+
+**Sub-tasks**:
+1. Deployment: Staleness indicators (revision hash comparison)
+2. Deployment: Diff view (added/removed/changed members)
+3. Deployment: Deploy with pre-validation gating
+4. Deployment: Deployment status tracking (live SignalR push)
+5. System-wide artifact deployment with per-site status matrix
+6. Debug view: Instance selection, snapshot + live stream via SignalR
+7. Site event log viewer: Remote query with filters, pagination, keyword search
+8. Parked message management: Query, retry, discard
+9. Audit log viewer: Query with filters (user, entity type, action, time range)
+
+---
+
+### Phase 7: Integration Surfaces
+
+**Goal**: External systems can call in and site scripts can call out.
+
+**Components**:
+- Inbound API (full runtime)
+- External System Gateway (full site-side execution)
+- Notification Service (full site-side delivery)
+
+**Testable Outcome**: External systems call `POST /api/{method}` with X-API-Key auth. Site scripts use `ExternalSystem.Call()` / `CachedCall()` with HTTP/REST + JSON. Notifications sent via SMTP/OAuth2. Store-and-forward handles transient failures for CachedCall and notifications. Error classification works (transient → S&F, permanent → script).
+
+**HighLevelReqs Coverage**: 5.1–5.6, 6.1–6.4, 7.1–7.5
+
+**Plan Document**: `docs/plans/phase-7-integrations.md`
+
+**Sub-tasks**:
+1. Inbound API: ASP.NET endpoint registration, X-API-Key header auth
+2. Inbound API: Method routing, parameter validation (extended type system)
+3. Inbound API: Script execution engine on central
+4. Inbound API: Route.To() for cross-site calls (via Communication Layer)
+5. Inbound API: Batch attribute operations (GetAttributes/SetAttributes)
+6. Inbound API: Error handling (401/403/400/500), per-method timeout
+7. Inbound API: Failures-only logging
+8. External System Gateway: HTTP/REST client with JSON serialization
+9. External System Gateway: API key + Basic Auth outbound authentication
+10. External System Gateway: Per-system timeout
+11. External System Gateway: Dual call modes — Call() (synchronous) and CachedCall() (S&F on transient)
+12. External System Gateway: Error classification (5xx/408/429 transient, other 4xx permanent)
+13. External System Gateway: CachedCall idempotency documentation for callers
+14. External System Gateway: Dedicated blocking I/O dispatcher for Script Execution Actors
+15. External System Gateway: Database access — Connection() (ADO.NET pooling) and CachedWrite() (S&F)
+16. Notification Service: SMTP client with OAuth2 Client Credentials (M365) + Basic Auth
+17. Notification Service: Token lifecycle (fetch, cache, refresh on expiry)
+18. Notification Service: BCC delivery, plain text, from address
+19. Notification Service: Connection timeout + max concurrent connections
+20. Notification Service: Error classification (SMTP 4xx transient, 5xx permanent)
+21. Notification Service: Store-and-forward integration for transient failures
+22. End-to-end tests: Inbound API → Route.To() → site script → ExternalSystem.Call() → response
+23. End-to-end tests: Script → CachedCall → transient failure → S&F buffer → retry → success
+24. End-to-end tests: Script → Notify.To().Send() → SMTP delivery (or S&F on failure)
+
+---
+
+### Phase 8: Production Readiness & Hardening
+
+**Goal**: System is validated at target scale and ready for production deployment.
+
+**Note**: This phase is for **comprehensive** resilience and scale testing. Basic failover testing is embedded in Phases 3A and 3C. This phase validates the full system under realistic conditions.
+
+**Components**: Cross-cutting (all components)
+
+**Testable Outcome**: Documented pass on failover, recovery, scale, and security targets. No workflow requires direct DB access. Operational documentation complete.
+
+**HighLevelReqs Coverage**: 2.5 (scale verification), all sections (final verification pass)
+
+**Plan Document**: `docs/plans/phase-8-production-readiness.md`
+
+**Sub-tasks**:
+1. Full-system failover testing: Central failover (JWT survival, SignalR reconnect, deployment state)
+2. Full-system failover testing: Site failover (singleton migration, S&F takeover, DCL reconnect, alarm re-evaluation)
+3. Full-system failover testing: Dual-node failure and automatic recovery (both central and site)
+4. Load/performance testing at target scale (10 sites, 500 machines, 75 tags each)
+5. Security hardening: LDAPS verification, JWT signing key rotation procedures, secrets management
+6. Script sandboxing verification (forbidden API enforcement under adversarial conditions)
+7. Recovery drills: Mid-deploy failover, communication drops during artifact deployment, site restarts
+8. Observability validation: Structured log review, correlation ID coverage, health dashboard accuracy
+9. Message contract compatibility testing (version skew scenarios between central and sites)
+10. Installer/deployment packaging (Windows Service setup, appsettings templates)
+11. Operational runbooks and documentation
+
+---
+
+## Delivery Milestones
+
+| Milestone | Phases | Outcome |
+|-----------|--------|---------|
+| M1: Foundation | 0–1 | Central host, auth, DB, web shell, Akka bootstrap |
+| M2: Core Modeling | 2 | Template modeling, validation, deployment package contract |
+| M3: Site Runtime | 3A–3C | Full deployment pipeline, site execution, failover proven |
+| M4: Operator UI | 4 | Admin/operator workflows for pilot operations |
+| M5: Full Management | 5–6 | Complete central management and troubleshooting experience |
+| M6: Integration-Ready | 7 | External systems can call in, site scripts can call out |
+| M7: Production | 8 | Hardened, tested, documented for production |
+
+---
+
+## Plan Generation Procedure
+
+For each phase, the implementation plan document must contain:
+
+1. **Scope** — Components and features included
+2. **Prerequisites** — Which phases/components must be complete
+3. **Work Packages** — Numbered tasks with:
+   - Description
+   - Acceptance criteria
+   - Estimated complexity (S/M/L)
+   - Requirements traced (HighLevelReqs section + REQ-* IDs)
+4. **Test Strategy** — Unit, integration, and failover tests required
+5. **Verification Gate** — What must pass before the phase is considered complete
+6. **Open Questions** — Any ambiguities discovered, added to `questions.md`
+
+### Generation Steps
+
+1. Read the phase definition in this document
+2. Read all referenced Component-*.md documents
+3. Read referenced HighLevelReqs.md sections
+4. Cross-reference `requirements-traceability.md` to ensure coverage
+5. Break sub-tasks into concrete work packages with acceptance criteria
+6. Identify test scenarios
+7. Write the plan document to `docs/plans/phase-N-<name>.md`
+8. Update `requirements-traceability.md` with plan references
+9. Log any questions to `questions.md`
+
+---
+
+## File Index
+
+| File | Purpose |
+|------|---------|
+| `docs/plans/generate_plans.md` | This document — master plan for generating implementation plans |
+| `docs/plans/requirements-traceability.md` | Matrix linking every requirement to its implementation phase |
+| `docs/plans/questions.md` | Questions requiring follow-up before or during implementation |
+| `docs/plans/phase-0-solution-skeleton.md` | (To be generated) |
+| `docs/plans/phase-1-central-foundations.md` | (To be generated) |
+| `docs/plans/phase-2-modeling-validation.md` | (To be generated) |
+| `docs/plans/phase-3a-runtime-foundation.md` | (To be generated) |
+| `docs/plans/phase-3b-site-io-observability.md` | (To be generated) |
+| `docs/plans/phase-3c-deployment-store-forward.md` | (To be generated) |
+| `docs/plans/phase-4-operator-ui.md` | (To be generated) |
+| `docs/plans/phase-5-authoring-ui.md` | (To be generated) |
+| `docs/plans/phase-6-deployment-ops-ui.md` | (To be generated) |
+| `docs/plans/phase-7-integrations.md` | (To be generated) |
+| `docs/plans/phase-8-production-readiness.md` | (To be generated) |