diff --git a/CLAUDE.md b/CLAUDE.md index ea948b7..8813894 100644 --- a/CLAUDE.md +++ b/CLAUDE.md @@ -119,7 +119,9 @@ There is no source code in this project — only design documentation in markdow - Automatic dual-node recovery from persistent storage. ### UI & Monitoring -- Central UI: Blazor Server (ASP.NET Core + SignalR). Real-time push for debug view, health dashboard, deployment status. +- Central UI: Blazor Server (ASP.NET Core + SignalR) with Bootstrap CSS. No third-party component frameworks (no Blazorise, MudBlazor, Radzen, etc.). Build custom Blazor components for tables, grids, forms, etc. +- UI design: Clean, corporate, internal-use aesthetic. Not flashy. Use the `frontend-design` skill when designing UI pages/components. +- Real-time push for debug view, health dashboard, deployment status. - Health reports: 30s interval, 60s offline threshold, monotonic sequence numbers, raw error counts per interval. - Dead letter monitoring as a health metric. - Site Event Logging: 30-day retention, 1GB storage cap, daily purge, paginated queries with keyword search. diff --git a/Component-ClusterInfrastructure.md b/Component-ClusterInfrastructure.md index ea22d62..526ae5c 100644 --- a/Component-ClusterInfrastructure.md +++ b/Component-ClusterInfrastructure.md @@ -32,7 +32,7 @@ Both central and site clusters. - The Site Runtime Deployment Manager runs as an **Akka.NET cluster singleton** on the active node, owning the full Instance Actor hierarchy. - One standby node receives replicated store-and-forward data and is ready to take over. - Connected to local SQLite databases (store-and-forward buffer, event logs, deployed configurations). -- Connected to machines via data connections (OPC UA, custom protocol). +- Connected to machines via data connections (OPC UA, LmxProxy). ## Failover Behavior diff --git a/Component-DataConnectionLayer.md b/Component-DataConnectionLayer.md index 3aeefea..8b117f7 100644 --- a/Component-DataConnectionLayer.md +++ b/Component-DataConnectionLayer.md @@ -10,7 +10,7 @@ Site clusters only. Central does not interact with machines directly. ## Responsibilities -- Manage data connections defined at the site level (OPC UA servers, custom protocol endpoints). +- Manage data connections defined at the site level (OPC UA servers, LmxProxy endpoints). - Establish and maintain connections to data sources based on deployed instance configurations. - Subscribe to tag paths as requested by Instance Actors (based on attribute data source references in the flattened configuration). - Deliver tag value updates to the requesting Instance Actors. @@ -19,7 +19,7 @@ Site clusters only. Central does not interact with machines directly. ## Common Interface -Both OPC UA and the custom protocol implement the same interface: +Both OPC UA and LmxProxy implement the same interface: ``` IDataConnection @@ -34,15 +34,65 @@ IDataConnection Additional protocols can be added by implementing this interface. +### Concrete Type Mappings + +| IDataConnection | OPC UA SDK | LmxProxy SDK (`LmxProxyClient`) | +|---|---|---| +| `Connect()` | OPC UA session establishment | `ConnectAsync()` → gRPC `ConnectRequest`, server returns `SessionId` | +| `Disconnect()` | Close OPC UA session | `DisconnectAsync()` → gRPC `DisconnectRequest` | +| `Subscribe(tagPath, callback)` | OPC UA Monitored Items | `SubscribeAsync(addresses, onUpdate)` → server-streaming gRPC (`IAsyncEnumerable`) | +| `Unsubscribe(id)` | Remove Monitored Item | `ISubscription.DisposeAsync()` (cancels streaming RPC) | +| `Read(tagPath)` | OPC UA Read | `ReadAsync(address)` → `Vtq` | +| `Write(tagPath, value)` | OPC UA Write | `WriteAsync(address, value)` | +| `Status` | OPC UA session state | `IsConnected` property + keep-alive heartbeat (30-second interval via `GetConnectionStateAsync`) | + +### Common Value Type + +Both protocols produce the same value tuple consumed by Instance Actors: + +| Concept | ScadaLink Design | LmxProxy SDK (`Vtq`) | +|---|---|---| +| Value container | `{value, quality, timestamp}` | `Vtq(Value, Timestamp, Quality)` — readonly record struct | +| Quality | good / bad / uncertain | `Quality` enum (byte, OPC UA compatible: Good=0xC0, Bad=0x00, Uncertain=0x40) | +| Timestamp | UTC | `DateTime` (UTC) | +| Value type | object | `object?` (parsed: double, bool, string) | + ## Supported Protocols ### OPC UA - Standard OPC UA client implementation. - Supports subscriptions (monitored items) and read/write operations. -### Custom Protocol -- Proprietary protocol adapter. -- Implements the same subscription-based model as OPC UA. +### LmxProxy (Custom Protocol) + +LmxProxy is a gRPC-based protocol for communicating with LMX data servers. An existing client SDK (`LmxProxyClient` NuGet package) provides a production-ready implementation. + +**Transport & Connection**: +- gRPC over HTTP/2, using protobuf-net code-first contracts (service: `scada.ScadaService`). +- Default port: **5050**. +- Session-based: `ConnectAsync` returns a `SessionId` used for all subsequent operations. +- Keep-alive: 30-second heartbeat via `GetConnectionStateAsync`. On failure, the client marks itself disconnected and disposes subscriptions. + +**Authentication & TLS**: +- API key-based authentication (sent in `ConnectRequest`). +- Full TLS support: TLS 1.2/1.3, mutual TLS (client cert + key in PEM), custom CA trust, self-signed cert allowance for dev. + +**Subscriptions**: +- Server-streaming gRPC (`IAsyncEnumerable`). +- Configurable sampling interval (default: 1000ms; 0 = on-change). +- Wire format: `VtqMessage { Tag, Value (string), TimestampUtcTicks (long), Quality (string: "Good"/"Uncertain"/"Bad") }`. +- Subscription disposed via `ISubscription.DisposeAsync()`. + +**Additional Capabilities (beyond IDataConnection)**: +- `ReadBatchAsync(addresses)` — bulk read in a single gRPC call. +- `WriteBatchAsync(values)` — bulk write in a single gRPC call. +- `WriteBatchAndWaitAsync(values, flagAddress, flagValue, responseAddress, responseValue, timeout)` — write-and-poll pattern for handshake protocols (default timeout: 30s, poll interval: 100ms). +- Built-in retry policy via Polly: exponential backoff (base delay × 2^attempt), configurable max attempts (default: 3), applied to reads. Transient errors: `Unavailable`, `DeadlineExceeded`, `ResourceExhausted`, `Aborted`. +- Operation metrics: count, errors, p95/p99 latency (ring buffer of last 1000 samples per operation). +- Correlation ID propagation for distributed tracing (configurable header name). +- DI integration: `AddLmxProxyClient(IConfiguration)` binds to `"LmxProxy"` config section in `appsettings.json`. + +**SDK Reference**: The client SDK source is at `LmxProxyClient` in the ScadaBridge repository. The DCL's LmxProxy adapter wraps this SDK behind the `IDataConnection` interface. ## Subscription Management @@ -77,12 +127,14 @@ Each data connection is managed by a dedicated connection actor that uses the Ak This pattern ensures no messages are lost during connection transitions and is the standard Akka.NET approach for actors with I/O lifecycle dependencies. +**LmxProxy-specific notes**: The LmxProxy connection actor holds the `SessionId` returned by `ConnectAsync` and passes it to all subsequent operations. On entering the **Connected** state, the actor starts the 30-second keep-alive timer. Subscriptions use server-streaming gRPC — the actor processes the `IAsyncEnumerable` stream and forwards updates to Instance Actors. On keep-alive failure, the actor transitions to **Reconnecting** and the client automatically disposes active subscriptions. + ## Connection Lifecycle & Reconnection The DCL manages connection lifecycle automatically: 1. **Connection drop detection**: When a connection to a data source is lost, the DCL immediately pushes a value update with quality `bad` for **every tag subscribed on that connection**. Instance Actors and their downstream consumers (alarms, scripts checking quality) see the staleness immediately. -2. **Auto-reconnect with fixed interval**: The DCL retries the connection at a configurable fixed interval (e.g., every 5 seconds). The retry interval is defined **per data connection**. This is consistent with the fixed-interval retry philosophy used throughout the system. +2. **Auto-reconnect with fixed interval**: The DCL retries the connection at a configurable fixed interval (e.g., every 5 seconds). The retry interval is defined **per data connection**. This is consistent with the fixed-interval retry philosophy used throughout the system. **Note on LmxProxy**: The LmxProxy SDK includes its own retry policy (exponential backoff via Polly) for individual operations (reads). The DCL's fixed-interval reconnect owns **connection-level** recovery (re-establishing the gRPC session after a keep-alive failure or disconnect). The SDK's retry policy handles **operation-level** transient failures within an active session. These are complementary — the DCL does not disable the SDK's retry policy. 3. **Connection state transitions**: The DCL tracks each connection's state as `connected`, `disconnected`, or `reconnecting`. All transitions are logged to Site Event Logging. 4. **Transparent re-subscribe**: On successful reconnection, the DCL automatically re-establishes all previously active subscriptions for that connection. Instance Actors require no action — they simply see quality return to `good` as fresh values arrive from restored subscriptions. diff --git a/HighLevelReqs.md b/HighLevelReqs.md index 5e980c3..b954b1c 100644 --- a/HighLevelReqs.md +++ b/HighLevelReqs.md @@ -54,7 +54,7 @@ - Store-and-forward buffers are persisted to a **local SQLite database on each node** and replicated between nodes via application-level replication (see 1.3). ### 2.4 Data Connection Protocols -- The system supports **OPC UA** and a **custom protocol**. +- The system supports **OPC UA** and **LmxProxy** (a gRPC-based custom protocol with an existing client SDK). - Both protocols implement a **common interface** supporting: connect, subscribe to tag paths, receive value updates, and write values. - Additional protocols can be added by implementing the common interface. - The Data Connection Layer is a **clean data pipe** — it publishes tag value updates to Instance Actors but performs no evaluation of triggers or alarm conditions. diff --git a/docs/plans/generate_plans.md b/docs/plans/generate_plans.md index e2891a0..a38ba41 100644 --- a/docs/plans/generate_plans.md +++ b/docs/plans/generate_plans.md @@ -13,11 +13,13 @@ This document defines the phased implementation strategy for the ScadaLink SCADA 1. **Each phase produces a testable, working increment** — no phase ends with unverifiable work. 2. **Dependencies are respected** — no component is built before its dependencies. -3. **Requirements traceability** — every HighLevelReqs section and REQ-* identifier must map to at least one phase. See `docs/plans/requirements-traceability.md` for the full matrix. -4. **Questions are tracked** — any ambiguity discovered during plan generation is logged in `docs/plans/questions.md`. -5. **Plans are broken into implementable work packages** — each phase is subdivided into epics, each epic into concrete tasks with acceptance criteria. -6. **Failover and resilience are validated early** — not deferred to a final hardening phase. Each runtime phase includes failover acceptance criteria. -7. **Persistence/recovery semantics are defined before actor design** — Akka.NET actor protocols depend on recovery behavior. +3. **Requirements traceability at bullet level** — every individual requirement (each bullet point, sub-bullet, and constraint) in HighLevelReqs.md must map to at least one work package. Section-level mapping is insufficient — a section like "4.4 Script Capabilities" contains ~8 distinct requirements that may land in different phases. See `docs/plans/requirements-traceability.md` for the matrix. +4. **Design decision traceability** — the Key Design Decisions in CLAUDE.md and detailed design in Component-*.md documents contain implementation constraints not present in HighLevelReqs.md (e.g., Become/Stash pattern, staggered startup, Tell vs Ask conventions, forbidden script APIs). Each must trace to a work package. +5. **Split-section completeness** — when a HighLevelReqs section spans multiple phases, each phase's plan must explicitly list which bullets from that section it covers. The union across all phases must be the complete section with no gaps. +6. **Questions are tracked** — any ambiguity discovered during plan generation is logged in `docs/plans/questions.md`. +7. **Plans are broken into implementable work packages** — each phase is subdivided into epics, each epic into concrete tasks with acceptance criteria. +8. **Failover and resilience are validated early** — not deferred to a final hardening phase. Each runtime phase includes failover acceptance criteria. +9. **Persistence/recovery semantics are defined before actor design** — Akka.NET actor protocols depend on recovery behavior. --- @@ -442,26 +444,70 @@ For each phase, the implementation plan document must contain: 1. **Scope** — Components and features included 2. **Prerequisites** — Which phases/components must be complete -3. **Work Packages** — Numbered tasks with: +3. **Requirements Checklist** — A bullet-level checklist extracted from HighLevelReqs.md for every section this phase covers (see Bullet-Level Extraction below). Each bullet is a checkbox that must map to a work package. +4. **Design Constraints Checklist** — Applicable constraints from CLAUDE.md Key Design Decisions and Component-*.md documents, each mapped to a work package. +5. **Work Packages** — Numbered tasks with: - Description - - Acceptance criteria + - Acceptance criteria (must cover every checklist bullet mapped to this work package) - Estimated complexity (S/M/L) - - Requirements traced (HighLevelReqs section + REQ-* IDs) -4. **Test Strategy** — Unit, integration, and failover tests required -5. **Verification Gate** — What must pass before the phase is considered complete -6. **Open Questions** — Any ambiguities discovered, added to `questions.md` + - Requirements traced (HighLevelReqs bullet IDs + REQ-* IDs + design constraint refs) +6. **Test Strategy** — Unit, integration, and failover tests required +7. **Verification Gate** — What must pass before the phase is considered complete +8. **Open Questions** — Any ambiguities discovered, added to `questions.md` + +### Bullet-Level Extraction + +When a phase covers HighLevelReqs sections, the plan must decompose each section into its individual requirements: + +- Each bullet point or sub-bullet is a separate requirement line item. +- Each sentence within a bullet that introduces a distinct constraint or behavior is a separate requirement. +- Negative requirements ("cannot", "does not", "no") are explicit line items — they become acceptance criteria that verify the behavior is correctly prohibited. +- For sections split across phases, each phase lists only its bullets. After all plans covering that section are generated, verify the union is complete. + +**Example**: Section 4.4 "Script Capabilities" decomposes into: +- `[4.4-1]` Read attribute values (live + static) +- `[4.4-2]` Write attributes — data-sourced writes go to DCL, value updates on device confirm +- `[4.4-3]` Write attributes — static writes persist to SQLite, survive restart/failover, reset on redeploy +- `[4.4-4]` CallScript with ask pattern, concurrent execution +- `[4.4-5]` CallShared executes inline (no separate actor) +- `[4.4-6]` ExternalSystem.Call() synchronous +- `[4.4-7]` ExternalSystem.CachedCall() with S&F +- `[4.4-8]` Send notifications +- `[4.4-9]` Database.Connection() for raw ADO.NET access +- `[4.4-10]` Cannot access other instances' attributes or scripts + +### Design Constraint Extraction + +Each phase plan must also scan the following sources for implementation constraints relevant to its components: + +1. **CLAUDE.md → Key Design Decisions**: Each bullet is a constraint. Tag with `[KDD-category-N]` (e.g., `[KDD-runtime-3]` for "Staggered Instance Actor startup on failover"). +2. **Component-*.md documents**: Design details beyond what HighLevelReqs specifies (e.g., connection actor Become/Stash pattern, health report monotonic sequence numbers, 30s keep-alive interval). Tag with `[CD-ComponentName-N]`. + +These are mapped to work packages and verified in acceptance criteria just like HighLevelReqs bullets. ### Generation Steps 1. Read the phase definition in this document 2. Read all referenced Component-*.md documents -3. Read referenced HighLevelReqs.md sections -4. Cross-reference `requirements-traceability.md` to ensure coverage -5. Break sub-tasks into concrete work packages with acceptance criteria -6. Identify test scenarios -7. Write the plan document to `docs/plans/phase-N-.md` -8. Update `requirements-traceability.md` with plan references -9. Log any questions to `questions.md` +3. Read referenced HighLevelReqs.md sections **line by line** — extract every bullet, sub-bullet, and constraint as a numbered requirement +4. Read CLAUDE.md Key Design Decisions — extract constraints relevant to this phase's components +5. Build the Requirements Checklist and Design Constraints Checklist +6. Break sub-tasks into concrete work packages with acceptance criteria, mapping every checklist item +7. Verify: every checklist item maps to at least one work package. Flag any orphans. +8. Identify test scenarios — negative requirements ("cannot", "does not") must have explicit test cases +9. Write the plan document to `docs/plans/phase-N-.md` +10. Update `requirements-traceability.md` with bullet-level references +11. Log any questions to `questions.md` + +### Post-Generation Verification (Orphan Check) + +After writing a phase plan, perform this verification before considering it complete: + +1. **Forward check**: Walk every item in the Requirements Checklist and Design Constraints Checklist. Confirm each maps to a work package with acceptance criteria that would fail if the requirement were not implemented. +2. **Reverse check**: Walk every work package. Confirm each traces back to at least one requirement or design constraint (no untraceable work). +3. **Split-section check**: For any HighLevelReqs section shared with another phase, list the bullets this phase does NOT cover and note which phase owns them. If a bullet is unowned, it's a gap — assign it. +4. **Negative requirement check**: Every "cannot", "does not", "no", "not" constraint has an acceptance criterion that verifies the prohibition (e.g., "Scripts cannot access other instances" → test that cross-instance access fails). +5. Record the verification result at the bottom of the plan document. --- diff --git a/docs/plans/questions.md b/docs/plans/questions.md index 828eff7..6728924 100644 --- a/docs/plans/questions.md +++ b/docs/plans/questions.md @@ -6,26 +6,11 @@ ## Open Questions -### Phase 3: Site Execution - -| # | Question | Context | Impact | Status | -|---|----------|---------|--------|--------| -| Q9 | What is the custom protocol? Is there an existing specification or SDK? | The design mentions "custom protocol" as a second adapter alongside OPC UA. Need details to implement. | Phase 3. | Deferred — owner will provide details later | - ### Phase 7: Integrations | # | Question | Context | Impact | Status | |---|----------|---------|--------|--------| -| Q11 | Are there specific external systems (MES, recipe manager) to integrate with for initial testing? | Affects External System Gateway and Inbound API test scenarios. | Phase 7. | Open | -| Q12 | What Microsoft 365 tenant/app registration is available for SMTP OAuth2 testing? | Affects Notification Service OAuth2 implementation. | Phase 7. | Open | - -### Cross-Phase - -| # | Question | Context | Impact | Status | -|---|----------|---------|--------|--------| -| Q13 | Who is the development team? (Size, experience with Akka.NET, availability) | Affects phase sizing, parallelization, and timeline estimates. | All phases. | Open | -| Q14 | Is there an existing deployment target environment for early pilot testing? | Affects Phase 4 (minimal UI for pilot operations) timing. | Phase 4+. | Open | -| Q15 | Should the Machine Data Database schema be designed in this project, or is it out of scope? | HighLevelReqs 2.1 mentions it but no component owns it. | Phase 3 or later. | Open | +| Q12 | What Microsoft 365 tenant/app registration is available for SMTP OAuth2 testing? | Affects Notification Service OAuth2 implementation. | Phase 7. | Deferred — won't be known during development. Implement against Basic Auth first; OAuth2 tested when tenant available. | --- @@ -42,3 +27,8 @@ | Q7 | JWT signing key storage? | `appsettings.json` (per environment). | 2026-03-16 | | Q8 | OPC UA server for dev/test? | Azure IoT OPC PLC simulator in Docker. See `infra/opcua/nodes.json` and `test_infra_opcua.md`. | 2026-03-16 | | Q10 | Target site hardware? | Windows Server 2022, 24 GB RAM, 1 TB drive, 16-core Xeon. | 2026-03-16 | +| Q9 | What is the custom protocol? Is there an existing specification or SDK? | LmxProxy — gRPC-based protocol (protobuf-net code-first, port 5050, API key auth). Client SDK: `LmxProxyClient` NuGet package. See Component-DataConnectionLayer.md for full API mapping and protocol details. | 2026-03-16 | +| Q11 | Are there specific external systems (MES, recipe manager) to integrate with for initial testing? | REST API test server (`infra/restapi/`) provides simulated external endpoints for External System Gateway and Inbound API testing. No real MES/recipe system needed for initial phases. | 2026-03-16 | +| Q15 | Should the Machine Data Database schema be designed in this project, or is it out of scope? | Out of scope — Machine Data Database is a pre-existing database at customer sites. Test infra seeds sample tables/data in `infra/mssql/machinedata_seed.sql`. | 2026-03-16 | +| Q13 | Who is the development team? | Solo developer with extensive Akka.NET experience and full availability. No parallelization constraints — phases are sequential. | 2026-03-16 | +| Q14 | Is there an existing deployment target environment for early pilot testing? | Yes — developer can test directly. No separate pilot environment needed. | 2026-03-16 | diff --git a/docs/plans/requirements-traceability.md b/docs/plans/requirements-traceability.md index 8cf5db9..f617fed 100644 --- a/docs/plans/requirements-traceability.md +++ b/docs/plans/requirements-traceability.md @@ -1,6 +1,10 @@ # Requirements Traceability Matrix -**Purpose**: Ensures every requirement from HighLevelReqs.md and every REQ-* identifier maps to at least one implementation phase. Updated as plan documents are generated. +**Purpose**: Ensures every requirement from HighLevelReqs.md, every REQ-* identifier, and every design constraint from CLAUDE.md and Component-*.md maps to at least one work package in an implementation phase plan. Updated as plan documents are generated. + +**Traceability levels**: +- **Section-level** (this document): Maps HighLevelReqs sections, REQ-* IDs, and design constraints to phases. Serves as the index. +- **Bullet-level** (phase plan documents): Each phase plan contains a Requirements Checklist that decomposes its sections into individual bullets with `[section-N]` IDs, each mapped to a work package. The bullet-level detail lives in the plan documents, not here — this matrix tracks which sections are assigned and their verification status. --- @@ -102,9 +106,160 @@ --- +## Design Constraints → Phase Mapping + +Design decisions from CLAUDE.md Key Design Decisions and Component-*.md documents that impose implementation constraints beyond what HighLevelReqs specifies. Each is tagged `[KDD-category-N]` (Key Design Decision) or `[CD-Component-N]` (Component Design). Bullet-level extraction happens in the phase plan documents. + +### Architecture & Runtime + +| ID | Constraint | Source | Phase(s) | Status | +|----|-----------|--------|----------|--------| +| KDD-runtime-1 | Instance modeled as Akka actor (Instance Actor) — single source of truth for runtime state | CLAUDE.md | 3A | Pending | +| KDD-runtime-2 | Site Runtime actor hierarchy: Deployment Manager singleton → Instance Actors → Script Actors + Alarm Actors | CLAUDE.md | 3A, 3B | Pending | +| KDD-runtime-3 | Script Actors spawn short-lived Script Execution Actors on dedicated blocking I/O dispatcher | CLAUDE.md | 3B | Pending | +| KDD-runtime-4 | Alarm Actors are separate peer subsystem from scripts | CLAUDE.md | 3B | Pending | +| KDD-runtime-5 | Shared scripts execute inline as compiled code (no separate actors) | CLAUDE.md | 3B | Pending | +| KDD-runtime-6 | Site-wide Akka stream for attribute value and alarm state changes with per-subscriber buffering | CLAUDE.md | 3B | Pending | +| KDD-runtime-7 | Instance Actors serialize all state mutations; concurrent scripts produce interleaved side effects | CLAUDE.md | 3B | Pending | +| KDD-runtime-8 | Staggered Instance Actor startup on failover to prevent reconnection storms | CLAUDE.md | 3A | Pending | +| KDD-runtime-9 | Supervision: Resume for coordinator actors, Stop for short-lived execution actors | CLAUDE.md | 3A | Pending | + +### Data & Communication + +| ID | Constraint | Source | Phase(s) | Status | +|----|-----------|--------|----------|--------| +| KDD-data-1 | DCL connection actor uses Become/Stash pattern for lifecycle state machine | CLAUDE.md, Component-DCL | 3B | Pending | +| KDD-data-2 | DCL auto-reconnect at fixed interval; immediate bad quality on disconnect; transparent re-subscribe | CLAUDE.md, Component-DCL | 3B | Pending | +| KDD-data-3 | DCL write failures returned synchronously to calling script | CLAUDE.md, Component-DCL | 3B | Pending | +| KDD-data-4 | Tag path resolution retried periodically for devices still booting | CLAUDE.md, Component-DCL | 3B | Pending | +| KDD-data-5 | Static attribute writes persisted to local SQLite (survive restart/failover, reset on redeployment) | CLAUDE.md | 3A | Pending | +| KDD-data-6 | All timestamps are UTC throughout the system | CLAUDE.md | 0 | Pending | +| KDD-data-7 | Tell for hot-path internal communication; Ask reserved for system boundaries | CLAUDE.md | 3A, 3B | Pending | +| KDD-data-8 | Application-level correlation IDs on all request/response messages | CLAUDE.md | 3B | Pending | + +### External Integrations + +| ID | Constraint | Source | Phase(s) | Status | +|----|-----------|--------|----------|--------| +| KDD-ext-1 | External System Gateway: HTTP/REST only, JSON serialization, API key + Basic Auth | CLAUDE.md | 7 | Pending | +| KDD-ext-2 | Dual call modes: Call() synchronous and CachedCall() store-and-forward | CLAUDE.md | 7 | Pending | +| KDD-ext-3 | Error classification: HTTP 5xx/408/429/connection = transient; other 4xx = permanent | CLAUDE.md | 7 | Pending | +| KDD-ext-4 | Notification Service: SMTP with OAuth2 Client Credentials (M365) or Basic Auth. BCC delivery, plain text | CLAUDE.md | 7 | Pending | +| KDD-ext-5 | Inbound API: POST /api/{methodName}, X-API-Key header, flat JSON, extended type system | CLAUDE.md | 7 | Pending | + +### Templates & Deployment + +| ID | Constraint | Source | Phase(s) | Status | +|----|-----------|--------|----------|--------| +| KDD-deploy-1 | Pre-deployment validation includes semantic checks (call targets, argument types, trigger operand types) | CLAUDE.md | 2 | Pending | +| KDD-deploy-2 | Composed member addressing: [ModuleInstanceName].[MemberName] | CLAUDE.md | 2 | Pending | +| KDD-deploy-3 | Override granularity defined per entity type and per field | CLAUDE.md | 2 | Pending | +| KDD-deploy-4 | Template graph acyclicity enforced on save | CLAUDE.md | 2 | Pending | +| KDD-deploy-5 | Flattened configs include revision hash for staleness detection | CLAUDE.md | 2 | Pending | +| KDD-deploy-6 | Deployment identity: unique deployment ID + revision hash for idempotency | CLAUDE.md | 3C | Pending | +| KDD-deploy-7 | Per-instance operation lock covers all mutating commands | CLAUDE.md | 3C | Pending | +| KDD-deploy-8 | Site-side apply is all-or-nothing per instance | CLAUDE.md | 3C | Pending | +| KDD-deploy-9 | System-wide artifact version skew across sites is supported | CLAUDE.md | 3C | Pending | +| KDD-deploy-10 | Last-write-wins for concurrent template editing | CLAUDE.md | 2 | Pending | +| KDD-deploy-11 | Optimistic concurrency on deployment status records | CLAUDE.md | 3C | Pending | +| KDD-deploy-12 | Naming collisions in composed feature modules are design-time errors | CLAUDE.md | 2 | Pending | + +### Store-and-Forward + +| ID | Constraint | Source | Phase(s) | Status | +|----|-----------|--------|----------|--------| +| KDD-sf-1 | Fixed retry interval, no max buffer size. Only transient failures buffered | CLAUDE.md | 3C | Pending | +| KDD-sf-2 | Async best-effort replication to standby (no ack wait) | CLAUDE.md | 3C | Pending | +| KDD-sf-3 | Messages not cleared on instance deletion | CLAUDE.md | 3C | Pending | +| KDD-sf-4 | CachedCall idempotency is the caller's responsibility | CLAUDE.md | 7 | Pending | + +### Security & Auth + +| ID | Constraint | Source | Phase(s) | Status | +|----|-----------|--------|----------|--------| +| KDD-sec-1 | Authentication: direct LDAP bind, no Kerberos/NTLM. LDAPS/StartTLS required | CLAUDE.md | 1 | Pending | +| KDD-sec-2 | JWT: HMAC-SHA256 shared symmetric key, 15-min expiry with sliding refresh, 30-min idle timeout | CLAUDE.md | 1 | Pending | +| KDD-sec-3 | LDAP failure: new logins fail; active sessions continue with current roles | CLAUDE.md | 1 | Pending | +| KDD-sec-4 | Load balancer in front of central UI; JWT + shared Data Protection keys for failover | CLAUDE.md | 1 | Pending | + +### Cluster & Failover + +| ID | Constraint | Source | Phase(s) | Status | +|----|-----------|--------|----------|--------| +| KDD-cluster-1 | Keep-oldest SBR with down-if-alone=on, 15s stable-after | CLAUDE.md | 3A | Pending | +| KDD-cluster-2 | Both nodes are seed nodes. min-nr-of-members=1 | CLAUDE.md | 3A | Pending | +| KDD-cluster-3 | Failure detection: 2s heartbeat, 10s threshold. Total failover ~25s | CLAUDE.md | 3A | Pending | +| KDD-cluster-4 | CoordinatedShutdown for graceful singleton handover | CLAUDE.md | 3A | Pending | +| KDD-cluster-5 | Automatic dual-node recovery from persistent storage | CLAUDE.md | 3A | Pending | + +### UI & Monitoring + +| ID | Constraint | Source | Phase(s) | Status | +|----|-----------|--------|----------|--------| +| KDD-ui-1 | Central UI: Blazor Server (ASP.NET Core + SignalR) | CLAUDE.md | 1 | Pending | +| KDD-ui-2 | Real-time push for debug view, health dashboard, deployment status | CLAUDE.md | 3B, 6 | Pending | +| KDD-ui-3 | Health reports: 30s interval, 60s offline threshold, monotonic sequence numbers, raw error counts | CLAUDE.md | 3B | Pending | +| KDD-ui-4 | Dead letter monitoring as health metric | CLAUDE.md | 3B | Pending | +| KDD-ui-5 | Site Event Logging: 30-day retention, 1GB cap, daily purge, paginated queries with keyword search | CLAUDE.md | 3B | Pending | + +### Code Organization + +| ID | Constraint | Source | Phase(s) | Status | +|----|-----------|--------|----------|--------| +| KDD-code-1 | Entity classes are persistence-ignorant POCOs in Commons; EF mappings in Configuration Database | CLAUDE.md | 0, 1 | Pending | +| KDD-code-2 | Repository interfaces in Commons; implementations in Configuration Database | CLAUDE.md | 0, 1 | Pending | +| KDD-code-3 | Commons namespace hierarchy: Types/, Interfaces/, Entities/, Messages/ with domain area subfolders | CLAUDE.md | 0 | Pending | +| KDD-code-4 | Message contracts follow additive-only evolution rules | CLAUDE.md | 0 | Pending | +| KDD-code-5 | Per-component configuration via appsettings.json sections bound to options classes | CLAUDE.md | 0, 1 | Pending | +| KDD-code-6 | Options classes owned by component projects, not Commons | CLAUDE.md | 0 | Pending | +| KDD-code-7 | Host readiness gating: /health/ready endpoint, no traffic until operational | CLAUDE.md | 1 | Pending | +| KDD-code-8 | EF Core migrations: auto-apply in dev, manual SQL scripts for production | CLAUDE.md | 1 | Pending | +| KDD-code-9 | Script trust model: forbidden APIs (System.IO, Process, Threading, Reflection, raw network) | CLAUDE.md | 3B | Pending | + +### LmxProxy Protocol (Component Design) + +| ID | Constraint | Source | Phase(s) | Status | +|----|-----------|--------|----------|--------| +| CD-DCL-1 | LmxProxy: gRPC/HTTP/2 transport, protobuf-net code-first, port 5050 | Component-DCL | 3B | Pending | +| CD-DCL-2 | LmxProxy: API key auth, session-based (SessionId), 30s keep-alive heartbeat | Component-DCL | 3B | Pending | +| CD-DCL-3 | LmxProxy: Server-streaming gRPC for subscriptions, 1000ms default sampling | Component-DCL | 3B | Pending | +| CD-DCL-4 | LmxProxy: SDK retry policy (exponential backoff) complements DCL's fixed-interval reconnect | Component-DCL | 3B | Pending | +| CD-DCL-5 | LmxProxy: Batch read/write capabilities (ReadBatchAsync, WriteBatchAsync) | Component-DCL | 3B | Pending | +| CD-DCL-6 | LmxProxy: TLS 1.2/1.3, mutual TLS, self-signed for dev | Component-DCL | 3B | Pending | + +--- + +## Split-Section Tracking + +Sections that span multiple phases. When phase plans are generated, this table tracks which bullets each phase owns. **The union must equal the full section — no gaps.** + +| Section | Description | Phase Split | Bullet-Level Verified | +|---------|-------------|------------|----------------------| +| 1.2 | Failover | 3A (site failover mechanics), 8 (full-system validation) | Not yet | +| 1.4 | Deployment Behavior | 3C (pipeline), 6 (UI) | Not yet | +| 1.5 | System-Wide Artifact Deployment | 3C (backend), 6 (UI) | Not yet | +| 3.3 | Data Connections | 2 (model/binding), 3B (runtime) | Not yet | +| 3.8.1 | Instance Lifecycle | 3C (backend), 4 (UI) | Not yet | +| 3.9 | Deployment & Change Propagation | 3C (pipeline), 6 (UI) | Not yet | +| 3.10 | Areas | 2 (model), 4 (UI) | Not yet | +| 4.1 | Script Definitions | 2 (model), 3B (runtime) | Not yet | +| 4.4 | Script Capabilities | 3B (core: read/write/call), 7 (external/notify/DB) | Not yet | +| 5.1 | External System Definitions | 5 (UI), 7 (runtime) | Not yet | +| 5.3 | S&F for External Calls | 3C (engine), 7 (integration) | Not yet | +| 5.4 | Parked Message Management | 3C (backend), 6 (UI) | Not yet | +| 5.5 | Database Connections | 5 (UI), 7 (runtime) | Not yet | +| 6.1 | Notification Lists | 5 (UI), 7 (runtime) | Not yet | +| 7.4 | API Method Definitions | 5 (UI), 7 (runtime) | Not yet | +| 8 | Central UI | 4, 5, 6 (split by workflow type) | Not yet | + +--- + ## Coverage Verification **HighLevelReqs sections**: 54 sections mapped. **0 unmapped.** **REQ-* identifiers**: 22 identifiers mapped. **0 unmapped.** +**Design constraints (KDD-*)**: 52 constraints mapped. **0 unmapped.** +**Component design constraints (CD-*)**: 6 constraints mapped. **0 unmapped.** +**Split sections**: 16 identified. **0 bullet-level verified** (verified when phase plans are generated). -All requirements have at least one phase assignment. Coverage will be re-verified as each phase plan is generated. +All requirements and constraints have at least one phase assignment. Bullet-level verification occurs during phase plan generation — each plan document contains its own Requirements Checklist and Design Constraints Checklist with forward/reverse tracing to work packages.