Files
scadalink-design/docs/plans/questions.md
Joseph Doherty 021817930b Generate all 11 phase implementation plans with bullet-level requirement traceability
All phases (0-8) now have detailed implementation plans with:
- Bullet-level requirement extraction from HighLevelReqs sections
- Design constraint traceability (KDD + Component Design)
- Work packages with acceptance criteria mapped to every requirement
- Split-section ownership verified across phases
- Orphan checks (forward, reverse, negative) all passing
- Codex MCP (gpt-5.4) external verification completed per phase

Total: 7,549 lines across 11 plan documents, ~160 work packages,
~400 requirements traced, ~25 open questions logged for follow-up.
2026-03-16 15:34:54 -04:00

11 KiB

Implementation Questions

Purpose: Track questions and ambiguities discovered during plan generation that require follow-up before or during implementation.


Open Questions

Phase 0: Solution Skeleton

# Question Context Impact Status
Q16 Should Result<T> use a OneOf-style library or be hand-rolled? Affects COM-7-1 (minimal dependencies). A hand-rolled Result<T> keeps zero external dependencies. Phase 0. Recommend hand-rolled to maintain zero-dependency constraint.
Q17 Should entity POCO properties be required (init-only) or settable? EF Core Fluent API mapping may need settable properties. POCOs must be persistence-ignorant but still mappable by Phase 1. Phase 0 / Phase 1 boundary. Recommend { get; set; } for EF compatibility, with constructor invariants for required fields.
Q18 What QualityCode values should the protocol abstraction define? OPC UA has a rich quality model (Good, Uncertain, Bad with subtypes). Need to decide on a simplified shared set. Phase 0. Recommend: Good, Bad, Uncertain as the minimal set, with room to extend.
Q19 Should IDataConnection be IAsyncDisposable for connection cleanup? Affects DCL connection actor lifecycle. Phase 0 / Phase 3B boundary. Recommend yes — add IAsyncDisposable to support proper cleanup.

Phase 1: Central Platform Foundations

# Question Context Impact Status
Q-P1-1 Should Data Protection keys be stored in the configuration database (via EF Core Data Protection key store) or on a shared filesystem path? WP-10 requires both central nodes share Data Protection keys. DB storage is more portable; filesystem requires shared mount. Implementation detail for WP-10. Either approach works. Open — decide during implementation. Default to DB storage.

Phase 2: Core Modeling, Validation & Deployment Contract

# Question Context Impact Status
Q-P2-1 What hashing algorithm should be used for revision hashes? SHA-256 is likely choice for determinism and collision resistance. WP-16. Low risk — algorithm can be changed without API impact. Open — proceed with SHA-256 as default.
Q-P2-2 What serialization format for the deployment package contract? JSON is most natural for .NET; MessagePack is more compact. Decision affects Site Runtime deserialization. WP-17. Medium — format must be stable once sites consume it. Open — recommend JSON for debuggability; can add binary format later.
Q-P2-3 How should script pre-compilation handle references to runtime APIs (GetAttribute, SetAttribute, etc.) that don't exist at compile time on central? Scripts reference runtime APIs only available at site. Central needs stubs. WP-18, WP-19. Must be addressed before script compilation validation works. Open — implement compilation against a stub ScriptApi assembly.
Q-P2-4 Should semantic validation for CallShared resolve against shared script library at validation time, or deployed version at target site? Shared scripts may be modified between validation and deployment. WP-19. Low risk if validation re-runs before deployment. Open — validate against current library; document re-validation on deploy.

Phase 3A: Runtime Foundation

# Question Context Impact Status
Q-P3A-1 What is the optimal batch size and delay for staggered Instance Actor startup? Component-SiteRuntime.md suggests 20 with a "short delay." Actual values depend on OPC UA server capacity. Performance tuning. Default to 20/100ms, make configurable. Deferred — tune during Phase 3B when DCL is integrated.
Q-P3A-2 Should the SQLite schema use a single database file or separate files per concern (configs, overrides, S&F, events)? Single file is simpler. Separate files isolate concerns and allow independent backup/maintenance. Schema design. Recommend single file with separate tables. Simpler transaction management. Final decision during implementation.
Q-P3A-3 Should Akka.Persistence (event sourcing / snapshotting) be used for the Deployment Manager singleton, or is direct SQLite access sufficient? Akka.Persistence adds complexity (journal, snapshots) but provides built-in recovery. Direct SQLite is simpler for this use case. Architecture. Recommend direct SQLite — Deployment Manager recovery is a full read-all-configs-and-rebuild pattern, not event replay.

Phase 3B: Site I/O & Observability

# Question Context Impact Status
Q-P3B-1 What is the exact dedicated blocking I/O dispatcher configuration for Script Execution Actors? KDD-runtime-3 says "dedicated blocking I/O dispatcher" — need Akka.NET HOCON config (thread pool size, throughput settings). WP-15. Sensible defaults can be set; tuned in Phase 8. Deferred — use Akka.NET default blocking-io-dispatcher config; tune during Phase 8 performance testing.
Q-P3B-2 Should LmxProxy adapter expose WriteBatchAndWaitAsync (write-and-poll handshake) through IDataConnection or as a protocol-specific extension? CD-DCL-5 lists WriteBatchAndWaitAsync but IDataConnection only defines simple Write. WP-8. Does not block core functionality. Deferred — expose as protocol-specific extension method; not part of IDataConnection core contract.
Q-P3B-3 What is the Rate of Change alarm evaluation time window? Section 3.4 says "changes faster than a defined threshold" but does not specify the time window (per-second? per-minute? configurable?). WP-16. Needs a design decision for the evaluation algorithm. Deferred — implement as configurable window (default: per-second rate). Document in alarm definition schema.
Q-P3B-4 How does the health report sequence number behave across failover? Sequence number is monotonic within a singleton lifecycle. After failover, the new singleton starts at 1. Central must handle this. WP-27, WP-28. Resolved in design — central accepts report when site is offline; for online sites, requires seq > last. On failover, site goes offline first (missed reports), so the reset is naturally handled.

Phase 3C: Deployment Pipeline & Store-and-Forward

# Question Context Impact Status
Q-P3C-1 Should S&F retry timers be reset on failover or continue from the last known retry timestamp? On failover, the new active node loads buffer from SQLite. Messages have last_attempt_at timestamps. Should retry timing continue relative to last_attempt_at or reset to "now"? Affects retry behavior immediately after failover. Recommend: continue from last_attempt_at to avoid burst retries. Open
Q-P3C-2 What is the maximum number of parked messages returned in a single remote query? Communication Layer pattern 8 uses 30s timeout. Very large parked message sets may need pagination. Recommend: paginated query (e.g., 100 per page) consistent with Site Event Logging pagination pattern. Open
Q-P3C-3 Should the per-instance operation lock be in-memory (lost on central failover) or persisted? In-memory is simpler and consistent with "in-progress deployments treated as failed on failover." Persisted lock could cause orphan locks. Recommend: in-memory. On failover, all locks released. Site state query resolves any ambiguity. Open

Phase 4: Operator/Admin UI

# Question Context Impact Status
Q-P4-1 Should the API key value be auto-generated (GUID/random) or allow user-provided values? Component-InboundAPI.md says "key value" but does not specify generation. Phase 4, WP-5. Open — assume auto-generated with optional copy-to-clipboard; user can regenerate.
Q-P4-2 Should the health dashboard support configurable refresh intervals or always use the 30s report interval? Component-HealthMonitoring.md specifies 30s default interval. Phase 4, WP-9. Open — assume display updates on every report arrival (no UI-side polling); interval is server-side config.
Q-P4-3 Should area deletion cascade to child areas or require bottom-up deletion? HighLevelReqs 3.10 says "parent-child relationships" but does not specify cascade behavior. Phase 4, WP-3. Open — assume cascade delete of child areas (if no instances assigned to any area in the subtree).

Phase 7: Integrations

# Question Context Impact Status
Q12 What Microsoft 365 tenant/app registration is available for SMTP OAuth2 testing? Affects Notification Service OAuth2 implementation. Phase 7. Deferred — won't be known during development. Implement against Basic Auth first; OAuth2 tested when tenant available.

Resolved Questions

# Question Resolution Date
Q1 What .NET version should we target? .NET 10 LTS (released November 2025, supported through 2028). 2026-03-16
Q2 What Akka.NET version? Latest stable 1.5.x (currently 1.5.62). 2026-03-16
Q3 Monorepo or separate repos? Single monorepo with SLNX solution file (.slnx, the new XML-based format default in .NET 10). 2026-03-16
Q4 What CI/CD platform? None for now. No CI/CD pipeline. 2026-03-16
Q5 What LDAP server for dev/test? GLAuth (lightweight LDAP) in Docker. See infra/glauth/config.toml and test_infra_ldap.md. 2026-03-16
Q6 What MS SQL version and hosting? SQL Server 2022 Developer Edition in Docker. See infra/docker-compose.yml and test_infra_db.md. 2026-03-16
Q7 JWT signing key storage? appsettings.json (per environment). 2026-03-16
Q8 OPC UA server for dev/test? Azure IoT OPC PLC simulator in Docker. See infra/opcua/nodes.json and test_infra_opcua.md. 2026-03-16
Q10 Target site hardware? Windows Server 2022, 24 GB RAM, 1 TB drive, 16-core Xeon. 2026-03-16
Q9 What is the custom protocol? Is there an existing specification or SDK? LmxProxy — gRPC-based protocol (protobuf-net code-first, port 5050, API key auth). Client SDK: LmxProxyClient NuGet package. See Component-DataConnectionLayer.md for full API mapping and protocol details. 2026-03-16
Q11 Are there specific external systems (MES, recipe manager) to integrate with for initial testing? REST API test server (infra/restapi/) provides simulated external endpoints for External System Gateway and Inbound API testing. No real MES/recipe system needed for initial phases. 2026-03-16
Q15 Should the Machine Data Database schema be designed in this project, or is it out of scope? Out of scope — Machine Data Database is a pre-existing database at customer sites. Test infra seeds sample tables/data in infra/mssql/machinedata_seed.sql. 2026-03-16
Q13 Who is the development team? Solo developer with extensive Akka.NET experience and full availability. No parallelization constraints — phases are sequential. 2026-03-16
Q14 Is there an existing deployment target environment for early pilot testing? Yes — developer can test directly. No separate pilot environment needed. 2026-03-16