All phases (0-8) now have detailed implementation plans with: - Bullet-level requirement extraction from HighLevelReqs sections - Design constraint traceability (KDD + Component Design) - Work packages with acceptance criteria mapped to every requirement - Split-section ownership verified across phases - Orphan checks (forward, reverse, negative) all passing - Codex MCP (gpt-5.4) external verification completed per phase Total: 7,549 lines across 11 plan documents, ~160 work packages, ~400 requirements traced, ~25 open questions logged for follow-up.
11 KiB
11 KiB
Implementation Questions
Purpose: Track questions and ambiguities discovered during plan generation that require follow-up before or during implementation.
Open Questions
Phase 0: Solution Skeleton
| # | Question | Context | Impact | Status |
|---|---|---|---|---|
| Q16 | Should Result<T> use a OneOf-style library or be hand-rolled? |
Affects COM-7-1 (minimal dependencies). A hand-rolled Result<T> keeps zero external dependencies. |
Phase 0. | Recommend hand-rolled to maintain zero-dependency constraint. |
| Q17 | Should entity POCO properties be required (init-only) or settable? | EF Core Fluent API mapping may need settable properties. POCOs must be persistence-ignorant but still mappable by Phase 1. | Phase 0 / Phase 1 boundary. | Recommend { get; set; } for EF compatibility, with constructor invariants for required fields. |
| Q18 | What QualityCode values should the protocol abstraction define? |
OPC UA has a rich quality model (Good, Uncertain, Bad with subtypes). Need to decide on a simplified shared set. | Phase 0. | Recommend: Good, Bad, Uncertain as the minimal set, with room to extend. |
| Q19 | Should IDataConnection be IAsyncDisposable for connection cleanup? |
Affects DCL connection actor lifecycle. | Phase 0 / Phase 3B boundary. | Recommend yes — add IAsyncDisposable to support proper cleanup. |
Phase 1: Central Platform Foundations
| # | Question | Context | Impact | Status |
|---|---|---|---|---|
| Q-P1-1 | Should Data Protection keys be stored in the configuration database (via EF Core Data Protection key store) or on a shared filesystem path? | WP-10 requires both central nodes share Data Protection keys. DB storage is more portable; filesystem requires shared mount. | Implementation detail for WP-10. Either approach works. | Open — decide during implementation. Default to DB storage. |
Phase 2: Core Modeling, Validation & Deployment Contract
| # | Question | Context | Impact | Status |
|---|---|---|---|---|
| Q-P2-1 | What hashing algorithm should be used for revision hashes? | SHA-256 is likely choice for determinism and collision resistance. | WP-16. Low risk — algorithm can be changed without API impact. | Open — proceed with SHA-256 as default. |
| Q-P2-2 | What serialization format for the deployment package contract? | JSON is most natural for .NET; MessagePack is more compact. Decision affects Site Runtime deserialization. | WP-17. Medium — format must be stable once sites consume it. | Open — recommend JSON for debuggability; can add binary format later. |
| Q-P2-3 | How should script pre-compilation handle references to runtime APIs (GetAttribute, SetAttribute, etc.) that don't exist at compile time on central? | Scripts reference runtime APIs only available at site. Central needs stubs. | WP-18, WP-19. Must be addressed before script compilation validation works. | Open — implement compilation against a stub ScriptApi assembly. |
| Q-P2-4 | Should semantic validation for CallShared resolve against shared script library at validation time, or deployed version at target site? | Shared scripts may be modified between validation and deployment. | WP-19. Low risk if validation re-runs before deployment. | Open — validate against current library; document re-validation on deploy. |
Phase 3A: Runtime Foundation
| # | Question | Context | Impact | Status |
|---|---|---|---|---|
| Q-P3A-1 | What is the optimal batch size and delay for staggered Instance Actor startup? | Component-SiteRuntime.md suggests 20 with a "short delay." Actual values depend on OPC UA server capacity. | Performance tuning. Default to 20/100ms, make configurable. | Deferred — tune during Phase 3B when DCL is integrated. |
| Q-P3A-2 | Should the SQLite schema use a single database file or separate files per concern (configs, overrides, S&F, events)? | Single file is simpler. Separate files isolate concerns and allow independent backup/maintenance. | Schema design. | Recommend single file with separate tables. Simpler transaction management. Final decision during implementation. |
| Q-P3A-3 | Should Akka.Persistence (event sourcing / snapshotting) be used for the Deployment Manager singleton, or is direct SQLite access sufficient? | Akka.Persistence adds complexity (journal, snapshots) but provides built-in recovery. Direct SQLite is simpler for this use case. | Architecture. | Recommend direct SQLite — Deployment Manager recovery is a full read-all-configs-and-rebuild pattern, not event replay. |
Phase 3B: Site I/O & Observability
| # | Question | Context | Impact | Status |
|---|---|---|---|---|
| Q-P3B-1 | What is the exact dedicated blocking I/O dispatcher configuration for Script Execution Actors? | KDD-runtime-3 says "dedicated blocking I/O dispatcher" — need Akka.NET HOCON config (thread pool size, throughput settings). | WP-15. Sensible defaults can be set; tuned in Phase 8. | Deferred — use Akka.NET default blocking-io-dispatcher config; tune during Phase 8 performance testing. |
| Q-P3B-2 | Should LmxProxy adapter expose WriteBatchAndWaitAsync (write-and-poll handshake) through IDataConnection or as a protocol-specific extension? | CD-DCL-5 lists WriteBatchAndWaitAsync but IDataConnection only defines simple Write. | WP-8. Does not block core functionality. | Deferred — expose as protocol-specific extension method; not part of IDataConnection core contract. |
| Q-P3B-3 | What is the Rate of Change alarm evaluation time window? | Section 3.4 says "changes faster than a defined threshold" but does not specify the time window (per-second? per-minute? configurable?). | WP-16. Needs a design decision for the evaluation algorithm. | Deferred — implement as configurable window (default: per-second rate). Document in alarm definition schema. |
| Q-P3B-4 | How does the health report sequence number behave across failover? | Sequence number is monotonic within a singleton lifecycle. After failover, the new singleton starts at 1. Central must handle this. | WP-27, WP-28. | Resolved in design — central accepts report when site is offline; for online sites, requires seq > last. On failover, site goes offline first (missed reports), so the reset is naturally handled. |
Phase 3C: Deployment Pipeline & Store-and-Forward
| # | Question | Context | Impact | Status |
|---|---|---|---|---|
| Q-P3C-1 | Should S&F retry timers be reset on failover or continue from the last known retry timestamp? | On failover, the new active node loads buffer from SQLite. Messages have last_attempt_at timestamps. Should retry timing continue relative to last_attempt_at or reset to "now"? |
Affects retry behavior immediately after failover. Recommend: continue from last_attempt_at to avoid burst retries. |
Open |
| Q-P3C-2 | What is the maximum number of parked messages returned in a single remote query? | Communication Layer pattern 8 uses 30s timeout. Very large parked message sets may need pagination. | Recommend: paginated query (e.g., 100 per page) consistent with Site Event Logging pagination pattern. | Open |
| Q-P3C-3 | Should the per-instance operation lock be in-memory (lost on central failover) or persisted? | In-memory is simpler and consistent with "in-progress deployments treated as failed on failover." Persisted lock could cause orphan locks. | Recommend: in-memory. On failover, all locks released. Site state query resolves any ambiguity. | Open |
Phase 4: Operator/Admin UI
| # | Question | Context | Impact | Status |
|---|---|---|---|---|
| Q-P4-1 | Should the API key value be auto-generated (GUID/random) or allow user-provided values? | Component-InboundAPI.md says "key value" but does not specify generation. | Phase 4, WP-5. | Open — assume auto-generated with optional copy-to-clipboard; user can regenerate. |
| Q-P4-2 | Should the health dashboard support configurable refresh intervals or always use the 30s report interval? | Component-HealthMonitoring.md specifies 30s default interval. | Phase 4, WP-9. | Open — assume display updates on every report arrival (no UI-side polling); interval is server-side config. |
| Q-P4-3 | Should area deletion cascade to child areas or require bottom-up deletion? | HighLevelReqs 3.10 says "parent-child relationships" but does not specify cascade behavior. | Phase 4, WP-3. | Open — assume cascade delete of child areas (if no instances assigned to any area in the subtree). |
Phase 7: Integrations
| # | Question | Context | Impact | Status |
|---|---|---|---|---|
| Q12 | What Microsoft 365 tenant/app registration is available for SMTP OAuth2 testing? | Affects Notification Service OAuth2 implementation. | Phase 7. | Deferred — won't be known during development. Implement against Basic Auth first; OAuth2 tested when tenant available. |
Resolved Questions
| # | Question | Resolution | Date |
|---|---|---|---|
| Q1 | What .NET version should we target? | .NET 10 LTS (released November 2025, supported through 2028). | 2026-03-16 |
| Q2 | What Akka.NET version? | Latest stable 1.5.x (currently 1.5.62). | 2026-03-16 |
| Q3 | Monorepo or separate repos? | Single monorepo with SLNX solution file (.slnx, the new XML-based format default in .NET 10). |
2026-03-16 |
| Q4 | What CI/CD platform? | None for now. No CI/CD pipeline. | 2026-03-16 |
| Q5 | What LDAP server for dev/test? | GLAuth (lightweight LDAP) in Docker. See infra/glauth/config.toml and test_infra_ldap.md. |
2026-03-16 |
| Q6 | What MS SQL version and hosting? | SQL Server 2022 Developer Edition in Docker. See infra/docker-compose.yml and test_infra_db.md. |
2026-03-16 |
| Q7 | JWT signing key storage? | appsettings.json (per environment). |
2026-03-16 |
| Q8 | OPC UA server for dev/test? | Azure IoT OPC PLC simulator in Docker. See infra/opcua/nodes.json and test_infra_opcua.md. |
2026-03-16 |
| Q10 | Target site hardware? | Windows Server 2022, 24 GB RAM, 1 TB drive, 16-core Xeon. | 2026-03-16 |
| Q9 | What is the custom protocol? Is there an existing specification or SDK? | LmxProxy — gRPC-based protocol (protobuf-net code-first, port 5050, API key auth). Client SDK: LmxProxyClient NuGet package. See Component-DataConnectionLayer.md for full API mapping and protocol details. |
2026-03-16 |
| Q11 | Are there specific external systems (MES, recipe manager) to integrate with for initial testing? | REST API test server (infra/restapi/) provides simulated external endpoints for External System Gateway and Inbound API testing. No real MES/recipe system needed for initial phases. |
2026-03-16 |
| Q15 | Should the Machine Data Database schema be designed in this project, or is it out of scope? | Out of scope — Machine Data Database is a pre-existing database at customer sites. Test infra seeds sample tables/data in infra/mssql/machinedata_seed.sql. |
2026-03-16 |
| Q13 | Who is the development team? | Solo developer with extensive Akka.NET experience and full availability. No parallelization constraints — phases are sequential. | 2026-03-16 |
| Q14 | Is there an existing deployment target environment for early pilot testing? | Yes — developer can test directly. No separate pilot environment needed. | 2026-03-16 |