Files

Joseph Doherty 021817930b Generate all 11 phase implementation plans with bullet-level requirement traceability

All phases (0-8) now have detailed implementation plans with:
- Bullet-level requirement extraction from HighLevelReqs sections
- Design constraint traceability (KDD + Component Design)
- Work packages with acceptance criteria mapped to every requirement
- Split-section ownership verified across phases
- Orphan checks (forward, reverse, negative) all passing
- Codex MCP (gpt-5.4) external verification completed per phase

Total: 7,549 lines across 11 plan documents, ~160 work packages,
~400 requirements traced, ~25 open questions logged for follow-up.

2026-03-16 15:34:54 -04:00

11 KiB

Raw Blame History

Implementation Questions

Purpose: Track questions and ambiguities discovered during plan generation that require follow-up before or during implementation.

Open Questions

Phase 0: Solution Skeleton

#	Question	Context	Impact	Status
Q16	Should `Result<T>` use a OneOf-style library or be hand-rolled?	Affects COM-7-1 (minimal dependencies). A hand-rolled `Result<T>` keeps zero external dependencies.	Phase 0.	Recommend hand-rolled to maintain zero-dependency constraint.
Q17	Should entity POCO properties be required (init-only) or settable?	EF Core Fluent API mapping may need settable properties. POCOs must be persistence-ignorant but still mappable by Phase 1.	Phase 0 / Phase 1 boundary.	Recommend `{ get; set; }` for EF compatibility, with constructor invariants for required fields.
Q18	What `QualityCode` values should the protocol abstraction define?	OPC UA has a rich quality model (Good, Uncertain, Bad with subtypes). Need to decide on a simplified shared set.	Phase 0.	Recommend: Good, Bad, Uncertain as the minimal set, with room to extend.
Q19	Should `IDataConnection` be `IAsyncDisposable` for connection cleanup?	Affects DCL connection actor lifecycle.	Phase 0 / Phase 3B boundary.	Recommend yes — add `IAsyncDisposable` to support proper cleanup.

Phase 1: Central Platform Foundations

#	Question	Context	Impact	Status
Q-P1-1	Should Data Protection keys be stored in the configuration database (via EF Core Data Protection key store) or on a shared filesystem path?	WP-10 requires both central nodes share Data Protection keys. DB storage is more portable; filesystem requires shared mount.	Implementation detail for WP-10. Either approach works.	Open — decide during implementation. Default to DB storage.

Phase 2: Core Modeling, Validation & Deployment Contract

#	Question	Context	Impact	Status
Q-P2-1	What hashing algorithm should be used for revision hashes?	SHA-256 is likely choice for determinism and collision resistance.	WP-16. Low risk — algorithm can be changed without API impact.	Open — proceed with SHA-256 as default.
Q-P2-2	What serialization format for the deployment package contract?	JSON is most natural for .NET; MessagePack is more compact. Decision affects Site Runtime deserialization.	WP-17. Medium — format must be stable once sites consume it.	Open — recommend JSON for debuggability; can add binary format later.
Q-P2-3	How should script pre-compilation handle references to runtime APIs (GetAttribute, SetAttribute, etc.) that don't exist at compile time on central?	Scripts reference runtime APIs only available at site. Central needs stubs.	WP-18, WP-19. Must be addressed before script compilation validation works.	Open — implement compilation against a stub ScriptApi assembly.
Q-P2-4	Should semantic validation for CallShared resolve against shared script library at validation time, or deployed version at target site?	Shared scripts may be modified between validation and deployment.	WP-19. Low risk if validation re-runs before deployment.	Open — validate against current library; document re-validation on deploy.

Phase 3A: Runtime Foundation

#	Question	Context	Impact	Status
Q-P3A-1	What is the optimal batch size and delay for staggered Instance Actor startup?	Component-SiteRuntime.md suggests 20 with a "short delay." Actual values depend on OPC UA server capacity.	Performance tuning. Default to 20/100ms, make configurable.	Deferred — tune during Phase 3B when DCL is integrated.
Q-P3A-2	Should the SQLite schema use a single database file or separate files per concern (configs, overrides, S&F, events)?	Single file is simpler. Separate files isolate concerns and allow independent backup/maintenance.	Schema design.	Recommend single file with separate tables. Simpler transaction management. Final decision during implementation.
Q-P3A-3	Should Akka.Persistence (event sourcing / snapshotting) be used for the Deployment Manager singleton, or is direct SQLite access sufficient?	Akka.Persistence adds complexity (journal, snapshots) but provides built-in recovery. Direct SQLite is simpler for this use case.	Architecture.	Recommend direct SQLite — Deployment Manager recovery is a full read-all-configs-and-rebuild pattern, not event replay.

Phase 3B: Site I/O & Observability

#	Question	Context	Impact	Status
Q-P3B-1	What is the exact dedicated blocking I/O dispatcher configuration for Script Execution Actors?	KDD-runtime-3 says "dedicated blocking I/O dispatcher" — need Akka.NET HOCON config (thread pool size, throughput settings).	WP-15. Sensible defaults can be set; tuned in Phase 8.	Deferred — use Akka.NET default blocking-io-dispatcher config; tune during Phase 8 performance testing.
Q-P3B-2	Should LmxProxy adapter expose WriteBatchAndWaitAsync (write-and-poll handshake) through IDataConnection or as a protocol-specific extension?	CD-DCL-5 lists WriteBatchAndWaitAsync but IDataConnection only defines simple Write.	WP-8. Does not block core functionality.	Deferred — expose as protocol-specific extension method; not part of IDataConnection core contract.
Q-P3B-3	What is the Rate of Change alarm evaluation time window?	Section 3.4 says "changes faster than a defined threshold" but does not specify the time window (per-second? per-minute? configurable?).	WP-16. Needs a design decision for the evaluation algorithm.	Deferred — implement as configurable window (default: per-second rate). Document in alarm definition schema.
Q-P3B-4	How does the health report sequence number behave across failover?	Sequence number is monotonic within a singleton lifecycle. After failover, the new singleton starts at 1. Central must handle this.	WP-27, WP-28.	Resolved in design — central accepts report when site is offline; for online sites, requires seq > last. On failover, site goes offline first (missed reports), so the reset is naturally handled.

Phase 3C: Deployment Pipeline & Store-and-Forward

#	Question	Context	Impact	Status
Q-P3C-1	Should S&F retry timers be reset on failover or continue from the last known retry timestamp?	On failover, the new active node loads buffer from SQLite. Messages have `last_attempt_at` timestamps. Should retry timing continue relative to `last_attempt_at` or reset to "now"?	Affects retry behavior immediately after failover. Recommend: continue from `last_attempt_at` to avoid burst retries.	Open
Q-P3C-2	What is the maximum number of parked messages returned in a single remote query?	Communication Layer pattern 8 uses 30s timeout. Very large parked message sets may need pagination.	Recommend: paginated query (e.g., 100 per page) consistent with Site Event Logging pagination pattern.	Open
Q-P3C-3	Should the per-instance operation lock be in-memory (lost on central failover) or persisted?	In-memory is simpler and consistent with "in-progress deployments treated as failed on failover." Persisted lock could cause orphan locks.	Recommend: in-memory. On failover, all locks released. Site state query resolves any ambiguity.	Open

Phase 4: Operator/Admin UI

#	Question	Context	Impact	Status
Q-P4-1	Should the API key value be auto-generated (GUID/random) or allow user-provided values?	Component-InboundAPI.md says "key value" but does not specify generation.	Phase 4, WP-5.	Open — assume auto-generated with optional copy-to-clipboard; user can regenerate.
Q-P4-2	Should the health dashboard support configurable refresh intervals or always use the 30s report interval?	Component-HealthMonitoring.md specifies 30s default interval.	Phase 4, WP-9.	Open — assume display updates on every report arrival (no UI-side polling); interval is server-side config.
Q-P4-3	Should area deletion cascade to child areas or require bottom-up deletion?	HighLevelReqs 3.10 says "parent-child relationships" but does not specify cascade behavior.	Phase 4, WP-3.	Open — assume cascade delete of child areas (if no instances assigned to any area in the subtree).

Phase 7: Integrations

#	Question	Context	Impact	Status
Q12	What Microsoft 365 tenant/app registration is available for SMTP OAuth2 testing?	Affects Notification Service OAuth2 implementation.	Phase 7.	Deferred — won't be known during development. Implement against Basic Auth first; OAuth2 tested when tenant available.

Resolved Questions

#	Question	Resolution	Date
Q1	What .NET version should we target?	.NET 10 LTS (released November 2025, supported through 2028).	2026-03-16
Q2	What Akka.NET version?	Latest stable 1.5.x (currently 1.5.62).	2026-03-16
Q3	Monorepo or separate repos?	Single monorepo with SLNX solution file (`.slnx`, the new XML-based format default in .NET 10).	2026-03-16
Q4	What CI/CD platform?	None for now. No CI/CD pipeline.	2026-03-16
Q5	What LDAP server for dev/test?	GLAuth (lightweight LDAP) in Docker. See `infra/glauth/config.toml` and `test_infra_ldap.md`.	2026-03-16
Q6	What MS SQL version and hosting?	SQL Server 2022 Developer Edition in Docker. See `infra/docker-compose.yml` and `test_infra_db.md`.	2026-03-16
Q7	JWT signing key storage?	`appsettings.json` (per environment).	2026-03-16
Q8	OPC UA server for dev/test?	Azure IoT OPC PLC simulator in Docker. See `infra/opcua/nodes.json` and `test_infra_opcua.md`.	2026-03-16
Q10	Target site hardware?	Windows Server 2022, 24 GB RAM, 1 TB drive, 16-core Xeon.	2026-03-16
Q9	What is the custom protocol? Is there an existing specification or SDK?	LmxProxy — gRPC-based protocol (protobuf-net code-first, port 5050, API key auth). Client SDK: `LmxProxyClient` NuGet package. See Component-DataConnectionLayer.md for full API mapping and protocol details.	2026-03-16
Q11	Are there specific external systems (MES, recipe manager) to integrate with for initial testing?	REST API test server (`infra/restapi/`) provides simulated external endpoints for External System Gateway and Inbound API testing. No real MES/recipe system needed for initial phases.	2026-03-16
Q15	Should the Machine Data Database schema be designed in this project, or is it out of scope?	Out of scope — Machine Data Database is a pre-existing database at customer sites. Test infra seeds sample tables/data in `infra/mssql/machinedata_seed.sql`.	2026-03-16
Q13	Who is the development team?	Solo developer with extensive Akka.NET experience and full availability. No parallelization constraints — phases are sequential.	2026-03-16
Q14	Is there an existing deployment target environment for early pilot testing?	Yes — developer can test directly. No separate pilot environment needed.	2026-03-16

11 KiB Raw Blame History

Implementation Questions

Open Questions

Phase 0: Solution Skeleton

Phase 1: Central Platform Foundations

Phase 2: Core Modeling, Validation & Deployment Contract

Phase 3A: Runtime Foundation

Phase 3B: Site I/O & Observability

Phase 3C: Deployment Pipeline & Store-and-Forward

Phase 4: Operator/Admin UI

Phase 7: Integrations

Resolved Questions

11 KiB

Raw Blame History