Phase 0 WP-0.10–0.12: Host skeleton, options classes, sample configs, and execution framework
- WP-0.10: Role-based Host startup (Central=WebApplication, Site=generic Host), 15 component AddXxx() extension methods, MapCentralUI/MapInboundAPI stubs - WP-0.11: 12 per-component options classes with config binding - WP-0.12: Sample appsettings for central and site topologies - Add execution procedure and checklist template to generate_plans.md - Add phase-0-checklist.md for execution tracking - Resolve all 21 open questions from plan generation - Update IDataConnection with batch ops and IAsyncDisposable 57 tests pass, zero warnings.
This commit is contained in:
@@ -6,63 +6,6 @@
|
||||
|
||||
## Open Questions
|
||||
|
||||
### Phase 0: Solution Skeleton
|
||||
|
||||
| # | Question | Context | Impact | Status |
|
||||
|---|----------|---------|--------|--------|
|
||||
| Q16 | Should `Result<T>` use a OneOf-style library or be hand-rolled? | Affects COM-7-1 (minimal dependencies). A hand-rolled `Result<T>` keeps zero external dependencies. | Phase 0. | Recommend hand-rolled to maintain zero-dependency constraint. |
|
||||
| Q17 | Should entity POCO properties be required (init-only) or settable? | EF Core Fluent API mapping may need settable properties. POCOs must be persistence-ignorant but still mappable by Phase 1. | Phase 0 / Phase 1 boundary. | Recommend `{ get; set; }` for EF compatibility, with constructor invariants for required fields. |
|
||||
| Q18 | What `QualityCode` values should the protocol abstraction define? | OPC UA has a rich quality model (Good, Uncertain, Bad with subtypes). Need to decide on a simplified shared set. | Phase 0. | Recommend: Good, Bad, Uncertain as the minimal set, with room to extend. |
|
||||
| Q19 | Should `IDataConnection` be `IAsyncDisposable` for connection cleanup? | Affects DCL connection actor lifecycle. | Phase 0 / Phase 3B boundary. | Recommend yes — add `IAsyncDisposable` to support proper cleanup. |
|
||||
|
||||
### Phase 1: Central Platform Foundations
|
||||
|
||||
| # | Question | Context | Impact | Status |
|
||||
|---|----------|---------|--------|--------|
|
||||
| Q-P1-1 | Should Data Protection keys be stored in the configuration database (via EF Core Data Protection key store) or on a shared filesystem path? | WP-10 requires both central nodes share Data Protection keys. DB storage is more portable; filesystem requires shared mount. | Implementation detail for WP-10. Either approach works. | Open — decide during implementation. Default to DB storage. |
|
||||
|
||||
### Phase 2: Core Modeling, Validation & Deployment Contract
|
||||
|
||||
| # | Question | Context | Impact | Status |
|
||||
|---|----------|---------|--------|--------|
|
||||
| Q-P2-1 | What hashing algorithm should be used for revision hashes? | SHA-256 is likely choice for determinism and collision resistance. | WP-16. Low risk — algorithm can be changed without API impact. | Open — proceed with SHA-256 as default. |
|
||||
| Q-P2-2 | What serialization format for the deployment package contract? | JSON is most natural for .NET; MessagePack is more compact. Decision affects Site Runtime deserialization. | WP-17. Medium — format must be stable once sites consume it. | Open — recommend JSON for debuggability; can add binary format later. |
|
||||
| Q-P2-3 | How should script pre-compilation handle references to runtime APIs (GetAttribute, SetAttribute, etc.) that don't exist at compile time on central? | Scripts reference runtime APIs only available at site. Central needs stubs. | WP-18, WP-19. Must be addressed before script compilation validation works. | Open — implement compilation against a stub ScriptApi assembly. |
|
||||
| Q-P2-4 | Should semantic validation for CallShared resolve against shared script library at validation time, or deployed version at target site? | Shared scripts may be modified between validation and deployment. | WP-19. Low risk if validation re-runs before deployment. | Open — validate against current library; document re-validation on deploy. |
|
||||
|
||||
### Phase 3A: Runtime Foundation
|
||||
|
||||
| # | Question | Context | Impact | Status |
|
||||
|---|----------|---------|--------|--------|
|
||||
| Q-P3A-1 | What is the optimal batch size and delay for staggered Instance Actor startup? | Component-SiteRuntime.md suggests 20 with a "short delay." Actual values depend on OPC UA server capacity. | Performance tuning. Default to 20/100ms, make configurable. | Deferred — tune during Phase 3B when DCL is integrated. |
|
||||
| Q-P3A-2 | Should the SQLite schema use a single database file or separate files per concern (configs, overrides, S&F, events)? | Single file is simpler. Separate files isolate concerns and allow independent backup/maintenance. | Schema design. | Recommend single file with separate tables. Simpler transaction management. Final decision during implementation. |
|
||||
| Q-P3A-3 | Should Akka.Persistence (event sourcing / snapshotting) be used for the Deployment Manager singleton, or is direct SQLite access sufficient? | Akka.Persistence adds complexity (journal, snapshots) but provides built-in recovery. Direct SQLite is simpler for this use case. | Architecture. | Recommend direct SQLite — Deployment Manager recovery is a full read-all-configs-and-rebuild pattern, not event replay. |
|
||||
|
||||
### Phase 3B: Site I/O & Observability
|
||||
|
||||
| # | Question | Context | Impact | Status |
|
||||
|---|----------|---------|--------|--------|
|
||||
| Q-P3B-1 | What is the exact dedicated blocking I/O dispatcher configuration for Script Execution Actors? | KDD-runtime-3 says "dedicated blocking I/O dispatcher" — need Akka.NET HOCON config (thread pool size, throughput settings). | WP-15. Sensible defaults can be set; tuned in Phase 8. | Deferred — use Akka.NET default blocking-io-dispatcher config; tune during Phase 8 performance testing. |
|
||||
| Q-P3B-2 | Should LmxProxy adapter expose WriteBatchAndWaitAsync (write-and-poll handshake) through IDataConnection or as a protocol-specific extension? | CD-DCL-5 lists WriteBatchAndWaitAsync but IDataConnection only defines simple Write. | WP-8. Does not block core functionality. | Deferred — expose as protocol-specific extension method; not part of IDataConnection core contract. |
|
||||
| Q-P3B-3 | What is the Rate of Change alarm evaluation time window? | Section 3.4 says "changes faster than a defined threshold" but does not specify the time window (per-second? per-minute? configurable?). | WP-16. Needs a design decision for the evaluation algorithm. | Deferred — implement as configurable window (default: per-second rate). Document in alarm definition schema. |
|
||||
| Q-P3B-4 | How does the health report sequence number behave across failover? | Sequence number is monotonic within a singleton lifecycle. After failover, the new singleton starts at 1. Central must handle this. | WP-27, WP-28. | Resolved in design — central accepts report when site is offline; for online sites, requires seq > last. On failover, site goes offline first (missed reports), so the reset is naturally handled. |
|
||||
|
||||
### Phase 3C: Deployment Pipeline & Store-and-Forward
|
||||
|
||||
| # | Question | Context | Impact | Status |
|
||||
|---|----------|---------|--------|--------|
|
||||
| Q-P3C-1 | Should S&F retry timers be reset on failover or continue from the last known retry timestamp? | On failover, the new active node loads buffer from SQLite. Messages have `last_attempt_at` timestamps. Should retry timing continue relative to `last_attempt_at` or reset to "now"? | Affects retry behavior immediately after failover. Recommend: continue from `last_attempt_at` to avoid burst retries. | Open |
|
||||
| Q-P3C-2 | What is the maximum number of parked messages returned in a single remote query? | Communication Layer pattern 8 uses 30s timeout. Very large parked message sets may need pagination. | Recommend: paginated query (e.g., 100 per page) consistent with Site Event Logging pagination pattern. | Open |
|
||||
| Q-P3C-3 | Should the per-instance operation lock be in-memory (lost on central failover) or persisted? | In-memory is simpler and consistent with "in-progress deployments treated as failed on failover." Persisted lock could cause orphan locks. | Recommend: in-memory. On failover, all locks released. Site state query resolves any ambiguity. | Open |
|
||||
|
||||
### Phase 4: Operator/Admin UI
|
||||
|
||||
| # | Question | Context | Impact | Status |
|
||||
|---|----------|---------|--------|--------|
|
||||
| Q-P4-1 | Should the API key value be auto-generated (GUID/random) or allow user-provided values? | Component-InboundAPI.md says "key value" but does not specify generation. | Phase 4, WP-5. | Open — assume auto-generated with optional copy-to-clipboard; user can regenerate. |
|
||||
| Q-P4-2 | Should the health dashboard support configurable refresh intervals or always use the 30s report interval? | Component-HealthMonitoring.md specifies 30s default interval. | Phase 4, WP-9. | Open — assume display updates on every report arrival (no UI-side polling); interval is server-side config. |
|
||||
| Q-P4-3 | Should area deletion cascade to child areas or require bottom-up deletion? | HighLevelReqs 3.10 says "parent-child relationships" but does not specify cascade behavior. | Phase 4, WP-3. | Open — assume cascade delete of child areas (if no instances assigned to any area in the subtree). |
|
||||
|
||||
### Phase 7: Integrations
|
||||
|
||||
| # | Question | Context | Impact | Status |
|
||||
@@ -89,3 +32,25 @@
|
||||
| Q15 | Should the Machine Data Database schema be designed in this project, or is it out of scope? | Out of scope — Machine Data Database is a pre-existing database at customer sites. Test infra seeds sample tables/data in `infra/mssql/machinedata_seed.sql`. | 2026-03-16 |
|
||||
| Q13 | Who is the development team? | Solo developer with extensive Akka.NET experience and full availability. No parallelization constraints — phases are sequential. | 2026-03-16 |
|
||||
| Q14 | Is there an existing deployment target environment for early pilot testing? | Yes — developer can test directly. No separate pilot environment needed. | 2026-03-16 |
|
||||
| Q16 | Should `Result<T>` use a OneOf-style library or be hand-rolled? | Hand-rolled to maintain zero-dependency constraint (REQ-COM-7). | 2026-03-16 |
|
||||
| Q17 | Should entity POCO properties be required (init-only) or settable? | `{ get; set; }` for EF compatibility, with constructor invariants for required fields. | 2026-03-16 |
|
||||
| Q18 | What `QualityCode` values should the protocol abstraction define? | Good, Bad, Uncertain as the minimal set, with room to extend. | 2026-03-16 |
|
||||
| Q19 | Should `IDataConnection` be `IAsyncDisposable` for connection cleanup? | Yes — add `IAsyncDisposable` to support proper cleanup in DCL connection actors. | 2026-03-16 |
|
||||
| Q-P1-1 | Data Protection keys — DB or shared filesystem? | DB storage (EF Core Data Protection key store). More portable than shared filesystem mount. | 2026-03-16 |
|
||||
| Q-P2-1 | What hashing algorithm for revision hashes? | SHA-256. | 2026-03-16 |
|
||||
| Q-P2-2 | What serialization format for the deployment package contract? | JSON for debuggability. Can add binary format later if needed. | 2026-03-16 |
|
||||
| Q-P2-3 | How should script pre-compilation handle references to runtime APIs? | Compile against a stub ScriptApi assembly at central. Site uses real implementation. | 2026-03-16 |
|
||||
| Q-P2-4 | Semantic validation for CallShared — current library or deployed version? | Validate against current library; re-validate on deploy. | 2026-03-16 |
|
||||
| Q-P3A-1 | Staggered Instance Actor startup batch size/delay? | Default 20 actors per batch, 100ms delay. Make configurable. Tune during Phase 3B/8. | 2026-03-16 |
|
||||
| Q-P3A-2 | Single SQLite file or separate files per concern? | Single file with separate tables. Simpler transaction management. | 2026-03-16 |
|
||||
| Q-P3A-3 | Akka.Persistence or direct SQLite for Deployment Manager singleton? | Direct SQLite. Recovery is full read-all-configs-and-rebuild, not event replay. | 2026-03-16 |
|
||||
| Q-P3B-1 | Blocking I/O dispatcher config for Script Execution Actors? | Use Akka.NET default blocking-io-dispatcher config. Tune during Phase 8 performance testing. | 2026-03-16 |
|
||||
| Q-P3B-2 | Should WriteBatchAndWaitAsync be on IDataConnection or protocol-specific? | Add to `IDataConnection` — both OPC UA and LmxProxy can implement it. | 2026-03-16 |
|
||||
| Q-P3B-3 | Rate of Change alarm evaluation time window? | Configurable window, default per-second rate. Document in alarm definition schema. | 2026-03-16 |
|
||||
| Q-P3B-4 | Health report sequence number across failover? | Resolved in design — offline detection handles the reset naturally. Central accepts lower seq after site goes offline/online. | 2026-03-16 |
|
||||
| Q-P3C-1 | S&F retry timers on failover — reset or continue? | Continue from `last_attempt_at` to avoid burst retries. | 2026-03-16 |
|
||||
| Q-P3C-2 | Max parked messages per remote query? | Paginated, 100 per page, consistent with Site Event Logging pattern. | 2026-03-16 |
|
||||
| Q-P3C-3 | Per-instance operation lock — in-memory or persisted? | In-memory. Released on failover. Site state query resolves any ambiguity. | 2026-03-16 |
|
||||
| Q-P4-1 | API key values — auto-generated or user-provided? | Auto-generated with copy-to-clipboard. User can regenerate. | 2026-03-16 |
|
||||
| Q-P4-2 | Health dashboard refresh interval? | Updates on every report arrival (server push). No UI-side polling. | 2026-03-16 |
|
||||
| Q-P4-3 | Area deletion — cascade or bottom-up? | Cascade delete child areas if no instances are assigned to any area in the subtree. | 2026-03-16 |
|
||||
|
||||
Reference in New Issue
Block a user