9aa1259504
Bite-sized TDD plan. M1 (runtime wiring) fully detailed across 10 tasks after verifying the purge/reconciliation actors already exist and only need Host wiring + a gRPC pull client + event-logger injection. M2/M3/M4 as right-sized task inventories with files, classification, and AC. Co-located .tasks.json for executing-plans resume.
305 lines
24 KiB
Markdown
305 lines
24 KiB
Markdown
# stillpending.md Phase 1 (M1–M4) Implementation Plan
|
||
|
||
> **For Claude:** REQUIRED SUB-SKILL: Use superpowers-extended-cc:executing-plans to implement this plan task-by-task.
|
||
|
||
**Goal:** Stabilize ScadaBridge — make the Tier-1 silent gaps actually run, correct the Tier-2 behavioral divergences, and reconcile the Tier-4 doc↔code drift, per `docs/plans/2026-06-15-stillpending-completion-design.md`.
|
||
|
||
**Architecture:** Risk-first. M1 wires already-implemented-but-never-started central actors + fills the site event-log categories. M2 corrects behavioral gaps with targeted, test-first edits. M3 replaces the fake script "compiler" with a real Roslyn compile + semantic forbidden-API enforcement. M4 is doc-only reconciliation. Each task is independently shippable; spec + code + tests + deploy travel together (CLAUDE.md).
|
||
|
||
**Tech Stack:** C#/.NET 10, Akka.NET 1.5 (cluster singletons), EF Core 10 (MS SQL + SQLite), gRPC (sitestream.proto), Roslyn (`Microsoft.CodeAnalysis.CSharp.Scripting` — already a referenced package), xUnit + FluentAssertions + NSubstitute/Moq, Blazor Server.
|
||
|
||
**Build/test commands (repo-wide):**
|
||
- Build: `dotnet build ZB.MOM.WW.ScadaBridge.slnx`
|
||
- Test one project: `dotnet test tests/<Project>/<Project>.csproj`
|
||
- Filter one test: `dotnet test tests/<Project>/<Project>.csproj --filter "FullyQualifiedName~<TestName>"`
|
||
- Cluster runtime change: rebuild image with `bash docker/deploy.sh`
|
||
|
||
---
|
||
|
||
## Discovery (verified before planning)
|
||
|
||
- `AuditLogPurgeActor` (`src/ZB.MOM.WW.ScadaBridge.AuditLog/Central/AuditLogPurgeActor.cs`) and `SiteAuditReconciliationActor` (`.../Central/SiteAuditReconciliationActor.cs`) are **fully implemented** — timers, cursors, stalled detection, per-row retry budget, error isolation. They are simply **never instantiated**. M1 wires them; it does not write them.
|
||
- `IPullAuditEventsClient` (`.../Central/IPullAuditEventsClient.cs`) has a documented NoOp default and no production gRPC implementation. The reconciliation actor runs harmlessly against the NoOp today.
|
||
- Central singletons are created in `AkkaHostedService.RegisterCentralActors` (`src/ZB.MOM.WW.ScadaBridge.Host/Actors/AkkaHostedService.cs:~592`), using the `ClusterSingletonManager` + `ClusterSingletonProxy` pattern (see AuditLogIngest `:460`, SiteCallAudit `:524`). This is the insertion point for M1.1/M1.2.
|
||
- `AddAuditLogCentralMaintenance` (`.../AuditLog/ServiceCollectionExtensions.cs:319`) already registers the partition-roll-*forward* hosted service + central health snapshot; it does NOT register the purge/reconciliation **actors** (those need the ActorSystem, so they're Host-wired, not DI).
|
||
- `ISiteEventLogger.LogEventAsync(eventType, severity, instanceId, source, message, details?)` (`.../SiteEventLogging/ISiteEventLogger.cs`) is the emit surface. Only `connection` + `script` (error-path) categories are emitted today.
|
||
- `SiteCallAuditActor` (`.../SiteCallAudit/SiteCallAuditActor.cs:24-34`) explicitly defers its reconciliation puller + purge scheduler; `SiteCallAuditRepository.PurgeTerminalAsync` (`.../ConfigurationDatabase/Repositories/SiteCallAuditRepository.cs:213`) exists but is never invoked.
|
||
|
||
**Open risk flagged for M1.1:** whether `sitestream.proto` already exposes a server-streaming/unary `PullAuditEvents` RPC and a site-side handler reading `ISiteAuditQueue.ReadPendingSinceAsync`. Task M1.1 begins by confirming this; if absent, the proto + site handler are added first (sub-tasks M1.1a/b). Do not assume.
|
||
|
||
---
|
||
|
||
# Milestone M1 — Runtime wiring (Tier 1 #3, #4, #5, #6)
|
||
|
||
Wire behavior that exists but never starts, and fill the event-log categories.
|
||
|
||
### Task M1.0: Confirm proto/site surface for audit pull (spike)
|
||
|
||
**Classification:** trivial (investigation; no production code)
|
||
**Estimated implement time:** ~4 min
|
||
**Parallelizable with:** M1.5, M1.6, M1.7, M1.8 (event-logging tasks)
|
||
|
||
**Files:**
|
||
- Read: `src/ZB.MOM.WW.ScadaBridge.Communication/**/sitestream.proto` (or wherever the proto lives — `find . -name "*.proto"`)
|
||
- Read: `src/ZB.MOM.WW.ScadaBridge.Commons/Interfaces/Services/ISiteAuditQueue.cs` (`ReadPendingSinceAsync`)
|
||
- Read: `src/ZB.MOM.WW.ScadaBridge.Commons/Messages/Integration/` (`PullAuditEventsResponse`)
|
||
|
||
**Step 1:** `grep -rn "PullAuditEvents\|rpc Pull" --include=*.proto --include=*.cs src` — determine whether a site-side pull RPC + handler exist.
|
||
**Step 2:** Record the finding at the top of M1.1 (one of: "RPC exists → client-only", or "RPC missing → add proto + site handler first"). This decides whether M1.1 is 1 task or 3.
|
||
**Step 3:** No commit (investigation only); update this plan's M1.1 scope note.
|
||
|
||
---
|
||
|
||
### Task M1.1: Production `IPullAuditEventsClient` (gRPC)
|
||
|
||
**Classification:** high-risk (data contract + cross-cluster gRPC)
|
||
**Estimated implement time:** ~5 min (client only; +2 tasks if proto/site handler missing — see M1.0)
|
||
**Parallelizable with:** M1.5–M1.8
|
||
|
||
**Files:**
|
||
- Create: `src/ZB.MOM.WW.ScadaBridge.AuditLog/Central/GrpcPullAuditEventsClient.cs`
|
||
- Modify: `src/ZB.MOM.WW.ScadaBridge.AuditLog/ServiceCollectionExtensions.cs` (replace the NoOp `IPullAuditEventsClient` binding inside the central path — likely a new `AddAuditLogCentralReconciliationClient` helper to keep the "every Add* is safe from any root" invariant)
|
||
- Test: `tests/ZB.MOM.WW.ScadaBridge.AuditLog.Tests/Central/GrpcPullAuditEventsClientTests.cs`
|
||
|
||
**Approach:** Implement `PullAsync(siteId, sinceUtc, batchSize, ct)` by resolving the per-site channel via the existing `SiteStreamGrpcClientFactory` (mirror `SiteStreamGrpcClient`), calling the site `PullAuditEvents` RPC, and mapping the proto reply into `PullAuditEventsResponse` (`Events` oldest-first + `MoreAvailable`). MUST NOT throw on tolerable transport faults (connection refused, deadline exceeded) — catch and return an empty response (per the interface contract docstring).
|
||
|
||
**Step 1:** Write failing test: a `PullAsync` against a stubbed site channel returns mapped events ordered oldest-first and surfaces `MoreAvailable`; a connection-refused fault returns an empty response (no throw).
|
||
**Step 2:** Run: `dotnet test tests/ZB.MOM.WW.ScadaBridge.AuditLog.Tests/ZB.MOM.WW.ScadaBridge.AuditLog.Tests.csproj --filter "FullyQualifiedName~GrpcPullAuditEventsClient"` → FAIL.
|
||
**Step 3:** Implement the client (+ DI helper). If M1.0 found the RPC missing, do M1.1a (add `rpc PullAuditEvents` to the proto + regenerate) and M1.1b (site handler reading `ISiteAuditQueue.ReadPendingSinceAsync`) first.
|
||
**Step 4:** Run the filter → PASS; then full project test.
|
||
**Step 5:** Commit: `feat(audit): production gRPC IPullAuditEventsClient for site reconciliation`.
|
||
|
||
---
|
||
|
||
### Task M1.2: Wire `SiteAuditReconciliationActor` + `AuditLogPurgeActor` as central singletons
|
||
|
||
**Classification:** high-risk (actor model, cluster singleton, runtime)
|
||
**Estimated implement time:** ~5 min
|
||
**Parallelizable with:** M1.5–M1.8
|
||
**Depends on:** M1.1 (real client) — though wiring works against the NoOp too, so M1.2 can land first and M1.1 swaps the binding.
|
||
|
||
**Files:**
|
||
- Modify: `src/ZB.MOM.WW.ScadaBridge.Host/Actors/AkkaHostedService.cs` (in `RegisterCentralActors`, after the SiteCallAudit singleton block ~`:589`)
|
||
- Modify: `src/ZB.MOM.WW.ScadaBridge.Host/Program.cs` (ensure `AddAuditLogCentralMaintenance` + the reconciliation client helper are called on the central path — confirm against existing call site)
|
||
- Test: `tests/ZB.MOM.WW.ScadaBridge.Host.Tests/` (singleton-registration assertion) and/or `tests/ZB.MOM.WW.ScadaBridge.IntegrationTests/`
|
||
|
||
**Approach:** Mirror the SiteCallAudit singleton pattern (`:524-573`): build `ClusterSingletonManager.Props(Props.Create(() => new SiteAuditReconciliationActor(...)))` resolving `ISiteEnumerator`, `IPullAuditEventsClient`, root `IServiceProvider`, `IOptions<SiteAuditReconciliationOptions>`, logger from `_serviceProvider`; same for `AuditLogPurgeActor` (resolves `IServiceProvider`, `IOptions<AuditLogPurgeOptions>`, `IOptions<AuditLogOptions>`, logger). Register both on the central role with distinct singleton names (`site-audit-reconciliation`, `audit-log-purge`). Add CoordinatedShutdown graceful-stop hooks mirroring `:550-567`. Verify `SiteAuditReconciliationOptions`/`AuditLogPurgeOptions` are bound in the central composition root (add to `AddAuditLogCentralMaintenance` if not).
|
||
|
||
**Step 1:** Write failing test asserting the central host registers `audit-log-purge` and `site-audit-reconciliation` singleton managers (resolve actor selection or assert via a test probe in the integration harness).
|
||
**Step 2:** Run the test → FAIL.
|
||
**Step 3:** Implement the wiring; bind the options sections.
|
||
**Step 4:** Run test → PASS; `dotnet build ZB.MOM.WW.ScadaBridge.slnx`.
|
||
**Step 5:** Commit: `feat(audit): start AuditLogPurgeActor + SiteAuditReconciliationActor central singletons`.
|
||
|
||
---
|
||
|
||
### Task M1.3: Site Call Audit — periodic reconciliation pull
|
||
|
||
**Classification:** high-risk (actor + EF cursor read + cross-cluster)
|
||
**Estimated implement time:** ~5 min (may split: cursor-read repo method vs actor scheduler)
|
||
**Parallelizable with:** M1.5–M1.8
|
||
|
||
**Files:**
|
||
- Modify: `src/ZB.MOM.WW.ScadaBridge.SiteCallAudit/SiteCallAuditActor.cs` (add a `ReconciliationTick` self-schedule + per-site pull, mirroring `SiteAuditReconciliationActor`)
|
||
- Modify: `src/ZB.MOM.WW.ScadaBridge.ConfigurationDatabase/Repositories/SiteCallAuditRepository.cs` (+ interface) — add a "changed-since cursor" read if absent
|
||
- Modify: `SiteCallAuditOptions` — add reconciliation interval + batch size
|
||
- Test: `tests/ZB.MOM.WW.ScadaBridge.SiteCallAudit.Tests/`
|
||
|
||
**Approach:** Add a timer-driven reconciliation pull that asks each site for `SiteCallOperational` rows changed since a per-site cursor and upserts them idempotently (`UpsertAsync` is already monotonic). Reuse the gRPC pull surface from M1.1 if it can carry SiteCall operational state; otherwise add a sibling pull. **Sub-decision (record at implement time):** confirm whether `PullAuditEventsResponse` carries SiteCall operational rows (the audit found it does not) — if not, extend the telemetry/pull contract additively or add a dedicated SiteCall pull RPC. Flag if this grows beyond ~300 LOC → split.
|
||
**Steps:** TDD as above (failing test: a row dropped from telemetry is back-filled on the next reconciliation tick) → implement → pass → build → commit `feat(sitecallaudit): periodic reconciliation pull back-fills lost telemetry`.
|
||
|
||
---
|
||
|
||
### Task M1.4: Site Call Audit — daily terminal-row purge scheduler
|
||
|
||
**Classification:** standard (actor scheduler + existing repo method)
|
||
**Estimated implement time:** ~3 min
|
||
**Parallelizable with:** M1.5–M1.8
|
||
|
||
**Files:**
|
||
- Modify: `src/ZB.MOM.WW.ScadaBridge.SiteCallAudit/SiteCallAuditActor.cs` (add a `PurgeTick` timer invoking `ISiteCallAuditRepository.PurgeTerminalAsync`)
|
||
- Modify: `SiteCallAuditOptions` — add purge interval + retention days (default 365)
|
||
- Test: `tests/ZB.MOM.WW.ScadaBridge.SiteCallAudit.Tests/`
|
||
|
||
**Approach:** Mirror `AuditLogPurgeActor`'s `PurgeTick` cadence; resolve the scoped repo per tick; continue-on-error. **Step 1:** failing test — terminal rows older than retention are purged on tick. **Steps 2–5:** implement → pass → build → commit `feat(sitecallaudit): daily terminal-row purge scheduler`.
|
||
|
||
---
|
||
|
||
### Task M1.5: Site Event Logging — Alarm events
|
||
|
||
**Classification:** standard
|
||
**Estimated implement time:** ~4 min
|
||
**Parallelizable with:** M1.0–M1.4, M1.6, M1.7, M1.8
|
||
|
||
**Files:**
|
||
- Modify: `src/ZB.MOM.WW.ScadaBridge.SiteRuntime/Actors/AlarmActor.cs` and `.../NativeAlarmActor.cs` (inject `ISiteEventLogger`; emit `alarm` events on raise/clear/ack transitions)
|
||
- Test: `tests/ZB.MOM.WW.ScadaBridge.SiteRuntime.Tests/`
|
||
|
||
**Approach:** Add the `ISiteEventLogger` dependency (confirm how SiteRuntime actors receive DI services — via `Props`/factory, mirror `DataConnectionActor`/`ScriptExecutionActor` which already log). Emit `LogEventAsync("alarm", severity, instanceId, source, message, details)` on state transitions. TDD: failing test asserts an alarm transition produces an `alarm` row → implement → pass → build → commit `feat(siteeventlog): emit alarm events`.
|
||
|
||
---
|
||
|
||
### Task M1.6: Site Event Logging — Deployment + Instance-lifecycle events
|
||
|
||
**Classification:** standard
|
||
**Estimated implement time:** ~4 min
|
||
**Parallelizable with:** M1.5, M1.7, M1.8
|
||
|
||
**Files:**
|
||
- Modify: `src/ZB.MOM.WW.ScadaBridge.SiteRuntime/Actors/DeploymentManagerActor.cs` (+ InstanceActor if instance-lifecycle lives there) — emit `deployment` + `instance_lifecycle` events on deploy/enable/disable/delete/apply.
|
||
- Test: `tests/ZB.MOM.WW.ScadaBridge.SiteRuntime.Tests/` (or DeploymentManager.Tests)
|
||
|
||
**Approach + steps:** as M1.5; commit `feat(siteeventlog): emit deployment + instance-lifecycle events`.
|
||
|
||
---
|
||
|
||
### Task M1.7: Site Event Logging — Store-and-Forward + Notification events
|
||
|
||
**Classification:** standard
|
||
**Estimated implement time:** ~4 min
|
||
**Parallelizable with:** M1.5, M1.6, M1.8
|
||
|
||
**Files:**
|
||
- Modify: Store-and-Forward engine (`src/ZB.MOM.WW.ScadaBridge.StoreAndForward/...StoreAndForwardService.cs`) — emit `store_and_forward` events on buffer/retry/park.
|
||
- Modify: site notification forwarding path — emit `notification` events (note: delivery is central; the site event is "forwarded"/"queued").
|
||
- Test: relevant `*.Tests`
|
||
|
||
**Approach + steps:** as M1.5; **add `notification` to the `eventType` doc enum** in `ISiteEventLogger` (the XML doc currently lists 6 categories, not notification). Commit `feat(siteeventlog): emit store-and-forward + notification events`.
|
||
|
||
---
|
||
|
||
### Task M1.8: Site Event Logging — script started/completed (Info)
|
||
|
||
**Classification:** small
|
||
**Estimated implement time:** ~3 min
|
||
**Parallelizable with:** M1.5–M1.7
|
||
|
||
**Files:**
|
||
- Modify: `src/ZB.MOM.WW.ScadaBridge.SiteRuntime/Actors/ScriptExecutionActor.cs:238-256`, `ScriptActor.cs:368` — add `Info` started/completed events alongside the existing `Error` events.
|
||
- Test: `tests/ZB.MOM.WW.ScadaBridge.SiteRuntime.Tests/`
|
||
|
||
**Steps:** failing test asserts a successful script run emits started + completed Info rows → implement → pass → build → commit `feat(siteeventlog): emit script started/completed events`.
|
||
|
||
---
|
||
|
||
### Task M1.9: M1 integration verification + redeploy
|
||
|
||
**Classification:** standard (no new logic; end-to-end proof)
|
||
**Estimated implement time:** ~5 min
|
||
**Parallelizable with:** none (gate for M1)
|
||
**Depends on:** M1.1–M1.8
|
||
|
||
**Steps:**
|
||
1. `dotnet build ZB.MOM.WW.ScadaBridge.slnx` → 0 errors.
|
||
2. `dotnet test` across `AuditLog.Tests`, `SiteCallAudit.Tests`, `SiteRuntime.Tests`, `IntegrationTests` → green (MSSQL-gated tests may skip; note skips).
|
||
3. `bash docker/deploy.sh`; confirm cluster healthy (`curl -s localhost:9000/health/ready`).
|
||
4. Spot-check logs for "Purged AuditLog partition" / reconciliation ticks / new event-log categories.
|
||
5. Commit (if any fixture cleanup): `test(m1): integration verification of audit wiring + event-log categories`.
|
||
|
||
---
|
||
|
||
# Milestone M2 — Correctness & behavioral gaps (Tier 2)
|
||
|
||
> Each task is TDD (write failing behavior test → implement → pass → commit). Tasks are right-sized; where the exact edit spans a method, the implementer reads the cited file:line first (the `Files:` block is the contract). Full code is specified at implement time against the live file — do not pre-fabricate.
|
||
|
||
### Task M2.1: `Database.CachedWrite` — immediate attempt + synchronous permanent-`Failed`
|
||
**Classification:** high-risk (trust-boundary behavior) · **~5 min** · **Parallelizable with:** M2.2–M2.13
|
||
**Files:** `src/ZB.MOM.WW.ScadaBridge.ExternalSystemGateway/DatabaseGateway.cs:78-204`; test `tests/ZB.MOM.WW.ScadaBridge.ExternalSystemGateway.Tests/`.
|
||
**AC:** mirror `ExternalSystemClient.cs:100-161` — attempt immediately; permanent SQL error throws `PermanentExternalSystemException`-equivalent synchronously (`Failed`), only transient errors buffer. Failing test: a permanent SQL error returns `Failed` synchronously and is NOT enqueued.
|
||
|
||
### Task M2.2: Apply alarm `conditionFilter`
|
||
**Classification:** high-risk (DCL + OPC UA filter) · **~5 min** · **Parallelizable with:** others
|
||
**Files:** `RealOpcUaClient.cs:242,295` (build `WhereClause` from filter), `DataConnectionActor.cs:1482,1540-1554` (honor filter in routing), `MxGatewayDataConnection.cs:154-167`; tests in DCL test project.
|
||
**AC:** a non-null `conditionFilter` mirrors only matching conditions. Failing test: filtered subscription drops non-matching conditions.
|
||
|
||
### Task M2.3: Per-script execution timeout
|
||
**Classification:** standard · **~5 min** · split if model change is large
|
||
**Files:** `TemplateScript` entity (+ flattened config type), `SiteRuntimeOptions.cs:31`, `ScriptExecutionActor.cs:100`, `AlarmExecutionActor.cs:66`; tests.
|
||
**AC:** a per-script timeout overrides the global default; falls back when unset.
|
||
|
||
### Task M2.4: Connection-level diff surfaces in deployment diff
|
||
**Classification:** standard · **~5 min**
|
||
**Files:** `Commons/Types/Flattening/ConfigurationDiff.cs:7-24` (add `ConnectionChanges` slot + `HasChanges`), `DiffService.cs:45-53` (call `ComputeConnectionsDiff` from `ComputeDiff`), CentralUI diff component; tests.
|
||
**AC:** a connection endpoint/protocol/failover change appears in the diff and UI.
|
||
|
||
### Task M2.5: `MachineDataDb` fail-fast
|
||
**Classification:** standard · **~3 min**
|
||
**Files:** `DatabaseOptions.cs:6-12` (add property), `StartupValidator.cs:60-61` (validate non-empty on central), DbContext only if consumed; tests.
|
||
**AC:** central node fails startup with a clear message when `MachineDataDb` is empty.
|
||
|
||
### Task M2.6: CI grep-guard against `UPDATE/DELETE … AuditLog`
|
||
**Classification:** small · **~3 min**
|
||
**Files:** new build script (e.g. `tools/check-auditlog-append-only.sh`) + a `.csproj`/CI hook or an MSBuild target; a test that the guard trips on a planted violation.
|
||
**AC:** build/CI fails if a data-layer file contains `UPDATE`/`DELETE` against `AuditLog`.
|
||
|
||
### Task M2.7: LDAP periodic re-query on session refresh
|
||
**Classification:** high-risk (security/session) · **~5 min**
|
||
**Files:** `JwtTokenService` (`RefreshToken`/`ShouldRefresh`/`IsIdleTimedOut` — wire callers), or an `OnValidatePrincipal` cookie-revalidation hook in `Security/ServiceCollectionExtensions.cs`; tests with adversarial coverage.
|
||
**AC:** interactive session roles are re-queried from LDAP at the refresh boundary (never >15 min stale); LDAP failure leaves the active session on prior roles (per spec).
|
||
|
||
### Tasks M2.8–M2.13: low-severity batch (one task each, `small`/`standard`, mostly parallelizable)
|
||
- **M2.8** Return-type compatibility check — `SemanticValidator.cs:62-63,279-287` (wire the built `*ReturnMap` into a comparison).
|
||
- **M2.9** Argument *type* compatibility — `SemanticValidator.cs:251-266,390-425` (parse + compare arg types, not just count).
|
||
- **M2.10** Native-alarm-source capability validation wired into deploy pipeline — `SemanticValidator.cs:239-245`, `FlatteningPipeline.cs:93,115` (supply `alarmCapableConnectionNames`).
|
||
- **M2.11** Binding-completeness as deploy-gating Error (+ "name exists at site") — `ValidationService.cs:504-519`, `ValidationResult.cs:9`.
|
||
- **M2.12** Debug snapshot/subscribe error for unknown instance — `DeploymentManagerActor.cs:845-866` (return error reply, not empty snapshot).
|
||
- **M2.13** Misc: recursion-limit → site event log (`ScriptRuntimeContext.cs:302-305,464-466`); debug-stream ordering + timestamp-dedup replay (`DebugStreamBridgeActor.cs:89-103`); OPC UA transition field population (`RealOpcUaClient.cs:395-403`); readiness "required singletons" probe (`Program.cs:188-201`); register SiteEventLog active-node purge gate (`SiteEventLogging/ServiceCollectionExtensions.cs:33-37`); consume `FailedWriteCount` in Health Monitoring (`ISiteEventLogger.cs:32-40`); reconcile `StateTransitionValidator` delete-from-`NotDeployed` (`StateTransitionValidator.cs:38-39`). **Split each into its own task at execution time** — they are independent and parallelizable; grouped here only to keep the plan readable.
|
||
|
||
### Task M2.14: nested `Object`/`List` type validation (Inbound API, from M4 disposition)
|
||
**Classification:** standard · **~5 min**
|
||
**Files:** `ParameterValidator.cs:109-145`, `ReturnValueValidator.cs:18`; tests.
|
||
**AC:** nested field/element types are validated, not just `ValueKind`.
|
||
|
||
---
|
||
|
||
# Milestone M3 — Script trust boundary (Tier 1 #1, #2)
|
||
|
||
### Task M3.1: Real Roslyn compile in `ScriptCompiler.TryCompile`
|
||
**Classification:** high-risk (validation gate, security) · **~5 min** (may split: compile vs diagnostics mapping)
|
||
**Files:** `src/ZB.MOM.WW.ScadaBridge.TemplateEngine/Validation/ScriptCompiler.cs:56-104`, `.csproj` (confirm `Microsoft.CodeAnalysis.CSharp.Scripting` ref — it's in `Directory.Packages.props`), `ValidationService.cs:128`; tests.
|
||
**AC:** semantically-invalid C# (undefined symbol, type error) fails validation with a useful diagnostic; valid scripts pass. Failing test: a script referencing an undefined symbol fails to validate.
|
||
|
||
### Task M3.2: Semantic forbidden-API enforcement (replace substring scan)
|
||
**Classification:** high-risk (security boundary) · **~5 min**
|
||
**Files:** `ScriptCompiler.cs:14-22,61-72`, coordinate with Site Runtime sandbox; tests with an adversarial corpus.
|
||
**AC:** alias / `using static` / `global::` paths to a forbidden API are detected via Roslyn symbol resolution and block deploy. Adversarial bypass tests must FAIL to deploy.
|
||
|
||
### Task M3.3: Real compile for shared scripts
|
||
**Classification:** standard · **~4 min**
|
||
**Files:** `SharedScriptService.cs:168-206`; tests. **AC:** shared scripts get the same semantic compile.
|
||
|
||
### Task M3.4: Fixture cleanup + verification
|
||
**Classification:** standard · **~5 min** · **Depends on:** M3.1–M3.3
|
||
**Steps:** real compile may flag latent-invalid scripts in existing templates/test fixtures — fix them; run `dotnet test` for TemplateEngine + integration; commit.
|
||
|
||
---
|
||
|
||
# Milestone M4 — Doc reconciliation (Tier 4, doc-only, parallelizable)
|
||
|
||
> All `trivial`/`small`, doc-only, no test impact (except M2.14 which is code, planned in M2). Each is one commit.
|
||
|
||
### Task M4.1: Config-DB / Commons spec re-architecture
|
||
**Files:** `docs/requirements/Component-ConfigurationDatabase.md` (collapsed `AuditLog` schema → canonical + `DetailsJson` + computed cols), `Component-Commons.md` (`AuditEvent`→`ZB.MOM.WW.Audit` package, `ApiKey` retirement, undocumented types/interfaces, `SiteCall` field names). **AC:** spec matches `AuditLogRow.cs` + migrations.
|
||
|
||
### Task M4.2: Inbound API + Security + Notification spec drift
|
||
**Files:** `Component-InboundAPI.md` (Bearer auth, fire-and-forget audit timing, type validation now built per M2.14), `Component-Security.md` (cookie-only session model, role names Administrator/Designer/Deployer/Viewer), `Component-NotificationService.md` / Commons (`NotificationType` is Email-only; Teams future), `AuditKind` vocabulary (`ApiInbound.Completed` → `InboundRequest`/`InboundAuthFailure`; `ExecuteReader`→`DbWrite`).
|
||
|
||
### Task M4.3: CLI docs
|
||
**Files:** `src/ZB.MOM.WW.ScadaBridge.CLI/README.md` + `docs/requirements/Component-CLI.md` — document the `bundle` group; fix README option drift (`api-key create --methods`, `--key-id` vs `--id`, `api-method create --script`, `db-connection` no `--provider`, `set-methods`, scope-rule/health/template option names, `audit query` options). **AC:** every documented command/option matches a registered one.
|
||
|
||
### Task M4.4: Clear stale "deferred"/"no-op" markers for shipped features
|
||
**Files:** comments in `SiteCallAudit/ServiceCollectionExtensions.cs:11-13` (relay shipped), `BundleImporter.cs:28-30`, `AuditingDbCommand.cs:466-467,512` + `ScriptRuntimeContext.cs:1808` (M5 redaction shipped), `AuditLogPage.razor.cs:16-18` (HandleRowSelected wired), Transport design doc §13 (CLI shipped), `requirements-traceability.md` "Pending" rows, and the stale `.tasks.json` notes. **AC:** no comment claims "deferred/not implemented" for a shipped feature.
|
||
|
||
---
|
||
|
||
## Native tasks & dependencies
|
||
|
||
M1 tasks are created as native tasks (CLI-visible). M2/M3/M4 remain under the existing umbrella tasks (#13/#14/#15) and are broken into per-item native tasks at the start of each milestone's execution (especially the M2.8–M2.13 split). Dependency edges: M1.9 ⟵ M1.1–M1.8; M5 (#16) ⟵ M1 (#12), already set.
|
||
|
||
## Notes / risks carried forward
|
||
|
||
- **M1.1 proto risk** (M1.0 resolves it) — if the site pull RPC is missing, M1.1 grows by two sub-tasks (proto + site handler).
|
||
- **M1.3 contract risk** — `PullAuditEventsResponse` likely does not carry SiteCall operational state; an additive contract extension or a dedicated SiteCall pull may be needed. Confirm at implement time; split if >300 LOC.
|
||
- **M3.1 fixture risk** — real compile may surface latent-invalid existing scripts (budgeted as M3.4).
|
||
- Keep `git diff` review before each commit; rebuild the image (`bash docker/deploy.sh`) for any M1 runtime change before claiming done.
|