docs(tests): design for Playwright coverage expansion (7 audit recs)

Captures the 2026-06-05 coverage audit's gaps and the approved approach for closing them: ephemeral CLI-provisioned fixtures with outcome-tolerant asserts for the mutating suites (deploy lifecycle, retry/discard, transport import), UI CRUD round-trips, nav render hardening, a Health KPI load test, and a standardized skip-and-log policy. Next: writing-plans turns this into tasks.
2026-06-05 09:39:35 -04:00
parent d33617d65d
commit cb3b3bf373
1 changed files with 156 additions and 0 deletions
@@ -0,0 +1,156 @@
+# Playwright Coverage Expansion — Design
+
+**Date:** 2026-06-05
+**Status:** Approved (brainstorming complete) → ready for writing-plans
+**Component:** #9 Central UI — `tests/ZB.MOM.WW.ScadaBridge.CentralUI.PlaywrightTests`
+
+## Goal
+
+Close the functional-coverage gaps found in the 2026-06-05 Playwright coverage audit by
+implementing all 7 audit recommendations: add ~15–18 functional E2E tests, upgrade the
+shallow navigation tests, and standardize the skip policy — all against the live 8-node
+docker cluster, inside the existing xunit + `PlaywrightFixture` structure.
+
+## Background — the audit
+
+The suite (~68 tests) is **bimodal**: a deep, well-built audit/site-calls core wrapped in
+a thin shell of navigation + nav-visibility tests, with a large blind spot over the app's
+**mutating actions** (deploy, import, retry/discard, all CRUD writes).
+
+Key gaps the 7 recommendations target:
+
+1. **Topology instance lifecycle** (Deploy/Enable/Disable/Delete) — Akka-singleton relays,
+   the exact surface the recent report-page hang lived in — untested beyond URL change.
+2. **Parked-message / notification Retry/Discard** relays — untested (Site-Calls relay *is*
+   tested; it is the pattern to copy).
+3. **Transport Import → Apply** — bulk writes across all central config, Admin-only — zero
+   coverage.
+4. **Navigation tests assert URL only** — never that the destination rendered; a route
+   could 500 after navigation and stay green.
+5. **No Health-dashboard load test** — the page that fans out to three singleton `Ask`s
+   every 10s has no assertion its KPI tiles resolve vs. hang/degrade.
+6. **No successful persisted write through the UI** anywhere — the entire create/edit/delete
+   surface is functionally unverified end-to-end (`SiteCrudTests` only covers validation
+   failure; audit/site-calls "writes" are direct SQL seeds).
+7. **Silent coverage cliffs** — DB-dependent tests are inconsistent (`AuditLogPageTests`
+   *throw* when MSSQL is down; Site-Calls/grid tests *skip*), and skips aren't surfaced.
+
+## Decisions (settled during brainstorming)
+
+| # | Decision | Choice | Rationale |
+|---|---|---|---|
+| D1 | Mutation fidelity for state-changing tests | **Ephemeral fixtures + outcome-tolerant** | High fidelity, isolated; matches the existing `SiteCallsPageTests` real-relay test that asserts the round-trip happened (toast: `Applied`/`NotParked`/`SiteUnreachable`), not a deep cluster side-effect. |
+| D2 | How fixtures are created/torn down | **Shell out to the `scadabridge` CLI** | CLAUDE.md prefers the CLI for state setup; the CLI exposes every needed verb with `--format json` + clean `0`/`1` exit codes. |
+| D3 | Skip vs fail when DB/cluster unavailable | **Standardize on Skip + log** | Consistent `SkippableFact` everywhere; a logged skipped-summary prevents a downed dependency from masquerading as full green. Local dev without the cluster still gets green on the rest. |
+
+### CLI surface confirmed (host → `http://localhost:9000`, `multi-role`/`password`)
+
+- `site create --name --identifier [--description]`, `site delete --id`, `site list/get`,
+  `site area create --site-id --name`, `site area delete --id`, `site deploy-artifacts`.
+- `template create --name [--description] [--parent-id]`, `template delete --id`,
+  `template attribute add --template-id --name --data-type`, `template validate --id`.
+- `instance create --name --template-id --site-id [--area-id]`, `instance deploy --id`,
+  `instance enable --id`, `instance disable --id`, `instance delete --id`.
+- `bundle export --output --passphrase [--all|--include-dependencies]`, `bundle preview`,
+  `bundle import --input --passphrase --on-conflict`.
+
+## Design
+
+### Section 1 — Shared infrastructure (underpins recs 1, 2, 3, 6, 7)
+
+**`CliRunner`** (new helper):
+- Resolves the CLI via the built DLL of `src/ZB.MOM.WW.ScadaBridge.CLI` (invoked as
+  `dotnet <cli>.dll …`), falling back to a `scadabridge` on PATH. The test `.csproj` adds a
+  build-order `ProjectReference` to the CLI project so the binary always exists.
+- Fixed args: `--url http://localhost:9000 --username multi-role --password password --format json`.
+- `Task<JsonDocument> RunAsync(params string[] args)` — runs the subprocess, captures
+  stdout/stderr, throws on non-zero exit (stderr in the message), parses JSON stdout.
+- Typed helpers: `CreateSiteAsync`, `CreateAreaAsync`, `CreateTemplateAsync` +
+  `AddAttributeAsync` (so the template validates), `CreateInstanceAsync`,
+  `DeleteInstanceAsync` / `DeleteTemplateAsync` / `DeleteSiteAsync`,
+  `BundleExportAsync(path, templateId, passphrase)`.
+
+**Naming + teardown convention:** every provisioned entity is named `zztest-<8charguid>`
+(sorts last; unambiguous for `LIKE 'zztest%'` safety-net deletes). Teardown is best-effort
+(swallow errors), mirroring `AuditDataSeeder`/`SiteCallDataSeeder`.
+
+**`ClusterAvailability` gate + skip logging (rec 7):**
+- One shared probe (CLI `site list` succeeds *and* the existing DB `IsAvailableAsync`) →
+  `Skip.IfNot(...)` used uniformly across all DB/cluster tests.
+- Convert `AuditLogPageTests`' 11 throw-on-unavailable tests to `SkippableFact`.
+- A collection fixture's `DisposeAsync` writes one summary line
+  (`SKIPPED N tests — cluster/DB unavailable`) so skips are visible.
+
+### Section 2 — Mutating action suites (recs 1, 2, 3)
+
+**`DeploymentActionTests` (rec 1):** a `DeploymentFixture` (collection fixture) provisions
+**one** `zztest` site + valid template once; each test creates its **own** throwaway
+instance on it (cheap), acts via the Topology UI, deletes it:
+- `Deploy_Instance_ConfirmsAndShowsOutcome` — Topology → context-menu Deploy → confirm
+  dialog → assert exactly one outcome toast (tolerating `Deployed`/`SiteUnreachable`) and
+  the status badge transitions off "not deployed".
+- `Enable_Instance_ShowsOutcome`, `Disable_Instance_ShowsOutcome` — same shape.
+- `Delete_Instance_RemovesFromTree` — UI delete → confirm → node disappears.
+
+**`RetryDiscardActionTests` (rec 2):** reuse the existing direct-SQL seeders to seed a
+`Parked` row, then drive the UI relay outcome-tolerantly (the `SiteCallsPageTests` pattern):
+- Parked Messages: `Retry_ParkedMessage_ShowsOutcomeToast`,
+  `Discard_ParkedMessage_ShowsOutcomeToast` (seed a parked S&F message for a `zztest`
+  target on `site-a`).
+- Notification Report: `Retry_ParkedNotification_ShowsOutcome`,
+  `Discard_ParkedNotification_ShowsOutcome` (seed a parked `Notifications` row).
+- Each: confirm dialog → exactly one outcome toast (`Applied`/`NotParked`/`SiteUnreachable`);
+  best-effort row teardown.
+
+**`TransportImportTests` (rec 3):** round-trip via CLI export + UI import:
+1. CLI creates a `zztest` template → `bundle export --output /tmp/zztest-<guid>.bundle
+   --passphrase <p>` (1-template synthetic bundle).
+2. UI Import wizard: upload the exported file → passphrase → diff/resolve (Add) →
+   type-env-name confirm → **Apply**.
+3. Assert the result screen shows success, the imported template appears, and the audit
+   drill-in `?bundleImportId=` link is present.
+4. CLI deletes the imported template(s).
+- **Impl risk to de-risk first:** remote file upload — Playwright `SetInputFiles` streams the
+  host file to the container browser; fine for a tiny bundle, but the plan's first transport
+  step verifies the upload end-to-end before building the rest.
+
+### Section 3 — Happy-path CRUD round-trips (rec 6)
+
+Three pure-UI create→edit→delete tests (the suite currently verifies *no* successful
+persisted write). Each ends by deleting what it made; a `zztest`-name safety-net teardown
+guards mid-test failure:
+- **Site** (extend `SiteCrudTests`): create via `/admin/sites/create` (name + identifier +
+  node addresses) → appears in list → edit description → delete → gone.
+- **Template**: `/design/templates/create` → add an attribute on `/design/templates/{id}` →
+  delete.
+- **LDAP mapping**: `/admin/ldap-mappings/create` (group + role) → edit role → delete.
+
+### Section 4 — Shallow-coverage hardening (recs 4, 5)
+
+**Nav render assertions (rec 4):** upgrade the 16 `NavigationTests` theory cases from "URL
+changed" to *also* assert the destination's heading/content renders (a per-route
+expected-heading map). No new tests — a strengthened helper.
+
+**Health load test (rec 5):** `HealthDashboardTests.KpiTiles_ResolveToValues` — load
+`/monitoring/health`, assert the three KPI tile groups (Notification-Outbox, Site-Call,
+Audit) resolve to numeric values (not the em-dash degrade) within a timeout. A direct
+regression guard for the singleton-hang class of bug.
+
+## Verification
+
+- Each new suite runs green against the live cluster; the full run stays at **0 failed**,
+  with any skips logged.
+- Mutating tests leave **no residual** `zztest-*` entities — verified by a post-run
+  `site list` / `template list` check.
+- Build: `dotnet build` the test project (picks up the new CLI `ProjectReference`), then
+  `dotnet test`.
+
+## Scope guard (YAGNI)
+
+No new page-object framework, no CI wiring, no parallelization changes. Everything slots
+into the existing xunit + `PlaywrightFixture` structure.
+
+## Native tasks
+
+Brainstorming checklist tasks #51–#56 track this design through to writing-plans. The
+implementation plan (produced next by the writing-plans skill) will carry its own task set.