docs: add Playwright coverage-fill design (Tier 1-3 + edge sweep, 4 waves)

2026-06-06 11:23:59 -04:00
parent 667d141f1a
commit 58bf59a42d
1 changed files with 213 additions and 0 deletions
@@ -0,0 +1,213 @@
+# Playwright Coverage Fill — Design
+
+**Date:** 2026-06-06
+**Status:** Approved (brainstorming complete) → ready for writing-plans
+**Component:** #9 Central UI — `tests/ZB.MOM.WW.ScadaBridge.CentralUI.PlaywrightTests`
+**Predecessor:** [2026-06-05 Playwright Coverage Expansion](2026-06-05-playwright-coverage-expansion-design.md) (the first wave; this is the close-out)
+
+## Goal
+
+Close the remaining functional and edge-case gaps found in the 2026-06-06 coverage
+re-audit — every untested Tier 1–3 page plus a cross-cutting edge-case sweep on the
+already-covered pages — delivered as **4 risk-tiered waves** inside the existing
+xunit + `PlaywrightFixture` harness, against the live 8-node docker cluster.
+
+## Background — the re-audit
+
+After the first expansion (suite at 83 passing), a re-audit mapped all 38 routable pages
+against the test catalog. Findings:
+
+- ~14 of 38 pages have real functional coverage; the rest are nav-only (h4 heading) or
+  untested.
+- **Biggest blind spot:** `InstanceConfigure` (`/deployment/instances/{id}/configure`) —
+  the most complex mutating page (5 override subsystems, browse dialog, test-bindings) —
+  has zero coverage.
+- Edge cases are systematically uncovered across *all* pages: no duplicate-name validation,
+  no cancel flows, no empty states, no pagination, no filter-combination, no error/500
+  paths, no wrong-passphrase import.
+- The suite is happy-path + outcome-tolerant by design (a deliberate, defensible choice
+  that matches the relay reality) — but a page that renders-but-misbehaves on a non-default
+  input can still stay green.
+
+## Decisions (settled during brainstorming)
+
+| # | Decision | Choice | Rationale |
+|---|----------|--------|-----------|
+| D1 | How much to take on | **Everything** — Tier 1–3 functional gaps + the edge sweep | User chose full close-out; phased so it stays tractable. |
+| D2 | Structure | **Risk-tiered phased waves** (4) | Front-loads highest-risk surfaces; each wave is a shippable, green, zero-residue increment; mirrors the structure that worked for wave 1. |
+| D3 | Test selectors on hook-poor pages | **Add `data-test` hooks as needed** (minimal, additive, non-functional) | Deep assertions on the complex forms (esp. `InstanceConfigure`) are otherwise brittle; matches the convention the audited Audit/SiteCalls/Transport pages already follow. |
+| D4 | Real-time / streaming tests | **Drive real behavior, outcome-tolerant, generous timeouts, `SkippableFact`** | Highest value; managed flake risk on a handful of tests, with a documented downgrade-to-render-guard fallback. |
+
+## Key enabler — the CLI mirrors the whole surface
+
+The `scadabridge` CLI exposes create/list/delete (and read-back) for **every** entity the
+new suites need, including the entire `InstanceConfigure` surface:
+
+- `instance set-bindings | set-overrides | alarm-override set/delete/list |
+  native-alarm-source set/clear | set-area | diff | get`
+- `security api-key create/list/delete/update/set-methods`
+- `data-connection`, `db-connection`, `external-system` (+ `method`), `api-method`,
+  `shared-script`, `notification` (list/create/delete + `smtp`) create/list/delete
+
+→ New functional suites need **no DB seeders**. Fixtures are CLI-provisioned and
+persistence is verified by CLI read-back (`instance get`, `security api-key list`). DB
+seeding remains only in the existing Audit/SiteCalls/Notification *report* edge tests that
+require a `Parked` row (site SQLite isn't reachable, so those stay direct-SQL).
+
+## Wave structure
+
+| Wave | Theme | New suites | New fixtures |
+|------|-------|-----------|--------------|
+| **1** | Tier 1 — deepest mutating surfaces | `InstanceConfigureTests`, `ApiKeyCrudTests`, `TransportExportTests` (+ wrong-passphrase import negative) | `InstanceConfigureFixture`, `ApiSurfaceFixture` |
+| **2** | Tier 2 — real-time / relay | `DeploymentsRealtimeTests`, `TopologyAreaTests`, `DebugViewTests`, `ParkedMessagesActionTests`, Discard click-throughs (SiteCalls + Notification) | reuse `DeploymentFixture` / `InstanceConfigureFixture` |
+| **3** | Tier 3 — config CRUD breadth | `NotificationListCrudTests`, `SmtpConfigTests`, `NotificationKpisTests`, `DataConnectionCrudTests`, `ExternalSystemCrudTests`, `SharedScriptCrudTests`, `ApiMethodFormTests`, `EventLogsTests`, `AuditConfigurationTests` | small per-suite CLI helpers |
+| **4** | Cross-cutting edge sweep | extend `SiteCrudTests`, `TemplateCrudTests`, `LdapMappingCrudTests`, `AuditLogPageTests`, `SiteCallsPageTests`, `NotificationActionTests` | none |
+
+Each wave ends green with zero residue — a clean stop/ship point. Inside a wave, suites
+are independent (disjoint fixtures/files) → parallel-dispatchable. Rough size: W1 ≈ 3 files
+/~12 tests, W2 ≈ 5 files/~14, W3 ≈ 9 files/~20, W4 ≈ 6 extended files/~20. Total ≈
+**23 files, ~65 new cases.**
+
+## Shared infrastructure
+
+**CLI helper extensions** (`CliRunner.Helpers.cs`, same throw-vs-swallow split):
+- *Provision (throw):* `CreateDataConnectionAsync`, `CreateExternalSystemAsync`,
+  `CreateApiMethodAsync`, `CreateNotificationListAsync`, `CreateSharedScriptAsync`.
+- *Verify (read-back, throw):* `GetInstanceAsync(id)`, `ListApiKeysAsync`.
+- *Teardown (best-effort):* `DeleteApiKeyAsync`, `DeleteDataConnectionAsync`,
+  `DeleteExternalSystemAsync`, `DeleteNotificationListAsync`, `DeleteSharedScriptAsync` —
+  all keyed on `zztest-*`.
+
+**New fixtures** (`IAsyncLifetime` + `IClassFixture`, partial-init guard + `Available`
+flag, mirroring `DeploymentFixture`):
+- **`InstanceConfigureFixture`** — site-a: `zztest` template + attribute(s) + `zztest`
+  data-connection + an instance, **deployed**. Disposes instance → connection → template.
+- **`ApiSurfaceFixture`** — `zztest` external-system + one api-method, so the API-key form
+  renders method checkboxes. Disposes api-method + external-system; created keys deleted
+  per-test via `security api-key delete`.
+
+**`data-test` hooks** (minimal, additive, each its own tiny commit so the app diff is
+auditable/revertable): `InstanceConfigure.razor` subsystem anchors; row anchors on
+`EventLogs.razor` / `ConfigurationAuditLog.razor`; save/result anchors on a few Design
+forms. Only where no stable existing selector (id/aria-label/text) exists.
+
+**Reused as-is:** `ClusterAvailability` skip gate + `SkipSummaryReporter`;
+`PlaywrightFixture`; toast = `.toast` web-first `ToHaveCountAsync(1)`; confirm =
+`.modal-footer .btn-danger` (Delete) / `.btn-primary` (Confirm); `zztest-<kind>-<8hex>`
+naming; `PlaywrightDbConnection` (only Wave-4 report-page Parked-row edge tests).
+
+## Per-wave test shapes
+
+### Wave 1 — Tier 1 mutating surfaces
+
+**`InstanceConfigureTests`** (`InstanceConfigureFixture`):
+- *Bindings round-trip* — bulk "Assign all to" the zztest connection → Save Bindings →
+  toast; verify persisted via `GetInstanceAsync` (not just a toast).
+- *Attribute-override round-trip* — type text override → Save Overrides → toast →
+  `GetInstanceAsync` shows it.
+- *Area reassignment* — select area → Set Area → toast → `GetInstanceAsync` confirms.
+- *(⚠ feasibility)* Alarm-override Edit→Save→badge→Clear — needs a template attribute with
+  an alarm; resolve via `instance alarm-override set` precondition or downgrade to a
+  section-renders assertion. Other three tests don't depend on alarms.
+- *Edge* — `/deployment/instances/999999/configure` → `alert-danger` not-found.
+
+**`ApiKeyCrudTests`** (`ApiSurfaceFixture`; teardown `security api-key delete`):
+- Create→token reveal (`data-test="created-token"`, Copy works); Enable/Disable toast +
+  badge; Delete confirm `.btn-danger` → row gone; Edit (name disabled, toggle method).
+- Edges: empty name → validation; uncheck all methods → "select at least one".
+
+**`TransportExportTests`** + negative import:
+- Export — CLI-create zztest template → `/design/transport/export` → select → passphrase →
+  Export → assert the bundle download (filename/size).
+- Wrong-passphrase import — feed the exported bundle to the import wizard with a bad
+  passphrase → Unlock error state (passphrase input stays, error shown), no `diff-summary`.
+
+### Wave 2 — Tier 2 real-time / relay (drive behavior, tolerant, generous timeouts, `SkippableFact`)
+
+- **`DeploymentsRealtimeTests`** — on `/deployment/deployments`, CLI `instance deploy` a
+  fixture instance → status row appears within ~20s (SignalR push). Second: Pause → deploy
+  → row absent → Refresh → row present.
+- **`TopologyAreaTests`** — create area (toolbar + site context-menu); inline rename
+  (Enter commits + toast; Escape reverts); move area; move instance; *(⚠)* Diff dialog
+  opens for a deployed instance (tolerant).
+- **`DebugViewTests`** — select site-a + enabled instance → Connect → badge "Live" +
+  snapshot/table region renders (tolerant: rows *or* "Waiting for snapshot"). Disconnect →
+  tables gone, selects re-enabled.
+- **`ParkedMessagesActionTests`** — render+controls guard on the page's own Retry/Discard
+  button presence/enablement (site SQLite not seedable → no click-through here).
+- **Discard click-throughs** — add the symmetric Discard-clicked test to
+  `SiteCallsPageTests` and `NotificationActionTests` (Retry is already click-through-tested).
+
+### Wave 3 — Tier 3 config CRUD (breadth; CLI teardown by zztest name)
+
+- **`NotificationListCrudTests`** — create → add recipient → delete recipient (no confirm)
+  → delete list (confirm + "Deleted." toast).
+- **`SmtpConfigTests`** (RequireAdmin) — add (host+from required) → saved toast →
+  credentials "(stored)"; edit cancel/save.
+- **`NotificationKpisTests`** — load → 5 tiles resolve (or "—") → Refresh spinner.
+- **`DataConnectionCrudTests` / `ExternalSystemCrudTests` / `SharedScriptCrudTests`** —
+  create→(edit)→delete round-trips.
+- **`ApiMethodFormTests`** — create method (name+timeout, minimal Monaco script, default
+  schema) → Save → appears under external systems (Monaco interaction minimal/tolerant).
+- **`EventLogsTests`** — Search disabled until site selected; select site → Search; filter
+  by Severity; row expand/collapse; Load more appends.
+- **`AuditConfigurationTests`** — load `/audit/configuration`; search narrows; Prev disabled
+  on page 1; large-state modal open/close; copy-entity-id toast; `?bundleImportId=` chip
+  drill-in.
+
+### Wave 4 — cross-cutting edge sweep (⚠ verify each page's real validation first)
+
+- **Sites** — duplicate-identifier (assert whatever the app surfaces), cancel-from-edit,
+  invalid Akka/gRPC URL *if validated*.
+- **Templates** — duplicate-name, cancel, edit existing attribute, delete-when-instances-
+  exist (blocking).
+- **LDAP** — duplicate group name, missing-field validation.
+- **Audit Log** — filter combination (channel+site+time), empty-results-after-Apply, drawer
+  close (X/Escape), non-API row has no cURL button, pagination.
+- **Site Calls** — filter by status, empty state, keyset pagination (`site-calls-prev/next`
+  hooks exist, unused today).
+- **Notification Report** — filter combos, "Stuck only", detail modal open/close, pagination.
+
+## Error handling & verification strategy
+
+**Validation-behavior protocol (the ⚠ items).** Before asserting a specific failure mode,
+the implementer reads the page code-behind to learn what it actually does — inline
+`_formError` vs error toast vs DB-constraint bubble vs silent success — and asserts that
+reality. Where the app doesn't validate (relies on a DB constraint with no friendly
+message), the test asserts the real surfaced behavior and a code comment notes the gap.
+Same protocol resolves the alarm-override and Diff-dialog feasibility ⚠s.
+
+**Flake management (real-time/streaming).** Web-first assertions only (`Expect(...)` with
+explicit timeouts), never `WaitForTimeout` + read. Generous ceilings (~20s). Outcome-
+tolerant where the cluster's response isn't deterministic. All `SkippableFact` +
+`ClusterAvailability` gated. Documented fallback: if a real-time test is irreducibly flaky,
+downgrade *that test only* to a render+controls guard.
+
+**Teardown & residue.** Best-effort cleanup in `DisposeAsync`/`finally`, keyed on
+`zztest-*`. Each wave ends with the no-residue check (`site/template/instance/
+data-connection/api-key/notification list` via CLI + DB marker scan) → zero. Mutating tests
+that target real site-a leave it as found.
+
+**Per-wave gate.** Wave is "done" only when: new tests pass against the live cluster, the
+full suite stays at **0 failed** (skips logged), zero residue, and `dotnet build` is clean
+(`TreatWarningsAsErrors=true`). `data-test` additions verified not to alter rendered
+behavior.
+
+## Scope guard (YAGNI)
+
+No new page-object framework, no CI wiring, no parallelization/runner changes, no
+visual-regression/screenshot testing, no perf testing. New `data-test` attributes are the
+only app-code change and are purely additive. Everything slots into the existing structure.
+
+## Success criteria
+
+All 4 waves merged; ~23 files / ~65 new cases; every Tier 1–3 page from the audit has
+functional coverage; the edge sweep adds duplicate/cancel/empty/filter-combo/pagination
+assertions to the previously happy-path-only pages; suite green with logged skips; zero
+residue.
+
+## Native tasks
+
+Brainstorming checklist tasks #73–#78 track this design through to writing-plans. The
+implementation plan (produced next by the writing-plans skill) carries its own task set,
+one wave at a time.