Files
ScadaBridge/stillpending.md
T
Joseph Doherty f4707745bf docs(plans): completion roadmap for stillpending.md audit
Add the system-completion design doc (risk-first milestones M1-M10):
Phase 1 Stabilize (M1 runtime wiring, M2 correctness, M3 script trust
boundary, M4 doc reconciliation) then Phase 2 Expand (M5-M10 feature
epics). Scope = all Tier 1/2/4 + in-scope Tier 3 features; T12/T19
deferred to own brainstorm; deliberate anti-goals excluded. Also commit
the source audit (stillpending.md).
2026-06-15 09:27:00 -04:00

218 lines
23 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# ScadaBridge — Pending / Deferred / Partial / Missing Functionality Audit
**Date:** 2026-06-15
**Scope:** Full system — design specs (`docs/requirements/`), all of `src/`, the Central UI / CLI / Management Service surfaces, and the plan/checklist archive (`docs/plans/`).
**Method:** Five parallel read-only investigators, each verifying doc claims against actual code (file:line evidence). Top findings were independently corroborated by 2+ agents.
## Executive summary
The codebase is unusually clean: **zero real `TODO`/`FIXME` markers in `src/`**, and all 11 implementation phases self-report complete. Consequently the unfinished work does **not** announce itself — it hides in three forms:
1. **Silent gaps (Tier 1)** — documented as working, not marked deferred, but absent or inert in production.
2. **Partial / behaviorally-divergent functionality (Tier 2)** — real, but narrower or different from the spec.
3. **Intentional deferrals (Tier 3)** — knowingly punted, correctly documented, with extensible seams. *Not defects.*
4. **Doc↔code drift (Tier 4)** — code is fine; the specs describe a superseded architecture.
**Actionable risk is concentrated in Tier 1.** Recommended starting point: **#3 / #4 (wire the two never-started audit actors)** — highest impact, smallest blast radius.
---
## Tier 1 — Silent gaps: documented as working, but not actually running
These are the dangerous ones. Specs present them as live behavior, they are **not** marked deferred, yet the functionality is absent or inert.
| # | Gap | Where | Impact |
|---|-----|-------|--------|
| **1** | **Script "test-compilation" does no real compilation.** The headline pre-deploy gate ("scripts must compile without errors") is a brace-balance + forbidden-API *substring* scan. No Roslyn reference in the project. | `TemplateEngine/Validation/ScriptCompiler.cs:56-104`; used by `ValidationService.cs:128` | Semantically-broken C# passes validation and **deploys**. Found independently by 2 agents. |
| **2** | **Script forbidden-API gate is bypassable.** The script trust model's only design-time enforcement is the same substring scan — defeated by aliases / `using static` / `global::`. Self-documented as "SECURITY LIMITATION (TemplateEngine-006)". | `ScriptCompiler.cs:14-22,61-72`; `ValidationService.cs:346` | Security boundary is advisory only. 2 agents. |
| **3** | **Audit Log 365-day retention purge never starts.** `AuditLogPurgeActor` exists but has **zero** `ActorOf`/`Props.Create` callers; only the roll-*forward* partition service runs. | `AuditLog/.../AuditLogPurgeActor.cs:58`; `AkkaHostedService.cs:486` ("wired in a later bundle") | The documented purge does not run in production; `AuditLog` grows unbounded. |
| **4** | **Audit Log reconciliation self-heal never wired.** `IPullAuditEventsClient` has no implementation; `SiteAuditReconciliationActor` is never instantiated. | `IPullAuditEventsClient.cs:31`; `SiteAuditReconciliationActor.cs:68` | The documented "lost-telemetry fallback" doesn't exist; forward telemetry is the only path. |
| **5** | **Site Call Audit reconciliation pull + daily purge both missing.** The actor's own docstring admits it. `PurgeTerminalAsync` is implemented but never invoked. | `SiteCallAuditActor.cs:28-34`; `SiteCallAuditRepository.cs:213` | `SiteCalls` mirror has no self-heal and grows unbounded. |
| **6** | **Site Event Logging emits only 2 of 7 documented categories.** Only `connection` + `script` (error-path only). Alarm, Deployment, Store-and-Forward, Instance-Lifecycle, Notification events are never logged — `ISiteEventLogger` isn't injected into those subsystems. | `DataConnectionActor.cs`, `ScriptExecutionActor.cs` (only emitters); spec "Events Logged" §20-28 | Operational event log is materially incomplete vs spec. |
---
## Tier 2 — Partial / behaviorally-divergent functionality
Real, but narrower than the spec — wrong in a way that could surprise an operator or script author.
| # | Gap | Where |
|---|-----|-------|
| **7** | **`Database.CachedWrite` misclassifies permanent SQL errors as transient** → retries forever instead of failing fast to the script. The API path does it right; the DB path does not. No immediate attempt, no synchronous permanent-`Failed` return. | `DatabaseGateway.cs:78-204` (cf. `ExternalSystemClient.cs:100-161`) |
| **8** | **Alarm `conditionFilter` is plumbed end-to-end but applied nowhere** — set a filter on a native-alarm source and it silently mirrors *all* conditions. | `DataConnectionActor.cs:1482,1540-1554`; `RealOpcUaClient.cs:242,295`; `MxGatewayDataConnection.cs:154-167` |
| **9** | **Per-script execution timeout doesn't exist** — spec promises per-script; only a global `ScriptExecutionTimeoutSeconds`. No field in the template/flattened model to carry it. | `SiteRuntimeOptions.cs:31`; `ScriptExecutionActor.cs:100`; `AlarmExecutionActor.cs:66` |
| **10** | **Connection-level diffs never surface in the deployment diff**`ComputeConnectionsDiff` is dead code (no callers); `ConfigurationDiff` has no slot for it. Per-attribute binding drift *is* caught; standalone connection endpoint (protocol/config/failover) diff is not. | `DiffService.cs:158-204`; `Commons/Types/Flattening/ConfigurationDiff.cs:7-24` |
| **11** | **Inbound API auth transport drift** — code uses `Authorization: Bearer sbk_<keyId>_<secret>`; doc says `X-API-Key` header. | `InboundAPI/.../EndpointExtensions.cs:83-90` |
| **12** | **Inbound API audit write is fire-and-forget *after* response flush** — doc says synchronous *before* flush. Row is still emitted (fail-soft), just non-blocking and after the body is forwarded. | `AuditWriteMiddleware.cs:195-212,281-290` |
| **13** | **Inbound `Object`/`List` extended types are shape-validated only** — no nested/field-level type validation, despite spec implying typed/nested validation. | `ParameterValidator.cs:109-145`; `ReturnValueValidator.cs:18` |
| **14** | **JWT-in-cookie session design not implemented**`/auth/login` signs a plain `ClaimsPrincipal`; `GenerateToken` only used by the CLI `/auth/token` path; `ValidateToken` has no external callers. | `AuthEndpoints.cs:38,75-112,152`; `ServiceCollectionExtensions.cs:99-118` |
| **15** | **"Re-query LDAP every 15 min / roles never >15 min stale" not implemented for interactive sessions** — `JwtTokenService.RefreshToken`/`RecordActivity`/`ShouldRefresh`/`IsIdleTimedOut` have **zero** call sites; roles fixed until cookie expiry. The 15-min sliding + 30-min idle layers are collapsed into a single 30-min sliding cookie window. | `JwtTokenService.*` (no callers); `ServiceCollectionExtensions.cs:99-148` |
| **16** | **Transport stale-instance enumeration always returns empty**`BundleImporter` returns `Array.Empty<int>()`; UI shows a generic warning with no count, link not filtered to stale instances. | `BundleImporter.cs:733`; `TransportImport.razor:347-388` |
| **17** | **`MachineDataDb` fail-fast requirement not enforced** — spec (REQ-HOST-3/4) requires central nodes to validate a non-empty `MachineDataDb` connection string. `DatabaseOptions` has only `ConfigurationDb`/`SiteDbPath`; validator never checks it; 0 `grep` hits in `src/`. Key lives only in docker appsettings as dead config. | `DatabaseOptions.cs:6-12`; `StartupValidator.cs:60-61` |
| **18** | **CI grep-guard against `UPDATE/DELETE … AuditLog` not in the repo** — spec claims a build-time grep that fails on data-layer mutations. DB-role DENY enforcement *is* present in migrations (so this is a backstop, not the only control), but the claimed code-level guard is absent. | spec `Component-AuditLog.md:335-336`, `Component-ConfigurationDatabase.md:297` |
### Lower-severity Tier-2 / behavioral notes
| # | Gap | Where |
|---|-----|-------|
| 19 | **Script "started"/"completed" events not logged** (only failures, severity `Error`). | `ScriptExecutionActor.cs:239,256`; `ScriptActor.cs:369` |
| 20 | **Return-type compatibility check is dead scaffolding**`BuildReturnMap` builds maps never read; no return-type comparison runs. | `SemanticValidator.cs:62-63,279-287` |
| 21 | **Argument *type* compatibility not checked** — only arg *count* (comma counting). | `SemanticValidator.cs:251-266,390-425` |
| 22 | **Native-alarm-source connection-capability validation never runs in deploy pipeline**`alarmCapableConnectionNames` param no production caller supplies. | `SemanticValidator.cs:30-33,239-245`; `FlatteningPipeline.cs:93,115` |
| 23 | **Connection-binding completeness is a non-blocking Warning, not deploy-gating Error**; "name exists at site" half missing. | `ValidationService.cs:504-519`; `ValidationResult.cs:9` |
| 24 | **Debug snapshot/subscribe for unknown instance returns empty snapshot, not error** — caller can't distinguish "not deployed" from "deployed but empty." | `DeploymentManagerActor.cs:845-866` |
| 25 | **Recursion-limit error logged to .NET `ILogger`, not the site event log** as spec requires. | `ScriptRuntimeContext.cs:302-305,464-466` |
| 26 | **Debug-stream snapshot/stream ordering reversed; no timestamp-dedup replay**`PreStart` sends snapshot first, opens stream after; gap-window events lost (spec wants stream-first + replay/dedup). | `DebugStreamBridgeActor.cs:89-103,163-166` |
| 27 | **OPC UA native-alarm transition leaves several display fields empty** (Category/Description/OperatorUser/OriginalRaiseTime/CurrentValue/LimitValue) — partly by design. | `RealOpcUaClient.cs:395-403`; `MxGatewayAlarmMapper.cs:79-113` |
| 28 | **Readiness gate omits "required cluster singletons running" criterion** — covers membership + DB connectivity only (softened by spec's "(if applicable)"). | `Program.cs:188-201,314-317`; `AkkaClusterHealthCheck.cs:54` |
| 29 | **SiteEventLog active-node purge gate never registered**`SiteEventLogActiveNodeCheck` not added to DI; purge defaults to `() => true`, runs on standby too (harmless, but documented restriction unenforced). | `SiteEventLogging/ServiceCollectionExtensions.cs:33-37`; `EventLogPurgeService.cs:61` |
| 30 | **`FailedWriteCount` metric exposed "for future Health Monitoring" but never consumed** — dangling metric. | `ISiteEventLogger.cs:32-40` |
| 31 | **`StateTransitionValidator` allows Delete from `NotDeployed`; spec matrix says No** (deliberate per code comment, contradicts doc). | `StateTransitionValidator.cs:38-39` |
---
## Tier 3 — Intentional deferrals (correctly documented — NOT defects)
Knowingly punted, with extensible seams and explicit doc notes. `[PERM]` = permanent / v-next; `[SLICE]` = deferred-to-a-later-slice with seam present.
**Centralized Audit Log (#23)**
- `[PERM]` Hash-chain tamper evidence (v1.x). `verify-chain` CLI is a no-op stub that prints "not enabled in this release". — `AuditCommands.cs:243-246`; `AuditVerifyChainHelpers.cs:6-8`
- `[PERM]` Parquet export/archival. Server returns HTTP `501`; CSV + JSONL implemented. — `AuditEndpoints.cs:188-194`; `AuditExportHelpers.cs:139-148`
- `[PERM]` Per-channel retention overrides. — `2026-05-20-audit-log-code-roadmap.md:16`
- `[PERM]` Tag-cascade for `ParentExecutionId` — only the inbound-API→routed-site bridge is built; trigger-driven runs pass `parentExecutionId = null`. — `ScriptActor.cs:404,429`; `2026-05-21-audit-parent-executionid-design.md:209`
- `[PERM]` ExecutionId/ParentExecutionId backfill on historical rows; SourceNode backfill on legacy rows; per-node stuck-count KPIs.
- `[PERM]` Structured/response-header response capture; inbound request-header capture; per-method opt-out; `AuditInboundCeilingHits` metric. — `2026-05-23-inbound-api-full-response-audit-design.md:113-127`
- *Uncertain:* CLI `audit tree` command (doc "maybe", not found in CLI).
**Notifications (#8 / #21)**
- `[SLICE]` Teams (and all non-Email) notification types — `INotificationDeliveryAdapter` seam exists, only `EmailNotificationDeliveryAdapter` implemented; `NotificationType` enum is Email-only. Missing-adapter path parks gracefully. — `NotificationType.cs:6-9`; `NotificationOutboxActor.cs:457-474`
- `[SLICE]` Central UI notification-list form has no `Type` selector (Email hard-coded). — `NotificationListForm.razor`
- `[PERM]` Historical/trend KPI charts (no time-series store).
**Native Alarms / MxGateway / OPC UA**
- `[PERM]` Native-alarm ack/shelve/suppress write-back; central alarm tables/history/journal; alarm-driven notifications/scripts — read-only by design. — `2026-05-29-native-alarms-design.md:201-206`
- `[SLICE]` Dedicated operator Alarm Summary page (DebugView only for now).
- `[PERM]` MxGateway secured writes (operator+verifier).
- `[SLICE]` OPC UA address-space search; `BrowseNext` paging. — `RealOpcUaClient.cs:574`
- `[PERM]` OPC UA type-info surfacing; bulk override import/CSV.
- `[SLICE]` OPC UA "Verify endpoint" connectivity button; cert-management UI.
**Transport (#24)**
- `[PERM]` Site-scoped / instance-scoped artifact transport (needs name-mapping subsystem).
- `[PERM]` Direct cluster-to-cluster pull; asymmetric bundle signing; differential/incremental bundles.
- `[PERM/SLICE]` Per-line/Myers diff for Modified artifacts (coarse line-count delta only). — `ArtifactDiff.cs:18-25`
**TreeView**
- `[SLICE/PERM]` R6 lazy-loading, R7 keyboard nav, R16 multi-select — spec marks all "(Deferred)". — `Component-TreeView.md:87-93,288-295`
**Templates / Data Connections / Triggers UI**
- `[SLICE]` Template tree search/filter; `[PERM]` folder drag-drop, sibling reorder, root context menu.
- `[PERM]` Move data connection between sites; `[SLICE]` connection live-status indicators (blocked on DCL state surfacing).
- `[SLICE]` Base-template versioning "update-derived" flow; multiple inheritance levels; `[PERM]` promote-derived-to-base, cross-tenant libraries.
- `[SLICE]` Strict expression-trigger analysis kind; `[PERM]` WhileTrue trigger mode for alarms.
- `[SLICE]` Schema-driven value-entry forms; schema hover/completion; `[PERM]` JSON Schema `$ref` reuse / template-level schema library.
**Cached-call tracking (#6 / #22)**
- `[SLICE]` CLI surface for site-local Retry/Discard of cached calls; `[PERM]` unified notifications+site-calls outbox page.
**UI audit backlog (`2026-05-12-ui-audit.md:536-554`)**
- `IDialogService` modal abstraction; design-tokens/CSS-vars; dark-mode/theming; shared pagination+filter component; accessibility pass; replacing SignalR debug-view streaming.
**Environment / tooling**
- `[PERM]` True air-gapped second environment (env2 shares MSSQL/LDAP/SMTP); 3rd/4th env; `--env` flag on `deploy.sh`.
- `[PERM]` Repo/folder rename (kept as ScadaBridge to preserve context).
- `[SLICE]` Playwright alarm-override UI coverage.
---
## Tier 4 — Doc↔code drift (code is fine; docs describe a superseded architecture)
Worth fixing for anyone relying on the docs as the spec.
**Config DB / Commons re-architecture not reflected in specs (High doc-impact):**
- `AuditLog` table collapsed to 10 canonical + `DetailsJson` + 6 PERSISTED `JSON_VALUE` computed cols; doc still lists ~24 typed columns (`Kind`, `HttpStatus`, `RequestSummary`, …). — migration `20260602174346_CollapseAuditLogToCanonical.cs`; `Entities/AuditLogRow.cs:54-136`
- `AuditEvent` moved out of Commons into the external `ZB.MOM.WW.Audit` NuGet package; doc (REQ-COM-1/3/5b) still describes it as a Commons type. — `Commons.csproj:11`
- `ApiKey` entity / API-key persistence retired to shared `ZB.MOM.WW.Auth.ApiKeys` SQLite store; doc still lists `ApiKey` + `ApprovedApiKeyIds`. — migration `20260602092753_RetireInboundApiKeyStore.cs`
**CLI docs drift (README is the stale doc; `Component-CLI.md` mostly matches code):**
- Entire `bundle` (Transport #24) command group is shipped + registered but documented in **neither** `Component-CLI.md` **nor** `CLI/README.md`. — `Program.cs:36`; `BundleCommands.cs:24-372`
- `security api-key create` requires undocumented `--methods` (Required); docs show only `--name`. — `SecurityCommands.cs:41-45`
- `security api-key update`/`delete` use `--key-id`; docs document `--id` (and an unwired `--name` on update). — `SecurityCommands.cs:60,71`
- `security api-key set-methods` subcommand exists in code, documented nowhere. — `SecurityCommands.cs:91-102`
- `api-method create` uses required `--script`; docs document `--code` + `--description` (neither exists). README is internally inconsistent (create=`--code`, update=`--script`). — `ApiMethodCommands.cs:57-62`
- `db-connection create`/`update` documented with `--provider`; code has no such option. — `DbConnectionCommands.cs:56-72`
- Widespread README option-name drift where `Component-CLI.md` already matches code (scope-rule `--mapping-id`, health `--site`/`--keyword`, template attribute `--value`/`--data-source`, template alarm `--trigger-type`/`--priority`/`--trigger-config`, composition delete `--id`, etc.).
- `audit query` doc lists `--page` (code is keyset-only `--all`); undocumented `--execution-id`/`--parent-execution-id` filters exist.
**Stale "deferred" markers for things that have actually SHIPPED:**
- Transport CLI (`bundle export/preview/import`) — design doc §13 said "deferred"; now implemented.
- `SourceNode` capture — `.tasks.json` shows all 21 tasks "pending"; fully implemented across Commons/AuditLog/NotificationOutbox/SiteCallOperational.
- Site Call Audit Retry/Discard relay — DI comment says deferred; implemented + wired (`SiteCallAuditActor.cs:150-156,450-505`; `AkkaHostedService.cs:580-589`).
- Bundle-import audit filter UI (Transport-012) — doc says deferred follow-up; shipped (`ConfigurationAuditLog.razor` `?bundleImportId=` filter).
- Redaction/payload-cap "deferred to M5" comments in Site Runtime — already shipped (`ScadaBridgeAuditRedactor`, `AuditLogOptions.DefaultCapBytes/ErrorCapBytes`).
- `AuditLogPage.HandleRowSelected` class comment says "no-op seam"; method is fully wired (opens drawer).
**Other doc/spec inconsistencies (code richer/different than doc):**
- Security role names: doc says Admin/Design/Deployment; code uses Administrator/Designer/Deployer/Viewer (canonicalized via migration).
- `SiteCall` entity field names diverge from doc (`Channel` not `Kind`, `SourceSite` not `SourceSiteId`, adds `HttpStatus`/`IngestedAtUtc`).
- `ExecuteReader` audited as `DbWrite` (read/write distinguished via `Extra` JSON `op`, not a distinct `AuditKind`).
- Inbound audit doc references `ApiInbound.Completed`; actual kinds are `InboundRequest`/`InboundAuthFailure`.
- `Teams` claimed present in `NotificationType` enum by Commons/ConfigDB docs; enum is Email-only.
- Commons under-documents shipped code: MxGateway endpoint serializer/validator/config, `Observability/ScadaBridgeTelemetry.cs`, `IInboundApiKeyAdmin`, `IAuditActorAccessor` — none in the doc folder map.
- `IHealthMonitoringRepository` listed in ConfigDB repo table but doesn't exist (doc annotated "future").
- `requirements-traceability.md` and many `.md.tasks.json` show "Pending" for shipped features — they track *plan generation*, not implementation; unreliable as a status source.
- `ExternalSystemForm` "Recent audit activity" drill-in omits `channel=ApiOutbound` and uses exact-match `target` instead of starts-with (sibling `ApiKeyForm` link is correct). — `ExternalSystemForm.razor:20-24`
---
## Code-level sweep — investigated and ruled out (false positives)
For completeness, items that *look* unfinished but are intentional:
- ~44 empty `catch` blocks — all have explanatory comments / intentional fallback (JSON parse → default; disposal-race `ObjectDisposedException`). None silently swallow real errors.
- `SiteNotificationRepository` / `SiteExternalSystemRepository` write methods throw `NotSupportedException` — by design (site config is read-only, managed via central deployment).
- `StubOpcUaClient` (canned data; `BrowseChildrenAsync` throws `NotImplementedException`) — dev/test-only; production wires `RealOpcUaClientFactory`/`RealMxGatewayClientFactory` (`DataConnectionFactory.cs:38-47`).
- `NoOpSiteStreamAuditClient`, `SandboxNotifyHelper`, sandbox host fakes — legitimate DI-default / test composition seams.
- `AddSecurityActors` / `AddTemplateEngineActors` "Phase 0 placeholder" registrations — intentional empty seams (actor wiring lives in Host).
- Migration `Down()` `NotSupportedException`, MxGateway/Bundle version-rejection `NotSupportedException`, `AuditWriteMiddleware` write-only-stream `NotSupportedException` — intentional guards.
- Management Service: 113 handlers; all wire-registered `Mgmt*` commands dispatched. The three "unhandled" (`ResolveRolesCommand` retired; `BrowseNodeCommand`/`ReadTagValuesCommand` routed direct-to-site) are intentional.
- Central UI: no stub/placeholder pages, no `NotImplementedException`, no "coming soon" banners, no no-op `@onclick`. `disabled=`/`placeholder=` usages are legitimate (loading guards, edit locks, HTML hints).
---
## Phase completeness (self-reported)
All 11 phases report Complete with passing verification gates:
- **Phase 0** Solution Skeleton — Complete (gate 11/11, 57 tests).
- **Phase 1** Central Foundations — Complete (gate 20/20, 186 pass + 1 live-LDAP skip).
- **Phase 2** Modeling & Validation — Complete (gate 9/9, 359 tests).
- **Phase 3A** Runtime Foundation — Complete (gate 13/13, 389/389).
- **Phase 3B** Site I/O & Observability — Complete (gate 11/11, 541 cumulative).
- **Phase 3C** Deployment & Store-Forward — Complete (terse checklist).
- **Phase 4** Operator UI — Complete (terse checklist).
- **Phase 5** Authoring UI — Complete (terse checklist).
- **Phase 6** Deployment & Ops UI — Complete (terse checklist; Codex external-review step skipped, best-effort).
- **Phase 7** Integrations — Complete (terse checklist; Q12 SMTP-OAuth2 is a test-env dependency).
- **Phase 8** Production Readiness — Complete (terse checklist).
The ~665 unchecked `- [ ]` items in phase *plan* docs are spec-traceability references (each dispositioned Pass / Out-of-scope in Forward/Reverse tables), a documentation style — not a TODO list.
**Operational (not code):** `docs/deployment/production-checklist.md` has ~60 unchecked install-time operator steps (env vars, connection strings, firewall ports 8081/636/587/1433, TLS certs, smoke tests).
---
## Confidence & caveats
- **High confidence** on Tier 1 — each item verified by reading the code (class/interface existence + absence of callers via grep); top items corroborated by 2+ independent agents.
- Terse Phase 3C8 checklists self-report "Complete / tests passing" with no per-gate breakdown; test counts for those phases were not independently re-run.
- Actual `src/` artifacts were treated as truth over `.tasks.json` status fields, which are demonstrably stale.
- Items marked *Uncertain* (e.g. `audit tree` CLI, per-channel retention) rest on doc text only.
## Recommended next steps
1. **Wire the two never-started audit actors** (#3, #4) — highest impact, smallest blast radius (DI/Host wiring + `IPullAuditEventsClient` impl).
2. **Site Call Audit reconciliation + purge** (#5) — same shape as #3/#4.
3. **Decide on script compilation/security** (#1, #2) — either implement the Roslyn gate or downgrade the spec's claims; currently the strongest functional + security gap.
4. **Site Event Logging categories** (#6) — inject `ISiteEventLogger` into the 5 missing subsystems.
5. **Reconcile Tier-4 doc drift** — update Config DB / Commons specs for the audit/auth re-architecture and the CLI docs for the `bundle` group + option names.