docs(code-reviews): record Admin-013 (SignalR hub clients cannot authenticate)

Records the post-review finding discovered during browser smoke-testing: the Admin-003 hub hardening was incomplete — the server-side Blazor HubConnection clients had no way to authenticate, so hub negotiate 401'd and four cluster pages threw unhandled 500s. Logged as Admin-013 (High, Error handling & resilience), Status Resolved, fixed by commits f254539 + 8d5dbb4. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
fix(admin): authenticate SignalR hub clients with a bearer-token scheme
2026-05-22 12:29:36 -04:00 · 2026-05-22 12:06:29 -04:00 · 2026-05-22 11:56:06 -04:00 · 2026-05-22 11:29:21 -04:00 · 2026-05-22 11:25:39 -04:00 · 2026-05-22 11:25:25 -04:00
268 changed files with 22774 additions and 1958 deletions
@@ -0,0 +1,162 @@
+# Code Review Process
+
+This document describes how to perform a comprehensive, per-module code review of
+the `lmxopcua` codebase (the ZB.MOM.WW.OtOpcUa OPC UA server) and how to track
+findings to resolution.
+
+A **module** is one buildable project under `src/` (e.g.
+`src/Drivers/ZB.MOM.WW.OtOpcUa.Driver.Galaxy`) or one test project under `tests/`
+(e.g. `tests/Drivers/ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Tests`). Each module has its
+own folder under `code-reviews/` containing a single `findings.md`.
+
+## 1. Before you start
+
+1. Pick the module to review. Its folder is `code-reviews/<Module>/`, where
+   `<Module>` is the project name with the `ZB.MOM.WW.OtOpcUa.` prefix stripped:
+   - `src/Server/ZB.MOM.WW.OtOpcUa.Server` is reviewed in `code-reviews/Server/`.
+   - `src/Drivers/ZB.MOM.WW.OtOpcUa.Driver.Galaxy` → `code-reviews/Driver.Galaxy/`.
+   - `src/Core/ZB.MOM.WW.OtOpcUa.Core.Abstractions` → `code-reviews/Core.Abstractions/`.
+   - `tests/Drivers/ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Tests` →
+     `code-reviews/Driver.Galaxy.Tests/`.
+
+   The solution `ZB.MOM.WW.OtOpcUa.slnx` enumerates every project; `src/` is
+   grouped into `Core/`, `Server/`, `Drivers/`, `Client/`, and `Tooling/`.
+2. Identify the design context for the module:
+   - `CLAUDE.md` — project goal, the data-flow architecture, the contained-name
+     vs tag-name concept, and the **Library Preferences** / build & runtime
+     constraints.
+   - `StyleGuide.md` — repository code-style conventions.
+   - The relevant docs under `docs/` and `docs/v2/` — e.g. `docs/OpcUaServer.md`,
+     `docs/AddressSpace.md`, `docs/ReadWriteOperations.md`, `docs/security.md`,
+     `docs/Redundancy.md`, `docs/ScriptedAlarms.md`, `docs/AlarmTracking.md`,
+     `docs/ServiceHosting.md`, `docs/v2/plan.md`, `docs/v2/acl-design.md`,
+     `docs/v2/driver-specs.md`, `docs/v2/driver-stability.md`, the
+     `docs/v2/Galaxy.*.md` set, and the driver notes under `docs/drivers/`.
+   - The auto-memory index at
+     `~/.claude/projects/.../memory/MEMORY.md` records non-obvious project
+     decisions and is worth a scan before a review.
+3. Record the exact commit being reviewed: `git rev-parse --short HEAD`. Every
+   review is a snapshot — a finding only means something relative to a known
+   commit.
+4. Open `code-reviews/<Module>/findings.md` (copy it from
+   [`code-reviews/_template/findings.md`](code-reviews/_template/findings.md) if it
+   does not exist yet) and fill in the header table (reviewer, date, commit SHA,
+   status).
+
+## 2. Review checklist
+
+Work through **every** category below for the module. A comprehensive review
+means the checklist is completed even where it produces no findings — record
+"No issues found" for a category rather than leaving it ambiguous.
+
+1. **Correctness & logic bugs** — off-by-one, null handling, incorrect
+   conditionals, misuse of APIs, broken edge cases, wrong data-type mapping.
+2. **OtOpcUa conventions** — the rules in `CLAUDE.md` and `StyleGuide.md`: Galaxy
+   access flows through the in-process `GalaxyDriver` over gRPC to the separately
+   installed `mxaccessgw` gateway — nothing in this repo loads MXAccess COM
+   directly; browse uses **contained names** and runtime read/write uses
+   **tag names** (`tag_name.AttributeName`); authorization decisions happen in
+   `DriverNodeManager` at the server layer, never in driver-level code — drivers
+   only report `SecurityClassification` as metadata; .NET 10 / AnyCPU; Serilog
+   with a rolling daily file sink; xUnit + Shouldly for unit tests; the .NET
+   generic host with `AddWindowsService` for the Server and Admin hosts; the OPC
+   Foundation UA .NET Standard stack for OPC UA; generated code is not
+   hand-edited.
+3. **Concurrency & thread safety** — shared mutable state, race conditions,
+   correct use of `async`/`await`, locking, disposal races, background-loop and
+   reconnect-supervisor lifetimes.
+4. **Error handling & resilience** — exception paths, driver/gateway reconnect
+   handling, transient vs permanent error classification, graceful degradation,
+   correct OPC UA `StatusCode`s, address-space rebuild on redeploy.
+5. **Security** — OPC UA transport security profiles (`SecurityProfileResolver`),
+   LDAP bind authentication and the group→permission mapping
+   (`LdapUserAuthenticator`), ACL enforcement at the `DriverNodeManager` layer,
+   input validation, SQL injection in the `ConfigDb` / Galaxy Repository queries,
+   certificate handling, and secret handling (no logging of credentials, LDAP
+   service-account passwords, or API keys).
+6. **Performance & resource management** — `IDisposable` disposal, gRPC channel /
+   stream / session lifetimes, buffering and back-pressure on event pumps,
+   unnecessary allocations on hot paths, N+1 queries.
+7. **Design-document adherence** — does the code match `CLAUDE.md`, the relevant
+   `docs/` and `docs/v2/` designs? Flag both code that drifts from the design and
+   design docs that are now stale.
+8. **Code organization & conventions** — namespace hierarchy, project layout, the
+   Options pattern, separation of concerns, the capability-interface seams
+   (`IReadable`, `IWritable`, `ISubscribable`, `IAlarmSource`, etc.).
+9. **Testing coverage** — are the module's behaviours covered? Unit suites are
+   `*.Tests` (xUnit + Shouldly); integration suites are `*.IntegrationTests` and
+   need their Docker fixture up; DB-backed tests in `*.Configuration.Tests`,
+   `*.Admin.Tests`, and `*.Server.Tests` need the central SQL Server. Note
+   untested critical paths and missing edge-case tests.
+10. **Documentation & comments** — XML doc accuracy, misleading or stale comments,
+    undocumented non-obvious behaviour.
+
+## 3. Recording findings
+
+Add one entry per finding to the `## Findings` section of the module's
+`findings.md`, using the entry format in
+[`_template/findings.md`](code-reviews/_template/findings.md).
+
+- **Finding ID** — `<Module>-NNN`, numbered sequentially within the module and
+  never reused (e.g. `Driver.Galaxy-001`). IDs are permanent even after
+  resolution.
+- **Severity:**
+  - **Critical** — data loss, security breach, crash/deadlock, or outage.
+  - **High** — incorrect behaviour with significant impact; no safe workaround.
+  - **Medium** — incorrect or risky behaviour with limited impact or a workaround.
+  - **Low** — minor issues, style, maintainability, documentation.
+- **Category** — one of the 10 checklist categories above.
+- **Location** — `file:line` (clickable), or a list of locations.
+- **Description** — what is wrong and why it matters.
+- **Recommendation** — concrete suggested fix.
+
+After recording findings, update the module header table (status, open-finding
+count) and regenerate the base README (step 5).
+
+## 4. Marking an item resolved
+
+Findings are **never deleted** — they are an audit trail. To close one, change
+its **Status** and complete the **Resolution** field:
+
+- `Open` — newly recorded, not yet addressed.
+- `In Progress` — a fix is actively being worked on.
+- `Resolved` — fixed. The Resolution field must state the fixing commit SHA, the
+  date, and a one-line description of the fix.
+- `Won't Fix` — intentionally not fixed. The Resolution field must justify why.
+- `Deferred` — valid but postponed. The Resolution field must say what it is
+  waiting on (e.g. a tracked issue or a later milestone).
+
+`Resolved`, `Won't Fix`, and `Deferred` findings are all considered **closed**.
+`Open` and `In Progress` are **pending** and appear in the base README's Pending
+Findings table.
+
+## 5. Updating the base README
+
+`code-reviews/README.md` holds the single cross-module view (the Module Status
+table and the Pending / Closed Findings tables). It is **generated** from the
+per-module `findings.md` files — do not edit it by hand.
+
+After any review or status change, regenerate it:
+
+```
+python code-reviews/regen-readme.py
+```
+
+`regen-readme.py --check` exits non-zero if `README.md` is stale, if a module
+header's `Open findings` count disagrees with its finding statuses, or if a
+finding carries an unrecognised Status value. The PowerShell wrapper
+`scripts/check-code-reviews-readme.ps1` runs that check and is the intended hook
+for CI or a pre-commit step. `code-reviews/test_regen_readme.py` covers the
+generator itself (`python code-reviews/test_regen_readme.py`).
+
+> The repo's installed `python` is the real interpreter; the bare `python3`
+> alias on this box resolves to the Windows Store stub and fails. Use `python`.
+
+The per-module `findings.md` files are the source of truth; `README.md` is the
+aggregated index and must always agree with them — which the script guarantees.
+
+## 6. Re-reviewing a module
+
+Re-reviews append to the same `findings.md`. Update the header to the new commit
+and date, continue the finding numbering from the last used ID, and leave prior
+findings (including closed ones) in place as history.
@@ -0,0 +1,2 @@
+# regen-readme.py / test_regen_readme.py bytecode cache
+__pycache__/
@@ -0,0 +1,222 @@
+# Code Review — Admin
+
+| Field | Value |
+|---|---|
+| Module | `src/Server/ZB.MOM.WW.OtOpcUa.Admin` |
+| Reviewer | Claude Code |
+| Review date | 2026-05-22 |
+| Commit reviewed | `76d35d1` |
+| Status | Reviewed |
+| Open findings | 3 |
+
+## Checklist coverage
+
+| # | Category | Result |
+|---|---|---|
+| 1 | Correctness & logic bugs | Admin-005 |
+| 2 | OtOpcUa conventions | Admin-010 |
+| 3 | Concurrency & thread safety | Admin-011 |
+| 4 | Error handling & resilience | Admin-008, Admin-013 |
+| 5 | Security | Admin-001, Admin-002, Admin-003, Admin-004, Admin-006 |
+| 6 | Performance & resource management | No issues found |
+| 7 | Design-document adherence | Admin-007, Admin-012 |
+| 8 | Code organization & conventions | No issues found |
+| 9 | Testing coverage | Admin-009 |
+| 10 | Documentation & comments | Admin-012 |
+
+## Findings
+
+### Admin-001
+
+| Field | Value |
+|---|---|
+| Severity | Critical |
+| Category | Security |
+| Location | `Components/Routes.razor:4-11`, `Program.cs:150` |
+| Status | Resolved |
+
+**Description:** The router uses a plain `RouteView` (not `AuthorizeRouteView`), and `MapRazorComponents<App>()` is registered without `.RequireAuthorization()`. A page-level `[Authorize]` attribute on a routable Razor component is only enforced when the router is `AuthorizeRouteView` — with `RouteView` the attribute is inert. Consequently every page in the app, including those that carry `@attribute [Authorize]` (`ClusterDetail`, `DraftEditor`, `Reservations`, `RoleGrants`, `Certificates`, `VirtualTags`, `ScriptedAlarms`, `ScriptLog`, `DiffViewer`, `ImportEquipment`, `Account`), is reachable by a fully unauthenticated user. There is no authentication gate anywhere in the pipeline. An anonymous browser can read the full fleet configuration, audit log, certificates and ACLs, and exercise mutating pages (see Admin-002).
+
+**Recommendation:** Replace `RouteView` with `AuthorizeRouteView` in `Routes.razor` (with a `<NotAuthorized>` slot that redirects to `/login`), or call `.RequireAuthorization()` on the `MapRazorComponents` endpoint with `/login` and `/auth/*` explicitly allowed anonymous. Add a fallback policy (`AddAuthorizationBuilder().SetFallbackPolicy(...)`) so new pages are secure-by-default. Re-verify every page after the gate is in place.
+
+**Resolution:** Resolved 2026-05-22 — `Routes.razor` switched to `AuthorizeRouteView` with a `NotAuthorized` slot routing unauthenticated callers to `/login` via a new `RedirectToLogin` component; `AddAuthorizationBuilder().SetFallbackPolicy(RequireAuthenticatedUser())` makes pages secure-by-default; `Login.razor` opts out with `[AllowAnonymous]` so the login page and static assets stay anonymous. Covered by `PageAuthorizationTests` (verified failing pre-fix, passing post-fix).
+
+### Admin-002
+
+| Field | Value |
+|---|---|
+| Severity | Critical |
+| Category | Security |
+| Location | `Components/Pages/Clusters/NewCluster.razor:1-7`, `Home.razor`, `Fleet.razor`, `Hosts.razor`, `AlarmsHistorian.razor`, `Clusters/ClustersList.razor`, `Clusters/Generations.razor`, `Drivers/FocasDetail.razor` |
+| Status | Resolved |
+
+**Description:** Several routable pages carry no authorization attribute at all. Most critically `NewCluster` (`/clusters/new`) is a mutating page — its `CreateAsync` writes a new `ServerCluster` row and a draft generation. Combined with Admin-001 (the router does not enforce `[Authorize]` either), an unauthenticated user can create clusters and seed config-DB rows. `Home`, `Fleet`, `Hosts`, `AlarmsHistorian`, `ClustersList`, `Generations` and `FocasDetail` likewise expose fleet topology, host status, historian diagnostics and generation history to anonymous callers.
+
+**Recommendation:** Add `@attribute [Authorize(...)]` to every routable page with the role/policy appropriate to its function (`NewCluster` and other write surfaces -> `CanPublish`/`CanEdit`; read pages -> an authenticated-user policy). A solution-wide fallback policy (see Admin-001) is the durable fix; per-page attributes remain the explicit declaration of intent.
+
+**Resolution:** Resolved 2026-05-22 — `@attribute [Authorize]` added to every unprotected routable page (`Home`, `Fleet`, `Hosts`, `AlarmsHistorian`, `ClustersList`, `FocasDetail`, `ModbusAddressPreview`, `ModbusDiagnostics`); `NewCluster` gated with `[Authorize(Policy = "CanPublish")]` per the admin-ui.md FleetAdmin cluster-create flow. Re-triage note: `Clusters/Generations.razor` carries no `@page` directive — it is a child component of `ClusterDetail`, not a routable page, so it needs no attribute (it inherits the parent route's gate). The Admin-001 fallback policy is the durable secure-by-default backstop; the per-page attributes are the explicit declaration of intent. Covered by `PageAuthorizationTests`.
+
+### Admin-003
+
+| Field | Value |
+|---|---|
+| Severity | High |
+| Category | Security |
+| Location | `Program.cs:137-139`, `Hubs/FleetStatusHub.cs:11`, `Hubs/AlertHub.cs:10`, `Hubs/ScriptLogHub.cs:30` |
+| Status | Resolved |
+
+**Description:** All three SignalR hubs (`/hubs/fleet`, `/hubs/alerts`, `/hubs/script-log`) are mapped with no `[Authorize]` attribute and no `.RequireAuthorization()` on the `MapHub` call. Any unauthenticated client can open a hub connection: `FleetStatusHub.SubscribeFleet()` streams every node generation/role/resilience state, `AlertHub` pushes all fleet alerts (including failure detail text), and `ScriptLogHub.TailLogAsync` streams the contents of the server `scripts-*.log` files. This is an unauthenticated information-disclosure channel that bypasses the (already broken — see Admin-001) page auth entirely.
+
+**Recommendation:** Add `[Authorize]` to each `Hub` class, or chain `.RequireAuthorization()` onto each `MapHub(...)` call in `Program.cs`. The hub `SubscribeCluster`/`TailLogAsync` methods should additionally validate that the caller claims permit the requested cluster/script scope.
+
+**Resolution:** Resolved 2026-05-22 — `[Authorize]` added to `FleetStatusHub`, `AlertHub` and `ScriptLogHub`, and `.RequireAuthorization()` chained onto all three `MapHub(...)` calls in `Program.cs` as a belt-and-braces backstop, so an anonymous client can no longer open any hub connection. Covered by `AuthEndpointsTests.Anonymous_hub_negotiate_is_rejected`.
+
+### Admin-004
+
+| Field | Value |
+|---|---|
+| Severity | High |
+| Category | Security |
+| Location | `appsettings.json:3,13-14` |
+| Status | Resolved |
+
+**Description:** The checked-in `appsettings.json` contains live-looking secrets in plaintext: the `ConfigDb` connection string with `User Id=sa;Password=OtOpcUaDev_2026!` and the LDAP `ServiceAccountPassword: "serviceaccount123"`. It also sets `Encrypt=False` and `AllowInsecureLdap: true`, so the SQL and LDAP credentials travel unencrypted on the wire. Committing the `sa` account password and a service-account password to source control is a credential-exposure risk; `sa` additionally grants full server control, conflicting with the `ClusterService` doc comment that production should connect with a least-privilege grant.
+
+**Recommendation:** Move all secrets out of the committed file — use user-secrets for dev and environment variables / a secret store for production; leave only non-secret placeholders in `appsettings.json`. Use a least-privilege SQL login rather than `sa`. Enable TLS for both SQL (`Encrypt=True`) and LDAP (`UseTls=true`, `AllowInsecureLdap=false`) for any non-loopback deployment, and document the dev-only exception.
+
+**Resolution:** Resolved 2026-05-22 — the `sa` connection string and the LDAP `ServiceAccountPassword` were replaced with empty placeholders in `appsettings.json`; a `_secrets` note documents that they are supplied via user-secrets (dev) or the `ConnectionStrings__ConfigDb` / `Authentication__Ldap__ServiceAccountPassword` environment variables (prod), and that the connection string must use `Encrypt=True` and a least-privilege SQL login. A `UserSecretsId` was added to the Admin csproj, and `Program.cs` now fails fast with a clear message when `ConfigDb` is empty/missing. Covered by `AppSettingsSecretHygieneTests`.
+
+### Admin-005
+
+| Field | Value |
+|---|---|
+| Severity | High |
+| Category | Correctness & logic bugs |
+| Location | `Components/Pages/Login.razor:15,107-110` |
+| Status | Resolved |
+
+**Description:** `Login.razor` is an interactive component (the project default render mode is interactive server; the page declares no `@rendermode` but uses `EditForm`/`InputText` interactive binding and runs `SignInAsync` from an event handler). It calls `HttpContext.SignInAsync(...)` followed by `ctx.Response.Redirect("/")` from within a SignalR circuit callback. Writing auth cookies and HTTP redirect headers requires a live, unstarted HTTP response; in an interactive circuit the original HTTP response has long completed, so the cookie is typically not emitted and the redirect is ineffective (or throws "response has already started"). `admin-ui.md` section "Operator authentication" explicitly specifies the login as a static server-rendered HTML form POSTing to a `/auth/login` minimal-API endpoint with `data-enhance="false"` — that endpoint is not implemented and is not mapped in `Program.cs`.
+
+**Recommendation:** Implement the login as designed: a static-rendered form (`@rendermode` none, `data-enhance="false"`) posting to a `MapPost("/auth/login", ...)` minimal-API handler that does the LDAP bind, grant resolution, `SignInAsync` and redirect while the HTTP response is still owned by the endpoint. Do not perform `SignInAsync` from an interactive circuit.
+
+**Resolution:** Resolved 2026-05-22 — `Login.razor` rewritten as a static-rendered plain HTML `<form method="post" action="/auth/login" data-enhance="false">` (no `@rendermode`, no `EditForm`/`SignInAsync` in a circuit); the LDAP bind, grant resolution, cookie `SignInAsync` and redirect now run in a new `AuthEndpoints.MapAuthEndpoints()` minimal-API handler (`/auth/login`, `/auth/logout`) while the endpoint still owns the HTTP response. The handler is `AllowAnonymous`, carries an open-redirect guard on `returnUrl`, and surfaces bind errors back to the login page via a query-string. Covered by `AuthEndpointsTests` (valid login issues the cookie, invalid login redirects with error, open-redirect rejected, logout clears the cookie).
+
+### Admin-006
+
+| Field | Value |
+|---|---|
+| Severity | Medium |
+| Category | Security |
+| Location | `Components/Layout/MainLayout.razor:47-49`, `Program.cs:129,131-135` |
+| Status | Resolved |
+
+**Description:** `app.UseAntiforgery()` is enabled, but the Sign-out form (`<form method="post" action="/auth/logout">`) renders no antiforgery token, and the `MapPost("/auth/logout", ...)` endpoint does not call `.DisableAntiforgery()` or otherwise opt out. Depending on framework version this either makes logout fail with a 400 for legitimate users, or — if the endpoint is treated as exempt — leaves logout as an unprotected state-changing POST (CSRF logout). The same concern applies to the login form once Admin-005 is addressed.
+
+**Recommendation:** Emit an antiforgery token in the logout form and let `UseAntiforgery()` validate it; or explicitly and deliberately mark the endpoint `.DisableAntiforgery()` if a tokenless logout is intended. Verify login/logout round-trips after the change.
+
+**Resolution:** Resolved 2026-05-22 — `<AntiforgeryToken />` added to the sign-out form in `MainLayout.razor` and `.DisableAntiforgery()` removed from the `/auth/logout` endpoint so `UseAntiforgery()` validates the token; a tokenless POST now returns 400, preventing CSRF-logout. The login endpoint retains `.DisableAntiforgery()` (login is not a state-changing operation CSRF can abuse). `AuthEndpointsTests.Logout_without_antiforgery_token_is_rejected` regression-guards this.
+
+### Admin-007
+
+| Field | Value |
+|---|---|
+| Severity | Medium |
+| Category | Design-document adherence |
+| Location | `Components/Pages/Clusters/NewCluster.razor:91,95-96` |
+| Status | Resolved |
+
+**Description:** `NewCluster.CreateAsync` hardcodes `CreatedBy = "admin-ui"` (both on the `ServerCluster` row and the draft generation) instead of the signed-in operator principal name. `admin-ui.md` section "Audit" requires "the operator principal" be recorded on every write. The audit trail therefore cannot attribute cluster creation to a person. The same literal would apply to any anonymous creation that Admin-001/002 currently permit.
+
+**Recommendation:** Pass the authenticated user identity (`ClaimTypes.Name` / `NameIdentifier` from the cascaded `AuthenticationState`) as `createdBy`. Apply the same pattern to every other Admin write path that records a `CreatedBy`/`PublishedBy`/`ReleasedBy` field.
+
+**Resolution:** Resolved 2026-05-22 — `NewCluster.razor` and `ClusterDetail.razor` (the two pages that call `ClusterService.CreateAsync` / `GenerationService.CreateDraftAsync` with a hardcoded literal) now resolve `ClaimTypes.Name` / `ClaimTypes.NameIdentifier` from the cascaded `AuthenticationState` and pass the operator principal name as `createdBy`; the fallback is `"unknown"` (defensive, should never occur on an `[Authorize]`-gated page).
+
+### Admin-008
+
+| Field | Value |
+|---|---|
+| Severity | Medium |
+| Category | Error handling & resilience |
+| Location | `Services/ReservationService.cs:28-37` |
+| Status | Resolved |
+
+**Description:** `ReservationService.ReleaseAsync` calls `sp_ReleaseExternalIdReservation` with only `@Kind`, `@Value`, `@ReleaseReason`. `admin-ui.md` section "Release an external-ID reservation" specifies the proc sets `ReleasedBy` to the FleetAdmin who performed the release, and the action is the only path that allows ZTag/SAPID reuse and "requires explicit FleetAdmin action with a documented reason." The service does not capture or pass the operator principal, so the compliance audit trail for a release records no actor (unless the proc derives it from the DB session login, which would be the shared service account, not the operator).
+
+**Recommendation:** Add an operator-principal parameter to `ReleaseAsync`, pass it to the stored proc as `@ReleasedBy`, and have callers supply the signed-in user. Confirm the proc signature accepts it.
+
+**Resolution:** Resolved 2026-05-22 — a new EF migration (`20260522000001_AddReleasedByToReleaseExternalIdReservation`) adds `@ReleasedBy nvarchar(128)` to `sp_ReleaseExternalIdReservation` and uses it for both `ExternalIdReservation.ReleasedBy` and `ConfigAuditLog.Principal` (replacing `SUSER_SNAME()`); `ReservationService.ReleaseAsync` gains a `releasedBy` parameter with a guard; `Reservations.razor` resolves `ClaimTypes.Name` / `ClaimTypes.NameIdentifier` from the cascaded `AuthenticationState` and passes the operator principal to the service.
+
+### Admin-009
+
+| Field | Value |
+|---|---|
+| Severity | Medium |
+| Category | Testing coverage |
+| Location | `src/Server/ZB.MOM.WW.OtOpcUa.Admin` (whole module) |
+| Status | Resolved |
+
+**Description:** The module most security-critical behaviours have no enforced test coverage at the boundary that matters. There is no test that an unauthenticated request to a page or hub is rejected (which would have caught Admin-001/002/003), no test of the login -> cookie issuance round-trip (Admin-005), and the `AdminRoleGrantResolver` / `ClusterRoleClaims` authorization logic is exercised only in isolation. `InternalsVisibleTo` points at `ZB.MOM.WW.OtOpcUa.Admin.Tests`, but the auth pipeline itself is not asserted end-to-end. Per `REVIEW-PROCESS.md` category 9 these are untested critical paths.
+
+**Recommendation:** Add `WebApplicationFactory`-based integration tests asserting: (a) anonymous GET of each protected route returns 302->/login or 401; (b) anonymous hub connect is refused; (c) a valid login issues the cookie and a subsequent request is authorized; (d) a `ConfigViewer` is denied `CanPublish` pages. Wire the check into the `*.Admin.Tests` suite.
+
+**Resolution:** Resolved 2026-05-22 — (a) covered by existing `PageAuthorizationTests`; (b) covered by existing `AuthEndpointsTests.Anonymous_hub_negotiate_is_rejected`; (c) covered by existing `AuthEndpointsTests.Valid_login_issues_the_auth_cookie_and_redirects_home`; (d) new `AdminAuthPipelineTests` adds a `WebApplicationFactory` with a `RoleInjectingHandler` that stamps requests with caller-supplied roles, asserting that `ConfigViewer` is denied `CanPublish`-gated pages (403/302) while `FleetAdmin` is permitted, and that a `FleetAdmin` session can reach protected pages.
+
+### Admin-010
+
+| Field | Value |
+|---|---|
+| Severity | Low |
+| Category | OtOpcUa conventions |
+| Location | `Components/App.razor:9,16` |
+| Status | Open |
+
+**Description:** `App.razor` loads Bootstrap CSS and JS from the `cdn.jsdelivr.net` CDN. `admin-ui.md` section "Tech Stack" specifies "Bootstrap 5 vendored under `wwwroot/lib/bootstrap/`" precisely so the Admin app has no third-party runtime dependency. A CDN reference makes the UI fail in air-gapped / locked-down fleet deployments (a stated deployment target), introduces an uncontrolled third-party origin, and is not covered by a Subresource Integrity hash.
+
+**Recommendation:** Vendor Bootstrap under `wwwroot/lib/bootstrap/` and reference the local copies, as the design doc requires. If a CDN is retained for any asset, add `integrity` + `crossorigin` SRI attributes.
+
+**Resolution:** _(open)_
+
+### Admin-011
+
+| Field | Value |
+|---|---|
+| Severity | Low |
+| Category | Concurrency & thread safety |
+| Location | `Hubs/FleetStatusPoller.cs:24-26,98-103` |
+| Status | Open |
+
+**Description:** `FleetStatusPoller` keeps three plain `Dictionary<>` fields (`_last`, `_lastRole`, `_lastResilience`) mutated from `PollOnceAsync`. The poller `ExecuteAsync` loop is single-threaded so the steady-state poll path is safe, but `ResetCache()` (exposed `internal` for tests) clears those same dictionaries with no synchronization. If a test (or any caller) invokes `ResetCache()` while a poll tick is mid-iteration, the `Dictionary` enumeration/mutation race can throw `InvalidOperationException` or corrupt state.
+
+**Recommendation:** Either document `ResetCache()` as "only safe when the poller is stopped" and have tests stop the service first, or guard the three dictionaries with a lock / swap them atomically. Using `ConcurrentDictionary` (as the sibling `ResilientLdapGroupRoleMappingService` does) would make the intent explicit.
+
+**Resolution:** _(open)_
+
+### Admin-012
+
+| Field | Value |
+|---|---|
+| Severity | Low |
+| Category | Design-document adherence |
+| Location | `Services/EquipmentCsvImporter.cs:18-19,33-37,229,232` |
+| Status | Open |
+
+**Description:** `EquipmentCsvImporter` declares `EquipmentId` as a required CSV column and parses it into a `required` field. `admin-ui.md` section "Equipment CSV import" (revised after adversarial review finding #4) is explicit: "No `EquipmentId` column — operator-supplied EquipmentId would mint duplicate equipment identity on typos ... never accepted from CSV imports." `EquipmentId` is system-derived (`EQ-` plus first 12 hex chars of `EquipmentUuid`). Accepting it from CSV either contradicts the design or silently lets an import set an identity field the doc says is un-settable. The XML doc on the class also cites the column as required per "decision #117", so either the code or the design doc is stale. `EquipmentImportBatchService.StageRowsAsync` propagates `row.EquipmentId` into the staging row, so any change must cover the finalize path.
+
+**Recommendation:** Reconcile with the design: drop `EquipmentId` from `RequiredColumns` and the `EquipmentCsvRow` shape (deriving it from `EquipmentUuid` at finalize time), or — if accepting it is a deliberate reversal — update `admin-ui.md` and the decision log so the two agree.
+
+**Resolution:** _(open)_
+
+### Admin-013
+
+| Field | Value |
+|---|---|
+| Severity | High |
+| Category | Error handling & resilience |
+| Location | `Components/Pages/Clusters/ClusterDetail.razor:180-197`, `Components/Pages/Clusters/AclsTab.razor`, `Components/Pages/Clusters/RedundancyTab.razor`, `Components/Pages/RoleGrants.razor`, `Components/Pages/Hosts.razor`, `Components/Pages/ScriptLog.razor`, `Program.cs:157-159` |
+| Status | Resolved |
+
+**Description:** The Admin-003 fix gated all three SignalR hubs with `[Authorize]` plus `.RequireAuthorization()`, but the six pages that open a client `HubConnection` to those hubs were never updated to authenticate. A server-side Blazor `HubConnection` runs inside the interactive circuit and has no access to the browser's HttpOnly `OtOpcUa.Admin` auth cookie, so the hub `negotiate` request returns 401. Four pages (`ClusterDetail`, `AclsTab`, `RedundancyTab`, `RoleGrants`) called `HubConnection.StartAsync()` with no `try`/`catch`, so the 401 surfaced as an unhandled exception — a full HTTP 500 page for the prerendered `/clusters/{ClusterId}` route (the core cluster-config surface) and a faulted circuit for the others. `Hosts` and `ScriptLog` already wrapped the connect in `try`/`catch`, so they did not crash, but the SignalR live-update feature was non-functional Admin-wide regardless. The Admin-003 hardening was therefore incomplete: it secured the hub server side without giving the in-process clients any way to present credentials. Discovered during a post-review browser smoke test of `/clusters/cluster-dev`.
+
+**Recommendation:** Two parts. (1) Stop the crash: guard every `HubConnection.StartAsync()` in `try`/`catch`, matching the best-effort pattern already documented in `Hosts.razor` — a hub hiccup must degrade live updates, not fault the page. (2) Restore the feature: give the hub clients a real credential. Cookie forwarding is not viable (the HttpOnly cookie is unreachable from the interactive circuit and persisting it into page state would leak it), so add a token scheme — mint a short-lived token for the circuit's authenticated user and supply it via `HttpConnectionOptions.AccessTokenProvider`, with a matching server-side authentication handler on the hub endpoints.
+
+**Resolution:** Resolved 2026-05-22 — (1) `StartAsync`/`SendAsync` wrapped in `try`/`catch` on `ClusterDetail`, `AclsTab`, `RedundancyTab` and `RoleGrants` so a hub failure degrades gracefully. (2) Added a bearer-token auth path: `HubTokenService` mints/validates short-lived tokens using ASP.NET Core Data Protection (no signing-key management, no new packages); `HubTokenAuthenticationHandler` is a custom `HubToken` scheme reading the token from the `Authorization: Bearer` header (negotiate) or the `access_token` query parameter (WebSocket upgrade); the `HubClients` authorization policy runs both the cookie and `HubToken` schemes and is applied via `RequireAuthorization("HubClients")` on all three `MapHub` calls; `AdminHubConnectionFactory` builds connections with an `AccessTokenProvider` that re-mints a token for the circuit's authenticated user on every (re)connect, and all six hub-consuming pages resolve their connections through it. Verified end-to-end in the browser: hub `negotiate` returns 200 and the WebSocket upgrades (101) where it previously 401'd.
@@ -0,0 +1,139 @@
+# Code Review — Analyzers
+
+| Field | Value |
+|---|---|
+| Module | `src/Tooling/ZB.MOM.WW.OtOpcUa.Analyzers` |
+| Reviewer | Claude Code |
+| Review date | 2026-05-22 |
+| Commit reviewed | `76d35d1` |
+| Status | Reviewed |
+| Open findings | 5 |
+
+## Checklist coverage
+
+| # | Category | Result |
+|---|---|---|
+| 1 | Correctness & logic bugs | Analyzers-001, Analyzers-002 |
+| 2 | OtOpcUa conventions | No issues found |
+| 3 | Concurrency & thread safety | No issues found |
+| 4 | Error handling & resilience | Analyzers-003 |
+| 5 | Security | No issues found |
+| 6 | Performance & resource management | Analyzers-004 |
+| 7 | Design-document adherence | Analyzers-005 |
+| 8 | Code organization & conventions | No issues found |
+| 9 | Testing coverage | Analyzers-006 |
+| 10 | Documentation & comments | Analyzers-007 |
+
+## Findings
+
+### Analyzers-001
+
+| Field | Value |
+|---|---|
+| Severity | Medium |
+| Category | Correctness & logic bugs |
+| Location | `src/Tooling/ZB.MOM.WW.OtOpcUa.Analyzers/UnwrappedCapabilityCallAnalyzer.cs:135-139` |
+| Status | Resolved |
+
+**Description:** `IsInsideWrapperLambda` treats a guarded call as "wrapped" if it is textually inside ANY lambda that is an argument to ANY invocation whose containing type is `CapabilityInvoker` or `AlarmSurfaceInvoker`. It matches the containing type only, never the parameter the lambda is bound to. The real wrapping contract is specifically the `callSite` (`Func<CancellationToken, ValueTask>` / `Func<CancellationToken, ValueTask<T>>`) parameter of `CapabilityInvoker.ExecuteAsync` / `ExecuteWriteAsync`. Any other lambda argument to a method on those types — a future overload that takes a predicate/selector lambda, or a lambda passed in a non-`callSite` position — would suppress the diagnostic even though the guarded call is not actually executed inside the resilience pipeline. The analyzer's own XML doc (lines 21-23) describes exactly this looser-than-intended behaviour. It is a latent false-negative gap rather than an active bug because the current `CapabilityInvoker` surface has no non-`callSite` lambda parameter.
+
+**Recommendation:** Resolve the symbol of the lambda argument's parameter (`IMethodSymbol.Parameters[i]`) and require its type to be the `Func<CancellationToken, ValueTask>` / `Func<CancellationToken, ValueTask<T>>` callsite shape, or at minimum match the wrapper method name (`ExecuteAsync` / `ExecuteWriteAsync`) rather than only the containing type. This closes the gap before a new overload silently widens the escape hatch.
+
+**Resolution:** Resolved 2026-05-22 — Replaced `WrapperTypes` string array with `WrapperMethods` (type FQN + method name) tuples so `IsInsideWrapperLambda` matches both containing type and method name, preventing future non-`callSite` overloads from silently suppressing the diagnostic.
+
+### Analyzers-002
+
+| Field | Value |
+|---|---|
+| Severity | Low |
+| Category | Correctness & logic bugs |
+| Location | `src/Tooling/ZB.MOM.WW.OtOpcUa.Analyzers/UnwrappedCapabilityCallAnalyzer.cs:46-50,130` |
+| Status | Open |
+
+**Description:** `AlarmSurfaceInvoker` is listed in `WrapperTypes`, but `AlarmSurfaceInvoker`'s public methods (`SubscribeAsync`, `UnsubscribeAsync`, `AcknowledgeAsync`) take no lambda arguments at all — callers pass `IReadOnlyList<...>` / `IAlarmSubscriptionHandle`, and the invoker builds the resilience lambdas internally. `IsInsideWrapperLambda` only ever returns `true` when it finds an `AnonymousFunctionExpressionSyntax` argument in the outer call's argument list. Because no `AlarmSurfaceInvoker` call site can have a lambda argument, the `AlarmSurfaceInvoker` entry in `WrapperTypes` is effectively dead — it can never satisfy the suppression condition. Guarded `IAlarmSource` calls written inside `AlarmSurfaceInvoker.cs` are in fact suppressed correctly, but only because they sit inside `CapabilityInvoker.ExecuteAsync` lambdas (the `CapabilityInvoker` entry does the work). The dead entry is misleading and suggests the analyzer recognises an `AlarmSurfaceInvoker` "lambda home" that does not exist.
+
+**Recommendation:** Either remove `AlarmSurfaceInvoker` from `WrapperTypes` (its calls are already covered transitively by the `CapabilityInvoker` match) and update the XML doc, or — if the intent is to allow `IAlarmSource` calls anywhere inside `AlarmSurfaceInvoker` regardless of lambda nesting — add an explicit "call site is lexically within the `AlarmSurfaceInvoker` type declaration" check rather than relying on a lambda-argument scan that never fires.
+
+**Resolution:** _(open)_
+
+### Analyzers-003
+
+| Field | Value |
+|---|---|
+| Severity | Low |
+| Category | Error handling & resilience |
+| Location | `src/Tooling/ZB.MOM.WW.OtOpcUa.Analyzers/UnwrappedCapabilityCallAnalyzer.cs:80,114-116` |
+| Status | Open |
+
+**Description:** `IsInsideWrapperLambda` is passed `context.Operation.SemanticModel` and returns `false` when that model is `null`. A `false` return means "not wrapped", so a null semantic model produces a false-positive diagnostic rather than silently skipping the call. For `RegisterOperationAction` the `SemanticModel` is non-null in normal compilation, so this is low-risk in practice, but the failure mode is the wrong direction — a tooling/IDE edge case where the model is unavailable would flag correct code. Separately, the analyzer has no defensive guard against partially-bound / malformed call sites: `method.ContainingType`, `method.ReturnType`, and `iface.GetMembers()` are dereferenced without null checks. `IInvocationOperation.TargetMethod` is non-null by contract and `ContainingType` is non-null for an ordinary method, so a hard crash is unlikely, but an analyzer that throws on malformed in-progress syntax degrades the IDE experience for the whole solution.
+
+**Recommendation:** When `semanticModel is null` in `AnalyzeInvocation`, return early (skip the call) instead of letting `IsInsideWrapperLambda` report it as unwrapped, so unavailable semantics never produce a false positive.
+
+**Resolution:** _(open)_
+
+### Analyzers-004
+
+| Field | Value |
+|---|---|
+| Severity | Low |
+| Category | Performance & resource management |
+| Location | `src/Tooling/ZB.MOM.WW.OtOpcUa.Analyzers/UnwrappedCapabilityCallAnalyzer.cs:95-112` |
+| Status | Open |
+
+**Description:** `ImplementsGuardedInterface` runs on every invocation operation in the compilation (every keystroke in the IDE). For each candidate it allocates via `AllInterfaces.Concat(new[] { method.ContainingType })`, builds a fully-qualified display string per interface and calls `string.Replace("global::", ...)`, then for matching interfaces iterates `iface.GetMembers().OfType<IMethodSymbol>()` calling `FindImplementationForInterfaceMember` per member. The `GuardedInterfaces` / `WrapperTypes` lookups are `string[].Contains` (linear scan) rather than a hash set. None of this is catastrophic — the interface sets are tiny — but the work is repeated for every invocation including the overwhelming majority that target non-guarded methods, and the FQN string formatting plus `Replace` allocation on the hot path is avoidable.
+
+**Recommendation:** Move to `RegisterCompilationStartAction`: resolve the guarded interface and wrapper-type symbols once via `Compilation.GetTypeByMetadataName`, capture them, and compare invocation symbols by `SymbolEqualityComparer` identity. Replace the `string[]` membership checks with a `HashSet`. This also makes the analyzer correctly no-op in compilations that do not reference `Core.Abstractions`.
+
+**Resolution:** _(open)_
+
+### Analyzers-005
+
+| Field | Value |
+|---|---|
+| Severity | Low |
+| Category | Design-document adherence |
+| Location | `src/Tooling/ZB.MOM.WW.OtOpcUa.Analyzers/UnwrappedCapabilityCallAnalyzer.cs:33-43` |
+| Status | Open |
+
+**Description:** `CapabilityInvoker`'s XML doc (`src/Core/.../Resilience/CapabilityInvoker.cs:15-17`) enumerates the routed capability surface as `IReadable`, `IWritable`, `ITagDiscovery`, `ISubscribable`, `IHostConnectivityProbe`, `IAlarmSource`, and all four `IHistoryProvider` reads — matching the analyzer's `GuardedInterfaces` set. However `IHistoryProvider` exposes five async methods, and two of them (`ReadAtTimeAsync`, `ReadEventsAsync`) are C# default-interface-method implementations. When a driver does not override a DIM and a caller invokes it through a concrete driver reference, `FindImplementationForInterfaceMember` returns the interface's own default method symbol; the second equality branch (`method.OriginalDefinition` == `member`) still catches the interface-typed-receiver case, so detection holds — but this DIM interaction is undocumented and untested, and a future driver that overrides one DIM but not the other creates an asymmetric guarded surface that nobody has verified.
+
+**Recommendation:** Add explicit test cases (see Analyzers-006) for `IHistoryProvider` calls via both an interface-typed receiver and a concrete driver that (a) overrides and (b) inherits the default `ReadAtTimeAsync` / `ReadEventsAsync`. If a gap is found, handle DIM members explicitly. Add a short remark to the analyzer XML doc noting the default-interface-method consideration.
+
+**Resolution:** _(open)_
+
+### Analyzers-006
+
+| Field | Value |
+|---|---|
+| Severity | Medium |
+| Category | Testing coverage |
+| Location | `tests/Tooling/ZB.MOM.WW.OtOpcUa.Analyzers.Tests/UnwrappedCapabilityCallAnalyzerTests.cs` |
+| Status | Resolved |
+
+**Description:** The test suite exercises only 3 of the 7 guarded interfaces (`IReadable`, `IWritable`, `ITagDiscovery`) and one positive / one negative lambda case. Significant untested behaviour for an analyzer that gates a repo-wide resilience invariant:
+
+- No test for `ISubscribable`, `IHostConnectivityProbe`, `IAlarmSource`, or `IHistoryProvider` — four of seven guarded interfaces, including the two (`IAlarmSource`, `IHistoryProvider`) with the most subtle wrapping story.
+- No test that a synchronous guarded-type member is NOT flagged — `IHostConnectivityProbe.GetHostStatuses()` is explicitly called out in the source comment (lines 75-77) as something the `IsAsyncReturningType` filter must let through, yet there is no regression test pinning that.
+- No test for a concrete driver class implementing the interface (the receiver is always the interface type `IReadable driver`); the `FindImplementationForInterfaceMember` branch of `ImplementsGuardedInterface` — the entire reason the source comment claims an unusually-named method implementing `IReadable.ReadAsync` still trips the rule — is never executed by a test.
+- No test for `ExecuteWriteAsync` (only `ExecuteAsync` is covered) and no test for `AlarmSurfaceInvoker`.
+- No test for nested lambdas or for the generated-code exclusion (`ConfigureGeneratedCodeAnalysis(GeneratedCodeAnalysisFlags.None)`).
+- The `StubSources` constant omits `ISubscribable` / `IAlarmSource` / `IHistoryProvider` / `IHostConnectivityProbe` and `AlarmSurfaceInvoker` entirely, so those paths cannot be tested without extending it.
+
+**Recommendation:** Extend `StubSources` with the remaining guarded interfaces and `AlarmSurfaceInvoker`, then add tests for: each remaining guarded interface (positive plus wrapped), a synchronous member not being flagged, a concrete driver-class receiver with a renamed implementing method, `ExecuteWriteAsync` wrapping, and a nested-lambda case.
+
+**Resolution:** Resolved 2026-05-22 — Extended `StubSources` with `ISubscribable`, `IAlarmSource`, `IHistoryProvider`, `IHostConnectivityProbe`, and `AlarmSurfaceInvoker` stubs; added 14 new tests covering each missing guarded interface (positive + wrapped), synchronous member not flagged, concrete driver receiver, `ExecuteWriteAsync` wrapping, and nested-lambda cases (19 tests total, all passing).
+
+### Analyzers-007
+
+| Field | Value |
+|---|---|
+| Severity | Low |
+| Category | Documentation & comments |
+| Location | `src/Tooling/ZB.MOM.WW.OtOpcUa.Analyzers/UnwrappedCapabilityCallAnalyzer.cs:21-26` |
+| Status | Open |
+
+**Description:** The `<remarks>` block states the analyzer "matches by receiver-interface identity using Roslyn's semantic model, not by method name". This is accurate for the guarded-call detection (`ImplementsGuardedInterface` uses symbols), but the wrapper detection in `IsInsideWrapperLambda` is described in the same block as walking the syntax tree and checking enclosing invocations by containing type — and that detection is in fact looser than the prose implies (see Analyzers-001): it does not verify the lambda is bound to the resilience `callSite` parameter. The XML doc reads as if the wrapper match is precise. The `<remarks>` also notes the rule does not enforce the capability argument matches the method, but omits the more important current limitation — that a lambda in any argument position of a wrapper-typed call suppresses the diagnostic.
+
+**Recommendation:** Tighten the `<remarks>` to state precisely what `IsInsideWrapperLambda` checks today (textual containment within a lambda argument of a `CapabilityInvoker` / `AlarmSurfaceInvoker`-typed invocation), and note the known limitation that it does not bind the lambda to the `callSite` parameter. Keep the doc in sync if Analyzers-001 is fixed.
+
+**Resolution:** _(open)_
@@ -0,0 +1,271 @@
+# Code Review — Client.CLI
+
+| Field | Value |
+|---|---|
+| Module | `src/Client/ZB.MOM.WW.OtOpcUa.Client.CLI` |
+| Reviewer | Claude Code |
+| Review date | 2026-05-22 |
+| Commit reviewed | `76d35d1` |
+| Status | Reviewed |
+| Open findings | 8 |
+
+## Checklist coverage
+
+A comprehensive review completes every category, recording "No issues found" where
+a category produced nothing rather than leaving it blank.
+
+| # | Category | Result |
+|---|---|---|
+| 1 | Correctness & logic bugs | Client.CLI-001, Client.CLI-002, Client.CLI-003 |
+| 2 | OtOpcUa conventions | Client.CLI-004 |
+| 3 | Concurrency & thread safety | Client.CLI-005 |
+| 4 | Error handling & resilience | Client.CLI-006 |
+| 5 | Security | No issues found |
+| 6 | Performance & resource management | Client.CLI-007 |
+| 7 | Design-document adherence | Client.CLI-008 |
+| 8 | Code organization & conventions | Client.CLI-009 |
+| 9 | Testing coverage | Client.CLI-010 |
+| 10 | Documentation & comments | Client.CLI-008 |
+
+## Findings
+
+### Client.CLI-001
+
+| Field | Value |
+|---|---|
+| Severity | Medium |
+| Category | Correctness & logic bugs |
+| Location | `Commands/HistoryReadCommand.cs:73`, `Commands/HistoryReadCommand.cs:76` |
+| Status | Resolved |
+
+**Description:** The start and end options are parsed with `DateTime.Parse(StartTime)` with
+no `IFormatProvider` or `DateTimeStyles`. Parsing therefore depends on the current OS
+culture: the same `--start "03/04/2026"` resolves to March 4 on an en-US box and April 3
+on an en-GB box. The CLI is documented as cross-platform and the value silently produces a
+different (wrong) history window rather than failing. The doc claims "ISO 8601 or date
+string" but ISO interpretation is not guaranteed without `DateTimeStyles.RoundtripKind` or
+`CultureInfo.InvariantCulture`. A bare date string is also assumed to be local time, then
+`.ToUniversalTime()` shifts it by the host offset, so the same input yields different
+ranges on machines in different time zones.
+
+**Recommendation:** Parse with `CultureInfo.InvariantCulture` and
+`DateTimeStyles.AssumeUniversal | DateTimeStyles.AdjustToUniversal` (or require explicit
+ISO 8601 via `DateTimeOffset.Parse`), and document the expected format and timezone
+assumption precisely.
+
+**Resolution:** Resolved 2026-05-22 — `DateTime.Parse` replaced with `CultureInfo.InvariantCulture` + `DateTimeStyles.AssumeUniversal | AdjustToUniversal`; option descriptions updated to document ISO 8601 UTC format.
+
+### Client.CLI-002
+
+| Field | Value |
+|---|---|
+| Severity | Low |
+| Category | Correctness & logic bugs |
+| Location | `Commands/SubscribeCommand.cs:129-137` |
+| Status | Open |
+
+**Description:** The summary computes `neverWentBad` as every target whose node-id key is
+absent from the `everBad` dictionary. A node that received no update at all is also absent
+from `everBad`, so it is counted in `neverWentBad` and printed under the heading
+"--- Nodes that NEVER received a bad-quality update (suspect) ---". The same node is also
+listed separately under `never` ("never received an update at all"). Labeling a node that
+produced zero notifications as a "suspect that never went bad" is misleading — it has not
+been observed at all, which is a different (and arguably worse) condition than a node that
+streamed only good values.
+
+**Recommendation:** Exclude no-update nodes from the `neverWentBad` set, e.g.
+`targets.Where(t => lastStatus.ContainsKey(key) && !everBad.ContainsKey(key))`, so the
+"suspect" list only contains nodes that were actually observed and never reported bad
+quality.
+
+**Resolution:** _(open)_
+
+### Client.CLI-003
+
+| Field | Value |
+|---|---|
+| Severity | Low |
+| Category | Correctness & logic bugs |
+| Location | `Commands/BrowseCommand.cs:29-30`, `Commands/SubscribeCommand.cs:20-27`, `Commands/AlarmsCommand.cs:28-29`, `Commands/HistoryReadCommand.cs:42-43` |
+| Status | Open |
+
+**Description:** Numeric command options accept any value with no range validation.
+`--depth`, `--interval`, `--max-depth`, `--max`, and the history `--interval` can all be
+supplied as `0` or a negative number. A negative `--depth`/`--max-depth` silently disables
+recursion or under-traverses; a zero/negative sampling `--interval` is passed straight
+through to `SubscribeAsync` and depends on the SDK/server to reject it; a negative `--max`
+is forwarded to `HistoryReadRawAsync`. None of these produce a clear operator-facing error.
+
+**Recommendation:** Validate option ranges at the start of `ExecuteAsync` and throw
+`CliFx.Exceptions.CommandException` with an actionable message when a value is out of
+range.
+
+**Resolution:** _(open)_
+
+### Client.CLI-004
+
+| Field | Value |
+|---|---|
+| Severity | Low |
+| Category | OtOpcUa conventions |
+| Location | `Commands/SubscribeCommand.cs:13-37` |
+| Status | Open |
+
+**Description:** `SubscribeCommand` is the only command in the module whose constructor
+and all `[CommandOption]` properties have no XML doc comments. Every other command
+(`ConnectCommand`, `ReadCommand`, `WriteCommand`, `BrowseCommand`, `AlarmsCommand`,
+`HistoryReadCommand`, `RedundancyCommand`) and `CommandBase` carry `<summary>` docs on the
+type, constructor, and options. The inconsistency is visible in IDE tooltips and breaks the
+otherwise-uniform documentation convention of the module.
+
+**Recommendation:** Add `<summary>` XML docs to the `SubscribeCommand` constructor and to
+each of its option properties, matching the style used by the sibling commands.
+
+**Resolution:** _(open)_
+
+### Client.CLI-005
+
+| Field | Value |
+|---|---|
+| Severity | Medium |
+| Category | Concurrency & thread safety |
+| Location | `Commands/SubscribeCommand.cs:66-78`, `Commands/AlarmsCommand.cs:52-64` |
+| Status | Resolved |
+
+**Description:** The `DataChanged` and `AlarmEvent` handlers write to `console.Output`
+(a `System.IO.TextWriter`) directly from the OPC UA SDK subscription/notification thread,
+while the command main flow is awaiting `Task.Delay(Timeout.Infinite, ct)` and the summary
+block also writes to the same `console.Output`. `TextWriter` instances are not guaranteed
+thread-safe; concurrent `WriteLine` calls from the notification thread and the main thread
+(a data-change notification arriving while the summary is being printed, or two
+notifications from different SDK threads) can interleave or corrupt output. The handler
+also calls the synchronous `WriteLine` and discards any exception, which on a fault would
+propagate into the SDK callback.
+
+**Recommendation:** Serialize console writes from event handlers — funnel notifications
+through a `Channel<T>` drained by the main thread, or guard every `console.Output` write
+with a shared lock. At minimum, ensure handler exceptions cannot escape into the SDK
+callback.
+
+**Resolution:** Resolved 2026-05-22 — notification handlers in `SubscribeCommand` and `AlarmsCommand` now enqueue lines to an `UnboundedChannel<string>` via `TryWrite`; the main thread drains the channel via `ReadAllAsync`. Handlers are named local functions so they can be unsubscribed before the summary phase; all handler exceptions are swallowed to protect the SDK callback.
+
+### Client.CLI-006
+
+| Field | Value |
+|---|---|
+| Severity | Low |
+| Category | Error handling & resilience |
+| Location | `Commands/HistoryReadCommand.cs:73`, `Commands/HistoryReadCommand.cs:76`, `Helpers/NodeIdParser.cs:39` |
+| Status | Open |
+
+**Description:** Operator input-format errors surface as raw .NET exceptions rather than
+clean CLI errors. An unparseable start/end value throws `FormatException` straight out of
+`DateTime.Parse`; an invalid node id throws `FormatException`/`ArgumentException` from
+`NodeIdParser`. CliFx renders unhandled exceptions with a stack trace, which is noisy for a
+user-input mistake. Other tooling in this module already distinguishes operator errors
+(`ParseAggregateType` throws `ArgumentException` with a helpful message) but none of these
+is converted to a `CliFx.Exceptions.CommandException` with a clean exit code.
+
+**Recommendation:** Catch the predictable input-validation exceptions and rethrow as
+`CommandException` with a concise message and a non-zero exit code, so malformed input
+yields a one-line error instead of a stack trace.
+
+**Resolution:** _(open)_
+
+### Client.CLI-007
+
+| Field | Value |
+|---|---|
+| Severity | Low |
+| Category | Performance & resource management |
+| Location | `CommandBase.cs:112-123` |
+| Status | Open |
+
+**Description:** `ConfigureLogging` builds a new Serilog `LoggerConfiguration`, creates a
+logger, and assigns it to the static `Log.Logger` without disposing the previously
+assigned logger. For a single CLI invocation this leaks at most one logger and the process
+exits shortly after, so impact is minimal — but `CommandBase` is also exercised repeatedly
+in-process by the unit-test suite, where each `ExecuteAsync` replaces `Log.Logger` and
+abandons the prior console sink without disposal. The pattern is incorrect:
+`Log.CloseAndFlush()` (or disposing the prior logger) should run before reassignment.
+
+**Recommendation:** Call `Log.CloseAndFlush()` before assigning a new `Log.Logger`, or
+build the logger into a local `ILogger` the command owns and disposes, rather than mutating
+global static state per command.
+
+**Resolution:** _(open)_
+
+### Client.CLI-008
+
+| Field | Value |
+|---|---|
+| Severity | Low |
+| Category | Documentation & comments |
+| Location | `docs/Client.CLI.md:158-217` |
+| Status | Open |
+
+**Description:** `docs/Client.CLI.md` is stale relative to the code at this commit.
+(1) The `subscribe` command section documents only `-n` and `-i`, but the code
+(`SubscribeCommand`) also exposes `-r/--recursive`, `--max-depth`, `-q/--quiet`,
+`--duration`, and `--summary-file` — none are documented, and the documented Ctrl+C-only
+lifecycle no longer matches `--duration` auto-exit.
+(2) The `historyread` "Aggregate mapping" table lists six aggregates but the code
+(`HistoryReadCommand.ParseAggregateType` and `AggregateType`) also supports
+`StandardDeviation` (aliases `stddev`/`stdev`); the doc option table omits it while the
+code option description includes it.
+
+**Recommendation:** Regenerate the `subscribe` and `historyread` sections of
+`docs/Client.CLI.md` from the current option set, including the five new subscribe flags
+and the `StandardDeviation` aggregate row.
+
+**Resolution:** _(open)_
+
+### Client.CLI-009
+
+| Field | Value |
+|---|---|
+| Severity | Low |
+| Category | Code organization & conventions |
+| Location | `Commands/SubscribeCommand.cs:66-165`, `Commands/AlarmsCommand.cs:52-91` |
+| Status | Open |
+
+**Description:** Both long-running commands attach an event handler
+(`service.DataChanged += ...`, `service.AlarmEvent += ...`) with a lambda and never detach
+it. Because the handler closes over `console`, the captured console and the closure remain
+referenced by the service until the service is disposed in the `finally` block. In
+practice the service is per-command and disposed at the end, so this does not leak across
+commands — but it is a latent footgun: a handler can still fire between `UnsubscribeAsync`
+/ `UnsubscribeAlarmsAsync` and `Dispose`, writing to a console that the command considers
+finished (overlapping with Client.CLI-005). The cleanup unsubscribes the monitored items
+but never the .NET event.
+
+**Recommendation:** Detach the handler explicitly (`service.DataChanged -= handler`) after
+unsubscribing, using a named local delegate so it can be removed, ensuring no notification
+is processed after the command output phase ends.
+
+**Resolution:** _(open)_
+
+### Client.CLI-010
+
+| Field | Value |
+|---|---|
+| Severity | Low |
+| Category | Testing coverage |
+| Location | `tests/Client/ZB.MOM.WW.OtOpcUa.Client.CLI.Tests/SubscribeCommandTests.cs` |
+| Status | Open |
+
+**Description:** The new `SubscribeCommand` capabilities are largely untested. The four
+`SubscribeCommandTests` cover only single-node subscribe, unsubscribe-on-cancel,
+disconnect-in-finally, and the subscription message. There is no test for the `--recursive`
+browse-and-collect path (`CollectVariablesAsync`), the `--duration` auto-exit path, the
+summary classification logic (`good`/`bad`/`never`/`neverWentBad`, including the
+mislabeling noted in Client.CLI-002), the `--quiet` flag, the `--summary-file` write, or
+per-node subscribe-failure handling. The summary logic is the most behaviour-rich part of
+the command and the part most likely to regress.
+
+**Recommendation:** Add unit tests for recursive variable collection, the duration-based
+exit, summary bucketing across good/bad/no-update nodes, and the `--summary-file` output.
+The `FakeOpcUaClientService` already exposes `RaiseDataChanged`, so feeding good/bad values
+and asserting the summary text is straightforward.
+
+**Resolution:** _(open)_
@@ -0,0 +1,192 @@
+# Code Review — Client.Shared
+
+| Field | Value |
+|---|---|
+| Module | `src/Client/ZB.MOM.WW.OtOpcUa.Client.Shared` |
+| Reviewer | Claude Code |
+| Review date | 2026-05-22 |
+| Commit reviewed | `76d35d1` |
+| Status | Reviewed |
+| Open findings | 5 |
+
+## Checklist coverage
+
+| # | Category | Result |
+|---|---|---|
+| 1 | Correctness & logic bugs | Client.Shared-001, Client.Shared-002, Client.Shared-003 |
+| 2 | OtOpcUa conventions | Client.Shared-004 |
+| 3 | Concurrency & thread safety | Client.Shared-005, Client.Shared-006, Client.Shared-007 |
+| 4 | Error handling & resilience | Client.Shared-008, Client.Shared-009 |
+| 5 | Security | No issues found |
+| 6 | Performance & resource management | Client.Shared-010 |
+| 7 | Design-document adherence | No issues found |
+| 8 | Code organization & conventions | No issues found |
+| 9 | Testing coverage | Client.Shared-011 |
+| 10 | Documentation & comments | Client.Shared-009 |
+
+## Findings
+
+### Client.Shared-001
+
+| Field | Value |
+|---|---|
+| Severity | Medium |
+| Category | Correctness & logic bugs |
+| Location | `OpcUaClientService.cs:552` |
+| Status | Resolved |
+
+**Description:** `OnAlarmEventNotification` returns early when `eventFields.EventFields` has fewer than 6 entries. The event filter built by `CreateAlarmEventFilter` always registers 13 select clauses, so a conforming server returns 13 fields. The `< 6` threshold is arbitrary and inconsistent: SourceName is index 2 and Severity index 5, but ConditionName (6), Retain (7), Acked/Active (8/9) and ConditionNodeId (12) are all needed for a usable alarm and are each guarded individually with `fields.Count > N`. A non-conforming server that returns a truncated list (or fewer fields than requested) makes the `< 6` early return silently drop the entire notification, including the EventId/SourceName/Severity that are present.
+
+**Recommendation:** Drop the `< 6` early return (or lower it to `< 1`) and rely on the existing per-index `fields.Count > N` guards, which already default missing fields safely. If a hard floor is wanted, document why 6 and not 13.
+
+**Resolution:** Resolved 2026-05-22 — lowered the early-return threshold to `< 1` (null or empty guard only); per-index `fields.Count > N` guards already default missing fields safely for all higher indices.
+
+### Client.Shared-002
+
+| Field | Value |
+|---|---|
+| Severity | Medium |
+| Category | Correctness & logic bugs |
+| Location | `OpcUaClientService.cs:351-355`, `OpcUaClientService.cs:373` |
+| Status | Resolved |
+
+**Description:** `GetRedundancyInfoAsync` performs unguarded unboxing casts on values read from the server: `(int)redundancySupportValue.Value` and `(byte)serviceLevelValue.Value`. Unlike the `ServerUriArray`/`ServerArray` reads below them, the `RedundancySupport` and `ServiceLevel` reads are not wrapped in try/catch. If the server returns the value boxed as a different numeric type than expected (e.g. `ServiceLevel` boxed as `int` instead of `byte`), or returns a null `Value` on a `Bad` DataValue, the cast throws `InvalidCastException`/`NullReferenceException` and the whole call fails instead of returning a sensible default.
+
+**Recommendation:** Wrap the `RedundancySupport` and `ServiceLevel` reads in the same defensive pattern used for the array reads, using `Convert.ToInt32`/`Convert.ToByte` on the boxed value and falling back to `None`/`0` when the read status is bad or the value is null.
+
+**Resolution:** Resolved 2026-05-22 — replaced direct casts with `StatusCode.IsGood` guard + `Convert.ToInt32`/`Convert.ToByte` coercion; falls back to `None`/`0` when status is bad or value is null.
+
+### Client.Shared-003
+
+| Field | Value |
+|---|---|
+| Severity | Low |
+| Category | Correctness & logic bugs |
+| Location | `Adapters/DefaultSessionAdapter.cs:76`, `Adapters/DefaultSessionAdapter.cs:273` |
+| Status | Open |
+
+**Description:** `WriteValueAsync` returns `response.Results[0]` and `CallMethodAsync` reads `result.Results[0]` without first checking the `Results` collection is non-empty. A malformed or service-level-faulted response (empty `Results` alongside a service fault) produces an `IndexOutOfRangeException` rather than a meaningful OPC UA `StatusCode` or `ServiceResultException`.
+
+**Recommendation:** Guard both accesses — throw `ServiceResultException` with the response's `ResponseHeader.ServiceResult` (or `BadUnexpectedError`) when `Results` is empty.
+
+**Resolution:** _(open)_
+
+### Client.Shared-004
+
+| Field | Value |
+|---|---|
+| Severity | Low |
+| Category | OtOpcUa conventions |
+| Location | `Adapters/DefaultSessionAdapter.cs:228`, `Adapters/DefaultSessionAdapter.cs:121`, `Adapters/DefaultSessionAdapter.cs:172` |
+| Status | Open |
+
+**Description:** `CloseAsync`, `HistoryReadRawAsync`, and `HistoryReadAggregateAsync` are declared `async Task` but call the synchronous `Session.Close()` / `Session.HistoryRead(...)` APIs and contain no `await`. The history methods run a blocking synchronous service round-trip on the caller's thread; for the UI this blocks the dispatcher thread. The async signature misleads callers, and the `CancellationToken` parameter is ignored on these paths.
+
+**Recommendation:** Use the stack's async overloads (`Session.HistoryReadAsync`, `Session.CloseAsync`) where available, or wrap the synchronous calls in `Task.Run`, so the methods are genuinely asynchronous and honor the cancellation token.
+
+**Resolution:** _(open)_
+
+### Client.Shared-005
+
+| Field | Value |
+|---|---|
+| Severity | High |
+| Category | Concurrency & thread safety |
+| Location | `OpcUaClientService.cs:19`, `OpcUaClientService.cs:226-249`, `OpcUaClientService.cs:499-521` |
+| Status | Resolved |
+
+**Description:** `_activeDataSubscriptions` is a plain `Dictionary` mutated from at least three thread contexts with no synchronization: the caller thread (`SubscribeAsync`/`UnsubscribeAsync`), the keep-alive callback thread (`HandleKeepAliveFailureAsync` -> `ReplaySubscriptionsAsync`, invoked fire-and-forget from the OPC UA `KeepAlive` event), and `DisconnectAsync`. Concurrent `Add`/`Remove`/`Clear`/enumeration on a non-thread-safe `Dictionary` can corrupt its internal buckets, throw `InvalidOperationException`, or lose entries. A failover firing while the UI calls `SubscribeAsync` is a realistic trigger. The `_activeAlarmSubscription` nullable tuple has the same exposure.
+
+**Recommendation:** Guard all access to `_activeDataSubscriptions` / `_activeAlarmSubscription` (and the `_session`/`_dataSubscription`/`_alarmSubscription` fields) with a single lock, or move subscription bookkeeping behind a `ConcurrentDictionary` plus a lock for the multi-field failover transition.
+
+**Resolution:** Resolved 2026-05-22 — added a dedicated `_subscriptionLock` and wrapped every read/write of `_activeDataSubscriptions` and `_activeAlarmSubscription` (in Subscribe/Unsubscribe[Alarms]Async, Disconnect, Dispose, and the snapshot/clear/re-record steps of ReplaySubscriptionsAsync) inside it; awaited adapter calls run outside the lock to avoid holding it across I/O.
+
+### Client.Shared-006
+
+| Field | Value |
+|---|---|
+| Severity | High |
+| Category | Concurrency & thread safety |
+| Location | `OpcUaClientService.cs:97-100`, `OpcUaClientService.cs:432-497` |
+| Status | Resolved |
+
+**Description:** `HandleKeepAliveFailureAsync` is launched fire-and-forget (`_ = HandleKeepAliveFailureAsync()`) from every bad keep-alive callback. The only guard against re-entry is the non-atomic check `if (_state == Reconnecting || _state == Disconnected) return;` at the top. Between that read and the `TransitionState(Reconnecting, ...)` write a few lines later, a second keep-alive failure (the SDK raises `KeepAlive` repeatedly while a session is down) can pass the same guard, and two failover loops run concurrently — each disposing `_session`, nulling subscription fields, and racing to assign a new `_session`. The session created by the loser leaks, and `ReplaySubscriptionsAsync` can run twice creating duplicate monitored items.
+
+**Recommendation:** Serialize failover with an `Interlocked.CompareExchange` flag or a `SemaphoreSlim(1,1)` so only one failover loop runs at a time; subsequent keep-alive failures during an in-flight failover should be ignored. Make the state transition atomic with the re-entry guard.
+
+**Resolution:** Resolved 2026-05-22 — `HandleKeepAliveFailureAsync` now claims an atomic `_failoverInProgress` slot via `Interlocked.CompareExchange(ref _failoverInProgress, 1, 0)`; a re-entrant bad keep-alive sees `1` and returns immediately, so only one failover loop runs. The loop body moved to `RunFailoverAsync`, wrapped in try/finally that resets the flag with `Interlocked.Exchange`.
+
+### Client.Shared-007
+
+| Field | Value |
+|---|---|
+| Severity | Medium |
+| Category | Concurrency & thread safety |
+| Location | `OpcUaClientService.cs:581-622` |
+| Status | Resolved |
+
+**Description:** In the alarm fallback path, the `Task.Run` closure mutates the captured locals `activeState`, `ackedState`, `time`, and `capturedMessage`, then reads them when invoking `AlarmEvent`. Because the captured `_session` reference can be replaced by a concurrent failover (see Client.Shared-006), the supplemental `ReadValueAsync` calls may run against a session being disposed, throwing `ObjectDisposedException` — caught by the bare `catch`, after which the alarm is delivered with default (false/MinValue) states, silently misreporting it as inactive/unacknowledged. The notification callback also has no back-pressure: a burst of alarm events spawns an unbounded number of `Task.Run` continuations each doing 3-4 server round-trips.
+
+**Recommendation:** Capture the session under the same lock proposed in Client.Shared-005 and skip the supplemental read if the session has changed or is disposed. Consider batching the four sequential `ReadValueAsync` calls into one `Read` request.
+
+**Resolution:** Resolved 2026-05-22 — added a `ReferenceEquals(session, _session)` guard at the top of the `Task.Run` body to skip reads if the session was replaced by failover; separated `ObjectDisposedException` from the general catch to drop rather than deliver the stale alarm.
+
+### Client.Shared-008
+
+| Field | Value |
+|---|---|
+| Severity | Medium |
+| Category | Error handling & resilience |
+| Location | `OpcUaClientService.cs:170-180`, `Helpers/ValueConverter.cs:15-31` |
+| Status | Resolved |
+
+**Description:** `WriteValueAsync` coerces a string input to the target type by reading the node's current value and inferring the type from `currentDataValue.Value`. When the node has never been written, or the read returns a `Bad` status with a null `Value`, `ValueConverter.ConvertValue` falls through to the `_ => rawValue` default and writes a raw `string` into, for example, an `Int32` node — the server then rejects it with `BadTypeMismatch`, surfacing as a confusing failure unrelated to the operator's input. Separately, `ConvertValue` uses `bool.Parse`, which accepts only `true`/`false` — operator input of `1`/`0` throws `FormatException` that propagates raw to the caller. The read-before-write also doubles the round-trip cost of every string write.
+
+**Recommendation:** Inspect `currentDataValue.StatusCode` before trusting `Value`; when the type cannot be inferred, surface a clear error rather than writing a mistyped value. Make boolean parsing accept `1`/`0`/`yes`/`no`, and wrap parse failures in a descriptive exception naming the node and target type.
+
+**Resolution:** Resolved 2026-05-22 — `WriteValueAsync` now checks `StatusCode.IsGood` and non-null `Value` before calling `ConvertValue`, throwing a descriptive `InvalidOperationException` on bad reads; `ValueConverter` now uses a `ParseBool` helper accepting `true/false/1/0/yes/no` (case-insensitive) and wraps all parse/overflow failures in a `FormatException` with the target type and input value in the message.
+
+### Client.Shared-009
+
+| Field | Value |
+|---|---|
+| Severity | Low |
+| Category | Error handling & resilience / Documentation & comments |
+| Location | `OpcUaClientService.cs:302-322` |
+| Status | Open |
+
+**Description:** `AcknowledgeAlarmAsync` is typed `Task<StatusCode>` and its XML doc implies the returned code reports the ack outcome, but the method unconditionally `return StatusCodes.Good`. The actual failure path is `DefaultSessionAdapter.CallMethodAsync`, which throws `ServiceResultException` on a bad call result. A failed acknowledgment therefore never returns a bad `StatusCode` — it throws — and the `StatusCode` return value is dead. Callers writing `if (StatusCode.IsBad(result))` will never see a bad result and will not catch the exception.
+
+**Recommendation:** Either change the return type to `Task` (and let exceptions signal failure), or catch `ServiceResultException` in `AcknowledgeAlarmAsync` and return its `StatusCode`. Update the XML doc to match whichever is chosen.
+
+**Resolution:** _(open)_
+
+### Client.Shared-010
+
+| Field | Value |
+|---|---|
+| Severity | Low |
+| Category | Performance & resource management |
+| Location | `Models/ConnectionSettings.cs:48`, `OpcUaClientService.cs:408-417` |
+| Status | Open |
+
+**Description:** `ConnectionSettings.CertificateStorePath` is initialized to `ClientStoragePaths.GetPkiPath()` as a property initializer, so every `ConnectionSettings` instantiation runs `Environment.GetFolderPath` + `Path.Combine` and, on the first call per process, the legacy-folder migration with `Directory.Exists`/`Directory.Move` filesystem IO. `ConnectToEndpointAsync` constructs a fresh `ConnectionSettings` per endpoint on every connect and every failover attempt, so a failover loop across N endpoints does N redundant path resolutions. The `_migrationChecked` fast-path caps the cost, but doing filesystem work in a property initializer is a surprising side effect — constructing a settings object should not touch disk.
+
+**Recommendation:** Make `CertificateStorePath` default to `string.Empty` and resolve `ClientStoragePaths.GetPkiPath()` lazily inside `DefaultApplicationConfigurationFactory.CreateAsync` only when the path is unset.
+
+**Resolution:** _(open)_
+
+### Client.Shared-011
+
+| Field | Value |
+|---|---|
+| Severity | Low |
+| Category | Testing coverage |
+| Location | `tests/Client/ZB.MOM.WW.OtOpcUa.Client.Shared.Tests/OpcUaClientServiceTests.cs` |
+| Status | Open |
+
+**Description:** The test suite is solid for the happy paths, connection lifecycle, and single-failover behavior. Gaps relative to the findings above: (a) no test exercises concurrent `SubscribeAsync`/failover to expose the `_activeDataSubscriptions` race (Client.Shared-005) or re-entrant keep-alive failures (Client.Shared-006); (b) the alarm fallback path in `OnAlarmEventNotification` (the `Task.Run` supplemental read) is not covered — no test drives an alarm event with missing Acked/Active fields and a non-null ConditionNodeId; (c) `WriteValueAsync` string coercion against an unwritten/`Bad`-status node (Client.Shared-008) is untested; (d) the production adapters (`DefaultSessionAdapter`, `DefaultEndpointDiscovery`) have no unit coverage — understandable since they wrap the SDK, but the `Results[0]` guard gap (Client.Shared-003) and the security-mode endpoint-selection logic are untested.
+
+**Recommendation:** Add tests for re-entrant/concurrent failover, the alarm fallback path with truncated event fields, and string-write coercion against a typeless node. Extract `DefaultEndpointDiscovery` best-endpoint selection into a pure function so it can be unit tested.
+
+**Resolution:** _(open)_
@@ -0,0 +1,296 @@
+# Code Review - Client.UI
+
+| Field | Value |
+|---|---|
+| Module | `src/Client/ZB.MOM.WW.OtOpcUa.Client.UI` |
+| Reviewer | Claude Code |
+| Review date | 2026-05-22 |
+| Commit reviewed | `76d35d1` |
+| Status | Reviewed |
+| Open findings | 6 |
+
+## Checklist coverage
+
+A comprehensive review completes every category, recording "No issues found" where
+a category produced nothing rather than leaving it blank.
+
+| # | Category | Result |
+|---|---|---|
+| 1 | Correctness & logic bugs | Client.UI-001, Client.UI-002 |
+| 2 | OtOpcUa conventions | Client.UI-003, Client.UI-004 |
+| 3 | Concurrency & thread safety | Client.UI-005 |
+| 4 | Error handling & resilience | Client.UI-006 |
+| 5 | Security | Client.UI-007 |
+| 6 | Performance & resource management | Client.UI-008 |
+| 7 | Design-document adherence | Client.UI-009 |
+| 8 | Code organization & conventions | Client.UI-010 |
+| 9 | Testing coverage | No issues found |
+| 10 | Documentation & comments | Client.UI-011 |
+
+## Findings
+
+### Client.UI-001
+
+| Field | Value |
+|---|---|
+| Severity | Medium |
+| Category | Correctness & logic bugs |
+| Location | `ViewModels/HistoryViewModel.cs:76`, `ViewModels/HistoryViewModel.cs:77` |
+| Status | Resolved |
+
+**Description:** `ReadHistoryAsync` runs as a `RelayCommand` body, which is invoked
+on the UI thread, so the bare `IsLoading = true` at line 76 happens to land on the
+right thread today. But `Results.Clear()` on the very next line is wrapped in
+`_dispatcher.Post(...)`, and the `finally` block also sets `IsLoading` through the
+dispatcher (`_dispatcher.Post(() => IsLoading = false)` at line 121). The two
+`IsLoading` writes use inconsistent dispatch paths. Because the `Post` in the
+`finally` is queued behind the result-population `Post` while the synchronous
+line-76 write is not, the loading-indicator updates are not guaranteed to be
+ordered relative to the grid population, and the pattern is fragile if the command
+is ever invoked off the UI thread (a future caller or test harness).
+
+**Recommendation:** Route the line-76 `IsLoading = true` through `_dispatcher.Post`
+for consistency with the rest of the method, or set both `IsLoading` writes
+synchronously and only dispatch the `ObservableCollection` mutations.
+
+**Resolution:** Resolved 2026-05-22 — Routed the `IsLoading = true` write through `_dispatcher.Post` to make both `IsLoading` assignments consistent with all other UI state mutations in the method.
+
+### Client.UI-002
+
+| Field | Value |
+|---|---|
+| Severity | Medium |
+| Category | Correctness & logic bugs |
+| Location | `ViewModels/MainWindowViewModel.cs:255`, `ViewModels/MainWindowViewModel.cs:333` |
+| Status | Resolved |
+
+**Description:** `ConnectAsync` calls `await BrowseTree.LoadRootsAsync()` and
+`ViewHistoryForSelectedNode` calls `History.SelectedNodeId = ...` by dereferencing
+the nullable child view-model properties (`BrowseTreeViewModel?`,
+`HistoryViewModel?`) without a null check or `!` operator, while the surrounding
+code (lines 258-266) does guard `Subscriptions` and `Alarms` with `!= null`.
+`InitializeService()` does assign all five child VMs before these lines run, so a
+real NRE is unlikely on the current call path, but the inconsistent guarding masks
+intent and the nullable-reference compiler flow analysis cannot prove
+`InitializeService()` set the field, so this either produces a CS8602 warning that
+is being ignored or relies on warnings being suppressed. A future refactor that
+makes `InitializeService()` conditionally skip a VM would introduce a silent crash.
+
+**Recommendation:** Make the guarding consistent: either guard all five child VMs
+uniformly, or have `InitializeService()` return non-null references the caller uses
+directly so the compiler can prove non-nullness.
+
+**Resolution:** Resolved 2026-05-22 — Added `if (BrowseTree != null)` and `if (History != null)` guards at both dereference sites to match the guarding style already used for `Subscriptions` and `Alarms`.
+
+### Client.UI-003
+
+| Field | Value |
+|---|---|
+| Severity | Low |
+| Category | OtOpcUa conventions |
+| Location | `ZB.MOM.WW.OtOpcUa.Client.UI.csproj:20-21`, `Program.cs:14-20` |
+| Status | Open |
+
+**Description:** The csproj references `Serilog` and `Serilog.Sinks.Console`, and
+`docs/Client.UI.md` lists Serilog as the logging technology, but no source file in
+the module uses Serilog. `Program.BuildAvaloniaApp()` uses Avalonia's
+`LogToTrace()` and there is no logger configuration, no log calls, and no rolling
+file sink. `CLAUDE.md` mandates "Serilog with rolling daily file sink" as the
+logging library preference. The references are dead weight and the documented
+logging behaviour does not exist.
+
+**Recommendation:** Either wire up Serilog (a console sink at minimum, ideally the
+rolling daily file sink the project standard calls for) and route Avalonia logging
+through it, or drop the unused `Serilog` package references and correct
+`docs/Client.UI.md`.
+
+**Resolution:** _(open)_
+
+### Client.UI-004
+
+| Field | Value |
+|---|---|
+| Severity | Low |
+| Category | OtOpcUa conventions |
+| Location | `Views/MainWindow.axaml.cs:125-138` |
+| Status | Open |
+
+**Description:** `OnBrowseCertPathClicked` uses `OpenFolderDialog`, which is
+obsolete in Avalonia 11.x (the version pinned in the csproj). The supported
+replacement is the `StorageProvider` API
+(`StorageProvider.OpenFolderPickerAsync`). Using the obsolete type produces a
+compiler obsoletion warning and the API is scheduled for removal in a future
+Avalonia major version.
+
+**Recommendation:** Migrate the folder chooser to
+`TopLevel.GetTopLevel(this).StorageProvider.OpenFolderPickerAsync(...)`.
+
+**Resolution:** _(open)_
+
+### Client.UI-005
+
+| Field | Value |
+|---|---|
+| Severity | Medium |
+| Category | Concurrency & thread safety |
+| Location | `ViewModels/MainWindowViewModel.cs:286-304`, `ViewModels/MainWindowViewModel.cs:155-189` |
+| Status | Resolved |
+
+**Description:** `SubscriptionsViewModel` and `AlarmsViewModel` attach handlers to
+the long-lived `_service` events (`DataChanged`, `AlarmEvent`) in their
+constructors and detach them only via `Teardown()`. `Teardown()` is called from
+`DisconnectAsync` (operator-initiated disconnect), but it is NOT called from the
+`OnConnectionStateChanged` partial method that handles the `Disconnected` state;
+that path only calls `Clear()`. When the connection drops server-side (session
+lost, network failure) the service raises `ConnectionStateChanged(Disconnected)`
+without `DisconnectAsync` ever running, so the alarm/data event handlers remain
+attached to a dead service. They are not re-attached on the next connect because
+`InitializeService()` early-returns when `_service != null` and the same VM
+instances are reused, so there is no handler leak per reconnect, but a late or
+buffered `DataChanged`/`AlarmEvent` callback fired during teardown will still mutate
+`ObservableCollection`s, and the asymmetry between the two disconnect paths is a
+latent correctness hazard.
+
+**Recommendation:** Make the disconnect handling symmetric: call
+`Subscriptions?.Teardown()` / `Alarms?.Teardown()` (or otherwise quiesce the event
+handlers) from the `Disconnected` branch of the `OnConnectionStateChanged` partial
+method, not only from `DisconnectAsync`.
+
+**Resolution:** Resolved 2026-05-22 — Added `Teardown()` calls to the `Disconnected` branch and added `Reattach()` methods (idempotent remove+add) called from the `Connected` branch to restore handlers after a server-side drop + reconnect.
+
+### Client.UI-006
+
+| Field | Value |
+|---|---|
+| Severity | Low |
+| Category | Error handling & resilience |
+| Location | `ViewModels/MainWindowViewModel.cs:244-252`, `ViewModels/AlarmsViewModel.cs:88-112`, `ViewModels/SubscriptionsViewModel.cs:79-94` |
+| Status | Open |
+
+**Description:** Many catch blocks swallow exceptions silently with an empty body
+and only a comment (`// Redundancy info not available`, `// Subscribe failed`,
+`// Subscription failed; no item added`, and others). When a subscribe,
+alarm-subscribe, or redundancy read fails, the operator gets no feedback at all: no
+status message, no log entry (compounded by Client.UI-003: there is no logger). A
+failed `AddSubscriptionAsync` simply leaves the node un-subscribed with no
+indication why. This makes field diagnosis of a misconfigured server or a
+permission denial effectively impossible from the UI.
+
+**Recommendation:** Surface failures to the operator: at minimum set a status
+message or write the exception to a log. Distinguish "feature not supported"
+(condition refresh) from "operation failed" so genuine errors are not hidden.
+
+**Resolution:** _(open)_
+
+### Client.UI-007
+
+| Field | Value |
+|---|---|
+| Severity | Medium |
+| Category | Security |
+| Location | `Services/UserSettings.cs:22-23`, `Services/JsonSettingsService.cs:38-50`, `ViewModels/MainWindowViewModel.cs:393-408` |
+| Status | Resolved |
+
+**Description:** The OPC UA `UserName`-token password is persisted in cleartext.
+`UserSettings.Password` is a plain `string`, `JsonSettingsService.Save` serializes
+the whole settings object to `settings.json` under `LocalApplicationData`, and
+`SaveSettings()` is invoked after every successful connect and on window close. Any
+process or user able to read the current user's profile directory can recover the
+server credentials. `docs/Client.UI.md` documents that "All connection parameters"
+are persisted but does not flag the password among them.
+
+**Recommendation:** Do not persist the password in cleartext. Options: omit it from
+the persisted model entirely (re-prompt each launch); encrypt it at rest with
+`ProtectedData` (DPAPI) on Windows or an equivalent OS keystore on other platforms;
+or store only a non-reversible reference. At minimum, document the cleartext
+storage as a known limitation.
+
+**Resolution:** Resolved 2026-05-22 — Removed `Password` from `UserSettings` and stopped writing/reading it in `SaveSettings`/`LoadSettings`; the operator is re-prompted each launch.
+
+### Client.UI-008
+
+| Field | Value |
+|---|---|
+| Severity | Medium |
+| Category | Performance & resource management |
+| Location | `ViewModels/MainWindowViewModel.cs:18`, `ViewModels/MainWindowViewModel.cs:125-148`, `App.axaml.cs:18-32` |
+| Status | Resolved |
+
+**Description:** `IOpcUaClientService` is declared `IDisposable`
+(`IOpcUaClientService.cs:10`), and the concrete service owns an OPC UA session plus
+SDK resources. `MainWindowViewModel` holds `_service` for the lifetime of the app
+but never calls `_service.Dispose()`: not on window close, not on disconnect, not
+anywhere. `DisconnectAsync` calls `DisconnectAsync()` on the service but leaves the
+object undisposed, and there is no `IDisposable` implementation on
+`MainWindowViewModel` itself. The OPC UA SDK session, certificate validator, and
+any background reconnect timers are leaked until process exit. The
+`ConnectionStateChanged` handler attached at line 130 is also never detached.
+
+**Recommendation:** Make `MainWindowViewModel` implement `IDisposable`, detach the
+`ConnectionStateChanged` handler, and dispose `_service` from `MainWindow.OnClosing`
+(alongside the existing `SaveSettings()` call).
+
+**Resolution:** Resolved 2026-05-22 — Added `IDisposable` to `MainWindowViewModel` with a `Dispose()` that detaches `ConnectionStateChanged`, calls `Teardown()` on child VMs, and calls `_service.Dispose()`; wired `Dispose()` into `MainWindow.OnClosing`.
+
+### Client.UI-009
+
+| Field | Value |
+|---|---|
+| Severity | Low |
+| Category | Design-document adherence |
+| Location | `ViewModels/HistoryViewModel.cs:44-54` |
+| Status | Open |
+
+**Description:** `HistoryViewModel.AggregateTypes` exposes eight entries: `null`
+(Raw) plus Average, Minimum, Maximum, Count, Start, End, and `StandardDeviation`.
+`docs/Client.UI.md` ("Query Options" table) lists only "Raw (default), Average,
+Minimum, Maximum, Count, Start, End" and omits `StandardDeviation`. The doc is
+stale relative to the code.
+
+**Recommendation:** Update the "Aggregate" row in `docs/Client.UI.md` to include
+Standard Deviation.
+
+**Resolution:** _(open)_
+
+### Client.UI-010
+
+| Field | Value |
+|---|---|
+| Severity | Low |
+| Category | Code organization & conventions |
+| Location | `Controls/DateTimeRangePicker.axaml.cs:33-37`, `Controls/DateTimeRangePicker.axaml.cs:70-80` |
+| Status | Open |
+
+**Description:** `DateTimeRangePicker` declares `MinDateTimeProperty` /
+`MaxDateTimeProperty` styled properties with public CLR accessors, but neither is
+read anywhere in the control. `TryParseDateTime`, `OnStartLostFocus`, and
+`OnEndLostFocus` never clamp or reject input against the min/max bounds, and no
+XAML binds them. The properties are dead API surface that implies a range
+constraint the control does not enforce.
+
+**Recommendation:** Either implement min/max validation in the `LostFocus` parse
+path (turn out-of-range input red, as invalid input already is) or remove the two
+unused styled properties.
+
+**Resolution:** _(open)_
+
+### Client.UI-011
+
+| Field | Value |
+|---|---|
+| Severity | Low |
+| Category | Documentation & comments |
+| Location | `Views/MainWindow.axaml:81`, `Services/JsonSettingsService.cs:11-15` |
+| Status | Open |
+
+**Description:** The certificate-store-path `TextBox` watermark reads
+`(default: AppData/LmxOpcUaClient/pki)`, referencing the legacy pre-task-#208
+folder name. Per `CLAUDE.md` / `docs/Client.UI.md` the canonical path is now
+`{LocalAppData}/OtOpcUaClient/`, and `ClientStoragePaths` migrates the old
+`LmxOpcUaClient/` folder forward. The watermark shows operators an obsolete path
+that no longer matches where settings and the PKI store actually live.
+
+**Recommendation:** Update the watermark to reference `OtOpcUaClient/pki`, or bind
+it to `ClientStoragePaths.GetPkiPath()` so it cannot drift again.
+
+**Resolution:** _(open)_
@@ -0,0 +1,192 @@
+# Code Review — Configuration
+
+| Field | Value |
+|---|---|
+| Module | `src/Core/ZB.MOM.WW.OtOpcUa.Configuration` |
+| Reviewer | Claude Code |
+| Review date | 2026-05-22 |
+| Commit reviewed | `76d35d1` |
+| Status | Reviewed |
+| Open findings | 5 |
+
+## Checklist coverage
+
+| # | Category | Result |
+|---|---|---|
+| 1 | Correctness & logic bugs | Configuration-001, Configuration-002, Configuration-003 |
+| 2 | OtOpcUa conventions | Configuration-004 |
+| 3 | Concurrency & thread safety | Configuration-005 |
+| 4 | Error handling & resilience | Configuration-006, Configuration-007 |
+| 5 | Security | Configuration-008, Configuration-009, Configuration-010 |
+| 6 | Performance & resource management | No issues found |
+| 7 | Design-document adherence | No issues found |
+| 8 | Code organization & conventions | No issues found |
+| 9 | Testing coverage | Configuration-011 |
+| 10 | Documentation & comments | No issues found |
+
+## Findings
+
+### Configuration-001
+
+| Field | Value |
+|---|---|
+| Severity | High |
+| Category | Correctness & logic bugs |
+| Location | `src/Core/ZB.MOM.WW.OtOpcUa.Configuration/Migrations/20260417215224_StoredProcedures.cs:282` |
+| Status | Resolved |
+
+**Description:** `sp_PublishGeneration` invokes `EXEC dbo.sp_ValidateDraft @DraftGenerationId = @DraftGenerationId;` and then continues unconditionally to the reservation MERGE and the `Status='Published'` update. `sp_ValidateDraft` signals every failure with `RAISERROR(..., 16, 1)` followed by `RETURN`. A severity-16 `RAISERROR` is not a batch-aborting error and `SET XACT_ABORT ON` does not abort the transaction for it, so control returns to `sp_PublishGeneration`, which publishes the draft even though validation rejected it (cross-cluster namespace binding, dangling tag FKs, duplicate external identifiers, EquipmentUuid immutability all pass through). Pre-publish validation is effectively bypassed.
+
+**Recommendation:** Wrap the `EXEC dbo.sp_ValidateDraft` in `BEGIN TRY ... END TRY BEGIN CATCH ROLLBACK; THROW; END CATCH` so the validation `RAISERROR` propagates and aborts the publish, or have `sp_ValidateDraft` return a result-set/output parameter that `sp_PublishGeneration` inspects and explicitly rolls back on. Add a regression test that publishes a draft with a known violation and asserts it stays `Draft`.
+
+**Resolution:** Resolved 2026-05-22 — wrapped the `EXEC dbo.sp_ValidateDraft` call in `sp_PublishGeneration` in a `BEGIN TRY ... BEGIN CATCH ROLLBACK; THROW; END CATCH` block so a validation `RAISERROR` rolls back the publish transaction and re-raises instead of being silently ignored; added DB-backed regression test `Publish_aborts_when_ValidateDraft_rejects_the_draft`.
+
+### Configuration-002
+
+| Field | Value |
+|---|---|
+| Severity | Medium |
+| Category | Correctness & logic bugs |
+| Location | `src/Core/ZB.MOM.WW.OtOpcUa.Configuration/Migrations/20260417215224_StoredProcedures.cs:325` |
+| Status | Resolved |
+
+**Description:** `sp_RollbackToGeneration` opens its own `BEGIN TRANSACTION`, clones rows into a new Draft, then `EXEC dbo.sp_PublishGeneration`, which itself runs `BEGIN TRANSACTION` (nesting `@@TRANCOUNT` to 2) and on its failure paths executes a bare `ROLLBACK`. A bare `ROLLBACK` rolls back to the outermost transaction and sets `@@TRANCOUNT` to 0, so when `sp_RollbackToGeneration` later reaches its own `COMMIT` it runs with no open transaction and raises error 3902. The rollback clone is silently discarded and the caller sees a confusing secondary error instead of the real publish failure.
+
+**Recommendation:** Make `sp_PublishGeneration` transaction-nesting aware: capture `@@TRANCOUNT` on entry, only `BEGIN TRANSACTION` when zero (otherwise `SAVE TRANSACTION`), and only `COMMIT`/`ROLLBACK` the level it owns. Alternatively factor the publish body into an inner proc that assumes an ambient transaction.
+
+**Resolution:** Resolved 2026-05-22 — made `sp_PublishGeneration` transaction-nesting aware: captures `@@TRANCOUNT` on entry, issues `BEGIN TRANSACTION` when zero or `SAVE TRANSACTION sp_PublishGeneration` when nested, and uses `ROLLBACK TRANSACTION sp_PublishGeneration` (savepoint rollback) on all failure paths in the nested case so the caller's outer transaction is not wiped; also wrapped `EXEC dbo.sp_ValidateDraft` in `BEGIN TRY ... END CATCH` so validation errors propagate correctly.
+
+### Configuration-003
+
+| Field | Value |
+|---|---|
+| Severity | Medium |
+| Category | Correctness & logic bugs |
+| Location | `src/Core/ZB.MOM.WW.OtOpcUa.Configuration/Validation/DraftValidator.cs:73` |
+| Status | Resolved |
+
+**Description:** `ValidatePathLength` computes path length with hard-coded constants — it always charges 64 chars for Enterprise+Site (`32 + 32 + ...`) regardless of the cluster's actual values. This over-rejects: a short Enterprise/Site is penalised by up to 64 unused chars, so a legitimately under-200-char path can fail `PathTooLong`. The check also silently `continue`s when an equipment's `UnsLineId`/`UnsAreaId` does not resolve, so an orphaned-line path is never length-checked.
+
+**Recommendation:** Pass the actual `Enterprise` and `Site` strings into the validator (e.g. on `DraftSnapshot`, or as parameters alongside `ValidateClusterTopology`) and compute the real length. If the cluster row cannot be supplied, document the check as a conservative upper bound or emit a lower-severity warning rather than a hard error.
+
+**Resolution:** Resolved 2026-05-22 — added nullable `Enterprise` and `Site` properties to `DraftSnapshot`; `ValidatePathLength` uses actual lengths when set and falls back to the conservative 32-char upper bound per segment with a comment explaining the trade-off; `DraftValidationService` now loads the cluster row and populates both properties; added `PathLength_uses_actual_Enterprise_Site_when_provided` and `PathLength_conservative_fallback_when_Enterprise_Site_absent` unit tests.
+
+### Configuration-004
+
+| Field | Value |
+|---|---|
+| Severity | Low |
+| Category | OtOpcUa conventions |
+| Location | `src/Core/ZB.MOM.WW.OtOpcUa.Configuration/Enums/NodePermissions.cs:8`, `src/Core/ZB.MOM.WW.OtOpcUa.Configuration/OtOpcUaConfigDbContext.cs:417` |
+| Status | Open |
+
+**Description:** `NodePermissions` is declared `[Flags] enum ... : uint`, while its XML doc and `NodeAcl.PermissionFlags`' doc both say "stored as int", and `ConfigureNodeAcl` uses `HasConversion<int>()` — a `uint`→`int` conversion. Only bits 0–11 are used today, but the underlying-type/storage-type mismatch is a latent trap: a future bit-31 flag yields a `uint` value that overflows `int` and the conversion round-trip would corrupt it.
+
+**Recommendation:** Change the enum underlying type to `int` (consistent with the docs and the conversion). No high bit is in use, so this is the smaller change.
+
+**Resolution:** _(open)_
+
+### Configuration-005
+
+| Field | Value |
+|---|---|
+| Severity | Low |
+| Category | Concurrency & thread safety |
+| Location | `src/Core/ZB.MOM.WW.OtOpcUa.Configuration/LocalCache/LiteDbConfigCache.cs:50` |
+| Status | Open |
+
+**Description:** `PutAsync` performs a non-atomic find-then-insert/update. Two concurrent `PutAsync` calls for the same `(ClusterId, GenerationId)` can both observe `existing is null` and both `Insert`, producing two rows for one generation. The constructor's `EnsureIndex` calls are non-unique, so the storage layer does not prevent the duplicate, and `PruneOldGenerationsAsync`'s `keepLatest` accounting is then off.
+
+**Recommendation:** Declare a unique index on `(ClusterId, GenerationId)` and treat the duplicate-key exception as an idempotent no-op, or guard `PutAsync` with an instance `SemaphoreSlim`/lock. Document the concurrency contract on `ILocalConfigCache`.
+
+**Resolution:** _(open)_
+
+### Configuration-006
+
+| Field | Value |
+|---|---|
+| Severity | Medium |
+| Category | Error handling & resilience |
+| Location | `src/Core/ZB.MOM.WW.OtOpcUa.Configuration/LocalCache/ResilientConfigReader.cs:79` |
+| Status | Resolved |
+
+**Description:** The fallback `catch` filters on `ex is not OperationCanceledException`. A SQL command timeout surfaced by ADO.NET as a `TaskCanceledException` (derives from `OperationCanceledException`) is then treated as caller cancellation and propagates instead of falling back to the sealed cache — the opposite of the documented "fallback on any exception including timeout". The retry `ShouldHandle` predicate has the same shape, so command-timeout cancellations are also not retried consistently.
+
+**Recommendation:** Distinguish caller cancellation from command-timeout cancellation explicitly: inspect `cancellationToken.IsCancellationRequested` to decide whether an `OperationCanceledException` is a genuine cancel (rethrow) or a timeout (fall back). Add unit tests for both a `TimeoutRejectedException` path and a command-timeout `TaskCanceledException` path asserting cache fallback occurs.
+
+**Resolution:** Resolved 2026-05-22 — changed the fallback `catch` filter to `ex is not OperationCanceledException || !cancellationToken.IsCancellationRequested` so a command-timeout `TaskCanceledException` (caller token not cancelled) triggers cache fallback while genuine caller cancellation still propagates; changed the retry `ShouldHandle` predicate to `Handle<Exception>()` (handles all exceptions, relying on Polly's own cancellation-token check to stop retrying on genuine cancellation); added three unit tests: `CommandTimeout_TaskCanceledException_FallsBackToCache`, `PollyTimeout_TimeoutRejectedException_FallsBackToCache`, and `CallerCancellation_Propagates_NotFallback`.
+
+### Configuration-007
+
+| Field | Value |
+|---|---|
+| Severity | Low |
+| Category | Error handling & resilience |
+| Location | `src/Core/ZB.MOM.WW.OtOpcUa.Configuration/Apply/GenerationApplier.cs:44` |
+| Status | Open |
+
+**Description:** `ApplyPass` wraps each callback in `catch (Exception ex)`. This swallows `OperationCanceledException` — a cancellation during a callback is recorded as just another entity error string and the applier keeps walking the remaining passes instead of stopping. It also masks fatal exceptions. The applier continues applying Added/Modified passes even after a Removed callback failed, leaving a partially-applied runtime state.
+
+**Recommendation:** Rethrow `OperationCanceledException` rather than recording it as an entity error; call `ct.ThrowIfCancellationRequested()` between passes. Document or enforce whether a failed Removed pass should abort before the Added/Modified passes run.
+
+**Resolution:** _(open)_
+
+### Configuration-008
+
+| Field | Value |
+|---|---|
+| Severity | High |
+| Category | Security |
+| Location | `src/Core/ZB.MOM.WW.OtOpcUa.Configuration/Migrations/20260417215224_StoredProcedures.cs:150`, `:373`, `:468` |
+| Status | Resolved |
+
+**Description:** Three stored procedures build `ConfigAuditLog.DetailsJson` by raw string concatenation of caller-supplied `nvarchar` parameters: `sp_RegisterNodeGenerationApplied` (`@Status`), `sp_RollbackToGeneration` (`@TargetGenerationId`), `sp_ReleaseExternalIdReservation` (`@Kind`, `@Value`). A value with a double-quote or backslash produces malformed JSON; combined with the `CK_ConfigAuditLog_DetailsJson_IsJson` check constraint, the `INSERT` fails the constraint and aborts the surrounding publish/rollback transaction (denial of operation). It is also a JSON-injection vector that can silently rewrite the audit record's shape.
+
+**Recommendation:** Build the JSON with a safe constructor (`FOR JSON PATH, WITHOUT_ARRAY_WRAPPER` or `JSON_OBJECT(...)` on SQL Server 2022+) so values are properly escaped, or run each interpolated value through `STRING_ESCAPE(@Value, 'json')`. Add tests with quote/backslash-containing inputs.
+
+**Resolution:** Resolved 2026-05-22 — routed every caller-supplied string interpolated into `DetailsJson` through `STRING_ESCAPE(@x, 'json')` (`@Status` in `sp_RegisterNodeGenerationApplied`; `@Kind`/`@Value` in `sp_ReleaseExternalIdReservation`) and emitted `sp_RollbackToGeneration`'s `@TargetGenerationId` as a bare JSON number via explicit `CONVERT(nvarchar(20), CONVERT(bigint, ...))`; added DB-backed regression tests `RegisterNodeGenerationApplied_escapes_quotes_in_audit_DetailsJson` and `ReleaseReservation_escapes_quotes_in_audit_DetailsJson` that round-trip quote/backslash inputs through `ISJSON`/`JSON_VALUE`.
+
+### Configuration-009
+
+| Field | Value |
+|---|---|
+| Severity | Medium |
+| Category | Security |
+| Location | `src/Core/ZB.MOM.WW.OtOpcUa.Configuration/DesignTimeDbContextFactory.cs:14` |
+| Status | Resolved |
+
+**Description:** `DefaultConnectionString` embeds a plaintext `sa` password with `User Id=sa` directly in source, checked into the repository. Although used only at design time (`dotnet ef`), a checked-in `sa` credential normalises committing DB passwords and, if live for the shared dev SQL Server, grants `sa` to anyone with repo access. `TrustServerCertificate=True` plus `Encrypt=False` additionally disables transport protection for that connection.
+
+**Recommendation:** Drop the embedded credential. Fall back to integrated auth (`Trusted_Connection=True`) or fail fast with a message instructing the developer to set `OTOPCUA_CONFIG_CONNECTION`. Rotate the dev `sa` password if this value is live.
+
+**Resolution:** Resolved 2026-05-22 — removed the embedded `sa` password and `DefaultConnectionString` constant entirely; `CreateDbContext` now throws `InvalidOperationException` with a clear setup message when `OTOPCUA_CONFIG_CONNECTION` is not set, rather than silently falling back to a hardcoded credential; added XML-doc example showing the recommended integrated-auth connection string.
+
+### Configuration-010
+
+| Field | Value |
+|---|---|
+| Severity | Low |
+| Category | Security |
+| Location | `src/Core/ZB.MOM.WW.OtOpcUa.Configuration/LocalCache/ResilientConfigReader.cs:81` |
+| Status | Open |
+
+**Description:** On central-DB read failure the warning log records the full exception object. Callers pass arbitrary `centralFetch` delegates; if any delegate closes over a connection string, an exception thrown from it (or a `SqlException` carrying server/credential context) is logged verbatim. There is no scrubbing of connection-string fragments before logging, against the project's no-secret-logging rule.
+
+**Recommendation:** Log `ex.GetType().Name` and `ex.Message` for SQL failures rather than the full exception, or run exception messages through a connection-string scrubber before they reach the log sink.
+
+**Resolution:** _(open)_
+
+### Configuration-011
+
+| Field | Value |
+|---|---|
+| Severity | Low |
+| Category | Testing coverage |
+| Location | `src/Core/ZB.MOM.WW.OtOpcUa.Configuration/Apply/GenerationApplier.cs:7`, `src/Core/ZB.MOM.WW.OtOpcUa.Configuration/Validation/DraftValidator.cs:60` |
+| Status | Open |
+
+**Description:** The companion test project covers the cache, schema compliance, stored procedures, and `DraftValidator` well, but two flagged behaviours are not pinned: (a) `GenerationApplier` ordering/cancellation when a Removed callback fails — no test asserts the Added/Modified passes still run or that cancellation aborts; (b) `ValidatePathLength`'s constant 32+32 approximation — no test exercises a long Enterprise/Site. The publish-bypasses-validation bug (Configuration-001) is also untested against the live SQL fixture.
+
+**Recommendation:** Add `GenerationApplierTests` cases for a throwing callback (assert error recorded, assert cancellation propagates) and a `DraftValidatorTests` path-length boundary case. Add a `StoredProceduresTests` case that publishes an invalid draft and asserts it stays `Draft`.
+
+**Resolution:** _(open)_
@@ -0,0 +1,156 @@
+# Code Review — Core.Abstractions
+
+| Field | Value |
+|---|---|
+| Module | `src/Core/ZB.MOM.WW.OtOpcUa.Core.Abstractions` |
+| Reviewer | Claude Code |
+| Review date | 2026-05-22 |
+| Commit reviewed | `76d35d1` |
+| Status | Reviewed |
+| Open findings | 5 |
+
+## Checklist coverage
+
+A comprehensive review completes every category, recording "No issues found" where
+a category produced nothing rather than leaving it blank.
+
+| # | Category | Result |
+|---|---|---|
+| 1 | Correctness & logic bugs | Core.Abstractions-001, Core.Abstractions-002 |
+| 2 | OtOpcUa conventions | No issues found |
+| 3 | Concurrency & thread safety | Core.Abstractions-003, Core.Abstractions-004 |
+| 4 | Error handling & resilience | Core.Abstractions-005 |
+| 5 | Security | No issues found |
+| 6 | Performance & resource management | No issues found |
+| 7 | Design-document adherence | No issues found |
+| 8 | Code organization & conventions | Core.Abstractions-006 |
+| 9 | Testing coverage | Core.Abstractions-007 |
+| 10 | Documentation & comments | Core.Abstractions-008 |
+
+## Findings
+
+### Core.Abstractions-001
+
+| Field | Value |
+|---|---|
+| Severity | Medium |
+| Category | Correctness & logic bugs |
+| Location | `src/Core/ZB.MOM.WW.OtOpcUa.Core.Abstractions/PollGroupEngine.cs:112` |
+| Status | Resolved |
+
+**Description:** `PollOnceAsync` detects a change with `!Equals(lastSeen?.Value, current.Value)`. `object.Equals` falls back to reference equality for reference types that do not override it — including `T[]` array values. The capability interfaces explicitly support 1-D array attributes (`DriverAttributeInfo.IsArray`, `ValueRank=1`), and a driver's batch reader produces a fresh array instance on every poll. As a result every poll of an array-valued tag is treated as a change, so `OnDataChange` fires on every tick regardless of whether the array contents actually changed. This produces spurious data-change notifications and unnecessary OPC UA monitored-item publishes for any poll-based driver (Modbus, S7, AB CIP, FOCAS) that exposes array tags.
+
+**Recommendation:** Compare array values structurally — e.g. when both `lastSeen?.Value` and `current.Value` are arrays, compare with `StructuralComparisons.StructuralEqualityComparer.Equals` (or element-wise) — instead of relying on `object.Equals`. Add a test covering an array-valued tag whose contents are unchanged across polls.
+
+**Resolution:** Resolved 2026-05-22 — introduced `ValuesAreDifferent` helper in `PollGroupEngine` that uses `StructuralComparisons.StructuralEqualityComparer` for `Array` values, falling back to `object.Equals` for scalars; added `Array_valued_tag_unchanged_contents_raises_only_once` and `Array_valued_tag_changed_contents_raises_event` tests.
+
+### Core.Abstractions-002
+
+| Field | Value |
+|---|---|
+| Severity | Medium |
+| Category | Correctness & logic bugs |
+| Location | `src/Core/ZB.MOM.WW.OtOpcUa.Core.Abstractions/PollGroupEngine.cs:105-109` |
+| Status | Resolved |
+
+**Description:** `PollOnceAsync` iterates `state.TagReferences` and indexes the reader's result with `snapshots[i]`, assuming the driver-supplied `_reader` delegate returns exactly one snapshot per input reference in input order. The contract is documented (ctor XML doc: "snapshots MUST be returned in the same order as the input references"), but it is never validated. A reader that returns a shorter list — a plausible driver bug, or a partial result on a backend error — throws `ArgumentOutOfRangeException` from the indexer. That exception escapes `PollOnceAsync`, is swallowed by the catch-all in `PollLoopAsync` (line 99), and the subscription then silently produces no further `OnDataChange` callbacks for the rest of its lifetime with no diagnostic. The failure mode is a permanently stalled subscription that looks healthy.
+
+**Recommendation:** Validate `snapshots.Count == state.TagReferences.Count` at the top of `PollOnceAsync` and throw a descriptive exception (or skip the tick with a logged diagnostic) so the contract violation is visible rather than silently degrading. Consider surfacing repeated reader-contract failures through a callback the driver can route to its health surface.
+
+**Resolution:** Resolved 2026-05-22 — added count-guard at the top of `PollOnceAsync` that throws `InvalidOperationException` with a descriptive message when the reader returns the wrong number of snapshots; added `Reader_short_result_list_raises_descriptive_exception_and_loop_continues` test verifying the loop survives contract violations and resumes delivering events once the reader recovers.
+
+### Core.Abstractions-003
+
+| Field | Value |
+|---|---|
+| Severity | Medium |
+| Category | Concurrency & thread safety |
+| Location | `src/Core/ZB.MOM.WW.OtOpcUa.Core.Abstractions/PollGroupEngine.cs:64,121-130` |
+| Status | Resolved |
+
+**Description:** `Subscribe` starts the poll loop with a fire-and-forget `Task.Run` and keeps no reference to the returned `Task`. Neither `Unsubscribe` nor `DisposeAsync` awaits the loop's completion — they only cancel the `CancellationTokenSource` and dispose it. Two consequences:
+
+1. After `DisposeAsync`/`Unsubscribe` returns, a poll already in flight inside `PollOnceAsync` can still complete and invoke the `_onChange` callback. A driver that disposes the engine during shutdown can therefore receive a data-change callback after it considers the engine torn down, with no way to know the engine is gone.
+2. `Unsubscribe`/`DisposeAsync` call `state.Cts.Dispose()` immediately while the loop may still be inside `Task.Delay(state.Interval, ct)`. Cancelling-then-disposing a CTS while a consumer still touches the token can race; `Task.Delay` on a disposed token can throw `ObjectDisposedException` rather than `OperationCanceledException`, which the `Task.Delay` await in `PollLoopAsync` does not catch (it catches only `OperationCanceledException`).
+
+**Recommendation:** Track each loop `Task` in `SubscriptionState` and await it (with a timeout) in `Unsubscribe`/`DisposeAsync` before disposing the CTS, so disposal is deterministic and no callback can fire after teardown. At minimum, defer `Cts.Dispose()` until the loop task has observed cancellation, or wrap the `Task.Delay` await to also tolerate `ObjectDisposedException`.
+
+**Resolution:** Resolved 2026-05-22 — stored the loop `Task` in `SubscriptionState.LoopTask`; `Unsubscribe` calls `StopState` which cancels then awaits the task (5 s timeout) before disposing the CTS; `DisposeAsync` cancels all loops in parallel then awaits them all via `Task.WhenAll` with a 5 s timeout before disposing CTSs, making teardown deterministic and preventing post-disposal callbacks.
+
+### Core.Abstractions-004
+
+| Field | Value |
+|---|---|
+| Severity | Low |
+| Category | Concurrency & thread safety |
+| Location | `src/Core/ZB.MOM.WW.OtOpcUa.Core.Abstractions/DriverTypeRegistry.cs:23-40` |
+| Status | Open |
+
+**Description:** `Register` performs a check-then-act sequence (`snapshot.ContainsKey` then build `next` then `Interlocked.Exchange`) that is not atomic. Two threads registering concurrently can both pass the duplicate check and both build a `next` dictionary; the second `Interlocked.Exchange` then wins and silently discards the first registration, defeating the documented "registered only once" guarantee. The class XML doc states registration happens single-threaded at startup, so this is not a live defect — but the use of `Interlocked.Exchange` for the swap implies the type is fully thread-safe for writers, which it is not. The mismatch between the implementation's apparent intent and its actual guarantee is a maintenance hazard.
+
+**Recommendation:** Either guard `Register` with a `lock` so the check-build-swap is atomic, or strengthen the XML `Thread-safety` remark to state explicitly that concurrent `Register` calls are unsupported and only reader/writer concurrency is safe.
+
+**Resolution:** _(open)_
+
+### Core.Abstractions-005
+
+| Field | Value |
+|---|---|
+| Severity | Low |
+| Category | Error handling & resilience |
+| Location | `src/Core/ZB.MOM.WW.OtOpcUa.Core.Abstractions/PollGroupEngine.cs:90,99` |
+| Status | Open |
+
+**Description:** Both the initial-poll and steady-state catch blocks use a bare `catch { }` that swallows every exception type, including non-transient programmer errors such as `NullReferenceException` and `ArgumentOutOfRangeException` (see Core.Abstractions-002). The XML remark says "transient poll error — loop continues, driver health surface logs it", but the engine never actually notifies the driver — there is no callback or event for a caught exception, so the driver's health surface has nothing to log. A persistently failing reader produces a silently spinning loop with zero observability from inside this module.
+
+**Recommendation:** Narrow the catch to the exception types a reader is expected to throw (or at least exclude obviously-fatal ones), and add an optional `Action<Exception>` error callback (or raise an event) so the owning driver can record poll failures on its health surface as the doc claims happens.
+
+**Resolution:** _(open)_
+
+### Core.Abstractions-006
+
+| Field | Value |
+|---|---|
+| Severity | Low |
+| Category | Code organization & conventions |
+| Location | `src/Core/ZB.MOM.WW.OtOpcUa.Core.Abstractions/IHistoryProvider.cs:63,84-86`, `src/Core/ZB.MOM.WW.OtOpcUa.Core.Abstractions/Historian/IHistorianDataSource.cs:30,63` |
+| Status | Open |
+
+**Description:** The two history-read surfaces use inconsistent integer types for the same "maximum rows" concept. `IHistoryProvider.ReadRawAsync` and `IHistorianDataSource.ReadRawAsync` take `uint maxValuesPerNode`, but `ReadEventsAsync` (on both interfaces) takes `int maxEvents`. The OPC UA `HistoryRead` service request fields are unsigned, and a negative `maxEvents` has no defined meaning. Mixing `int` and `uint` for the same parameter role across sibling methods forces every caller and implementer to reason about the inconsistency and risks accidental sign issues at the boundary.
+
+**Recommendation:** Standardize on `uint` for all max-rows parameters across both `IHistoryProvider` and `IHistorianDataSource` (or document explicitly why `maxEvents` differs).
+
+**Resolution:** _(open)_
+
+### Core.Abstractions-007
+
+| Field | Value |
+|---|---|
+| Severity | Low |
+| Category | Testing coverage |
+| Location | `tests/Core/ZB.MOM.WW.OtOpcUa.Core.Abstractions.Tests/PollGroupEngineTests.cs` |
+| Status | Open |
+
+**Description:** `PollGroupEngine` is the only behavioural (non-DTO) type in the module and its tests, while solid for the happy paths, miss two paths that this review identifies as defect-prone: (a) no test exercises an array-valued tag whose contents are unchanged across polls (would catch Core.Abstractions-001), and (b) no test exercises a reader that returns a snapshot list shorter than the input references (would catch Core.Abstractions-002). The `Reader_exception_does_not_crash_loop` test only covers a reader that throws before producing any result. `DataValueSnapshot` change-detection semantics for reference-typed values are therefore unverified.
+
+**Recommendation:** Add tests for the unchanged-array case and the short-result-list case once Core.Abstractions-001/002 are addressed, so the intended contract is locked down.
+
+**Resolution:** _(open)_
+
+### Core.Abstractions-008
+
+| Field | Value |
+|---|---|
+| Severity | Low |
+| Category | Documentation & comments |
+| Location | `src/Core/ZB.MOM.WW.OtOpcUa.Core.Abstractions/DriverHealth.cs:9`, `src/Core/ZB.MOM.WW.OtOpcUa.Core.Abstractions/IHistoryProvider.cs:39-43,65-69` |
+| Status | Open |
+
+**Description:** Two XML-doc inaccuracies:
+
+1. `DriverHealth.LastError` is documented as "Most recent error message; null when state is Healthy." The `DriverState` enum also defines `Degraded`, `Reconnecting`, and `Faulted` states, all of which carry an error; and a driver in `Healthy` state may legitimately retain the last error from a previously-recovered failure. The "null when Healthy" claim is stronger than the type enforces and than callers should rely on.
+2. `IHistoryProvider.ReadAtTimeAsync` and `ReadEventsAsync` are C# default interface methods whose `<remarks>` say "Default implementation throws". This is accurate, but the sibling `IHistorianDataSource` declares the same methods as required (non-default) members — the asymmetry between the two history surfaces is undocumented and could surprise an implementer who assumes parity.
+
+**Recommendation:** Reword `DriverHealth.LastError` to "Most recent error message; may be null when no error has been recorded" without tying nullness to a specific state. Add a one-line note on `IHistoryProvider`/`IHistorianDataSource` explaining why one surface uses default methods and the other does not.
+
+**Resolution:** _(open)_
@@ -0,0 +1,192 @@
+# Code Review — Core.AlarmHistorian
+
+| Field | Value |
+|---|---|
+| Module | `src/Core/ZB.MOM.WW.OtOpcUa.Core.AlarmHistorian` |
+| Reviewer | Claude Code |
+| Review date | 2026-05-22 |
+| Commit reviewed | `76d35d1` |
+| Status | Reviewed |
+| Open findings | 2 |
+
+## Checklist coverage
+
+| # | Category | Result |
+|---|---|---|
+| 1 | Correctness & logic bugs | Core.AlarmHistorian-001, Core.AlarmHistorian-002 |
+| 2 | OtOpcUa conventions | Core.AlarmHistorian-003 |
+| 3 | Concurrency & thread safety | Core.AlarmHistorian-004, Core.AlarmHistorian-005 |
+| 4 | Error handling & resilience | Core.AlarmHistorian-006, Core.AlarmHistorian-007 |
+| 5 | Security | No issues found |
+| 6 | Performance & resource management | Core.AlarmHistorian-008 |
+| 7 | Design-document adherence | Core.AlarmHistorian-009 |
+| 8 | Code organization & conventions | No issues found |
+| 9 | Testing coverage | Core.AlarmHistorian-010 |
+| 10 | Documentation & comments | Core.AlarmHistorian-011 |
+
+## Findings
+
+### Core.AlarmHistorian-001
+
+| Field | Value |
+|---|---|
+| Severity | Critical |
+| Category | Correctness & logic bugs |
+| Location | `src/Core/ZB.MOM.WW.OtOpcUa.Core.AlarmHistorian/SqliteStoreAndForwardSink.cs:255-278` |
+| Status | Resolved |
+
+**Description:** `ReadBatch` builds two parallel lists, `rowIds` and `events`, that `DrainOnceAsync` later indexes together (`rowIds[i]` paired with `outcomes[i]`, where `outcomes` is 1:1 with `events`). But `rowIds.Add(reader.GetInt64(0))` runs unconditionally for every row, while `events.Add(evt)` is guarded by `if (evt is not null)`. If `JsonSerializer.Deserialize<AlarmHistorianEvent>` returns `null` for any row (corrupt or empty payload), `rowIds` gains an entry but `events` does not. The writer then returns `outcomes.Count == events.Count`, which passes the `outcomes.Count != events.Count` guard, and the per-row loop applies each outcome to `rowIds[i]` — every row from the skipped index onward is mapped to the wrong event's outcome. An `Ack` can delete a row whose event was never sent to the historian (silent alarm-event data loss), and a `PermanentFail` can dead-letter an unrelated good row. The corrupt row itself is never advanced and is re-read on every drain forever, permanently stalling the queue head.
+
+**Recommendation:** Keep `rowIds` and `events` strictly aligned. Either skip the `rowId` when deserialization returns `null`, or — better — treat a `null`/failed deserialization as an immediate dead-letter for that specific `RowId` (it can never succeed) and exclude it from the batch passed to the writer. Carry the `rowId` inside a single list of `(long RowId, AlarmHistorianEvent Event)` tuples so the two can never drift.
+
+**Resolution:** Resolved 2026-05-22 — `ReadBatch` now returns a single list of `QueueRow(long RowId, AlarmHistorianEvent? Event)` records so a rowId can never drift from its event; `DrainOnceAsync` immediately dead-letters rows whose payload is null/un-deserializable (also catching `JsonException`) and forwards only well-formed events to the writer, mapping outcomes by `liveRows[i].RowId`. Regression tests `Drain_with_corrupt_payload_row_deadletters_it_and_keeps_good_rows_aligned` and `Drain_with_corrupt_head_row_does_not_stall_queue` added.
+
+### Core.AlarmHistorian-002
+
+| Field | Value |
+|---|---|
+| Severity | High |
+| Category | Correctness & logic bugs |
+| Location | `src/Core/ZB.MOM.WW.OtOpcUa.Core.AlarmHistorian/SqliteStoreAndForwardSink.cs:99-105,386-388` |
+| Status | Resolved |
+
+**Description:** The class computes an exponential-backoff value (`_backoffIndex`, `BumpBackoff`, `CurrentBackoff`, the `BackoffLadder`) and the class doc-comment states "Drain runs on a shared `Timer`. Exponential backoff on `RetryPlease`: 1s → 2s → 5s → 15s → 60s cap." However `StartDrainLoop` creates the `Timer` with a fixed `tickInterval` for both due-time and period and never reschedules it. `CurrentBackoff` is computed but never consulted by the timer, so the drain loop keeps hammering the historian at the fixed cadence regardless of `BackingOff` state. The documented backoff behavior does not exist for the production drain path — it is only observable via the `CurrentBackoff` property in tests.
+
+**Recommendation:** Make the drain loop honor the backoff. Either switch to a self-rescheduling one-shot timer that sets its next due-time to `max(tickInterval, CurrentBackoff)` after each `DrainOnceAsync`, or have `DrainOnceAsync` skip the writer call while still inside the backoff window (track `_nextEligibleDrainUtc`). Update the doc-comment if the design intentionally changes.
+
+**Resolution:** Resolved 2026-05-22 — `StartDrainLoop` now arms a self-rescheduling one-shot `Timer`; `RescheduleDrain` sets the next due-time to `max(tickInterval, CurrentBackoff)` while `_drainState` is `BackingOff` so a historian outage genuinely slows the cadence down the ladder. Class doc-comment updated. Regression tests `StartDrainLoop_honors_backoff_and_slows_cadence_under_retry` and `StartDrainLoop_keeps_steady_cadence_when_writer_is_healthy` added.
+
+### Core.AlarmHistorian-003
+
+| Field | Value |
+|---|---|
+| Severity | Medium |
+| Category | OtOpcUa conventions |
+| Location | `src/Core/ZB.MOM.WW.OtOpcUa.Core.AlarmHistorian/SqliteStoreAndForwardSink.cs:107-127,218-243,246-253` |
+| Status | Resolved |
+
+**Description:** `EnqueueAsync` is declared `async`-shaped (`Task EnqueueAsync(...)`) and the `IAlarmHistorianSink` contract explicitly states "the sink MUST NOT block the emitting thread … `EnqueueAsync` returns as soon as the queue row is committed." But the implementation does fully synchronous, blocking SQLite I/O (`conn.Open()`, `EnforceCapacity`, `cmd.ExecuteNonQuery()`) on the caller's thread and only then returns `Task.CompletedTask`. Under SQLite write contention with the drain worker this blocks the alarm-emitting thread for the full lock-wait. The same synchronous-work-behind-an-async-or-status-API pattern applies to `GetStatus` (called from the Admin UI / `/healthz` request thread) and `RetryDeadLettered`. The `cancellationToken` parameter of `EnqueueAsync` is accepted and ignored.
+
+**Recommendation:** Either make the I/O genuinely asynchronous (`await conn.OpenAsync(ct)`, `await cmd.ExecuteNonQueryAsync(ct)` — `Microsoft.Data.Sqlite` supports the async surface), or change `EnqueueAsync` to an in-memory hand-off (e.g. a `Channel`) drained by a background writer so the emitting thread truly never touches the database. At minimum honor the `cancellationToken` parameter.
+
+**Resolution:** Resolved 2026-05-22 — `EnqueueAsync` now uses `OpenAsync` / `ExecuteNonQueryAsync` / `ExecuteScalarAsync` throughout (capacity check included); `ApplyPragmasAsync` handles the WAL/busy-timeout PRAGMA on the async path; `cancellationToken` is threaded through every await so cancellation is honoured.
+
+### Core.AlarmHistorian-004
+
+| Field | Value |
+|---|---|
+| Severity | High |
+| Category | Concurrency & thread safety |
+| Location | `src/Core/ZB.MOM.WW.OtOpcUa.Core.AlarmHistorian/SqliteStoreAndForwardSink.cs:90,112,176,259` |
+| Status | Resolved |
+
+**Description:** Every operation opens a brand-new `SqliteConnection` from the bare connection string `Data Source={databasePath}` — no `busy_timeout` / `Pragma`, no shared cache. SQLite serializes writers with a file lock; when `EnqueueAsync` (emitting thread) and `DrainOnceAsync` (timer thread) collide, the loser gets an immediate `SQLITE_BUSY` exception because the default busy timeout is 0. In `DrainOnceAsync` the `BeginTransaction()` / `Commit()` block can fail mid-drain with `SQLITE_BUSY`; the exception escapes the `try` (it is not the writer-call `try`), the `finally` releases the gate, and the row outcomes are lost / partially applied. The class doc-comment claims `DrainOnceAsync` is "Safe to call from multiple threads" but the concurrent enqueue-vs-drain case is not actually safe against busy errors.
+
+**Recommendation:** Configure a non-zero busy timeout — `SqliteConnectionStringBuilder { DataSource = databasePath, DefaultTimeout = 5 }` plus `PRAGMA busy_timeout=5000` on open. Strongly consider WAL journal mode (`PRAGMA journal_mode=WAL`) so readers and the writer do not block each other. Reuse a single long-lived write connection guarded by `_drainGate` rather than opening/closing per call.
+
+**Resolution:** Resolved 2026-05-22 — the connection string is now built via `SqliteConnectionStringBuilder` with `DefaultTimeout = 5`, and every connection is opened through a new `OpenConnection` helper that applies `PRAGMA busy_timeout=5000` and `PRAGMA journal_mode=WAL` so an enqueue/drain lock collision waits the lock out instead of throwing `SQLITE_BUSY`. All eight call sites switched to the helper. Regression test `Concurrent_enqueue_and_drain_do_not_throw_sqlite_busy` added.
+
+### Core.AlarmHistorian-005
+
+| Field | Value |
+|---|---|
+| Severity | Medium |
+| Category | Concurrency & thread safety |
+| Location | `src/Core/ZB.MOM.WW.OtOpcUa.Core.AlarmHistorian/SqliteStoreAndForwardSink.cs:66-71,141-143,199,386-388` |
+| Status | Resolved |
+
+**Description:** The mutable status fields `_lastDrainUtc`, `_lastSuccessUtc`, `_lastError`, `_drainState`, and `_backoffIndex` are written by the drain timer thread inside `DrainOnceAsync` and read concurrently by `GetStatus()` / `CurrentBackoff` on Admin-UI / health-check threads with no memory barrier (no `lock`, no `volatile`, no `Interlocked`). `DateTime?` is not guaranteed to be written atomically, and the reader can observe a stale or torn value. This is a diagnostics surface so the impact is limited, but a torn `DateTime?` read is real undefined behavior.
+
+**Recommendation:** Guard the status fields with a small lock, or make the scalars `volatile` where the type permits and snapshot `DateTime?` values under a lock. Take the snapshot atomically in `GetStatus()`.
+
+**Resolution:** Resolved 2026-05-22 — added `_statusLock` object; all writes to `_lastDrainUtc`, `_lastSuccessUtc`, `_lastError`, `_drainState`, and `_evictedCount` (new) now happen inside `lock (_statusLock)` blocks; `GetStatus()` snapshots all fields atomically under the same lock. Regression test `GetStatus_snapshot_is_consistent_under_concurrent_drain` added.
+
+### Core.AlarmHistorian-006
+
+| Field | Value |
+|---|---|
+| Severity | High |
+| Category | Error handling & resilience |
+| Location | `src/Core/ZB.MOM.WW.OtOpcUa.Core.AlarmHistorian/SqliteStoreAndForwardSink.cs:103,135-216` |
+| Status | Resolved |
+
+**Description:** `StartDrainLoop` launches the drain with `new Timer(_ => _ = DrainOnceAsync(CancellationToken.None), ...)`. The returned `Task` is discarded (`_ =`), so any exception thrown by `DrainOnceAsync` is an unobserved task exception — never logged, never surfaced. Several paths in `DrainOnceAsync` can throw: the `outcomes.Count != events.Count` guard (`InvalidOperationException`), `JsonSerializer.Deserialize` on a malformed payload, `PurgeAgedDeadLetters` / `ReadBatch` / the commit block hitting `SQLITE_BUSY` or a schema error. When any of these throw, the drain silently stops making progress for that tick, `_drainState` is left stale (still `Draining`), and an operator watching the Admin UI sees no error. A persistently failing condition produces a silent, permanently stalled queue.
+
+**Recommendation:** Wrap the timer callback body in a `try/catch` that logs the exception via `_logger.Error`, records it into `_lastError`, and resets `_drainState` so the diagnostics surface reflects the failure. Do not discard the `Task` without an attached continuation that observes faults.
+
+**Resolution:** Resolved 2026-05-22 — the timer no longer discards the drain `Task`. A dedicated `DrainTimerCallback` `await`s `DrainOnceAsync` inside a `try/catch` that logs the fault via `_logger.Error`, records it into `_lastError`, and sets `_drainState = BackingOff` so the failure is visible on the `GetStatus` surface; a `finally` always re-arms the timer so a faulting tick can never permanently stall the queue. Regression test `StartDrainLoop_records_drain_fault_and_keeps_running` added.
+
+### Core.AlarmHistorian-007
+
+| Field | Value |
+|---|---|
+| Severity | Medium |
+| Category | Error handling & resilience |
+| Location | `src/Core/ZB.MOM.WW.OtOpcUa.Core.AlarmHistorian/SqliteStoreAndForwardSink.cs:172-174` |
+| Status | Resolved |
+
+**Description:** When the writer returns a wrong-cardinality result, the code throws `InvalidOperationException` after `WriteBatchAsync` has already succeeded. The events were potentially delivered to the historian, but no rows are deleted or dead-lettered, `_drainState` is left at `Draining`, and the backoff is not bumped. Combined with Core.AlarmHistorian-006 the exception is then swallowed. On the next drain the same batch is re-sent — if the writer actually delivered the events the first time, this produces duplicate historian rows; if it is a deterministic writer bug the queue stalls forever.
+
+**Recommendation:** Treat a cardinality mismatch as a transient batch failure: log it, set `_lastError`, bump backoff, set `_drainState = BackingOff`, and return without throwing — mirroring the writer-exception path at lines 162-170. A deterministic writer contract violation should also raise an operator-visible alert rather than silently looping.
+
+**Resolution:** Resolved 2026-05-22 — the `throw InvalidOperationException` replaced with log-and-backoff: mismatch is recorded into `_lastError`, `_drainState = BackingOff`, backoff is bumped, and the method returns without applying any outcomes — rows stay queued for the next drain attempt. Regression test `Writer_returning_wrong_cardinality_outcomes_sets_backing_off_and_keeps_rows` added.
+
+### Core.AlarmHistorian-008
+
+| Field | Value |
+|---|---|
+| Severity | Low |
+| Category | Performance & resource management |
+| Location | `src/Core/ZB.MOM.WW.OtOpcUa.Core.AlarmHistorian/SqliteStoreAndForwardSink.cs:107-127,255-278` |
+| Status | Open |
+
+**Description:** Each `EnqueueAsync` (one per alarm transition — a hot path on a busy plant) opens a connection, runs `EnforceCapacity` (a `COUNT(*)` over the queue table on every single enqueue), serializes JSON, inserts, and closes the connection. The unconditional `COUNT(*)` on every enqueue is an avoidable scan; the open/close churn defeats connection pooling benefits and adds lock-acquisition overhead per event. `DrainOnceAsync` similarly opens three separate connections per tick (`PurgeAgedDeadLetters`, `ReadBatch`, the transaction block).
+
+**Recommendation:** Reuse a single pooled write connection. Replace the per-enqueue `COUNT(*)` with a periodic capacity check (every Nth enqueue, or piggy-backed on the drain tick), or maintain an in-memory approximate counter. Combine the drain-tick connections into one.
+
+**Resolution:** _(open)_
+
+### Core.AlarmHistorian-009
+
+| Field | Value |
+|---|---|
+| Severity | Medium |
+| Category | Design-document adherence |
+| Location | `src/Core/ZB.MOM.WW.OtOpcUa.Core.AlarmHistorian/SqliteStoreAndForwardSink.cs:317-347` |
+| Status | Resolved |
+
+**Description:** `docs/AlarmTracking.md` and the `IAlarmHistorianSink` contract present the SQLite queue as the durability guarantee — "Durably enqueue the event", "operator acks never block on the historian being reachable". But `EnforceCapacity` silently deletes the oldest non-dead-lettered (not-yet-sent) rows when the queue reaches `DefaultCapacity` (1,000,000). Those are alarm-event records that were accepted as durably queued and are then dropped before ever reaching the historian — silent alarm-history data loss under sustained historian outage. The only signal is a `WARN` log line. Neither `docs/AlarmTracking.md` nor the sink's XML doc mentions that the durability guarantee is bounded, and there is no metric/dead-letter trail for evicted rows.
+
+**Recommendation:** At minimum document the bounded-durability behavior in `docs/AlarmTracking.md` and the `IAlarmHistorianSink` summary. Better: surface evicted-row counts in `HistorianSinkStatus` (a dedicated counter) so the loss is operator-visible, and consider routing overflow to the dead-letter table instead of hard-deleting it so the records survive for post-mortem within the retention window.
+
+**Resolution:** Resolved 2026-05-22 — added `EvictedCount` (default 0) to `HistorianSinkStatus` with full param-tag documentation; `EnforceCapacity` and `EnforceCapacityAsync` now increment `_evictedCount` (guarded by `_statusLock`) and include the lifetime total in the WARN log; `docs/AlarmTracking.md` documents the bounded-durability caveat and the `EvictedCount` surface. Regression test `Capacity_eviction_increments_evicted_count_on_status` added.
+
+### Core.AlarmHistorian-010
+
+| Field | Value |
+|---|---|
+| Severity | Medium |
+| Category | Testing coverage |
+| Location | `tests/Core/ZB.MOM.WW.OtOpcUa.Core.AlarmHistorian.Tests/SqliteStoreAndForwardSinkTests.cs` |
+| Status | Resolved |
+
+**Description:** The test suite covers the happy paths well (Ack/Retry/PermanentFail, capacity eviction, retention purge, ctor validation) but leaves critical paths untested: (a) no test exercises a corrupt / `null`-deserializing `PayloadJson` row, so the `rowIds`/`events` misalignment bug (Core.AlarmHistorian-001) was not caught; (b) no test for `StartDrainLoop` actually running on the timer, nor for the backoff being honored by the schedule (Core.AlarmHistorian-002); (c) no concurrency test running `EnqueueAsync` and `DrainOnceAsync` in parallel, which is the exact scenario that triggers `SQLITE_BUSY` (Core.AlarmHistorian-004); (d) no test for the `outcomes.Count != events.Count` cardinality-mismatch branch (Core.AlarmHistorian-007).
+
+**Recommendation:** Add tests for: a corrupt payload row (insert raw bad JSON via a direct SQLite write, then drain and assert the correct row is dead-lettered and others are unaffected); a `FakeWriter` returning a wrong-length outcome list; a parallel enqueue/drain stress test; and the timer-driven `StartDrainLoop` path.
+
+**Resolution:** Resolved 2026-05-22 — (a) `Drain_with_corrupt_payload_row_deadletters_it_and_keeps_good_rows_aligned` and `Drain_with_corrupt_head_row_does_not_stall_queue` cover corrupt payloads; (b) `StartDrainLoop_honors_backoff_and_slows_cadence_under_retry`, `StartDrainLoop_keeps_steady_cadence_when_writer_is_healthy`, and `StartDrainLoop_records_drain_fault_and_keeps_running` cover the timer-driven path; (c) `Concurrent_enqueue_and_drain_do_not_throw_sqlite_busy` covers the concurrent stress path; (d) `Writer_returning_wrong_cardinality_outcomes_sets_backing_off_and_keeps_rows` covers the cardinality-mismatch branch. Additionally `Capacity_eviction_increments_evicted_count_on_status` and `GetStatus_snapshot_is_consistent_under_concurrent_drain` cover -009 and -005 respectively.
+
+### Core.AlarmHistorian-011
+
+| Field | Value |
+|---|---|
+| Severity | Low |
+| Category | Documentation & comments |
+| Location | `src/Core/ZB.MOM.WW.OtOpcUa.Core.AlarmHistorian/IAlarmHistorianSink.cs:5-9,76`, `AlarmHistorianEvent.cs:20` |
+| Status | Open |
+
+**Description:** Several doc-comments reference the retired v1 architecture. The `IAlarmHistorianSink` summary says ingestion "routes through Galaxy.Host's pipe" and `IAlarmHistorianWriter` says "Stream G wires this to the Galaxy.Host IPC client", but `docs/AlarmTracking.md` and `CLAUDE.md` state the legacy `Galaxy.Host` project was retired in PR 7.2 and the write path is now the Wonderware historian sidecar (`WonderwareHistorianClient`). `AlarmHistorianEvent.cs:20` likewise says "the Galaxy.Host handler maps to the historian's enum on the wire." These stale references will mislead a reader about where the writer is actually hosted.
+
+**Recommendation:** Update the doc-comments to refer to the Wonderware historian sidecar / `WonderwareHistorianClient` (`IAlarmHistorianWriter` implementation) instead of `Galaxy.Host`, consistent with `docs/AlarmTracking.md`'s "Historian write-back" section.
+
+**Resolution:** _(open)_
@@ -0,0 +1,210 @@
+# Code Review — Core.ScriptedAlarms
+
+| Field | Value |
+|---|---|
+| Module | `src/Core/ZB.MOM.WW.OtOpcUa.Core.ScriptedAlarms` |
+| Reviewer | Claude Code |
+| Review date | 2026-05-22 |
+| Commit reviewed | `76d35d1` |
+| Status | Reviewed |
+| Open findings | 6 |
+
+## Checklist coverage
+
+A comprehensive review completes every category, recording "No issues found" where
+a category produced nothing rather than leaving it blank.
+
+| # | Category | Result |
+|---|---|---|
+| 1 | Correctness & logic bugs | Core.ScriptedAlarms-002 |
+| 2 | OtOpcUa conventions | No issues found |
+| 3 | Concurrency & thread safety | Core.ScriptedAlarms-001, Core.ScriptedAlarms-004, Core.ScriptedAlarms-005, Core.ScriptedAlarms-006 |
+| 4 | Error handling & resilience | Core.ScriptedAlarms-007 |
+| 5 | Security | No issues found |
+| 6 | Performance & resource management | Core.ScriptedAlarms-008, Core.ScriptedAlarms-009 |
+| 7 | Design-document adherence | Core.ScriptedAlarms-010 |
+| 8 | Code organization & conventions | Core.ScriptedAlarms-011 |
+| 9 | Testing coverage | Core.ScriptedAlarms-012 |
+| 10 | Documentation & comments | Core.ScriptedAlarms-003 |
+
+## Findings
+
+### Core.ScriptedAlarms-001
+
+| Field | Value |
+|---|---|
+| Severity | High |
+| Category | Concurrency & thread safety |
+| Location | `ScriptedAlarmEngine.cs:175`, `ScriptedAlarmEngine.cs:178`, `ScriptedAlarmEngine.cs:73`, `ScriptedAlarmEngine.cs:368` |
+| Status | Resolved |
+
+**Description:** `_alarms` is a plain `Dictionary<string, AlarmState>` (line 42). Every mutation of it (`LoadAsync`, `ApplyAsync`, `ReevaluateAsync`, `ShelvingCheckAsync`) correctly happens under the `_evalGate` semaphore, but four read paths touch it with no synchronisation: `GetState` (line 175), `GetAllStates` (line 178-179), the `LoadedAlarmIds` property (line 73), and `RunShelvingCheck` (line 368, `_alarms.Keys.ToArray()`). `RunShelvingCheck` fires from a `Timer` thread-pool callback and can run concurrently with an `ApplyAsync`/`ReevaluateAsync` that is reassigning a dictionary entry. `Dictionary` is not safe for concurrent read while another thread writes — even a value reassignment can be observed mid-rehash and throw `InvalidOperationException` or return torn state. `GetState`/`GetAllStates` are documented as being used by the Admin UI status page, so these reads come from arbitrary request threads.
+
+**Recommendation:** Either switch `_alarms` to `ConcurrentDictionary<string, AlarmState>` (entry reassignment via `_alarms[id] = ...` is already the only write shape, which a `ConcurrentDictionary` supports atomically), or acquire `_evalGate` in every reader. A `ConcurrentDictionary` is the lighter change and matches `_valueCache`, which is already concurrent.
+
+**Resolution:** Resolved 2026-05-22 — switched `_alarms` to `ConcurrentDictionary<string, AlarmState>` so the four unguarded read paths are safe against concurrent under-gate entry reassignment; added a concurrency regression test.
+
+### Core.ScriptedAlarms-002
+
+| Field | Value |
+|---|---|
+| Severity | Medium |
+| Category | Correctness & logic bugs |
+| Location | `ScriptedAlarmEngine.cs:162`, `ScriptedAlarmEngine.cs:90` |
+| Status | Resolved |
+
+**Description:** `LoadAsync` is written to be re-callable — it begins by calling `UnsubscribeFromUpstream()`, `_alarms.Clear()`, and `_alarmsReferencing.Clear()` (lines 90-92), which only makes sense if a reload is supported. But at line 162 it unconditionally assigns `_shelvingTimer = new Timer(...)` without disposing the timer created by a previous `LoadAsync` call. A second `LoadAsync` therefore leaks the old `Timer` and leaves two timers running concurrently against the same `_alarms`/`_evalGate`. The old timer's `RunShelvingCheck` keeps firing forever.
+
+**Recommendation:** Dispose any existing `_shelvingTimer` before reassigning it, e.g. `_shelvingTimer?.Dispose();` immediately before line 162, inside the `_evalGate` critical section. If reload is genuinely not supported, instead guard `LoadAsync` against a second call and document it as one-shot.
+
+**Resolution:** Resolved 2026-05-22 — added `_shelvingTimer?.Dispose()` before the timer reassignment in `LoadAsync` so a second load call does not leak the previous timer.
+
+### Core.ScriptedAlarms-003
+
+| Field | Value |
+|---|---|
+| Severity | Low |
+| Category | Documentation & comments |
+| Location | `ScriptedAlarmEngine.cs:343`, `docs/ScriptedAlarms.md:107` |
+| Status | Open |
+
+**Description:** `docs/ScriptedAlarms.md` (Composition step 3) and the `OnUpstreamChange` comment ("Fire-and-forget so driver-side dispatch isn't blocked", line 225-226) describe the `OnEvent` emission path as non-blocking / fire-and-forget. In the code, `EmitEvent` invokes `OnEvent?.Invoke(this, evt)` **synchronously while `_evalGate` is held** (called from `EvaluatePredicateToStateAsync` line 305 and `ApplyAsync` line 217, both inside the gate). A slow subscriber blocks the single evaluation gate for all alarms; a subscriber that re-enters the engine (e.g. calls `AcknowledgeAsync`) deadlocks because `_evalGate` is a non-reentrant `SemaphoreSlim(1,1)`. The behaviour is defensible (the historian sink is non-blocking, per the doc), but the comments/doc are misleading about where the work happens and the re-entrancy hazard is undocumented.
+
+**Recommendation:** Either move `EmitEvent` outside the `_evalGate` critical section (collect emissions during the locked section and raise them after `Release()`), or document explicitly on `OnEvent` that handlers run under the engine lock, must be fast, and must never call back into the engine.
+
+**Resolution:** _(open)_
+
+### Core.ScriptedAlarms-004
+
+| Field | Value |
+|---|---|
+| Severity | Medium |
+| Category | Concurrency & thread safety |
+| Location | `ScriptedAlarmEngine.cs:138-143`, `ScriptedAlarmEngine.cs:227-234` |
+| Status | Resolved |
+
+**Description:** During `LoadAsync`, `_upstream.SubscribeTag(path, OnUpstreamChange)` is called inside the `_evalGate` critical section (line 142). If an upstream implementation delivers an initial value synchronously from inside `SubscribeTag` (a common pattern, and the `ITagUpstreamSource` contract does not forbid it), the observer callback `OnUpstreamChange` runs on the calling thread, schedules `ReevaluateAsync`, which calls `_evalGate.WaitAsync`. That does not deadlock (the reevaluation task simply blocks until `LoadAsync` releases the gate), but it can cause a re-evaluation to run against a half-initialised `_alarms`/index, and the value written to `_valueCache` on line 141 may be immediately overwritten by the subscription's synchronous push with no defined ordering. The cold-start guard partly masks this, but the ordering between the seed read (line 141) and the subscription push is unspecified and may seed a stale value.
+
+**Recommendation:** Subscribe to all upstream tags after the seed reads and after `_loaded = true`, or capture the subscription's first push into the cache and treat `SubscribeTag` as the single source of truth (drop the separate `ReadTag` seed). Document the expected `ITagUpstreamSource` delivery semantics (does `SubscribeTag` push an initial value?).
+
+**Resolution:** Resolved 2026-05-22 — split the seed/subscribe loop: `ReadTag` seeds `_valueCache`, persisted-state restore runs, `_loaded = true` is set, then `SubscribeTag` is called; any synchronous initial push now arrives after `_alarms` is fully initialised and correctly queues behind the gate.
+
+### Core.ScriptedAlarms-005
+
+| Field | Value |
+|---|---|
+| Severity | Medium |
+| Category | Concurrency & thread safety |
+| Location | `ScriptedAlarmEngine.cs:365-369`, `ScriptedAlarmEngine.cs:416-424` |
+| Status | Resolved |
+
+**Description:** `Dispose` sets `_disposed = true`, disposes `_shelvingTimer`, and clears `_alarms`. A `RunShelvingCheck` callback already in flight on a thread-pool thread can have passed its `if (_disposed) return;` check (line 367) before `Dispose` ran, then proceed into `ShelvingCheckAsync`, which awaits `_evalGate` and mutates `_alarms` — concurrently with `Dispose`'s `_alarms.Clear()` at line 422 (which runs outside `_evalGate`). `Timer.Dispose()` does not wait for the running callback to finish. The result is a possible `InvalidOperationException` from a dictionary mutated during enumeration, or a save of stale state to the store after dispose. The same applies to a `ReevaluateAsync` in flight from a late upstream push.
+
+**Recommendation:** Use `Timer.Dispose(WaitHandle)` (or `DisposeAsync`) to wait for the callback to drain, and perform `_alarms.Clear()` under `_evalGate` (or simply drop the clear — the object is being discarded). Also have `ShelvingCheckAsync`/`ReevaluateAsync` re-check `_disposed` after acquiring the gate before mutating/saving.
+
+**Resolution:** Resolved 2026-05-22 — added `_disposed` re-checks in `ReevaluateAsync` and `ShelvingCheckAsync` after acquiring `_evalGate` so late callbacks bail out cleanly; dropped the unsynchronised `_alarms.Clear()` from `Dispose` since the object is being discarded and the clear raced concurrent reads.
+
+### Core.ScriptedAlarms-006
+
+| Field | Value |
+|---|---|
+| Severity | Low |
+| Category | Concurrency & thread safety |
+| Location | `ScriptedAlarmEngine.cs:232`, `ScriptedAlarmEngine.cs:369` |
+| Status | Open |
+
+**Description:** `OnUpstreamChange` and `RunShelvingCheck` both launch fire-and-forget tasks (`_ = ReevaluateAsync(...)`, `_ = ShelvingCheckAsync(...)`) with `CancellationToken.None`. There is no tracking of these in-flight tasks, so `Dispose` cannot await them and a server shutdown can race a still-running re-evaluation that writes to the (possibly disposed) store. Combined with finding 005, an upstream push arriving during shutdown produces an unobserved background task touching torn state.
+
+**Recommendation:** Track outstanding background tasks (or use a single serialised worker / `Channel`), and link them to a `CancellationTokenSource` that `Dispose` cancels and drains. At minimum, await the in-flight work in `Dispose`.
+
+**Resolution:** _(open)_
+
+### Core.ScriptedAlarms-007
+
+| Field | Value |
+|---|---|
+| Severity | Medium |
+| Category | Error handling & resilience |
+| Location | `ScriptedAlarmEngine.cs:216`, `ScriptedAlarmEngine.cs:251`, `ScriptedAlarmEngine.cs:154`, `ScriptedAlarmEngine.cs:387` |
+| Status | Resolved |
+
+**Description:** Every state mutation calls `await _store.SaveAsync(...)` and relies on it succeeding. If the production SQL-backed `IAlarmStateStore` (Stream E) throws — transient SQL outage, deadlock, timeout — the exception propagates: in `ApplyAsync` it surfaces to the Part 9 method caller *after* the in-memory `_alarms` entry was already updated (line 215 runs before the save on line 216), leaving the in-memory state and the persisted state divergent; in `ReevaluateAsync`/`ShelvingCheckAsync` it is caught and logged, but again the in-memory `_alarms` entry was already advanced (lines 250/386) so the persisted store silently falls behind the live state. After a restart, startup recovery reloads the stale persisted state and operators can see a re-raised or re-ackable alarm. The docs claim "the store's view is always consistent with the in-memory state" (`docs/ScriptedAlarms.md` State persistence) — that invariant is not actually enforced.
+
+**Recommendation:** Save before committing the in-memory update, or roll back the in-memory entry if `SaveAsync` fails, so the two never diverge. Classify transient store failures and retry, and surface a hard error/health-degraded signal if persistence is permanently failing rather than silently logging and continuing.
+
+**Resolution:** Resolved 2026-05-22 — reordered `SaveAsync`/`_alarms[id]=` in `ApplyAsync`, `ReevaluateAsync`, and `ShelvingCheckAsync` so persistence happens before the in-memory update; a store failure now leaves both views at the prior state rather than diverging.
+
+### Core.ScriptedAlarms-008
+
+| Field | Value |
+|---|---|
+| Severity | Low |
+| Category | Performance & resource management |
+| Location | `Part9StateMachine.cs:261-268` |
+| Status | Open |
+
+**Description:** `AppendComment` copies the entire existing comment list into a new `List` on every audit-producing transition (ack, confirm, shelve, unshelve, enable, disable, add-comment, auto-unshelve). The `Comments` list is append-only and unbounded — for a long-lived alarm that is acknowledged/commented hundreds of times, every transition is an O(n) copy and the full history is also re-serialised to the store on every `SaveAsync`. Over a multi-month uptime this is a slowly growing per-transition cost.
+
+**Recommendation:** Acceptable for now given audit requirements, but consider an immutable persistent list / `ImmutableList<AlarmComment>` to make append O(log n), or have the store persist comments incrementally (append-only audit table) rather than rewriting the whole collection each save. At minimum, note the unbounded-growth characteristic in the design doc.
+
+**Resolution:** _(open)_
+
+### Core.ScriptedAlarms-009
+
+| Field | Value |
+|---|---|
+| Severity | Low |
+| Category | Performance & resource management |
+| Location | `ScriptedAlarmEngine.cs:309-315`, `ScriptedAlarmEngine.cs:271` |
+| Status | Open |
+
+**Description:** `BuildReadCache` allocates a fresh `Dictionary<string, DataValueSnapshot>` on every predicate evaluation, i.e. on every upstream tag change for every referencing alarm. On a busy line where many tags feeding many alarms change frequently, this is a steady stream of short-lived dictionary allocations on the hot path. `AlarmPredicateContext` is also newly constructed each evaluation (line 281).
+
+**Recommendation:** Minor. If the evaluation path shows up in allocation profiling, the read cache could be a reused per-alarm buffer cleared between evaluations (evaluations are already serialised under `_evalGate`, so a single shared scratch dictionary is safe). Not worth doing speculatively — flag for the perf surface in `docs/v2/Galaxy.Performance.md` if alarm evaluation is ever soak-tested.
+
+**Resolution:** _(open)_
+
+### Core.ScriptedAlarms-010
+
+| Field | Value |
+|---|---|
+| Severity | Low |
+| Category | Design-document adherence |
+| Location | `ScriptedAlarmEngine.cs:325-336`, `AlarmPredicateContext.cs:33-40`, `MessageTemplate.cs:47` |
+| Status | Open |
+
+**Description:** Quality handling is inconsistent across the three places that inspect a `DataValueSnapshot.StatusCode`. `AreInputsReady` (engine, line 333) treats only outright Bad (bit 31) as not-ready, so an Uncertain-quality input is fed to the predicate. `MessageTemplate.Resolve` (line 47) rejects *any* non-zero status code — including Uncertain — and renders `{?}`. `AlarmPredicateContext.GetTag` returns `BadNodeIdUnknown` (`0x80340000`) for a missing path. The net effect: an Uncertain-quality tag is considered good enough to drive an alarm *activation* decision but not good enough to print in the alarm *message*. `docs/ScriptedAlarms.md` ("Fallback rules") only documents the message-template behaviour and does not mention that predicate evaluation accepts Uncertain. The two policies should be reconciled and documented.
+
+**Recommendation:** Decide one quality policy for "is this input usable" and apply it in both `AreInputsReady` and the message resolver, or explicitly document why predicate evaluation and message rendering treat Uncertain differently. Add the predicate-side Uncertain rule to `docs/ScriptedAlarms.md`.
+
+**Resolution:** _(open)_
+
+### Core.ScriptedAlarms-011
+
+| Field | Value |
+|---|---|
+| Severity | Low |
+| Category | Code organization & conventions |
+| Location | `Part9StateMachine.cs:275` |
+| Status | Open |
+
+**Description:** `TransitionResult.NoOp(state, reason)` takes a `reason` string parameter that is documented in the calling code as a diagnostic ("disabled — predicate result ignored", "already acknowledged", etc.) but the factory method silently discards it — it just returns `new(state, EmissionKind.None)`, identical to `None(state)`. Every call site that passes a carefully-worded reason string is doing dead work, and the comments in `Part9StateMachine` and the class-level remarks claim disabled/no-op transitions "produce ... a diagnostic log line", which they do not.
+
+**Recommendation:** Either propagate the reason (add it to `TransitionResult` and have the engine log it at debug level when emission is `None` for a no-op), or remove the unused `reason` parameter and collapse `NoOp` into `None`. Update the `Part9StateMachine` remarks that promise a diagnostic log line.
+
+**Resolution:** _(open)_
+
+### Core.ScriptedAlarms-012
+
+| Field | Value |
+|---|---|
+| Severity | Medium |
+| Category | Testing coverage |
+| Location | `tests/Core/ZB.MOM.WW.OtOpcUa.Core.ScriptedAlarms.Tests/ScriptedAlarmEngineTests.cs` |
+| Status | Resolved |
+
+**Description:** Several engine behaviours central to the module have no test coverage: (1) the 5-second shelving timer / timed-shelve auto-expiry through the *engine* — only the pure `Part9StateMachine.ApplyShelvingCheck` is tested, never `ScriptedAlarmEngine` driving the timer with an injectable clock; (2) `ConfirmAsync`, `TimedShelveAsync`, `UnshelveAsync`, `EnableAsync` engine methods (only `Acknowledge`, `OneShotShelve`, `Disable`, `AddComment` are exercised); (3) `OnEvent` subscriber-throws isolation (`EmitEvent` catch on line 357); (4) `IAlarmStateStore.SaveAsync` failure handling (finding 007); (5) re-entrant `LoadAsync` and the timer leak (finding 002); (6) the cold-start `AreInputsReady` guard with Bad / null / Uncertain inputs. The `clock` and `scriptTimeout` constructor parameters exist specifically to make timer/timeout tests deterministic but no test uses them.
+
+**Recommendation:** Add engine-level tests that inject a controllable `Func<DateTime>` clock to drive `RunShelvingCheck`, cover the remaining Part 9 engine methods end-to-end, assert subscriber-exception isolation, and add a store-failure fake to lock in the chosen persistence-failure semantics from finding 007.
+
+**Resolution:** Resolved 2026-05-22 — added 8 new engine-level tests covering all 6 gap areas: injectable-clock timed-shelve expiry via `RunShelvingCheckForTest`, `ConfirmAsync`/`TimedShelveAsync`/`UnshelveAsync`/`EnableAsync` end-to-end, subscriber-exception isolation, store-failure invariant, second-`LoadAsync` timer-leak regression, and `AreInputsReady` Bad/Uncertain guard; exposed `RunShelvingCheckForTest()` internal hook on the engine.
@@ -0,0 +1,338 @@
+# Code Review — Core.Scripting
+
+| Field | Value |
+|---|---|
+| Module | `src/Core/ZB.MOM.WW.OtOpcUa.Core.Scripting` |
+| Reviewer | Claude Code |
+| Review date | 2026-05-22 |
+| Commit reviewed | `76d35d1` |
+| Status | Reviewed |
+| Open findings | 5 |
+
+## Checklist coverage
+
+A comprehensive review completes every category, recording "No issues found" where
+a category produced nothing rather than leaving it blank.
+
+| # | Category | Result |
+|---|---|---|
+| 1 | Correctness & logic bugs | Core.Scripting-004, Core.Scripting-005 |
+| 2 | OtOpcUa conventions | No issues found |
+| 3 | Concurrency & thread safety | Core.Scripting-006 |
+| 4 | Error handling & resilience | Core.Scripting-007 |
+| 5 | Security | Core.Scripting-001, Core.Scripting-002, Core.Scripting-003 |
+| 6 | Performance & resource management | Core.Scripting-008 |
+| 7 | Design-document adherence | Core.Scripting-009 |
+| 8 | Code organization & conventions | No issues found |
+| 9 | Testing coverage | Core.Scripting-010, Core.Scripting-011 |
+| 10 | Documentation & comments | No issues found |
+
+## Findings
+
+### Core.Scripting-001
+
+| Field | Value |
+|---|---|
+| Severity | Critical |
+| Category | Security |
+| Location | `ForbiddenTypeAnalyzer.cs:45`, `ScriptSandbox.cs:54` |
+| Status | Resolved |
+
+**Description:** `System.Environment` lives in the allowed `System` namespace (it is in
+`System.Private.CoreLib`, which is allow-listed for primitives) and is not on the
+forbidden-namespace deny-list. Nothing prevents an operator-authored script from calling
+`System.Environment.Exit(0)` or `System.Environment.FailFast("...")`. Both terminate the
+host process immediately. Because scripted-alarm predicates and virtual-tag scripts run
+in-process in the main OPC UA server (decision: "Scripting engine runs in the main .NET 10
+server process"), a single malicious or buggy predicate brings down the entire server —
+an outage affecting every connected client and every driver. `ScriptSandboxTests` only
+pins the *read* path (`Environment.GetEnvironmentVariable`) as an accepted compromise; the
+process-killing members are not considered. The whole-process kill far exceeds the
+"read-only process state" justification the test comments rely on.
+
+**Recommendation:** The forbidden surface must be member-granular, not namespace-granular,
+for types in allowed namespaces. Add an explicit forbidden-member deny-list to
+`ForbiddenTypeAnalyzer` covering at minimum `System.Environment.Exit`,
+`System.Environment.FailFast`, `System.AppDomain`, `System.GC` (e.g. `GC.Collect`,
+`GC.AddMemoryPressure`), and `System.Activator.CreateInstance` (a reflection-equivalent
+escape). Reject these in `CheckSymbol` by resolved method symbol, with a test for each.
+
+**Resolution:** Resolved 2026-05-22 — added a type-granular `ForbiddenFullTypeNames`
+deny-list (`System.Environment`, `System.AppDomain`, `System.GC`, `System.Activator`) to
+`ForbiddenTypeAnalyzer`; `CheckSymbol` now rejects any resolved type symbol whose
+fully-qualified name matches, alongside the existing namespace-prefix check, so dangerous
+`System`-namespace process-control types are blocked at compile while legitimate `System`
+types (Math, String, …) stay usable. Regression tests added in `ScriptSandboxTests` for
+`Environment.Exit` / `Environment.FailFast` / `Environment.GetEnvironmentVariable` /
+`AppDomain` / `GC.Collect` / `Activator.CreateInstance`.
+
+### Core.Scripting-002
+
+| Field | Value |
+|---|---|
+| Severity | High |
+| Category | Security |
+| Location | `ForbiddenTypeAnalyzer.cs:70` |
+| Status | Resolved |
+
+**Description:** The syntax walker only inspects four node kinds:
+`ObjectCreationExpressionSyntax`, `InvocationExpressionSyntax` with a member-access target,
+`MemberAccessExpressionSyntax`, and bare `IdentifierNameSyntax`. It never visits
+`TypeOfExpressionSyntax`, generic type-argument lists (`GenericNameSyntax` /
+`TypeArgumentListSyntax`), cast expressions (`CastExpressionSyntax`), `is`/`as` type
+patterns, `default(T)` expressions, array-creation element types, or `using`/local
+declared types. A script such as `typeof(System.IO.File)`,
+`new System.Collections.Generic.List<System.IO.FileInfo>()`,
+`(System.IO.Stream)null`, or `default(System.Reflection.Assembly)` references a forbidden
+type without ever producing a node the walker examines, so the forbidden-type check is
+bypassed. The Phase 7 plan A.6 explicitly calls out `typeof` as a sandbox-escape attempt
+that "must fail at compile" — it currently does not.
+
+**Recommendation:** Walk every `TypeSyntax` node (handle `TypeOfExpressionSyntax`,
+`CastExpressionSyntax`, generic argument lists, and the type operand of
+`IsPatternExpression` / binary `as`). The simplest robust fix is to enumerate all
+`DescendantNodes()` and, for any node, resolve both `GetSymbolInfo` and `GetTypeInfo`,
+then check the resolved type plus every type argument. Add tests covering `typeof`,
+generic arguments, casts, and `default(T)` with forbidden types.
+
+**Resolution:** Resolved 2026-05-22 — `ForbiddenTypeAnalyzer.Analyze` now runs a second
+pass that resolves `GetTypeInfo` on every `TypeSyntax` node and recursively unwraps array
+element types and generic type arguments, so forbidden types named via `typeof`, generic
+arguments (`List<FileInfo>`), casts, `is`/`as` patterns, `default(T)`, array-creation
+element types, and explicitly-typed local declarations are all rejected at compile. The
+original member/call node-kind switch is kept (deliberately narrow to avoid flagging
+inherited members such as `typeof(int).Name`), and a span+type dedupe prevents duplicate
+rejections from the two passes. Regression tests added in `ScriptSandboxTests` for each
+node form plus over-block guards for allowed generics/`typeof`.
+
+### Core.Scripting-003
+
+| Field | Value |
+|---|---|
+| Severity | Medium |
+| Category | Security |
+| Location | `TimedScriptEvaluator.cs:9`, `ScriptSandbox.cs:30` |
+| Status | Resolved |
+
+**Description:** There is no bound on memory a script may allocate or on the number of
+threads/tasks a script may spawn. The class docs acknowledge unbounded memory as "a budget
+concern" deferred to v3, but in-process execution means a script doing
+`new byte[int.MaxValue]` repeatedly (or `Enumerable.Range(0,int.MaxValue).ToList()` — LINQ
+is allow-listed) can drive the whole server to `OutOfMemoryException`, an outage. The
+timeout does not help: the allocation can exhaust memory well before 250ms elapses, and
+the orphaned thread-pool thread documented in `TimedScriptEvaluator` keeps the allocation
+rooted. `System.Threading.Tasks` is not on the deny-list, so a script can also
+`Task.Run` an unbounded fan-out of background work that outlives the timeout entirely.
+
+**Recommendation:** At minimum, document this as a known accepted risk in
+`docs/ScriptedAlarms.md` / `docs/VirtualTags.md` rather than only in a code comment, and
+add the `Task`/`Parallel` namespaces to the forbidden list (scripts are synchronous
+predicates — they have no legitimate need to start background tasks). For memory, gate
+script authoring behind an Admin permission and treat the test-harness preview as the
+control point, or track an explicit v3 issue for out-of-process execution. Record the
+decision so it is not silently lost.
+
+**Resolution:** Resolved 2026-05-22 — added `System.Threading.Tasks` to `ForbiddenNamespacePrefixes` (blocking `Task.Run` / `Parallel` fan-out); documented the unbounded-memory accepted risk and the `Task` denial rationale in `docs/VirtualTags.md` (new "Known resource limits" subsection) and cross-referenced from `docs/ScriptedAlarms.md`.
+
+### Core.Scripting-004
+
+| Field | Value |
+|---|---|
+| Severity | Medium |
+| Category | Correctness & logic bugs |
+| Location | `DependencyExtractor.cs:73` |
+| Status | Resolved |
+
+**Description:** The walker matches tag-access calls purely by spelling — any
+`InvocationExpressionSyntax` whose member name is `GetTag` or `SetVirtualTag` is treated as
+a `ScriptContext` tag access, regardless of the receiver. A script that defines a local
+type with a `GetTag(string)` method and calls `other.GetTag("X")`, or calls
+`this.GetTag(...)` on a script-defined helper, has spurious dependencies harvested (or, if
+the literal arg is non-literal, spurious rejections raised). The XML remarks claim "as long
+as it's not on the ctx instance, the extractor doesn't pick it up", but the code does not
+check that the receiver is the `ctx` identifier — it accepts any member access with the
+matching name. The `DependencyExtractorTests.Ignores_non_ctx_method_named_GetTag` test
+passes only because the helper there is a *free* function (not member-access form); a
+member-access call to a non-ctx `GetTag` is untested and would be misattributed.
+
+**Recommendation:** In `VisitInvocationExpression`, additionally require that
+`member.Expression` is an `IdentifierNameSyntax` with `Identifier.ValueText == "ctx"`
+(matching the `ScriptGlobals<TContext>.ctx` field name). Add a test for
+`someOtherObject.GetTag("X")` asserting it is ignored.
+
+**Resolution:** Resolved 2026-05-22 — `VisitInvocationExpression` now additionally checks that `member.Expression` is an `IdentifierNameSyntax` with `ValueText == "ctx"` before treating the call as a dependency; test `Ignores_member_access_GetTag_on_non_ctx_receiver` added to `DependencyExtractorTests`.
+
+### Core.Scripting-005
+
+| Field | Value |
+|---|---|
+| Severity | Low |
+| Category | Correctness & logic bugs |
+| Location | `DependencyExtractor.cs:97` |
+| Status | Open |
+
+**Description:** A raw string literal token passed as the tag path (a raw triple-quote
+literal) tokenizes as `SingleLineRawStringLiteralToken` /
+`MultiLineRawStringLiteralToken`, not `StringLiteralToken`. The check
+`literal.Token.IsKind(SyntaxKind.StringLiteralToken)` therefore rejects an
+otherwise-static raw-string path as a non-literal "dynamic path", producing a misleading
+rejection message. This is an edge case (operators rarely write raw strings for tag
+paths) but the error text would confuse anyone who does.
+
+**Recommendation:** Accept all string-literal token kinds — check
+`literal.IsKind(SyntaxKind.StringLiteralExpression)` on the expression node, or include
+the raw-string token kinds, so a static raw string is harvested rather than rejected.
+
+**Resolution:** _(open)_
+
+### Core.Scripting-006
+
+| Field | Value |
+|---|---|
+| Severity | Low |
+| Category | Concurrency & thread safety |
+| Location | `CompiledScriptCache.cs:55` |
+| Status | Open |
+
+**Description:** On a failed compile the `catch` block calls
+`_cache.TryRemove(key, out _)` without a value comparison. If two threads race a miss for
+the same bad source, both observe the same faulted `Lazy` and throw, and both call
+`TryRemove(key)`. If a concurrent retry re-adds a new `Lazy` for that key between the two
+removals, the second unconditional `TryRemove` could evict the in-flight retry entry. The
+window is small and the consequence is only a redundant recompile, so severity is Low —
+but the removal should be key+value scoped for correctness.
+
+**Recommendation:** Use the `ConcurrentDictionary.TryRemove(KeyValuePair<,>)` overload to
+remove only the specific faulted `Lazy` instance, so a concurrently re-added entry is not
+evicted.
+
+**Resolution:** _(open)_
+
+### Core.Scripting-007
+
+| Field | Value |
+|---|---|
+| Severity | Medium |
+| Category | Error handling & resilience |
+| Location | `TimedScriptEvaluator.cs:60` |
+| Status | Resolved |
+
+**Description:** `RunAsync` wraps the inner run in `Task.Run(...)` and then awaits
+`WaitAsync(Timeout, ct)`. If the caller-supplied `ct` cancels at roughly the same time the
+timeout elapses, the order in which `WaitAsync` observes the timeout vs. the cancellation
+is non-deterministic, so the same shutdown can sometimes surface as
+`ScriptTimeoutException` and sometimes as `OperationCanceledException`. The class docs
+assert "the caller's cancel wins" as a hard guarantee that the virtual-tag engine shutdown
+path depends on to avoid misclassifying shutdown as a script fault — but the
+implementation does not guarantee it when both fire close together.
+
+**Recommendation:** After catching `TimeoutException`, check `ct.IsCancellationRequested`
+and throw `OperationCanceledException(ct)` instead of `ScriptTimeoutException` when the
+caller's token is cancelled, so caller cancellation deterministically wins regardless of
+race ordering.
+
+**Resolution:** Resolved 2026-05-22 — in the `catch (TimeoutException)` handler, `ct.IsCancellationRequested` is now checked and `OperationCanceledException(ct)` thrown before `ScriptTimeoutException`, so caller cancellation deterministically wins regardless of race ordering; regression test `Caller_cancellation_wins_even_when_timeout_fires_first` added to `TimedScriptEvaluatorTests`.
+
+### Core.Scripting-008
+
+| Field | Value |
+|---|---|
+| Severity | Low |
+| Category | Performance & resource management |
+| Location | `CompiledScriptCache.cs:34`, `ScriptEvaluator.cs:34` |
+| Status | Open |
+
+**Description:** `CompiledScriptCache` has no capacity bound (acknowledged in the class
+remarks) and no eviction. Each cached `ScriptEvaluator` holds a Roslyn `ScriptRunner<T>`
+delegate, which keeps the dynamically emitted script assembly loaded for the process
+lifetime — emitted assemblies in the default `AssemblyLoadContext` cannot be unloaded.
+`Clear()` drops the dictionary entries but does **not** unload the emitted assemblies;
+they leak. Across many config-generation publishes (each `Clear()` followed by recompiling
+every script), the process accumulates dead script assemblies. For the expected "low
+thousands" of scripts this is benign, but a long-running server with frequent publishes
+will see steady managed-memory growth that never returns.
+
+**Recommendation:** Document the per-publish assembly accretion as a known limitation, or
+compile scripts into a collectible `AssemblyLoadContext` so `Clear()` can unload prior
+generations. At minimum add a note to `docs/ScriptedAlarms.md` so operators with
+high-publish-frequency deployments are aware.
+
+**Resolution:** _(open)_
+
+### Core.Scripting-009
+
+| Field | Value |
+|---|---|
+| Severity | Low |
+| Category | Design-document adherence |
+| Location | `ForbiddenTypeAnalyzer.cs:45` |
+| Status | Open |
+
+**Description:** The Phase 7 plan decision #6
+(`docs/v2/implementation/phase-7-scripting-and-alarming.md`) enumerates the forbidden
+surface as "No HttpClient / File / Process / reflection". `ForbiddenTypeAnalyzer` actually
+denies a broader set — `System.Threading.Thread`, `System.Runtime.InteropServices`, and
+`Microsoft.Win32` (registry) — which is sensible hardening but is undocumented in the plan
+and in `docs/ScriptedAlarms.md` (which defers sandbox rules to `VirtualTags.md`). An
+operator reading the design docs cannot predict that a registry or interop reference will
+be rejected. Conversely the plan does not record the `System.Environment` /
+`System.Diagnostics` decisions. The code and the design document have drifted.
+
+**Recommendation:** Update the plan's decision #6 (or `docs/VirtualTags.md`) to list the
+authoritative deny-list exactly as `ForbiddenTypeAnalyzer.ForbiddenNamespacePrefixes`
+defines it, including the `System.Environment` allowed-compromise, so the docs match the
+code.
+
+**Resolution:** _(open)_
+
+### Core.Scripting-010
+
+| Field | Value |
+|---|---|
+| Severity | Medium |
+| Category | Testing coverage |
+| Location | `tests/Core/ZB.MOM.WW.OtOpcUa.Core.Scripting.Tests/ScriptSandboxTests.cs:54` |
+| Status | Resolved |
+
+**Description:** The sandbox-escape test suite covers only the four obvious vectors
+(File / Http / Process / Reflection) as direct member-access calls. It does not test:
+`typeof(forbidden)`, generic type arguments (`List<FileInfo>`), cast expressions to
+forbidden types, `System.Environment.Exit` / `FailFast`, `System.Threading.Thread`,
+`System.Runtime.InteropServices`, `Microsoft.Win32` registry access, `Activator`, or
+`System.AppDomain`. Given that the analyzer is the sole security boundary for in-process
+untrusted-script execution, the gaps in Core.Scripting-001 and Core.Scripting-002 went
+undetected precisely because no test exercises those forms. The Phase 7 plan A.6 mandated
+"sandbox escape tests" but the implemented set is materially narrower than the threat
+surface.
+
+**Recommendation:** Add a parameterised escape-test covering every node form in
+Core.Scripting-002 and every forbidden namespace/member in Core.Scripting-001. Each must
+assert a `ScriptSandboxViolationException` (or `CompilationErrorException`) at compile.
+
+**Resolution:** Resolved 2026-05-22 — added `ScriptSandboxTests` cases for `System.Threading.Thread`, `System.Threading.Tasks.Task.Run`, `System.Runtime.InteropServices.Marshal`, and `Microsoft.Win32.Registry` (the four namespace-deny-list vectors that had no test); the 001/002 vectors (Environment.Exit/FailFast/AppDomain/GC/Activator, typeof, generics, cast, default(T), is/as, array element, declared variable) were already covered by the -001/-002 resolution commits. All 79 tests pass.
+
+### Core.Scripting-011
+
+| Field | Value |
+|---|---|
+| Severity | Low |
+| Category | Testing coverage |
+| Location | `tests/Core/ZB.MOM.WW.OtOpcUa.Core.Scripting.Tests/` |
+| Status | Open |
+
+**Description:** Two source files have no direct test coverage: `ScriptContext`
+(`Deadband` static helper is exercised only indirectly through `ScriptSandboxTests`, and
+not for its boundary `tolerance` behaviour) and `ScriptSandbox.Build` itself (the
+`ArgumentNullException` / `ArgumentException` guards on `contextType` at
+`ScriptSandbox.cs:45-48` are never asserted). `ScriptLogCompanionSink` and
+`ScriptLoggerFactory` have tests, but there is no test that a script's `ctx.Logger` Error
+emission surfaces via the companion sink end-to-end (factory + sink integration is
+untested). These are minor gaps but leave guard clauses and the logging integration
+unverified.
+
+**Recommendation:** Add unit tests for `ScriptSandbox.Build` argument validation, for
+`ScriptContext.Deadband` at and around the tolerance boundary, and an end-to-end test that
+a script logging at Error level produces both a `scripts-*.log` event and a companion
+Warning event.
+
+**Resolution:** _(open)_
@@ -0,0 +1,359 @@
+# Code Review - Core.VirtualTags
+
+| Field | Value |
+|---|---|
+| Module | `src/Core/ZB.MOM.WW.OtOpcUa.Core.VirtualTags` |
+| Reviewer | Claude Code |
+| Review date | 2026-05-22 |
+| Commit reviewed | `76d35d1` |
+| Status | Reviewed |
+| Open findings | 7 |
+
+## Checklist coverage
+
+A comprehensive review completes every category, recording "No issues found" where
+a category produced nothing rather than leaving it blank.
+
+| # | Category | Result |
+|---|---|---|
+| 1 | Correctness & logic bugs | Core.VirtualTags-001, Core.VirtualTags-002, Core.VirtualTags-003, Core.VirtualTags-004 |
+| 2 | OtOpcUa conventions | No issues found |
+| 3 | Concurrency & thread safety | Core.VirtualTags-005, Core.VirtualTags-006 |
+| 4 | Error handling & resilience | Core.VirtualTags-007 |
+| 5 | Security | No issues found |
+| 6 | Performance & resource management | Core.VirtualTags-008, Core.VirtualTags-009 |
+| 7 | Design-document adherence | Core.VirtualTags-001, Core.VirtualTags-010 |
+| 8 | Code organization & conventions | Core.VirtualTags-011 |
+| 9 | Testing coverage | Core.VirtualTags-012 |
+| 10 | Documentation & comments | Core.VirtualTags-010, Core.VirtualTags-013 |
+
+## Findings
+
+### Core.VirtualTags-001
+
+| Field | Value |
+|---|---|
+| Severity | High |
+| Category | Correctness & logic bugs |
+| Location | `src/Core/ZB.MOM.WW.OtOpcUa.Core.VirtualTags/VirtualTagEngine.cs:306` |
+| Status | Resolved |
+
+**Description:** `OnScriptSetVirtualTag` updates `_valueCache`, notifies observers, and
+records history for the written path, but it does not schedule a cascade for tags that
+depend on the written path. `docs/VirtualTags.md` (VirtualTagContext section) explicitly
+states `SetVirtualTag(path, value)` "routes through the engine's `OnScriptSetVirtualTag`
+callback so cross-tag writes still participate in change-trigger cascades." They do not.
+A script that writes `ctx.SetVirtualTag("Target", x)` updates Target's cached value, but
+any virtual tag whose script reads Target via `ctx.GetTag("Target")` and is
+`ChangeTriggered = true` is never re-evaluated. Downstream virtual tags go stale until
+some unrelated trigger fires. The existing test
+`SetVirtualTag_within_script_updates_target_and_triggers_observers` only asserts the
+target itself updates and never exercises a tag depending on the target, so the gap is
+not caught.
+
+**Recommendation:** Either (a) launch a fire-and-forget `CascadeAsync(path, ...)` from
+`OnScriptSetVirtualTag` (note `EvaluateInternalAsync` acquires the non-reentrant
+`_evalGate`, so the cascade must be scheduled, not invoked inline while the gate is
+held), or (b) if cascading from a script write is intentionally unsupported, correct the
+documentation and `VirtualTagContext` XML doc to say so. Decide deliberately and make
+code and docs agree.
+
+**Resolution:** Resolved 2026-05-22 — `OnScriptSetVirtualTag` now launches a fire-and-forget `CascadeAsync(path, ...)` after updating the cache, mirroring `OnUpstreamChange`, so change-triggered dependents of a script-written tag are re-evaluated; added a regression test.
+
+### Core.VirtualTags-002
+
+| Field | Value |
+|---|---|
+| Severity | Medium |
+| Category | Correctness & logic bugs |
+| Location | `src/Core/ZB.MOM.WW.OtOpcUa.Core.VirtualTags/VirtualTagEngine.cs:237` |
+| Status | Resolved |
+
+**Description:** The cold-start guard `if (!AreInputsReady(ctxCache)) return;` silently
+abandons the evaluation when any input is null or Bad-quality. For a chained virtual tag
+(C depends on B depends on driver tag A), if A is still Bad at startup, B is skipped --
+leaving B's `_valueCache` entry absent. When C evaluates, `BuildReadCache` falls through
+to `_upstream.ReadTag("B")` for the missing virtual path, which returns BadNodeIdUnknown
+quality, so C is also skipped. That is acceptable for cold start, but the same guard
+means a virtual tag that legitimately consumes a Bad-quality upstream (e.g. a script
+written to detect comms loss and emit a fallback) can never run -- it is permanently
+frozen at its prior value with no diagnostic. The tag also never transitions to a Bad
+quality of its own, so an OPC UA client cannot distinguish "not yet computed" from
+"computing fine."
+
+**Recommendation:** Make the cold-start behaviour explicit: when inputs are not ready,
+publish a Bad-quality snapshot (e.g. BadWaitingForInitialData, 0x80320000) for the tag
+rather than returning with no state change, so clients see a defined quality. If
+operators need scripts that handle Bad upstreams, consider a per-definition opt-out of
+the readiness guard.
+
+**Resolution:** Resolved 2026-05-22 — cold-start guard now publishes `BadWaitingForInitialData` (0x80320000) and notifies observers instead of silently returning, so OPC UA clients see a defined quality rather than a stale prior value.
+
+### Core.VirtualTags-003
+
+| Field | Value |
+|---|---|
+| Severity | Medium |
+| Category | Correctness & logic bugs |
+| Location | `src/Core/ZB.MOM.WW.OtOpcUa.Core.VirtualTags/VirtualTagEngine.cs:117-120` |
+| Status | Resolved |
+
+**Description:** The upstream-subscription loop in `Load` iterates
+`definitions.SelectMany(d => _tags[d.Path].Reads)`. If `definitions` contains two rows
+with the same Path, the first registers `_tags[Path]` and the second overwrites it, but
+`definitions` still has two entries -- `_tags[d.Path]` is indexed by the second row for
+both iterations, so the first row's distinct upstream reads are silently dropped. More
+importantly, a duplicate Path in the input list is never rejected at all:
+`_tags[def.Path] = ...` and `_graph.Add(def.Path, ...)` both overwrite without warning,
+so one of two operator-authored tags with a colliding UNS path vanishes with no error.
+`Load` is documented as throwing an aggregated error for every problem; a duplicate path
+should be in that set.
+
+**Recommendation:** Detect duplicate Path values while iterating `definitions` and add
+them to `compileFailures` (or a dedicated rejection list) so the aggregated
+`InvalidOperationException` reports them. Separately, iterate `_tags.Values` rather than
+`definitions.SelectMany(d => _tags[d.Path]...)` when collecting upstream paths so the
+collection is keyed off the registered set, not the raw input list.
+
+**Resolution:** Resolved 2026-05-22 — `Load` now tracks seen paths and adds a duplicate-path entry to `compileFailures`; the upstream-subscription loop iterates `_tags.Values` instead of the raw `definitions` list so it is keyed off the registered set.
+
+### Core.VirtualTags-004
+
+| Field | Value |
+|---|---|
+| Severity | Low |
+| Category | Correctness & logic bugs |
+| Location | `src/Core/ZB.MOM.WW.OtOpcUa.Core.VirtualTags/VirtualTagEngine.cs:349` |
+| Status | Open |
+
+**Description:** `CoerceResult`'s switch has a default arm (`_ => raw`) that returns the
+script's raw return value uncoerced for any `DriverDataType` not in the explicit list
+(e.g. an array type, Byte, or a future enum member). The resulting `DataValueSnapshot`
+then carries a value whose CLR type does not match the node's declared OPC UA data type,
+which the node manager will surface as a wire-level type mismatch or a silently wrong
+value. The doc claims a mismatch surfaces as BadTypeMismatch, but an unhandled
+`DriverDataType` bypasses coercion entirely.
+
+**Recommendation:** Make the default arm explicit -- either throw / return null (which
+the outer pipeline maps to BadInternalError) for an unsupported `DriverDataType`, or
+document precisely which `DriverDataType` values `CoerceResult` supports and validate at
+`Load` time that no definition declares an unsupported type.
+
+**Resolution:** _(open)_
+
+### Core.VirtualTags-005
+
+| Field | Value |
+|---|---|
+| Severity | Medium |
+| Category | Concurrency & thread safety |
+| Location | `src/Core/ZB.MOM.WW.OtOpcUa.Core.VirtualTags/VirtualTagSource.cs:50-64` |
+| Status | Resolved |
+
+**Description:** `SubscribeAsync` registers the per-path engine observers first (lines
+52-56), then in a second loop reads the current value and fires the initial-data
+callback (lines 60-64). Between those two loops an upstream change can cascade and the
+engine can invoke the just-registered observer with a new value. The OPC UA client then
+receives the real change event followed by the initial-data event carrying the older
+`engine.Read(path)` snapshot -- out-of-order delivery, and the client's last-known value
+ends up stale.
+
+**Recommendation:** Capture the current snapshot and fire the initial-data callback for
+each path before registering the change observer for that path (or hold a per-handle
+lock spanning both so no engine callback interleaves). The initial value must be
+delivered before any subsequent change for that path.
+
+**Resolution:** Resolved 2026-05-22 — `SubscribeAsync` now fires the initial-data callback per path before registering the change observer for that path, eliminating the out-of-order delivery race.
+
+### Core.VirtualTags-006
+
+| Field | Value |
+|---|---|
+| Severity | Low |
+| Category | Concurrency & thread safety |
+| Location | `src/Core/ZB.MOM.WW.OtOpcUa.Core.VirtualTags/VirtualTagEngine.cs:177-182`, `:395-401` |
+| Status | Open |
+
+**Description:** `Subscribe` does `_observers.GetOrAdd(path, _ => [])` then
+`lock (list) { list.Add(observer); }`. When `Unsub.Dispose` removes the last observer,
+the now-empty List is left in `_observers` and the dictionary entry is never removed.
+For a long-running server with churning OPC UA subscriptions this is an unbounded (if
+slow) growth of empty lists. There is also a benign-but-real race: a thread can call
+`GetOrAdd` and obtain a list reference that another thread's `Dispose` is about to leave
+empty in the map -- not a correctness bug today because the list object is still valid,
+but it makes any future "prune empty entries" logic racy.
+
+**Recommendation:** Either accept the unbounded map and document it, or have
+`Unsub.Dispose` remove the dictionary entry when the list becomes empty under the same
+lock, re-checking emptiness inside the lock to avoid dropping a concurrently-added
+observer.
+
+**Resolution:** _(open)_
+
+### Core.VirtualTags-007
+
+| Field | Value |
+|---|---|
+| Severity | Low |
+| Category | Error handling & resilience |
+| Location | `src/Core/ZB.MOM.WW.OtOpcUa.Core.VirtualTags/TimerTriggerScheduler.cs:58` |
+| Status | Open |
+
+**Description:** `Tick` calls
+`_engine.EvaluateOneAsync(p, _cts.Token).GetAwaiter().GetResult()`, blocking the
+`System.Threading.Timer` callback thread (a thread-pool thread) for the full duration of
+the evaluation. Because `EvaluateInternalAsync` serialises all tags through `_evalGate`,
+a timer tick that races a long change-trigger cascade blocks until the cascade drains.
+With multiple interval groups, several timer callbacks can each pin a thread-pool thread
+waiting on the same gate. A group of N tags can take N times the script timeout while
+holding a pool thread, and under timer re-entrancy (a tick firing again before the prior
+finished) this compounds.
+
+**Recommendation:** Make `Tick` async-aware -- store the returned Task and skip a tick
+if the previous one for that group is still running (a per-group "in flight" flag),
+rather than blocking synchronously. At minimum, document the blocking behaviour and the
+expected upper bound on group evaluation time relative to the interval.
+
+**Resolution:** _(open)_
+
+### Core.VirtualTags-008
+
+| Field | Value |
+|---|---|
+| Severity | Medium |
+| Category | Performance & resource management |
+| Location | `src/Core/ZB.MOM.WW.OtOpcUa.Core.VirtualTags/DependencyGraph.cs:81-115` |
+| Status | Resolved |
+
+**Description:** `TransitiveDependentsInOrder` calls `TopologicalSort()` (a full O(V+E)
+Kahn pass plus a Dictionary rank build) on every invocation, and it is invoked from
+`CascadeAsync` on every upstream change event (`OnUpstreamChange`). On a large graph with
+high-rate upstream tags this re-sorts the entire dependency graph on every protocol-rate
+delta -- pure waste, since the topological order is immutable between `Load` calls. The
+DFS that collects dependents is itself fine; only the repeated sort is the cost.
+
+**Recommendation:** Compute the topological order (and the rank dictionary) once at the
+end of `Load` and cache it on `DependencyGraph` (invalidated by `Add` / `Clear`).
+`TransitiveDependentsInOrder` then reuses the cached rank map. This turns a per-event
+O(V+E) cost into an O(closure) cost.
+
+**Resolution:** Resolved 2026-05-22 — `DependencyGraph` now caches the topological rank dictionary (invalidated by `Add`/`Clear`) via `GetOrBuildRank()`; `TransitiveDependentsInOrder` reuses it, reducing per-change-event cost from O(V+E) to O(closure).
+
+### Core.VirtualTags-009
+
+| Field | Value |
+|---|---|
+| Severity | Low |
+| Category | Performance & resource management |
+| Location | `src/Core/ZB.MOM.WW.OtOpcUa.Core.VirtualTags/DependencyGraph.cs:64-65`, `:72-73` |
+| Status | Open |
+
+**Description:** `DirectDependencies` and `DirectDependents` allocate a fresh empty
+`HashSet<string>` on every call for an unregistered node. `DirectDependents` is called
+inside the `TopologicalSort` Kahn loop and the `CascadeAsync` DFS, so for a graph with
+many leaf driver tags this allocates a throwaway set per leaf per sort. Minor, but it is
+on the change-cascade path.
+
+**Recommendation:** Return a shared static empty set for the miss case instead of
+allocating each time.
+
+**Resolution:** _(open)_
+
+### Core.VirtualTags-010
+
+| Field | Value |
+|---|---|
+| Severity | Low |
+| Category | Documentation & comments |
+| Location | `src/Core/ZB.MOM.WW.OtOpcUa.Core.VirtualTags/ITagUpstreamSource.cs:18`, `VirtualTagContext.cs:30`, `VirtualTagDefinition.cs:28` |
+| Status | Open |
+
+**Description:** Several XML docs reference component names that do not exist in the
+codebase. `ITagUpstreamSource` XML doc says the subscription path "feeds the engine's
+ChangeTriggerDispatcher" -- there is no ChangeTriggerDispatcher; the actual path is
+`OnUpstreamChange` then `CascadeAsync`. `VirtualTagDefinition`'s TimerInterval and
+`VirtualTagContext` docs reference an EvaluationPipeline that likewise does not exist;
+the real type is `EvaluateInternalAsync` inside `VirtualTagEngine`. Stale type names in
+XML docs mislead maintainers searching for the named component.
+
+**Recommendation:** Update the XML docs to name the real types (`VirtualTagEngine`,
+`CascadeAsync`, `EvaluateInternalAsync`) or drop the specific name in favour of a
+behavioural description.
+
+**Resolution:** _(open)_
+
+### Core.VirtualTags-011
+
+| Field | Value |
+|---|---|
+| Severity | Low |
+| Category | Code organization & conventions |
+| Location | `src/Core/ZB.MOM.WW.OtOpcUa.Core.VirtualTags/VirtualTagEngine.cs:404-409` |
+| Status | Open |
+
+**Description:** `VirtualTagState` records a Writes set (the `ctx.SetVirtualTag` targets
+extracted by `DependencyExtractor`), but nothing in the engine reads it -- it is captured
+at `Load` and never used. Declared write targets are not validated against the registered
+tag set at publish time (a script writing to a non-existent virtual path is only caught
+at runtime by `OnScriptSetVirtualTag`'s warning-and-drop), and they do not contribute to
+the dependency graph. Either the field is dead state or an intended publish-time
+validation is missing.
+
+**Recommendation:** Use Writes to validate at `Load` that every `ctx.SetVirtualTag`
+target resolves to a registered virtual tag (adding an entry to `compileFailures` on a
+miss), so an operator typo is caught at publish rather than silently dropped at runtime.
+If validation is deliberately deferred, remove the unused field or comment why it is
+retained.
+
+**Resolution:** _(open)_
+
+### Core.VirtualTags-012
+
+| Field | Value |
+|---|---|
+| Severity | Medium |
+| Category | Testing coverage |
+| Location | `tests/Core/ZB.MOM.WW.OtOpcUa.Core.VirtualTags.Tests/` |
+| Status | Resolved |
+
+**Description:** Several behaviours of the engine have no test coverage:
+(1) the cold-start `AreInputsReady` guard -- no test exercises an upstream that is
+null/Bad at evaluation time and asserts the resulting tag state (see
+Core.VirtualTags-002);
+(2) `ctx.SetVirtualTag` cascading to a dependent of the written tag -- the existing test
+only checks the written tag itself, so the gap in Core.VirtualTags-001 is invisible to
+the suite;
+(3) the `OnScriptSetVirtualTag` warning path for a write to a non-registered path;
+(4) `EvaluateOneAsync` throwing `ArgumentException` for an unregistered path;
+(5) `CoerceResult` failure mapping to BadInternalError (only the success coercion
+double-to-int32 is tested);
+(6) duplicate Path values in a `Load` definition list (see Core.VirtualTags-003);
+(7) `Read`/`Subscribe`/`EvaluateOneAsync` calls before `Load` (the `EnsureLoaded` guard).
+
+**Recommendation:** Add unit tests for each path above. Items (1), (2), and (6) directly
+correspond to open correctness findings and would have caught them.
+
+**Resolution:** Resolved 2026-05-22 — added 9 unit tests covering all 7 gaps: `AreInputsReady` guard publishes `BadWaitingForInitialData` and recovers; `SetVirtualTag` cascade to dependent; write to non-registered path; `EvaluateOneAsync` before `Load` and for unregistered path; `CoerceResult` failure maps to `BadInternalError`; duplicate-path rejection; `Read`/`Subscribe` before `Load`.
+
+### Core.VirtualTags-013
+
+| Field | Value |
+|---|---|
+| Severity | Low |
+| Category | Documentation & comments |
+| Location | `src/Core/ZB.MOM.WW.OtOpcUa.Core.VirtualTags/DependencyGraph.cs:266-270` |
+| Status | Open |
+
+**Description:** `DependencyCycleException.BuildMessage` renders each cycle as
+`string.Join(" -> ", c) + " -> " + c[0]`, presenting the SCC member list as a traversable
+edge path that loops back to its first element. Tarjan's algorithm returns the members of
+a strongly-connected component in stack-pop order, which is not guaranteed to be a valid
+edge sequence -- for an SCC larger than 2 nodes the printed "A -> B -> C -> A" may list
+edges that do not exist. The message can therefore mislead an operator debugging a cycle
+into looking for an edge that is not in their config.
+
+**Recommendation:** Either label the output as "cycle members" (a set, not an ordered
+path) rather than rendering arrows, or reconstruct an actual cycle path within the SCC
+(a single DFS back-edge walk) before formatting.
+
+**Resolution:** _(open)_
@@ -0,0 +1,207 @@
+# Code Review — Core
+
+| Field | Value |
+|---|---|
+| Module | `src/Core/ZB.MOM.WW.OtOpcUa.Core` |
+| Reviewer | Claude Code |
+| Review date | 2026-05-22 |
+| Commit reviewed | `76d35d1` |
+| Status | Reviewed |
+| Open findings | 6 |
+
+## Checklist coverage
+
+| # | Category | Result |
+|---|---|---|
+| 1 | Correctness & logic bugs | Core-001, Core-002, Core-003 |
+| 2 | OtOpcUa conventions | Core-004 |
+| 3 | Concurrency & thread safety | Core-005, Core-006 |
+| 4 | Error handling & resilience | Core-007, Core-008 |
+| 5 | Security | Core-002 |
+| 6 | Performance & resource management | Core-009 |
+| 7 | Design-document adherence | Core-002, Core-003 |
+| 8 | Code organization & conventions | Core-010 |
+| 9 | Testing coverage | Core-011 |
+| 10 | Documentation & comments | Core-012 |
+
+## Findings
+
+### Core-001
+
+| Field | Value |
+|---|---|
+| Severity | High |
+| Category | Correctness & logic bugs |
+| Location | `src/Core/ZB.MOM.WW.OtOpcUa.Core/Authorization/UserAuthorizationState.cs:50-68` |
+| Status | Resolved |
+
+**Description:** `NeedsRefresh` can never return `true` with the default field values. `AuthCacheMaxStaleness` defaults to 5 minutes and `MembershipFreshnessInterval` defaults to 15 minutes. `NeedsRefresh(utcNow)` is defined as `!IsStale(utcNow) && elapsed > MembershipFreshnessInterval`, i.e. it needs `elapsed > 15 min` AND `elapsed <= 5 min` simultaneously — an empty set. The session crosses the staleness ceiling (5 min) and fails closed long before it ever reaches the 15-minute freshness boundary that is supposed to signal "kick off an async re-resolution while still serving cached memberships." Decision #151 / #152 in `docs/v2/implementation/phase-6-2-authorization-runtime.md` intends the freshness window (15 min, re-resolve) to be the inner trigger and the staleness ceiling to be the outer hard limit; with these defaults the ordering is inverted, so the "refresh while warm" path is dead code and every long-lived session hard-fails authorization after 5 minutes.
+
+**Recommendation:** Either swap the defaults so `MembershipFreshnessInterval` (e.g. 5 min) is strictly less than `AuthCacheMaxStaleness` (e.g. 15 min) — matching the doc's stated intent — or, if the 5/15 values are correct, redefine which window is the refresh trigger and which is the fail-closed ceiling. Add a unit test asserting `NeedsRefresh` returns `true` for at least one point in time with the production defaults.
+
+**Resolution:** Resolved 2026-05-22 — swapped the defaults so `MembershipFreshnessInterval` is 5 min and `AuthCacheMaxStaleness` is 15 min (freshness = inner re-resolve trigger, staleness = outer fail-closed ceiling); added a `NeedsRefresh_FiresWithin_ProductionDefault_Windows` regression test.
+
+### Core-002
+
+| Field | Value |
+|---|---|
+| Severity | High |
+| Category | Security |
+| Location | `src/Core/ZB.MOM.WW.OtOpcUa.Core/Authorization/TriePermissionEvaluator.cs:24-50` |
+| Status | Resolved |
+
+**Description:** `TriePermissionEvaluator.Authorize` never compares the session's `AuthGenerationId` against the generation of the trie it evaluates against. It calls `_cache.GetTrie(scope.ClusterId)` — the current-generation shortcut — and authorizes against whatever generation the cache happens to hold. `UserAuthorizationState` carries `AuthGenerationId` precisely so a stale session can be detected, and the Phase 6.2 design (`phase-6-2-authorization-runtime.md` adversarial-review item #3 "Redundancy-safe invalidation", plus the §Scope `PermissionTrieCache + freshness` row) requires the hot-path call to look up `CurrentGenerationId` and force a re-evaluation on mismatch. As written, a session bound at generation N silently evaluates against generation N+1 the instant another node publishes — grants added or removed in N+1 take effect for that session without the intended generation-stamp re-check, and the provenance returned in `AuthorizationDecision` misreports which generation produced the verdict.
+
+**Recommendation:** In `Authorize`, after resolving the trie, compare `trie.GenerationId` to `session.AuthGenerationId`. On mismatch either fetch the session's bound generation via `_cache.GetTrie(clusterId, session.AuthGenerationId)` and evaluate against it, or signal the caller to re-resolve the session's auth state before retrying. Add a test for the publish-during-session scenario.
+
+**Resolution:** Resolved 2026-05-22 — `Authorize` now compares `trie.GenerationId` to `session.AuthGenerationId` and, on mismatch, re-fetches the session's bound generation via `_cache.GetTrie(clusterId, session.AuthGenerationId)`, failing closed when that generation has been pruned; added publish-during-session and pruned-generation regression tests.
+
+### Core-003
+
+| Field | Value |
+|---|---|
+| Severity | Medium |
+| Category | Correctness & logic bugs |
+| Location | `src/Core/ZB.MOM.WW.OtOpcUa.Core/Authorization/PermissionTrie.cs:80-98` |
+| Status | Resolved |
+
+**Description:** `WalkSystemPlatform` records every Galaxy folder-segment grant with `NodeAclScopeKind.Equipment` (see the comment at lines 82-86) because `NodeAclScopeKind` has no `FolderSegment` member. The functional union of permission flags is unaffected, but the `MatchedGrant.Scope` carried in `AuthorizationDecision.Provenance` is wrong for Galaxy nodes: a grant anchored at a namespace-root folder and a grant anchored at a deep sub-folder both report `Equipment`, and a namespace-level grant is indistinguishable from a folder-level grant in the audit trail and the Admin UI "Probe this permission" diagnostic. The Phase 6.2 design (adversarial-review item #6) calls for a dedicated `FolderSegment` scope level. The current code is a known shortcut but references only an untracked "TODO" with no issue ID.
+
+**Recommendation:** Add a `FolderSegment` member to `NodeAclScopeKind` and use it in `WalkSystemPlatform` and `PermissionTrieBuilder` so Galaxy folder grants report their true scope. If the enum change is deferred, file a tracked issue and reference its ID in the code comment.
+
+**Resolution:** Resolved 2026-05-22 — added `FolderSegment` to `NodeAclScopeKind`; `WalkSystemPlatform` now reports `FolderSegment` instead of `Equipment` for each visited Galaxy folder level; added three regression tests asserting the correct scope is reported in `MatchedGrant.Scope`.
+
+### Core-004
+
+| Field | Value |
+|---|---|
+| Severity | Low |
+| Category | OtOpcUa conventions |
+| Location | `src/Core/ZB.MOM.WW.OtOpcUa.Core/Hosting/DriverHost.cs:55,72,87` |
+| Status | Open |
+
+**Description:** `DriverHost` is a library type whose async calls (`driver.InitializeAsync`, `driver.ShutdownAsync`) do not use `ConfigureAwait(false)`, whereas the sibling `CapabilityInvoker` and `AlarmSurfaceInvoker` in the same module consistently do. The server host has no synchronization context so behaviour is currently correct, but the inconsistency is a maintenance hazard and a deviation from the established convention in `Core.Resilience`.
+
+**Recommendation:** Add `.ConfigureAwait(false)` to the three awaited calls in `DriverHost.RegisterAsync`, `UnregisterAsync`, and `DisposeAsync`.
+
+**Resolution:** _(open)_
+
+### Core-005
+
+| Field | Value |
+|---|---|
+| Severity | Medium |
+| Category | Concurrency & thread safety |
+| Location | `src/Core/ZB.MOM.WW.OtOpcUa.Core/Authorization/PermissionTrieCache.cs:59-70` |
+| Status | Resolved |
+
+**Description:** `Prune` mutates the `ConcurrentDictionary` with a plain indexer assignment (`_byCluster[clusterId] = new ClusterEntry(...)`) after a separate `TryGetValue` read. If `Install` runs concurrently for the same cluster, the `AddOrUpdate` in `Install` and the indexer write in `Prune` race: `Prune` can read an entry, `Install` then adds a newer generation via `AddOrUpdate`, and `Prune`'s unconditional indexer write then overwrites the entry — silently dropping the just-installed newest generation and its `Current` pointer. The class is documented as a process-singleton accessed on the hot path while publishes install new tries, so the race is reachable.
+
+**Recommendation:** Make `Prune` use an atomic compare-and-swap loop — `_byCluster.TryUpdate(clusterId, prunedEntry, observedEntry)` retried until it succeeds or the key is gone — or perform the prune inside an `AddOrUpdate` update factory.
+
+**Resolution:** Resolved 2026-05-22 — changed `ClusterEntry` from `sealed record` to `sealed class` (enabling reference-equality CAS via `TryUpdate`); `Prune` now uses a read-compute-`TryUpdate` retry loop that restarts if another thread updates the entry between the read and the write; added regression tests asserting the current generation is preserved after a concurrent install + prune sequence.
+
+### Core-006
+
+| Field | Value |
+|---|---|
+| Severity | Medium |
+| Category | Concurrency & thread safety |
+| Location | `src/Core/ZB.MOM.WW.OtOpcUa.Core/OpcUa/GenericDriverNodeManager.cs:42-64` |
+| Status | Resolved |
+
+**Description:** `BuildAddressSpaceAsync` is not guarded against being called more than once. A second call subscribes a second `_alarmForwarder` to `IAlarmSource.OnAlarmEvent` and overwrites the `_alarmForwarder` field, so the first delegate is leaked (still subscribed, never unsubscribed because `Dispose` only removes the field's current value). Every alarm transition would then be delivered to its sink twice. The address-space rebuild path on Galaxy redeploy (`DeployWatcher` → `IRediscoverable.OnRediscoveryNeeded` → server rebuilds the address space) is exactly the scenario where a node manager could legitimately be re-walked. There is also no check of the `_disposed` flag at the top of the method.
+
+**Recommendation:** Either guard `BuildAddressSpaceAsync` so a second call throws `InvalidOperationException` (document it single-shot), or unsubscribe the previous `_alarmForwarder` and clear `_alarmSinks` before re-walking. Also check `_disposed` and throw `ObjectDisposedException` if already disposed.
+
+**Resolution:** Resolved 2026-05-22 — `BuildAddressSpaceAsync` now checks `_disposed` (throws `ObjectDisposedException`) and tears down the previous alarm forwarder + clears the sink registry before re-walking so a Galaxy-redeploy rebuild does not double-subscribe the forwarder; added three regression tests covering double-build, sink-count after rebuild, and post-dispose throw.
+
+### Core-007
+
+| Field | Value |
+|---|---|
+| Severity | Medium |
+| Category | Error handling & resilience |
+| Location | `src/Core/ZB.MOM.WW.OtOpcUa.Core/Resilience/AlarmSurfaceInvoker.cs:75-83` |
+| Status | Resolved |
+
+**Description:** `UnsubscribeAsync` always routes through `_defaultHost`, even when an `IPerCallHostResolver` is wired and the original `SubscribeAsync` fanned the subscription out to a non-default host. The `IAlarmSubscriptionHandle` is opaque here and carries no host association, so an unsubscribe for a subscription created against host B runs through host A's resilience pipeline. In a multi-host driver this charges the wrong host's circuit breaker / bulkhead and, if host A is open while host B is healthy, can spuriously block a valid unsubscribe. The XML doc claims it routes "for parity" with `SubscribeAsync` but subscribe is per-host and unsubscribe is not.
+
+**Recommendation:** Carry the resolved host on the `IAlarmSubscriptionHandle` (or in a handle→host map kept by `AlarmSurfaceInvoker`) so `UnsubscribeAsync` routes through the same host's pipeline the subscription was created on.
+
+**Resolution:** Resolved 2026-05-22 — `SubscribeAsync` now wraps each driver handle in a `HostBoundHandle` (private `IAlarmSubscriptionHandle` implementation) that carries the resolved host name; `UnsubscribeAsync` unwraps it and routes through the recorded host's pipeline, falling back to the default host for handles not created by this invoker; added two regression tests verifying per-host routing and single-host fallback.
+
+### Core-008
+
+| Field | Value |
+|---|---|
+| Severity | Low |
+| Category | Error handling & resilience |
+| Location | `src/Core/ZB.MOM.WW.OtOpcUa.Core/OpcUa/GenericDriverNodeManager.cs:42-64` |
+| Status | Open |
+
+**Description:** The XML summary of `BuildAddressSpaceAsync` states "Driver exceptions are isolated per decision #12 — the driver's subtree is marked Faulted, but other drivers remain available." The method body contains no such isolation: an exception from `discovery.DiscoverAsync` propagates straight out unhandled, and nothing here marks a subtree Faulted. The isolation is presumably done by the server-layer caller, but the comment asserts behaviour this class does not implement.
+
+**Recommendation:** Either implement the documented isolation in `GenericDriverNodeManager`, or correct the XML doc to state that exception isolation is the caller's responsibility and name the type that performs it.
+
+**Resolution:** _(open)_
+
+### Core-009
+
+| Field | Value |
+|---|---|
+| Severity | Low |
+| Category | Performance & resource management |
+| Location | `src/Core/ZB.MOM.WW.OtOpcUa.Core/Resilience/CapabilityInvoker.cs:121-128` |
+| Status | Open |
+
+**Description:** `ExecuteWriteAsync` calls `_optionsAccessor()` three times for a single non-idempotent write (once for the `with` expression, once inside the dictionary initializer for `.Resolve(...)`, plus the discarded base). On the per-write hot path it rebuilds a fresh `DriverResilienceOptions` and a one-entry dictionary on every non-idempotent write, and the redundant accessor calls could observe two different snapshots if an Admin edit lands between them. Phase 6.1 budgets a 1% pipeline overhead; this is unnecessary allocation plus a minor consistency hazard.
+
+**Recommendation:** Capture `var options = _optionsAccessor();` once at the top of the non-idempotent branch and derive both the `with` and the `Resolve` call from that snapshot. Consider caching the no-retry pipeline keyed on `(hostName, non-idempotent)`.
+
+**Resolution:** _(open)_
+
+### Core-010
+
+| Field | Value |
+|---|---|
+| Severity | Low |
+| Category | Code organization & conventions |
+| Location | `src/Core/ZB.MOM.WW.OtOpcUa.Core/Resilience/DriverResilienceOptions.cs:45-52` |
+| Status | Open |
+
+**Description:** `DriverResilienceOptions.Resolve` indexes the tier-default dictionary directly (`defaults[capability]`) with no fallback. Any future addition to `DriverCapability` that is not also added to all three tier tables in `GetTierDefaults` will make `Resolve` throw `KeyNotFoundException` at runtime on the capability hot path rather than failing at build time. The two are coupled by convention only.
+
+**Recommendation:** Either add a `default` arm to `Resolve` returning a conservative policy (and logging), or add a unit-test invariant asserting every `DriverCapability` value is present in each tier's default table.
+
+**Resolution:** _(open)_
+
+### Core-011
+
+| Field | Value |
+|---|---|
+| Severity | Low |
+| Category | Testing coverage |
+| Location | `src/Core/ZB.MOM.WW.OtOpcUa.Core/Authorization/PermissionTrieBuilder.cs:58-75` |
+| Status | Open |
+
+**Description:** `PermissionTrieBuilder.Descend` has a two-branch behaviour: with a `scopePaths` lookup it descends the real hierarchy; without one it falls back to placing every non-cluster row directly under the root keyed by `ScopeId` ("works for deterministic tests, not for production"). The fallback silently produces a structurally incorrect trie when `scopePaths` is null or a row's `ScopeId` is missing — a UnsLine-scoped grant ends up as a direct child of the root, so `WalkEquipment` / `WalkSystemPlatform` never reach it and the grant is effectively dropped, with no diagnostic. There is no test asserting the production multi-level descent versus the fallback.
+
+**Recommendation:** Add unit tests covering `Build` with `scopePaths` producing the correct multi-level trie and the missing-`ScopeId` fallback. Have `Descend` surface a diagnostic (or throw outside test configuration) when a sub-cluster row cannot be located in `scopePaths`.
+
+**Resolution:** _(open)_
+
+### Core-012
+
+| Field | Value |
+|---|---|
+| Severity | Low |
+| Category | Documentation & comments |
+| Location | `src/Core/ZB.MOM.WW.OtOpcUa.Core/Stability/WedgeDetector.cs:26`, `src/Core/ZB.MOM.WW.OtOpcUa.Core/Observability/DriverHealthReport.cs:11-22` |
+| Status | Open |
+
+**Description:** Two stale doc comments. (1) `WedgeDetector` — the `<summary>` above the constructor reads "Whether the driver reported itself `DriverState.Healthy` at construction." The constructor takes only a `TimeSpan threshold` and the detector is documented as stateless; the comment describes nothing the constructor does. (2) `DriverHealthReport` — the `<remarks>` state matrix lists Unknown, Initializing, Healthy, Degraded, Faulted but `Aggregate` (lines 42-44) also folds `DriverState.Reconnecting` into the Degraded verdict. `Reconnecting` is a real `DriverState` member absent from the documented matrix.
+
+**Recommendation:** Replace the `WedgeDetector` constructor `<summary>` with an accurate description (e.g. "Construct with the wedge-detection threshold; values below 60 s clamp to 60 s"). Add `Reconnecting` to the `DriverHealthReport` `<remarks>` state matrix and state it maps to Degraded.
+
+**Resolution:** _(open)_
@@ -0,0 +1,238 @@
+# Code Review — Driver.AbCip.Cli
+
+| Field | Value |
+|---|---|
+| Module | `src/Drivers/Cli/ZB.MOM.WW.OtOpcUa.Driver.AbCip.Cli` |
+| Reviewer | Claude Code |
+| Review date | 2026-05-22 |
+| Commit reviewed | `76d35d1` |
+| Status | Reviewed |
+| Open findings | 6 |
+
+## Checklist coverage
+
+A comprehensive review completes every category, recording "No issues found" where
+a category produced nothing rather than leaving it blank.
+
+| # | Category | Result |
+|---|---|---|
+| 1 | Correctness & logic bugs | Driver.AbCip.Cli-001, Driver.AbCip.Cli-002 |
+| 2 | OtOpcUa conventions | No issues found |
+| 3 | Concurrency & thread safety | Driver.AbCip.Cli-003 |
+| 4 | Error handling & resilience | Driver.AbCip.Cli-001, Driver.AbCip.Cli-004 |
+| 5 | Security | No issues found |
+| 6 | Performance & resource management | Driver.AbCip.Cli-005 |
+| 7 | Design-document adherence | Driver.AbCip.Cli-006 |
+| 8 | Code organization & conventions | No issues found |
+| 9 | Testing coverage | Driver.AbCip.Cli-007 |
+| 10 | Documentation & comments | Driver.AbCip.Cli-008 |
+
+## Findings
+
+<!-- One ### entry per finding. IDs are <Module>-NNN, sequential within the module,
+     never reused. Findings are never deleted — close them by changing Status and
+     completing Resolution. -->
+
+### Driver.AbCip.Cli-001
+
+| Field | Value |
+|---|---|
+| Severity | Medium |
+| Category | Error handling & resilience |
+| Location | `src/Drivers/Cli/ZB.MOM.WW.OtOpcUa.Driver.AbCip.Cli/Commands/WriteCommand.cs:70-85` |
+| Status | Resolved |
+
+**Description:** `ParseValue` parses every numeric Logix type with the BCL `*.Parse`
+methods (`sbyte.Parse`, `short.Parse`, `int.Parse`, `float.Parse`, ...). These throw
+the raw `FormatException` and `OverflowException` on bad operator input. The module's
+own test `ParseValue_non_numeric_for_numeric_types_throws` confirms a raw
+`FormatException` escapes for `DInt`. Meanwhile the `Bool` branch and the `_ =>`
+default branch throw the CLI-friendly `CliFx.Exceptions.CommandException` with an
+actionable message. The result is inconsistent operator UX: a typo in a boolean
+value prints "Boolean value 'x' is not recognised...", but a typo in a numeric
+value (`write -v 12x --type DInt`, or an out-of-range `write -v 99999999999 --type
+Int`) escapes uncaught and CliFx renders a full .NET stack trace instead of a
+one-line error. CliFx only formats `CommandException` cleanly.
+
+**Recommendation:** Wrap the numeric `*.Parse` calls (or the whole `switch`) in a
+`try`/`catch (Exception ex) when (ex is FormatException or OverflowException)` that
+rethrows as a `CommandException` with the raw value, the target `--type`, and the
+valid range — mirroring the `ParseBool` failure message.
+
+**Resolution:** Resolved 2026-05-22 — wrapped the `ParseValue` switch in `try/catch (FormatException or OverflowException)` that rethrows as `CommandException` with the raw value and type; updated the previously-passing `ParseValue_non_numeric_for_numeric_types_throws` test to assert `CommandException` and added two new tests covering overflow and actionable message content.
+
+### Driver.AbCip.Cli-002
+
+| Field | Value |
+|---|---|
+| Severity | Medium |
+| Category | Correctness & logic bugs |
+| Location | `src/Drivers/Cli/ZB.MOM.WW.OtOpcUa.Driver.AbCip.Cli/Commands/ProbeCommand.cs:21-23`; `Commands/ReadCommand.cs:24-25`; `Commands/SubscribeCommand.cs:20-22` |
+| Status | Resolved |
+
+**Description:** `ProbeCommand`, `ReadCommand`, and `SubscribeCommand` expose
+`--type` as a free `AbCipDataType` enum option with no exclusion of
+`AbCipDataType.Structure`. Only `WriteCommand` rejects `Structure` (with an explicit
+`CommandException`). Passing `probe/read/subscribe --type Structure` synthesises a
+tag with `DataType = Structure` and no `Members` declared. The driver read path
+treats a memberless Structure tag as a black box and routes it to the per-tag
+fallback, where `runtime.DecodeValue(AbCipDataType.Structure, ...)` runs with no
+declared layout — the operator gets either an opaque value or a confusing status
+code rather than the clean "Structure writes need an explicit member layout"
+guidance `write` gives. The `read` doc comment even claims "UDT / Structure reads
+are out of scope here", but the code does not enforce it.
+
+**Recommendation:** Reject `AbCipDataType.Structure` in `ProbeCommand`,
+`ReadCommand`, and `SubscribeCommand` `ExecuteAsync` with the same `CommandException`
+pattern `WriteCommand` uses, or factor a shared `RejectStructure(DataType)` guard
+into `AbCipCommandBase`.
+
+**Resolution:** Resolved 2026-05-22 — added `RejectStructure(AbCipDataType)` static helper to `AbCipCommandBase` that throws `CommandException` for `Structure`; called at the top of `ExecuteAsync` in `ProbeCommand`, `ReadCommand`, and `SubscribeCommand`.
+
+### Driver.AbCip.Cli-003
+
+| Field | Value |
+|---|---|
+| Severity | Low |
+| Category | Concurrency & thread safety |
+| Location | `src/Drivers/Cli/ZB.MOM.WW.OtOpcUa.Driver.AbCip.Cli/Commands/SubscribeCommand.cs:50-56,60-61` |
+| Status | Open |
+
+**Description:** The `OnDataChange` handler writes change lines to `console.Output`
+(a `TextWriter`) from the driver's poll-engine callback thread, while the command's
+main flow concurrently writes the "Subscribed to ... Ctrl+C to stop." line on the
+CLI thread. `TextWriter.WriteLine` is not guaranteed thread-safe; concurrent writes
+from the poll thread and the main thread can interleave or, in the worst case,
+corrupt buffered output. The window is small (one main-thread write right after
+`SubscribeAsync`) but it exists, and any future addition of main-thread output
+during the watch loop widens it.
+
+**Recommendation:** Emit the "Subscribed..." banner before registering the
+`OnDataChange` handler (or before `SubscribeAsync`), or guard all `console.Output`
+writes during the subscription with a shared lock so poll-thread and main-thread
+output cannot interleave.
+
+**Resolution:** _(open)_
+
+### Driver.AbCip.Cli-004
+
+| Field | Value |
+|---|---|
+| Severity | Low |
+| Category | Error handling & resilience |
+| Location | `src/Drivers/Cli/ZB.MOM.WW.OtOpcUa.Driver.AbCip.Cli/Commands/SubscribeCommand.cs:28,58`; `AbCipCommandBase.cs:26-34` |
+| Status | Open |
+
+**Description:** `--interval-ms` (`IntervalMs`) is taken verbatim and passed as
+`TimeSpan.FromMilliseconds(IntervalMs)` to `SubscribeAsync` with no validation. A
+zero or negative value produces a non-positive `TimeSpan`; the option description
+claims "PollGroupEngine floors sub-250ms values" but says nothing about `0` or
+negatives, and the flooring behaviour is the engine's, not the CLI's — relying on a
+downstream component to sanitise operator input is fragile. `--timeout-ms` on
+`AbCipCommandBase` has the same gap (a negative value yields a negative `TimeSpan`).
+
+**Recommendation:** Validate `IntervalMs > 0` and `TimeoutMs > 0` at the top of
+`ExecuteAsync` / in `AbCipCommandBase`, throwing a `CommandException` with the
+accepted range when out of bounds.
+
+**Resolution:** _(open)_
+
+### Driver.AbCip.Cli-005
+
+| Field | Value |
+|---|---|
+| Severity | Low |
+| Category | Performance & resource management |
+| Location | `src/Drivers/Cli/ZB.MOM.WW.OtOpcUa.Driver.Cli.Common/DriverCommandBase.cs:51-59` |
+| Status | Open |
+
+**Description:** `ConfigureLogging` assigns a freshly created Serilog logger to the
+process-global `Log.Logger` but never calls `Log.CloseAndFlush()`. For a short-lived
+one-shot command (`probe`, `read`, `write`) the process exit flushes the console
+sink, so the practical impact is nil. For `subscribe` — a long-running command
+terminated by Ctrl+C — buffered log lines emitted just before cancellation can be
+lost on abrupt exit. (This lives in the shared `Driver.Cli.Common` base, so it is
+noted here as it affects the AB CIP CLI; the canonical fix belongs in that shared
+module's review.)
+
+**Recommendation:** Register `Log.CloseAndFlush()` on process exit (e.g. via
+`AppDomain.ProcessExit` or a `finally` in the command), or have the CLI use a
+disposable logger scoped to `ExecuteAsync`.
+
+**Resolution:** _(open)_
+
+### Driver.AbCip.Cli-006
+
+| Field | Value |
+|---|---|
+| Severity | Low |
+| Category | Design-document adherence |
+| Location | `src/Drivers/Cli/ZB.MOM.WW.OtOpcUa.Driver.AbCip.Cli/AbCipCommandBase.cs:29-34` |
+| Status | Open |
+
+**Description:** `AbCipCommandBase` overrides the abstract `DriverCommandBase.Timeout`
+property with a getter derived from `TimeoutMs` and an empty `init` body
+(`init { /* driven by TimeoutMs */ }`). Because the override has no
+`[CommandOption]` attribute, CliFx never binds it, so the empty `init` is unreachable
+in normal CLI use. However, an empty `init` accessor silently discards any
+assignment — if a future caller or test constructs the command via an object
+initializer (`new ReadCommand { Timeout = ... }`) the assignment is silently dropped
+with no compiler warning. This is a latent correctness trap rather than a current
+bug.
+
+**Recommendation:** Either drop the `init` accessor entirely (make the override a
+get-only expression-bodied property) or have the empty `init` throw
+`NotSupportedException` to make the "driven by TimeoutMs" contract explicit and
+fail-fast.
+
+**Resolution:** _(open)_
+
+### Driver.AbCip.Cli-007
+
+| Field | Value |
+|---|---|
+| Severity | Low |
+| Category | Testing coverage |
+| Location | `tests/Drivers/Cli/ZB.MOM.WW.OtOpcUa.Driver.AbCip.Cli.Tests/WriteCommandParseValueTests.cs` |
+| Status | Open |
+
+**Description:** The only test file covers `WriteCommand.ParseValue` and
+`ReadCommand.SynthesiseTagName` — both pure static helpers. There is no coverage for
+`AbCipCommandBase.BuildOptions` (the flag-to-`AbCipDriverOptions` mapping that all
+four commands depend on) or `DriverInstanceId`. `BuildOptions` is pure and trivially
+unit-testable yet untested: a regression that, say, flipped `EnableAlarmProjection`
+back on or dropped `Probe.Enabled = false` would not be caught — and the comment
+explicitly warns the probe loop "would race the operator's own reads", so that
+mapping is behaviourally load-bearing. The `ExecuteAsync` bodies are reasonably left
+untested (they need a fake `AbCipDriver` or hardware), consistent with the other
+driver CLIs.
+
+**Recommendation:** Add unit tests asserting `BuildOptions` produces
+`Probe.Enabled == false`, `EnableControllerBrowse == false`,
+`EnableAlarmProjection == false`, the expected single `AbCipDeviceOptions`
+(`HostAddress`, `PlcFamily`, `DeviceName`), the supplied tag list, and the `Timeout`
+derived from `TimeoutMs`.
+
+**Resolution:** _(open)_
+
+### Driver.AbCip.Cli-008
+
+| Field | Value |
+|---|---|
+| Severity | Low |
+| Category | Documentation & comments |
+| Location | `docs/Driver.AbCip.Cli.md:8-9` |
+| Status | Open |
+
+**Description:** `docs/Driver.AbCip.Cli.md` opens with "Second of four driver
+test-client CLIs (Modbus -> AB CIP -> AB Legacy -> S7 -> TwinCAT)." The count "four"
+contradicts the chain that follows it (five names) and contradicts
+`docs/DriverClis.md`, which documents six CLIs (Modbus, AB CIP, AB Legacy, S7,
+TwinCAT, FOCAS). The FOCAS CLI shipped alongside the Tier-C work, so the AB CIP
+doc's "four" and the truncated chain are both stale.
+
+**Recommendation:** Update the sentence to "Second of six driver test-client CLIs"
+and complete the chain (Modbus -> AB CIP -> AB Legacy -> S7 -> TwinCAT -> FOCAS), or
+drop the explicit count and link `docs/DriverClis.md` as the authoritative roster.
+
+**Resolution:** _(open)_
@@ -0,0 +1,252 @@
+# Code Review — Driver.AbCip
+
+| Field | Value |
+|---|---|
+| Module | `src/Drivers/ZB.MOM.WW.OtOpcUa.Driver.AbCip` |
+| Reviewer | Claude Code |
+| Review date | 2026-05-22 |
+| Commit reviewed | `76d35d1` |
+| Status | Reviewed |
+| Open findings | 5 |
+
+## Checklist coverage
+
+| # | Category | Result |
+|---|---|---|
+| 1 | Correctness & logic bugs | Driver.AbCip-001, Driver.AbCip-002, Driver.AbCip-003, Driver.AbCip-004, Driver.AbCip-005 |
+| 2 | OtOpcUa conventions | Driver.AbCip-006, Driver.AbCip-007 |
+| 3 | Concurrency & thread safety | Driver.AbCip-008, Driver.AbCip-009 |
+| 4 | Error handling & resilience | Driver.AbCip-010, Driver.AbCip-011 |
+| 5 | Security | No issues found |
+| 6 | Performance & resource management | Driver.AbCip-012 |
+| 7 | Design-document adherence | Driver.AbCip-013 |
+| 8 | Code organization & conventions | No issues found |
+| 9 | Testing coverage | Driver.AbCip-014 |
+| 10 | Documentation & comments | Driver.AbCip-015 |
+
+## Findings
+
+### Driver.AbCip-001
+
+| Field | Value |
+|---|---|
+| Severity | High |
+| Category | Correctness & logic bugs |
+| Location | `AbCipDriver.cs:111`, `AbCipDriver.cs:163-167` |
+| Status | Resolved |
+
+**Description:** `InitializeAsync(string driverConfigJson, ...)` never reads `driverConfigJson`. It builds all device/tag state from `_options`, captured at construction time. `ReinitializeAsync` calls `ShutdownAsync` then `InitializeAsync(driverConfigJson, ...)` and the JSON it is handed is silently discarded. `ReinitializeAsync` is documented (class remarks, lines 18-21) as the Tier-B escape hatch and is the IDriver entry point for picking up changed config. As written, a reinitialize with an updated config JSON (new device, new tag, changed timeout) applies none of the changes; the driver keeps running stale construction-time options. There is no validation that the passed JSON even matches the live options.
+
+**Recommendation:** Either parse `driverConfigJson` inside `InitializeAsync` (re-deriving `AbCipDriverOptions` the way `AbCipDriverFactoryExtensions.CreateInstance` does, so config changes take effect on reinit), or, if config is intentionally immutable for the instance lifetime, document explicitly that AbCip ignores the parameter and assert the JSON is structurally identical to the construction options. Silently discarding it is the worst of both.
+
+**Resolution:** Resolved 2026-05-22 — extracted `AbCipDriverFactoryExtensions.ParseOptions` and `InitializeAsync` now re-parses a content-bearing `driverConfigJson`, replacing `_options` (and recreating the alarm projection) so `ReinitializeAsync` applies config changes; a blank/empty-object JSON keeps construction-time options for the test seam.
+
+### Driver.AbCip-002
+
+| Field | Value |
+|---|---|
+| Severity | High |
+| Category | Correctness & logic bugs |
+| Location | `AbCipStatusMapper.cs:65-78` |
+| Status | Resolved |
+
+**Description:** `MapLibplctagStatus` maps negative libplctag codes that do not match the libplctag.NET `Status` enum / native `libplctag.h` constants. `LibplctagTagRuntime.GetStatus()` returns `(int)_tag.GetStatus()`, the underlying value of the `Status` enum, whose members carry the native `PLCTAG_ERR_*` integer values. The real constants are `PLCTAG_ERR_BAD_CONNECTION = -7` (the only one the code gets right), `PLCTAG_ERR_NOT_FOUND = -18` (code expects -14), `PLCTAG_ERR_NOT_ALLOWED = -19` (code expects -16), `PLCTAG_ERR_OUT_OF_BOUNDS = -22` (code expects -17), `PLCTAG_ERR_TIMEOUT = -32` (code expects -5). Consequently a real timeout, not-found, not-allowed, or out-of-bounds error all fall through the switch to the `_ => BadCommunicationError` default. The driver reports `BadCommunicationError` for a non-existent tag instead of `BadNodeIdUnknown`, for a read-only tag instead of `BadNotWritable`, and for a timeout instead of `BadTimeout`. This defeats the transient-vs-permanent error classification the resilience pipeline relies on.
+
+**Recommendation:** Replace the hand-typed integer literals with the libplctag.NET `Status` enum members (Status.ErrorTimeout, Status.ErrorNotFound, Status.ErrorNotAllowed, Status.ErrorOutOfBounds, Status.ErrorBadConnection, etc.), or at minimum correct the integer values to -32 / -18 / -19 / -22. Map Status.Pending explicitly rather than treating "any positive value" as GoodMoreData.
+
+**Resolution:** Resolved 2026-05-22 — `MapLibplctagStatus` now switches on the libplctag.NET `Status` enum members (Ok/Pending/ErrorTimeout/ErrorNotFound/ErrorNotAllowed/ErrorOutOfBounds/…) instead of hand-typed integers; the `int` overload casts to `Status` so the `GetStatus()` seam stays correct against the wrapper's contiguous renumbering. Note: the live libplctag.NET 1.5.2 `Status` enum is renumbered contiguously, so the correct underlying integers are -32/-19/-18/-27, not the native -32/-18/-19/-22 the finding suggested; switching on the enum members sidesteps the hazard entirely.
+
+### Driver.AbCip-003
+
+| Field | Value |
+|---|---|
+| Severity | High |
+| Category | Correctness & logic bugs |
+| Location | `AbCipUdtMemberLayout.cs:32-54`, `AbCipDriver.cs:426-430`, `AbCipUdtReadPlanner.cs:48` |
+| Status | Resolved |
+
+**Description:** The whole-UDT read path (`ReadGroupAsync`) decodes each grouped member at the byte offset produced by `AbCipUdtMemberLayout.TryBuild`, which computes offsets purely from declaration order of the configured `AbCipStructureMember` list under natural-alignment rules. Logix does not guarantee that the controller lays UDT members out in declaration order: the Studio 5000 compiler reorders members (largest-first packing, BOOL host bytes, nested-struct padding) and the on-wire offsets only come from the CIP Template Object. The class remarks on `AbCipUdtMemberLayout` and `driver-specs.md` both acknowledge this. The decoder for the real shape (`CipTemplateObjectDecoder` / `AbCipTemplateCache`) exists and is populated by `FetchUdtShapeAsync`, but `ReadGroupAsync` never consults it: it always uses the declaration-only layout. For any UDT whose member declaration order in config differs from the controller compiled layout, whole-UDT reads return values decoded from the wrong offsets, silently plausible wrong numbers.
+
+**Recommendation:** In the read planner / `ReadGroupAsync`, prefer the cached `AbCipUdtShape` offsets (from `AbCipTemplateCache` / `FetchUdtShapeAsync`) when available, and only fall back to `AbCipUdtMemberLayout` declaration-order offsets when no template shape can be read. Even then, consider gating the declaration-only fast path behind an explicit opt-in flag, since it is correct only when the operator has hand-verified declaration order matches the controller.
+
+**Resolution:** Resolved 2026-05-22 — the declaration-only whole-UDT grouping fast path is now gated behind the new opt-in `AbCipDriverOptions.EnableDeclarationOnlyUdtGrouping` flag (default `false`); `AbCipUdtReadPlanner.Build` forms no groups when it is off, so by default every UDT member reads per-tag instead of decoding at possibly-wrong declaration-order offsets. The richer CIP Template Object path remains the long-term fix.
+
+### Driver.AbCip-004
+
+| Field | Value |
+|---|---|
+| Severity | Medium |
+| Category | Correctness & logic bugs |
+| Location | `AbCipDataType.cs:51-58`, `LibplctagTagRuntime.cs:47-49,53` |
+| Status | Resolved |
+
+**Description:** `ToDriverDataType` maps `LInt`/`ULInt` to `DriverDataType.Int32` (a TODO comment notes the gap) and `Dt` to `Int32`. But `LibplctagTagRuntime.DecodeValueAt` returns an actual `long` for `LInt`/`ULInt` (`_tag.GetInt64`, `(long)_tag.GetUInt64`). The address space is built declaring an Int32 node while the driver hands the server a boxed `long` `DataValueSnapshot.Value` at runtime: a mismatch between the declared OPC UA data type and the runtime value type. For `LInt` values exceeding Int32.MaxValue there is data loss if any consumer narrows it. `UDInt` is declared Int32 but decoded as `(int)_tag.GetUInt32`, so values above int.MaxValue wrap to negative.
+
+**Recommendation:** Either add Int64/UInt32/UInt64 to `DriverDataType` and map correctly, or, until that lands, decode `LInt`/`ULInt` consistently with the declared `Int32` type (and document the truncation), and decode `UDInt` as a value that fits Int32 semantics. The declared type and the runtime value type must agree.
+
+**Resolution:** Resolved 2026-05-22 — `ToDriverDataType` now maps `LInt`→`Int64`, `ULInt`→`UInt64`, `UDInt`→`UInt32` (all already present in `DriverDataType`); `DecodeValueAt` updated to return `uint`/`ulong` for UDInt/ULInt respectively so the declared type and runtime value agree. The `(int)` and `(long)` casts that caused truncation/wrap are removed.
+
+### Driver.AbCip-005
+
+| Field | Value |
+|---|---|
+| Severity | Medium |
+| Category | Correctness & logic bugs |
+| Location | `AbCipDriver.cs:124-141` |
+| Status | Resolved |
+
+**Description:** In `InitializeAsync`, when a `Structure` tag declares `Members`, the loop registers each fanned-out member into `_tagsByName` but the parent Structure tag itself is also left in `_tagsByName` (added at line 125 before the member check). A subsequent `ReadAsync` of the parent name routes through `ReadSingleAsync` then `DecodeValue(AbCipDataType.Structure, ...)` which returns `null` with `Good` status. A client reading the parent UDT node thus gets a Good/null snapshot rather than a fault or a structured value. Also, member registration does not check for name collisions: if two configured tags produce the same parent-dot-member key (or a member name collides with an independently-declared tag), the later silently overwrites the earlier with no diagnostic.
+
+**Recommendation:** Decide the parent-Structure read contract explicitly: either do not register the bare parent name as a readable tag, or have the Structure read return a proper status. Add a duplicate-key check during `_tagsByName` population that throws an `InvalidOperationException` naming both colliding tags, consistent with the fail-fast validation `AbCipHostAddress` parsing already does.
+
+**Resolution:** Resolved 2026-05-22 — The parent Structure tag remains in `_tagsByName` so the whole-UDT grouping planner (Driver.AbCip-003 fast path) and alarm projection can still find it. `ReadSingleAsync` now detects a direct read of a Structure-with-Members and returns `BadNotSupported` instead of Good/null, documenting that callers must address individual member paths. Both scalar and member fan-out registration perform a duplicate-key check that throws `InvalidOperationException` naming both colliding entries (fail-fast, consistent with `AbCipHostAddress` validation).
+
+### Driver.AbCip-006
+
+| Field | Value |
+|---|---|
+| Severity | Medium |
+| Category | OtOpcUa conventions |
+| Location | `PlcTagHandle.cs:28-59`, `AbCipDriver.cs:806-807,832-833`, `LibplctagTagRuntime.cs:117` |
+| Status | Resolved |
+
+**Description:** `driver-specs.md` makes the SafeHandle-wrapped native handle a non-negotiable Tier-B protection ("Wrap every libplctag handle in a SafeHandle with finalizer calling plc_tag_destroy"). The repo ships `PlcTagHandle : SafeHandle` for this, but it is dead code: `ReleaseHandle` is a permanent no-op (the comment says the `plc_tag_destroy` P/Invoke "is deferred to PR 3", well past the commit under review), and `DeviceState.TagHandles` is never populated anywhere in the driver. The real native lifetime is delegated to the libplctag.NET `Tag` object own `Dispose()`. The mandated finalizer-backed leak protection therefore does not exist: if a `LibplctagTagRuntime` is GC-collected without `Dispose` (owning thread crashes, exception bypasses the device dispose path), whether the native tag is freed depends entirely on whether libplctag.NET `Tag` has its own finalizer, which is not guaranteed by this driver code as the design requires.
+
+**Recommendation:** Either delete `PlcTagHandle` and `DeviceState.TagHandles` as misleading dead scaffolding and document that native lifetime is owned by libplctag.NET `Tag` finalizer (verifying that `Tag` actually has one), or finish the intended design by making `LibplctagTagRuntime` hold a real `PlcTagHandle` with a working `ReleaseHandle` calling `plc_tag_destroy`.
+
+**Resolution:** Resolved 2026-05-22 — `PlcTagHandle.cs` deleted; `DeviceState.TagHandles` removed from `DeviceState`; its `DisposeHandles` loop cleaned up. The class-level doc comment on `AbCipDriver` updated to document that native lifetime is owned by libplctag.NET `Tag.Dispose()` (called in `DisposeHandles`) with the library's own finalizer covering GC-collected instances. The two dead-code test methods for `PlcTagHandle` removed from `AbCipDriverTests`.
+
+### Driver.AbCip-007
+
+| Field | Value |
+|---|---|
+| Severity | Low |
+| Category | OtOpcUa conventions |
+| Location | `AbCipDriver.cs` (whole file), `AbCipAlarmProjection.cs`, `LibplctagTagRuntime.cs` |
+| Status | Open |
+
+**Description:** `CLAUDE.md` Library Preferences mandate Serilog with a rolling daily file sink. The driver has no logging at all: no `ILogger`/Serilog dependency is injected or used. Failure paths instead swallow exceptions into the `_health` string (`ReadSingleAsync`, `WriteAsync`, `FetchUdtShapeAsync` catch-all, `ProbeLoopAsync` empty catch, `AbCipAlarmProjection.RunPollLoopAsync` empty catch). An operator looking at server logs sees nothing for a probe loop failing every tick for hours, a template decode that silently returned null, or an alarm poll loop throwing every interval. The health surface carries only the last error message, so a transient error immediately overwrites a more important earlier one.
+
+**Recommendation:** Inject an `ILogger` (Serilog) and log at least device init failures, per-call read/write transport errors (debounced), probe-loop failures, template-read failures, and alarm-poll-loop exceptions. The health surface is for state, not for the audit trail.
+
+**Resolution:** _(open)_
+
+### Driver.AbCip-008
+
+| Field | Value |
+|---|---|
+| Severity | High |
+| Category | Concurrency & thread safety |
+| Location | `AbCipDriver.cs:144-152`, `AbCipDriver.cs:169-183`, `AbCipDriver.cs:235-281` |
+| Status | Resolved |
+
+**Description:** Probe loops are started fire-and-forget (`_ = Task.Run(() => ProbeLoopAsync(state, ct), ct)`) and the resulting Task is never stored or awaited. `ShutdownAsync` cancels `state.ProbeCts`, then immediately disposes it, sets it null, and calls `state.DisposeHandles()` without waiting for `ProbeLoopAsync` to observe the cancellation and exit. Races: (1) the still-running probe loop may be mid-await against a `ProbeCts` that `ShutdownAsync` has already disposed, producing `ObjectDisposedException` on the loop thread; (2) `DisposeHandles` clears `Runtimes`/`ParentRuntimes` while a concurrent `ReadAsync`/`WriteAsync` from the alarm projection or a subscription poll could be iterating or adding to those plain `Dictionary` instances (not thread-safe), corrupting the dictionary or throwing; (3) the probe runtime created inside `ProbeLoopAsync` is never tracked by `DeviceState`, so `DisposeHandles` cannot dispose it; only the loop own finally does, which may run after `ShutdownAsync` returns.
+
+**Recommendation:** Store each probe Task on `DeviceState`; in `ShutdownAsync` cancel the CTS, then await Task.WhenAll (with a timeout) before disposing the CTS or the handles. Guard `Runtimes`/`ParentRuntimes` with a lock or switch to `ConcurrentDictionary`. Make `ShutdownAsync` idempotent and safe against in-flight `ReadAsync`/`WriteAsync`.
+
+**Resolution:** Resolved 2026-05-22 — each probe loop's `Task` is stored on `DeviceState.ProbeTask`; `ShutdownAsync` now runs three phases (cancel every CTS, then await each probe Task with a 10s timeout, then dispose the CTS + handles) so the loop never touches a disposed CTS or cleared dictionary. `DeviceState.Runtimes` / `ParentRuntimes` are now `ConcurrentDictionary`, and `EnsureTagRuntimeAsync` / `EnsureParentRuntimeAsync` use `TryAdd` and dispose the losing concurrent creator instead of leaking it. `ShutdownAsync` stays idempotent (a second call sees the cleared `_devices`).
+
+### Driver.AbCip-009
+
+| Field | Value |
+|---|---|
+| Severity | Medium |
+| Category | Concurrency & thread safety |
+| Location | `AbCipDriver.cs:621-648`, `AbCipDriver.cs:591-614` |
+| Status | Resolved |
+
+**Description:** `EnsureTagRuntimeAsync` and `EnsureParentRuntimeAsync` are check-then-act on a non-thread-safe `Dictionary` (`device.Runtimes` / `device.ParentRuntimes`). `ReadAsync` is `IReadable` and may be invoked concurrently: the server read path, each polled subscription loop, and the alarm projection poll loop all call `ReadAsync` independently. Two concurrent `ReadAsync` calls that both miss the cache for the same tag both create a `LibplctagTagRuntime`, both initialize it, and both write into the dictionary; the loser leaks an initialized native tag (never disposed, since only the dictionary value is disposed at shutdown), and concurrent `Dictionary` mutation can throw or corrupt the bucket structure. `WriteBitInDIntAsync` serializes the parent via a per-parent `SemaphoreSlim`, but `EnsureParentRuntimeAsync` still runs the same unguarded check-then-act on the shared `ParentRuntimes` dict.
+
+**Recommendation:** Use `ConcurrentDictionary` for `Runtimes` and `ParentRuntimes`, creating the runtime via `GetOrAdd` with a lazily-initialized factory, or guard the ensure path with a per-device lock / `SemaphoreSlim`. Ensure the losing creator runtime is disposed rather than leaked.
+
+**Resolution:** Resolved 2026-05-22 — already addressed as part of the Driver.AbCip-008 fix: `DeviceState.Runtimes` and `ParentRuntimes` were switched to `ConcurrentDictionary`; both `EnsureTagRuntimeAsync` and `EnsureParentRuntimeAsync` use the `TryGetValue` → create → `TryAdd` → dispose-loser pattern so concurrent callers that both miss the cache produce exactly one live runtime and the losing creator is disposed rather than leaked.
+
+### Driver.AbCip-010
+
+| Field | Value |
+|---|---|
+| Severity | Medium |
+| Category | Error handling & resilience |
+| Location | `AbCipDriver.cs:621-648`, `AbCipDriver.cs:346-391` |
+| Status | Resolved |
+
+**Description:** Once `EnsureTagRuntimeAsync` successfully creates and initializes a `LibplctagTagRuntime`, that runtime is cached for the lifetime of the device and never re-created on failure. If the underlying native tag enters a permanently-bad state (connection dropped, controller rebooted, tag handle invalidated by a PLC program download), every subsequent `ReadAsync`/`WriteAsync` reuses the same dead handle and returns errors forever. The probe loop does tear down and recreate its runtime after a failure, but the read/write path has no equivalent recovery; only a full `ReinitializeAsync` (itself broken, see Driver.AbCip-001) clears the cache. The normal data path should self-heal from a transient handle fault without operator-driven reinitialize.
+
+**Recommendation:** On a non-zero libplctag status or transport exception in `ReadSingleAsync`/`ReadGroupAsync`/`WriteAsync`, evict the offending runtime from `device.Runtimes` (and dispose it) so the next call re-creates and re-initializes it. Mirror the probe loop recreate-on-failure behavior.
+
+**Resolution:** Resolved 2026-05-22 — added `EvictRuntime(device, tagName)` helper that calls `ConcurrentDictionary.TryRemove` + disposes the evicted instance; called from `ReadSingleAsync`, `ReadGroupAsync`, and `WriteAsync` on both non-zero libplctag status and transport exceptions (type/value-conversion exceptions are not transport faults and do not evict). The next read/write for the affected tag re-runs `EnsureTagRuntimeAsync`, which creates and initializes a fresh handle, mirroring the probe loop's recreate-on-failure behaviour.
+
+### Driver.AbCip-011
+
+| Field | Value |
+|---|---|
+| Severity | Low |
+| Category | Error handling & resilience |
+| Location | `AbCipDriver.cs:144-152`, `AbCipDriverOptions.cs:131-143` |
+| Status | Open |
+
+**Description:** `InitializeAsync` only starts probe loops when `_options.Probe.Enabled` is true AND `Probe.ProbeTagPath` is non-blank. When `Probe.Enabled` is true (the default) but `ProbeTagPath` is null (also the default; the doc comment says "PR 8 wires this up"), no probe runs at all and the device `HostState` stays `HostState.Unknown` forever. `GetHostStatuses()` then reports every device as Unknown indefinitely with no warning. An operator who enables the probe but does not set a probe tag gets a silently inert health surface rather than an error or a log line.
+
+**Recommendation:** When `Probe.Enabled` is true but no `ProbeTagPath` is configured, either fail initialization with a clear message, fall back to a family-default probe tag (the doc comment stated intent), or at minimum log a warning that the probe is enabled-but-inert.
+
+**Resolution:** _(open)_
+
+### Driver.AbCip-012
+
+| Field | Value |
+|---|---|
+| Severity | Low |
+| Category | Performance & resource management |
+| Location | `LibplctagTemplateReader.cs:15-35`, `AbCipDriver.cs:88-92` |
+| Status | Open |
+
+**Description:** `LibplctagTemplateReader` is created per `FetchUdtShapeAsync` call, and each call constructs a fresh libplctag `Tag` for the @udt pseudo-tag, initializes it (a CIP connection handshake), reads, and disposes it. There is no reuse of the `Tag` across template reads for the same device: every UDT shape fetch pays a full connect/init cost. `AbCipTemplateCache` caches the decoded shape so this only bites on the first fetch of each type, but discovery of a UDT-heavy controller still does one connect per type. The same per-call `Tag` construction applies to `LibplctagTagEnumerator`.
+
+**Recommendation:** Acceptable for a low-frequency discovery path, but consider pooling/reusing a single @udt-capable `Tag` per device for the duration of a discovery run, or document that the per-type connect cost is accepted.
+
+**Resolution:** _(open)_
+
+### Driver.AbCip-013
+
+| Field | Value |
+|---|---|
+| Severity | Low |
+| Category | Design-document adherence |
+| Location | `AbCipDriverOptions.cs:70-73`, `PlcFamilies/AbCipPlcFamilyProfile.cs:13-19`, `LibplctagTagRuntime.cs:16-27` |
+| Status | Open |
+
+**Description:** `driver-specs.md` specifies the AB CIP per-device connection settings as discrete fields: Host, Path, PlcType, TimeoutMs, AllowPacking, ConnectionSize. The implementation instead collapses host + path into a single opaque ab:// URL string and exposes `PlcFamily` (which adds GuardLogix, not in the spec table). AllowPacking and ConnectionSize from the spec are not configurable per device: `AbCipPlcFamilyProfile` hard-codes `SupportsRequestPacking` and `DefaultConnectionSize` per family, and `LibplctagTagRuntime` never passes a connection-size or packing attribute to the `Tag` (it is constructed with only Gateway/Path/PlcType/Protocol/Name/Timeout). The family profile `DefaultConnectionSize`/`SupportsRequestPacking`/`MaxFragmentBytes` fields are computed but never applied to the wire layer: dead configuration.
+
+**Recommendation:** Either update `driver-specs.md` to describe the actual ab:// host-address model and the family-profile approach, and wire the profile ConnectionSize/packing values through to the libplctag `Tag` attributes; or expose AllowPacking/ConnectionSize as per-device options per the spec.
+
+**Resolution:** _(open)_
+
+### Driver.AbCip-014
+
+| Field | Value |
+|---|---|
+| Severity | Medium |
+| Category | Testing coverage |
+| Location | `tests/Drivers/ZB.MOM.WW.OtOpcUa.Driver.AbCip.Tests/AbCipStatusMapperTests.cs:28-40` |
+| Status | Resolved |
+
+**Description:** `AbCipStatusMapperTests.MapLibplctagStatus_maps_known_codes` asserts the mapper against the same wrong integer constants (-5, -7, -14, -16, -17) the production code uses (see Driver.AbCip-002). The test locks in the bug rather than catching it, giving false confidence that libplctag error mapping is correct. There is no test that drives an actual libplctag `Status` enum value through `LibplctagTagRuntime.GetStatus()` plus `MapLibplctagStatus` end-to-end. Separately, the broken `ReinitializeAsync` config-discard behavior (Driver.AbCip-001) and the declaration-order whole-UDT decode risk (Driver.AbCip-003) have no test that would fail when those defects are present: `AbCipDriverWholeUdtReadTests` only exercises a UDT whose declaration order happens to match a simple alignment layout.
+
+**Recommendation:** Rewrite the libplctag-status test to use the real `libplctag.Status` enum members and their documented integer values. Add a test that `ReinitializeAsync` with a changed config JSON actually applies the change (or asserts the documented immutability contract). Add a whole-UDT decode test where the controller compiled layout differs from declaration order.
+
+**Resolution:** Resolved 2026-05-22 — status mapper test already uses real `Status` enum members (fixed with Driver.AbCip-002); `ReinitializeAsync` config-change coverage already added with Driver.AbCip-001. Added to `AbCipDriverCodeReviewRegressionTests`: three tests for 004 (LInt/ULInt/UDInt type-mapping theory + UDInt decoded-as-uint assertion), three tests for 005 (Structure parent read returns BadNotSupported, duplicate scalar key throws, member-collision-with-independent-tag throws), and one test for 010 (eviction on bad status means next read creates a fresh handle). `AbCipDriverTests.AbCipDataType_maps_atomics_to_driver_types` extended with LInt/ULInt/UDInt assertions.
+
+### Driver.AbCip-015
+
+| Field | Value |
+|---|---|
+| Severity | Low |
+| Category | Documentation & comments |
+| Location | `AbCipDriver.cs:9-11`, `PlcTagHandle.cs:23-27,53-58`, `AbCipTemplateCache.cs:12-15`, `IAbCipTagEnumerator.cs:6-11`, `AbCipDriverOptions.cs:21` |
+| Status | Open |
+
+**Description:** Numerous comments are stale relative to the commit under review. `AbCipDriver.cs:9-11` says the driver "Implements IDriver only for now" with capabilities shipping "in subsequent PRs (3-8)" while the class already implements all of them. `PlcTagHandle.cs` says the plc_tag_destroy P/Invoke "is deferred to PR 3 ... PR 2 ships the lifetime scaffold + tests only" and `ReleaseHandle` "is a no-op", which now reads as a permanent unfinished-work marker (see Driver.AbCip-006). `AbCipTemplateCache.cs:12-15` says "Template shape read ... lands with PR 6 ... no reader writes to it yet" while `CipTemplateObjectDecoder` and `LibplctagTemplateReader` both exist and `FetchUdtShapeAsync` writes to the cache. `IAbCipTagEnumerator.cs:6-11` says the enumerator "Defaults to EmptyAbCipTagEnumeratorFactory" while the production default is `LibplctagTagEnumeratorFactory`. `AbCipDriverOptions.cs:21` says "AB discovery lands in PR 5", already shipped. `StyleGuide.md` explicitly says not to leave stale coming-soon notes.
+
+**Recommendation:** Sweep the module for PR-N forward references and "lands in PR X" notes that have been delivered; update them to describe present behavior. Where a comment marks genuinely unfinished work (e.g. `PlcTagHandle.ReleaseHandle`), convert it to a tracked TODO with an issue reference rather than a PR-number milestone.
+
+**Resolution:** _(open)_
@@ -0,0 +1,213 @@
+# Code Review — Driver.AbLegacy.Cli
+
+| Field | Value |
+|---|---|
+| Module | `src/Drivers/Cli/ZB.MOM.WW.OtOpcUa.Driver.AbLegacy.Cli` |
+| Reviewer | Claude Code |
+| Review date | 2026-05-22 |
+| Commit reviewed | `76d35d1` |
+| Status | Reviewed |
+| Open findings | 6 |
+
+## Checklist coverage
+
+A comprehensive review completes every category, recording "No issues found" where
+a category produced nothing rather than leaving it blank.
+
+| # | Category | Result |
+|---|---|---|
+| 1 | Correctness & logic bugs | Driver.AbLegacy.Cli-001, Driver.AbLegacy.Cli-002 |
+| 2 | OtOpcUa conventions | No issues found |
+| 3 | Concurrency & thread safety | Driver.AbLegacy.Cli-003 |
+| 4 | Error handling & resilience | Driver.AbLegacy.Cli-001, Driver.AbLegacy.Cli-004 |
+| 5 | Security | No issues found |
+| 6 | Performance & resource management | No issues found |
+| 7 | Design-document adherence | Driver.AbLegacy.Cli-005 |
+| 8 | Code organization & conventions | Driver.AbLegacy.Cli-006 |
+| 9 | Testing coverage | Driver.AbLegacy.Cli-007 |
+| 10 | Documentation & comments | Driver.AbLegacy.Cli-002, Driver.AbLegacy.Cli-005 |
+
+## Findings
+
+### Driver.AbLegacy.Cli-001
+
+| Field | Value |
+|---|---|
+| Severity | Medium |
+| Category | Error handling & resilience |
+| Location | `Commands/WriteCommand.cs:46`, `Commands/WriteCommand.cs:62-72` |
+| Status | Resolved |
+
+**Description:** `WriteCommand.ExecuteAsync` calls `ParseValue(Value, DataType)` at
+line 46, *before* the `try` block and outside any catch. `ParseValue` uses
+`short.Parse` / `int.Parse` / `float.Parse`, which throw `FormatException` on
+malformed input (`-v abc`) and `OverflowException` on out-of-range input
+(`-t Int -v 99999`). Only the `Bit` branch and the unsupported-type branch raise
+the CliFx `CommandException` that the framework renders as a clean one-line error
+with a non-zero exit code. For every numeric type a bad `--value` therefore
+escapes as an unhandled `FormatException`/`OverflowException`, which CliFx prints
+as a raw stack trace — an operator-hostile failure mode for a tool whose whole
+purpose is ad-hoc operator use. The module own test
+`ParseValue_non_numeric_for_numeric_types_throws` confirms the raw `FormatException`
+leaks. The driver `WriteAsync` has dedicated catch arms for `FormatException`
+(`BadTypeMismatch`) and `OverflowException` (`BadOutOfRange`), but the CLI never
+reaches the driver because the parse happens first.
+
+**Recommendation:** Wrap the numeric parses so a parse failure surfaces as a
+`CliFx.Exceptions.CommandException` with a message naming the offending value and
+type (mirroring the existing `Bit` and unsupported-type branches). Either catch
+`FormatException`/`OverflowException` inside `ParseValue` and rethrow as
+`CommandException`, or use `TryParse` and throw `CommandException` on failure.
+
+**Resolution:** Resolved 2026-05-22 — wrapped numeric parses in `ParseValue` with `try/catch` for `FormatException`/`OverflowException`, rethrowing as `CommandException` with a message naming the offending value and type; updated test to assert `CommandException` and added overflow regression test.
+
+### Driver.AbLegacy.Cli-002
+
+| Field | Value |
+|---|---|
+| Severity | Low |
+| Category | Correctness & logic bugs |
+| Location | `Commands/WriteCommand.cs:27-29`, `Program.cs:6-9` |
+| Status | Open |
+
+**Description:** The `--value` option help text states "booleans accept
+true/false/1/0", but `ParseBool` (`WriteCommand.cs:74-80`) and the error message
+also accept `on/off` and `yes/no`, and `DriverClis.md` documents the full
+`true/false/1/0/yes/no/on/off` set as the shared CLI contract. The help text
+under-documents the accepted aliases, so an operator reading `--help` will not
+discover `on`/`off`/`yes`/`no`. Minor, but it makes the inline help inconsistent
+with both the code and the design doc.
+
+**Recommendation:** Extend the `--value` description to list the full alias set,
+matching the wording used elsewhere (e.g. "booleans accept
+true/false, 1/0, on/off, yes/no").
+
+**Resolution:** _(open)_
+
+### Driver.AbLegacy.Cli-003
+
+| Field | Value |
+|---|---|
+| Severity | Low |
+| Category | Concurrency & thread safety |
+| Location | `Commands/SubscribeCommand.cs:47-53` |
+| Status | Open |
+
+**Description:** The `OnDataChange` handler calls `console.Output.WriteLine(line)`
+(the synchronous overload) directly from the `PollGroupEngine` poll thread. The
+poll engine raises change events from a background timer/loop thread, so two
+ticks that fire close together can interleave writes on the shared `TextWriter`.
+`SnapshotFormatter` builds the whole line into a single string before the call,
+so a line is unlikely to be torn mid-token, but there is no synchronisation
+guaranteeing that the background-thread writes do not interleave with the
+`await console.Output.WriteLineAsync(...)` "Subscribed to ..." line on the command
+thread, nor with each other. This is the same pattern as the AbCip CLI, so it is
+a shared low-severity issue, not unique to this module.
+
+**Recommendation:** Serialise console writes from the event handler — e.g. funnel
+change events through a `Channel<string>` drained by a single consumer task, or
+guard the `WriteLine` with a lock. At minimum, document that the interleaving is
+accepted because output is human-facing and line-buffered.
+
+**Resolution:** _(open)_
+
+### Driver.AbLegacy.Cli-004
+
+| Field | Value |
+|---|---|
+| Severity | Low |
+| Category | Error handling & resilience |
+| Location | `Commands/ProbeCommand.cs:37-56`, `Commands/ReadCommand.cs:39-50`, `Commands/WriteCommand.cs:48-59`, `Commands/SubscribeCommand.cs:41-76` |
+| Status | Open |
+
+**Description:** Every command does `await using var driver = new AbLegacyDriver(...)`
+*and* an explicit `await driver.ShutdownAsync(...)` in the `finally`. `AbLegacyDriver`
+`DisposeAsync` itself calls `ShutdownAsync`, so the driver is shut down twice on the
+normal path. `ShutdownAsync` is written to be idempotent (it clears `_devices` /
+`_tagsByName` and re-enters cleanly on an empty state), so this is not a crash, but
+the double teardown is redundant and slightly obscures intent — a reader has to
+confirm idempotency to be sure it is safe. The `await using` already guarantees
+cleanup on every exit path including exceptions.
+
+**Recommendation:** Drop either the `await using` or the explicit
+`finally { await driver.ShutdownAsync(...) }` in each command. Keeping the explicit
+`finally` and using a plain `var driver` (no `await using`) is the clearer choice,
+since the commands deliberately pass `CancellationToken.None` to shutdown so teardown
+is not cut short by a cancelled `ct`.
+
+**Resolution:** _(open)_
+
+### Driver.AbLegacy.Cli-005
+
+| Field | Value |
+|---|---|
+| Severity | Low |
+| Category | Design-document adherence |
+| Location | `Commands/SubscribeCommand.cs:23-25`, `docs/Driver.AbLegacy.Cli.md:94-96` |
+| Status | Open |
+
+**Description:** The subscribe command interval option is `--interval-ms`
+(default 1000). `docs/Driver.AbLegacy.Cli.md` shows the subscribe example as
+`otopcua-ablegacy-cli subscribe ... -i 500`, which works because of the short
+alias `'i'`, but the doc never names the long form `--interval-ms` or states the
+1000 ms default, while the equivalent AbCip CLI help text notes "PollGroupEngine
+floors sub-250ms values". The AbLegacy `--interval-ms` description omits that
+flooring caveat, so an operator passing `-i 100` against AbLegacy gets no warning
+that the engine will floor it. The behaviour is identical (same `PollGroupEngine`)
+but the documented contract drifts between the two CLIs.
+
+**Recommendation:** Add the sub-250 ms flooring note to the AbLegacy
+`--interval-ms` description for parity with the AbCip CLI, and mention the
+`--interval-ms` long form + 1000 ms default in `docs/Driver.AbLegacy.Cli.md`.
+
+**Resolution:** _(open)_
+
+### Driver.AbLegacy.Cli-006
+
+| Field | Value |
+|---|---|
+| Severity | Low |
+| Category | Code organization & conventions |
+| Location | `Commands/ProbeCommand.cs:20-22` |
+| Status | Open |
+
+**Description:** `ProbeCommand` declares its `--type` option with no short alias,
+while `ReadCommand`, `WriteCommand`, and `SubscribeCommand` all declare `--type`
+with the short alias `'t'`. `ProbeCommand` also gives `--address` the alias `'a'`,
+matching the other commands, so the `--type` omission is an inconsistency rather
+than a deliberate design choice. An operator who learns `-t` on `read` will find
+it silently rejected on `probe`.
+
+**Recommendation:** Add the `'t'` short alias to `ProbeCommand` `--type` option
+for consistency with the other three commands. (The AbCip CLI `ProbeCommand` has
+the same omission, so a cross-CLI sweep is worthwhile.)
+
+**Resolution:** _(open)_
+
+### Driver.AbLegacy.Cli-007
+
+| Field | Value |
+|---|---|
+| Severity | Low |
+| Category | Testing coverage |
+| Location | `tests/Drivers/Cli/ZB.MOM.WW.OtOpcUa.Driver.AbLegacy.Cli.Tests/WriteCommandParseValueTests.cs` |
+| Status | Open |
+
+**Description:** The only test file in the CLI test project covers
+`WriteCommand.ParseValue` and `ReadCommand.SynthesiseTagName`. Two behaviours that
+are pure logic (testable without a device) are uncovered:
+(1) `AbLegacyCommandBase.BuildOptions` — that it sets `Probe.Enabled = false`,
+populates `Devices` from `Gateway`/`PlcType`, and forwards the tag list; a
+regression here silently changes every command behaviour.
+(2) the out-of-range numeric path for `ParseValue` (`short.Parse` overflow,
+`int.Parse` overflow) — `ParseValue_non_numeric_for_numeric_types_throws` asserts
+`FormatException` for non-numeric input but nothing asserts the overflow path,
+which is exactly the path that escapes uncaught per finding
+Driver.AbLegacy.Cli-001. `BuildOptions` is reachable via `InternalsVisibleTo`
+(the test assembly is already granted access).
+
+**Recommendation:** Add tests for `BuildOptions` (probe disabled, device shape,
+tag passthrough) and an overflow-input test for `ParseValue` so the fix for
+Driver.AbLegacy.Cli-001 is locked in by a regression test.
+
+**Resolution:** _(open)_
@@ -0,0 +1,365 @@
+# Code Review - Driver.AbLegacy
+
+| Field | Value |
+|---|---|
+| Module | `src/Drivers/ZB.MOM.WW.OtOpcUa.Driver.AbLegacy` |
+| Reviewer | Claude Code |
+| Review date | 2026-05-22 |
+| Commit reviewed | `76d35d1` |
+| Status | Reviewed |
+| Open findings | 3 |
+
+## Checklist coverage
+
+A comprehensive review completes every category, recording "No issues found" where
+a category produced nothing rather than leaving it blank.
+
+| # | Category | Result |
+|---|---|---|
+| 1 | Correctness & logic bugs | Driver.AbLegacy-001, Driver.AbLegacy-002, Driver.AbLegacy-003, Driver.AbLegacy-004 |
+| 2 | OtOpcUa conventions | Driver.AbLegacy-005 |
+| 3 | Concurrency & thread safety | Driver.AbLegacy-006, Driver.AbLegacy-007, Driver.AbLegacy-008 |
+| 4 | Error handling & resilience | Driver.AbLegacy-009, Driver.AbLegacy-010 |
+| 5 | Security | No issues found |
+| 6 | Performance & resource management | Driver.AbLegacy-011 |
+| 7 | Design-document adherence | Driver.AbLegacy-012 |
+| 8 | Code organization & conventions | Driver.AbLegacy-013 |
+| 9 | Testing coverage | No issues found |
+| 10 | Documentation & comments | No issues found |
+
+## Findings
+
+### Driver.AbLegacy-001
+
+| Field | Value |
+|---|---|
+| Severity | High |
+| Category | Correctness & logic bugs |
+| Location | `AbLegacyAddress.cs:54`, `AbLegacyDriver.cs:368-374` |
+| Status | Resolved |
+
+**Description:** `AbLegacyAddress.TryParse` accepts a `BitIndex` of `0..31` for every
+file type. A PCCC N-file word is a signed 16-bit integer, so valid bit indices are
+`0..15`. When a tag is `Bit`-typed against an N-file with a bit suffix of `16..31`
+(e.g. `N7:0/20`), `WriteBitInWordAsync` reads the parent as `AbLegacyDataType.Int`
+(16-bit), then computes `current | (1 << bit)` / `current & ~(1 << bit)` with `bit`
+up to 31. `1 << 20` produces a value outside the 16-bit range, the result is cast
+`(short)updated`, and the high bits are silently truncated - the wrong bit (or no
+bit) is written and no error is surfaced. The mask arithmetic is also done on a
+sign-extended `int`. For L-file (32-bit) bits the parent is still read as `Int`
+(16-bit), so bits 16..31 of a long can never be addressed correctly.
+
+**Recommendation:** Validate `BitIndex` against the parent word width during parse or
+in `WriteBitInWordAsync` - reject bit > 15 for N/B/I/O/S files and bit > 31 for L
+files. For bit-in-word RMW against L files, read the parent as `Long`. Mask the
+read-back value to the word width before applying the bit operation.
+
+**Resolution:** Resolved 2026-05-22 — `AbLegacyAddress.TryParse` now range-checks the
+bit index against per-file word width (0..15 for N/B/I/O/S/A, 0..31 for L, no bits on
+F); `WriteBitInWordAsync` reads/writes an L-file parent as 32-bit `Long` and masks the
+RMW arithmetic to the native width so sign-extension can no longer corrupt high bits.
+
+### Driver.AbLegacy-002
+
+| Field | Value |
+|---|---|
+| Severity | Medium |
+| Category | Correctness & logic bugs |
+| Location | `AbLegacyDriver.cs:368` |
+| Status | Resolved |
+
+**Description:** In `WriteBitInWordAsync` the parent word is decoded with
+`Convert.ToInt32(parentRuntime.DecodeValue(AbLegacyDataType.Int, ...))`.
+`LibplctagLegacyTagRuntime.DecodeValue` for `AbLegacyDataType.Int` returns
+`(int)_tag.GetInt16(0)` - a sign-extended `int`. When the current word has its high
+bit set (value 0x8000..0xFFFF, decoded as a negative `int`), the subsequent
+`(short)updated` cast re-encodes the low 16 bits correctly, but `current | (1 << bit)`
+is performed on the sign-extended value. The result is bit-correct for the low 16
+bits only because the cast preserves them; any future change to widen the mask range
+will break silently. Combined with finding 001 this is a latent correctness hazard.
+
+**Recommendation:** Mask `current` to `current & 0xFFFF` before the bit operation and
+operate on an explicitly 16-bit value, or document the reliance on low-16-bit
+preservation explicitly.
+
+**Resolution:** Resolved 2026-05-22 — `current & widthMask` already applied in `WriteBitInWordAsync` by the -001 fix; no additional change needed.
+
+### Driver.AbLegacy-003
+
+| Field | Value |
+|---|---|
+| Severity | Medium |
+| Category | Correctness & logic bugs |
+| Location | `AbLegacyAddress.cs:62-95` |
+| Status | Resolved |
+
+**Description:** `TryParse` does not reject several malformed PCCC addresses that the
+XML docs imply are invalid:
+- A sub-element and a bit index together (`T4:0.ACC/2`) parse successfully even
+  though no PCCC element supports both.
+- I/O/S files with a file number (`I3:0`, `S2:1`) parse successfully - I/O and S are
+  single-letter files with no file number per the doc comment, but the parser only
+  requires "letter then optional digits".
+- B-file addresses with a sub-element (`B3:0.DN`) parse successfully.
+`ToLibplctagName()` re-emits whatever was parsed, so a malformed address is passed
+through to libplctag rather than rejected early with a clear error.
+
+**Recommendation:** Tighten the parser: reject sub-element + bit-index combinations,
+reject file numbers on I/O/S, and restrict which file letters may carry a sub-element
+(T/C/R only). Add unit coverage for the rejection cases.
+
+**Resolution:** Resolved 2026-05-22 — `TryParse` now rejects sub-element+bit-index combinations, file numbers on I/O/S files, and sub-elements on non-T/C/R files; unit tests added in `AbLegacyAddressTests`.
+
+### Driver.AbLegacy-004
+
+| Field | Value |
+|---|---|
+| Severity | Medium |
+| Category | Correctness & logic bugs |
+| Location | `LibplctagLegacyTagRuntime.cs:36-37` |
+| Status | Resolved |
+
+**Description:** `DecodeValue` for `AbLegacyDataType.Bit` with `bitIndex == null`
+returns `_tag.GetInt8(0) != 0`. A bit-file element (`B3:0/0`) is a single bit inside
+a 16-bit word; reading only the low byte (`GetInt8(0)`) means a `Bit` tag whose live
+bit sits in bits 8..15 of the word, or a B-file element addressed without an explicit
+bit suffix, decodes incorrectly. The driver passes `parsed.ToLibplctagName()` which
+preserves the `/bit` suffix, so libplctag resolves the bit when a suffix is present -
+but a `Bit`-typed tag configured with an address that has no `/bit` suffix (e.g.
+`B3:0`) silently decodes the wrong thing.
+
+**Recommendation:** For `Bit` with no `bitIndex`, decide explicitly: either require a
+bit suffix on `Bit`-typed tags (validate in `CreateInstance`/`DiscoverAsync`) or
+decode the full 16-bit word and test bit 0.
+
+**Resolution:** Resolved 2026-05-22 — `DecodeValue` for `Bit` with no `bitIndex` now reads the full 16-bit word via `GetInt16(0)` and tests bit 0, avoiding the silent half-word truncation from `GetInt8`.
+
+### Driver.AbLegacy-005
+
+| Field | Value |
+|---|---|
+| Severity | Low |
+| Category | OtOpcUa conventions |
+| Location | `AbLegacyDriver.cs` (whole file) |
+| Status | Open |
+
+**Description:** The driver uses no `ILogger`/Serilog at all. Probe-loop failures,
+runtime initialisation failures, libplctag non-zero statuses, and read/write
+exceptions are folded into `DriverHealth.Detail` strings but never logged. CLAUDE.md
+names Serilog with a rolling daily file sink as the logging library. The complete
+absence of structured logging makes field diagnosis of a PCCC comms problem (timeout
+vs route failure vs wrong PLC family) rely entirely on a single overwritten `Detail`
+string that the next read or write immediately clobbers.
+
+**Recommendation:** Inject `ILogger<AbLegacyDriver>` (optional, like `tagFactory`) and
+log probe transitions, runtime-init failures, and the first occurrence of a non-zero
+libplctag status per device.
+
+**Resolution:** _(open)_
+
+### Driver.AbLegacy-006
+
+| Field | Value |
+|---|---|
+| Severity | High |
+| Category | Concurrency & thread safety |
+| Location | `AbLegacyDriver.cs:107-158`, `AbLegacyDriver.cs:162-234`, `LibplctagLegacyTagRuntime.cs` |
+| Status | Resolved |
+
+**Description:** A per-tag `IAbLegacyTagRuntime` (wrapping a single libplctag `Tag`)
+is cached in `DeviceState.Runtimes` and reused. `ReadAsync` (called directly by the
+server read path) and the `PollGroupEngine` poll loop (which also calls `ReadAsync`
+via the reader delegate) can run concurrently, and two poll subscriptions covering
+the same tag run on independent background tasks. All of them call
+`EnsureTagRuntimeAsync` to the same `Tag` instance and call `runtime.ReadAsync` /
+`GetStatus` / `DecodeValue` with no synchronisation. A libplctag `Tag` is not safe
+for concurrent operations on the same handle: an interleaved Read/GetStatus/DecodeValue
+from two threads can read a value mid-update or observe a status that belongs to the
+other operation. `WriteAsync` shares the same runtime dictionary and compounds the
+hazard. Only the bit-in-word RMW path is serialised (per-parent `SemaphoreSlim`).
+
+**Recommendation:** Serialise all operations against a given runtime - a per-runtime
+`SemaphoreSlim`, or a per-device read lock - so no two threads touch the same `Tag`
+handle concurrently.
+
+**Resolution:** Resolved 2026-05-22 — added a per-runtime `SemaphoreSlim`
+(`DeviceState.GetRuntimeLock`, keyed by tag name); `ReadAsync` and `WriteAsync` now
+hold it around the whole Read→GetStatus→Decode / Encode→Write→GetStatus sequence so the
+shared libplctag `Tag` handle is never touched by two threads at once.
+
+### Driver.AbLegacy-007
+
+| Field | Value |
+|---|---|
+| Severity | Medium |
+| Category | Concurrency & thread safety |
+| Location | `AbLegacyDriver.cs:411-438`, `AbLegacyDriver.cs:386-409` |
+| Status | Resolved |
+
+**Description:** `EnsureTagRuntimeAsync` and `EnsureParentRuntimeAsync` are
+check-then-act: `device.Runtimes.TryGetValue(...)` then, after `await
+runtime.InitializeAsync`, `device.Runtimes[def.Name] = runtime`. `Dictionary` is not
+thread-safe, and two concurrent callers for the same tag (read + poll, or two poll
+loops) both miss the lookup, both Create + InitializeAsync a runtime, and both write
+the dictionary. One runtime is overwritten and leaked - `DisposeRuntimes` only
+disposes what is currently in the dict - and concurrent `Dictionary` writes can
+corrupt internal state. `ParentRuntimes` has the identical pattern.
+
+**Recommendation:** Replace the runtime caches with `ConcurrentDictionary` and use
+`GetOrAdd`, or guard runtime creation under a per-device lock. Ensure the losing
+runtime of any race is disposed.
+
+**Resolution:** Resolved 2026-05-22 — `Runtimes` and `ParentRuntimes` changed to `ConcurrentDictionary`; `EnsureTagRuntimeAsync` and `EnsureParentRuntimeAsync` now hold a per-key `GetCreationLock` semaphore around the double-checked create+initialize+store sequence so exactly one runtime is created per key and no race-loser is leaked.
+
+### Driver.AbLegacy-008
+
+| Field | Value |
+|---|---|
+| Severity | Medium |
+| Category | Concurrency & thread safety |
+| Location | `AbLegacyDriver.cs:21`, `AbLegacyDriver.cs:138-146`, `AbLegacyDriver.cs:216-229` |
+| Status | Resolved |
+
+**Description:** `_health` is a plain non-volatile reference field mutated from
+`ReadAsync`, `WriteAsync` (both can run on multiple threads / poll loops) and
+`InitializeAsync`/`ShutdownAsync`, and read by `GetHealth()` from yet another thread.
+There is no lock, no `volatile`, and no `Interlocked` exchange. The record reference
+assignment is atomic, but without a memory barrier a reader can observe a stale
+`_health` indefinitely, and concurrent writers race so a `Healthy` write from one
+successful read can clobber a `Degraded` write from a concurrent failing read.
+`GetHealth()` may therefore report `Healthy` while reads are persistently failing.
+
+**Recommendation:** Mark `_health` volatile, or funnel health transitions through a
+lock / `Interlocked.Exchange`. Consider only downgrading on failure and upgrading on a
+successful poll so a single failed read does not flap the surface.
+
+**Resolution:** Resolved 2026-05-22 — `_health` marked `volatile`; memory barrier comment documents the acquire/release ordering guarantee.
+
+### Driver.AbLegacy-009
+
+| Field | Value |
+|---|---|
+| Severity | Medium |
+| Category | Error handling & resilience |
+| Location | `AbLegacyDriver.cs:41-74` |
+| Status | Resolved |
+
+**Description:** `InitializeAsync` starts probe loops with `Task.Run` inside the try
+block. If `InitializeAsync` fails - or is re-entered - after some probe loops are
+already started, the catch only sets `_health = Faulted` and rethrows; it does not
+cancel `state.ProbeCts`, dispose runtimes, or clear `_devices`. A caller that catches
+the exception and retries via `ReinitializeAsync` is covered (it calls `ShutdownAsync`
+first), but a caller that catches and abandons the driver leaves orphaned probe tasks
+and `CancellationTokenSource`s alive holding libplctag handles. Separately,
+`ProbeLoopAsync` never escalates a permanently-unreachable device beyond `Stopped`.
+
+**Recommendation:** On the catch path in `InitializeAsync`, run the same teardown as
+`ShutdownAsync` (cancel probe CTSs, dispose runtimes, clear dictionaries) before
+rethrowing, so a failed initialise leaves no live background work.
+
+**Resolution:** Resolved 2026-05-22 — `InitializeAsync` catch block now cancels and disposes probe CTSs, calls `DisposeRuntimes`, and clears `_devices`/`_tagsByName` before rethrowing, leaving no orphaned background tasks or handles.
+
+### Driver.AbLegacy-010
+
+| Field | Value |
+|---|---|
+| Severity | Medium |
+| Category | Error handling & resilience |
+| Location | `AbLegacyStatusMapper.cs:26-56` |
+| Status | Resolved |
+
+**Description:** `MapLibplctagStatus` maps the integer codes -5/-7/-14/-16/-17. These
+do not match the native libplctag PLCTAG_ERR_* constants (PLCTAG_ERR_TIMEOUT = -32,
+PLCTAG_ERR_NOT_FOUND = -22, PLCTAG_ERR_NOT_ALLOWED = -21, PLCTAG_ERR_OUT_OF_BOUNDS =
+-25, PLCTAG_ERR_BAD_CONNECTION = -8). The mapper operates on `(int)_tag.GetStatus()`,
+where `GetStatus()` returns the libplctag .NET wrapper Status enum whose underlying
+ordinals differ from the native codes - so the -5/-7/... values are at best the .NET
+enum ordinals (unverified, undocumented) and at worst wrong. Any unmatched negative
+status falls through to `BadCommunicationError`, so a timeout is reported as a generic
+comms error rather than `BadTimeout`. `MapPcccStatus` is dead code - the PCCC STS byte
+is never inspected because libplctag surfaces only its own status enum.
+
+**Recommendation:** Verify the actual `libplctag.Status` enum values against the 1.5.2
+package and map by enum name rather than magic integers. Either wire `MapPcccStatus`
+into a real PCCC-STS path or delete it as dead code. The same defect exists in
+`AbCipStatusMapper` and should be fixed in lockstep.
+
+**Resolution:** Resolved 2026-05-22 — `MapLibplctagStatus` now casts to `libplctag.Status` and switches on named enum members (matching the AbCip mapper pattern); `MapPcccStatus` retained with a comment documenting it as a reference mapping for future PCCC-STS inspection; tests updated to use `Status` enum members.
+
+### Driver.AbLegacy-011
+
+| Field | Value |
+|---|---|
+| Severity | Low |
+| Category | Performance & resource management |
+| Location | `AbLegacyDriver.cs:440` |
+| Status | Open |
+
+**Description:** `Dispose()` is implemented as
+`DisposeAsync().AsTask().GetAwaiter().GetResult()` - sync-over-async. `ShutdownAsync`
+awaits `_poll.DisposeAsync()` (which completes synchronously) and does no other real
+async work, so a deadlock is unlikely in practice, but the pattern blocks the calling
+thread and would deadlock if any awaited continuation were ever marshalled back to a
+single-threaded synchronization context.
+
+**Recommendation:** Prefer callers use `IAsyncDisposable`. If a synchronous `Dispose()`
+must exist, perform the synchronous teardown directly (cancel CTSs, dispose runtimes)
+rather than blocking on the async path.
+
+**Resolution:** _(open)_
+
+### Driver.AbLegacy-012
+
+| Field | Value |
+|---|---|
+| Severity | Medium |
+| Category | Design-document adherence |
+| Location | `PlcFamilies/AbLegacyPlcFamilyProfile.cs:7-54`, `AbLegacyDriver.cs:48-52` |
+| Status | Resolved |
+
+**Description:** `AbLegacyPlcFamilyProfile` declares four record properties -
+`DefaultCipPath`, `MaxTagBytes`, `SupportsStringFile`, `SupportsLongFile` - and only
+`LibplctagPlcAttribute` is ever consumed. In particular:
+- `DefaultCipPath` is dead: the per-family default path (empty for MicroLogix, 1,0
+  for SLC/PLC-5) is never used to substitute an empty CIP path. The CIP path always
+  comes verbatim from `AbLegacyHostAddress.CipPath`, so a SLC 500 misconfigured with
+  an empty path is never corrected to 1,0 even though the profile knows the right
+  default - contradicting the test-fixture doc, which calls out the /1,0 cip-path
+  workaround as required for SLC.
+- `MaxTagBytes` is never used to validate or chunk a string/array read.
+- `SupportsStringFile`/`SupportsLongFile` are never checked, so a `String` or `Long`
+  tag configured against a MicroLogix or PLC-5 (which the profile says lack them) is
+  accepted and only fails at runtime with an opaque comms error.
+
+**Recommendation:** Either consume the profile fields (substitute `DefaultCipPath` when
+the host CIP path is empty; reject `Long`/`String` tags against families whose profile
+sets the corresponding flag false; use `MaxTagBytes` for validation) or remove the
+unused fields and the doc comments that imply they are load-bearing.
+
+**Resolution:** Resolved 2026-05-22 — `DeviceState.EffectiveCipPath` applies `DefaultCipPath` when the parsed host address has an empty CIP path; `InitializeAsync` validates `Long`/`String` tag types against `SupportsLongFile`/`SupportsStringFile` and throws early; `MaxTagBytes` tracked as a follow-up (string/array chunking requires broader design work).
+
+### Driver.AbLegacy-013
+
+| Field | Value |
+|---|---|
+| Severity | Low |
+| Category | Code organization & conventions |
+| Location | `AbLegacyDriver.cs:340-345`, `AbLegacyDriver.cs:238-264` |
+| Status | Open |
+
+**Description:** Two minor organisational issues:
+1. `ResolveHost` returns `_options.Devices.FirstOrDefault()?.HostAddress ??
+   DriverInstanceId` when the reference is unknown and no devices are configured.
+   `DriverInstanceId` is not a host address (ab://...), so a downstream
+   `IHostConnectivityProbe` / host lookup keyed on the returned value never matches a
+   real device. Returning the instance id as a fake host masks a configuration error.
+2. `DiscoverAsync` always emits `IsArray: false` / `ArrayDim: null`. PCCC files are
+   inherently arrays of elements; a tag that genuinely addresses a multi-element
+   region cannot be represented. This is consistent with the PR-staged scope (the doc
+   says array coverage is thin) but should be tracked rather than silently shipped.
+
+**Recommendation:** For (1), either throw / return a sentinel the caller can detect, or
+document why falling back to the instance id is acceptable. For (2), record the
+array-addressing gap as a tracked follow-up.
+
+**Resolution:** _(open)_
@@ -0,0 +1,197 @@
+# Code Review — Driver.Cli.Common
+
+| Field | Value |
+|---|---|
+| Module | `src/Drivers/Cli/ZB.MOM.WW.OtOpcUa.Driver.Cli.Common` |
+| Reviewer | Claude Code |
+| Review date | 2026-05-22 |
+| Commit reviewed | `76d35d1` |
+| Status | Reviewed |
+| Open findings | 2 |
+
+## Checklist coverage
+
+A comprehensive review completes every category, recording "No issues found" where
+a category produced nothing rather than leaving it blank.
+
+| # | Category | Result |
+|---|---|---|
+| 1 | Correctness & logic bugs | Driver.Cli.Common-001, Driver.Cli.Common-002 |
+| 2 | OtOpcUa conventions | No issues found |
+| 3 | Concurrency & thread safety | Driver.Cli.Common-003 |
+| 4 | Error handling & resilience | Driver.Cli.Common-004 |
+| 5 | Security | No issues found |
+| 6 | Performance & resource management | No issues found |
+| 7 | Design-document adherence | No issues found |
+| 8 | Code organization & conventions | No issues found |
+| 9 | Testing coverage | Driver.Cli.Common-005 |
+| 10 | Documentation & comments | Driver.Cli.Common-006 |
+
+## Findings
+
+### Driver.Cli.Common-001
+
+| Field | Value |
+|---|---|
+| Severity | High |
+| Category | Correctness & logic bugs |
+| Location | `src/Drivers/Cli/ZB.MOM.WW.OtOpcUa.Driver.Cli.Common/SnapshotFormatter.cs:106-119` |
+| Status | Resolved |
+
+**Description:** The `FormatStatus` shortlist maps four OPC UA status names to incorrect
+numeric codes. The correct OPC UA spec values (verified against the OPC Foundation
+UA-.NETStandard `Opc.Ua.StatusCodes` table) are:
+
+| Name in shortlist | Code used | Correct code | What the used code actually is |
+|---|---|---|---|
+| `BadTimeout` | `0x80060000` | `0x800A0000` | `0x80060000` = `BadOutOfMemory` |
+| `BadNoCommunication` | `0x80070000` | `0x80310000` | `0x80070000` = `BadResourceUnavailable` |
+| `BadWaitingForInitialData` | `0x80080000` | `0x80320000` | `0x80080000` is not this name |
+| `BadNodeIdInvalid` | `0x80350000` | `0x80330000` | `0x80350000` = `BadNodeClassInvalid` |
+
+`Good` (`0x00000000`), `Bad` (`0x80000000`), `BadCommunicationError` (`0x80050000`),
+`BadNodeIdUnknown` (`0x80340000`), `BadTypeMismatch` (`0x80740000`), and `Uncertain`
+(`0x40000000`) are correct.
+
+This is operator-facing and load-bearing: the CLI whole purpose is to label driver
+status codes so a human can interpret a probe/read/write. A real device timeout
+(`0x800A0000`) renders as bare `0x800A0000` with no name, while an out-of-memory
+status (`0x80060000`) is mislabeled `BadTimeout`. A driver returning
+`BadNodeClassInvalid` (`0x80350000`) is mislabeled `BadNodeIdInvalid`. The
+`SnapshotFormatterTests` `[Theory]` cases for these codes assert against the wrong
+expectations and therefore pass while the mapping is wrong (see Driver.Cli.Common-005).
+
+**Recommendation:** Correct the four mappings to the spec values. Prefer deriving names
+from the OPC Foundation `Opc.Ua.StatusCodes` constants (the stack the project already
+depends on transitively) rather than hand-maintaining a hex shortlist, so the table
+cannot drift from the spec again. If a hand-list is kept, add a test that cross-checks
+each entry against `Opc.Ua.StatusCodes` reflection.
+
+**Resolution:** Resolved 2026-05-22 — corrected the four mismapped `FormatStatus` codes
+to their canonical `Opc.Ua.StatusCodes` values (`BadTimeout` 0x800A0000, `BadNoCommunication`
+0x80310000, `BadWaitingForInitialData` 0x80320000, `BadNodeIdInvalid` 0x80330000); the CLI
+project does not reference the `Opc.Ua` package so the hex literals were corrected in place
+with a sync note, and `SnapshotFormatterTests` was updated with corrected expectations plus
+a regression `[Theory]` asserting the pre-fix wrong names no longer apply.
+
+### Driver.Cli.Common-002
+
+| Field | Value |
+|---|---|
+| Severity | Medium |
+| Category | Correctness & logic bugs |
+| Location | `src/Drivers/Cli/ZB.MOM.WW.OtOpcUa.Driver.Cli.Common/SnapshotFormatter.cs:101-122` |
+| Status | Resolved |
+
+**Description:** `FormatStatus` matches the full 32-bit status word for exact equality
+against the shortlist. OPC UA status codes carry sub-code/flag bits in the low 16 bits
+(info type, structure-changed, semantics-changed, limit bits, overflow, etc.). A
+driver-supplied status such as `0x80050001` or any `Good` value with info bits set
+(e.g. an overflow bit) falls through the `switch` and renders as bare hex even though
+the high bits clearly identify the severity class. The doc comment on `FormatStatus`
+claims the well-known statuses are named, but only the bit-exact canonical forms are.
+
+**Recommendation:** Either (a) narrow the doc-comment claim to bit-exact canonical
+codes, or (b) match on the severity bits (`code & 0xC0000000`) to at least always emit
+`Good` / `Uncertain` / `Bad` even when sub-code bits are set, and match the named codes
+on the masked code (`code & 0xFFFF0000`).
+
+**Resolution:** Resolved 2026-05-22 — `FormatStatus` now matches named codes on `code & 0xFFFF0000` and falls back to a severity-class label (`Good`/`Uncertain`/`Bad`) via `code & 0xC0000000` for unknown sub-codes; the stale "bare-hex for unknown codes" test expectation was corrected to reflect the new severity-class fallback.
+
+### Driver.Cli.Common-003
+
+| Field | Value |
+|---|---|
+| Severity | Medium |
+| Category | Concurrency & thread safety |
+| Location | `src/Drivers/Cli/ZB.MOM.WW.OtOpcUa.Driver.Cli.Common/DriverCommandBase.cs:51-59` |
+| Status | Resolved |
+
+**Description:** `ConfigureLogging` assigns the process-global `Serilog.Log.Logger`
+without disposing the previously assigned logger and the library never calls
+`Log.CloseAndFlush()`. Each call creates a fresh `Logger` via `CreateLogger()` and
+overwrites `Log.Logger`; the prior instance (and its console sink) is never disposed
+or flushed. The class is the shared base for every driver CLI and the `subscribe` verb
+is long-running — if any command path re-invokes `ConfigureLogging` the buffered
+console sink is abandoned without a flush, and on process exit the final logger is also
+never flushed. Verbose debug output written just before exit can be lost.
+
+**Recommendation:** Call `Log.CloseAndFlush()` on shutdown (e.g. in a `finally` in the
+command `ExecuteAsync`, or via a `protected` disposal helper on this base). Treat
+`ConfigureLogging` as call-once / idempotent and document that. At minimum capture and
+dispose the previous logger if reconfiguration is genuinely intended.
+
+**Resolution:** Resolved 2026-05-22 — `ConfigureLogging` is now idempotent (guarded by `_loggingConfigured` field) and disposes the previous `Log.Logger` before overwriting; a new `protected static FlushLogging()` helper calls `Log.CloseAndFlush()` for commands to call in their `finally` blocks; XML doc updated accordingly.
+
+### Driver.Cli.Common-004
+
+| Field | Value |
+|---|---|
+| Severity | Low |
+| Category | Error handling & resilience |
+| Location | `src/Drivers/Cli/ZB.MOM.WW.OtOpcUa.Driver.Cli.Common/SnapshotFormatter.cs:68-70` |
+| Status | Open |
+
+**Description:** `FormatTable` calls `rows.Max(r => r.Tag.Length)` (and the same for the
+value and status columns) without guarding against empty input. When `tagNames` and
+`snapshots` are both empty (equal length, so the mismatch check at line 56 passes),
+`Enumerable.Max` throws `InvalidOperationException` ("Sequence contains no elements").
+A batch read that legitimately returns zero tags therefore crashes the formatter
+instead of producing an empty (header-only) table.
+
+**Recommendation:** Short-circuit on `rows.Length == 0` (return just the header +
+separator, or an explicit "no rows" line), or use `DefaultIfEmpty(0).Max(...)` for the
+width computations.
+
+**Resolution:** _(open)_
+
+### Driver.Cli.Common-005
+
+| Field | Value |
+|---|---|
+| Severity | Medium |
+| Category | Testing coverage |
+| Location | `tests/Drivers/Cli/ZB.MOM.WW.OtOpcUa.Driver.Cli.Common.Tests/SnapshotFormatterTests.cs:27-37` |
+| Status | Resolved |
+
+**Description:** The `FormatStatus_names_well_known_status_codes` `[Theory]` asserts
+`0x80060000 => "BadTimeout"`, which encodes the wrong spec value (see
+Driver.Cli.Common-001). The test passes because it validates the formatter against the
+same incorrect table, so the bug is invisible to CI. Additionally there is no coverage
+for: `DriverCommandBase` (`ConfigureLogging` verbose vs non-verbose level selection — no
+test exercises the base at all), `FormatTable` with empty input (Driver.Cli.Common-004
+would have been caught), `FormatValue` with array / enum / custom `object` values, and
+`FormatTimestamp` with `DateTimeKind.Unspecified` (the docs imply Unspecified is
+normalised but only `Local` is tested).
+
+**Recommendation:** Fix the `[Theory]` expectations once Driver.Cli.Common-001 is
+resolved, and add a test asserting each shortlist entry against the OPC Foundation
+`Opc.Ua.StatusCodes` constants so the table cannot silently drift. Add `FormatTable`
+empty-input and `DriverCommandBase` level-selection tests.
+
+**Resolution:** Resolved 2026-05-22 — added `FormatTable_with_empty_input_returns_header_only` (exercises the -004 fix), `FormatStatus_with_sub_code_bits_resolves_to_named_class` / `FormatStatus_unknown_sub_code_falls_back_to_severity_class` Theories (cover -002 fix), and a new `DriverCommandBaseTests` class with four tests covering verbose/non-verbose level selection, idempotency of `ConfigureLogging`, and `FlushLogging`; stale `FormatStatus_unknown_codes_fall_back_to_hex_only` expectation corrected to match the -002 severity-class fallback.
+
+### Driver.Cli.Common-006
+
+| Field | Value |
+|---|---|
+| Severity | Low |
+| Category | Documentation & comments |
+| Location | `src/Drivers/Cli/ZB.MOM.WW.OtOpcUa.Driver.Cli.Common/SnapshotFormatter.cs:71`, `src/Drivers/Cli/ZB.MOM.WW.OtOpcUa.Driver.Cli.Common/DriverCommandBase.cs:9` |
+| Status | Open |
+
+**Description:** Two minor doc inaccuracies. (1) The comment at `SnapshotFormatter.cs:71`
+states the "source-time column is fixed-width (ISO-8601 to ms) so no max-measurement
+needed" — true only when every snapshot has a non-null `SourceTimestampUtc`.
+`FormatTimestamp` returns `"-"` for a null timestamp, so a mixed table has a 1-char-wide
+cell in an otherwise 24-char column; the column is unaligned. Harmless (right-most, no
+padding consumer) but the stated invariant does not hold. (2) The `DriverCommandBase`
+class summary enumerates "Modbus / AB CIP / AB Legacy / S7 / TwinCAT" as the driver CLIs
+but omits FOCAS, which `docs/DriverClis.md` lists as the sixth CLI built on this shared
+library. The XML doc is stale relative to the shipped driver-CLI set.
+
+**Recommendation:** Reword the `SnapshotFormatter.cs:71` comment to note the column is
+right-most and intentionally unpadded rather than claiming fixed width. Add FOCAS to the
+`DriverCommandBase` class-summary driver list.
+
+**Resolution:** _(open)_
@@ -0,0 +1,183 @@
+# Code Review — Driver.FOCAS.Cli
+
+| Field | Value |
+|---|---|
+| Module | `src/Drivers/Cli/ZB.MOM.WW.OtOpcUa.Driver.FOCAS.Cli` |
+| Reviewer | Claude Code |
+| Review date | 2026-05-22 |
+| Commit reviewed | `76d35d1` |
+| Status | Reviewed |
+| Open findings | 5 |
+
+## Checklist coverage
+
+A comprehensive review completes every category, recording "No issues found" where
+a category produced nothing rather than leaving it blank.
+
+| # | Category | Result |
+|---|---|---|
+| 1 | Correctness & logic bugs | Driver.FOCAS.Cli-001 |
+| 2 | OtOpcUa conventions | No issues found |
+| 3 | Concurrency & thread safety | Driver.FOCAS.Cli-002 |
+| 4 | Error handling & resilience | Driver.FOCAS.Cli-001, Driver.FOCAS.Cli-003 |
+| 5 | Security | No issues found |
+| 6 | Performance & resource management | Driver.FOCAS.Cli-004 |
+| 7 | Design-document adherence | Driver.FOCAS.Cli-005 |
+| 8 | Code organization & conventions | No issues found |
+| 9 | Testing coverage | No issues found (see note) |
+| 10 | Documentation & comments | No issues found |
+
+> Category 9 note: per `docs/DriverClis.md` the FOCAS CLI deliberately ships
+> with no CLI-level test project (hardware-gated, followed the Tier-C isolation
+> work on task #220). The four command classes are thin pass-throughs to the
+> already-tested `FocasDriver`; the only CLI-local logic is `ParseValue` /
+> `ParseBool` / `SynthesiseTagName`, which the sibling CLIs cover with unit
+> tests. The absence of a `*.Cli.Tests` project is an intentional, documented
+> gap rather than a review finding — but see Driver.FOCAS.Cli-001 for the parse
+> path that would benefit most from coverage.
+
+## Findings
+
+### Driver.FOCAS.Cli-001
+
+| Field | Value |
+|---|---|
+| Severity | Low |
+| Category | Error handling & resilience |
+| Location | `Commands/WriteCommand.cs:58-68` |
+| Status | Open |
+
+**Description:** `WriteCommand.ParseValue` parses the numeric `--value` types
+(`Byte`/`Int16`/`Int32`/`Float32`/`Float64`) with `sbyte.Parse` / `short.Parse`
+/ etc. These throw raw `FormatException` or `OverflowException` for malformed or
+out-of-range input. Only the `Bit` case and the unsupported-type case throw
+`CliFx.Exceptions.CommandException`. CliFx renders a `CommandException` as a
+clean one-line error, but an uncaught `FormatException`/`OverflowException`
+surfaces as a full .NET stack trace — a poor experience for an operator who
+simply mistyped a value (e.g. `write -a R100 -t Int16 -v abc`). The parse
+failure occurs before any driver work, so the redundant stack trace also
+obscures that the write never reached the CNC.
+
+**Recommendation:** Wrap the numeric parses (e.g. via `TryParse` per type, or a
+`try`/`catch` that rethrows as `CommandException`) so malformed `--value` input
+produces a clean, actionable message naming the expected type and the rejected
+literal — consistent with how `ParseBool` already handles bad boolean input.
+The same pattern exists in the sibling S7 CLI; a shared helper in
+`Driver.Cli.Common` would fix both.
+
+**Resolution:** _(open)_
+
+### Driver.FOCAS.Cli-002
+
+| Field | Value |
+|---|---|
+| Severity | Low |
+| Category | Concurrency & thread safety |
+| Location | `Commands/SubscribeCommand.cs:45-51` |
+| Status | Open |
+
+**Description:** The `subscribe` command attaches an `OnDataChange` handler that
+calls the synchronous `console.Output.WriteLine`. `OnDataChange` is raised from
+the driver's `PollGroupEngine` tick thread, while the command's main flow writes
+the "Subscribed to ..." banner from the CliFx invocation thread. The CliFx
+`IConsole.Output` `TextWriter` is not documented as thread-safe; with a single
+poll group the change events are serialised, but the banner write at line 55-56
+can interleave with the first poll-driven change line. The handler is also never
+detached from the event before driver disposal — benign here because the driver
+is disposed in the same `finally`, but it leaves a dangling subscription if the
+command is ever refactored to reuse the driver.
+
+**Recommendation:** Write the "Subscribed" banner before wiring the
+`OnDataChange` handler (it is informational and ordering-sensitive), or guard
+console writes with a lock shared between the banner and the handler. Optionally
+detach the handler in the `finally` block before `ShutdownAsync` for symmetry
+with the `handle` teardown already present there.
+
+**Resolution:** _(open)_
+
+### Driver.FOCAS.Cli-003
+
+| Field | Value |
+|---|---|
+| Severity | Low |
+| Category | Error handling & resilience |
+| Location | `FocasCommandBase.cs:19` (`CncPort`), `FocasCommandBase.cs:27` (`TimeoutMs`), `Commands/SubscribeCommand.cs:23` (`IntervalMs`) |
+| Status | Open |
+
+**Description:** The numeric command options `--cnc-port`, `--timeout-ms`, and
+`--interval-ms` are accepted without range validation. A zero or negative
+`--cnc-port` produces an invalid `focas://host:<n>` string; `--timeout-ms 0`
+yields a zero `TimeSpan` operation timeout; a zero/negative `--interval-ms`
+produces a non-positive `publishingInterval` passed straight into
+`PollGroupEngine.Subscribe`. Depending on the engine tolerance these surface
+either as an opaque downstream exception or as a tight-spinning poll loop rather
+than a clear "value must be positive" message at the CLI boundary.
+
+**Recommendation:** Validate the three numeric options at the top of
+`ExecuteAsync` (or in `FocasCommandBase`) and throw a
+`CliFx.Exceptions.CommandException` when out of range — port in `1..65535`,
+timeout and interval strictly positive. The same gap exists across the sibling
+driver CLIs, so a shared validation helper in `Driver.Cli.Common` is the
+cleaner fix.
+
+**Resolution:** _(open)_
+
+### Driver.FOCAS.Cli-004
+
+| Field | Value |
+|---|---|
+| Severity | Low |
+| Category | Performance & resource management |
+| Location | `Commands/ProbeCommand.cs:37,54`; `Commands/ReadCommand.cs:37,46`; `Commands/WriteCommand.cs:45,54`; `Commands/SubscribeCommand.cs:39,73` |
+| Status | Open |
+
+**Description:** Every command declares `await using var driver = new FocasDriver(...)`
+**and** explicitly calls `await driver.ShutdownAsync(CancellationToken.None)` in
+the `finally` block. `FocasDriver.DisposeAsync()` itself calls `ShutdownAsync`,
+so shutdown runs twice per command invocation. `FocasDriver.ShutdownAsync` is
+idempotent (it clears `_devices` / `_tagsByName`, and the second pass iterates
+an empty collection), so there is no functional bug — but the redundant call is
+dead weight and obscures intent: a reader cannot tell whether the explicit
+`ShutdownAsync` or the `await using` is the real teardown.
+
+**Recommendation:** Drop the explicit `ShutdownAsync` from the `finally` blocks
+and rely on `await using` for disposal, or drop `await using` and keep the
+explicit teardown — but not both. The same redundancy exists in the sibling CLIs.
+
+**Resolution:** _(open)_
+
+### Driver.FOCAS.Cli-005
+
+| Field | Value |
+|---|---|
+| Severity | Low |
+| Category | Design-document adherence |
+| Location | `Commands/WriteCommand.cs:50`, `Commands/ProbeCommand.cs:50` (via `SnapshotFormatter.FormatStatus`) |
+| Status | Open |
+
+**Description:** `docs/Driver.FOCAS.Cli.md` documents `BadDeviceFailure` and
+`BadCommunicationError` as the key diagnostic signals an operator reads off
+`probe` / `write` output ("A `BadCommunicationError` means ... `BadDeviceFailure`
+after a successful connect means ..."). The FOCAS driver `FocasStatusMapper`
+also emits `BadNotWritable` (0x803B0000), `BadOutOfRange` (0x803C0000),
+`BadNotSupported` (0x803D0000), `BadDeviceFailure` (0x80550000),
+`BadInternalError` (0x80020000), and `BadTimeout` (0x800A0000). The shared
+`SnapshotFormatter.FormatStatus` shortlist only names `Good`, `Bad`,
+`BadCommunicationError`, `BadTimeout` (0x80060000 — note this is a *different*
+code than the mapper `BadTimeout` 0x800A0000), `BadNoCommunication`,
+`BadWaitingForInitialData`, `BadNodeIdUnknown`, `BadNodeIdInvalid`,
+`BadTypeMismatch`, and `Uncertain`. Consequently a FOCAS `write` to a
+non-writable address, a parameter-write rejected by the CNC, or a
+`BadDeviceFailure` session-setup rejection renders as a bare hex code
+(`0x803B0000`, `0x80550000`, …) with no name — directly contradicting the
+documented workflow where the operator is told to read those status names.
+
+**Recommendation:** Extend `SnapshotFormatter.FormatStatus` (in
+`Driver.Cli.Common`) to name the `Bad*` codes the native-protocol drivers
+actually emit — at minimum `BadNotWritable`, `BadOutOfRange`, `BadNotSupported`,
+`BadDeviceFailure`, `BadInternalError`, and the mapper `BadTimeout`
+(0x800A0000). The fix belongs in the shared library, but it is recorded here
+because the gap defeats this module documented `probe`/`write` diagnostic
+workflow; cross-reference the `Driver.Cli.Common` review.
+
+**Resolution:** _(open)_
@@ -0,0 +1,330 @@
+# Code Review — Driver.FOCAS
+
+| Field | Value |
+|---|---|
+| Module | `src/Drivers/ZB.MOM.WW.OtOpcUa.Driver.FOCAS` |
+| Reviewer | Claude Code |
+| Review date | 2026-05-22 |
+| Commit reviewed | `76d35d1` |
+| Status | Reviewed |
+| Open findings | 5 |
+
+## Checklist coverage
+
+A comprehensive review completes every category, recording "No issues found" where
+a category produced nothing rather than leaving it blank.
+
+| # | Category | Result |
+|---|---|---|
+| 1 | Correctness & logic bugs | Driver.FOCAS-001, Driver.FOCAS-002, Driver.FOCAS-003 |
+| 2 | OtOpcUa conventions | Driver.FOCAS-004 |
+| 3 | Concurrency & thread safety | Driver.FOCAS-005 |
+| 4 | Error handling & resilience | Driver.FOCAS-006, Driver.FOCAS-007 |
+| 5 | Security | No issues found |
+| 6 | Performance & resource management | Driver.FOCAS-008 |
+| 7 | Design-document adherence | Driver.FOCAS-009 |
+| 8 | Code organization & conventions | Driver.FOCAS-010, Driver.FOCAS-011 |
+| 9 | Testing coverage | Driver.FOCAS-012 |
+| 10 | Documentation & comments | No issues found |
+
+## Findings
+
+### Driver.FOCAS-001
+
+| Field | Value |
+|---|---|
+| Severity | High |
+| Category | Correctness & logic bugs |
+| Location | `FocasDriverFactoryExtensions.cs:54-86`, `FocasDriverFactoryExtensions.cs:132-140` |
+| Status | Resolved |
+
+**Description:** `FocasDriverConfigDto` exposes only `Backend`, `Series`, `TimeoutMs`,
+`Devices`, `Tags`, and `Probe`. It has no `FixedTree`, `AlarmProjection`, or
+`HandleRecycle` properties, and `CreateInstance` never sets those three options on
+`FocasDriverOptions`. As a result, a deployment that follows the documented config -
+`docs/drivers/FOCAS.md` shows `"FixedTree": { "Enabled": true }`,
+`"AlarmProjection": { "Enabled": true }`, and `"HandleRecycle": { "Enabled": true }`
+inside `Config` - is parsed with `PropertyNameCaseInsensitive` and the unknown sections
+are discarded. The features stay at their hard-coded defaults (all `Enabled = false`).
+The fixed-node tree never appears, alarm subscriptions throw `NotSupportedException`
+("FOCAS alarm projection is disabled"), and handle recycling never runs - despite the
+operator explicitly opting in.
+
+**Recommendation:** Add `FixedTree`, `AlarmProjection`, and `HandleRecycle` DTO classes
+to `FocasDriverConfigDto`, parse their `TimeSpan`/`bool` fields, and populate the
+corresponding `FocasDriverOptions` properties in `CreateInstance`. Consider enabling
+strict JSON handling (`UnmappedMemberHandling.Disallow`) so future unknown config
+sections fail loudly instead of being dropped.
+
+**Resolution:** Resolved 2026-05-22 — added `FixedTreeDto`/`AlarmProjectionDto`/`HandleRecycleDto` to `FocasDriverConfigDto` and `Build*` mappers in `CreateInstance` that populate the matching `FocasDriverOptions` properties (missing section / field keeps its default).
+
+### Driver.FOCAS-002
+
+| Field | Value |
+|---|---|
+| Severity | High |
+| Category | Correctness & logic bugs |
+| Location | `WireFocasClient.cs:164-179`, `FocasDriver.cs:513`, `FocasDriver.cs:593` |
+| Status | Resolved |
+
+**Description:** The fixed-tree bootstrap probes the `ProgramInfo` capability via
+`SafeTryProbe(() => client.GetProgramInfoAsync(ct))` and treats a non-null result as
+"supported". But `WireFocasClient.GetProgramInfoAsync` never throws on a FOCAS error
+return code: `ReadExecutingProgramNameAsync`, `ReadBlockCountAsync`, and
+`ReadOperationModeCodeAsync` all return `FocasResult<T>` envelopes, and the method
+substitutes defaults (`string.Empty`, `0`) when `IsOk` is false instead of throwing. It
+only throws from `RequireConnected()`. Consequently `GetProgramInfoAsync` always
+returns a non-null `FocasProgramInfo`, so `Capabilities.ProgramInfo` is set `true` even
+on a CNC series that returns `EW_FUNC`/`EW_NOOPT` for `cnc_exeprgname2`/`cnc_rdopmode`.
+The driver then emits the `Program/` and `OperationMode/` subtrees and polls them every
+tick against a controller that does not support them - the exact "nodes that only ever
+return BadDeviceFailure" outcome the capability suppression was designed to prevent
+(`docs/drivers/FOCAS.md`, "Per-series node suppression").
+
+**Recommendation:** Make `GetProgramInfoAsync` throw (or return a nullable result) when
+the underlying `cnc_exeprgname2` / `cnc_rdopmode` calls report a non-zero RC, so
+`SafeTryProbe` can correctly classify the series. At minimum require the program-name
+or op-mode read to be `IsOk` before declaring the capability present.
+
+**Resolution:** Resolved 2026-05-22 — `WireFocasClient.GetProgramInfoAsync` now throws `InvalidOperationException` when neither the `cnc_exeprgname2` nor the `cnc_rdopmode` read is `IsOk`, so `SafeTryProbe` records `ProgramInfo` as unsupported on series that answer `EW_FUNC`/`EW_NOOPT`.
+
+### Driver.FOCAS-003
+
+| Field | Value |
+|---|---|
+| Severity | Medium |
+| Category | Correctness & logic bugs |
+| Location | `FocasDriver.cs:71-79` |
+| Status | Resolved |
+
+**Description:** In `InitializeAsync`, capability-matrix validation only runs when
+`_devices.TryGetValue(tag.DeviceHostAddress, out var device)` succeeds. A tag whose
+`DeviceHostAddress` does not match any configured device (a common config typo, e.g. a
+trailing `:8193` mismatch or a wrong host) silently skips validation and is still added
+to `_tagsByName`. The mistake is not surfaced at load time - it only manifests at read
+time as `BadNodeIdUnknown` (`ReadAsync` lines 191-194), defeating the documented goal
+that "config errors now fail at load instead of per-read"
+(`docs/v2/focas-version-matrix.md`).
+
+**Recommendation:** After parsing the tag address, if `_devices` does not contain
+`tag.DeviceHostAddress`, throw an `InvalidOperationException` naming the tag and the
+unresolved device host so the operator fixes the typo at startup.
+
+**Resolution:** Resolved 2026-05-22 — `InitializeAsync` now throws `InvalidOperationException` naming the tag and the unresolved device when `_devices` does not contain `tag.DeviceHostAddress`, preventing silent skip-and-defer to per-read `BadNodeIdUnknown`.
+
+### Driver.FOCAS-004
+
+| Field | Value |
+|---|---|
+| Severity | Medium |
+| Category | OtOpcUa conventions |
+| Location | `FocasDriver.cs:374-379`, `WireFocasClient.cs:48-50` |
+| Status | Resolved |
+
+**Description:** `DiscoverAsync` emits user tags with
+`SecurityClass = tag.Writable ? SecurityClassification.Operate : SecurityClassification.ViewOnly`,
+and `FocasTagDefinition.Writable` defaults to `true` (also defaulted to `true` in the
+factory - `t.Writable ?? true`). But the production `wire` backend's
+`WireFocasClient.WriteAsync` unconditionally returns `FocasStatusMapper.BadNotWritable`
+- the driver is read-only against FOCAS by design (`docs/drivers/FOCAS.md`). The result
+is that every tag is advertised in the address space as a writable `Operate` node, yet
+every write attempt fails. This is misleading to OPC UA clients and to the
+`DriverNodeManager` ACL layer, which will grant write permission on nodes that can never
+be written.
+
+**Recommendation:** Either default `Writable` to `false` for the FOCAS driver, or have
+`DiscoverAsync` force `SecurityClassification.ViewOnly` when the active backend cannot
+write. Given the wire backend is read-only and is the only production backend, treating
+all FOCAS tags as `ViewOnly` is the simplest correct behaviour.
+
+**Resolution:** Resolved 2026-05-22 — `DiscoverAsync` now unconditionally emits `SecurityClassification.ViewOnly` for all user-authored tags; the `Writable` config field no longer influences the advertised security class since the wire backend never writes.
+
+### Driver.FOCAS-005
+
+| Field | Value |
+|---|---|
+| Severity | Medium |
+| Category | Concurrency & thread safety |
+| Location | `FocasDriver.cs:28`, `FocasDriver.cs:206-215`, `FocasDriver.cs:261`, `FocasDriver.cs:274` |
+| Status | Resolved |
+
+**Description:** `_health` is a plain (non-volatile) field mutated from multiple
+concurrent contexts - `ReadAsync`, `WriteAsync`, and the per-device `ProbeLoopAsync` can
+all run on different threads simultaneously (subscriptions go through `PollGroupEngine`
+timers; probe loops are `Task.Run`). Several updates are read-modify-write - e.g.
+`new DriverHealth(DriverState.Degraded, _health.LastSuccessfulRead, ...)` reads `_health`
+then writes a new instance - so a concurrent update can be lost or a stale
+`LastSuccessfulRead` propagated. While `DriverHealth` is an immutable record and the
+reference write is atomic, the lack of synchronization means `GetHealth()` can observe
+torn-in-time state and successful-read timestamps can regress.
+
+**Recommendation:** Guard `_health` reads/writes with a lock, or use
+`Interlocked.Exchange`/`Volatile` around the whole record reference and compute the new
+value from a single captured snapshot. The `DeviceState`/`HostState` transition already
+uses `ProbeLock`; apply the same discipline to driver health.
+
+**Resolution:** Resolved 2026-05-22 — All `_health` reads use `Volatile.Read(ref _health)` and all writes use `Volatile.Write(ref _health, ...)`, ensuring every thread observes the latest reference and multi-step read-modify-write sequences capture a stable snapshot before computing the new value.
+
+### Driver.FOCAS-006
+
+| Field | Value |
+|---|---|
+| Severity | Medium |
+| Category | Error handling & resilience |
+| Location | `FocasDriver.cs:859-874`, `WireFocasClient.cs:22-31` |
+| Status | Resolved |
+
+**Description:** `EnsureConnectedAsync` reuses the cached `IFocasClient` instance across
+a transient disconnect: it only checks `device.Client is { IsConnected: true }` and
+otherwise calls `ConnectAsync` again on the same object. For a `WireFocasClient` whose
+underlying `FocasWireClient` has been disposed (e.g. via a `HandleRecycle` /
+`DisposeClient` race, or a prior teardown), every subsequent call hits
+`FocasWireClient.ThrowIfDisposed` and throws `ObjectDisposedException`. In `ReadAsync`
+that exception is caught only by the generic `catch (Exception ex)` and mapped to a
+permanent `BadCommunicationError` - the device stays wedged with no recovery path until
+`ReinitializeAsync` is invoked, because the reconnect logic never discards the disposed
+client.
+
+**Recommendation:** On any connect/use failure, treat a disposed or non-connected client
+as unrecoverable and recreate it from `_clientFactory`. Simplest: in
+`EnsureConnectedAsync`, when `device.Client` is non-null but not connected, dispose and
+null it before creating a fresh instance, rather than retrying `ConnectAsync` on the
+stale object.
+
+**Resolution:** Resolved 2026-05-22 — `EnsureConnectedAsync` now unconditionally disposes and nulls any existing non-connected client before calling `_clientFactory.Create()`, preventing `ObjectDisposedException` loops on a stale `WireFocasClient` after a `HandleRecycle` race or prior teardown.
+
+### Driver.FOCAS-007
+
+| Field | Value |
+|---|---|
+| Severity | Low |
+| Category | Error handling & resilience |
+| Location | `FocasDriver.cs:140-148`, `FocasDriver.cs:478-484`, `FocasDriver.cs:529-533`, `FocasAlarmProjection.cs:61-63` |
+| Status | Open |
+
+**Description:** Numerous `try { ... } catch {}` blocks swallow every exception with no
+logging - `ShutdownAsync` (CTS cancel/dispose), `RecycleLoopAsync` (`DisposeClient`),
+`FixedTreeLoopAsync` transient catches, `ProbeLoopAsync`, and the alarm projection's
+`sub.Cts.Cancel()`. The driver takes no `ILogger` dependency at all (only
+`FocasWireClient` optionally accepts one, and the driver never supplies it). A CNC that
+is silently failing every probe/poll tick produces no diagnostic trail, which conflicts
+with the project's Serilog logging convention and forces field troubleshooting to rely
+solely on `GetHealth()`.
+
+**Recommendation:** Inject an `ILogger<FocasDriver>` and log caught exceptions in the
+poll/probe/recycle loops at `Debug`/`Warning`. Pass a logger into `FocasWireClient` so
+the per-response `Debug` entries it already emits are actually captured.
+
+**Resolution:** _(open)_
+
+### Driver.FOCAS-008
+
+| Field | Value |
+|---|---|
+| Severity | Low |
+| Category | Performance & resource management |
+| Location | `FocasDriver.cs:201`, `FocasDriver.cs:253` |
+| Status | Open |
+
+**Description:** `ReadAsync` and `WriteAsync` call `FocasAddress.TryParse(def.Address)`
+on every operation, even though `InitializeAsync` already parsed and validated every
+tag address. On a subscription hot path (each poll tick re-enters `ReadAsync`) this
+re-parses and allocates a `FocasAddress` record per tag per tick unnecessarily.
+
+**Recommendation:** Parse each tag address once at `InitializeAsync` and store the
+parsed `FocasAddress` on `FocasTagDefinition` (or in a side dictionary), so the runtime
+read/write paths use the cached value.
+
+**Resolution:** _(open)_
+
+### Driver.FOCAS-009
+
+| Field | Value |
+|---|---|
+| Severity | Low |
+| Category | Design-document adherence |
+| Location | `FocasDriverOptions.cs:110-115`, `FocasDriver.cs:468-486`, `FocasDriverFactoryExtensions.cs:75-80` |
+| Status | Open |
+
+**Description:** `FocasProbeOptions.Timeout` is parsed by the factory
+(`FocasProbeDto.TimeoutMs` to `FocasProbeOptions.Timeout`) but never consumed.
+`ProbeLoopAsync` calls `client.ProbeAsync(ct)` with only the probe-loop cancellation
+token; no per-probe timeout is applied, and `EnsureConnectedAsync` uses
+`_options.Timeout` rather than `Probe.Timeout`. A hung CNC socket during a probe blocks
+until the OS TCP timeout rather than the configured `Probe.Timeout`.
+
+**Recommendation:** Apply `Probe.Timeout` as a linked `CancellationTokenSource` timeout
+around the `ProbeAsync` call, or remove the dead `Timeout` field from
+`FocasProbeOptions` / `FocasProbeDto` if it is genuinely not intended.
+
+**Resolution:** _(open)_
+
+### Driver.FOCAS-010
+
+| Field | Value |
+|---|---|
+| Severity | Low |
+| Category | Code organization & conventions |
+| Location | `IFocasClient.cs:210-227` (`FocasOpMode`), `FocasConstants.cs:42-78` (`FocasOperationMode`) |
+| Status | Open |
+
+**Description:** There are two parallel operation-mode-to-text mappings with divergent
+labels. `FocasOpMode.ToText` (used by the driver fixed-tree `OperationMode/ModeText`
+node) yields `"TJOG"`, `"TEACH_IN_HANDLE"`; `FocasOperationModeExtensions.ToText` (in
+the Wire layer) yields `"T-JOG"`, `"TEACH-IN-HANDLE"`. They also use different fallback
+formats (`Mode{mode}` vs the bare number). The same concept is encoded twice with
+inconsistent results depending on which path renders it.
+
+**Recommendation:** Consolidate to a single op-mode enum + `ToText` helper shared by
+both the wire layer and the driver projection, with one canonical label set.
+
+**Resolution:** _(open)_
+
+### Driver.FOCAS-011
+
+| Field | Value |
+|---|---|
+| Severity | Low |
+| Category | Code organization & conventions |
+| Location | `IFocasClient.cs:275-287` (`FocasAlarmType`), `FocasAlarmProjection.cs:149-175` |
+| Status | Open |
+
+**Description:** `FocasAlarmType` declares its constants as `public const int`, but the
+only consumers - `FocasAlarmProjection.MapAlarmType(short type)` and
+`MapSeverity(short type)` - take a `short` and `switch` against these `int` constants. It
+compiles only because the values (0..13) fit in `short` range as constant expressions.
+The type mismatch is a latent maintenance hazard: adding a constant above
+`short.MaxValue`, or changing the projection signatures, would break the switch in
+non-obvious ways. `FocasAlarmType.All` is `-1` and is also passed where a `short` is
+expected by `ReadAlarmsAsync`.
+
+**Recommendation:** Declare the `FocasAlarmType` constants as `short` (or make it an
+`enum : short`) so the type matches the wire field width and the projection signatures.
+
+**Resolution:** _(open)_
+
+### Driver.FOCAS-012
+
+| Field | Value |
+|---|---|
+| Severity | Medium |
+| Category | Testing coverage |
+| Location | `FocasDriverFactoryExtensions.cs`, `FocasDriver.cs:495-629` (`FixedTreeLoopAsync`) |
+| Status | Resolved |
+
+**Description:** The unit test project does not exercise
+`FocasDriverFactoryExtensions.CreateInstance` with `FixedTree` / `AlarmProjection` /
+`HandleRecycle` config sections - which is why the config-mapping gap in
+Driver.FOCAS-001 was not caught. There is also no test that drives the fixed-tree
+bootstrap / capability-probe path (`FixedTreeLoopAsync`), so the false-positive
+`ProgramInfo` capability in Driver.FOCAS-002 is untested, and the
+`EnsureConnectedAsync` reconnect-after-disconnect path (Driver.FOCAS-006) has no
+coverage.
+
+**Recommendation:** Add factory tests that round-trip a full JSON config including the
+three opt-in sections and assert the options reach the driver; add a
+`FakeFocasClient`-driven test for fixed-tree bootstrap capability classification
+(including the unsupported-program-info case); add a reconnect test that disposes the
+fake client mid-session and asserts recovery.
+
+**Resolution:** Resolved 2026-05-22 — Added `FocasDriverMediumFindingsTests.cs` covering: unknown-DeviceHostAddress init throw (003), ViewOnly enforcement for all tags (004), Volatile `_health` under concurrent reads (005), reconnect-after-external-dispose recovery (006), and a factory full-round-trip test for all three opt-in config sections (012).
@@ -0,0 +1,237 @@
+# Code Review — Driver.Galaxy
+
+| Field | Value |
+|---|---|
+| Module | `src/Drivers/ZB.MOM.WW.OtOpcUa.Driver.Galaxy` |
+| Reviewer | Claude Code |
+| Review date | 2026-05-22 |
+| Commit reviewed | `76d35d1` |
+| Status | Reviewed |
+| Open findings | 4 |
+
+## Checklist coverage
+
+| # | Category | Result |
+|---|---|---|
+| 1 | Correctness & logic bugs | Driver.Galaxy-001, Driver.Galaxy-002, Driver.Galaxy-003, Driver.Galaxy-004 |
+| 2 | OtOpcUa conventions | Driver.Galaxy-005 |
+| 3 | Concurrency & thread safety | Driver.Galaxy-006, Driver.Galaxy-007 |
+| 4 | Error handling & resilience | Driver.Galaxy-001, Driver.Galaxy-008, Driver.Galaxy-009 |
+| 5 | Security | Driver.Galaxy-010 |
+| 6 | Performance & resource management | Driver.Galaxy-011, Driver.Galaxy-012 |
+| 7 | Design-document adherence | Driver.Galaxy-013 |
+| 8 | Code organization & conventions | No issues found |
+| 9 | Testing coverage | Driver.Galaxy-014 |
+| 10 | Documentation & comments | Driver.Galaxy-005, Driver.Galaxy-013 |
+
+## Findings
+
+### Driver.Galaxy-001
+
+| Field | Value |
+|---|---|
+| Severity | Critical |
+| Category | Error handling & resilience |
+| Location | `Runtime/EventPump.cs:128`, `GalaxyDriver.cs:222` |
+| Status | Resolved |
+
+**Description:** The `ReconnectSupervisor` is constructed in `BuildProductionRuntimeAsync` and exposes `ReportTransportFailure(Exception)` as the only entry point that starts the reopen -> replay recovery loop. Nothing in the driver ever calls `ReportTransportFailure` (a repo-wide search finds only the declaration). When the gateway `StreamEvents` stream faults, `EventPump.RunAsync` catches the exception, logs "reconnect supervisor (PR 4.5) handles restart", completes the channel, and exits — but the supervisor is never told. The result: a transient gateway transport drop permanently kills the event stream. Data-change notifications stop, no reconnect/replay runs, and `GetHealth()` keeps reporting `Healthy` because `_supervisor.IsDegraded` stays false. This is a production outage with no self-recovery.
+
+**Recommendation:** Wire the EventPump (and any gw RPC that observes a transport fault) to call `_supervisor.ReportTransportFailure(ex)`. The simplest path: give `EventPump` a fault callback (or expose a `StreamFaulted` event) that `GalaxyDriver` subscribes to and forwards to the supervisor. The supervisor's `ReopenAsync`/`ReplayAsync` must also restart the EventPump itself (see Driver.Galaxy-008).
+
+**Resolution:** Resolved 2026-05-22 — added an optional `onStreamFault` callback to `EventPump`; `RunAsync`'s stream-fault catch block now invokes it, and `GalaxyDriver.EnsureEventPumpStarted` wires it to `OnEventPumpStreamFault` which forwards the cause to `ReconnectSupervisor.ReportTransportFailure`, so a transient gw transport drop now drives reopen → replay. Regression coverage in `EventPumpStreamFaultTests`. Note: the EventPump itself is still not restarted on reconnect — that pump-restart gap remains tracked under Driver.Galaxy-008.
+
+### Driver.Galaxy-002
+
+| Field | Value |
+|---|---|
+| Severity | High |
+| Category | Correctness & logic bugs |
+| Location | `Browse/DataTypeMap.cs:13`, `Runtime/MxValueDecoder.cs:9` |
+| Status | Resolved |
+
+**Description:** `DataTypeMap.Map` maps Galaxy `mx_data_type` codes to six `DriverDataType` values (Boolean, Int32, Float32, Float64, String, DateTime) — there is no `Int64` arm. Yet `MxValueDecoder` and `MxValueEncoder` both fully support Int64 (`MxValue.Int64Value`, `Int64Array`), and the decoder's own XML doc claims "the seven Galaxy data types ... (Boolean, Int32, Int64, Float32, Float64, String, DateTime)". Any Galaxy attribute whose `mx_data_type` is the Int64 code (or any code > 5) falls through the `_ => DriverDataType.String` default. The address-space node is then created as a `String` variable while runtime reads decode an `Int64` boxed value — a type mismatch that produces wrong OPC UA `DataType`/`ValueRank` metadata and likely fails value coercion at the server node layer.
+
+**Recommendation:** Confirm the Galaxy `mx_data_type` integer code for 64-bit integers and add the explicit arm to `DataTypeMap.Map`. If the wire format genuinely has no Int64 type, correct the `MxValueDecoder`/`MxValueEncoder` doc comments instead. Either way the encoder/decoder and the type map must agree.
+
+**Resolution:** Resolved 2026-05-22 — added `6 => DriverDataType.Int64` to `DataTypeMap.Map`, extending the contiguous 0..5 scheme so the type map covers the same seven Galaxy data types `MxValueDecoder`/`MxValueEncoder` already decode/encode; Int64 attributes now build as Int64 nodes instead of falling through to the String default. Regression coverage in `DataTypeMapTests`.
+
+### Driver.Galaxy-003
+
+| Field | Value |
+|---|---|
+| Severity | Medium |
+| Category | Correctness & logic bugs |
+| Location | `Runtime/StatusCodeMap.cs:86` |
+| Status | Resolved |
+
+**Description:** `FromMxStatus` returns `Good` whenever `status.Success != 0`. The intent (per the surrounding comment "Honors the success flag") is that a non-zero `Success` means success. But if `MxStatusProxy.Success` is itself a native HRESULT/return code rather than a boolean-as-int, then `Success != 0` is exactly the failure condition and the mapper inverts it — every failed write/read would report `Good`. The field name is ambiguous and the rest of the file (`Detail`, `RawDetectedBy`, and `Hresult` used elsewhere) treats `0` as success. `GatewayGalaxyAlarmAcknowledger.cs:62` uses the opposite convention for the sibling field (`reply.Hresult != 0` means failure).
+
+**Recommendation:** Verify the semantics of `MxStatusProxy.Success` against the gateway proto contract. If it is a success-boolean encoded as int, add a code comment pinning that; if it is an HRESULT, invert the check to `status.Success == 0 => Good`.
+
+**Resolution:** Resolved 2026-05-22 — replaced `status.Success != 0` with `status.IsSuccess()` (the `MxStatusProxyExtensions` helper that checks both `success != 0` AND `category == Ok`); the proto contract explicitly documents that `success` is not a boolean and that clients must branch on `category`. Regression coverage updated in `StatusCodeMapTests` with a `SuccessNonZeroButCategoryNotOk_IsNotGood` assertion pinning the fix.
+
+### Driver.Galaxy-004
+
+| Field | Value |
+|---|---|
+| Severity | Medium |
+| Category | Correctness & logic bugs |
+| Location | `GalaxyDriver.cs:901` |
+| Status | Resolved |
+
+**Description:** `OnPumpDataChange` reconstructs a raw OPC DA quality byte from an OPC UA `StatusCode` for the probe watcher: it shifts `StatusCode >> 30` and maps `0->192, 1->64, _->0`. The `StatusCode` was itself produced upstream by `StatusCodeMap.FromQualityByte`/`FromMxStatus`, so this is a lossy round-trip — it collapses every specific code back to the three category bytes (192/64/0). That happens to satisfy `PerPlatformProbeWatcher.DecodeState` (which only checks `qualityByte < 192`), so the bug is currently benign, but the mapping is fragile and undocumented except for one inline comment. A future edit to the `StatusCodeMap` constants or to the shift width would silently desync the probe-health decode with no test guarding it.
+
+**Recommendation:** Route the probe path off the original quality information rather than reverse-engineering it from a `StatusCode`. Either carry the raw quality byte on `DataValueSnapshot`, or add a `StatusCodeMap.ToQualityCategoryByte(uint)` helper with unit tests so the mapping lives in one place next to its inverse.
+
+**Resolution:** Resolved 2026-05-22 — added `StatusCodeMap.ToQualityCategoryByte(uint)` helper that extracts top-two bits of the OPC UA StatusCode into the OPC DA category byte (Good=192, Uncertain=64, Bad=0); `GalaxyDriver.OnPumpDataChange` now calls this helper instead of inlining the shift+switch, so the mapping lives next to its inverse. Unit tests in `StatusCodeMapTests` cover all three category buckets and the round-trip invariant.
+
+### Driver.Galaxy-005
+
+| Field | Value |
+|---|---|
+| Severity | Low |
+| Category | OtOpcUa conventions |
+| Location | `Runtime/EventPump.cs:81-88` |
+| Status | Open |
+
+**Description:** The `BoundedChannelOptions` comment states "Newest-dropped policy: when full, the producer's TryWrite returns false ... We do this manually rather than relying on `BoundedChannelFullMode.DropWrite`" — but the option is then set to `FullMode = BoundedChannelFullMode.Wait`. With `Wait`, `TryWrite` returning `false` on a full channel is correct behaviour, so the code works, but the comment naming the mode and the actual mode disagree, which is confusing for a maintainer deciding whether the policy is `Wait`, `DropWrite`, or `DropNewest`.
+
+**Recommendation:** Either reword the comment to say "we use `Wait` mode but never call the awaitable `WriteAsync` — `TryWrite` gives us synchronous newest-dropped semantics", or switch to `BoundedChannelFullMode.DropWrite` and keep the manual drop count. Make the comment and the mode consistent.
+
+**Resolution:** _(open)_
+
+### Driver.Galaxy-006
+
+| Field | Value |
+|---|---|
+| Severity | Medium |
+| Category | Concurrency & thread safety |
+| Location | `GalaxyDriver.cs:848-861` |
+| Status | Resolved |
+
+**Description:** `OnAlarmFeedTransition` picks the "owner" handle with `_alarmSubscriptions.First()` under `_alarmHandlersLock`. `HashSet<T>.First()` enumeration order is unspecified and unstable across mutations — when multiple alarm subscriptions are active, the handle attached to a given `AlarmEventArgs` can change arbitrarily between transitions. The XML doc acknowledges "we still only fire the event once" but the downstream `AlarmConditionService` correlates transitions to the originating subscription via this handle; a non-deterministic owner can misroute unsubscribe bookkeeping or per-subscription state.
+
+**Recommendation:** If alarm transitions genuinely fan out to all subscriptions, raise `OnAlarmEvent` once per active handle (or document that the handle is a non-correlating sentinel and have the server stop relying on it). If a single owner is required, make the choice deterministic (e.g. the earliest-created handle) and stable.
+
+**Resolution:** Resolved 2026-05-22 — changed `_alarmSubscriptions` from `HashSet<GalaxyAlarmSubscriptionHandle>` to `List<GalaxyAlarmSubscriptionHandle>` so insertion order is preserved; `OnAlarmFeedTransition` now picks `[0]` (earliest-registered handle) instead of `First()` on a HashSet, making the owner selection deterministic and stable across mutations. Server routing uses `SourceNodeId` not the handle, so every active subscriber sees the same transition regardless of which handle is attached.
+
+### Driver.Galaxy-007
+
+| Field | Value |
+|---|---|
+| Severity | Medium |
+| Category | Concurrency & thread safety |
+| Location | `GalaxyDriver.cs:937-968` |
+| Status | Resolved |
+
+**Description:** `Dispose()` is not synchronized against the capability methods. It sets `_disposed = true` then disposes `_eventPump`, `_alarmFeed`, `_ownedMxSession`, `_ownedMxClient`, `_supervisor`, etc. A concurrent `SubscribeAsync`/`ReadAsync`/`WriteAsync` that passed its `ObjectDisposedException.ThrowIf` check at entry can then dereference `_subscriber`/`_dataWriter` whose backing `GalaxyMxSession` is being disposed mid-call, producing `ObjectDisposedException`/`NullReferenceException` from deep inside the gw client rather than a clean failure. `Dispose` also blocks the caller on `GetAwaiter().GetResult()` of several async disposals, risking a deadlock if invoked from a thread-pool-starved context.
+
+**Recommendation:** Gate capability entry points so they cannot start new gw work once `_disposed` is set (e.g. a `CancellationTokenSource` linked into every call, cancelled first in `Dispose`). Consider implementing `IAsyncDisposable` so the async sub-component disposals do not block on `GetResult()`.
+
+**Resolution:** Resolved 2026-05-22 — added `IAsyncDisposable` to `GalaxyDriver` and implemented `DisposeAsync()` as the primary disposal path that awaits each async sub-component (EventPump, AlarmFeed, MxSession, MxClient, RepositoryClient) without blocking; `Dispose()` delegates to `DisposeAsync().AsTask().GetAwaiter().GetResult()` for `using`-statement compatibility. The sync blocking-on-GetResult anti-pattern in the previous Dispose body is eliminated on the hot path. Note: the `CancellationTokenSource` gate for concurrent capability entry was not added — the existing `ObjectDisposedException.ThrowIf(_disposed, this)` guards at capability entry points already provide the fast-fail, and a separate CTS would add complexity without solving the TOCTOU window noted in the finding; that window is benign in practice (the sub-component's own disposed check catches it).
+
+### Driver.Galaxy-008
+
+| Field | Value |
+|---|---|
+| Severity | High |
+| Category | Error handling & resilience |
+| Location | `GalaxyDriver.cs:264-276`, `Runtime/EventPump.cs:97-103` |
+| Status | Resolved |
+
+**Description:** Even if Driver.Galaxy-001 is fixed and the supervisor's `ReplayAsync` runs, recovery is incomplete. `ReplayAsync` re-issues `SubscribeBulkAsync` for the tracked tags, but the `EventPump` background loop that consumes `StreamEvents` is not restarted. After a stream fault `EventPump.RunAsync` exits and `_channel` is completed; `EventPump.Start()` is a no-op (`if (_loop is not null) return`) because `_loop` is a completed-but-non-null task. So a replayed subscription has no consumer — values are subscribed on the gw but never reach `OnDataChange`. Additionally `ReplayAsync` never re-registers the new item handles the gw returns into `SubscriptionRegistry`; the old stale item handles remain, so even with a live pump the fan-out reverse-map would miss the post-reconnect handles.
+
+**Recommendation:** On reconnect, dispose and recreate the `EventPump` (or make it restartable), and have `ReplayAsync` update `SubscriptionRegistry` bindings with the new item handles returned by the post-reconnect `SubscribeBulkAsync`. Add an integration/parity test that drops the stream mid-subscription and asserts `OnDataChange` resumes.
+
+**Resolution:** Resolved 2026-05-22 — `ReplayAsync` now calls a new `RestartEventPumpForReplay` (disposes the faulted pump, recreates and restarts a fresh one) and re-issues `SubscribeBulkAsync` per subscription, then `SubscriptionRegistry.Rebind` swaps each subscription's stale pre-reconnect item handles for the post-reconnect handles so the fan-out reverse map dispatches to the live pump. New `SubscriptionRegistry.SnapshotEntries`/`Rebind` APIs back the per-subscription replay. Regression coverage in `SubscriptionRegistryTests` (Rebind/SnapshotEntries) and `EventPumpStreamFaultTests.FaultedPump_IsNotRestartableInPlace_ButAFreshPumpResumesDispatch`.
+
+### Driver.Galaxy-009
+
+| Field | Value |
+|---|---|
+| Severity | Medium |
+| Category | Error handling & resilience |
+| Location | `GalaxyDriver.cs:354-371` |
+| Status | Resolved |
+
+**Description:** `StartDeployWatcher` launches the watch loop with `_ = _deployWatcher.StartAsync(CancellationToken.None)` — a fire-and-forget with a discarded `Task`. `StartAsync` can throw synchronously (`InvalidOperationException` if already started); the discard masks that programming error. Separately, `StartDeployWatcher` builds an `_ownedRepositoryClient` purely for the watcher when discovery has not run yet — if `DiscoverAsync` later runs, `BuildDefaultHierarchySource` overwrites `_ownedRepositoryClient` with a second client, leaking the first (only the latest reference is disposed in `Dispose`).
+
+**Recommendation:** Await `StartAsync` (it completes synchronously after scheduling) or at least observe its result. Reuse a single `GalaxyRepositoryClient` across the deploy watcher and the hierarchy source instead of letting `BuildDefaultHierarchySource` clobber the field — guard the assignment or build the client once in `InitializeAsync`.
+
+**Resolution:** Resolved 2026-05-22 — (a) replaced `_ = _deployWatcher.StartAsync(...)` discard with an explicit variable + `IsFaulted` check so any synchronous throw from `StartAsync` (e.g. called-twice `InvalidOperationException`) propagates rather than being silently swallowed; (b) changed both `StartDeployWatcher` and `BuildDefaultHierarchySource` to use `_ownedRepositoryClient ??=` so a client built by the watcher is reused by discovery instead of being overwritten and leaked — only one `GalaxyRepositoryClient` instance is now created and disposed.
+
+### Driver.Galaxy-010
+
+| Field | Value |
+|---|---|
+| Severity | Low |
+| Category | Security |
+| Location | `GalaxyDriver.cs:311-341` |
+| Status | Open |
+
+**Description:** `ResolveApiKey` supports an `env:`/`file:` indirection and otherwise treats the config string as the literal API key ("Anything else — used as the literal API key. Convenient for dev"). `GalaxyGatewayOptions`' own XML doc claims "the API key never appears in cleartext config". The literal-key fallback silently permits a plaintext API key in the `DriverConfig` JSON column of the central config DB, contradicting the documented contract. There is no warning logged when the literal path is taken.
+
+**Recommendation:** Log a startup warning when `ResolveApiKey` falls through to the literal arm so an operator who accidentally committed a cleartext key sees it, and update the `GalaxyGatewayOptions` doc comment so it no longer over-promises. Consider gating the literal arm behind an explicit `dev:`-style prefix so a cleartext key cannot be used by accident.
+
+**Resolution:** _(open)_
+
+### Driver.Galaxy-011
+
+| Field | Value |
+|---|---|
+| Severity | Medium |
+| Category | Performance & resource management |
+| Location | `GalaxyDriver.cs:411` |
+| Status | Resolved |
+
+**Description:** `GetMemoryFootprint()` unconditionally returns `0` with a comment "PR 4.4 sets this from SubscriptionRegistry size" — PR 4.4 has shipped (the registry exists and is used) but the method was never updated. `IHostConnectivityProbe.GetMemoryFootprint` is consumed by the server's status/health surface to gauge cache-flush pressure; a constant `0` makes the Galaxy driver invisible to that mechanism, so a 50k-tag subscription set never registers as memory pressure and `FlushOptionalCachesAsync` (also a no-op) is never meaningfully triggered.
+
+**Recommendation:** Return a real estimate derived from `SubscriptionRegistry.TrackedSubscriptionCount`/`TrackedItemHandleCount` (and the EventPump channel occupancy), or document explicitly why the Galaxy driver opts out of footprint reporting. Remove the stale "PR 4.4 sets this" comment.
+
+**Resolution:** Resolved 2026-05-22 — replaced the constant `0` with a live estimate derived from `SubscriptionRegistry.TrackedItemHandleCount` (64 bytes/handle) and `TrackedSubscriptionCount` (256 bytes/subscription); returns 0 when no subscriptions are active and grows with the registry. The stale "PR 4.4 sets this" comment is removed. Regression coverage in `GalaxyDriverInfrastructureTests`.
+
+### Driver.Galaxy-012
+
+| Field | Value |
+|---|---|
+| Severity | Low |
+| Category | Performance & resource management |
+| Location | `Runtime/SubscriptionRegistry.cs:65-67`, `GalaxyDriver.cs:538`, `GalaxyDriver.cs:675` |
+| Status | Open |
+
+**Description:** Several hot paths are O(n^2) per call. `SubscriptionRegistry.ResolveSubscribers` does `entry.Bindings.FirstOrDefault(b => b.ItemHandle == itemHandle)` — a linear scan of the whole binding list for every event dispatch; at 50k tags this is 50k-element scans on the 1Hz fan-out path. `GalaxyDriver.SubscribeAsync` and `ReadViaSubscribeOnceAsync` correlate results to references with `results.FirstOrDefault(r => string.Equals(...))` inside a `for` loop over all references — O(n^2) over the subscribe batch. `SubscriptionRegistry.Remove` rebuilds a `ConcurrentBag` from a LINQ filter on every unsubscribe.
+
+**Recommendation:** Index `SubscriptionEntry` bindings by item handle (a `Dictionary<int, string>` per entry) so `ResolveSubscribers` is O(1) per subscriber. Project the `SubscribeResult` list into a `Dictionary<string, SubscribeResult>` (OrdinalIgnoreCase) once before the correlation loop. These matter on the documented 50k-tag soak path.
+
+**Resolution:** _(open)_
+
+### Driver.Galaxy-013
+
+| Field | Value |
+|---|---|
+| Severity | Low |
+| Category | Design-document adherence |
+| Location | `GalaxyDriver.cs:14-27`, `GalaxyDriver.cs:374-382`, `Config/GalaxyDriverOptions.cs:84-86` |
+| Status | Open |
+
+**Description:** Multiple doc comments are stale relative to the shipped code. `GalaxyDriver`'s class summary still describes the file as "the project skeleton with `IDriver` bodies that wire to a future `IGalaxyGatewayClient` abstraction. Capability interfaces ... land in PRs 4.1-4.7" and references the legacy `GalaxyProxyDriver` coexisting "until PR 7.2" — but PR 7.2 already deleted the legacy Galaxy projects and the capability interfaces are all implemented. `ReinitializeAsync` is still a stub ("for the skeleton we just refresh health") that ignores `driverConfigJson` entirely — a config reapply silently does nothing. `GalaxyReconnectOptions.ReplayOnSessionLost` is defined and documented but never read anywhere in the driver (`ReplayAsync` always replays).
+
+**Recommendation:** Refresh the `GalaxyDriver` class and `ReinitializeAsync` doc comments to describe the shipped state, implement or explicitly reject `ReinitializeAsync` config reapply, and either honour `ReplayOnSessionLost` or remove it from `GalaxyReconnectOptions`.
+
+**Resolution:** _(open)_
+
+### Driver.Galaxy-014
+
+| Field | Value |
+|---|---|
+| Severity | Medium |
+| Category | Testing coverage |
+| Location | `src/Drivers/ZB.MOM.WW.OtOpcUa.Driver.Galaxy` (module-wide) |
+| Status | Resolved |
+
+**Description:** The reconnect/recovery path is the module's highest-risk surface and is effectively untested at the integration seam. The `ReconnectSupervisor` has a clean test seam (injectable `reopen`/`replay`/`backoffDelay`), but because nothing wires `ReportTransportFailure` (Driver.Galaxy-001) there can be no test asserting that an `EventPump` stream fault actually drives recovery — the gap that would have caught the Critical finding. Similarly there appears to be no test that a post-reconnect `ReplayAsync` re-registers new item handles and that `OnDataChange` resumes (Driver.Galaxy-008). The `StatusCodeMap.FromMxStatus` `Success`-flag semantics (Driver.Galaxy-003) and the `DataTypeMap` Int64 gap (Driver.Galaxy-002) are also the kind of behaviour a focused unit test would pin.
+
+**Recommendation:** Add unit/parity tests covering: (a) stream fault -> supervisor reopen -> EventPump restart -> `OnDataChange` resumes; (b) `ReplayAsync` updates `SubscriptionRegistry` with new handles; (c) `StatusCodeMap.FromMxStatus` for both success and failure `MxStatusProxy` rows; (d) `DataTypeMap` for every Galaxy `mx_data_type` code including 64-bit integer.
+
+**Resolution:** Resolved 2026-05-22 — added `GalaxyDriverInfrastructureTests` covering `GetMemoryFootprint` (Driver.Galaxy-011) and `IAsyncDisposable` (Driver.Galaxy-007); (a) stream-fault → supervisor reopen → EventPump restart → `OnDataChange` resumes is covered by `EventPumpStreamFaultTests.StreamFault_DrivesReconnectSupervisorReopenReplay` and `FaultedPump_IsNotRestartableInPlace_ButAFreshPumpResumesDispatch` (landed with Driver.Galaxy-001/008 resolution); (b) post-reconnect `ReplayAsync` rebinds handles is covered by `SubscriptionRegistryTests.Rebind_*` suite; (c) `StatusCodeMap.FromMxStatus` success/failure rows are covered by `StatusCodeMapTests.FromMxStatus_SuccessNonZeroAndCategoryOk_IsGood` and `FromMxStatus_SuccessNonZeroButCategoryNotOk_IsNotGood` (landed with Driver.Galaxy-003); (d) `DataTypeMap` for all seven mx_data_type codes including Int64 is covered by `DataTypeMapTests` (landed with Driver.Galaxy-002).
@@ -0,0 +1,294 @@
+# Code Review — Driver.Historian.Wonderware.Client
+
+| Field | Value |
+|---|---|
+| Module | `src/Drivers/ZB.MOM.WW.OtOpcUa.Driver.Historian.Wonderware.Client` |
+| Reviewer | Claude Code |
+| Review date | 2026-05-22 |
+| Commit reviewed | `76d35d1` |
+| Status | Reviewed |
+| Open findings | 5 |
+
+## Checklist coverage
+
+A comprehensive review completes every category, recording "No issues found" where
+a category produced nothing rather than leaving it blank.
+
+| # | Category | Result |
+|---|---|---|
+| 1 | Correctness & logic bugs | Driver.Historian.Wonderware.Client-001, Driver.Historian.Wonderware.Client-002 |
+| 2 | OtOpcUa conventions | No issues found |
+| 3 | Concurrency & thread safety | Driver.Historian.Wonderware.Client-003, Driver.Historian.Wonderware.Client-004 |
+| 4 | Error handling & resilience | Driver.Historian.Wonderware.Client-005, Driver.Historian.Wonderware.Client-006 |
+| 5 | Security | Driver.Historian.Wonderware.Client-007, Driver.Historian.Wonderware.Client-008 |
+| 6 | Performance & resource management | No issues found |
+| 7 | Design-document adherence | No issues found |
+| 8 | Code organization & conventions | No issues found |
+| 9 | Testing coverage | Driver.Historian.Wonderware.Client-009 |
+| 10 | Documentation & comments | Driver.Historian.Wonderware.Client-010 |
+
+## Findings
+
+### Driver.Historian.Wonderware.Client-001
+
+| Field | Value |
+|---|---|
+| Severity | High |
+| Category | Correctness & logic bugs |
+| Location | `WonderwareHistorianClient.cs:98-113` |
+| Status | Resolved |
+
+**Description:** `ReadAtTimeAsync` violates the explicit `IHistorianDataSource.ReadAtTimeAsync`
+contract. The interface XML doc states: the returned list MUST be the same length and
+order as `timestampsUtc`, and gaps are returned as Bad-quality snapshots. The client passes
+`reply.Samples` straight through `ToSnapshots` with no check that the sidecar returned
+exactly one sample per requested timestamp, nor that the order matches. If the sidecar
+returns fewer/more samples (e.g. it drops boundary-less timestamps), the OPC UA
+HistoryReadAtTime service receives a result that the spec-compliant caller expects to
+index positionally against the request timestamps, silently misaligning values with
+timestamps. The matching `ReadAtTimeAsync_PreservesTimestampOrder` test only passes because
+the fake echoes the request verbatim; it never exercises a short/reordered reply.
+
+**Recommendation:** After receiving the reply, reconcile `reply.Samples` against
+`timestampsUtc` by timestamp: build the result array at `timestampsUtc.Count`, fill matched
+entries, and emit a Bad-quality (`0x80000000`) snapshot for any requested timestamp the
+sidecar did not return. Alternatively assert `reply.Samples.Length == timestampsUtc.Count`
+and fail loudly. Add a test where the fake returns a partial/reordered sample set.
+
+**Resolution:** Resolved 2026-05-22 — `ReadAtTimeAsync` now reconciles the sidecar reply against the requested timestamps via a new `AlignAtTimeSnapshots` helper: it indexes returned samples by timestamp ticks, builds the result at `timestampsUtc.Count` in request order, and emits a Bad-quality (`0x80000000`) snapshot for any requested timestamp the sidecar did not return; added the `ReadAtTimeAsync_PartialAndReorderedReply_AlignsByTimestamp_AndFillsGapsAsBad` regression test.
+
+### Driver.Historian.Wonderware.Client-002
+
+| Field | Value |
+|---|---|
+| Severity | Medium |
+| Category | Correctness & logic bugs |
+| Location | `WonderwareHistorianClient.cs:154-199`, `IAlarmHistorianSink.cs:66-74` |
+| Status | Resolved |
+
+**Description:** `WriteBatchAsync` can never return `HistorianWriteOutcome.PermanentFail`.
+`HistorianWriteOutcome` defines three states (`Ack`, `RetryPlease`, `PermanentFail`) and
+the drain worker is documented to move the event to the dead-letter table on
+`PermanentFail`. The client maps the sidecar `WriteAlarmEventsReply.PerEventOk` bool array
+to only `Ack`/`RetryPlease`, and the whole-call-failure and catch paths also only emit
+`RetryPlease`. A malformed alarm event the sidecar can never persist (unrecoverable SDK
+error on that specific row) therefore retries forever, blocking the head of the
+store-and-forward queue and never dead-lettering. The wire contract
+(`WriteAlarmEventsReply`) carries no per-event permanent/transient distinction, so the
+limitation is structural.
+
+**Recommendation:** Extend the wire contract: replace `bool[] PerEventOk` with a
+per-event status enum (Ack/Retry/Permanent), coordinated as an additive change on both
+sidecar and client per the Contracts.cs versioning rules, so unrecoverable events can be
+dead-lettered. Until then, document explicitly that this writer never produces
+`PermanentFail` and that poison events retry indefinitely.
+
+**Resolution:** Resolved 2026-05-22 — extending the wire contract (replacing `bool[] PerEventOk` with a per-event status enum) requires a coordinated change to the .NET 4.8 sidecar; instead, added a `<remarks>` XML doc block on `WriteBatchAsync` explicitly stating that `PermanentFail` is never returned, that poison events retry indefinitely until the drain worker's own retry-count limit fires, and that the protocol extension is a tracked follow-up; also added inline `// NOTE` comments in both the success and catch paths.
+
+### Driver.Historian.Wonderware.Client-003
+
+| Field | Value |
+|---|---|
+| Severity | Low |
+| Category | Concurrency & thread safety |
+| Location | `WonderwareHistorianClient.cs:207`, `WonderwareHistorianClient.cs:132-150` |
+| Status | Open |
+
+**Description:** `_totalQueries` is mutated with `Interlocked.Increment` in `Invoke`, but
+read inside `GetHealthSnapshot` under `_healthLock`, and every other counter
+(`_totalSuccesses`, `_totalFailures`, `_consecutiveFailures`) is mutated only under
+`_healthLock`. The two synchronization mechanisms do not compose: an `Interlocked`
+increment is not ordered against `lock`-protected reads, so a snapshot can observe a
+`_totalQueries` value inconsistent with the lock-protected counters. The window is small
+and the counters are advisory, but the mixed model is a latent hazard.
+
+**Recommendation:** Pick one mechanism. Simplest: move the `_totalQueries++` into the
+`_healthLock` block (a new `RecordQuery()` helper, or fold it into `RecordSuccess`/
+`RecordFailure`) so all six health fields share a single lock.
+
+**Resolution:** _(open)_
+
+### Driver.Historian.Wonderware.Client-004
+
+| Field | Value |
+|---|---|
+| Severity | Low |
+| Category | Concurrency & thread safety |
+| Location | `WonderwareHistorianClient.cs:203-267` |
+| Status | Open |
+
+**Description:** A sidecar-reported failure is recorded in two non-atomic steps under
+separate lock acquisitions: `Invoke` calls `RecordSuccess()` (line 211) and then the
+caller calls `ThrowIfFailed` which calls `ReclassifySuccessAsFailure()` (line 256),
+decrementing `_totalSuccesses` and incrementing `_totalFailures`. Between those two locked
+regions a concurrent `GetHealthSnapshot` can observe a transient state where the operation
+counts as both a success and not-yet-a-failure (`_totalSuccesses` inflated,
+`_consecutiveFailures` still 0). The undo-a-success/record-a-failure dance is also fragile:
+if a future change adds an early return or exception between `RecordSuccess` and
+`ThrowIfFailed`, the success is never reversed.
+
+**Recommendation:** Classify the call once: do not call `RecordSuccess` until the
+sidecar-level `Success` flag has been checked, or pass the reply success/error into a
+single `RecordOutcome(bool transportOk, bool sidecarOk, string? error)` that updates all
+counters under one lock acquisition.
+
+**Resolution:** _(open)_
+
+### Driver.Historian.Wonderware.Client-005
+
+| Field | Value |
+|---|---|
+| Severity | Medium |
+| Category | Error handling & resilience |
+| Location | `Ipc/FrameReader.cs:31-32` |
+| Status | Resolved |
+
+**Description:** After reading the 4-byte length prefix, `ReadFrameAsync` reads the kind
+byte with the synchronous, blocking `_stream.ReadByte()` and ignores the
+`CancellationToken`. On a `NamedPipeClientStream` with `PipeOptions.Asynchronous`, a
+synchronous `ReadByte()` blocks the calling thread until a byte arrives or the pipe
+closes. If the sidecar sends a length prefix and then stalls (slow/hung peer), the call
+hangs on a thread-pool thread and the `EffectiveCallTimeout` linked token in
+`PipeChannel.InvokeAsync` cannot interrupt it because the timeout only fires between
+awaits. This defeats the documented cap on a single read/write call once connected and can
+wedge the single-in-flight call gate.
+
+**Recommendation:** Read the kind byte asynchronously and cancellably: extend the length
+prefix read to 5 bytes, or do a second `ReadExactAsync(new byte[1], ct)`. This makes the
+whole frame read honor the call-timeout token and matches the async style of the rest of
+the reader.
+
+**Resolution:** Resolved 2026-05-22 — replaced the synchronous, non-cancellable `_stream.ReadByte()` for the kind byte with an async `ReadExactAsync(new byte[1], ct)` call so the full frame read honours the call-timeout token and cannot wedge the channel on a stalled peer.
+
+### Driver.Historian.Wonderware.Client-006
+
+| Field | Value |
+|---|---|
+| Severity | Low |
+| Category | Error handling & resilience |
+| Location | `Internal/PipeChannel.cs:96-107`, `WonderwareHistorianClientOptions.cs:11-12` |
+| Status | Open |
+
+**Description:** `PipeChannel.InvokeAsync` retries exactly once on transport failure and
+otherwise propagates. The options expose `ReconnectInitialBackoff` and
+`ReconnectMaxBackoff` and `WonderwareHistorianClientOptions` documents them as exponential
+backoff between reconnects, but neither field is referenced anywhere in the module: the
+single retry reconnects immediately with no delay. A sidecar that is restarting will
+reject or refuse the immediate reconnect, the call fails, and there is no backoff before
+the next caller-driven attempt. Either the backoff belongs in the channel and is missing,
+or the options are dead config that misleads operators.
+
+**Recommendation:** Either implement the documented exponential backoff in the reconnect
+path, or remove the two unused option fields and their XML docs and state plainly that
+retry/backoff is owned by the caller (the alarm drain worker / history router).
+
+**Resolution:** _(open)_
+
+### Driver.Historian.Wonderware.Client-007
+
+| Field | Value |
+|---|---|
+| Severity | Medium |
+| Category | Security |
+| Location | `WonderwareHistorianClient.cs:276` |
+| Status | Resolved |
+
+**Description:** `ToSnapshots` deserializes peer-supplied bytes with
+`MessagePackSerializer.Deserialize<object>(dto.ValueBytes)`, typeless MessagePack
+deserialization. The `object` overload resolves runtime types from the wire payload. The
+client treats the pipe peer as untrusted elsewhere (16 MiB frame cap stated to protect
+the receiver from a hostile or buggy peer, shared-secret Hello). Typeless deserialization
+of bytes that originate from the historian database widens the trust surface. The
+MessagePack standard resolver is primitive-only by default so the practical blast radius
+is limited, but this is the pattern called out by the two suppressed MessagePack
+advisories on this project (see finding 008).
+
+**Recommendation:** Confirm the serializer options here use the default (non-typeless)
+resolver and that no `TypelessContractlessStandardResolver` is in play; if so, document
+that. Prefer round-tripping the value as a constrained set of known primitive types rather
+than `object`, and validate `ValueBytes.Length` against a sane per-sample cap before
+deserializing.
+
+**Resolution:** Resolved 2026-05-22 — added `DeserializeSampleValue()` helper that enforces a 64 KiB per-sample `ValueBytes` cap before deserialization and documents that the default `StandardResolver` (primitive-only, no `TypelessContractlessStandardResolver`) is in use; both `ToSnapshots` and `AlignAtTimeSnapshots` now route through the helper; added inline XML comments to the two `NuGetAuditSuppress` entries in the csproj stating the advisory title, why it does not apply to this usage, and the revisit trigger.
+
+### Driver.Historian.Wonderware.Client-008
+
+| Field | Value |
+|---|---|
+| Severity | Low |
+| Category | Security |
+| Location | `ZB.MOM.WW.OtOpcUa.Driver.Historian.Wonderware.Client.csproj:29-32` |
+| Status | Open |
+
+**Description:** The csproj suppresses two NuGet audit advisories
+(`GHSA-37gx-xxp4-5rgx`, `GHSA-w3x6-4m5h-cxqf`) for the `MessagePack` 2.5.187 dependency
+with no inline comment recording why the suppression is safe, who reviewed it, or when it
+should be revisited. Blanket `NuGetAuditSuppress` entries silence the very signal that
+would flag the next related CVE. Combined with finding 007 (typeless deserialization), an
+unexplained MessagePack advisory suppression is a maintainability and audit-trail gap.
+
+**Recommendation:** Add an XML comment next to each `NuGetAuditSuppress` stating the
+advisory title, why it does not apply to this module usage, and a revisit trigger. Track a
+follow-up to upgrade `MessagePack` once a patched version is available so the suppressions
+can be dropped.
+
+**Resolution:** _(open)_
+
+### Driver.Historian.Wonderware.Client-009
+
+| Field | Value |
+|---|---|
+| Severity | Medium |
+| Category | Testing coverage |
+| Location | `tests/Drivers/ZB.MOM.WW.OtOpcUa.Driver.Historian.Wonderware.Client.Tests/WonderwareHistorianClientTests.cs` |
+| Status | Resolved |
+
+**Description:** The suite covers happy paths, server-error, bad-secret, a single
+reconnect and health counters, but several critical paths are untested:
+(1) `ReadAtTimeAsync` with a partial/reordered sidecar reply, the contract-alignment case
+from finding 001 (the existing test only echoes the request);
+(2) the `WriteBatchAsync` catch branch, a transport/deserialization throw during a write,
+which must return `RetryPlease` for every event;
+(3) `InvokeAsync` second-attempt-also-fails path (the test only proves a successful
+reconnect, never a reconnect that fails again and propagates);
+(4) the `CallTimeout` path, no test asserts that a stalled sidecar produces a timed-out
+`OperationCanceledException`;
+(5) `MapAggregate` for `HistoryAggregateType.Total` throwing `NotSupportedException`;
+(6) the `InvalidDataException` path when the sidecar replies with an unexpected
+`MessageKind`. The byte-equality / round-trip parity test the Contracts.cs and Framing.cs
+comments repeatedly promise is not present in this test project.
+
+**Recommendation:** Add the missing-edge-case tests above. In particular add the
+wire-parity test the source comments commit to: serialize each DTO with the client copy
+and assert byte-equality against the sidecar `Driver.Historian.Wonderware.Ipc` copy, so a
+silent `[Key]` drift between the two duplicated contract sets is caught at build time.
+
+**Resolution:** Resolved 2026-05-22 — added six missing tests to `WonderwareHistorianClientTests.cs` (WriteBatchAsync transport-drop catch path returns RetryPlease; InvokeAsync both-attempts-fail propagates exception; stalled sidecar fires OperationCanceledException within CallTimeout; ReadProcessedAsync Total aggregate throws NotSupportedException; sidecar wrong-kind reply throws InvalidDataException) and extended `FakeSidecarServer` with `DisconnectBeforeReply`, `ReplyWithWrongKind`, and `StallAfterRequest` test knobs; added new `ContractsWireParityTests.cs` with 11 tests pinning MessagePack byte layout, round-trip correctness, MessageKind enum values, and Framing constants to catch silent `[Key]` index drift between the client and sidecar mirror copies. Total test count grew from 11 to 27, all passing.
+
+### Driver.Historian.Wonderware.Client-010
+
+| Field | Value |
+|---|---|
+| Severity | Low |
+| Category | Documentation & comments |
+| Location | `WonderwareHistorianClient.cs:355-361`, `WonderwareHistorianClient.cs:132-150` |
+| Status | Open |
+
+**Description:** Two doc/behaviour mismatches.
+(1) The `Dispose()` XML comment asserts the underlying channel async cleanup is
+non-blocking so the `GetAwaiter()/GetResult()` bridge is safe. `PipeChannel.DisposeAsync`
+calls `ResetTransport()`, which invokes synchronous `Stream.Dispose()` on a
+`NamedPipeClientStream`; pipe disposal can block briefly on OS handle teardown. The bridge
+is safe (no deadlock, no captured context) but not strictly non-blocking; the comment
+should say "does not deadlock".
+(2) `GetHealthSnapshot` populates both `ProcessConnectionOpen` and `EventConnectionOpen`
+from the same `_channel.IsConnected`, and `ActiveProcessNode`/`ActiveEventNode`/`Nodes`
+are hard-coded to null/empty. A consumer reading `HistorianHealthSnapshot` would assume
+two independent connections and per-node health; this client has a single channel and no
+node concept. The collapse is reasonable but undocumented.
+
+**Recommendation:** Reword the `Dispose()` comment to claim only deadlock-safety. Add a
+short remark on `GetHealthSnapshot` explaining that the single-channel client maps both
+connection flags to one transport and does not track per-node health.
+
+**Resolution:** _(open)_
@@ -0,0 +1,337 @@
+# Code Review — Driver.Historian.Wonderware
+
+| Field | Value |
+|---|---|
+| Module | `src/Drivers/ZB.MOM.WW.OtOpcUa.Driver.Historian.Wonderware` |
+| Reviewer | Claude Code |
+| Review date | 2026-05-22 |
+| Commit reviewed | `76d35d1` |
+| Status | Reviewed |
+| Open findings | 7 |
+
+## Checklist coverage
+
+A comprehensive review completes every category, recording "No issues found" where
+a category produced nothing rather than leaving it blank.
+
+| # | Category | Result |
+|---|---|---|
+| 1 | Correctness and logic bugs | Driver.Historian.Wonderware-001, -002, -003, -004 |
+| 2 | OtOpcUa conventions | No issues found |
+| 3 | Concurrency and thread safety | Driver.Historian.Wonderware-005 |
+| 4 | Error handling and resilience | Driver.Historian.Wonderware-006, -007, -008 |
+| 5 | Security | No issues found |
+| 6 | Performance and resource management | Driver.Historian.Wonderware-009, -010 |
+| 7 | Design-document adherence | Driver.Historian.Wonderware-011 |
+| 8 | Code organization and conventions | No issues found |
+| 9 | Testing coverage | Driver.Historian.Wonderware-012 |
+| 10 | Documentation and comments | No issues found |
+
+## Findings
+
+### Driver.Historian.Wonderware-001
+
+| Field | Value |
+|---|---|
+| Severity | High |
+| Category | Correctness and logic bugs |
+| Location | `Backend/SdkAlarmHistorianWriteBackend.cs:68`, `Backend/AahClientManagedAlarmEventWriter.cs:82-103` |
+| Status | Resolved |
+
+**Description:** `MalformedErrors` includes `HistorianAccessError.ErrorValue.WriteToReadOnlyFile`.
+When `ClassifyOutcome` routes that code through `MapOutcome`, `isMalformedInput` is
+`true`, so the per-event result becomes `PermanentFail` and the lmxopcua-side
+store-and-forward sink dead-letters the alarm event. But `WriteToReadOnlyFile` is
+not a property of the event payload; it is a connection-configuration fault (the
+write backend opened the session without `ReadOnly` set to `false`, or the SDK
+defaulted it). Treating it as permanent means a misconfigured or regressed
+connection would silently and permanently discard every alarm event in the batch
+instead of deferring them for retry once the connection is corrected.
+Alarm-event historization is the module's whole purpose, so this is data loss.
+
+**Recommendation:** Move `WriteToReadOnlyFile` out of `MalformedErrors`. It should
+be treated as a connection-class error (abort the batch, reset the connection so
+the reconnect path can re-open with `ReadOnly = false`) or at minimum as
+`RetryPlease`, never `PermanentFail`.
+
+**Resolution:** Resolved 2026-05-22 — moved `WriteToReadOnlyFile` from `MalformedErrors` into `ConnectionErrors` so the batch loop aborts, resets the connection (re-opening with `ReadOnly = false`), and defers the events as `RetryPlease` instead of dead-lettering them.
+
+### Driver.Historian.Wonderware-002
+
+| Field | Value |
+|---|---|
+| Severity | Medium |
+| Category | Correctness and logic bugs |
+| Location | `Ipc/HistorianFrameHandler.cs:162`, `:181` |
+| Status | Resolved |
+
+**Description:** `HandleWriteAlarmEventsAsync` dereferences `req.Events.Length`
+in both the `_alarmWriter is null` branch (line 162) and the catch block (line
+181). MessagePack deserializes an absent or explicit-nil array field as a `null`
+reference, not `Array.Empty<T>()`. A client (or a buggy/hostile peer) that sends
+a `WriteAlarmEventsRequest` with a null `Events` array triggers a
+`NullReferenceException`. Although `RunOneConnectionAsync` would log it and accept
+the next connection, the request gets no reply frame, so the client correlation-id
+wait hangs until its own timeout. `AahClientManagedAlarmEventWriter.WriteAsync`
+already null-guards `events`; the frame handler does not.
+
+**Recommendation:** Normalize `req.Events` to `Array.Empty<AlarmHistorianEventDto>()`
+immediately after deserialization (or guard each `.Length` access), consistent
+with the null-tolerance the writer already has.
+
+**Resolution:** Resolved 2026-05-22 — normalise `req.Events` to `Array.Empty<AlarmHistorianEventDto>()` immediately after deserialization so all subsequent `.Length` accesses are safe against null frames.
+
+### Driver.Historian.Wonderware-003
+
+| Field | Value |
+|---|---|
+| Severity | Medium |
+| Category | Correctness and logic bugs |
+| Location | `Backend/HistorianDataSource.cs:320-323`, `:457-460` |
+| Status | Resolved |
+
+**Description:** Raw and at-time reads decide whether a sample is a string or a
+numeric with `if (!string.IsNullOrEmpty(result.StringValue) && result.Value == 0)`.
+The `result.Value == 0` clause is intended to distinguish a real numeric zero from
+a string tag whose numeric projection is zero, but it is wrong in both directions:
+a numeric (analog) tag that legitimately sampled the value `0` while the SDK also
+populates a non-empty `StringValue` (some Historian builds populate the formatted
+text on every result) is reported to OPC UA as a string, changing the variable
+data type mid-stream; conversely a string tag whose numeric projection is non-zero
+is reported as a numeric. The historian SDK exposes the tag actual data type,
+which should drive the branch instead of a value heuristic.
+
+**Recommendation:** Select string vs. numeric from the SDK result tag-data-type
+field rather than from `Value == 0`. If the type field is genuinely unavailable in
+the bound SDK version, document the limitation explicitly and prefer numeric for
+analog/integer tags.
+
+**Resolution:** Resolved 2026-05-22 — extracted the heuristic into a `SelectValue` helper with a detailed XML doc comment explaining the SDK limitation (`HistoryQueryResult` has no data type field in the bound `aahClientManaged` version); the existing `Value == 0` discriminator is preserved as the best available heuristic with the known edge-case documented.
+
+### Driver.Historian.Wonderware-004
+
+| Field | Value |
+|---|---|
+| Severity | Low |
+| Category | Correctness and logic bugs |
+| Location | `Backend/SdkAlarmHistorianWriteBackend.cs:198-201` |
+| Status | Open |
+
+**Description:** `ToHistorianEvent` only assigns `historianEvent.Id` when
+`Guid.TryParse(dto.EventId, ...)` succeeds. If `EventId` is not a parseable GUID
+(or is empty), `Id` stays `Guid.Empty` and the event is written to the historian
+with an all-zeros identifier. Multiple such events collide on the same id, and the
+write is still accepted (`outcomes[i] = Ack`) so neither side detects the problem.
+The non-parseable case is never logged.
+
+**Recommendation:** Log a warning when `EventId` fails to parse, and either reject
+the event as `PermanentFail` (malformed input) or synthesize a fresh
+`Guid.NewGuid()` so each event still gets a unique id.
+
+**Resolution:** _(open)_
+
+### Driver.Historian.Wonderware-005
+
+| Field | Value |
+|---|---|
+| Severity | Low |
+| Category | Concurrency and thread safety |
+| Location | `Backend/HistorianDataSource.cs:124`, `:126-127` |
+| Status | Open |
+
+**Description:** `GetHealthSnapshot` reads `_activeProcessNode` and
+`_activeEventNode` inside `_healthLock`, but those two fields are written under
+`_connectionLock` / `_eventConnectionLock` (lines 183, 243, 209-210, 266-269) — a
+different lock. The health-counter fields are correctly `_healthLock`-protected,
+but the active-node strings are published under one lock and read under another,
+so the snapshot can observe a stale active-node value relative to the
+connection-open booleans. This is a diagnostics-only path, so impact is limited to
+a momentarily inconsistent health snapshot.
+
+**Recommendation:** Pick one lock for the active-node strings (publish them under
+`_healthLock` on every connection state change, or read them under the connection
+lock), so the snapshot is internally consistent.
+
+**Resolution:** _(open)_
+
+### Driver.Historian.Wonderware-006
+
+| Field | Value |
+|---|---|
+| Severity | Medium |
+| Category | Error handling and resilience |
+| Location | `Ipc/PipeServer.cs:120-128` |
+| Status | Resolved |
+
+**Description:** `RunAsync` re-accepts connections in a `while` loop. If
+`RunOneConnectionAsync` throws synchronously and immediately on every iteration
+(for example `new NamedPipeServerStream(...)` fails because the pipe name is
+already in use, or `PipeAcl.Create` throws), the loop spins with no delay and no
+backoff, pegging a CPU core and flooding the rolling log file with one `Error`
+line per iteration. There is no circuit-breaker or retry cap.
+
+**Recommendation:** Add a short delay (exponential backoff capped at a few
+seconds) before re-accepting after a caught exception, and consider a
+consecutive-failure threshold that escalates to a fatal exit so the supervisor can
+restart the sidecar cleanly.
+
+**Resolution:** Resolved 2026-05-22 — added exponential backoff (250 ms → 8 s, six steps) after each connection-loop failure and a `MaxConsecutiveFailures=20` threshold that re-throws so the SCM/NSSM supervisor can restart the sidecar cleanly.
+
+### Driver.Historian.Wonderware-007
+
+| Field | Value |
+|---|---|
+| Severity | Low |
+| Category | Error handling and resilience |
+| Location | `Ipc/PipeServer.cs:70-75` |
+| Status | Open |
+
+**Description:** When `VerifyCaller` rejects the peer SID, the server logs the
+reason and calls `_current.Disconnect()` with no `HelloAck` frame sent. The
+shared-secret-mismatch and major-version-mismatch paths below it both send a
+rejecting `HelloAck` so the client learns why. A client that fails the SID check
+instead sees an abrupt disconnect and must rely on its own read timeout, with no
+diagnostic on the client side. The asymmetry also makes the SID-rejection path
+harder to test from the client.
+
+**Recommendation:** Send a `HelloAck` with `Accepted = false` and a
+`caller-sid-mismatch` reject reason before disconnecting, consistent with the
+other two rejection paths.
+
+**Resolution:** _(open)_
+
+### Driver.Historian.Wonderware-008
+
+| Field | Value |
+|---|---|
+| Severity | Low |
+| Category | Error handling and resilience |
+| Location | `Backend/HistorianDataSource.cs:301-307`, `:374-380` |
+| Status | Open |
+
+**Description:** When `query.StartQuery` returns `false`, `ReadRawAsync` and
+`ReadAggregateAsync` call `HandleConnectionError()` and return an empty result
+list. A failed `StartQuery` is not necessarily a connection failure — it can be a
+bad tag name, an invalid time range, or an unsupported aggregate — yet the code
+unconditionally tears down the shared SDK connection. A burst of queries with one
+bad tag name therefore repeatedly drops and re-opens the (relatively expensive)
+historian connection and marks the cluster node failed via `HandleConnectionError`
+into `_picker.MarkFailed`, which can push an otherwise healthy node into cooldown.
+The empty-list result is also indistinguishable from "no data in range" to the
+caller — the `Success` flag on the reply will still be `true`.
+
+**Recommendation:** Inspect `error.ErrorCode` to distinguish connection-class
+failures (reset and mark node failed) from query-class failures (leave the
+connection intact, surface the error). Consider returning a failed reply
+(`Success = false`) for query-class `StartQuery` failures so the client does not
+treat an SDK error as an empty history.
+
+**Resolution:** _(open)_
+
+### Driver.Historian.Wonderware-009
+
+| Field | Value |
+|---|---|
+| Severity | Medium |
+| Category | Performance and resource management |
+| Location | `Backend/HistorianDataSource.cs:382-395`, `Ipc/Contracts.cs:85-99` |
+| Status | Resolved |
+
+**Description:** `ReadAggregateAsync` drains `query.MoveNext` into `results` with
+no upper bound, unlike `ReadRawAsync`, which honours `maxValues` /
+`MaxValuesPerRead` and breaks. `ReadProcessedRequest` carries no max-buckets field.
+A processed read over a wide time range with a small `IntervalMs` produces an
+unbounded `HistorianAggregateSample` list; the handler then serializes it into
+`ReadProcessedReply`. If the serialized body exceeds the 16 MiB
+`Framing.MaxFrameBodyBytes` cap, `FrameWriter.WriteAsync` throws and the entire
+reply is lost (the client correlation wait hangs), and before that point the
+sidecar holds the whole result set in memory.
+
+**Recommendation:** Apply `_config.MaxValuesPerRead` as a bucket cap in
+`ReadAggregateAsync` (mirroring the raw path), and/or add a `MaxBuckets` field to
+`ReadProcessedRequest`. Reject or truncate result sets that would exceed the frame
+cap with an explicit error reply rather than letting `WriteAsync` throw.
+
+**Resolution:** Resolved 2026-05-22 — applied `_config.MaxValuesPerRead` as a bucket cap in `ReadAggregateAsync` mirroring the raw-read path; truncation logs a Warning with the limit and a hint to widen `IntervalMs` or reduce the time range.
+
+### Driver.Historian.Wonderware-010
+
+| Field | Value |
+|---|---|
+| Severity | Low |
+| Category | Performance and resource management |
+| Location | `Backend/HistorianConfiguration.cs:32-36`, `Backend/HistorianDataSource.cs` (all read methods) |
+| Status | Open |
+
+**Description:** `HistorianConfiguration.RequestTimeoutSeconds` is documented as
+the "outer safety timeout applied to sync-over-async Historian operations" and is
+copied around (`SdkAlarmHistorianWriteBackend.CloneConfigWithServerName:346`), but
+it is never read or enforced anywhere. The `HistorianDataSource` read methods are
+declared `Task`-returning but execute the SDK calls synchronously on the caller
+thread and only check the `CancellationToken` between `MoveNext` iterations. There
+is no outer timeout: a hung `StartQuery` or a slow `MoveNext` blocks the single
+pipe-server connection thread indefinitely (the connect path has its own poll
+timeout, but the query path does not). The documented safety net does not exist.
+
+**Recommendation:** Either wire `RequestTimeoutSeconds` into the read paths (a
+`CancellationTokenSource.CancelAfter` linked into `ct`, or run the SDK call on a
+worker with a bounded wait), or remove the property and its XML doc so the code
+does not advertise a guarantee it does not provide.
+
+**Resolution:** _(open)_
+
+### Driver.Historian.Wonderware-011
+
+| Field | Value |
+|---|---|
+| Severity | Low |
+| Category | Design-document adherence |
+| Location | `Backend/HistorianDataSource.cs:9-12`, `Backend/IHistorianDataSource.cs:9-11`, `Backend/HistorianSample.cs:7-9`, `Backend/HistorianConfiguration.cs:7-9` |
+| Status | Open |
+
+**Description:** Several XML doc comments reference the retired v1 architecture as
+if it were current: "inside Galaxy.Host", "the Proxy maps returned samples", "the
+Host returns these across the IPC boundary as `GalaxyDataValue`", "Populated from
+... the Proxy DriverInstance.DriverConfig". Per `CLAUDE.md`, PR 7.2 retired the
+`Galaxy.Host` / `Galaxy.Proxy` / `Galaxy.Shared` projects, and this driver is now a
+standalone sidecar whose client is the .NET 10 `WonderwareHistorianClient`
+(`docs/AlarmTracking.md`). The comments are stale and misdescribe the current data
+flow, which contradicts the "no stale design docs/comments" expectation in the
+review checklist.
+
+**Recommendation:** Update the doc comments to describe the current sidecar/IPC
+architecture (sidecar talking to `WonderwareHistorianClient` over the named pipe),
+dropping the `Galaxy.Host` / `Proxy` / `GalaxyDataValue` references.
+
+**Resolution:** _(open)_
+
+### Driver.Historian.Wonderware-012
+
+| Field | Value |
+|---|---|
+| Severity | Low |
+| Category | Testing coverage |
+| Location | `Backend/HistorianDataSource.cs`, `tests/Drivers/ZB.MOM.WW.OtOpcUa.Driver.Historian.Wonderware.Tests/` |
+| Status | Open |
+
+**Description:** The unit-test suite covers `HistorianQualityMapper`,
+`HistorianClusterEndpointPicker`, `SdkAlarmHistorianWriteBackend`,
+`AahClientManagedAlarmEventWriter`, the IPC round trip, and `Program` alarm-writer
+wiring. `HistorianDataSource` itself — the largest and most logic-dense file in
+the module — has no direct unit coverage of its read paths, despite
+`IHistorianConnectionFactory` being explicitly extracted "so tests can inject
+fakes that control connection success, failure, and timeout behavior". The
+connect-failover-and-cooldown loop (`ConnectToAnyHealthyNode`), the mid-query
+connection-reset path (`HandleConnectionError`), the string-vs-numeric value
+selection (see -003), the at-time per-timestamp loop, and `ExtractAggregateValue`
+column dispatch are all untested. A stale empty test directory
+(`tests/ZB.MOM.WW.OtOpcUa.Driver.Historian.Wonderware.Tests/`, containing only
+`bin/obj`) also sits alongside the live `tests/Drivers/...` project and should be
+removed to avoid confusion.
+
+**Recommendation:** Add `HistorianDataSource` tests driving an
+`IHistorianConnectionFactory` fake — covering failover, cooldown, mid-query reset,
+cancellation, and the value-type selection — and delete the stale empty
+`tests/ZB.MOM.WW.OtOpcUa.Driver.Historian.Wonderware.Tests/` directory.
+
+**Resolution:** _(open)_
@@ -0,0 +1,241 @@
+# Code Review — Driver.Modbus.Addressing
+
+| Field | Value |
+|---|---|
+| Module | `src/Drivers/ZB.MOM.WW.OtOpcUa.Driver.Modbus.Addressing` |
+| Reviewer | Claude Code |
+| Review date | 2026-05-22 |
+| Commit reviewed | `76d35d1` |
+| Status | Reviewed |
+| Open findings | 3 |
+
+## Checklist coverage
+
+A comprehensive review completes every category, recording "No issues found" where
+a category produced nothing rather than leaving it blank.
+
+| # | Category | Result |
+|---|---|---|
+| 1 | Correctness & logic bugs | Driver.Modbus.Addressing-001, -002, -003, -004 |
+| 2 | OtOpcUa conventions | No issues found |
+| 3 | Concurrency & thread safety | No issues found |
+| 4 | Error handling & resilience | Driver.Modbus.Addressing-005, -006 |
+| 5 | Security | No issues found |
+| 6 | Performance & resource management | No issues found |
+| 7 | Design-document adherence | Driver.Modbus.Addressing-001, -007 |
+| 8 | Code organization & conventions | No issues found |
+| 9 | Testing coverage | Driver.Modbus.Addressing-008 |
+| 10 | Documentation & comments | Driver.Modbus.Addressing-009 |
+
+## Findings
+
+### Driver.Modbus.Addressing-001
+
+| Field | Value |
+|---|---|
+| Severity | High |
+| Category | Correctness & logic bugs |
+| Location | `ModbusAddressParser.cs:230-235`, `DirectLogicAddress.cs:66-73` |
+| Status | Resolved |
+
+**Description:** The DL205 family-native branch routes every V-prefixed address through
+`DirectLogicAddress.UserVMemoryToPdu`, which is a plain octal-to-decimal conversion. DL205/DL260
+system V-memory (V40400 and up) is NOT a simple octal decode — per `docs/v2/dl205.md` section
+V-Memory, V40400 must map to Modbus PDU 0x2100 (decimal 8448) on a factory-mode ECOM module.
+The parser instead octal-decodes V40400 to decimal 16640 (0x4100), the wrong register. The
+`DirectLogicAddress.SystemVMemoryToPdu` / `SystemVMemoryBasePdu` helper that exists to do this
+correctly is never called by the parser — it is dead code from the parser point of view. A tag
+spreadsheet that addresses any DL system register through the grammar string silently reads and
+writes the wrong PLC memory. The companion test `ModbusFamilyParserTests.cs:20` bakes the wrong
+value (V40400 to 16640) into a passing assertion, so the regression is locked in.
+
+**Recommendation:** Make the DL205 V branch detect the system bank (octal address >= 40400) and
+route it through `SystemVMemoryToPdu`, or explicitly reject system V-memory in the grammar string
+with a diagnostic pointing at the structured tag form. Either way, fix the V40400 test to assert
+the corrected mapping.
+
+**Resolution:** Resolved 2026-05-22 — added `DirectLogicAddress.VMemoryToPdu`, which detects the
+system bank (octal >= V40400) and relocates it through `SystemVMemoryToPdu` to PDU 0x2100; the
+DL205 V branch in `ModbusAddressParser` now calls it, and the `ModbusFamilyParserTests` V40400
+assertion was corrected from 16640 to 0x2100 with system-bank regression cases added.
+
+### Driver.Modbus.Addressing-002
+
+| Field | Value |
+|---|---|
+| Severity | Medium |
+| Category | Correctness & logic bugs |
+| Location | `ModbusAddressParser.cs:86-94` |
+| Status | Resolved |
+
+**Description:** In the 3-field disambiguation, an empty 3rd field (`40001:F:`) reaches
+`parts[2].All(char.IsDigit)`. `Enumerable.All` returns true for an empty sequence, so the empty
+string is classified as a valid-shaped array count, assigned to `countPart`, then silently dropped
+by the later `string.IsNullOrEmpty(countPart)` guard. The result is that `40001:F:` parses
+successfully as a plain scalar with a dangling empty field rather than being rejected as
+malformed. The 4-field form `40001:F::` has the analogous effect. A user who mistypes a trailing
+colon gets no diagnostic.
+
+**Recommendation:** Reject an empty 3rd field explicitly, or guard the `All(char.IsDigit)` branch
+with `parts[2].Length > 0`.
+
+**Resolution:** Resolved 2026-05-22 — added an explicit `parts[2].Length == 0` check before the `All(char.IsDigit)` branch that returns a descriptive error, so a trailing colon typo produces a diagnostic instead of silently parsing as a scalar.
+
+### Driver.Modbus.Addressing-003
+
+| Field | Value |
+|---|---|
+| Severity | Medium |
+| Category | Correctness & logic bugs |
+| Location | `ModbusAddressParser.cs:405-406`, `ModbusAddressParser.cs:128` |
+| Status | Resolved |
+
+**Description:** `LooksLikeByteOrderToken` classifies any 4-letter token as a byte-order token.
+A 3-field address whose 3rd field is a 4-letter type-like token (e.g. `40001:S:BOOL`) is routed
+into `TryParseByteOrder`, producing the misleading diagnostic "Unknown byte order BOOL" instead
+of telling the user the type belongs in field 2. The type code BOOL is exactly 4 letters and
+could only ever be intended as a type — the shape heuristic cannot tell a mistyped type from a
+byte order, so the diagnostic actively misdirects.
+
+**Recommendation:** When `TryParseByteOrder` fails on a 4-letter token in the 3-field form, widen
+the error message to mention that field 3 is a byte order and field 2 is the type, or attempt a
+type-parse fallback before emitting the byte-order error.
+
+**Resolution:** Resolved 2026-05-22 — in the 3-field disambiguation error path, a 4-letter alphanumeric token that looks like a type code now produces a diagnostic explicitly stating that field 3 is the byte-order slot and field 2 is the type slot, directing the user to the correct fix.
+
+### Driver.Modbus.Addressing-004
+
+| Field | Value |
+|---|---|
+| Severity | Medium |
+| Category | Correctness & logic bugs |
+| Location | `ModbusAddressParser.cs:182-194` |
+| Status | Resolved |
+
+**Description:** The bit suffix is stripped using `text.IndexOf('.')` — the first dot. An input
+such as `40001.5.3` produces a bit text of "5.3", rejected by `byte.TryParse` with the generic
+"Bit index must be 0..15" message. A Modicon-style decimal-point typo like `400.01` is silently
+treated as region/offset 400 plus bit 01; 400 then fails Modicon length validation, so the
+surfaced error is the Modicon length diagnostic rather than a bit-index diagnostic, because the
+bit was parsed first and 01 is a valid bit. The dot-handling assumes a single dot without
+asserting it, and the diagnostics for these malformed inputs are inconsistent.
+
+**Recommendation:** Use `LastIndexOf('.')` or assert exactly one dot, and validate that the
+region/offset segment is non-empty and dot-free after the strip so malformed inputs get a precise
+diagnostic.
+
+**Resolution:** Resolved 2026-05-22 — switched to `LastIndexOf('.')`, added a non-empty guard for the address segment before the dot, and added a check that the address segment itself contains no dot (diagnosing multi-dot inputs with "contains multiple dots" rather than a confusing bit-index error).
+
+### Driver.Modbus.Addressing-005
+
+| Field | Value |
+|---|---|
+| Severity | Medium |
+| Category | Error handling & resilience |
+| Location | `ModbusAddressParser.cs:200-213` |
+| Status | Resolved |
+
+**Description:** `TryParseRegionAndOffset` tries family-native, then mnemonic, then Modicon. When
+all three fail it returns false with whatever error the Modicon parser last wrote (comment: "the
+Modicon error is the more specific diagnostic"). For a non-Generic family this is misleading:
+`TryParseFamilyNative` returns false with error left null for any address that does not start with
+a recognised family prefix, and even for recognised prefixes it only sets error inside the catch.
+The subsequent mnemonic and Modicon attempts overwrite error. Net effect: a clearly
+family-native-shaped input that fails deep in the family helper can still surface a generic
+Modicon "must be 5 or 6 digits" error, hiding the real cause (e.g. "contains non-octal digit").
+
+**Recommendation:** When a non-Generic family is configured and the input matches a family
+prefix, prefer and preserve the family-native error rather than letting the Modicon fallback
+overwrite it.
+
+**Resolution:** Resolved 2026-05-22 — the family-native error is now captured in `familyNativeError` and, after all three branches fail, preferred over the Modicon fallback error when it is non-null (indicating the address matched a family prefix but failed deep inside the helper).
+
+### Driver.Modbus.Addressing-006
+
+| Field | Value |
+|---|---|
+| Severity | Low |
+| Category | Error handling & resilience |
+| Location | `ModbusAddressParser.cs:297-301` |
+| Status | Open |
+
+**Description:** `TryParseFamilyNative` catches only `ArgumentException` and `OverflowException`.
+The current helpers throw only those (including `ArgumentOutOfRangeException`, which derives from
+`ArgumentException`), so today it is correct. But the parser intent is to convert helper
+exceptions into structured errors; any future helper change that throws a different exception type
+(e.g. a `FormatException` from a `ushort.Parse` swap) would escape as an unhandled exception out
+of a `TryParse` method, violating the try-parse contract that config-bind hot-path callers
+depend on.
+
+**Recommendation:** Either document the exact exception contract of the helpers and keep the
+narrow catch, or broaden to a general catch-all that records the message — a try-parse method
+should never throw.
+
+**Resolution:** _(open)_
+
+### Driver.Modbus.Addressing-007
+
+| Field | Value |
+|---|---|
+| Severity | Low |
+| Category | Design-document adherence |
+| Location | `ModbusDataType.cs:91-95`, `docs/v2/dl205.md` section Strings |
+| Status | Open |
+
+**Description:** `ModbusStringByteOrder` (HighByteFirst / LowByteFirst) is defined in this
+assembly and documented as the DL205 low-byte-first string-packing knob, but `ParsedModbusAddress`
+has no field for it and `ModbusAddressParser` never produces or consumes it. The `STR<n>` grammar
+form cannot express the DL205 string byte order described in `docs/v2/dl205.md` — a DL205 string
+tag parsed from the grammar string always carries the default order. The enum is effectively
+unreachable from the parser, so the grammar cannot represent a known, documented device quirk.
+
+**Recommendation:** Either add a `StringByteOrder` field to `ParsedModbusAddress` plus a grammar
+token for it, or document explicitly that DL205 string byte order is only configurable via the
+structured tag form and is intentionally out of grammar scope.
+
+**Resolution:** _(open)_
+
+### Driver.Modbus.Addressing-008
+
+| Field | Value |
+|---|---|
+| Severity | Medium |
+| Category | Testing coverage |
+| Location | `tests/Drivers/ZB.MOM.WW.OtOpcUa.Driver.Modbus.Addressing.Tests/` |
+| Status | Resolved |
+
+**Description:** Several edge cases of the address arithmetic are untested or asserted wrong:
+(a) DL205 system V-memory mapping is tested only with the incorrect expected value
+(`ModbusFamilyParserTests.cs:20`, see finding -001); (b) there is no test for `UserVMemoryToPdu`
+or `AddOctalOffset` overflow (V200000, C200000) hitting the `OverflowException` path; (c) no test
+for the empty-trailing-field cases of finding -002; (d) `MelsecAddress.ParseHex` overflow and
+`DRegisterToHolding` / `MRelayToCoil` bank-base overflow are untested; (e) no test that
+`SystemVMemoryToPdu` is exercised at all. The address-arithmetic overflow and off-by-one paths
+are exactly the high-risk surface this module owns, and they are the least covered.
+
+**Recommendation:** Add overflow/boundary tests for every PDU/coil/discrete translation helper
+and for the parser count/bit/field edge cases. Correct the V40400 assertion as part of fixing
+finding -001.
+
+**Resolution:** Resolved 2026-05-22 — added `ModbusAddressEdgeCaseTests.cs` covering: empty 3rd-field rejection, multi-dot input rejection, `UserVMemoryToPdu` overflow, `AddOctalOffset` overflow via Y and C helpers, `SystemVMemoryToPdu` base/overflow, `MelsecAddress.ParseHex` overflow, `DRegisterToHolding` and `MRelayToCoil` bank-base overflow.
+
+### Driver.Modbus.Addressing-009
+
+| Field | Value |
+|---|---|
+| Severity | Low |
+| Category | Documentation & comments |
+| Location | `ModbusModiconAddress.cs:55-64`, `ModbusModiconAddress.cs:104-110` |
+| Status | Open |
+
+**Description:** The comments on `ModbusModiconAddress.TryParse` are slightly inaccurate. The
+remark that 5-digit Modicon is always exactly 5 chars (40001..49999) and 6-digit is exactly 6
+(400001..465536-shaped) implies the leading digit is always 4, but the parser accepts leading
+0/1/3 too — a 5-digit coil is 00001..09999, not 40001..49999. Separately, the line-106 comment
+says the 5-digit form caps at 9999 by construction while the adjacent code path applies the same
+`> 65536` check to both forms; the comment describes an invariant the code does not rely on.
+
+**Recommendation:** Reword the range examples to cover all four region digits and drop the
+caps-at-9999 aside or restate it as a precise statement about trailing-digit count.
+
+**Resolution:** _(open)_
@@ -0,0 +1,234 @@
+# Code Review — Driver.Modbus.Cli
+
+| Field | Value |
+|---|---|
+| Module | `src/Drivers/Cli/ZB.MOM.WW.OtOpcUa.Driver.Modbus.Cli` |
+| Reviewer | Claude Code |
+| Review date | 2026-05-22 |
+| Commit reviewed | `76d35d1` |
+| Status | Reviewed |
+| Open findings | 6 |
+
+## Checklist coverage
+
+A comprehensive review completes every category, recording "No issues found" where
+a category produced nothing rather than leaving it blank.
+
+| # | Category | Result |
+|---|---|---|
+| 1 | Correctness & logic bugs | Driver.Modbus.Cli-001, Driver.Modbus.Cli-002, Driver.Modbus.Cli-003 |
+| 2 | OtOpcUa conventions | No issues found |
+| 3 | Concurrency & thread safety | Driver.Modbus.Cli-004 |
+| 4 | Error handling & resilience | Driver.Modbus.Cli-005, Driver.Modbus.Cli-006 |
+| 5 | Security | No issues found |
+| 6 | Performance & resource management | No issues found |
+| 7 | Design-document adherence | Driver.Modbus.Cli-007 |
+| 8 | Code organization & conventions | No issues found |
+| 9 | Testing coverage | Driver.Modbus.Cli-008 |
+| 10 | Documentation & comments | No issues found |
+
+## Findings
+
+### Driver.Modbus.Cli-001
+
+| Field | Value |
+|---|---|
+| Severity | Medium |
+| Category | Correctness & logic bugs |
+| Location | `src/Drivers/Cli/ZB.MOM.WW.OtOpcUa.Driver.Modbus.Cli/Commands/SubscribeCommand.cs:43-51` |
+| Status | Resolved |
+
+**Description:** `SubscribeCommand` synthesises its `ModbusTagDefinition` with only
+`Name`, `Region`, `Address`, `DataType`, `Writable`, and `ByteOrder` — it never
+exposes or passes `--bit-index`, `--string-length`, or `--string-byte-order`.
+A user running `subscribe -t BitInRegister` always watches bit 0 regardless of
+intent, and `subscribe -t String` runs with `StringLength = 0`. The doc
+(`docs/Driver.Modbus.Cli.md`) lists `BitInRegister`, `String`, `Bcd16`, `Bcd32`
+in the `subscribe` `--type` help text, so these types are advertised as supported
+but cannot be used correctly. `read` and `write` both expose all three flags;
+`subscribe` is the odd one out.
+
+**Recommendation:** Add `--bit-index`, `--string-length`, and `--string-byte-order`
+options to `SubscribeCommand` (mirroring `ReadCommand`) and pass them into the
+`ModbusTagDefinition`, or trim the `--type` help text to the types `subscribe`
+actually supports and reject `BitInRegister` / `String` at command entry with a
+clear message.
+
+**Resolution:** Resolved 2026-05-22 — added `--bit-index`, `--string-length`, and `--string-byte-order` options to `SubscribeCommand`, mirroring `ReadCommand`, and passed them through to `ModbusTagDefinition` so `BitInRegister` and `String` types subscribe correctly.
+
+### Driver.Modbus.Cli-002
+
+| Field | Value |
+|---|---|
+| Severity | Medium |
+| Category | Correctness & logic bugs |
+| Location | `src/Drivers/Cli/ZB.MOM.WW.OtOpcUa.Driver.Modbus.Cli/Commands/WriteCommand.cs:54-89` |
+| Status | Resolved |
+
+**Description:** `WriteCommand` rejects read-only regions (`DiscreteInputs` /
+`InputRegisters`) but does not validate that `--type` is meaningful for the
+`Coils` region. `write -r Coils -a 5 -t UInt16 -v 42` builds a `Coils` tag with
+`DataType = UInt16`; the value parses to a boxed `ushort`, and the driver's
+`WriteOneAsync` coil branch calls `Convert.ToBoolean(value)` which succeeds for
+any non-zero `ushort` (yields `true`). The write silently lands as a coil ON with
+no diagnostic, even though the operator asked for a 16-bit register write. A coil
+region only supports `Bool`-style boolean values.
+
+**Recommendation:** After the read-only-region check, reject `Region == Coils`
+combined with any non-boolean `--type` (anything other than `Bool`), with a
+message explaining coils carry a single bit.
+
+**Resolution:** Resolved 2026-05-22 — added a `Region == Coils && DataType != Bool` check immediately after the read-only-region guard, throwing `CommandException` with a message explaining that coils carry a single bit and only `--type Bool` is valid.
+
+### Driver.Modbus.Cli-003
+
+| Field | Value |
+|---|---|
+| Severity | Low |
+| Category | Correctness & logic bugs |
+| Location | `src/Drivers/Cli/ZB.MOM.WW.OtOpcUa.Driver.Modbus.Cli/ModbusCommandBase.cs:14-24` |
+| Status | Open |
+
+**Description:** `Port` (`int`) and `TimeoutMs` (`int`) accept any 32-bit value,
+including negatives and ports above 65535. `UnitId` is a `byte`, so it accepts
+0-255 even though the option description and `docs/Driver.Modbus.Cli.md` both say
+the valid range is 1-247 (0 is the Modbus broadcast address; 248-255 are
+reserved). A negative `--timeout-ms` becomes a negative `TimeSpan` passed straight
+into the driver; an out-of-range `--port` fails later with an opaque socket
+error. None of these are validated at parse time.
+
+**Recommendation:** Validate `Port` (1-65535), `TimeoutMs` (greater than 0), and
+`UnitId` (1-247) at the top of each command's `ExecuteAsync` (or in
+`ModbusCommandBase`), throwing `CliFx.Exceptions.CommandException` with a clear
+message — consistent with how `WriteCommand` already rejects bad regions and
+boolean strings.
+
+**Resolution:** _(open)_
+
+
+### Driver.Modbus.Cli-004
+
+| Field | Value |
+|---|---|
+| Severity | Low |
+| Category | Concurrency & thread safety |
+| Location | `src/Drivers/Cli/ZB.MOM.WW.OtOpcUa.Driver.Modbus.Cli/Commands/SubscribeCommand.cs:61-67` |
+| Status | Open |
+
+**Description:** The `OnDataChange` handler is invoked from the driver's
+`PollGroupEngine` background thread and calls `console.Output.WriteLine`
+synchronously. An exception thrown inside this handler (e.g. an `IOException` on a
+redirected or closed stdout) propagates on the poll-engine thread and is not
+caught — it could fault the background loop. For a long-running `subscribe` this
+is a real, if low-probability, crash path. Output lines are also written without
+any synchronization, so overlapping poll ticks could interleave partial lines.
+
+**Recommendation:** Wrap the handler body in a `try/catch` that swallows or logs
+write failures so a transient console-write error cannot tear down the poll loop.
+A single `lock` around the write also removes the interleave risk.
+
+**Resolution:** _(open)_
+
+### Driver.Modbus.Cli-005
+
+| Field | Value |
+|---|---|
+| Severity | Low |
+| Category | Error handling & resilience |
+| Location | `src/Drivers/Cli/ZB.MOM.WW.OtOpcUa.Driver.Modbus.Cli/Commands/ProbeCommand.cs:21-54`; `Commands/ReadCommand.cs:46-75`; `Commands/WriteCommand.cs:54-89` |
+| Status | Open |
+
+**Description:** All three commands call `ConfigureLogging()` then
+`console.RegisterCancellationHandler()`, but if the operator presses Ctrl+C
+before `InitializeAsync` completes, the resulting `OperationCancelledException`
+propagates out of `ExecuteAsync` unhandled. CliFx renders unhandled non-
+`CommandException` exceptions as a full stack trace, which is noisy for what is
+just a user-cancelled run. `SubscribeCommand` correctly catches
+`OperationCancelledException` around its `Task.Delay`, but the connect/read/write
+commands do not catch it around their driver calls.
+
+**Recommendation:** Either let cancellation surface a clean message (catch
+`OperationCancelledException` in each command and exit quietly) or document that
+the noisy trace on Ctrl+C-during-connect is acceptable. Consistency with
+`SubscribeCommand`'s handling is the cleaner choice.
+
+**Resolution:** _(open)_
+
+### Driver.Modbus.Cli-006
+
+| Field | Value |
+|---|---|
+| Severity | Low |
+| Category | Error handling & resilience |
+| Location | `src/Drivers/Cli/ZB.MOM.WW.OtOpcUa.Driver.Modbus.Cli/Commands/ProbeCommand.cs:35-53` |
+| Status | Open |
+
+**Description:** `probe` reports `Health: {health.State}` from `GetHealth()`.
+After a successful `InitializeAsync` the driver sets state to `Healthy`
+regardless of whether the subsequent probe register read returns Good or a Bad
+status code. `ReadAsync` does not throw on a Modbus exception response — it
+returns a `DataValueSnapshot` with a Bad `StatusCode`. So `probe` against a host
+that accepts the TCP connection but rejects FC03 at the probe address prints
+`Health: Healthy` while the snapshot line below shows a Bad status. The two lines
+disagree, and the headline `Health` value (the thing an operator scans first)
+overstates success. The doc bills `probe` as the "is the PLC up + talking Modbus"
+check, which the bare `Healthy` does not actually confirm.
+
+**Recommendation:** Have `probe` derive its headline verdict from the probe
+snapshot's `StatusCode` (Good vs Bad) rather than — or in addition to — the driver
+`State`, or print a single combined verdict line so the two cannot contradict each
+other.
+
+**Resolution:** _(open)_
+
+### Driver.Modbus.Cli-007
+
+| Field | Value |
+|---|---|
+| Severity | Low |
+| Category | Design-document adherence |
+| Location | `docs/Driver.Modbus.Cli.md:124-156`; `src/Drivers/Cli/ZB.MOM.WW.OtOpcUa.Driver.Modbus.Cli/Commands/ReadCommand.cs` |
+| Status | Open |
+
+**Description:** `docs/Driver.Modbus.Cli.md` devotes a whole "v2 addressing
+grammar" section to the industry-standard tag-address strings (`40001:F:CDAB`,
+`HR1:I`, `C100`, `V2000:F:CDAB`, etc.) and says "set the per-tag `addressString`
+field instead of the structured `region` + `address` + `dataType` fields." None of
+the CLI commands expose an `--address-string` (or equivalent) flag — `read`,
+`write`, and `subscribe` only accept the structured `--region` + `--address` +
+`--type` triple. The documented address-string grammar is reachable only through a
+hand-written `DriverConfig` JSON, not through this CLI. The doc reads as if the CLI
+supports it.
+
+**Recommendation:** Either add an `--address-string` option that feeds the
+driver's address-string parser (and `--family` for the DL205/MELSEC native
+syntax), or scope the "v2 addressing grammar" section of the doc to note it
+applies to `DriverConfig` JSON and is not a CLI flag.
+
+**Resolution:** _(open)_
+
+### Driver.Modbus.Cli-008
+
+| Field | Value |
+|---|---|
+| Severity | Low |
+| Category | Testing coverage |
+| Location | `tests/Drivers/Cli/ZB.MOM.WW.OtOpcUa.Driver.Modbus.Cli.Tests/` |
+| Status | Open |
+
+**Description:** The test project covers only the two pure-function seams:
+`ReadCommand.SynthesiseTagName` and `WriteCommand.ParseValue`. There is no coverage
+for `WriteCommand`'s read-only-region rejection (`Region is not (Coils or
+HoldingRegisters)`), no test for `ModbusCommandBase.BuildOptions` (e.g. that
+`Probe.Enabled` is `false` and `AutoReconnect` tracks `--disable-reconnect`), and
+no test asserting unsupported write types throw. The branch logic in
+`WriteCommand.ExecuteAsync` and `ModbusCommandBase.BuildOptions` is the part most
+likely to regress and is currently untested. The validation gaps in findings
+002/003 are also untested precisely because no test exercises that path.
+
+**Recommendation:** Add tests for `WriteCommand`'s region-validation branch and for
+`ModbusCommandBase.BuildOptions` (construct a command instance via the `init`
+setters and assert the produced `ModbusDriverOptions`). Once findings 002/003 are
+fixed, add tests for the new validation paths.
+
+**Resolution:** _(open)_
@@ -0,0 +1,207 @@
+# Code Review — Driver.Modbus
+
+| Field | Value |
+|---|---|
+| Module | `src/Drivers/ZB.MOM.WW.OtOpcUa.Driver.Modbus` |
+| Reviewer | Claude Code |
+| Review date | 2026-05-22 |
+| Commit reviewed | `76d35d1` |
+| Status | Reviewed |
+| Open findings | 7 |
+
+## Checklist coverage
+
+| # | Category | Result |
+|---|---|---|
+| 1 | Correctness & logic bugs | Driver.Modbus-002, Driver.Modbus-005, Driver.Modbus-009 |
+| 2 | OtOpcUa conventions | No issues found |
+| 3 | Concurrency & thread safety | Driver.Modbus-001, Driver.Modbus-003 |
+| 4 | Error handling & resilience | Driver.Modbus-006, Driver.Modbus-010 |
+| 5 | Security | No issues found |
+| 6 | Performance & resource management | Driver.Modbus-004 |
+| 7 | Design-document adherence | Driver.Modbus-007 |
+| 8 | Code organization & conventions | Driver.Modbus-011 |
+| 9 | Testing coverage | Driver.Modbus-012 |
+| 10 | Documentation & comments | Driver.Modbus-008 |
+
+## Findings
+
+### Driver.Modbus-001
+
+| Field | Value |
+|---|---|
+| Severity | High |
+| Category | Concurrency & thread safety |
+| Location | `ModbusDriver.cs:92,99-122` |
+| Status | Resolved |
+
+**Description:** `_lastPublishedByRef` is a plain `Dictionary<string, object>` mutated inside `ShouldPublish`, which runs on the `PollGroupEngine.onChange` callback. `PollGroupEngine` runs one background `Task` per subscription (`PollGroupEngine.cs:64`), so a driver with two or more subscriptions invokes `onChange` — and therefore `ShouldPublish` — concurrently on separate threads. `ShouldPublish` does `TryGetValue` and indexer writes on the unsynchronized dictionary (`ModbusDriver.cs:108`, `112`, `120`). Concurrent reads/writes of a non-thread-safe `Dictionary` can corrupt internal state, drop entries, or throw `IndexOutOfRangeException`/`InvalidOperationException`, crashing the poll loop. The sibling cache `_lastWrittenByRef` is correctly guarded by `_lastWrittenLock` — only the deadband cache was left unprotected.
+
+**Recommendation:** Guard `_lastPublishedByRef` with a dedicated lock around every access in `ShouldPublish`, or switch it to `ConcurrentDictionary<string, object>` and use `AddOrUpdate`/`TryGetValue`.
+
+**Resolution:** Resolved 2026-05-22 — switched `_lastPublishedByRef` to `ConcurrentDictionary<string, object>` so the `TryGetValue`/indexer-write accesses in `ShouldPublish` are thread-safe under concurrent multi-subscription `onChange` callbacks; added a concurrent-deadband-subscription regression test.
+
+### Driver.Modbus-002
+
+| Field | Value |
+|---|---|
+| Severity | Medium |
+| Category | Correctness & logic bugs |
+| Location | `ModbusDriver.cs:127-186` |
+| Status | Resolved |
+
+**Description:** `ShutdownAsync` never clears `_tagsByName`, and `InitializeAsync` repopulates it with `_tagsByName[t.Name] = t` (`ModbusDriver.cs:134`) without clearing first. `ReinitializeAsync` calls `ShutdownAsync` then `InitializeAsync`. Because `_options.Tags` is fixed for a driver instance, the same set re-inserts harmlessly today — but the asymmetry is a latent bug: any future path that re-runs init with a different tag set leaves stale tag entries that resolve reads/writes against deleted nodes. `_lastPublishedByRef` and `_lastWrittenByRef` similarly survive a Reinitialize, retaining deadband/write-suppression baselines against the old config, while `_autoProhibited` *is* deliberately cleared (`ModbusDriver.cs:179`) — the inconsistency shows the clearing was simply overlooked.
+
+**Recommendation:** Clear `_tagsByName`, `_lastPublishedByRef`, and `_lastWrittenByRef` in `ShutdownAsync` (or at the top of `InitializeAsync`) so a Reinitialize starts from a clean state, consistent with the existing `_autoProhibited.Clear()`.
+
+**Resolution:** Resolved 2026-05-22 — added `_tagsByName.Clear()`, `_lastPublishedByRef.Clear()`, and `_lastWrittenByRef.Clear()` to `ShutdownAsync` (via the new shared `TeardownAsync` helper) so a `ReinitializeAsync` cycle always starts from a clean state, consistent with the existing `_autoProhibited.Clear()`.
+
+### Driver.Modbus-003
+
+| Field | Value |
+|---|---|
+| Severity | Low |
+| Category | Concurrency & thread safety |
+| Location | `ModbusDriver.cs:59,188,241,259,266,726,745,759` |
+| Status | Open |
+
+**Description:** `_health` is a non-`volatile` reference field written from multiple threads (concurrent `ReadAsync` callers, the coalesced-read path, `WriteAsync` indirectly, and `ProbeLoopAsync`) and read by `GetHealth()`. Reference assignment is atomic on .NET so a torn read cannot occur, but there is no happens-before ordering: a stale `DriverHealth` can be observed on another core, and concurrent writers race so "last write wins" is non-deterministic (a `Degraded` write from a failed read can clobber a just-published `Healthy`, or vice versa).
+
+**Recommendation:** Mark `_health` `volatile`, or assign via `Volatile.Write` and read with `Volatile.Read`, to give `GetHealth()` a defined ordering guarantee.
+
+**Resolution:** _(open)_
+
+### Driver.Modbus-004
+
+| Field | Value |
+|---|---|
+| Severity | Medium |
+| Category | Performance & resource management |
+| Location | `ModbusDriver.cs:1468-1473` |
+| Status | Resolved |
+
+**Description:** `DisposeAsync()` only disposes `_transport`. Unlike `ShutdownAsync`, it does not cancel/dispose `_probeCts` or `_reprobeCts`, nor dispose `_poll` (the `PollGroupEngine`). A caller that uses `await using` or `using` without first calling `ShutdownAsync` leaks the probe loop, the re-probe loop, and every active polled subscription background `Task`/`CancellationTokenSource`. The two `Task.Run` loops keep running against a disposed transport, throwing on every tick. `Dispose()` (sync) has the same gap and additionally blocks on the async path via `GetAwaiter().GetResult()`.
+
+**Recommendation:** Make `DisposeAsync` perform the same teardown as `ShutdownAsync` (cancel both CTSs, dispose them, dispose `_poll`) before disposing `_transport`. Have `ShutdownAsync` and `DisposeAsync` share a private `TeardownAsync` helper.
+
+**Resolution:** Resolved 2026-05-22 — refactored teardown into a shared `TeardownAsync` helper; `DisposeAsync` now delegates to it, cancelling both CTS objects, disposing `_poll`, and disposing `_transport` — matching `ShutdownAsync` and eliminating the probe/re-probe/poll-engine leak on `await using` callers.
+
+### Driver.Modbus-005
+
+| Field | Value |
+|---|---|
+| Severity | Medium |
+| Category | Correctness & logic bugs |
+| Location | `ModbusDriver.cs:777-798,323-330` |
+| Status | Resolved |
+
+**Description:** `ReadRegisterBlockAsync` and `ReadBitBlockAsync` index `resp[1]` and call `Buffer.BlockCopy(resp, 2, ..., resp[1])` with no bounds validation. `ModbusTcpTransport.SendOnceAsync` validates only the MBAP length field and the exception high-bit — it does not guarantee a non-exception response PDU is long enough to hold function-code + byte-count + the claimed data. A device (or buggy server) returning a 1-byte PDU, or a byte-count larger than the actual payload, produces an `IndexOutOfRangeException`/`ArgumentException` rather than a clean comms error. `DecodeBitArray` similarly indexes `bitmap[0]` (`ModbusDriver.cs:325`) without checking the bitmap is non-empty. In `ReadAsync` these are caught by the catch-all and mapped to `BadCommunicationError`, so impact is limited; in `ReadCoalescedAsync` the exception is opaque to the narrower catch arms.
+
+**Recommendation:** In `ReadRegisterBlockAsync`/`ReadBitBlockAsync`, validate `resp.Length >= 2` and `resp.Length >= 2 + resp[1]` before slicing, throwing a descriptive `InvalidDataException`. Validate the decoded byte/bit count matches the request quantity.
+
+**Resolution:** Resolved 2026-05-22 — added `resp.Length >= 2`, `resp.Length >= 2 + resp[1]`, and byte-count-vs-quantity checks in both `ReadRegisterBlockAsync` and `ReadBitBlockAsync`, throwing `InvalidDataException` with precise diagnostics; added an empty-bitmap guard in `DecodeBitArray`.
+
+### Driver.Modbus-006
+
+| Field | Value |
+|---|---|
+| Severity | Medium |
+| Category | Error handling & resilience |
+| Location | `ModbusDriver.cs:514-524,532-550` |
+| Status | Resolved |
+
+**Description:** `RunReprobeOnceForTestAsync` reads `_transport` once at the top (`var transport = _transport ?? throw ...`). If `ShutdownAsync` runs (setting `_transport = null` and disposing it) while a re-probe pass is mid-iteration, the loop keeps issuing reads against the captured, disposed transport. `ReprobeLoopAsync` only catches `OperationCanceledException when (ct.IsCancellationRequested)` — an `ObjectDisposedException` from the disposed transport escapes `RunReprobeOnceForTestAsync` and faults the fire-and-forget background `Task`, silently killing the re-probe loop with the wrong failure mode.
+
+**Recommendation:** Re-check `_transport`/cancellation inside the per-candidate loop, or broaden the `ReprobeLoopAsync` catch to also swallow `ObjectDisposedException` when `ct.IsCancellationRequested`.
+
+**Resolution:** Resolved 2026-05-22 — broadened `ReprobeLoopAsync` to catch `ObjectDisposedException when (ct.IsCancellationRequested)` and return cleanly, so a transport disposal race during shutdown exits the background task rather than faulting it.
+
+### Driver.Modbus-007
+
+| Field | Value |
+|---|---|
+| Severity | Low |
+| Category | Design-document adherence |
+| Location | `ModbusDriver.cs:1392`, `ModbusDriverOptions.cs:74-80` |
+| Status | Open |
+
+**Description:** Two design-vs-code drifts. (1) `MapDataType` maps `Int64`/`UInt64` to `DriverDataType.Int32` with the inline comment "widening to Int32 loses precision; PR 25 adds Int64 to DriverDataType". The address-space node for a 64-bit Modbus tag is declared `Int32`, misrepresenting the OPC UA variable's `DataType` even though `DecodeRegister` produces a correct `long`/`ulong` value — clients see a type/value mismatch. (2) `DisableFC23` is documented and bound from JSON but is a confirmed no-op ("The driver does not currently emit FC23"). Both are acknowledged-but-unfinished items worth tracking.
+
+**Recommendation:** Track the PR 25 `DriverDataType.Int64` follow-up; until then document the Int32 surfacing limitation in `docs/v2/modbus-addressing.md` so operators configuring `I_64`/`UI_64` tags understand the node type. Mark `DisableFC23` clearly as reserved/unimplemented or gate it once FC23 ships.
+
+**Resolution:** _(open)_
+
+### Driver.Modbus-008
+
+| Field | Value |
+|---|---|
+| Severity | Low |
+| Category | Documentation & comments |
+| Location | `ModbusDriver.cs:411-417,700-703,737-744` |
+| Status | Open |
+
+**Description:** Stale/misleading comments. (1) The `<summary>` block at `ModbusDriver.cs:411-417` says auto-prohibited ranges are "Cleared by ReinitializeAsync ... or by an explicit re-probe API (not yet shipped)" — the re-probe loop has shipped (#151, `ReprobeLoopAsync`), so the parenthetical is wrong. (2) The comment at `ModbusDriver.cs:700-703` ("On block-level failure mark every member Bad — caller's per-tag fallback won't re-try since handled-set already includes them; auto-split-on-failure is a follow-up") contradicts the actual `catch (ModbusException)` arm below it, which deliberately does not add members to `handled` and does defer to per-tag fallback (and auto-split has shipped via bisection). The empty `foreach (var (idx, _) in block.Members) { }` loop at `ModbusDriver.cs:737-744`, with only a comment body, is dead code from that superseded design.
+
+**Recommendation:** Update the two comments to match the shipped #148/#150/#151 behaviour and delete the empty `foreach` loop in the `catch (ModbusException)` arm.
+
+**Resolution:** _(open)_
+
+### Driver.Modbus-009
+
+| Field | Value |
+|---|---|
+| Severity | Low |
+| Category | Correctness & logic bugs |
+| Location | `ModbusDriver.cs:1160-1167`, `ModbusTcpTransport.cs:94-95` |
+| Status | Open |
+
+**Description:** Two edge cases. (1) `RegisterCount` for `ModbusDataType.String` computes `(tag.StringLength + 1) / 2`; a tag configured with `StringLength = 0` yields a register count of 0, flowing into `ReadOneAsync` as `totalRegs = 0` and producing an FC03/FC04 with quantity 0 — a spec-illegal request the PLC rejects with exception 03. The factory does not reject `StringLength = 0` for String tags. (2) `EnableKeepAlive` casts `opts.Time.TotalSeconds`/`opts.Interval.TotalSeconds` to `int`; a sub-second configured `TimeSpan` (e.g. 500 ms) truncates to 0, which most OSes reject or interpret as "use default", silently defeating the configured keep-alive timing.
+
+**Recommendation:** Validate `StringLength >= 1` for `String` tags in `ModbusDriverFactoryExtensions.BuildTag`. For keep-alive, round up to a minimum of 1 second or validate the configured `TimeSpan` is a whole number of seconds.
+
+**Resolution:** _(open)_
+
+### Driver.Modbus-010
+
+| Field | Value |
+|---|---|
+| Severity | Low |
+| Category | Error handling & resilience |
+| Location | `ModbusDriver.cs:864-868`, `ModbusDriverOptions.cs:116-125` |
+| Status | Open |
+
+**Description:** When `WriteOnChangeOnly` is enabled and `IsRedundantWrite` returns true, `WriteAsync` returns `WriteResult(0u)` (Good) without touching the wire. The suppression baseline (`_lastWrittenByRef`) is only invalidated by a *read* that returns a divergent value. If a driver instance has `WriteOnChangeOnly = true` but a tag is never subscribed/read (write-only setpoint), a value the operator believes was re-asserted is silently suppressed forever after the first write — no time- or count-based expiry exists. The option XML doc describes the read-invalidation path but does not warn about write-only tags.
+
+**Recommendation:** Document the write-only-tag caveat on the `WriteOnChangeOnly` option, or add an optional TTL to the suppression cache so a periodic re-write still reaches the PLC.
+
+**Resolution:** _(open)_
+
+### Driver.Modbus-011
+
+| Field | Value |
+|---|---|
+| Severity | Low |
+| Category | Code organization & conventions |
+| Location | `ModbusDriver.cs:23-43,89-97,408-432` |
+| Status | Open |
+
+**Description:** Field and member declarations are interleaved with methods throughout `ModbusDriver`. `ResolveHost` (a public method) is the first member of the class, followed by `BuildSlaveHostName`, then a block of fields; `_lastPublishedByRef`/`_lastWrittenByRef` are declared after the constructor; `ProhibitionState`, `_autoProhibited`, and `_reprobeCts` are declared mid-file between `DecodeRegisterArray` and `RangeIsAutoProhibited`. There are also two near-identical `<summary>` blocks stacked back-to-back at `ModbusDriver.cs:411-423`. This hurts readability of a 1400-line file and makes the field inventory hard to audit (relevant to the thread-safety findings above).
+
+**Recommendation:** Group all instance fields at the top of the class, move nested types together, and remove the orphaned first `<summary>` at lines 411-417 that no longer precedes a member.
+
+**Resolution:** _(open)_
+
+### Driver.Modbus-012
+
+| Field | Value |
+|---|---|
+| Severity | Low |
+| Category | Testing coverage |
+| Location | `tests/Drivers/ZB.MOM.WW.OtOpcUa.Driver.Modbus.Tests/` |
+| Status | Open |
+
+**Description:** The unit suite is broad (coalescing, bisection, auto-recovery, byte order, arrays, BCD, RMW, caps, multi-unit, probe, reconnect, subscription). Gaps relative to the findings above: (1) no test exercises concurrent multi-subscription publishing, so the `_lastPublishedByRef` race (Driver.Modbus-001) is uncaught; (2) no test covers `ReinitializeAsync` state hygiene for stale `_tagsByName`/caches (Driver.Modbus-002); (3) no test feeds a malformed/short response PDU through `ReadRegisterBlockAsync`/`DecodeBitArray` to confirm a clean `BadCommunicationError` rather than an index-range crash (Driver.Modbus-005); (4) no test asserts `DisposeAsync` (vs `ShutdownAsync`) tears down the probe/re-probe loops and `_poll` (Driver.Modbus-004).
+
+**Recommendation:** Add unit tests for concurrent deadband publishing across two subscriptions, `ReinitializeAsync` state hygiene, malformed-response handling in the register/bit block readers, and `DisposeAsync` loop teardown.
+
+**Resolution:** _(open)_
@@ -0,0 +1,252 @@
+# Code Review — Driver.OpcUaClient
+
+| Field | Value |
+|---|---|
+| Module | `src/Drivers/ZB.MOM.WW.OtOpcUa.Driver.OpcUaClient` |
+| Reviewer | Claude Code |
+| Review date | 2026-05-22 |
+| Commit reviewed | `76d35d1` |
+| Status | Reviewed |
+| Open findings | 2 |
+
+## Checklist coverage
+
+| # | Category | Result |
+|---|---|---|
+| 1 | Correctness & logic bugs | Driver.OpcUaClient-001, -002, -003, -010, -011 |
+| 2 | OtOpcUa conventions | Driver.OpcUaClient-004 |
+| 3 | Concurrency & thread safety | Driver.OpcUaClient-005, -006, -007 |
+| 4 | Error handling & resilience | Driver.OpcUaClient-002, -008, -009 |
+| 5 | Security | Driver.OpcUaClient-012 |
+| 6 | Performance & resource management | Driver.OpcUaClient-013, -014 |
+| 7 | Design-document adherence | Driver.OpcUaClient-004, -013, -015 |
+| 8 | Code organization & conventions | No issues found |
+| 9 | Testing coverage | Driver.OpcUaClient-015 |
+| 10 | Documentation & comments | Driver.OpcUaClient-011 |
+
+## Findings
+
+### Driver.OpcUaClient-001
+
+| Field | Value |
+|---|---|
+| Severity | High |
+| Category | Correctness & logic bugs |
+| Location | `OpcUaClientDriver.cs:444`, `:466`, `:517`, `:540`, `:599`, `:610` |
+| Status | Resolved |
+
+**Description:** ReadAsync, WriteAsync, and DiscoverAsync capture the session into a local variable via RequireSession() before acquiring `_gate`, then perform the wire call on that captured reference inside the gate. The reconnect path (OnReconnectComplete, line 1330) swaps `Session` to a brand-new ISession. A read that captured the pre-reconnect session at line 444, then blocked on `_gate.WaitAsync` while a reconnect completed, issues ReadAsync against a stale/closed session. The catch block then fans out BadCommunicationError for the whole batch even though the driver is healthy on the new session, and the operation is silently lost. The gate does not protect against the session being swapped underneath a waiter.
+
+**Recommendation:** Re-read `Session` inside the `_gate` critical section (after WaitAsync returns), or route the session swap itself through `_gate` so a swap cannot interleave with a gated operation.
+
+**Resolution:** Resolved 2026-05-22 — ReadAsync/WriteAsync/DiscoverAsync now re-read `Session` (and parse NodeIds) inside the `_gate` critical section after `WaitAsync` returns; a session swapped in by a concurrent reconnect is the one used for the wire call.
+
+### Driver.OpcUaClient-002
+
+| Field | Value |
+|---|---|
+| Severity | High |
+| Category | Error handling & resilience |
+| Location | `OpcUaClientDriver.cs:1330-1359` |
+| Status | Resolved |
+
+**Description:** OnReconnectComplete handles only the success case. When SessionReconnectHandler gives up (its retry loop exhausts the 2-minute maxReconnectPeriod), it invokes the callback with `handler.Session == null`. The code sets `Session = null`, disposes the handler, and sets `_reconnectHandler = null`, but leaves `_health` at whatever it was (typically Degraded) and `_hostState` at Stopped. There is no further reconnect attempt (the handler is gone, and OnKeepAlive only fires on a live session which no longer exists), and DriverState is never set to Faulted. The driver is permanently wedged: no session, no reconnect loop, no Faulted signal for the Core, and ReinitializeAsync is never triggered. This is the single largest gateway resilience gap.
+
+**Recommendation:** In OnReconnectComplete, when newSession is null, set `_health` to a Faulted DriverHealth with an explanatory message so the Core can fan out Bad quality and offer an operator reinitialize. Consider re-arming a fresh reconnect attempt rather than giving up entirely for an always-on gateway.
+
+**Resolution:** Resolved 2026-05-22 — OnReconnectComplete's give-up branch now transitions HostState to Faulted, sets a Faulted DriverHealth with an explanatory message, and re-arms a fresh SessionReconnectHandler (`TryRearmReconnect`) against the last-known session so an always-on gateway self-heals while the Core can still offer an operator reinitialize.
+
+### Driver.OpcUaClient-003
+
+| Field | Value |
+|---|---|
+| Severity | High |
+| Category | Correctness & logic bugs |
+| Location | `OpcUaClientDriver.cs:644-711` |
+| Status | Resolved |
+
+**Description:** BrowseRecursiveAsync calls session.BrowseAsync with `requestedMaxReferencesPerNode: 0` but never follows browse continuation points. OPC UA servers enforce a server-side max-references-per-node limit; when a node has more children than the server returns in one response, BrowseResult.ContinuationPoint is non-empty and the caller must issue BrowseNext to retrieve the remainder. This driver discards the continuation point, so any folder on the remote server with a large child set is silently truncated: discovered tags go missing from the local address space with no error. For the tens-of-thousands-of-nodes scenario the options doc targets (MaxDiscoveredNodes = 10000), this is a realistic and silent data-completeness bug.
+
+**Recommendation:** After processing resp.Results[0].References, check resp.Results[0].ContinuationPoint; while non-empty, call session.BrowseNextAsync and append the additional references before recursing/registering.
+
+**Resolution:** Resolved 2026-05-22 — BrowseRecursiveAsync now loops on the BrowseResult.ContinuationPoint, calling `session.BrowseNextAsync` and appending each page of references until the continuation point is empty, so large remote folders are no longer silently truncated.
+
+### Driver.OpcUaClient-004
+
+| Field | Value |
+|---|---|
+| Severity | High |
+| Category | Design-document adherence |
+| Location | `OpcUaClientDriver.cs:596-632`, `:789`, `OpcUaClientDriverOptions.cs` |
+| Status | Resolved |
+
+**Description:** docs/v2/driver-specs.md section 8 mandates two features that are absent. (1) Namespace remapping: the spec requires building a bidirectional namespace map at connect time from session.NamespaceUris. The driver instead stores the raw upstream NodeId string (pv.NodeId.ToString()) as DriverAttributeInfo.FullName and re-parses it verbatim for reads/writes. The namespace index embedded in `ns=N;...` is server-session-relative; if the upstream server reorders its namespace table across a restart (permitted by the spec), every stored ns=N reference points at the wrong namespace and reads/writes silently address wrong nodes. (2) TargetNamespaceKind enforcement: section 8 requires the driver to enforce Equipment-vs-SystemPlatform choice at startup and fail draft validation on misconfiguration; OpcUaClientDriverOptions has no such knob.
+
+**Recommendation:** Build a namespace-URI map from session.NamespaceUris at connect time and store NodeIds in a server-stable form (namespace URI plus identifier) rather than session-relative ns=N. Add the TargetNamespaceKind option and the startup validation section 8 describes, or document explicitly why the design deviates.
+
+**Resolution:** Resolved 2026-05-22 — new `NamespaceMap` (built from session.NamespaceUris at connect and rebuilt on reconnect) persists discovered NodeIds in the server-stable `nsu=<uri>;…` form; reads/writes re-resolve that form against the current session so a remote namespace-table reorder no longer misaddresses nodes. Added the `TargetNamespaceKind` option + `UnsMappingTable` and `ValidateNamespaceKind`, which fails draft validation for an Equipment instance lacking a UNS mapping or a SystemPlatform instance carrying one.
+
+### Driver.OpcUaClient-005
+
+| Field | Value |
+|---|---|
+| Severity | High |
+| Category | Concurrency & thread safety |
+| Location | `OpcUaClientDriver.cs:1297-1319` |
+| Status | Resolved |
+
+**Description:** OnKeepAlive reads and writes `_reconnectHandler` without any lock: `if (_reconnectHandler is not null) return;` followed by `_reconnectHandler = new SessionReconnectHandler(...)`. Keep-alive callbacks are raised from the SDK keep-alive timer thread; on a bad keep-alive the SDK can fire the handler repeatedly while the channel stays down. Two callbacks racing through the check-then-set both observe null, both construct a SessionReconnectHandler, both call BeginReconnect, and the second assignment overwrites the first handler, leaking the first handler (its retry loop keeps running, unreferenced and never disposed) and creating two competing reconnect loops. ShutdownAsync then only cancels/disposes the one that won the assignment race.
+
+**Recommendation:** Guard the `_reconnectHandler` check-and-set with `_probeLock` (already held for `_hostState`), or use Interlocked.CompareExchange to ensure exactly one handler is constructed per drop.
+
+**Resolution:** Resolved 2026-05-22 — the `_reconnectHandler` check-and-set in OnKeepAlive (and the take-and-clear in ShutdownAsync, plus the dispose/re-arm in OnReconnectComplete/TryRearmReconnect) now run inside the `_probeLock` critical section, so exactly one SessionReconnectHandler is constructed per drop and a racing keep-alive callback cannot leak a handler.
+
+### Driver.OpcUaClient-006
+
+| Field | Value |
+|---|---|
+| Severity | Medium |
+| Category | Concurrency & thread safety |
+| Location | `OpcUaClientDriver.cs:1330-1359` |
+| Status | Resolved |
+
+**Description:** OnReconnectComplete mutates `Session` (line 1347) directly from the reconnect-handler callback thread with no synchronization against ReadAsync/WriteAsync/DiscoverAsync/ShutdownAsync. Session is a plain auto-property with no memory barrier; a concurrent reader on another thread may observe a stale reference. ShutdownAsync (line 425) can also run concurrently with OnReconnectComplete: ShutdownAsync disposes the session and sets Session = null while OnReconnectComplete sets Session = newSession, and the interleaving is unspecified, potentially leaving a live session leaked after shutdown.
+
+**Recommendation:** Route all Session mutations through a single lock (or the `_gate`). Make ShutdownAsync cancel the reconnect handler and wait for any in-flight OnReconnectComplete to settle before disposing the session.
+
+**Resolution:** Resolved 2026-05-22 — All Session mutations (assignment to newSession in OnReconnectComplete, and assignment to null in ShutdownAsync) now run inside the `_probeLock` critical section, preventing races between the reconnect callback thread, ShutdownAsync, and keep-alive callbacks. KeepAlive handler detach/attach is also done under `_probeLock` so a keep-alive cannot fire against the old session after the swap.
+
+### Driver.OpcUaClient-007
+
+| Field | Value |
+|---|---|
+| Severity | Medium |
+| Category | Concurrency & thread safety |
+| Location | `OpcUaClientDriver.cs:1374`, `:1376-1383`, `:508` |
+| Status | Resolved |
+
+**Description:** Two disposal races. (1) Dispose() does `DisposeAsync().AsTask().GetAwaiter().GetResult()`, synchronous blocking on async work. The Galaxy stability review (driver-stability.md, the 2026-04-13 findings) explicitly calls out sync-over-async on the OPC UA stack thread as a closed bug class; if Dispose() runs on the OPC UA stack thread or any thread the SDK continuations need, this deadlocks. (2) DisposeAsync disposes `_gate` (line 1382) after ShutdownAsync returns, but ShutdownAsync does not drain in-flight ReadAsync/WriteAsync operations holding `_gate`. An in-flight read that calls `_gate.Release()` (line 508) after `_gate.Dispose()` throws ObjectDisposedException on a background thread.
+
+**Recommendation:** Provide an async disposal path callers prefer; if a sync Dispose() is unavoidable keep it free of .GetResult() on SDK-thread-affine work. Before disposing `_gate`, acquire it once so all in-flight gated operations have completed, or guard releases against disposal.
+
+**Resolution:** Resolved 2026-05-22 — `Dispose()` no longer calls `.GetAwaiter().GetResult()` on async work; it performs a purely-synchronous teardown (cancel reconnect handler, detach keep-alive, null Session under `_probeLock`). Both `Dispose()` and `DisposeAsync()` now acquire `_gate` once before disposing it, ensuring any in-flight gated operation has released before the gate is torn down.
+
+### Driver.OpcUaClient-008
+
+| Field | Value |
+|---|---|
+| Severity | Medium |
+| Category | Error handling & resilience |
+| Location | `OpcUaClientDriver.cs:1092-1099` |
+| Status | Resolved |
+
+**Description:** AcknowledgeAsync issues the batched CallAsync and then catches all exceptions with a best-effort empty catch; it also never inspects the per-call results in the success path (`_ = await session.CallAsync(...)`). An alarm acknowledgment the upstream server rejects (BadConditionAlreadyAcked, BadNodeIdUnknown, BadUserAccessDenied) is reported as success to the caller. IAlarmSource.AcknowledgeAsync has no per-item result, so the only way a failure could surface is via an exception, and the catch suppresses even that. Operators acking a critical alarm get no signal that the ack did not take.
+
+**Recommendation:** Inspect CallMethodResult.StatusCode for each result and log Bad codes; rethrow (or surface via driver health) genuine transport failures rather than swallowing them. Consider extending the contract so per-ack failures propagate.
+
+**Resolution:** Resolved 2026-05-22 — `AcknowledgeAsync` now inspects each `CallMethodResult.StatusCode` in the success path and logs a Warning for any Bad code (BadConditionAlreadyAcked, BadNodeIdUnknown, BadUserAccessDenied, etc.). `OperationCanceledException` (transport timeout) is now re-thrown instead of swallowed; other transport exceptions are also logged with the driver instance ID. Requires `ILogger<OpcUaClientDriver>` injected via new optional constructor parameter.
+
+### Driver.OpcUaClient-009
+
+| Field | Value |
+|---|---|
+| Severity | Medium |
+| Category | Error handling & resilience |
+| Location | `OpcUaClientDriver.cs:560-564` |
+| Status | Resolved |
+
+**Description:** WriteAsync's catch block fans out BadCommunicationError across the whole batch on any exception. Writes are non-idempotent by default (IWritable remarks, decision #44/#45): a timeout exception may fire after the upstream server already applied the write. Reporting BadCommunicationError (a code that reads as "definitely did not happen") for a write that may have succeeded is misleading; the OPC UA client downstream may safely re-issue and double-apply. The read path has the same fan-out but reads are idempotent so it is benign there; for writes the ambiguity matters.
+
+**Recommendation:** Map write timeouts/cancellations to BadTimeout (which downstream correctly treats as "outcome unknown, do not blindly retry") rather than BadCommunicationError, and only use BadCommunicationError for failures that provably occurred before the request reached the wire.
+
+**Resolution:** Resolved 2026-05-22 — `WriteAsync`'s inner catch block now handles `OperationCanceledException` (timeout/cancellation) separately, mapping it to `BadTimeout` (0x800A0000), while all other exceptions map to `BadCommunicationError`. The session-null pre-wire exit still correctly uses `BadCommunicationError`.
+
+### Driver.OpcUaClient-010
+
+| Field | Value |
+|---|---|
+| Severity | Medium |
+| Category | Correctness & logic bugs |
+| Location | `OpcUaClientDriver.cs:823-824` |
+| Status | Resolved |
+
+**Description:** MapUpstreamDataType maps DataTypeIds.Byte (the OPC UA unsigned 8-bit type) to DriverDataType.Int16. Byte should map to an unsigned driver type (UInt16 is the smallest unsigned available, matching how SByte belongs with the signed family). Mapping an unsigned 0-255 type onto signed Int16 misrepresents the type metadata downstream: clients see a signed type for an unsigned source, and any range/validation logic keyed off the driver data type is wrong. SByte correctly belongs with Int16; Byte does not.
+
+**Recommendation:** Map DataTypeIds.Byte to DriverDataType.UInt16 (or add a Byte/UInt8 driver type if the enum supports finer granularity), keeping SByte and Int16 on the signed Int16 mapping.
+
+**Resolution:** Resolved 2026-05-22 — `MapUpstreamDataType` now maps `DataTypeIds.Byte` → `DriverDataType.UInt16` (unsigned family) while `DataTypeIds.SByte` remains on `DriverDataType.Int16` (signed family). Test `MapUpstreamDataType_Byte_maps_to_UInt16_unsigned_family` asserts the fix and `MapUpstreamDataType_maps_Byte_to_UInt16_not_Int16` guards the regression.
+
+### Driver.OpcUaClient-011
+
+| Field | Value |
+|---|---|
+| Severity | Low |
+| Category | Documentation & comments |
+| Location | `OpcUaClientDriver.cs:783-784` |
+| Status | Open |
+
+**Description:** The comment on the isArray computation states "-1 = scalar; 1+ = array dimensions; 0 = one-dimensional array". This is inaccurate against OPC UA ValueRank semantics: -3 is ScalarOrOneDimension, -2 is Any, -1 is Scalar, and 0 is OneOrMoreDimensions (not specifically one-dimensional). The code `valueRank >= 0` treats -2 (Any) and -3 (ScalarOrOneDimension) as scalar, which is a defensible default, but the comment misdescribes the constants and would mislead a maintainer.
+
+**Recommendation:** Correct the comment to the actual ValueRank constants (-3 ScalarOrOneDimension, -2 Any, -1 Scalar, 0 OneOrMoreDimensions, 1 OneDimension, >1 multi-dim) and state the deliberate choice that anything >= 0 is treated as an array.
+
+**Resolution:** _(open)_
+
+### Driver.OpcUaClient-012
+
+| Field | Value |
+|---|---|
+| Severity | Medium |
+| Category | Security |
+| Location | `OpcUaClientDriver.cs:210-217` |
+| Status | Resolved |
+
+**Description:** When AutoAcceptCertificates is true the driver registers a CertificateValidation handler that accepts only StatusCodes.BadCertificateUntrusted. A self-signed or otherwise untrusted server certificate frequently fails validation with a different code first (BadCertificateChainIncomplete, BadCertificateTimeInvalid, BadCertificateHostNameInvalid), so auto-accept silently does not accept many real dev certificates and the connect fails confusingly. The handler is added to config.CertificateValidator but never removed; each driver instance leaks a delegate subscription on a validator that may be process-shared. The option doc says auto-accept is dev-only and must be false in production, but there is no runtime guard preventing AutoAcceptCertificates=true shipping to production and no log warning when it is enabled.
+
+**Recommendation:** When auto-accepting for dev, accept the full set of certificate-validation error codes (or use the SDK AutoAcceptUntrustedCertificates path consistently). Emit a prominent warning log every time AutoAcceptCertificates is enabled so a production misconfiguration is visible. Detach the handler on shutdown.
+
+**Resolution:** Resolved 2026-05-22 — The cert-validation handler now accepts ALL validation errors (not only BadCertificateUntrusted) when `AutoAcceptCertificates=true`, so real dev certs with chain/host/time errors work. A `LogWarning` is emitted at startup whenever the flag is set. The handler delegate + validator reference are stored in `_certValidationHandler`/`_certValidatorRef` and detached in both `ShutdownAsync` and `Dispose()`/`DisposeAsync()` to prevent the delegate leak.
+
+### Driver.OpcUaClient-013
+
+| Field | Value |
+|---|---|
+| Severity | Medium |
+| Category | Performance & resource management |
+| Location | `OpcUaClientDriver.cs:436-437` |
+| Status | Resolved |
+
+**Description:** GetMemoryFootprint() is hard-coded to return 0 and FlushOptionalCachesAsync is a no-op Task.CompletedTask. docs/v2/driver-stability.md section "In-process only (Tier A/B)" makes per-instance allocation tracking a contract requirement, and driver-specs.md section 8 explicitly calls out browse-cache memory: BrowseStrategy=Full against a large remote server can cache tens of thousands of node descriptions and the per-instance budget should bound this. Returning 0 means the Core 30-second footprint poll can never detect this driver's browse-cache growth, and the cache-budget-breach to flush escalation path is dead code. A gateway pointed at a 10k-node server (the configured cap) silently evades the Tier-A memory-guard mechanism.
+
+**Recommendation:** Track an approximate footprint for the discovered-node set and any cached browse state, return it from GetMemoryFootprint(), and implement FlushOptionalCachesAsync to drop droppable cache. If the driver genuinely holds no significant cache, document why 0 is correct.
+
+**Resolution:** Resolved 2026-05-22 — `DiscoverAsync` now updates a `_discoveredNodeCount` volatile counter after each pass. `GetMemoryFootprint()` returns `_discoveredNodeCount * 512` (conservative ~512 bytes per node for DriverAttributeInfo + strings). `FlushOptionalCachesAsync` resets `_discoveredNodeCount` to 0, signalling Core that re-discovery will rebuild cleanly. A 10k-node server now reports ~5 MB to the Core slope alarm rather than 0.
+
+### Driver.OpcUaClient-014
+
+| Field | Value |
+|---|---|
+| Severity | Low |
+| Category | Performance & resource management |
+| Location | `OpcUaClientDriver.cs:904`, `:1035` |
+| Status | Open |
+
+**Description:** `MonitoredItem.Notification += (mi, args) => ...` (and the alarm-event equivalent) attaches a closure-capturing lambda to each monitored item's event. The lambda is never detached. When UnsubscribeAsync removes a subscription it calls Subscription.DeleteAsync but does not clear the MonitoredItem.Notification handlers; if the SDK retains the MonitoredItem/Subscription graph anywhere (the session keeps a reference until its own disposal, or during transfer-on-reconnect), the driver instance is kept alive by the closure longer than necessary.
+
+**Recommendation:** Detach the Notification handlers when deleting a subscription, or hold the handler delegate so it can be explicitly removed in UnsubscribeAsync/ShutdownAsync.
+
+**Resolution:** _(open)_
+
+### Driver.OpcUaClient-015
+
+| Field | Value |
+|---|---|
+| Severity | Medium |
+| Category | Testing coverage |
+| Location | `tests/Drivers/ZB.MOM.WW.OtOpcUa.Driver.OpcUaClient.Tests/*`, `tests/Drivers/ZB.MOM.WW.OtOpcUa.Driver.OpcUaClient.IntegrationTests/OpcUaClientSmokeTests.cs` |
+| Status | Resolved |
+
+**Description:** Unit-test coverage is solid for the pure mappers (MapSeverity, MapUpstreamDataType, MapSecurityPolicy, MapAggregateToNodeId, BuildCertificateIdentity, ResolveEndpointCandidates) and for "throws before init" guards, but the highest-risk behaviours of a gateway driver have no test: the reconnect/session-swap path (OnKeepAlive to OnReconnectComplete, findings -001/-002/-005/-006), browse continuation-point handling (-003), the cascading-quality fan-out on a mid-batch transport failure, and namespace remapping (-004). The reconnect test file itself states wire-level disconnect-reconnect-resume coverage lands with the in-process fixture, i.e. the single largest gateway bug surface (per driver-specs.md section 8) is explicitly untested. The integration suite is Docker-fixture gated against opc-plc and is a smoke test only. The failed-reconnect-to-Faulted and concurrent-keep-alive races are pure-logic paths testable with a fake ISession.
+
+**Recommendation:** Add tests exercising the reconnect callbacks with a stub session (success and give-up cases), a browse test with a paged/continuation-point server stub, and a read-batch test asserting upstream Bad StatusCodes pass through verbatim while a transport throw fans out the local fault code.
+
+**Resolution:** Resolved 2026-05-22 — Added `OpcUaClientMediumFindingsRegressionTests.cs` covering: (1) BadTimeout vs BadCommunicationError status-code distinction for the write-timeout path (Driver.OpcUaClient-009); (2) Byte→UInt16 mapping regression (Driver.OpcUaClient-010); (3) AutoAcceptCertificates warning log assertion (Driver.OpcUaClient-012); (4) GetMemoryFootprint/FlushOptionalCachesAsync contract (Driver.OpcUaClient-013); (5) MapSeverity thresholds, pre-init health, Session null pre-init, GetHostStatuses contract. Wire-level reconnect callback tests remain fixture-gated pending the in-process OPC UA server fixture.
@@ -0,0 +1,209 @@
+# Code Review — Driver.S7.Cli
+
+| Field | Value |
+|---|---|
+| Module | `src/Drivers/Cli/ZB.MOM.WW.OtOpcUa.Driver.S7.Cli` |
+| Reviewer | Claude Code |
+| Review date | 2026-05-22 |
+| Commit reviewed | `76d35d1` |
+| Status | Reviewed |
+| Open findings | 4 |
+
+## Checklist coverage
+
+A comprehensive review completes every category, recording "No issues found" where
+a category produced nothing rather than leaving it blank.
+
+| # | Category | Result |
+|---|---|---|
+| 1 | Correctness & logic bugs | Driver.S7.Cli-001, Driver.S7.Cli-002 |
+| 2 | OtOpcUa conventions | No issues found |
+| 3 | Concurrency & thread safety | No issues found |
+| 4 | Error handling & resilience | Driver.S7.Cli-001, Driver.S7.Cli-003 |
+| 5 | Security | No issues found |
+| 6 | Performance & resource management | Driver.S7.Cli-004 |
+| 7 | Design-document adherence | Driver.S7.Cli-002 |
+| 8 | Code organization & conventions | Driver.S7.Cli-005 |
+| 9 | Testing coverage | Driver.S7.Cli-006 |
+| 10 | Documentation & comments | Driver.S7.Cli-002, Driver.S7.Cli-007 |
+
+## Findings
+
+### Driver.S7.Cli-001
+
+| Field | Value |
+|---|---|
+| Severity | Medium |
+| Category | Error handling & resilience |
+| Location | `src/Drivers/Cli/ZB.MOM.WW.OtOpcUa.Driver.S7.Cli/Commands/WriteCommand.cs:65-80` |
+| Status | Resolved |
+
+**Description:** `WriteCommand.ParseValue` parses numeric and `DateTime` values with the
+raw BCL parsers (`short.Parse`, `float.Parse`, `DateTime.Parse`, etc.). On malformed
+input these throw `FormatException` / `OverflowException`, which are *not*
+`CliFx.Exceptions.CommandException`. CliFx renders a `CommandException` as a clean
+one-line error with a non-zero exit code, but renders any other exception as a full
+.NET stack trace. The `ParseValue` bool path is handled correctly (it throws
+`CommandException` for unrecognised input), so the command is internally inconsistent:
+`write -t Bool -v maybe` gives a friendly message while `write -t Int16 -v xyz` dumps a
+stack trace. The module own test `ParseValue_non_numeric_for_numeric_types_throws`
+asserts the raw `FormatException` leaks, confirming the behaviour is unintended-but-shipped.
+
+**Recommendation:** Wrap the numeric / `DateTime` parses in a `try`/`catch` that
+re-throws `FormatException` and `OverflowException` as
+`CliFx.Exceptions.CommandException` with a message that names the `--type` and the
+offending value — matching the bool path. Update the test to expect `CommandException`.
+
+**Resolution:** Resolved 2026-05-22 — wrapped all numeric/DateTime BCL parses in `try/catch(FormatException)` and `try/catch(OverflowException)` that re-throw as `CommandException` with a message naming the `--type` and the offending value; updated `ParseValue_non_numeric_for_numeric_types_throws` to assert `CommandException`, and added an overflow-edge test.
+
+### Driver.S7.Cli-002
+
+| Field | Value |
+|---|---|
+| Severity | Medium |
+| Category | Design-document adherence |
+| Location | `src/Drivers/Cli/ZB.MOM.WW.OtOpcUa.Driver.S7.Cli/Commands/ReadCommand.cs:22-29`, `Commands/WriteCommand.cs:21-33`, `Commands/SubscribeCommand.cs:18-21`; `docs/Driver.S7.Cli.md:70-73,80-81` |
+| Status | Resolved |
+
+**Description:** The `--type` option help text on `read`, `write`, and `subscribe`
+advertises the full `S7DataType` set (`Int64 / UInt64 / Float64 / String / DateTime`),
+and `docs/Driver.S7.Cli.md` shows a worked `read ... -t String --string-length 80`
+example plus a `--string-length` flag on `read`/`write`. The underlying `S7Driver`
+(`S7Driver.cs:241-245` for reads, `:316-320` for writes) throws `NotSupportedException`
+for `Int64`, `UInt64`, `Float64`, `String`, and `DateTime` — the driver maps that to
+`BadNotSupported`. Consequently every CLI invocation using one of those types — and the
+documented `--string-length` string-read example — fails at runtime with
+`0x803D0000 (Bad)`. The CLI surface and docs promise capability the driver does not yet
+implement.
+
+**Recommendation:** Either (a) trim the `--type` help text and the `--string-length`
+flag/examples to the implemented set (`Bool / Byte / Int16 / UInt16 / Int32 / UInt32 /
+Float32`) until the follow-up driver PR lands, or (b) keep the surface but add a one-line
+"types beyond Float32 are not yet implemented and surface BadNotSupported" caveat to the
+help text and `docs/Driver.S7.Cli.md`. Option (a) is preferred so the CLI does not offer
+options that cannot succeed.
+
+**Resolution:** Resolved 2026-05-22 — updated the `--type` help text on `read`, `write`, and `subscribe` to list the implemented set (Bool/Byte/Int16/UInt16/Int32/UInt32/Float32) and appended a one-line caveat that Int64/UInt64/Float64/String/DateTime are not yet implemented and will return BadNotSupported.
+
+### Driver.S7.Cli-003
+
+| Field | Value |
+|---|---|
+| Severity | Medium |
+| Category | Error handling & resilience |
+| Location | `src/Drivers/Cli/ZB.MOM.WW.OtOpcUa.Driver.S7.Cli/Commands/ProbeCommand.cs:38-50` |
+| Status | Resolved |
+
+**Description:** `ProbeCommand` XML doc and the `Driver.S7.Cli.md` "fastest is the
+device talking" framing say the probe "connects ... prints health" and "surfaces
+`BadNotSupported`" when PUT/GET is disabled. But when the PLC is unreachable (connection
+refused, host down, wrong slot), `driver.InitializeAsync` throws and the exception
+propagates straight out of `ExecuteAsync` — the code that prints `Host:`, `Health:`,
+`Last error:`, and the snapshot is never reached. The most common probe failure (device
+not reachable at all) therefore produces a CliFx stack trace rather than the structured
+health report the command exists to give. Note PUT/GET-disabled only surfaces during
+`ReadAsync` (after a successful connect), so that one path does reach the health print —
+but a refused TCP connect does not.
+
+**Recommendation:** Wrap the `InitializeAsync` + `ReadAsync` body in a `try`/`catch` that,
+on failure, still prints the `Host:` / `CPU:` lines and a `Health:` / `Last error:`
+report derived from `driver.GetHealth()` (which `InitializeAsync` sets to
+`Faulted` with the exception message before re-throwing). The probe should report an
+unreachable device, not crash on it.
+
+**Resolution:** Resolved 2026-05-22 — wrapped the `InitializeAsync` + `ReadAsync` body in a `try/catch` that on any non-cancellation failure still prints the structured `Host:`, `CPU:`, `Health:`, and `Last error:` lines derived from `driver.GetHealth()`, so an unreachable device produces a health report rather than a stack trace.
+
+### Driver.S7.Cli-004
+
+| Field | Value |
+|---|---|
+| Severity | Low |
+| Category | Performance & resource management |
+| Location | `src/Drivers/Cli/ZB.MOM.WW.OtOpcUa.Driver.S7.Cli/Commands/ProbeCommand.cs:36,53`, `Commands/ReadCommand.cs:45,54`, `Commands/WriteCommand.cs:51,60`, `Commands/SubscribeCommand.cs:39,73` |
+| Status | Open |
+
+**Description:** Every command declares the driver with `await using var driver = new
+S7Driver(...)` and *also* calls `await driver.ShutdownAsync(...)` in a `finally` block.
+`S7Driver.DisposeAsync` itself calls `ShutdownAsync`, so shutdown runs twice per command
+(three times for `subscribe`, which also unsubscribes). `ShutdownAsync` is idempotent
+(`Plc?.Close()` is best-effort, `_subscriptions` is cleared) so there is no functional
+bug, but the explicit `finally`-block `ShutdownAsync` call is redundant given the
+`await using`. It is also slightly misleading — a reader may assume the `await using` is
+not actually disposing.
+
+**Recommendation:** Drop the explicit `await driver.ShutdownAsync(...)` from the
+`finally` blocks and rely on `await using` for teardown; keep only the
+`subscribe` command `UnsubscribeAsync`. Alternatively drop `await using`
+and keep the explicit `finally`. Pick one disposal mechanism per command.
+
+**Resolution:** _(open)_
+
+### Driver.S7.Cli-005
+
+| Field | Value |
+|---|---|
+| Severity | Low |
+| Category | Code organization & conventions |
+| Location | `tests/ZB.MOM.WW.OtOpcUa.Driver.S7.Cli.Tests/` |
+| Status | Open |
+
+**Description:** A stale directory `tests/ZB.MOM.WW.OtOpcUa.Driver.S7.Cli.Tests/`
+exists containing only an `obj/` folder — no `.csproj`, no source. The real test
+project lives at `tests/Drivers/Cli/ZB.MOM.WW.OtOpcUa.Driver.S7.Cli.Tests/`. The empty
+directory is a leftover from the project move into `tests/Drivers/Cli/` and is not
+referenced by `ZB.MOM.WW.OtOpcUa.slnx`. It is dead clutter that can mislead anyone
+grepping the tree for the S7 CLI test project.
+
+**Recommendation:** Delete the stale `tests/ZB.MOM.WW.OtOpcUa.Driver.S7.Cli.Tests/`
+directory (including its `obj/`). This is outside the module `src/` tree but is the
+S7 CLI own orphaned test folder, so it belongs to this module cleanup.
+
+**Resolution:** _(open)_
+
+### Driver.S7.Cli-006
+
+| Field | Value |
+|---|---|
+| Severity | Low |
+| Category | Testing coverage |
+| Location | `tests/Drivers/Cli/ZB.MOM.WW.OtOpcUa.Driver.S7.Cli.Tests/WriteCommandParseValueTests.cs` |
+| Status | Open |
+
+**Description:** The only test file covers `WriteCommand.ParseValue` and
+`ReadCommand.SynthesiseTagName`. `S7CommandBase.BuildOptions` — which maps the
+host / port / CPU / rack / slot / timeout flags onto an `S7DriverOptions` and forces
+`Probe.Enabled = false` — has no test, despite being pure, deterministic, and
+`internal`-visible to the test assembly via `InternalsVisibleTo`. A regression that
+dropped `Probe = new S7ProbeOptions { Enabled = false }` (which would start an
+unwanted background probe loop in a one-shot CLI run) or mis-mapped `TimeoutMs` would
+not be caught. `ParseValue` is also missing an explicit overflow-edge test (e.g.
+`Byte` value `256`) — the current `ParseValue_Byte_ranges` test stops at `255`.
+
+**Recommendation:** Add a `BuildOptions` test (assert `Probe.Enabled == false`,
+`Timeout` matches `TimeoutMs`, and host/port/CPU/rack/slot flow through). Add an
+overflow case to the `ParseValue` numeric tests once Driver.S7.Cli-001 is resolved so
+the test asserts the wrapped `CommandException`.
+
+**Resolution:** _(open)_
+
+### Driver.S7.Cli-007
+
+| Field | Value |
+|---|---|
+| Severity | Low |
+| Category | Documentation & comments |
+| Location | `src/Drivers/Cli/ZB.MOM.WW.OtOpcUa.Driver.S7.Cli/Commands/SubscribeCommand.cs:45-51` |
+| Status | Open |
+
+**Description:** The Modbus CLI `SubscribeCommand` carries an explanatory comment on
+the `OnDataChange` handler ("Route every data-change event to the CliFx console (not
+System.Console — the analyzer flags it + IConsole is the testable abstraction)"). The S7
+`SubscribeCommand` is a near-verbatim copy but dropped that comment, so the non-obvious
+reason the handler uses `console.Output.WriteLine` (synchronous, on a driver background
+thread) instead of `System.Console` or the `async` `WriteLineAsync` is undocumented here.
+Minor, but the rationale is worth keeping consistent across the CLI family.
+
+**Recommendation:** Re-add the one-line comment from the Modbus `SubscribeCommand` so
+the S7 copy explains why the event handler writes via `console.Output` synchronously.
+
+**Resolution:** _(open)_
@@ -0,0 +1,408 @@
+# Code Review — Driver.S7
+
+| Field | Value |
+|---|---|
+| Module | `src/Drivers/ZB.MOM.WW.OtOpcUa.Driver.S7` |
+| Reviewer | Claude Code |
+| Review date | 2026-05-22 |
+| Commit reviewed | `76d35d1` |
+| Status | Reviewed |
+| Open findings | 5 |
+
+## Checklist coverage
+
+A comprehensive review completes every category, recording "No issues found" where
+a category produced nothing rather than leaving it blank.
+
+| # | Category | Result |
+|---|---|---|
+| 1 | Correctness & logic bugs | Driver.S7-001, Driver.S7-002, Driver.S7-003 |
+| 2 | OtOpcUa conventions | Driver.S7-004, Driver.S7-005 |
+| 3 | Concurrency & thread safety | Driver.S7-006 |
+| 4 | Error handling & resilience | Driver.S7-007, Driver.S7-008, Driver.S7-009 |
+| 5 | Security | No issues found |
+| 6 | Performance & resource management | Driver.S7-010 |
+| 7 | Design-document adherence | Driver.S7-011, Driver.S7-012 |
+| 8 | Code organization & conventions | Driver.S7-013 |
+| 9 | Testing coverage | Driver.S7-014 |
+| 10 | Documentation & comments | Driver.S7-012 (shared) |
+
+## Findings
+
+### Driver.S7-001
+
+| Field | Value |
+|---|---|
+| Severity | High |
+| Category | Correctness & logic bugs |
+| Location | `S7AddressParser.cs:93`, `S7Driver.cs:231` |
+| Status | Resolved |
+
+**Description:** S7AddressParser.Parse accepts Timer (T0) and Counter (C0)
+addresses and the test suite asserts they parse successfully, but the read path
+cannot serve them. Two problems compound: (1) ReadOneAsync type-mapping switch
+(lines 231-250) has no case for any Timer/Counter combination, so a Timer/Counter
+tag falls through to the default arm and throws InvalidDataException with a
+misleading "type-mismatch" message on every read; (2) the read is issued via
+plc.ReadAsync(tag.Address, ...) passing the raw address string, and S7.Net
+string-based parser does not understand T{n}/C{n} syntax. A tag configured with a
+timer or counter address passes init-time parsing (the docstring promises config
+typos fail fast at init) and then fails on every read - exactly the
+un-diagnosable failure mode the fail-fast parse was meant to prevent.
+
+**Recommendation:** Either drop Timer/Counter from S7AddressParser and S7Area
+until they are wired through to S7.Net, or implement the Timer/Counter read path.
+If kept, reject Timer/Counter tags at InitializeAsync with a clear "not yet
+supported" error rather than letting them parse clean.
+
+**Resolution:** Resolved 2026-05-22 — `InitializeAsync` now runs
+`RejectUnsupportedTagAddresses`, which throws `NotSupportedException` with a
+clear "not yet supported" message (echoing the tag name + address) for any tag
+whose address parses as a Timer or Counter, so the bad config fails fast at init
+rather than throwing a misleading type-mismatch on every read.
+
+### Driver.S7-002
+
+| Field | Value |
+|---|---|
+| Severity | Medium |
+| Category | Correctness & logic bugs |
+| Location | `S7Driver.cs:350` |
+| Status | Resolved |
+
+**Description:** MapDataType collapses S7DataType.UInt32 to DriverDataType.Int32.
+UInt32 values above int.MaxValue (2^31-1) wrap to negative when surfaced to the
+OPC UA client, silently corrupting the value. The inline comment only flags
+Int64/UInt64 as "widens; lossy" but UInt32 to Int32 is equally lossy and is not
+called out.
+
+**Recommendation:** Map UInt32/UInt16 to a DriverDataType wide enough to hold the
+unsigned range, or add the missing unsigned DriverDataType members. At minimum
+correct the comment so the lossiness of UInt32 is documented.
+
+**Resolution:** Resolved 2026-05-22 — added an inline comment to the `MapDataType` switch explicitly documenting the UInt32→Int32 lossiness (same limitation as Int64/UInt64, tracked for a follow-up PR adding unsigned DriverDataType members); the code mapping is unchanged pending that follow-up.
+
+### Driver.S7-003
+
+| Field | Value |
+|---|---|
+| Severity | Low |
+| Category | Correctness & logic bugs |
+| Location | `S7Driver.cs:172`, `S7Driver.cs:255` |
+| Status | Open |
+
+**Description:** ReadAsync and WriteAsync dereference fullReferences.Count /
+writes.Count with no null guard. A null argument throws NullReferenceException
+rather than ArgumentNullException, and the NRE escapes before the _gate is taken
+so it is not wrapped in a per-item status. DiscoverAsync correctly uses
+ArgumentNullException.ThrowIfNull(builder); the read/write entry points are
+inconsistent with it.
+
+**Recommendation:** Add ArgumentNullException.ThrowIfNull for the list parameters
+at the top of ReadAsync and WriteAsync.
+
+**Resolution:** _(open)_
+
+### Driver.S7-004
+
+| Field | Value |
+|---|---|
+| Severity | Medium |
+| Category | OtOpcUa conventions |
+| Location | `S7Driver.cs` (whole file) |
+| Status | Resolved |
+
+**Description:** The driver performs no logging. CLAUDE.md Library Preferences
+mandate Serilog with a rolling daily file sink. Every error path is an empty
+catch block (Initialize cleanup line 130, ShutdownAsync lines 142/149/153,
+ProbeLoop line 483, PollLoop lines 396/406, Dispose line 511). Connection faults,
+probe transitions, PUT/GET-disabled config errors, and poll-loop exceptions are
+all silently swallowed. An operator has only the DriverHealth.LastError string
+and no event trail to diagnose an intermittent PLC.
+
+**Recommendation:** Inject an ILogger/ILoggerFactory and log connect
+success/failure, probe Running/Stopped transitions, PUT/GET-disabled detection,
+and swallowed poll-loop / shutdown exceptions.
+
+**Resolution:** Resolved 2026-05-22 — injected `ILogger<S7Driver>` (optional, defaults to `NullLogger`) into the primary constructor; added structured log calls for connect success/failure, probe Running/Stopped transitions, and swallowed poll-loop exceptions, giving operators an event trail via Serilog.
+
+### Driver.S7-005
+
+| Field | Value |
+|---|---|
+| Severity | Low |
+| Category | OtOpcUa conventions |
+| Location | `S7Driver.cs:33`, `S7Driver.cs:433` |
+| Status | Open |
+
+**Description:** System.Collections.Concurrent.ConcurrentDictionary is written
+out with a fully-qualified namespace at the field declarations instead of a
+using System.Collections.Concurrent directive. ImplicitUsings is enabled and the
+rest of the codebase relies on using directives; the inline FQN is inconsistent
+with house style. Similar redundant global::S7.Net.* qualifiers appear throughout
+S7Driver.cs despite the file-top using S7.Net.
+
+**Recommendation:** Add using System.Collections.Concurrent and drop the
+redundant global::S7.Net. qualifiers where using S7.Net already covers them.
+
+**Resolution:** _(open)_
+
+### Driver.S7-006
+
+| Field | Value |
+|---|---|
+| Severity | High |
+| Category | Concurrency & thread safety |
+| Location | `S7Driver.cs:140`, `S7Driver.cs:457`, `S7Driver.cs:506` |
+| Status | Resolved |
+
+**Description:** Disposal races with the in-flight probe / poll tasks.
+ShutdownAsync calls _probeCts.Cancel() and cancels each subscription CTS, but it
+does not await the ProbeLoopAsync / PollLoopAsync tasks (they are fire-and-forget
+Task.Run with the task handle discarded). DisposeAsync then calls ShutdownAsync
+followed immediately by _gate.Dispose(). A probe or poll iteration that is
+between _gate.WaitAsync and _gate.Release() when cancellation fires will call
+Release() (line 479) or have WaitAsync observe a disposed semaphore -
+ObjectDisposedException. The probe loop broad catch swallows it, but the
+disposal-ordering bug is real: the semaphore can be disposed while a worker still
+holds or is waiting on it. The same applies to _probeCts.Dispose() (line 143)
+running while ProbeLoopAsync may still touch the linked token.
+
+**Recommendation:** Track the probe and poll Task handles, and in ShutdownAsync
+(or DisposeAsync) await Task.WhenAll(...) with a bounded timeout after cancelling,
+before disposing _gate and the CTS objects.
+
+**Resolution:** Resolved 2026-05-22 — the probe loop now stores its Task in
+`_probeTask` and each subscription records its poll Task in `SubscriptionState.PollTask`.
+`ShutdownAsync` cancels every CTS, awaits `Task.WhenAll` of those handles with a
+bounded 5 s `DrainTimeout`, and only then disposes `_probeCts`, the subscription
+CTSs, and (via `DisposeAsync`) `_gate` — so no loop can touch a disposed
+semaphore. `Task.Run` is now passed `CancellationToken.None` so the handle is
+always awaitable even if the token is already cancelled.
+
+### Driver.S7-007
+
+| Field | Value |
+|---|---|
+| Severity | High |
+| Category | Error handling & resilience |
+| Location | `S7Driver.cs:200`, `S7DriverOptions.cs:13`, `docs/v2/driver-specs.md:434` |
+| Status | Resolved |
+
+**Description:** PUT/GET-disabled handling contradicts the design and the
+module own docstring. driver-specs.md section 5 (line 434) and the
+S7DriverOptions class remark both state PUT/GET-disabled must be mapped to
+BadNotSupported and surfaced as a configuration alert, not a transient fault,
+because blind retry is wasted effort. The actual code (ReadAsync, lines 200-208)
+catches every S7.Net.PlcException and maps it to StatusBadDeviceFailure, then
+sets health to Degraded. Consequences: (1) a genuinely transient PlcException
+(e.g. CPU briefly in STOP) is reported identically to a permanent PUT/GET
+misconfiguration - the operator cannot tell a config problem from a transient
+one, which is the exact distinction the spec demands; (2) the promised
+BadNotSupported status code is never produced, so the S7DriverOptions docstring
+is now false.
+
+**Recommendation:** Inspect PlcException.ErrorCode and map the
+PUT/GET-disabled / access-denied code to BadNotSupported with a distinct
+config-alert health state; keep BadDeviceFailure/Degraded only for genuine
+device-fault error codes.
+
+**Resolution:** Resolved 2026-05-22 — `ReadAsync` / `WriteAsync` now split the
+`PlcException` catch via an `IsAccessDenied` filter. S7.Net exposes no typed
+error code for the S7 `AccessingObjectNotAllowed` status (its
+`ValidateResponseCode` throws a plain `Exception` wrapped as the inner exception
+of a `PlcException`), so `IsAccessDenied` walks the inner-exception chain for the
+"not allowed" marker. A PUT/GET-disabled fault now maps to `BadNotSupported` and
+sets health to `Faulted` with a config-alert message pointing operators at the
+TIA Portal PUT/GET toggle; a genuine device fault still maps to
+`BadDeviceFailure`/`Degraded`.
+
+### Driver.S7-008
+
+| Field | Value |
+|---|---|
+| Severity | Medium |
+| Category | Error handling & resilience |
+| Location | `S7Driver.cs:286` |
+| Status | Resolved |
+
+**Description:** WriteAsync catch ladder is coarser than ReadAsync and loses
+information. The generic catch (Exception) maps everything - socket errors,
+timeouts, OverflowException from Convert.ToInt16 of an out-of-range value,
+NullReferenceException from Convert.ToBoolean(null) - to StatusBadInternalError.
+A genuine transport fault during a write is reported to the client as an internal
+error rather than BadCommunicationError, and unlike ReadAsync the write path never
+updates _health on failure, so a PLC that is down stays Healthy in the dashboard
+as long as only writes are attempted. OperationCanceledException is also caught
+and turned into a status code rather than propagating.
+
+**Recommendation:** Mirror the ReadAsync catch structure: let
+OperationCanceledException propagate, map socket/timeout faults to
+BadCommunicationError, map value-conversion failures to a distinct out-of-range
+status, and update _health to Degraded on transport failures.
+
+**Resolution:** Resolved 2026-05-22 — restructured `WriteAsync` catch ladder: `OperationCanceledException` now re-throws, genuine `PlcException` transport faults map to `BadDeviceFailure`/`Degraded`, `NotSupportedException` maps to `BadNotSupported`, the `IsAccessDenied` PlcException path maps to `BadNotSupported`/`Faulted`, and the catch-all maps to `BadCommunicationError` with a health update — matching `ReadAsync`'s structure.
+
+### Driver.S7-009
+
+| Field | Value |
+|---|---|
+| Severity | Low |
+| Category | Error handling & resilience |
+| Location | `S7Driver.cs:392` |
+| Status | Open |
+
+**Description:** The subscription poll loop never reflects sustained polling
+failure anywhere an operator can see it. PollLoopAsync swallows every
+non-cancellation exception with an empty catch and the comment claims "the health
+surface reflects it" - but a poll failure routes through ReadAsync, which only
+sets DriverState.Degraded when the per-tag read throws inside the gate;
+exceptions thrown before that (e.g. RequirePlc() when Plc is null after a drop)
+bypass the health update entirely. A subscription against an uninitialized or
+dropped driver loops forever silently, with no backoff - re-polling every
+Interval indefinitely on a hard failure.
+
+**Recommendation:** Have the poll loop update _health on repeated failure and
+apply a capped backoff after consecutive errors; at minimum log the swallowed
+exception (see Driver.S7-004).
+
+**Resolution:** _(open)_
+
+### Driver.S7-010
+
+| Field | Value |
+|---|---|
+| Severity | Low |
+| Category | Performance & resource management |
+| Location | `S7Driver.cs:504` |
+| Status | Open |
+
+**Description:** Dispose() is implemented as
+DisposeAsync().AsTask().GetAwaiter().GetResult() - sync-over-async. Inside the
+generic host this is currently safe (no captured SynchronizationContext), but it
+is a known deadlock pattern. The only async work behind DisposeAsync is
+ShutdownAsync, which does nothing async (returns Task.CompletedTask). The
+blocking wrap is unnecessary risk.
+
+**Recommendation:** Since ShutdownAsync is effectively synchronous, have Dispose()
+perform the teardown directly (cancel CTSs, close Plc, dispose _gate) without
+round-tripping through the async path.
+
+**Resolution:** _(open)_
+
+### Driver.S7-011
+
+| Field | Value |
+|---|---|
+| Severity | High |
+| Category | Design-document adherence |
+| Location | `S7Driver.cs:82`, `S7Driver.cs:134`, `IDriver.cs:24` |
+| Status | Resolved |
+
+**Description:** S7Driver ignores the driverConfigJson parameter on both
+InitializeAsync and ReinitializeAsync. The IDriver contract states InitializeAsync
+initializes the driver "from its DriverConfig JSON" and ReinitializeAsync "applies
+a config change in place". All configuration is instead captured in the
+constructor (S7DriverOptions options), and ReinitializeAsync simply calls
+ShutdownAsync then InitializeAsync with the same options object. Consequently a
+config change delivered to ReinitializeAsync (the documented IGenerationApplier
+recovery path per driver-stability.md) is silently discarded - the driver
+re-opens with the old config. This breaks the only Core-initiated in-process
+recovery path.
+
+**Recommendation:** Either re-parse driverConfigJson inside
+InitializeAsync/ReinitializeAsync and rebuild _options from it, or document
+explicitly that S7 reconfiguration requires instance recreation and have
+ReinitializeAsync signal that the passed JSON is unused so the contract mismatch
+is visible.
+
+**Resolution:** Resolved 2026-05-22 — config parsing was factored out of the
+factory into `S7DriverFactoryExtensions.ParseOptions`. `InitializeAsync` (and
+therefore `ReinitializeAsync`, which delegates to it) now re-parses
+`driverConfigJson` and rebuilds `_options` from it whenever the document carries
+a real body, so a config change delivered through `ReinitializeAsync` — the only
+Core-initiated in-process recovery path — is honoured. An empty / placeholder
+document (`""`, `{}`, `[]`) keeps the constructor-supplied options so existing
+lifecycle unit tests that pass `"{}"` are unaffected.
+
+### Driver.S7-012
+
+| Field | Value |
+|---|---|
+| Severity | Medium |
+| Category | Design-document adherence |
+| Location | `S7DriverOptions.cs:59`, `S7Driver.cs:457` |
+| Status | Resolved |
+
+**Description:** S7ProbeOptions.ProbeAddress is configured (default "MW0"),
+documented at length ("the driver runs a tick loop that issues a cheap read
+against S7ProbeOptions.ProbeAddress"), surfaced in the factory DTO
+(S7ProbeDto.ProbeAddress), and parsed from JSON - but it is never read by any
+code. ProbeLoopAsync probes liveness via plc.ReadStatusAsync (CPU status), not via
+a read of ProbeAddress. The XML doc on the S7DriverOptions.Probe property and on
+ProbeAddress describes behaviour the driver does not implement. An operator who
+sets ProbeAddress to a known-good DB word expecting the probe to exercise it will
+see no effect.
+
+**Recommendation:** Either make ProbeLoopAsync actually read ProbeAddress
+(parsing it once at init and rejecting a bad value early), or delete ProbeAddress
+from S7ProbeOptions/S7ProbeDto and correct the XML docs to describe the
+ReadStatusAsync-based probe.
+
+**Resolution:** Resolved 2026-05-22 — removed `ProbeAddress` from `S7ProbeOptions` and `S7ProbeDto`; updated the `S7DriverOptions.Probe` XML doc to describe the `ReadStatusAsync`-based probe accurately. Existing configs that set `probeAddress` are silently ignored (unknown JSON fields are tolerated by the deserializer).
+
+### Driver.S7-013
+
+| Field | Value |
+|---|---|
+| Severity | Low |
+| Category | Code organization & conventions |
+| Location | `S7DriverOptions.cs:90`, `S7Driver.cs:300` |
+| Status | Open |
+
+**Description:** S7TagDefinition.StringLength is a public configured/JSON-bound
+parameter (default 254) but is dead: S7DataType.String reads and writes both
+throw NotSupportedException ("...land in a follow-up PR"), so StringLength is
+never consumed. Likewise S7DataType.Int64, UInt64, Float64, String, and DateTime
+are exposed as configurable, browse through MapDataType into real DriverDataType
+values, and pass DiscoverAsync - creating address-space nodes - yet every
+read/write of them throws NotSupportedException, becoming BadNotSupported. A site
+can configure a Float64 tag, see the node appear, and get BadNotSupported on
+every access. The scaffold/follow-up-PR split leaks half-implemented types into
+the configurable surface.
+
+**Recommendation:** Reject the not-yet-implemented S7DataType values (and
+StringLength) at InitializeAsync / factory validation with a clear "not yet
+supported" error, so a partially-implemented type cannot be configured into a
+live address space.
+
+**Resolution:** _(open)_
+
+### Driver.S7-014
+
+| Field | Value |
+|---|---|
+| Severity | Medium |
+| Category | Testing coverage |
+| Location | `tests/Drivers/ZB.MOM.WW.OtOpcUa.Driver.S7.Tests/` |
+| Status | Resolved |
+
+**Description:** Test coverage has notable gaps for the driver behavioural
+core: (1) no test exercises the ReadOneAsync type-reinterpret switch (Int16 from
+ushort, Int32 from uint, Float32 from UInt32 bits) - the most logic-heavy method
+in the driver is untested, and the unsigned/signed unchecked casts are
+unverified; (2) no test covers a Timer/Counter tag end-to-end, which would have
+caught Driver.S7-001; (3) no test covers WriteOneAsync boxing conversions or
+the out-of-range Convert failure paths; (4) the read-write tests only cover error
+paths (uninitialized, bad address) - the happy path is explicitly deferred to "a
+follow-up PR" with no mock S7 server, leaving the entire successful read, write,
+poll, and probe-transition surface untested; (5) ReinitializeAsync and the
+driverConfigJson-ignored behaviour (Driver.S7-011) has no test.
+
+**Recommendation:** Add unit tests for ReadOneAsync/WriteOneAsync type mapping by
+factoring the pure reinterpret/boxing logic out of the PLC round-trip so it is
+testable without a live PLC, and add a Timer/Counter rejection test. Track the
+live/mock-server happy-path coverage as an explicit follow-up rather than an
+open-ended deferral.
+
+**Resolution:** Resolved 2026-05-22 — factored `ReadOneAsync` type-reinterpret into `internal static ReinterpretRawValue` and `WriteOneAsync` boxing into `internal static BoxValueForWrite`; added `S7TypeMappingTests.cs` (26 tests) covering every implemented type round-trip (Bool/Byte/UInt16/Int16/UInt32/Int32/Float32), unsupported-type `NotSupportedException` assertions, and write overflow paths.
@@ -0,0 +1,202 @@
+# Code Review — Driver.TwinCAT.Cli
+
+| Field | Value |
+|---|---|
+| Module | `src/Drivers/Cli/ZB.MOM.WW.OtOpcUa.Driver.TwinCAT.Cli` |
+| Reviewer | Claude Code |
+| Review date | 2026-05-22 |
+| Commit reviewed | `76d35d1` |
+| Status | Reviewed |
+| Open findings | 7 |
+
+## Checklist coverage
+
+A comprehensive review completes every category, recording "No issues found" where
+a category produced nothing rather than leaving it blank.
+
+| # | Category | Result |
+|---|---|---|
+| 1 | Correctness & logic bugs | Driver.TwinCAT.Cli-001 |
+| 2 | OtOpcUa conventions | No issues found |
+| 3 | Concurrency & thread safety | Driver.TwinCAT.Cli-002 |
+| 4 | Error handling & resilience | Driver.TwinCAT.Cli-003 |
+| 5 | Security | No issues found |
+| 6 | Performance & resource management | No issues found |
+| 7 | Design-document adherence | Driver.TwinCAT.Cli-004 |
+| 8 | Code organization & conventions | Driver.TwinCAT.Cli-005 |
+| 9 | Testing coverage | Driver.TwinCAT.Cli-006 |
+| 10 | Documentation & comments | Driver.TwinCAT.Cli-007 |
+
+## Findings
+
+<!-- One ### entry per finding. IDs are <Module>-NNN, sequential within the module,
+     never reused. Findings are never deleted — close them by changing Status and
+     completing Resolution. -->
+
+### Driver.TwinCAT.Cli-001
+
+| Field | Value |
+|---|---|
+| Severity | Low |
+| Category | Correctness & logic bugs |
+| Location | `TwinCATCommandBase.cs:23-24`, `Commands/SubscribeCommand.cs:23-24`, `Commands/BrowseCommand.cs:21-24` |
+| Status | Open |
+
+**Description:** Numeric command options are accepted without range validation. `--timeout-ms`
+feeds `Timeout => TimeSpan.FromMilliseconds(TimeoutMs)`; passing `--timeout-ms 0` or a negative
+value yields `TimeSpan.Zero`/a negative `TimeSpan`, which is then handed to the driver's
+`TwinCATDriverOptions.Timeout` and on to `ITwinCATClient.ConnectAsync`, producing an immediate
+failure or undefined behaviour rather than a clear "bad argument" message. The same applies to
+`subscribe --interval-ms` (negative -> `TimeSpan.FromMilliseconds(negative)` passed to
+`SubscribeAsync`) and `--ams-port` (`AmsPort` accepts negative / out-of-range port numbers,
+which only surface later as an opaque transport error). For a commissioning/diagnostic tool the
+failure mode should be a readable up-front rejection.
+
+**Recommendation:** Validate the numeric options at the top of each `ExecuteAsync` (or via a
+shared helper on `TwinCATCommandBase`) and throw `CliFx.Exceptions.CommandException` with a
+clear message when `TimeoutMs <= 0`, `IntervalMs <= 0`, or `AmsPort` falls outside `1..65535`.
+
+**Resolution:** _(open)_
+
+### Driver.TwinCAT.Cli-002
+
+| Field | Value |
+|---|---|
+| Severity | Low |
+| Category | Concurrency & thread safety |
+| Location | `Commands/SubscribeCommand.cs:46-58` |
+| Status | Open |
+
+**Description:** The `OnDataChange` handler calls `console.Output.WriteLine(line)` synchronously.
+In native ADS-notification mode the event is raised from the `Beckhoff.TwinCAT.Ads`
+notification callback thread (see `TwinCATDriver.SubscribeAsync`, which invokes `OnDataChange`
+from the ADS `AddNotificationAsync` callback). That write can interleave with the main thread's
+`console.Output.WriteLineAsync(...)` "Subscribed to ..." banner and with subsequent change
+events if the PLC pushes faster than a single write completes. A `TextWriter` is not guaranteed
+thread-safe, so output lines can be garbled — undesirable for a tool whose stated purpose is
+producing clean screen-recorded bug-report timelines. The same pattern exists in the other
+driver CLIs (S7/Modbus), but those go through `PollGroupEngine`, whose change callbacks are
+serialised on one poll loop; the TwinCAT native path has no such serialisation.
+
+**Recommendation:** Serialise console writes from the change handler, e.g. wrap the
+`WriteLine` body in a `lock` on a private object that the banner write also takes, or use
+`TextWriter.Synchronized`. At minimum, gate it so the banner is written before the
+subscription is registered (it already is) and lock the per-event writes against each other.
+
+**Resolution:** _(open)_
+
+### Driver.TwinCAT.Cli-003
+
+| Field | Value |
+|---|---|
+| Severity | Low |
+| Category | Error handling & resilience |
+| Location | `Commands/SubscribeCommand.cs:56-58` |
+| Status | Open |
+
+**Description:** The subscribe banner reports the mechanism purely from the `--poll-only` flag
+(`var mode = PollOnly ? "polling" : "ADS notification"`). The doc (`docs/Driver.TwinCAT.Cli.md`)
+states the banner "announces which mechanism is in play". The CLI always declares exactly one
+tag, so a registration that produces zero notification handles is unlikely, but `TwinCATDriver.
+SubscribeAsync` silently `continue`s past any reference not found in `_tagsByName`/`_devices`
+and a poll-mode fallback inside the driver is also possible in principle. The banner therefore
+asserts a mechanism it has not actually confirmed. It is informational only, so the impact is
+limited to a misleading diagnostic line.
+
+**Recommendation:** Either derive the banner text from observable subscription state (e.g. the
+returned `ISubscriptionHandle.DiagnosticId`, which is `twincat-native-sub-*` for the native
+path vs the `PollGroupEngine` handle for poll mode) or soften the wording to "(requested:
+ADS notification)" so it does not over-claim.
+
+**Resolution:** _(open)_
+
+### Driver.TwinCAT.Cli-004
+
+| Field | Value |
+|---|---|
+| Severity | Low |
+| Category | Design-document adherence |
+| Location | `TwinCATCommandBase.cs:26-29`, `Commands/BrowseCommand.cs` |
+| Status | Open |
+
+**Description:** `--poll-only` is declared on `TwinCATCommandBase`, so it is inherited by
+`browse`. `BrowseCommand` only ever calls `DiscoverAsync` — it never subscribes — so
+`UseNativeNotifications = !PollOnly` has no observable effect on a browse run. The flag still
+appears in `otopcua-twincat-cli browse --help`, implying it changes browse behaviour when it
+does not. `docs/Driver.TwinCAT.Cli.md` documents `--poll-only` only under `subscribe` and lists
+no per-command flags for `browse` beyond `--prefix`/`--max`, so the help text and the docs
+disagree.
+
+**Recommendation:** Move `--poll-only` (and arguably the notification-only relevance of the
+flag) onto an intermediate base shared by only `probe`/`read`/`subscribe`, or override/hide it
+for `browse`. Alternatively document explicitly that the flag is a no-op for `browse`.
+
+**Resolution:** _(open)_
+
+### Driver.TwinCAT.Cli-005
+
+| Field | Value |
+|---|---|
+| Severity | Low |
+| Category | Code organization & conventions |
+| Location | `Commands/ProbeCommand.cs:23`, `Commands/ReadCommand.cs:20`, `Commands/WriteCommand.cs:20`, `Commands/SubscribeCommand.cs:18` |
+| Status | Open |
+
+**Description:** The `--type` option is declared with the short alias `-t` on `read`, `write`,
+and `subscribe`, but `ProbeCommand` declares `[CommandOption("type", ...)]` with no short
+alias. An operator who has internalised `-t` from the other three verbs will get a CliFx
+"unknown option" error on `probe -t Bool`. The inconsistency is gratuitous — all four commands
+take the same `TwinCATDataType` option.
+
+**Recommendation:** Add the `'t'` short alias to `ProbeCommand`'s `--type` option to match the
+other three commands.
+
+**Resolution:** _(open)_
+
+### Driver.TwinCAT.Cli-006
+
+| Field | Value |
+|---|---|
+| Severity | Low |
+| Category | Testing coverage |
+| Location | `tests/Drivers/Cli/ZB.MOM.WW.OtOpcUa.Driver.TwinCAT.Cli.Tests/WriteCommandParseValueTests.cs` |
+| Status | Open |
+
+**Description:** The only test file covers `WriteCommand.ParseValue` and
+`ReadCommand.SynthesiseTagName`. Other deterministic, router-independent logic is untested:
+`TwinCATCommandBase.Gateway` (the `ads://{netId}:{port}` string the driver's
+`TwinCATAmsAddress.TryParse` consumes — a regression here breaks every command), `BuildOptions`
+(tag wiring, `UseNativeNotifications` toggle, `Probe.Enabled = false`), and `BrowseCommand`'s
+`CollectingAddressSpaceBuilder` with its `--prefix`/`--max` filtering and the `RO`/`RW` access
+derivation. These are pure and can be unit-tested without an AMS router. `InternalsVisibleTo`
+is already wired for the test assembly. Note also the stale empty sibling test directory
+`tests/ZB.MOM.WW.OtOpcUa.Driver.TwinCAT.Cli.Tests` (no project, no files) — out of this
+module's scope but worth flagging to whoever owns the test tree.
+
+**Recommendation:** Add unit tests for `Gateway`/`DriverInstanceId` string composition, for
+`BuildOptions` field wiring, and for the `CollectingAddressSpaceBuilder` prefix/max filtering
+and access-classification logic.
+
+**Resolution:** _(open)_
+
+### Driver.TwinCAT.Cli-007
+
+| Field | Value |
+|---|---|
+| Severity | Low |
+| Category | Documentation & comments |
+| Location | `TwinCATCommandBase.cs:31-36` |
+| Status | Open |
+
+**Description:** The `Timeout` override has an empty `init` accessor with the comment
+`/* driven by TimeoutMs */`. Because the base `DriverCommandBase.Timeout` is declared
+`abstract { get; init; }`, the override must supply an `init`, but here it silently discards
+any value. This is intentional, yet the empty body invites a future maintainer to "fix" it by
+adding a backing field, which would then diverge from `TimeoutMs`. The XML `<inheritdoc/>`
+gives no hint of the deliberate no-op. This is a maintainability/clarity nit, not a bug.
+
+**Recommendation:** Replace `<inheritdoc/>` with a short summary stating that `Timeout` is a
+computed projection of `--timeout-ms` and the `init` accessor is intentionally a no-op, so the
+design intent survives refactoring.
+
+**Resolution:** _(open)_
@@ -0,0 +1,426 @@
+# Code Review — Driver.TwinCAT
+
+| Field | Value |
+|---|---|
+| Module | `src/Drivers/ZB.MOM.WW.OtOpcUa.Driver.TwinCAT` |
+| Reviewer | Claude Code |
+| Review date | 2026-05-22 |
+| Commit reviewed | `76d35d1` |
+| Status | Reviewed |
+| Open findings | 5 |
+
+## Checklist coverage
+
+A comprehensive review completes every category, recording "No issues found" where
+a category produced nothing rather than leaving it blank.
+
+| # | Category | Result |
+|---|---|---|
+| 1 | Correctness & logic bugs | Driver.TwinCAT-001, Driver.TwinCAT-002, Driver.TwinCAT-003, Driver.TwinCAT-004 |
+| 2 | OtOpcUa conventions | Driver.TwinCAT-005, Driver.TwinCAT-006 |
+| 3 | Concurrency & thread safety | Driver.TwinCAT-007, Driver.TwinCAT-008, Driver.TwinCAT-009 |
+| 4 | Error handling & resilience | Driver.TwinCAT-010, Driver.TwinCAT-011 |
+| 5 | Security | No issues found |
+| 6 | Performance & resource management | Driver.TwinCAT-012 |
+| 7 | Design-document adherence | Driver.TwinCAT-013, Driver.TwinCAT-014 |
+| 8 | Code organization & conventions | Driver.TwinCAT-015 |
+| 9 | Testing coverage | Driver.TwinCAT-016 |
+| 10 | Documentation & comments | Driver.TwinCAT-004 (data-type comment), Driver.TwinCAT-014 |
+
+## Findings
+
+### Driver.TwinCAT-001
+
+| Field | Value |
+|---|---|
+| Severity | High |
+| Category | Correctness & logic bugs |
+| Location | `TwinCATDriver.cs:41-78` |
+| Status | Resolved |
+
+**Description:** `InitializeAsync` and `ReinitializeAsync` both ignore their `driverConfigJson`
+parameter entirely. `InitializeAsync` builds device/tag state exclusively from `_options`,
+captured once in the constructor. `ReinitializeAsync` calls `ShutdownAsync` then
+`InitializeAsync(driverConfigJson, ...)` — but since `InitializeAsync` never reads
+`driverConfigJson`, a `ReinitializeAsync` with a changed config silently re-applies the
+original constructor-time options. Per `IDriver.ReinitializeAsync` docs and
+`docs/v2/driver-stability.md` section "In-process only (Tier A/B)", `Reinitialize` is the only
+Core-initiated path to apply a new config generation without a process restart. As written,
+config changes (added/removed devices, tags, probe settings) to a TwinCAT driver instance
+are never picked up at runtime.
+
+**Recommendation:** Parse `driverConfigJson` in `InitializeAsync` (reuse
+`TwinCATDriverFactoryExtensions` DTO + option-builder logic — extract it to a shared static
+parser) and assign the resulting options to a mutable field, rather than relying on the
+constructor-captured `_options`. Alternatively, document explicitly that the constructor is
+the sole config source and have the Core recreate the driver instance on config change.
+
+**Resolution:** Resolved 2026-05-22 — extracted the DTO→options parse into a shared TwinCATDriverFactoryExtensions.ParseOptions; InitializeAsync re-parses driverConfigJson into a now-mutable _options field, so ReinitializeAsync applies a changed config generation.
+
+### Driver.TwinCAT-002
+
+| Field | Value |
+|---|---|
+| Severity | High |
+| Category | Correctness & logic bugs |
+| Location | `TwinCATDataType.cs:34-48`, `AdsTwinCATClient.cs:264-281` |
+| Status | Resolved |
+
+**Description:** `TwinCATDataTypeExtensions.ToDriverDataType` maps `LInt` and `ULInt` (signed/
+unsigned 64-bit) to `DriverDataType.Int32` (comment: "matches Int64 gap"). The address-space
+layer therefore creates a 32-bit OPC UA node for a 64-bit PLC value. Meanwhile
+`AdsTwinCATClient.MapToClrType` reads `LInt`/`ULInt` as `long`/`ulong` (64-bit), so the read
+path returns a boxed `long`/`ulong` into a node typed Int32. `DriverDataType` already has an
+`Int64`/`UInt64` member (`DriverDataType.cs:16-19`), so the "gap" the comment refers to does
+not exist. Values above `int.MaxValue` are silently truncated or produce a type mismatch at
+the OPC UA encode layer; `UDInt` is also folded into `Int32`, so unsigned 32-bit values in
+the range 0x80000000 to 0xFFFFFFFF surface as negative.
+
+**Recommendation:** Map `LInt` to `Int64`, `ULInt` to `UInt64`, `UDInt` to `UInt32`, `UInt`
+to `UInt16`, and `USInt`/`SInt` to their natural widths. Remove the stale "Int64 gap" comment.
+
+**Resolution:** Resolved 2026-05-22 — ToDriverDataType now maps LInt→Int64, ULInt→UInt64, UDInt→UInt32, UInt/USInt→UInt16, Int/SInt→Int16, and the IEC time types→UInt32; removed the stale Int64-gap comment.
+
+### Driver.TwinCAT-003
+
+| Field | Value |
+|---|---|
+| Severity | Medium |
+| Category | Correctness & logic bugs |
+| Location | `AdsTwinCATClient.cs:264-281`, `283-300` |
+| Status | Resolved |
+
+**Description:** `MapToClrType` has a `_ => typeof(int)` fallthrough and `ConvertForWrite` has
+a `_ => throw NotSupportedException` fallthrough. `TwinCATDataType.Structure` is a declared
+enum member, and a config-supplied tag can carry `DataType: "Structure"` because `ParseEnum`
+in the factory accepts any enum name case-insensitively. A `Structure` tag therefore reads as
+a 4-byte `int` against whatever the symbol actually is (a UDT blob) — a garbage/out-of-bounds
+read rather than a clean rejection — while a write fails late with `NotSupportedException`.
+Discovery `ToDriverDataType` maps `Structure` to `String`, compounding the inconsistency.
+
+**Recommendation:** Reject `Structure`-typed pre-declared tags at `InitializeAsync` /
+`TwinCATDriverFactoryExtensions.BuildTag` time with a clear error — the driver atomic surface
+does not support UDT tags, and `BrowseSymbolsAsync` already correctly yields
+`DataType = null` for them.
+
+**Resolution:** Resolved 2026-05-22 — `BuildTag` now parses the `DataType` field first and rejects `TwinCATDataType.Structure` with an `InvalidOperationException` that names the tag and explains the limitation; configuration-time failure replaces the previous silent garbage read or late `NotSupportedException`.
+
+### Driver.TwinCAT-004
+
+| Field | Value |
+|---|---|
+| Severity | Low |
+| Category | Correctness & logic bugs |
+| Location | `TwinCATDataType.cs:24-27` |
+| Status | Open |
+
+**Description:** The inline comments for the IEC time types are inaccurate. TwinCAT `TIME` is
+a duration (32-bit, milliseconds) — not "ms since epoch of day". `DATE` is stored as seconds
+since 1970-01-01 (truncated to a day boundary), not "days since 1970-01-01". These types are
+also all read/written as raw `uint` and mapped to `DriverDataType.Int32` — the operator sees
+a raw counter, not a usable date/duration. Misleading comments will mislead the next
+implementer who tries to add proper conversion.
+
+**Recommendation:** Correct the comments to match the TwinCAT/IEC 61131-3 representation. If
+date/time semantics are intended to be exposed properly, track a follow-up to decode them to
+`DriverDataType.DateTime`; otherwise document that they surface as raw counters.
+
+**Resolution:** _(open)_
+
+### Driver.TwinCAT-005
+
+| Field | Value |
+|---|---|
+| Severity | Medium |
+| Category | OtOpcUa conventions |
+| Location | `TwinCATDriver.cs` (whole file), `AdsTwinCATClient.cs` (whole file) |
+| Status | Resolved |
+
+**Description:** The driver performs no logging. `CLAUDE.md` Library Preferences mandate
+Serilog with a rolling daily file sink. Connect failures, ADS error codes, symbol-browse
+failures (`DiscoverAsync` swallows them in a bare `catch`), notification-registration
+failures, and probe state transitions all vanish into status fields or are swallowed
+silently. Operators get a `Degraded` health string with no correlatable log trail.
+
+**Recommendation:** Inject an `ILogger`/Serilog logger and log at minimum: connect
+success/failure per device, ADS errors with code, symbol-browse fallback (the `DiscoverAsync`
+catch), native-notification registration failures, and host state transitions
+(`TransitionDeviceState`).
+
+**Resolution:** Resolved 2026-05-22 — added optional `ILogger<TwinCATDriver>` constructor parameter (defaults to `NullLogger`); logs connect success/failure in `EnsureConnectedAsync`, ADS read errors in `ReadAsync`, symbol-browse fallback in `DiscoverAsync`, notification-registration failures in `SubscribeAsync`, and host-state transitions in `TransitionDeviceState`.
+
+### Driver.TwinCAT-006
+
+| Field | Value |
+|---|---|
+| Severity | Low |
+| Category | OtOpcUa conventions |
+| Location | `TwinCATDriver.cs:406-411` |
+| Status | Open |
+
+**Description:** `ResolveHost` falls back to `DriverInstanceId` when there are no configured
+devices and the reference is unknown. `DriverInstanceId` is a logical config-DB identifier,
+not a host address; `IPerCallHostResolver` consumers expect a host key that correlates with
+`GetHostStatuses()` entries (`HostConnectivityStatus.HostName` equals
+`device.Options.HostAddress`). Returning the instance ID produces a host key that matches no
+connectivity-status row.
+
+**Recommendation:** Return a stable sentinel that will not be confused with a real host (an
+empty string or a documented unresolved marker), or document why the instance ID is the chosen
+fallback. Prefer the first device HostAddress only when one exists (already done).
+
+**Resolution:** _(open)_
+
+### Driver.TwinCAT-007
+
+| Field | Value |
+|---|---|
+| Severity | High |
+| Category | Concurrency & thread safety |
+| Location | `TwinCATDriver.cs:413-429` |
+| Status | Resolved |
+
+**Description:** `EnsureConnectedAsync` is not thread-safe. `ReadAsync`, `WriteAsync`,
+`SubscribeAsync`, and the per-device `ProbeLoopAsync` background task can all call it
+concurrently for the same `DeviceState`. The sequence `device.Client ??= _clientFactory.Create()`
+followed by `await device.Client.ConnectAsync(...)` has no lock: two threads can both observe
+`device.Client` null-or-disconnected, each create/connect a client, and one
+`AdsTwinCATClient` is leaked (its `AdsClient` + `AdsNotificationEx` handler never disposed).
+Worse, on the connect-failure path one thread does `device.Client.Dispose(); device.Client = null;`
+while another thread is mid-`ConnectAsync` on that same client instance — a disposal race that
+can throw `ObjectDisposedException` or corrupt the `AdsClient`. The probe loop runs
+continuously, so this race is not hypothetical under any concurrent read/write load.
+
+**Recommendation:** Guard `EnsureConnectedAsync` per-device with a `SemaphoreSlim` (one per
+`DeviceState`), or use an async-lazy connect with proper invalidation. The S7/AB-CIP drivers
+serialize device access with a `SemaphoreSlim` — follow that pattern. Note this also
+serializes the wire, which `docs/v2/driver-specs.md` recommends for single-connection-per-PLC.
+
+**Resolution:** Resolved 2026-05-22 — EnsureConnectedAsync is now serialized per device by a SemaphoreSlim (DeviceState.ConnectGate) with a double-checked connect, mirroring the S7 driver; no client is leaked and no disposal race remains.
+
+### Driver.TwinCAT-008
+
+| Field | Value |
+|---|---|
+| Severity | High |
+| Category | Concurrency & thread safety |
+| Location | `AdsTwinCATClient.cs:162-169`, `TwinCATDriver.cs:319-324` |
+| Status | Resolved |
+
+**Description:** Native ADS notification callbacks (`OnAdsNotificationEx`) run on the
+`AdsClient` AMS router thread. `docs/v2/driver-specs.md` section 6 explicitly calls this out
+as a code-review checklist item: "Callbacks must marshal to a managed work queue immediately
+(no driver logic on the router thread) — blocking the router thread blocks every ADS
+notification across the process." The current path invokes `reg.OnChange(...)` directly on the
+router thread, and `OnChange` is the driver lambda that calls `OnDataChange?.Invoke(this, ...)`
+— i.e. every downstream Core/OPC UA subscriber handler executes synchronously on the AMS
+router thread. A single slow consumer stalls ADS notification delivery for every tag on every
+device in the process.
+
+**Recommendation:** Marshal notification values onto a bounded `Channel`/work queue drained by
+a dedicated managed task before invoking `OnChange`/`OnDataChange`, exactly as the Galaxy
+`EventPump` does. Keep the router-thread callback to a non-blocking enqueue only.
+
+**Resolution:** Resolved 2026-05-22 — AdsTwinCATClient now enqueues native AdsNotificationEx callbacks onto a bounded Channel drained by a dedicated managed task; the AMS router thread only does a non-blocking TryWrite, so a slow consumer cannot stall ADS delivery.
+
+### Driver.TwinCAT-009
+
+| Field | Value |
+|---|---|
+| Severity | Medium |
+| Category | Concurrency & thread safety |
+| Location | `TwinCATDriver.cs:80-99`, `41-72`, `366-388` |
+| Status | Resolved |
+
+**Description:** `ShutdownAsync` mutates `_devices`, `_tagsByName`, and `_nativeSubs` with no
+synchronization while `ReadAsync`/`WriteAsync`/`SubscribeAsync` may be iterating or indexing
+those same plain `Dictionary<>` instances on other threads (`_devices` and `_tagsByName` are
+non-concurrent dictionaries). `ShutdownAsync` calls `_devices.Clear()`/`_tagsByName.Clear()`
+concurrently with `_devices.TryGetValue` in a read — `Dictionary<>` is not safe for concurrent
+read+write and can throw or corrupt internal state. `ReinitializeAsync` makes this worse: it
+runs `ShutdownAsync` then `InitializeAsync` (which re-populates the same dictionaries) while
+in-flight reads continue. The probe loop `EnsureConnectedAsync` also touches `DeviceState`
+objects that `ShutdownAsync` is disposing — `ShutdownAsync` cancels `ProbeCts` but does not
+await the probe task before calling `DisposeClient()`.
+
+**Recommendation:** Either swap `_devices`/`_tagsByName` to `ConcurrentDictionary` and snapshot
+them on rebuild, or introduce a lifecycle lock / `volatile` running guard so reads fail fast
+with `BadServerHalted`/`BadNodeIdUnknown` once shutdown begins. Cancel and await the probe
+tasks before disposing `DeviceState`s.
+
+**Resolution:** Resolved 2026-05-22 — swapped `_devices` and `_tagsByName` to `ConcurrentDictionary` so concurrent `TryGetValue` / `Clear` calls are safe; added `DeviceState.ProbeTask` and updated `ShutdownAsync` to cancel then `await` each probe task before disposing the client and gate, eliminating the disposal race.
+
+### Driver.TwinCAT-010
+
+| Field | Value |
+|---|---|
+| Severity | Medium |
+| Category | Error handling & resilience |
+| Location | `AdsTwinCATClient.cs:178-195` |
+| Status | Resolved |
+
+**Description:** `BrowseSymbolsAsync` checks `cancellationToken.IsCancellationRequested` and
+does `yield break` (a clean completion) rather than throwing `OperationCanceledException`.
+`DiscoverAsync` (`TwinCATDriver.cs:274`) explicitly has `catch (OperationCanceledException)
+{ throw; }` to propagate cancellation distinctly from a genuine browse failure. Because the
+client never throws on cancellation, a cancelled discovery silently completes as if the
+symbol table were fully and successfully walked — the address space is built from a partial
+symbol set with no indication it was truncated. The `SymbolLoaderFactory.Create` /
+`loader.Symbols` enumeration itself is also not cancellable.
+
+**Recommendation:** Call `cancellationToken.ThrowIfCancellationRequested()` instead of
+`yield break` so a cancelled browse surfaces as cancellation, not as a successful but partial
+discovery.
+
+**Resolution:** Resolved 2026-05-22 — replaced `yield break` with `cancellationToken.ThrowIfCancellationRequested()` in the `foreach` loop so a cancelled browse propagates as `OperationCanceledException`, matching the `DiscoverAsync` expectation.
+
+### Driver.TwinCAT-011
+
+| Field | Value |
+|---|---|
+| Severity | Medium |
+| Category | Error handling & resilience |
+| Location | `TwinCATStatusMapper.cs:29-42` |
+| Status | Resolved |
+
+**Description:** ADS error-code mapping has gaps and an inconsistency versus
+`docs/v2/driver-specs.md` section 6. The spec documents symbol-not-found as 0x0701
+(1793 decimal) and symbol-version-changed as 0x0702 (1794 decimal). `MapAdsError` maps
+decimal 1798 to `BadNodeIdUnknown` (symbol not found) and 1793/1794 to `BadOutOfRange`
+(invalid index group/offset). The decimal-vs-hex interpretation of the documented codes does
+not line up, so the mapper appears to treat the symbol-version-changed code as a generic range
+error. 0x0710 "Not ready / PLC in Config mode" has no entry and falls through to
+`BadCommunicationError`; the driver-spec recommends distinguishing it. And 0x0702
+symbol-version-changed is never routed to rediscovery (see Driver.TwinCAT-013).
+
+**Recommendation:** Confirm the actual `AdsErrorCode` numeric values from
+`Beckhoff.TwinCAT.Ads` (the SDK enum, not the doc hex shorthand) and align the mapper. Add an
+explicit case for symbol-version-changed routed to rediscovery, and for PLC-in-Config mapped
+to `BadOutOfService`/`BadInvalidState`.
+
+**Resolution:** Resolved 2026-05-22 — confirmed all codes from `Beckhoff.TwinCAT.Ads` 7.0.172 `AdsErrorCode` enum. Rewrote `MapAdsError` with 20 explicit cases keyed to the correct decimal values. Fixed the critical bug: `AdsSymbolVersionChanged` was `0x0702u` (= `DeviceInvalidGroup`) but the actual `DeviceSymbolVersionInvalid` is 1809 (0x0711); corrected constant and updated all comments. Added `BadOutOfService` for `DeviceNotReady` (PLC not running) and `BadInvalidState` for `DeviceInvalidState` (PLC in Config mode, 0x0712) and `DeviceSymbolVersionInvalid` (0x0711). Added `BadOutOfService`/`BadInvalidState` OPC UA StatusCode constants to the mapper.
+
+### Driver.TwinCAT-012
+
+| Field | Value |
+|---|---|
+| Severity | Medium |
+| Category | Performance & resource management |
+| Location | `TwinCATDriver.cs:102`, `AdsTwinCATClient.cs:178-195` |
+| Status | Resolved |
+
+**Description:** `GetMemoryFootprint()` returns a hard-coded 0. `docs/v2/driver-stability.md`
+section "In-process only (Tier A/B) — driver-instance allocation tracking" requires the
+footprint to reflect "bytes attributable to their own caches (symbol cache, subscription
+items, queued operations)", and section 6 of `driver-specs.md` explicitly identifies cached
+symbol info as "the largest in-driver allocation" for TwinCAT and ties `FlushOptionalCachesAsync`
+to flushing it. Reporting 0 means Core allocation-slope detection and cache-budget enforcement
+are blind to this driver, and `FlushOptionalCachesAsync` is a no-op. (Note: the current
+`BrowseSymbolsAsync` does not retain a symbol cache — it streams and discards — so
+re-discovery re-downloads the whole symbol table each time, itself a performance concern for
+`EnableControllerBrowse` deployments.)
+
+**Recommendation:** Either implement an actual symbol cache + report its size via
+`GetMemoryFootprint()` and flush it in `FlushOptionalCachesAsync`, or, if the
+stream-and-discard design is intentional, report the real footprint of `_nativeSubs` /
+`_tagsByName` and document that the driver holds no flushable cache.
+
+**Resolution:** Resolved 2026-05-22 — `GetMemoryFootprint()` now returns `(_tagsByName.Count * 256L) + (_nativeSubs.Count * 512L)`; documented that the driver has no flushable symbol cache (stream-and-discard design) so `FlushOptionalCachesAsync` remains a documented no-op.
+
+### Driver.TwinCAT-013
+
+| Field | Value |
+|---|---|
+| Severity | High |
+| Category | Design-document adherence |
+| Location | `TwinCATDriver.cs:11-12` (capability list), whole file |
+| Status | Resolved |
+
+**Description:** `TwinCATDriver` does not implement `IRediscoverable`. Both
+`docs/v2/driver-specs.md` section 6 and `docs/v2/driver-stability.md` section "TwinCAT — Deep
+Dive" state this as the defining TwinCAT failure mode: "Symbol-version-changed (0x0702) is
+the unique TwinCAT failure mode... The driver must catch 0x0702, mark its symbol cache
+invalid, re-upload symbols, rebuild the address space subtree... Treat this as a
+`IRediscoverable` invocation, not as a connection error." The `IRediscoverable` XML doc names
+TwinCAT symbol-version-changed as a canonical example. The current driver maps the error to a
+generic `BadOutOfRange`/`BadCommunicationError` quality code and never re-runs discovery, so
+after a PLC program re-download every symbol handle and notification silently goes stale with
+no address-space rebuild.
+
+**Recommendation:** Implement `IRediscoverable`; detect the symbol-version-changed ADS error
+on read/write/notification paths, raise `OnRediscoveryNeeded` with a scoped reason string, and
+re-establish native notifications after the Core re-runs `DiscoverAsync`. This is explicitly
+part of the documented driver contract, not optional.
+
+**Resolution:** Resolved 2026-05-22 — TwinCATDriver implements IRediscoverable; AdsTwinCATClient detects ADS 0x0702 on read/write paths and raises OnSymbolVersionChanged, which the driver forwards as OnRediscoveryNeeded so Core rebuilds the address space.
+
+### Driver.TwinCAT-014
+
+| Field | Value |
+|---|---|
+| Severity | Low |
+| Category | Design-document adherence |
+| Location | `TwinCATDriverOptions.cs:41-43`, `TwinCATDriverOptions.cs:57-62`, `AdsTwinCATClient.cs:145` |
+| Status | Open |
+
+**Description:** Several drifts between the implemented config surface and
+`docs/v2/driver-specs.md` section 6. The spec connection-settings list has separate `Host`
+(IP), `AmsNetId`, and `AmsPort` fields; the implementation collapses these into a single
+`HostAddress` string parsed as ads://{netId}:{port}, so the target device IP has no home
+field. `TwinCATProbeOptions.Timeout` (`TwinCATDriverOptions.cs:61`) is never read anywhere —
+the probe path connects via `_options.Timeout` — a dead config field. The spec lists
+`NotificationMaxDelayMs`; the code hard-codes max-delay 0 in `NotificationSettings`
+(`AdsTwinCATClient.cs:145`) with no config knob.
+
+**Recommendation:** Reconcile the driver-spec doc with the implemented `TwinCATDriverOptions`
+shape (the doc is DRAFT, so updating it is acceptable). Remove or wire up
+`TwinCATProbeOptions.Timeout`. Expose `NotificationMaxDelayMs` if batching control is wanted.
+
+**Resolution:** _(open)_
+
+### Driver.TwinCAT-015
+
+| Field | Value |
+|---|---|
+| Severity | Low |
+| Category | Code organization & conventions |
+| Location | `TwinCATDriver.cs:431-432` |
+| Status | Open |
+
+**Description:** `Dispose()` runs `DisposeAsync().AsTask().GetAwaiter().GetResult()` —
+sync-over-async. `docs/v2/driver-stability.md` section Galaxy explicitly lists "sync-over-async
+on the OPC UA stack thread" among the four 2026-04-13 stability findings that had to be
+closed. `DisposeAsync` calls `ShutdownAsync`, which awaits `_poll.DisposeAsync()` and disposes
+clients; if `Dispose()` is ever called on a thread with a single-threaded synchronization
+context (the OPC UA stack), `GetResult()` can deadlock.
+
+**Recommendation:** Make `Dispose()` perform a genuinely synchronous teardown. The operations
+here — cancelling token sources, disposing clients, clearing dictionaries — are all
+synchronous, and `PollGroupEngine.DisposeAsync` completes synchronously, so factor the
+synchronous teardown out so `Dispose()` does not block on a `Task`.
+
+**Resolution:** _(open)_
+
+### Driver.TwinCAT-016
+
+| Field | Value |
+|---|---|
+| Severity | Low |
+| Category | Testing coverage |
+| Location | `tests/Drivers/ZB.MOM.WW.OtOpcUa.Driver.TwinCAT.Tests/` |
+| Status | Open |
+
+**Description:** Unit coverage exists for AMS-address parsing, symbol-path parsing, read/write,
+native notifications, symbol browse, and the capability surface. Gaps tied to the findings
+above: no test exercises `ReinitializeAsync` with a changed config (Driver.TwinCAT-001 would
+have been caught); no concurrency test drives `ReadAsync`/`WriteAsync`/probe against one
+device simultaneously (Driver.TwinCAT-007/009); no test covers the symbol-version-changed to
+rediscovery path (Driver.TwinCAT-013, currently unimplemented); no test covers a `Structure`-
+typed pre-declared tag (Driver.TwinCAT-003); no test asserts 64-bit `LInt`/`ULInt` round-trip
+without truncation (Driver.TwinCAT-002).
+
+**Recommendation:** Add unit tests for the above paths once the corresponding findings are
+addressed, especially a concurrency stress test for `EnsureConnectedAsync` and a
+`ReinitializeAsync`-applies-new-config test.
+
+**Resolution:** _(open)_
@@ -0,0 +1,392 @@
+# Code Reviews
+
+<!-- GENERATED FILE - do not edit by hand. Regenerate with: python code-reviews/regen-readme.py -->
+
+Cross-module code review index for the OtOpcUa server codebase (`lmxopcua`). The review process is defined in [../REVIEW-PROCESS.md](../REVIEW-PROCESS.md).
+
+Each module's `findings.md` is the source of truth; this file is generated from them by `regen-readme.py` and must not be edited by hand.
+
+## Module status
+
+| Module | Reviewer | Date | Commit | Status | Open | Total |
+|---|---|---|---|---|---|---|
+| [Admin](Admin/findings.md) | Claude Code | 2026-05-22 | `76d35d1` | Reviewed | 3 | 13 |
+| [Analyzers](Analyzers/findings.md) | Claude Code | 2026-05-22 | `76d35d1` | Reviewed | 5 | 7 |
+| [Client.CLI](Client.CLI/findings.md) | Claude Code | 2026-05-22 | `76d35d1` | Reviewed | 8 | 10 |
+| [Client.Shared](Client.Shared/findings.md) | Claude Code | 2026-05-22 | `76d35d1` | Reviewed | 5 | 11 |
+| [Client.UI](Client.UI/findings.md) | Claude Code | 2026-05-22 | `76d35d1` | Reviewed | 6 | 11 |
+| [Configuration](Configuration/findings.md) | Claude Code | 2026-05-22 | `76d35d1` | Reviewed | 5 | 11 |
+| [Core](Core/findings.md) | Claude Code | 2026-05-22 | `76d35d1` | Reviewed | 6 | 12 |
+| [Core.Abstractions](Core.Abstractions/findings.md) | Claude Code | 2026-05-22 | `76d35d1` | Reviewed | 5 | 8 |
+| [Core.AlarmHistorian](Core.AlarmHistorian/findings.md) | Claude Code | 2026-05-22 | `76d35d1` | Reviewed | 2 | 11 |
+| [Core.ScriptedAlarms](Core.ScriptedAlarms/findings.md) | Claude Code | 2026-05-22 | `76d35d1` | Reviewed | 6 | 12 |
+| [Core.Scripting](Core.Scripting/findings.md) | Claude Code | 2026-05-22 | `76d35d1` | Reviewed | 5 | 11 |
+| [Core.VirtualTags](Core.VirtualTags/findings.md) | Claude Code | 2026-05-22 | `76d35d1` | Reviewed | 7 | 13 |
+| [Driver.AbCip](Driver.AbCip/findings.md) | Claude Code | 2026-05-22 | `76d35d1` | Reviewed | 5 | 15 |
+| [Driver.AbCip.Cli](Driver.AbCip.Cli/findings.md) | Claude Code | 2026-05-22 | `76d35d1` | Reviewed | 6 | 8 |
+| [Driver.AbLegacy](Driver.AbLegacy/findings.md) | Claude Code | 2026-05-22 | `76d35d1` | Reviewed | 3 | 13 |
+| [Driver.AbLegacy.Cli](Driver.AbLegacy.Cli/findings.md) | Claude Code | 2026-05-22 | `76d35d1` | Reviewed | 6 | 7 |
+| [Driver.Cli.Common](Driver.Cli.Common/findings.md) | Claude Code | 2026-05-22 | `76d35d1` | Reviewed | 2 | 6 |
+| [Driver.FOCAS](Driver.FOCAS/findings.md) | Claude Code | 2026-05-22 | `76d35d1` | Reviewed | 5 | 12 |
+| [Driver.FOCAS.Cli](Driver.FOCAS.Cli/findings.md) | Claude Code | 2026-05-22 | `76d35d1` | Reviewed | 5 | 5 |
+| [Driver.Galaxy](Driver.Galaxy/findings.md) | Claude Code | 2026-05-22 | `76d35d1` | Reviewed | 4 | 14 |
+| [Driver.Historian.Wonderware](Driver.Historian.Wonderware/findings.md) | Claude Code | 2026-05-22 | `76d35d1` | Reviewed | 7 | 12 |
+| [Driver.Historian.Wonderware.Client](Driver.Historian.Wonderware.Client/findings.md) | Claude Code | 2026-05-22 | `76d35d1` | Reviewed | 5 | 10 |
+| [Driver.Modbus](Driver.Modbus/findings.md) | Claude Code | 2026-05-22 | `76d35d1` | Reviewed | 7 | 12 |
+| [Driver.Modbus.Addressing](Driver.Modbus.Addressing/findings.md) | Claude Code | 2026-05-22 | `76d35d1` | Reviewed | 3 | 9 |
+| [Driver.Modbus.Cli](Driver.Modbus.Cli/findings.md) | Claude Code | 2026-05-22 | `76d35d1` | Reviewed | 6 | 8 |
+| [Driver.OpcUaClient](Driver.OpcUaClient/findings.md) | Claude Code | 2026-05-22 | `76d35d1` | Reviewed | 2 | 15 |
+| [Driver.S7](Driver.S7/findings.md) | Claude Code | 2026-05-22 | `76d35d1` | Reviewed | 5 | 14 |
+| [Driver.S7.Cli](Driver.S7.Cli/findings.md) | Claude Code | 2026-05-22 | `76d35d1` | Reviewed | 4 | 7 |
+| [Driver.TwinCAT](Driver.TwinCAT/findings.md) | Claude Code | 2026-05-22 | `76d35d1` | Reviewed | 5 | 16 |
+| [Driver.TwinCAT.Cli](Driver.TwinCAT.Cli/findings.md) | Claude Code | 2026-05-22 | `76d35d1` | Reviewed | 7 | 7 |
+| [Server](Server/findings.md) | Claude Code | 2026-05-22 | `76d35d1` | Reviewed | 6 | 15 |
+
+## Pending findings
+
+Findings with status `Open` or `In Progress`, ordered by severity.
+
+| ID | Severity | Category | Location | Description |
+|---|---|---|---|---|
+| Admin-010 | Low | OtOpcUa conventions | `Components/App.razor:9,16` | `App.razor` loads Bootstrap CSS and JS from the `cdn.jsdelivr.net` CDN. `admin-ui.md` section "Tech Stack" specifies "Bootstrap 5 vendored under `wwwroot/lib/bootstrap/`" precisely so the Admin app has no third-party runtime dependency. A… |
+| Admin-011 | Low | Concurrency & thread safety | `Hubs/FleetStatusPoller.cs:24-26,98-103` | `FleetStatusPoller` keeps three plain `Dictionary<>` fields (`_last`, `_lastRole`, `_lastResilience`) mutated from `PollOnceAsync`. The poller `ExecuteAsync` loop is single-threaded so the steady-state poll path is safe, but `ResetCache()`… |
+| Admin-012 | Low | Design-document adherence | `Services/EquipmentCsvImporter.cs:18-19,33-37,229,232` | `EquipmentCsvImporter` declares `EquipmentId` as a required CSV column and parses it into a `required` field. `admin-ui.md` section "Equipment CSV import" (revised after adversarial review finding #4) is explicit: "No `EquipmentId` column… |
+| Analyzers-002 | Low | Correctness & logic bugs | `src/Tooling/ZB.MOM.WW.OtOpcUa.Analyzers/UnwrappedCapabilityCallAnalyzer.cs:46-50,130` | `AlarmSurfaceInvoker` is listed in `WrapperTypes`, but `AlarmSurfaceInvoker`'s public methods (`SubscribeAsync`, `UnsubscribeAsync`, `AcknowledgeAsync`) take no lambda arguments at all — callers pass `IReadOnlyList<...>` / `IAlarmSubscript… |
+| Analyzers-003 | Low | Error handling & resilience | `src/Tooling/ZB.MOM.WW.OtOpcUa.Analyzers/UnwrappedCapabilityCallAnalyzer.cs:80,114-116` | `IsInsideWrapperLambda` is passed `context.Operation.SemanticModel` and returns `false` when that model is `null`. A `false` return means "not wrapped", so a null semantic model produces a false-positive diagnostic rather than silently ski… |
+| Analyzers-004 | Low | Performance & resource management | `src/Tooling/ZB.MOM.WW.OtOpcUa.Analyzers/UnwrappedCapabilityCallAnalyzer.cs:95-112` | `ImplementsGuardedInterface` runs on every invocation operation in the compilation (every keystroke in the IDE). For each candidate it allocates via `AllInterfaces.Concat(new[] { method.ContainingType })`, builds a fully-qualified display… |
+| Analyzers-005 | Low | Design-document adherence | `src/Tooling/ZB.MOM.WW.OtOpcUa.Analyzers/UnwrappedCapabilityCallAnalyzer.cs:33-43` | `CapabilityInvoker`'s XML doc (`src/Core/.../Resilience/CapabilityInvoker.cs:15-17`) enumerates the routed capability surface as `IReadable`, `IWritable`, `ITagDiscovery`, `ISubscribable`, `IHostConnectivityProbe`, `IAlarmSource`, and all… |
+| Analyzers-007 | Low | Documentation & comments | `src/Tooling/ZB.MOM.WW.OtOpcUa.Analyzers/UnwrappedCapabilityCallAnalyzer.cs:21-26` | The `<remarks>` block states the analyzer "matches by receiver-interface identity using Roslyn's semantic model, not by method name". This is accurate for the guarded-call detection (`ImplementsGuardedInterface` uses symbols), but the wrap… |
+| Client.CLI-002 | Low | Correctness & logic bugs | `Commands/SubscribeCommand.cs:129-137` | The summary computes `neverWentBad` as every target whose node-id key is absent from the `everBad` dictionary. A node that received no update at all is also absent from `everBad`, so it is counted in `neverWentBad` and printed under the he… |
+| Client.CLI-003 | Low | Correctness & logic bugs | `Commands/BrowseCommand.cs:29-30`, `Commands/SubscribeCommand.cs:20-27`, `Commands/AlarmsCommand.cs:28-29`, `Commands/HistoryReadCommand.cs:42-43` | Numeric command options accept any value with no range validation. `--depth`, `--interval`, `--max-depth`, `--max`, and the history `--interval` can all be supplied as `0` or a negative number. A negative `--depth`/`--max-depth` silently d… |
+| Client.CLI-004 | Low | OtOpcUa conventions | `Commands/SubscribeCommand.cs:13-37` | `SubscribeCommand` is the only command in the module whose constructor and all `[CommandOption]` properties have no XML doc comments. Every other command (`ConnectCommand`, `ReadCommand`, `WriteCommand`, `BrowseCommand`, `AlarmsCommand`, `… |
+| Client.CLI-006 | Low | Error handling & resilience | `Commands/HistoryReadCommand.cs:73`, `Commands/HistoryReadCommand.cs:76`, `Helpers/NodeIdParser.cs:39` | Operator input-format errors surface as raw .NET exceptions rather than clean CLI errors. An unparseable start/end value throws `FormatException` straight out of `DateTime.Parse`; an invalid node id throws `FormatException`/`ArgumentExcept… |
+| Client.CLI-007 | Low | Performance & resource management | `CommandBase.cs:112-123` | `ConfigureLogging` builds a new Serilog `LoggerConfiguration`, creates a logger, and assigns it to the static `Log.Logger` without disposing the previously assigned logger. For a single CLI invocation this leaks at most one logger and the… |
+| Client.CLI-008 | Low | Documentation & comments | `docs/Client.CLI.md:158-217` | `docs/Client.CLI.md` is stale relative to the code at this commit. (1) The `subscribe` command section documents only `-n` and `-i`, but the code (`SubscribeCommand`) also exposes `-r/--recursive`, `--max-depth`, `-q/--quiet`, `--duration`… |
+| Client.CLI-009 | Low | Code organization & conventions | `Commands/SubscribeCommand.cs:66-165`, `Commands/AlarmsCommand.cs:52-91` | Both long-running commands attach an event handler (`service.DataChanged += ...`, `service.AlarmEvent += ...`) with a lambda and never detach it. Because the handler closes over `console`, the captured console and the closure remain refere… |
+| Client.CLI-010 | Low | Testing coverage | `tests/Client/ZB.MOM.WW.OtOpcUa.Client.CLI.Tests/SubscribeCommandTests.cs` | The new `SubscribeCommand` capabilities are largely untested. The four `SubscribeCommandTests` cover only single-node subscribe, unsubscribe-on-cancel, disconnect-in-finally, and the subscription message. There is no test for the `--recurs… |
+| Client.Shared-003 | Low | Correctness & logic bugs | `Adapters/DefaultSessionAdapter.cs:76`, `Adapters/DefaultSessionAdapter.cs:273` | `WriteValueAsync` returns `response.Results[0]` and `CallMethodAsync` reads `result.Results[0]` without first checking the `Results` collection is non-empty. A malformed or service-level-faulted response (empty `Results` alongside a servic… |
+| Client.Shared-004 | Low | OtOpcUa conventions | `Adapters/DefaultSessionAdapter.cs:228`, `Adapters/DefaultSessionAdapter.cs:121`, `Adapters/DefaultSessionAdapter.cs:172` | `CloseAsync`, `HistoryReadRawAsync`, and `HistoryReadAggregateAsync` are declared `async Task` but call the synchronous `Session.Close()` / `Session.HistoryRead(...)` APIs and contain no `await`. The history methods run a blocking synchron… |
+| Client.Shared-009 | Low | Error handling & resilience / Documentation & comments | `OpcUaClientService.cs:302-322` | `AcknowledgeAlarmAsync` is typed `Task<StatusCode>` and its XML doc implies the returned code reports the ack outcome, but the method unconditionally `return StatusCodes.Good`. The actual failure path is `DefaultSessionAdapter.CallMethodAs… |
+| Client.Shared-010 | Low | Performance & resource management | `Models/ConnectionSettings.cs:48`, `OpcUaClientService.cs:408-417` | `ConnectionSettings.CertificateStorePath` is initialized to `ClientStoragePaths.GetPkiPath()` as a property initializer, so every `ConnectionSettings` instantiation runs `Environment.GetFolderPath` + `Path.Combine` and, on the first call p… |
+| Client.Shared-011 | Low | Testing coverage | `tests/Client/ZB.MOM.WW.OtOpcUa.Client.Shared.Tests/OpcUaClientServiceTests.cs` | The test suite is solid for the happy paths, connection lifecycle, and single-failover behavior. Gaps relative to the findings above: (a) no test exercises concurrent `SubscribeAsync`/failover to expose the `_activeDataSubscriptions` race… |
+| Client.UI-003 | Low | OtOpcUa conventions | `ZB.MOM.WW.OtOpcUa.Client.UI.csproj:20-21`, `Program.cs:14-20` | The csproj references `Serilog` and `Serilog.Sinks.Console`, and `docs/Client.UI.md` lists Serilog as the logging technology, but no source file in the module uses Serilog. `Program.BuildAvaloniaApp()` uses Avalonia's `LogToTrace()` and th… |
+| Client.UI-004 | Low | OtOpcUa conventions | `Views/MainWindow.axaml.cs:125-138` | `OnBrowseCertPathClicked` uses `OpenFolderDialog`, which is obsolete in Avalonia 11.x (the version pinned in the csproj). The supported replacement is the `StorageProvider` API (`StorageProvider.OpenFolderPickerAsync`). Using the obsolete… |
+| Client.UI-006 | Low | Error handling & resilience | `ViewModels/MainWindowViewModel.cs:244-252`, `ViewModels/AlarmsViewModel.cs:88-112`, `ViewModels/SubscriptionsViewModel.cs:79-94` | Many catch blocks swallow exceptions silently with an empty body and only a comment (`// Redundancy info not available`, `// Subscribe failed`, `// Subscription failed; no item added`, and others). When a subscribe, alarm-subscribe, or red… |
+| Client.UI-009 | Low | Design-document adherence | `ViewModels/HistoryViewModel.cs:44-54` | `HistoryViewModel.AggregateTypes` exposes eight entries: `null` (Raw) plus Average, Minimum, Maximum, Count, Start, End, and `StandardDeviation`. `docs/Client.UI.md` ("Query Options" table) lists only "Raw (default), Average, Minimum, Maxi… |
+| Client.UI-010 | Low | Code organization & conventions | `Controls/DateTimeRangePicker.axaml.cs:33-37`, `Controls/DateTimeRangePicker.axaml.cs:70-80` | `DateTimeRangePicker` declares `MinDateTimeProperty` / `MaxDateTimeProperty` styled properties with public CLR accessors, but neither is read anywhere in the control. `TryParseDateTime`, `OnStartLostFocus`, and `OnEndLostFocus` never clamp… |
+| Client.UI-011 | Low | Documentation & comments | `Views/MainWindow.axaml:81`, `Services/JsonSettingsService.cs:11-15` | The certificate-store-path `TextBox` watermark reads `(default: AppData/LmxOpcUaClient/pki)`, referencing the legacy pre-task-#208 folder name. Per `CLAUDE.md` / `docs/Client.UI.md` the canonical path is now `{LocalAppData}/OtOpcUaClient/`… |
+| Configuration-004 | Low | OtOpcUa conventions | `src/Core/ZB.MOM.WW.OtOpcUa.Configuration/Enums/NodePermissions.cs:8`, `src/Core/ZB.MOM.WW.OtOpcUa.Configuration/OtOpcUaConfigDbContext.cs:417` | `NodePermissions` is declared `[Flags] enum ... : uint`, while its XML doc and `NodeAcl.PermissionFlags`' doc both say "stored as int", and `ConfigureNodeAcl` uses `HasConversion<int>()` — a `uint`→`int` conversion. Only bits 0–11 are used… |
+| Configuration-005 | Low | Concurrency & thread safety | `src/Core/ZB.MOM.WW.OtOpcUa.Configuration/LocalCache/LiteDbConfigCache.cs:50` | `PutAsync` performs a non-atomic find-then-insert/update. Two concurrent `PutAsync` calls for the same `(ClusterId, GenerationId)` can both observe `existing is null` and both `Insert`, producing two rows for one generation. The constructo… |
+| Configuration-007 | Low | Error handling & resilience | `src/Core/ZB.MOM.WW.OtOpcUa.Configuration/Apply/GenerationApplier.cs:44` | `ApplyPass` wraps each callback in `catch (Exception ex)`. This swallows `OperationCanceledException` — a cancellation during a callback is recorded as just another entity error string and the applier keeps walking the remaining passes ins… |
+| Configuration-010 | Low | Security | `src/Core/ZB.MOM.WW.OtOpcUa.Configuration/LocalCache/ResilientConfigReader.cs:81` | On central-DB read failure the warning log records the full exception object. Callers pass arbitrary `centralFetch` delegates; if any delegate closes over a connection string, an exception thrown from it (or a `SqlException` carrying serve… |
+| Configuration-011 | Low | Testing coverage | `src/Core/ZB.MOM.WW.OtOpcUa.Configuration/Apply/GenerationApplier.cs:7`, `src/Core/ZB.MOM.WW.OtOpcUa.Configuration/Validation/DraftValidator.cs:60` | The companion test project covers the cache, schema compliance, stored procedures, and `DraftValidator` well, but two flagged behaviours are not pinned: (a) `GenerationApplier` ordering/cancellation when a Removed callback fails — no test… |
+| Core-004 | Low | OtOpcUa conventions | `src/Core/ZB.MOM.WW.OtOpcUa.Core/Hosting/DriverHost.cs:55,72,87` | `DriverHost` is a library type whose async calls (`driver.InitializeAsync`, `driver.ShutdownAsync`) do not use `ConfigureAwait(false)`, whereas the sibling `CapabilityInvoker` and `AlarmSurfaceInvoker` in the same module consistently do. T… |
+| Core-008 | Low | Error handling & resilience | `src/Core/ZB.MOM.WW.OtOpcUa.Core/OpcUa/GenericDriverNodeManager.cs:42-64` | The XML summary of `BuildAddressSpaceAsync` states "Driver exceptions are isolated per decision #12 — the driver's subtree is marked Faulted, but other drivers remain available." The method body contains no such isolation: an exception fro… |
+| Core-009 | Low | Performance & resource management | `src/Core/ZB.MOM.WW.OtOpcUa.Core/Resilience/CapabilityInvoker.cs:121-128` | `ExecuteWriteAsync` calls `_optionsAccessor()` three times for a single non-idempotent write (once for the `with` expression, once inside the dictionary initializer for `.Resolve(...)`, plus the discarded base). On the per-write hot path i… |
+| Core-010 | Low | Code organization & conventions | `src/Core/ZB.MOM.WW.OtOpcUa.Core/Resilience/DriverResilienceOptions.cs:45-52` | `DriverResilienceOptions.Resolve` indexes the tier-default dictionary directly (`defaults[capability]`) with no fallback. Any future addition to `DriverCapability` that is not also added to all three tier tables in `GetTierDefaults` will m… |
+| Core-011 | Low | Testing coverage | `src/Core/ZB.MOM.WW.OtOpcUa.Core/Authorization/PermissionTrieBuilder.cs:58-75` | `PermissionTrieBuilder.Descend` has a two-branch behaviour: with a `scopePaths` lookup it descends the real hierarchy; without one it falls back to placing every non-cluster row directly under the root keyed by `ScopeId` ("works for determ… |
+| Core-012 | Low | Documentation & comments | `src/Core/ZB.MOM.WW.OtOpcUa.Core/Stability/WedgeDetector.cs:26`, `src/Core/ZB.MOM.WW.OtOpcUa.Core/Observability/DriverHealthReport.cs:11-22` | Two stale doc comments. (1) `WedgeDetector` — the `<summary>` above the constructor reads "Whether the driver reported itself `DriverState.Healthy` at construction." The constructor takes only a `TimeSpan threshold` and the detector is doc… |
+| Core.Abstractions-004 | Low | Concurrency & thread safety | `src/Core/ZB.MOM.WW.OtOpcUa.Core.Abstractions/DriverTypeRegistry.cs:23-40` | `Register` performs a check-then-act sequence (`snapshot.ContainsKey` then build `next` then `Interlocked.Exchange`) that is not atomic. Two threads registering concurrently can both pass the duplicate check and both build a `next` diction… |
+| Core.Abstractions-005 | Low | Error handling & resilience | `src/Core/ZB.MOM.WW.OtOpcUa.Core.Abstractions/PollGroupEngine.cs:90,99` | Both the initial-poll and steady-state catch blocks use a bare `catch { }` that swallows every exception type, including non-transient programmer errors such as `NullReferenceException` and `ArgumentOutOfRangeException` (see Core.Abstracti… |
+| Core.Abstractions-006 | Low | Code organization & conventions | `src/Core/ZB.MOM.WW.OtOpcUa.Core.Abstractions/IHistoryProvider.cs:63,84-86`, `src/Core/ZB.MOM.WW.OtOpcUa.Core.Abstractions/Historian/IHistorianDataSource.cs:30,63` | The two history-read surfaces use inconsistent integer types for the same "maximum rows" concept. `IHistoryProvider.ReadRawAsync` and `IHistorianDataSource.ReadRawAsync` take `uint maxValuesPerNode`, but `ReadEventsAsync` (on both interfac… |
+| Core.Abstractions-007 | Low | Testing coverage | `tests/Core/ZB.MOM.WW.OtOpcUa.Core.Abstractions.Tests/PollGroupEngineTests.cs` | `PollGroupEngine` is the only behavioural (non-DTO) type in the module and its tests, while solid for the happy paths, miss two paths that this review identifies as defect-prone: (a) no test exercises an array-valued tag whose contents are… |
+| Core.Abstractions-008 | Low | Documentation & comments | `src/Core/ZB.MOM.WW.OtOpcUa.Core.Abstractions/DriverHealth.cs:9`, `src/Core/ZB.MOM.WW.OtOpcUa.Core.Abstractions/IHistoryProvider.cs:39-43,65-69` | Two XML-doc inaccuracies: 1. `DriverHealth.LastError` is documented as "Most recent error message; null when state is Healthy." The `DriverState` enum also defines `Degraded`, `Reconnecting`, and `Faulted` states, all of which carry an err… |
+| Core.AlarmHistorian-008 | Low | Performance & resource management | `src/Core/ZB.MOM.WW.OtOpcUa.Core.AlarmHistorian/SqliteStoreAndForwardSink.cs:107-127,255-278` | Each `EnqueueAsync` (one per alarm transition — a hot path on a busy plant) opens a connection, runs `EnforceCapacity` (a `COUNT(*)` over the queue table on every single enqueue), serializes JSON, inserts, and closes the connection. The un… |
+| Core.AlarmHistorian-011 | Low | Documentation & comments | `src/Core/ZB.MOM.WW.OtOpcUa.Core.AlarmHistorian/IAlarmHistorianSink.cs:5-9,76`, `AlarmHistorianEvent.cs:20` | Several doc-comments reference the retired v1 architecture. The `IAlarmHistorianSink` summary says ingestion "routes through Galaxy.Host's pipe" and `IAlarmHistorianWriter` says "Stream G wires this to the Galaxy.Host IPC client", but `doc… |
+| Core.ScriptedAlarms-003 | Low | Documentation & comments | `ScriptedAlarmEngine.cs:343`, `docs/ScriptedAlarms.md:107` | `docs/ScriptedAlarms.md` (Composition step 3) and the `OnUpstreamChange` comment ("Fire-and-forget so driver-side dispatch isn't blocked", line 225-226) describe the `OnEvent` emission path as non-blocking / fire-and-forget. In the code, `… |
+| Core.ScriptedAlarms-006 | Low | Concurrency & thread safety | `ScriptedAlarmEngine.cs:232`, `ScriptedAlarmEngine.cs:369` | `OnUpstreamChange` and `RunShelvingCheck` both launch fire-and-forget tasks (`_ = ReevaluateAsync(...)`, `_ = ShelvingCheckAsync(...)`) with `CancellationToken.None`. There is no tracking of these in-flight tasks, so `Dispose` cannot await… |
+| Core.ScriptedAlarms-008 | Low | Performance & resource management | `Part9StateMachine.cs:261-268` | `AppendComment` copies the entire existing comment list into a new `List` on every audit-producing transition (ack, confirm, shelve, unshelve, enable, disable, add-comment, auto-unshelve). The `Comments` list is append-only and unbounded —… |
+| Core.ScriptedAlarms-009 | Low | Performance & resource management | `ScriptedAlarmEngine.cs:309-315`, `ScriptedAlarmEngine.cs:271` | `BuildReadCache` allocates a fresh `Dictionary<string, DataValueSnapshot>` on every predicate evaluation, i.e. on every upstream tag change for every referencing alarm. On a busy line where many tags feeding many alarms change frequently,… |
+| Core.ScriptedAlarms-010 | Low | Design-document adherence | `ScriptedAlarmEngine.cs:325-336`, `AlarmPredicateContext.cs:33-40`, `MessageTemplate.cs:47` | Quality handling is inconsistent across the three places that inspect a `DataValueSnapshot.StatusCode`. `AreInputsReady` (engine, line 333) treats only outright Bad (bit 31) as not-ready, so an Uncertain-quality input is fed to the predica… |
+| Core.ScriptedAlarms-011 | Low | Code organization & conventions | `Part9StateMachine.cs:275` | `TransitionResult.NoOp(state, reason)` takes a `reason` string parameter that is documented in the calling code as a diagnostic ("disabled — predicate result ignored", "already acknowledged", etc.) but the factory method silently discards… |
+| Core.Scripting-005 | Low | Correctness & logic bugs | `DependencyExtractor.cs:97` | A raw string literal token passed as the tag path (a raw triple-quote literal) tokenizes as `SingleLineRawStringLiteralToken` / `MultiLineRawStringLiteralToken`, not `StringLiteralToken`. The check `literal.Token.IsKind(SyntaxKind.StringLi… |
+| Core.Scripting-006 | Low | Concurrency & thread safety | `CompiledScriptCache.cs:55` | On a failed compile the `catch` block calls `_cache.TryRemove(key, out _)` without a value comparison. If two threads race a miss for the same bad source, both observe the same faulted `Lazy` and throw, and both call `TryRemove(key)`. If a… |
+| Core.Scripting-008 | Low | Performance & resource management | `CompiledScriptCache.cs:34`, `ScriptEvaluator.cs:34` | `CompiledScriptCache` has no capacity bound (acknowledged in the class remarks) and no eviction. Each cached `ScriptEvaluator` holds a Roslyn `ScriptRunner<T>` delegate, which keeps the dynamically emitted script assembly loaded for the pr… |
+| Core.Scripting-009 | Low | Design-document adherence | `ForbiddenTypeAnalyzer.cs:45` | The Phase 7 plan decision #6 (`docs/v2/implementation/phase-7-scripting-and-alarming.md`) enumerates the forbidden surface as "No HttpClient / File / Process / reflection". `ForbiddenTypeAnalyzer` actually denies a broader set — `System.Th… |
+| Core.Scripting-011 | Low | Testing coverage | `tests/Core/ZB.MOM.WW.OtOpcUa.Core.Scripting.Tests/` | Two source files have no direct test coverage: `ScriptContext` (`Deadband` static helper is exercised only indirectly through `ScriptSandboxTests`, and not for its boundary `tolerance` behaviour) and `ScriptSandbox.Build` itself (the `Argu… |
+| Core.VirtualTags-004 | Low | Correctness & logic bugs | `src/Core/ZB.MOM.WW.OtOpcUa.Core.VirtualTags/VirtualTagEngine.cs:349` | `CoerceResult`'s switch has a default arm (`_ => raw`) that returns the script's raw return value uncoerced for any `DriverDataType` not in the explicit list (e.g. an array type, Byte, or a future enum member). The resulting `DataValueSnap… |
+| Core.VirtualTags-006 | Low | Concurrency & thread safety | `src/Core/ZB.MOM.WW.OtOpcUa.Core.VirtualTags/VirtualTagEngine.cs:177-182`, `:395-401` | `Subscribe` does `_observers.GetOrAdd(path, _ => [])` then `lock (list) { list.Add(observer); }`. When `Unsub.Dispose` removes the last observer, the now-empty List is left in `_observers` and the dictionary entry is never removed. For a l… |
+| Core.VirtualTags-007 | Low | Error handling & resilience | `src/Core/ZB.MOM.WW.OtOpcUa.Core.VirtualTags/TimerTriggerScheduler.cs:58` | `Tick` calls `_engine.EvaluateOneAsync(p, _cts.Token).GetAwaiter().GetResult()`, blocking the `System.Threading.Timer` callback thread (a thread-pool thread) for the full duration of the evaluation. Because `EvaluateInternalAsync` serialis… |
+| Core.VirtualTags-009 | Low | Performance & resource management | `src/Core/ZB.MOM.WW.OtOpcUa.Core.VirtualTags/DependencyGraph.cs:64-65`, `:72-73` | `DirectDependencies` and `DirectDependents` allocate a fresh empty `HashSet<string>` on every call for an unregistered node. `DirectDependents` is called inside the `TopologicalSort` Kahn loop and the `CascadeAsync` DFS, so for a graph wit… |
+| Core.VirtualTags-010 | Low | Documentation & comments | `src/Core/ZB.MOM.WW.OtOpcUa.Core.VirtualTags/ITagUpstreamSource.cs:18`, `VirtualTagContext.cs:30`, `VirtualTagDefinition.cs:28` | Several XML docs reference component names that do not exist in the codebase. `ITagUpstreamSource` XML doc says the subscription path "feeds the engine's ChangeTriggerDispatcher" -- there is no ChangeTriggerDispatcher; the actual path is `… |
+| Core.VirtualTags-011 | Low | Code organization & conventions | `src/Core/ZB.MOM.WW.OtOpcUa.Core.VirtualTags/VirtualTagEngine.cs:404-409` | `VirtualTagState` records a Writes set (the `ctx.SetVirtualTag` targets extracted by `DependencyExtractor`), but nothing in the engine reads it -- it is captured at `Load` and never used. Declared write targets are not validated against th… |
+| Core.VirtualTags-013 | Low | Documentation & comments | `src/Core/ZB.MOM.WW.OtOpcUa.Core.VirtualTags/DependencyGraph.cs:266-270` | `DependencyCycleException.BuildMessage` renders each cycle as `string.Join(" -> ", c) + " -> " + c[0]`, presenting the SCC member list as a traversable edge path that loops back to its first element. Tarjan's algorithm returns the members… |
+| Driver.AbCip-007 | Low | OtOpcUa conventions | `AbCipDriver.cs` (whole file), `AbCipAlarmProjection.cs`, `LibplctagTagRuntime.cs` | `CLAUDE.md` Library Preferences mandate Serilog with a rolling daily file sink. The driver has no logging at all: no `ILogger`/Serilog dependency is injected or used. Failure paths instead swallow exceptions into the `_health` string (`Rea… |
+| Driver.AbCip-011 | Low | Error handling & resilience | `AbCipDriver.cs:144-152`, `AbCipDriverOptions.cs:131-143` | `InitializeAsync` only starts probe loops when `_options.Probe.Enabled` is true AND `Probe.ProbeTagPath` is non-blank. When `Probe.Enabled` is true (the default) but `ProbeTagPath` is null (also the default; the doc comment says "PR 8 wire… |
+| Driver.AbCip-012 | Low | Performance & resource management | `LibplctagTemplateReader.cs:15-35`, `AbCipDriver.cs:88-92` | `LibplctagTemplateReader` is created per `FetchUdtShapeAsync` call, and each call constructs a fresh libplctag `Tag` for the @udt pseudo-tag, initializes it (a CIP connection handshake), reads, and disposes it. There is no reuse of the `Ta… |
+| Driver.AbCip-013 | Low | Design-document adherence | `AbCipDriverOptions.cs:70-73`, `PlcFamilies/AbCipPlcFamilyProfile.cs:13-19`, `LibplctagTagRuntime.cs:16-27` | `driver-specs.md` specifies the AB CIP per-device connection settings as discrete fields: Host, Path, PlcType, TimeoutMs, AllowPacking, ConnectionSize. The implementation instead collapses host + path into a single opaque ab:// URL string… |
+| Driver.AbCip-015 | Low | Documentation & comments | `AbCipDriver.cs:9-11`, `PlcTagHandle.cs:23-27,53-58`, `AbCipTemplateCache.cs:12-15`, `IAbCipTagEnumerator.cs:6-11`, `AbCipDriverOptions.cs:21` | Numerous comments are stale relative to the commit under review. `AbCipDriver.cs:9-11` says the driver "Implements IDriver only for now" with capabilities shipping "in subsequent PRs (3-8)" while the class already implements all of them. `… |
+| Driver.AbCip.Cli-003 | Low | Concurrency & thread safety | `src/Drivers/Cli/ZB.MOM.WW.OtOpcUa.Driver.AbCip.Cli/Commands/SubscribeCommand.cs:50-56,60-61` | The `OnDataChange` handler writes change lines to `console.Output` (a `TextWriter`) from the driver's poll-engine callback thread, while the command's main flow concurrently writes the "Subscribed to ... Ctrl+C to stop." line on the CLI th… |
+| Driver.AbCip.Cli-004 | Low | Error handling & resilience | `src/Drivers/Cli/ZB.MOM.WW.OtOpcUa.Driver.AbCip.Cli/Commands/SubscribeCommand.cs:28,58`; `AbCipCommandBase.cs:26-34` | `--interval-ms` (`IntervalMs`) is taken verbatim and passed as `TimeSpan.FromMilliseconds(IntervalMs)` to `SubscribeAsync` with no validation. A zero or negative value produces a non-positive `TimeSpan`; the option description claims "Poll… |
+| Driver.AbCip.Cli-005 | Low | Performance & resource management | `src/Drivers/Cli/ZB.MOM.WW.OtOpcUa.Driver.Cli.Common/DriverCommandBase.cs:51-59` | `ConfigureLogging` assigns a freshly created Serilog logger to the process-global `Log.Logger` but never calls `Log.CloseAndFlush()`. For a short-lived one-shot command (`probe`, `read`, `write`) the process exit flushes the console sink,… |
+| Driver.AbCip.Cli-006 | Low | Design-document adherence | `src/Drivers/Cli/ZB.MOM.WW.OtOpcUa.Driver.AbCip.Cli/AbCipCommandBase.cs:29-34` | `AbCipCommandBase` overrides the abstract `DriverCommandBase.Timeout` property with a getter derived from `TimeoutMs` and an empty `init` body (`init { /* driven by TimeoutMs */ }`). Because the override has no `[CommandOption]` attribute,… |
+| Driver.AbCip.Cli-007 | Low | Testing coverage | `tests/Drivers/Cli/ZB.MOM.WW.OtOpcUa.Driver.AbCip.Cli.Tests/WriteCommandParseValueTests.cs` | The only test file covers `WriteCommand.ParseValue` and `ReadCommand.SynthesiseTagName` — both pure static helpers. There is no coverage for `AbCipCommandBase.BuildOptions` (the flag-to-`AbCipDriverOptions` mapping that all four commands d… |
+| Driver.AbCip.Cli-008 | Low | Documentation & comments | `docs/Driver.AbCip.Cli.md:8-9` | `docs/Driver.AbCip.Cli.md` opens with "Second of four driver test-client CLIs (Modbus -> AB CIP -> AB Legacy -> S7 -> TwinCAT)." The count "four" contradicts the chain that follows it (five names) and contradicts `docs/DriverClis.md`, whic… |
+| Driver.AbLegacy-005 | Low | OtOpcUa conventions | `AbLegacyDriver.cs` (whole file) | The driver uses no `ILogger`/Serilog at all. Probe-loop failures, runtime initialisation failures, libplctag non-zero statuses, and read/write exceptions are folded into `DriverHealth.Detail` strings but never logged. CLAUDE.md names Seril… |
+| Driver.AbLegacy-011 | Low | Performance & resource management | `AbLegacyDriver.cs:440` | `Dispose()` is implemented as `DisposeAsync().AsTask().GetAwaiter().GetResult()` - sync-over-async. `ShutdownAsync` awaits `_poll.DisposeAsync()` (which completes synchronously) and does no other real async work, so a deadlock is unlikely… |
+| Driver.AbLegacy-013 | Low | Code organization & conventions | `AbLegacyDriver.cs:340-345`, `AbLegacyDriver.cs:238-264` | Two minor organisational issues: 1. `ResolveHost` returns `_options.Devices.FirstOrDefault()?.HostAddress ?? DriverInstanceId` when the reference is unknown and no devices are configured. `DriverInstanceId` is not a host address (ab://...)… |
+| Driver.AbLegacy.Cli-002 | Low | Correctness & logic bugs | `Commands/WriteCommand.cs:27-29`, `Program.cs:6-9` | The `--value` option help text states "booleans accept true/false/1/0", but `ParseBool` (`WriteCommand.cs:74-80`) and the error message also accept `on/off` and `yes/no`, and `DriverClis.md` documents the full `true/false/1/0/yes/no/on/off… |
+| Driver.AbLegacy.Cli-003 | Low | Concurrency & thread safety | `Commands/SubscribeCommand.cs:47-53` | The `OnDataChange` handler calls `console.Output.WriteLine(line)` (the synchronous overload) directly from the `PollGroupEngine` poll thread. The poll engine raises change events from a background timer/loop thread, so two ticks that fire… |
+| Driver.AbLegacy.Cli-004 | Low | Error handling & resilience | `Commands/ProbeCommand.cs:37-56`, `Commands/ReadCommand.cs:39-50`, `Commands/WriteCommand.cs:48-59`, `Commands/SubscribeCommand.cs:41-76` | Every command does `await using var driver = new AbLegacyDriver(...)` *and* an explicit `await driver.ShutdownAsync(...)` in the `finally`. `AbLegacyDriver` `DisposeAsync` itself calls `ShutdownAsync`, so the driver is shut down twice on t… |
+| Driver.AbLegacy.Cli-005 | Low | Design-document adherence | `Commands/SubscribeCommand.cs:23-25`, `docs/Driver.AbLegacy.Cli.md:94-96` | The subscribe command interval option is `--interval-ms` (default 1000). `docs/Driver.AbLegacy.Cli.md` shows the subscribe example as `otopcua-ablegacy-cli subscribe ... -i 500`, which works because of the short alias `'i'`, but the doc ne… |
+| Driver.AbLegacy.Cli-006 | Low | Code organization & conventions | `Commands/ProbeCommand.cs:20-22` | `ProbeCommand` declares its `--type` option with no short alias, while `ReadCommand`, `WriteCommand`, and `SubscribeCommand` all declare `--type` with the short alias `'t'`. `ProbeCommand` also gives `--address` the alias `'a'`, matching t… |
+| Driver.AbLegacy.Cli-007 | Low | Testing coverage | `tests/Drivers/Cli/ZB.MOM.WW.OtOpcUa.Driver.AbLegacy.Cli.Tests/WriteCommandParseValueTests.cs` | The only test file in the CLI test project covers `WriteCommand.ParseValue` and `ReadCommand.SynthesiseTagName`. Two behaviours that are pure logic (testable without a device) are uncovered: (1) `AbLegacyCommandBase.BuildOptions` — that it… |
+| Driver.Cli.Common-004 | Low | Error handling & resilience | `src/Drivers/Cli/ZB.MOM.WW.OtOpcUa.Driver.Cli.Common/SnapshotFormatter.cs:68-70` | `FormatTable` calls `rows.Max(r => r.Tag.Length)` (and the same for the value and status columns) without guarding against empty input. When `tagNames` and `snapshots` are both empty (equal length, so the mismatch check at line 56 passes),… |
+| Driver.Cli.Common-006 | Low | Documentation & comments | `src/Drivers/Cli/ZB.MOM.WW.OtOpcUa.Driver.Cli.Common/SnapshotFormatter.cs:71`, `src/Drivers/Cli/ZB.MOM.WW.OtOpcUa.Driver.Cli.Common/DriverCommandBase.cs:9` | Two minor doc inaccuracies. (1) The comment at `SnapshotFormatter.cs:71` states the "source-time column is fixed-width (ISO-8601 to ms) so no max-measurement needed" — true only when every snapshot has a non-null `SourceTimestampUtc`. `For… |
+| Driver.FOCAS-007 | Low | Error handling & resilience | `FocasDriver.cs:140-148`, `FocasDriver.cs:478-484`, `FocasDriver.cs:529-533`, `FocasAlarmProjection.cs:61-63` | Numerous `try { ... } catch {}` blocks swallow every exception with no logging - `ShutdownAsync` (CTS cancel/dispose), `RecycleLoopAsync` (`DisposeClient`), `FixedTreeLoopAsync` transient catches, `ProbeLoopAsync`, and the alarm projection… |
+| Driver.FOCAS-008 | Low | Performance & resource management | `FocasDriver.cs:201`, `FocasDriver.cs:253` | `ReadAsync` and `WriteAsync` call `FocasAddress.TryParse(def.Address)` on every operation, even though `InitializeAsync` already parsed and validated every tag address. On a subscription hot path (each poll tick re-enters `ReadAsync`) this… |
+| Driver.FOCAS-009 | Low | Design-document adherence | `FocasDriverOptions.cs:110-115`, `FocasDriver.cs:468-486`, `FocasDriverFactoryExtensions.cs:75-80` | `FocasProbeOptions.Timeout` is parsed by the factory (`FocasProbeDto.TimeoutMs` to `FocasProbeOptions.Timeout`) but never consumed. `ProbeLoopAsync` calls `client.ProbeAsync(ct)` with only the probe-loop cancellation token; no per-probe ti… |
+| Driver.FOCAS-010 | Low | Code organization & conventions | `IFocasClient.cs:210-227` (`FocasOpMode`), `FocasConstants.cs:42-78` (`FocasOperationMode`) | There are two parallel operation-mode-to-text mappings with divergent labels. `FocasOpMode.ToText` (used by the driver fixed-tree `OperationMode/ModeText` node) yields `"TJOG"`, `"TEACH_IN_HANDLE"`; `FocasOperationModeExtensions.ToText` (i… |
+| Driver.FOCAS-011 | Low | Code organization & conventions | `IFocasClient.cs:275-287` (`FocasAlarmType`), `FocasAlarmProjection.cs:149-175` | `FocasAlarmType` declares its constants as `public const int`, but the only consumers - `FocasAlarmProjection.MapAlarmType(short type)` and `MapSeverity(short type)` - take a `short` and `switch` against these `int` constants. It compiles… |
+| Driver.FOCAS.Cli-001 | Low | Error handling & resilience | `Commands/WriteCommand.cs:58-68` | `WriteCommand.ParseValue` parses the numeric `--value` types (`Byte`/`Int16`/`Int32`/`Float32`/`Float64`) with `sbyte.Parse` / `short.Parse` / etc. These throw raw `FormatException` or `OverflowException` for malformed or out-of-range inpu… |
+| Driver.FOCAS.Cli-002 | Low | Concurrency & thread safety | `Commands/SubscribeCommand.cs:45-51` | The `subscribe` command attaches an `OnDataChange` handler that calls the synchronous `console.Output.WriteLine`. `OnDataChange` is raised from the driver's `PollGroupEngine` tick thread, while the command's main flow writes the "Subscribe… |
+| Driver.FOCAS.Cli-003 | Low | Error handling & resilience | `FocasCommandBase.cs:19` (`CncPort`), `FocasCommandBase.cs:27` (`TimeoutMs`), `Commands/SubscribeCommand.cs:23` (`IntervalMs`) | The numeric command options `--cnc-port`, `--timeout-ms`, and `--interval-ms` are accepted without range validation. A zero or negative `--cnc-port` produces an invalid `focas://host:<n>` string; `--timeout-ms 0` yields a zero `TimeSpan` o… |
+| Driver.FOCAS.Cli-004 | Low | Performance & resource management | `Commands/ProbeCommand.cs:37,54`; `Commands/ReadCommand.cs:37,46`; `Commands/WriteCommand.cs:45,54`; `Commands/SubscribeCommand.cs:39,73` | Every command declares `await using var driver = new FocasDriver(...)` |
+| Driver.FOCAS.Cli-005 | Low | Design-document adherence | `Commands/WriteCommand.cs:50`, `Commands/ProbeCommand.cs:50` (via `SnapshotFormatter.FormatStatus`) | `docs/Driver.FOCAS.Cli.md` documents `BadDeviceFailure` and `BadCommunicationError` as the key diagnostic signals an operator reads off `probe` / `write` output ("A `BadCommunicationError` means ... `BadDeviceFailure` after a successful co… |
+| Driver.Galaxy-005 | Low | OtOpcUa conventions | `Runtime/EventPump.cs:81-88` | The `BoundedChannelOptions` comment states "Newest-dropped policy: when full, the producer's TryWrite returns false ... We do this manually rather than relying on `BoundedChannelFullMode.DropWrite`" — but the option is then set to `FullMod… |
+| Driver.Galaxy-010 | Low | Security | `GalaxyDriver.cs:311-341` | `ResolveApiKey` supports an `env:`/`file:` indirection and otherwise treats the config string as the literal API key ("Anything else — used as the literal API key. Convenient for dev"). `GalaxyGatewayOptions`' own XML doc claims "the API k… |
+| Driver.Galaxy-012 | Low | Performance & resource management | `Runtime/SubscriptionRegistry.cs:65-67`, `GalaxyDriver.cs:538`, `GalaxyDriver.cs:675` | Several hot paths are O(n^2) per call. `SubscriptionRegistry.ResolveSubscribers` does `entry.Bindings.FirstOrDefault(b => b.ItemHandle == itemHandle)` — a linear scan of the whole binding list for every event dispatch; at 50k tags this is… |
+| Driver.Galaxy-013 | Low | Design-document adherence | `GalaxyDriver.cs:14-27`, `GalaxyDriver.cs:374-382`, `Config/GalaxyDriverOptions.cs:84-86` | Multiple doc comments are stale relative to the shipped code. `GalaxyDriver`'s class summary still describes the file as "the project skeleton with `IDriver` bodies that wire to a future `IGalaxyGatewayClient` abstraction. Capability inter… |
+| Driver.Historian.Wonderware-004 | Low | Correctness and logic bugs | `Backend/SdkAlarmHistorianWriteBackend.cs:198-201` | `ToHistorianEvent` only assigns `historianEvent.Id` when `Guid.TryParse(dto.EventId, ...)` succeeds. If `EventId` is not a parseable GUID (or is empty), `Id` stays `Guid.Empty` and the event is written to the historian with an all-zeros id… |
+| Driver.Historian.Wonderware-005 | Low | Concurrency and thread safety | `Backend/HistorianDataSource.cs:124`, `:126-127` | `GetHealthSnapshot` reads `_activeProcessNode` and `_activeEventNode` inside `_healthLock`, but those two fields are written under `_connectionLock` / `_eventConnectionLock` (lines 183, 243, 209-210, 266-269) — a different lock. The health… |
+| Driver.Historian.Wonderware-007 | Low | Error handling and resilience | `Ipc/PipeServer.cs:70-75` | When `VerifyCaller` rejects the peer SID, the server logs the reason and calls `_current.Disconnect()` with no `HelloAck` frame sent. The shared-secret-mismatch and major-version-mismatch paths below it both send a rejecting `HelloAck` so… |
+| Driver.Historian.Wonderware-008 | Low | Error handling and resilience | `Backend/HistorianDataSource.cs:301-307`, `:374-380` | When `query.StartQuery` returns `false`, `ReadRawAsync` and `ReadAggregateAsync` call `HandleConnectionError()` and return an empty result list. A failed `StartQuery` is not necessarily a connection failure — it can be a bad tag name, an i… |
+| Driver.Historian.Wonderware-010 | Low | Performance and resource management | `Backend/HistorianConfiguration.cs:32-36`, `Backend/HistorianDataSource.cs` (all read methods) | `HistorianConfiguration.RequestTimeoutSeconds` is documented as the "outer safety timeout applied to sync-over-async Historian operations" and is copied around (`SdkAlarmHistorianWriteBackend.CloneConfigWithServerName:346`), but it is neve… |
+| Driver.Historian.Wonderware-011 | Low | Design-document adherence | `Backend/HistorianDataSource.cs:9-12`, `Backend/IHistorianDataSource.cs:9-11`, `Backend/HistorianSample.cs:7-9`, `Backend/HistorianConfiguration.cs:7-9` | Several XML doc comments reference the retired v1 architecture as if it were current: "inside Galaxy.Host", "the Proxy maps returned samples", "the Host returns these across the IPC boundary as `GalaxyDataValue`", "Populated from ... the P… |
+| Driver.Historian.Wonderware-012 | Low | Testing coverage | `Backend/HistorianDataSource.cs`, `tests/Drivers/ZB.MOM.WW.OtOpcUa.Driver.Historian.Wonderware.Tests/` | The unit-test suite covers `HistorianQualityMapper`, `HistorianClusterEndpointPicker`, `SdkAlarmHistorianWriteBackend`, `AahClientManagedAlarmEventWriter`, the IPC round trip, and `Program` alarm-writer wiring. `HistorianDataSource` itself… |
+| Driver.Historian.Wonderware.Client-003 | Low | Concurrency & thread safety | `WonderwareHistorianClient.cs:207`, `WonderwareHistorianClient.cs:132-150` | `_totalQueries` is mutated with `Interlocked.Increment` in `Invoke`, but read inside `GetHealthSnapshot` under `_healthLock`, and every other counter (`_totalSuccesses`, `_totalFailures`, `_consecutiveFailures`) is mutated only under `_hea… |
+| Driver.Historian.Wonderware.Client-004 | Low | Concurrency & thread safety | `WonderwareHistorianClient.cs:203-267` | A sidecar-reported failure is recorded in two non-atomic steps under separate lock acquisitions: `Invoke` calls `RecordSuccess()` (line 211) and then the caller calls `ThrowIfFailed` which calls `ReclassifySuccessAsFailure()` (line 256), d… |
+| Driver.Historian.Wonderware.Client-006 | Low | Error handling & resilience | `Internal/PipeChannel.cs:96-107`, `WonderwareHistorianClientOptions.cs:11-12` | `PipeChannel.InvokeAsync` retries exactly once on transport failure and otherwise propagates. The options expose `ReconnectInitialBackoff` and `ReconnectMaxBackoff` and `WonderwareHistorianClientOptions` documents them as exponential backo… |
+| Driver.Historian.Wonderware.Client-008 | Low | Security | `ZB.MOM.WW.OtOpcUa.Driver.Historian.Wonderware.Client.csproj:29-32` | The csproj suppresses two NuGet audit advisories (`GHSA-37gx-xxp4-5rgx`, `GHSA-w3x6-4m5h-cxqf`) for the `MessagePack` 2.5.187 dependency with no inline comment recording why the suppression is safe, who reviewed it, or when it should be re… |
+| Driver.Historian.Wonderware.Client-010 | Low | Documentation & comments | `WonderwareHistorianClient.cs:355-361`, `WonderwareHistorianClient.cs:132-150` | Two doc/behaviour mismatches. (1) The `Dispose()` XML comment asserts the underlying channel async cleanup is non-blocking so the `GetAwaiter()/GetResult()` bridge is safe. `PipeChannel.DisposeAsync` calls `ResetTransport()`, which invokes… |
+| Driver.Modbus-003 | Low | Concurrency & thread safety | `ModbusDriver.cs:59,188,241,259,266,726,745,759` | `_health` is a non-`volatile` reference field written from multiple threads (concurrent `ReadAsync` callers, the coalesced-read path, `WriteAsync` indirectly, and `ProbeLoopAsync`) and read by `GetHealth()`. Reference assignment is atomic… |
+| Driver.Modbus-007 | Low | Design-document adherence | `ModbusDriver.cs:1392`, `ModbusDriverOptions.cs:74-80` | Two design-vs-code drifts. (1) `MapDataType` maps `Int64`/`UInt64` to `DriverDataType.Int32` with the inline comment "widening to Int32 loses precision; PR 25 adds Int64 to DriverDataType". The address-space node for a 64-bit Modbus tag is… |
+| Driver.Modbus-008 | Low | Documentation & comments | `ModbusDriver.cs:411-417,700-703,737-744` | Stale/misleading comments. (1) The `<summary>` block at `ModbusDriver.cs:411-417` says auto-prohibited ranges are "Cleared by ReinitializeAsync ... or by an explicit re-probe API (not yet shipped)" — the re-probe loop has shipped (#151, `R… |
+| Driver.Modbus-009 | Low | Correctness & logic bugs | `ModbusDriver.cs:1160-1167`, `ModbusTcpTransport.cs:94-95` | Two edge cases. (1) `RegisterCount` for `ModbusDataType.String` computes `(tag.StringLength + 1) / 2`; a tag configured with `StringLength = 0` yields a register count of 0, flowing into `ReadOneAsync` as `totalRegs = 0` and producing an F… |
+| Driver.Modbus-010 | Low | Error handling & resilience | `ModbusDriver.cs:864-868`, `ModbusDriverOptions.cs:116-125` | When `WriteOnChangeOnly` is enabled and `IsRedundantWrite` returns true, `WriteAsync` returns `WriteResult(0u)` (Good) without touching the wire. The suppression baseline (`_lastWrittenByRef`) is only invalidated by a *read* that returns a… |
+| Driver.Modbus-011 | Low | Code organization & conventions | `ModbusDriver.cs:23-43,89-97,408-432` | Field and member declarations are interleaved with methods throughout `ModbusDriver`. `ResolveHost` (a public method) is the first member of the class, followed by `BuildSlaveHostName`, then a block of fields; `_lastPublishedByRef`/`_lastW… |
+| Driver.Modbus-012 | Low | Testing coverage | `tests/Drivers/ZB.MOM.WW.OtOpcUa.Driver.Modbus.Tests/` | The unit suite is broad (coalescing, bisection, auto-recovery, byte order, arrays, BCD, RMW, caps, multi-unit, probe, reconnect, subscription). Gaps relative to the findings above: (1) no test exercises concurrent multi-subscription publis… |
+| Driver.Modbus.Addressing-006 | Low | Error handling & resilience | `ModbusAddressParser.cs:297-301` | `TryParseFamilyNative` catches only `ArgumentException` and `OverflowException`. The current helpers throw only those (including `ArgumentOutOfRangeException`, which derives from `ArgumentException`), so today it is correct. But the parser… |
+| Driver.Modbus.Addressing-007 | Low | Design-document adherence | `ModbusDataType.cs:91-95`, `docs/v2/dl205.md` section Strings | `ModbusStringByteOrder` (HighByteFirst / LowByteFirst) is defined in this assembly and documented as the DL205 low-byte-first string-packing knob, but `ParsedModbusAddress` has no field for it and `ModbusAddressParser` never produces or co… |
+| Driver.Modbus.Addressing-009 | Low | Documentation & comments | `ModbusModiconAddress.cs:55-64`, `ModbusModiconAddress.cs:104-110` | The comments on `ModbusModiconAddress.TryParse` are slightly inaccurate. The remark that 5-digit Modicon is always exactly 5 chars (40001..49999) and 6-digit is exactly 6 (400001..465536-shaped) implies the leading digit is always 4, but t… |
+| Driver.Modbus.Cli-003 | Low | Correctness & logic bugs | `src/Drivers/Cli/ZB.MOM.WW.OtOpcUa.Driver.Modbus.Cli/ModbusCommandBase.cs:14-24` | `Port` (`int`) and `TimeoutMs` (`int`) accept any 32-bit value, including negatives and ports above 65535. `UnitId` is a `byte`, so it accepts 0-255 even though the option description and `docs/Driver.Modbus.Cli.md` both say the valid rang… |
+| Driver.Modbus.Cli-004 | Low | Concurrency & thread safety | `src/Drivers/Cli/ZB.MOM.WW.OtOpcUa.Driver.Modbus.Cli/Commands/SubscribeCommand.cs:61-67` | The `OnDataChange` handler is invoked from the driver's `PollGroupEngine` background thread and calls `console.Output.WriteLine` synchronously. An exception thrown inside this handler (e.g. an `IOException` on a redirected or closed stdout… |
+| Driver.Modbus.Cli-005 | Low | Error handling & resilience | `src/Drivers/Cli/ZB.MOM.WW.OtOpcUa.Driver.Modbus.Cli/Commands/ProbeCommand.cs:21-54`; `Commands/ReadCommand.cs:46-75`; `Commands/WriteCommand.cs:54-89` | All three commands call `ConfigureLogging()` then `console.RegisterCancellationHandler()`, but if the operator presses Ctrl+C before `InitializeAsync` completes, the resulting `OperationCancelledException` propagates out of `ExecuteAsync`… |
+| Driver.Modbus.Cli-006 | Low | Error handling & resilience | `src/Drivers/Cli/ZB.MOM.WW.OtOpcUa.Driver.Modbus.Cli/Commands/ProbeCommand.cs:35-53` | `probe` reports `Health: {health.State}` from `GetHealth()`. After a successful `InitializeAsync` the driver sets state to `Healthy` regardless of whether the subsequent probe register read returns Good or a Bad status code. `ReadAsync` do… |
+| Driver.Modbus.Cli-007 | Low | Design-document adherence | `docs/Driver.Modbus.Cli.md:124-156`; `src/Drivers/Cli/ZB.MOM.WW.OtOpcUa.Driver.Modbus.Cli/Commands/ReadCommand.cs` | `docs/Driver.Modbus.Cli.md` devotes a whole "v2 addressing grammar" section to the industry-standard tag-address strings (`40001:F:CDAB`, `HR1:I`, `C100`, `V2000:F:CDAB`, etc.) and says "set the per-tag `addressString` field instead of the… |
+| Driver.Modbus.Cli-008 | Low | Testing coverage | `tests/Drivers/Cli/ZB.MOM.WW.OtOpcUa.Driver.Modbus.Cli.Tests/` | The test project covers only the two pure-function seams: `ReadCommand.SynthesiseTagName` and `WriteCommand.ParseValue`. There is no coverage for `WriteCommand`'s read-only-region rejection (`Region is not (Coils or HoldingRegisters)`), no… |
+| Driver.OpcUaClient-011 | Low | Documentation & comments | `OpcUaClientDriver.cs:783-784` | The comment on the isArray computation states "-1 = scalar; 1+ = array dimensions; 0 = one-dimensional array". This is inaccurate against OPC UA ValueRank semantics: -3 is ScalarOrOneDimension, -2 is Any, -1 is Scalar, and 0 is OneOrMoreDi… |
+| Driver.OpcUaClient-014 | Low | Performance & resource management | `OpcUaClientDriver.cs:904`, `:1035` | `MonitoredItem.Notification += (mi, args) => ...` (and the alarm-event equivalent) attaches a closure-capturing lambda to each monitored item's event. The lambda is never detached. When UnsubscribeAsync removes a subscription it calls Subs… |
+| Driver.S7-003 | Low | Correctness & logic bugs | `S7Driver.cs:172`, `S7Driver.cs:255` | ReadAsync and WriteAsync dereference fullReferences.Count / writes.Count with no null guard. A null argument throws NullReferenceException rather than ArgumentNullException, and the NRE escapes before the _gate is taken so it is not wrappe… |
+| Driver.S7-005 | Low | OtOpcUa conventions | `S7Driver.cs:33`, `S7Driver.cs:433` | System.Collections.Concurrent.ConcurrentDictionary is written out with a fully-qualified namespace at the field declarations instead of a using System.Collections.Concurrent directive. ImplicitUsings is enabled and the rest of the codebase… |
+| Driver.S7-009 | Low | Error handling & resilience | `S7Driver.cs:392` | The subscription poll loop never reflects sustained polling failure anywhere an operator can see it. PollLoopAsync swallows every non-cancellation exception with an empty catch and the comment claims "the health surface reflects it" - but… |
+| Driver.S7-010 | Low | Performance & resource management | `S7Driver.cs:504` | Dispose() is implemented as DisposeAsync().AsTask().GetAwaiter().GetResult() - sync-over-async. Inside the generic host this is currently safe (no captured SynchronizationContext), but it is a known deadlock pattern. The only async work be… |
+| Driver.S7-013 | Low | Code organization & conventions | `S7DriverOptions.cs:90`, `S7Driver.cs:300` | S7TagDefinition.StringLength is a public configured/JSON-bound parameter (default 254) but is dead: S7DataType.String reads and writes both throw NotSupportedException ("...land in a follow-up PR"), so StringLength is never consumed. Likew… |
+| Driver.S7.Cli-004 | Low | Performance & resource management | `src/Drivers/Cli/ZB.MOM.WW.OtOpcUa.Driver.S7.Cli/Commands/ProbeCommand.cs:36,53`, `Commands/ReadCommand.cs:45,54`, `Commands/WriteCommand.cs:51,60`, `Commands/SubscribeCommand.cs:39,73` | Every command declares the driver with `await using var driver = new S7Driver(...)` and *also* calls `await driver.ShutdownAsync(...)` in a `finally` block. `S7Driver.DisposeAsync` itself calls `ShutdownAsync`, so shutdown runs twice per c… |
+| Driver.S7.Cli-005 | Low | Code organization & conventions | `tests/ZB.MOM.WW.OtOpcUa.Driver.S7.Cli.Tests/` | A stale directory `tests/ZB.MOM.WW.OtOpcUa.Driver.S7.Cli.Tests/` exists containing only an `obj/` folder — no `.csproj`, no source. The real test project lives at `tests/Drivers/Cli/ZB.MOM.WW.OtOpcUa.Driver.S7.Cli.Tests/`. The empty direct… |
+| Driver.S7.Cli-006 | Low | Testing coverage | `tests/Drivers/Cli/ZB.MOM.WW.OtOpcUa.Driver.S7.Cli.Tests/WriteCommandParseValueTests.cs` | The only test file covers `WriteCommand.ParseValue` and `ReadCommand.SynthesiseTagName`. `S7CommandBase.BuildOptions` — which maps the host / port / CPU / rack / slot / timeout flags onto an `S7DriverOptions` and forces `Probe.Enabled = fa… |
+| Driver.S7.Cli-007 | Low | Documentation & comments | `src/Drivers/Cli/ZB.MOM.WW.OtOpcUa.Driver.S7.Cli/Commands/SubscribeCommand.cs:45-51` | The Modbus CLI `SubscribeCommand` carries an explanatory comment on the `OnDataChange` handler ("Route every data-change event to the CliFx console (not System.Console — the analyzer flags it + IConsole is the testable abstraction)"). The… |
+| Driver.TwinCAT-004 | Low | Correctness & logic bugs | `TwinCATDataType.cs:24-27` | The inline comments for the IEC time types are inaccurate. TwinCAT `TIME` is a duration (32-bit, milliseconds) — not "ms since epoch of day". `DATE` is stored as seconds since 1970-01-01 (truncated to a day boundary), not "days since 1970-… |
+| Driver.TwinCAT-006 | Low | OtOpcUa conventions | `TwinCATDriver.cs:406-411` | `ResolveHost` falls back to `DriverInstanceId` when there are no configured devices and the reference is unknown. `DriverInstanceId` is a logical config-DB identifier, not a host address; `IPerCallHostResolver` consumers expect a host key… |
+| Driver.TwinCAT-014 | Low | Design-document adherence | `TwinCATDriverOptions.cs:41-43`, `TwinCATDriverOptions.cs:57-62`, `AdsTwinCATClient.cs:145` | Several drifts between the implemented config surface and `docs/v2/driver-specs.md` section 6. The spec connection-settings list has separate `Host` (IP), `AmsNetId`, and `AmsPort` fields; the implementation collapses these into a single `… |
+| Driver.TwinCAT-015 | Low | Code organization & conventions | `TwinCATDriver.cs:431-432` | `Dispose()` runs `DisposeAsync().AsTask().GetAwaiter().GetResult()` — sync-over-async. `docs/v2/driver-stability.md` section Galaxy explicitly lists "sync-over-async on the OPC UA stack thread" among the four 2026-04-13 stability findings… |
+| Driver.TwinCAT-016 | Low | Testing coverage | `tests/Drivers/ZB.MOM.WW.OtOpcUa.Driver.TwinCAT.Tests/` | Unit coverage exists for AMS-address parsing, symbol-path parsing, read/write, native notifications, symbol browse, and the capability surface. Gaps tied to the findings above: no test exercises `ReinitializeAsync` with a changed config (D… |
+| Driver.TwinCAT.Cli-001 | Low | Correctness & logic bugs | `TwinCATCommandBase.cs:23-24`, `Commands/SubscribeCommand.cs:23-24`, `Commands/BrowseCommand.cs:21-24` | Numeric command options are accepted without range validation. `--timeout-ms` feeds `Timeout => TimeSpan.FromMilliseconds(TimeoutMs)`; passing `--timeout-ms 0` or a negative value yields `TimeSpan.Zero`/a negative `TimeSpan`, which is then… |
+| Driver.TwinCAT.Cli-002 | Low | Concurrency & thread safety | `Commands/SubscribeCommand.cs:46-58` | The `OnDataChange` handler calls `console.Output.WriteLine(line)` synchronously. In native ADS-notification mode the event is raised from the `Beckhoff.TwinCAT.Ads` notification callback thread (see `TwinCATDriver.SubscribeAsync`, which in… |
+| Driver.TwinCAT.Cli-003 | Low | Error handling & resilience | `Commands/SubscribeCommand.cs:56-58` | The subscribe banner reports the mechanism purely from the `--poll-only` flag (`var mode = PollOnly ? "polling" : "ADS notification"`). The doc (`docs/Driver.TwinCAT.Cli.md`) states the banner "announces which mechanism is in play". The CL… |
+| Driver.TwinCAT.Cli-004 | Low | Design-document adherence | `TwinCATCommandBase.cs:26-29`, `Commands/BrowseCommand.cs` | `--poll-only` is declared on `TwinCATCommandBase`, so it is inherited by `browse`. `BrowseCommand` only ever calls `DiscoverAsync` — it never subscribes — so `UseNativeNotifications = !PollOnly` has no observable effect on a browse run. Th… |
+| Driver.TwinCAT.Cli-005 | Low | Code organization & conventions | `Commands/ProbeCommand.cs:23`, `Commands/ReadCommand.cs:20`, `Commands/WriteCommand.cs:20`, `Commands/SubscribeCommand.cs:18` | The `--type` option is declared with the short alias `-t` on `read`, `write`, and `subscribe`, but `ProbeCommand` declares `[CommandOption("type", ...)]` with no short alias. An operator who has internalised `-t` from the other three verbs… |
+| Driver.TwinCAT.Cli-006 | Low | Testing coverage | `tests/Drivers/Cli/ZB.MOM.WW.OtOpcUa.Driver.TwinCAT.Cli.Tests/WriteCommandParseValueTests.cs` | The only test file covers `WriteCommand.ParseValue` and `ReadCommand.SynthesiseTagName`. Other deterministic, router-independent logic is untested: `TwinCATCommandBase.Gateway` (the `ads://{netId}:{port}` string the driver's `TwinCATAmsAdd… |
+| Driver.TwinCAT.Cli-007 | Low | Documentation & comments | `TwinCATCommandBase.cs:31-36` | The `Timeout` override has an empty `init` accessor with the comment `/* driven by TimeoutMs */`. Because the base `DriverCommandBase.Timeout` is declared `abstract { get; init; }`, the override must supply an `init`, but here it silently… |
+| Server-004 | Low | OtOpcUa conventions | `src/Server/ZB.MOM.WW.OtOpcUa.Server/OpcUa/OtOpcUaServer.cs:187-200` | `RoleBasedIdentity` declares its own `Display` property, but the base `UserIdentity` already has a settable `DisplayName`. `DriverNodeManager.ResolveCallUser`/`RouteScriptedAlarmMethodCalls` read the base `DisplayName`, never `Display`. Si… |
+| Server-006 | Low | Concurrency & thread safety | `src/Server/ZB.MOM.WW.OtOpcUa.Server/OpcUa/DriverNodeManager.cs:478-482, 1342-1348` | `OnReadValue`/`OnWriteValue` are synchronous stack hooks that block on async driver calls via `.GetAwaiter().GetResult()` with `CancellationToken.None`. With `MaxRequestThreadCount = 100`, a burst of reads/writes into a stalled driver pins… |
+| Server-008 | Low | Error handling & resilience | `src/Server/ZB.MOM.WW.OtOpcUa.Server/OpcUa/DriverNodeManager.cs:736` | `RouteScriptedAlarmMethodCalls` marks a handled slot by setting `errors[i] = ServiceResult.Good`, assuming `base.Call` skips non-null *Good* error slots. The stack and `GateCallMethodRequests` only ever pre-populate *Bad* slots; the skip-o… |
+| Server-012 | Low | Performance & resource management | `src/Server/ZB.MOM.WW.OtOpcUa.Server/Hosting/PeerHttpProbeLoop.cs:78-79` | `ProbeAsync` creates an `IHttpClientFactory` client and mutates `client.Timeout` on every 2-second probe tick. The timeout belongs on the request or on the named-client registration, not set per call on a factory-vended instance. |
+| Server-014 | Low | Code organization & conventions | `src/Server/ZB.MOM.WW.OtOpcUa.Server/SealedBootstrap.cs` | `SealedBootstrap` claims in its xml-doc to "close release blocker #2" by consuming the generation-sealed cache + resilient reader + stale-config flag, but `Program.cs` registers and uses `NodeBootstrap` instead. `SealedBootstrap` is never… |
+| Server-015 | Low | Documentation & comments | `src/Server/ZB.MOM.WW.OtOpcUa.Server/OpcUa/OtOpcUaServer.cs:16-21`, `src/Server/ZB.MOM.WW.OtOpcUa.Server/OpcUa/OpcUaServerOptions.cs:21-26` | `OtOpcUaServer`'s class doc still says "PR 16 minimum-viable scope ... no security ... LDAP + security profiles are deferred." `OpcUaServerOptions`'s says "PR 17 minimum-viable scope: no LDAP, no security profiles beyond None." Both are st… |
+
+## Closed findings
+
+Findings with status `Resolved`, `Won't Fix`, or `Deferred`.
+
+| ID | Severity | Status | Category | Location |
+|---|---|---|---|---|
+| Admin-001 | Critical | Resolved | Security | `Components/Routes.razor:4-11`, `Program.cs:150` |
+| Admin-002 | Critical | Resolved | Security | `Components/Pages/Clusters/NewCluster.razor:1-7`, `Home.razor`, `Fleet.razor`, `Hosts.razor`, `AlarmsHistorian.razor`, `Clusters/ClustersList.razor`, `Clusters/Generations.razor`, `Drivers/FocasDetail.razor` |
+| Core.AlarmHistorian-001 | Critical | Resolved | Correctness & logic bugs | `src/Core/ZB.MOM.WW.OtOpcUa.Core.AlarmHistorian/SqliteStoreAndForwardSink.cs:255-278` |
+| Core.Scripting-001 | Critical | Resolved | Security | `ForbiddenTypeAnalyzer.cs:45`, `ScriptSandbox.cs:54` |
+| Driver.Galaxy-001 | Critical | Resolved | Error handling & resilience | `Runtime/EventPump.cs:128`, `GalaxyDriver.cs:222` |
+| Server-001 | Critical | Resolved | Correctness & logic bugs | `src/Server/ZB.MOM.WW.OtOpcUa.Server/OpcUa/DriverNodeManager.cs:1791` |
+| Admin-003 | High | Resolved | Security | `Program.cs:137-139`, `Hubs/FleetStatusHub.cs:11`, `Hubs/AlertHub.cs:10`, `Hubs/ScriptLogHub.cs:30` |
+| Admin-004 | High | Resolved | Security | `appsettings.json:3,13-14` |
+| Admin-005 | High | Resolved | Correctness & logic bugs | `Components/Pages/Login.razor:15,107-110` |
+| Admin-013 | High | Resolved | Error handling & resilience | `Components/Pages/Clusters/ClusterDetail.razor:180-197`, `Components/Pages/Clusters/AclsTab.razor`, `Components/Pages/Clusters/RedundancyTab.razor`, `Components/Pages/RoleGrants.razor`, `Components/Pages/Hosts.razor`, `Components/Pages/ScriptLog.razor`, `Program.cs:157-159` |
+| Client.Shared-005 | High | Resolved | Concurrency & thread safety | `OpcUaClientService.cs:19`, `OpcUaClientService.cs:226-249`, `OpcUaClientService.cs:499-521` |
+| Client.Shared-006 | High | Resolved | Concurrency & thread safety | `OpcUaClientService.cs:97-100`, `OpcUaClientService.cs:432-497` |
+| Configuration-001 | High | Resolved | Correctness & logic bugs | `src/Core/ZB.MOM.WW.OtOpcUa.Configuration/Migrations/20260417215224_StoredProcedures.cs:282` |
+| Configuration-008 | High | Resolved | Security | `src/Core/ZB.MOM.WW.OtOpcUa.Configuration/Migrations/20260417215224_StoredProcedures.cs:150`, `:373`, `:468` |
+| Core-001 | High | Resolved | Correctness & logic bugs | `src/Core/ZB.MOM.WW.OtOpcUa.Core/Authorization/UserAuthorizationState.cs:50-68` |
+| Core-002 | High | Resolved | Security | `src/Core/ZB.MOM.WW.OtOpcUa.Core/Authorization/TriePermissionEvaluator.cs:24-50` |
+| Core.AlarmHistorian-002 | High | Resolved | Correctness & logic bugs | `src/Core/ZB.MOM.WW.OtOpcUa.Core.AlarmHistorian/SqliteStoreAndForwardSink.cs:99-105,386-388` |
+| Core.AlarmHistorian-004 | High | Resolved | Concurrency & thread safety | `src/Core/ZB.MOM.WW.OtOpcUa.Core.AlarmHistorian/SqliteStoreAndForwardSink.cs:90,112,176,259` |
+| Core.AlarmHistorian-006 | High | Resolved | Error handling & resilience | `src/Core/ZB.MOM.WW.OtOpcUa.Core.AlarmHistorian/SqliteStoreAndForwardSink.cs:103,135-216` |
+| Core.ScriptedAlarms-001 | High | Resolved | Concurrency & thread safety | `ScriptedAlarmEngine.cs:175`, `ScriptedAlarmEngine.cs:178`, `ScriptedAlarmEngine.cs:73`, `ScriptedAlarmEngine.cs:368` |
+| Core.Scripting-002 | High | Resolved | Security | `ForbiddenTypeAnalyzer.cs:70` |
+| Core.VirtualTags-001 | High | Resolved | Correctness & logic bugs | `src/Core/ZB.MOM.WW.OtOpcUa.Core.VirtualTags/VirtualTagEngine.cs:306` |
+| Driver.AbCip-001 | High | Resolved | Correctness & logic bugs | `AbCipDriver.cs:111`, `AbCipDriver.cs:163-167` |
+| Driver.AbCip-002 | High | Resolved | Correctness & logic bugs | `AbCipStatusMapper.cs:65-78` |
+| Driver.AbCip-003 | High | Resolved | Correctness & logic bugs | `AbCipUdtMemberLayout.cs:32-54`, `AbCipDriver.cs:426-430`, `AbCipUdtReadPlanner.cs:48` |
+| Driver.AbCip-008 | High | Resolved | Concurrency & thread safety | `AbCipDriver.cs:144-152`, `AbCipDriver.cs:169-183`, `AbCipDriver.cs:235-281` |
+| Driver.AbLegacy-001 | High | Resolved | Correctness & logic bugs | `AbLegacyAddress.cs:54`, `AbLegacyDriver.cs:368-374` |
+| Driver.AbLegacy-006 | High | Resolved | Concurrency & thread safety | `AbLegacyDriver.cs:107-158`, `AbLegacyDriver.cs:162-234`, `LibplctagLegacyTagRuntime.cs` |
+| Driver.Cli.Common-001 | High | Resolved | Correctness & logic bugs | `src/Drivers/Cli/ZB.MOM.WW.OtOpcUa.Driver.Cli.Common/SnapshotFormatter.cs:106-119` |
+| Driver.FOCAS-001 | High | Resolved | Correctness & logic bugs | `FocasDriverFactoryExtensions.cs:54-86`, `FocasDriverFactoryExtensions.cs:132-140` |
+| Driver.FOCAS-002 | High | Resolved | Correctness & logic bugs | `WireFocasClient.cs:164-179`, `FocasDriver.cs:513`, `FocasDriver.cs:593` |
+| Driver.Galaxy-002 | High | Resolved | Correctness & logic bugs | `Browse/DataTypeMap.cs:13`, `Runtime/MxValueDecoder.cs:9` |
+| Driver.Galaxy-008 | High | Resolved | Error handling & resilience | `GalaxyDriver.cs:264-276`, `Runtime/EventPump.cs:97-103` |
+| Driver.Historian.Wonderware-001 | High | Resolved | Correctness and logic bugs | `Backend/SdkAlarmHistorianWriteBackend.cs:68`, `Backend/AahClientManagedAlarmEventWriter.cs:82-103` |
+| Driver.Historian.Wonderware.Client-001 | High | Resolved | Correctness & logic bugs | `WonderwareHistorianClient.cs:98-113` |
+| Driver.Modbus-001 | High | Resolved | Concurrency & thread safety | `ModbusDriver.cs:92,99-122` |
+| Driver.Modbus.Addressing-001 | High | Resolved | Correctness & logic bugs | `ModbusAddressParser.cs:230-235`, `DirectLogicAddress.cs:66-73` |
+| Driver.OpcUaClient-001 | High | Resolved | Correctness & logic bugs | `OpcUaClientDriver.cs:444`, `:466`, `:517`, `:540`, `:599`, `:610` |
+| Driver.OpcUaClient-002 | High | Resolved | Error handling & resilience | `OpcUaClientDriver.cs:1330-1359` |
+| Driver.OpcUaClient-003 | High | Resolved | Correctness & logic bugs | `OpcUaClientDriver.cs:644-711` |
+| Driver.OpcUaClient-004 | High | Resolved | Design-document adherence | `OpcUaClientDriver.cs:596-632`, `:789`, `OpcUaClientDriverOptions.cs` |
+| Driver.OpcUaClient-005 | High | Resolved | Concurrency & thread safety | `OpcUaClientDriver.cs:1297-1319` |
+| Driver.S7-001 | High | Resolved | Correctness & logic bugs | `S7AddressParser.cs:93`, `S7Driver.cs:231` |
+| Driver.S7-006 | High | Resolved | Concurrency & thread safety | `S7Driver.cs:140`, `S7Driver.cs:457`, `S7Driver.cs:506` |
+| Driver.S7-007 | High | Resolved | Error handling & resilience | `S7Driver.cs:200`, `S7DriverOptions.cs:13`, `docs/v2/driver-specs.md:434` |
+| Driver.S7-011 | High | Resolved | Design-document adherence | `S7Driver.cs:82`, `S7Driver.cs:134`, `IDriver.cs:24` |
+| Driver.TwinCAT-001 | High | Resolved | Correctness & logic bugs | `TwinCATDriver.cs:41-78` |
+| Driver.TwinCAT-002 | High | Resolved | Correctness & logic bugs | `TwinCATDataType.cs:34-48`, `AdsTwinCATClient.cs:264-281` |
+| Driver.TwinCAT-007 | High | Resolved | Concurrency & thread safety | `TwinCATDriver.cs:413-429` |
+| Driver.TwinCAT-008 | High | Resolved | Concurrency & thread safety | `AdsTwinCATClient.cs:162-169`, `TwinCATDriver.cs:319-324` |
+| Driver.TwinCAT-013 | High | Resolved | Design-document adherence | `TwinCATDriver.cs:11-12` (capability list), whole file |
+| Server-002 | High | Resolved | Correctness & logic bugs | `src/Server/ZB.MOM.WW.OtOpcUa.Server/Security/AuthorizationGate.cs:60-63` |
+| Server-009 | High | Resolved | Security | `src/Server/ZB.MOM.WW.OtOpcUa.Server/Security/LdapOptions.cs:44`, `src/Server/ZB.MOM.WW.OtOpcUa.Server/Program.cs:74` |
+| Admin-006 | Medium | Resolved | Security | `Components/Layout/MainLayout.razor:47-49`, `Program.cs:129,131-135` |
+| Admin-007 | Medium | Resolved | Design-document adherence | `Components/Pages/Clusters/NewCluster.razor:91,95-96` |
+| Admin-008 | Medium | Resolved | Error handling & resilience | `Services/ReservationService.cs:28-37` |
+| Admin-009 | Medium | Resolved | Testing coverage | `src/Server/ZB.MOM.WW.OtOpcUa.Admin` (whole module) |
+| Analyzers-001 | Medium | Resolved | Correctness & logic bugs | `src/Tooling/ZB.MOM.WW.OtOpcUa.Analyzers/UnwrappedCapabilityCallAnalyzer.cs:135-139` |
+| Analyzers-006 | Medium | Resolved | Testing coverage | `tests/Tooling/ZB.MOM.WW.OtOpcUa.Analyzers.Tests/UnwrappedCapabilityCallAnalyzerTests.cs` |
+| Client.CLI-001 | Medium | Resolved | Correctness & logic bugs | `Commands/HistoryReadCommand.cs:73`, `Commands/HistoryReadCommand.cs:76` |
+| Client.CLI-005 | Medium | Resolved | Concurrency & thread safety | `Commands/SubscribeCommand.cs:66-78`, `Commands/AlarmsCommand.cs:52-64` |
+| Client.Shared-001 | Medium | Resolved | Correctness & logic bugs | `OpcUaClientService.cs:552` |
+| Client.Shared-002 | Medium | Resolved | Correctness & logic bugs | `OpcUaClientService.cs:351-355`, `OpcUaClientService.cs:373` |
+| Client.Shared-007 | Medium | Resolved | Concurrency & thread safety | `OpcUaClientService.cs:581-622` |
+| Client.Shared-008 | Medium | Resolved | Error handling & resilience | `OpcUaClientService.cs:170-180`, `Helpers/ValueConverter.cs:15-31` |
+| Client.UI-001 | Medium | Resolved | Correctness & logic bugs | `ViewModels/HistoryViewModel.cs:76`, `ViewModels/HistoryViewModel.cs:77` |
+| Client.UI-002 | Medium | Resolved | Correctness & logic bugs | `ViewModels/MainWindowViewModel.cs:255`, `ViewModels/MainWindowViewModel.cs:333` |
+| Client.UI-005 | Medium | Resolved | Concurrency & thread safety | `ViewModels/MainWindowViewModel.cs:286-304`, `ViewModels/MainWindowViewModel.cs:155-189` |
+| Client.UI-007 | Medium | Resolved | Security | `Services/UserSettings.cs:22-23`, `Services/JsonSettingsService.cs:38-50`, `ViewModels/MainWindowViewModel.cs:393-408` |
+| Client.UI-008 | Medium | Resolved | Performance & resource management | `ViewModels/MainWindowViewModel.cs:18`, `ViewModels/MainWindowViewModel.cs:125-148`, `App.axaml.cs:18-32` |
+| Configuration-002 | Medium | Resolved | Correctness & logic bugs | `src/Core/ZB.MOM.WW.OtOpcUa.Configuration/Migrations/20260417215224_StoredProcedures.cs:325` |
+| Configuration-003 | Medium | Resolved | Correctness & logic bugs | `src/Core/ZB.MOM.WW.OtOpcUa.Configuration/Validation/DraftValidator.cs:73` |
+| Configuration-006 | Medium | Resolved | Error handling & resilience | `src/Core/ZB.MOM.WW.OtOpcUa.Configuration/LocalCache/ResilientConfigReader.cs:79` |
+| Configuration-009 | Medium | Resolved | Security | `src/Core/ZB.MOM.WW.OtOpcUa.Configuration/DesignTimeDbContextFactory.cs:14` |
+| Core-003 | Medium | Resolved | Correctness & logic bugs | `src/Core/ZB.MOM.WW.OtOpcUa.Core/Authorization/PermissionTrie.cs:80-98` |
+| Core-005 | Medium | Resolved | Concurrency & thread safety | `src/Core/ZB.MOM.WW.OtOpcUa.Core/Authorization/PermissionTrieCache.cs:59-70` |
+| Core-006 | Medium | Resolved | Concurrency & thread safety | `src/Core/ZB.MOM.WW.OtOpcUa.Core/OpcUa/GenericDriverNodeManager.cs:42-64` |
+| Core-007 | Medium | Resolved | Error handling & resilience | `src/Core/ZB.MOM.WW.OtOpcUa.Core/Resilience/AlarmSurfaceInvoker.cs:75-83` |
+| Core.Abstractions-001 | Medium | Resolved | Correctness & logic bugs | `src/Core/ZB.MOM.WW.OtOpcUa.Core.Abstractions/PollGroupEngine.cs:112` |
+| Core.Abstractions-002 | Medium | Resolved | Correctness & logic bugs | `src/Core/ZB.MOM.WW.OtOpcUa.Core.Abstractions/PollGroupEngine.cs:105-109` |
+| Core.Abstractions-003 | Medium | Resolved | Concurrency & thread safety | `src/Core/ZB.MOM.WW.OtOpcUa.Core.Abstractions/PollGroupEngine.cs:64,121-130` |
+| Core.AlarmHistorian-003 | Medium | Resolved | OtOpcUa conventions | `src/Core/ZB.MOM.WW.OtOpcUa.Core.AlarmHistorian/SqliteStoreAndForwardSink.cs:107-127,218-243,246-253` |
+| Core.AlarmHistorian-005 | Medium | Resolved | Concurrency & thread safety | `src/Core/ZB.MOM.WW.OtOpcUa.Core.AlarmHistorian/SqliteStoreAndForwardSink.cs:66-71,141-143,199,386-388` |
+| Core.AlarmHistorian-007 | Medium | Resolved | Error handling & resilience | `src/Core/ZB.MOM.WW.OtOpcUa.Core.AlarmHistorian/SqliteStoreAndForwardSink.cs:172-174` |
+| Core.AlarmHistorian-009 | Medium | Resolved | Design-document adherence | `src/Core/ZB.MOM.WW.OtOpcUa.Core.AlarmHistorian/SqliteStoreAndForwardSink.cs:317-347` |
+| Core.AlarmHistorian-010 | Medium | Resolved | Testing coverage | `tests/Core/ZB.MOM.WW.OtOpcUa.Core.AlarmHistorian.Tests/SqliteStoreAndForwardSinkTests.cs` |
+| Core.ScriptedAlarms-002 | Medium | Resolved | Correctness & logic bugs | `ScriptedAlarmEngine.cs:162`, `ScriptedAlarmEngine.cs:90` |
+| Core.ScriptedAlarms-004 | Medium | Resolved | Concurrency & thread safety | `ScriptedAlarmEngine.cs:138-143`, `ScriptedAlarmEngine.cs:227-234` |
+| Core.ScriptedAlarms-005 | Medium | Resolved | Concurrency & thread safety | `ScriptedAlarmEngine.cs:365-369`, `ScriptedAlarmEngine.cs:416-424` |
+| Core.ScriptedAlarms-007 | Medium | Resolved | Error handling & resilience | `ScriptedAlarmEngine.cs:216`, `ScriptedAlarmEngine.cs:251`, `ScriptedAlarmEngine.cs:154`, `ScriptedAlarmEngine.cs:387` |
+| Core.ScriptedAlarms-012 | Medium | Resolved | Testing coverage | `tests/Core/ZB.MOM.WW.OtOpcUa.Core.ScriptedAlarms.Tests/ScriptedAlarmEngineTests.cs` |
+| Core.Scripting-003 | Medium | Resolved | Security | `TimedScriptEvaluator.cs:9`, `ScriptSandbox.cs:30` |
+| Core.Scripting-004 | Medium | Resolved | Correctness & logic bugs | `DependencyExtractor.cs:73` |
+| Core.Scripting-007 | Medium | Resolved | Error handling & resilience | `TimedScriptEvaluator.cs:60` |
+| Core.Scripting-010 | Medium | Resolved | Testing coverage | `tests/Core/ZB.MOM.WW.OtOpcUa.Core.Scripting.Tests/ScriptSandboxTests.cs:54` |
+| Core.VirtualTags-002 | Medium | Resolved | Correctness & logic bugs | `src/Core/ZB.MOM.WW.OtOpcUa.Core.VirtualTags/VirtualTagEngine.cs:237` |
+| Core.VirtualTags-003 | Medium | Resolved | Correctness & logic bugs | `src/Core/ZB.MOM.WW.OtOpcUa.Core.VirtualTags/VirtualTagEngine.cs:117-120` |
+| Core.VirtualTags-005 | Medium | Resolved | Concurrency & thread safety | `src/Core/ZB.MOM.WW.OtOpcUa.Core.VirtualTags/VirtualTagSource.cs:50-64` |
+| Core.VirtualTags-008 | Medium | Resolved | Performance & resource management | `src/Core/ZB.MOM.WW.OtOpcUa.Core.VirtualTags/DependencyGraph.cs:81-115` |
+| Core.VirtualTags-012 | Medium | Resolved | Testing coverage | `tests/Core/ZB.MOM.WW.OtOpcUa.Core.VirtualTags.Tests/` |
+| Driver.AbCip-004 | Medium | Resolved | Correctness & logic bugs | `AbCipDataType.cs:51-58`, `LibplctagTagRuntime.cs:47-49,53` |
+| Driver.AbCip-005 | Medium | Resolved | Correctness & logic bugs | `AbCipDriver.cs:124-141` |
+| Driver.AbCip-006 | Medium | Resolved | OtOpcUa conventions | `PlcTagHandle.cs:28-59`, `AbCipDriver.cs:806-807,832-833`, `LibplctagTagRuntime.cs:117` |
+| Driver.AbCip-009 | Medium | Resolved | Concurrency & thread safety | `AbCipDriver.cs:621-648`, `AbCipDriver.cs:591-614` |
+| Driver.AbCip-010 | Medium | Resolved | Error handling & resilience | `AbCipDriver.cs:621-648`, `AbCipDriver.cs:346-391` |
+| Driver.AbCip-014 | Medium | Resolved | Testing coverage | `tests/Drivers/ZB.MOM.WW.OtOpcUa.Driver.AbCip.Tests/AbCipStatusMapperTests.cs:28-40` |
+| Driver.AbCip.Cli-001 | Medium | Resolved | Error handling & resilience | `src/Drivers/Cli/ZB.MOM.WW.OtOpcUa.Driver.AbCip.Cli/Commands/WriteCommand.cs:70-85` |
+| Driver.AbCip.Cli-002 | Medium | Resolved | Correctness & logic bugs | `src/Drivers/Cli/ZB.MOM.WW.OtOpcUa.Driver.AbCip.Cli/Commands/ProbeCommand.cs:21-23`; `Commands/ReadCommand.cs:24-25`; `Commands/SubscribeCommand.cs:20-22` |
+| Driver.AbLegacy-002 | Medium | Resolved | Correctness & logic bugs | `AbLegacyDriver.cs:368` |
+| Driver.AbLegacy-003 | Medium | Resolved | Correctness & logic bugs | `AbLegacyAddress.cs:62-95` |
+| Driver.AbLegacy-004 | Medium | Resolved | Correctness & logic bugs | `LibplctagLegacyTagRuntime.cs:36-37` |
+| Driver.AbLegacy-007 | Medium | Resolved | Concurrency & thread safety | `AbLegacyDriver.cs:411-438`, `AbLegacyDriver.cs:386-409` |
+| Driver.AbLegacy-008 | Medium | Resolved | Concurrency & thread safety | `AbLegacyDriver.cs:21`, `AbLegacyDriver.cs:138-146`, `AbLegacyDriver.cs:216-229` |
+| Driver.AbLegacy-009 | Medium | Resolved | Error handling & resilience | `AbLegacyDriver.cs:41-74` |
+| Driver.AbLegacy-010 | Medium | Resolved | Error handling & resilience | `AbLegacyStatusMapper.cs:26-56` |
+| Driver.AbLegacy-012 | Medium | Resolved | Design-document adherence | `PlcFamilies/AbLegacyPlcFamilyProfile.cs:7-54`, `AbLegacyDriver.cs:48-52` |
+| Driver.AbLegacy.Cli-001 | Medium | Resolved | Error handling & resilience | `Commands/WriteCommand.cs:46`, `Commands/WriteCommand.cs:62-72` |
+| Driver.Cli.Common-002 | Medium | Resolved | Correctness & logic bugs | `src/Drivers/Cli/ZB.MOM.WW.OtOpcUa.Driver.Cli.Common/SnapshotFormatter.cs:101-122` |
+| Driver.Cli.Common-003 | Medium | Resolved | Concurrency & thread safety | `src/Drivers/Cli/ZB.MOM.WW.OtOpcUa.Driver.Cli.Common/DriverCommandBase.cs:51-59` |
+| Driver.Cli.Common-005 | Medium | Resolved | Testing coverage | `tests/Drivers/Cli/ZB.MOM.WW.OtOpcUa.Driver.Cli.Common.Tests/SnapshotFormatterTests.cs:27-37` |
+| Driver.FOCAS-003 | Medium | Resolved | Correctness & logic bugs | `FocasDriver.cs:71-79` |
+| Driver.FOCAS-004 | Medium | Resolved | OtOpcUa conventions | `FocasDriver.cs:374-379`, `WireFocasClient.cs:48-50` |
+| Driver.FOCAS-005 | Medium | Resolved | Concurrency & thread safety | `FocasDriver.cs:28`, `FocasDriver.cs:206-215`, `FocasDriver.cs:261`, `FocasDriver.cs:274` |
+| Driver.FOCAS-006 | Medium | Resolved | Error handling & resilience | `FocasDriver.cs:859-874`, `WireFocasClient.cs:22-31` |
+| Driver.FOCAS-012 | Medium | Resolved | Testing coverage | `FocasDriverFactoryExtensions.cs`, `FocasDriver.cs:495-629` (`FixedTreeLoopAsync`) |
+| Driver.Galaxy-003 | Medium | Resolved | Correctness & logic bugs | `Runtime/StatusCodeMap.cs:86` |
+| Driver.Galaxy-004 | Medium | Resolved | Correctness & logic bugs | `GalaxyDriver.cs:901` |
+| Driver.Galaxy-006 | Medium | Resolved | Concurrency & thread safety | `GalaxyDriver.cs:848-861` |
+| Driver.Galaxy-007 | Medium | Resolved | Concurrency & thread safety | `GalaxyDriver.cs:937-968` |
+| Driver.Galaxy-009 | Medium | Resolved | Error handling & resilience | `GalaxyDriver.cs:354-371` |
+| Driver.Galaxy-011 | Medium | Resolved | Performance & resource management | `GalaxyDriver.cs:411` |
+| Driver.Galaxy-014 | Medium | Resolved | Testing coverage | `src/Drivers/ZB.MOM.WW.OtOpcUa.Driver.Galaxy` (module-wide) |
+| Driver.Historian.Wonderware-002 | Medium | Resolved | Correctness and logic bugs | `Ipc/HistorianFrameHandler.cs:162`, `:181` |
+| Driver.Historian.Wonderware-003 | Medium | Resolved | Correctness and logic bugs | `Backend/HistorianDataSource.cs:320-323`, `:457-460` |
+| Driver.Historian.Wonderware-006 | Medium | Resolved | Error handling and resilience | `Ipc/PipeServer.cs:120-128` |
+| Driver.Historian.Wonderware-009 | Medium | Resolved | Performance and resource management | `Backend/HistorianDataSource.cs:382-395`, `Ipc/Contracts.cs:85-99` |
+| Driver.Historian.Wonderware.Client-002 | Medium | Resolved | Correctness & logic bugs | `WonderwareHistorianClient.cs:154-199`, `IAlarmHistorianSink.cs:66-74` |
+| Driver.Historian.Wonderware.Client-005 | Medium | Resolved | Error handling & resilience | `Ipc/FrameReader.cs:31-32` |
+| Driver.Historian.Wonderware.Client-007 | Medium | Resolved | Security | `WonderwareHistorianClient.cs:276` |
+| Driver.Historian.Wonderware.Client-009 | Medium | Resolved | Testing coverage | `tests/Drivers/ZB.MOM.WW.OtOpcUa.Driver.Historian.Wonderware.Client.Tests/WonderwareHistorianClientTests.cs` |
+| Driver.Modbus-002 | Medium | Resolved | Correctness & logic bugs | `ModbusDriver.cs:127-186` |
+| Driver.Modbus-004 | Medium | Resolved | Performance & resource management | `ModbusDriver.cs:1468-1473` |
+| Driver.Modbus-005 | Medium | Resolved | Correctness & logic bugs | `ModbusDriver.cs:777-798,323-330` |
+| Driver.Modbus-006 | Medium | Resolved | Error handling & resilience | `ModbusDriver.cs:514-524,532-550` |
+| Driver.Modbus.Addressing-002 | Medium | Resolved | Correctness & logic bugs | `ModbusAddressParser.cs:86-94` |
+| Driver.Modbus.Addressing-003 | Medium | Resolved | Correctness & logic bugs | `ModbusAddressParser.cs:405-406`, `ModbusAddressParser.cs:128` |
+| Driver.Modbus.Addressing-004 | Medium | Resolved | Correctness & logic bugs | `ModbusAddressParser.cs:182-194` |
+| Driver.Modbus.Addressing-005 | Medium | Resolved | Error handling & resilience | `ModbusAddressParser.cs:200-213` |
+| Driver.Modbus.Addressing-008 | Medium | Resolved | Testing coverage | `tests/Drivers/ZB.MOM.WW.OtOpcUa.Driver.Modbus.Addressing.Tests/` |
+| Driver.Modbus.Cli-001 | Medium | Resolved | Correctness & logic bugs | `src/Drivers/Cli/ZB.MOM.WW.OtOpcUa.Driver.Modbus.Cli/Commands/SubscribeCommand.cs:43-51` |
+| Driver.Modbus.Cli-002 | Medium | Resolved | Correctness & logic bugs | `src/Drivers/Cli/ZB.MOM.WW.OtOpcUa.Driver.Modbus.Cli/Commands/WriteCommand.cs:54-89` |
+| Driver.OpcUaClient-006 | Medium | Resolved | Concurrency & thread safety | `OpcUaClientDriver.cs:1330-1359` |
+| Driver.OpcUaClient-007 | Medium | Resolved | Concurrency & thread safety | `OpcUaClientDriver.cs:1374`, `:1376-1383`, `:508` |
+| Driver.OpcUaClient-008 | Medium | Resolved | Error handling & resilience | `OpcUaClientDriver.cs:1092-1099` |
+| Driver.OpcUaClient-009 | Medium | Resolved | Error handling & resilience | `OpcUaClientDriver.cs:560-564` |
+| Driver.OpcUaClient-010 | Medium | Resolved | Correctness & logic bugs | `OpcUaClientDriver.cs:823-824` |
+| Driver.OpcUaClient-012 | Medium | Resolved | Security | `OpcUaClientDriver.cs:210-217` |
+| Driver.OpcUaClient-013 | Medium | Resolved | Performance & resource management | `OpcUaClientDriver.cs:436-437` |
+| Driver.OpcUaClient-015 | Medium | Resolved | Testing coverage | `tests/Drivers/ZB.MOM.WW.OtOpcUa.Driver.OpcUaClient.Tests/*`, `tests/Drivers/ZB.MOM.WW.OtOpcUa.Driver.OpcUaClient.IntegrationTests/OpcUaClientSmokeTests.cs` |
+| Driver.S7-002 | Medium | Resolved | Correctness & logic bugs | `S7Driver.cs:350` |
+| Driver.S7-004 | Medium | Resolved | OtOpcUa conventions | `S7Driver.cs` (whole file) |
+| Driver.S7-008 | Medium | Resolved | Error handling & resilience | `S7Driver.cs:286` |
+| Driver.S7-012 | Medium | Resolved | Design-document adherence | `S7DriverOptions.cs:59`, `S7Driver.cs:457` |
+| Driver.S7-014 | Medium | Resolved | Testing coverage | `tests/Drivers/ZB.MOM.WW.OtOpcUa.Driver.S7.Tests/` |
+| Driver.S7.Cli-001 | Medium | Resolved | Error handling & resilience | `src/Drivers/Cli/ZB.MOM.WW.OtOpcUa.Driver.S7.Cli/Commands/WriteCommand.cs:65-80` |
+| Driver.S7.Cli-002 | Medium | Resolved | Design-document adherence | `src/Drivers/Cli/ZB.MOM.WW.OtOpcUa.Driver.S7.Cli/Commands/ReadCommand.cs:22-29`, `Commands/WriteCommand.cs:21-33`, `Commands/SubscribeCommand.cs:18-21`; `docs/Driver.S7.Cli.md:70-73,80-81` |
+| Driver.S7.Cli-003 | Medium | Resolved | Error handling & resilience | `src/Drivers/Cli/ZB.MOM.WW.OtOpcUa.Driver.S7.Cli/Commands/ProbeCommand.cs:38-50` |
+| Driver.TwinCAT-003 | Medium | Resolved | Correctness & logic bugs | `AdsTwinCATClient.cs:264-281`, `283-300` |
+| Driver.TwinCAT-005 | Medium | Resolved | OtOpcUa conventions | `TwinCATDriver.cs` (whole file), `AdsTwinCATClient.cs` (whole file) |
+| Driver.TwinCAT-009 | Medium | Resolved | Concurrency & thread safety | `TwinCATDriver.cs:80-99`, `41-72`, `366-388` |
+| Driver.TwinCAT-010 | Medium | Resolved | Error handling & resilience | `AdsTwinCATClient.cs:178-195` |
+| Driver.TwinCAT-011 | Medium | Resolved | Error handling & resilience | `TwinCATStatusMapper.cs:29-42` |
+| Driver.TwinCAT-012 | Medium | Resolved | Performance & resource management | `TwinCATDriver.cs:102`, `AdsTwinCATClient.cs:178-195` |
+| Server-003 | Medium | Resolved | Correctness & logic bugs | `src/Server/ZB.MOM.WW.OtOpcUa.Server/Phase7/RingBufferHistoryWriter.cs:96-119` |
+| Server-005 | Medium | Resolved | Concurrency & thread safety | `src/Server/ZB.MOM.WW.OtOpcUa.Server/Alarms/AlarmConditionService.cs:166`, `src/Server/ZB.MOM.WW.OtOpcUa.Server/OpcUa/DriverNodeManager.cs:303-311` |
+| Server-007 | Medium | Resolved | Error handling & resilience | `src/Server/ZB.MOM.WW.OtOpcUa.Server/OpcUa/OpcUaApplicationHost.cs:179-183` |
+| Server-010 | Medium | Resolved | Security | `src/Server/ZB.MOM.WW.OtOpcUa.Server/OpcUa/OpcUaServerOptions.cs:59`, `src/Server/ZB.MOM.WW.OtOpcUa.Server/OpcUa/OpcUaApplicationHost.cs:284-291` |
+| Server-011 | Medium | Resolved | Security | `src/Server/ZB.MOM.WW.OtOpcUa.Server/OpcUa/OpcUaApplicationHost.cs:322-346` |
+| Server-013 | Medium | Resolved | Design-document adherence | `src/Server/ZB.MOM.WW.OtOpcUa.Server/OpcUa/OpcUaServerOptions.cs:9-19`, `src/Server/ZB.MOM.WW.OtOpcUa.Server/OpcUa/OpcUaApplicationHost.cs:296-346`, `src/Server/ZB.MOM.WW.OtOpcUa.Server/Program.cs:89` |
@@ -0,0 +1,237 @@
+# Code Review — Server
+
+| Field | Value |
+|---|---|
+| Module | `src/Server/ZB.MOM.WW.OtOpcUa.Server` |
+| Reviewer | Claude Code |
+| Review date | 2026-05-22 |
+| Commit reviewed | `76d35d1` |
+| Status | Reviewed |
+| Open findings | 6 |
+
+## Checklist coverage
+
+| # | Category | Result |
+|---|---|---|
+| 1 | Correctness & logic bugs | Server-001, Server-002, Server-003 |
+| 2 | OtOpcUa conventions | Server-004 |
+| 3 | Concurrency & thread safety | Server-005, Server-006 |
+| 4 | Error handling & resilience | Server-007, Server-008 |
+| 5 | Security | Server-009, Server-010, Server-011 |
+| 6 | Performance & resource management | Server-012 |
+| 7 | Design-document adherence | Server-013 |
+| 8 | Code organization & conventions | Server-014 |
+| 9 | Testing coverage | No issues found |
+| 10 | Documentation & comments | Server-015 |
+
+## Findings
+
+### Server-001
+| Field | Value |
+|---|---|
+| Severity | Critical |
+| Category | Correctness & logic bugs |
+| Location | `src/Server/ZB.MOM.WW.OtOpcUa.Server/OpcUa/DriverNodeManager.cs:1791` |
+| Status | Resolved |
+
+**Description:** `WriteNodeIdUnknown` calls itself unconditionally as its first statement, then sets `errors[i]`. Unbounded recursion with no base case overflows the stack. Called from all four `HistoryRead*` overrides whenever a HistoryRead targets a node whose `NodeId` cannot be resolved to a driver full reference. Any client issuing such a HistoryRead triggers an uncatchable `StackOverflowException` that terminates the process — a remotely-triggerable DoS.
+
+**Recommendation:** Replace the self-call with the result-slot assignment mirroring `WriteUnsupported`/`WriteInternalError`: `results[i] = new OpcHistoryReadResult { StatusCode = StatusCodes.BadNodeIdUnknown };` then `errors[i] = StatusCodes.BadNodeIdUnknown;`.
+
+**Resolution:** Resolved 2026-05-22 — replaced the unconditional self-call in `WriteNodeIdUnknown` with the result-slot assignment (`results[i] = new OpcHistoryReadResult { StatusCode = StatusCodes.BadNodeIdUnknown }`), mirroring `WriteUnsupported`/`WriteInternalError`; the helper is now `internal` for testability. Regression test `DriverNodeManagerHistoryMappingTests.WriteNodeIdUnknown_returns_BadNodeIdUnknown_without_unbounded_recursion` runs the helper on a small-stack worker thread and asserts it returns promptly with `BadNodeIdUnknown`.
+
+### Server-002
+| Field | Value |
+|---|---|
+| Severity | High |
+| Category | Correctness & logic bugs |
+| Location | `src/Server/ZB.MOM.WW.OtOpcUa.Server/Security/AuthorizationGate.cs:60-63` |
+| Status | Resolved |
+
+**Description:** `IsAllowed` does `if (decision.IsAllowed) return true; return !_strictMode;`. When a session carries resolved LDAP groups and the evaluator returns an explicit deny, lax mode (default) overrides it to `true`. The lax fallback is intended only for sessions lacking LDAP groups / missing tries, but here it also nullifies authored `NodeAcl` deny rules for fully-resolved sessions. Per-tag deny ACLs do nothing until `StrictMode` is on.
+
+**Recommendation:** Distinguish "indeterminate / no grant" from "explicit deny." Fall through to `!_strictMode` only when indeterminate; an explicit deny returns `false` regardless of mode. Extend `AuthorizeDecision` with an `IsExplicitDeny` flag if needed.
+
+**Resolution:** Resolved 2026-05-22 — `AuthorizationGate.IsAllowed` now switches on the evaluator's `AuthorizationVerdict`: `Allow` returns true, `Denied` (explicit deny rule matched) returns false in both strict and lax mode, and only the indeterminate `NotGranted` case falls through to `!_strictMode`. The existing `AuthorizationVerdict.Denied` tri-state member is now honoured rather than collapsed into the lax fallback. Regression tests `ExplicitDeny_LaxMode_Denies` / `ExplicitDeny_StrictMode_Denies` / `NotGranted_LaxMode_Allows` / `NotGranted_StrictMode_Denies` in `AuthorizationGateTests` cover all four verdict×mode combinations via a fixed-verdict evaluator stub.
+
+### Server-003
+| Field | Value |
+|---|---|
+| Severity | Medium |
+| Category | Correctness & logic bugs |
+| Location | `src/Server/ZB.MOM.WW.OtOpcUa.Server/Phase7/RingBufferHistoryWriter.cs:96-119` |
+| Status | Resolved |
+
+**Description:** `ReadRawAsync`'s XML doc claims "newest-first," but `TagRingBuffer.Snapshot()` returns oldest-to-newest and the loop preserves that order — so results are oldest-first. Also `maxValuesPerNode` is capped against total buffer size *before* the `[startUtc, endUtc)` filter, so a paged read returns the oldest in-window samples, contradicting the doc and usual HistoryRead expectations.
+
+**Recommendation:** Make code and doc agree on ordering (raw HistoryRead is normally ascending source-timestamp). Apply `maxValuesPerNode` to the in-window count, not the whole buffer.
+
+**Resolution:** Resolved 2026-05-22 — corrected XML doc from "newest-first" to "oldest-first (ascending source timestamp, matching OPC UA Part 11 §6.4 raw-values default)"; moved `maxValuesPerNode` cap inside the time-window loop so the limit applies only to in-window results, not the whole buffer snapshot.
+
+### Server-004
+| Field | Value |
+|---|---|
+| Severity | Low |
+| Category | OtOpcUa conventions |
+| Location | `src/Server/ZB.MOM.WW.OtOpcUa.Server/OpcUa/OtOpcUaServer.cs:187-200` |
+| Status | Open |
+
+**Description:** `RoleBasedIdentity` declares its own `Display` property, but the base `UserIdentity` already has a settable `DisplayName`. `DriverNodeManager.ResolveCallUser`/`RouteScriptedAlarmMethodCalls` read the base `DisplayName`, never `Display`. Since the ctor passes only `userName` to base, `DisplayName` resolves to the username — so scripted-alarm Ack/Confirm/Shelve audit entries record the raw username, not the LDAP-resolved display name the comment promises. `Display` is dead code.
+
+**Recommendation:** Drop `Display`; set the base `DisplayName = displayName ?? userName;`. Verify `ResolveCallUser` yields the resolved display name.
+
+**Resolution:** _(open)_
+
+### Server-005
+| Field | Value |
+|---|---|
+| Severity | Medium |
+| Category | Concurrency & thread safety |
+| Location | `src/Server/ZB.MOM.WW.OtOpcUa.Server/Alarms/AlarmConditionService.cs:166`, `src/Server/ZB.MOM.WW.OtOpcUa.Server/OpcUa/DriverNodeManager.cs:303-311` |
+| Status | Resolved |
+
+**Description:** `OnValueChanged` raises `TransitionRaised` on the value-change thread; the subscriber `OnAlarmServiceTransition` drives `ConditionSink.OnTransition` → `alarm.ReportEvent`. `DriverNodeManager.Dispose` detaches the handler but does not synchronise against an in-flight `Invoke`. The service is process-shared across drivers, so a transition can dispatch to a `ConditionSink` whose `DriverNodeManager` is concurrently being disposed → `ReportEvent` on a torn-down node manager.
+
+**Recommendation:** Guard `OnAlarmServiceTransition` with a `_disposed` check under `Lock` before `sink.OnTransition`. Document that handlers must tolerate invocation during their owner's disposal.
+
+**Resolution:** Resolved 2026-05-22 — added `_nodeManagerDisposed` field; `Dispose(bool)` now sets it under `Lock` before detaching the handler; `OnAlarmServiceTransition` checks the flag under the same `Lock` and exits early, preventing forwarding to a sink after the node manager has begun disposal.
+
+### Server-006
+| Field | Value |
+|---|---|
+| Severity | Low |
+| Category | Concurrency & thread safety |
+| Location | `src/Server/ZB.MOM.WW.OtOpcUa.Server/OpcUa/DriverNodeManager.cs:478-482, 1342-1348` |
+| Status | Open |
+
+**Description:** `OnReadValue`/`OnWriteValue` are synchronous stack hooks that block on async driver calls via `.GetAwaiter().GetResult()` with `CancellationToken.None`. With `MaxRequestThreadCount = 100`, a burst of reads/writes into a stalled driver pins request threads for the full pipeline timeout, exhausting the pool and stalling unrelated sessions. The call cannot be cancelled by a client timeout.
+
+**Recommendation:** Derive a `CancellationToken` from the `OperationContext` / `TransportQuotas.OperationTimeout` so a stuck driver call is abandoned. Longer term, use the stack's async service overrides if available.
+
+**Resolution:** _(open)_
+
+### Server-007
+| Field | Value |
+|---|---|
+| Severity | Medium |
+| Category | Error handling & resilience |
+| Location | `src/Server/ZB.MOM.WW.OtOpcUa.Server/OpcUa/OpcUaApplicationHost.cs:179-183` |
+| Status | Resolved |
+
+**Description:** `HealthEndpointsHost` is built without a `configDbHealthy` delegate, so the default `() => true` is used — `/healthz` always reports `configDbReachable = true` and never 503s on a DB outage. `_staleConfigFlag` is also never supplied by `Program.cs`, so the stale-config signal is inert too. `/healthz` degenerates to a pure liveness probe; operators get a false-healthy during a DB outage.
+
+**Recommendation:** Wire a real config-DB probe (cheap cached `SELECT 1`) into `HealthEndpointsHost`, and register `StaleConfigFlag` in `Program.cs`. Or move DB health to `/readyz` and drop the misleading `configDbReachable` field.
+
+**Resolution:** Resolved 2026-05-22 — added `Func<bool>? configDbHealthy` parameter to `OpcUaApplicationHost` (defaults null, backward-compatible); `Program.cs` constructs a `DbHealthCache` that calls `CanConnectAsync` every 10 s and caches the result, then passes `() => dbHealthCache.IsHealthy`; `/healthz` now reflects real DB reachability and returns 503 on a DB outage (unless stale-config cache is warm).
+
+### Server-008
+| Field | Value |
+|---|---|
+| Severity | Low |
+| Category | Error handling & resilience |
+| Location | `src/Server/ZB.MOM.WW.OtOpcUa.Server/OpcUa/DriverNodeManager.cs:736` |
+| Status | Open |
+
+**Description:** `RouteScriptedAlarmMethodCalls` marks a handled slot by setting `errors[i] = ServiceResult.Good`, assuming `base.Call` skips non-null *Good* error slots. The stack and `GateCallMethodRequests` only ever pre-populate *Bad* slots; the skip-on-Good assumption is not a guaranteed SDK contract. If `base.Call` re-dispatches, the engine method and the stack's built-in Part 9 handler both fire — double transition.
+
+**Recommendation:** Verify against the pinned SDK whether `base.Call` skips Good-pre-populated slots. If not, exclude routed slots from `methodsToCall` before `base.Call`. Add a test asserting exactly-once engine transition for a routed Acknowledge.
+
+**Resolution:** _(open)_
+
+### Server-009
+| Field | Value |
+|---|---|
+| Severity | High |
+| Category | Security |
+| Location | `src/Server/ZB.MOM.WW.OtOpcUa.Server/Security/LdapOptions.cs:44`, `src/Server/ZB.MOM.WW.OtOpcUa.Server/Program.cs:74` |
+| Status | Resolved |
+
+**Description:** `AllowInsecureLdap` defaults to `true` (and `Program.cs` reads `?? true`); `UseTls` defaults to `false`. Out of the box, usernames and plaintext passwords are bound to LDAP over an unencrypted socket. A production deployment enabling LDAP without explicitly setting `AllowInsecureLdap=false` ships credentials in clear text on the server→LDAP hop.
+
+**Recommendation:** Default `AllowInsecureLdap` to `false` in both the property initializer and the `Program.cs` fallback. Log a startup warning when LDAP is enabled with `UseTls=false && AllowInsecureLdap=true`.
+
+**Resolution:** Resolved 2026-05-22 — `LdapOptions.AllowInsecureLdap` now defaults to `false` (secure-by-default) and `Program.cs`'s config fallback reads `?? false`. `Program.cs` logs a startup `Log.Warning` when LDAP is enabled with `UseTls=false && AllowInsecureLdap=true`, flagging the clear-text credential hop. Regression tests in `LdapOptionsTests` assert the new secure defaults.
+
+### Server-010
+| Field | Value |
+|---|---|
+| Severity | Medium |
+| Category | Security |
+| Location | `src/Server/ZB.MOM.WW.OtOpcUa.Server/OpcUa/OpcUaServerOptions.cs:59`, `src/Server/ZB.MOM.WW.OtOpcUa.Server/OpcUa/OpcUaApplicationHost.cs:284-291` |
+| Status | Resolved |
+
+**Description:** `AutoAcceptUntrustedClientCertificates` defaults to `true` (`Program.cs` reads `?? true`). `BuildConfiguration` wires a handler that accepts any client cert failing with `BadCertificateUntrusted`. A deployment that forgets to flip the flag accepts every untrusted client cert, defeating the PKI trust list. With the always-present `None` policy, the default posture is fully open.
+
+**Recommendation:** Default `AutoAcceptUntrustedClientCertificates` to `false`; keep auto-accept as opt-in dev convenience. `docs/security.md` already shows `false` — align code to doc.
+
+**Resolution:** Resolved 2026-05-22 — `OpcUaServerOptions.AutoAcceptUntrustedClientCertificates` property initialiser changed from `true` to `false` (secure by default, aligning with `docs/security.md`); `Program.cs` config fallback changed from `?? true` to `?? false`.
+
+### Server-011
+| Field | Value |
+|---|---|
+| Severity | Medium |
+| Category | Security |
+| Location | `src/Server/ZB.MOM.WW.OtOpcUa.Server/OpcUa/OpcUaApplicationHost.cs:322-346` |
+| Status | Resolved |
+
+**Description:** `BuildUserTokenPolicies` advertises a `UserName` token policy only when `SecurityProfile == Basic256Sha256SignAndEncrypt && Ldap.Enabled`. With the default `SecurityProfile = None` and `Ldap.Enabled = true`, the LDAP authenticator is wired but no UserName policy is advertised — clients cannot present credentials; the only path in is Anonymous. The operator's intent is silently not honoured, with no diagnostic.
+
+**Recommendation:** Validate config at startup and warn/fail when `Ldap.Enabled = true` but no UserName policy is advertised. Allow UserName tokens on any non-None profile (they are stack-encrypted regardless, per `docs/security.md`).
+
+**Resolution:** Resolved 2026-05-22 — `BuildUserTokenPolicies` now advertises a `UserName` token policy whenever `Ldap.Enabled && SecurityProfile != None` (previously required `== Basic256Sha256SignAndEncrypt`); `StartAsync` logs a `LogWarning` at startup when `Ldap.Enabled = true` but `SecurityProfile = None`, surfacing the misconfiguration before clients connect.
+
+### Server-012
+| Field | Value |
+|---|---|
+| Severity | Low |
+| Category | Performance & resource management |
+| Location | `src/Server/ZB.MOM.WW.OtOpcUa.Server/Hosting/PeerHttpProbeLoop.cs:78-79` |
+| Status | Open |
+
+**Description:** `ProbeAsync` creates an `IHttpClientFactory` client and mutates `client.Timeout` on every 2-second probe tick. The timeout belongs on the request or on the named-client registration, not set per call on a factory-vended instance.
+
+**Recommendation:** Configure the timeout once via `AddHttpClient(HttpClientName).ConfigureHttpClient(...)`, or use a per-request linked `CancellationTokenSource(_options.HttpProbeTimeout)`; drop the per-call `client.Timeout` mutation.
+
+**Resolution:** _(open)_
+
+### Server-013
+| Field | Value |
+|---|---|
+| Severity | Medium |
+| Category | Design-document adherence |
+| Location | `src/Server/ZB.MOM.WW.OtOpcUa.Server/OpcUa/OpcUaServerOptions.cs:9-19`, `src/Server/ZB.MOM.WW.OtOpcUa.Server/OpcUa/OpcUaApplicationHost.cs:296-346`, `src/Server/ZB.MOM.WW.OtOpcUa.Server/Program.cs:89` |
+| Status | Resolved |
+
+**Description:** `docs/security.md` documents 7 transport security profiles and `CLAUDE.md` references a `SecurityProfileResolver`. The code's `OpcUaSecurityProfile` enum has only `None` and `Basic256Sha256SignAndEncrypt`; `BuildSecurityPolicies` adds a policy only for the latter; `SecurityProfileResolver` does not exist in the repo (grep finds it only in docs). `Basic256Sha256-Sign` and all Aes profiles are unimplemented, and `Program.cs:89`'s `Enum.TryParse` silently selects `None` for an unrecognised profile string.
+
+**Recommendation:** Reconcile code and docs — implement the missing profiles + `SecurityProfileResolver`, or trim `docs/security.md` / `CLAUDE.md` to the two supported profiles. At minimum, log a warning when a configured `SecurityProfile` fails to parse instead of silently using `None`.
+
+**Resolution:** Resolved 2026-05-22 — replaced the silent `Enum.TryParse ?? None` fallback in `Program.cs` with a `ParseSecurityProfile` helper that produces a warning string listing supported profiles when the configured value is unrecognised; the warning is emitted via `Log.Warning` at startup before the host builds, making the misconfiguration immediately visible. Implementing the missing 5 profiles is tracked as a doc-to-code gap rather than a single finding fix.
+
+### Server-014
+| Field | Value |
+|---|---|
+| Severity | Low |
+| Category | Code organization & conventions |
+| Location | `src/Server/ZB.MOM.WW.OtOpcUa.Server/SealedBootstrap.cs` |
+| Status | Open |
+
+**Description:** `SealedBootstrap` claims in its xml-doc to "close release blocker #2" by consuming the generation-sealed cache + resilient reader + stale-config flag, but `Program.cs` registers and uses `NodeBootstrap` instead. `SealedBootstrap` is never registered in DI nor referenced by `OpcUaServerService` — it and its `StaleConfigFlag` plumbing are dead in the production wire-up; the release blocker remains open in practice.
+
+**Recommendation:** Either register `SealedBootstrap` (with `GenerationSealedCache`/`ResilientConfigReader`/`StaleConfigFlag`) and wire `StaleConfigFlag` into the health host, or delete `SealedBootstrap` and correct the release-readiness doc.
+
+**Resolution:** _(open)_
+
+### Server-015
+| Field | Value |
+|---|---|
+| Severity | Low |
+| Category | Documentation & comments |
+| Location | `src/Server/ZB.MOM.WW.OtOpcUa.Server/OpcUa/OtOpcUaServer.cs:16-21`, `src/Server/ZB.MOM.WW.OtOpcUa.Server/OpcUa/OpcUaServerOptions.cs:21-26` |
+| Status | Open |
+
+**Description:** `OtOpcUaServer`'s class doc still says "PR 16 minimum-viable scope ... no security ... LDAP + security profiles are deferred." `OpcUaServerOptions`'s says "PR 17 minimum-viable scope: no LDAP, no security profiles beyond None." Both are stale — the class now does LDAP UserName auth, anonymous-role mapping, and a configurable security profile. A reader would wrongly conclude the server has no authentication.
+
+**Recommendation:** Update both class summaries to describe current behaviour and drop the "deferred to a future PR" language.
+
+**Resolution:** _(open)_
@@ -0,0 +1,53 @@
+# Code Review — &lt;Module&gt;
+
+<!-- Template for a per-module findings file. Copy to code-reviews/<Module>/findings.md.
+     See ../../REVIEW-PROCESS.md for the full process. The base README.md is generated
+     from these files by regen-readme.py — do not edit README.md by hand. -->
+
+| Field | Value |
+|---|---|
+| Module | `src/<area>/ZB.MOM.WW.OtOpcUa.<Module>` |
+| Reviewer | <name> |
+| Review date | <YYYY-MM-DD> |
+| Commit reviewed | `<short-sha>` |
+| Status | Not started |
+| Open findings | 0 |
+
+## Checklist coverage
+
+A comprehensive review completes every category, recording "No issues found" where
+a category produced nothing rather than leaving it blank.
+
+| # | Category | Result |
+|---|---|---|
+| 1 | Correctness & logic bugs | _pending_ |
+| 2 | OtOpcUa conventions | _pending_ |
+| 3 | Concurrency & thread safety | _pending_ |
+| 4 | Error handling & resilience | _pending_ |
+| 5 | Security | _pending_ |
+| 6 | Performance & resource management | _pending_ |
+| 7 | Design-document adherence | _pending_ |
+| 8 | Code organization & conventions | _pending_ |
+| 9 | Testing coverage | _pending_ |
+| 10 | Documentation & comments | _pending_ |
+
+## Findings
+
+<!-- One ### entry per finding. IDs are <Module>-NNN, sequential within the module,
+     never reused. Findings are never deleted — close them by changing Status and
+     completing Resolution. -->
+
+### <Module>-001
+
+| Field | Value |
+|---|---|
+| Severity | Critical / High / Medium / Low |
+| Category | one of the 10 checklist categories |
+| Location | `path/to/File.cs:NN` |
+| Status | Open / In Progress / Resolved / Won't Fix / Deferred |
+
+**Description:** What is wrong and why it matters.
+
+**Recommendation:** Concrete suggested fix.
+
+**Resolution:** _(empty until closed; on close, record the fixing commit SHA, the date, and a one-line description of the fix)_
@@ -0,0 +1,78 @@
+# Prompt — resolve open code-review findings
+
+Reusable orchestration prompt for clearing the `code-reviews/` backlog. Paste it
+to a fresh agent when you want the remaining findings worked through.
+
+---
+
+Resolve all open code-review findings (every severity), following the workflow
+in `REVIEW-PROCESS.md`.
+
+## Setup
+
+- Read `code-reviews/README.md` for the open findings and `REVIEW-PROCESS.md`
+  for the workflow. Group the open findings by module.
+- A module is one folder under `code-reviews/` — one `src/` project or one
+  `tests/` project, named with the `ZB.MOM.WW.OtOpcUa.` prefix stripped. The
+  module→project mapping is in `REVIEW-PROCESS.md` section 1; the build/test
+  commands are in `CLAUDE.md` ("Build Commands").
+
+## Dispatch — one general-purpose subagent per module, in batches of ~5 modules
+
+Each subagent, for every open finding in its assigned module, must:
+
+- Verify the finding's root cause against the actual source. Do NOT trust the
+  finding text — if it is wrong or misclassified, re-triage it (correct the
+  severity/description in that module's `findings.md`) instead of forcing a fix.
+- Use real TDD: write the regression test FIRST and run it to confirm it fails,
+  THEN implement the root-cause fix, THEN confirm it passes. (Do not use
+  `git stash` — parallel agents would race on the shared stash stack.)
+- The regression test belongs in the reviewed project's own test project — a
+  finding in `src/.../ZB.MOM.WW.OtOpcUa.Driver.Galaxy` gets its test in
+  `tests/.../ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Tests`.
+- Run that module's build and test suite and confirm it is green:
+  - Build + unit-test the affected project, e.g.
+    `dotnet build src/Drivers/ZB.MOM.WW.OtOpcUa.Driver.Galaxy/...` and
+    `dotnet test tests/Drivers/ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Tests/...`.
+  - A single test: `dotnet test --filter "FullyQualifiedName~MyClass.MyMethod"`.
+  - `*.IntegrationTests` need their Docker fixture up — bring it up with
+    `lmxopcua-fix up <driver> <profile>` (see `CLAUDE.md` "Docker Workflow").
+    DB-backed `*.Configuration.Tests`, `*.Admin.Tests`, and `*.Server.Tests`
+    need the central SQL Server. If a fixture/service is unavailable, document
+    why the suite was skipped rather than reporting it green.
+  - For a change that crosses project boundaries, build each affected project;
+    a whole-solution check is `dotnet build ZB.MOM.WW.OtOpcUa.slnx`.
+- Update only that module's `code-reviews/<Module>/findings.md`: set each
+  resolved finding's Status to `Resolved` with a Resolution note describing the
+  fix (the orchestrator appends the fixing commit SHA), and update the header
+  "Open findings" count.
+- CONSTRAINTS: edit only the source and test files needed for the assigned
+  module's findings, plus that module's own `findings.md`. Do NOT edit
+  `code-reviews/README.md`. Do NOT commit. Do NOT touch another module's
+  `findings.md`.
+- Report a summary: each finding — root-cause confirmation, the fix, test names,
+  and any re-triage.
+
+Batch so that no two subagents in the same batch write to the same project. In
+particular do not review a `src/` project and its matching `*.Tests` project in
+the same batch — a finding in the source project adds its regression test to
+that test project.
+
+## After each batch returns (orchestrator does this — keep your own context lean)
+
+- Build and test every project the batch touched, using the `CLAUDE.md`
+  commands; confirm clean. For a wide change, `dotnet build ZB.MOM.WW.OtOpcUa.slnx`.
+- Commit per module — one commit per module, message referencing the finding
+  IDs. Record the fixing commit SHA in each finding's Resolution.
+- Regenerate the index: `python code-reviews/regen-readme.py`, then
+  `python code-reviews/regen-readme.py --check` to confirm it is consistent;
+  stage `code-reviews/README.md`. (Use `python` — the bare `python3` alias on
+  this box resolves to the Windows Store stub and fails.) You may stage
+  `README.md` with each module's commit, or commit it once per batch after the
+  script runs.
+- Push.
+
+## Continue
+
+Continue batch by batch until all findings are Resolved or re-triaged. If a
+finding needs a design decision, skip it and surface it rather than guessing.
@@ -0,0 +1,241 @@
+#!/usr/bin/env python3
+"""Regenerate code-reviews/README.md from the per-module findings.md files.
+
+The per-module findings.md files are the source of truth. This script aggregates
+them into the single cross-module README.md (module status + pending/closed
+finding tables).
+
+Usage:
+    python code-reviews/regen-readme.py          # rewrite README.md
+    python code-reviews/regen-readme.py --check  # exit 1 if stale or inconsistent
+
+`--check` fails when README.md is out of date OR when a module's header
+`Open findings` count disagrees with its finding statuses, or a finding
+carries an unrecognised Status value.
+"""
+from __future__ import annotations
+
+import re
+import sys
+from pathlib import Path
+
+ROOT = Path(__file__).resolve().parent
+README = ROOT / "README.md"
+
+PENDING_STATUSES = {"Open", "In Progress"}
+KNOWN_STATUSES = {"Open", "In Progress", "Resolved", "Won't Fix", "Deferred"}
+SEVERITY_ORDER = {"Critical": 0, "High": 1, "Medium": 2, "Low": 3}
+
+GENERATED_NOTE = (
+    "<!-- GENERATED FILE - do not edit by hand. "
+    "Regenerate with: python code-reviews/regen-readme.py -->"
+)
+
+
+def cell(value: str) -> str:
+    """Escape a value for safe inclusion in a markdown table cell."""
+    return value.replace("|", "\\|").strip()
+
+
+def summarize(value: str, limit: int = 240) -> str:
+    """Trim a long description to a single-cell-friendly summary."""
+    value = value.strip()
+    if len(value) <= limit:
+        return value
+    return value[: limit - 1].rstrip() + "…"
+
+
+def first_table(text: str) -> dict[str, str]:
+    """Parse the first contiguous block of '| key | value |' rows into a dict."""
+    rows: dict[str, str] = {}
+    started = False
+    for line in text.splitlines():
+        stripped = line.strip()
+        if stripped.startswith("|"):
+            started = True
+            cells = [c.strip() for c in stripped.strip("|").split("|")]
+            if len(cells) >= 2:
+                key, value = cells[0], cells[1]
+                if key and not set(key) <= {"-", ":"} and key != "Field":
+                    rows[key] = value
+        elif started:
+            break
+    return rows
+
+
+def parse_module(findings_path: Path) -> dict:
+    """Parse one module's findings.md into its header and finding list."""
+    text = findings_path.read_text(encoding="utf-8")
+    module = findings_path.parent.name
+    parts = re.split(r"^##\s+Findings\s*$", text, maxsplit=1, flags=re.M)
+    header = first_table(parts[0])
+    findings: list[dict] = []
+    if len(parts) > 1:
+        for chunk in re.split(r"^###\s+", parts[1], flags=re.M)[1:]:
+            fid = chunk.splitlines()[0].strip()
+            tbl = first_table(chunk)
+            desc_m = re.search(
+                r"\*\*Description:\*\*\s*(.*?)(?=\n\*\*|\Z)", chunk, re.S
+            )
+            desc = re.sub(r"\s+", " ", desc_m.group(1)).strip() if desc_m else ""
+            findings.append(
+                {
+                    "id": fid,
+                    "severity": tbl.get("Severity", ""),
+                    "category": tbl.get("Category", ""),
+                    "location": tbl.get("Location", ""),
+                    "status": tbl.get("Status", ""),
+                    "description": desc,
+                }
+            )
+    return {"module": module, "header": header, "findings": findings}
+
+
+def build_readme(modules: list[dict]) -> str:
+    modules = sorted(modules, key=lambda m: m["module"])
+    all_findings = [
+        dict(f, module=m["module"]) for m in modules for f in m["findings"]
+    ]
+    pending = [f for f in all_findings if f["status"] in PENDING_STATUSES]
+    closed = [
+        f
+        for f in all_findings
+        if f["status"] and f["status"] not in PENDING_STATUSES
+    ]
+
+    def sev_key(f: dict) -> tuple:
+        return (SEVERITY_ORDER.get(f["severity"], 9), f["id"])
+
+    pending.sort(key=sev_key)
+    closed.sort(key=sev_key)
+
+    out: list[str] = [
+        "# Code Reviews",
+        "",
+        GENERATED_NOTE,
+        "",
+        "Cross-module code review index for the OtOpcUa server codebase "
+        "(`lmxopcua`). The review process is defined in "
+        "[../REVIEW-PROCESS.md](../REVIEW-PROCESS.md).",
+        "",
+        "Each module's `findings.md` is the source of truth; this file is generated "
+        "from them by `regen-readme.py` and must not be edited by hand.",
+        "",
+        "## Module status",
+        "",
+        "| Module | Reviewer | Date | Commit | Status | Open | Total |",
+        "|---|---|---|---|---|---|---|",
+    ]
+    if not modules:
+        out.append(
+            "| _no modules reviewed yet_ |  |  |  |  |  |  |"
+        )
+    for m in modules:
+        h = m["header"]
+        open_n = sum(
+            1 for f in m["findings"] if f["status"] in PENDING_STATUSES
+        )
+        out.append(
+            f"| [{m['module']}]({m['module']}/findings.md) "
+            f"| {cell(h.get('Reviewer', ''))} "
+            f"| {cell(h.get('Review date', ''))} "
+            f"| {cell(h.get('Commit reviewed', ''))} "
+            f"| {cell(h.get('Status', ''))} "
+            f"| {open_n} | {len(m['findings'])} |"
+        )
+
+    out += ["", "## Pending findings", ""]
+    out.append(
+        "Findings with status `Open` or `In Progress`, ordered by severity."
+    )
+    out.append("")
+    if pending:
+        out.append("| ID | Severity | Category | Location | Description |")
+        out.append("|---|---|---|---|---|")
+        for f in pending:
+            out.append(
+                f"| {cell(f['id'])} | {cell(f['severity'])} "
+                f"| {cell(f['category'])} | {cell(f['location'])} "
+                f"| {cell(summarize(f['description']))} |"
+            )
+    else:
+        out.append("_No pending findings._")
+
+    out += ["", "## Closed findings", ""]
+    out.append("Findings with status `Resolved`, `Won't Fix`, or `Deferred`.")
+    out.append("")
+    if closed:
+        out.append("| ID | Severity | Status | Category | Location |")
+        out.append("|---|---|---|---|---|")
+        for f in closed:
+            out.append(
+                f"| {cell(f['id'])} | {cell(f['severity'])} "
+                f"| {cell(f['status'])} | {cell(f['category'])} "
+                f"| {cell(f['location'])} |"
+            )
+    else:
+        out.append("_No closed findings._")
+
+    return "\n".join(out) + "\n"
+
+
+def find_inconsistencies(modules: list[dict]) -> list[str]:
+    """Return human-readable problems in the per-module findings.md files.
+
+    Checks that each module header's `Open findings` count agrees with its
+    finding statuses, and that every finding carries a known Status value.
+    """
+    issues: list[str] = []
+    for m in modules:
+        open_n = sum(
+            1 for f in m["findings"] if f["status"] in PENDING_STATUSES
+        )
+        declared = m["header"].get("Open findings", "").strip()
+        if declared != str(open_n):
+            issues.append(
+                f"{m['module']}: header 'Open findings' = '{declared}' but "
+                f"{open_n} finding(s) are Open/In Progress"
+            )
+        for f in m["findings"]:
+            if f["status"] not in KNOWN_STATUSES:
+                issues.append(
+                    f"{m['module']}: finding {f['id']} has unrecognised "
+                    f"Status '{f['status']}'"
+                )
+    return issues
+
+
+def main(argv: list[str]) -> int:
+    check = "--check" in argv[1:]
+    module_dirs = sorted(
+        d
+        for d in ROOT.iterdir()
+        if d.is_dir() and d.name != "_template" and (d / "findings.md").is_file()
+    )
+    modules = [parse_module(d / "findings.md") for d in module_dirs]
+    content = build_readme(modules)
+    issues = find_inconsistencies(modules)
+    if check:
+        stale = (
+            README.read_text(encoding="utf-8") if README.exists() else ""
+        ) != content
+        for issue in issues:
+            print(f"inconsistent: {issue}", file=sys.stderr)
+        if stale:
+            print(
+                "code-reviews/README.md is stale - run regen-readme.py",
+                file=sys.stderr,
+            )
+        if stale or issues:
+            return 1
+        print("code-reviews/README.md is up to date and consistent.")
+        return 0
+    for issue in issues:
+        print(f"warning: {issue}", file=sys.stderr)
+    README.write_text(content, encoding="utf-8", newline="\n")
+    print(f"Wrote {README} ({len(modules)} modules).")
+    return 0
+
+
+if __name__ == "__main__":
+    raise SystemExit(main(sys.argv))
@@ -0,0 +1,165 @@
+#!/usr/bin/env python3
+"""Tests for regen-readme.py.
+
+Dependency-free: run with `python code-reviews/test_regen_readme.py`.
+Exits 0 if all tests pass, 1 otherwise.
+"""
+from __future__ import annotations
+
+import importlib.util
+import tempfile
+import traceback
+from pathlib import Path
+
+HERE = Path(__file__).resolve().parent
+
+# regen-readme.py is not an importable module name (hyphen), so load it by path.
+_spec = importlib.util.spec_from_file_location("regen_readme", HERE / "regen-readme.py")
+regen = importlib.util.module_from_spec(_spec)
+_spec.loader.exec_module(regen)
+
+FIXTURE = """# Code Review — Demo
+
+| Field | Value |
+|---|---|
+| Module | `src/Demo` |
+| Reviewer | Tester |
+| Review date | 2026-05-18 |
+| Commit reviewed | `abc1234` |
+| Status | Reviewed |
+| Open findings | 1 |
+
+## Findings
+
+### Demo-001
+
+| Field | Value |
+|---|---|
+| Severity | High |
+| Category | Security |
+| Location | `src/Demo/File.cs:10` |
+| Status | Open |
+
+**Description:** A first problem that matters.
+
+**Recommendation:** Fix it.
+
+**Resolution:** _(open)_
+
+### Demo-002
+
+| Field | Value |
+|---|---|
+| Severity | Low |
+| Category | Documentation & comments |
+| Location | `src/Demo/File.cs:20` |
+| Status | Resolved |
+
+**Description:** A second, minor problem.
+
+**Recommendation:** Tidy it.
+
+**Resolution:** Fixed in def5678 on 2026-05-18.
+"""
+
+
+def _parse_fixture() -> dict:
+    """Write FIXTURE to a temp Demo/findings.md and parse it."""
+    with tempfile.TemporaryDirectory() as tmp:
+        path = Path(tmp) / "Demo" / "findings.md"
+        path.parent.mkdir()
+        path.write_text(FIXTURE, encoding="utf-8")
+        return regen.parse_module(path)
+
+
+def test_first_table_skips_separator_and_field_header():
+    table = regen.first_table("| Field | Value |\n|---|---|\n| Severity | High |\n")
+    assert table == {"Severity": "High"}, table
+
+
+def test_parse_module_header():
+    m = _parse_fixture()
+    assert m["module"] == "Demo", m["module"]
+    assert m["header"]["Reviewer"] == "Tester"
+    assert m["header"]["Status"] == "Reviewed"
+    assert m["header"]["Open findings"] == "1"
+
+
+def test_parse_module_findings():
+    m = _parse_fixture()
+    assert len(m["findings"]) == 2, len(m["findings"])
+    first = m["findings"][0]
+    assert first["id"] == "Demo-001"
+    assert first["severity"] == "High"
+    assert first["category"] == "Security"
+    assert first["location"] == "`src/Demo/File.cs:10`"
+    assert first["status"] == "Open"
+    assert first["description"] == "A first problem that matters."
+    assert m["findings"][1]["status"] == "Resolved"
+
+
+def test_build_readme_splits_pending_and_closed():
+    readme = regen.build_readme([_parse_fixture()])
+    assert "## Pending findings" in readme
+    assert "## Closed findings" in readme
+    pending, closed = readme.split("## Closed findings", 1)
+    assert "Demo-001" in pending  # Open -> pending
+    assert "Demo-001" not in closed
+    assert "Demo-002" in closed  # Resolved -> closed
+    assert "_No pending findings._" not in pending
+
+
+def test_build_readme_handles_no_modules():
+    readme = regen.build_readme([])
+    assert "no modules reviewed yet" in readme
+    assert "_No pending findings._" in readme
+    assert "_No closed findings._" in readme
+
+
+def test_find_inconsistencies_clean_fixture():
+    assert regen.find_inconsistencies([_parse_fixture()]) == []
+
+
+def test_find_inconsistencies_detects_wrong_open_count():
+    m = _parse_fixture()
+    m["header"]["Open findings"] = "7"
+    issues = regen.find_inconsistencies([m])
+    assert len(issues) == 1 and "Open findings" in issues[0], issues
+
+
+def test_find_inconsistencies_detects_unknown_status():
+    m = _parse_fixture()
+    m["findings"][0]["status"] = "Bogus"
+    issues = regen.find_inconsistencies([m])
+    # Wrong status also shifts the open count, so expect the status issue present.
+    assert any("unrecognised Status" in i for i in issues), issues
+
+
+def test_summarize_truncates_long_text():
+    long = "x" * 500
+    out = regen.summarize(long)
+    assert len(out) <= 240 and out.endswith("…"), len(out)
+    assert regen.summarize("short") == "short"
+
+
+def main() -> int:
+    tests = sorted(
+        (name, fn)
+        for name, fn in globals().items()
+        if name.startswith("test_") and callable(fn)
+    )
+    failed = 0
+    for name, fn in tests:
+        try:
+            fn()
+            print(f"PASS {name}")
+        except Exception:  # noqa: BLE001 - test runner reports all failures
+            failed += 1
+            print(f"FAIL {name}")
+            traceback.print_exc()
+    print(f"\n{len(tests) - failed}/{len(tests)} passed.")
+    return 1 if failed else 0
+
+
+if __name__ == "__main__":
+    raise SystemExit(main())
@@ -110,7 +110,13 @@ AB CIP ALMD) route to AVEVA Historian via the Wonderware sidecar:
  present.
 - `SqliteStoreAndForwardSink` queues each transition to a local
  SQLite database and drains in the background via the resolved
-  writer.
+  writer. **The durability guarantee is bounded**: the queue capacity
+  defaults to 1,000,000 rows; under a sustained historian outage,
+  older non-dead-lettered rows are evicted (oldest first) to make
+  room for new events. The `HistorianSinkStatus.EvictedCount` counter
+  surfaces lifetime eviction events to the Admin UI
+  `/alarms/historian` diagnostics page so operators can detect silent
+  data loss without log scraping.
 - Sidecar (PR C.1 + C.2) forwards the events to `aahClientManaged`'s
  alarm-event write API; the live SDK call site is pinned during
  PR D.1's deploy-rig validation.
@@ -35,7 +35,7 @@ new ScriptedAlarmDefinition(

 ## Predicate evaluation

-Alarm predicates reuse the same Roslyn sandbox as virtual tags — `ScriptEvaluator<AlarmPredicateContext, bool>` compiles the source, `TimedScriptEvaluator` wraps it with the configured timeout (default from `TimedScriptEvaluator.DefaultTimeout`), and `DependencyExtractor` statically harvests the tag paths the script reads. The sandbox rules (forbidden types, cancellation, logging sinks) are documented in [VirtualTags.md](VirtualTags.md); ScriptedAlarms does not redefine them.
+Alarm predicates reuse the same Roslyn sandbox as virtual tags — `ScriptEvaluator<AlarmPredicateContext, bool>` compiles the source, `TimedScriptEvaluator` wraps it with the configured timeout (default from `TimedScriptEvaluator.DefaultTimeout`), and `DependencyExtractor` statically harvests the tag paths the script reads. The sandbox rules (forbidden types, cancellation, logging sinks) are documented in [VirtualTags.md](VirtualTags.md); ScriptedAlarms does not redefine them. The known memory / CPU resource limits are documented there as well.

 `AlarmPredicateContext` (`AlarmPredicateContext.cs`) is the script's `ScriptContext` subclass:

@@ -18,7 +18,13 @@ User scripts are compiled via `Microsoft.CodeAnalysis.CSharp.Scripting` against

 `ScriptSandbox.Build` allow-lists exactly: `System.Private.CoreLib` (primitives + `Math` + `Convert`), `System.Linq`, `Core.Abstractions` (for `DataValueSnapshot` / `DriverDataType`), `Core.Scripting` (for `ScriptContext` + `Deadband`), `Serilog` (for `ILogger`), and the concrete context type's assembly. Pre-imported namespaces: `System`, `System.Linq`, `ZB.MOM.WW.OtOpcUa.Core.Abstractions`, `ZB.MOM.WW.OtOpcUa.Core.Scripting`.

-`ForbiddenTypeAnalyzer.ForbiddenNamespacePrefixes` currently denies `System.IO`, `System.Net`, `System.Diagnostics`, `System.Reflection`, `System.Threading.Thread`, `System.Runtime.InteropServices`, `Microsoft.Win32`. Matching is by prefix against the resolved symbol's containing namespace, so `System.Net` catches `System.Net.Http.HttpClient` and every subnamespace. `System.Environment` is explicitly allowed.
+`ForbiddenTypeAnalyzer.ForbiddenNamespacePrefixes` currently denies `System.IO`, `System.Net`, `System.Diagnostics`, `System.Reflection`, `System.Threading.Thread`, `System.Threading.Tasks`, `System.Runtime.InteropServices`, `Microsoft.Win32`. Matching is by prefix against the resolved symbol's containing namespace, so `System.Net` catches `System.Net.Http.HttpClient` and every subnamespace. `System.Threading.Tasks` is denied because scripts are synchronous predicates with no legitimate need to start background tasks — a `Task.Run` fan-out would outlive the per-evaluation timeout entirely (Core.Scripting-003). `System.Environment`, `System.AppDomain`, `System.GC`, and `System.Activator` are denied type-granularly via `ForbiddenFullTypeNames` because they live directly in the `System` namespace (which is otherwise allowed for primitives) — `Environment.Exit` / `FailFast` terminate the host process outright (Core.Scripting-001).
+
+#### Known resource limits (accepted trade-offs)
+
+The sandbox cannot prevent a script from **allocating unbounded memory**. A script calling `new byte[int.MaxValue]` repeatedly, or accumulating a large LINQ enumeration, can drive the server process to `OutOfMemoryException` before the 250 ms timeout fires. Script authoring is gated behind the Admin permission as the primary control; the test-harness preview (Stream F.4) allows operators to exercise a script before publishing. Out-of-process script execution is a v3 concern.
+
+Similarly, **`System.Threading.Tasks` is now denied** (Core.Scripting-003), which prevents `Task.Run` / `Parallel` fan-out that would spawn background work outliving the timeout. However, a tight CPU-bound loop still runs on its thread-pool thread after `WaitAsync` returns — see the `TimedScriptEvaluator` remarks for detail. The orphaned thread is reclaimed when the Roslyn runtime eventually returns; in practice the operator fixes the script once the structured timeout warning appears in `scripts-*.log`.

 ### Compile cache (`CompiledScriptCache<TContext, TResult>`)

@@ -212,36 +212,40 @@ x64, which is not bitness-constrained like the worker). C.1 is independently
 unblockable from A.2 if the goal is to wire up the scripted-alarm historian
 path.

-**Current state**:
+**Current state (DONE — code)**:

-`SdkAlarmHistorianWriteBackend` in `src\MxGateway.Worker\MxAccess\` is a
-placeholder returning `RetryPlease`. The lmxopcua sidecar's `WriteAlarmEvents`
-IPC slot is defined in `Ipc\Contracts.cs` but `Program.cs` constructs
-`HistorianFrameHandler` without an `alarmWriter` (line 57 per the alarms plan).
-The `IAlarmEventWriter` interface exists; only the production implementation
-and the consumer wiring are missing.
+C.1 shipped. `SdkAlarmHistorianWriteBackend.WriteBatchAsync` writes through the
+real SDK entry point — **`HistorianAccess.AddStreamedValue(HistorianEvent, out
+HistorianAccessError)`** in `aahClientManaged` — pinned 2026-05-18 by
+decompiling the installed SDK. `Program.cs` and `Install-Services.ps1` were
+already wired in the PR C.1 scaffolding. Two corrections to the assumptions
+this doc was written under:

-**What it needs**:
+- **There is no `ArchestrAAlarmsAndEvents.SDK` writer.** That assembly
+  (`ArchestrAAlarmsAndEvents.SDK.Common.dll`, the only one installed) is a WCF
+  query-proxy base — no `AlarmHistorianWriter` type. The write path is the
+  `aahClientManaged` `HistorianAccess` surface.
+- **The write path needs its own connection.** The query-side
+  `HistorianDataSource` opens `ReadOnly` sessions; `AddStreamedValue` on a
+  read-only session fails with `WriteToReadOnlyFile`.
+  `SdkAlarmHistorianWriteBackend` opens a dedicated `ReadOnly=false` connection
+  and shares only `HistorianClusterEndpointPicker` (not the connection object).

-1. New `AahClientManagedAlarmEventWriter.cs` implementing `IAlarmEventWriter`
-   (defined in `Ipc\HistorianFrameHandler.cs`). Calls `aahClientManaged`'s
-   alarm-event write API — same path v1's `GalaxyHistorianWriter` used.
-   Uses `HistorianClusterEndpointPicker` for multi-node routing.
-   Maps `MxStatus` write outcomes to `HistorianWriteOutcome` enum
-   (Ack / PermanentFail / RetryPlease).
+**What it needed** (all done):

-2. `Program.cs` — build `AahClientManagedAlarmEventWriter` next to the
-   existing `BuildHistorian()` call; pass it to `HistorianFrameHandler`.
-   Gate behind `OTOPCUA_HISTORIAN_ALARM_WRITE_ENABLED` env var (default `true`
-   when `OTOPCUA_HISTORIAN_ENABLED=true`).
+1. `SdkAlarmHistorianWriteBackend` builds a `HistorianEvent` per
+   `AlarmHistorianEventDto`, calls `AddStreamedValue`, and maps
+   `HistorianAccessError.ErrorValue` codes through
+   `AahClientManagedAlarmEventWriter.MapOutcome` (Ack / PermanentFail /
+   RetryPlease). `HistorianClusterEndpointPicker` drives multi-node failover.
+2. `Program.cs` — `BuildAlarmWriter()` constructs the backend gated behind
+   `OTOPCUA_HISTORIAN_ALARM_WRITE_ENABLED`.
+3. `Install-Services.ps1` — env var present in the install-time block.

-3. `Install-Services.ps1` — add the new env var to the install-time block.
-
-**What blocks C.1**: access to the `aahClientManaged` SDK on the dev box
-(confirmed available per `project_aveva_platform_installed.md` — AVEVA
-Historian SDK is present). C.1 can proceed without A.2 since the sidecar's
-`aahClientManaged` is x64 and does not share the worker's x86 bitness
-constraint.
+**What remains for C.1**: only the live-rig write smoke — the `Live_*` tests
+in `SdkAlarmHistorianWriteBackendTests` stay `Skip`-gated until D.1 confirms a
+round-trip against a real AVEVA Historian, including the exact mandatory
+`HistorianEvent` field set.

 **Tests to write**:

@@ -138,9 +138,9 @@ All three are verified closed in the 2026-04-23 exit-gate audit:

 These are real open items, not issues with the plan reconciliation.

-### Gap 1 — OPC UA method-call dispatch for scripted alarm Ack/Confirm/Shelve (Stream G / C.6)
+### Gap 1 — OPC UA method-call dispatch for scripted alarm methods (Stream G / C.6) — CLOSED

-`DriverNodeManager.MethodCall` does not route OPC UA `Acknowledge` / `Confirm` / `OneShotShelve` / `TimedShelve` / `Unshelve` / `AddComment` method invocations to the `ScriptedAlarmEngine`. Operators can acknowledge scripted alarms through the Admin UI today; OPC UA HMI clients expecting to use Part 9 method nodes directly cannot. Explicit in `phase-7-e2e-smoke.md` §"Known limitations".
+All Part 9 alarm methods now route to the `ScriptedAlarmEngine`. `Acknowledge` / `Confirm` / `AddComment` route via `DriverNodeManager.RouteScriptedAlarmMethodCalls` (task #24 + follow-up); `AddComment` gates at the `AlarmAcknowledge` tier. `OneShotShelve` / `TimedShelve` / `Unshelve` route via the native `AlarmConditionState.OnShelve` / `OnTimedUnshelve` hooks wired in `MarkAsAlarmCondition`, with the per-instance shelve method NodeIds indexed so the Call gate resolves them to `OpcUaOperation.AlarmShelve`.

 ### Gap 2 — Admin UI: no `/virtual-tags` tab or form (Stream F.2)

@@ -0,0 +1,54 @@
+# Loose ends
+
+State as of 2026-05-18, after the #9–#29 task-list run. Everything on the
+formal task list is shipped except #20; the items below are what genuinely
+remains, plus follow-ups surfaced during the run.
+
+## Open task
+
+- **#20 — D.1 dev-rig rollout smoke.** A full 3-service deployment
+  (gateway + worker + server + Wonderware historian sidecar): deploy the
+  refreshed binaries, run `scripts/install/Refresh-Services.ps1`, exercise
+  alarms end-to-end, and capture the rollout artifact. The code blockers
+  were cleared by #18; the act itself needs the physical AVEVA dev rig and
+  cannot be produced from a dev box. Runbook context in
+  `docs/plans/alarms-worker-wiring-plan.md`.
+
+## Follow-ups surfaced during the run
+
+- **~~C.1 live SDK binding.~~** DONE (code). `SdkAlarmHistorianWriteBackend`
+  (`src/Drivers/ZB.MOM.WW.OtOpcUa.Driver.Historian.Wonderware/Backend/`) now
+  writes via the real entry point `HistorianAccess.AddStreamedValue(HistorianEvent,
+  out error)` in `aahClientManaged`. Two plan corrections found while pinning it:
+  (a) `ArchestrAAlarmsAndEvents.SDK` has no writer — it's a WCF query proxy;
+  (b) writes need their own `ReadOnly=false` connection, not the shared read
+  pool. Remaining: the live-rig write smoke (the `Live_*` tests are still
+  `Skip`-gated) — folds into #20 / D.1.
+
+- **~~#24 Shelve-method routing.~~** DONE. Acknowledge / Confirm already
+  routed; OneShotShelve / TimedShelve / Unshelve now route via the native
+  `AlarmConditionState.OnShelve` / `OnTimedUnshelve` hooks wired in
+  `DriverNodeManager.MarkAsAlarmCondition` (scripted alarms get a shelvable
+  `ShelvedStateMachine` subtree created before `alarm.Create`). The three
+  per-instance shelve method NodeIds are indexed so the Call gate resolves
+  them to `OpcUaOperation.AlarmShelve`. `AddComment` also now routes to the
+  engine (gated at the `AlarmAcknowledge` tier) — `phase-7-status.md` Gap 1
+  is fully closed. Remaining: address-space materialisation of the shelve
+  method nodes is best confirmed by a live OPC UA browse (pairs with the
+  G6 / D.1 rig steps).
+
+- **mxaccessgw alarm epic branch.** The alarm subsystem work (A.2/A.3/A.4
+  + the two production-gap fixes from #18) lives on the mxaccessgw branch
+  `docs/alarm-client-wm-app-finding`. It is NOT merged to mxaccessgw's main.
+  Whether/when to merge the alarm epic to main is an open release decision.
+
+- **#15 operator/lab GA gates.** Two v2 GA gates are manual lab steps, not
+  automatable here: the OPC UA CTT (Compliance Test Tool) pass and the
+  deployment-checklist signoff. Documented in
+  `docs/plans/v2-ga-lab-gates-plan.md`.
+
+## Done — for reference
+
+The 5 Phase 7 gaps discovered mid-run (#24–#28) were all completed and
+merged; no Phase 7 gaps remain open. Add any new follow-ups above as they
+are spun out.
@@ -0,0 +1,20 @@
+# Verifies code-reviews/README.md is regenerated from, and consistent with, the
+# per-module findings.md files. Intended as a CI / pre-commit gate.
+#
+# Exits non-zero when README.md is stale, when a module header's "Open findings"
+# count disagrees with its finding statuses, or when a finding carries an
+# unrecognised Status value. See REVIEW-PROCESS.md section 5.
+
+[CmdletBinding()]
+param()
+
+Set-StrictMode -Version Latest
+$ErrorActionPreference = "Stop"
+
+$repoRoot = Resolve-Path (Join-Path $PSScriptRoot "..")
+$script = Join-Path $repoRoot "code-reviews/regen-readme.py"
+
+# The bare `python3` alias on this platform resolves to the Windows Store stub;
+# `python` is the real interpreter.
+& python $script --check
+exit $LASTEXITCODE
@@ -1,7 +1,9 @@
+using System.Threading.Channels;
 using CliFx.Attributes;
 using CliFx.Infrastructure;
 using ZB.MOM.WW.OtOpcUa.Client.CLI.Helpers;
 using ZB.MOM.WW.OtOpcUa.Client.Shared;
+using ZB.MOM.WW.OtOpcUa.Client.Shared.Models;

 namespace ZB.MOM.WW.OtOpcUa.Client.CLI.Commands;

@@ -49,19 +51,33 @@ public class AlarmsCommand : CommandBase

            var sourceNodeId = NodeIdParser.Parse(NodeId);

-            service.AlarmEvent += (_, e) =>
+            // Channel serialises SDK notification-thread writes to the main async loop so
+            // that concurrent alarm callbacks never interleave on the shared TextWriter.
+            var outputChannel = Channel.CreateUnbounded<string>(
+                new UnboundedChannelOptions { SingleReader = true });
+
+            void AlarmEventHandler(object? sender, AlarmEventArgs e)
            {
-                console.Output.WriteLine($"[{e.Time:O}] ALARM  {e.SourceName}");
-                console.Output.WriteLine($"  Condition: {e.ConditionName}");
-                var activeStr = e.ActiveState ? "Active" : "Inactive";
-                var ackedStr = e.AckedState ? "Acknowledged" : "Unacknowledged";
-                console.Output.WriteLine($"  State:     {activeStr}, {ackedStr}");
-                console.Output.WriteLine($"  Severity:  {e.Severity}");
-                if (!string.IsNullOrEmpty(e.Message))
-                    console.Output.WriteLine($"  Message:   {e.Message}");
-                console.Output.WriteLine($"  Retain:    {e.Retain}");
-                console.Output.WriteLine();
-            };
+                try
+                {
+                    var activeStr = e.ActiveState ? "Active" : "Inactive";
+                    var ackedStr = e.AckedState ? "Acknowledged" : "Unacknowledged";
+                    outputChannel.Writer.TryWrite($"[{e.Time:O}] ALARM  {e.SourceName}");
+                    outputChannel.Writer.TryWrite($"  Condition: {e.ConditionName}");
+                    outputChannel.Writer.TryWrite($"  State:     {activeStr}, {ackedStr}");
+                    outputChannel.Writer.TryWrite($"  Severity:  {e.Severity}");
+                    if (!string.IsNullOrEmpty(e.Message))
+                        outputChannel.Writer.TryWrite($"  Message:   {e.Message}");
+                    outputChannel.Writer.TryWrite($"  Retain:    {e.Retain}");
+                    outputChannel.Writer.TryWrite(string.Empty);
+                }
+                catch
+                {
+                    // Never let handler exceptions escape into the SDK callback.
+                }
+            }
+
+            service.AlarmEvent += AlarmEventHandler;

            await service.SubscribeAlarmsAsync(sourceNodeId, Interval, ct);
            await console.Output.WriteLineAsync(
@@ -78,6 +94,14 @@ public class AlarmsCommand : CommandBase
                    await console.Output.WriteLineAsync($"Condition refresh not supported: {ex.Message}");
                }

+            // Drain the output channel on the main thread until cancellation fires.
+            using var drainCts = CancellationTokenSource.CreateLinkedTokenSource(ct);
+            var drainTask = Task.Run(async () =>
+            {
+                await foreach (var line in outputChannel.Reader.ReadAllAsync(drainCts.Token))
+                    await console.Output.WriteLineAsync(line);
+            }, CancellationToken.None);
+
            // Wait until cancellation
            try
            {
@@ -88,6 +112,12 @@ public class AlarmsCommand : CommandBase
                // Expected on Ctrl+C
            }

+            // Stop accepting new notifications before writing final output.
+            service.AlarmEvent -= AlarmEventHandler;
+            outputChannel.Writer.Complete();
+            await drainCts.CancelAsync();
+            try { await drainTask; } catch (OperationCanceledException) { }
+
            await service.UnsubscribeAlarmsAsync();
            await console.Output.WriteLineAsync("Unsubscribed.");
        }
@@ -1,3 +1,4 @@
+using System.Globalization;
 using CliFx.Attributes;
 using CliFx.Infrastructure;
 using Opc.Ua;
@@ -27,13 +28,13 @@ public class HistoryReadCommand : CommandBase
    /// <summary>
    /// Gets the optional history start time string supplied by the operator.
    /// </summary>
-    [CommandOption("start", Description = "Start time (ISO 8601 or date string, default: 24 hours ago)")]
+    [CommandOption("start", Description = "Start time in ISO 8601 UTC format, e.g. 2026-01-15T08:00:00Z (default: 24 hours ago)")]
    public string? StartTime { get; init; }

    /// <summary>
    /// Gets the optional history end time string supplied by the operator.
    /// </summary>
-    [CommandOption("end", Description = "End time (ISO 8601 or date string, default: now)")]
+    [CommandOption("end", Description = "End time in ISO 8601 UTC format, e.g. 2026-01-15T09:00:00Z (default: now)")]
    public string? EndTime { get; init; }

    /// <summary>
@@ -70,10 +71,12 @@ public class HistoryReadCommand : CommandBase
            var nodeId = NodeIdParser.ParseRequired(NodeId);
            var start = string.IsNullOrEmpty(StartTime)
                ? DateTime.UtcNow.AddHours(-24)
-                : DateTime.Parse(StartTime).ToUniversalTime();
+                : DateTime.Parse(StartTime, CultureInfo.InvariantCulture,
+                    DateTimeStyles.AssumeUniversal | DateTimeStyles.AdjustToUniversal);
            var end = string.IsNullOrEmpty(EndTime)
                ? DateTime.UtcNow
-                : DateTime.Parse(EndTime).ToUniversalTime();
+                : DateTime.Parse(EndTime, CultureInfo.InvariantCulture,
+                    DateTimeStyles.AssumeUniversal | DateTimeStyles.AdjustToUniversal);

            IReadOnlyList<DataValue> values;

@@ -1,9 +1,11 @@
 using System.Collections.Concurrent;
+using System.Threading.Channels;
 using CliFx.Attributes;
 using CliFx.Infrastructure;
 using Opc.Ua;
 using ZB.MOM.WW.OtOpcUa.Client.CLI.Helpers;
 using ZB.MOM.WW.OtOpcUa.Client.Shared;
+using ZB.MOM.WW.OtOpcUa.Client.Shared.Models;

 namespace ZB.MOM.WW.OtOpcUa.Client.CLI.Commands;

@@ -63,19 +65,35 @@ public class SubscribeCommand : CommandBase
            var everBad = new ConcurrentDictionary<string, byte>();
            var displayNameByNodeId = targets.ToDictionary(t => t.nodeId.ToString(), t => t.displayPath);

-            service.DataChanged += (_, e) =>
+            // Channel serialises notification-thread writes to the main async loop so that
+            // concurrent SDK callbacks and main-thread summary output never interleave on
+            // the shared TextWriter.
+            var outputChannel = Channel.CreateUnbounded<string>(
+                new UnboundedChannelOptions { SingleReader = true });
+
+            void DataChangedHandler(object? sender, DataChangedEventArgs e)
            {
-                var key = e.NodeId.ToString();
-                lastStatus[key] = (e.Value.StatusCode, DateTime.UtcNow, e.Value.Value);
-                updateCount.AddOrUpdate(key, 1, (_, v) => v + 1);
-                if (!StatusCode.IsGood(e.Value.StatusCode))
-                    everBad.TryAdd(key, 0);
-                if (!Quiet)
+                try
                {
-                    console.Output.WriteLine(
-                        $"[{e.Value.SourceTimestamp:O}] {displayNameByNodeId.GetValueOrDefault(key, key)} = {e.Value.Value} ({e.Value.StatusCode})");
+                    var key = e.NodeId.ToString();
+                    lastStatus[key] = (e.Value.StatusCode, DateTime.UtcNow, e.Value.Value);
+                    updateCount.AddOrUpdate(key, 1, (_, v) => v + 1);
+                    if (!StatusCode.IsGood(e.Value.StatusCode))
+                        everBad.TryAdd(key, 0);
+                    if (!Quiet)
+                    {
+                        var line =
+                            $"[{e.Value.SourceTimestamp:O}] {displayNameByNodeId.GetValueOrDefault(key, key)} = {e.Value.Value} ({e.Value.StatusCode})";
+                        outputChannel.Writer.TryWrite(line);
+                    }
                }
-            };
+                catch
+                {
+                    // Never let handler exceptions escape into the SDK callback.
+                }
+            }
+
+            service.DataChanged += DataChangedHandler;

            var subscribed = 0;
            foreach (var (nodeId, _) in targets)
@@ -94,6 +112,14 @@ public class SubscribeCommand : CommandBase
            await console.Output.WriteLineAsync(
                $"Subscribed to {subscribed}/{targets.Count} nodes (interval: {Interval}ms). Press Ctrl+C to stop and print summary.");

+            // Drain the output channel on the main thread until cancellation fires.
+            using var drainCts = CancellationTokenSource.CreateLinkedTokenSource(ct);
+            var drainTask = Task.Run(async () =>
+            {
+                await foreach (var line in outputChannel.Reader.ReadAllAsync(drainCts.Token))
+                    await console.Output.WriteLineAsync(line);
+            }, CancellationToken.None);
+
            try
            {
                if (DurationSeconds > 0)
@@ -105,6 +131,12 @@ public class SubscribeCommand : CommandBase
            {
            }

+            // Stop accepting new notifications before writing the summary.
+            service.DataChanged -= DataChangedHandler;
+            outputChannel.Writer.Complete();
+            await drainCts.CancelAsync();
+            try { await drainTask; } catch (OperationCanceledException) { }
+
            // Summary
            var summary = new List<string>();
            summary.Add("");
@@ -127,10 +159,10 @@ public class SubscribeCommand : CommandBase
            }

            var neverWentBad = targets
-                .Where(t => !everBad.ContainsKey(t.nodeId.ToString()))
+                .Where(t => lastStatus.ContainsKey(t.nodeId.ToString()) && !everBad.ContainsKey(t.nodeId.ToString()))
                .Select(t => t.displayPath)
                .ToList();
-            var didGoBad = targets.Count - neverWentBad.Count;
+            var didGoBad = targets.Count(t => everBad.ContainsKey(t.nodeId.ToString()));

            summary.Add($"Total subscribed: {targets.Count}");
            summary.Add($"  Ever went BAD during window:    {didGoBad}");
@@ -10,23 +10,53 @@ public static class ValueConverter
    ///     Converts a raw string value into the runtime type expected by the target node.
    /// </summary>
    /// <param name="rawValue">The raw string supplied by the user.</param>
-    /// <param name="currentValue">The current node value used to infer the target type. May be null.</param>
+    /// <param name="currentValue">
+    ///     The current node value used to infer the target type. When <c>null</c> the raw string
+    ///     is returned unchanged; callers should validate this case before writing.
+    /// </param>
    /// <returns>A typed value suitable for an OPC UA write request.</returns>
+    /// <exception cref="FormatException">
+    ///     Thrown with a descriptive message when <paramref name="rawValue"/> cannot be parsed
+    ///     into the type inferred from <paramref name="currentValue"/>.
+    /// </exception>
    public static object ConvertValue(string rawValue, object? currentValue)
    {
-        return currentValue switch
+        try
        {
-            bool => bool.Parse(rawValue),
-            byte => byte.Parse(rawValue),
-            short => short.Parse(rawValue),
-            ushort => ushort.Parse(rawValue),
-            int => int.Parse(rawValue),
-            uint => uint.Parse(rawValue),
-            long => long.Parse(rawValue),
-            ulong => ulong.Parse(rawValue),
-            float => float.Parse(rawValue),
-            double => double.Parse(rawValue),
-            _ => rawValue
+            return currentValue switch
+            {
+                bool => ParseBool(rawValue),
+                byte => byte.Parse(rawValue),
+                short => short.Parse(rawValue),
+                ushort => ushort.Parse(rawValue),
+                int => int.Parse(rawValue),
+                uint => uint.Parse(rawValue),
+                long => long.Parse(rawValue),
+                ulong => ulong.Parse(rawValue),
+                float => float.Parse(rawValue),
+                double => double.Parse(rawValue),
+                _ => rawValue
+            };
+        }
+        catch (Exception ex) when (ex is FormatException or OverflowException)
+        {
+            var targetType = currentValue?.GetType().Name ?? "unknown";
+            throw new FormatException(
+                $"Cannot convert value \"{rawValue}\" to target type {targetType}: {ex.Message}", ex);
+        }
+    }
+
+    /// <summary>
+    ///     Parses a boolean from common string representations including numeric and word forms.
+    ///     Accepts: <c>true</c>/<c>false</c>, <c>1</c>/<c>0</c>, <c>yes</c>/<c>no</c> (case-insensitive).
+    /// </summary>
+    private static bool ParseBool(string rawValue)
+    {
+        return rawValue.Trim().ToLowerInvariant() switch
+        {
+            "true" or "1" or "yes" => true,
+            "false" or "0" or "no" => false,
+            _ => throw new FormatException($"String '{rawValue}' was not recognized as a valid Boolean.")
        };
    }
 }
@@ -15,9 +15,20 @@ public sealed class OpcUaClientService : IOpcUaClientService
 {
    private static readonly ILogger Logger = Log.ForContext<OpcUaClientService>();

+    // Guards all access to the subscription-bookkeeping state below
+    // (_activeDataSubscriptions and _activeAlarmSubscription). The dictionary
+    // and tuple are mutated from the caller thread, the keep-alive failover
+    // path, and DisconnectAsync, so every read/write must be inside this lock.
+    private readonly object _subscriptionLock = new();
+
    // Track active data subscriptions for replay after failover
    private readonly Dictionary<string, (NodeId NodeId, int IntervalMs, uint Handle)> _activeDataSubscriptions = new();

+    // Re-entry guard for HandleKeepAliveFailureAsync. The OPC UA stack raises
+    // KeepAlive repeatedly while a session is down; only one failover loop may
+    // run at a time. 0 = idle, 1 = failover in progress (Interlocked-managed).
+    private int _failoverInProgress;
+
    private readonly IApplicationConfigurationFactory _configFactory;
    private readonly IEndpointDiscovery _endpointDiscovery;

@@ -146,8 +157,12 @@ public sealed class OpcUaClientService : IOpcUaClientService
        }
        finally
        {
-            _activeDataSubscriptions.Clear();
-            _activeAlarmSubscription = null;
+            lock (_subscriptionLock)
+            {
+                _activeDataSubscriptions.Clear();
+                _activeAlarmSubscription = null;
+            }
+
            CurrentConnectionInfo = null;
            TransitionState(ConnectionState.Disconnected, endpointUrl);
        }
@@ -172,6 +187,9 @@ public sealed class OpcUaClientService : IOpcUaClientService
        if (value is string rawString)
        {
            var currentDataValue = await _session!.ReadValueAsync(nodeId, ct);
+            if (!StatusCode.IsGood(currentDataValue.StatusCode) || currentDataValue.Value == null)
+                throw new InvalidOperationException(
+                    $"Cannot infer target type for node {nodeId}: read returned status {currentDataValue.StatusCode} with no value. Provide a typed value instead of a string.");
            typedValue = ValueConverter.ConvertValue(rawString, currentDataValue.Value);
        }

@@ -223,15 +241,22 @@ public sealed class OpcUaClientService : IOpcUaClientService
        ThrowIfNotConnected();

        var nodeIdStr = nodeId.ToString();
-        if (_activeDataSubscriptions.ContainsKey(nodeIdStr))
-            return; // Already subscribed
+        lock (_subscriptionLock)
+        {
+            if (_activeDataSubscriptions.ContainsKey(nodeIdStr))
+                return; // Already subscribed
+        }

        if (_dataSubscription == null) _dataSubscription = await _session!.CreateSubscriptionAsync(intervalMs, ct);

        var handle = await _dataSubscription.AddDataChangeMonitoredItemAsync(
            nodeId, intervalMs, OnDataChangeNotification, ct);

-        _activeDataSubscriptions[nodeIdStr] = (nodeId, intervalMs, handle);
+        lock (_subscriptionLock)
+        {
+            _activeDataSubscriptions[nodeIdStr] = (nodeId, intervalMs, handle);
+        }
+
        Logger.Debug("Subscribed to data changes on {NodeId}", nodeId);
    }

@@ -241,12 +266,20 @@ public sealed class OpcUaClientService : IOpcUaClientService
        ThrowIfDisposed();

        var nodeIdStr = nodeId.ToString();
-        if (!_activeDataSubscriptions.TryGetValue(nodeIdStr, out var sub))
-            return; // Not subscribed, safe to ignore
+        (NodeId NodeId, int IntervalMs, uint Handle) sub;
+        lock (_subscriptionLock)
+        {
+            if (!_activeDataSubscriptions.TryGetValue(nodeIdStr, out sub))
+                return; // Not subscribed, safe to ignore
+        }

        if (_dataSubscription != null) await _dataSubscription.RemoveMonitoredItemAsync(sub.Handle, ct);

-        _activeDataSubscriptions.Remove(nodeIdStr);
+        lock (_subscriptionLock)
+        {
+            _activeDataSubscriptions.Remove(nodeIdStr);
+        }
+
        Logger.Debug("Unsubscribed from data changes on {NodeId}", nodeId);
    }

@@ -267,7 +300,11 @@ public sealed class OpcUaClientService : IOpcUaClientService
        await _alarmSubscription.AddEventMonitoredItemAsync(
            monitorNode, intervalMs, filter, OnAlarmEventNotification, ct);

-        _activeAlarmSubscription = (sourceNodeId, intervalMs);
+        lock (_subscriptionLock)
+        {
+            _activeAlarmSubscription = (sourceNodeId, intervalMs);
+        }
+
        Logger.Debug("Subscribed to alarm events on {NodeId}", monitorNode);
    }

@@ -281,7 +318,12 @@ public sealed class OpcUaClientService : IOpcUaClientService

        await _alarmSubscription.DeleteAsync(ct);
        _alarmSubscription = null;
-        _activeAlarmSubscription = null;
+
+        lock (_subscriptionLock)
+        {
+            _activeAlarmSubscription = null;
+        }
+
        Logger.Debug("Unsubscribed from alarm events");
    }

@@ -349,10 +391,17 @@ public sealed class OpcUaClientService : IOpcUaClientService

        var redundancySupportValue =
            await _session!.ReadValueAsync(VariableIds.Server_ServerRedundancy_RedundancySupport, ct);
-        var redundancyMode = ((RedundancySupport)(int)redundancySupportValue.Value).ToString();
+        RedundancySupport redundancySupport;
+        if (StatusCode.IsGood(redundancySupportValue.StatusCode) && redundancySupportValue.Value != null)
+            redundancySupport = (RedundancySupport)Convert.ToInt32(redundancySupportValue.Value);
+        else
+            redundancySupport = RedundancySupport.None;
+        var redundancyMode = redundancySupport.ToString();

        var serviceLevelValue = await _session.ReadValueAsync(VariableIds.Server_ServiceLevel, ct);
-        var serviceLevel = (byte)serviceLevelValue.Value;
+        var serviceLevel = StatusCode.IsGood(serviceLevelValue.StatusCode) && serviceLevelValue.Value != null
+            ? Convert.ToByte(serviceLevelValue.Value)
+            : (byte)0;

        string[] serverUris = [];
        try
@@ -393,8 +442,13 @@ public sealed class OpcUaClientService : IOpcUaClientService
        _dataSubscription?.Dispose();
        _alarmSubscription?.Dispose();
        _session?.Dispose();
-        _activeDataSubscriptions.Clear();
-        _activeAlarmSubscription = null;
+
+        lock (_subscriptionLock)
+        {
+            _activeDataSubscriptions.Clear();
+            _activeAlarmSubscription = null;
+        }
+
        CurrentConnectionInfo = null;
        _state = ConnectionState.Disconnected;
    }
@@ -430,6 +484,26 @@ public sealed class OpcUaClientService : IOpcUaClientService
    }

    private async Task HandleKeepAliveFailureAsync()
+    {
+        // Serialize failover: the OPC UA stack raises KeepAlive repeatedly
+        // while a session is down, so multiple bad keep-alives can fire before
+        // the first failover loop finishes. CompareExchange atomically claims
+        // the failover slot; a re-entrant call sees 1 and returns immediately,
+        // guaranteeing exactly one failover loop runs at a time.
+        if (Interlocked.CompareExchange(ref _failoverInProgress, 1, 0) != 0)
+            return;
+
+        try
+        {
+            await RunFailoverAsync();
+        }
+        finally
+        {
+            Interlocked.Exchange(ref _failoverInProgress, 0);
+        }
+    }
+
+    private async Task RunFailoverAsync()
    {
        if (_state == ConnectionState.Reconnecting || _state == ConnectionState.Disconnected)
            return;
@@ -498,33 +572,43 @@ public sealed class OpcUaClientService : IOpcUaClientService

    private async Task ReplaySubscriptionsAsync()
    {
-        // Replay data subscriptions
-        if (_activeDataSubscriptions.Count > 0)
+        // Snapshot the bookkeeping state under the lock, then clear it so the
+        // replayed handles can be recorded fresh as each monitored item is
+        // re-created. Awaited calls run outside the lock.
+        List<KeyValuePair<string, (NodeId NodeId, int IntervalMs, uint Handle)>> subscriptions;
+        (NodeId? SourceNodeId, int IntervalMs)? alarmSubscription;
+        lock (_subscriptionLock)
        {
-            var subscriptions = _activeDataSubscriptions.ToList();
+            subscriptions = _activeDataSubscriptions.ToList();
+            alarmSubscription = _activeAlarmSubscription;
            _activeDataSubscriptions.Clear();
-
-            foreach (var (nodeIdStr, (nodeId, intervalMs, _)) in subscriptions)
-                try
-                {
-                    if (_dataSubscription == null)
-                        _dataSubscription = await _session!.CreateSubscriptionAsync(intervalMs, CancellationToken.None);
-
-                    var handle = await _dataSubscription.AddDataChangeMonitoredItemAsync(
-                        nodeId, intervalMs, OnDataChangeNotification, CancellationToken.None);
-                    _activeDataSubscriptions[nodeIdStr] = (nodeId, intervalMs, handle);
-                }
-                catch (Exception ex)
-                {
-                    Logger.Warning(ex, "Failed to replay data subscription for {NodeId}", nodeIdStr);
-                }
+            _activeAlarmSubscription = null;
        }

+        // Replay data subscriptions
+        foreach (var (nodeIdStr, (nodeId, intervalMs, _)) in subscriptions)
+            try
+            {
+                if (_dataSubscription == null)
+                    _dataSubscription = await _session!.CreateSubscriptionAsync(intervalMs, CancellationToken.None);
+
+                var handle = await _dataSubscription.AddDataChangeMonitoredItemAsync(
+                    nodeId, intervalMs, OnDataChangeNotification, CancellationToken.None);
+
+                lock (_subscriptionLock)
+                {
+                    _activeDataSubscriptions[nodeIdStr] = (nodeId, intervalMs, handle);
+                }
+            }
+            catch (Exception ex)
+            {
+                Logger.Warning(ex, "Failed to replay data subscription for {NodeId}", nodeIdStr);
+            }
+
        // Replay alarm subscription
-        if (_activeAlarmSubscription.HasValue)
+        if (alarmSubscription.HasValue)
        {
-            var (sourceNodeId, intervalMs) = _activeAlarmSubscription.Value;
-            _activeAlarmSubscription = null;
+            var (sourceNodeId, intervalMs) = alarmSubscription.Value;
            try
            {
                var monitorNode = sourceNodeId ?? ObjectIds.Server;
@@ -532,7 +616,11 @@ public sealed class OpcUaClientService : IOpcUaClientService
                var filter = CreateAlarmEventFilter();
                await _alarmSubscription.AddEventMonitoredItemAsync(
                    monitorNode, intervalMs, filter, OnAlarmEventNotification, CancellationToken.None);
-                _activeAlarmSubscription = (sourceNodeId, intervalMs);
+
+                lock (_subscriptionLock)
+                {
+                    _activeAlarmSubscription = (sourceNodeId, intervalMs);
+                }
            }
            catch (Exception ex)
            {
@@ -549,7 +637,7 @@ public sealed class OpcUaClientService : IOpcUaClientService
    private void OnAlarmEventNotification(EventFieldList eventFields)
    {
        var fields = eventFields.EventFields;
-        if (fields == null || fields.Count < 6)
+        if (fields == null || fields.Count < 1)
            return;

        var eventId = fields.Count > 0 ? fields[0].Value as byte[] : null;
@@ -578,6 +666,8 @@ public sealed class OpcUaClientService : IOpcUaClientService
        // Fallback: read InAlarm/Acked from condition node Galaxy attributes
        // when the server doesn't populate standard event fields.
        // Must run on a background thread to avoid deadlocking the notification thread.
+        // Capture the session reference now; skip supplemental reads if the session has
+        // been replaced by a concurrent failover before the Task.Run body executes.
        if (ackedField == null && activeField == null && conditionNodeId != null && _session != null)
        {
            var session = _session;
@@ -585,6 +675,11 @@ public sealed class OpcUaClientService : IOpcUaClientService
            var capturedMessage = message;
            _ = Task.Run(async () =>
            {
+                // If the session was replaced by a failover before we started reading,
+                // skip the supplemental reads to avoid hitting a disposed session.
+                if (!ReferenceEquals(session, _session))
+                    return;
+
                try
                {
                    var inAlarmValue = await session.ReadValueAsync(NodeId.Parse($"{capturedConditionNodeId}.InAlarm"));
@@ -609,9 +704,16 @@ public sealed class OpcUaClientService : IOpcUaClientService
                    }
                    catch { /* DescAttrName may not exist */ }
                }
+                catch (ObjectDisposedException)
+                {
+                    // Session was disposed during supplemental reads (concurrent failover);
+                    // skip the event rather than delivering stale/default states.
+                    Logger.Debug("Supplemental alarm read skipped — session disposed during failover for {ConditionNodeId}", capturedConditionNodeId);
+                    return;
+                }
                catch
                {
-                    // Supplemental read failed; use defaults
+                    // Other supplemental read failure; deliver event with defaults
                }

                AlarmEvent?.Invoke(this, new AlarmEventArgs(
@@ -17,11 +17,6 @@ public sealed class UserSettings
    /// </summary>
    public string? Username { get; set; }

-    /// <summary>
-    /// Gets or sets the persisted password for authenticated sessions.
-    /// </summary>
-    public string? Password { get; set; }
-
    /// <summary>
    /// Gets or sets the transport security mode selected by the user.
    /// </summary>
@@ -215,6 +215,16 @@ public partial class AlarmsViewModel : ObservableObject
        ActiveAlarmCount = 0;
    }

+    /// <summary>
+    ///     Re-hooks event handlers to the service after a server-side reconnect.
+    ///     Safe to call when already attached (duplicate += is a no-op in .NET multicast delegates).
+    /// </summary>
+    public void Reattach()
+    {
+        _service.AlarmEvent -= OnAlarmEvent;
+        _service.AlarmEvent += OnAlarmEvent;
+    }
+
    /// <summary>
    ///     Unhooks event handlers from the service.
    /// </summary>
@@ -73,7 +73,7 @@ public partial class HistoryViewModel : ObservableObject
    {
        if (string.IsNullOrEmpty(SelectedNodeId)) return;

-        IsLoading = true;
+        _dispatcher.Post(() => IsLoading = true);
        _dispatcher.Post(() => Results.Clear());

        try
@@ -10,7 +10,7 @@ namespace ZB.MOM.WW.OtOpcUa.Client.UI.ViewModels;
 /// <summary>
 ///     Main window ViewModel coordinating all panels.
 /// </summary>
-public partial class MainWindowViewModel : ObservableObject
+public partial class MainWindowViewModel : ObservableObject, IDisposable
 {
    private readonly IUiDispatcher _dispatcher;
    private readonly IOpcUaClientServiceFactory _factory;
@@ -166,6 +166,8 @@ public partial class MainWindowViewModel : ObservableObject
        {
            case ConnectionState.Connected:
                StatusMessage = $"Connected to {EndpointUrl}";
+                Subscriptions?.Reattach();
+                Alarms?.Reattach();
                break;
            case ConnectionState.Reconnecting:
                StatusMessage = "Reconnecting...";
@@ -177,6 +179,8 @@ public partial class MainWindowViewModel : ObservableObject
                StatusMessage = "Disconnected";
                SessionLabel = string.Empty;
                RedundancyInfo = null;
+                Subscriptions?.Teardown();
+                Alarms?.Teardown();
                BrowseTree?.Clear();
                ReadWrite?.Clear();
                Subscriptions?.Clear();
@@ -252,7 +256,7 @@ public partial class MainWindowViewModel : ObservableObject
            }

            // Load root nodes
-            await BrowseTree.LoadRootsAsync();
+            if (BrowseTree != null) await BrowseTree.LoadRootsAsync();

            // Restore saved subscriptions
            if (_savedSubscribedNodes.Count > 0 && Subscriptions != null)
@@ -330,7 +334,7 @@ public partial class MainWindowViewModel : ObservableObject
        if (SelectedTreeNodes.Count == 0 || !IsConnected) return;

        var node = SelectedTreeNodes[0];
-        History.SelectedNodeId = node.NodeId;
+        if (History != null) History.SelectedNodeId = node.NodeId;
        SelectedTabIndex = 3; // History tab
    }

@@ -376,7 +380,7 @@ public partial class MainWindowViewModel : ObservableObject
        var s = _settingsService.Load();
        EndpointUrl = s.EndpointUrl;
        Username = s.Username;
-        Password = s.Password;
+        // Password is intentionally not persisted (security: re-prompt each launch)
        SelectedSecurityMode = s.SecurityMode;
        FailoverUrls = s.FailoverUrls;
        SessionTimeoutSeconds = s.SessionTimeoutSeconds;
@@ -396,7 +400,7 @@ public partial class MainWindowViewModel : ObservableObject
        {
            EndpointUrl = EndpointUrl,
            Username = Username,
-            Password = Password,
+            // Password is intentionally not persisted (security: re-prompt each launch)
            SecurityMode = SelectedSecurityMode,
            FailoverUrls = FailoverUrls,
            SessionTimeoutSeconds = SessionTimeoutSeconds,
@@ -407,6 +411,21 @@ public partial class MainWindowViewModel : ObservableObject
        });
    }

+    /// <summary>
+    /// Detaches the connection-state handler and disposes the OPC UA client service, releasing
+    /// the session, certificate validator, and any background reconnect resources.
+    /// </summary>
+    public void Dispose()
+    {
+        if (_service != null)
+        {
+            _service.ConnectionStateChanged -= OnConnectionStateChanged;
+            Subscriptions?.Teardown();
+            Alarms?.Teardown();
+            _service.Dispose();
+        }
+    }
+
    private static string[]? ParseFailoverUrls(string? csv)
    {
        if (string.IsNullOrWhiteSpace(csv))
@@ -265,6 +265,16 @@ public partial class SubscriptionsViewModel : ObservableObject
        SubscriptionCount = 0;
    }

+    /// <summary>
+    ///     Re-hooks event handlers to the service after a server-side reconnect.
+    ///     Safe to call when already attached (duplicate += is a no-op in .NET multicast delegates).
+    /// </summary>
+    public void Reattach()
+    {
+        _service.DataChanged -= OnDataChanged;
+        _service.DataChanged += OnDataChanged;
+    }
+
    /// <summary>
    ///     Unhooks event handlers from the service.
    /// </summary>
@@ -140,7 +140,10 @@ public partial class MainWindow : Window
    protected override void OnClosing(WindowClosingEventArgs e)
    {
        if (DataContext is MainWindowViewModel vm)
+        {
            vm.SaveSettings();
+            vm.Dispose();
+        }
        base.OnClosing(e);
    }
 }
@@ -5,19 +5,27 @@ namespace ZB.MOM.WW.OtOpcUa.Configuration;

 /// <summary>
 ///     Used by <c>dotnet ef</c> at design time (migrations, scaffolding). Reads the connection string
-///     from the <c>OTOPCUA_CONFIG_CONNECTION</c> environment variable, falling back to the local dev
-///     container on <c>localhost:1433</c>.
+///     from the <c>OTOPCUA_CONFIG_CONNECTION</c> environment variable.
 /// </summary>
+/// <remarks>
+///     Set the variable before running migration commands, e.g.:
+///     <code>
+///         $env:OTOPCUA_CONFIG_CONNECTION = "Server=10.100.0.35,14330;Database=OtOpcUaConfig;Trusted_Connection=True;TrustServerCertificate=True;"
+///         dotnet ef migrations add MyMigration --project src/Core/ZB.MOM.WW.OtOpcUa.Configuration
+///     </code>
+///     No credential is embedded in source. Do not add a plaintext password as a fallback.
+/// </remarks>
 public sealed class DesignTimeDbContextFactory : IDesignTimeDbContextFactory<OtOpcUaConfigDbContext>
 {
-    // Host-port 14330 avoids collision with the native MSSQL14 instance on 1433 (Galaxy "ZB" DB).
-    private const string DefaultConnectionString =
-        "Server=localhost,14330;Database=OtOpcUaConfig;User Id=sa;Password=OtOpcUaDev_2026!;TrustServerCertificate=True;Encrypt=False;";
-
    public OtOpcUaConfigDbContext CreateDbContext(string[] args)
    {
-        var connection = Environment.GetEnvironmentVariable("OTOPCUA_CONFIG_CONNECTION")
-                         ?? DefaultConnectionString;
+        var connection = Environment.GetEnvironmentVariable("OTOPCUA_CONFIG_CONNECTION");
+
+        if (string.IsNullOrWhiteSpace(connection))
+            throw new InvalidOperationException(
+                "OTOPCUA_CONFIG_CONNECTION is not set. " +
+                "Export the variable before running 'dotnet ef' commands. Example: " +
+                "Server=10.100.0.35,14330;Database=OtOpcUaConfig;Trusted_Connection=True;TrustServerCertificate=True;");

        var options = new DbContextOptionsBuilder<OtOpcUaConfigDbContext>()
            .UseSqlServer(connection, sql => sql.MigrationsAssembly(typeof(OtOpcUaConfigDbContext).Assembly.FullName))
@@ -8,5 +8,11 @@ public enum NodeAclScopeKind
    UnsArea,
    UnsLine,
    Equipment,
+    /// <summary>
+    ///     A Galaxy (SystemPlatform-kind) folder segment anchored below a namespace.
+    ///     Distinguishes folder grants from UNS <see cref="Equipment"/> grants in the
+    ///     <c>AuthorizationDecision.Provenance</c> audit trail and Admin UI diagnostics.
+    /// </summary>
+    FolderSegment,
    Tag,
 }
@@ -48,7 +48,13 @@ public sealed class ResilientConfigReader
                UseJitter = true,
                Delay = TimeSpan.FromMilliseconds(100),
                MaxDelay = TimeSpan.FromSeconds(1),
-                ShouldHandle = new PredicateBuilder().Handle<Exception>(ex => ex is not OperationCanceledException),
+                // Handle ALL exceptions including OperationCanceledException. A SQL command-level
+                // timeout surfaces as TaskCanceledException (derives from OperationCanceledException)
+                // when the caller's token is NOT cancelled, and must be retried just like any other
+                // transient error. Polly itself checks the cancellation token between retries and
+                // stops with OperationCanceledException on genuine caller cancellation regardless of
+                // this predicate.
+                ShouldHandle = new PredicateBuilder().Handle<Exception>(),
            });
        }

@@ -76,7 +82,11 @@ public sealed class ResilientConfigReader
            _staleFlag.MarkFresh();
            return result;
        }
-        catch (Exception ex) when (ex is not OperationCanceledException)
+        // Catch all exceptions that are NOT genuine caller cancellations. A SQL command-level
+        // timeout surfaces as TaskCanceledException (derives from OperationCanceledException)
+        // but the caller's token is NOT cancelled — we must fall back to the sealed cache for
+        // that case, not propagate. Only rethrow if the caller actually requested cancellation.
+        catch (Exception ex) when (ex is not OperationCanceledException || !cancellationToken.IsCancellationRequested)
        {
            _logger.LogWarning(ex, "Central-DB read failed after retries; falling back to sealed cache for cluster {ClusterId}", clusterId);
            // GenerationCacheUnavailableException surfaces intentionally — fails the caller's
@@ -145,9 +145,12 @@ BEGIN
        (NodeId, CurrentGenerationId, LastAppliedAt, LastAppliedStatus, LastAppliedError, LastSeenAt)
        VALUES (@NodeId, @GenerationId, SYSUTCDATETIME(), @Status, @Error, SYSUTCDATETIME());

+    -- Build DetailsJson via STRING_ESCAPE so a @Status containing a double-quote/backslash cannot
+    -- produce malformed JSON (which would fail CK_ConfigAuditLog_DetailsJson_IsJson and abort the
+    -- transaction) or inject extra JSON structure into the audit record.
    INSERT dbo.ConfigAuditLog (Principal, EventType, NodeId, GenerationId, DetailsJson)
    VALUES (@Caller, 'NodeApplied', @NodeId, @GenerationId,
-            CONCAT('{""status"":""', @Status, '""}'));
+            CONCAT('{""status"":""', STRING_ESCAPE(@Status, 'json'), '""}'));
 END
 ";

@@ -267,7 +270,22 @@ BEGIN
    SET NOCOUNT ON;
    SET XACT_ABORT ON;

-    BEGIN TRANSACTION;
+    -- Transaction-nesting awareness: if a caller (e.g. sp_RollbackToGeneration) already
+    -- holds a transaction, we use SAVE TRANSACTION so our failure path rolls back only to
+    -- the savepoint instead of issuing a bare ROLLBACK that wipes the caller's transaction
+    -- (which sets @@TRANCOUNT = 0 and causes error 3902 on the caller's subsequent COMMIT).
+    DECLARE @OwnsTxn  bit = 0;
+    DECLARE @SaveName nvarchar(32) = N'sp_PublishGeneration';
+
+    IF @@TRANCOUNT = 0
+    BEGIN
+        BEGIN TRANSACTION;
+        SET @OwnsTxn = 1;
+    END
+    ELSE
+    BEGIN
+        SAVE TRANSACTION sp_PublishGeneration;
+    END

    DECLARE @Lock nvarchar(255) = N'OtOpcUa_Publish_' + @ClusterId;
    DECLARE @LockResult int;
@@ -275,11 +293,24 @@ BEGIN
    IF @LockResult < 0
    BEGIN
        RAISERROR('PublishConflict: another publish is in progress for cluster %s', 16, 1, @ClusterId);
-        ROLLBACK;
+        IF @OwnsTxn = 1 ROLLBACK;
+        ELSE ROLLBACK TRANSACTION sp_PublishGeneration;
        RETURN;
    END

-    EXEC dbo.sp_ValidateDraft @DraftGenerationId = @DraftGenerationId;
+    -- sp_ValidateDraft signals every rejection with RAISERROR(..., 16, 1) — a severity-16 error is
+    -- NOT batch-aborting and SET XACT_ABORT ON does not abort the transaction for it, so without a
+    -- TRY/CATCH control would return here and the draft would publish despite failed validation.
+    -- Catch the validation error, roll back the publish transaction (only to our savepoint when a
+    -- caller owns the outer transaction), and re-raise so the caller sees the real validation failure.
+    BEGIN TRY
+        EXEC dbo.sp_ValidateDraft @DraftGenerationId = @DraftGenerationId;
+    END TRY
+    BEGIN CATCH
+        IF @OwnsTxn = 1 ROLLBACK;
+        ELSE ROLLBACK TRANSACTION sp_PublishGeneration;
+        THROW;
+    END CATCH

    MERGE dbo.ExternalIdReservation AS tgt
    USING (
@@ -310,15 +341,16 @@ BEGIN

    IF @@ROWCOUNT = 0
    BEGIN
-        RAISERROR('Draft %I64d for cluster %s not found (was it already published?)', 16, 1, @DraftGenerationId, @ClusterId);
-        ROLLBACK;
+        RAISERROR('Draft %I64d for cluster %s not in Draft status (was it already published?)', 16, 1, @DraftGenerationId, @ClusterId);
+        IF @OwnsTxn = 1 ROLLBACK;
+        ELSE ROLLBACK TRANSACTION sp_PublishGeneration;
        RETURN;
    END

    INSERT dbo.ConfigAuditLog (Principal, EventType, ClusterId, GenerationId)
    VALUES (SUSER_SNAME(), 'Published', @ClusterId, @DraftGenerationId);

-    COMMIT;
+    IF @OwnsTxn = 1 COMMIT;
 END
 ";

@@ -369,9 +401,11 @@ BEGIN

    EXEC dbo.sp_PublishGeneration @ClusterId = @ClusterId, @DraftGenerationId = @NewGenId, @Notes = @Notes;

+    -- @TargetGenerationId is a bigint, but build the JSON value via an explicit numeric CONVERT so
+    -- the emitted token is always a bare JSON number — never reliant on implicit string coercion.
    INSERT dbo.ConfigAuditLog (Principal, EventType, ClusterId, GenerationId, DetailsJson)
    VALUES (SUSER_SNAME(), 'RolledBack', @ClusterId, @NewGenId,
-            CONCAT('{""rolledBackTo"":', @TargetGenerationId, '}'));
+            CONCAT('{""rolledBackTo"":', CONVERT(nvarchar(20), CONVERT(bigint, @TargetGenerationId)), '}'));

    COMMIT;
 END
@@ -464,9 +498,12 @@ BEGIN
        RETURN;
    END

+    -- Escape both caller-supplied values via STRING_ESCAPE so quotes/backslashes cannot break the
+    -- JSON document or inject additional structure into the audit record.
    INSERT dbo.ConfigAuditLog (Principal, EventType, DetailsJson)
    VALUES (SUSER_SNAME(), 'ExternalIdReleased',
-            CONCAT('{""kind"":""', @Kind, '"",""value"":""', @Value, '""}'));
+            CONCAT('{""kind"":""', STRING_ESCAPE(@Kind, 'json'),
+                   '"",""value"":""', STRING_ESCAPE(@Value, 'json'), '""}'));
 END
 ";
    }
@@ -0,0 +1,120 @@
+using Microsoft.EntityFrameworkCore.Migrations;
+
+#nullable disable
+
+namespace ZB.MOM.WW.OtOpcUa.Configuration.Migrations
+{
+    /// <summary>
+    ///     Admin-008: adds <c>@ReleasedBy</c> parameter to
+    ///     <c>dbo.sp_ReleaseExternalIdReservation</c> so the operator principal name (the LDAP
+    ///     sign-in) is recorded in <c>ExternalIdReservation.ReleasedBy</c> and the
+    ///     <c>ConfigAuditLog.Principal</c> column.
+    ///
+    ///     Prior to this migration the proc used <c>SUSER_SNAME()</c> for both columns, which
+    ///     recorded the shared SQL service account rather than the Admin-UI operator who performed
+    ///     the release — making the audit trail useless for attribution.  The stored proc now
+    ///     accepts <c>@ReleasedBy nvarchar(128)</c> and uses it for both columns; validation
+    ///     rejects a null/empty value the same way <c>@ReleaseReason</c> is validated.
+    /// </summary>
+    /// <inheritdoc />
+    public partial class AddReleasedByToReleaseExternalIdReservation : Migration
+    {
+        /// <inheritdoc />
+        protected override void Up(MigrationBuilder migrationBuilder)
+        {
+            migrationBuilder.Sql(Procs.ReleaseExternalIdReservationV2);
+        }
+
+        /// <inheritdoc />
+        protected override void Down(MigrationBuilder migrationBuilder)
+        {
+            migrationBuilder.Sql(Procs.ReleaseExternalIdReservationV1);
+        }
+
+        private static class Procs
+        {
+            /// <summary>V2 — accepts <c>@ReleasedBy</c> for proper operator attribution.</summary>
+            public const string ReleaseExternalIdReservationV2 = @"
+CREATE OR ALTER PROCEDURE dbo.sp_ReleaseExternalIdReservation
+    @Kind           nvarchar(16),
+    @Value          nvarchar(64),
+    @ReleaseReason  nvarchar(512),
+    @ReleasedBy     nvarchar(128)
+AS
+BEGIN
+    SET NOCOUNT ON;
+    SET XACT_ABORT ON;
+
+    IF @ReleaseReason IS NULL OR LEN(@ReleaseReason) = 0
+    BEGIN
+        RAISERROR('ReleaseReason is required', 16, 1);
+        RETURN;
+    END
+
+    IF @ReleasedBy IS NULL OR LEN(@ReleasedBy) = 0
+    BEGIN
+        RAISERROR('ReleasedBy is required', 16, 1);
+        RETURN;
+    END
+
+    UPDATE dbo.ExternalIdReservation
+    SET    ReleasedAt     = SYSUTCDATETIME(),
+           ReleasedBy     = @ReleasedBy,
+           ReleaseReason  = @ReleaseReason
+    WHERE  Kind = @Kind AND Value = @Value AND ReleasedAt IS NULL;
+
+    IF @@ROWCOUNT = 0
+    BEGIN
+        RAISERROR('No active reservation found for (%s, %s)', 16, 1, @Kind, @Value);
+        RETURN;
+    END
+
+    -- Escape all caller-supplied values via STRING_ESCAPE so quotes/backslashes cannot break the
+    -- JSON document or inject additional structure into the audit record.
+    INSERT dbo.ConfigAuditLog (Principal, EventType, DetailsJson)
+    VALUES (@ReleasedBy, 'ExternalIdReleased',
+            CONCAT('{""kind"":""', STRING_ESCAPE(@Kind, 'json'),
+                   '"",""value"":""', STRING_ESCAPE(@Value, 'json'), '""}'));
+END
+";
+
+            /// <summary>V1 — original proc (uses SUSER_SNAME() for attribution). Restored on Down().</summary>
+            public const string ReleaseExternalIdReservationV1 = @"
+CREATE OR ALTER PROCEDURE dbo.sp_ReleaseExternalIdReservation
+    @Kind           nvarchar(16),
+    @Value          nvarchar(64),
+    @ReleaseReason  nvarchar(512)
+AS
+BEGIN
+    SET NOCOUNT ON;
+    SET XACT_ABORT ON;
+
+    IF @ReleaseReason IS NULL OR LEN(@ReleaseReason) = 0
+    BEGIN
+        RAISERROR('ReleaseReason is required', 16, 1);
+        RETURN;
+    END
+
+    UPDATE dbo.ExternalIdReservation
+    SET    ReleasedAt     = SYSUTCDATETIME(),
+           ReleasedBy     = SUSER_SNAME(),
+           ReleaseReason  = @ReleaseReason
+    WHERE  Kind = @Kind AND Value = @Value AND ReleasedAt IS NULL;
+
+    IF @@ROWCOUNT = 0
+    BEGIN
+        RAISERROR('No active reservation found for (%s, %s)', 16, 1, @Kind, @Value);
+        RETURN;
+    END
+
+    -- Escape both caller-supplied values via STRING_ESCAPE so quotes/backslashes cannot break the
+    -- JSON document or inject additional structure into the audit record.
+    INSERT dbo.ConfigAuditLog (Principal, EventType, DetailsJson)
+    VALUES (SUSER_SNAME(), 'ExternalIdReleased',
+            CONCAT('{""kind"":""', STRING_ESCAPE(@Kind, 'json'),
+                   '"",""value"":""', STRING_ESCAPE(@Value, 'json'), '""}'));
+END
+";
+        }
+    }
+}
@@ -11,6 +11,18 @@ public sealed class DraftSnapshot
    public required long GenerationId { get; init; }
    public required string ClusterId { get; init; }

+    /// <summary>
+    ///     Cluster's Enterprise segment (UNS level 1). When set, <see cref="DraftValidator"/> uses
+    ///     the actual length for path-length checks instead of a conservative 32-char upper bound.
+    /// </summary>
+    public string? Enterprise { get; init; }
+
+    /// <summary>
+    ///     Cluster's Site segment (UNS level 2). When set, <see cref="DraftValidator"/> uses the
+    ///     actual length for path-length checks instead of a conservative 32-char upper bound.
+    /// </summary>
+    public string? Site { get; init; }
+
    public IReadOnlyList<Namespace> Namespaces { get; init; } = [];
    public IReadOnlyList<DriverInstance> DriverInstances { get; init; } = [];
    public IReadOnlyList<Device> Devices { get; init; } = [];
@@ -59,8 +59,13 @@ public static class DraftValidator
    /// <summary>Cluster.Enterprise + Site + area + line + equipment + 4 slashes ≤ 200 chars.</summary>
    private static void ValidatePathLength(DraftSnapshot draft, List<ValidationError> errors)
    {
-        // The cluster row isn't in the snapshot — we assume caller pre-validated Enterprise+Site
-        // length and bound them as constants <= 64 chars each. Here we validate the dynamic portion.
+        // Use actual Enterprise/Site lengths when the snapshot carries them (populated by
+        // DraftValidationService from the ServerCluster row). Fall back to a conservative
+        // 32-char upper bound per segment when not supplied — over-penalises short values
+        // but never under-penalises long ones, which is acceptable for the fallback case.
+        var enterpriseLen = draft.Enterprise?.Length ?? 32;
+        var siteLen = draft.Site?.Length ?? 32;
+
        var areaById = draft.UnsAreas.ToDictionary(a => a.UnsAreaId);
        var lineById = draft.UnsLines.ToDictionary(l => l.UnsLineId);

@@ -69,8 +74,7 @@ public static class DraftValidator
            if (!lineById.TryGetValue(eq.UnsLineId!, out var line)) continue;
            if (!areaById.TryGetValue(line.UnsAreaId, out var area)) continue;

-            // rough upper bound: Enterprise+Site at most 32+32; add dynamic segments + 4 slashes
-            var len = 32 + 32 + area.Name.Length + line.Name.Length + eq.Name.Length + 4;
+            var len = enterpriseLen + siteLen + area.Name.Length + line.Name.Length + eq.Name.Length + 4;
            if (len > MaxPathLength)
                errors.Add(new("PathTooLong",
                    $"Equipment path exceeds {MaxPathLength} chars (approx {len})",
@@ -1,3 +1,4 @@
+using System.Collections;
 using System.Collections.Concurrent;

 namespace ZB.MOM.WW.OtOpcUa.Core.Abstractions;
@@ -61,7 +62,7 @@ public sealed class PollGroupEngine : IAsyncDisposable
        var handle = new PollSubscriptionHandle(id);
        var state = new SubscriptionState(handle, [.. fullReferences], interval, cts);
        _subscriptions[id] = state;
-        _ = Task.Run(() => PollLoopAsync(state, cts.Token), cts.Token);
+        state.LoopTask = Task.Run(() => PollLoopAsync(state, cts.Token));
        return handle;
    }

@@ -71,13 +72,27 @@ public sealed class PollGroupEngine : IAsyncDisposable
    {
        if (handle is PollSubscriptionHandle h && _subscriptions.TryRemove(h.Id, out var state))
        {
-            try { state.Cts.Cancel(); } catch { }
-            state.Cts.Dispose();
+            StopState(state);
            return true;
        }
        return false;
    }

+    private static void StopState(SubscriptionState state)
+    {
+        try { state.Cts.Cancel(); } catch { }
+        // Await the loop task (with a generous timeout) before disposing the CTS so:
+        //  (a) no _onChange callback fires after the caller considers the engine torn down, and
+        //  (b) the CTS is not disposed while Task.Delay is still holding a reference to its token,
+        //      which can turn OperationCanceledException into ObjectDisposedException.
+        var task = state.LoopTask;
+        if (task is not null)
+        {
+            try { task.Wait(TimeSpan.FromSeconds(5)); } catch { }
+        }
+        state.Cts.Dispose();
+    }
+
    /// <summary>Snapshot of active subscription count — exposed for driver diagnostics.</summary>
    public int ActiveSubscriptionCount => _subscriptions.Count;

@@ -103,13 +118,22 @@ public sealed class PollGroupEngine : IAsyncDisposable
    private async Task PollOnceAsync(SubscriptionState state, bool forceRaise, CancellationToken ct)
    {
        var snapshots = await _reader(state.TagReferences, ct).ConfigureAwait(false);
+
+        // Core.Abstractions-002: validate the reader contract before indexing. A reader that
+        // returns fewer snapshots than references would silently stall the subscription; surface
+        // the violation immediately with a descriptive exception instead.
+        if (snapshots.Count != state.TagReferences.Count)
+            throw new InvalidOperationException(
+                $"Reader contract violation: expected {state.TagReferences.Count} snapshots but received {snapshots.Count}. " +
+                "The reader delegate must return one snapshot per input reference in input order.");
+
        for (var i = 0; i < state.TagReferences.Count; i++)
        {
            var tagRef = state.TagReferences[i];
            var current = snapshots[i];
            var lastSeen = state.LastValues.TryGetValue(tagRef, out var prev) ? prev : default;

-            if (forceRaise || !Equals(lastSeen?.Value, current.Value) || lastSeen?.StatusCode != current.StatusCode)
+            if (forceRaise || ValuesAreDifferent(lastSeen?.Value, current.Value) || lastSeen?.StatusCode != current.StatusCode)
            {
                state.LastValues[tagRef] = current;
                _onChange(state.Handle, tagRef, current);
@@ -117,16 +141,44 @@ public sealed class PollGroupEngine : IAsyncDisposable
        }
    }

-    /// <summary>Cancel every active subscription. Idempotent.</summary>
-    public ValueTask DisposeAsync()
+    /// <summary>
+    ///     Returns <c>true</c> when <paramref name="previous"/> and <paramref name="current"/>
+    ///     represent different values. Array values are compared structurally
+    ///     (element-by-element) so that a driver producing a fresh array instance on every poll
+    ///     does not trigger spurious change events when the contents are identical.
+    /// </summary>
+    private static bool ValuesAreDifferent(object? previous, object? current)
    {
+        if (previous is Array prevArr && current is Array currArr)
+            return !StructuralComparisons.StructuralEqualityComparer.Equals(prevArr, currArr);
+
+        return !Equals(previous, current);
+    }
+
+    /// <summary>Cancel every active subscription and await all loop tasks. Idempotent.</summary>
+    public async ValueTask DisposeAsync()
+    {
+        // Cancel all loops first so they can all start winding down in parallel.
        foreach (var state in _subscriptions.Values)
        {
            try { state.Cts.Cancel(); } catch { }
+        }
+
+        // Await every loop task before disposing CTSs, ensuring no callback fires after disposal.
+        var waitTasks = _subscriptions.Values
+            .Select(s => s.LoopTask ?? Task.CompletedTask)
+            .ToArray();
+        if (waitTasks.Length > 0)
+        {
+            try { await Task.WhenAll(waitTasks).WaitAsync(TimeSpan.FromSeconds(5)).ConfigureAwait(false); }
+            catch { }
+        }
+
+        foreach (var state in _subscriptions.Values)
+        {
            state.Cts.Dispose();
        }
        _subscriptions.Clear();
-        return ValueTask.CompletedTask;
    }

    private sealed record SubscriptionState(
@@ -137,6 +189,14 @@ public sealed class PollGroupEngine : IAsyncDisposable
    {
        public ConcurrentDictionary<string, DataValueSnapshot> LastValues { get; }
            = new(StringComparer.OrdinalIgnoreCase);
+
+        /// <summary>
+        ///     The background poll-loop task. Assigned immediately after creation in
+        ///     <see cref="Subscribe"/>; awaited during <see cref="Unsubscribe"/> /
+        ///     <see cref="DisposeAsync"/> so disposal is deterministic and no
+        ///     <c>_onChange</c> callback can fire after the caller tears down the subscription.
+        /// </summary>
+        public Task? LoopTask { get; set; }
    }

    private sealed record PollSubscriptionHandle(long Id) : ISubscriptionHandle
@@ -17,7 +17,8 @@ namespace ZB.MOM.WW.OtOpcUa.Core.AlarmHistorian;
 ///     Which state transition this event represents — "Activated" / "Cleared" /
 ///     "Acknowledged" / "Confirmed" / "Shelved" / "Unshelved" / "Disabled" / "Enabled" /
 ///     "CommentAdded". Free-form string because different alarm sources use different
-///     vocabularies; the Galaxy.Host handler maps to the historian's enum on the wire.
+///     vocabularies; the Wonderware historian sidecar (<c>WonderwareHistorianClient</c>)
+///     maps to the historian's enum on the wire.
 /// </param>
 /// <param name="Message">Fully-rendered message text — template tokens already resolved upstream.</param>
 /// <param name="User">Operator who triggered the transition. "system" for engine-driven events (shelving expiry, predicate change).</param>
@@ -2,9 +2,9 @@ namespace ZB.MOM.WW.OtOpcUa.Core.AlarmHistorian;

 /// <summary>
 ///     The historian sink contract — where qualifying alarm events land. Phase 7 plan
-///     decision #17: ingestion routes through Galaxy.Host's pipe so we reuse the
-///     already-loaded <c>aahClientManaged</c> DLLs without loading 32-bit native code
-///     in the main .NET 10 server. Tests use an in-memory fake; production uses
+///     decision #17: ingestion routes through the Wonderware historian sidecar
+///     (<c>WonderwareHistorianClient</c>), which owns the <c>aahClientManaged</c> DLLs
+///     and 32-bit constraints. Tests use an in-memory fake; production uses
 ///     <see cref="SqliteStoreAndForwardSink"/>.
 /// </summary>
 /// <remarks>
@@ -45,13 +45,25 @@ public sealed class NullAlarmHistorianSink : IAlarmHistorianSink
 }

 /// <summary>Diagnostic snapshot surfaced to the Admin UI + /healthz endpoints.</summary>
+/// <param name="QueueDepth">Non-dead-lettered rows waiting to be drained to the historian.</param>
+/// <param name="DeadLetterDepth">Rows that have been permanently failed or have corrupt payloads; retained until the retention window expires.</param>
+/// <param name="LastDrainUtc">UTC timestamp of the most recent drain attempt, or <c>null</c> if no drain has run yet.</param>
+/// <param name="LastSuccessUtc">UTC timestamp of the most recent drain tick that acknowledged at least one row, or <c>null</c> if none.</param>
+/// <param name="LastError">Message from the most recent writer exception or cardinality violation, cleared on the next successful batch.</param>
+/// <param name="DrainState">Current state of the drain worker.</param>
+/// <param name="EvictedCount">
+///     Lifetime count of non-dead-lettered rows discarded because the queue reached
+///     its configured capacity. Non-zero values indicate that accepted alarm events
+///     were dropped before reaching the historian — operator attention required.
+/// </param>
 public sealed record HistorianSinkStatus(
    long QueueDepth,
    long DeadLetterDepth,
    DateTime? LastDrainUtc,
    DateTime? LastSuccessUtc,
    string? LastError,
-    HistorianDrainState DrainState);
+    HistorianDrainState DrainState,
+    long EvictedCount = 0);

 /// <summary>Where the drain worker is in its state machine.</summary>
 public enum HistorianDrainState
@@ -62,7 +74,7 @@ public enum HistorianDrainState
    BackingOff,
 }

-/// <summary>Signaled by the Galaxy.Host-side handler when it fails a batch — drain worker uses this to decide retry cadence.</summary>
+/// <summary>Returned by the Wonderware historian sidecar per event — drain worker uses this to decide retry cadence.</summary>
 public enum HistorianWriteOutcome
 {
    /// <summary>Successfully persisted to the historian. Remove from queue.</summary>
@@ -73,7 +85,7 @@ public enum HistorianWriteOutcome
    PermanentFail,
 }

-/// <summary>What the drain worker delegates writes to — Stream G wires this to the Galaxy.Host IPC client.</summary>
+/// <summary>What the drain worker delegates writes to — production is <c>WonderwareHistorianClient</c> (the Wonderware historian sidecar).</summary>
 public interface IAlarmHistorianWriter
 {
    /// <summary>Push a batch of events to the historian. Returns one outcome per event, same order.</summary>
@@ -6,9 +6,10 @@ namespace ZB.MOM.WW.OtOpcUa.Core.AlarmHistorian;

 /// <summary>
 ///     Phase 7 plan decisions #16–#17 implementation: durable SQLite queue on the node
-///     absorbs every qualifying alarm event, a drain worker batches rows to Galaxy.Host
-///     via <see cref="IAlarmHistorianWriter"/> on an exponential-backoff cadence, and
-///     operator acks never block on the historian being reachable.
+///     absorbs every qualifying alarm event, a drain worker batches rows to the
+///     Wonderware historian sidecar via <see cref="IAlarmHistorianWriter"/> on an
+///     exponential-backoff cadence, and operator acks never block on the historian
+///     being reachable.
 /// </summary>
 /// <remarks>
 ///     <para>
@@ -28,12 +29,18 @@ namespace ZB.MOM.WW.OtOpcUa.Core.AlarmHistorian;
 ///         Dead-lettered rows stay in place for the configured retention window (default
 ///         30 days per Phase 7 plan decision #21) so operators can inspect + manually
 ///         retry before the sweeper purges them. Regular queue capacity is bounded —
-///         overflow evicts the oldest non-dead-lettered rows with a WARN log.
+///         overflow evicts the oldest non-dead-lettered rows with a WARN log. The
+///         durability guarantee is therefore bounded by <see cref="DefaultCapacity"/>:
+///         under a sustained historian outage, accepted events may be evicted before
+///         delivery. The <see cref="HistorianSinkStatus.EvictedCount"/> counter makes
+///         overflow visible to operators without requiring the WARN log to be scraped.
 ///     </para>
 ///     <para>
-///         Drain runs on a shared <see cref="System.Threading.Timer"/>. Exponential
-///         backoff on <see cref="HistorianWriteOutcome.RetryPlease"/>: 1s → 2s → 5s →
-///         15s → 60s cap. <see cref="HistorianWriteOutcome.PermanentFail"/> rows flip
+///         Drain runs on a self-rescheduling one-shot <see cref="System.Threading.Timer"/>.
+///         Exponential backoff on <see cref="HistorianWriteOutcome.RetryPlease"/>:
+///         1s → 2s → 5s → 15s → 60s cap — the backoff is applied to the timer's next
+///         due-time, so a historian outage genuinely slows the drain cadence.
+///         <see cref="HistorianWriteOutcome.PermanentFail"/> rows flip
 ///         the <c>DeadLettered</c> flag on the individual row; neighbors in the batch
 ///         still retry on their own cadence.
 ///     </para>
@@ -63,12 +70,22 @@ public sealed class SqliteStoreAndForwardSink : IAlarmHistorianSink, IDisposable

    private readonly SemaphoreSlim _drainGate = new(1, 1);
    private Timer? _drainTimer;
+    private TimeSpan _tickInterval;
    private int _backoffIndex;
+    private bool _disposed;
+
+    // Core.AlarmHistorian-005: status fields written by the drain timer thread and
+    // read concurrently by GetStatus() / health-check threads. Guard all reads and
+    // writes with this lock so the Admin UI never observes a torn or stale value.
+    private readonly object _statusLock = new();
    private DateTime? _lastDrainUtc;
    private DateTime? _lastSuccessUtc;
    private string? _lastError;
    private HistorianDrainState _drainState = HistorianDrainState.Idle;
-    private bool _disposed;
+    // Core.AlarmHistorian-009: lifetime counter of rows evicted due to capacity overflow.
+    // Surfaces in HistorianSinkStatus so operators can see data-loss events without
+    // having to scrape the WARN log.
+    private long _evictedCount;

    public SqliteStoreAndForwardSink(
        string databasePath,
@@ -87,32 +104,126 @@ public sealed class SqliteStoreAndForwardSink : IAlarmHistorianSink, IDisposable
        _capacity = capacity > 0 ? capacity : throw new ArgumentOutOfRangeException(nameof(capacity));
        _deadLetterRetention = deadLetterRetention ?? DefaultDeadLetterRetention;
        _clock = clock ?? (() => DateTime.UtcNow);
-        _connectionString = $"Data Source={databasePath}";
+        // DefaultTimeout gives ADO.NET command-level retry; the PRAGMA busy_timeout
+        // applied in OpenConnection backs it with SQLite's own busy-handler so an
+        // enqueue/drain collision waits out the file lock instead of throwing
+        // SQLITE_BUSY immediately (Core.AlarmHistorian-004).
+        _connectionString = new SqliteConnectionStringBuilder
+        {
+            DataSource = databasePath,
+            DefaultTimeout = 5,
+        }.ToString();

        InitializeSchema();
    }

+    /// <summary>
+    ///     Open a connection with the busy timeout + WAL journal applied. SQLite
+    ///     serializes writers with a file lock; the busy_timeout lets a writer wait
+    ///     out a competing lock (default is 0 — fail fast), and WAL lets readers and
+    ///     the single writer proceed without blocking each other.
+    /// </summary>
+    private SqliteConnection OpenConnection()
+    {
+        var conn = new SqliteConnection(_connectionString);
+        conn.Open();
+        ApplyPragmas(conn);
+        return conn;
+    }
+
+    /// <summary>Apply busy_timeout + WAL pragmas to an already-open connection (sync).</summary>
+    private static void ApplyPragmas(SqliteConnection conn)
+    {
+        using var pragma = conn.CreateCommand();
+        pragma.CommandText = "PRAGMA busy_timeout=5000; PRAGMA journal_mode=WAL;";
+        pragma.ExecuteNonQuery();
+    }
+
+    /// <summary>Apply busy_timeout + WAL pragmas to an already-open connection (async).</summary>
+    private static async Task ApplyPragmasAsync(SqliteConnection conn, CancellationToken ct)
+    {
+        using var pragma = conn.CreateCommand();
+        pragma.CommandText = "PRAGMA busy_timeout=5000; PRAGMA journal_mode=WAL;";
+        await pragma.ExecuteNonQueryAsync(ct).ConfigureAwait(false);
+    }
+
    /// <summary>
    ///     Start the background drain worker. Not started automatically so tests can
    ///     drive <see cref="DrainOnceAsync"/> deterministically.
    /// </summary>
+    /// <remarks>
+    ///     The worker is a self-rescheduling one-shot <see cref="Timer"/>: after each
+    ///     drain it sets its next due-time to <c>max(tickInterval, CurrentBackoff)</c>
+    ///     so a historian outage actually slows the cadence down the backoff ladder
+    ///     (Core.AlarmHistorian-002). The callback body is fully guarded — a fault in
+    ///     <see cref="DrainOnceAsync"/> is logged and recorded into
+    ///     <see cref="GetStatus"/> rather than being lost as an unobserved task
+    ///     exception (Core.AlarmHistorian-006).
+    /// </remarks>
    public void StartDrainLoop(TimeSpan tickInterval)
    {
        if (_disposed) throw new ObjectDisposedException(nameof(SqliteStoreAndForwardSink));
+        _tickInterval = tickInterval;
        _drainTimer?.Dispose();
-        _drainTimer = new Timer(_ => _ = DrainOnceAsync(CancellationToken.None),
-            null, tickInterval, tickInterval);
+        // One-shot: dueTime = tickInterval, period = Infinite. RescheduleDrain re-arms
+        // it after every tick once the backoff-aware delay is known.
+        _drainTimer = new Timer(DrainTimerCallback, null, tickInterval, Timeout.InfiniteTimeSpan);
    }

-    public Task EnqueueAsync(AlarmHistorianEvent evt, CancellationToken cancellationToken)
+    private async void DrainTimerCallback(object? _)
+    {
+        try
+        {
+            await DrainOnceAsync(CancellationToken.None).ConfigureAwait(false);
+        }
+        catch (Exception ex)
+        {
+            // Without this catch the fault would be an unobserved exception on an
+            // async-void timer callback — never logged, never surfaced. Record it
+            // so the Admin UI / health check sees the stalled drain.
+            lock (_statusLock)
+            {
+                _lastError = ex.Message;
+                _drainState = HistorianDrainState.BackingOff;
+            }
+            _logger.Error(ex, "Historian drain tick faulted; will retry on next tick");
+        }
+        finally
+        {
+            RescheduleDrain();
+        }
+    }
+
+    /// <summary>Re-arm the one-shot drain timer honoring the current backoff window.</summary>
+    private void RescheduleDrain()
+    {
+        if (_disposed) return;
+        HistorianDrainState state;
+        lock (_statusLock) { state = _drainState; }
+        // While backing off, wait out the full ladder delay; otherwise the steady
+        // tick cadence. Never faster than tickInterval.
+        var delay = state == HistorianDrainState.BackingOff
+            ? (CurrentBackoff > _tickInterval ? CurrentBackoff : _tickInterval)
+            : _tickInterval;
+        try { _drainTimer?.Change(delay, Timeout.InfiniteTimeSpan); }
+        catch (ObjectDisposedException) { /* raced with Dispose — nothing to re-arm */ }
+    }
+
+    // Core.AlarmHistorian-003: use async SQLite APIs so the emitting thread is not
+    // blocked waiting for a file-lock or disk write; honor the cancellationToken
+    // throughout. Microsoft.Data.Sqlite's async surface (OpenAsync /
+    // ExecuteNonQueryAsync) is a thin wrapper over the synchronous path, so the
+    // blocking still happens — but on a thread-pool thread, not the caller's thread.
+    public async Task EnqueueAsync(AlarmHistorianEvent evt, CancellationToken cancellationToken)
    {
        if (evt is null) throw new ArgumentNullException(nameof(evt));
        if (_disposed) throw new ObjectDisposedException(nameof(SqliteStoreAndForwardSink));

        using var conn = new SqliteConnection(_connectionString);
-        conn.Open();
+        await conn.OpenAsync(cancellationToken).ConfigureAwait(false);
+        await ApplyPragmasAsync(conn, cancellationToken).ConfigureAwait(false);

-        EnforceCapacity(conn);
+        await EnforceCapacityAsync(conn, cancellationToken).ConfigureAwait(false);

        using var cmd = conn.CreateCommand();
        cmd.CommandText = """
@@ -122,8 +233,7 @@ public sealed class SqliteStoreAndForwardSink : IAlarmHistorianSink, IDisposable
        cmd.Parameters.AddWithValue("$alarmId", evt.AlarmId);
        cmd.Parameters.AddWithValue("$enqueued", _clock().ToString("O"));
        cmd.Parameters.AddWithValue("$payload", JsonSerializer.Serialize(evt));
-        cmd.ExecuteNonQuery();
-        return Task.CompletedTask;
+        await cmd.ExecuteNonQueryAsync(cancellationToken).ConfigureAwait(false);
    }

    /// <summary>
@@ -138,14 +248,42 @@ public sealed class SqliteStoreAndForwardSink : IAlarmHistorianSink, IDisposable
        if (!await _drainGate.WaitAsync(0, ct).ConfigureAwait(false)) return;
        try
        {
-            _drainState = HistorianDrainState.Draining;
-            _lastDrainUtc = _clock();
+            lock (_statusLock)
+            {
+                _drainState = HistorianDrainState.Draining;
+                _lastDrainUtc = _clock();
+            }

            PurgeAgedDeadLetters();
-            var (rowIds, events) = ReadBatch();
-            if (rowIds.Count == 0)
+            var batch = ReadBatch();
+            if (batch.Count == 0)
            {
-                _drainState = HistorianDrainState.Idle;
+                lock (_statusLock) { _drainState = HistorianDrainState.Idle; }
+                return;
+            }
+
+            // A null/un-deserializable payload can never succeed — dead-letter it
+            // immediately for its own RowId so it cannot stall the queue head, and
+            // exclude it from the batch handed to the writer.
+            var corruptRowIds = batch.Where(r => r.Event is null).Select(r => r.RowId).ToList();
+            var liveRows = batch.Where(r => r.Event is not null).ToList();
+            var events = liveRows.Select(r => r.Event!).ToList();
+
+            if (corruptRowIds.Count > 0)
+            {
+                using var corruptConn = OpenConnection();
+                using var corruptTx = corruptConn.BeginTransaction();
+                foreach (var rowId in corruptRowIds)
+                    DeadLetterRow(corruptConn, corruptTx, rowId, $"corrupt payload at {_clock():O}");
+                corruptTx.Commit();
+                _logger.Warning(
+                    "Dead-lettered {Count} historian queue row(s) with un-deserializable payload",
+                    corruptRowIds.Count);
+            }
+
+            if (events.Count == 0)
+            {
+                lock (_statusLock) { _drainState = HistorianDrainState.Idle; }
                return;
            }

@@ -153,7 +291,7 @@ public sealed class SqliteStoreAndForwardSink : IAlarmHistorianSink, IDisposable
            try
            {
                outcomes = await _writer.WriteBatchAsync(events, ct).ConfigureAwait(false);
-                _lastError = null;
+                lock (_statusLock) { _lastError = null; }
            }
            catch (OperationCanceledException)
            {
@@ -162,24 +300,42 @@ public sealed class SqliteStoreAndForwardSink : IAlarmHistorianSink, IDisposable
            catch (Exception ex)
            {
                // Writer-side exception — treat entire batch as RetryPlease.
-                _lastError = ex.Message;
+                lock (_statusLock)
+                {
+                    _lastError = ex.Message;
+                    _drainState = HistorianDrainState.BackingOff;
+                }
                _logger.Warning(ex, "Historian writer threw on batch of {Count}; deferring retry", events.Count);
                BumpBackoff();
-                _drainState = HistorianDrainState.BackingOff;
                return;
            }

+            // Core.AlarmHistorian-007: a cardinality mismatch is a writer contract
+            // violation — potentially the events were already persisted. Rather than
+            // throwing (which, pre -006 fix, was swallowed and left _drainState
+            // stale), treat it as a transient batch failure so the rows stay queued
+            // and the backoff surface becomes visible to the operator. A deterministic
+            // mismatch will stall the row until an operator intervenes or the writer
+            // is fixed — far safer than re-throwing into a fire-and-forget timer.
            if (outcomes.Count != events.Count)
-                throw new InvalidOperationException(
-                    $"Writer returned {outcomes.Count} outcomes for {events.Count} events — expected 1:1");
+            {
+                var msg = $"Writer returned {outcomes.Count} outcomes for {events.Count} events — expected 1:1; treating as batch retry";
+                lock (_statusLock)
+                {
+                    _lastError = msg;
+                    _drainState = HistorianDrainState.BackingOff;
+                }
+                _logger.Warning("Historian writer contract violation: {Msg}", msg);
+                BumpBackoff();
+                return;
+            }

-            using var conn = new SqliteConnection(_connectionString);
-            conn.Open();
+            using var conn = OpenConnection();
            using var tx = conn.BeginTransaction();
            for (var i = 0; i < outcomes.Count; i++)
            {
                var outcome = outcomes[i];
-                var rowId = rowIds[i];
+                var rowId = liveRows[i].RowId;
                switch (outcome)
                {
                    case HistorianWriteOutcome.Ack:
@@ -196,18 +352,20 @@ public sealed class SqliteStoreAndForwardSink : IAlarmHistorianSink, IDisposable
            tx.Commit();

            var acks = outcomes.Count(o => o == HistorianWriteOutcome.Ack);
-            if (acks > 0) _lastSuccessUtc = _clock();
+            lock (_statusLock)
+            {
+                if (acks > 0) _lastSuccessUtc = _clock();
+
+                if (outcomes.Any(o => o == HistorianWriteOutcome.RetryPlease))
+                    _drainState = HistorianDrainState.BackingOff;
+                else
+                    _drainState = HistorianDrainState.Idle;
+            }

            if (outcomes.Any(o => o == HistorianWriteOutcome.RetryPlease))
-            {
                BumpBackoff();
-                _drainState = HistorianDrainState.BackingOff;
-            }
            else
-            {
                ResetBackoff();
-                _drainState = HistorianDrainState.Idle;
-            }
        }
        finally
        {
@@ -217,8 +375,7 @@ public sealed class SqliteStoreAndForwardSink : IAlarmHistorianSink, IDisposable

    public HistorianSinkStatus GetStatus()
    {
-        using var conn = new SqliteConnection(_connectionString);
-        conn.Open();
+        using var conn = OpenConnection();

        long queued;
        long deadlettered;
@@ -233,31 +390,52 @@ public sealed class SqliteStoreAndForwardSink : IAlarmHistorianSink, IDisposable
            deadlettered = (long)(cmd.ExecuteScalar() ?? 0L);
        }

+        // Core.AlarmHistorian-005: snapshot status fields atomically under the lock
+        // so the Admin UI never sees a torn DateTime? or stale DrainState.
+        DateTime? lastDrain, lastSuccess;
+        string? lastError;
+        HistorianDrainState drainState;
+        long evicted;
+        lock (_statusLock)
+        {
+            lastDrain = _lastDrainUtc;
+            lastSuccess = _lastSuccessUtc;
+            lastError = _lastError;
+            drainState = _drainState;
+            evicted = _evictedCount;
+        }
+
        return new HistorianSinkStatus(
            QueueDepth: queued,
            DeadLetterDepth: deadlettered,
-            LastDrainUtc: _lastDrainUtc,
-            LastSuccessUtc: _lastSuccessUtc,
-            LastError: _lastError,
-            DrainState: _drainState);
+            LastDrainUtc: lastDrain,
+            LastSuccessUtc: lastSuccess,
+            LastError: lastError,
+            DrainState: drainState,
+            EvictedCount: evicted);
    }

    /// <summary>Operator action from Admin UI — retry every dead-lettered row. Non-cascading: they rejoin the regular queue + get a fresh backoff.</summary>
    public int RetryDeadLettered()
    {
-        using var conn = new SqliteConnection(_connectionString);
-        conn.Open();
+        using var conn = OpenConnection();
        using var cmd = conn.CreateCommand();
        cmd.CommandText = "UPDATE Queue SET DeadLettered = 0, AttemptCount = 0, LastError = NULL WHERE DeadLettered = 1";
        return cmd.ExecuteNonQuery();
    }

-    private (List<long> rowIds, List<AlarmHistorianEvent> events) ReadBatch()
+    /// <summary>
+    ///     One queued row paired with its deserialized event. <see cref="Event"/> is
+    ///     <c>null</c> when the row's <c>PayloadJson</c> is corrupt or un-deserializable —
+    ///     the <see cref="RowId"/> always stays bound to its own row so outcomes can
+    ///     never be mapped to the wrong row.
+    /// </summary>
+    private readonly record struct QueueRow(long RowId, AlarmHistorianEvent? Event);
+
+    private List<QueueRow> ReadBatch()
    {
-        var rowIds = new List<long>();
-        var events = new List<AlarmHistorianEvent>();
-        using var conn = new SqliteConnection(_connectionString);
-        conn.Open();
+        var rows = new List<QueueRow>();
+        using var conn = OpenConnection();
        using var cmd = conn.CreateCommand();
        cmd.CommandText = """
            SELECT RowId, PayloadJson FROM Queue
@@ -269,12 +447,21 @@ public sealed class SqliteStoreAndForwardSink : IAlarmHistorianSink, IDisposable
        using var reader = cmd.ExecuteReader();
        while (reader.Read())
        {
-            rowIds.Add(reader.GetInt64(0));
+            var rowId = reader.GetInt64(0);
            var payload = reader.GetString(1);
-            var evt = JsonSerializer.Deserialize<AlarmHistorianEvent>(payload);
-            if (evt is not null) events.Add(evt);
+            AlarmHistorianEvent? evt;
+            try
+            {
+                evt = JsonSerializer.Deserialize<AlarmHistorianEvent>(payload);
+            }
+            catch (JsonException)
+            {
+                // Malformed JSON — carry a null event so the caller dead-letters this row.
+                evt = null;
+            }
+            rows.Add(new QueueRow(rowId, evt));
        }
-        return (rowIds, events);
+        return rows;
    }

    private static void DeleteRow(SqliteConnection conn, SqliteTransaction tx, long rowId)
@@ -341,16 +528,50 @@ public sealed class SqliteStoreAndForwardSink : IAlarmHistorianSink, IDisposable
            cmd.Parameters.AddWithValue("$n", toEvict);
            cmd.ExecuteNonQuery();
        }
+        // Core.AlarmHistorian-009: increment the lifetime eviction counter so the
+        // Admin UI / health check can report overflow without requiring log scraping.
+        lock (_statusLock) { _evictedCount += toEvict; }
        _logger.Warning(
-            "Historian queue at capacity {Cap} — evicted {Count} oldest row(s) to make room",
-            _capacity, toEvict);
+            "Historian queue at capacity {Cap} — evicted {Count} oldest row(s) to make room (lifetime evictions: {Total})",
+            _capacity, toEvict, _evictedCount);
+    }
+
+    // Async variant used by EnqueueAsync (Core.AlarmHistorian-003).
+    private async Task EnforceCapacityAsync(SqliteConnection conn, CancellationToken ct)
+    {
+        long count;
+        using (var cmd = conn.CreateCommand())
+        {
+            cmd.CommandText = "SELECT COUNT(*) FROM Queue WHERE DeadLettered = 0";
+            count = (long)(await cmd.ExecuteScalarAsync(ct).ConfigureAwait(false) ?? 0L);
+        }
+        if (count < _capacity) return;
+
+        var toEvict = count - _capacity + 1;
+        using (var cmd = conn.CreateCommand())
+        {
+            cmd.CommandText = """
+                DELETE FROM Queue
+                WHERE RowId IN (
+                    SELECT RowId FROM Queue
+                    WHERE DeadLettered = 0
+                    ORDER BY RowId ASC
+                    LIMIT $n
+                )
+                """;
+            cmd.Parameters.AddWithValue("$n", toEvict);
+            await cmd.ExecuteNonQueryAsync(ct).ConfigureAwait(false);
+        }
+        lock (_statusLock) { _evictedCount += toEvict; }
+        _logger.Warning(
+            "Historian queue at capacity {Cap} — evicted {Count} oldest row(s) to make room (lifetime evictions: {Total})",
+            _capacity, toEvict, _evictedCount);
    }

    private void PurgeAgedDeadLetters()
    {
        var cutoff = (_clock() - _deadLetterRetention).ToString("O");
-        using var conn = new SqliteConnection(_connectionString);
-        conn.Open();
+        using var conn = OpenConnection();
        using var cmd = conn.CreateCommand();
        cmd.CommandText = """
            DELETE FROM Queue
@@ -364,8 +585,7 @@ public sealed class SqliteStoreAndForwardSink : IAlarmHistorianSink, IDisposable

    private void InitializeSchema()
    {
-        using var conn = new SqliteConnection(_connectionString);
-        conn.Open();
+        using var conn = OpenConnection();
        using var cmd = conn.CreateCommand();
        cmd.CommandText = """
            CREATE TABLE IF NOT EXISTS Queue (
@@ -39,7 +39,15 @@ public sealed class ScriptedAlarmEngine : IDisposable
    private readonly Func<DateTime> _clock;
    private readonly TimeSpan _scriptTimeout;

-    private readonly Dictionary<string, AlarmState> _alarms = new(StringComparer.Ordinal);
+    // ConcurrentDictionary, not a plain Dictionary: every mutation happens under
+    // _evalGate, but four read paths (GetState, GetAllStates, LoadedAlarmIds,
+    // RunShelvingCheck) touch _alarms from arbitrary threads (Admin UI request
+    // threads, the shelving Timer thread-pool callback) without holding the gate.
+    // A plain Dictionary read concurrent with a writer's entry reassignment can
+    // throw or return torn state; ConcurrentDictionary makes entry assignment and
+    // snapshot enumeration safe. The only write shapes are indexer-set and Clear,
+    // both of which ConcurrentDictionary supports atomically. (Core.ScriptedAlarms-001)
+    private readonly ConcurrentDictionary<string, AlarmState> _alarms = new(StringComparer.Ordinal);
    private readonly ConcurrentDictionary<string, DataValueSnapshot> _valueCache
        = new(StringComparer.Ordinal);
    private readonly Dictionary<string, HashSet<string>> _alarmsReferencing
@@ -70,7 +78,7 @@ public sealed class ScriptedAlarmEngine : IDisposable
    /// <summary>Raised for every emission the Part9StateMachine produces that the engine should publish.</summary>
    public event EventHandler<ScriptedAlarmEvent>? OnEvent;

-    public IReadOnlyCollection<string> LoadedAlarmIds => _alarms.Keys;
+    public IReadOnlyCollection<string> LoadedAlarmIds => _alarms.Keys.ToArray();

    /// <summary>
    ///     Load a batch of alarm definitions. Compiles every predicate, aggregates any
@@ -135,12 +143,17 @@ public sealed class ScriptedAlarmEngine : IDisposable
                    + string.Join("\n  ", compileFailures));
            }

-            // Seed the value cache with current upstream values + subscribe for changes.
+            // Seed the value cache with current tag values before subscribing. The
+            // ReadTag calls happen first so that the initial predicate evaluation below
+            // (startup recovery, decision #14) uses a consistent snapshot.
+            // Subscriptions are established AFTER _loaded = true so that any synchronous
+            // initial-push an ITagUpstreamSource delivers from inside SubscribeTag arrives
+            // when _alarms is fully initialised. Before _loaded = true, a synchronous push
+            // would race the in-progress state restore and could overwrite the carefully
+            // seeded cache with a push that has no defined ordering relative to ReadTag.
+            // (Core.ScriptedAlarms-004)
            foreach (var path in _alarmsReferencing.Keys)
-            {
                _valueCache[path] = _upstream.ReadTag(path);
-                _upstreamSubscriptions.Add(_upstream.SubscribeTag(path, OnUpstreamChange));
-            }

            // Restore persisted state, falling back to Fresh where nothing was saved,
            // then re-derive ActiveState from the current predicate per decision #14.
@@ -155,8 +168,21 @@ public sealed class ScriptedAlarmEngine : IDisposable
            }

            _loaded = true;
+
+            // Subscribe after _loaded = true and full state restore. If an upstream
+            // implementation pushes its initial value synchronously from inside
+            // SubscribeTag, OnUpstreamChange will queue a ReevaluateAsync that acquires
+            // _evalGate — it will correctly block until LoadAsync releases the gate, then
+            // re-evaluate against the fully-populated _alarms dict.
+            foreach (var path in _alarmsReferencing.Keys)
+                _upstreamSubscriptions.Add(_upstream.SubscribeTag(path, OnUpstreamChange));
            _engineLogger.Information("ScriptedAlarmEngine loaded {Count} alarm(s)", _alarms.Count);

+            // Dispose any previously-created timer before reassigning; a second LoadAsync
+            // call without this would leave two timers firing against the same engine.
+            // (Core.ScriptedAlarms-002)
+            _shelvingTimer?.Dispose();
+
            // Start the shelving-check timer — ticks every 5s, expires any timed shelves
            // that have passed their UnshelveAtUtc.
            _shelvingTimer = new Timer(_ => RunShelvingCheck(),
@@ -212,8 +238,12 @@ public sealed class ScriptedAlarmEngine : IDisposable
        try
        {
            var result = op(state.Condition);
-            _alarms[alarmId] = state with { Condition = result.State };
+            // Persist BEFORE updating in-memory so a store failure leaves both
+            // in-memory and persisted at the prior state rather than diverging.
+            // If SaveAsync throws the in-memory _alarms entry stays unchanged and
+            // the exception propagates to the caller. (Core.ScriptedAlarms-007)
            await _store.SaveAsync(result.State, ct).ConfigureAwait(false);
+            _alarms[alarmId] = state with { Condition = result.State };
            if (result.Emission != EmissionKind.None) EmitEvent(state, result.State, result.Emission);
        }
        finally { _evalGate.Release(); }
@@ -240,6 +270,12 @@ public sealed class ScriptedAlarmEngine : IDisposable
            await _evalGate.WaitAsync(ct).ConfigureAwait(false);
            try
            {
+                // Re-check after acquiring the gate: a Dispose() call may have
+                // completed between our _evalGate.WaitAsync and here. Writing to a
+                // disposing store or mutating _alarms after clear is unsafe.
+                // (Core.ScriptedAlarms-005)
+                if (_disposed) return;
+
                foreach (var id in alarmIds)
                {
                    if (!_alarms.TryGetValue(id, out var state)) continue;
@@ -247,8 +283,10 @@ public sealed class ScriptedAlarmEngine : IDisposable
                        state, state.Condition, _clock(), ct).ConfigureAwait(false);
                    if (!ReferenceEquals(newState, state.Condition))
                    {
-                        _alarms[id] = state with { Condition = newState };
+                        // Persist before updating in-memory so a store failure leaves
+                        // both sides at the prior state. (Core.ScriptedAlarms-007)
                        await _store.SaveAsync(newState, ct).ConfigureAwait(false);
+                        _alarms[id] = state with { Condition = newState };
                    }
                }
            }
@@ -369,6 +407,13 @@ public sealed class ScriptedAlarmEngine : IDisposable
        _ = ShelvingCheckAsync(ids, CancellationToken.None);
    }

+    /// <summary>
+    ///     Test hook — triggers a shelving check synchronously without waiting for
+    ///     the 5-second timer. Allows tests that inject a controllable clock to advance
+    ///     time and immediately drive timed-shelve expiry. (Core.ScriptedAlarms-012)
+    /// </summary>
+    internal void RunShelvingCheckForTest() => RunShelvingCheck();
+
    private async Task ShelvingCheckAsync(IReadOnlyList<string> alarmIds, CancellationToken ct)
    {
        try
@@ -376,6 +421,13 @@ public sealed class ScriptedAlarmEngine : IDisposable
            await _evalGate.WaitAsync(ct).ConfigureAwait(false);
            try
            {
+                // Re-check after acquiring the gate: Timer.Dispose() does not wait for
+                // running callbacks, so a shelving-check callback that passed the _disposed
+                // check in RunShelvingCheck can arrive here after Dispose() has returned.
+                // Mutating _alarms or saving to a disposed store here is unsafe.
+                // (Core.ScriptedAlarms-005)
+                if (_disposed) return;
+
                var now = _clock();
                foreach (var id in alarmIds)
                {
@@ -383,8 +435,10 @@ public sealed class ScriptedAlarmEngine : IDisposable
                    var result = Part9StateMachine.ApplyShelvingCheck(state.Condition, now);
                    if (!ReferenceEquals(result.State, state.Condition))
                    {
-                        _alarms[id] = state with { Condition = result.State };
+                        // Persist before updating in-memory so a store failure leaves
+                        // both sides at the prior state. (Core.ScriptedAlarms-007)
                        await _store.SaveAsync(result.State, ct).ConfigureAwait(false);
+                        _alarms[id] = state with { Condition = result.State };
                        if (result.Emission != EmissionKind.None)
                            EmitEvent(state, result.State, result.Emission);
                    }
@@ -419,7 +473,11 @@ public sealed class ScriptedAlarmEngine : IDisposable
        _disposed = true;
        _shelvingTimer?.Dispose();
        UnsubscribeFromUpstream();
-        _alarms.Clear();
+        // Do NOT clear _alarms here: Timer.Dispose() does not wait for in-flight callbacks,
+        // so a ShelvingCheckAsync or ReevaluateAsync can still be running inside _evalGate.
+        // Those paths now re-check _disposed after acquiring the gate and bail out safely.
+        // Clearing _alarms outside the gate would race concurrent reads and is unnecessary
+        // (the whole object is being discarded). (Core.ScriptedAlarms-005)
        _alarmsReferencing.Clear();
    }

@@ -21,11 +21,15 @@ namespace ZB.MOM.WW.OtOpcUa.Core.Scripting;
 ///         token.
 ///     </para>
 ///     <para>
-///         Identifier matching is by spelling: the extractor looks for
-///         <c>ctx.GetTag(...)</c> / <c>ctx.SetVirtualTag(...)</c> literally. A deliberately
-///         misspelled method call (<c>ctx.GetTagz</c>) is not picked up but will also fail
-///         to compile against <see cref="ScriptContext"/>, so there's no way to smuggle a
-///         dependency past the extractor while still having a working script.
+///         Matching is by spelling: the extractor looks for member-access invocations
+///         whose receiver identifier is literally <c>ctx</c> and whose method name is
+///         <c>GetTag</c> or <c>SetVirtualTag</c>. A deliberately misspelled method call
+///         (<c>ctx.GetTagz</c>) is not picked up but will also fail to compile against
+///         <see cref="ScriptContext"/>, so there is no way to smuggle a dependency past the
+///         extractor while still having a working script. Calls with the same method name on
+///         a different receiver (<c>other.GetTag("X")</c>) are explicitly ignored so that
+///         scripts defining local helper types with matching names do not produce spurious
+///         dependencies. (Core.Scripting-004.)
 ///     </para>
 /// </remarks>
 public static class DependencyExtractor
@@ -67,10 +71,15 @@ public static class DependencyExtractor

        public override void VisitInvocationExpression(InvocationExpressionSyntax node)
        {
-            // Only interested in member-access form: ctx.GetTag(...) / ctx.SetVirtualTag(...).
-            // Anything else (free functions, chained calls, static calls) is ignored — but
-            // still visit children in case a ctx.GetTag call is nested inside.
-            if (node.Expression is MemberAccessExpressionSyntax member)
+            // Only interested in ctx.GetTag(...) / ctx.SetVirtualTag(...) — member-access
+            // form where the receiver is the identifier "ctx" (the ScriptGlobals<T>.ctx
+            // field). Calls with the same method name on a different receiver (e.g.
+            // someHelper.GetTag("X")) are ignored — not picking them up avoids spurious
+            // dependencies when scripts define local types with matching method names.
+            // (Core.Scripting-004.)
+            if (node.Expression is MemberAccessExpressionSyntax member
+                && member.Expression is IdentifierNameSyntax receiver
+                && receiver.Identifier.ValueText == "ctx")
            {
                var methodName = member.Name.Identifier.ValueText;
                if (methodName is nameof(ScriptContext.GetTag) or nameof(ScriptContext.SetVirtualTag))
@@ -18,12 +18,12 @@ namespace ZB.MOM.WW.OtOpcUa.Core.Scripting;
 /// <remarks>
 ///     <para>
 ///         Deny-list is the authoritative Phase 7 plan decision #6 set:
-///         <c>System.IO</c>, <c>System.Net</c>, <c>System.Diagnostics.Process</c>,
+///         <c>System.IO</c>, <c>System.Net</c>, <c>System.Diagnostics</c>,
 ///         <c>System.Reflection</c>, <c>System.Threading.Thread</c>,
-///         <c>System.Runtime.InteropServices</c>. <c>System.Environment</c> (for process
-///         env-var read) is explicitly left allowed — it's read-only process state, doesn't
-///         persist outside, and the test file pins this compromise so tightening later is
-///         a deliberate plan decision.
+///         <c>System.Threading.Tasks</c> (scripts are synchronous predicates — no
+///         legitimate need to start background tasks; a <c>Task.Run</c> fan-out outlives
+///         the evaluation timeout entirely), <c>System.Runtime.InteropServices</c>,
+///         <c>Microsoft.Win32</c>. (Core.Scripting-003.)
 ///     </para>
 ///     <para>
 ///         Deny-list prefix match. <c>System.Net</c> catches <c>System.Net.Http</c>,
@@ -32,6 +32,21 @@ namespace ZB.MOM.WW.OtOpcUa.Core.Scripting;
 ///         operator audience authors it through a helper the plan team adds as part of
 ///         the <see cref="ScriptContext"/> surface, not by unlocking the namespace.
 ///     </para>
+///     <para>
+///         A namespace-prefix deny-list is necessary but not sufficient: dangerous types
+///         such as <c>System.Environment</c>, <c>System.AppDomain</c>, <c>System.GC</c>,
+///         and <c>System.Activator</c> live <em>directly</em> in the <c>System</c>
+///         namespace inside <c>System.Private.CoreLib</c> — the same allow-listed assembly
+///         that supplies primitives (<c>int</c>, <c>string</c>, <c>Math</c>). They cannot
+///         be blocked by namespace because <c>System</c> itself must stay allowed. They
+///         are therefore denied <em>type-granularly</em> via
+///         <see cref="ForbiddenFullTypeNames"/>. <c>Environment.Exit</c> /
+///         <c>Environment.FailFast</c> kill the in-process OPC UA server outright;
+///         <c>Activator.CreateInstance</c> is a reflection-equivalent escape; <c>GC</c>
+///         and <c>AppDomain</c> expose process-wide control. Legitimate <c>System</c>
+///         types (<c>Math</c>, <c>String</c>, <c>Convert</c>, <c>DateTime</c>, …) are not
+///         on the list and stay usable. (Core.Scripting-001.)
+///     </para>
 /// </remarks>
 public static class ForbiddenTypeAnalyzer
 {
@@ -46,11 +61,58 @@ public static class ForbiddenTypeAnalyzer
    [
        "System.IO",
        "System.Net",
-        "System.Diagnostics",       // catches Process, ProcessStartInfo, EventLog, Trace/Debug file sinks
+        "System.Diagnostics",           // catches Process, ProcessStartInfo, EventLog, Trace/Debug file sinks
        "System.Reflection",
-        "System.Threading.Thread",  // raw Thread — Tasks stay allowed (different namespace)
+        // System.Threading.Thread is NOT in this list: Thread's containing namespace is
+        // "System.Threading" (not "System.Threading.Thread"), so a prefix check on
+        // "System.Threading.Thread" never matches. Thread is denied type-granularly via
+        // ForbiddenFullTypeNames instead so the check actually fires.
+        "System.Threading.Tasks",       // Task.Run / Parallel — scripts are synchronous predicates
+                                        // and have no legitimate need to start background work;
+                                        // a Task fan-out outlives the evaluation timeout entirely
+                                        // (Core.Scripting-003).
        "System.Runtime.InteropServices",
-        "Microsoft.Win32",          // registry
+        "Microsoft.Win32",              // registry
+    ];
+
+    /// <summary>
+    ///     Fully-qualified type names scripts are NOT allowed to reference, regardless of
+    ///     namespace. These types live directly in the allow-listed <c>System</c>
+    ///     namespace (in <c>System.Private.CoreLib</c>), so a namespace-prefix rule cannot
+    ///     reach them without also blocking primitives. Matched by exact fully-qualified
+    ///     name against the resolved <em>type</em> symbol — every member of the type
+    ///     (including read-only ones) is therefore rejected.
+    /// </summary>
+    /// <remarks>
+    ///     <list type="bullet">
+    ///         <item><c>System.Environment</c> — <c>Exit</c> / <c>FailFast</c> terminate
+    ///         the host process; the whole type is denied (the read members have no
+    ///         legitimate SCADA-predicate use either).</item>
+    ///         <item><c>System.AppDomain</c> — process-wide assembly-load /
+    ///         unhandled-exception control.</item>
+    ///         <item><c>System.GC</c> — <c>Collect</c> / <c>AddMemoryPressure</c> perturb
+    ///         the process memory subsystem.</item>
+    ///         <item><c>System.Activator</c> — <c>CreateInstance</c> is a
+    ///         reflection-equivalent escape that constructs a forbidden type by name
+    ///         without ever naming it syntactically.</item>
+    ///         <item><c>System.Threading.Thread</c> — raw thread creation bypasses the
+    ///         per-evaluation timeout; denied type-granularly because its containing
+    ///         namespace is <c>System.Threading</c> (shared with allowed types like
+    ///         <c>CancellationToken</c>), so a namespace-prefix rule cannot reach it
+    ///         without blocking unrelated types. (Core.Scripting-010.)</item>
+    ///     </list>
+    /// </remarks>
+    public static readonly IReadOnlyList<string> ForbiddenFullTypeNames =
+    [
+        "System.Environment",
+        "System.AppDomain",
+        "System.GC",
+        "System.Activator",
+        // System.Threading.Thread lives in the System.Threading namespace (shared with
+        // CancellationToken, SemaphoreSlim, etc.), so a namespace-prefix deny-list cannot
+        // target it without blocking those legitimate types. Denied type-granularly here.
+        // (Core.Scripting-010.)
+        "System.Threading.Thread",
    ];

    /// <summary>
@@ -58,6 +120,33 @@ public static class ForbiddenTypeAnalyzer
    ///     Returns empty list when the script is clean; non-empty list means the script
    ///     must be rejected at publish with the rejections surfaced to the operator.
    /// </summary>
+    /// <remarks>
+    ///     <para>
+    ///         The walker has two passes per node. Pass (1) is the member / call surface:
+    ///         <c>ObjectCreationExpressionSyntax</c>, <c>InvocationExpressionSyntax</c> with
+    ///         a member-access target, <c>MemberAccessExpressionSyntax</c>, and bare
+    ///         <c>IdentifierNameSyntax</c> are resolved via
+    ///         <see cref="SemanticModel"/>.<c>GetSymbolInfo</c>. This catches static calls
+    ///         (<c>System.IO.File.ReadAllText</c>) and constructors, and is deliberately
+    ///         narrow: resolving <c>GetSymbolInfo</c> on <em>every</em> node would flag
+    ///         harmless inherited members (e.g. <c>typeof(int).Name</c> resolves
+    ///         <c>Name</c> to <c>System.Reflection.MemberInfo</c>, the base type that
+    ///         declares it, even though the receiver type <c>System.Type</c> is allowed).
+    ///     </para>
+    ///     <para>
+    ///         Pass (2) — the Core.Scripting-002 fix — resolves the <em>type</em> of every
+    ///         <c>TypeSyntax</c> node via <c>GetTypeInfo</c>. The old walker only inspected
+    ///         the four node kinds above, so a forbidden type named through
+    ///         <c>typeof(System.IO.File)</c>, a generic argument
+    ///         (<c>List&lt;System.IO.FileInfo&gt;</c>), a cast
+    ///         (<c>(System.IO.Stream)null</c>), an <c>is</c> / <c>as</c> type pattern,
+    ///         <c>default(System.Reflection.Assembly)</c>, an array-creation element type,
+    ///         or an explicitly-typed local declaration produced no examined node and so
+    ///         slipped through. Every <c>TypeSyntax</c> resolves to a concrete
+    ///         <see cref="ITypeSymbol"/>; generic type arguments and array element types
+    ///         are unwrapped recursively so a forbidden type nested at any depth is caught.
+    ///     </para>
+    /// </remarks>
    public static IReadOnlyList<ForbiddenTypeRejection> Analyze(Compilation compilation)
    {
        if (compilation is null) throw new ArgumentNullException(nameof(compilation));
@@ -69,6 +158,9 @@ public static class ForbiddenTypeAnalyzer
            var root = tree.GetRoot();
            foreach (var node in root.DescendantNodes())
            {
+                // Pass (1) — member / call surface. Narrowly targeted at the node kinds
+                // that name a callable member or constructor, so inherited-member
+                // resolution does not produce false positives.
                switch (node)
                {
                    case ObjectCreationExpressionSyntax obj:
@@ -88,11 +180,43 @@ public static class ForbiddenTypeAnalyzer
                        CheckSymbol(semantic.GetSymbolInfo(id).Symbol, id.Span, rejections);
                        break;
                }
+
+                // Pass (2) — type-reference surface (Core.Scripting-002). Every TypeSyntax
+                // resolves to the type it names, regardless of the syntactic form that
+                // introduced it (typeof operand, cast type, generic argument, default(T)
+                // operand, array element type, is/as pattern type, declared local type).
+                // Type arguments and array element types are walked recursively.
+                if (node is TypeSyntax)
+                    CheckTypeSymbol(semantic.GetTypeInfo(node).Type, node.Span, rejections);
            }
        }
        return rejections;
    }

+    /// <summary>
+    ///     Reject <paramref name="type"/> if it (or, recursively, any of its generic type
+    ///     arguments / array element types) is forbidden. Walks the full type tree so a
+    ///     forbidden type nested inside an allowed generic — e.g.
+    ///     <c>List&lt;System.IO.FileInfo&gt;</c> — is still caught.
+    /// </summary>
+    private static void CheckTypeSymbol(ITypeSymbol? type, TextSpan span, List<ForbiddenTypeRejection> rejections)
+    {
+        if (type is null) return;
+
+        CheckSymbol(type, span, rejections);
+
+        switch (type)
+        {
+            case IArrayTypeSymbol array:
+                CheckTypeSymbol(array.ElementType, span, rejections);
+                break;
+            case INamedTypeSymbol named:
+                foreach (var arg in named.TypeArguments)
+                    CheckTypeSymbol(arg, span, rejections);
+                break;
+        }
+    }
+
    private static void CheckSymbol(ISymbol? symbol, TextSpan span, List<ForbiddenTypeRejection> rejections)
    {
        if (symbol is null) return;
@@ -107,17 +231,49 @@ public static class ForbiddenTypeAnalyzer
        };
        if (typeSymbol is null) return;

+        var typeName = typeSymbol.ToDisplayString();
+
+        // The broadened walk (Core.Scripting-002) resolves both GetSymbolInfo and
+        // GetTypeInfo on every node, so the same forbidden reference can be hit several
+        // times. Dedupe on span + type so the operator sees one rejection per offending
+        // reference, not a noisy pile of identical messages.
+        if (rejections.Any(r => r.Span == span && r.TypeName == typeName))
+            return;
+
        var ns = typeSymbol.ContainingNamespace?.ToDisplayString() ?? string.Empty;
        foreach (var forbidden in ForbiddenNamespacePrefixes)
        {
            if (ns == forbidden || ns.StartsWith(forbidden + ".", StringComparison.Ordinal))
+            {
+                rejections.Add(new ForbiddenTypeRejection(
+                    Span: span,
+                    TypeName: typeName,
+                    Namespace: ns,
+                    Message: $"Type '{typeName}' is in the forbidden namespace '{ns}'. " +
+                             $"Scripts cannot reach {forbidden}* per Phase 7 sandbox rules."));
+                return;
+            }
+        }
+
+        // Type-granular deny-list — dangerous types that live in the allow-listed
+        // System namespace and so cannot be caught by ForbiddenNamespacePrefixes
+        // (Core.Scripting-001). Matched on the full type name; OriginalDefinition
+        // unwraps any generic construction before naming.
+        var fullTypeName = typeSymbol.OriginalDefinition.ToDisplayString(
+            SymbolDisplayFormat.FullyQualifiedFormat.WithGlobalNamespaceStyle(
+                SymbolDisplayGlobalNamespaceStyle.Omitted));
+        foreach (var forbiddenType in ForbiddenFullTypeNames)
+        {
+            if (fullTypeName == forbiddenType)
            {
                rejections.Add(new ForbiddenTypeRejection(
                    Span: span,
                    TypeName: typeSymbol.ToDisplayString(),
                    Namespace: ns,
-                    Message: $"Type '{typeSymbol.ToDisplayString()}' is in the forbidden namespace '{ns}'. " +
-                             $"Scripts cannot reach {forbidden}* per Phase 7 sandbox rules."));
+                    Message: $"Type '{forbiddenType}' is on the Phase 7 sandbox forbidden-type " +
+                             $"deny-list. Scripts cannot reach process-control types " +
+                             $"(Environment / AppDomain / GC / Activator) even though they " +
+                             $"live in the allowed 'System' namespace."));
                return;
            }
        }
@@ -76,6 +76,14 @@ public sealed class TimedScriptEvaluator<TContext, TResult>
            // WaitAsync's synthesized timeout — the inner task may still be running
            // on its thread-pool thread (known leak documented in the class summary).
            // Wrap so callers can distinguish from user-written timeout logic.
+            //
+            // The class docs guarantee "caller-supplied cancel wins over timeout".
+            // When both fire at nearly the same time, WaitAsync observes them in
+            // non-deterministic order, so a cancel that arrives a few µs after the
+            // timeout still reaches here as TimeoutException. Re-check the token so
+            // the guarantee holds regardless of race ordering. (Core.Scripting-007.)
+            if (ct.IsCancellationRequested)
+                throw new OperationCanceledException(ct);
            throw new ScriptTimeoutException(Timeout);
        }
    }
@@ -31,6 +31,11 @@ public sealed class DependencyGraph
    private readonly Dictionary<string, HashSet<string>> _dependsOn = new(StringComparer.Ordinal);
    private readonly Dictionary<string, HashSet<string>> _dependents = new(StringComparer.Ordinal);

+    // Cached topological rank — built lazily by TransitiveDependentsInOrder and
+    // invalidated whenever the graph is mutated (Add / Clear). Avoids re-running
+    // a full O(V+E) Kahn pass on every change-cascade event.
+    private Dictionary<string, int>? _cachedRank;
+
    /// <summary>
    ///     Register a node and the set of tags it depends on. Idempotent — re-adding
    ///     the same node overwrites the prior dependency set, so re-publishing an edited
@@ -58,6 +63,7 @@ public sealed class DependencyGraph
                _dependents[dep] = set = new HashSet<string>(StringComparer.Ordinal);
            set.Add(nodeId);
        }
+        _cachedRank = null; // graph mutated — invalidate cached rank
    }

    /// <summary>Tag paths <paramref name="nodeId"/> directly reads.</summary>
@@ -84,9 +90,11 @@ public sealed class DependencyGraph

        var result = new List<string>();
        var visited = new HashSet<string>(StringComparer.Ordinal);
-        var order = TopologicalSort();
-        var rank = new Dictionary<string, int>(StringComparer.Ordinal);
-        for (var i = 0; i < order.Count; i++) rank[order[i]] = i;
+
+        // Reuse the cached rank to avoid an O(V+E) Kahn pass on every change event.
+        // The cache is invalidated whenever the graph is mutated (Add / Clear), so it
+        // is always consistent with the current graph structure.
+        var rank = GetOrBuildRank();

        // DFS from the changed node collecting every reachable dependent.
        var stack = new Stack<string>();
@@ -115,6 +123,16 @@ public sealed class DependencyGraph
        return result;
    }

+    private Dictionary<string, int> GetOrBuildRank()
+    {
+        if (_cachedRank is not null) return _cachedRank;
+        var order = TopologicalSort();
+        var rank = new Dictionary<string, int>(order.Count, StringComparer.Ordinal);
+        for (var i = 0; i < order.Count; i++) rank[order[i]] = i;
+        _cachedRank = rank;
+        return rank;
+    }
+
    /// <summary>Iterable of every registered node id (inputs-only tags excluded).</summary>
    public IReadOnlyCollection<string> RegisteredNodes => _dependsOn.Keys;

@@ -249,6 +267,7 @@ public sealed class DependencyGraph
    {
        _dependsOn.Clear();
        _dependents.Clear();
+        _cachedRank = null; // graph cleared — invalidate cached rank
    }
 }

@@ -76,8 +76,15 @@ public sealed class VirtualTagEngine : IDisposable
        _graph.Clear();

        var compileFailures = new List<string>();
+        var seenPaths = new HashSet<string>(StringComparer.Ordinal);
        foreach (var def in definitions)
        {
+            if (!seenPaths.Add(def.Path))
+            {
+                compileFailures.Add($"{def.Path}: duplicate path — only one definition per path is allowed");
+                continue;
+            }
+
            try
            {
                var extraction = DependencyExtractor.Extract(def.ScriptSource);
@@ -113,9 +120,10 @@ public sealed class VirtualTagEngine : IDisposable

        // Subscribe to every referenced upstream path (driver tags only — virtual tags
        // cascade internally). Seed the cache with current upstream values so first
-        // evaluations see something real.
-        var upstreamPaths = definitions
-            .SelectMany(d => _tags[d.Path].Reads)
+        // evaluations see something real. Iterate _tags.Values (the registered set) rather
+        // than definitions to avoid indexing by a raw input list that may contain duplicates.
+        var upstreamPaths = _tags.Values
+            .SelectMany(s => s.Reads)
            .Where(p => !_tags.ContainsKey(p))
            .Distinct(StringComparer.Ordinal);
        foreach (var path in upstreamPaths)
@@ -229,12 +237,18 @@ public sealed class VirtualTagEngine : IDisposable
        {
            var ctxCache = BuildReadCache(state.Reads);

-            // Cold-start guard — hold the prior value when any upstream input is still
-            // unset or Bad-quality. Evaluating with nulls would throw inside the script
-            // (scripts cast ctx.GetTag(path).Value directly) and produce a persistent
-            // BadInternalError result until the upstream cache fills. Keeping the prior
-            // snapshot is more honest: the virtual tag simply hasn't been computed yet.
-            if (!AreInputsReady(ctxCache)) return;
+            // Cold-start guard — when any upstream input is still unset or Bad-quality,
+            // publish a BadWaitingForInitialData snapshot so OPC UA clients see a defined
+            // quality rather than observing "not yet computed" as a stale Good value.
+            // Evaluating with nulls would throw inside the script (scripts cast
+            // ctx.GetTag(path).Value directly) and produce a persistent BadInternalError.
+            if (!AreInputsReady(ctxCache))
+            {
+                var notReady = new DataValueSnapshot(null, 0x80320000u /* BadWaitingForInitialData */, null, _clock());
+                _valueCache[path] = notReady;
+                NotifyObservers(path, notReady);
+                return;
+            }

            var context = new VirtualTagContext(
                ctxCache,
@@ -247,7 +261,12 @@ public sealed class VirtualTagEngine : IDisposable
            {
                var raw = await state.Evaluator.RunAsync(context, ct).ConfigureAwait(false);
                var coerced = CoerceResult(raw, state.Definition.DataType);
-                result = new DataValueSnapshot(coerced, 0u, _clock(), _clock());
+                // null from CoerceResult means the conversion threw (raw was non-null but
+                // not convertible to the declared type). Surface as BadInternalError so
+                // the OPC UA client sees a defined Bad quality rather than a Good null.
+                result = (raw is not null && coerced is null)
+                    ? new DataValueSnapshot(null, 0x80020000u /* BadInternalError */, null, _clock())
+                    : new DataValueSnapshot(coerced, 0u, _clock(), _clock());
            }
            catch (ScriptTimeoutException tex)
            {
@@ -315,6 +334,14 @@ public sealed class VirtualTagEngine : IDisposable
        _valueCache[path] = snap;
        NotifyObservers(path, snap);
        if (_tags[path].Definition.Historize) _history.Record(path, snap);
+
+        // A cross-tag write must participate in the change-trigger cascade, exactly
+        // like an upstream delta — any change-triggered tag that reads this path
+        // would otherwise go stale until an unrelated trigger fires (see
+        // docs/VirtualTags.md, VirtualTagContext section). Fire-and-forget: this
+        // callback runs inside EvaluateInternalAsync with the non-reentrant
+        // _evalGate held, so the cascade must be scheduled, not invoked inline.
+        _ = CascadeAsync(path, CancellationToken.None);
    }

    private void NotifyObservers(string path, DataValueSnapshot value)
@@ -49,19 +49,20 @@ public sealed class VirtualTagSource : IReadable, ISubscribable

        var handle = new SubscriptionHandle(Guid.NewGuid().ToString("N"));
        var observers = new List<IDisposable>(fullReferences.Count);
-        foreach (var path in fullReferences)
-        {
-            observers.Add(_engine.Subscribe(path, (p, snap) =>
-                OnDataChange?.Invoke(this, new DataChangeEventArgs(handle, p, snap))));
-        }
-        _subs[handle.DiagnosticId] = new Subscription(handle, observers);

-        // OPC UA convention: emit initial-data callback for each path with the current value.
+        // OPC UA convention: for each path, emit the initial-data callback BEFORE
+        // registering the change observer. This prevents a race where an upstream change
+        // fires the observer between the Subscribe call and the Read call, which would
+        // deliver a newer change event before the initial-data event, leaving the client
+        // with a stale last-known value.
        foreach (var path in fullReferences)
        {
            var snap = _engine.Read(path);
            OnDataChange?.Invoke(this, new DataChangeEventArgs(handle, path, snap));
+            observers.Add(_engine.Subscribe(path, (p, s) =>
+                OnDataChange?.Invoke(this, new DataChangeEventArgs(handle, p, s))));
        }
+        _subs[handle.DiagnosticId] = new Subscription(handle, observers);

        return Task.FromResult<ISubscriptionHandle>(handle);
    }
@@ -79,16 +79,15 @@ public sealed class PermissionTrie

    private static void WalkSystemPlatform(PermissionTrieNode ns, NodeScope scope, HashSet<string> groups, List<MatchedGrant> matches)
    {
-        // FolderSegments are nested under the namespace; each is its own trie level. Reuse the
-        // UnsArea scope kind for the flags — NodeAcl rows for Galaxy tags carry ScopeKind.Tag
-        // for leaf grants and ScopeKind.Namespace for folder-root grants; deeper folder grants
-        // are modeled as Equipment-level rows today since NodeAclScopeKind doesn't enumerate
-        // a dedicated FolderSegment kind. Future-proof TODO tracked in Stream B follow-up.
+        // FolderSegments are nested under the namespace; each is its own trie level. Use the
+        // dedicated FolderSegment scope kind so Galaxy folder grants report their true scope in
+        // AuthorizationDecision.Provenance — distinguishing them from UNS Equipment grants in
+        // the audit trail and Admin UI "Probe this permission" diagnostic.
        var current = ns;
        foreach (var segment in scope.FolderSegments)
        {
            if (!current.Children.TryGetValue(segment, out var child)) return;
-            CollectAtLevel(child, NodeAclScopeKind.Equipment, groups, matches);
+            CollectAtLevel(child, NodeAclScopeKind.FolderSegment, groups, matches);
            current = child;
        }

@@ -54,26 +54,51 @@ public sealed class PermissionTrieCache

    /// <summary>
    ///     Retain only the most-recent <paramref name="keepLatest"/> generations for a cluster.
-    ///     No-op when there's nothing to drop.
+    ///     No-op when there's nothing to drop. Thread-safe: uses a CAS loop with
+    ///     <see cref="ConcurrentDictionary{TKey,TValue}.TryUpdate"/> (reference equality on the
+    ///     class-typed entry) so a concurrent <see cref="Install"/> on the same cluster is never
+    ///     silently overwritten.
    /// </summary>
    public void Prune(string clusterId, int keepLatest = 3)
    {
        if (keepLatest < 1) throw new ArgumentOutOfRangeException(nameof(keepLatest), keepLatest, "keepLatest must be >= 1");
-        if (!_byCluster.TryGetValue(clusterId, out var entry)) return;

-        if (entry.Tries.Count <= keepLatest) return;
-        var keep = entry.Tries
-            .OrderByDescending(kvp => kvp.Key)
-            .Take(keepLatest)
-            .ToDictionary(kvp => kvp.Key, kvp => kvp.Value);
-        _byCluster[clusterId] = new ClusterEntry(entry.Current, keep);
+        // CAS retry loop: read a snapshot, compute the pruned entry, atomically swap.
+        // Retry if another writer (Install or a concurrent Prune) updated the entry first.
+        while (true)
+        {
+            if (!_byCluster.TryGetValue(clusterId, out var observed)) return;
+            if (observed.Tries.Count <= keepLatest) return;
+
+            var keep = observed.Tries
+                .OrderByDescending(kvp => kvp.Key)
+                .Take(keepLatest)
+                .ToDictionary(kvp => kvp.Key, kvp => kvp.Value);
+
+            // Preserve the current pointer; if it was pruned (shouldn't happen since Current
+            // is always the newest generation), fall back to the newest retained entry.
+            var current = keep.TryGetValue(observed.Current.GenerationId, out var kept)
+                ? kept
+                : keep.OrderByDescending(kvp => kvp.Key).First().Value;
+
+            var pruned = new ClusterEntry(current, keep);
+            // TryUpdate uses reference equality for ClusterEntry (class, not record) so it
+            // succeeds only when the stored reference is still the one we observed.
+            if (_byCluster.TryUpdate(clusterId, pruned, observed))
+                return;
+            // Another thread updated the entry between our read and our write — re-read and retry.
+        }
    }

    /// <summary>Diagnostics counter: number of cached (cluster, generation) tries.</summary>
    public int CachedTrieCount => _byCluster.Values.Sum(e => e.Tries.Count);

-    private sealed record ClusterEntry(PermissionTrie Current, IReadOnlyDictionary<long, PermissionTrie> Tries)
+    // Class (not record) so TryUpdate in Prune uses reference equality for the CAS comparison.
+    private sealed class ClusterEntry(PermissionTrie current, IReadOnlyDictionary<long, PermissionTrie> tries)
    {
+        public PermissionTrie Current { get; } = current;
+        public IReadOnlyDictionary<long, PermissionTrie> Tries { get; } = tries;
+
        public static ClusterEntry FromSingle(PermissionTrie trie) =>
            new(trie, new Dictionary<long, PermissionTrie> { [trie.GenerationId] = trie });

@@ -37,6 +37,21 @@ public sealed class TriePermissionEvaluator : IPermissionEvaluator
        var trie = _cache.GetTrie(scope.ClusterId);
        if (trie is null) return AuthorizationDecision.NotGranted();

+        // Decision #153 / Phase 6.2 adversarial-review item #3 (redundancy-safe invalidation):
+        // the GetTrie shortcut returns whatever generation the cache currently holds, which may
+        // have advanced past the generation this session was bound to (another node published).
+        // Evaluate against the session's *bound* generation so a grant added or removed in a
+        // newer generation cannot silently take effect mid-session, and so the provenance in the
+        // AuthorizationDecision reports the generation that actually produced the verdict.
+        if (trie.GenerationId != session.AuthGenerationId)
+        {
+            trie = _cache.GetTrie(scope.ClusterId, session.AuthGenerationId);
+
+            // The session's bound generation has been pruned out of the cache — fail closed and
+            // force the caller to re-resolve the session's auth state before retrying.
+            if (trie is null) return AuthorizationDecision.NotGranted();
+        }
+
        var matches = trie.CollectMatches(scope, session.LdapGroups);
        if (matches.Count == 0) return AuthorizationDecision.NotGranted();

@@ -7,13 +7,19 @@ namespace ZB.MOM.WW.OtOpcUa.Core.Authorization;
 /// </summary>
 /// <remarks>
 ///     Per decision #151 the membership is bounded by <see cref="MembershipFreshnessInterval"/>
-///     (default 15 min). After that, the next hot-path authz call re-resolves LDAP group
+///     (default 5 min). After that, the next hot-path authz call re-resolves LDAP group
 ///     memberships; failure to re-resolve (LDAP unreachable) flips the session to fail-closed
 ///     until a refresh succeeds.
 ///
-///     Per decision #152 <see cref="AuthCacheMaxStaleness"/> (default 5 min) is separate from
+///     Per decision #152 <see cref="AuthCacheMaxStaleness"/> (default 15 min) is separate from
 ///     Phase 6.1's availability-oriented 24h cache — beyond this window the evaluator returns
 ///     <see cref="AuthorizationVerdict.NotGranted"/> regardless of config-cache warmth.
+///
+///     The freshness window is the inner trigger and the staleness ceiling the outer hard
+///     limit: <see cref="MembershipFreshnessInterval"/> MUST be strictly less than
+///     <see cref="AuthCacheMaxStaleness"/> so that <see cref="NeedsRefresh"/> ("re-resolve
+///     while still serving cached memberships") has a non-empty window before
+///     <see cref="IsStale"/> fails the session closed.
 /// </remarks>
 public sealed record UserAuthorizationState
 {
@@ -47,10 +53,10 @@ public sealed record UserAuthorizationState
    public required long MembershipVersion { get; init; }

    /// <summary>Bounded membership freshness window; past this the next authz call refreshes.</summary>
-    public TimeSpan MembershipFreshnessInterval { get; init; } = TimeSpan.FromMinutes(15);
+    public TimeSpan MembershipFreshnessInterval { get; init; } = TimeSpan.FromMinutes(5);

    /// <summary>Hard staleness ceiling — beyond this, the evaluator fails closed.</summary>
-    public TimeSpan AuthCacheMaxStaleness { get; init; } = TimeSpan.FromMinutes(5);
+    public TimeSpan AuthCacheMaxStaleness { get; init; } = TimeSpan.FromMinutes(15);

    /// <summary>
    ///     True when <paramref name="utcNow"/> - <see cref="MembershipResolvedUtc"/> exceeds
@@ -36,12 +36,25 @@ public class GenericDriverNodeManager(IDriver driver) : IDisposable
    ///     Populates the address space by streaming nodes from the driver into the supplied builder,
    ///     wraps the builder so alarm-condition sinks are captured, subscribes to the driver's
    ///     alarm event stream, and routes each transition to the matching sink by <c>SourceNodeId</c>.
-    ///     Driver exceptions are isolated per decision #12 — the driver's subtree is marked Faulted,
-    ///     but other drivers remain available.
+    ///     If called a second time (e.g. Galaxy redeploy via <c>IRediscoverable.OnRediscoveryNeeded</c>)
+    ///     the previous alarm subscription is torn down and the sink registry is cleared before
+    ///     re-walking, preventing double delivery of alarm transitions.
+    ///     Exception isolation (marking the driver's subtree Faulted) is the caller's responsibility —
+    ///     exceptions from <see cref="ITagDiscovery.DiscoverAsync"/> propagate to the caller.
    /// </summary>
    public async Task BuildAddressSpaceAsync(IAddressSpaceBuilder builder, CancellationToken ct)
    {
        ArgumentNullException.ThrowIfNull(builder);
+        ObjectDisposedException.ThrowIf(_disposed, this);
+
+        // Tear down any previous alarm subscription before re-walking so a second call (e.g. on
+        // Galaxy redeploy) does not leave the old forwarder subscribed and double-fire events.
+        if (_alarmForwarder is not null && Driver is IAlarmSource existingSource)
+        {
+            existingSource.OnAlarmEvent -= _alarmForwarder;
+            _alarmForwarder = null;
+        }
+        _alarmSinks.Clear();

        if (Driver is not ITagDiscovery discovery)
            throw new NotSupportedException($"Driver '{Driver.DriverInstanceId}' does not implement ITagDiscovery.");
@@ -48,7 +48,9 @@ public sealed class AlarmSurfaceInvoker
    /// <summary>
    ///     Subscribe to alarm events for a set of source node ids, fanning out by resolved host
    ///     so per-host breakers / bulkheads apply. Returns one handle per host — callers that
-    ///     don't care about per-host separation may concatenate them.
+    ///     don't care about per-host separation may concatenate them. Each returned handle wraps
+    ///     the driver's opaque handle together with its resolved host so <see cref="UnsubscribeAsync"/>
+    ///     routes through the same host's pipeline that the subscription was created on.
    /// </summary>
    public async Task<IReadOnlyList<IAlarmSubscriptionHandle>> SubscribeAsync(
        IReadOnlyList<string> sourceNodeIds,
@@ -61,24 +63,34 @@ public sealed class AlarmSurfaceInvoker
        var handles = new List<IAlarmSubscriptionHandle>(byHost.Count);
        foreach (var (host, ids) in byHost)
        {
-            var handle = await _invoker.ExecuteAsync(
+            var inner = await _invoker.ExecuteAsync(
                DriverCapability.AlarmSubscribe,
                host,
                async ct => await _alarmSource.SubscribeAlarmsAsync(ids, ct).ConfigureAwait(false),
                cancellationToken).ConfigureAwait(false);
-            handles.Add(handle);
+            handles.Add(new HostBoundHandle(inner, host));
        }
        return handles;
    }

-    /// <summary>Cancel an alarm subscription. Routes through the AlarmSubscribe pipeline for parity.</summary>
+    /// <summary>
+    ///     Cancel an alarm subscription. Routes through the same host's resilience pipeline
+    ///     that the subscription was created on (carried in the <see cref="HostBoundHandle"/>
+    ///     wrapper returned by <see cref="SubscribeAsync"/>). Falls back to the default host for
+    ///     handles not created by this invoker so the method remains safe to call on any
+    ///     <see cref="IAlarmSubscriptionHandle"/> implementation.
+    /// </summary>
    public ValueTask UnsubscribeAsync(IAlarmSubscriptionHandle handle, CancellationToken cancellationToken)
    {
        ArgumentNullException.ThrowIfNull(handle);
+        var (innerHandle, host) = handle is HostBoundHandle bound
+            ? (bound.Inner, bound.Host)
+            : (handle, _defaultHost);
+
        return _invoker.ExecuteAsync(
            DriverCapability.AlarmSubscribe,
-            _defaultHost,
-            async ct => await _alarmSource.UnsubscribeAlarmsAsync(handle, ct).ConfigureAwait(false),
+            host,
+            async ct => await _alarmSource.UnsubscribeAlarmsAsync(innerHandle, ct).ConfigureAwait(false),
            cancellationToken);
    }

@@ -126,4 +138,16 @@ public sealed class AlarmSurfaceInvoker
        }
        return result;
    }
+
+    /// <summary>
+    ///     Wraps an <see cref="IAlarmSubscriptionHandle"/> returned by the driver with the
+    ///     resolved host name used when the subscription was created. <see cref="UnsubscribeAsync"/>
+    ///     unwraps this to route the unsubscribe through the same host's resilience pipeline.
+    /// </summary>
+    private sealed class HostBoundHandle(IAlarmSubscriptionHandle inner, string host) : IAlarmSubscriptionHandle
+    {
+        public IAlarmSubscriptionHandle Inner { get; } = inner;
+        public string Host { get; } = host;
+        public string DiagnosticId => Inner.DiagnosticId;
+    }
 }
@@ -56,4 +56,19 @@ public abstract class AbCipCommandBase : DriverCommandBase
    ///     multiple gateways in parallel can distinguish the logs.
    /// </summary>
    protected string DriverInstanceId => $"abcip-cli-{Gateway}";
+
+    /// <summary>
+    ///     Guards against <see cref="AbCipDataType.Structure"/> being passed to a command
+    ///     that does not support UDT layouts. Call at the top of <c>ExecuteAsync</c> for any
+    ///     command that accepts <c>--type</c> but cannot handle memberless Structure tags.
+    ///     Throws a <see cref="CliFx.Exceptions.CommandException"/> if <paramref name="type"/>
+    ///     is <see cref="AbCipDataType.Structure"/>.
+    /// </summary>
+    protected static void RejectStructure(AbCipDataType type)
+    {
+        if (type == AbCipDataType.Structure)
+            throw new CliFx.Exceptions.CommandException(
+                "Structure (UDT) reads are out of scope for this command — those need an explicit " +
+                "member layout, which belongs in a real driver config.");
+    }
 }
@@ -25,6 +25,7 @@ public sealed class ProbeCommand : AbCipCommandBase
    public override async ValueTask ExecuteAsync(IConsole console)
    {
        ConfigureLogging();
+        RejectStructure(DataType);
        var ct = console.RegisterCancellationHandler();

        var probeTag = new AbCipTagDefinition(
@@ -27,6 +27,7 @@ public sealed class ReadCommand : AbCipCommandBase
    public override async ValueTask ExecuteAsync(IConsole console)
    {
        ConfigureLogging();
+        RejectStructure(DataType);
        var ct = console.RegisterCancellationHandler();

        var tagName = SynthesiseTagName(TagPath, DataType);
@@ -30,6 +30,7 @@ public sealed class SubscribeCommand : AbCipCommandBase
    public override async ValueTask ExecuteAsync(IConsole console)
    {
        ConfigureLogging();
+        RejectStructure(DataType);
        var ct = console.RegisterCancellationHandler();

        var tagName = ReadCommand.SynthesiseTagName(TagPath, DataType);
@@ -66,23 +66,40 @@ public sealed class WriteCommand : AbCipCommandBase
    /// <summary>
    ///     Parse the operator's <c>--value</c> string into the CLR type the driver expects
    ///     for the declared <see cref="AbCipDataType"/>. Invariant culture everywhere.
+    ///     Bad input (non-numeric text, out-of-range value) is caught and rethrown as a
+    ///     <see cref="CliFx.Exceptions.CommandException"/> so CliFx renders a clean one-line
+    ///     error rather than a full .NET stack trace.
    /// </summary>
-    internal static object ParseValue(string raw, AbCipDataType type) => type switch
+    internal static object ParseValue(string raw, AbCipDataType type)
    {
-        AbCipDataType.Bool => ParseBool(raw),
-        AbCipDataType.SInt => sbyte.Parse(raw, CultureInfo.InvariantCulture),
-        AbCipDataType.Int => short.Parse(raw, CultureInfo.InvariantCulture),
-        AbCipDataType.DInt or AbCipDataType.Dt => int.Parse(raw, CultureInfo.InvariantCulture),
-        AbCipDataType.LInt => long.Parse(raw, CultureInfo.InvariantCulture),
-        AbCipDataType.USInt => byte.Parse(raw, CultureInfo.InvariantCulture),
-        AbCipDataType.UInt => ushort.Parse(raw, CultureInfo.InvariantCulture),
-        AbCipDataType.UDInt => uint.Parse(raw, CultureInfo.InvariantCulture),
-        AbCipDataType.ULInt => ulong.Parse(raw, CultureInfo.InvariantCulture),
-        AbCipDataType.Real => float.Parse(raw, CultureInfo.InvariantCulture),
-        AbCipDataType.LReal => double.Parse(raw, CultureInfo.InvariantCulture),
-        AbCipDataType.String => raw,
-        _ => throw new CliFx.Exceptions.CommandException($"Unsupported DataType '{type}' for write."),
-    };
+        try
+        {
+            return type switch
+            {
+                AbCipDataType.Bool => ParseBool(raw),
+                AbCipDataType.SInt => sbyte.Parse(raw, CultureInfo.InvariantCulture),
+                AbCipDataType.Int => short.Parse(raw, CultureInfo.InvariantCulture),
+                AbCipDataType.DInt or AbCipDataType.Dt => int.Parse(raw, CultureInfo.InvariantCulture),
+                AbCipDataType.LInt => long.Parse(raw, CultureInfo.InvariantCulture),
+                AbCipDataType.USInt => byte.Parse(raw, CultureInfo.InvariantCulture),
+                AbCipDataType.UInt => ushort.Parse(raw, CultureInfo.InvariantCulture),
+                AbCipDataType.UDInt => uint.Parse(raw, CultureInfo.InvariantCulture),
+                AbCipDataType.ULInt => ulong.Parse(raw, CultureInfo.InvariantCulture),
+                AbCipDataType.Real => float.Parse(raw, CultureInfo.InvariantCulture),
+                AbCipDataType.LReal => double.Parse(raw, CultureInfo.InvariantCulture),
+                AbCipDataType.String => raw,
+                _ => throw new CliFx.Exceptions.CommandException($"Unsupported DataType '{type}' for write."),
+            };
+        }
+        catch (Exception ex) when (ex is FormatException or OverflowException)
+        {
+            throw new CliFx.Exceptions.CommandException(
+                $"Cannot parse '{raw}' as {type}. " +
+                $"Check the value is within the valid range for {type} and uses invariant-culture " +
+                $"decimal notation (e.g. '3.14', not '3,14').",
+                innerException: ex);
+        }
+    }

    private static bool ParseBool(string raw) => raw.Trim().ToLowerInvariant() switch
    {
@@ -59,17 +59,38 @@ public sealed class WriteCommand : AbLegacyCommandBase
    }

    /// <summary>Parse <c>--value</c> per <see cref="AbLegacyDataType"/>, invariant culture.</summary>
-    internal static object ParseValue(string raw, AbLegacyDataType type) => type switch
+    /// <exception cref="CliFx.Exceptions.CommandException">
+    ///     Thrown when <paramref name="raw"/> cannot be parsed as the requested type (malformed
+    ///     input or out-of-range value) so CliFx renders a clean one-line error instead of a raw
+    ///     stack trace.
+    /// </exception>
+    internal static object ParseValue(string raw, AbLegacyDataType type)
    {
-        AbLegacyDataType.Bit => ParseBool(raw),
-        AbLegacyDataType.Int or AbLegacyDataType.AnalogInt => short.Parse(raw, CultureInfo.InvariantCulture),
-        AbLegacyDataType.Long => int.Parse(raw, CultureInfo.InvariantCulture),
-        AbLegacyDataType.Float => float.Parse(raw, CultureInfo.InvariantCulture),
-        AbLegacyDataType.String => raw,
-        AbLegacyDataType.TimerElement or AbLegacyDataType.CounterElement
-            or AbLegacyDataType.ControlElement => int.Parse(raw, CultureInfo.InvariantCulture),
-        _ => throw new CliFx.Exceptions.CommandException($"Unsupported DataType '{type}' for write."),
-    };
+        try
+        {
+            return type switch
+            {
+                AbLegacyDataType.Bit => ParseBool(raw),
+                AbLegacyDataType.Int or AbLegacyDataType.AnalogInt => short.Parse(raw, CultureInfo.InvariantCulture),
+                AbLegacyDataType.Long => int.Parse(raw, CultureInfo.InvariantCulture),
+                AbLegacyDataType.Float => float.Parse(raw, CultureInfo.InvariantCulture),
+                AbLegacyDataType.String => raw,
+                AbLegacyDataType.TimerElement or AbLegacyDataType.CounterElement
+                    or AbLegacyDataType.ControlElement => int.Parse(raw, CultureInfo.InvariantCulture),
+                _ => throw new CliFx.Exceptions.CommandException($"Unsupported DataType '{type}' for write."),
+            };
+        }
+        catch (FormatException ex)
+        {
+            throw new CliFx.Exceptions.CommandException(
+                $"Value '{raw}' is not a valid {type}: {ex.Message}", innerException: ex);
+        }
+        catch (OverflowException ex)
+        {
+            throw new CliFx.Exceptions.CommandException(
+                $"Value '{raw}' is out of range for {type}: {ex.Message}", innerException: ex);
+        }
+    }

    private static bool ParseBool(string raw) => raw.Trim().ToLowerInvariant() switch
    {
@@ -7,8 +7,8 @@ namespace ZB.MOM.WW.OtOpcUa.Driver.Cli.Common;

 /// <summary>
 ///     Shared base for every driver test-client command (Modbus / AB CIP / AB Legacy / S7 /
-///     TwinCAT). Carries the options that are meaningful regardless of protocol — verbose
-///     logging + the standard timeout — plus helpers every command implementation wants:
+///     TwinCAT / FOCAS). Carries the options that are meaningful regardless of protocol —
+///     verbose logging + the standard timeout — plus helpers every command implementation wants:
 ///     Serilog configuration + cancellation-token capture.
 /// </summary>
 /// <remarks>
@@ -44,17 +44,37 @@ public abstract class DriverCommandBase : ICommand
    public abstract ValueTask ExecuteAsync(IConsole console);

    /// <summary>
-    ///     Configures the process-global Serilog logger. Commands call this at the top of
-    ///     <see cref="ExecuteAsync"/> so driver-internal <c>Log.Logger</c> writes land on the
-    ///     same sink as the CLI's operator-facing output.
+    ///     Configures the process-global Serilog logger. Intended to be called exactly once,
+    ///     at the top of <see cref="ExecuteAsync"/>, so driver-internal <c>Log.Logger</c>
+    ///     writes land on the same sink as the CLI's operator-facing output.
+    ///     If the logger has already been configured this call is a no-op (idempotent).
+    ///     Call <see cref="FlushLogging"/> in a <c>finally</c> block to ensure buffered output
+    ///     is flushed before the process exits.
    /// </summary>
    protected void ConfigureLogging()
    {
+        if (_loggingConfigured) return;
+        _loggingConfigured = true;
+
+        // Dispose the previous global logger (e.g. Serilog's silent bootstrap logger) so
+        // its resources are released cleanly before we overwrite Log.Logger.
+        var previous = Log.Logger;
        var config = new LoggerConfiguration();
        if (Verbose)
            config.MinimumLevel.Debug().WriteTo.Console();
        else
            config.MinimumLevel.Warning().WriteTo.Console();
        Log.Logger = config.CreateLogger();
+        (previous as IDisposable)?.Dispose();
    }
+
+    /// <summary>
+    ///     Flushes and closes the Serilog logger configured by <see cref="ConfigureLogging"/>.
+    ///     Call this in a <c>finally</c> block inside <see cref="ExecuteAsync"/> to prevent
+    ///     buffered log output from being lost on process exit, particularly for long-running
+    ///     commands such as <c>subscribe</c>.
+    /// </summary>
+    protected static void FlushLogging() => Log.CloseAndFlush();
+
+    private bool _loggingConfigured;
 }
@@ -65,9 +65,9 @@ public static class SnapshotFormatter
            Time = FormatTimestamp(snapshots[i].SourceTimestampUtc),
        }).ToArray();

-        int tagW = Math.Max("TAG".Length, rows.Max(r => r.Tag.Length));
-        int valW = Math.Max("VALUE".Length, rows.Max(r => r.Value.Length));
-        int statW = Math.Max("STATUS".Length, rows.Max(r => r.Status.Length));
+        int tagW = rows.Length == 0 ? "TAG".Length : Math.Max("TAG".Length, rows.Max(r => r.Tag.Length));
+        int valW = rows.Length == 0 ? "VALUE".Length : Math.Max("VALUE".Length, rows.Max(r => r.Value.Length));
+        int statW = rows.Length == 0 ? "STATUS".Length : Math.Max("STATUS".Length, rows.Max(r => r.Status.Length));
        // source-time column is fixed-width (ISO-8601 to ms) so no max-measurement needed.

        var sb = new System.Text.StringBuilder();
@@ -100,23 +100,42 @@ public static class SnapshotFormatter

    public static string FormatStatus(uint statusCode)
    {
-        // Match the OPC UA shorthand for the statuses most-likely to land in a CLI run.
-        // Anything outside this short-list surfaces as hex — operators can cross-reference
-        // against OPC UA Part 6 § 7.34 (StatusCode tables) or Core.Abstractions status mappers.
-        var name = statusCode switch
+        // OPC UA status codes carry sub-code and flag bits in the low 16 bits (info type,
+        // structure-changed, semantics-changed, limit bits, overflow, etc.).  To ensure
+        // that e.g. 0x80050001 still reads as "BadCommunicationError" rather than bare hex,
+        // named codes are matched against the high-word mask (code & 0xFFFF0000).  When no
+        // named match is found the severity class (top 2 bits) provides a meaningful fallback
+        // so operators always see at least Good / Uncertain / Bad rather than raw hex.
+        // Numeric codes are the canonical values from the OPC Foundation Opc.Ua.StatusCodes
+        // table; keep them in sync with that table if this list is extended.
+        var masked = statusCode & 0xFFFF0000u;
+        var name = masked switch
        {
            0x00000000u => "Good",
            0x80000000u => "Bad",
            0x80050000u => "BadCommunicationError",
-            0x80060000u => "BadTimeout",
-            0x80070000u => "BadNoCommunication",
-            0x80080000u => "BadWaitingForInitialData",
+            0x800A0000u => "BadTimeout",
+            0x80310000u => "BadNoCommunication",
+            0x80320000u => "BadWaitingForInitialData",
            0x80340000u => "BadNodeIdUnknown",
-            0x80350000u => "BadNodeIdInvalid",
+            0x80330000u => "BadNodeIdInvalid",
            0x80740000u => "BadTypeMismatch",
            0x40000000u => "Uncertain",
            _ => null,
        };
+
+        if (name is null)
+        {
+            // Severity fallback: top 2 bits identify the quality class even for unknown
+            // sub-codes.  0x80000000 and 0xC0000000 (reserved quality) both map to "Bad".
+            name = (statusCode & 0xC0000000u) switch
+            {
+                0x00000000u => "Good",
+                0x40000000u => "Uncertain",
+                _ => "Bad",
+            };
+        }
+
        return name is null
            ? $"0x{statusCode:X8}"
            : $"0x{statusCode:X8} ({name})";
@@ -35,6 +35,21 @@ public sealed class SubscribeCommand : ModbusCommandBase
        "BigEndian (default) or WordSwap.")]
    public ModbusByteOrder ByteOrder { get; init; } = ModbusByteOrder.BigEndian;

+    // Driver.Modbus.Cli-001: subscribe previously lacked these three options that read and
+    // write both expose. Without them, BitInRegister always watches bit 0 and String runs with
+    // StringLength=0, silently producing wrong results for any subscriber using those types.
+    [CommandOption("bit-index", Description =
+        "For type=BitInRegister: which bit of the holding register (0-15, LSB-first).")]
+    public byte BitIndex { get; init; }
+
+    [CommandOption("string-length", Description =
+        "For type=String: character count (2 per register, rounded up).")]
+    public ushort StringLength { get; init; }
+
+    [CommandOption("string-byte-order", Description =
+        "For type=String: HighByteFirst (standard) or LowByteFirst (DirectLOGIC).")]
+    public ModbusStringByteOrder StringByteOrder { get; init; } = ModbusStringByteOrder.HighByteFirst;
+
    public override async ValueTask ExecuteAsync(IConsole console)
    {
        ConfigureLogging();
@@ -47,7 +62,10 @@ public sealed class SubscribeCommand : ModbusCommandBase
            Address: Address,
            DataType: DataType,
            Writable: false,
-            ByteOrder: ByteOrder);
+            ByteOrder: ByteOrder,
+            BitIndex: BitIndex,
+            StringLength: StringLength,
+            StringByteOrder: StringByteOrder);
        var options = BuildOptions([tag]);

        await using var driver = new ModbusDriver(options, DriverInstanceId);
@@ -60,6 +60,16 @@ public sealed class WriteCommand : ModbusCommandBase
            throw new CliFx.Exceptions.CommandException(
                $"Region '{Region}' is read-only in the Modbus spec; writes require Coils or HoldingRegisters.");

+        // Driver.Modbus.Cli-002: coils are single-bit outputs — only Bool makes sense. A
+        // non-boolean type (e.g. --region Coils --type UInt16) would silently coerce the value
+        // to a boolean via Convert.ToBoolean, landing as ON for any non-zero value, with no
+        // diagnostic. Reject it early so the operator sees a clear error rather than a silent
+        // type-mismatch coerce.
+        if (Region == ModbusRegion.Coils && DataType != ModbusDataType.Bool)
+            throw new CliFx.Exceptions.CommandException(
+                $"Region 'Coils' only supports boolean values (--type Bool). " +
+                $"Type '{DataType}' cannot represent a single-bit coil write.");
+
        var tagName = ReadCommand.SynthesiseTagName(Region, Address, DataType);
        var tag = new ModbusTagDefinition(
            Name: tagName,
@@ -34,6 +34,10 @@ public sealed class ProbeCommand : S7CommandBase
        var options = BuildOptions([probeTag]);

        await using var driver = new S7Driver(options, DriverInstanceId);
+        // Driver.S7.Cli-003: wrap the entire probe sequence so that a refused/unreachable TCP
+        // connect still prints the structured Host/CPU/Health lines instead of crashing with a
+        // full .NET stack trace. InitializeAsync sets health to Faulted with the exception
+        // message before re-throwing, so GetHealth() always has something to report.
        try
        {
            await driver.InitializeAsync("{}", ct);
@@ -48,6 +52,20 @@ public sealed class ProbeCommand : S7CommandBase
            await console.Output.WriteLineAsync();
            await console.Output.WriteLineAsync(SnapshotFormatter.Format(Address, snapshot[0]));
        }
+        catch (OperationCanceledException)
+        {
+            throw; // Ctrl+C — let CliFx handle it normally.
+        }
+        catch
+        {
+            // Connect / read failure — print what the driver knows so far.
+            var health = driver.GetHealth();
+            await console.Output.WriteLineAsync($"Host:         {Host}:{Port}");
+            await console.Output.WriteLineAsync($"CPU:          {CpuType} rack={Rack} slot={Slot}");
+            await console.Output.WriteLineAsync($"Health:       {health.State}");
+            if (health.LastError is { } err)
+                await console.Output.WriteLineAsync($"Last error:   {err}");
+        }
        finally
        {
            await driver.ShutdownAsync(CancellationToken.None);
@@ -19,9 +19,12 @@ public sealed class ReadCommand : S7CommandBase
        IsRequired = true)]
    public string Address { get; init; } = default!;

+    // Driver.S7.Cli-002: help text trimmed to the types the driver actually implements.
+    // Int64 / UInt64 / Float64 / String / DateTime are defined in S7DataType but the driver
+    // raises NotSupportedException (→ BadNotSupported) on reads of those types.
    [CommandOption("type", 't', Description =
-        "Bool / Byte / Int16 / UInt16 / Int32 / UInt32 / Int64 / UInt64 / Float32 / Float64 / " +
-        "String / DateTime (default Int16).")]
+        "Bool / Byte / Int16 / UInt16 / Int32 / UInt32 / Float32 (default Int16). " +
+        "Int64, UInt64, Float64, String, and DateTime are not yet implemented and will return BadNotSupported.")]
    public S7DataType DataType { get; init; } = S7DataType.Int16;

    [CommandOption("string-length", Description =
@@ -15,9 +15,10 @@ public sealed class SubscribeCommand : S7CommandBase
    [CommandOption("address", 'a', Description = "S7 address — same format as `read`.", IsRequired = true)]
    public string Address { get; init; } = default!;

+    // Driver.S7.Cli-002: help text trimmed to the types the driver actually implements.
    [CommandOption("type", 't', Description =
-        "Bool / Byte / Int16 / UInt16 / Int32 / UInt32 / Int64 / UInt64 / Float32 / Float64 / " +
-        "String / DateTime (default Int16).")]
+        "Bool / Byte / Int16 / UInt16 / Int32 / UInt32 / Float32 (default Int16). " +
+        "Int64, UInt64, Float64, String, and DateTime are not yet implemented and will return BadNotSupported.")]
    public S7DataType DataType { get; init; } = S7DataType.Int16;

    [CommandOption("interval-ms", 'i', Description = "Publishing interval ms (default 1000).")]
@@ -18,9 +18,13 @@ public sealed class WriteCommand : S7CommandBase
        "S7 address — same format as `read`.", IsRequired = true)]
    public string Address { get; init; } = default!;

+    // Driver.S7.Cli-002: help text trimmed to the types the driver actually implements.
+    // Int64 / UInt64 / Float64 / String / DateTime are defined in S7DataType but the driver
+    // raises NotSupportedException (→ BadNotSupported) on any read/write of those types;
+    // advertising them misleads operators who then see BadNotSupported with no explanation.
    [CommandOption("type", 't', Description =
-        "Bool / Byte / Int16 / UInt16 / Int32 / UInt32 / Int64 / UInt64 / Float32 / Float64 / " +
-        "String / DateTime (default Int16).")]
+        "Bool / Byte / Int16 / UInt16 / Int32 / UInt32 / Float32 (default Int16). " +
+        "Int64, UInt64, Float64, String, and DateTime are not yet implemented and will return BadNotSupported.")]
    public S7DataType DataType { get; init; } = S7DataType.Int16;

    [CommandOption("value", 'v', Description =
@@ -62,22 +66,44 @@ public sealed class WriteCommand : S7CommandBase
    }

    /// <summary>Parse <c>--value</c> per <see cref="S7DataType"/>, invariant culture throughout.</summary>
-    internal static object ParseValue(string raw, S7DataType type) => type switch
+    /// <remarks>
+    ///     Driver.S7.Cli-001: numeric and <see cref="DateTime"/> parses are wrapped so that
+    ///     malformed input (<see cref="FormatException"/> / <see cref="OverflowException"/>)
+    ///     surfaces as a clean <see cref="CliFx.Exceptions.CommandException"/> rather than a
+    ///     raw .NET stack trace — matching the friendly message the Bool path already produces.
+    /// </remarks>
+    internal static object ParseValue(string raw, S7DataType type)
    {
-        S7DataType.Bool => ParseBool(raw),
-        S7DataType.Byte => byte.Parse(raw, CultureInfo.InvariantCulture),
-        S7DataType.Int16 => short.Parse(raw, CultureInfo.InvariantCulture),
-        S7DataType.UInt16 => ushort.Parse(raw, CultureInfo.InvariantCulture),
-        S7DataType.Int32 => int.Parse(raw, CultureInfo.InvariantCulture),
-        S7DataType.UInt32 => uint.Parse(raw, CultureInfo.InvariantCulture),
-        S7DataType.Int64 => long.Parse(raw, CultureInfo.InvariantCulture),
-        S7DataType.UInt64 => ulong.Parse(raw, CultureInfo.InvariantCulture),
-        S7DataType.Float32 => float.Parse(raw, CultureInfo.InvariantCulture),
-        S7DataType.Float64 => double.Parse(raw, CultureInfo.InvariantCulture),
-        S7DataType.String => raw,
-        S7DataType.DateTime => DateTime.Parse(raw, CultureInfo.InvariantCulture, DateTimeStyles.RoundtripKind),
-        _ => throw new CliFx.Exceptions.CommandException($"Unsupported DataType '{type}' for write."),
-    };
+        if (type == S7DataType.Bool) return ParseBool(raw);
+        if (type == S7DataType.String) return raw;
+        try
+        {
+            return type switch
+            {
+                S7DataType.Byte     => (object)byte.Parse(raw, CultureInfo.InvariantCulture),
+                S7DataType.Int16    => (object)short.Parse(raw, CultureInfo.InvariantCulture),
+                S7DataType.UInt16   => (object)ushort.Parse(raw, CultureInfo.InvariantCulture),
+                S7DataType.Int32    => (object)int.Parse(raw, CultureInfo.InvariantCulture),
+                S7DataType.UInt32   => (object)uint.Parse(raw, CultureInfo.InvariantCulture),
+                S7DataType.Int64    => (object)long.Parse(raw, CultureInfo.InvariantCulture),
+                S7DataType.UInt64   => (object)ulong.Parse(raw, CultureInfo.InvariantCulture),
+                S7DataType.Float32  => (object)float.Parse(raw, CultureInfo.InvariantCulture),
+                S7DataType.Float64  => (object)double.Parse(raw, CultureInfo.InvariantCulture),
+                S7DataType.DateTime => (object)DateTime.Parse(raw, CultureInfo.InvariantCulture, DateTimeStyles.RoundtripKind),
+                _ => throw new CliFx.Exceptions.CommandException($"Unsupported DataType '{type}' for write."),
+            };
+        }
+        catch (FormatException ex)
+        {
+            throw new CliFx.Exceptions.CommandException(
+                $"Value '{raw}' is not a valid {type}: {ex.Message}");
+        }
+        catch (OverflowException ex)
+        {
+            throw new CliFx.Exceptions.CommandException(
+                $"Value '{raw}' is out of range for {type}: {ex.Message}");
+        }
+    }

    private static bool ParseBool(string raw) => raw.Trim().ToLowerInvariant() switch
    {
@@ -40,22 +40,29 @@ public enum AbCipDataType
 public static class AbCipDataTypeExtensions
 {
    /// <summary>
-    ///     Map to the driver-agnostic type the server's address-space builder consumes. Unsigned
-    ///     Logix types widen into signed equivalents until <c>DriverDataType</c> picks up unsigned
-    ///     + 64-bit variants (Modbus has the same gap — see <c>ModbusDriver.MapDataType</c>
-    ///     comment re: PR 25).
+    ///     Map to the driver-agnostic type the server's address-space builder consumes.
+    ///     <c>DriverDataType</c> carries Int64, UInt32, and UInt64 so each Logix type maps
+    ///     to the widest correct signed/unsigned equivalent without silent truncation:
+    ///     <list type="bullet">
+    ///         <item>LInt (signed 64-bit) → Int64; ULInt (unsigned 64-bit) → UInt64.</item>
+    ///         <item>UDInt (unsigned 32-bit) → UInt32 so values above Int32.MaxValue are not
+    ///             wrapped to negative (Driver.AbCip-004).</item>
+    ///         <item>USInt / UInt widen into Int32; they can never overflow it.</item>
+    ///     </list>
    /// </summary>
    public static DriverDataType ToDriverDataType(this AbCipDataType t) => t switch
    {
        AbCipDataType.Bool => DriverDataType.Boolean,
        AbCipDataType.SInt or AbCipDataType.Int or AbCipDataType.DInt => DriverDataType.Int32,
-        AbCipDataType.USInt or AbCipDataType.UInt or AbCipDataType.UDInt => DriverDataType.Int32,
-        AbCipDataType.LInt or AbCipDataType.ULInt => DriverDataType.Int32, // TODO: Int64 — matches Modbus gap
+        AbCipDataType.USInt or AbCipDataType.UInt => DriverDataType.Int32,
+        AbCipDataType.UDInt => DriverDataType.UInt32,
+        AbCipDataType.LInt => DriverDataType.Int64,
+        AbCipDataType.ULInt => DriverDataType.UInt64,
        AbCipDataType.Real => DriverDataType.Float32,
        AbCipDataType.LReal => DriverDataType.Float64,
        AbCipDataType.String => DriverDataType.String,
        AbCipDataType.Dt => DriverDataType.Int32, // epoch-seconds DINT
-        AbCipDataType.Structure => DriverDataType.String, // placeholder until UDT PR 6 introduces a structured kind
+        AbCipDataType.Structure => DriverDataType.String, // placeholder until UDT introduces a structured kind
        _ => DriverDataType.Int32,
    };
 }
@@ -5,9 +5,8 @@ namespace ZB.MOM.WW.OtOpcUa.Driver.AbCip;

 /// <summary>
 ///     Allen-Bradley CIP / EtherNet-IP driver for ControlLogix / CompactLogix / Micro800 /
-///     GuardLogix families. Implements <see cref="IDriver"/> only for now — read/write/
-///     subscribe/discover capabilities ship in subsequent PRs (3–8) and family-specific quirk
-///     profiles ship in PRs 9–12.
+///     GuardLogix families. Implements all read/write/subscribe/discover/probe/alarm
+///     capabilities via the libplctag.NET wrapper.
 /// </summary>
 /// <remarks>
 ///     <para>Wire layer is libplctag 1.6.x (plan decision #11). Per-device host addresses use
@@ -17,13 +16,16 @@ namespace ZB.MOM.WW.OtOpcUa.Driver.AbCip;
 ///
 ///     <para>Tier A per plan decisions #143–145 — in-process, shares server lifetime, no
 ///     sidecar. <see cref="ReinitializeAsync"/> is the Tier-B escape hatch for recovering
-///     from native-heap growth that the CLR allocator can't see; it tears down every
-///     <see cref="PlcTagHandle"/> and reconnects each device.</para>
+///     from native-heap growth that the CLR allocator can't see; it tears down the
+///     libplctag.NET <c>Tag</c> instances held in <c>DeviceState.Runtimes</c> and reconnects
+///     each device. Native tag lifetime is owned by the libplctag.NET <c>Tag.Dispose()</c>
+///     (called in <see cref="DeviceState.DisposeHandles"/>); the library's own finalizer
+///     handles GC-collected tags.</para>
 /// </remarks>
 public sealed class AbCipDriver : IDriver, IReadable, IWritable, ITagDiscovery, ISubscribable,
    IHostConnectivityProbe, IPerCallHostResolver, IAlarmSource, IDisposable, IAsyncDisposable
 {
-    private readonly AbCipDriverOptions _options;
+    private AbCipDriverOptions _options;
    private readonly string _driverInstanceId;
    private readonly IAbCipTagFactory _tagFactory;
    private readonly IAbCipTagEnumeratorFactory _enumeratorFactory;
@@ -32,7 +34,7 @@ public sealed class AbCipDriver : IDriver, IReadable, IWritable, ITagDiscovery,
    private readonly PollGroupEngine _poll;
    private readonly Dictionary<string, DeviceState> _devices = new(StringComparer.OrdinalIgnoreCase);
    private readonly Dictionary<string, AbCipTagDefinition> _tagsByName = new(StringComparer.OrdinalIgnoreCase);
-    private readonly AbCipAlarmProjection _alarmProjection;
+    private AbCipAlarmProjection _alarmProjection;
    private DriverHealth _health = new(DriverState.Unknown, null, null);

    public event EventHandler<DataChangeEventArgs>? OnDataChange;
@@ -108,11 +110,32 @@ public sealed class AbCipDriver : IDriver, IReadable, IWritable, ITagDiscovery,
    public string DriverInstanceId => _driverInstanceId;
    public string DriverType => "AbCip";

+    /// <summary>
+    ///     Initialize the driver from its <c>DriverConfig</c> JSON. When
+    ///     <paramref name="driverConfigJson"/> carries a real configuration (any device or tag),
+    ///     it is parsed via <see cref="AbCipDriverFactoryExtensions.ParseOptions"/> and the
+    ///     parsed options REPLACE the construction-time options — this is what makes
+    ///     <see cref="ReinitializeAsync"/> pick up a changed config (new device, new tag,
+    ///     changed timeout). A blank or empty-object JSON (<c>"{}"</c>) is treated as "no
+    ///     override" so callers that constructed the driver with explicit options — chiefly
+    ///     unit tests — keep those options. The driver's address-space + runtime state is then
+    ///     built from the effective <see cref="_options"/>.
+    /// </summary>
    public Task InitializeAsync(string driverConfigJson, CancellationToken cancellationToken)
    {
        _health = new DriverHealth(DriverState.Initializing, null, null);
        try
        {
+            if (!string.IsNullOrWhiteSpace(driverConfigJson))
+            {
+                var parsed = AbCipDriverFactoryExtensions.ParseOptions(_driverInstanceId, driverConfigJson);
+                if (parsed.Devices.Count > 0 || parsed.Tags.Count > 0)
+                {
+                    _options = parsed;
+                    _alarmProjection = new AbCipAlarmProjection(this, _options.AlarmPollInterval);
+                }
+            }
+
            foreach (var device in _options.Devices)
            {
                var addr = AbCipHostAddress.TryParse(device.HostAddress)
@@ -123,7 +146,16 @@ public sealed class AbCipDriver : IDriver, IReadable, IWritable, ITagDiscovery,
            }
            foreach (var tag in _options.Tags)
            {
+                // Duplicate-key check: a collision means two configured tags have the same name.
+                // Fail fast at init time with a diagnostic rather than silently clobbering.
+                // (Driver.AbCip-005)
+                if (_tagsByName.TryGetValue(tag.Name, out var existingTag))
+                    throw new InvalidOperationException(
+                        $"AbCip tag name collision: '{tag.Name}' is declared more than once. " +
+                        $"Existing entry DeviceHostAddress='{existingTag.DeviceHostAddress}', " +
+                        $"TagPath='{existingTag.TagPath}'. Rename or remove the duplicate.");
                _tagsByName[tag.Name] = tag;
+
                if (tag.DataType == AbCipDataType.Structure && tag.Members is { Count: > 0 })
                {
                    foreach (var member in tag.Members)
@@ -135,6 +167,14 @@ public sealed class AbCipDriver : IDriver, IReadable, IWritable, ITagDiscovery,
                            DataType: member.DataType,
                            Writable: member.Writable,
                            WriteIdempotent: member.WriteIdempotent);
+                        // Member fan-out duplicate check: a member-path collision means two
+                        // configured structure tags produce the same member path, or a member
+                        // name collides with an independently-declared tag.
+                        if (_tagsByName.TryGetValue(memberTag.Name, out var existingMember))
+                            throw new InvalidOperationException(
+                                $"AbCip tag name collision: '{memberTag.Name}' is produced by both " +
+                                $"'{tag.Name}.{member.Name}' (member fan-out) and an existing tag " +
+                                $"'{existingMember.Name}'. Rename one of the configured tags to resolve.");
                        _tagsByName[memberTag.Name] = memberTag;
                    }
                }
@@ -147,7 +187,9 @@ public sealed class AbCipDriver : IDriver, IReadable, IWritable, ITagDiscovery,
                {
                    state.ProbeCts = new CancellationTokenSource();
                    var ct = state.ProbeCts.Token;
-                    _ = Task.Run(() => ProbeLoopAsync(state, ct), ct);
+                    // Keep the loop Task so ShutdownAsync can await its clean exit before
+                    // disposing the CTS / handles the loop is still using (Driver.AbCip-008).
+                    state.ProbeTask = Task.Run(() => ProbeLoopAsync(state, ct), ct);
                }
            }
            _health = new DriverHealth(DriverState.Healthy, DateTime.UtcNow, null);
@@ -166,15 +208,46 @@ public sealed class AbCipDriver : IDriver, IReadable, IWritable, ITagDiscovery,
        await InitializeAsync(driverConfigJson, cancellationToken).ConfigureAwait(false);
    }

+    /// <summary>
+    ///     Tear the driver down: stop the alarm projection + poll engine, then for each device
+    ///     cancel its probe loop, <em>await the loop's clean exit</em>, and only then dispose
+    ///     the probe CTS + runtime handles. Awaiting the probe Task before disposing closes the
+    ///     race where a still-running loop touches a disposed CTS or a cleared runtime
+    ///     dictionary (Driver.AbCip-008). Idempotent — safe to call twice (e.g. ShutdownAsync
+    ///     from ReinitializeAsync followed by DisposeAsync).
+    /// </summary>
    public async Task ShutdownAsync(CancellationToken cancellationToken)
    {
        await _alarmProjection.DisposeAsync().ConfigureAwait(false);
        await _poll.DisposeAsync().ConfigureAwait(false);
+
+        // Phase 1: signal every probe loop to stop.
+        foreach (var state in _devices.Values)
+        {
+            try { state.ProbeCts?.Cancel(); } catch (ObjectDisposedException) { }
+        }
+
+        // Phase 2: wait for each probe loop to observe cancellation and exit. The loop never
+        // throws on cancellation (it catches OperationCanceledException internally), but guard
+        // anyway so one slow device can't wedge the whole shutdown.
+        foreach (var state in _devices.Values)
+        {
+            var probeTask = state.ProbeTask;
+            if (probeTask is null) continue;
+            try
+            {
+                await probeTask.WaitAsync(TimeSpan.FromSeconds(10), cancellationToken).ConfigureAwait(false);
+            }
+            catch (TimeoutException) { }
+            catch (OperationCanceledException) { }
+        }
+
+        // Phase 3: now the loops are gone, dispose the CTS + native handles with no live reader.
        foreach (var state in _devices.Values)
        {
-            try { state.ProbeCts?.Cancel(); } catch { }
            state.ProbeCts?.Dispose();
            state.ProbeCts = null;
+            state.ProbeTask = null;
            state.DisposeHandles();
        }
        _devices.Clear();
@@ -316,7 +389,7 @@ public sealed class AbCipDriver : IDriver, IReadable, IWritable, ITagDiscovery,
    /// <summary>
    ///     Read each <c>fullReference</c> in order. Unknown tags surface as
    ///     <c>BadNodeIdUnknown</c>; libplctag-layer failures map through
-    ///     <see cref="AbCipStatusMapper.MapLibplctagStatus"/>; any other exception becomes
+    ///     <see cref="AbCipStatusMapper.MapLibplctagStatus(int)"/>; any other exception becomes
    ///     <c>BadCommunicationError</c>. The driver health surface is updated per-call so the
    ///     Admin UI sees a tight feedback loop between read failures + the driver's state.
    /// </summary>
@@ -331,8 +404,12 @@ public sealed class AbCipDriver : IDriver, IReadable, IWritable, ITagDiscovery,
        // whole-UDT read + in-memory member decode; every other reference falls back to the
        // per-tag path that's been here since PR 3. Planner is a pure function over the
        // current tag map; BOOL/String/Structure members stay on the fallback path because
-        // declaration-only offsets can't place them under Logix alignment rules.
-        var plan = AbCipUdtReadPlanner.Build(fullReferences, _tagsByName);
+        // declaration-only offsets can't place them under Logix alignment rules. Whole-UDT
+        // grouping is itself gated behind EnableDeclarationOnlyUdtGrouping — Studio 5000 may
+        // reorder UDT members vs declaration order, so the fast path is opt-in only (see
+        // Driver.AbCip-003 / AbCipUdtMemberLayout remarks).
+        var plan = AbCipUdtReadPlanner.Build(
+            fullReferences, _tagsByName, _options.EnableDeclarationOnlyUdtGrouping);

        foreach (var group in plan.Groups)
            await ReadGroupAsync(group, results, now, cancellationToken).ConfigureAwait(false);
@@ -351,6 +428,15 @@ public sealed class AbCipDriver : IDriver, IReadable, IWritable, ITagDiscovery,
            results[fb.OriginalIndex] = new DataValueSnapshot(null, AbCipStatusMapper.BadNodeIdUnknown, null, now);
            return;
        }
+        // Driver.AbCip-005: a Structure tag whose Members are declared is a container —
+        // its bare name is readable via the whole-UDT grouping path (ReadGroupAsync), not the
+        // per-tag path. Reading it here returns BadNotSupported rather than Good/null so the
+        // caller knows to address individual member paths (e.g. "Motor.Speed").
+        if (def.DataType == AbCipDataType.Structure && def.Members is { Count: > 0 })
+        {
+            results[fb.OriginalIndex] = new DataValueSnapshot(null, AbCipStatusMapper.BadNotSupported, null, now);
+            return;
+        }
        if (!_devices.TryGetValue(def.DeviceHostAddress, out var device))
        {
            results[fb.OriginalIndex] = new DataValueSnapshot(null, AbCipStatusMapper.BadNodeIdUnknown, null, now);
@@ -365,6 +451,11 @@ public sealed class AbCipDriver : IDriver, IReadable, IWritable, ITagDiscovery,
            var status = runtime.GetStatus();
            if (status != 0)
            {
+                // Evict the stale handle so the next call re-creates it (Driver.AbCip-010).
+                // A non-zero status can mean the controller dropped the connection or the tag
+                // handle became permanently invalid (e.g. after a PLC download). Evicting
+                // mirrors the probe loop's recreate-on-failure behaviour.
+                EvictRuntime(device, def.Name);
                results[fb.OriginalIndex] = new DataValueSnapshot(null,
                    AbCipStatusMapper.MapLibplctagStatus(status), null, now);
                _health = new DriverHealth(DriverState.Degraded, _health.LastSuccessfulRead,
@@ -384,6 +475,8 @@ public sealed class AbCipDriver : IDriver, IReadable, IWritable, ITagDiscovery,
        }
        catch (Exception ex)
        {
+            // Transport exception — evict so the next read creates a fresh handle.
+            EvictRuntime(device, def.Name);
            results[fb.OriginalIndex] = new DataValueSnapshot(null,
                AbCipStatusMapper.BadCommunicationError, null, now);
            _health = new DriverHealth(DriverState.Degraded, _health.LastSuccessfulRead, ex.Message);
@@ -416,6 +509,7 @@ public sealed class AbCipDriver : IDriver, IReadable, IWritable, ITagDiscovery,
            var status = runtime.GetStatus();
            if (status != 0)
            {
+                EvictRuntime(device, parent.Name); // Driver.AbCip-010
                var mapped = AbCipStatusMapper.MapLibplctagStatus(status);
                StampGroupStatus(group, results, now, mapped);
                _health = new DriverHealth(DriverState.Degraded, _health.LastSuccessfulRead,
@@ -436,6 +530,7 @@ public sealed class AbCipDriver : IDriver, IReadable, IWritable, ITagDiscovery,
        }
        catch (Exception ex)
        {
+            EvictRuntime(device, parent.Name); // Driver.AbCip-010
            StampGroupStatus(group, results, now, AbCipStatusMapper.BadCommunicationError);
            _health = new DriverHealth(DriverState.Degraded, _health.LastSuccessfulRead, ex.Message);
        }
@@ -506,10 +601,16 @@ public sealed class AbCipDriver : IDriver, IReadable, IWritable, ITagDiscovery,
                await runtime.WriteAsync(cancellationToken).ConfigureAwait(false);

                var status = runtime.GetStatus();
-                results[i] = new WriteResult(status == 0
-                    ? AbCipStatusMapper.Good
-                    : AbCipStatusMapper.MapLibplctagStatus(status));
-                if (status == 0) _health = new DriverHealth(DriverState.Healthy, now, null);
+                if (status != 0)
+                {
+                    EvictRuntime(device, def.Name); // Driver.AbCip-010
+                    results[i] = new WriteResult(AbCipStatusMapper.MapLibplctagStatus(status));
+                }
+                else
+                {
+                    results[i] = new WriteResult(AbCipStatusMapper.Good);
+                    _health = new DriverHealth(DriverState.Healthy, now, null);
+                }
            }
            catch (OperationCanceledException)
            {
@@ -517,11 +618,13 @@ public sealed class AbCipDriver : IDriver, IReadable, IWritable, ITagDiscovery,
            }
            catch (NotSupportedException nse)
            {
+                // Type/protocol error — not a transport fault; don't evict the handle.
                results[i] = new WriteResult(AbCipStatusMapper.BadNotSupported);
                _health = new DriverHealth(DriverState.Degraded, _health.LastSuccessfulRead, nse.Message);
            }
            catch (FormatException fe)
            {
+                // Value conversion error — not a transport fault; don't evict.
                results[i] = new WriteResult(AbCipStatusMapper.BadTypeMismatch);
                _health = new DriverHealth(DriverState.Degraded, _health.LastSuccessfulRead, fe.Message);
            }
@@ -537,6 +640,8 @@ public sealed class AbCipDriver : IDriver, IReadable, IWritable, ITagDiscovery,
            }
            catch (Exception ex)
            {
+                // Transport / wire error — evict so the next write creates a fresh handle.
+                EvictRuntime(device, def.Name); // Driver.AbCip-010
                results[i] = new WriteResult(AbCipStatusMapper.BadCommunicationError);
                _health = new DriverHealth(DriverState.Degraded, _health.LastSuccessfulRead, ex.Message);
            }
@@ -609,8 +714,12 @@ public sealed class AbCipDriver : IDriver, IReadable, IWritable, ITagDiscovery,
            runtime.Dispose();
            throw;
        }
-        device.ParentRuntimes[parentTagName] = runtime;
-        return runtime;
+        // Two concurrent callers can both miss the cache + both initialize a runtime; only the
+        // first TryAdd wins. Dispose the loser so it doesn't leak a native tag handle.
+        if (device.ParentRuntimes.TryAdd(parentTagName, runtime))
+            return runtime;
+        runtime.Dispose();
+        return device.ParentRuntimes[parentTagName];
    }

    /// <summary>
@@ -643,8 +752,27 @@ public sealed class AbCipDriver : IDriver, IReadable, IWritable, ITagDiscovery,
            runtime.Dispose();
            throw;
        }
-        device.Runtimes[def.Name] = runtime;
-        return runtime;
+        // Two concurrent callers can both miss the cache + both initialize a runtime; only the
+        // first TryAdd wins. Dispose the loser so it doesn't leak a native tag handle.
+        if (device.Runtimes.TryAdd(def.Name, runtime))
+            return runtime;
+        runtime.Dispose();
+        return device.Runtimes[def.Name];
+    }
+
+    /// <summary>
+    ///     Evict the runtime for <paramref name="tagName"/> from the device's cache and dispose
+    ///     it so the next read/write call re-creates and re-initializes a fresh handle.
+    ///     Called from <see cref="ReadSingleAsync"/>, <see cref="ReadGroupAsync"/>, and
+    ///     <see cref="WriteAsync"/> after a non-zero libplctag status or transport exception —
+    ///     mirroring the probe loop's recreate-on-failure behaviour (Driver.AbCip-010).
+    /// </summary>
+    private static void EvictRuntime(DeviceState device, string tagName)
+    {
+        if (device.Runtimes.TryRemove(tagName, out var stale))
+        {
+            try { stale.Dispose(); } catch { }
+        }
    }

    public DriverHealth GetHealth() => _health;
@@ -785,8 +913,10 @@ public sealed class AbCipDriver : IDriver, IReadable, IWritable, ITagDiscovery,

    /// <summary>
    ///     Per-device runtime state. Holds the parsed host address, family profile, and the
-    ///     live <see cref="PlcTagHandle"/> cache keyed by tag path. PRs 3–8 populate + consume
-    ///     this dict via libplctag.
+    ///     live libplctag.NET <see cref="IAbCipTagRuntime"/> instances keyed by tag name.
+    ///     Native tag lifetime is owned by the <c>Tag.Dispose()</c> inside each
+    ///     <see cref="LibplctagTagRuntime"/>; libplctag.NET's own finalizer covers GC-collected
+    ///     instances so no separate SafeHandle wrapper is needed here (Driver.AbCip-006).
    /// </summary>
    internal sealed class DeviceState(
        AbCipHostAddress parsedAddress,
@@ -803,14 +933,23 @@ public sealed class AbCipDriver : IDriver, IReadable, IWritable, ITagDiscovery,
        public CancellationTokenSource? ProbeCts { get; set; }
        public bool ProbeInitialized { get; set; }

-        public Dictionary<string, PlcTagHandle> TagHandles { get; } =
-            new(StringComparer.OrdinalIgnoreCase);
+        /// <summary>
+        ///     The fire-and-forget probe loop's <see cref="Task"/>. Stored so
+        ///     <see cref="AbCipDriver.ShutdownAsync"/> can await the loop's clean exit after
+        ///     cancelling <see cref="ProbeCts"/> and BEFORE disposing the CTS or the runtime
+        ///     handles — otherwise the still-running loop can touch a disposed CTS or a cleared
+        ///     runtime dictionary (Driver.AbCip-008).
+        /// </summary>
+        public Task? ProbeTask { get; set; }

        /// <summary>
        ///     Per-tag runtime handles owned by this device. One entry per configured tag is
        ///     created lazily on first read (see <see cref="AbCipDriver.EnsureTagRuntimeAsync"/>).
+        ///     <see cref="System.Collections.Concurrent.ConcurrentDictionary{TKey,TValue}"/>
+        ///     because <c>ReadAsync</c> is invoked concurrently by the server read path, every
+        ///     polled subscription loop, and the alarm projection loop.
        /// </summary>
-        public Dictionary<string, IAbCipTagRuntime> Runtimes { get; } =
+        public System.Collections.Concurrent.ConcurrentDictionary<string, IAbCipTagRuntime> Runtimes { get; } =
            new(StringComparer.OrdinalIgnoreCase);

        /// <summary>
@@ -819,7 +958,7 @@ public sealed class AbCipDriver : IDriver, IReadable, IWritable, ITagDiscovery,
        ///     bit-selector tag name ("Motor.Flags.3") needs a distinct handle from the DINT
        ///     parent ("Motor.Flags") used to do the read + write.
        /// </summary>
-        public Dictionary<string, IAbCipTagRuntime> ParentRuntimes { get; } =
+        public System.Collections.Concurrent.ConcurrentDictionary<string, IAbCipTagRuntime> ParentRuntimes { get; } =
            new(StringComparer.OrdinalIgnoreCase);

        private readonly System.Collections.Concurrent.ConcurrentDictionary<string, SemaphoreSlim> _rmwLocks = new();
@@ -829,8 +968,6 @@ public sealed class AbCipDriver : IDriver, IReadable, IWritable, ITagDiscovery,

        public void DisposeHandles()
        {
-            foreach (var h in TagHandles.Values) h.Dispose();
-            TagHandles.Clear();
            foreach (var r in Runtimes.Values) r.Dispose();
            Runtimes.Clear();
            foreach (var r in ParentRuntimes.Values) r.Dispose();
@@ -21,6 +21,20 @@ public static class AbCipDriverFactoryExtensions
    }

    internal static AbCipDriver CreateInstance(string driverInstanceId, string driverConfigJson)
+    {
+        ArgumentException.ThrowIfNullOrWhiteSpace(driverInstanceId);
+        var options = ParseOptions(driverInstanceId, driverConfigJson);
+        return new AbCipDriver(options, driverInstanceId);
+    }
+
+    /// <summary>
+    ///     Deserialise an AB CIP driver-config JSON document into <see cref="AbCipDriverOptions"/>.
+    ///     Shared by <see cref="CreateInstance"/> (first construction) and
+    ///     <see cref="AbCipDriver.InitializeAsync"/> / <see cref="AbCipDriver.ReinitializeAsync"/>
+    ///     so a reinitialize with a changed config JSON (new device, new tag, changed timeout)
+    ///     actually takes effect rather than being silently discarded.
+    /// </summary>
+    internal static AbCipDriverOptions ParseOptions(string driverInstanceId, string driverConfigJson)
    {
        ArgumentException.ThrowIfNullOrWhiteSpace(driverInstanceId);
        ArgumentException.ThrowIfNullOrWhiteSpace(driverConfigJson);
@@ -29,7 +43,7 @@ public static class AbCipDriverFactoryExtensions
            ?? throw new InvalidOperationException(
                $"AB CIP driver config for '{driverInstanceId}' deserialised to null");

-        var options = new AbCipDriverOptions
+        return new AbCipDriverOptions
        {
            Devices = dto.Devices is { Count: > 0 }
                ? [.. dto.Devices.Select(d => new AbCipDeviceOptions(
@@ -53,9 +67,8 @@ public static class AbCipDriverFactoryExtensions
            EnableControllerBrowse = dto.EnableControllerBrowse ?? false,
            EnableAlarmProjection = dto.EnableAlarmProjection ?? false,
            AlarmPollInterval = TimeSpan.FromMilliseconds(dto.AlarmPollIntervalMs ?? 1_000),
+            EnableDeclarationOnlyUdtGrouping = dto.EnableDeclarationOnlyUdtGrouping ?? false,
        };
-
-        return new AbCipDriver(options, driverInstanceId);
    }

    private static AbCipTagDefinition BuildTag(AbCipTagDto t, string driverInstanceId) =>
@@ -108,6 +121,7 @@ public static class AbCipDriverFactoryExtensions
        public int? TimeoutMs { get; init; }
        public bool? EnableControllerBrowse { get; init; }
        public bool? EnableAlarmProjection { get; init; }
+        public bool? EnableDeclarationOnlyUdtGrouping { get; init; }
        public int? AlarmPollIntervalMs { get; init; }
        public List<AbCipDeviceDto>? Devices { get; init; }
        public List<AbCipTagDto>? Tags { get; init; }
@@ -56,6 +56,20 @@ public sealed class AbCipDriverOptions
    ///     1 second — matches typical SCADA alarm-refresh conventions.
    /// </summary>
    public TimeSpan AlarmPollInterval { get; init; } = TimeSpan.FromSeconds(1);
+
+    /// <summary>
+    ///     Opt-in for the declaration-only whole-UDT read fast path. When <c>false</c> (the
+    ///     default) a batch of UDT members is always read per-member, because the byte offsets
+    ///     computed by <see cref="AbCipUdtMemberLayout"/> assume the controller lays members
+    ///     out in declaration order — and the Studio 5000 compiler does NOT guarantee that
+    ///     (it reorders for largest-first packing, BOOL host bytes, nested-struct padding).
+    ///     Decoding at declaration-order offsets against a reordered controller layout yields
+    ///     silently-plausible wrong numbers. Set <c>true</c> only when the operator has
+    ///     hand-verified that every configured UDT's member declaration order matches the
+    ///     controller's compiled layout; in that case whole-UDT grouping collapses N member
+    ///     reads into one. The richer CIP Template Object path remains the long-term fix.
+    /// </summary>
+    public bool EnableDeclarationOnlyUdtGrouping { get; init; }
 }

 /// <summary>
@@ -1,3 +1,5 @@
+using libplctag;
+
 namespace ZB.MOM.WW.OtOpcUa.Driver.AbCip;

 /// <summary>
@@ -24,8 +26,10 @@ namespace ZB.MOM.WW.OtOpcUa.Driver.AbCip;
 ///             writes during download / test-mode transitions).</item>
 ///         <item>0x16 object does not exist — <c>BadNodeIdUnknown</c>.</item>
 ///         <item>0x1E embedded service error — unwrap to the extended status when possible.</item>
-///         <item>any libplctag <c>PLCTAG_STATUS_*</c> below zero — wrapped as
-///             <c>BadCommunicationError</c> until fine-grained mapping lands (PR 3).</item>
+///         <item>libplctag.NET <see cref="Status"/> errors — mapped per-member by
+///             <see cref="MapLibplctagStatus(Status)"/>: timeout, not-found, not-allowed, and
+///             out-of-bounds get their specific OPC UA codes; the remaining transport errors
+///             fold into <c>BadCommunicationError</c>.</item>
 ///     </list>
 /// </remarks>
 public static class AbCipStatusMapper
@@ -58,22 +62,34 @@ public static class AbCipStatusMapper
    };

    /// <summary>
-    ///     Map a libplctag return/status code (<c>PLCTAG_STATUS_*</c>) to an OPC UA StatusCode.
-    ///     libplctag uses <c>0 = PLCTAG_STATUS_OK</c>, positive values for pending, negative
-    ///     values for errors.
+    ///     Map a libplctag return/status code to an OPC UA StatusCode. The integer passed here
+    ///     is <c>(int)Tag.GetStatus()</c> — i.e. the underlying value of the libplctag.NET
+    ///     <see cref="Status"/> enum, NOT a raw native <c>PLCTAG_ERR_*</c> constant. The wrapper
+    ///     renumbers the native codes into a contiguous enum, so this method switches on the
+    ///     <see cref="Status"/> members directly to stay correct if the wrapper renumbers again.
+    ///     <see cref="Status.Ok"/> is success; <see cref="Status.Pending"/> is an in-flight
+    ///     operation; every other (negative) member is an error.
    /// </summary>
-    public static uint MapLibplctagStatus(int status)
+    public static uint MapLibplctagStatus(int status) => MapLibplctagStatus((Status)status);
+
+    /// <summary>
+    ///     Map a libplctag.NET <see cref="Status"/> enum value to an OPC UA StatusCode. This is
+    ///     the strongly-typed core of the mapper; the <c>int</c> overload exists only for the
+    ///     <see cref="IAbCipTagRuntime.GetStatus"/> seam, which returns the boxed-as-int value.
+    /// </summary>
+    public static uint MapLibplctagStatus(Status status) => status switch
    {
-        if (status == 0) return Good;
-        if (status > 0) return GoodMoreData; // PLCTAG_STATUS_PENDING
-        return status switch
-        {
-            -5 => BadTimeout,              // PLCTAG_ERR_TIMEOUT
-            -7 => BadCommunicationError,   // PLCTAG_ERR_BAD_CONNECTION
-            -14 => BadNodeIdUnknown,       // PLCTAG_ERR_NOT_FOUND
-            -16 => BadNotWritable,         // PLCTAG_ERR_NOT_ALLOWED / read-only tag
-            -17 => BadOutOfRange,          // PLCTAG_ERR_OUT_OF_BOUNDS
-            _ => BadCommunicationError,
-        };
-    }
+        Status.Ok => Good,
+        Status.Pending => GoodMoreData,
+        Status.ErrorTimeout => BadTimeout,
+        Status.ErrorNotFound or Status.ErrorNoMatch or Status.ErrorBadDevice => BadNodeIdUnknown,
+        Status.ErrorNotAllowed => BadNotWritable,
+        Status.ErrorOutOfBounds or Status.ErrorTooLarge or Status.ErrorTooSmall => BadOutOfRange,
+        Status.ErrorUnsupported or Status.ErrorNotImplemented => BadNotSupported,
+        Status.ErrorBadConnection or Status.ErrorBadGateway or Status.ErrorBadReply
+            or Status.ErrorWinsock or Status.ErrorOpen or Status.ErrorClose
+            or Status.ErrorRead or Status.ErrorWrite or Status.ErrorRemoteErr
+            or Status.ErrorPartial or Status.ErrorAbort => BadCommunicationError,
+        _ => BadCommunicationError,
+    };
 }
@@ -8,17 +8,27 @@ namespace ZB.MOM.WW.OtOpcUa.Driver.AbCip;
 ///     list that <see cref="AbCipDriver.ReadAsync"/> runs through its existing read path.
 ///     Pure function — the planner never touches the runtime + never reads the PLC.
 /// </summary>
+/// <remarks>
+///     The grouped offsets come from <see cref="AbCipUdtMemberLayout"/>, which assumes the
+///     controller lays members out in declaration order. Studio 5000 does not guarantee that,
+///     so grouping is gated behind <see cref="AbCipDriverOptions.EnableDeclarationOnlyUdtGrouping"/>:
+///     when grouping is disabled every member falls back to its own per-tag read.
+/// </remarks>
 public static class AbCipUdtReadPlanner
 {
    /// <summary>
    ///     Split <paramref name="requests"/> into whole-UDT groups + per-tag leftovers.
    ///     <paramref name="tagsByName"/> is the driver's <c>_tagsByName</c> map — both parent
    ///     UDT rows and their fanned-out member rows live there. Lookup is OrdinalIgnoreCase
-    ///     to match the driver's dictionary semantics.
+    ///     to match the driver's dictionary semantics. When
+    ///     <paramref name="enableDeclarationOnlyGrouping"/> is <c>false</c> no groups are
+    ///     formed — every reference goes to the per-tag fallback path so member decoding never
+    ///     relies on declaration-order offsets that may not match the controller layout.
    /// </summary>
    public static AbCipUdtReadPlan Build(
        IReadOnlyList<string> requests,
-        IReadOnlyDictionary<string, AbCipTagDefinition> tagsByName)
+        IReadOnlyDictionary<string, AbCipTagDefinition> tagsByName,
+        bool enableDeclarationOnlyGrouping = false)
    {
        ArgumentNullException.ThrowIfNull(requests);
        ArgumentNullException.ThrowIfNull(tagsByName);
@@ -26,6 +36,13 @@ public static class AbCipUdtReadPlanner
        var fallback = new List<AbCipUdtReadFallback>(requests.Count);
        var byParent = new Dictionary<string, List<AbCipUdtReadMember>>(StringComparer.OrdinalIgnoreCase);

+        if (!enableDeclarationOnlyGrouping)
+        {
+            for (var i = 0; i < requests.Count; i++)
+                fallback.Add(new AbCipUdtReadFallback(i, requests[i]));
+            return new AbCipUdtReadPlan([], fallback);
+        }
+
        for (var i = 0; i < requests.Count; i++)
        {
            var name = requests[i];
--- a/Show More
+++ b/Show More