docs(code-reviews): comprehensive per-module review pass at 76d35d1
Reviewed all 31 src/ production projects against the 10-category checklist in REVIEW-PROCESS.md. Each module gets its own findings.md; code-reviews/README.md is regenerated from them. 334 findings: 6 Critical, 46 High, 126 Medium, 156 Low. Critical findings: - Server-001: WriteNodeIdUnknown recurses unconditionally — a HistoryRead on an unresolvable node crashes the process (remote DoS). - Admin-001/002: app-wide auth bypass (RouteView not AuthorizeRouteView) plus unauthenticated mutating routes. - Core.Scripting-001: System.Environment reachable from operator scripts; Environment.Exit() terminates the server. - Core.AlarmHistorian-001: rowIds/events parallel-list desync on a corrupt payload misapplies outcomes — silent alarm-event data loss. - Driver.Galaxy-001: ReconnectSupervisor is built but never triggered, so a transient gateway drop permanently kills the event stream. All findings are Status=Open; resolution is tracked per REVIEW-PROCESS.md section 4. Review only — no source code changed. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
207
code-reviews/Admin/findings.md
Normal file
207
code-reviews/Admin/findings.md
Normal file
@@ -0,0 +1,207 @@
|
||||
# Code Review — Admin
|
||||
|
||||
| Field | Value |
|
||||
|---|---|
|
||||
| Module | `src/Server/ZB.MOM.WW.OtOpcUa.Admin` |
|
||||
| Reviewer | Claude Code |
|
||||
| Review date | 2026-05-22 |
|
||||
| Commit reviewed | `76d35d1` |
|
||||
| Status | Reviewed |
|
||||
| Open findings | 12 |
|
||||
|
||||
## Checklist coverage
|
||||
|
||||
| # | Category | Result |
|
||||
|---|---|---|
|
||||
| 1 | Correctness & logic bugs | Admin-005 |
|
||||
| 2 | OtOpcUa conventions | Admin-010 |
|
||||
| 3 | Concurrency & thread safety | Admin-011 |
|
||||
| 4 | Error handling & resilience | Admin-008 |
|
||||
| 5 | Security | Admin-001, Admin-002, Admin-003, Admin-004, Admin-006 |
|
||||
| 6 | Performance & resource management | No issues found |
|
||||
| 7 | Design-document adherence | Admin-007, Admin-012 |
|
||||
| 8 | Code organization & conventions | No issues found |
|
||||
| 9 | Testing coverage | Admin-009 |
|
||||
| 10 | Documentation & comments | Admin-012 |
|
||||
|
||||
## Findings
|
||||
|
||||
### Admin-001
|
||||
|
||||
| Field | Value |
|
||||
|---|---|
|
||||
| Severity | Critical |
|
||||
| Category | Security |
|
||||
| Location | `Components/Routes.razor:4-11`, `Program.cs:150` |
|
||||
| Status | Open |
|
||||
|
||||
**Description:** The router uses a plain `RouteView` (not `AuthorizeRouteView`), and `MapRazorComponents<App>()` is registered without `.RequireAuthorization()`. A page-level `[Authorize]` attribute on a routable Razor component is only enforced when the router is `AuthorizeRouteView` — with `RouteView` the attribute is inert. Consequently every page in the app, including those that carry `@attribute [Authorize]` (`ClusterDetail`, `DraftEditor`, `Reservations`, `RoleGrants`, `Certificates`, `VirtualTags`, `ScriptedAlarms`, `ScriptLog`, `DiffViewer`, `ImportEquipment`, `Account`), is reachable by a fully unauthenticated user. There is no authentication gate anywhere in the pipeline. An anonymous browser can read the full fleet configuration, audit log, certificates and ACLs, and exercise mutating pages (see Admin-002).
|
||||
|
||||
**Recommendation:** Replace `RouteView` with `AuthorizeRouteView` in `Routes.razor` (with a `<NotAuthorized>` slot that redirects to `/login`), or call `.RequireAuthorization()` on the `MapRazorComponents` endpoint with `/login` and `/auth/*` explicitly allowed anonymous. Add a fallback policy (`AddAuthorizationBuilder().SetFallbackPolicy(...)`) so new pages are secure-by-default. Re-verify every page after the gate is in place.
|
||||
|
||||
**Resolution:** _(open)_
|
||||
|
||||
### Admin-002
|
||||
|
||||
| Field | Value |
|
||||
|---|---|
|
||||
| Severity | Critical |
|
||||
| Category | Security |
|
||||
| Location | `Components/Pages/Clusters/NewCluster.razor:1-7`, `Home.razor`, `Fleet.razor`, `Hosts.razor`, `AlarmsHistorian.razor`, `Clusters/ClustersList.razor`, `Clusters/Generations.razor`, `Drivers/FocasDetail.razor` |
|
||||
| Status | Open |
|
||||
|
||||
**Description:** Several routable pages carry no authorization attribute at all. Most critically `NewCluster` (`/clusters/new`) is a mutating page — its `CreateAsync` writes a new `ServerCluster` row and a draft generation. Combined with Admin-001 (the router does not enforce `[Authorize]` either), an unauthenticated user can create clusters and seed config-DB rows. `Home`, `Fleet`, `Hosts`, `AlarmsHistorian`, `ClustersList`, `Generations` and `FocasDetail` likewise expose fleet topology, host status, historian diagnostics and generation history to anonymous callers.
|
||||
|
||||
**Recommendation:** Add `@attribute [Authorize(...)]` to every routable page with the role/policy appropriate to its function (`NewCluster` and other write surfaces -> `CanPublish`/`CanEdit`; read pages -> an authenticated-user policy). A solution-wide fallback policy (see Admin-001) is the durable fix; per-page attributes remain the explicit declaration of intent.
|
||||
|
||||
**Resolution:** _(open)_
|
||||
|
||||
### Admin-003
|
||||
|
||||
| Field | Value |
|
||||
|---|---|
|
||||
| Severity | High |
|
||||
| Category | Security |
|
||||
| Location | `Program.cs:137-139`, `Hubs/FleetStatusHub.cs:11`, `Hubs/AlertHub.cs:10`, `Hubs/ScriptLogHub.cs:30` |
|
||||
| Status | Open |
|
||||
|
||||
**Description:** All three SignalR hubs (`/hubs/fleet`, `/hubs/alerts`, `/hubs/script-log`) are mapped with no `[Authorize]` attribute and no `.RequireAuthorization()` on the `MapHub` call. Any unauthenticated client can open a hub connection: `FleetStatusHub.SubscribeFleet()` streams every node generation/role/resilience state, `AlertHub` pushes all fleet alerts (including failure detail text), and `ScriptLogHub.TailLogAsync` streams the contents of the server `scripts-*.log` files. This is an unauthenticated information-disclosure channel that bypasses the (already broken — see Admin-001) page auth entirely.
|
||||
|
||||
**Recommendation:** Add `[Authorize]` to each `Hub` class, or chain `.RequireAuthorization()` onto each `MapHub(...)` call in `Program.cs`. The hub `SubscribeCluster`/`TailLogAsync` methods should additionally validate that the caller claims permit the requested cluster/script scope.
|
||||
|
||||
**Resolution:** _(open)_
|
||||
|
||||
### Admin-004
|
||||
|
||||
| Field | Value |
|
||||
|---|---|
|
||||
| Severity | High |
|
||||
| Category | Security |
|
||||
| Location | `appsettings.json:3,13-14` |
|
||||
| Status | Open |
|
||||
|
||||
**Description:** The checked-in `appsettings.json` contains live-looking secrets in plaintext: the `ConfigDb` connection string with `User Id=sa;Password=OtOpcUaDev_2026!` and the LDAP `ServiceAccountPassword: "serviceaccount123"`. It also sets `Encrypt=False` and `AllowInsecureLdap: true`, so the SQL and LDAP credentials travel unencrypted on the wire. Committing the `sa` account password and a service-account password to source control is a credential-exposure risk; `sa` additionally grants full server control, conflicting with the `ClusterService` doc comment that production should connect with a least-privilege grant.
|
||||
|
||||
**Recommendation:** Move all secrets out of the committed file — use user-secrets for dev and environment variables / a secret store for production; leave only non-secret placeholders in `appsettings.json`. Use a least-privilege SQL login rather than `sa`. Enable TLS for both SQL (`Encrypt=True`) and LDAP (`UseTls=true`, `AllowInsecureLdap=false`) for any non-loopback deployment, and document the dev-only exception.
|
||||
|
||||
**Resolution:** _(open)_
|
||||
|
||||
### Admin-005
|
||||
|
||||
| Field | Value |
|
||||
|---|---|
|
||||
| Severity | High |
|
||||
| Category | Correctness & logic bugs |
|
||||
| Location | `Components/Pages/Login.razor:15,107-110` |
|
||||
| Status | Open |
|
||||
|
||||
**Description:** `Login.razor` is an interactive component (the project default render mode is interactive server; the page declares no `@rendermode` but uses `EditForm`/`InputText` interactive binding and runs `SignInAsync` from an event handler). It calls `HttpContext.SignInAsync(...)` followed by `ctx.Response.Redirect("/")` from within a SignalR circuit callback. Writing auth cookies and HTTP redirect headers requires a live, unstarted HTTP response; in an interactive circuit the original HTTP response has long completed, so the cookie is typically not emitted and the redirect is ineffective (or throws "response has already started"). `admin-ui.md` section "Operator authentication" explicitly specifies the login as a static server-rendered HTML form POSTing to a `/auth/login` minimal-API endpoint with `data-enhance="false"` — that endpoint is not implemented and is not mapped in `Program.cs`.
|
||||
|
||||
**Recommendation:** Implement the login as designed: a static-rendered form (`@rendermode` none, `data-enhance="false"`) posting to a `MapPost("/auth/login", ...)` minimal-API handler that does the LDAP bind, grant resolution, `SignInAsync` and redirect while the HTTP response is still owned by the endpoint. Do not perform `SignInAsync` from an interactive circuit.
|
||||
|
||||
**Resolution:** _(open)_
|
||||
|
||||
### Admin-006
|
||||
|
||||
| Field | Value |
|
||||
|---|---|
|
||||
| Severity | Medium |
|
||||
| Category | Security |
|
||||
| Location | `Components/Layout/MainLayout.razor:47-49`, `Program.cs:129,131-135` |
|
||||
| Status | Open |
|
||||
|
||||
**Description:** `app.UseAntiforgery()` is enabled, but the Sign-out form (`<form method="post" action="/auth/logout">`) renders no antiforgery token, and the `MapPost("/auth/logout", ...)` endpoint does not call `.DisableAntiforgery()` or otherwise opt out. Depending on framework version this either makes logout fail with a 400 for legitimate users, or — if the endpoint is treated as exempt — leaves logout as an unprotected state-changing POST (CSRF logout). The same concern applies to the login form once Admin-005 is addressed.
|
||||
|
||||
**Recommendation:** Emit an antiforgery token in the logout form and let `UseAntiforgery()` validate it; or explicitly and deliberately mark the endpoint `.DisableAntiforgery()` if a tokenless logout is intended. Verify login/logout round-trips after the change.
|
||||
|
||||
**Resolution:** _(open)_
|
||||
|
||||
### Admin-007
|
||||
|
||||
| Field | Value |
|
||||
|---|---|
|
||||
| Severity | Medium |
|
||||
| Category | Design-document adherence |
|
||||
| Location | `Components/Pages/Clusters/NewCluster.razor:91,95-96` |
|
||||
| Status | Open |
|
||||
|
||||
**Description:** `NewCluster.CreateAsync` hardcodes `CreatedBy = "admin-ui"` (both on the `ServerCluster` row and the draft generation) instead of the signed-in operator principal name. `admin-ui.md` section "Audit" requires "the operator principal" be recorded on every write. The audit trail therefore cannot attribute cluster creation to a person. The same literal would apply to any anonymous creation that Admin-001/002 currently permit.
|
||||
|
||||
**Recommendation:** Pass the authenticated user identity (`ClaimTypes.Name` / `NameIdentifier` from the cascaded `AuthenticationState`) as `createdBy`. Apply the same pattern to every other Admin write path that records a `CreatedBy`/`PublishedBy`/`ReleasedBy` field.
|
||||
|
||||
**Resolution:** _(open)_
|
||||
|
||||
### Admin-008
|
||||
|
||||
| Field | Value |
|
||||
|---|---|
|
||||
| Severity | Medium |
|
||||
| Category | Error handling & resilience |
|
||||
| Location | `Services/ReservationService.cs:28-37` |
|
||||
| Status | Open |
|
||||
|
||||
**Description:** `ReservationService.ReleaseAsync` calls `sp_ReleaseExternalIdReservation` with only `@Kind`, `@Value`, `@ReleaseReason`. `admin-ui.md` section "Release an external-ID reservation" specifies the proc sets `ReleasedBy` to the FleetAdmin who performed the release, and the action is the only path that allows ZTag/SAPID reuse and "requires explicit FleetAdmin action with a documented reason." The service does not capture or pass the operator principal, so the compliance audit trail for a release records no actor (unless the proc derives it from the DB session login, which would be the shared service account, not the operator).
|
||||
|
||||
**Recommendation:** Add an operator-principal parameter to `ReleaseAsync`, pass it to the stored proc as `@ReleasedBy`, and have callers supply the signed-in user. Confirm the proc signature accepts it.
|
||||
|
||||
**Resolution:** _(open)_
|
||||
|
||||
### Admin-009
|
||||
|
||||
| Field | Value |
|
||||
|---|---|
|
||||
| Severity | Medium |
|
||||
| Category | Testing coverage |
|
||||
| Location | `src/Server/ZB.MOM.WW.OtOpcUa.Admin` (whole module) |
|
||||
| Status | Open |
|
||||
|
||||
**Description:** The module most security-critical behaviours have no enforced test coverage at the boundary that matters. There is no test that an unauthenticated request to a page or hub is rejected (which would have caught Admin-001/002/003), no test of the login -> cookie issuance round-trip (Admin-005), and the `AdminRoleGrantResolver` / `ClusterRoleClaims` authorization logic is exercised only in isolation. `InternalsVisibleTo` points at `ZB.MOM.WW.OtOpcUa.Admin.Tests`, but the auth pipeline itself is not asserted end-to-end. Per `REVIEW-PROCESS.md` category 9 these are untested critical paths.
|
||||
|
||||
**Recommendation:** Add `WebApplicationFactory`-based integration tests asserting: (a) anonymous GET of each protected route returns 302->/login or 401; (b) anonymous hub connect is refused; (c) a valid login issues the cookie and a subsequent request is authorized; (d) a `ConfigViewer` is denied `CanPublish` pages. Wire the check into the `*.Admin.Tests` suite.
|
||||
|
||||
**Resolution:** _(open)_
|
||||
|
||||
### Admin-010
|
||||
|
||||
| Field | Value |
|
||||
|---|---|
|
||||
| Severity | Low |
|
||||
| Category | OtOpcUa conventions |
|
||||
| Location | `Components/App.razor:9,16` |
|
||||
| Status | Open |
|
||||
|
||||
**Description:** `App.razor` loads Bootstrap CSS and JS from the `cdn.jsdelivr.net` CDN. `admin-ui.md` section "Tech Stack" specifies "Bootstrap 5 vendored under `wwwroot/lib/bootstrap/`" precisely so the Admin app has no third-party runtime dependency. A CDN reference makes the UI fail in air-gapped / locked-down fleet deployments (a stated deployment target), introduces an uncontrolled third-party origin, and is not covered by a Subresource Integrity hash.
|
||||
|
||||
**Recommendation:** Vendor Bootstrap under `wwwroot/lib/bootstrap/` and reference the local copies, as the design doc requires. If a CDN is retained for any asset, add `integrity` + `crossorigin` SRI attributes.
|
||||
|
||||
**Resolution:** _(open)_
|
||||
|
||||
### Admin-011
|
||||
|
||||
| Field | Value |
|
||||
|---|---|
|
||||
| Severity | Low |
|
||||
| Category | Concurrency & thread safety |
|
||||
| Location | `Hubs/FleetStatusPoller.cs:24-26,98-103` |
|
||||
| Status | Open |
|
||||
|
||||
**Description:** `FleetStatusPoller` keeps three plain `Dictionary<>` fields (`_last`, `_lastRole`, `_lastResilience`) mutated from `PollOnceAsync`. The poller `ExecuteAsync` loop is single-threaded so the steady-state poll path is safe, but `ResetCache()` (exposed `internal` for tests) clears those same dictionaries with no synchronization. If a test (or any caller) invokes `ResetCache()` while a poll tick is mid-iteration, the `Dictionary` enumeration/mutation race can throw `InvalidOperationException` or corrupt state.
|
||||
|
||||
**Recommendation:** Either document `ResetCache()` as "only safe when the poller is stopped" and have tests stop the service first, or guard the three dictionaries with a lock / swap them atomically. Using `ConcurrentDictionary` (as the sibling `ResilientLdapGroupRoleMappingService` does) would make the intent explicit.
|
||||
|
||||
**Resolution:** _(open)_
|
||||
|
||||
### Admin-012
|
||||
|
||||
| Field | Value |
|
||||
|---|---|
|
||||
| Severity | Low |
|
||||
| Category | Design-document adherence |
|
||||
| Location | `Services/EquipmentCsvImporter.cs:18-19,33-37,229,232` |
|
||||
| Status | Open |
|
||||
|
||||
**Description:** `EquipmentCsvImporter` declares `EquipmentId` as a required CSV column and parses it into a `required` field. `admin-ui.md` section "Equipment CSV import" (revised after adversarial review finding #4) is explicit: "No `EquipmentId` column — operator-supplied EquipmentId would mint duplicate equipment identity on typos ... never accepted from CSV imports." `EquipmentId` is system-derived (`EQ-` plus first 12 hex chars of `EquipmentUuid`). Accepting it from CSV either contradicts the design or silently lets an import set an identity field the doc says is un-settable. The XML doc on the class also cites the column as required per "decision #117", so either the code or the design doc is stale. `EquipmentImportBatchService.StageRowsAsync` propagates `row.EquipmentId` into the staging row, so any change must cover the finalize path.
|
||||
|
||||
**Recommendation:** Reconcile with the design: drop `EquipmentId` from `RequiredColumns` and the `EquipmentCsvRow` shape (deriving it from `EquipmentUuid` at finalize time), or — if accepting it is a deliberate reversal — update `admin-ui.md` and the decision log so the two agree.
|
||||
|
||||
**Resolution:** _(open)_
|
||||
139
code-reviews/Analyzers/findings.md
Normal file
139
code-reviews/Analyzers/findings.md
Normal file
@@ -0,0 +1,139 @@
|
||||
# Code Review — Analyzers
|
||||
|
||||
| Field | Value |
|
||||
|---|---|
|
||||
| Module | `src/Tooling/ZB.MOM.WW.OtOpcUa.Analyzers` |
|
||||
| Reviewer | Claude Code |
|
||||
| Review date | 2026-05-22 |
|
||||
| Commit reviewed | `76d35d1` |
|
||||
| Status | Reviewed |
|
||||
| Open findings | 7 |
|
||||
|
||||
## Checklist coverage
|
||||
|
||||
| # | Category | Result |
|
||||
|---|---|---|
|
||||
| 1 | Correctness & logic bugs | Analyzers-001, Analyzers-002 |
|
||||
| 2 | OtOpcUa conventions | No issues found |
|
||||
| 3 | Concurrency & thread safety | No issues found |
|
||||
| 4 | Error handling & resilience | Analyzers-003 |
|
||||
| 5 | Security | No issues found |
|
||||
| 6 | Performance & resource management | Analyzers-004 |
|
||||
| 7 | Design-document adherence | Analyzers-005 |
|
||||
| 8 | Code organization & conventions | No issues found |
|
||||
| 9 | Testing coverage | Analyzers-006 |
|
||||
| 10 | Documentation & comments | Analyzers-007 |
|
||||
|
||||
## Findings
|
||||
|
||||
### Analyzers-001
|
||||
|
||||
| Field | Value |
|
||||
|---|---|
|
||||
| Severity | Medium |
|
||||
| Category | Correctness & logic bugs |
|
||||
| Location | `src/Tooling/ZB.MOM.WW.OtOpcUa.Analyzers/UnwrappedCapabilityCallAnalyzer.cs:135-139` |
|
||||
| Status | Open |
|
||||
|
||||
**Description:** `IsInsideWrapperLambda` treats a guarded call as "wrapped" if it is textually inside ANY lambda that is an argument to ANY invocation whose containing type is `CapabilityInvoker` or `AlarmSurfaceInvoker`. It matches the containing type only, never the parameter the lambda is bound to. The real wrapping contract is specifically the `callSite` (`Func<CancellationToken, ValueTask>` / `Func<CancellationToken, ValueTask<T>>`) parameter of `CapabilityInvoker.ExecuteAsync` / `ExecuteWriteAsync`. Any other lambda argument to a method on those types — a future overload that takes a predicate/selector lambda, or a lambda passed in a non-`callSite` position — would suppress the diagnostic even though the guarded call is not actually executed inside the resilience pipeline. The analyzer's own XML doc (lines 21-23) describes exactly this looser-than-intended behaviour. It is a latent false-negative gap rather than an active bug because the current `CapabilityInvoker` surface has no non-`callSite` lambda parameter.
|
||||
|
||||
**Recommendation:** Resolve the symbol of the lambda argument's parameter (`IMethodSymbol.Parameters[i]`) and require its type to be the `Func<CancellationToken, ValueTask>` / `Func<CancellationToken, ValueTask<T>>` callsite shape, or at minimum match the wrapper method name (`ExecuteAsync` / `ExecuteWriteAsync`) rather than only the containing type. This closes the gap before a new overload silently widens the escape hatch.
|
||||
|
||||
**Resolution:** _(open)_
|
||||
|
||||
### Analyzers-002
|
||||
|
||||
| Field | Value |
|
||||
|---|---|
|
||||
| Severity | Low |
|
||||
| Category | Correctness & logic bugs |
|
||||
| Location | `src/Tooling/ZB.MOM.WW.OtOpcUa.Analyzers/UnwrappedCapabilityCallAnalyzer.cs:46-50,130` |
|
||||
| Status | Open |
|
||||
|
||||
**Description:** `AlarmSurfaceInvoker` is listed in `WrapperTypes`, but `AlarmSurfaceInvoker`'s public methods (`SubscribeAsync`, `UnsubscribeAsync`, `AcknowledgeAsync`) take no lambda arguments at all — callers pass `IReadOnlyList<...>` / `IAlarmSubscriptionHandle`, and the invoker builds the resilience lambdas internally. `IsInsideWrapperLambda` only ever returns `true` when it finds an `AnonymousFunctionExpressionSyntax` argument in the outer call's argument list. Because no `AlarmSurfaceInvoker` call site can have a lambda argument, the `AlarmSurfaceInvoker` entry in `WrapperTypes` is effectively dead — it can never satisfy the suppression condition. Guarded `IAlarmSource` calls written inside `AlarmSurfaceInvoker.cs` are in fact suppressed correctly, but only because they sit inside `CapabilityInvoker.ExecuteAsync` lambdas (the `CapabilityInvoker` entry does the work). The dead entry is misleading and suggests the analyzer recognises an `AlarmSurfaceInvoker` "lambda home" that does not exist.
|
||||
|
||||
**Recommendation:** Either remove `AlarmSurfaceInvoker` from `WrapperTypes` (its calls are already covered transitively by the `CapabilityInvoker` match) and update the XML doc, or — if the intent is to allow `IAlarmSource` calls anywhere inside `AlarmSurfaceInvoker` regardless of lambda nesting — add an explicit "call site is lexically within the `AlarmSurfaceInvoker` type declaration" check rather than relying on a lambda-argument scan that never fires.
|
||||
|
||||
**Resolution:** _(open)_
|
||||
|
||||
### Analyzers-003
|
||||
|
||||
| Field | Value |
|
||||
|---|---|
|
||||
| Severity | Low |
|
||||
| Category | Error handling & resilience |
|
||||
| Location | `src/Tooling/ZB.MOM.WW.OtOpcUa.Analyzers/UnwrappedCapabilityCallAnalyzer.cs:80,114-116` |
|
||||
| Status | Open |
|
||||
|
||||
**Description:** `IsInsideWrapperLambda` is passed `context.Operation.SemanticModel` and returns `false` when that model is `null`. A `false` return means "not wrapped", so a null semantic model produces a false-positive diagnostic rather than silently skipping the call. For `RegisterOperationAction` the `SemanticModel` is non-null in normal compilation, so this is low-risk in practice, but the failure mode is the wrong direction — a tooling/IDE edge case where the model is unavailable would flag correct code. Separately, the analyzer has no defensive guard against partially-bound / malformed call sites: `method.ContainingType`, `method.ReturnType`, and `iface.GetMembers()` are dereferenced without null checks. `IInvocationOperation.TargetMethod` is non-null by contract and `ContainingType` is non-null for an ordinary method, so a hard crash is unlikely, but an analyzer that throws on malformed in-progress syntax degrades the IDE experience for the whole solution.
|
||||
|
||||
**Recommendation:** When `semanticModel is null` in `AnalyzeInvocation`, return early (skip the call) instead of letting `IsInsideWrapperLambda` report it as unwrapped, so unavailable semantics never produce a false positive.
|
||||
|
||||
**Resolution:** _(open)_
|
||||
|
||||
### Analyzers-004
|
||||
|
||||
| Field | Value |
|
||||
|---|---|
|
||||
| Severity | Low |
|
||||
| Category | Performance & resource management |
|
||||
| Location | `src/Tooling/ZB.MOM.WW.OtOpcUa.Analyzers/UnwrappedCapabilityCallAnalyzer.cs:95-112` |
|
||||
| Status | Open |
|
||||
|
||||
**Description:** `ImplementsGuardedInterface` runs on every invocation operation in the compilation (every keystroke in the IDE). For each candidate it allocates via `AllInterfaces.Concat(new[] { method.ContainingType })`, builds a fully-qualified display string per interface and calls `string.Replace("global::", ...)`, then for matching interfaces iterates `iface.GetMembers().OfType<IMethodSymbol>()` calling `FindImplementationForInterfaceMember` per member. The `GuardedInterfaces` / `WrapperTypes` lookups are `string[].Contains` (linear scan) rather than a hash set. None of this is catastrophic — the interface sets are tiny — but the work is repeated for every invocation including the overwhelming majority that target non-guarded methods, and the FQN string formatting plus `Replace` allocation on the hot path is avoidable.
|
||||
|
||||
**Recommendation:** Move to `RegisterCompilationStartAction`: resolve the guarded interface and wrapper-type symbols once via `Compilation.GetTypeByMetadataName`, capture them, and compare invocation symbols by `SymbolEqualityComparer` identity. Replace the `string[]` membership checks with a `HashSet`. This also makes the analyzer correctly no-op in compilations that do not reference `Core.Abstractions`.
|
||||
|
||||
**Resolution:** _(open)_
|
||||
|
||||
### Analyzers-005
|
||||
|
||||
| Field | Value |
|
||||
|---|---|
|
||||
| Severity | Low |
|
||||
| Category | Design-document adherence |
|
||||
| Location | `src/Tooling/ZB.MOM.WW.OtOpcUa.Analyzers/UnwrappedCapabilityCallAnalyzer.cs:33-43` |
|
||||
| Status | Open |
|
||||
|
||||
**Description:** `CapabilityInvoker`'s XML doc (`src/Core/.../Resilience/CapabilityInvoker.cs:15-17`) enumerates the routed capability surface as `IReadable`, `IWritable`, `ITagDiscovery`, `ISubscribable`, `IHostConnectivityProbe`, `IAlarmSource`, and all four `IHistoryProvider` reads — matching the analyzer's `GuardedInterfaces` set. However `IHistoryProvider` exposes five async methods, and two of them (`ReadAtTimeAsync`, `ReadEventsAsync`) are C# default-interface-method implementations. When a driver does not override a DIM and a caller invokes it through a concrete driver reference, `FindImplementationForInterfaceMember` returns the interface's own default method symbol; the second equality branch (`method.OriginalDefinition` == `member`) still catches the interface-typed-receiver case, so detection holds — but this DIM interaction is undocumented and untested, and a future driver that overrides one DIM but not the other creates an asymmetric guarded surface that nobody has verified.
|
||||
|
||||
**Recommendation:** Add explicit test cases (see Analyzers-006) for `IHistoryProvider` calls via both an interface-typed receiver and a concrete driver that (a) overrides and (b) inherits the default `ReadAtTimeAsync` / `ReadEventsAsync`. If a gap is found, handle DIM members explicitly. Add a short remark to the analyzer XML doc noting the default-interface-method consideration.
|
||||
|
||||
**Resolution:** _(open)_
|
||||
|
||||
### Analyzers-006
|
||||
|
||||
| Field | Value |
|
||||
|---|---|
|
||||
| Severity | Medium |
|
||||
| Category | Testing coverage |
|
||||
| Location | `tests/Tooling/ZB.MOM.WW.OtOpcUa.Analyzers.Tests/UnwrappedCapabilityCallAnalyzerTests.cs` |
|
||||
| Status | Open |
|
||||
|
||||
**Description:** The test suite exercises only 3 of the 7 guarded interfaces (`IReadable`, `IWritable`, `ITagDiscovery`) and one positive / one negative lambda case. Significant untested behaviour for an analyzer that gates a repo-wide resilience invariant:
|
||||
|
||||
- No test for `ISubscribable`, `IHostConnectivityProbe`, `IAlarmSource`, or `IHistoryProvider` — four of seven guarded interfaces, including the two (`IAlarmSource`, `IHistoryProvider`) with the most subtle wrapping story.
|
||||
- No test that a synchronous guarded-type member is NOT flagged — `IHostConnectivityProbe.GetHostStatuses()` is explicitly called out in the source comment (lines 75-77) as something the `IsAsyncReturningType` filter must let through, yet there is no regression test pinning that.
|
||||
- No test for a concrete driver class implementing the interface (the receiver is always the interface type `IReadable driver`); the `FindImplementationForInterfaceMember` branch of `ImplementsGuardedInterface` — the entire reason the source comment claims an unusually-named method implementing `IReadable.ReadAsync` still trips the rule — is never executed by a test.
|
||||
- No test for `ExecuteWriteAsync` (only `ExecuteAsync` is covered) and no test for `AlarmSurfaceInvoker`.
|
||||
- No test for nested lambdas or for the generated-code exclusion (`ConfigureGeneratedCodeAnalysis(GeneratedCodeAnalysisFlags.None)`).
|
||||
- The `StubSources` constant omits `ISubscribable` / `IAlarmSource` / `IHistoryProvider` / `IHostConnectivityProbe` and `AlarmSurfaceInvoker` entirely, so those paths cannot be tested without extending it.
|
||||
|
||||
**Recommendation:** Extend `StubSources` with the remaining guarded interfaces and `AlarmSurfaceInvoker`, then add tests for: each remaining guarded interface (positive plus wrapped), a synchronous member not being flagged, a concrete driver-class receiver with a renamed implementing method, `ExecuteWriteAsync` wrapping, and a nested-lambda case.
|
||||
|
||||
**Resolution:** _(open)_
|
||||
|
||||
### Analyzers-007
|
||||
|
||||
| Field | Value |
|
||||
|---|---|
|
||||
| Severity | Low |
|
||||
| Category | Documentation & comments |
|
||||
| Location | `src/Tooling/ZB.MOM.WW.OtOpcUa.Analyzers/UnwrappedCapabilityCallAnalyzer.cs:21-26` |
|
||||
| Status | Open |
|
||||
|
||||
**Description:** The `<remarks>` block states the analyzer "matches by receiver-interface identity using Roslyn's semantic model, not by method name". This is accurate for the guarded-call detection (`ImplementsGuardedInterface` uses symbols), but the wrapper detection in `IsInsideWrapperLambda` is described in the same block as walking the syntax tree and checking enclosing invocations by containing type — and that detection is in fact looser than the prose implies (see Analyzers-001): it does not verify the lambda is bound to the resilience `callSite` parameter. The XML doc reads as if the wrapper match is precise. The `<remarks>` also notes the rule does not enforce the capability argument matches the method, but omits the more important current limitation — that a lambda in any argument position of a wrapper-typed call suppresses the diagnostic.
|
||||
|
||||
**Recommendation:** Tighten the `<remarks>` to state precisely what `IsInsideWrapperLambda` checks today (textual containment within a lambda argument of a `CapabilityInvoker` / `AlarmSurfaceInvoker`-typed invocation), and note the known limitation that it does not bind the lambda to the `callSite` parameter. Keep the doc in sync if Analyzers-001 is fixed.
|
||||
|
||||
**Resolution:** _(open)_
|
||||
271
code-reviews/Client.CLI/findings.md
Normal file
271
code-reviews/Client.CLI/findings.md
Normal file
@@ -0,0 +1,271 @@
|
||||
# Code Review — Client.CLI
|
||||
|
||||
| Field | Value |
|
||||
|---|---|
|
||||
| Module | `src/Client/ZB.MOM.WW.OtOpcUa.Client.CLI` |
|
||||
| Reviewer | Claude Code |
|
||||
| Review date | 2026-05-22 |
|
||||
| Commit reviewed | `76d35d1` |
|
||||
| Status | Reviewed |
|
||||
| Open findings | 10 |
|
||||
|
||||
## Checklist coverage
|
||||
|
||||
A comprehensive review completes every category, recording "No issues found" where
|
||||
a category produced nothing rather than leaving it blank.
|
||||
|
||||
| # | Category | Result |
|
||||
|---|---|---|
|
||||
| 1 | Correctness & logic bugs | Client.CLI-001, Client.CLI-002, Client.CLI-003 |
|
||||
| 2 | OtOpcUa conventions | Client.CLI-004 |
|
||||
| 3 | Concurrency & thread safety | Client.CLI-005 |
|
||||
| 4 | Error handling & resilience | Client.CLI-006 |
|
||||
| 5 | Security | No issues found |
|
||||
| 6 | Performance & resource management | Client.CLI-007 |
|
||||
| 7 | Design-document adherence | Client.CLI-008 |
|
||||
| 8 | Code organization & conventions | Client.CLI-009 |
|
||||
| 9 | Testing coverage | Client.CLI-010 |
|
||||
| 10 | Documentation & comments | Client.CLI-008 |
|
||||
|
||||
## Findings
|
||||
|
||||
### Client.CLI-001
|
||||
|
||||
| Field | Value |
|
||||
|---|---|
|
||||
| Severity | Medium |
|
||||
| Category | Correctness & logic bugs |
|
||||
| Location | `Commands/HistoryReadCommand.cs:73`, `Commands/HistoryReadCommand.cs:76` |
|
||||
| Status | Open |
|
||||
|
||||
**Description:** The start and end options are parsed with `DateTime.Parse(StartTime)` with
|
||||
no `IFormatProvider` or `DateTimeStyles`. Parsing therefore depends on the current OS
|
||||
culture: the same `--start "03/04/2026"` resolves to March 4 on an en-US box and April 3
|
||||
on an en-GB box. The CLI is documented as cross-platform and the value silently produces a
|
||||
different (wrong) history window rather than failing. The doc claims "ISO 8601 or date
|
||||
string" but ISO interpretation is not guaranteed without `DateTimeStyles.RoundtripKind` or
|
||||
`CultureInfo.InvariantCulture`. A bare date string is also assumed to be local time, then
|
||||
`.ToUniversalTime()` shifts it by the host offset, so the same input yields different
|
||||
ranges on machines in different time zones.
|
||||
|
||||
**Recommendation:** Parse with `CultureInfo.InvariantCulture` and
|
||||
`DateTimeStyles.AssumeUniversal | DateTimeStyles.AdjustToUniversal` (or require explicit
|
||||
ISO 8601 via `DateTimeOffset.Parse`), and document the expected format and timezone
|
||||
assumption precisely.
|
||||
|
||||
**Resolution:** _(open)_
|
||||
|
||||
### Client.CLI-002
|
||||
|
||||
| Field | Value |
|
||||
|---|---|
|
||||
| Severity | Low |
|
||||
| Category | Correctness & logic bugs |
|
||||
| Location | `Commands/SubscribeCommand.cs:129-137` |
|
||||
| Status | Open |
|
||||
|
||||
**Description:** The summary computes `neverWentBad` as every target whose node-id key is
|
||||
absent from the `everBad` dictionary. A node that received no update at all is also absent
|
||||
from `everBad`, so it is counted in `neverWentBad` and printed under the heading
|
||||
"--- Nodes that NEVER received a bad-quality update (suspect) ---". The same node is also
|
||||
listed separately under `never` ("never received an update at all"). Labeling a node that
|
||||
produced zero notifications as a "suspect that never went bad" is misleading — it has not
|
||||
been observed at all, which is a different (and arguably worse) condition than a node that
|
||||
streamed only good values.
|
||||
|
||||
**Recommendation:** Exclude no-update nodes from the `neverWentBad` set, e.g.
|
||||
`targets.Where(t => lastStatus.ContainsKey(key) && !everBad.ContainsKey(key))`, so the
|
||||
"suspect" list only contains nodes that were actually observed and never reported bad
|
||||
quality.
|
||||
|
||||
**Resolution:** _(open)_
|
||||
|
||||
### Client.CLI-003
|
||||
|
||||
| Field | Value |
|
||||
|---|---|
|
||||
| Severity | Low |
|
||||
| Category | Correctness & logic bugs |
|
||||
| Location | `Commands/BrowseCommand.cs:29-30`, `Commands/SubscribeCommand.cs:20-27`, `Commands/AlarmsCommand.cs:28-29`, `Commands/HistoryReadCommand.cs:42-43` |
|
||||
| Status | Open |
|
||||
|
||||
**Description:** Numeric command options accept any value with no range validation.
|
||||
`--depth`, `--interval`, `--max-depth`, `--max`, and the history `--interval` can all be
|
||||
supplied as `0` or a negative number. A negative `--depth`/`--max-depth` silently disables
|
||||
recursion or under-traverses; a zero/negative sampling `--interval` is passed straight
|
||||
through to `SubscribeAsync` and depends on the SDK/server to reject it; a negative `--max`
|
||||
is forwarded to `HistoryReadRawAsync`. None of these produce a clear operator-facing error.
|
||||
|
||||
**Recommendation:** Validate option ranges at the start of `ExecuteAsync` and throw
|
||||
`CliFx.Exceptions.CommandException` with an actionable message when a value is out of
|
||||
range.
|
||||
|
||||
**Resolution:** _(open)_
|
||||
|
||||
### Client.CLI-004
|
||||
|
||||
| Field | Value |
|
||||
|---|---|
|
||||
| Severity | Low |
|
||||
| Category | OtOpcUa conventions |
|
||||
| Location | `Commands/SubscribeCommand.cs:13-37` |
|
||||
| Status | Open |
|
||||
|
||||
**Description:** `SubscribeCommand` is the only command in the module whose constructor
|
||||
and all `[CommandOption]` properties have no XML doc comments. Every other command
|
||||
(`ConnectCommand`, `ReadCommand`, `WriteCommand`, `BrowseCommand`, `AlarmsCommand`,
|
||||
`HistoryReadCommand`, `RedundancyCommand`) and `CommandBase` carry `<summary>` docs on the
|
||||
type, constructor, and options. The inconsistency is visible in IDE tooltips and breaks the
|
||||
otherwise-uniform documentation convention of the module.
|
||||
|
||||
**Recommendation:** Add `<summary>` XML docs to the `SubscribeCommand` constructor and to
|
||||
each of its option properties, matching the style used by the sibling commands.
|
||||
|
||||
**Resolution:** _(open)_
|
||||
|
||||
### Client.CLI-005
|
||||
|
||||
| Field | Value |
|
||||
|---|---|
|
||||
| Severity | Medium |
|
||||
| Category | Concurrency & thread safety |
|
||||
| Location | `Commands/SubscribeCommand.cs:66-78`, `Commands/AlarmsCommand.cs:52-64` |
|
||||
| Status | Open |
|
||||
|
||||
**Description:** The `DataChanged` and `AlarmEvent` handlers write to `console.Output`
|
||||
(a `System.IO.TextWriter`) directly from the OPC UA SDK subscription/notification thread,
|
||||
while the command main flow is awaiting `Task.Delay(Timeout.Infinite, ct)` and the summary
|
||||
block also writes to the same `console.Output`. `TextWriter` instances are not guaranteed
|
||||
thread-safe; concurrent `WriteLine` calls from the notification thread and the main thread
|
||||
(a data-change notification arriving while the summary is being printed, or two
|
||||
notifications from different SDK threads) can interleave or corrupt output. The handler
|
||||
also calls the synchronous `WriteLine` and discards any exception, which on a fault would
|
||||
propagate into the SDK callback.
|
||||
|
||||
**Recommendation:** Serialize console writes from event handlers — funnel notifications
|
||||
through a `Channel<T>` drained by the main thread, or guard every `console.Output` write
|
||||
with a shared lock. At minimum, ensure handler exceptions cannot escape into the SDK
|
||||
callback.
|
||||
|
||||
**Resolution:** _(open)_
|
||||
|
||||
### Client.CLI-006
|
||||
|
||||
| Field | Value |
|
||||
|---|---|
|
||||
| Severity | Low |
|
||||
| Category | Error handling & resilience |
|
||||
| Location | `Commands/HistoryReadCommand.cs:73`, `Commands/HistoryReadCommand.cs:76`, `Helpers/NodeIdParser.cs:39` |
|
||||
| Status | Open |
|
||||
|
||||
**Description:** Operator input-format errors surface as raw .NET exceptions rather than
|
||||
clean CLI errors. An unparseable start/end value throws `FormatException` straight out of
|
||||
`DateTime.Parse`; an invalid node id throws `FormatException`/`ArgumentException` from
|
||||
`NodeIdParser`. CliFx renders unhandled exceptions with a stack trace, which is noisy for a
|
||||
user-input mistake. Other tooling in this module already distinguishes operator errors
|
||||
(`ParseAggregateType` throws `ArgumentException` with a helpful message) but none of these
|
||||
is converted to a `CliFx.Exceptions.CommandException` with a clean exit code.
|
||||
|
||||
**Recommendation:** Catch the predictable input-validation exceptions and rethrow as
|
||||
`CommandException` with a concise message and a non-zero exit code, so malformed input
|
||||
yields a one-line error instead of a stack trace.
|
||||
|
||||
**Resolution:** _(open)_
|
||||
|
||||
### Client.CLI-007
|
||||
|
||||
| Field | Value |
|
||||
|---|---|
|
||||
| Severity | Low |
|
||||
| Category | Performance & resource management |
|
||||
| Location | `CommandBase.cs:112-123` |
|
||||
| Status | Open |
|
||||
|
||||
**Description:** `ConfigureLogging` builds a new Serilog `LoggerConfiguration`, creates a
|
||||
logger, and assigns it to the static `Log.Logger` without disposing the previously
|
||||
assigned logger. For a single CLI invocation this leaks at most one logger and the process
|
||||
exits shortly after, so impact is minimal — but `CommandBase` is also exercised repeatedly
|
||||
in-process by the unit-test suite, where each `ExecuteAsync` replaces `Log.Logger` and
|
||||
abandons the prior console sink without disposal. The pattern is incorrect:
|
||||
`Log.CloseAndFlush()` (or disposing the prior logger) should run before reassignment.
|
||||
|
||||
**Recommendation:** Call `Log.CloseAndFlush()` before assigning a new `Log.Logger`, or
|
||||
build the logger into a local `ILogger` the command owns and disposes, rather than mutating
|
||||
global static state per command.
|
||||
|
||||
**Resolution:** _(open)_
|
||||
|
||||
### Client.CLI-008
|
||||
|
||||
| Field | Value |
|
||||
|---|---|
|
||||
| Severity | Low |
|
||||
| Category | Documentation & comments |
|
||||
| Location | `docs/Client.CLI.md:158-217` |
|
||||
| Status | Open |
|
||||
|
||||
**Description:** `docs/Client.CLI.md` is stale relative to the code at this commit.
|
||||
(1) The `subscribe` command section documents only `-n` and `-i`, but the code
|
||||
(`SubscribeCommand`) also exposes `-r/--recursive`, `--max-depth`, `-q/--quiet`,
|
||||
`--duration`, and `--summary-file` — none are documented, and the documented Ctrl+C-only
|
||||
lifecycle no longer matches `--duration` auto-exit.
|
||||
(2) The `historyread` "Aggregate mapping" table lists six aggregates but the code
|
||||
(`HistoryReadCommand.ParseAggregateType` and `AggregateType`) also supports
|
||||
`StandardDeviation` (aliases `stddev`/`stdev`); the doc option table omits it while the
|
||||
code option description includes it.
|
||||
|
||||
**Recommendation:** Regenerate the `subscribe` and `historyread` sections of
|
||||
`docs/Client.CLI.md` from the current option set, including the five new subscribe flags
|
||||
and the `StandardDeviation` aggregate row.
|
||||
|
||||
**Resolution:** _(open)_
|
||||
|
||||
### Client.CLI-009
|
||||
|
||||
| Field | Value |
|
||||
|---|---|
|
||||
| Severity | Low |
|
||||
| Category | Code organization & conventions |
|
||||
| Location | `Commands/SubscribeCommand.cs:66-165`, `Commands/AlarmsCommand.cs:52-91` |
|
||||
| Status | Open |
|
||||
|
||||
**Description:** Both long-running commands attach an event handler
|
||||
(`service.DataChanged += ...`, `service.AlarmEvent += ...`) with a lambda and never detach
|
||||
it. Because the handler closes over `console`, the captured console and the closure remain
|
||||
referenced by the service until the service is disposed in the `finally` block. In
|
||||
practice the service is per-command and disposed at the end, so this does not leak across
|
||||
commands — but it is a latent footgun: a handler can still fire between `UnsubscribeAsync`
|
||||
/ `UnsubscribeAlarmsAsync` and `Dispose`, writing to a console that the command considers
|
||||
finished (overlapping with Client.CLI-005). The cleanup unsubscribes the monitored items
|
||||
but never the .NET event.
|
||||
|
||||
**Recommendation:** Detach the handler explicitly (`service.DataChanged -= handler`) after
|
||||
unsubscribing, using a named local delegate so it can be removed, ensuring no notification
|
||||
is processed after the command output phase ends.
|
||||
|
||||
**Resolution:** _(open)_
|
||||
|
||||
### Client.CLI-010
|
||||
|
||||
| Field | Value |
|
||||
|---|---|
|
||||
| Severity | Low |
|
||||
| Category | Testing coverage |
|
||||
| Location | `tests/Client/ZB.MOM.WW.OtOpcUa.Client.CLI.Tests/SubscribeCommandTests.cs` |
|
||||
| Status | Open |
|
||||
|
||||
**Description:** The new `SubscribeCommand` capabilities are largely untested. The four
|
||||
`SubscribeCommandTests` cover only single-node subscribe, unsubscribe-on-cancel,
|
||||
disconnect-in-finally, and the subscription message. There is no test for the `--recursive`
|
||||
browse-and-collect path (`CollectVariablesAsync`), the `--duration` auto-exit path, the
|
||||
summary classification logic (`good`/`bad`/`never`/`neverWentBad`, including the
|
||||
mislabeling noted in Client.CLI-002), the `--quiet` flag, the `--summary-file` write, or
|
||||
per-node subscribe-failure handling. The summary logic is the most behaviour-rich part of
|
||||
the command and the part most likely to regress.
|
||||
|
||||
**Recommendation:** Add unit tests for recursive variable collection, the duration-based
|
||||
exit, summary bucketing across good/bad/no-update nodes, and the `--summary-file` output.
|
||||
The `FakeOpcUaClientService` already exposes `RaiseDataChanged`, so feeding good/bad values
|
||||
and asserting the summary text is straightforward.
|
||||
|
||||
**Resolution:** _(open)_
|
||||
192
code-reviews/Client.Shared/findings.md
Normal file
192
code-reviews/Client.Shared/findings.md
Normal file
@@ -0,0 +1,192 @@
|
||||
# Code Review — Client.Shared
|
||||
|
||||
| Field | Value |
|
||||
|---|---|
|
||||
| Module | `src/Client/ZB.MOM.WW.OtOpcUa.Client.Shared` |
|
||||
| Reviewer | Claude Code |
|
||||
| Review date | 2026-05-22 |
|
||||
| Commit reviewed | `76d35d1` |
|
||||
| Status | Reviewed |
|
||||
| Open findings | 11 |
|
||||
|
||||
## Checklist coverage
|
||||
|
||||
| # | Category | Result |
|
||||
|---|---|---|
|
||||
| 1 | Correctness & logic bugs | Client.Shared-001, Client.Shared-002, Client.Shared-003 |
|
||||
| 2 | OtOpcUa conventions | Client.Shared-004 |
|
||||
| 3 | Concurrency & thread safety | Client.Shared-005, Client.Shared-006, Client.Shared-007 |
|
||||
| 4 | Error handling & resilience | Client.Shared-008, Client.Shared-009 |
|
||||
| 5 | Security | No issues found |
|
||||
| 6 | Performance & resource management | Client.Shared-010 |
|
||||
| 7 | Design-document adherence | No issues found |
|
||||
| 8 | Code organization & conventions | No issues found |
|
||||
| 9 | Testing coverage | Client.Shared-011 |
|
||||
| 10 | Documentation & comments | Client.Shared-009 |
|
||||
|
||||
## Findings
|
||||
|
||||
### Client.Shared-001
|
||||
|
||||
| Field | Value |
|
||||
|---|---|
|
||||
| Severity | Medium |
|
||||
| Category | Correctness & logic bugs |
|
||||
| Location | `OpcUaClientService.cs:552` |
|
||||
| Status | Open |
|
||||
|
||||
**Description:** `OnAlarmEventNotification` returns early when `eventFields.EventFields` has fewer than 6 entries. The event filter built by `CreateAlarmEventFilter` always registers 13 select clauses, so a conforming server returns 13 fields. The `< 6` threshold is arbitrary and inconsistent: SourceName is index 2 and Severity index 5, but ConditionName (6), Retain (7), Acked/Active (8/9) and ConditionNodeId (12) are all needed for a usable alarm and are each guarded individually with `fields.Count > N`. A non-conforming server that returns a truncated list (or fewer fields than requested) makes the `< 6` early return silently drop the entire notification, including the EventId/SourceName/Severity that are present.
|
||||
|
||||
**Recommendation:** Drop the `< 6` early return (or lower it to `< 1`) and rely on the existing per-index `fields.Count > N` guards, which already default missing fields safely. If a hard floor is wanted, document why 6 and not 13.
|
||||
|
||||
**Resolution:** _(open)_
|
||||
|
||||
### Client.Shared-002
|
||||
|
||||
| Field | Value |
|
||||
|---|---|
|
||||
| Severity | Medium |
|
||||
| Category | Correctness & logic bugs |
|
||||
| Location | `OpcUaClientService.cs:351-355`, `OpcUaClientService.cs:373` |
|
||||
| Status | Open |
|
||||
|
||||
**Description:** `GetRedundancyInfoAsync` performs unguarded unboxing casts on values read from the server: `(int)redundancySupportValue.Value` and `(byte)serviceLevelValue.Value`. Unlike the `ServerUriArray`/`ServerArray` reads below them, the `RedundancySupport` and `ServiceLevel` reads are not wrapped in try/catch. If the server returns the value boxed as a different numeric type than expected (e.g. `ServiceLevel` boxed as `int` instead of `byte`), or returns a null `Value` on a `Bad` DataValue, the cast throws `InvalidCastException`/`NullReferenceException` and the whole call fails instead of returning a sensible default.
|
||||
|
||||
**Recommendation:** Wrap the `RedundancySupport` and `ServiceLevel` reads in the same defensive pattern used for the array reads, using `Convert.ToInt32`/`Convert.ToByte` on the boxed value and falling back to `None`/`0` when the read status is bad or the value is null.
|
||||
|
||||
**Resolution:** _(open)_
|
||||
|
||||
### Client.Shared-003
|
||||
|
||||
| Field | Value |
|
||||
|---|---|
|
||||
| Severity | Low |
|
||||
| Category | Correctness & logic bugs |
|
||||
| Location | `Adapters/DefaultSessionAdapter.cs:76`, `Adapters/DefaultSessionAdapter.cs:273` |
|
||||
| Status | Open |
|
||||
|
||||
**Description:** `WriteValueAsync` returns `response.Results[0]` and `CallMethodAsync` reads `result.Results[0]` without first checking the `Results` collection is non-empty. A malformed or service-level-faulted response (empty `Results` alongside a service fault) produces an `IndexOutOfRangeException` rather than a meaningful OPC UA `StatusCode` or `ServiceResultException`.
|
||||
|
||||
**Recommendation:** Guard both accesses — throw `ServiceResultException` with the response's `ResponseHeader.ServiceResult` (or `BadUnexpectedError`) when `Results` is empty.
|
||||
|
||||
**Resolution:** _(open)_
|
||||
|
||||
### Client.Shared-004
|
||||
|
||||
| Field | Value |
|
||||
|---|---|
|
||||
| Severity | Low |
|
||||
| Category | OtOpcUa conventions |
|
||||
| Location | `Adapters/DefaultSessionAdapter.cs:228`, `Adapters/DefaultSessionAdapter.cs:121`, `Adapters/DefaultSessionAdapter.cs:172` |
|
||||
| Status | Open |
|
||||
|
||||
**Description:** `CloseAsync`, `HistoryReadRawAsync`, and `HistoryReadAggregateAsync` are declared `async Task` but call the synchronous `Session.Close()` / `Session.HistoryRead(...)` APIs and contain no `await`. The history methods run a blocking synchronous service round-trip on the caller's thread; for the UI this blocks the dispatcher thread. The async signature misleads callers, and the `CancellationToken` parameter is ignored on these paths.
|
||||
|
||||
**Recommendation:** Use the stack's async overloads (`Session.HistoryReadAsync`, `Session.CloseAsync`) where available, or wrap the synchronous calls in `Task.Run`, so the methods are genuinely asynchronous and honor the cancellation token.
|
||||
|
||||
**Resolution:** _(open)_
|
||||
|
||||
### Client.Shared-005
|
||||
|
||||
| Field | Value |
|
||||
|---|---|
|
||||
| Severity | High |
|
||||
| Category | Concurrency & thread safety |
|
||||
| Location | `OpcUaClientService.cs:19`, `OpcUaClientService.cs:226-249`, `OpcUaClientService.cs:499-521` |
|
||||
| Status | Open |
|
||||
|
||||
**Description:** `_activeDataSubscriptions` is a plain `Dictionary` mutated from at least three thread contexts with no synchronization: the caller thread (`SubscribeAsync`/`UnsubscribeAsync`), the keep-alive callback thread (`HandleKeepAliveFailureAsync` -> `ReplaySubscriptionsAsync`, invoked fire-and-forget from the OPC UA `KeepAlive` event), and `DisconnectAsync`. Concurrent `Add`/`Remove`/`Clear`/enumeration on a non-thread-safe `Dictionary` can corrupt its internal buckets, throw `InvalidOperationException`, or lose entries. A failover firing while the UI calls `SubscribeAsync` is a realistic trigger. The `_activeAlarmSubscription` nullable tuple has the same exposure.
|
||||
|
||||
**Recommendation:** Guard all access to `_activeDataSubscriptions` / `_activeAlarmSubscription` (and the `_session`/`_dataSubscription`/`_alarmSubscription` fields) with a single lock, or move subscription bookkeeping behind a `ConcurrentDictionary` plus a lock for the multi-field failover transition.
|
||||
|
||||
**Resolution:** _(open)_
|
||||
|
||||
### Client.Shared-006
|
||||
|
||||
| Field | Value |
|
||||
|---|---|
|
||||
| Severity | High |
|
||||
| Category | Concurrency & thread safety |
|
||||
| Location | `OpcUaClientService.cs:97-100`, `OpcUaClientService.cs:432-497` |
|
||||
| Status | Open |
|
||||
|
||||
**Description:** `HandleKeepAliveFailureAsync` is launched fire-and-forget (`_ = HandleKeepAliveFailureAsync()`) from every bad keep-alive callback. The only guard against re-entry is the non-atomic check `if (_state == Reconnecting || _state == Disconnected) return;` at the top. Between that read and the `TransitionState(Reconnecting, ...)` write a few lines later, a second keep-alive failure (the SDK raises `KeepAlive` repeatedly while a session is down) can pass the same guard, and two failover loops run concurrently — each disposing `_session`, nulling subscription fields, and racing to assign a new `_session`. The session created by the loser leaks, and `ReplaySubscriptionsAsync` can run twice creating duplicate monitored items.
|
||||
|
||||
**Recommendation:** Serialize failover with an `Interlocked.CompareExchange` flag or a `SemaphoreSlim(1,1)` so only one failover loop runs at a time; subsequent keep-alive failures during an in-flight failover should be ignored. Make the state transition atomic with the re-entry guard.
|
||||
|
||||
**Resolution:** _(open)_
|
||||
|
||||
### Client.Shared-007
|
||||
|
||||
| Field | Value |
|
||||
|---|---|
|
||||
| Severity | Medium |
|
||||
| Category | Concurrency & thread safety |
|
||||
| Location | `OpcUaClientService.cs:581-622` |
|
||||
| Status | Open |
|
||||
|
||||
**Description:** In the alarm fallback path, the `Task.Run` closure mutates the captured locals `activeState`, `ackedState`, `time`, and `capturedMessage`, then reads them when invoking `AlarmEvent`. Because the captured `_session` reference can be replaced by a concurrent failover (see Client.Shared-006), the supplemental `ReadValueAsync` calls may run against a session being disposed, throwing `ObjectDisposedException` — caught by the bare `catch`, after which the alarm is delivered with default (false/MinValue) states, silently misreporting it as inactive/unacknowledged. The notification callback also has no back-pressure: a burst of alarm events spawns an unbounded number of `Task.Run` continuations each doing 3-4 server round-trips.
|
||||
|
||||
**Recommendation:** Capture the session under the same lock proposed in Client.Shared-005 and skip the supplemental read if the session has changed or is disposed. Consider batching the four sequential `ReadValueAsync` calls into one `Read` request.
|
||||
|
||||
**Resolution:** _(open)_
|
||||
|
||||
### Client.Shared-008
|
||||
|
||||
| Field | Value |
|
||||
|---|---|
|
||||
| Severity | Medium |
|
||||
| Category | Error handling & resilience |
|
||||
| Location | `OpcUaClientService.cs:170-180`, `Helpers/ValueConverter.cs:15-31` |
|
||||
| Status | Open |
|
||||
|
||||
**Description:** `WriteValueAsync` coerces a string input to the target type by reading the node's current value and inferring the type from `currentDataValue.Value`. When the node has never been written, or the read returns a `Bad` status with a null `Value`, `ValueConverter.ConvertValue` falls through to the `_ => rawValue` default and writes a raw `string` into, for example, an `Int32` node — the server then rejects it with `BadTypeMismatch`, surfacing as a confusing failure unrelated to the operator's input. Separately, `ConvertValue` uses `bool.Parse`, which accepts only `true`/`false` — operator input of `1`/`0` throws `FormatException` that propagates raw to the caller. The read-before-write also doubles the round-trip cost of every string write.
|
||||
|
||||
**Recommendation:** Inspect `currentDataValue.StatusCode` before trusting `Value`; when the type cannot be inferred, surface a clear error rather than writing a mistyped value. Make boolean parsing accept `1`/`0`/`yes`/`no`, and wrap parse failures in a descriptive exception naming the node and target type.
|
||||
|
||||
**Resolution:** _(open)_
|
||||
|
||||
### Client.Shared-009
|
||||
|
||||
| Field | Value |
|
||||
|---|---|
|
||||
| Severity | Low |
|
||||
| Category | Error handling & resilience / Documentation & comments |
|
||||
| Location | `OpcUaClientService.cs:302-322` |
|
||||
| Status | Open |
|
||||
|
||||
**Description:** `AcknowledgeAlarmAsync` is typed `Task<StatusCode>` and its XML doc implies the returned code reports the ack outcome, but the method unconditionally `return StatusCodes.Good`. The actual failure path is `DefaultSessionAdapter.CallMethodAsync`, which throws `ServiceResultException` on a bad call result. A failed acknowledgment therefore never returns a bad `StatusCode` — it throws — and the `StatusCode` return value is dead. Callers writing `if (StatusCode.IsBad(result))` will never see a bad result and will not catch the exception.
|
||||
|
||||
**Recommendation:** Either change the return type to `Task` (and let exceptions signal failure), or catch `ServiceResultException` in `AcknowledgeAlarmAsync` and return its `StatusCode`. Update the XML doc to match whichever is chosen.
|
||||
|
||||
**Resolution:** _(open)_
|
||||
|
||||
### Client.Shared-010
|
||||
|
||||
| Field | Value |
|
||||
|---|---|
|
||||
| Severity | Low |
|
||||
| Category | Performance & resource management |
|
||||
| Location | `Models/ConnectionSettings.cs:48`, `OpcUaClientService.cs:408-417` |
|
||||
| Status | Open |
|
||||
|
||||
**Description:** `ConnectionSettings.CertificateStorePath` is initialized to `ClientStoragePaths.GetPkiPath()` as a property initializer, so every `ConnectionSettings` instantiation runs `Environment.GetFolderPath` + `Path.Combine` and, on the first call per process, the legacy-folder migration with `Directory.Exists`/`Directory.Move` filesystem IO. `ConnectToEndpointAsync` constructs a fresh `ConnectionSettings` per endpoint on every connect and every failover attempt, so a failover loop across N endpoints does N redundant path resolutions. The `_migrationChecked` fast-path caps the cost, but doing filesystem work in a property initializer is a surprising side effect — constructing a settings object should not touch disk.
|
||||
|
||||
**Recommendation:** Make `CertificateStorePath` default to `string.Empty` and resolve `ClientStoragePaths.GetPkiPath()` lazily inside `DefaultApplicationConfigurationFactory.CreateAsync` only when the path is unset.
|
||||
|
||||
**Resolution:** _(open)_
|
||||
|
||||
### Client.Shared-011
|
||||
|
||||
| Field | Value |
|
||||
|---|---|
|
||||
| Severity | Low |
|
||||
| Category | Testing coverage |
|
||||
| Location | `tests/Client/ZB.MOM.WW.OtOpcUa.Client.Shared.Tests/OpcUaClientServiceTests.cs` |
|
||||
| Status | Open |
|
||||
|
||||
**Description:** The test suite is solid for the happy paths, connection lifecycle, and single-failover behavior. Gaps relative to the findings above: (a) no test exercises concurrent `SubscribeAsync`/failover to expose the `_activeDataSubscriptions` race (Client.Shared-005) or re-entrant keep-alive failures (Client.Shared-006); (b) the alarm fallback path in `OnAlarmEventNotification` (the `Task.Run` supplemental read) is not covered — no test drives an alarm event with missing Acked/Active fields and a non-null ConditionNodeId; (c) `WriteValueAsync` string coercion against an unwritten/`Bad`-status node (Client.Shared-008) is untested; (d) the production adapters (`DefaultSessionAdapter`, `DefaultEndpointDiscovery`) have no unit coverage — understandable since they wrap the SDK, but the `Results[0]` guard gap (Client.Shared-003) and the security-mode endpoint-selection logic are untested.
|
||||
|
||||
**Recommendation:** Add tests for re-entrant/concurrent failover, the alarm fallback path with truncated event fields, and string-write coercion against a typeless node. Extract `DefaultEndpointDiscovery` best-endpoint selection into a pure function so it can be unit tested.
|
||||
|
||||
**Resolution:** _(open)_
|
||||
296
code-reviews/Client.UI/findings.md
Normal file
296
code-reviews/Client.UI/findings.md
Normal file
@@ -0,0 +1,296 @@
|
||||
# Code Review - Client.UI
|
||||
|
||||
| Field | Value |
|
||||
|---|---|
|
||||
| Module | `src/Client/ZB.MOM.WW.OtOpcUa.Client.UI` |
|
||||
| Reviewer | Claude Code |
|
||||
| Review date | 2026-05-22 |
|
||||
| Commit reviewed | `76d35d1` |
|
||||
| Status | Reviewed |
|
||||
| Open findings | 11 |
|
||||
|
||||
## Checklist coverage
|
||||
|
||||
A comprehensive review completes every category, recording "No issues found" where
|
||||
a category produced nothing rather than leaving it blank.
|
||||
|
||||
| # | Category | Result |
|
||||
|---|---|---|
|
||||
| 1 | Correctness & logic bugs | Client.UI-001, Client.UI-002 |
|
||||
| 2 | OtOpcUa conventions | Client.UI-003, Client.UI-004 |
|
||||
| 3 | Concurrency & thread safety | Client.UI-005 |
|
||||
| 4 | Error handling & resilience | Client.UI-006 |
|
||||
| 5 | Security | Client.UI-007 |
|
||||
| 6 | Performance & resource management | Client.UI-008 |
|
||||
| 7 | Design-document adherence | Client.UI-009 |
|
||||
| 8 | Code organization & conventions | Client.UI-010 |
|
||||
| 9 | Testing coverage | No issues found |
|
||||
| 10 | Documentation & comments | Client.UI-011 |
|
||||
|
||||
## Findings
|
||||
|
||||
### Client.UI-001
|
||||
|
||||
| Field | Value |
|
||||
|---|---|
|
||||
| Severity | Medium |
|
||||
| Category | Correctness & logic bugs |
|
||||
| Location | `ViewModels/HistoryViewModel.cs:76`, `ViewModels/HistoryViewModel.cs:77` |
|
||||
| Status | Open |
|
||||
|
||||
**Description:** `ReadHistoryAsync` runs as a `RelayCommand` body, which is invoked
|
||||
on the UI thread, so the bare `IsLoading = true` at line 76 happens to land on the
|
||||
right thread today. But `Results.Clear()` on the very next line is wrapped in
|
||||
`_dispatcher.Post(...)`, and the `finally` block also sets `IsLoading` through the
|
||||
dispatcher (`_dispatcher.Post(() => IsLoading = false)` at line 121). The two
|
||||
`IsLoading` writes use inconsistent dispatch paths. Because the `Post` in the
|
||||
`finally` is queued behind the result-population `Post` while the synchronous
|
||||
line-76 write is not, the loading-indicator updates are not guaranteed to be
|
||||
ordered relative to the grid population, and the pattern is fragile if the command
|
||||
is ever invoked off the UI thread (a future caller or test harness).
|
||||
|
||||
**Recommendation:** Route the line-76 `IsLoading = true` through `_dispatcher.Post`
|
||||
for consistency with the rest of the method, or set both `IsLoading` writes
|
||||
synchronously and only dispatch the `ObservableCollection` mutations.
|
||||
|
||||
**Resolution:** _(open)_
|
||||
|
||||
### Client.UI-002
|
||||
|
||||
| Field | Value |
|
||||
|---|---|
|
||||
| Severity | Medium |
|
||||
| Category | Correctness & logic bugs |
|
||||
| Location | `ViewModels/MainWindowViewModel.cs:255`, `ViewModels/MainWindowViewModel.cs:333` |
|
||||
| Status | Open |
|
||||
|
||||
**Description:** `ConnectAsync` calls `await BrowseTree.LoadRootsAsync()` and
|
||||
`ViewHistoryForSelectedNode` calls `History.SelectedNodeId = ...` by dereferencing
|
||||
the nullable child view-model properties (`BrowseTreeViewModel?`,
|
||||
`HistoryViewModel?`) without a null check or `!` operator, while the surrounding
|
||||
code (lines 258-266) does guard `Subscriptions` and `Alarms` with `!= null`.
|
||||
`InitializeService()` does assign all five child VMs before these lines run, so a
|
||||
real NRE is unlikely on the current call path, but the inconsistent guarding masks
|
||||
intent and the nullable-reference compiler flow analysis cannot prove
|
||||
`InitializeService()` set the field, so this either produces a CS8602 warning that
|
||||
is being ignored or relies on warnings being suppressed. A future refactor that
|
||||
makes `InitializeService()` conditionally skip a VM would introduce a silent crash.
|
||||
|
||||
**Recommendation:** Make the guarding consistent: either guard all five child VMs
|
||||
uniformly, or have `InitializeService()` return non-null references the caller uses
|
||||
directly so the compiler can prove non-nullness.
|
||||
|
||||
**Resolution:** _(open)_
|
||||
|
||||
### Client.UI-003
|
||||
|
||||
| Field | Value |
|
||||
|---|---|
|
||||
| Severity | Low |
|
||||
| Category | OtOpcUa conventions |
|
||||
| Location | `ZB.MOM.WW.OtOpcUa.Client.UI.csproj:20-21`, `Program.cs:14-20` |
|
||||
| Status | Open |
|
||||
|
||||
**Description:** The csproj references `Serilog` and `Serilog.Sinks.Console`, and
|
||||
`docs/Client.UI.md` lists Serilog as the logging technology, but no source file in
|
||||
the module uses Serilog. `Program.BuildAvaloniaApp()` uses Avalonia's
|
||||
`LogToTrace()` and there is no logger configuration, no log calls, and no rolling
|
||||
file sink. `CLAUDE.md` mandates "Serilog with rolling daily file sink" as the
|
||||
logging library preference. The references are dead weight and the documented
|
||||
logging behaviour does not exist.
|
||||
|
||||
**Recommendation:** Either wire up Serilog (a console sink at minimum, ideally the
|
||||
rolling daily file sink the project standard calls for) and route Avalonia logging
|
||||
through it, or drop the unused `Serilog` package references and correct
|
||||
`docs/Client.UI.md`.
|
||||
|
||||
**Resolution:** _(open)_
|
||||
|
||||
### Client.UI-004
|
||||
|
||||
| Field | Value |
|
||||
|---|---|
|
||||
| Severity | Low |
|
||||
| Category | OtOpcUa conventions |
|
||||
| Location | `Views/MainWindow.axaml.cs:125-138` |
|
||||
| Status | Open |
|
||||
|
||||
**Description:** `OnBrowseCertPathClicked` uses `OpenFolderDialog`, which is
|
||||
obsolete in Avalonia 11.x (the version pinned in the csproj). The supported
|
||||
replacement is the `StorageProvider` API
|
||||
(`StorageProvider.OpenFolderPickerAsync`). Using the obsolete type produces a
|
||||
compiler obsoletion warning and the API is scheduled for removal in a future
|
||||
Avalonia major version.
|
||||
|
||||
**Recommendation:** Migrate the folder chooser to
|
||||
`TopLevel.GetTopLevel(this).StorageProvider.OpenFolderPickerAsync(...)`.
|
||||
|
||||
**Resolution:** _(open)_
|
||||
|
||||
### Client.UI-005
|
||||
|
||||
| Field | Value |
|
||||
|---|---|
|
||||
| Severity | Medium |
|
||||
| Category | Concurrency & thread safety |
|
||||
| Location | `ViewModels/MainWindowViewModel.cs:286-304`, `ViewModels/MainWindowViewModel.cs:155-189` |
|
||||
| Status | Open |
|
||||
|
||||
**Description:** `SubscriptionsViewModel` and `AlarmsViewModel` attach handlers to
|
||||
the long-lived `_service` events (`DataChanged`, `AlarmEvent`) in their
|
||||
constructors and detach them only via `Teardown()`. `Teardown()` is called from
|
||||
`DisconnectAsync` (operator-initiated disconnect), but it is NOT called from the
|
||||
`OnConnectionStateChanged` partial method that handles the `Disconnected` state;
|
||||
that path only calls `Clear()`. When the connection drops server-side (session
|
||||
lost, network failure) the service raises `ConnectionStateChanged(Disconnected)`
|
||||
without `DisconnectAsync` ever running, so the alarm/data event handlers remain
|
||||
attached to a dead service. They are not re-attached on the next connect because
|
||||
`InitializeService()` early-returns when `_service != null` and the same VM
|
||||
instances are reused, so there is no handler leak per reconnect, but a late or
|
||||
buffered `DataChanged`/`AlarmEvent` callback fired during teardown will still mutate
|
||||
`ObservableCollection`s, and the asymmetry between the two disconnect paths is a
|
||||
latent correctness hazard.
|
||||
|
||||
**Recommendation:** Make the disconnect handling symmetric: call
|
||||
`Subscriptions?.Teardown()` / `Alarms?.Teardown()` (or otherwise quiesce the event
|
||||
handlers) from the `Disconnected` branch of the `OnConnectionStateChanged` partial
|
||||
method, not only from `DisconnectAsync`.
|
||||
|
||||
**Resolution:** _(open)_
|
||||
|
||||
### Client.UI-006
|
||||
|
||||
| Field | Value |
|
||||
|---|---|
|
||||
| Severity | Low |
|
||||
| Category | Error handling & resilience |
|
||||
| Location | `ViewModels/MainWindowViewModel.cs:244-252`, `ViewModels/AlarmsViewModel.cs:88-112`, `ViewModels/SubscriptionsViewModel.cs:79-94` |
|
||||
| Status | Open |
|
||||
|
||||
**Description:** Many catch blocks swallow exceptions silently with an empty body
|
||||
and only a comment (`// Redundancy info not available`, `// Subscribe failed`,
|
||||
`// Subscription failed; no item added`, and others). When a subscribe,
|
||||
alarm-subscribe, or redundancy read fails, the operator gets no feedback at all: no
|
||||
status message, no log entry (compounded by Client.UI-003: there is no logger). A
|
||||
failed `AddSubscriptionAsync` simply leaves the node un-subscribed with no
|
||||
indication why. This makes field diagnosis of a misconfigured server or a
|
||||
permission denial effectively impossible from the UI.
|
||||
|
||||
**Recommendation:** Surface failures to the operator: at minimum set a status
|
||||
message or write the exception to a log. Distinguish "feature not supported"
|
||||
(condition refresh) from "operation failed" so genuine errors are not hidden.
|
||||
|
||||
**Resolution:** _(open)_
|
||||
|
||||
### Client.UI-007
|
||||
|
||||
| Field | Value |
|
||||
|---|---|
|
||||
| Severity | Medium |
|
||||
| Category | Security |
|
||||
| Location | `Services/UserSettings.cs:22-23`, `Services/JsonSettingsService.cs:38-50`, `ViewModels/MainWindowViewModel.cs:393-408` |
|
||||
| Status | Open |
|
||||
|
||||
**Description:** The OPC UA `UserName`-token password is persisted in cleartext.
|
||||
`UserSettings.Password` is a plain `string`, `JsonSettingsService.Save` serializes
|
||||
the whole settings object to `settings.json` under `LocalApplicationData`, and
|
||||
`SaveSettings()` is invoked after every successful connect and on window close. Any
|
||||
process or user able to read the current user's profile directory can recover the
|
||||
server credentials. `docs/Client.UI.md` documents that "All connection parameters"
|
||||
are persisted but does not flag the password among them.
|
||||
|
||||
**Recommendation:** Do not persist the password in cleartext. Options: omit it from
|
||||
the persisted model entirely (re-prompt each launch); encrypt it at rest with
|
||||
`ProtectedData` (DPAPI) on Windows or an equivalent OS keystore on other platforms;
|
||||
or store only a non-reversible reference. At minimum, document the cleartext
|
||||
storage as a known limitation.
|
||||
|
||||
**Resolution:** _(open)_
|
||||
|
||||
### Client.UI-008
|
||||
|
||||
| Field | Value |
|
||||
|---|---|
|
||||
| Severity | Medium |
|
||||
| Category | Performance & resource management |
|
||||
| Location | `ViewModels/MainWindowViewModel.cs:18`, `ViewModels/MainWindowViewModel.cs:125-148`, `App.axaml.cs:18-32` |
|
||||
| Status | Open |
|
||||
|
||||
**Description:** `IOpcUaClientService` is declared `IDisposable`
|
||||
(`IOpcUaClientService.cs:10`), and the concrete service owns an OPC UA session plus
|
||||
SDK resources. `MainWindowViewModel` holds `_service` for the lifetime of the app
|
||||
but never calls `_service.Dispose()`: not on window close, not on disconnect, not
|
||||
anywhere. `DisconnectAsync` calls `DisconnectAsync()` on the service but leaves the
|
||||
object undisposed, and there is no `IDisposable` implementation on
|
||||
`MainWindowViewModel` itself. The OPC UA SDK session, certificate validator, and
|
||||
any background reconnect timers are leaked until process exit. The
|
||||
`ConnectionStateChanged` handler attached at line 130 is also never detached.
|
||||
|
||||
**Recommendation:** Make `MainWindowViewModel` implement `IDisposable`, detach the
|
||||
`ConnectionStateChanged` handler, and dispose `_service` from `MainWindow.OnClosing`
|
||||
(alongside the existing `SaveSettings()` call).
|
||||
|
||||
**Resolution:** _(open)_
|
||||
|
||||
### Client.UI-009
|
||||
|
||||
| Field | Value |
|
||||
|---|---|
|
||||
| Severity | Low |
|
||||
| Category | Design-document adherence |
|
||||
| Location | `ViewModels/HistoryViewModel.cs:44-54` |
|
||||
| Status | Open |
|
||||
|
||||
**Description:** `HistoryViewModel.AggregateTypes` exposes eight entries: `null`
|
||||
(Raw) plus Average, Minimum, Maximum, Count, Start, End, and `StandardDeviation`.
|
||||
`docs/Client.UI.md` ("Query Options" table) lists only "Raw (default), Average,
|
||||
Minimum, Maximum, Count, Start, End" and omits `StandardDeviation`. The doc is
|
||||
stale relative to the code.
|
||||
|
||||
**Recommendation:** Update the "Aggregate" row in `docs/Client.UI.md` to include
|
||||
Standard Deviation.
|
||||
|
||||
**Resolution:** _(open)_
|
||||
|
||||
### Client.UI-010
|
||||
|
||||
| Field | Value |
|
||||
|---|---|
|
||||
| Severity | Low |
|
||||
| Category | Code organization & conventions |
|
||||
| Location | `Controls/DateTimeRangePicker.axaml.cs:33-37`, `Controls/DateTimeRangePicker.axaml.cs:70-80` |
|
||||
| Status | Open |
|
||||
|
||||
**Description:** `DateTimeRangePicker` declares `MinDateTimeProperty` /
|
||||
`MaxDateTimeProperty` styled properties with public CLR accessors, but neither is
|
||||
read anywhere in the control. `TryParseDateTime`, `OnStartLostFocus`, and
|
||||
`OnEndLostFocus` never clamp or reject input against the min/max bounds, and no
|
||||
XAML binds them. The properties are dead API surface that implies a range
|
||||
constraint the control does not enforce.
|
||||
|
||||
**Recommendation:** Either implement min/max validation in the `LostFocus` parse
|
||||
path (turn out-of-range input red, as invalid input already is) or remove the two
|
||||
unused styled properties.
|
||||
|
||||
**Resolution:** _(open)_
|
||||
|
||||
### Client.UI-011
|
||||
|
||||
| Field | Value |
|
||||
|---|---|
|
||||
| Severity | Low |
|
||||
| Category | Documentation & comments |
|
||||
| Location | `Views/MainWindow.axaml:81`, `Services/JsonSettingsService.cs:11-15` |
|
||||
| Status | Open |
|
||||
|
||||
**Description:** The certificate-store-path `TextBox` watermark reads
|
||||
`(default: AppData/LmxOpcUaClient/pki)`, referencing the legacy pre-task-#208
|
||||
folder name. Per `CLAUDE.md` / `docs/Client.UI.md` the canonical path is now
|
||||
`{LocalAppData}/OtOpcUaClient/`, and `ClientStoragePaths` migrates the old
|
||||
`LmxOpcUaClient/` folder forward. The watermark shows operators an obsolete path
|
||||
that no longer matches where settings and the PKI store actually live.
|
||||
|
||||
**Recommendation:** Update the watermark to reference `OtOpcUaClient/pki`, or bind
|
||||
it to `ClientStoragePaths.GetPkiPath()` so it cannot drift again.
|
||||
|
||||
**Resolution:** _(open)_
|
||||
192
code-reviews/Configuration/findings.md
Normal file
192
code-reviews/Configuration/findings.md
Normal file
@@ -0,0 +1,192 @@
|
||||
# Code Review — Configuration
|
||||
|
||||
| Field | Value |
|
||||
|---|---|
|
||||
| Module | `src/Core/ZB.MOM.WW.OtOpcUa.Configuration` |
|
||||
| Reviewer | Claude Code |
|
||||
| Review date | 2026-05-22 |
|
||||
| Commit reviewed | `76d35d1` |
|
||||
| Status | Reviewed |
|
||||
| Open findings | 11 |
|
||||
|
||||
## Checklist coverage
|
||||
|
||||
| # | Category | Result |
|
||||
|---|---|---|
|
||||
| 1 | Correctness & logic bugs | Configuration-001, Configuration-002, Configuration-003 |
|
||||
| 2 | OtOpcUa conventions | Configuration-004 |
|
||||
| 3 | Concurrency & thread safety | Configuration-005 |
|
||||
| 4 | Error handling & resilience | Configuration-006, Configuration-007 |
|
||||
| 5 | Security | Configuration-008, Configuration-009, Configuration-010 |
|
||||
| 6 | Performance & resource management | No issues found |
|
||||
| 7 | Design-document adherence | No issues found |
|
||||
| 8 | Code organization & conventions | No issues found |
|
||||
| 9 | Testing coverage | Configuration-011 |
|
||||
| 10 | Documentation & comments | No issues found |
|
||||
|
||||
## Findings
|
||||
|
||||
### Configuration-001
|
||||
|
||||
| Field | Value |
|
||||
|---|---|
|
||||
| Severity | High |
|
||||
| Category | Correctness & logic bugs |
|
||||
| Location | `src/Core/ZB.MOM.WW.OtOpcUa.Configuration/Migrations/20260417215224_StoredProcedures.cs:282` |
|
||||
| Status | Open |
|
||||
|
||||
**Description:** `sp_PublishGeneration` invokes `EXEC dbo.sp_ValidateDraft @DraftGenerationId = @DraftGenerationId;` and then continues unconditionally to the reservation MERGE and the `Status='Published'` update. `sp_ValidateDraft` signals every failure with `RAISERROR(..., 16, 1)` followed by `RETURN`. A severity-16 `RAISERROR` is not a batch-aborting error and `SET XACT_ABORT ON` does not abort the transaction for it, so control returns to `sp_PublishGeneration`, which publishes the draft even though validation rejected it (cross-cluster namespace binding, dangling tag FKs, duplicate external identifiers, EquipmentUuid immutability all pass through). Pre-publish validation is effectively bypassed.
|
||||
|
||||
**Recommendation:** Wrap the `EXEC dbo.sp_ValidateDraft` in `BEGIN TRY ... END TRY BEGIN CATCH ROLLBACK; THROW; END CATCH` so the validation `RAISERROR` propagates and aborts the publish, or have `sp_ValidateDraft` return a result-set/output parameter that `sp_PublishGeneration` inspects and explicitly rolls back on. Add a regression test that publishes a draft with a known violation and asserts it stays `Draft`.
|
||||
|
||||
**Resolution:** _(open)_
|
||||
|
||||
### Configuration-002
|
||||
|
||||
| Field | Value |
|
||||
|---|---|
|
||||
| Severity | Medium |
|
||||
| Category | Correctness & logic bugs |
|
||||
| Location | `src/Core/ZB.MOM.WW.OtOpcUa.Configuration/Migrations/20260417215224_StoredProcedures.cs:325` |
|
||||
| Status | Open |
|
||||
|
||||
**Description:** `sp_RollbackToGeneration` opens its own `BEGIN TRANSACTION`, clones rows into a new Draft, then `EXEC dbo.sp_PublishGeneration`, which itself runs `BEGIN TRANSACTION` (nesting `@@TRANCOUNT` to 2) and on its failure paths executes a bare `ROLLBACK`. A bare `ROLLBACK` rolls back to the outermost transaction and sets `@@TRANCOUNT` to 0, so when `sp_RollbackToGeneration` later reaches its own `COMMIT` it runs with no open transaction and raises error 3902. The rollback clone is silently discarded and the caller sees a confusing secondary error instead of the real publish failure.
|
||||
|
||||
**Recommendation:** Make `sp_PublishGeneration` transaction-nesting aware: capture `@@TRANCOUNT` on entry, only `BEGIN TRANSACTION` when zero (otherwise `SAVE TRANSACTION`), and only `COMMIT`/`ROLLBACK` the level it owns. Alternatively factor the publish body into an inner proc that assumes an ambient transaction.
|
||||
|
||||
**Resolution:** _(open)_
|
||||
|
||||
### Configuration-003
|
||||
|
||||
| Field | Value |
|
||||
|---|---|
|
||||
| Severity | Medium |
|
||||
| Category | Correctness & logic bugs |
|
||||
| Location | `src/Core/ZB.MOM.WW.OtOpcUa.Configuration/Validation/DraftValidator.cs:73` |
|
||||
| Status | Open |
|
||||
|
||||
**Description:** `ValidatePathLength` computes path length with hard-coded constants — it always charges 64 chars for Enterprise+Site (`32 + 32 + ...`) regardless of the cluster's actual values. This over-rejects: a short Enterprise/Site is penalised by up to 64 unused chars, so a legitimately under-200-char path can fail `PathTooLong`. The check also silently `continue`s when an equipment's `UnsLineId`/`UnsAreaId` does not resolve, so an orphaned-line path is never length-checked.
|
||||
|
||||
**Recommendation:** Pass the actual `Enterprise` and `Site` strings into the validator (e.g. on `DraftSnapshot`, or as parameters alongside `ValidateClusterTopology`) and compute the real length. If the cluster row cannot be supplied, document the check as a conservative upper bound or emit a lower-severity warning rather than a hard error.
|
||||
|
||||
**Resolution:** _(open)_
|
||||
|
||||
### Configuration-004
|
||||
|
||||
| Field | Value |
|
||||
|---|---|
|
||||
| Severity | Low |
|
||||
| Category | OtOpcUa conventions |
|
||||
| Location | `src/Core/ZB.MOM.WW.OtOpcUa.Configuration/Enums/NodePermissions.cs:8`, `src/Core/ZB.MOM.WW.OtOpcUa.Configuration/OtOpcUaConfigDbContext.cs:417` |
|
||||
| Status | Open |
|
||||
|
||||
**Description:** `NodePermissions` is declared `[Flags] enum ... : uint`, while its XML doc and `NodeAcl.PermissionFlags`' doc both say "stored as int", and `ConfigureNodeAcl` uses `HasConversion<int>()` — a `uint`→`int` conversion. Only bits 0–11 are used today, but the underlying-type/storage-type mismatch is a latent trap: a future bit-31 flag yields a `uint` value that overflows `int` and the conversion round-trip would corrupt it.
|
||||
|
||||
**Recommendation:** Change the enum underlying type to `int` (consistent with the docs and the conversion). No high bit is in use, so this is the smaller change.
|
||||
|
||||
**Resolution:** _(open)_
|
||||
|
||||
### Configuration-005
|
||||
|
||||
| Field | Value |
|
||||
|---|---|
|
||||
| Severity | Low |
|
||||
| Category | Concurrency & thread safety |
|
||||
| Location | `src/Core/ZB.MOM.WW.OtOpcUa.Configuration/LocalCache/LiteDbConfigCache.cs:50` |
|
||||
| Status | Open |
|
||||
|
||||
**Description:** `PutAsync` performs a non-atomic find-then-insert/update. Two concurrent `PutAsync` calls for the same `(ClusterId, GenerationId)` can both observe `existing is null` and both `Insert`, producing two rows for one generation. The constructor's `EnsureIndex` calls are non-unique, so the storage layer does not prevent the duplicate, and `PruneOldGenerationsAsync`'s `keepLatest` accounting is then off.
|
||||
|
||||
**Recommendation:** Declare a unique index on `(ClusterId, GenerationId)` and treat the duplicate-key exception as an idempotent no-op, or guard `PutAsync` with an instance `SemaphoreSlim`/lock. Document the concurrency contract on `ILocalConfigCache`.
|
||||
|
||||
**Resolution:** _(open)_
|
||||
|
||||
### Configuration-006
|
||||
|
||||
| Field | Value |
|
||||
|---|---|
|
||||
| Severity | Medium |
|
||||
| Category | Error handling & resilience |
|
||||
| Location | `src/Core/ZB.MOM.WW.OtOpcUa.Configuration/LocalCache/ResilientConfigReader.cs:79` |
|
||||
| Status | Open |
|
||||
|
||||
**Description:** The fallback `catch` filters on `ex is not OperationCanceledException`. A SQL command timeout surfaced by ADO.NET as a `TaskCanceledException` (derives from `OperationCanceledException`) is then treated as caller cancellation and propagates instead of falling back to the sealed cache — the opposite of the documented "fallback on any exception including timeout". The retry `ShouldHandle` predicate has the same shape, so command-timeout cancellations are also not retried consistently.
|
||||
|
||||
**Recommendation:** Distinguish caller cancellation from command-timeout cancellation explicitly: inspect `cancellationToken.IsCancellationRequested` to decide whether an `OperationCanceledException` is a genuine cancel (rethrow) or a timeout (fall back). Add unit tests for both a `TimeoutRejectedException` path and a command-timeout `TaskCanceledException` path asserting cache fallback occurs.
|
||||
|
||||
**Resolution:** _(open)_
|
||||
|
||||
### Configuration-007
|
||||
|
||||
| Field | Value |
|
||||
|---|---|
|
||||
| Severity | Low |
|
||||
| Category | Error handling & resilience |
|
||||
| Location | `src/Core/ZB.MOM.WW.OtOpcUa.Configuration/Apply/GenerationApplier.cs:44` |
|
||||
| Status | Open |
|
||||
|
||||
**Description:** `ApplyPass` wraps each callback in `catch (Exception ex)`. This swallows `OperationCanceledException` — a cancellation during a callback is recorded as just another entity error string and the applier keeps walking the remaining passes instead of stopping. It also masks fatal exceptions. The applier continues applying Added/Modified passes even after a Removed callback failed, leaving a partially-applied runtime state.
|
||||
|
||||
**Recommendation:** Rethrow `OperationCanceledException` rather than recording it as an entity error; call `ct.ThrowIfCancellationRequested()` between passes. Document or enforce whether a failed Removed pass should abort before the Added/Modified passes run.
|
||||
|
||||
**Resolution:** _(open)_
|
||||
|
||||
### Configuration-008
|
||||
|
||||
| Field | Value |
|
||||
|---|---|
|
||||
| Severity | High |
|
||||
| Category | Security |
|
||||
| Location | `src/Core/ZB.MOM.WW.OtOpcUa.Configuration/Migrations/20260417215224_StoredProcedures.cs:150`, `:373`, `:468` |
|
||||
| Status | Open |
|
||||
|
||||
**Description:** Three stored procedures build `ConfigAuditLog.DetailsJson` by raw string concatenation of caller-supplied `nvarchar` parameters: `sp_RegisterNodeGenerationApplied` (`@Status`), `sp_RollbackToGeneration` (`@TargetGenerationId`), `sp_ReleaseExternalIdReservation` (`@Kind`, `@Value`). A value with a double-quote or backslash produces malformed JSON; combined with the `CK_ConfigAuditLog_DetailsJson_IsJson` check constraint, the `INSERT` fails the constraint and aborts the surrounding publish/rollback transaction (denial of operation). It is also a JSON-injection vector that can silently rewrite the audit record's shape.
|
||||
|
||||
**Recommendation:** Build the JSON with a safe constructor (`FOR JSON PATH, WITHOUT_ARRAY_WRAPPER` or `JSON_OBJECT(...)` on SQL Server 2022+) so values are properly escaped, or run each interpolated value through `STRING_ESCAPE(@Value, 'json')`. Add tests with quote/backslash-containing inputs.
|
||||
|
||||
**Resolution:** _(open)_
|
||||
|
||||
### Configuration-009
|
||||
|
||||
| Field | Value |
|
||||
|---|---|
|
||||
| Severity | Medium |
|
||||
| Category | Security |
|
||||
| Location | `src/Core/ZB.MOM.WW.OtOpcUa.Configuration/DesignTimeDbContextFactory.cs:14` |
|
||||
| Status | Open |
|
||||
|
||||
**Description:** `DefaultConnectionString` embeds a plaintext `sa` password with `User Id=sa` directly in source, checked into the repository. Although used only at design time (`dotnet ef`), a checked-in `sa` credential normalises committing DB passwords and, if live for the shared dev SQL Server, grants `sa` to anyone with repo access. `TrustServerCertificate=True` plus `Encrypt=False` additionally disables transport protection for that connection.
|
||||
|
||||
**Recommendation:** Drop the embedded credential. Fall back to integrated auth (`Trusted_Connection=True`) or fail fast with a message instructing the developer to set `OTOPCUA_CONFIG_CONNECTION`. Rotate the dev `sa` password if this value is live.
|
||||
|
||||
**Resolution:** _(open)_
|
||||
|
||||
### Configuration-010
|
||||
|
||||
| Field | Value |
|
||||
|---|---|
|
||||
| Severity | Low |
|
||||
| Category | Security |
|
||||
| Location | `src/Core/ZB.MOM.WW.OtOpcUa.Configuration/LocalCache/ResilientConfigReader.cs:81` |
|
||||
| Status | Open |
|
||||
|
||||
**Description:** On central-DB read failure the warning log records the full exception object. Callers pass arbitrary `centralFetch` delegates; if any delegate closes over a connection string, an exception thrown from it (or a `SqlException` carrying server/credential context) is logged verbatim. There is no scrubbing of connection-string fragments before logging, against the project's no-secret-logging rule.
|
||||
|
||||
**Recommendation:** Log `ex.GetType().Name` and `ex.Message` for SQL failures rather than the full exception, or run exception messages through a connection-string scrubber before they reach the log sink.
|
||||
|
||||
**Resolution:** _(open)_
|
||||
|
||||
### Configuration-011
|
||||
|
||||
| Field | Value |
|
||||
|---|---|
|
||||
| Severity | Low |
|
||||
| Category | Testing coverage |
|
||||
| Location | `src/Core/ZB.MOM.WW.OtOpcUa.Configuration/Apply/GenerationApplier.cs:7`, `src/Core/ZB.MOM.WW.OtOpcUa.Configuration/Validation/DraftValidator.cs:60` |
|
||||
| Status | Open |
|
||||
|
||||
**Description:** The companion test project covers the cache, schema compliance, stored procedures, and `DraftValidator` well, but two flagged behaviours are not pinned: (a) `GenerationApplier` ordering/cancellation when a Removed callback fails — no test asserts the Added/Modified passes still run or that cancellation aborts; (b) `ValidatePathLength`'s constant 32+32 approximation — no test exercises a long Enterprise/Site. The publish-bypasses-validation bug (Configuration-001) is also untested against the live SQL fixture.
|
||||
|
||||
**Recommendation:** Add `GenerationApplierTests` cases for a throwing callback (assert error recorded, assert cancellation propagates) and a `DraftValidatorTests` path-length boundary case. Add a `StoredProceduresTests` case that publishes an invalid draft and asserts it stays `Draft`.
|
||||
|
||||
**Resolution:** _(open)_
|
||||
156
code-reviews/Core.Abstractions/findings.md
Normal file
156
code-reviews/Core.Abstractions/findings.md
Normal file
@@ -0,0 +1,156 @@
|
||||
# Code Review — Core.Abstractions
|
||||
|
||||
| Field | Value |
|
||||
|---|---|
|
||||
| Module | `src/Core/ZB.MOM.WW.OtOpcUa.Core.Abstractions` |
|
||||
| Reviewer | Claude Code |
|
||||
| Review date | 2026-05-22 |
|
||||
| Commit reviewed | `76d35d1` |
|
||||
| Status | Reviewed |
|
||||
| Open findings | 8 |
|
||||
|
||||
## Checklist coverage
|
||||
|
||||
A comprehensive review completes every category, recording "No issues found" where
|
||||
a category produced nothing rather than leaving it blank.
|
||||
|
||||
| # | Category | Result |
|
||||
|---|---|---|
|
||||
| 1 | Correctness & logic bugs | Core.Abstractions-001, Core.Abstractions-002 |
|
||||
| 2 | OtOpcUa conventions | No issues found |
|
||||
| 3 | Concurrency & thread safety | Core.Abstractions-003, Core.Abstractions-004 |
|
||||
| 4 | Error handling & resilience | Core.Abstractions-005 |
|
||||
| 5 | Security | No issues found |
|
||||
| 6 | Performance & resource management | No issues found |
|
||||
| 7 | Design-document adherence | No issues found |
|
||||
| 8 | Code organization & conventions | Core.Abstractions-006 |
|
||||
| 9 | Testing coverage | Core.Abstractions-007 |
|
||||
| 10 | Documentation & comments | Core.Abstractions-008 |
|
||||
|
||||
## Findings
|
||||
|
||||
### Core.Abstractions-001
|
||||
|
||||
| Field | Value |
|
||||
|---|---|
|
||||
| Severity | Medium |
|
||||
| Category | Correctness & logic bugs |
|
||||
| Location | `src/Core/ZB.MOM.WW.OtOpcUa.Core.Abstractions/PollGroupEngine.cs:112` |
|
||||
| Status | Open |
|
||||
|
||||
**Description:** `PollOnceAsync` detects a change with `!Equals(lastSeen?.Value, current.Value)`. `object.Equals` falls back to reference equality for reference types that do not override it — including `T[]` array values. The capability interfaces explicitly support 1-D array attributes (`DriverAttributeInfo.IsArray`, `ValueRank=1`), and a driver's batch reader produces a fresh array instance on every poll. As a result every poll of an array-valued tag is treated as a change, so `OnDataChange` fires on every tick regardless of whether the array contents actually changed. This produces spurious data-change notifications and unnecessary OPC UA monitored-item publishes for any poll-based driver (Modbus, S7, AB CIP, FOCAS) that exposes array tags.
|
||||
|
||||
**Recommendation:** Compare array values structurally — e.g. when both `lastSeen?.Value` and `current.Value` are arrays, compare with `StructuralComparisons.StructuralEqualityComparer.Equals` (or element-wise) — instead of relying on `object.Equals`. Add a test covering an array-valued tag whose contents are unchanged across polls.
|
||||
|
||||
**Resolution:** _(open)_
|
||||
|
||||
### Core.Abstractions-002
|
||||
|
||||
| Field | Value |
|
||||
|---|---|
|
||||
| Severity | Medium |
|
||||
| Category | Correctness & logic bugs |
|
||||
| Location | `src/Core/ZB.MOM.WW.OtOpcUa.Core.Abstractions/PollGroupEngine.cs:105-109` |
|
||||
| Status | Open |
|
||||
|
||||
**Description:** `PollOnceAsync` iterates `state.TagReferences` and indexes the reader's result with `snapshots[i]`, assuming the driver-supplied `_reader` delegate returns exactly one snapshot per input reference in input order. The contract is documented (ctor XML doc: "snapshots MUST be returned in the same order as the input references"), but it is never validated. A reader that returns a shorter list — a plausible driver bug, or a partial result on a backend error — throws `ArgumentOutOfRangeException` from the indexer. That exception escapes `PollOnceAsync`, is swallowed by the catch-all in `PollLoopAsync` (line 99), and the subscription then silently produces no further `OnDataChange` callbacks for the rest of its lifetime with no diagnostic. The failure mode is a permanently stalled subscription that looks healthy.
|
||||
|
||||
**Recommendation:** Validate `snapshots.Count == state.TagReferences.Count` at the top of `PollOnceAsync` and throw a descriptive exception (or skip the tick with a logged diagnostic) so the contract violation is visible rather than silently degrading. Consider surfacing repeated reader-contract failures through a callback the driver can route to its health surface.
|
||||
|
||||
**Resolution:** _(open)_
|
||||
|
||||
### Core.Abstractions-003
|
||||
|
||||
| Field | Value |
|
||||
|---|---|
|
||||
| Severity | Medium |
|
||||
| Category | Concurrency & thread safety |
|
||||
| Location | `src/Core/ZB.MOM.WW.OtOpcUa.Core.Abstractions/PollGroupEngine.cs:64,121-130` |
|
||||
| Status | Open |
|
||||
|
||||
**Description:** `Subscribe` starts the poll loop with a fire-and-forget `Task.Run` and keeps no reference to the returned `Task`. Neither `Unsubscribe` nor `DisposeAsync` awaits the loop's completion — they only cancel the `CancellationTokenSource` and dispose it. Two consequences:
|
||||
|
||||
1. After `DisposeAsync`/`Unsubscribe` returns, a poll already in flight inside `PollOnceAsync` can still complete and invoke the `_onChange` callback. A driver that disposes the engine during shutdown can therefore receive a data-change callback after it considers the engine torn down, with no way to know the engine is gone.
|
||||
2. `Unsubscribe`/`DisposeAsync` call `state.Cts.Dispose()` immediately while the loop may still be inside `Task.Delay(state.Interval, ct)`. Cancelling-then-disposing a CTS while a consumer still touches the token can race; `Task.Delay` on a disposed token can throw `ObjectDisposedException` rather than `OperationCanceledException`, which the `Task.Delay` await in `PollLoopAsync` does not catch (it catches only `OperationCanceledException`).
|
||||
|
||||
**Recommendation:** Track each loop `Task` in `SubscriptionState` and await it (with a timeout) in `Unsubscribe`/`DisposeAsync` before disposing the CTS, so disposal is deterministic and no callback can fire after teardown. At minimum, defer `Cts.Dispose()` until the loop task has observed cancellation, or wrap the `Task.Delay` await to also tolerate `ObjectDisposedException`.
|
||||
|
||||
**Resolution:** _(open)_
|
||||
|
||||
### Core.Abstractions-004
|
||||
|
||||
| Field | Value |
|
||||
|---|---|
|
||||
| Severity | Low |
|
||||
| Category | Concurrency & thread safety |
|
||||
| Location | `src/Core/ZB.MOM.WW.OtOpcUa.Core.Abstractions/DriverTypeRegistry.cs:23-40` |
|
||||
| Status | Open |
|
||||
|
||||
**Description:** `Register` performs a check-then-act sequence (`snapshot.ContainsKey` then build `next` then `Interlocked.Exchange`) that is not atomic. Two threads registering concurrently can both pass the duplicate check and both build a `next` dictionary; the second `Interlocked.Exchange` then wins and silently discards the first registration, defeating the documented "registered only once" guarantee. The class XML doc states registration happens single-threaded at startup, so this is not a live defect — but the use of `Interlocked.Exchange` for the swap implies the type is fully thread-safe for writers, which it is not. The mismatch between the implementation's apparent intent and its actual guarantee is a maintenance hazard.
|
||||
|
||||
**Recommendation:** Either guard `Register` with a `lock` so the check-build-swap is atomic, or strengthen the XML `Thread-safety` remark to state explicitly that concurrent `Register` calls are unsupported and only reader/writer concurrency is safe.
|
||||
|
||||
**Resolution:** _(open)_
|
||||
|
||||
### Core.Abstractions-005
|
||||
|
||||
| Field | Value |
|
||||
|---|---|
|
||||
| Severity | Low |
|
||||
| Category | Error handling & resilience |
|
||||
| Location | `src/Core/ZB.MOM.WW.OtOpcUa.Core.Abstractions/PollGroupEngine.cs:90,99` |
|
||||
| Status | Open |
|
||||
|
||||
**Description:** Both the initial-poll and steady-state catch blocks use a bare `catch { }` that swallows every exception type, including non-transient programmer errors such as `NullReferenceException` and `ArgumentOutOfRangeException` (see Core.Abstractions-002). The XML remark says "transient poll error — loop continues, driver health surface logs it", but the engine never actually notifies the driver — there is no callback or event for a caught exception, so the driver's health surface has nothing to log. A persistently failing reader produces a silently spinning loop with zero observability from inside this module.
|
||||
|
||||
**Recommendation:** Narrow the catch to the exception types a reader is expected to throw (or at least exclude obviously-fatal ones), and add an optional `Action<Exception>` error callback (or raise an event) so the owning driver can record poll failures on its health surface as the doc claims happens.
|
||||
|
||||
**Resolution:** _(open)_
|
||||
|
||||
### Core.Abstractions-006
|
||||
|
||||
| Field | Value |
|
||||
|---|---|
|
||||
| Severity | Low |
|
||||
| Category | Code organization & conventions |
|
||||
| Location | `src/Core/ZB.MOM.WW.OtOpcUa.Core.Abstractions/IHistoryProvider.cs:63,84-86`, `src/Core/ZB.MOM.WW.OtOpcUa.Core.Abstractions/Historian/IHistorianDataSource.cs:30,63` |
|
||||
| Status | Open |
|
||||
|
||||
**Description:** The two history-read surfaces use inconsistent integer types for the same "maximum rows" concept. `IHistoryProvider.ReadRawAsync` and `IHistorianDataSource.ReadRawAsync` take `uint maxValuesPerNode`, but `ReadEventsAsync` (on both interfaces) takes `int maxEvents`. The OPC UA `HistoryRead` service request fields are unsigned, and a negative `maxEvents` has no defined meaning. Mixing `int` and `uint` for the same parameter role across sibling methods forces every caller and implementer to reason about the inconsistency and risks accidental sign issues at the boundary.
|
||||
|
||||
**Recommendation:** Standardize on `uint` for all max-rows parameters across both `IHistoryProvider` and `IHistorianDataSource` (or document explicitly why `maxEvents` differs).
|
||||
|
||||
**Resolution:** _(open)_
|
||||
|
||||
### Core.Abstractions-007
|
||||
|
||||
| Field | Value |
|
||||
|---|---|
|
||||
| Severity | Low |
|
||||
| Category | Testing coverage |
|
||||
| Location | `tests/Core/ZB.MOM.WW.OtOpcUa.Core.Abstractions.Tests/PollGroupEngineTests.cs` |
|
||||
| Status | Open |
|
||||
|
||||
**Description:** `PollGroupEngine` is the only behavioural (non-DTO) type in the module and its tests, while solid for the happy paths, miss two paths that this review identifies as defect-prone: (a) no test exercises an array-valued tag whose contents are unchanged across polls (would catch Core.Abstractions-001), and (b) no test exercises a reader that returns a snapshot list shorter than the input references (would catch Core.Abstractions-002). The `Reader_exception_does_not_crash_loop` test only covers a reader that throws before producing any result. `DataValueSnapshot` change-detection semantics for reference-typed values are therefore unverified.
|
||||
|
||||
**Recommendation:** Add tests for the unchanged-array case and the short-result-list case once Core.Abstractions-001/002 are addressed, so the intended contract is locked down.
|
||||
|
||||
**Resolution:** _(open)_
|
||||
|
||||
### Core.Abstractions-008
|
||||
|
||||
| Field | Value |
|
||||
|---|---|
|
||||
| Severity | Low |
|
||||
| Category | Documentation & comments |
|
||||
| Location | `src/Core/ZB.MOM.WW.OtOpcUa.Core.Abstractions/DriverHealth.cs:9`, `src/Core/ZB.MOM.WW.OtOpcUa.Core.Abstractions/IHistoryProvider.cs:39-43,65-69` |
|
||||
| Status | Open |
|
||||
|
||||
**Description:** Two XML-doc inaccuracies:
|
||||
|
||||
1. `DriverHealth.LastError` is documented as "Most recent error message; null when state is Healthy." The `DriverState` enum also defines `Degraded`, `Reconnecting`, and `Faulted` states, all of which carry an error; and a driver in `Healthy` state may legitimately retain the last error from a previously-recovered failure. The "null when Healthy" claim is stronger than the type enforces and than callers should rely on.
|
||||
2. `IHistoryProvider.ReadAtTimeAsync` and `ReadEventsAsync` are C# default interface methods whose `<remarks>` say "Default implementation throws". This is accurate, but the sibling `IHistorianDataSource` declares the same methods as required (non-default) members — the asymmetry between the two history surfaces is undocumented and could surprise an implementer who assumes parity.
|
||||
|
||||
**Recommendation:** Reword `DriverHealth.LastError` to "Most recent error message; may be null when no error has been recorded" without tying nullness to a specific state. Add a one-line note on `IHistoryProvider`/`IHistorianDataSource` explaining why one surface uses default methods and the other does not.
|
||||
|
||||
**Resolution:** _(open)_
|
||||
192
code-reviews/Core.AlarmHistorian/findings.md
Normal file
192
code-reviews/Core.AlarmHistorian/findings.md
Normal file
@@ -0,0 +1,192 @@
|
||||
# Code Review — Core.AlarmHistorian
|
||||
|
||||
| Field | Value |
|
||||
|---|---|
|
||||
| Module | `src/Core/ZB.MOM.WW.OtOpcUa.Core.AlarmHistorian` |
|
||||
| Reviewer | Claude Code |
|
||||
| Review date | 2026-05-22 |
|
||||
| Commit reviewed | `76d35d1` |
|
||||
| Status | Reviewed |
|
||||
| Open findings | 11 |
|
||||
|
||||
## Checklist coverage
|
||||
|
||||
| # | Category | Result |
|
||||
|---|---|---|
|
||||
| 1 | Correctness & logic bugs | Core.AlarmHistorian-001, Core.AlarmHistorian-002 |
|
||||
| 2 | OtOpcUa conventions | Core.AlarmHistorian-003 |
|
||||
| 3 | Concurrency & thread safety | Core.AlarmHistorian-004, Core.AlarmHistorian-005 |
|
||||
| 4 | Error handling & resilience | Core.AlarmHistorian-006, Core.AlarmHistorian-007 |
|
||||
| 5 | Security | No issues found |
|
||||
| 6 | Performance & resource management | Core.AlarmHistorian-008 |
|
||||
| 7 | Design-document adherence | Core.AlarmHistorian-009 |
|
||||
| 8 | Code organization & conventions | No issues found |
|
||||
| 9 | Testing coverage | Core.AlarmHistorian-010 |
|
||||
| 10 | Documentation & comments | Core.AlarmHistorian-011 |
|
||||
|
||||
## Findings
|
||||
|
||||
### Core.AlarmHistorian-001
|
||||
|
||||
| Field | Value |
|
||||
|---|---|
|
||||
| Severity | Critical |
|
||||
| Category | Correctness & logic bugs |
|
||||
| Location | `src/Core/ZB.MOM.WW.OtOpcUa.Core.AlarmHistorian/SqliteStoreAndForwardSink.cs:255-278` |
|
||||
| Status | Open |
|
||||
|
||||
**Description:** `ReadBatch` builds two parallel lists, `rowIds` and `events`, that `DrainOnceAsync` later indexes together (`rowIds[i]` paired with `outcomes[i]`, where `outcomes` is 1:1 with `events`). But `rowIds.Add(reader.GetInt64(0))` runs unconditionally for every row, while `events.Add(evt)` is guarded by `if (evt is not null)`. If `JsonSerializer.Deserialize<AlarmHistorianEvent>` returns `null` for any row (corrupt or empty payload), `rowIds` gains an entry but `events` does not. The writer then returns `outcomes.Count == events.Count`, which passes the `outcomes.Count != events.Count` guard, and the per-row loop applies each outcome to `rowIds[i]` — every row from the skipped index onward is mapped to the wrong event's outcome. An `Ack` can delete a row whose event was never sent to the historian (silent alarm-event data loss), and a `PermanentFail` can dead-letter an unrelated good row. The corrupt row itself is never advanced and is re-read on every drain forever, permanently stalling the queue head.
|
||||
|
||||
**Recommendation:** Keep `rowIds` and `events` strictly aligned. Either skip the `rowId` when deserialization returns `null`, or — better — treat a `null`/failed deserialization as an immediate dead-letter for that specific `RowId` (it can never succeed) and exclude it from the batch passed to the writer. Carry the `rowId` inside a single list of `(long RowId, AlarmHistorianEvent Event)` tuples so the two can never drift.
|
||||
|
||||
**Resolution:** _(open)_
|
||||
|
||||
### Core.AlarmHistorian-002
|
||||
|
||||
| Field | Value |
|
||||
|---|---|
|
||||
| Severity | High |
|
||||
| Category | Correctness & logic bugs |
|
||||
| Location | `src/Core/ZB.MOM.WW.OtOpcUa.Core.AlarmHistorian/SqliteStoreAndForwardSink.cs:99-105,386-388` |
|
||||
| Status | Open |
|
||||
|
||||
**Description:** The class computes an exponential-backoff value (`_backoffIndex`, `BumpBackoff`, `CurrentBackoff`, the `BackoffLadder`) and the class doc-comment states "Drain runs on a shared `Timer`. Exponential backoff on `RetryPlease`: 1s → 2s → 5s → 15s → 60s cap." However `StartDrainLoop` creates the `Timer` with a fixed `tickInterval` for both due-time and period and never reschedules it. `CurrentBackoff` is computed but never consulted by the timer, so the drain loop keeps hammering the historian at the fixed cadence regardless of `BackingOff` state. The documented backoff behavior does not exist for the production drain path — it is only observable via the `CurrentBackoff` property in tests.
|
||||
|
||||
**Recommendation:** Make the drain loop honor the backoff. Either switch to a self-rescheduling one-shot timer that sets its next due-time to `max(tickInterval, CurrentBackoff)` after each `DrainOnceAsync`, or have `DrainOnceAsync` skip the writer call while still inside the backoff window (track `_nextEligibleDrainUtc`). Update the doc-comment if the design intentionally changes.
|
||||
|
||||
**Resolution:** _(open)_
|
||||
|
||||
### Core.AlarmHistorian-003
|
||||
|
||||
| Field | Value |
|
||||
|---|---|
|
||||
| Severity | Medium |
|
||||
| Category | OtOpcUa conventions |
|
||||
| Location | `src/Core/ZB.MOM.WW.OtOpcUa.Core.AlarmHistorian/SqliteStoreAndForwardSink.cs:107-127,218-243,246-253` |
|
||||
| Status | Open |
|
||||
|
||||
**Description:** `EnqueueAsync` is declared `async`-shaped (`Task EnqueueAsync(...)`) and the `IAlarmHistorianSink` contract explicitly states "the sink MUST NOT block the emitting thread … `EnqueueAsync` returns as soon as the queue row is committed." But the implementation does fully synchronous, blocking SQLite I/O (`conn.Open()`, `EnforceCapacity`, `cmd.ExecuteNonQuery()`) on the caller's thread and only then returns `Task.CompletedTask`. Under SQLite write contention with the drain worker this blocks the alarm-emitting thread for the full lock-wait. The same synchronous-work-behind-an-async-or-status-API pattern applies to `GetStatus` (called from the Admin UI / `/healthz` request thread) and `RetryDeadLettered`. The `cancellationToken` parameter of `EnqueueAsync` is accepted and ignored.
|
||||
|
||||
**Recommendation:** Either make the I/O genuinely asynchronous (`await conn.OpenAsync(ct)`, `await cmd.ExecuteNonQueryAsync(ct)` — `Microsoft.Data.Sqlite` supports the async surface), or change `EnqueueAsync` to an in-memory hand-off (e.g. a `Channel`) drained by a background writer so the emitting thread truly never touches the database. At minimum honor the `cancellationToken` parameter.
|
||||
|
||||
**Resolution:** _(open)_
|
||||
|
||||
### Core.AlarmHistorian-004
|
||||
|
||||
| Field | Value |
|
||||
|---|---|
|
||||
| Severity | High |
|
||||
| Category | Concurrency & thread safety |
|
||||
| Location | `src/Core/ZB.MOM.WW.OtOpcUa.Core.AlarmHistorian/SqliteStoreAndForwardSink.cs:90,112,176,259` |
|
||||
| Status | Open |
|
||||
|
||||
**Description:** Every operation opens a brand-new `SqliteConnection` from the bare connection string `Data Source={databasePath}` — no `busy_timeout` / `Pragma`, no shared cache. SQLite serializes writers with a file lock; when `EnqueueAsync` (emitting thread) and `DrainOnceAsync` (timer thread) collide, the loser gets an immediate `SQLITE_BUSY` exception because the default busy timeout is 0. In `DrainOnceAsync` the `BeginTransaction()` / `Commit()` block can fail mid-drain with `SQLITE_BUSY`; the exception escapes the `try` (it is not the writer-call `try`), the `finally` releases the gate, and the row outcomes are lost / partially applied. The class doc-comment claims `DrainOnceAsync` is "Safe to call from multiple threads" but the concurrent enqueue-vs-drain case is not actually safe against busy errors.
|
||||
|
||||
**Recommendation:** Configure a non-zero busy timeout — `SqliteConnectionStringBuilder { DataSource = databasePath, DefaultTimeout = 5 }` plus `PRAGMA busy_timeout=5000` on open. Strongly consider WAL journal mode (`PRAGMA journal_mode=WAL`) so readers and the writer do not block each other. Reuse a single long-lived write connection guarded by `_drainGate` rather than opening/closing per call.
|
||||
|
||||
**Resolution:** _(open)_
|
||||
|
||||
### Core.AlarmHistorian-005
|
||||
|
||||
| Field | Value |
|
||||
|---|---|
|
||||
| Severity | Medium |
|
||||
| Category | Concurrency & thread safety |
|
||||
| Location | `src/Core/ZB.MOM.WW.OtOpcUa.Core.AlarmHistorian/SqliteStoreAndForwardSink.cs:66-71,141-143,199,386-388` |
|
||||
| Status | Open |
|
||||
|
||||
**Description:** The mutable status fields `_lastDrainUtc`, `_lastSuccessUtc`, `_lastError`, `_drainState`, and `_backoffIndex` are written by the drain timer thread inside `DrainOnceAsync` and read concurrently by `GetStatus()` / `CurrentBackoff` on Admin-UI / health-check threads with no memory barrier (no `lock`, no `volatile`, no `Interlocked`). `DateTime?` is not guaranteed to be written atomically, and the reader can observe a stale or torn value. This is a diagnostics surface so the impact is limited, but a torn `DateTime?` read is real undefined behavior.
|
||||
|
||||
**Recommendation:** Guard the status fields with a small lock, or make the scalars `volatile` where the type permits and snapshot `DateTime?` values under a lock. Take the snapshot atomically in `GetStatus()`.
|
||||
|
||||
**Resolution:** _(open)_
|
||||
|
||||
### Core.AlarmHistorian-006
|
||||
|
||||
| Field | Value |
|
||||
|---|---|
|
||||
| Severity | High |
|
||||
| Category | Error handling & resilience |
|
||||
| Location | `src/Core/ZB.MOM.WW.OtOpcUa.Core.AlarmHistorian/SqliteStoreAndForwardSink.cs:103,135-216` |
|
||||
| Status | Open |
|
||||
|
||||
**Description:** `StartDrainLoop` launches the drain with `new Timer(_ => _ = DrainOnceAsync(CancellationToken.None), ...)`. The returned `Task` is discarded (`_ =`), so any exception thrown by `DrainOnceAsync` is an unobserved task exception — never logged, never surfaced. Several paths in `DrainOnceAsync` can throw: the `outcomes.Count != events.Count` guard (`InvalidOperationException`), `JsonSerializer.Deserialize` on a malformed payload, `PurgeAgedDeadLetters` / `ReadBatch` / the commit block hitting `SQLITE_BUSY` or a schema error. When any of these throw, the drain silently stops making progress for that tick, `_drainState` is left stale (still `Draining`), and an operator watching the Admin UI sees no error. A persistently failing condition produces a silent, permanently stalled queue.
|
||||
|
||||
**Recommendation:** Wrap the timer callback body in a `try/catch` that logs the exception via `_logger.Error`, records it into `_lastError`, and resets `_drainState` so the diagnostics surface reflects the failure. Do not discard the `Task` without an attached continuation that observes faults.
|
||||
|
||||
**Resolution:** _(open)_
|
||||
|
||||
### Core.AlarmHistorian-007
|
||||
|
||||
| Field | Value |
|
||||
|---|---|
|
||||
| Severity | Medium |
|
||||
| Category | Error handling & resilience |
|
||||
| Location | `src/Core/ZB.MOM.WW.OtOpcUa.Core.AlarmHistorian/SqliteStoreAndForwardSink.cs:172-174` |
|
||||
| Status | Open |
|
||||
|
||||
**Description:** When the writer returns a wrong-cardinality result, the code throws `InvalidOperationException` after `WriteBatchAsync` has already succeeded. The events were potentially delivered to the historian, but no rows are deleted or dead-lettered, `_drainState` is left at `Draining`, and the backoff is not bumped. Combined with Core.AlarmHistorian-006 the exception is then swallowed. On the next drain the same batch is re-sent — if the writer actually delivered the events the first time, this produces duplicate historian rows; if it is a deterministic writer bug the queue stalls forever.
|
||||
|
||||
**Recommendation:** Treat a cardinality mismatch as a transient batch failure: log it, set `_lastError`, bump backoff, set `_drainState = BackingOff`, and return without throwing — mirroring the writer-exception path at lines 162-170. A deterministic writer contract violation should also raise an operator-visible alert rather than silently looping.
|
||||
|
||||
**Resolution:** _(open)_
|
||||
|
||||
### Core.AlarmHistorian-008
|
||||
|
||||
| Field | Value |
|
||||
|---|---|
|
||||
| Severity | Low |
|
||||
| Category | Performance & resource management |
|
||||
| Location | `src/Core/ZB.MOM.WW.OtOpcUa.Core.AlarmHistorian/SqliteStoreAndForwardSink.cs:107-127,255-278` |
|
||||
| Status | Open |
|
||||
|
||||
**Description:** Each `EnqueueAsync` (one per alarm transition — a hot path on a busy plant) opens a connection, runs `EnforceCapacity` (a `COUNT(*)` over the queue table on every single enqueue), serializes JSON, inserts, and closes the connection. The unconditional `COUNT(*)` on every enqueue is an avoidable scan; the open/close churn defeats connection pooling benefits and adds lock-acquisition overhead per event. `DrainOnceAsync` similarly opens three separate connections per tick (`PurgeAgedDeadLetters`, `ReadBatch`, the transaction block).
|
||||
|
||||
**Recommendation:** Reuse a single pooled write connection. Replace the per-enqueue `COUNT(*)` with a periodic capacity check (every Nth enqueue, or piggy-backed on the drain tick), or maintain an in-memory approximate counter. Combine the drain-tick connections into one.
|
||||
|
||||
**Resolution:** _(open)_
|
||||
|
||||
### Core.AlarmHistorian-009
|
||||
|
||||
| Field | Value |
|
||||
|---|---|
|
||||
| Severity | Medium |
|
||||
| Category | Design-document adherence |
|
||||
| Location | `src/Core/ZB.MOM.WW.OtOpcUa.Core.AlarmHistorian/SqliteStoreAndForwardSink.cs:317-347` |
|
||||
| Status | Open |
|
||||
|
||||
**Description:** `docs/AlarmTracking.md` and the `IAlarmHistorianSink` contract present the SQLite queue as the durability guarantee — "Durably enqueue the event", "operator acks never block on the historian being reachable". But `EnforceCapacity` silently deletes the oldest non-dead-lettered (not-yet-sent) rows when the queue reaches `DefaultCapacity` (1,000,000). Those are alarm-event records that were accepted as durably queued and are then dropped before ever reaching the historian — silent alarm-history data loss under sustained historian outage. The only signal is a `WARN` log line. Neither `docs/AlarmTracking.md` nor the sink's XML doc mentions that the durability guarantee is bounded, and there is no metric/dead-letter trail for evicted rows.
|
||||
|
||||
**Recommendation:** At minimum document the bounded-durability behavior in `docs/AlarmTracking.md` and the `IAlarmHistorianSink` summary. Better: surface evicted-row counts in `HistorianSinkStatus` (a dedicated counter) so the loss is operator-visible, and consider routing overflow to the dead-letter table instead of hard-deleting it so the records survive for post-mortem within the retention window.
|
||||
|
||||
**Resolution:** _(open)_
|
||||
|
||||
### Core.AlarmHistorian-010
|
||||
|
||||
| Field | Value |
|
||||
|---|---|
|
||||
| Severity | Medium |
|
||||
| Category | Testing coverage |
|
||||
| Location | `tests/Core/ZB.MOM.WW.OtOpcUa.Core.AlarmHistorian.Tests/SqliteStoreAndForwardSinkTests.cs` |
|
||||
| Status | Open |
|
||||
|
||||
**Description:** The test suite covers the happy paths well (Ack/Retry/PermanentFail, capacity eviction, retention purge, ctor validation) but leaves critical paths untested: (a) no test exercises a corrupt / `null`-deserializing `PayloadJson` row, so the `rowIds`/`events` misalignment bug (Core.AlarmHistorian-001) was not caught; (b) no test for `StartDrainLoop` actually running on the timer, nor for the backoff being honored by the schedule (Core.AlarmHistorian-002); (c) no concurrency test running `EnqueueAsync` and `DrainOnceAsync` in parallel, which is the exact scenario that triggers `SQLITE_BUSY` (Core.AlarmHistorian-004); (d) no test for the `outcomes.Count != events.Count` cardinality-mismatch branch (Core.AlarmHistorian-007).
|
||||
|
||||
**Recommendation:** Add tests for: a corrupt payload row (insert raw bad JSON via a direct SQLite write, then drain and assert the correct row is dead-lettered and others are unaffected); a `FakeWriter` returning a wrong-length outcome list; a parallel enqueue/drain stress test; and the timer-driven `StartDrainLoop` path.
|
||||
|
||||
**Resolution:** _(open)_
|
||||
|
||||
### Core.AlarmHistorian-011
|
||||
|
||||
| Field | Value |
|
||||
|---|---|
|
||||
| Severity | Low |
|
||||
| Category | Documentation & comments |
|
||||
| Location | `src/Core/ZB.MOM.WW.OtOpcUa.Core.AlarmHistorian/IAlarmHistorianSink.cs:5-9,76`, `AlarmHistorianEvent.cs:20` |
|
||||
| Status | Open |
|
||||
|
||||
**Description:** Several doc-comments reference the retired v1 architecture. The `IAlarmHistorianSink` summary says ingestion "routes through Galaxy.Host's pipe" and `IAlarmHistorianWriter` says "Stream G wires this to the Galaxy.Host IPC client", but `docs/AlarmTracking.md` and `CLAUDE.md` state the legacy `Galaxy.Host` project was retired in PR 7.2 and the write path is now the Wonderware historian sidecar (`WonderwareHistorianClient`). `AlarmHistorianEvent.cs:20` likewise says "the Galaxy.Host handler maps to the historian's enum on the wire." These stale references will mislead a reader about where the writer is actually hosted.
|
||||
|
||||
**Recommendation:** Update the doc-comments to refer to the Wonderware historian sidecar / `WonderwareHistorianClient` (`IAlarmHistorianWriter` implementation) instead of `Galaxy.Host`, consistent with `docs/AlarmTracking.md`'s "Historian write-back" section.
|
||||
|
||||
**Resolution:** _(open)_
|
||||
210
code-reviews/Core.ScriptedAlarms/findings.md
Normal file
210
code-reviews/Core.ScriptedAlarms/findings.md
Normal file
@@ -0,0 +1,210 @@
|
||||
# Code Review — Core.ScriptedAlarms
|
||||
|
||||
| Field | Value |
|
||||
|---|---|
|
||||
| Module | `src/Core/ZB.MOM.WW.OtOpcUa.Core.ScriptedAlarms` |
|
||||
| Reviewer | Claude Code |
|
||||
| Review date | 2026-05-22 |
|
||||
| Commit reviewed | `76d35d1` |
|
||||
| Status | Reviewed |
|
||||
| Open findings | 12 |
|
||||
|
||||
## Checklist coverage
|
||||
|
||||
A comprehensive review completes every category, recording "No issues found" where
|
||||
a category produced nothing rather than leaving it blank.
|
||||
|
||||
| # | Category | Result |
|
||||
|---|---|---|
|
||||
| 1 | Correctness & logic bugs | Core.ScriptedAlarms-002 |
|
||||
| 2 | OtOpcUa conventions | No issues found |
|
||||
| 3 | Concurrency & thread safety | Core.ScriptedAlarms-001, Core.ScriptedAlarms-004, Core.ScriptedAlarms-005, Core.ScriptedAlarms-006 |
|
||||
| 4 | Error handling & resilience | Core.ScriptedAlarms-007 |
|
||||
| 5 | Security | No issues found |
|
||||
| 6 | Performance & resource management | Core.ScriptedAlarms-008, Core.ScriptedAlarms-009 |
|
||||
| 7 | Design-document adherence | Core.ScriptedAlarms-010 |
|
||||
| 8 | Code organization & conventions | Core.ScriptedAlarms-011 |
|
||||
| 9 | Testing coverage | Core.ScriptedAlarms-012 |
|
||||
| 10 | Documentation & comments | Core.ScriptedAlarms-003 |
|
||||
|
||||
## Findings
|
||||
|
||||
### Core.ScriptedAlarms-001
|
||||
|
||||
| Field | Value |
|
||||
|---|---|
|
||||
| Severity | High |
|
||||
| Category | Concurrency & thread safety |
|
||||
| Location | `ScriptedAlarmEngine.cs:175`, `ScriptedAlarmEngine.cs:178`, `ScriptedAlarmEngine.cs:73`, `ScriptedAlarmEngine.cs:368` |
|
||||
| Status | Open |
|
||||
|
||||
**Description:** `_alarms` is a plain `Dictionary<string, AlarmState>` (line 42). Every mutation of it (`LoadAsync`, `ApplyAsync`, `ReevaluateAsync`, `ShelvingCheckAsync`) correctly happens under the `_evalGate` semaphore, but four read paths touch it with no synchronisation: `GetState` (line 175), `GetAllStates` (line 178-179), the `LoadedAlarmIds` property (line 73), and `RunShelvingCheck` (line 368, `_alarms.Keys.ToArray()`). `RunShelvingCheck` fires from a `Timer` thread-pool callback and can run concurrently with an `ApplyAsync`/`ReevaluateAsync` that is reassigning a dictionary entry. `Dictionary` is not safe for concurrent read while another thread writes — even a value reassignment can be observed mid-rehash and throw `InvalidOperationException` or return torn state. `GetState`/`GetAllStates` are documented as being used by the Admin UI status page, so these reads come from arbitrary request threads.
|
||||
|
||||
**Recommendation:** Either switch `_alarms` to `ConcurrentDictionary<string, AlarmState>` (entry reassignment via `_alarms[id] = ...` is already the only write shape, which a `ConcurrentDictionary` supports atomically), or acquire `_evalGate` in every reader. A `ConcurrentDictionary` is the lighter change and matches `_valueCache`, which is already concurrent.
|
||||
|
||||
**Resolution:** _(open)_
|
||||
|
||||
### Core.ScriptedAlarms-002
|
||||
|
||||
| Field | Value |
|
||||
|---|---|
|
||||
| Severity | Medium |
|
||||
| Category | Correctness & logic bugs |
|
||||
| Location | `ScriptedAlarmEngine.cs:162`, `ScriptedAlarmEngine.cs:90` |
|
||||
| Status | Open |
|
||||
|
||||
**Description:** `LoadAsync` is written to be re-callable — it begins by calling `UnsubscribeFromUpstream()`, `_alarms.Clear()`, and `_alarmsReferencing.Clear()` (lines 90-92), which only makes sense if a reload is supported. But at line 162 it unconditionally assigns `_shelvingTimer = new Timer(...)` without disposing the timer created by a previous `LoadAsync` call. A second `LoadAsync` therefore leaks the old `Timer` and leaves two timers running concurrently against the same `_alarms`/`_evalGate`. The old timer's `RunShelvingCheck` keeps firing forever.
|
||||
|
||||
**Recommendation:** Dispose any existing `_shelvingTimer` before reassigning it, e.g. `_shelvingTimer?.Dispose();` immediately before line 162, inside the `_evalGate` critical section. If reload is genuinely not supported, instead guard `LoadAsync` against a second call and document it as one-shot.
|
||||
|
||||
**Resolution:** _(open)_
|
||||
|
||||
### Core.ScriptedAlarms-003
|
||||
|
||||
| Field | Value |
|
||||
|---|---|
|
||||
| Severity | Low |
|
||||
| Category | Documentation & comments |
|
||||
| Location | `ScriptedAlarmEngine.cs:343`, `docs/ScriptedAlarms.md:107` |
|
||||
| Status | Open |
|
||||
|
||||
**Description:** `docs/ScriptedAlarms.md` (Composition step 3) and the `OnUpstreamChange` comment ("Fire-and-forget so driver-side dispatch isn't blocked", line 225-226) describe the `OnEvent` emission path as non-blocking / fire-and-forget. In the code, `EmitEvent` invokes `OnEvent?.Invoke(this, evt)` **synchronously while `_evalGate` is held** (called from `EvaluatePredicateToStateAsync` line 305 and `ApplyAsync` line 217, both inside the gate). A slow subscriber blocks the single evaluation gate for all alarms; a subscriber that re-enters the engine (e.g. calls `AcknowledgeAsync`) deadlocks because `_evalGate` is a non-reentrant `SemaphoreSlim(1,1)`. The behaviour is defensible (the historian sink is non-blocking, per the doc), but the comments/doc are misleading about where the work happens and the re-entrancy hazard is undocumented.
|
||||
|
||||
**Recommendation:** Either move `EmitEvent` outside the `_evalGate` critical section (collect emissions during the locked section and raise them after `Release()`), or document explicitly on `OnEvent` that handlers run under the engine lock, must be fast, and must never call back into the engine.
|
||||
|
||||
**Resolution:** _(open)_
|
||||
|
||||
### Core.ScriptedAlarms-004
|
||||
|
||||
| Field | Value |
|
||||
|---|---|
|
||||
| Severity | Medium |
|
||||
| Category | Concurrency & thread safety |
|
||||
| Location | `ScriptedAlarmEngine.cs:138-143`, `ScriptedAlarmEngine.cs:227-234` |
|
||||
| Status | Open |
|
||||
|
||||
**Description:** During `LoadAsync`, `_upstream.SubscribeTag(path, OnUpstreamChange)` is called inside the `_evalGate` critical section (line 142). If an upstream implementation delivers an initial value synchronously from inside `SubscribeTag` (a common pattern, and the `ITagUpstreamSource` contract does not forbid it), the observer callback `OnUpstreamChange` runs on the calling thread, schedules `ReevaluateAsync`, which calls `_evalGate.WaitAsync`. That does not deadlock (the reevaluation task simply blocks until `LoadAsync` releases the gate), but it can cause a re-evaluation to run against a half-initialised `_alarms`/index, and the value written to `_valueCache` on line 141 may be immediately overwritten by the subscription's synchronous push with no defined ordering. The cold-start guard partly masks this, but the ordering between the seed read (line 141) and the subscription push is unspecified and may seed a stale value.
|
||||
|
||||
**Recommendation:** Subscribe to all upstream tags after the seed reads and after `_loaded = true`, or capture the subscription's first push into the cache and treat `SubscribeTag` as the single source of truth (drop the separate `ReadTag` seed). Document the expected `ITagUpstreamSource` delivery semantics (does `SubscribeTag` push an initial value?).
|
||||
|
||||
**Resolution:** _(open)_
|
||||
|
||||
### Core.ScriptedAlarms-005
|
||||
|
||||
| Field | Value |
|
||||
|---|---|
|
||||
| Severity | Medium |
|
||||
| Category | Concurrency & thread safety |
|
||||
| Location | `ScriptedAlarmEngine.cs:365-369`, `ScriptedAlarmEngine.cs:416-424` |
|
||||
| Status | Open |
|
||||
|
||||
**Description:** `Dispose` sets `_disposed = true`, disposes `_shelvingTimer`, and clears `_alarms`. A `RunShelvingCheck` callback already in flight on a thread-pool thread can have passed its `if (_disposed) return;` check (line 367) before `Dispose` ran, then proceed into `ShelvingCheckAsync`, which awaits `_evalGate` and mutates `_alarms` — concurrently with `Dispose`'s `_alarms.Clear()` at line 422 (which runs outside `_evalGate`). `Timer.Dispose()` does not wait for the running callback to finish. The result is a possible `InvalidOperationException` from a dictionary mutated during enumeration, or a save of stale state to the store after dispose. The same applies to a `ReevaluateAsync` in flight from a late upstream push.
|
||||
|
||||
**Recommendation:** Use `Timer.Dispose(WaitHandle)` (or `DisposeAsync`) to wait for the callback to drain, and perform `_alarms.Clear()` under `_evalGate` (or simply drop the clear — the object is being discarded). Also have `ShelvingCheckAsync`/`ReevaluateAsync` re-check `_disposed` after acquiring the gate before mutating/saving.
|
||||
|
||||
**Resolution:** _(open)_
|
||||
|
||||
### Core.ScriptedAlarms-006
|
||||
|
||||
| Field | Value |
|
||||
|---|---|
|
||||
| Severity | Low |
|
||||
| Category | Concurrency & thread safety |
|
||||
| Location | `ScriptedAlarmEngine.cs:232`, `ScriptedAlarmEngine.cs:369` |
|
||||
| Status | Open |
|
||||
|
||||
**Description:** `OnUpstreamChange` and `RunShelvingCheck` both launch fire-and-forget tasks (`_ = ReevaluateAsync(...)`, `_ = ShelvingCheckAsync(...)`) with `CancellationToken.None`. There is no tracking of these in-flight tasks, so `Dispose` cannot await them and a server shutdown can race a still-running re-evaluation that writes to the (possibly disposed) store. Combined with finding 005, an upstream push arriving during shutdown produces an unobserved background task touching torn state.
|
||||
|
||||
**Recommendation:** Track outstanding background tasks (or use a single serialised worker / `Channel`), and link them to a `CancellationTokenSource` that `Dispose` cancels and drains. At minimum, await the in-flight work in `Dispose`.
|
||||
|
||||
**Resolution:** _(open)_
|
||||
|
||||
### Core.ScriptedAlarms-007
|
||||
|
||||
| Field | Value |
|
||||
|---|---|
|
||||
| Severity | Medium |
|
||||
| Category | Error handling & resilience |
|
||||
| Location | `ScriptedAlarmEngine.cs:216`, `ScriptedAlarmEngine.cs:251`, `ScriptedAlarmEngine.cs:154`, `ScriptedAlarmEngine.cs:387` |
|
||||
| Status | Open |
|
||||
|
||||
**Description:** Every state mutation calls `await _store.SaveAsync(...)` and relies on it succeeding. If the production SQL-backed `IAlarmStateStore` (Stream E) throws — transient SQL outage, deadlock, timeout — the exception propagates: in `ApplyAsync` it surfaces to the Part 9 method caller *after* the in-memory `_alarms` entry was already updated (line 215 runs before the save on line 216), leaving the in-memory state and the persisted state divergent; in `ReevaluateAsync`/`ShelvingCheckAsync` it is caught and logged, but again the in-memory `_alarms` entry was already advanced (lines 250/386) so the persisted store silently falls behind the live state. After a restart, startup recovery reloads the stale persisted state and operators can see a re-raised or re-ackable alarm. The docs claim "the store's view is always consistent with the in-memory state" (`docs/ScriptedAlarms.md` State persistence) — that invariant is not actually enforced.
|
||||
|
||||
**Recommendation:** Save before committing the in-memory update, or roll back the in-memory entry if `SaveAsync` fails, so the two never diverge. Classify transient store failures and retry, and surface a hard error/health-degraded signal if persistence is permanently failing rather than silently logging and continuing.
|
||||
|
||||
**Resolution:** _(open)_
|
||||
|
||||
### Core.ScriptedAlarms-008
|
||||
|
||||
| Field | Value |
|
||||
|---|---|
|
||||
| Severity | Low |
|
||||
| Category | Performance & resource management |
|
||||
| Location | `Part9StateMachine.cs:261-268` |
|
||||
| Status | Open |
|
||||
|
||||
**Description:** `AppendComment` copies the entire existing comment list into a new `List` on every audit-producing transition (ack, confirm, shelve, unshelve, enable, disable, add-comment, auto-unshelve). The `Comments` list is append-only and unbounded — for a long-lived alarm that is acknowledged/commented hundreds of times, every transition is an O(n) copy and the full history is also re-serialised to the store on every `SaveAsync`. Over a multi-month uptime this is a slowly growing per-transition cost.
|
||||
|
||||
**Recommendation:** Acceptable for now given audit requirements, but consider an immutable persistent list / `ImmutableList<AlarmComment>` to make append O(log n), or have the store persist comments incrementally (append-only audit table) rather than rewriting the whole collection each save. At minimum, note the unbounded-growth characteristic in the design doc.
|
||||
|
||||
**Resolution:** _(open)_
|
||||
|
||||
### Core.ScriptedAlarms-009
|
||||
|
||||
| Field | Value |
|
||||
|---|---|
|
||||
| Severity | Low |
|
||||
| Category | Performance & resource management |
|
||||
| Location | `ScriptedAlarmEngine.cs:309-315`, `ScriptedAlarmEngine.cs:271` |
|
||||
| Status | Open |
|
||||
|
||||
**Description:** `BuildReadCache` allocates a fresh `Dictionary<string, DataValueSnapshot>` on every predicate evaluation, i.e. on every upstream tag change for every referencing alarm. On a busy line where many tags feeding many alarms change frequently, this is a steady stream of short-lived dictionary allocations on the hot path. `AlarmPredicateContext` is also newly constructed each evaluation (line 281).
|
||||
|
||||
**Recommendation:** Minor. If the evaluation path shows up in allocation profiling, the read cache could be a reused per-alarm buffer cleared between evaluations (evaluations are already serialised under `_evalGate`, so a single shared scratch dictionary is safe). Not worth doing speculatively — flag for the perf surface in `docs/v2/Galaxy.Performance.md` if alarm evaluation is ever soak-tested.
|
||||
|
||||
**Resolution:** _(open)_
|
||||
|
||||
### Core.ScriptedAlarms-010
|
||||
|
||||
| Field | Value |
|
||||
|---|---|
|
||||
| Severity | Low |
|
||||
| Category | Design-document adherence |
|
||||
| Location | `ScriptedAlarmEngine.cs:325-336`, `AlarmPredicateContext.cs:33-40`, `MessageTemplate.cs:47` |
|
||||
| Status | Open |
|
||||
|
||||
**Description:** Quality handling is inconsistent across the three places that inspect a `DataValueSnapshot.StatusCode`. `AreInputsReady` (engine, line 333) treats only outright Bad (bit 31) as not-ready, so an Uncertain-quality input is fed to the predicate. `MessageTemplate.Resolve` (line 47) rejects *any* non-zero status code — including Uncertain — and renders `{?}`. `AlarmPredicateContext.GetTag` returns `BadNodeIdUnknown` (`0x80340000`) for a missing path. The net effect: an Uncertain-quality tag is considered good enough to drive an alarm *activation* decision but not good enough to print in the alarm *message*. `docs/ScriptedAlarms.md` ("Fallback rules") only documents the message-template behaviour and does not mention that predicate evaluation accepts Uncertain. The two policies should be reconciled and documented.
|
||||
|
||||
**Recommendation:** Decide one quality policy for "is this input usable" and apply it in both `AreInputsReady` and the message resolver, or explicitly document why predicate evaluation and message rendering treat Uncertain differently. Add the predicate-side Uncertain rule to `docs/ScriptedAlarms.md`.
|
||||
|
||||
**Resolution:** _(open)_
|
||||
|
||||
### Core.ScriptedAlarms-011
|
||||
|
||||
| Field | Value |
|
||||
|---|---|
|
||||
| Severity | Low |
|
||||
| Category | Code organization & conventions |
|
||||
| Location | `Part9StateMachine.cs:275` |
|
||||
| Status | Open |
|
||||
|
||||
**Description:** `TransitionResult.NoOp(state, reason)` takes a `reason` string parameter that is documented in the calling code as a diagnostic ("disabled — predicate result ignored", "already acknowledged", etc.) but the factory method silently discards it — it just returns `new(state, EmissionKind.None)`, identical to `None(state)`. Every call site that passes a carefully-worded reason string is doing dead work, and the comments in `Part9StateMachine` and the class-level remarks claim disabled/no-op transitions "produce ... a diagnostic log line", which they do not.
|
||||
|
||||
**Recommendation:** Either propagate the reason (add it to `TransitionResult` and have the engine log it at debug level when emission is `None` for a no-op), or remove the unused `reason` parameter and collapse `NoOp` into `None`. Update the `Part9StateMachine` remarks that promise a diagnostic log line.
|
||||
|
||||
**Resolution:** _(open)_
|
||||
|
||||
### Core.ScriptedAlarms-012
|
||||
|
||||
| Field | Value |
|
||||
|---|---|
|
||||
| Severity | Medium |
|
||||
| Category | Testing coverage |
|
||||
| Location | `tests/Core/ZB.MOM.WW.OtOpcUa.Core.ScriptedAlarms.Tests/ScriptedAlarmEngineTests.cs` |
|
||||
| Status | Open |
|
||||
|
||||
**Description:** Several engine behaviours central to the module have no test coverage: (1) the 5-second shelving timer / timed-shelve auto-expiry through the *engine* — only the pure `Part9StateMachine.ApplyShelvingCheck` is tested, never `ScriptedAlarmEngine` driving the timer with an injectable clock; (2) `ConfirmAsync`, `TimedShelveAsync`, `UnshelveAsync`, `EnableAsync` engine methods (only `Acknowledge`, `OneShotShelve`, `Disable`, `AddComment` are exercised); (3) `OnEvent` subscriber-throws isolation (`EmitEvent` catch on line 357); (4) `IAlarmStateStore.SaveAsync` failure handling (finding 007); (5) re-entrant `LoadAsync` and the timer leak (finding 002); (6) the cold-start `AreInputsReady` guard with Bad / null / Uncertain inputs. The `clock` and `scriptTimeout` constructor parameters exist specifically to make timer/timeout tests deterministic but no test uses them.
|
||||
|
||||
**Recommendation:** Add engine-level tests that inject a controllable `Func<DateTime>` clock to drive `RunShelvingCheck`, cover the remaining Part 9 engine methods end-to-end, assert subscriber-exception isolation, and add a store-failure fake to lock in the chosen persistence-failure semantics from finding 007.
|
||||
|
||||
**Resolution:** _(open)_
|
||||
323
code-reviews/Core.Scripting/findings.md
Normal file
323
code-reviews/Core.Scripting/findings.md
Normal file
@@ -0,0 +1,323 @@
|
||||
# Code Review — Core.Scripting
|
||||
|
||||
| Field | Value |
|
||||
|---|---|
|
||||
| Module | `src/Core/ZB.MOM.WW.OtOpcUa.Core.Scripting` |
|
||||
| Reviewer | Claude Code |
|
||||
| Review date | 2026-05-22 |
|
||||
| Commit reviewed | `76d35d1` |
|
||||
| Status | Reviewed |
|
||||
| Open findings | 11 |
|
||||
|
||||
## Checklist coverage
|
||||
|
||||
A comprehensive review completes every category, recording "No issues found" where
|
||||
a category produced nothing rather than leaving it blank.
|
||||
|
||||
| # | Category | Result |
|
||||
|---|---|---|
|
||||
| 1 | Correctness & logic bugs | Core.Scripting-004, Core.Scripting-005 |
|
||||
| 2 | OtOpcUa conventions | No issues found |
|
||||
| 3 | Concurrency & thread safety | Core.Scripting-006 |
|
||||
| 4 | Error handling & resilience | Core.Scripting-007 |
|
||||
| 5 | Security | Core.Scripting-001, Core.Scripting-002, Core.Scripting-003 |
|
||||
| 6 | Performance & resource management | Core.Scripting-008 |
|
||||
| 7 | Design-document adherence | Core.Scripting-009 |
|
||||
| 8 | Code organization & conventions | No issues found |
|
||||
| 9 | Testing coverage | Core.Scripting-010, Core.Scripting-011 |
|
||||
| 10 | Documentation & comments | No issues found |
|
||||
|
||||
## Findings
|
||||
|
||||
### Core.Scripting-001
|
||||
|
||||
| Field | Value |
|
||||
|---|---|
|
||||
| Severity | Critical |
|
||||
| Category | Security |
|
||||
| Location | `ForbiddenTypeAnalyzer.cs:45`, `ScriptSandbox.cs:54` |
|
||||
| Status | Open |
|
||||
|
||||
**Description:** `System.Environment` lives in the allowed `System` namespace (it is in
|
||||
`System.Private.CoreLib`, which is allow-listed for primitives) and is not on the
|
||||
forbidden-namespace deny-list. Nothing prevents an operator-authored script from calling
|
||||
`System.Environment.Exit(0)` or `System.Environment.FailFast("...")`. Both terminate the
|
||||
host process immediately. Because scripted-alarm predicates and virtual-tag scripts run
|
||||
in-process in the main OPC UA server (decision: "Scripting engine runs in the main .NET 10
|
||||
server process"), a single malicious or buggy predicate brings down the entire server —
|
||||
an outage affecting every connected client and every driver. `ScriptSandboxTests` only
|
||||
pins the *read* path (`Environment.GetEnvironmentVariable`) as an accepted compromise; the
|
||||
process-killing members are not considered. The whole-process kill far exceeds the
|
||||
"read-only process state" justification the test comments rely on.
|
||||
|
||||
**Recommendation:** The forbidden surface must be member-granular, not namespace-granular,
|
||||
for types in allowed namespaces. Add an explicit forbidden-member deny-list to
|
||||
`ForbiddenTypeAnalyzer` covering at minimum `System.Environment.Exit`,
|
||||
`System.Environment.FailFast`, `System.AppDomain`, `System.GC` (e.g. `GC.Collect`,
|
||||
`GC.AddMemoryPressure`), and `System.Activator.CreateInstance` (a reflection-equivalent
|
||||
escape). Reject these in `CheckSymbol` by resolved method symbol, with a test for each.
|
||||
|
||||
**Resolution:** _(open)_
|
||||
|
||||
### Core.Scripting-002
|
||||
|
||||
| Field | Value |
|
||||
|---|---|
|
||||
| Severity | High |
|
||||
| Category | Security |
|
||||
| Location | `ForbiddenTypeAnalyzer.cs:70` |
|
||||
| Status | Open |
|
||||
|
||||
**Description:** The syntax walker only inspects four node kinds:
|
||||
`ObjectCreationExpressionSyntax`, `InvocationExpressionSyntax` with a member-access target,
|
||||
`MemberAccessExpressionSyntax`, and bare `IdentifierNameSyntax`. It never visits
|
||||
`TypeOfExpressionSyntax`, generic type-argument lists (`GenericNameSyntax` /
|
||||
`TypeArgumentListSyntax`), cast expressions (`CastExpressionSyntax`), `is`/`as` type
|
||||
patterns, `default(T)` expressions, array-creation element types, or `using`/local
|
||||
declared types. A script such as `typeof(System.IO.File)`,
|
||||
`new System.Collections.Generic.List<System.IO.FileInfo>()`,
|
||||
`(System.IO.Stream)null`, or `default(System.Reflection.Assembly)` references a forbidden
|
||||
type without ever producing a node the walker examines, so the forbidden-type check is
|
||||
bypassed. The Phase 7 plan A.6 explicitly calls out `typeof` as a sandbox-escape attempt
|
||||
that "must fail at compile" — it currently does not.
|
||||
|
||||
**Recommendation:** Walk every `TypeSyntax` node (handle `TypeOfExpressionSyntax`,
|
||||
`CastExpressionSyntax`, generic argument lists, and the type operand of
|
||||
`IsPatternExpression` / binary `as`). The simplest robust fix is to enumerate all
|
||||
`DescendantNodes()` and, for any node, resolve both `GetSymbolInfo` and `GetTypeInfo`,
|
||||
then check the resolved type plus every type argument. Add tests covering `typeof`,
|
||||
generic arguments, casts, and `default(T)` with forbidden types.
|
||||
|
||||
**Resolution:** _(open)_
|
||||
|
||||
### Core.Scripting-003
|
||||
|
||||
| Field | Value |
|
||||
|---|---|
|
||||
| Severity | Medium |
|
||||
| Category | Security |
|
||||
| Location | `TimedScriptEvaluator.cs:9`, `ScriptSandbox.cs:30` |
|
||||
| Status | Open |
|
||||
|
||||
**Description:** There is no bound on memory a script may allocate or on the number of
|
||||
threads/tasks a script may spawn. The class docs acknowledge unbounded memory as "a budget
|
||||
concern" deferred to v3, but in-process execution means a script doing
|
||||
`new byte[int.MaxValue]` repeatedly (or `Enumerable.Range(0,int.MaxValue).ToList()` — LINQ
|
||||
is allow-listed) can drive the whole server to `OutOfMemoryException`, an outage. The
|
||||
timeout does not help: the allocation can exhaust memory well before 250ms elapses, and
|
||||
the orphaned thread-pool thread documented in `TimedScriptEvaluator` keeps the allocation
|
||||
rooted. `System.Threading.Tasks` is not on the deny-list, so a script can also
|
||||
`Task.Run` an unbounded fan-out of background work that outlives the timeout entirely.
|
||||
|
||||
**Recommendation:** At minimum, document this as a known accepted risk in
|
||||
`docs/ScriptedAlarms.md` / `docs/VirtualTags.md` rather than only in a code comment, and
|
||||
add the `Task`/`Parallel` namespaces to the forbidden list (scripts are synchronous
|
||||
predicates — they have no legitimate need to start background tasks). For memory, gate
|
||||
script authoring behind an Admin permission and treat the test-harness preview as the
|
||||
control point, or track an explicit v3 issue for out-of-process execution. Record the
|
||||
decision so it is not silently lost.
|
||||
|
||||
**Resolution:** _(open)_
|
||||
|
||||
### Core.Scripting-004
|
||||
|
||||
| Field | Value |
|
||||
|---|---|
|
||||
| Severity | Medium |
|
||||
| Category | Correctness & logic bugs |
|
||||
| Location | `DependencyExtractor.cs:73` |
|
||||
| Status | Open |
|
||||
|
||||
**Description:** The walker matches tag-access calls purely by spelling — any
|
||||
`InvocationExpressionSyntax` whose member name is `GetTag` or `SetVirtualTag` is treated as
|
||||
a `ScriptContext` tag access, regardless of the receiver. A script that defines a local
|
||||
type with a `GetTag(string)` method and calls `other.GetTag("X")`, or calls
|
||||
`this.GetTag(...)` on a script-defined helper, has spurious dependencies harvested (or, if
|
||||
the literal arg is non-literal, spurious rejections raised). The XML remarks claim "as long
|
||||
as it's not on the ctx instance, the extractor doesn't pick it up", but the code does not
|
||||
check that the receiver is the `ctx` identifier — it accepts any member access with the
|
||||
matching name. The `DependencyExtractorTests.Ignores_non_ctx_method_named_GetTag` test
|
||||
passes only because the helper there is a *free* function (not member-access form); a
|
||||
member-access call to a non-ctx `GetTag` is untested and would be misattributed.
|
||||
|
||||
**Recommendation:** In `VisitInvocationExpression`, additionally require that
|
||||
`member.Expression` is an `IdentifierNameSyntax` with `Identifier.ValueText == "ctx"`
|
||||
(matching the `ScriptGlobals<TContext>.ctx` field name). Add a test for
|
||||
`someOtherObject.GetTag("X")` asserting it is ignored.
|
||||
|
||||
**Resolution:** _(open)_
|
||||
|
||||
### Core.Scripting-005
|
||||
|
||||
| Field | Value |
|
||||
|---|---|
|
||||
| Severity | Low |
|
||||
| Category | Correctness & logic bugs |
|
||||
| Location | `DependencyExtractor.cs:97` |
|
||||
| Status | Open |
|
||||
|
||||
**Description:** A raw string literal token passed as the tag path (a raw triple-quote
|
||||
literal) tokenizes as `SingleLineRawStringLiteralToken` /
|
||||
`MultiLineRawStringLiteralToken`, not `StringLiteralToken`. The check
|
||||
`literal.Token.IsKind(SyntaxKind.StringLiteralToken)` therefore rejects an
|
||||
otherwise-static raw-string path as a non-literal "dynamic path", producing a misleading
|
||||
rejection message. This is an edge case (operators rarely write raw strings for tag
|
||||
paths) but the error text would confuse anyone who does.
|
||||
|
||||
**Recommendation:** Accept all string-literal token kinds — check
|
||||
`literal.IsKind(SyntaxKind.StringLiteralExpression)` on the expression node, or include
|
||||
the raw-string token kinds, so a static raw string is harvested rather than rejected.
|
||||
|
||||
**Resolution:** _(open)_
|
||||
|
||||
### Core.Scripting-006
|
||||
|
||||
| Field | Value |
|
||||
|---|---|
|
||||
| Severity | Low |
|
||||
| Category | Concurrency & thread safety |
|
||||
| Location | `CompiledScriptCache.cs:55` |
|
||||
| Status | Open |
|
||||
|
||||
**Description:** On a failed compile the `catch` block calls
|
||||
`_cache.TryRemove(key, out _)` without a value comparison. If two threads race a miss for
|
||||
the same bad source, both observe the same faulted `Lazy` and throw, and both call
|
||||
`TryRemove(key)`. If a concurrent retry re-adds a new `Lazy` for that key between the two
|
||||
removals, the second unconditional `TryRemove` could evict the in-flight retry entry. The
|
||||
window is small and the consequence is only a redundant recompile, so severity is Low —
|
||||
but the removal should be key+value scoped for correctness.
|
||||
|
||||
**Recommendation:** Use the `ConcurrentDictionary.TryRemove(KeyValuePair<,>)` overload to
|
||||
remove only the specific faulted `Lazy` instance, so a concurrently re-added entry is not
|
||||
evicted.
|
||||
|
||||
**Resolution:** _(open)_
|
||||
|
||||
### Core.Scripting-007
|
||||
|
||||
| Field | Value |
|
||||
|---|---|
|
||||
| Severity | Medium |
|
||||
| Category | Error handling & resilience |
|
||||
| Location | `TimedScriptEvaluator.cs:60` |
|
||||
| Status | Open |
|
||||
|
||||
**Description:** `RunAsync` wraps the inner run in `Task.Run(...)` and then awaits
|
||||
`WaitAsync(Timeout, ct)`. If the caller-supplied `ct` cancels at roughly the same time the
|
||||
timeout elapses, the order in which `WaitAsync` observes the timeout vs. the cancellation
|
||||
is non-deterministic, so the same shutdown can sometimes surface as
|
||||
`ScriptTimeoutException` and sometimes as `OperationCanceledException`. The class docs
|
||||
assert "the caller's cancel wins" as a hard guarantee that the virtual-tag engine shutdown
|
||||
path depends on to avoid misclassifying shutdown as a script fault — but the
|
||||
implementation does not guarantee it when both fire close together.
|
||||
|
||||
**Recommendation:** After catching `TimeoutException`, check `ct.IsCancellationRequested`
|
||||
and throw `OperationCanceledException(ct)` instead of `ScriptTimeoutException` when the
|
||||
caller's token is cancelled, so caller cancellation deterministically wins regardless of
|
||||
race ordering.
|
||||
|
||||
**Resolution:** _(open)_
|
||||
|
||||
### Core.Scripting-008
|
||||
|
||||
| Field | Value |
|
||||
|---|---|
|
||||
| Severity | Low |
|
||||
| Category | Performance & resource management |
|
||||
| Location | `CompiledScriptCache.cs:34`, `ScriptEvaluator.cs:34` |
|
||||
| Status | Open |
|
||||
|
||||
**Description:** `CompiledScriptCache` has no capacity bound (acknowledged in the class
|
||||
remarks) and no eviction. Each cached `ScriptEvaluator` holds a Roslyn `ScriptRunner<T>`
|
||||
delegate, which keeps the dynamically emitted script assembly loaded for the process
|
||||
lifetime — emitted assemblies in the default `AssemblyLoadContext` cannot be unloaded.
|
||||
`Clear()` drops the dictionary entries but does **not** unload the emitted assemblies;
|
||||
they leak. Across many config-generation publishes (each `Clear()` followed by recompiling
|
||||
every script), the process accumulates dead script assemblies. For the expected "low
|
||||
thousands" of scripts this is benign, but a long-running server with frequent publishes
|
||||
will see steady managed-memory growth that never returns.
|
||||
|
||||
**Recommendation:** Document the per-publish assembly accretion as a known limitation, or
|
||||
compile scripts into a collectible `AssemblyLoadContext` so `Clear()` can unload prior
|
||||
generations. At minimum add a note to `docs/ScriptedAlarms.md` so operators with
|
||||
high-publish-frequency deployments are aware.
|
||||
|
||||
**Resolution:** _(open)_
|
||||
|
||||
### Core.Scripting-009
|
||||
|
||||
| Field | Value |
|
||||
|---|---|
|
||||
| Severity | Low |
|
||||
| Category | Design-document adherence |
|
||||
| Location | `ForbiddenTypeAnalyzer.cs:45` |
|
||||
| Status | Open |
|
||||
|
||||
**Description:** The Phase 7 plan decision #6
|
||||
(`docs/v2/implementation/phase-7-scripting-and-alarming.md`) enumerates the forbidden
|
||||
surface as "No HttpClient / File / Process / reflection". `ForbiddenTypeAnalyzer` actually
|
||||
denies a broader set — `System.Threading.Thread`, `System.Runtime.InteropServices`, and
|
||||
`Microsoft.Win32` (registry) — which is sensible hardening but is undocumented in the plan
|
||||
and in `docs/ScriptedAlarms.md` (which defers sandbox rules to `VirtualTags.md`). An
|
||||
operator reading the design docs cannot predict that a registry or interop reference will
|
||||
be rejected. Conversely the plan does not record the `System.Environment` /
|
||||
`System.Diagnostics` decisions. The code and the design document have drifted.
|
||||
|
||||
**Recommendation:** Update the plan's decision #6 (or `docs/VirtualTags.md`) to list the
|
||||
authoritative deny-list exactly as `ForbiddenTypeAnalyzer.ForbiddenNamespacePrefixes`
|
||||
defines it, including the `System.Environment` allowed-compromise, so the docs match the
|
||||
code.
|
||||
|
||||
**Resolution:** _(open)_
|
||||
|
||||
### Core.Scripting-010
|
||||
|
||||
| Field | Value |
|
||||
|---|---|
|
||||
| Severity | Medium |
|
||||
| Category | Testing coverage |
|
||||
| Location | `tests/Core/ZB.MOM.WW.OtOpcUa.Core.Scripting.Tests/ScriptSandboxTests.cs:54` |
|
||||
| Status | Open |
|
||||
|
||||
**Description:** The sandbox-escape test suite covers only the four obvious vectors
|
||||
(File / Http / Process / Reflection) as direct member-access calls. It does not test:
|
||||
`typeof(forbidden)`, generic type arguments (`List<FileInfo>`), cast expressions to
|
||||
forbidden types, `System.Environment.Exit` / `FailFast`, `System.Threading.Thread`,
|
||||
`System.Runtime.InteropServices`, `Microsoft.Win32` registry access, `Activator`, or
|
||||
`System.AppDomain`. Given that the analyzer is the sole security boundary for in-process
|
||||
untrusted-script execution, the gaps in Core.Scripting-001 and Core.Scripting-002 went
|
||||
undetected precisely because no test exercises those forms. The Phase 7 plan A.6 mandated
|
||||
"sandbox escape tests" but the implemented set is materially narrower than the threat
|
||||
surface.
|
||||
|
||||
**Recommendation:** Add a parameterised escape-test covering every node form in
|
||||
Core.Scripting-002 and every forbidden namespace/member in Core.Scripting-001. Each must
|
||||
assert a `ScriptSandboxViolationException` (or `CompilationErrorException`) at compile.
|
||||
|
||||
**Resolution:** _(open)_
|
||||
|
||||
### Core.Scripting-011
|
||||
|
||||
| Field | Value |
|
||||
|---|---|
|
||||
| Severity | Low |
|
||||
| Category | Testing coverage |
|
||||
| Location | `tests/Core/ZB.MOM.WW.OtOpcUa.Core.Scripting.Tests/` |
|
||||
| Status | Open |
|
||||
|
||||
**Description:** Two source files have no direct test coverage: `ScriptContext`
|
||||
(`Deadband` static helper is exercised only indirectly through `ScriptSandboxTests`, and
|
||||
not for its boundary `tolerance` behaviour) and `ScriptSandbox.Build` itself (the
|
||||
`ArgumentNullException` / `ArgumentException` guards on `contextType` at
|
||||
`ScriptSandbox.cs:45-48` are never asserted). `ScriptLogCompanionSink` and
|
||||
`ScriptLoggerFactory` have tests, but there is no test that a script's `ctx.Logger` Error
|
||||
emission surfaces via the companion sink end-to-end (factory + sink integration is
|
||||
untested). These are minor gaps but leave guard clauses and the logging integration
|
||||
unverified.
|
||||
|
||||
**Recommendation:** Add unit tests for `ScriptSandbox.Build` argument validation, for
|
||||
`ScriptContext.Deadband` at and around the tolerance boundary, and an end-to-end test that
|
||||
a script logging at Error level produces both a `scripts-*.log` event and a companion
|
||||
Warning event.
|
||||
|
||||
**Resolution:** _(open)_
|
||||
359
code-reviews/Core.VirtualTags/findings.md
Normal file
359
code-reviews/Core.VirtualTags/findings.md
Normal file
@@ -0,0 +1,359 @@
|
||||
# Code Review - Core.VirtualTags
|
||||
|
||||
| Field | Value |
|
||||
|---|---|
|
||||
| Module | `src/Core/ZB.MOM.WW.OtOpcUa.Core.VirtualTags` |
|
||||
| Reviewer | Claude Code |
|
||||
| Review date | 2026-05-22 |
|
||||
| Commit reviewed | `76d35d1` |
|
||||
| Status | Reviewed |
|
||||
| Open findings | 13 |
|
||||
|
||||
## Checklist coverage
|
||||
|
||||
A comprehensive review completes every category, recording "No issues found" where
|
||||
a category produced nothing rather than leaving it blank.
|
||||
|
||||
| # | Category | Result |
|
||||
|---|---|---|
|
||||
| 1 | Correctness & logic bugs | Core.VirtualTags-001, Core.VirtualTags-002, Core.VirtualTags-003, Core.VirtualTags-004 |
|
||||
| 2 | OtOpcUa conventions | No issues found |
|
||||
| 3 | Concurrency & thread safety | Core.VirtualTags-005, Core.VirtualTags-006 |
|
||||
| 4 | Error handling & resilience | Core.VirtualTags-007 |
|
||||
| 5 | Security | No issues found |
|
||||
| 6 | Performance & resource management | Core.VirtualTags-008, Core.VirtualTags-009 |
|
||||
| 7 | Design-document adherence | Core.VirtualTags-001, Core.VirtualTags-010 |
|
||||
| 8 | Code organization & conventions | Core.VirtualTags-011 |
|
||||
| 9 | Testing coverage | Core.VirtualTags-012 |
|
||||
| 10 | Documentation & comments | Core.VirtualTags-010, Core.VirtualTags-013 |
|
||||
|
||||
## Findings
|
||||
|
||||
### Core.VirtualTags-001
|
||||
|
||||
| Field | Value |
|
||||
|---|---|
|
||||
| Severity | High |
|
||||
| Category | Correctness & logic bugs |
|
||||
| Location | `src/Core/ZB.MOM.WW.OtOpcUa.Core.VirtualTags/VirtualTagEngine.cs:306` |
|
||||
| Status | Open |
|
||||
|
||||
**Description:** `OnScriptSetVirtualTag` updates `_valueCache`, notifies observers, and
|
||||
records history for the written path, but it does not schedule a cascade for tags that
|
||||
depend on the written path. `docs/VirtualTags.md` (VirtualTagContext section) explicitly
|
||||
states `SetVirtualTag(path, value)` "routes through the engine's `OnScriptSetVirtualTag`
|
||||
callback so cross-tag writes still participate in change-trigger cascades." They do not.
|
||||
A script that writes `ctx.SetVirtualTag("Target", x)` updates Target's cached value, but
|
||||
any virtual tag whose script reads Target via `ctx.GetTag("Target")` and is
|
||||
`ChangeTriggered = true` is never re-evaluated. Downstream virtual tags go stale until
|
||||
some unrelated trigger fires. The existing test
|
||||
`SetVirtualTag_within_script_updates_target_and_triggers_observers` only asserts the
|
||||
target itself updates and never exercises a tag depending on the target, so the gap is
|
||||
not caught.
|
||||
|
||||
**Recommendation:** Either (a) launch a fire-and-forget `CascadeAsync(path, ...)` from
|
||||
`OnScriptSetVirtualTag` (note `EvaluateInternalAsync` acquires the non-reentrant
|
||||
`_evalGate`, so the cascade must be scheduled, not invoked inline while the gate is
|
||||
held), or (b) if cascading from a script write is intentionally unsupported, correct the
|
||||
documentation and `VirtualTagContext` XML doc to say so. Decide deliberately and make
|
||||
code and docs agree.
|
||||
|
||||
**Resolution:** _(open)_
|
||||
|
||||
### Core.VirtualTags-002
|
||||
|
||||
| Field | Value |
|
||||
|---|---|
|
||||
| Severity | Medium |
|
||||
| Category | Correctness & logic bugs |
|
||||
| Location | `src/Core/ZB.MOM.WW.OtOpcUa.Core.VirtualTags/VirtualTagEngine.cs:237` |
|
||||
| Status | Open |
|
||||
|
||||
**Description:** The cold-start guard `if (!AreInputsReady(ctxCache)) return;` silently
|
||||
abandons the evaluation when any input is null or Bad-quality. For a chained virtual tag
|
||||
(C depends on B depends on driver tag A), if A is still Bad at startup, B is skipped --
|
||||
leaving B's `_valueCache` entry absent. When C evaluates, `BuildReadCache` falls through
|
||||
to `_upstream.ReadTag("B")` for the missing virtual path, which returns BadNodeIdUnknown
|
||||
quality, so C is also skipped. That is acceptable for cold start, but the same guard
|
||||
means a virtual tag that legitimately consumes a Bad-quality upstream (e.g. a script
|
||||
written to detect comms loss and emit a fallback) can never run -- it is permanently
|
||||
frozen at its prior value with no diagnostic. The tag also never transitions to a Bad
|
||||
quality of its own, so an OPC UA client cannot distinguish "not yet computed" from
|
||||
"computing fine."
|
||||
|
||||
**Recommendation:** Make the cold-start behaviour explicit: when inputs are not ready,
|
||||
publish a Bad-quality snapshot (e.g. BadWaitingForInitialData, 0x80320000) for the tag
|
||||
rather than returning with no state change, so clients see a defined quality. If
|
||||
operators need scripts that handle Bad upstreams, consider a per-definition opt-out of
|
||||
the readiness guard.
|
||||
|
||||
**Resolution:** _(open)_
|
||||
|
||||
### Core.VirtualTags-003
|
||||
|
||||
| Field | Value |
|
||||
|---|---|
|
||||
| Severity | Medium |
|
||||
| Category | Correctness & logic bugs |
|
||||
| Location | `src/Core/ZB.MOM.WW.OtOpcUa.Core.VirtualTags/VirtualTagEngine.cs:117-120` |
|
||||
| Status | Open |
|
||||
|
||||
**Description:** The upstream-subscription loop in `Load` iterates
|
||||
`definitions.SelectMany(d => _tags[d.Path].Reads)`. If `definitions` contains two rows
|
||||
with the same Path, the first registers `_tags[Path]` and the second overwrites it, but
|
||||
`definitions` still has two entries -- `_tags[d.Path]` is indexed by the second row for
|
||||
both iterations, so the first row's distinct upstream reads are silently dropped. More
|
||||
importantly, a duplicate Path in the input list is never rejected at all:
|
||||
`_tags[def.Path] = ...` and `_graph.Add(def.Path, ...)` both overwrite without warning,
|
||||
so one of two operator-authored tags with a colliding UNS path vanishes with no error.
|
||||
`Load` is documented as throwing an aggregated error for every problem; a duplicate path
|
||||
should be in that set.
|
||||
|
||||
**Recommendation:** Detect duplicate Path values while iterating `definitions` and add
|
||||
them to `compileFailures` (or a dedicated rejection list) so the aggregated
|
||||
`InvalidOperationException` reports them. Separately, iterate `_tags.Values` rather than
|
||||
`definitions.SelectMany(d => _tags[d.Path]...)` when collecting upstream paths so the
|
||||
collection is keyed off the registered set, not the raw input list.
|
||||
|
||||
**Resolution:** _(open)_
|
||||
|
||||
### Core.VirtualTags-004
|
||||
|
||||
| Field | Value |
|
||||
|---|---|
|
||||
| Severity | Low |
|
||||
| Category | Correctness & logic bugs |
|
||||
| Location | `src/Core/ZB.MOM.WW.OtOpcUa.Core.VirtualTags/VirtualTagEngine.cs:349` |
|
||||
| Status | Open |
|
||||
|
||||
**Description:** `CoerceResult`'s switch has a default arm (`_ => raw`) that returns the
|
||||
script's raw return value uncoerced for any `DriverDataType` not in the explicit list
|
||||
(e.g. an array type, Byte, or a future enum member). The resulting `DataValueSnapshot`
|
||||
then carries a value whose CLR type does not match the node's declared OPC UA data type,
|
||||
which the node manager will surface as a wire-level type mismatch or a silently wrong
|
||||
value. The doc claims a mismatch surfaces as BadTypeMismatch, but an unhandled
|
||||
`DriverDataType` bypasses coercion entirely.
|
||||
|
||||
**Recommendation:** Make the default arm explicit -- either throw / return null (which
|
||||
the outer pipeline maps to BadInternalError) for an unsupported `DriverDataType`, or
|
||||
document precisely which `DriverDataType` values `CoerceResult` supports and validate at
|
||||
`Load` time that no definition declares an unsupported type.
|
||||
|
||||
**Resolution:** _(open)_
|
||||
|
||||
### Core.VirtualTags-005
|
||||
|
||||
| Field | Value |
|
||||
|---|---|
|
||||
| Severity | Medium |
|
||||
| Category | Concurrency & thread safety |
|
||||
| Location | `src/Core/ZB.MOM.WW.OtOpcUa.Core.VirtualTags/VirtualTagSource.cs:50-64` |
|
||||
| Status | Open |
|
||||
|
||||
**Description:** `SubscribeAsync` registers the per-path engine observers first (lines
|
||||
52-56), then in a second loop reads the current value and fires the initial-data
|
||||
callback (lines 60-64). Between those two loops an upstream change can cascade and the
|
||||
engine can invoke the just-registered observer with a new value. The OPC UA client then
|
||||
receives the real change event followed by the initial-data event carrying the older
|
||||
`engine.Read(path)` snapshot -- out-of-order delivery, and the client's last-known value
|
||||
ends up stale.
|
||||
|
||||
**Recommendation:** Capture the current snapshot and fire the initial-data callback for
|
||||
each path before registering the change observer for that path (or hold a per-handle
|
||||
lock spanning both so no engine callback interleaves). The initial value must be
|
||||
delivered before any subsequent change for that path.
|
||||
|
||||
**Resolution:** _(open)_
|
||||
|
||||
### Core.VirtualTags-006
|
||||
|
||||
| Field | Value |
|
||||
|---|---|
|
||||
| Severity | Low |
|
||||
| Category | Concurrency & thread safety |
|
||||
| Location | `src/Core/ZB.MOM.WW.OtOpcUa.Core.VirtualTags/VirtualTagEngine.cs:177-182`, `:395-401` |
|
||||
| Status | Open |
|
||||
|
||||
**Description:** `Subscribe` does `_observers.GetOrAdd(path, _ => [])` then
|
||||
`lock (list) { list.Add(observer); }`. When `Unsub.Dispose` removes the last observer,
|
||||
the now-empty List is left in `_observers` and the dictionary entry is never removed.
|
||||
For a long-running server with churning OPC UA subscriptions this is an unbounded (if
|
||||
slow) growth of empty lists. There is also a benign-but-real race: a thread can call
|
||||
`GetOrAdd` and obtain a list reference that another thread's `Dispose` is about to leave
|
||||
empty in the map -- not a correctness bug today because the list object is still valid,
|
||||
but it makes any future "prune empty entries" logic racy.
|
||||
|
||||
**Recommendation:** Either accept the unbounded map and document it, or have
|
||||
`Unsub.Dispose` remove the dictionary entry when the list becomes empty under the same
|
||||
lock, re-checking emptiness inside the lock to avoid dropping a concurrently-added
|
||||
observer.
|
||||
|
||||
**Resolution:** _(open)_
|
||||
|
||||
### Core.VirtualTags-007
|
||||
|
||||
| Field | Value |
|
||||
|---|---|
|
||||
| Severity | Low |
|
||||
| Category | Error handling & resilience |
|
||||
| Location | `src/Core/ZB.MOM.WW.OtOpcUa.Core.VirtualTags/TimerTriggerScheduler.cs:58` |
|
||||
| Status | Open |
|
||||
|
||||
**Description:** `Tick` calls
|
||||
`_engine.EvaluateOneAsync(p, _cts.Token).GetAwaiter().GetResult()`, blocking the
|
||||
`System.Threading.Timer` callback thread (a thread-pool thread) for the full duration of
|
||||
the evaluation. Because `EvaluateInternalAsync` serialises all tags through `_evalGate`,
|
||||
a timer tick that races a long change-trigger cascade blocks until the cascade drains.
|
||||
With multiple interval groups, several timer callbacks can each pin a thread-pool thread
|
||||
waiting on the same gate. A group of N tags can take N times the script timeout while
|
||||
holding a pool thread, and under timer re-entrancy (a tick firing again before the prior
|
||||
finished) this compounds.
|
||||
|
||||
**Recommendation:** Make `Tick` async-aware -- store the returned Task and skip a tick
|
||||
if the previous one for that group is still running (a per-group "in flight" flag),
|
||||
rather than blocking synchronously. At minimum, document the blocking behaviour and the
|
||||
expected upper bound on group evaluation time relative to the interval.
|
||||
|
||||
**Resolution:** _(open)_
|
||||
|
||||
### Core.VirtualTags-008
|
||||
|
||||
| Field | Value |
|
||||
|---|---|
|
||||
| Severity | Medium |
|
||||
| Category | Performance & resource management |
|
||||
| Location | `src/Core/ZB.MOM.WW.OtOpcUa.Core.VirtualTags/DependencyGraph.cs:81-115` |
|
||||
| Status | Open |
|
||||
|
||||
**Description:** `TransitiveDependentsInOrder` calls `TopologicalSort()` (a full O(V+E)
|
||||
Kahn pass plus a Dictionary rank build) on every invocation, and it is invoked from
|
||||
`CascadeAsync` on every upstream change event (`OnUpstreamChange`). On a large graph with
|
||||
high-rate upstream tags this re-sorts the entire dependency graph on every protocol-rate
|
||||
delta -- pure waste, since the topological order is immutable between `Load` calls. The
|
||||
DFS that collects dependents is itself fine; only the repeated sort is the cost.
|
||||
|
||||
**Recommendation:** Compute the topological order (and the rank dictionary) once at the
|
||||
end of `Load` and cache it on `DependencyGraph` (invalidated by `Add` / `Clear`).
|
||||
`TransitiveDependentsInOrder` then reuses the cached rank map. This turns a per-event
|
||||
O(V+E) cost into an O(closure) cost.
|
||||
|
||||
**Resolution:** _(open)_
|
||||
|
||||
### Core.VirtualTags-009
|
||||
|
||||
| Field | Value |
|
||||
|---|---|
|
||||
| Severity | Low |
|
||||
| Category | Performance & resource management |
|
||||
| Location | `src/Core/ZB.MOM.WW.OtOpcUa.Core.VirtualTags/DependencyGraph.cs:64-65`, `:72-73` |
|
||||
| Status | Open |
|
||||
|
||||
**Description:** `DirectDependencies` and `DirectDependents` allocate a fresh empty
|
||||
`HashSet<string>` on every call for an unregistered node. `DirectDependents` is called
|
||||
inside the `TopologicalSort` Kahn loop and the `CascadeAsync` DFS, so for a graph with
|
||||
many leaf driver tags this allocates a throwaway set per leaf per sort. Minor, but it is
|
||||
on the change-cascade path.
|
||||
|
||||
**Recommendation:** Return a shared static empty set for the miss case instead of
|
||||
allocating each time.
|
||||
|
||||
**Resolution:** _(open)_
|
||||
|
||||
### Core.VirtualTags-010
|
||||
|
||||
| Field | Value |
|
||||
|---|---|
|
||||
| Severity | Low |
|
||||
| Category | Documentation & comments |
|
||||
| Location | `src/Core/ZB.MOM.WW.OtOpcUa.Core.VirtualTags/ITagUpstreamSource.cs:18`, `VirtualTagContext.cs:30`, `VirtualTagDefinition.cs:28` |
|
||||
| Status | Open |
|
||||
|
||||
**Description:** Several XML docs reference component names that do not exist in the
|
||||
codebase. `ITagUpstreamSource` XML doc says the subscription path "feeds the engine's
|
||||
ChangeTriggerDispatcher" -- there is no ChangeTriggerDispatcher; the actual path is
|
||||
`OnUpstreamChange` then `CascadeAsync`. `VirtualTagDefinition`'s TimerInterval and
|
||||
`VirtualTagContext` docs reference an EvaluationPipeline that likewise does not exist;
|
||||
the real type is `EvaluateInternalAsync` inside `VirtualTagEngine`. Stale type names in
|
||||
XML docs mislead maintainers searching for the named component.
|
||||
|
||||
**Recommendation:** Update the XML docs to name the real types (`VirtualTagEngine`,
|
||||
`CascadeAsync`, `EvaluateInternalAsync`) or drop the specific name in favour of a
|
||||
behavioural description.
|
||||
|
||||
**Resolution:** _(open)_
|
||||
|
||||
### Core.VirtualTags-011
|
||||
|
||||
| Field | Value |
|
||||
|---|---|
|
||||
| Severity | Low |
|
||||
| Category | Code organization & conventions |
|
||||
| Location | `src/Core/ZB.MOM.WW.OtOpcUa.Core.VirtualTags/VirtualTagEngine.cs:404-409` |
|
||||
| Status | Open |
|
||||
|
||||
**Description:** `VirtualTagState` records a Writes set (the `ctx.SetVirtualTag` targets
|
||||
extracted by `DependencyExtractor`), but nothing in the engine reads it -- it is captured
|
||||
at `Load` and never used. Declared write targets are not validated against the registered
|
||||
tag set at publish time (a script writing to a non-existent virtual path is only caught
|
||||
at runtime by `OnScriptSetVirtualTag`'s warning-and-drop), and they do not contribute to
|
||||
the dependency graph. Either the field is dead state or an intended publish-time
|
||||
validation is missing.
|
||||
|
||||
**Recommendation:** Use Writes to validate at `Load` that every `ctx.SetVirtualTag`
|
||||
target resolves to a registered virtual tag (adding an entry to `compileFailures` on a
|
||||
miss), so an operator typo is caught at publish rather than silently dropped at runtime.
|
||||
If validation is deliberately deferred, remove the unused field or comment why it is
|
||||
retained.
|
||||
|
||||
**Resolution:** _(open)_
|
||||
|
||||
### Core.VirtualTags-012
|
||||
|
||||
| Field | Value |
|
||||
|---|---|
|
||||
| Severity | Medium |
|
||||
| Category | Testing coverage |
|
||||
| Location | `tests/Core/ZB.MOM.WW.OtOpcUa.Core.VirtualTags.Tests/` |
|
||||
| Status | Open |
|
||||
|
||||
**Description:** Several behaviours of the engine have no test coverage:
|
||||
(1) the cold-start `AreInputsReady` guard -- no test exercises an upstream that is
|
||||
null/Bad at evaluation time and asserts the resulting tag state (see
|
||||
Core.VirtualTags-002);
|
||||
(2) `ctx.SetVirtualTag` cascading to a dependent of the written tag -- the existing test
|
||||
only checks the written tag itself, so the gap in Core.VirtualTags-001 is invisible to
|
||||
the suite;
|
||||
(3) the `OnScriptSetVirtualTag` warning path for a write to a non-registered path;
|
||||
(4) `EvaluateOneAsync` throwing `ArgumentException` for an unregistered path;
|
||||
(5) `CoerceResult` failure mapping to BadInternalError (only the success coercion
|
||||
double-to-int32 is tested);
|
||||
(6) duplicate Path values in a `Load` definition list (see Core.VirtualTags-003);
|
||||
(7) `Read`/`Subscribe`/`EvaluateOneAsync` calls before `Load` (the `EnsureLoaded` guard).
|
||||
|
||||
**Recommendation:** Add unit tests for each path above. Items (1), (2), and (6) directly
|
||||
correspond to open correctness findings and would have caught them.
|
||||
|
||||
**Resolution:** _(open)_
|
||||
|
||||
### Core.VirtualTags-013
|
||||
|
||||
| Field | Value |
|
||||
|---|---|
|
||||
| Severity | Low |
|
||||
| Category | Documentation & comments |
|
||||
| Location | `src/Core/ZB.MOM.WW.OtOpcUa.Core.VirtualTags/DependencyGraph.cs:266-270` |
|
||||
| Status | Open |
|
||||
|
||||
**Description:** `DependencyCycleException.BuildMessage` renders each cycle as
|
||||
`string.Join(" -> ", c) + " -> " + c[0]`, presenting the SCC member list as a traversable
|
||||
edge path that loops back to its first element. Tarjan's algorithm returns the members of
|
||||
a strongly-connected component in stack-pop order, which is not guaranteed to be a valid
|
||||
edge sequence -- for an SCC larger than 2 nodes the printed "A -> B -> C -> A" may list
|
||||
edges that do not exist. The message can therefore mislead an operator debugging a cycle
|
||||
into looking for an edge that is not in their config.
|
||||
|
||||
**Recommendation:** Either label the output as "cycle members" (a set, not an ordered
|
||||
path) rather than rendering arrows, or reconstruct an actual cycle path within the SCC
|
||||
(a single DFS back-edge walk) before formatting.
|
||||
|
||||
**Resolution:** _(open)_
|
||||
207
code-reviews/Core/findings.md
Normal file
207
code-reviews/Core/findings.md
Normal file
@@ -0,0 +1,207 @@
|
||||
# Code Review — Core
|
||||
|
||||
| Field | Value |
|
||||
|---|---|
|
||||
| Module | `src/Core/ZB.MOM.WW.OtOpcUa.Core` |
|
||||
| Reviewer | Claude Code |
|
||||
| Review date | 2026-05-22 |
|
||||
| Commit reviewed | `76d35d1` |
|
||||
| Status | Reviewed |
|
||||
| Open findings | 12 |
|
||||
|
||||
## Checklist coverage
|
||||
|
||||
| # | Category | Result |
|
||||
|---|---|---|
|
||||
| 1 | Correctness & logic bugs | Core-001, Core-002, Core-003 |
|
||||
| 2 | OtOpcUa conventions | Core-004 |
|
||||
| 3 | Concurrency & thread safety | Core-005, Core-006 |
|
||||
| 4 | Error handling & resilience | Core-007, Core-008 |
|
||||
| 5 | Security | Core-002 |
|
||||
| 6 | Performance & resource management | Core-009 |
|
||||
| 7 | Design-document adherence | Core-002, Core-003 |
|
||||
| 8 | Code organization & conventions | Core-010 |
|
||||
| 9 | Testing coverage | Core-011 |
|
||||
| 10 | Documentation & comments | Core-012 |
|
||||
|
||||
## Findings
|
||||
|
||||
### Core-001
|
||||
|
||||
| Field | Value |
|
||||
|---|---|
|
||||
| Severity | High |
|
||||
| Category | Correctness & logic bugs |
|
||||
| Location | `src/Core/ZB.MOM.WW.OtOpcUa.Core/Authorization/UserAuthorizationState.cs:50-68` |
|
||||
| Status | Open |
|
||||
|
||||
**Description:** `NeedsRefresh` can never return `true` with the default field values. `AuthCacheMaxStaleness` defaults to 5 minutes and `MembershipFreshnessInterval` defaults to 15 minutes. `NeedsRefresh(utcNow)` is defined as `!IsStale(utcNow) && elapsed > MembershipFreshnessInterval`, i.e. it needs `elapsed > 15 min` AND `elapsed <= 5 min` simultaneously — an empty set. The session crosses the staleness ceiling (5 min) and fails closed long before it ever reaches the 15-minute freshness boundary that is supposed to signal "kick off an async re-resolution while still serving cached memberships." Decision #151 / #152 in `docs/v2/implementation/phase-6-2-authorization-runtime.md` intends the freshness window (15 min, re-resolve) to be the inner trigger and the staleness ceiling to be the outer hard limit; with these defaults the ordering is inverted, so the "refresh while warm" path is dead code and every long-lived session hard-fails authorization after 5 minutes.
|
||||
|
||||
**Recommendation:** Either swap the defaults so `MembershipFreshnessInterval` (e.g. 5 min) is strictly less than `AuthCacheMaxStaleness` (e.g. 15 min) — matching the doc's stated intent — or, if the 5/15 values are correct, redefine which window is the refresh trigger and which is the fail-closed ceiling. Add a unit test asserting `NeedsRefresh` returns `true` for at least one point in time with the production defaults.
|
||||
|
||||
**Resolution:** _(open)_
|
||||
|
||||
### Core-002
|
||||
|
||||
| Field | Value |
|
||||
|---|---|
|
||||
| Severity | High |
|
||||
| Category | Security |
|
||||
| Location | `src/Core/ZB.MOM.WW.OtOpcUa.Core/Authorization/TriePermissionEvaluator.cs:24-50` |
|
||||
| Status | Open |
|
||||
|
||||
**Description:** `TriePermissionEvaluator.Authorize` never compares the session's `AuthGenerationId` against the generation of the trie it evaluates against. It calls `_cache.GetTrie(scope.ClusterId)` — the current-generation shortcut — and authorizes against whatever generation the cache happens to hold. `UserAuthorizationState` carries `AuthGenerationId` precisely so a stale session can be detected, and the Phase 6.2 design (`phase-6-2-authorization-runtime.md` adversarial-review item #3 "Redundancy-safe invalidation", plus the §Scope `PermissionTrieCache + freshness` row) requires the hot-path call to look up `CurrentGenerationId` and force a re-evaluation on mismatch. As written, a session bound at generation N silently evaluates against generation N+1 the instant another node publishes — grants added or removed in N+1 take effect for that session without the intended generation-stamp re-check, and the provenance returned in `AuthorizationDecision` misreports which generation produced the verdict.
|
||||
|
||||
**Recommendation:** In `Authorize`, after resolving the trie, compare `trie.GenerationId` to `session.AuthGenerationId`. On mismatch either fetch the session's bound generation via `_cache.GetTrie(clusterId, session.AuthGenerationId)` and evaluate against it, or signal the caller to re-resolve the session's auth state before retrying. Add a test for the publish-during-session scenario.
|
||||
|
||||
**Resolution:** _(open)_
|
||||
|
||||
### Core-003
|
||||
|
||||
| Field | Value |
|
||||
|---|---|
|
||||
| Severity | Medium |
|
||||
| Category | Correctness & logic bugs |
|
||||
| Location | `src/Core/ZB.MOM.WW.OtOpcUa.Core/Authorization/PermissionTrie.cs:80-98` |
|
||||
| Status | Open |
|
||||
|
||||
**Description:** `WalkSystemPlatform` records every Galaxy folder-segment grant with `NodeAclScopeKind.Equipment` (see the comment at lines 82-86) because `NodeAclScopeKind` has no `FolderSegment` member. The functional union of permission flags is unaffected, but the `MatchedGrant.Scope` carried in `AuthorizationDecision.Provenance` is wrong for Galaxy nodes: a grant anchored at a namespace-root folder and a grant anchored at a deep sub-folder both report `Equipment`, and a namespace-level grant is indistinguishable from a folder-level grant in the audit trail and the Admin UI "Probe this permission" diagnostic. The Phase 6.2 design (adversarial-review item #6) calls for a dedicated `FolderSegment` scope level. The current code is a known shortcut but references only an untracked "TODO" with no issue ID.
|
||||
|
||||
**Recommendation:** Add a `FolderSegment` member to `NodeAclScopeKind` and use it in `WalkSystemPlatform` and `PermissionTrieBuilder` so Galaxy folder grants report their true scope. If the enum change is deferred, file a tracked issue and reference its ID in the code comment.
|
||||
|
||||
**Resolution:** _(open)_
|
||||
|
||||
### Core-004
|
||||
|
||||
| Field | Value |
|
||||
|---|---|
|
||||
| Severity | Low |
|
||||
| Category | OtOpcUa conventions |
|
||||
| Location | `src/Core/ZB.MOM.WW.OtOpcUa.Core/Hosting/DriverHost.cs:55,72,87` |
|
||||
| Status | Open |
|
||||
|
||||
**Description:** `DriverHost` is a library type whose async calls (`driver.InitializeAsync`, `driver.ShutdownAsync`) do not use `ConfigureAwait(false)`, whereas the sibling `CapabilityInvoker` and `AlarmSurfaceInvoker` in the same module consistently do. The server host has no synchronization context so behaviour is currently correct, but the inconsistency is a maintenance hazard and a deviation from the established convention in `Core.Resilience`.
|
||||
|
||||
**Recommendation:** Add `.ConfigureAwait(false)` to the three awaited calls in `DriverHost.RegisterAsync`, `UnregisterAsync`, and `DisposeAsync`.
|
||||
|
||||
**Resolution:** _(open)_
|
||||
|
||||
### Core-005
|
||||
|
||||
| Field | Value |
|
||||
|---|---|
|
||||
| Severity | Medium |
|
||||
| Category | Concurrency & thread safety |
|
||||
| Location | `src/Core/ZB.MOM.WW.OtOpcUa.Core/Authorization/PermissionTrieCache.cs:59-70` |
|
||||
| Status | Open |
|
||||
|
||||
**Description:** `Prune` mutates the `ConcurrentDictionary` with a plain indexer assignment (`_byCluster[clusterId] = new ClusterEntry(...)`) after a separate `TryGetValue` read. If `Install` runs concurrently for the same cluster, the `AddOrUpdate` in `Install` and the indexer write in `Prune` race: `Prune` can read an entry, `Install` then adds a newer generation via `AddOrUpdate`, and `Prune`'s unconditional indexer write then overwrites the entry — silently dropping the just-installed newest generation and its `Current` pointer. The class is documented as a process-singleton accessed on the hot path while publishes install new tries, so the race is reachable.
|
||||
|
||||
**Recommendation:** Make `Prune` use an atomic compare-and-swap loop — `_byCluster.TryUpdate(clusterId, prunedEntry, observedEntry)` retried until it succeeds or the key is gone — or perform the prune inside an `AddOrUpdate` update factory.
|
||||
|
||||
**Resolution:** _(open)_
|
||||
|
||||
### Core-006
|
||||
|
||||
| Field | Value |
|
||||
|---|---|
|
||||
| Severity | Medium |
|
||||
| Category | Concurrency & thread safety |
|
||||
| Location | `src/Core/ZB.MOM.WW.OtOpcUa.Core/OpcUa/GenericDriverNodeManager.cs:42-64` |
|
||||
| Status | Open |
|
||||
|
||||
**Description:** `BuildAddressSpaceAsync` is not guarded against being called more than once. A second call subscribes a second `_alarmForwarder` to `IAlarmSource.OnAlarmEvent` and overwrites the `_alarmForwarder` field, so the first delegate is leaked (still subscribed, never unsubscribed because `Dispose` only removes the field's current value). Every alarm transition would then be delivered to its sink twice. The address-space rebuild path on Galaxy redeploy (`DeployWatcher` → `IRediscoverable.OnRediscoveryNeeded` → server rebuilds the address space) is exactly the scenario where a node manager could legitimately be re-walked. There is also no check of the `_disposed` flag at the top of the method.
|
||||
|
||||
**Recommendation:** Either guard `BuildAddressSpaceAsync` so a second call throws `InvalidOperationException` (document it single-shot), or unsubscribe the previous `_alarmForwarder` and clear `_alarmSinks` before re-walking. Also check `_disposed` and throw `ObjectDisposedException` if already disposed.
|
||||
|
||||
**Resolution:** _(open)_
|
||||
|
||||
### Core-007
|
||||
|
||||
| Field | Value |
|
||||
|---|---|
|
||||
| Severity | Medium |
|
||||
| Category | Error handling & resilience |
|
||||
| Location | `src/Core/ZB.MOM.WW.OtOpcUa.Core/Resilience/AlarmSurfaceInvoker.cs:75-83` |
|
||||
| Status | Open |
|
||||
|
||||
**Description:** `UnsubscribeAsync` always routes through `_defaultHost`, even when an `IPerCallHostResolver` is wired and the original `SubscribeAsync` fanned the subscription out to a non-default host. The `IAlarmSubscriptionHandle` is opaque here and carries no host association, so an unsubscribe for a subscription created against host B runs through host A's resilience pipeline. In a multi-host driver this charges the wrong host's circuit breaker / bulkhead and, if host A is open while host B is healthy, can spuriously block a valid unsubscribe. The XML doc claims it routes "for parity" with `SubscribeAsync` but subscribe is per-host and unsubscribe is not.
|
||||
|
||||
**Recommendation:** Carry the resolved host on the `IAlarmSubscriptionHandle` (or in a handle→host map kept by `AlarmSurfaceInvoker`) so `UnsubscribeAsync` routes through the same host's pipeline the subscription was created on.
|
||||
|
||||
**Resolution:** _(open)_
|
||||
|
||||
### Core-008
|
||||
|
||||
| Field | Value |
|
||||
|---|---|
|
||||
| Severity | Low |
|
||||
| Category | Error handling & resilience |
|
||||
| Location | `src/Core/ZB.MOM.WW.OtOpcUa.Core/OpcUa/GenericDriverNodeManager.cs:42-64` |
|
||||
| Status | Open |
|
||||
|
||||
**Description:** The XML summary of `BuildAddressSpaceAsync` states "Driver exceptions are isolated per decision #12 — the driver's subtree is marked Faulted, but other drivers remain available." The method body contains no such isolation: an exception from `discovery.DiscoverAsync` propagates straight out unhandled, and nothing here marks a subtree Faulted. The isolation is presumably done by the server-layer caller, but the comment asserts behaviour this class does not implement.
|
||||
|
||||
**Recommendation:** Either implement the documented isolation in `GenericDriverNodeManager`, or correct the XML doc to state that exception isolation is the caller's responsibility and name the type that performs it.
|
||||
|
||||
**Resolution:** _(open)_
|
||||
|
||||
### Core-009
|
||||
|
||||
| Field | Value |
|
||||
|---|---|
|
||||
| Severity | Low |
|
||||
| Category | Performance & resource management |
|
||||
| Location | `src/Core/ZB.MOM.WW.OtOpcUa.Core/Resilience/CapabilityInvoker.cs:121-128` |
|
||||
| Status | Open |
|
||||
|
||||
**Description:** `ExecuteWriteAsync` calls `_optionsAccessor()` three times for a single non-idempotent write (once for the `with` expression, once inside the dictionary initializer for `.Resolve(...)`, plus the discarded base). On the per-write hot path it rebuilds a fresh `DriverResilienceOptions` and a one-entry dictionary on every non-idempotent write, and the redundant accessor calls could observe two different snapshots if an Admin edit lands between them. Phase 6.1 budgets a 1% pipeline overhead; this is unnecessary allocation plus a minor consistency hazard.
|
||||
|
||||
**Recommendation:** Capture `var options = _optionsAccessor();` once at the top of the non-idempotent branch and derive both the `with` and the `Resolve` call from that snapshot. Consider caching the no-retry pipeline keyed on `(hostName, non-idempotent)`.
|
||||
|
||||
**Resolution:** _(open)_
|
||||
|
||||
### Core-010
|
||||
|
||||
| Field | Value |
|
||||
|---|---|
|
||||
| Severity | Low |
|
||||
| Category | Code organization & conventions |
|
||||
| Location | `src/Core/ZB.MOM.WW.OtOpcUa.Core/Resilience/DriverResilienceOptions.cs:45-52` |
|
||||
| Status | Open |
|
||||
|
||||
**Description:** `DriverResilienceOptions.Resolve` indexes the tier-default dictionary directly (`defaults[capability]`) with no fallback. Any future addition to `DriverCapability` that is not also added to all three tier tables in `GetTierDefaults` will make `Resolve` throw `KeyNotFoundException` at runtime on the capability hot path rather than failing at build time. The two are coupled by convention only.
|
||||
|
||||
**Recommendation:** Either add a `default` arm to `Resolve` returning a conservative policy (and logging), or add a unit-test invariant asserting every `DriverCapability` value is present in each tier's default table.
|
||||
|
||||
**Resolution:** _(open)_
|
||||
|
||||
### Core-011
|
||||
|
||||
| Field | Value |
|
||||
|---|---|
|
||||
| Severity | Low |
|
||||
| Category | Testing coverage |
|
||||
| Location | `src/Core/ZB.MOM.WW.OtOpcUa.Core/Authorization/PermissionTrieBuilder.cs:58-75` |
|
||||
| Status | Open |
|
||||
|
||||
**Description:** `PermissionTrieBuilder.Descend` has a two-branch behaviour: with a `scopePaths` lookup it descends the real hierarchy; without one it falls back to placing every non-cluster row directly under the root keyed by `ScopeId` ("works for deterministic tests, not for production"). The fallback silently produces a structurally incorrect trie when `scopePaths` is null or a row's `ScopeId` is missing — a UnsLine-scoped grant ends up as a direct child of the root, so `WalkEquipment` / `WalkSystemPlatform` never reach it and the grant is effectively dropped, with no diagnostic. There is no test asserting the production multi-level descent versus the fallback.
|
||||
|
||||
**Recommendation:** Add unit tests covering `Build` with `scopePaths` producing the correct multi-level trie and the missing-`ScopeId` fallback. Have `Descend` surface a diagnostic (or throw outside test configuration) when a sub-cluster row cannot be located in `scopePaths`.
|
||||
|
||||
**Resolution:** _(open)_
|
||||
|
||||
### Core-012
|
||||
|
||||
| Field | Value |
|
||||
|---|---|
|
||||
| Severity | Low |
|
||||
| Category | Documentation & comments |
|
||||
| Location | `src/Core/ZB.MOM.WW.OtOpcUa.Core/Stability/WedgeDetector.cs:26`, `src/Core/ZB.MOM.WW.OtOpcUa.Core/Observability/DriverHealthReport.cs:11-22` |
|
||||
| Status | Open |
|
||||
|
||||
**Description:** Two stale doc comments. (1) `WedgeDetector` — the `<summary>` above the constructor reads "Whether the driver reported itself `DriverState.Healthy` at construction." The constructor takes only a `TimeSpan threshold` and the detector is documented as stateless; the comment describes nothing the constructor does. (2) `DriverHealthReport` — the `<remarks>` state matrix lists Unknown, Initializing, Healthy, Degraded, Faulted but `Aggregate` (lines 42-44) also folds `DriverState.Reconnecting` into the Degraded verdict. `Reconnecting` is a real `DriverState` member absent from the documented matrix.
|
||||
|
||||
**Recommendation:** Replace the `WedgeDetector` constructor `<summary>` with an accurate description (e.g. "Construct with the wedge-detection threshold; values below 60 s clamp to 60 s"). Add `Reconnecting` to the `DriverHealthReport` `<remarks>` state matrix and state it maps to Degraded.
|
||||
|
||||
**Resolution:** _(open)_
|
||||
238
code-reviews/Driver.AbCip.Cli/findings.md
Normal file
238
code-reviews/Driver.AbCip.Cli/findings.md
Normal file
@@ -0,0 +1,238 @@
|
||||
# Code Review — Driver.AbCip.Cli
|
||||
|
||||
| Field | Value |
|
||||
|---|---|
|
||||
| Module | `src/Drivers/Cli/ZB.MOM.WW.OtOpcUa.Driver.AbCip.Cli` |
|
||||
| Reviewer | Claude Code |
|
||||
| Review date | 2026-05-22 |
|
||||
| Commit reviewed | `76d35d1` |
|
||||
| Status | Reviewed |
|
||||
| Open findings | 8 |
|
||||
|
||||
## Checklist coverage
|
||||
|
||||
A comprehensive review completes every category, recording "No issues found" where
|
||||
a category produced nothing rather than leaving it blank.
|
||||
|
||||
| # | Category | Result |
|
||||
|---|---|---|
|
||||
| 1 | Correctness & logic bugs | Driver.AbCip.Cli-001, Driver.AbCip.Cli-002 |
|
||||
| 2 | OtOpcUa conventions | No issues found |
|
||||
| 3 | Concurrency & thread safety | Driver.AbCip.Cli-003 |
|
||||
| 4 | Error handling & resilience | Driver.AbCip.Cli-001, Driver.AbCip.Cli-004 |
|
||||
| 5 | Security | No issues found |
|
||||
| 6 | Performance & resource management | Driver.AbCip.Cli-005 |
|
||||
| 7 | Design-document adherence | Driver.AbCip.Cli-006 |
|
||||
| 8 | Code organization & conventions | No issues found |
|
||||
| 9 | Testing coverage | Driver.AbCip.Cli-007 |
|
||||
| 10 | Documentation & comments | Driver.AbCip.Cli-008 |
|
||||
|
||||
## Findings
|
||||
|
||||
<!-- One ### entry per finding. IDs are <Module>-NNN, sequential within the module,
|
||||
never reused. Findings are never deleted — close them by changing Status and
|
||||
completing Resolution. -->
|
||||
|
||||
### Driver.AbCip.Cli-001
|
||||
|
||||
| Field | Value |
|
||||
|---|---|
|
||||
| Severity | Medium |
|
||||
| Category | Error handling & resilience |
|
||||
| Location | `src/Drivers/Cli/ZB.MOM.WW.OtOpcUa.Driver.AbCip.Cli/Commands/WriteCommand.cs:70-85` |
|
||||
| Status | Open |
|
||||
|
||||
**Description:** `ParseValue` parses every numeric Logix type with the BCL `*.Parse`
|
||||
methods (`sbyte.Parse`, `short.Parse`, `int.Parse`, `float.Parse`, ...). These throw
|
||||
the raw `FormatException` and `OverflowException` on bad operator input. The module's
|
||||
own test `ParseValue_non_numeric_for_numeric_types_throws` confirms a raw
|
||||
`FormatException` escapes for `DInt`. Meanwhile the `Bool` branch and the `_ =>`
|
||||
default branch throw the CLI-friendly `CliFx.Exceptions.CommandException` with an
|
||||
actionable message. The result is inconsistent operator UX: a typo in a boolean
|
||||
value prints "Boolean value 'x' is not recognised...", but a typo in a numeric
|
||||
value (`write -v 12x --type DInt`, or an out-of-range `write -v 99999999999 --type
|
||||
Int`) escapes uncaught and CliFx renders a full .NET stack trace instead of a
|
||||
one-line error. CliFx only formats `CommandException` cleanly.
|
||||
|
||||
**Recommendation:** Wrap the numeric `*.Parse` calls (or the whole `switch`) in a
|
||||
`try`/`catch (Exception ex) when (ex is FormatException or OverflowException)` that
|
||||
rethrows as a `CommandException` with the raw value, the target `--type`, and the
|
||||
valid range — mirroring the `ParseBool` failure message.
|
||||
|
||||
**Resolution:** _(open)_
|
||||
|
||||
### Driver.AbCip.Cli-002
|
||||
|
||||
| Field | Value |
|
||||
|---|---|
|
||||
| Severity | Medium |
|
||||
| Category | Correctness & logic bugs |
|
||||
| Location | `src/Drivers/Cli/ZB.MOM.WW.OtOpcUa.Driver.AbCip.Cli/Commands/ProbeCommand.cs:21-23`; `Commands/ReadCommand.cs:24-25`; `Commands/SubscribeCommand.cs:20-22` |
|
||||
| Status | Open |
|
||||
|
||||
**Description:** `ProbeCommand`, `ReadCommand`, and `SubscribeCommand` expose
|
||||
`--type` as a free `AbCipDataType` enum option with no exclusion of
|
||||
`AbCipDataType.Structure`. Only `WriteCommand` rejects `Structure` (with an explicit
|
||||
`CommandException`). Passing `probe/read/subscribe --type Structure` synthesises a
|
||||
tag with `DataType = Structure` and no `Members` declared. The driver read path
|
||||
treats a memberless Structure tag as a black box and routes it to the per-tag
|
||||
fallback, where `runtime.DecodeValue(AbCipDataType.Structure, ...)` runs with no
|
||||
declared layout — the operator gets either an opaque value or a confusing status
|
||||
code rather than the clean "Structure writes need an explicit member layout"
|
||||
guidance `write` gives. The `read` doc comment even claims "UDT / Structure reads
|
||||
are out of scope here", but the code does not enforce it.
|
||||
|
||||
**Recommendation:** Reject `AbCipDataType.Structure` in `ProbeCommand`,
|
||||
`ReadCommand`, and `SubscribeCommand` `ExecuteAsync` with the same `CommandException`
|
||||
pattern `WriteCommand` uses, or factor a shared `RejectStructure(DataType)` guard
|
||||
into `AbCipCommandBase`.
|
||||
|
||||
**Resolution:** _(open)_
|
||||
|
||||
### Driver.AbCip.Cli-003
|
||||
|
||||
| Field | Value |
|
||||
|---|---|
|
||||
| Severity | Low |
|
||||
| Category | Concurrency & thread safety |
|
||||
| Location | `src/Drivers/Cli/ZB.MOM.WW.OtOpcUa.Driver.AbCip.Cli/Commands/SubscribeCommand.cs:50-56,60-61` |
|
||||
| Status | Open |
|
||||
|
||||
**Description:** The `OnDataChange` handler writes change lines to `console.Output`
|
||||
(a `TextWriter`) from the driver's poll-engine callback thread, while the command's
|
||||
main flow concurrently writes the "Subscribed to ... Ctrl+C to stop." line on the
|
||||
CLI thread. `TextWriter.WriteLine` is not guaranteed thread-safe; concurrent writes
|
||||
from the poll thread and the main thread can interleave or, in the worst case,
|
||||
corrupt buffered output. The window is small (one main-thread write right after
|
||||
`SubscribeAsync`) but it exists, and any future addition of main-thread output
|
||||
during the watch loop widens it.
|
||||
|
||||
**Recommendation:** Emit the "Subscribed..." banner before registering the
|
||||
`OnDataChange` handler (or before `SubscribeAsync`), or guard all `console.Output`
|
||||
writes during the subscription with a shared lock so poll-thread and main-thread
|
||||
output cannot interleave.
|
||||
|
||||
**Resolution:** _(open)_
|
||||
|
||||
### Driver.AbCip.Cli-004
|
||||
|
||||
| Field | Value |
|
||||
|---|---|
|
||||
| Severity | Low |
|
||||
| Category | Error handling & resilience |
|
||||
| Location | `src/Drivers/Cli/ZB.MOM.WW.OtOpcUa.Driver.AbCip.Cli/Commands/SubscribeCommand.cs:28,58`; `AbCipCommandBase.cs:26-34` |
|
||||
| Status | Open |
|
||||
|
||||
**Description:** `--interval-ms` (`IntervalMs`) is taken verbatim and passed as
|
||||
`TimeSpan.FromMilliseconds(IntervalMs)` to `SubscribeAsync` with no validation. A
|
||||
zero or negative value produces a non-positive `TimeSpan`; the option description
|
||||
claims "PollGroupEngine floors sub-250ms values" but says nothing about `0` or
|
||||
negatives, and the flooring behaviour is the engine's, not the CLI's — relying on a
|
||||
downstream component to sanitise operator input is fragile. `--timeout-ms` on
|
||||
`AbCipCommandBase` has the same gap (a negative value yields a negative `TimeSpan`).
|
||||
|
||||
**Recommendation:** Validate `IntervalMs > 0` and `TimeoutMs > 0` at the top of
|
||||
`ExecuteAsync` / in `AbCipCommandBase`, throwing a `CommandException` with the
|
||||
accepted range when out of bounds.
|
||||
|
||||
**Resolution:** _(open)_
|
||||
|
||||
### Driver.AbCip.Cli-005
|
||||
|
||||
| Field | Value |
|
||||
|---|---|
|
||||
| Severity | Low |
|
||||
| Category | Performance & resource management |
|
||||
| Location | `src/Drivers/Cli/ZB.MOM.WW.OtOpcUa.Driver.Cli.Common/DriverCommandBase.cs:51-59` |
|
||||
| Status | Open |
|
||||
|
||||
**Description:** `ConfigureLogging` assigns a freshly created Serilog logger to the
|
||||
process-global `Log.Logger` but never calls `Log.CloseAndFlush()`. For a short-lived
|
||||
one-shot command (`probe`, `read`, `write`) the process exit flushes the console
|
||||
sink, so the practical impact is nil. For `subscribe` — a long-running command
|
||||
terminated by Ctrl+C — buffered log lines emitted just before cancellation can be
|
||||
lost on abrupt exit. (This lives in the shared `Driver.Cli.Common` base, so it is
|
||||
noted here as it affects the AB CIP CLI; the canonical fix belongs in that shared
|
||||
module's review.)
|
||||
|
||||
**Recommendation:** Register `Log.CloseAndFlush()` on process exit (e.g. via
|
||||
`AppDomain.ProcessExit` or a `finally` in the command), or have the CLI use a
|
||||
disposable logger scoped to `ExecuteAsync`.
|
||||
|
||||
**Resolution:** _(open)_
|
||||
|
||||
### Driver.AbCip.Cli-006
|
||||
|
||||
| Field | Value |
|
||||
|---|---|
|
||||
| Severity | Low |
|
||||
| Category | Design-document adherence |
|
||||
| Location | `src/Drivers/Cli/ZB.MOM.WW.OtOpcUa.Driver.AbCip.Cli/AbCipCommandBase.cs:29-34` |
|
||||
| Status | Open |
|
||||
|
||||
**Description:** `AbCipCommandBase` overrides the abstract `DriverCommandBase.Timeout`
|
||||
property with a getter derived from `TimeoutMs` and an empty `init` body
|
||||
(`init { /* driven by TimeoutMs */ }`). Because the override has no
|
||||
`[CommandOption]` attribute, CliFx never binds it, so the empty `init` is unreachable
|
||||
in normal CLI use. However, an empty `init` accessor silently discards any
|
||||
assignment — if a future caller or test constructs the command via an object
|
||||
initializer (`new ReadCommand { Timeout = ... }`) the assignment is silently dropped
|
||||
with no compiler warning. This is a latent correctness trap rather than a current
|
||||
bug.
|
||||
|
||||
**Recommendation:** Either drop the `init` accessor entirely (make the override a
|
||||
get-only expression-bodied property) or have the empty `init` throw
|
||||
`NotSupportedException` to make the "driven by TimeoutMs" contract explicit and
|
||||
fail-fast.
|
||||
|
||||
**Resolution:** _(open)_
|
||||
|
||||
### Driver.AbCip.Cli-007
|
||||
|
||||
| Field | Value |
|
||||
|---|---|
|
||||
| Severity | Low |
|
||||
| Category | Testing coverage |
|
||||
| Location | `tests/Drivers/Cli/ZB.MOM.WW.OtOpcUa.Driver.AbCip.Cli.Tests/WriteCommandParseValueTests.cs` |
|
||||
| Status | Open |
|
||||
|
||||
**Description:** The only test file covers `WriteCommand.ParseValue` and
|
||||
`ReadCommand.SynthesiseTagName` — both pure static helpers. There is no coverage for
|
||||
`AbCipCommandBase.BuildOptions` (the flag-to-`AbCipDriverOptions` mapping that all
|
||||
four commands depend on) or `DriverInstanceId`. `BuildOptions` is pure and trivially
|
||||
unit-testable yet untested: a regression that, say, flipped `EnableAlarmProjection`
|
||||
back on or dropped `Probe.Enabled = false` would not be caught — and the comment
|
||||
explicitly warns the probe loop "would race the operator's own reads", so that
|
||||
mapping is behaviourally load-bearing. The `ExecuteAsync` bodies are reasonably left
|
||||
untested (they need a fake `AbCipDriver` or hardware), consistent with the other
|
||||
driver CLIs.
|
||||
|
||||
**Recommendation:** Add unit tests asserting `BuildOptions` produces
|
||||
`Probe.Enabled == false`, `EnableControllerBrowse == false`,
|
||||
`EnableAlarmProjection == false`, the expected single `AbCipDeviceOptions`
|
||||
(`HostAddress`, `PlcFamily`, `DeviceName`), the supplied tag list, and the `Timeout`
|
||||
derived from `TimeoutMs`.
|
||||
|
||||
**Resolution:** _(open)_
|
||||
|
||||
### Driver.AbCip.Cli-008
|
||||
|
||||
| Field | Value |
|
||||
|---|---|
|
||||
| Severity | Low |
|
||||
| Category | Documentation & comments |
|
||||
| Location | `docs/Driver.AbCip.Cli.md:8-9` |
|
||||
| Status | Open |
|
||||
|
||||
**Description:** `docs/Driver.AbCip.Cli.md` opens with "Second of four driver
|
||||
test-client CLIs (Modbus -> AB CIP -> AB Legacy -> S7 -> TwinCAT)." The count "four"
|
||||
contradicts the chain that follows it (five names) and contradicts
|
||||
`docs/DriverClis.md`, which documents six CLIs (Modbus, AB CIP, AB Legacy, S7,
|
||||
TwinCAT, FOCAS). The FOCAS CLI shipped alongside the Tier-C work, so the AB CIP
|
||||
doc's "four" and the truncated chain are both stale.
|
||||
|
||||
**Recommendation:** Update the sentence to "Second of six driver test-client CLIs"
|
||||
and complete the chain (Modbus -> AB CIP -> AB Legacy -> S7 -> TwinCAT -> FOCAS), or
|
||||
drop the explicit count and link `docs/DriverClis.md` as the authoritative roster.
|
||||
|
||||
**Resolution:** _(open)_
|
||||
252
code-reviews/Driver.AbCip/findings.md
Normal file
252
code-reviews/Driver.AbCip/findings.md
Normal file
@@ -0,0 +1,252 @@
|
||||
# Code Review — Driver.AbCip
|
||||
|
||||
| Field | Value |
|
||||
|---|---|
|
||||
| Module | `src/Drivers/ZB.MOM.WW.OtOpcUa.Driver.AbCip` |
|
||||
| Reviewer | Claude Code |
|
||||
| Review date | 2026-05-22 |
|
||||
| Commit reviewed | `76d35d1` |
|
||||
| Status | Reviewed |
|
||||
| Open findings | 15 |
|
||||
|
||||
## Checklist coverage
|
||||
|
||||
| # | Category | Result |
|
||||
|---|---|---|
|
||||
| 1 | Correctness & logic bugs | Driver.AbCip-001, Driver.AbCip-002, Driver.AbCip-003, Driver.AbCip-004, Driver.AbCip-005 |
|
||||
| 2 | OtOpcUa conventions | Driver.AbCip-006, Driver.AbCip-007 |
|
||||
| 3 | Concurrency & thread safety | Driver.AbCip-008, Driver.AbCip-009 |
|
||||
| 4 | Error handling & resilience | Driver.AbCip-010, Driver.AbCip-011 |
|
||||
| 5 | Security | No issues found |
|
||||
| 6 | Performance & resource management | Driver.AbCip-012 |
|
||||
| 7 | Design-document adherence | Driver.AbCip-013 |
|
||||
| 8 | Code organization & conventions | No issues found |
|
||||
| 9 | Testing coverage | Driver.AbCip-014 |
|
||||
| 10 | Documentation & comments | Driver.AbCip-015 |
|
||||
|
||||
## Findings
|
||||
|
||||
### Driver.AbCip-001
|
||||
|
||||
| Field | Value |
|
||||
|---|---|
|
||||
| Severity | High |
|
||||
| Category | Correctness & logic bugs |
|
||||
| Location | `AbCipDriver.cs:111`, `AbCipDriver.cs:163-167` |
|
||||
| Status | Open |
|
||||
|
||||
**Description:** `InitializeAsync(string driverConfigJson, ...)` never reads `driverConfigJson`. It builds all device/tag state from `_options`, captured at construction time. `ReinitializeAsync` calls `ShutdownAsync` then `InitializeAsync(driverConfigJson, ...)` and the JSON it is handed is silently discarded. `ReinitializeAsync` is documented (class remarks, lines 18-21) as the Tier-B escape hatch and is the IDriver entry point for picking up changed config. As written, a reinitialize with an updated config JSON (new device, new tag, changed timeout) applies none of the changes; the driver keeps running stale construction-time options. There is no validation that the passed JSON even matches the live options.
|
||||
|
||||
**Recommendation:** Either parse `driverConfigJson` inside `InitializeAsync` (re-deriving `AbCipDriverOptions` the way `AbCipDriverFactoryExtensions.CreateInstance` does, so config changes take effect on reinit), or, if config is intentionally immutable for the instance lifetime, document explicitly that AbCip ignores the parameter and assert the JSON is structurally identical to the construction options. Silently discarding it is the worst of both.
|
||||
|
||||
**Resolution:** _(open)_
|
||||
|
||||
### Driver.AbCip-002
|
||||
|
||||
| Field | Value |
|
||||
|---|---|
|
||||
| Severity | High |
|
||||
| Category | Correctness & logic bugs |
|
||||
| Location | `AbCipStatusMapper.cs:65-78` |
|
||||
| Status | Open |
|
||||
|
||||
**Description:** `MapLibplctagStatus` maps negative libplctag codes that do not match the libplctag.NET `Status` enum / native `libplctag.h` constants. `LibplctagTagRuntime.GetStatus()` returns `(int)_tag.GetStatus()`, the underlying value of the `Status` enum, whose members carry the native `PLCTAG_ERR_*` integer values. The real constants are `PLCTAG_ERR_BAD_CONNECTION = -7` (the only one the code gets right), `PLCTAG_ERR_NOT_FOUND = -18` (code expects -14), `PLCTAG_ERR_NOT_ALLOWED = -19` (code expects -16), `PLCTAG_ERR_OUT_OF_BOUNDS = -22` (code expects -17), `PLCTAG_ERR_TIMEOUT = -32` (code expects -5). Consequently a real timeout, not-found, not-allowed, or out-of-bounds error all fall through the switch to the `_ => BadCommunicationError` default. The driver reports `BadCommunicationError` for a non-existent tag instead of `BadNodeIdUnknown`, for a read-only tag instead of `BadNotWritable`, and for a timeout instead of `BadTimeout`. This defeats the transient-vs-permanent error classification the resilience pipeline relies on.
|
||||
|
||||
**Recommendation:** Replace the hand-typed integer literals with the libplctag.NET `Status` enum members (Status.ErrorTimeout, Status.ErrorNotFound, Status.ErrorNotAllowed, Status.ErrorOutOfBounds, Status.ErrorBadConnection, etc.), or at minimum correct the integer values to -32 / -18 / -19 / -22. Map Status.Pending explicitly rather than treating "any positive value" as GoodMoreData.
|
||||
|
||||
**Resolution:** _(open)_
|
||||
|
||||
### Driver.AbCip-003
|
||||
|
||||
| Field | Value |
|
||||
|---|---|
|
||||
| Severity | High |
|
||||
| Category | Correctness & logic bugs |
|
||||
| Location | `AbCipUdtMemberLayout.cs:32-54`, `AbCipDriver.cs:426-430`, `AbCipUdtReadPlanner.cs:48` |
|
||||
| Status | Open |
|
||||
|
||||
**Description:** The whole-UDT read path (`ReadGroupAsync`) decodes each grouped member at the byte offset produced by `AbCipUdtMemberLayout.TryBuild`, which computes offsets purely from declaration order of the configured `AbCipStructureMember` list under natural-alignment rules. Logix does not guarantee that the controller lays UDT members out in declaration order: the Studio 5000 compiler reorders members (largest-first packing, BOOL host bytes, nested-struct padding) and the on-wire offsets only come from the CIP Template Object. The class remarks on `AbCipUdtMemberLayout` and `driver-specs.md` both acknowledge this. The decoder for the real shape (`CipTemplateObjectDecoder` / `AbCipTemplateCache`) exists and is populated by `FetchUdtShapeAsync`, but `ReadGroupAsync` never consults it: it always uses the declaration-only layout. For any UDT whose member declaration order in config differs from the controller compiled layout, whole-UDT reads return values decoded from the wrong offsets, silently plausible wrong numbers.
|
||||
|
||||
**Recommendation:** In the read planner / `ReadGroupAsync`, prefer the cached `AbCipUdtShape` offsets (from `AbCipTemplateCache` / `FetchUdtShapeAsync`) when available, and only fall back to `AbCipUdtMemberLayout` declaration-order offsets when no template shape can be read. Even then, consider gating the declaration-only fast path behind an explicit opt-in flag, since it is correct only when the operator has hand-verified declaration order matches the controller.
|
||||
|
||||
**Resolution:** _(open)_
|
||||
|
||||
### Driver.AbCip-004
|
||||
|
||||
| Field | Value |
|
||||
|---|---|
|
||||
| Severity | Medium |
|
||||
| Category | Correctness & logic bugs |
|
||||
| Location | `AbCipDataType.cs:51-58`, `LibplctagTagRuntime.cs:47-49,53` |
|
||||
| Status | Open |
|
||||
|
||||
**Description:** `ToDriverDataType` maps `LInt`/`ULInt` to `DriverDataType.Int32` (a TODO comment notes the gap) and `Dt` to `Int32`. But `LibplctagTagRuntime.DecodeValueAt` returns an actual `long` for `LInt`/`ULInt` (`_tag.GetInt64`, `(long)_tag.GetUInt64`). The address space is built declaring an Int32 node while the driver hands the server a boxed `long` `DataValueSnapshot.Value` at runtime: a mismatch between the declared OPC UA data type and the runtime value type. For `LInt` values exceeding Int32.MaxValue there is data loss if any consumer narrows it. `UDInt` is declared Int32 but decoded as `(int)_tag.GetUInt32`, so values above int.MaxValue wrap to negative.
|
||||
|
||||
**Recommendation:** Either add Int64/UInt32/UInt64 to `DriverDataType` and map correctly, or, until that lands, decode `LInt`/`ULInt` consistently with the declared `Int32` type (and document the truncation), and decode `UDInt` as a value that fits Int32 semantics. The declared type and the runtime value type must agree.
|
||||
|
||||
**Resolution:** _(open)_
|
||||
|
||||
### Driver.AbCip-005
|
||||
|
||||
| Field | Value |
|
||||
|---|---|
|
||||
| Severity | Medium |
|
||||
| Category | Correctness & logic bugs |
|
||||
| Location | `AbCipDriver.cs:124-141` |
|
||||
| Status | Open |
|
||||
|
||||
**Description:** In `InitializeAsync`, when a `Structure` tag declares `Members`, the loop registers each fanned-out member into `_tagsByName` but the parent Structure tag itself is also left in `_tagsByName` (added at line 125 before the member check). A subsequent `ReadAsync` of the parent name routes through `ReadSingleAsync` then `DecodeValue(AbCipDataType.Structure, ...)` which returns `null` with `Good` status. A client reading the parent UDT node thus gets a Good/null snapshot rather than a fault or a structured value. Also, member registration does not check for name collisions: if two configured tags produce the same parent-dot-member key (or a member name collides with an independently-declared tag), the later silently overwrites the earlier with no diagnostic.
|
||||
|
||||
**Recommendation:** Decide the parent-Structure read contract explicitly: either do not register the bare parent name as a readable tag, or have the Structure read return a proper status. Add a duplicate-key check during `_tagsByName` population that throws an `InvalidOperationException` naming both colliding tags, consistent with the fail-fast validation `AbCipHostAddress` parsing already does.
|
||||
|
||||
**Resolution:** _(open)_
|
||||
|
||||
### Driver.AbCip-006
|
||||
|
||||
| Field | Value |
|
||||
|---|---|
|
||||
| Severity | Medium |
|
||||
| Category | OtOpcUa conventions |
|
||||
| Location | `PlcTagHandle.cs:28-59`, `AbCipDriver.cs:806-807,832-833`, `LibplctagTagRuntime.cs:117` |
|
||||
| Status | Open |
|
||||
|
||||
**Description:** `driver-specs.md` makes the SafeHandle-wrapped native handle a non-negotiable Tier-B protection ("Wrap every libplctag handle in a SafeHandle with finalizer calling plc_tag_destroy"). The repo ships `PlcTagHandle : SafeHandle` for this, but it is dead code: `ReleaseHandle` is a permanent no-op (the comment says the `plc_tag_destroy` P/Invoke "is deferred to PR 3", well past the commit under review), and `DeviceState.TagHandles` is never populated anywhere in the driver. The real native lifetime is delegated to the libplctag.NET `Tag` object own `Dispose()`. The mandated finalizer-backed leak protection therefore does not exist: if a `LibplctagTagRuntime` is GC-collected without `Dispose` (owning thread crashes, exception bypasses the device dispose path), whether the native tag is freed depends entirely on whether libplctag.NET `Tag` has its own finalizer, which is not guaranteed by this driver code as the design requires.
|
||||
|
||||
**Recommendation:** Either delete `PlcTagHandle` and `DeviceState.TagHandles` as misleading dead scaffolding and document that native lifetime is owned by libplctag.NET `Tag` finalizer (verifying that `Tag` actually has one), or finish the intended design by making `LibplctagTagRuntime` hold a real `PlcTagHandle` with a working `ReleaseHandle` calling `plc_tag_destroy`.
|
||||
|
||||
**Resolution:** _(open)_
|
||||
|
||||
### Driver.AbCip-007
|
||||
|
||||
| Field | Value |
|
||||
|---|---|
|
||||
| Severity | Low |
|
||||
| Category | OtOpcUa conventions |
|
||||
| Location | `AbCipDriver.cs` (whole file), `AbCipAlarmProjection.cs`, `LibplctagTagRuntime.cs` |
|
||||
| Status | Open |
|
||||
|
||||
**Description:** `CLAUDE.md` Library Preferences mandate Serilog with a rolling daily file sink. The driver has no logging at all: no `ILogger`/Serilog dependency is injected or used. Failure paths instead swallow exceptions into the `_health` string (`ReadSingleAsync`, `WriteAsync`, `FetchUdtShapeAsync` catch-all, `ProbeLoopAsync` empty catch, `AbCipAlarmProjection.RunPollLoopAsync` empty catch). An operator looking at server logs sees nothing for a probe loop failing every tick for hours, a template decode that silently returned null, or an alarm poll loop throwing every interval. The health surface carries only the last error message, so a transient error immediately overwrites a more important earlier one.
|
||||
|
||||
**Recommendation:** Inject an `ILogger` (Serilog) and log at least device init failures, per-call read/write transport errors (debounced), probe-loop failures, template-read failures, and alarm-poll-loop exceptions. The health surface is for state, not for the audit trail.
|
||||
|
||||
**Resolution:** _(open)_
|
||||
|
||||
### Driver.AbCip-008
|
||||
|
||||
| Field | Value |
|
||||
|---|---|
|
||||
| Severity | High |
|
||||
| Category | Concurrency & thread safety |
|
||||
| Location | `AbCipDriver.cs:144-152`, `AbCipDriver.cs:169-183`, `AbCipDriver.cs:235-281` |
|
||||
| Status | Open |
|
||||
|
||||
**Description:** Probe loops are started fire-and-forget (`_ = Task.Run(() => ProbeLoopAsync(state, ct), ct)`) and the resulting Task is never stored or awaited. `ShutdownAsync` cancels `state.ProbeCts`, then immediately disposes it, sets it null, and calls `state.DisposeHandles()` without waiting for `ProbeLoopAsync` to observe the cancellation and exit. Races: (1) the still-running probe loop may be mid-await against a `ProbeCts` that `ShutdownAsync` has already disposed, producing `ObjectDisposedException` on the loop thread; (2) `DisposeHandles` clears `Runtimes`/`ParentRuntimes` while a concurrent `ReadAsync`/`WriteAsync` from the alarm projection or a subscription poll could be iterating or adding to those plain `Dictionary` instances (not thread-safe), corrupting the dictionary or throwing; (3) the probe runtime created inside `ProbeLoopAsync` is never tracked by `DeviceState`, so `DisposeHandles` cannot dispose it; only the loop own finally does, which may run after `ShutdownAsync` returns.
|
||||
|
||||
**Recommendation:** Store each probe Task on `DeviceState`; in `ShutdownAsync` cancel the CTS, then await Task.WhenAll (with a timeout) before disposing the CTS or the handles. Guard `Runtimes`/`ParentRuntimes` with a lock or switch to `ConcurrentDictionary`. Make `ShutdownAsync` idempotent and safe against in-flight `ReadAsync`/`WriteAsync`.
|
||||
|
||||
**Resolution:** _(open)_
|
||||
|
||||
### Driver.AbCip-009
|
||||
|
||||
| Field | Value |
|
||||
|---|---|
|
||||
| Severity | Medium |
|
||||
| Category | Concurrency & thread safety |
|
||||
| Location | `AbCipDriver.cs:621-648`, `AbCipDriver.cs:591-614` |
|
||||
| Status | Open |
|
||||
|
||||
**Description:** `EnsureTagRuntimeAsync` and `EnsureParentRuntimeAsync` are check-then-act on a non-thread-safe `Dictionary` (`device.Runtimes` / `device.ParentRuntimes`). `ReadAsync` is `IReadable` and may be invoked concurrently: the server read path, each polled subscription loop, and the alarm projection poll loop all call `ReadAsync` independently. Two concurrent `ReadAsync` calls that both miss the cache for the same tag both create a `LibplctagTagRuntime`, both initialize it, and both write into the dictionary; the loser leaks an initialized native tag (never disposed, since only the dictionary value is disposed at shutdown), and concurrent `Dictionary` mutation can throw or corrupt the bucket structure. `WriteBitInDIntAsync` serializes the parent via a per-parent `SemaphoreSlim`, but `EnsureParentRuntimeAsync` still runs the same unguarded check-then-act on the shared `ParentRuntimes` dict.
|
||||
|
||||
**Recommendation:** Use `ConcurrentDictionary` for `Runtimes` and `ParentRuntimes`, creating the runtime via `GetOrAdd` with a lazily-initialized factory, or guard the ensure path with a per-device lock / `SemaphoreSlim`. Ensure the losing creator runtime is disposed rather than leaked.
|
||||
|
||||
**Resolution:** _(open)_
|
||||
|
||||
### Driver.AbCip-010
|
||||
|
||||
| Field | Value |
|
||||
|---|---|
|
||||
| Severity | Medium |
|
||||
| Category | Error handling & resilience |
|
||||
| Location | `AbCipDriver.cs:621-648`, `AbCipDriver.cs:346-391` |
|
||||
| Status | Open |
|
||||
|
||||
**Description:** Once `EnsureTagRuntimeAsync` successfully creates and initializes a `LibplctagTagRuntime`, that runtime is cached for the lifetime of the device and never re-created on failure. If the underlying native tag enters a permanently-bad state (connection dropped, controller rebooted, tag handle invalidated by a PLC program download), every subsequent `ReadAsync`/`WriteAsync` reuses the same dead handle and returns errors forever. The probe loop does tear down and recreate its runtime after a failure, but the read/write path has no equivalent recovery; only a full `ReinitializeAsync` (itself broken, see Driver.AbCip-001) clears the cache. The normal data path should self-heal from a transient handle fault without operator-driven reinitialize.
|
||||
|
||||
**Recommendation:** On a non-zero libplctag status or transport exception in `ReadSingleAsync`/`ReadGroupAsync`/`WriteAsync`, evict the offending runtime from `device.Runtimes` (and dispose it) so the next call re-creates and re-initializes it. Mirror the probe loop recreate-on-failure behavior.
|
||||
|
||||
**Resolution:** _(open)_
|
||||
|
||||
### Driver.AbCip-011
|
||||
|
||||
| Field | Value |
|
||||
|---|---|
|
||||
| Severity | Low |
|
||||
| Category | Error handling & resilience |
|
||||
| Location | `AbCipDriver.cs:144-152`, `AbCipDriverOptions.cs:131-143` |
|
||||
| Status | Open |
|
||||
|
||||
**Description:** `InitializeAsync` only starts probe loops when `_options.Probe.Enabled` is true AND `Probe.ProbeTagPath` is non-blank. When `Probe.Enabled` is true (the default) but `ProbeTagPath` is null (also the default; the doc comment says "PR 8 wires this up"), no probe runs at all and the device `HostState` stays `HostState.Unknown` forever. `GetHostStatuses()` then reports every device as Unknown indefinitely with no warning. An operator who enables the probe but does not set a probe tag gets a silently inert health surface rather than an error or a log line.
|
||||
|
||||
**Recommendation:** When `Probe.Enabled` is true but no `ProbeTagPath` is configured, either fail initialization with a clear message, fall back to a family-default probe tag (the doc comment stated intent), or at minimum log a warning that the probe is enabled-but-inert.
|
||||
|
||||
**Resolution:** _(open)_
|
||||
|
||||
### Driver.AbCip-012
|
||||
|
||||
| Field | Value |
|
||||
|---|---|
|
||||
| Severity | Low |
|
||||
| Category | Performance & resource management |
|
||||
| Location | `LibplctagTemplateReader.cs:15-35`, `AbCipDriver.cs:88-92` |
|
||||
| Status | Open |
|
||||
|
||||
**Description:** `LibplctagTemplateReader` is created per `FetchUdtShapeAsync` call, and each call constructs a fresh libplctag `Tag` for the @udt pseudo-tag, initializes it (a CIP connection handshake), reads, and disposes it. There is no reuse of the `Tag` across template reads for the same device: every UDT shape fetch pays a full connect/init cost. `AbCipTemplateCache` caches the decoded shape so this only bites on the first fetch of each type, but discovery of a UDT-heavy controller still does one connect per type. The same per-call `Tag` construction applies to `LibplctagTagEnumerator`.
|
||||
|
||||
**Recommendation:** Acceptable for a low-frequency discovery path, but consider pooling/reusing a single @udt-capable `Tag` per device for the duration of a discovery run, or document that the per-type connect cost is accepted.
|
||||
|
||||
**Resolution:** _(open)_
|
||||
|
||||
### Driver.AbCip-013
|
||||
|
||||
| Field | Value |
|
||||
|---|---|
|
||||
| Severity | Low |
|
||||
| Category | Design-document adherence |
|
||||
| Location | `AbCipDriverOptions.cs:70-73`, `PlcFamilies/AbCipPlcFamilyProfile.cs:13-19`, `LibplctagTagRuntime.cs:16-27` |
|
||||
| Status | Open |
|
||||
|
||||
**Description:** `driver-specs.md` specifies the AB CIP per-device connection settings as discrete fields: Host, Path, PlcType, TimeoutMs, AllowPacking, ConnectionSize. The implementation instead collapses host + path into a single opaque ab:// URL string and exposes `PlcFamily` (which adds GuardLogix, not in the spec table). AllowPacking and ConnectionSize from the spec are not configurable per device: `AbCipPlcFamilyProfile` hard-codes `SupportsRequestPacking` and `DefaultConnectionSize` per family, and `LibplctagTagRuntime` never passes a connection-size or packing attribute to the `Tag` (it is constructed with only Gateway/Path/PlcType/Protocol/Name/Timeout). The family profile `DefaultConnectionSize`/`SupportsRequestPacking`/`MaxFragmentBytes` fields are computed but never applied to the wire layer: dead configuration.
|
||||
|
||||
**Recommendation:** Either update `driver-specs.md` to describe the actual ab:// host-address model and the family-profile approach, and wire the profile ConnectionSize/packing values through to the libplctag `Tag` attributes; or expose AllowPacking/ConnectionSize as per-device options per the spec.
|
||||
|
||||
**Resolution:** _(open)_
|
||||
|
||||
### Driver.AbCip-014
|
||||
|
||||
| Field | Value |
|
||||
|---|---|
|
||||
| Severity | Medium |
|
||||
| Category | Testing coverage |
|
||||
| Location | `tests/Drivers/ZB.MOM.WW.OtOpcUa.Driver.AbCip.Tests/AbCipStatusMapperTests.cs:28-40` |
|
||||
| Status | Open |
|
||||
|
||||
**Description:** `AbCipStatusMapperTests.MapLibplctagStatus_maps_known_codes` asserts the mapper against the same wrong integer constants (-5, -7, -14, -16, -17) the production code uses (see Driver.AbCip-002). The test locks in the bug rather than catching it, giving false confidence that libplctag error mapping is correct. There is no test that drives an actual libplctag `Status` enum value through `LibplctagTagRuntime.GetStatus()` plus `MapLibplctagStatus` end-to-end. Separately, the broken `ReinitializeAsync` config-discard behavior (Driver.AbCip-001) and the declaration-order whole-UDT decode risk (Driver.AbCip-003) have no test that would fail when those defects are present: `AbCipDriverWholeUdtReadTests` only exercises a UDT whose declaration order happens to match a simple alignment layout.
|
||||
|
||||
**Recommendation:** Rewrite the libplctag-status test to use the real `libplctag.Status` enum members and their documented integer values. Add a test that `ReinitializeAsync` with a changed config JSON actually applies the change (or asserts the documented immutability contract). Add a whole-UDT decode test where the controller compiled layout differs from declaration order.
|
||||
|
||||
**Resolution:** _(open)_
|
||||
|
||||
### Driver.AbCip-015
|
||||
|
||||
| Field | Value |
|
||||
|---|---|
|
||||
| Severity | Low |
|
||||
| Category | Documentation & comments |
|
||||
| Location | `AbCipDriver.cs:9-11`, `PlcTagHandle.cs:23-27,53-58`, `AbCipTemplateCache.cs:12-15`, `IAbCipTagEnumerator.cs:6-11`, `AbCipDriverOptions.cs:21` |
|
||||
| Status | Open |
|
||||
|
||||
**Description:** Numerous comments are stale relative to the commit under review. `AbCipDriver.cs:9-11` says the driver "Implements IDriver only for now" with capabilities shipping "in subsequent PRs (3-8)" while the class already implements all of them. `PlcTagHandle.cs` says the plc_tag_destroy P/Invoke "is deferred to PR 3 ... PR 2 ships the lifetime scaffold + tests only" and `ReleaseHandle` "is a no-op", which now reads as a permanent unfinished-work marker (see Driver.AbCip-006). `AbCipTemplateCache.cs:12-15` says "Template shape read ... lands with PR 6 ... no reader writes to it yet" while `CipTemplateObjectDecoder` and `LibplctagTemplateReader` both exist and `FetchUdtShapeAsync` writes to the cache. `IAbCipTagEnumerator.cs:6-11` says the enumerator "Defaults to EmptyAbCipTagEnumeratorFactory" while the production default is `LibplctagTagEnumeratorFactory`. `AbCipDriverOptions.cs:21` says "AB discovery lands in PR 5", already shipped. `StyleGuide.md` explicitly says not to leave stale coming-soon notes.
|
||||
|
||||
**Recommendation:** Sweep the module for PR-N forward references and "lands in PR X" notes that have been delivered; update them to describe present behavior. Where a comment marks genuinely unfinished work (e.g. `PlcTagHandle.ReleaseHandle`), convert it to a tracked TODO with an issue reference rather than a PR-number milestone.
|
||||
|
||||
**Resolution:** _(open)_
|
||||
213
code-reviews/Driver.AbLegacy.Cli/findings.md
Normal file
213
code-reviews/Driver.AbLegacy.Cli/findings.md
Normal file
@@ -0,0 +1,213 @@
|
||||
# Code Review — Driver.AbLegacy.Cli
|
||||
|
||||
| Field | Value |
|
||||
|---|---|
|
||||
| Module | `src/Drivers/Cli/ZB.MOM.WW.OtOpcUa.Driver.AbLegacy.Cli` |
|
||||
| Reviewer | Claude Code |
|
||||
| Review date | 2026-05-22 |
|
||||
| Commit reviewed | `76d35d1` |
|
||||
| Status | Reviewed |
|
||||
| Open findings | 7 |
|
||||
|
||||
## Checklist coverage
|
||||
|
||||
A comprehensive review completes every category, recording "No issues found" where
|
||||
a category produced nothing rather than leaving it blank.
|
||||
|
||||
| # | Category | Result |
|
||||
|---|---|---|
|
||||
| 1 | Correctness & logic bugs | Driver.AbLegacy.Cli-001, Driver.AbLegacy.Cli-002 |
|
||||
| 2 | OtOpcUa conventions | No issues found |
|
||||
| 3 | Concurrency & thread safety | Driver.AbLegacy.Cli-003 |
|
||||
| 4 | Error handling & resilience | Driver.AbLegacy.Cli-001, Driver.AbLegacy.Cli-004 |
|
||||
| 5 | Security | No issues found |
|
||||
| 6 | Performance & resource management | No issues found |
|
||||
| 7 | Design-document adherence | Driver.AbLegacy.Cli-005 |
|
||||
| 8 | Code organization & conventions | Driver.AbLegacy.Cli-006 |
|
||||
| 9 | Testing coverage | Driver.AbLegacy.Cli-007 |
|
||||
| 10 | Documentation & comments | Driver.AbLegacy.Cli-002, Driver.AbLegacy.Cli-005 |
|
||||
|
||||
## Findings
|
||||
|
||||
### Driver.AbLegacy.Cli-001
|
||||
|
||||
| Field | Value |
|
||||
|---|---|
|
||||
| Severity | Medium |
|
||||
| Category | Error handling & resilience |
|
||||
| Location | `Commands/WriteCommand.cs:46`, `Commands/WriteCommand.cs:62-72` |
|
||||
| Status | Open |
|
||||
|
||||
**Description:** `WriteCommand.ExecuteAsync` calls `ParseValue(Value, DataType)` at
|
||||
line 46, *before* the `try` block and outside any catch. `ParseValue` uses
|
||||
`short.Parse` / `int.Parse` / `float.Parse`, which throw `FormatException` on
|
||||
malformed input (`-v abc`) and `OverflowException` on out-of-range input
|
||||
(`-t Int -v 99999`). Only the `Bit` branch and the unsupported-type branch raise
|
||||
the CliFx `CommandException` that the framework renders as a clean one-line error
|
||||
with a non-zero exit code. For every numeric type a bad `--value` therefore
|
||||
escapes as an unhandled `FormatException`/`OverflowException`, which CliFx prints
|
||||
as a raw stack trace — an operator-hostile failure mode for a tool whose whole
|
||||
purpose is ad-hoc operator use. The module own test
|
||||
`ParseValue_non_numeric_for_numeric_types_throws` confirms the raw `FormatException`
|
||||
leaks. The driver `WriteAsync` has dedicated catch arms for `FormatException`
|
||||
(`BadTypeMismatch`) and `OverflowException` (`BadOutOfRange`), but the CLI never
|
||||
reaches the driver because the parse happens first.
|
||||
|
||||
**Recommendation:** Wrap the numeric parses so a parse failure surfaces as a
|
||||
`CliFx.Exceptions.CommandException` with a message naming the offending value and
|
||||
type (mirroring the existing `Bit` and unsupported-type branches). Either catch
|
||||
`FormatException`/`OverflowException` inside `ParseValue` and rethrow as
|
||||
`CommandException`, or use `TryParse` and throw `CommandException` on failure.
|
||||
|
||||
**Resolution:** _(open)_
|
||||
|
||||
### Driver.AbLegacy.Cli-002
|
||||
|
||||
| Field | Value |
|
||||
|---|---|
|
||||
| Severity | Low |
|
||||
| Category | Correctness & logic bugs |
|
||||
| Location | `Commands/WriteCommand.cs:27-29`, `Program.cs:6-9` |
|
||||
| Status | Open |
|
||||
|
||||
**Description:** The `--value` option help text states "booleans accept
|
||||
true/false/1/0", but `ParseBool` (`WriteCommand.cs:74-80`) and the error message
|
||||
also accept `on/off` and `yes/no`, and `DriverClis.md` documents the full
|
||||
`true/false/1/0/yes/no/on/off` set as the shared CLI contract. The help text
|
||||
under-documents the accepted aliases, so an operator reading `--help` will not
|
||||
discover `on`/`off`/`yes`/`no`. Minor, but it makes the inline help inconsistent
|
||||
with both the code and the design doc.
|
||||
|
||||
**Recommendation:** Extend the `--value` description to list the full alias set,
|
||||
matching the wording used elsewhere (e.g. "booleans accept
|
||||
true/false, 1/0, on/off, yes/no").
|
||||
|
||||
**Resolution:** _(open)_
|
||||
|
||||
### Driver.AbLegacy.Cli-003
|
||||
|
||||
| Field | Value |
|
||||
|---|---|
|
||||
| Severity | Low |
|
||||
| Category | Concurrency & thread safety |
|
||||
| Location | `Commands/SubscribeCommand.cs:47-53` |
|
||||
| Status | Open |
|
||||
|
||||
**Description:** The `OnDataChange` handler calls `console.Output.WriteLine(line)`
|
||||
(the synchronous overload) directly from the `PollGroupEngine` poll thread. The
|
||||
poll engine raises change events from a background timer/loop thread, so two
|
||||
ticks that fire close together can interleave writes on the shared `TextWriter`.
|
||||
`SnapshotFormatter` builds the whole line into a single string before the call,
|
||||
so a line is unlikely to be torn mid-token, but there is no synchronisation
|
||||
guaranteeing that the background-thread writes do not interleave with the
|
||||
`await console.Output.WriteLineAsync(...)` "Subscribed to ..." line on the command
|
||||
thread, nor with each other. This is the same pattern as the AbCip CLI, so it is
|
||||
a shared low-severity issue, not unique to this module.
|
||||
|
||||
**Recommendation:** Serialise console writes from the event handler — e.g. funnel
|
||||
change events through a `Channel<string>` drained by a single consumer task, or
|
||||
guard the `WriteLine` with a lock. At minimum, document that the interleaving is
|
||||
accepted because output is human-facing and line-buffered.
|
||||
|
||||
**Resolution:** _(open)_
|
||||
|
||||
### Driver.AbLegacy.Cli-004
|
||||
|
||||
| Field | Value |
|
||||
|---|---|
|
||||
| Severity | Low |
|
||||
| Category | Error handling & resilience |
|
||||
| Location | `Commands/ProbeCommand.cs:37-56`, `Commands/ReadCommand.cs:39-50`, `Commands/WriteCommand.cs:48-59`, `Commands/SubscribeCommand.cs:41-76` |
|
||||
| Status | Open |
|
||||
|
||||
**Description:** Every command does `await using var driver = new AbLegacyDriver(...)`
|
||||
*and* an explicit `await driver.ShutdownAsync(...)` in the `finally`. `AbLegacyDriver`
|
||||
`DisposeAsync` itself calls `ShutdownAsync`, so the driver is shut down twice on the
|
||||
normal path. `ShutdownAsync` is written to be idempotent (it clears `_devices` /
|
||||
`_tagsByName` and re-enters cleanly on an empty state), so this is not a crash, but
|
||||
the double teardown is redundant and slightly obscures intent — a reader has to
|
||||
confirm idempotency to be sure it is safe. The `await using` already guarantees
|
||||
cleanup on every exit path including exceptions.
|
||||
|
||||
**Recommendation:** Drop either the `await using` or the explicit
|
||||
`finally { await driver.ShutdownAsync(...) }` in each command. Keeping the explicit
|
||||
`finally` and using a plain `var driver` (no `await using`) is the clearer choice,
|
||||
since the commands deliberately pass `CancellationToken.None` to shutdown so teardown
|
||||
is not cut short by a cancelled `ct`.
|
||||
|
||||
**Resolution:** _(open)_
|
||||
|
||||
### Driver.AbLegacy.Cli-005
|
||||
|
||||
| Field | Value |
|
||||
|---|---|
|
||||
| Severity | Low |
|
||||
| Category | Design-document adherence |
|
||||
| Location | `Commands/SubscribeCommand.cs:23-25`, `docs/Driver.AbLegacy.Cli.md:94-96` |
|
||||
| Status | Open |
|
||||
|
||||
**Description:** The subscribe command interval option is `--interval-ms`
|
||||
(default 1000). `docs/Driver.AbLegacy.Cli.md` shows the subscribe example as
|
||||
`otopcua-ablegacy-cli subscribe ... -i 500`, which works because of the short
|
||||
alias `'i'`, but the doc never names the long form `--interval-ms` or states the
|
||||
1000 ms default, while the equivalent AbCip CLI help text notes "PollGroupEngine
|
||||
floors sub-250ms values". The AbLegacy `--interval-ms` description omits that
|
||||
flooring caveat, so an operator passing `-i 100` against AbLegacy gets no warning
|
||||
that the engine will floor it. The behaviour is identical (same `PollGroupEngine`)
|
||||
but the documented contract drifts between the two CLIs.
|
||||
|
||||
**Recommendation:** Add the sub-250 ms flooring note to the AbLegacy
|
||||
`--interval-ms` description for parity with the AbCip CLI, and mention the
|
||||
`--interval-ms` long form + 1000 ms default in `docs/Driver.AbLegacy.Cli.md`.
|
||||
|
||||
**Resolution:** _(open)_
|
||||
|
||||
### Driver.AbLegacy.Cli-006
|
||||
|
||||
| Field | Value |
|
||||
|---|---|
|
||||
| Severity | Low |
|
||||
| Category | Code organization & conventions |
|
||||
| Location | `Commands/ProbeCommand.cs:20-22` |
|
||||
| Status | Open |
|
||||
|
||||
**Description:** `ProbeCommand` declares its `--type` option with no short alias,
|
||||
while `ReadCommand`, `WriteCommand`, and `SubscribeCommand` all declare `--type`
|
||||
with the short alias `'t'`. `ProbeCommand` also gives `--address` the alias `'a'`,
|
||||
matching the other commands, so the `--type` omission is an inconsistency rather
|
||||
than a deliberate design choice. An operator who learns `-t` on `read` will find
|
||||
it silently rejected on `probe`.
|
||||
|
||||
**Recommendation:** Add the `'t'` short alias to `ProbeCommand` `--type` option
|
||||
for consistency with the other three commands. (The AbCip CLI `ProbeCommand` has
|
||||
the same omission, so a cross-CLI sweep is worthwhile.)
|
||||
|
||||
**Resolution:** _(open)_
|
||||
|
||||
### Driver.AbLegacy.Cli-007
|
||||
|
||||
| Field | Value |
|
||||
|---|---|
|
||||
| Severity | Low |
|
||||
| Category | Testing coverage |
|
||||
| Location | `tests/Drivers/Cli/ZB.MOM.WW.OtOpcUa.Driver.AbLegacy.Cli.Tests/WriteCommandParseValueTests.cs` |
|
||||
| Status | Open |
|
||||
|
||||
**Description:** The only test file in the CLI test project covers
|
||||
`WriteCommand.ParseValue` and `ReadCommand.SynthesiseTagName`. Two behaviours that
|
||||
are pure logic (testable without a device) are uncovered:
|
||||
(1) `AbLegacyCommandBase.BuildOptions` — that it sets `Probe.Enabled = false`,
|
||||
populates `Devices` from `Gateway`/`PlcType`, and forwards the tag list; a
|
||||
regression here silently changes every command behaviour.
|
||||
(2) the out-of-range numeric path for `ParseValue` (`short.Parse` overflow,
|
||||
`int.Parse` overflow) — `ParseValue_non_numeric_for_numeric_types_throws` asserts
|
||||
`FormatException` for non-numeric input but nothing asserts the overflow path,
|
||||
which is exactly the path that escapes uncaught per finding
|
||||
Driver.AbLegacy.Cli-001. `BuildOptions` is reachable via `InternalsVisibleTo`
|
||||
(the test assembly is already granted access).
|
||||
|
||||
**Recommendation:** Add tests for `BuildOptions` (probe disabled, device shape,
|
||||
tag passthrough) and an overflow-input test for `ParseValue` so the fix for
|
||||
Driver.AbLegacy.Cli-001 is locked in by a regression test.
|
||||
|
||||
**Resolution:** _(open)_
|
||||
359
code-reviews/Driver.AbLegacy/findings.md
Normal file
359
code-reviews/Driver.AbLegacy/findings.md
Normal file
@@ -0,0 +1,359 @@
|
||||
# Code Review - Driver.AbLegacy
|
||||
|
||||
| Field | Value |
|
||||
|---|---|
|
||||
| Module | `src/Drivers/ZB.MOM.WW.OtOpcUa.Driver.AbLegacy` |
|
||||
| Reviewer | Claude Code |
|
||||
| Review date | 2026-05-22 |
|
||||
| Commit reviewed | `76d35d1` |
|
||||
| Status | Reviewed |
|
||||
| Open findings | 13 |
|
||||
|
||||
## Checklist coverage
|
||||
|
||||
A comprehensive review completes every category, recording "No issues found" where
|
||||
a category produced nothing rather than leaving it blank.
|
||||
|
||||
| # | Category | Result |
|
||||
|---|---|---|
|
||||
| 1 | Correctness & logic bugs | Driver.AbLegacy-001, Driver.AbLegacy-002, Driver.AbLegacy-003, Driver.AbLegacy-004 |
|
||||
| 2 | OtOpcUa conventions | Driver.AbLegacy-005 |
|
||||
| 3 | Concurrency & thread safety | Driver.AbLegacy-006, Driver.AbLegacy-007, Driver.AbLegacy-008 |
|
||||
| 4 | Error handling & resilience | Driver.AbLegacy-009, Driver.AbLegacy-010 |
|
||||
| 5 | Security | No issues found |
|
||||
| 6 | Performance & resource management | Driver.AbLegacy-011 |
|
||||
| 7 | Design-document adherence | Driver.AbLegacy-012 |
|
||||
| 8 | Code organization & conventions | Driver.AbLegacy-013 |
|
||||
| 9 | Testing coverage | No issues found |
|
||||
| 10 | Documentation & comments | No issues found |
|
||||
|
||||
## Findings
|
||||
|
||||
### Driver.AbLegacy-001
|
||||
|
||||
| Field | Value |
|
||||
|---|---|
|
||||
| Severity | High |
|
||||
| Category | Correctness & logic bugs |
|
||||
| Location | `AbLegacyAddress.cs:54`, `AbLegacyDriver.cs:368-374` |
|
||||
| Status | Open |
|
||||
|
||||
**Description:** `AbLegacyAddress.TryParse` accepts a `BitIndex` of `0..31` for every
|
||||
file type. A PCCC N-file word is a signed 16-bit integer, so valid bit indices are
|
||||
`0..15`. When a tag is `Bit`-typed against an N-file with a bit suffix of `16..31`
|
||||
(e.g. `N7:0/20`), `WriteBitInWordAsync` reads the parent as `AbLegacyDataType.Int`
|
||||
(16-bit), then computes `current | (1 << bit)` / `current & ~(1 << bit)` with `bit`
|
||||
up to 31. `1 << 20` produces a value outside the 16-bit range, the result is cast
|
||||
`(short)updated`, and the high bits are silently truncated - the wrong bit (or no
|
||||
bit) is written and no error is surfaced. The mask arithmetic is also done on a
|
||||
sign-extended `int`. For L-file (32-bit) bits the parent is still read as `Int`
|
||||
(16-bit), so bits 16..31 of a long can never be addressed correctly.
|
||||
|
||||
**Recommendation:** Validate `BitIndex` against the parent word width during parse or
|
||||
in `WriteBitInWordAsync` - reject bit > 15 for N/B/I/O/S files and bit > 31 for L
|
||||
files. For bit-in-word RMW against L files, read the parent as `Long`. Mask the
|
||||
read-back value to the word width before applying the bit operation.
|
||||
|
||||
**Resolution:** _(open)_
|
||||
|
||||
### Driver.AbLegacy-002
|
||||
|
||||
| Field | Value |
|
||||
|---|---|
|
||||
| Severity | Medium |
|
||||
| Category | Correctness & logic bugs |
|
||||
| Location | `AbLegacyDriver.cs:368` |
|
||||
| Status | Open |
|
||||
|
||||
**Description:** In `WriteBitInWordAsync` the parent word is decoded with
|
||||
`Convert.ToInt32(parentRuntime.DecodeValue(AbLegacyDataType.Int, ...))`.
|
||||
`LibplctagLegacyTagRuntime.DecodeValue` for `AbLegacyDataType.Int` returns
|
||||
`(int)_tag.GetInt16(0)` - a sign-extended `int`. When the current word has its high
|
||||
bit set (value 0x8000..0xFFFF, decoded as a negative `int`), the subsequent
|
||||
`(short)updated` cast re-encodes the low 16 bits correctly, but `current | (1 << bit)`
|
||||
is performed on the sign-extended value. The result is bit-correct for the low 16
|
||||
bits only because the cast preserves them; any future change to widen the mask range
|
||||
will break silently. Combined with finding 001 this is a latent correctness hazard.
|
||||
|
||||
**Recommendation:** Mask `current` to `current & 0xFFFF` before the bit operation and
|
||||
operate on an explicitly 16-bit value, or document the reliance on low-16-bit
|
||||
preservation explicitly.
|
||||
|
||||
**Resolution:** _(open)_
|
||||
|
||||
### Driver.AbLegacy-003
|
||||
|
||||
| Field | Value |
|
||||
|---|---|
|
||||
| Severity | Medium |
|
||||
| Category | Correctness & logic bugs |
|
||||
| Location | `AbLegacyAddress.cs:62-95` |
|
||||
| Status | Open |
|
||||
|
||||
**Description:** `TryParse` does not reject several malformed PCCC addresses that the
|
||||
XML docs imply are invalid:
|
||||
- A sub-element and a bit index together (`T4:0.ACC/2`) parse successfully even
|
||||
though no PCCC element supports both.
|
||||
- I/O/S files with a file number (`I3:0`, `S2:1`) parse successfully - I/O and S are
|
||||
single-letter files with no file number per the doc comment, but the parser only
|
||||
requires "letter then optional digits".
|
||||
- B-file addresses with a sub-element (`B3:0.DN`) parse successfully.
|
||||
`ToLibplctagName()` re-emits whatever was parsed, so a malformed address is passed
|
||||
through to libplctag rather than rejected early with a clear error.
|
||||
|
||||
**Recommendation:** Tighten the parser: reject sub-element + bit-index combinations,
|
||||
reject file numbers on I/O/S, and restrict which file letters may carry a sub-element
|
||||
(T/C/R only). Add unit coverage for the rejection cases.
|
||||
|
||||
**Resolution:** _(open)_
|
||||
|
||||
### Driver.AbLegacy-004
|
||||
|
||||
| Field | Value |
|
||||
|---|---|
|
||||
| Severity | Medium |
|
||||
| Category | Correctness & logic bugs |
|
||||
| Location | `LibplctagLegacyTagRuntime.cs:36-37` |
|
||||
| Status | Open |
|
||||
|
||||
**Description:** `DecodeValue` for `AbLegacyDataType.Bit` with `bitIndex == null`
|
||||
returns `_tag.GetInt8(0) != 0`. A bit-file element (`B3:0/0`) is a single bit inside
|
||||
a 16-bit word; reading only the low byte (`GetInt8(0)`) means a `Bit` tag whose live
|
||||
bit sits in bits 8..15 of the word, or a B-file element addressed without an explicit
|
||||
bit suffix, decodes incorrectly. The driver passes `parsed.ToLibplctagName()` which
|
||||
preserves the `/bit` suffix, so libplctag resolves the bit when a suffix is present -
|
||||
but a `Bit`-typed tag configured with an address that has no `/bit` suffix (e.g.
|
||||
`B3:0`) silently decodes the wrong thing.
|
||||
|
||||
**Recommendation:** For `Bit` with no `bitIndex`, decide explicitly: either require a
|
||||
bit suffix on `Bit`-typed tags (validate in `CreateInstance`/`DiscoverAsync`) or
|
||||
decode the full 16-bit word and test bit 0.
|
||||
|
||||
**Resolution:** _(open)_
|
||||
|
||||
### Driver.AbLegacy-005
|
||||
|
||||
| Field | Value |
|
||||
|---|---|
|
||||
| Severity | Low |
|
||||
| Category | OtOpcUa conventions |
|
||||
| Location | `AbLegacyDriver.cs` (whole file) |
|
||||
| Status | Open |
|
||||
|
||||
**Description:** The driver uses no `ILogger`/Serilog at all. Probe-loop failures,
|
||||
runtime initialisation failures, libplctag non-zero statuses, and read/write
|
||||
exceptions are folded into `DriverHealth.Detail` strings but never logged. CLAUDE.md
|
||||
names Serilog with a rolling daily file sink as the logging library. The complete
|
||||
absence of structured logging makes field diagnosis of a PCCC comms problem (timeout
|
||||
vs route failure vs wrong PLC family) rely entirely on a single overwritten `Detail`
|
||||
string that the next read or write immediately clobbers.
|
||||
|
||||
**Recommendation:** Inject `ILogger<AbLegacyDriver>` (optional, like `tagFactory`) and
|
||||
log probe transitions, runtime-init failures, and the first occurrence of a non-zero
|
||||
libplctag status per device.
|
||||
|
||||
**Resolution:** _(open)_
|
||||
|
||||
### Driver.AbLegacy-006
|
||||
|
||||
| Field | Value |
|
||||
|---|---|
|
||||
| Severity | High |
|
||||
| Category | Concurrency & thread safety |
|
||||
| Location | `AbLegacyDriver.cs:107-158`, `AbLegacyDriver.cs:162-234`, `LibplctagLegacyTagRuntime.cs` |
|
||||
| Status | Open |
|
||||
|
||||
**Description:** A per-tag `IAbLegacyTagRuntime` (wrapping a single libplctag `Tag`)
|
||||
is cached in `DeviceState.Runtimes` and reused. `ReadAsync` (called directly by the
|
||||
server read path) and the `PollGroupEngine` poll loop (which also calls `ReadAsync`
|
||||
via the reader delegate) can run concurrently, and two poll subscriptions covering
|
||||
the same tag run on independent background tasks. All of them call
|
||||
`EnsureTagRuntimeAsync` to the same `Tag` instance and call `runtime.ReadAsync` /
|
||||
`GetStatus` / `DecodeValue` with no synchronisation. A libplctag `Tag` is not safe
|
||||
for concurrent operations on the same handle: an interleaved Read/GetStatus/DecodeValue
|
||||
from two threads can read a value mid-update or observe a status that belongs to the
|
||||
other operation. `WriteAsync` shares the same runtime dictionary and compounds the
|
||||
hazard. Only the bit-in-word RMW path is serialised (per-parent `SemaphoreSlim`).
|
||||
|
||||
**Recommendation:** Serialise all operations against a given runtime - a per-runtime
|
||||
`SemaphoreSlim`, or a per-device read lock - so no two threads touch the same `Tag`
|
||||
handle concurrently.
|
||||
|
||||
**Resolution:** _(open)_
|
||||
|
||||
### Driver.AbLegacy-007
|
||||
|
||||
| Field | Value |
|
||||
|---|---|
|
||||
| Severity | Medium |
|
||||
| Category | Concurrency & thread safety |
|
||||
| Location | `AbLegacyDriver.cs:411-438`, `AbLegacyDriver.cs:386-409` |
|
||||
| Status | Open |
|
||||
|
||||
**Description:** `EnsureTagRuntimeAsync` and `EnsureParentRuntimeAsync` are
|
||||
check-then-act: `device.Runtimes.TryGetValue(...)` then, after `await
|
||||
runtime.InitializeAsync`, `device.Runtimes[def.Name] = runtime`. `Dictionary` is not
|
||||
thread-safe, and two concurrent callers for the same tag (read + poll, or two poll
|
||||
loops) both miss the lookup, both Create + InitializeAsync a runtime, and both write
|
||||
the dictionary. One runtime is overwritten and leaked - `DisposeRuntimes` only
|
||||
disposes what is currently in the dict - and concurrent `Dictionary` writes can
|
||||
corrupt internal state. `ParentRuntimes` has the identical pattern.
|
||||
|
||||
**Recommendation:** Replace the runtime caches with `ConcurrentDictionary` and use
|
||||
`GetOrAdd`, or guard runtime creation under a per-device lock. Ensure the losing
|
||||
runtime of any race is disposed.
|
||||
|
||||
**Resolution:** _(open)_
|
||||
|
||||
### Driver.AbLegacy-008
|
||||
|
||||
| Field | Value |
|
||||
|---|---|
|
||||
| Severity | Medium |
|
||||
| Category | Concurrency & thread safety |
|
||||
| Location | `AbLegacyDriver.cs:21`, `AbLegacyDriver.cs:138-146`, `AbLegacyDriver.cs:216-229` |
|
||||
| Status | Open |
|
||||
|
||||
**Description:** `_health` is a plain non-volatile reference field mutated from
|
||||
`ReadAsync`, `WriteAsync` (both can run on multiple threads / poll loops) and
|
||||
`InitializeAsync`/`ShutdownAsync`, and read by `GetHealth()` from yet another thread.
|
||||
There is no lock, no `volatile`, and no `Interlocked` exchange. The record reference
|
||||
assignment is atomic, but without a memory barrier a reader can observe a stale
|
||||
`_health` indefinitely, and concurrent writers race so a `Healthy` write from one
|
||||
successful read can clobber a `Degraded` write from a concurrent failing read.
|
||||
`GetHealth()` may therefore report `Healthy` while reads are persistently failing.
|
||||
|
||||
**Recommendation:** Mark `_health` volatile, or funnel health transitions through a
|
||||
lock / `Interlocked.Exchange`. Consider only downgrading on failure and upgrading on a
|
||||
successful poll so a single failed read does not flap the surface.
|
||||
|
||||
**Resolution:** _(open)_
|
||||
|
||||
### Driver.AbLegacy-009
|
||||
|
||||
| Field | Value |
|
||||
|---|---|
|
||||
| Severity | Medium |
|
||||
| Category | Error handling & resilience |
|
||||
| Location | `AbLegacyDriver.cs:41-74` |
|
||||
| Status | Open |
|
||||
|
||||
**Description:** `InitializeAsync` starts probe loops with `Task.Run` inside the try
|
||||
block. If `InitializeAsync` fails - or is re-entered - after some probe loops are
|
||||
already started, the catch only sets `_health = Faulted` and rethrows; it does not
|
||||
cancel `state.ProbeCts`, dispose runtimes, or clear `_devices`. A caller that catches
|
||||
the exception and retries via `ReinitializeAsync` is covered (it calls `ShutdownAsync`
|
||||
first), but a caller that catches and abandons the driver leaves orphaned probe tasks
|
||||
and `CancellationTokenSource`s alive holding libplctag handles. Separately,
|
||||
`ProbeLoopAsync` never escalates a permanently-unreachable device beyond `Stopped`.
|
||||
|
||||
**Recommendation:** On the catch path in `InitializeAsync`, run the same teardown as
|
||||
`ShutdownAsync` (cancel probe CTSs, dispose runtimes, clear dictionaries) before
|
||||
rethrowing, so a failed initialise leaves no live background work.
|
||||
|
||||
**Resolution:** _(open)_
|
||||
|
||||
### Driver.AbLegacy-010
|
||||
|
||||
| Field | Value |
|
||||
|---|---|
|
||||
| Severity | Medium |
|
||||
| Category | Error handling & resilience |
|
||||
| Location | `AbLegacyStatusMapper.cs:26-56` |
|
||||
| Status | Open |
|
||||
|
||||
**Description:** `MapLibplctagStatus` maps the integer codes -5/-7/-14/-16/-17. These
|
||||
do not match the native libplctag PLCTAG_ERR_* constants (PLCTAG_ERR_TIMEOUT = -32,
|
||||
PLCTAG_ERR_NOT_FOUND = -22, PLCTAG_ERR_NOT_ALLOWED = -21, PLCTAG_ERR_OUT_OF_BOUNDS =
|
||||
-25, PLCTAG_ERR_BAD_CONNECTION = -8). The mapper operates on `(int)_tag.GetStatus()`,
|
||||
where `GetStatus()` returns the libplctag .NET wrapper Status enum whose underlying
|
||||
ordinals differ from the native codes - so the -5/-7/... values are at best the .NET
|
||||
enum ordinals (unverified, undocumented) and at worst wrong. Any unmatched negative
|
||||
status falls through to `BadCommunicationError`, so a timeout is reported as a generic
|
||||
comms error rather than `BadTimeout`. `MapPcccStatus` is dead code - the PCCC STS byte
|
||||
is never inspected because libplctag surfaces only its own status enum.
|
||||
|
||||
**Recommendation:** Verify the actual `libplctag.Status` enum values against the 1.5.2
|
||||
package and map by enum name rather than magic integers. Either wire `MapPcccStatus`
|
||||
into a real PCCC-STS path or delete it as dead code. The same defect exists in
|
||||
`AbCipStatusMapper` and should be fixed in lockstep.
|
||||
|
||||
**Resolution:** _(open)_
|
||||
|
||||
### Driver.AbLegacy-011
|
||||
|
||||
| Field | Value |
|
||||
|---|---|
|
||||
| Severity | Low |
|
||||
| Category | Performance & resource management |
|
||||
| Location | `AbLegacyDriver.cs:440` |
|
||||
| Status | Open |
|
||||
|
||||
**Description:** `Dispose()` is implemented as
|
||||
`DisposeAsync().AsTask().GetAwaiter().GetResult()` - sync-over-async. `ShutdownAsync`
|
||||
awaits `_poll.DisposeAsync()` (which completes synchronously) and does no other real
|
||||
async work, so a deadlock is unlikely in practice, but the pattern blocks the calling
|
||||
thread and would deadlock if any awaited continuation were ever marshalled back to a
|
||||
single-threaded synchronization context.
|
||||
|
||||
**Recommendation:** Prefer callers use `IAsyncDisposable`. If a synchronous `Dispose()`
|
||||
must exist, perform the synchronous teardown directly (cancel CTSs, dispose runtimes)
|
||||
rather than blocking on the async path.
|
||||
|
||||
**Resolution:** _(open)_
|
||||
|
||||
### Driver.AbLegacy-012
|
||||
|
||||
| Field | Value |
|
||||
|---|---|
|
||||
| Severity | Medium |
|
||||
| Category | Design-document adherence |
|
||||
| Location | `PlcFamilies/AbLegacyPlcFamilyProfile.cs:7-54`, `AbLegacyDriver.cs:48-52` |
|
||||
| Status | Open |
|
||||
|
||||
**Description:** `AbLegacyPlcFamilyProfile` declares four record properties -
|
||||
`DefaultCipPath`, `MaxTagBytes`, `SupportsStringFile`, `SupportsLongFile` - and only
|
||||
`LibplctagPlcAttribute` is ever consumed. In particular:
|
||||
- `DefaultCipPath` is dead: the per-family default path (empty for MicroLogix, 1,0
|
||||
for SLC/PLC-5) is never used to substitute an empty CIP path. The CIP path always
|
||||
comes verbatim from `AbLegacyHostAddress.CipPath`, so a SLC 500 misconfigured with
|
||||
an empty path is never corrected to 1,0 even though the profile knows the right
|
||||
default - contradicting the test-fixture doc, which calls out the /1,0 cip-path
|
||||
workaround as required for SLC.
|
||||
- `MaxTagBytes` is never used to validate or chunk a string/array read.
|
||||
- `SupportsStringFile`/`SupportsLongFile` are never checked, so a `String` or `Long`
|
||||
tag configured against a MicroLogix or PLC-5 (which the profile says lack them) is
|
||||
accepted and only fails at runtime with an opaque comms error.
|
||||
|
||||
**Recommendation:** Either consume the profile fields (substitute `DefaultCipPath` when
|
||||
the host CIP path is empty; reject `Long`/`String` tags against families whose profile
|
||||
sets the corresponding flag false; use `MaxTagBytes` for validation) or remove the
|
||||
unused fields and the doc comments that imply they are load-bearing.
|
||||
|
||||
**Resolution:** _(open)_
|
||||
|
||||
### Driver.AbLegacy-013
|
||||
|
||||
| Field | Value |
|
||||
|---|---|
|
||||
| Severity | Low |
|
||||
| Category | Code organization & conventions |
|
||||
| Location | `AbLegacyDriver.cs:340-345`, `AbLegacyDriver.cs:238-264` |
|
||||
| Status | Open |
|
||||
|
||||
**Description:** Two minor organisational issues:
|
||||
1. `ResolveHost` returns `_options.Devices.FirstOrDefault()?.HostAddress ??
|
||||
DriverInstanceId` when the reference is unknown and no devices are configured.
|
||||
`DriverInstanceId` is not a host address (ab://...), so a downstream
|
||||
`IHostConnectivityProbe` / host lookup keyed on the returned value never matches a
|
||||
real device. Returning the instance id as a fake host masks a configuration error.
|
||||
2. `DiscoverAsync` always emits `IsArray: false` / `ArrayDim: null`. PCCC files are
|
||||
inherently arrays of elements; a tag that genuinely addresses a multi-element
|
||||
region cannot be represented. This is consistent with the PR-staged scope (the doc
|
||||
says array coverage is thin) but should be tracked rather than silently shipped.
|
||||
|
||||
**Recommendation:** For (1), either throw / return a sentinel the caller can detect, or
|
||||
document why falling back to the instance id is acceptable. For (2), record the
|
||||
array-addressing gap as a tracked follow-up.
|
||||
|
||||
**Resolution:** _(open)_
|
||||
192
code-reviews/Driver.Cli.Common/findings.md
Normal file
192
code-reviews/Driver.Cli.Common/findings.md
Normal file
@@ -0,0 +1,192 @@
|
||||
# Code Review — Driver.Cli.Common
|
||||
|
||||
| Field | Value |
|
||||
|---|---|
|
||||
| Module | `src/Drivers/Cli/ZB.MOM.WW.OtOpcUa.Driver.Cli.Common` |
|
||||
| Reviewer | Claude Code |
|
||||
| Review date | 2026-05-22 |
|
||||
| Commit reviewed | `76d35d1` |
|
||||
| Status | Reviewed |
|
||||
| Open findings | 6 |
|
||||
|
||||
## Checklist coverage
|
||||
|
||||
A comprehensive review completes every category, recording "No issues found" where
|
||||
a category produced nothing rather than leaving it blank.
|
||||
|
||||
| # | Category | Result |
|
||||
|---|---|---|
|
||||
| 1 | Correctness & logic bugs | Driver.Cli.Common-001, Driver.Cli.Common-002 |
|
||||
| 2 | OtOpcUa conventions | No issues found |
|
||||
| 3 | Concurrency & thread safety | Driver.Cli.Common-003 |
|
||||
| 4 | Error handling & resilience | Driver.Cli.Common-004 |
|
||||
| 5 | Security | No issues found |
|
||||
| 6 | Performance & resource management | No issues found |
|
||||
| 7 | Design-document adherence | No issues found |
|
||||
| 8 | Code organization & conventions | No issues found |
|
||||
| 9 | Testing coverage | Driver.Cli.Common-005 |
|
||||
| 10 | Documentation & comments | Driver.Cli.Common-006 |
|
||||
|
||||
## Findings
|
||||
|
||||
### Driver.Cli.Common-001
|
||||
|
||||
| Field | Value |
|
||||
|---|---|
|
||||
| Severity | High |
|
||||
| Category | Correctness & logic bugs |
|
||||
| Location | `src/Drivers/Cli/ZB.MOM.WW.OtOpcUa.Driver.Cli.Common/SnapshotFormatter.cs:106-119` |
|
||||
| Status | Open |
|
||||
|
||||
**Description:** The `FormatStatus` shortlist maps four OPC UA status names to incorrect
|
||||
numeric codes. The correct OPC UA spec values (verified against the OPC Foundation
|
||||
UA-.NETStandard `Opc.Ua.StatusCodes` table) are:
|
||||
|
||||
| Name in shortlist | Code used | Correct code | What the used code actually is |
|
||||
|---|---|---|---|
|
||||
| `BadTimeout` | `0x80060000` | `0x800A0000` | `0x80060000` = `BadOutOfMemory` |
|
||||
| `BadNoCommunication` | `0x80070000` | `0x80310000` | `0x80070000` = `BadResourceUnavailable` |
|
||||
| `BadWaitingForInitialData` | `0x80080000` | `0x80320000` | `0x80080000` is not this name |
|
||||
| `BadNodeIdInvalid` | `0x80350000` | `0x80330000` | `0x80350000` = `BadNodeClassInvalid` |
|
||||
|
||||
`Good` (`0x00000000`), `Bad` (`0x80000000`), `BadCommunicationError` (`0x80050000`),
|
||||
`BadNodeIdUnknown` (`0x80340000`), `BadTypeMismatch` (`0x80740000`), and `Uncertain`
|
||||
(`0x40000000`) are correct.
|
||||
|
||||
This is operator-facing and load-bearing: the CLI whole purpose is to label driver
|
||||
status codes so a human can interpret a probe/read/write. A real device timeout
|
||||
(`0x800A0000`) renders as bare `0x800A0000` with no name, while an out-of-memory
|
||||
status (`0x80060000`) is mislabeled `BadTimeout`. A driver returning
|
||||
`BadNodeClassInvalid` (`0x80350000`) is mislabeled `BadNodeIdInvalid`. The
|
||||
`SnapshotFormatterTests` `[Theory]` cases for these codes assert against the wrong
|
||||
expectations and therefore pass while the mapping is wrong (see Driver.Cli.Common-005).
|
||||
|
||||
**Recommendation:** Correct the four mappings to the spec values. Prefer deriving names
|
||||
from the OPC Foundation `Opc.Ua.StatusCodes` constants (the stack the project already
|
||||
depends on transitively) rather than hand-maintaining a hex shortlist, so the table
|
||||
cannot drift from the spec again. If a hand-list is kept, add a test that cross-checks
|
||||
each entry against `Opc.Ua.StatusCodes` reflection.
|
||||
|
||||
**Resolution:** _(open)_
|
||||
|
||||
### Driver.Cli.Common-002
|
||||
|
||||
| Field | Value |
|
||||
|---|---|
|
||||
| Severity | Medium |
|
||||
| Category | Correctness & logic bugs |
|
||||
| Location | `src/Drivers/Cli/ZB.MOM.WW.OtOpcUa.Driver.Cli.Common/SnapshotFormatter.cs:101-122` |
|
||||
| Status | Open |
|
||||
|
||||
**Description:** `FormatStatus` matches the full 32-bit status word for exact equality
|
||||
against the shortlist. OPC UA status codes carry sub-code/flag bits in the low 16 bits
|
||||
(info type, structure-changed, semantics-changed, limit bits, overflow, etc.). A
|
||||
driver-supplied status such as `0x80050001` or any `Good` value with info bits set
|
||||
(e.g. an overflow bit) falls through the `switch` and renders as bare hex even though
|
||||
the high bits clearly identify the severity class. The doc comment on `FormatStatus`
|
||||
claims the well-known statuses are named, but only the bit-exact canonical forms are.
|
||||
|
||||
**Recommendation:** Either (a) narrow the doc-comment claim to bit-exact canonical
|
||||
codes, or (b) match on the severity bits (`code & 0xC0000000`) to at least always emit
|
||||
`Good` / `Uncertain` / `Bad` even when sub-code bits are set, and match the named codes
|
||||
on the masked code (`code & 0xFFFF0000`).
|
||||
|
||||
**Resolution:** _(open)_
|
||||
|
||||
### Driver.Cli.Common-003
|
||||
|
||||
| Field | Value |
|
||||
|---|---|
|
||||
| Severity | Medium |
|
||||
| Category | Concurrency & thread safety |
|
||||
| Location | `src/Drivers/Cli/ZB.MOM.WW.OtOpcUa.Driver.Cli.Common/DriverCommandBase.cs:51-59` |
|
||||
| Status | Open |
|
||||
|
||||
**Description:** `ConfigureLogging` assigns the process-global `Serilog.Log.Logger`
|
||||
without disposing the previously assigned logger and the library never calls
|
||||
`Log.CloseAndFlush()`. Each call creates a fresh `Logger` via `CreateLogger()` and
|
||||
overwrites `Log.Logger`; the prior instance (and its console sink) is never disposed
|
||||
or flushed. The class is the shared base for every driver CLI and the `subscribe` verb
|
||||
is long-running — if any command path re-invokes `ConfigureLogging` the buffered
|
||||
console sink is abandoned without a flush, and on process exit the final logger is also
|
||||
never flushed. Verbose debug output written just before exit can be lost.
|
||||
|
||||
**Recommendation:** Call `Log.CloseAndFlush()` on shutdown (e.g. in a `finally` in the
|
||||
command `ExecuteAsync`, or via a `protected` disposal helper on this base). Treat
|
||||
`ConfigureLogging` as call-once / idempotent and document that. At minimum capture and
|
||||
dispose the previous logger if reconfiguration is genuinely intended.
|
||||
|
||||
**Resolution:** _(open)_
|
||||
|
||||
### Driver.Cli.Common-004
|
||||
|
||||
| Field | Value |
|
||||
|---|---|
|
||||
| Severity | Low |
|
||||
| Category | Error handling & resilience |
|
||||
| Location | `src/Drivers/Cli/ZB.MOM.WW.OtOpcUa.Driver.Cli.Common/SnapshotFormatter.cs:68-70` |
|
||||
| Status | Open |
|
||||
|
||||
**Description:** `FormatTable` calls `rows.Max(r => r.Tag.Length)` (and the same for the
|
||||
value and status columns) without guarding against empty input. When `tagNames` and
|
||||
`snapshots` are both empty (equal length, so the mismatch check at line 56 passes),
|
||||
`Enumerable.Max` throws `InvalidOperationException` ("Sequence contains no elements").
|
||||
A batch read that legitimately returns zero tags therefore crashes the formatter
|
||||
instead of producing an empty (header-only) table.
|
||||
|
||||
**Recommendation:** Short-circuit on `rows.Length == 0` (return just the header +
|
||||
separator, or an explicit "no rows" line), or use `DefaultIfEmpty(0).Max(...)` for the
|
||||
width computations.
|
||||
|
||||
**Resolution:** _(open)_
|
||||
|
||||
### Driver.Cli.Common-005
|
||||
|
||||
| Field | Value |
|
||||
|---|---|
|
||||
| Severity | Medium |
|
||||
| Category | Testing coverage |
|
||||
| Location | `tests/Drivers/Cli/ZB.MOM.WW.OtOpcUa.Driver.Cli.Common.Tests/SnapshotFormatterTests.cs:27-37` |
|
||||
| Status | Open |
|
||||
|
||||
**Description:** The `FormatStatus_names_well_known_status_codes` `[Theory]` asserts
|
||||
`0x80060000 => "BadTimeout"`, which encodes the wrong spec value (see
|
||||
Driver.Cli.Common-001). The test passes because it validates the formatter against the
|
||||
same incorrect table, so the bug is invisible to CI. Additionally there is no coverage
|
||||
for: `DriverCommandBase` (`ConfigureLogging` verbose vs non-verbose level selection — no
|
||||
test exercises the base at all), `FormatTable` with empty input (Driver.Cli.Common-004
|
||||
would have been caught), `FormatValue` with array / enum / custom `object` values, and
|
||||
`FormatTimestamp` with `DateTimeKind.Unspecified` (the docs imply Unspecified is
|
||||
normalised but only `Local` is tested).
|
||||
|
||||
**Recommendation:** Fix the `[Theory]` expectations once Driver.Cli.Common-001 is
|
||||
resolved, and add a test asserting each shortlist entry against the OPC Foundation
|
||||
`Opc.Ua.StatusCodes` constants so the table cannot silently drift. Add `FormatTable`
|
||||
empty-input and `DriverCommandBase` level-selection tests.
|
||||
|
||||
**Resolution:** _(open)_
|
||||
|
||||
### Driver.Cli.Common-006
|
||||
|
||||
| Field | Value |
|
||||
|---|---|
|
||||
| Severity | Low |
|
||||
| Category | Documentation & comments |
|
||||
| Location | `src/Drivers/Cli/ZB.MOM.WW.OtOpcUa.Driver.Cli.Common/SnapshotFormatter.cs:71`, `src/Drivers/Cli/ZB.MOM.WW.OtOpcUa.Driver.Cli.Common/DriverCommandBase.cs:9` |
|
||||
| Status | Open |
|
||||
|
||||
**Description:** Two minor doc inaccuracies. (1) The comment at `SnapshotFormatter.cs:71`
|
||||
states the "source-time column is fixed-width (ISO-8601 to ms) so no max-measurement
|
||||
needed" — true only when every snapshot has a non-null `SourceTimestampUtc`.
|
||||
`FormatTimestamp` returns `"-"` for a null timestamp, so a mixed table has a 1-char-wide
|
||||
cell in an otherwise 24-char column; the column is unaligned. Harmless (right-most, no
|
||||
padding consumer) but the stated invariant does not hold. (2) The `DriverCommandBase`
|
||||
class summary enumerates "Modbus / AB CIP / AB Legacy / S7 / TwinCAT" as the driver CLIs
|
||||
but omits FOCAS, which `docs/DriverClis.md` lists as the sixth CLI built on this shared
|
||||
library. The XML doc is stale relative to the shipped driver-CLI set.
|
||||
|
||||
**Recommendation:** Reword the `SnapshotFormatter.cs:71` comment to note the column is
|
||||
right-most and intentionally unpadded rather than claiming fixed width. Add FOCAS to the
|
||||
`DriverCommandBase` class-summary driver list.
|
||||
|
||||
**Resolution:** _(open)_
|
||||
183
code-reviews/Driver.FOCAS.Cli/findings.md
Normal file
183
code-reviews/Driver.FOCAS.Cli/findings.md
Normal file
@@ -0,0 +1,183 @@
|
||||
# Code Review — Driver.FOCAS.Cli
|
||||
|
||||
| Field | Value |
|
||||
|---|---|
|
||||
| Module | `src/Drivers/Cli/ZB.MOM.WW.OtOpcUa.Driver.FOCAS.Cli` |
|
||||
| Reviewer | Claude Code |
|
||||
| Review date | 2026-05-22 |
|
||||
| Commit reviewed | `76d35d1` |
|
||||
| Status | Reviewed |
|
||||
| Open findings | 5 |
|
||||
|
||||
## Checklist coverage
|
||||
|
||||
A comprehensive review completes every category, recording "No issues found" where
|
||||
a category produced nothing rather than leaving it blank.
|
||||
|
||||
| # | Category | Result |
|
||||
|---|---|---|
|
||||
| 1 | Correctness & logic bugs | Driver.FOCAS.Cli-001 |
|
||||
| 2 | OtOpcUa conventions | No issues found |
|
||||
| 3 | Concurrency & thread safety | Driver.FOCAS.Cli-002 |
|
||||
| 4 | Error handling & resilience | Driver.FOCAS.Cli-001, Driver.FOCAS.Cli-003 |
|
||||
| 5 | Security | No issues found |
|
||||
| 6 | Performance & resource management | Driver.FOCAS.Cli-004 |
|
||||
| 7 | Design-document adherence | Driver.FOCAS.Cli-005 |
|
||||
| 8 | Code organization & conventions | No issues found |
|
||||
| 9 | Testing coverage | No issues found (see note) |
|
||||
| 10 | Documentation & comments | No issues found |
|
||||
|
||||
> Category 9 note: per `docs/DriverClis.md` the FOCAS CLI deliberately ships
|
||||
> with no CLI-level test project (hardware-gated, followed the Tier-C isolation
|
||||
> work on task #220). The four command classes are thin pass-throughs to the
|
||||
> already-tested `FocasDriver`; the only CLI-local logic is `ParseValue` /
|
||||
> `ParseBool` / `SynthesiseTagName`, which the sibling CLIs cover with unit
|
||||
> tests. The absence of a `*.Cli.Tests` project is an intentional, documented
|
||||
> gap rather than a review finding — but see Driver.FOCAS.Cli-001 for the parse
|
||||
> path that would benefit most from coverage.
|
||||
|
||||
## Findings
|
||||
|
||||
### Driver.FOCAS.Cli-001
|
||||
|
||||
| Field | Value |
|
||||
|---|---|
|
||||
| Severity | Low |
|
||||
| Category | Error handling & resilience |
|
||||
| Location | `Commands/WriteCommand.cs:58-68` |
|
||||
| Status | Open |
|
||||
|
||||
**Description:** `WriteCommand.ParseValue` parses the numeric `--value` types
|
||||
(`Byte`/`Int16`/`Int32`/`Float32`/`Float64`) with `sbyte.Parse` / `short.Parse`
|
||||
/ etc. These throw raw `FormatException` or `OverflowException` for malformed or
|
||||
out-of-range input. Only the `Bit` case and the unsupported-type case throw
|
||||
`CliFx.Exceptions.CommandException`. CliFx renders a `CommandException` as a
|
||||
clean one-line error, but an uncaught `FormatException`/`OverflowException`
|
||||
surfaces as a full .NET stack trace — a poor experience for an operator who
|
||||
simply mistyped a value (e.g. `write -a R100 -t Int16 -v abc`). The parse
|
||||
failure occurs before any driver work, so the redundant stack trace also
|
||||
obscures that the write never reached the CNC.
|
||||
|
||||
**Recommendation:** Wrap the numeric parses (e.g. via `TryParse` per type, or a
|
||||
`try`/`catch` that rethrows as `CommandException`) so malformed `--value` input
|
||||
produces a clean, actionable message naming the expected type and the rejected
|
||||
literal — consistent with how `ParseBool` already handles bad boolean input.
|
||||
The same pattern exists in the sibling S7 CLI; a shared helper in
|
||||
`Driver.Cli.Common` would fix both.
|
||||
|
||||
**Resolution:** _(open)_
|
||||
|
||||
### Driver.FOCAS.Cli-002
|
||||
|
||||
| Field | Value |
|
||||
|---|---|
|
||||
| Severity | Low |
|
||||
| Category | Concurrency & thread safety |
|
||||
| Location | `Commands/SubscribeCommand.cs:45-51` |
|
||||
| Status | Open |
|
||||
|
||||
**Description:** The `subscribe` command attaches an `OnDataChange` handler that
|
||||
calls the synchronous `console.Output.WriteLine`. `OnDataChange` is raised from
|
||||
the driver's `PollGroupEngine` tick thread, while the command's main flow writes
|
||||
the "Subscribed to ..." banner from the CliFx invocation thread. The CliFx
|
||||
`IConsole.Output` `TextWriter` is not documented as thread-safe; with a single
|
||||
poll group the change events are serialised, but the banner write at line 55-56
|
||||
can interleave with the first poll-driven change line. The handler is also never
|
||||
detached from the event before driver disposal — benign here because the driver
|
||||
is disposed in the same `finally`, but it leaves a dangling subscription if the
|
||||
command is ever refactored to reuse the driver.
|
||||
|
||||
**Recommendation:** Write the "Subscribed" banner before wiring the
|
||||
`OnDataChange` handler (it is informational and ordering-sensitive), or guard
|
||||
console writes with a lock shared between the banner and the handler. Optionally
|
||||
detach the handler in the `finally` block before `ShutdownAsync` for symmetry
|
||||
with the `handle` teardown already present there.
|
||||
|
||||
**Resolution:** _(open)_
|
||||
|
||||
### Driver.FOCAS.Cli-003
|
||||
|
||||
| Field | Value |
|
||||
|---|---|
|
||||
| Severity | Low |
|
||||
| Category | Error handling & resilience |
|
||||
| Location | `FocasCommandBase.cs:19` (`CncPort`), `FocasCommandBase.cs:27` (`TimeoutMs`), `Commands/SubscribeCommand.cs:23` (`IntervalMs`) |
|
||||
| Status | Open |
|
||||
|
||||
**Description:** The numeric command options `--cnc-port`, `--timeout-ms`, and
|
||||
`--interval-ms` are accepted without range validation. A zero or negative
|
||||
`--cnc-port` produces an invalid `focas://host:<n>` string; `--timeout-ms 0`
|
||||
yields a zero `TimeSpan` operation timeout; a zero/negative `--interval-ms`
|
||||
produces a non-positive `publishingInterval` passed straight into
|
||||
`PollGroupEngine.Subscribe`. Depending on the engine tolerance these surface
|
||||
either as an opaque downstream exception or as a tight-spinning poll loop rather
|
||||
than a clear "value must be positive" message at the CLI boundary.
|
||||
|
||||
**Recommendation:** Validate the three numeric options at the top of
|
||||
`ExecuteAsync` (or in `FocasCommandBase`) and throw a
|
||||
`CliFx.Exceptions.CommandException` when out of range — port in `1..65535`,
|
||||
timeout and interval strictly positive. The same gap exists across the sibling
|
||||
driver CLIs, so a shared validation helper in `Driver.Cli.Common` is the
|
||||
cleaner fix.
|
||||
|
||||
**Resolution:** _(open)_
|
||||
|
||||
### Driver.FOCAS.Cli-004
|
||||
|
||||
| Field | Value |
|
||||
|---|---|
|
||||
| Severity | Low |
|
||||
| Category | Performance & resource management |
|
||||
| Location | `Commands/ProbeCommand.cs:37,54`; `Commands/ReadCommand.cs:37,46`; `Commands/WriteCommand.cs:45,54`; `Commands/SubscribeCommand.cs:39,73` |
|
||||
| Status | Open |
|
||||
|
||||
**Description:** Every command declares `await using var driver = new FocasDriver(...)`
|
||||
**and** explicitly calls `await driver.ShutdownAsync(CancellationToken.None)` in
|
||||
the `finally` block. `FocasDriver.DisposeAsync()` itself calls `ShutdownAsync`,
|
||||
so shutdown runs twice per command invocation. `FocasDriver.ShutdownAsync` is
|
||||
idempotent (it clears `_devices` / `_tagsByName`, and the second pass iterates
|
||||
an empty collection), so there is no functional bug — but the redundant call is
|
||||
dead weight and obscures intent: a reader cannot tell whether the explicit
|
||||
`ShutdownAsync` or the `await using` is the real teardown.
|
||||
|
||||
**Recommendation:** Drop the explicit `ShutdownAsync` from the `finally` blocks
|
||||
and rely on `await using` for disposal, or drop `await using` and keep the
|
||||
explicit teardown — but not both. The same redundancy exists in the sibling CLIs.
|
||||
|
||||
**Resolution:** _(open)_
|
||||
|
||||
### Driver.FOCAS.Cli-005
|
||||
|
||||
| Field | Value |
|
||||
|---|---|
|
||||
| Severity | Low |
|
||||
| Category | Design-document adherence |
|
||||
| Location | `Commands/WriteCommand.cs:50`, `Commands/ProbeCommand.cs:50` (via `SnapshotFormatter.FormatStatus`) |
|
||||
| Status | Open |
|
||||
|
||||
**Description:** `docs/Driver.FOCAS.Cli.md` documents `BadDeviceFailure` and
|
||||
`BadCommunicationError` as the key diagnostic signals an operator reads off
|
||||
`probe` / `write` output ("A `BadCommunicationError` means ... `BadDeviceFailure`
|
||||
after a successful connect means ..."). The FOCAS driver `FocasStatusMapper`
|
||||
also emits `BadNotWritable` (0x803B0000), `BadOutOfRange` (0x803C0000),
|
||||
`BadNotSupported` (0x803D0000), `BadDeviceFailure` (0x80550000),
|
||||
`BadInternalError` (0x80020000), and `BadTimeout` (0x800A0000). The shared
|
||||
`SnapshotFormatter.FormatStatus` shortlist only names `Good`, `Bad`,
|
||||
`BadCommunicationError`, `BadTimeout` (0x80060000 — note this is a *different*
|
||||
code than the mapper `BadTimeout` 0x800A0000), `BadNoCommunication`,
|
||||
`BadWaitingForInitialData`, `BadNodeIdUnknown`, `BadNodeIdInvalid`,
|
||||
`BadTypeMismatch`, and `Uncertain`. Consequently a FOCAS `write` to a
|
||||
non-writable address, a parameter-write rejected by the CNC, or a
|
||||
`BadDeviceFailure` session-setup rejection renders as a bare hex code
|
||||
(`0x803B0000`, `0x80550000`, …) with no name — directly contradicting the
|
||||
documented workflow where the operator is told to read those status names.
|
||||
|
||||
**Recommendation:** Extend `SnapshotFormatter.FormatStatus` (in
|
||||
`Driver.Cli.Common`) to name the `Bad*` codes the native-protocol drivers
|
||||
actually emit — at minimum `BadNotWritable`, `BadOutOfRange`, `BadNotSupported`,
|
||||
`BadDeviceFailure`, `BadInternalError`, and the mapper `BadTimeout`
|
||||
(0x800A0000). The fix belongs in the shared library, but it is recorded here
|
||||
because the gap defeats this module documented `probe`/`write` diagnostic
|
||||
workflow; cross-reference the `Driver.Cli.Common` review.
|
||||
|
||||
**Resolution:** _(open)_
|
||||
330
code-reviews/Driver.FOCAS/findings.md
Normal file
330
code-reviews/Driver.FOCAS/findings.md
Normal file
@@ -0,0 +1,330 @@
|
||||
# Code Review — Driver.FOCAS
|
||||
|
||||
| Field | Value |
|
||||
|---|---|
|
||||
| Module | `src/Drivers/ZB.MOM.WW.OtOpcUa.Driver.FOCAS` |
|
||||
| Reviewer | Claude Code |
|
||||
| Review date | 2026-05-22 |
|
||||
| Commit reviewed | `76d35d1` |
|
||||
| Status | Reviewed |
|
||||
| Open findings | 12 |
|
||||
|
||||
## Checklist coverage
|
||||
|
||||
A comprehensive review completes every category, recording "No issues found" where
|
||||
a category produced nothing rather than leaving it blank.
|
||||
|
||||
| # | Category | Result |
|
||||
|---|---|---|
|
||||
| 1 | Correctness & logic bugs | Driver.FOCAS-001, Driver.FOCAS-002, Driver.FOCAS-003 |
|
||||
| 2 | OtOpcUa conventions | Driver.FOCAS-004 |
|
||||
| 3 | Concurrency & thread safety | Driver.FOCAS-005 |
|
||||
| 4 | Error handling & resilience | Driver.FOCAS-006, Driver.FOCAS-007 |
|
||||
| 5 | Security | No issues found |
|
||||
| 6 | Performance & resource management | Driver.FOCAS-008 |
|
||||
| 7 | Design-document adherence | Driver.FOCAS-009 |
|
||||
| 8 | Code organization & conventions | Driver.FOCAS-010, Driver.FOCAS-011 |
|
||||
| 9 | Testing coverage | Driver.FOCAS-012 |
|
||||
| 10 | Documentation & comments | No issues found |
|
||||
|
||||
## Findings
|
||||
|
||||
### Driver.FOCAS-001
|
||||
|
||||
| Field | Value |
|
||||
|---|---|
|
||||
| Severity | High |
|
||||
| Category | Correctness & logic bugs |
|
||||
| Location | `FocasDriverFactoryExtensions.cs:54-86`, `FocasDriverFactoryExtensions.cs:132-140` |
|
||||
| Status | Open |
|
||||
|
||||
**Description:** `FocasDriverConfigDto` exposes only `Backend`, `Series`, `TimeoutMs`,
|
||||
`Devices`, `Tags`, and `Probe`. It has no `FixedTree`, `AlarmProjection`, or
|
||||
`HandleRecycle` properties, and `CreateInstance` never sets those three options on
|
||||
`FocasDriverOptions`. As a result, a deployment that follows the documented config -
|
||||
`docs/drivers/FOCAS.md` shows `"FixedTree": { "Enabled": true }`,
|
||||
`"AlarmProjection": { "Enabled": true }`, and `"HandleRecycle": { "Enabled": true }`
|
||||
inside `Config` - is parsed with `PropertyNameCaseInsensitive` and the unknown sections
|
||||
are discarded. The features stay at their hard-coded defaults (all `Enabled = false`).
|
||||
The fixed-node tree never appears, alarm subscriptions throw `NotSupportedException`
|
||||
("FOCAS alarm projection is disabled"), and handle recycling never runs - despite the
|
||||
operator explicitly opting in.
|
||||
|
||||
**Recommendation:** Add `FixedTree`, `AlarmProjection`, and `HandleRecycle` DTO classes
|
||||
to `FocasDriverConfigDto`, parse their `TimeSpan`/`bool` fields, and populate the
|
||||
corresponding `FocasDriverOptions` properties in `CreateInstance`. Consider enabling
|
||||
strict JSON handling (`UnmappedMemberHandling.Disallow`) so future unknown config
|
||||
sections fail loudly instead of being dropped.
|
||||
|
||||
**Resolution:** _(open)_
|
||||
|
||||
### Driver.FOCAS-002
|
||||
|
||||
| Field | Value |
|
||||
|---|---|
|
||||
| Severity | High |
|
||||
| Category | Correctness & logic bugs |
|
||||
| Location | `WireFocasClient.cs:164-179`, `FocasDriver.cs:513`, `FocasDriver.cs:593` |
|
||||
| Status | Open |
|
||||
|
||||
**Description:** The fixed-tree bootstrap probes the `ProgramInfo` capability via
|
||||
`SafeTryProbe(() => client.GetProgramInfoAsync(ct))` and treats a non-null result as
|
||||
"supported". But `WireFocasClient.GetProgramInfoAsync` never throws on a FOCAS error
|
||||
return code: `ReadExecutingProgramNameAsync`, `ReadBlockCountAsync`, and
|
||||
`ReadOperationModeCodeAsync` all return `FocasResult<T>` envelopes, and the method
|
||||
substitutes defaults (`string.Empty`, `0`) when `IsOk` is false instead of throwing. It
|
||||
only throws from `RequireConnected()`. Consequently `GetProgramInfoAsync` always
|
||||
returns a non-null `FocasProgramInfo`, so `Capabilities.ProgramInfo` is set `true` even
|
||||
on a CNC series that returns `EW_FUNC`/`EW_NOOPT` for `cnc_exeprgname2`/`cnc_rdopmode`.
|
||||
The driver then emits the `Program/` and `OperationMode/` subtrees and polls them every
|
||||
tick against a controller that does not support them - the exact "nodes that only ever
|
||||
return BadDeviceFailure" outcome the capability suppression was designed to prevent
|
||||
(`docs/drivers/FOCAS.md`, "Per-series node suppression").
|
||||
|
||||
**Recommendation:** Make `GetProgramInfoAsync` throw (or return a nullable result) when
|
||||
the underlying `cnc_exeprgname2` / `cnc_rdopmode` calls report a non-zero RC, so
|
||||
`SafeTryProbe` can correctly classify the series. At minimum require the program-name
|
||||
or op-mode read to be `IsOk` before declaring the capability present.
|
||||
|
||||
**Resolution:** _(open)_
|
||||
|
||||
### Driver.FOCAS-003
|
||||
|
||||
| Field | Value |
|
||||
|---|---|
|
||||
| Severity | Medium |
|
||||
| Category | Correctness & logic bugs |
|
||||
| Location | `FocasDriver.cs:71-79` |
|
||||
| Status | Open |
|
||||
|
||||
**Description:** In `InitializeAsync`, capability-matrix validation only runs when
|
||||
`_devices.TryGetValue(tag.DeviceHostAddress, out var device)` succeeds. A tag whose
|
||||
`DeviceHostAddress` does not match any configured device (a common config typo, e.g. a
|
||||
trailing `:8193` mismatch or a wrong host) silently skips validation and is still added
|
||||
to `_tagsByName`. The mistake is not surfaced at load time - it only manifests at read
|
||||
time as `BadNodeIdUnknown` (`ReadAsync` lines 191-194), defeating the documented goal
|
||||
that "config errors now fail at load instead of per-read"
|
||||
(`docs/v2/focas-version-matrix.md`).
|
||||
|
||||
**Recommendation:** After parsing the tag address, if `_devices` does not contain
|
||||
`tag.DeviceHostAddress`, throw an `InvalidOperationException` naming the tag and the
|
||||
unresolved device host so the operator fixes the typo at startup.
|
||||
|
||||
**Resolution:** _(open)_
|
||||
|
||||
### Driver.FOCAS-004
|
||||
|
||||
| Field | Value |
|
||||
|---|---|
|
||||
| Severity | Medium |
|
||||
| Category | OtOpcUa conventions |
|
||||
| Location | `FocasDriver.cs:374-379`, `WireFocasClient.cs:48-50` |
|
||||
| Status | Open |
|
||||
|
||||
**Description:** `DiscoverAsync` emits user tags with
|
||||
`SecurityClass = tag.Writable ? SecurityClassification.Operate : SecurityClassification.ViewOnly`,
|
||||
and `FocasTagDefinition.Writable` defaults to `true` (also defaulted to `true` in the
|
||||
factory - `t.Writable ?? true`). But the production `wire` backend's
|
||||
`WireFocasClient.WriteAsync` unconditionally returns `FocasStatusMapper.BadNotWritable`
|
||||
- the driver is read-only against FOCAS by design (`docs/drivers/FOCAS.md`). The result
|
||||
is that every tag is advertised in the address space as a writable `Operate` node, yet
|
||||
every write attempt fails. This is misleading to OPC UA clients and to the
|
||||
`DriverNodeManager` ACL layer, which will grant write permission on nodes that can never
|
||||
be written.
|
||||
|
||||
**Recommendation:** Either default `Writable` to `false` for the FOCAS driver, or have
|
||||
`DiscoverAsync` force `SecurityClassification.ViewOnly` when the active backend cannot
|
||||
write. Given the wire backend is read-only and is the only production backend, treating
|
||||
all FOCAS tags as `ViewOnly` is the simplest correct behaviour.
|
||||
|
||||
**Resolution:** _(open)_
|
||||
|
||||
### Driver.FOCAS-005
|
||||
|
||||
| Field | Value |
|
||||
|---|---|
|
||||
| Severity | Medium |
|
||||
| Category | Concurrency & thread safety |
|
||||
| Location | `FocasDriver.cs:28`, `FocasDriver.cs:206-215`, `FocasDriver.cs:261`, `FocasDriver.cs:274` |
|
||||
| Status | Open |
|
||||
|
||||
**Description:** `_health` is a plain (non-volatile) field mutated from multiple
|
||||
concurrent contexts - `ReadAsync`, `WriteAsync`, and the per-device `ProbeLoopAsync` can
|
||||
all run on different threads simultaneously (subscriptions go through `PollGroupEngine`
|
||||
timers; probe loops are `Task.Run`). Several updates are read-modify-write - e.g.
|
||||
`new DriverHealth(DriverState.Degraded, _health.LastSuccessfulRead, ...)` reads `_health`
|
||||
then writes a new instance - so a concurrent update can be lost or a stale
|
||||
`LastSuccessfulRead` propagated. While `DriverHealth` is an immutable record and the
|
||||
reference write is atomic, the lack of synchronization means `GetHealth()` can observe
|
||||
torn-in-time state and successful-read timestamps can regress.
|
||||
|
||||
**Recommendation:** Guard `_health` reads/writes with a lock, or use
|
||||
`Interlocked.Exchange`/`Volatile` around the whole record reference and compute the new
|
||||
value from a single captured snapshot. The `DeviceState`/`HostState` transition already
|
||||
uses `ProbeLock`; apply the same discipline to driver health.
|
||||
|
||||
**Resolution:** _(open)_
|
||||
|
||||
### Driver.FOCAS-006
|
||||
|
||||
| Field | Value |
|
||||
|---|---|
|
||||
| Severity | Medium |
|
||||
| Category | Error handling & resilience |
|
||||
| Location | `FocasDriver.cs:859-874`, `WireFocasClient.cs:22-31` |
|
||||
| Status | Open |
|
||||
|
||||
**Description:** `EnsureConnectedAsync` reuses the cached `IFocasClient` instance across
|
||||
a transient disconnect: it only checks `device.Client is { IsConnected: true }` and
|
||||
otherwise calls `ConnectAsync` again on the same object. For a `WireFocasClient` whose
|
||||
underlying `FocasWireClient` has been disposed (e.g. via a `HandleRecycle` /
|
||||
`DisposeClient` race, or a prior teardown), every subsequent call hits
|
||||
`FocasWireClient.ThrowIfDisposed` and throws `ObjectDisposedException`. In `ReadAsync`
|
||||
that exception is caught only by the generic `catch (Exception ex)` and mapped to a
|
||||
permanent `BadCommunicationError` - the device stays wedged with no recovery path until
|
||||
`ReinitializeAsync` is invoked, because the reconnect logic never discards the disposed
|
||||
client.
|
||||
|
||||
**Recommendation:** On any connect/use failure, treat a disposed or non-connected client
|
||||
as unrecoverable and recreate it from `_clientFactory`. Simplest: in
|
||||
`EnsureConnectedAsync`, when `device.Client` is non-null but not connected, dispose and
|
||||
null it before creating a fresh instance, rather than retrying `ConnectAsync` on the
|
||||
stale object.
|
||||
|
||||
**Resolution:** _(open)_
|
||||
|
||||
### Driver.FOCAS-007
|
||||
|
||||
| Field | Value |
|
||||
|---|---|
|
||||
| Severity | Low |
|
||||
| Category | Error handling & resilience |
|
||||
| Location | `FocasDriver.cs:140-148`, `FocasDriver.cs:478-484`, `FocasDriver.cs:529-533`, `FocasAlarmProjection.cs:61-63` |
|
||||
| Status | Open |
|
||||
|
||||
**Description:** Numerous `try { ... } catch {}` blocks swallow every exception with no
|
||||
logging - `ShutdownAsync` (CTS cancel/dispose), `RecycleLoopAsync` (`DisposeClient`),
|
||||
`FixedTreeLoopAsync` transient catches, `ProbeLoopAsync`, and the alarm projection's
|
||||
`sub.Cts.Cancel()`. The driver takes no `ILogger` dependency at all (only
|
||||
`FocasWireClient` optionally accepts one, and the driver never supplies it). A CNC that
|
||||
is silently failing every probe/poll tick produces no diagnostic trail, which conflicts
|
||||
with the project's Serilog logging convention and forces field troubleshooting to rely
|
||||
solely on `GetHealth()`.
|
||||
|
||||
**Recommendation:** Inject an `ILogger<FocasDriver>` and log caught exceptions in the
|
||||
poll/probe/recycle loops at `Debug`/`Warning`. Pass a logger into `FocasWireClient` so
|
||||
the per-response `Debug` entries it already emits are actually captured.
|
||||
|
||||
**Resolution:** _(open)_
|
||||
|
||||
### Driver.FOCAS-008
|
||||
|
||||
| Field | Value |
|
||||
|---|---|
|
||||
| Severity | Low |
|
||||
| Category | Performance & resource management |
|
||||
| Location | `FocasDriver.cs:201`, `FocasDriver.cs:253` |
|
||||
| Status | Open |
|
||||
|
||||
**Description:** `ReadAsync` and `WriteAsync` call `FocasAddress.TryParse(def.Address)`
|
||||
on every operation, even though `InitializeAsync` already parsed and validated every
|
||||
tag address. On a subscription hot path (each poll tick re-enters `ReadAsync`) this
|
||||
re-parses and allocates a `FocasAddress` record per tag per tick unnecessarily.
|
||||
|
||||
**Recommendation:** Parse each tag address once at `InitializeAsync` and store the
|
||||
parsed `FocasAddress` on `FocasTagDefinition` (or in a side dictionary), so the runtime
|
||||
read/write paths use the cached value.
|
||||
|
||||
**Resolution:** _(open)_
|
||||
|
||||
### Driver.FOCAS-009
|
||||
|
||||
| Field | Value |
|
||||
|---|---|
|
||||
| Severity | Low |
|
||||
| Category | Design-document adherence |
|
||||
| Location | `FocasDriverOptions.cs:110-115`, `FocasDriver.cs:468-486`, `FocasDriverFactoryExtensions.cs:75-80` |
|
||||
| Status | Open |
|
||||
|
||||
**Description:** `FocasProbeOptions.Timeout` is parsed by the factory
|
||||
(`FocasProbeDto.TimeoutMs` to `FocasProbeOptions.Timeout`) but never consumed.
|
||||
`ProbeLoopAsync` calls `client.ProbeAsync(ct)` with only the probe-loop cancellation
|
||||
token; no per-probe timeout is applied, and `EnsureConnectedAsync` uses
|
||||
`_options.Timeout` rather than `Probe.Timeout`. A hung CNC socket during a probe blocks
|
||||
until the OS TCP timeout rather than the configured `Probe.Timeout`.
|
||||
|
||||
**Recommendation:** Apply `Probe.Timeout` as a linked `CancellationTokenSource` timeout
|
||||
around the `ProbeAsync` call, or remove the dead `Timeout` field from
|
||||
`FocasProbeOptions` / `FocasProbeDto` if it is genuinely not intended.
|
||||
|
||||
**Resolution:** _(open)_
|
||||
|
||||
### Driver.FOCAS-010
|
||||
|
||||
| Field | Value |
|
||||
|---|---|
|
||||
| Severity | Low |
|
||||
| Category | Code organization & conventions |
|
||||
| Location | `IFocasClient.cs:210-227` (`FocasOpMode`), `FocasConstants.cs:42-78` (`FocasOperationMode`) |
|
||||
| Status | Open |
|
||||
|
||||
**Description:** There are two parallel operation-mode-to-text mappings with divergent
|
||||
labels. `FocasOpMode.ToText` (used by the driver fixed-tree `OperationMode/ModeText`
|
||||
node) yields `"TJOG"`, `"TEACH_IN_HANDLE"`; `FocasOperationModeExtensions.ToText` (in
|
||||
the Wire layer) yields `"T-JOG"`, `"TEACH-IN-HANDLE"`. They also use different fallback
|
||||
formats (`Mode{mode}` vs the bare number). The same concept is encoded twice with
|
||||
inconsistent results depending on which path renders it.
|
||||
|
||||
**Recommendation:** Consolidate to a single op-mode enum + `ToText` helper shared by
|
||||
both the wire layer and the driver projection, with one canonical label set.
|
||||
|
||||
**Resolution:** _(open)_
|
||||
|
||||
### Driver.FOCAS-011
|
||||
|
||||
| Field | Value |
|
||||
|---|---|
|
||||
| Severity | Low |
|
||||
| Category | Code organization & conventions |
|
||||
| Location | `IFocasClient.cs:275-287` (`FocasAlarmType`), `FocasAlarmProjection.cs:149-175` |
|
||||
| Status | Open |
|
||||
|
||||
**Description:** `FocasAlarmType` declares its constants as `public const int`, but the
|
||||
only consumers - `FocasAlarmProjection.MapAlarmType(short type)` and
|
||||
`MapSeverity(short type)` - take a `short` and `switch` against these `int` constants. It
|
||||
compiles only because the values (0..13) fit in `short` range as constant expressions.
|
||||
The type mismatch is a latent maintenance hazard: adding a constant above
|
||||
`short.MaxValue`, or changing the projection signatures, would break the switch in
|
||||
non-obvious ways. `FocasAlarmType.All` is `-1` and is also passed where a `short` is
|
||||
expected by `ReadAlarmsAsync`.
|
||||
|
||||
**Recommendation:** Declare the `FocasAlarmType` constants as `short` (or make it an
|
||||
`enum : short`) so the type matches the wire field width and the projection signatures.
|
||||
|
||||
**Resolution:** _(open)_
|
||||
|
||||
### Driver.FOCAS-012
|
||||
|
||||
| Field | Value |
|
||||
|---|---|
|
||||
| Severity | Medium |
|
||||
| Category | Testing coverage |
|
||||
| Location | `FocasDriverFactoryExtensions.cs`, `FocasDriver.cs:495-629` (`FixedTreeLoopAsync`) |
|
||||
| Status | Open |
|
||||
|
||||
**Description:** The unit test project does not exercise
|
||||
`FocasDriverFactoryExtensions.CreateInstance` with `FixedTree` / `AlarmProjection` /
|
||||
`HandleRecycle` config sections - which is why the config-mapping gap in
|
||||
Driver.FOCAS-001 was not caught. There is also no test that drives the fixed-tree
|
||||
bootstrap / capability-probe path (`FixedTreeLoopAsync`), so the false-positive
|
||||
`ProgramInfo` capability in Driver.FOCAS-002 is untested, and the
|
||||
`EnsureConnectedAsync` reconnect-after-disconnect path (Driver.FOCAS-006) has no
|
||||
coverage.
|
||||
|
||||
**Recommendation:** Add factory tests that round-trip a full JSON config including the
|
||||
three opt-in sections and assert the options reach the driver; add a
|
||||
`FakeFocasClient`-driven test for fixed-tree bootstrap capability classification
|
||||
(including the unsupported-program-info case); add a reconnect test that disposes the
|
||||
fake client mid-session and asserts recovery.
|
||||
|
||||
**Resolution:** _(open)_
|
||||
237
code-reviews/Driver.Galaxy/findings.md
Normal file
237
code-reviews/Driver.Galaxy/findings.md
Normal file
@@ -0,0 +1,237 @@
|
||||
# Code Review — Driver.Galaxy
|
||||
|
||||
| Field | Value |
|
||||
|---|---|
|
||||
| Module | `src/Drivers/ZB.MOM.WW.OtOpcUa.Driver.Galaxy` |
|
||||
| Reviewer | Claude Code |
|
||||
| Review date | 2026-05-22 |
|
||||
| Commit reviewed | `76d35d1` |
|
||||
| Status | Reviewed |
|
||||
| Open findings | 14 |
|
||||
|
||||
## Checklist coverage
|
||||
|
||||
| # | Category | Result |
|
||||
|---|---|---|
|
||||
| 1 | Correctness & logic bugs | Driver.Galaxy-001, Driver.Galaxy-002, Driver.Galaxy-003, Driver.Galaxy-004 |
|
||||
| 2 | OtOpcUa conventions | Driver.Galaxy-005 |
|
||||
| 3 | Concurrency & thread safety | Driver.Galaxy-006, Driver.Galaxy-007 |
|
||||
| 4 | Error handling & resilience | Driver.Galaxy-001, Driver.Galaxy-008, Driver.Galaxy-009 |
|
||||
| 5 | Security | Driver.Galaxy-010 |
|
||||
| 6 | Performance & resource management | Driver.Galaxy-011, Driver.Galaxy-012 |
|
||||
| 7 | Design-document adherence | Driver.Galaxy-013 |
|
||||
| 8 | Code organization & conventions | No issues found |
|
||||
| 9 | Testing coverage | Driver.Galaxy-014 |
|
||||
| 10 | Documentation & comments | Driver.Galaxy-005, Driver.Galaxy-013 |
|
||||
|
||||
## Findings
|
||||
|
||||
### Driver.Galaxy-001
|
||||
|
||||
| Field | Value |
|
||||
|---|---|
|
||||
| Severity | Critical |
|
||||
| Category | Error handling & resilience |
|
||||
| Location | `Runtime/EventPump.cs:128`, `GalaxyDriver.cs:222` |
|
||||
| Status | Open |
|
||||
|
||||
**Description:** The `ReconnectSupervisor` is constructed in `BuildProductionRuntimeAsync` and exposes `ReportTransportFailure(Exception)` as the only entry point that starts the reopen -> replay recovery loop. Nothing in the driver ever calls `ReportTransportFailure` (a repo-wide search finds only the declaration). When the gateway `StreamEvents` stream faults, `EventPump.RunAsync` catches the exception, logs "reconnect supervisor (PR 4.5) handles restart", completes the channel, and exits — but the supervisor is never told. The result: a transient gateway transport drop permanently kills the event stream. Data-change notifications stop, no reconnect/replay runs, and `GetHealth()` keeps reporting `Healthy` because `_supervisor.IsDegraded` stays false. This is a production outage with no self-recovery.
|
||||
|
||||
**Recommendation:** Wire the EventPump (and any gw RPC that observes a transport fault) to call `_supervisor.ReportTransportFailure(ex)`. The simplest path: give `EventPump` a fault callback (or expose a `StreamFaulted` event) that `GalaxyDriver` subscribes to and forwards to the supervisor. The supervisor's `ReopenAsync`/`ReplayAsync` must also restart the EventPump itself (see Driver.Galaxy-008).
|
||||
|
||||
**Resolution:** _(open)_
|
||||
|
||||
### Driver.Galaxy-002
|
||||
|
||||
| Field | Value |
|
||||
|---|---|
|
||||
| Severity | High |
|
||||
| Category | Correctness & logic bugs |
|
||||
| Location | `Browse/DataTypeMap.cs:13`, `Runtime/MxValueDecoder.cs:9` |
|
||||
| Status | Open |
|
||||
|
||||
**Description:** `DataTypeMap.Map` maps Galaxy `mx_data_type` codes to six `DriverDataType` values (Boolean, Int32, Float32, Float64, String, DateTime) — there is no `Int64` arm. Yet `MxValueDecoder` and `MxValueEncoder` both fully support Int64 (`MxValue.Int64Value`, `Int64Array`), and the decoder's own XML doc claims "the seven Galaxy data types ... (Boolean, Int32, Int64, Float32, Float64, String, DateTime)". Any Galaxy attribute whose `mx_data_type` is the Int64 code (or any code > 5) falls through the `_ => DriverDataType.String` default. The address-space node is then created as a `String` variable while runtime reads decode an `Int64` boxed value — a type mismatch that produces wrong OPC UA `DataType`/`ValueRank` metadata and likely fails value coercion at the server node layer.
|
||||
|
||||
**Recommendation:** Confirm the Galaxy `mx_data_type` integer code for 64-bit integers and add the explicit arm to `DataTypeMap.Map`. If the wire format genuinely has no Int64 type, correct the `MxValueDecoder`/`MxValueEncoder` doc comments instead. Either way the encoder/decoder and the type map must agree.
|
||||
|
||||
**Resolution:** _(open)_
|
||||
|
||||
### Driver.Galaxy-003
|
||||
|
||||
| Field | Value |
|
||||
|---|---|
|
||||
| Severity | Medium |
|
||||
| Category | Correctness & logic bugs |
|
||||
| Location | `Runtime/StatusCodeMap.cs:86` |
|
||||
| Status | Open |
|
||||
|
||||
**Description:** `FromMxStatus` returns `Good` whenever `status.Success != 0`. The intent (per the surrounding comment "Honors the success flag") is that a non-zero `Success` means success. But if `MxStatusProxy.Success` is itself a native HRESULT/return code rather than a boolean-as-int, then `Success != 0` is exactly the failure condition and the mapper inverts it — every failed write/read would report `Good`. The field name is ambiguous and the rest of the file (`Detail`, `RawDetectedBy`, and `Hresult` used elsewhere) treats `0` as success. `GatewayGalaxyAlarmAcknowledger.cs:62` uses the opposite convention for the sibling field (`reply.Hresult != 0` means failure).
|
||||
|
||||
**Recommendation:** Verify the semantics of `MxStatusProxy.Success` against the gateway proto contract. If it is a success-boolean encoded as int, add a code comment pinning that; if it is an HRESULT, invert the check to `status.Success == 0 => Good`.
|
||||
|
||||
**Resolution:** _(open)_
|
||||
|
||||
### Driver.Galaxy-004
|
||||
|
||||
| Field | Value |
|
||||
|---|---|
|
||||
| Severity | Medium |
|
||||
| Category | Correctness & logic bugs |
|
||||
| Location | `GalaxyDriver.cs:901` |
|
||||
| Status | Open |
|
||||
|
||||
**Description:** `OnPumpDataChange` reconstructs a raw OPC DA quality byte from an OPC UA `StatusCode` for the probe watcher: it shifts `StatusCode >> 30` and maps `0->192, 1->64, _->0`. The `StatusCode` was itself produced upstream by `StatusCodeMap.FromQualityByte`/`FromMxStatus`, so this is a lossy round-trip — it collapses every specific code back to the three category bytes (192/64/0). That happens to satisfy `PerPlatformProbeWatcher.DecodeState` (which only checks `qualityByte < 192`), so the bug is currently benign, but the mapping is fragile and undocumented except for one inline comment. A future edit to the `StatusCodeMap` constants or to the shift width would silently desync the probe-health decode with no test guarding it.
|
||||
|
||||
**Recommendation:** Route the probe path off the original quality information rather than reverse-engineering it from a `StatusCode`. Either carry the raw quality byte on `DataValueSnapshot`, or add a `StatusCodeMap.ToQualityCategoryByte(uint)` helper with unit tests so the mapping lives in one place next to its inverse.
|
||||
|
||||
**Resolution:** _(open)_
|
||||
|
||||
### Driver.Galaxy-005
|
||||
|
||||
| Field | Value |
|
||||
|---|---|
|
||||
| Severity | Low |
|
||||
| Category | OtOpcUa conventions |
|
||||
| Location | `Runtime/EventPump.cs:81-88` |
|
||||
| Status | Open |
|
||||
|
||||
**Description:** The `BoundedChannelOptions` comment states "Newest-dropped policy: when full, the producer's TryWrite returns false ... We do this manually rather than relying on `BoundedChannelFullMode.DropWrite`" — but the option is then set to `FullMode = BoundedChannelFullMode.Wait`. With `Wait`, `TryWrite` returning `false` on a full channel is correct behaviour, so the code works, but the comment naming the mode and the actual mode disagree, which is confusing for a maintainer deciding whether the policy is `Wait`, `DropWrite`, or `DropNewest`.
|
||||
|
||||
**Recommendation:** Either reword the comment to say "we use `Wait` mode but never call the awaitable `WriteAsync` — `TryWrite` gives us synchronous newest-dropped semantics", or switch to `BoundedChannelFullMode.DropWrite` and keep the manual drop count. Make the comment and the mode consistent.
|
||||
|
||||
**Resolution:** _(open)_
|
||||
|
||||
### Driver.Galaxy-006
|
||||
|
||||
| Field | Value |
|
||||
|---|---|
|
||||
| Severity | Medium |
|
||||
| Category | Concurrency & thread safety |
|
||||
| Location | `GalaxyDriver.cs:848-861` |
|
||||
| Status | Open |
|
||||
|
||||
**Description:** `OnAlarmFeedTransition` picks the "owner" handle with `_alarmSubscriptions.First()` under `_alarmHandlersLock`. `HashSet<T>.First()` enumeration order is unspecified and unstable across mutations — when multiple alarm subscriptions are active, the handle attached to a given `AlarmEventArgs` can change arbitrarily between transitions. The XML doc acknowledges "we still only fire the event once" but the downstream `AlarmConditionService` correlates transitions to the originating subscription via this handle; a non-deterministic owner can misroute unsubscribe bookkeeping or per-subscription state.
|
||||
|
||||
**Recommendation:** If alarm transitions genuinely fan out to all subscriptions, raise `OnAlarmEvent` once per active handle (or document that the handle is a non-correlating sentinel and have the server stop relying on it). If a single owner is required, make the choice deterministic (e.g. the earliest-created handle) and stable.
|
||||
|
||||
**Resolution:** _(open)_
|
||||
|
||||
### Driver.Galaxy-007
|
||||
|
||||
| Field | Value |
|
||||
|---|---|
|
||||
| Severity | Medium |
|
||||
| Category | Concurrency & thread safety |
|
||||
| Location | `GalaxyDriver.cs:937-968` |
|
||||
| Status | Open |
|
||||
|
||||
**Description:** `Dispose()` is not synchronized against the capability methods. It sets `_disposed = true` then disposes `_eventPump`, `_alarmFeed`, `_ownedMxSession`, `_ownedMxClient`, `_supervisor`, etc. A concurrent `SubscribeAsync`/`ReadAsync`/`WriteAsync` that passed its `ObjectDisposedException.ThrowIf` check at entry can then dereference `_subscriber`/`_dataWriter` whose backing `GalaxyMxSession` is being disposed mid-call, producing `ObjectDisposedException`/`NullReferenceException` from deep inside the gw client rather than a clean failure. `Dispose` also blocks the caller on `GetAwaiter().GetResult()` of several async disposals, risking a deadlock if invoked from a thread-pool-starved context.
|
||||
|
||||
**Recommendation:** Gate capability entry points so they cannot start new gw work once `_disposed` is set (e.g. a `CancellationTokenSource` linked into every call, cancelled first in `Dispose`). Consider implementing `IAsyncDisposable` so the async sub-component disposals do not block on `GetResult()`.
|
||||
|
||||
**Resolution:** _(open)_
|
||||
|
||||
### Driver.Galaxy-008
|
||||
|
||||
| Field | Value |
|
||||
|---|---|
|
||||
| Severity | High |
|
||||
| Category | Error handling & resilience |
|
||||
| Location | `GalaxyDriver.cs:264-276`, `Runtime/EventPump.cs:97-103` |
|
||||
| Status | Open |
|
||||
|
||||
**Description:** Even if Driver.Galaxy-001 is fixed and the supervisor's `ReplayAsync` runs, recovery is incomplete. `ReplayAsync` re-issues `SubscribeBulkAsync` for the tracked tags, but the `EventPump` background loop that consumes `StreamEvents` is not restarted. After a stream fault `EventPump.RunAsync` exits and `_channel` is completed; `EventPump.Start()` is a no-op (`if (_loop is not null) return`) because `_loop` is a completed-but-non-null task. So a replayed subscription has no consumer — values are subscribed on the gw but never reach `OnDataChange`. Additionally `ReplayAsync` never re-registers the new item handles the gw returns into `SubscriptionRegistry`; the old stale item handles remain, so even with a live pump the fan-out reverse-map would miss the post-reconnect handles.
|
||||
|
||||
**Recommendation:** On reconnect, dispose and recreate the `EventPump` (or make it restartable), and have `ReplayAsync` update `SubscriptionRegistry` bindings with the new item handles returned by the post-reconnect `SubscribeBulkAsync`. Add an integration/parity test that drops the stream mid-subscription and asserts `OnDataChange` resumes.
|
||||
|
||||
**Resolution:** _(open)_
|
||||
|
||||
### Driver.Galaxy-009
|
||||
|
||||
| Field | Value |
|
||||
|---|---|
|
||||
| Severity | Medium |
|
||||
| Category | Error handling & resilience |
|
||||
| Location | `GalaxyDriver.cs:354-371` |
|
||||
| Status | Open |
|
||||
|
||||
**Description:** `StartDeployWatcher` launches the watch loop with `_ = _deployWatcher.StartAsync(CancellationToken.None)` — a fire-and-forget with a discarded `Task`. `StartAsync` can throw synchronously (`InvalidOperationException` if already started); the discard masks that programming error. Separately, `StartDeployWatcher` builds an `_ownedRepositoryClient` purely for the watcher when discovery has not run yet — if `DiscoverAsync` later runs, `BuildDefaultHierarchySource` overwrites `_ownedRepositoryClient` with a second client, leaking the first (only the latest reference is disposed in `Dispose`).
|
||||
|
||||
**Recommendation:** Await `StartAsync` (it completes synchronously after scheduling) or at least observe its result. Reuse a single `GalaxyRepositoryClient` across the deploy watcher and the hierarchy source instead of letting `BuildDefaultHierarchySource` clobber the field — guard the assignment or build the client once in `InitializeAsync`.
|
||||
|
||||
**Resolution:** _(open)_
|
||||
|
||||
### Driver.Galaxy-010
|
||||
|
||||
| Field | Value |
|
||||
|---|---|
|
||||
| Severity | Low |
|
||||
| Category | Security |
|
||||
| Location | `GalaxyDriver.cs:311-341` |
|
||||
| Status | Open |
|
||||
|
||||
**Description:** `ResolveApiKey` supports an `env:`/`file:` indirection and otherwise treats the config string as the literal API key ("Anything else — used as the literal API key. Convenient for dev"). `GalaxyGatewayOptions`' own XML doc claims "the API key never appears in cleartext config". The literal-key fallback silently permits a plaintext API key in the `DriverConfig` JSON column of the central config DB, contradicting the documented contract. There is no warning logged when the literal path is taken.
|
||||
|
||||
**Recommendation:** Log a startup warning when `ResolveApiKey` falls through to the literal arm so an operator who accidentally committed a cleartext key sees it, and update the `GalaxyGatewayOptions` doc comment so it no longer over-promises. Consider gating the literal arm behind an explicit `dev:`-style prefix so a cleartext key cannot be used by accident.
|
||||
|
||||
**Resolution:** _(open)_
|
||||
|
||||
### Driver.Galaxy-011
|
||||
|
||||
| Field | Value |
|
||||
|---|---|
|
||||
| Severity | Medium |
|
||||
| Category | Performance & resource management |
|
||||
| Location | `GalaxyDriver.cs:411` |
|
||||
| Status | Open |
|
||||
|
||||
**Description:** `GetMemoryFootprint()` unconditionally returns `0` with a comment "PR 4.4 sets this from SubscriptionRegistry size" — PR 4.4 has shipped (the registry exists and is used) but the method was never updated. `IHostConnectivityProbe.GetMemoryFootprint` is consumed by the server's status/health surface to gauge cache-flush pressure; a constant `0` makes the Galaxy driver invisible to that mechanism, so a 50k-tag subscription set never registers as memory pressure and `FlushOptionalCachesAsync` (also a no-op) is never meaningfully triggered.
|
||||
|
||||
**Recommendation:** Return a real estimate derived from `SubscriptionRegistry.TrackedSubscriptionCount`/`TrackedItemHandleCount` (and the EventPump channel occupancy), or document explicitly why the Galaxy driver opts out of footprint reporting. Remove the stale "PR 4.4 sets this" comment.
|
||||
|
||||
**Resolution:** _(open)_
|
||||
|
||||
### Driver.Galaxy-012
|
||||
|
||||
| Field | Value |
|
||||
|---|---|
|
||||
| Severity | Low |
|
||||
| Category | Performance & resource management |
|
||||
| Location | `Runtime/SubscriptionRegistry.cs:65-67`, `GalaxyDriver.cs:538`, `GalaxyDriver.cs:675` |
|
||||
| Status | Open |
|
||||
|
||||
**Description:** Several hot paths are O(n^2) per call. `SubscriptionRegistry.ResolveSubscribers` does `entry.Bindings.FirstOrDefault(b => b.ItemHandle == itemHandle)` — a linear scan of the whole binding list for every event dispatch; at 50k tags this is 50k-element scans on the 1Hz fan-out path. `GalaxyDriver.SubscribeAsync` and `ReadViaSubscribeOnceAsync` correlate results to references with `results.FirstOrDefault(r => string.Equals(...))` inside a `for` loop over all references — O(n^2) over the subscribe batch. `SubscriptionRegistry.Remove` rebuilds a `ConcurrentBag` from a LINQ filter on every unsubscribe.
|
||||
|
||||
**Recommendation:** Index `SubscriptionEntry` bindings by item handle (a `Dictionary<int, string>` per entry) so `ResolveSubscribers` is O(1) per subscriber. Project the `SubscribeResult` list into a `Dictionary<string, SubscribeResult>` (OrdinalIgnoreCase) once before the correlation loop. These matter on the documented 50k-tag soak path.
|
||||
|
||||
**Resolution:** _(open)_
|
||||
|
||||
### Driver.Galaxy-013
|
||||
|
||||
| Field | Value |
|
||||
|---|---|
|
||||
| Severity | Low |
|
||||
| Category | Design-document adherence |
|
||||
| Location | `GalaxyDriver.cs:14-27`, `GalaxyDriver.cs:374-382`, `Config/GalaxyDriverOptions.cs:84-86` |
|
||||
| Status | Open |
|
||||
|
||||
**Description:** Multiple doc comments are stale relative to the shipped code. `GalaxyDriver`'s class summary still describes the file as "the project skeleton with `IDriver` bodies that wire to a future `IGalaxyGatewayClient` abstraction. Capability interfaces ... land in PRs 4.1-4.7" and references the legacy `GalaxyProxyDriver` coexisting "until PR 7.2" — but PR 7.2 already deleted the legacy Galaxy projects and the capability interfaces are all implemented. `ReinitializeAsync` is still a stub ("for the skeleton we just refresh health") that ignores `driverConfigJson` entirely — a config reapply silently does nothing. `GalaxyReconnectOptions.ReplayOnSessionLost` is defined and documented but never read anywhere in the driver (`ReplayAsync` always replays).
|
||||
|
||||
**Recommendation:** Refresh the `GalaxyDriver` class and `ReinitializeAsync` doc comments to describe the shipped state, implement or explicitly reject `ReinitializeAsync` config reapply, and either honour `ReplayOnSessionLost` or remove it from `GalaxyReconnectOptions`.
|
||||
|
||||
**Resolution:** _(open)_
|
||||
|
||||
### Driver.Galaxy-014
|
||||
|
||||
| Field | Value |
|
||||
|---|---|
|
||||
| Severity | Medium |
|
||||
| Category | Testing coverage |
|
||||
| Location | `src/Drivers/ZB.MOM.WW.OtOpcUa.Driver.Galaxy` (module-wide) |
|
||||
| Status | Open |
|
||||
|
||||
**Description:** The reconnect/recovery path is the module's highest-risk surface and is effectively untested at the integration seam. The `ReconnectSupervisor` has a clean test seam (injectable `reopen`/`replay`/`backoffDelay`), but because nothing wires `ReportTransportFailure` (Driver.Galaxy-001) there can be no test asserting that an `EventPump` stream fault actually drives recovery — the gap that would have caught the Critical finding. Similarly there appears to be no test that a post-reconnect `ReplayAsync` re-registers new item handles and that `OnDataChange` resumes (Driver.Galaxy-008). The `StatusCodeMap.FromMxStatus` `Success`-flag semantics (Driver.Galaxy-003) and the `DataTypeMap` Int64 gap (Driver.Galaxy-002) are also the kind of behaviour a focused unit test would pin.
|
||||
|
||||
**Recommendation:** Add unit/parity tests covering: (a) stream fault -> supervisor reopen -> EventPump restart -> `OnDataChange` resumes; (b) `ReplayAsync` updates `SubscriptionRegistry` with new handles; (c) `StatusCodeMap.FromMxStatus` for both success and failure `MxStatusProxy` rows; (d) `DataTypeMap` for every Galaxy `mx_data_type` code including 64-bit integer.
|
||||
|
||||
**Resolution:** _(open)_
|
||||
294
code-reviews/Driver.Historian.Wonderware.Client/findings.md
Normal file
294
code-reviews/Driver.Historian.Wonderware.Client/findings.md
Normal file
@@ -0,0 +1,294 @@
|
||||
# Code Review — Driver.Historian.Wonderware.Client
|
||||
|
||||
| Field | Value |
|
||||
|---|---|
|
||||
| Module | `src/Drivers/ZB.MOM.WW.OtOpcUa.Driver.Historian.Wonderware.Client` |
|
||||
| Reviewer | Claude Code |
|
||||
| Review date | 2026-05-22 |
|
||||
| Commit reviewed | `76d35d1` |
|
||||
| Status | Reviewed |
|
||||
| Open findings | 10 |
|
||||
|
||||
## Checklist coverage
|
||||
|
||||
A comprehensive review completes every category, recording "No issues found" where
|
||||
a category produced nothing rather than leaving it blank.
|
||||
|
||||
| # | Category | Result |
|
||||
|---|---|---|
|
||||
| 1 | Correctness & logic bugs | Driver.Historian.Wonderware.Client-001, Driver.Historian.Wonderware.Client-002 |
|
||||
| 2 | OtOpcUa conventions | No issues found |
|
||||
| 3 | Concurrency & thread safety | Driver.Historian.Wonderware.Client-003, Driver.Historian.Wonderware.Client-004 |
|
||||
| 4 | Error handling & resilience | Driver.Historian.Wonderware.Client-005, Driver.Historian.Wonderware.Client-006 |
|
||||
| 5 | Security | Driver.Historian.Wonderware.Client-007, Driver.Historian.Wonderware.Client-008 |
|
||||
| 6 | Performance & resource management | No issues found |
|
||||
| 7 | Design-document adherence | No issues found |
|
||||
| 8 | Code organization & conventions | No issues found |
|
||||
| 9 | Testing coverage | Driver.Historian.Wonderware.Client-009 |
|
||||
| 10 | Documentation & comments | Driver.Historian.Wonderware.Client-010 |
|
||||
|
||||
## Findings
|
||||
|
||||
### Driver.Historian.Wonderware.Client-001
|
||||
|
||||
| Field | Value |
|
||||
|---|---|
|
||||
| Severity | High |
|
||||
| Category | Correctness & logic bugs |
|
||||
| Location | `WonderwareHistorianClient.cs:98-113` |
|
||||
| Status | Open |
|
||||
|
||||
**Description:** `ReadAtTimeAsync` violates the explicit `IHistorianDataSource.ReadAtTimeAsync`
|
||||
contract. The interface XML doc states: the returned list MUST be the same length and
|
||||
order as `timestampsUtc`, and gaps are returned as Bad-quality snapshots. The client passes
|
||||
`reply.Samples` straight through `ToSnapshots` with no check that the sidecar returned
|
||||
exactly one sample per requested timestamp, nor that the order matches. If the sidecar
|
||||
returns fewer/more samples (e.g. it drops boundary-less timestamps), the OPC UA
|
||||
HistoryReadAtTime service receives a result that the spec-compliant caller expects to
|
||||
index positionally against the request timestamps, silently misaligning values with
|
||||
timestamps. The matching `ReadAtTimeAsync_PreservesTimestampOrder` test only passes because
|
||||
the fake echoes the request verbatim; it never exercises a short/reordered reply.
|
||||
|
||||
**Recommendation:** After receiving the reply, reconcile `reply.Samples` against
|
||||
`timestampsUtc` by timestamp: build the result array at `timestampsUtc.Count`, fill matched
|
||||
entries, and emit a Bad-quality (`0x80000000`) snapshot for any requested timestamp the
|
||||
sidecar did not return. Alternatively assert `reply.Samples.Length == timestampsUtc.Count`
|
||||
and fail loudly. Add a test where the fake returns a partial/reordered sample set.
|
||||
|
||||
**Resolution:** _(open)_
|
||||
|
||||
### Driver.Historian.Wonderware.Client-002
|
||||
|
||||
| Field | Value |
|
||||
|---|---|
|
||||
| Severity | Medium |
|
||||
| Category | Correctness & logic bugs |
|
||||
| Location | `WonderwareHistorianClient.cs:154-199`, `IAlarmHistorianSink.cs:66-74` |
|
||||
| Status | Open |
|
||||
|
||||
**Description:** `WriteBatchAsync` can never return `HistorianWriteOutcome.PermanentFail`.
|
||||
`HistorianWriteOutcome` defines three states (`Ack`, `RetryPlease`, `PermanentFail`) and
|
||||
the drain worker is documented to move the event to the dead-letter table on
|
||||
`PermanentFail`. The client maps the sidecar `WriteAlarmEventsReply.PerEventOk` bool array
|
||||
to only `Ack`/`RetryPlease`, and the whole-call-failure and catch paths also only emit
|
||||
`RetryPlease`. A malformed alarm event the sidecar can never persist (unrecoverable SDK
|
||||
error on that specific row) therefore retries forever, blocking the head of the
|
||||
store-and-forward queue and never dead-lettering. The wire contract
|
||||
(`WriteAlarmEventsReply`) carries no per-event permanent/transient distinction, so the
|
||||
limitation is structural.
|
||||
|
||||
**Recommendation:** Extend the wire contract: replace `bool[] PerEventOk` with a
|
||||
per-event status enum (Ack/Retry/Permanent), coordinated as an additive change on both
|
||||
sidecar and client per the Contracts.cs versioning rules, so unrecoverable events can be
|
||||
dead-lettered. Until then, document explicitly that this writer never produces
|
||||
`PermanentFail` and that poison events retry indefinitely.
|
||||
|
||||
**Resolution:** _(open)_
|
||||
|
||||
### Driver.Historian.Wonderware.Client-003
|
||||
|
||||
| Field | Value |
|
||||
|---|---|
|
||||
| Severity | Low |
|
||||
| Category | Concurrency & thread safety |
|
||||
| Location | `WonderwareHistorianClient.cs:207`, `WonderwareHistorianClient.cs:132-150` |
|
||||
| Status | Open |
|
||||
|
||||
**Description:** `_totalQueries` is mutated with `Interlocked.Increment` in `Invoke`, but
|
||||
read inside `GetHealthSnapshot` under `_healthLock`, and every other counter
|
||||
(`_totalSuccesses`, `_totalFailures`, `_consecutiveFailures`) is mutated only under
|
||||
`_healthLock`. The two synchronization mechanisms do not compose: an `Interlocked`
|
||||
increment is not ordered against `lock`-protected reads, so a snapshot can observe a
|
||||
`_totalQueries` value inconsistent with the lock-protected counters. The window is small
|
||||
and the counters are advisory, but the mixed model is a latent hazard.
|
||||
|
||||
**Recommendation:** Pick one mechanism. Simplest: move the `_totalQueries++` into the
|
||||
`_healthLock` block (a new `RecordQuery()` helper, or fold it into `RecordSuccess`/
|
||||
`RecordFailure`) so all six health fields share a single lock.
|
||||
|
||||
**Resolution:** _(open)_
|
||||
|
||||
### Driver.Historian.Wonderware.Client-004
|
||||
|
||||
| Field | Value |
|
||||
|---|---|
|
||||
| Severity | Low |
|
||||
| Category | Concurrency & thread safety |
|
||||
| Location | `WonderwareHistorianClient.cs:203-267` |
|
||||
| Status | Open |
|
||||
|
||||
**Description:** A sidecar-reported failure is recorded in two non-atomic steps under
|
||||
separate lock acquisitions: `Invoke` calls `RecordSuccess()` (line 211) and then the
|
||||
caller calls `ThrowIfFailed` which calls `ReclassifySuccessAsFailure()` (line 256),
|
||||
decrementing `_totalSuccesses` and incrementing `_totalFailures`. Between those two locked
|
||||
regions a concurrent `GetHealthSnapshot` can observe a transient state where the operation
|
||||
counts as both a success and not-yet-a-failure (`_totalSuccesses` inflated,
|
||||
`_consecutiveFailures` still 0). The undo-a-success/record-a-failure dance is also fragile:
|
||||
if a future change adds an early return or exception between `RecordSuccess` and
|
||||
`ThrowIfFailed`, the success is never reversed.
|
||||
|
||||
**Recommendation:** Classify the call once: do not call `RecordSuccess` until the
|
||||
sidecar-level `Success` flag has been checked, or pass the reply success/error into a
|
||||
single `RecordOutcome(bool transportOk, bool sidecarOk, string? error)` that updates all
|
||||
counters under one lock acquisition.
|
||||
|
||||
**Resolution:** _(open)_
|
||||
|
||||
### Driver.Historian.Wonderware.Client-005
|
||||
|
||||
| Field | Value |
|
||||
|---|---|
|
||||
| Severity | Medium |
|
||||
| Category | Error handling & resilience |
|
||||
| Location | `Ipc/FrameReader.cs:31-32` |
|
||||
| Status | Open |
|
||||
|
||||
**Description:** After reading the 4-byte length prefix, `ReadFrameAsync` reads the kind
|
||||
byte with the synchronous, blocking `_stream.ReadByte()` and ignores the
|
||||
`CancellationToken`. On a `NamedPipeClientStream` with `PipeOptions.Asynchronous`, a
|
||||
synchronous `ReadByte()` blocks the calling thread until a byte arrives or the pipe
|
||||
closes. If the sidecar sends a length prefix and then stalls (slow/hung peer), the call
|
||||
hangs on a thread-pool thread and the `EffectiveCallTimeout` linked token in
|
||||
`PipeChannel.InvokeAsync` cannot interrupt it because the timeout only fires between
|
||||
awaits. This defeats the documented cap on a single read/write call once connected and can
|
||||
wedge the single-in-flight call gate.
|
||||
|
||||
**Recommendation:** Read the kind byte asynchronously and cancellably: extend the length
|
||||
prefix read to 5 bytes, or do a second `ReadExactAsync(new byte[1], ct)`. This makes the
|
||||
whole frame read honor the call-timeout token and matches the async style of the rest of
|
||||
the reader.
|
||||
|
||||
**Resolution:** _(open)_
|
||||
|
||||
### Driver.Historian.Wonderware.Client-006
|
||||
|
||||
| Field | Value |
|
||||
|---|---|
|
||||
| Severity | Low |
|
||||
| Category | Error handling & resilience |
|
||||
| Location | `Internal/PipeChannel.cs:96-107`, `WonderwareHistorianClientOptions.cs:11-12` |
|
||||
| Status | Open |
|
||||
|
||||
**Description:** `PipeChannel.InvokeAsync` retries exactly once on transport failure and
|
||||
otherwise propagates. The options expose `ReconnectInitialBackoff` and
|
||||
`ReconnectMaxBackoff` and `WonderwareHistorianClientOptions` documents them as exponential
|
||||
backoff between reconnects, but neither field is referenced anywhere in the module: the
|
||||
single retry reconnects immediately with no delay. A sidecar that is restarting will
|
||||
reject or refuse the immediate reconnect, the call fails, and there is no backoff before
|
||||
the next caller-driven attempt. Either the backoff belongs in the channel and is missing,
|
||||
or the options are dead config that misleads operators.
|
||||
|
||||
**Recommendation:** Either implement the documented exponential backoff in the reconnect
|
||||
path, or remove the two unused option fields and their XML docs and state plainly that
|
||||
retry/backoff is owned by the caller (the alarm drain worker / history router).
|
||||
|
||||
**Resolution:** _(open)_
|
||||
|
||||
### Driver.Historian.Wonderware.Client-007
|
||||
|
||||
| Field | Value |
|
||||
|---|---|
|
||||
| Severity | Medium |
|
||||
| Category | Security |
|
||||
| Location | `WonderwareHistorianClient.cs:276` |
|
||||
| Status | Open |
|
||||
|
||||
**Description:** `ToSnapshots` deserializes peer-supplied bytes with
|
||||
`MessagePackSerializer.Deserialize<object>(dto.ValueBytes)`, typeless MessagePack
|
||||
deserialization. The `object` overload resolves runtime types from the wire payload. The
|
||||
client treats the pipe peer as untrusted elsewhere (16 MiB frame cap stated to protect
|
||||
the receiver from a hostile or buggy peer, shared-secret Hello). Typeless deserialization
|
||||
of bytes that originate from the historian database widens the trust surface. The
|
||||
MessagePack standard resolver is primitive-only by default so the practical blast radius
|
||||
is limited, but this is the pattern called out by the two suppressed MessagePack
|
||||
advisories on this project (see finding 008).
|
||||
|
||||
**Recommendation:** Confirm the serializer options here use the default (non-typeless)
|
||||
resolver and that no `TypelessContractlessStandardResolver` is in play; if so, document
|
||||
that. Prefer round-tripping the value as a constrained set of known primitive types rather
|
||||
than `object`, and validate `ValueBytes.Length` against a sane per-sample cap before
|
||||
deserializing.
|
||||
|
||||
**Resolution:** _(open)_
|
||||
|
||||
### Driver.Historian.Wonderware.Client-008
|
||||
|
||||
| Field | Value |
|
||||
|---|---|
|
||||
| Severity | Low |
|
||||
| Category | Security |
|
||||
| Location | `ZB.MOM.WW.OtOpcUa.Driver.Historian.Wonderware.Client.csproj:29-32` |
|
||||
| Status | Open |
|
||||
|
||||
**Description:** The csproj suppresses two NuGet audit advisories
|
||||
(`GHSA-37gx-xxp4-5rgx`, `GHSA-w3x6-4m5h-cxqf`) for the `MessagePack` 2.5.187 dependency
|
||||
with no inline comment recording why the suppression is safe, who reviewed it, or when it
|
||||
should be revisited. Blanket `NuGetAuditSuppress` entries silence the very signal that
|
||||
would flag the next related CVE. Combined with finding 007 (typeless deserialization), an
|
||||
unexplained MessagePack advisory suppression is a maintainability and audit-trail gap.
|
||||
|
||||
**Recommendation:** Add an XML comment next to each `NuGetAuditSuppress` stating the
|
||||
advisory title, why it does not apply to this module usage, and a revisit trigger. Track a
|
||||
follow-up to upgrade `MessagePack` once a patched version is available so the suppressions
|
||||
can be dropped.
|
||||
|
||||
**Resolution:** _(open)_
|
||||
|
||||
### Driver.Historian.Wonderware.Client-009
|
||||
|
||||
| Field | Value |
|
||||
|---|---|
|
||||
| Severity | Medium |
|
||||
| Category | Testing coverage |
|
||||
| Location | `tests/Drivers/ZB.MOM.WW.OtOpcUa.Driver.Historian.Wonderware.Client.Tests/WonderwareHistorianClientTests.cs` |
|
||||
| Status | Open |
|
||||
|
||||
**Description:** The suite covers happy paths, server-error, bad-secret, a single
|
||||
reconnect and health counters, but several critical paths are untested:
|
||||
(1) `ReadAtTimeAsync` with a partial/reordered sidecar reply, the contract-alignment case
|
||||
from finding 001 (the existing test only echoes the request);
|
||||
(2) the `WriteBatchAsync` catch branch, a transport/deserialization throw during a write,
|
||||
which must return `RetryPlease` for every event;
|
||||
(3) `InvokeAsync` second-attempt-also-fails path (the test only proves a successful
|
||||
reconnect, never a reconnect that fails again and propagates);
|
||||
(4) the `CallTimeout` path, no test asserts that a stalled sidecar produces a timed-out
|
||||
`OperationCanceledException`;
|
||||
(5) `MapAggregate` for `HistoryAggregateType.Total` throwing `NotSupportedException`;
|
||||
(6) the `InvalidDataException` path when the sidecar replies with an unexpected
|
||||
`MessageKind`. The byte-equality / round-trip parity test the Contracts.cs and Framing.cs
|
||||
comments repeatedly promise is not present in this test project.
|
||||
|
||||
**Recommendation:** Add the missing-edge-case tests above. In particular add the
|
||||
wire-parity test the source comments commit to: serialize each DTO with the client copy
|
||||
and assert byte-equality against the sidecar `Driver.Historian.Wonderware.Ipc` copy, so a
|
||||
silent `[Key]` drift between the two duplicated contract sets is caught at build time.
|
||||
|
||||
**Resolution:** _(open)_
|
||||
|
||||
### Driver.Historian.Wonderware.Client-010
|
||||
|
||||
| Field | Value |
|
||||
|---|---|
|
||||
| Severity | Low |
|
||||
| Category | Documentation & comments |
|
||||
| Location | `WonderwareHistorianClient.cs:355-361`, `WonderwareHistorianClient.cs:132-150` |
|
||||
| Status | Open |
|
||||
|
||||
**Description:** Two doc/behaviour mismatches.
|
||||
(1) The `Dispose()` XML comment asserts the underlying channel async cleanup is
|
||||
non-blocking so the `GetAwaiter()/GetResult()` bridge is safe. `PipeChannel.DisposeAsync`
|
||||
calls `ResetTransport()`, which invokes synchronous `Stream.Dispose()` on a
|
||||
`NamedPipeClientStream`; pipe disposal can block briefly on OS handle teardown. The bridge
|
||||
is safe (no deadlock, no captured context) but not strictly non-blocking; the comment
|
||||
should say "does not deadlock".
|
||||
(2) `GetHealthSnapshot` populates both `ProcessConnectionOpen` and `EventConnectionOpen`
|
||||
from the same `_channel.IsConnected`, and `ActiveProcessNode`/`ActiveEventNode`/`Nodes`
|
||||
are hard-coded to null/empty. A consumer reading `HistorianHealthSnapshot` would assume
|
||||
two independent connections and per-node health; this client has a single channel and no
|
||||
node concept. The collapse is reasonable but undocumented.
|
||||
|
||||
**Recommendation:** Reword the `Dispose()` comment to claim only deadlock-safety. Add a
|
||||
short remark on `GetHealthSnapshot` explaining that the single-channel client maps both
|
||||
connection flags to one transport and does not track per-node health.
|
||||
|
||||
**Resolution:** _(open)_
|
||||
337
code-reviews/Driver.Historian.Wonderware/findings.md
Normal file
337
code-reviews/Driver.Historian.Wonderware/findings.md
Normal file
@@ -0,0 +1,337 @@
|
||||
# Code Review — Driver.Historian.Wonderware
|
||||
|
||||
| Field | Value |
|
||||
|---|---|
|
||||
| Module | `src/Drivers/ZB.MOM.WW.OtOpcUa.Driver.Historian.Wonderware` |
|
||||
| Reviewer | Claude Code |
|
||||
| Review date | 2026-05-22 |
|
||||
| Commit reviewed | `76d35d1` |
|
||||
| Status | Reviewed |
|
||||
| Open findings | 12 |
|
||||
|
||||
## Checklist coverage
|
||||
|
||||
A comprehensive review completes every category, recording "No issues found" where
|
||||
a category produced nothing rather than leaving it blank.
|
||||
|
||||
| # | Category | Result |
|
||||
|---|---|---|
|
||||
| 1 | Correctness and logic bugs | Driver.Historian.Wonderware-001, -002, -003, -004 |
|
||||
| 2 | OtOpcUa conventions | No issues found |
|
||||
| 3 | Concurrency and thread safety | Driver.Historian.Wonderware-005 |
|
||||
| 4 | Error handling and resilience | Driver.Historian.Wonderware-006, -007, -008 |
|
||||
| 5 | Security | No issues found |
|
||||
| 6 | Performance and resource management | Driver.Historian.Wonderware-009, -010 |
|
||||
| 7 | Design-document adherence | Driver.Historian.Wonderware-011 |
|
||||
| 8 | Code organization and conventions | No issues found |
|
||||
| 9 | Testing coverage | Driver.Historian.Wonderware-012 |
|
||||
| 10 | Documentation and comments | No issues found |
|
||||
|
||||
## Findings
|
||||
|
||||
### Driver.Historian.Wonderware-001
|
||||
|
||||
| Field | Value |
|
||||
|---|---|
|
||||
| Severity | High |
|
||||
| Category | Correctness and logic bugs |
|
||||
| Location | `Backend/SdkAlarmHistorianWriteBackend.cs:68`, `Backend/AahClientManagedAlarmEventWriter.cs:82-103` |
|
||||
| Status | Open |
|
||||
|
||||
**Description:** `MalformedErrors` includes `HistorianAccessError.ErrorValue.WriteToReadOnlyFile`.
|
||||
When `ClassifyOutcome` routes that code through `MapOutcome`, `isMalformedInput` is
|
||||
`true`, so the per-event result becomes `PermanentFail` and the lmxopcua-side
|
||||
store-and-forward sink dead-letters the alarm event. But `WriteToReadOnlyFile` is
|
||||
not a property of the event payload; it is a connection-configuration fault (the
|
||||
write backend opened the session without `ReadOnly` set to `false`, or the SDK
|
||||
defaulted it). Treating it as permanent means a misconfigured or regressed
|
||||
connection would silently and permanently discard every alarm event in the batch
|
||||
instead of deferring them for retry once the connection is corrected.
|
||||
Alarm-event historization is the module's whole purpose, so this is data loss.
|
||||
|
||||
**Recommendation:** Move `WriteToReadOnlyFile` out of `MalformedErrors`. It should
|
||||
be treated as a connection-class error (abort the batch, reset the connection so
|
||||
the reconnect path can re-open with `ReadOnly = false`) or at minimum as
|
||||
`RetryPlease`, never `PermanentFail`.
|
||||
|
||||
**Resolution:** _(open)_
|
||||
|
||||
### Driver.Historian.Wonderware-002
|
||||
|
||||
| Field | Value |
|
||||
|---|---|
|
||||
| Severity | Medium |
|
||||
| Category | Correctness and logic bugs |
|
||||
| Location | `Ipc/HistorianFrameHandler.cs:162`, `:181` |
|
||||
| Status | Open |
|
||||
|
||||
**Description:** `HandleWriteAlarmEventsAsync` dereferences `req.Events.Length`
|
||||
in both the `_alarmWriter is null` branch (line 162) and the catch block (line
|
||||
181). MessagePack deserializes an absent or explicit-nil array field as a `null`
|
||||
reference, not `Array.Empty<T>()`. A client (or a buggy/hostile peer) that sends
|
||||
a `WriteAlarmEventsRequest` with a null `Events` array triggers a
|
||||
`NullReferenceException`. Although `RunOneConnectionAsync` would log it and accept
|
||||
the next connection, the request gets no reply frame, so the client correlation-id
|
||||
wait hangs until its own timeout. `AahClientManagedAlarmEventWriter.WriteAsync`
|
||||
already null-guards `events`; the frame handler does not.
|
||||
|
||||
**Recommendation:** Normalize `req.Events` to `Array.Empty<AlarmHistorianEventDto>()`
|
||||
immediately after deserialization (or guard each `.Length` access), consistent
|
||||
with the null-tolerance the writer already has.
|
||||
|
||||
**Resolution:** _(open)_
|
||||
|
||||
### Driver.Historian.Wonderware-003
|
||||
|
||||
| Field | Value |
|
||||
|---|---|
|
||||
| Severity | Medium |
|
||||
| Category | Correctness and logic bugs |
|
||||
| Location | `Backend/HistorianDataSource.cs:320-323`, `:457-460` |
|
||||
| Status | Open |
|
||||
|
||||
**Description:** Raw and at-time reads decide whether a sample is a string or a
|
||||
numeric with `if (!string.IsNullOrEmpty(result.StringValue) && result.Value == 0)`.
|
||||
The `result.Value == 0` clause is intended to distinguish a real numeric zero from
|
||||
a string tag whose numeric projection is zero, but it is wrong in both directions:
|
||||
a numeric (analog) tag that legitimately sampled the value `0` while the SDK also
|
||||
populates a non-empty `StringValue` (some Historian builds populate the formatted
|
||||
text on every result) is reported to OPC UA as a string, changing the variable
|
||||
data type mid-stream; conversely a string tag whose numeric projection is non-zero
|
||||
is reported as a numeric. The historian SDK exposes the tag actual data type,
|
||||
which should drive the branch instead of a value heuristic.
|
||||
|
||||
**Recommendation:** Select string vs. numeric from the SDK result tag-data-type
|
||||
field rather than from `Value == 0`. If the type field is genuinely unavailable in
|
||||
the bound SDK version, document the limitation explicitly and prefer numeric for
|
||||
analog/integer tags.
|
||||
|
||||
**Resolution:** _(open)_
|
||||
|
||||
### Driver.Historian.Wonderware-004
|
||||
|
||||
| Field | Value |
|
||||
|---|---|
|
||||
| Severity | Low |
|
||||
| Category | Correctness and logic bugs |
|
||||
| Location | `Backend/SdkAlarmHistorianWriteBackend.cs:198-201` |
|
||||
| Status | Open |
|
||||
|
||||
**Description:** `ToHistorianEvent` only assigns `historianEvent.Id` when
|
||||
`Guid.TryParse(dto.EventId, ...)` succeeds. If `EventId` is not a parseable GUID
|
||||
(or is empty), `Id` stays `Guid.Empty` and the event is written to the historian
|
||||
with an all-zeros identifier. Multiple such events collide on the same id, and the
|
||||
write is still accepted (`outcomes[i] = Ack`) so neither side detects the problem.
|
||||
The non-parseable case is never logged.
|
||||
|
||||
**Recommendation:** Log a warning when `EventId` fails to parse, and either reject
|
||||
the event as `PermanentFail` (malformed input) or synthesize a fresh
|
||||
`Guid.NewGuid()` so each event still gets a unique id.
|
||||
|
||||
**Resolution:** _(open)_
|
||||
|
||||
### Driver.Historian.Wonderware-005
|
||||
|
||||
| Field | Value |
|
||||
|---|---|
|
||||
| Severity | Low |
|
||||
| Category | Concurrency and thread safety |
|
||||
| Location | `Backend/HistorianDataSource.cs:124`, `:126-127` |
|
||||
| Status | Open |
|
||||
|
||||
**Description:** `GetHealthSnapshot` reads `_activeProcessNode` and
|
||||
`_activeEventNode` inside `_healthLock`, but those two fields are written under
|
||||
`_connectionLock` / `_eventConnectionLock` (lines 183, 243, 209-210, 266-269) — a
|
||||
different lock. The health-counter fields are correctly `_healthLock`-protected,
|
||||
but the active-node strings are published under one lock and read under another,
|
||||
so the snapshot can observe a stale active-node value relative to the
|
||||
connection-open booleans. This is a diagnostics-only path, so impact is limited to
|
||||
a momentarily inconsistent health snapshot.
|
||||
|
||||
**Recommendation:** Pick one lock for the active-node strings (publish them under
|
||||
`_healthLock` on every connection state change, or read them under the connection
|
||||
lock), so the snapshot is internally consistent.
|
||||
|
||||
**Resolution:** _(open)_
|
||||
|
||||
### Driver.Historian.Wonderware-006
|
||||
|
||||
| Field | Value |
|
||||
|---|---|
|
||||
| Severity | Medium |
|
||||
| Category | Error handling and resilience |
|
||||
| Location | `Ipc/PipeServer.cs:120-128` |
|
||||
| Status | Open |
|
||||
|
||||
**Description:** `RunAsync` re-accepts connections in a `while` loop. If
|
||||
`RunOneConnectionAsync` throws synchronously and immediately on every iteration
|
||||
(for example `new NamedPipeServerStream(...)` fails because the pipe name is
|
||||
already in use, or `PipeAcl.Create` throws), the loop spins with no delay and no
|
||||
backoff, pegging a CPU core and flooding the rolling log file with one `Error`
|
||||
line per iteration. There is no circuit-breaker or retry cap.
|
||||
|
||||
**Recommendation:** Add a short delay (exponential backoff capped at a few
|
||||
seconds) before re-accepting after a caught exception, and consider a
|
||||
consecutive-failure threshold that escalates to a fatal exit so the supervisor can
|
||||
restart the sidecar cleanly.
|
||||
|
||||
**Resolution:** _(open)_
|
||||
|
||||
### Driver.Historian.Wonderware-007
|
||||
|
||||
| Field | Value |
|
||||
|---|---|
|
||||
| Severity | Low |
|
||||
| Category | Error handling and resilience |
|
||||
| Location | `Ipc/PipeServer.cs:70-75` |
|
||||
| Status | Open |
|
||||
|
||||
**Description:** When `VerifyCaller` rejects the peer SID, the server logs the
|
||||
reason and calls `_current.Disconnect()` with no `HelloAck` frame sent. The
|
||||
shared-secret-mismatch and major-version-mismatch paths below it both send a
|
||||
rejecting `HelloAck` so the client learns why. A client that fails the SID check
|
||||
instead sees an abrupt disconnect and must rely on its own read timeout, with no
|
||||
diagnostic on the client side. The asymmetry also makes the SID-rejection path
|
||||
harder to test from the client.
|
||||
|
||||
**Recommendation:** Send a `HelloAck` with `Accepted = false` and a
|
||||
`caller-sid-mismatch` reject reason before disconnecting, consistent with the
|
||||
other two rejection paths.
|
||||
|
||||
**Resolution:** _(open)_
|
||||
|
||||
### Driver.Historian.Wonderware-008
|
||||
|
||||
| Field | Value |
|
||||
|---|---|
|
||||
| Severity | Low |
|
||||
| Category | Error handling and resilience |
|
||||
| Location | `Backend/HistorianDataSource.cs:301-307`, `:374-380` |
|
||||
| Status | Open |
|
||||
|
||||
**Description:** When `query.StartQuery` returns `false`, `ReadRawAsync` and
|
||||
`ReadAggregateAsync` call `HandleConnectionError()` and return an empty result
|
||||
list. A failed `StartQuery` is not necessarily a connection failure — it can be a
|
||||
bad tag name, an invalid time range, or an unsupported aggregate — yet the code
|
||||
unconditionally tears down the shared SDK connection. A burst of queries with one
|
||||
bad tag name therefore repeatedly drops and re-opens the (relatively expensive)
|
||||
historian connection and marks the cluster node failed via `HandleConnectionError`
|
||||
into `_picker.MarkFailed`, which can push an otherwise healthy node into cooldown.
|
||||
The empty-list result is also indistinguishable from "no data in range" to the
|
||||
caller — the `Success` flag on the reply will still be `true`.
|
||||
|
||||
**Recommendation:** Inspect `error.ErrorCode` to distinguish connection-class
|
||||
failures (reset and mark node failed) from query-class failures (leave the
|
||||
connection intact, surface the error). Consider returning a failed reply
|
||||
(`Success = false`) for query-class `StartQuery` failures so the client does not
|
||||
treat an SDK error as an empty history.
|
||||
|
||||
**Resolution:** _(open)_
|
||||
|
||||
### Driver.Historian.Wonderware-009
|
||||
|
||||
| Field | Value |
|
||||
|---|---|
|
||||
| Severity | Medium |
|
||||
| Category | Performance and resource management |
|
||||
| Location | `Backend/HistorianDataSource.cs:382-395`, `Ipc/Contracts.cs:85-99` |
|
||||
| Status | Open |
|
||||
|
||||
**Description:** `ReadAggregateAsync` drains `query.MoveNext` into `results` with
|
||||
no upper bound, unlike `ReadRawAsync`, which honours `maxValues` /
|
||||
`MaxValuesPerRead` and breaks. `ReadProcessedRequest` carries no max-buckets field.
|
||||
A processed read over a wide time range with a small `IntervalMs` produces an
|
||||
unbounded `HistorianAggregateSample` list; the handler then serializes it into
|
||||
`ReadProcessedReply`. If the serialized body exceeds the 16 MiB
|
||||
`Framing.MaxFrameBodyBytes` cap, `FrameWriter.WriteAsync` throws and the entire
|
||||
reply is lost (the client correlation wait hangs), and before that point the
|
||||
sidecar holds the whole result set in memory.
|
||||
|
||||
**Recommendation:** Apply `_config.MaxValuesPerRead` as a bucket cap in
|
||||
`ReadAggregateAsync` (mirroring the raw path), and/or add a `MaxBuckets` field to
|
||||
`ReadProcessedRequest`. Reject or truncate result sets that would exceed the frame
|
||||
cap with an explicit error reply rather than letting `WriteAsync` throw.
|
||||
|
||||
**Resolution:** _(open)_
|
||||
|
||||
### Driver.Historian.Wonderware-010
|
||||
|
||||
| Field | Value |
|
||||
|---|---|
|
||||
| Severity | Low |
|
||||
| Category | Performance and resource management |
|
||||
| Location | `Backend/HistorianConfiguration.cs:32-36`, `Backend/HistorianDataSource.cs` (all read methods) |
|
||||
| Status | Open |
|
||||
|
||||
**Description:** `HistorianConfiguration.RequestTimeoutSeconds` is documented as
|
||||
the "outer safety timeout applied to sync-over-async Historian operations" and is
|
||||
copied around (`SdkAlarmHistorianWriteBackend.CloneConfigWithServerName:346`), but
|
||||
it is never read or enforced anywhere. The `HistorianDataSource` read methods are
|
||||
declared `Task`-returning but execute the SDK calls synchronously on the caller
|
||||
thread and only check the `CancellationToken` between `MoveNext` iterations. There
|
||||
is no outer timeout: a hung `StartQuery` or a slow `MoveNext` blocks the single
|
||||
pipe-server connection thread indefinitely (the connect path has its own poll
|
||||
timeout, but the query path does not). The documented safety net does not exist.
|
||||
|
||||
**Recommendation:** Either wire `RequestTimeoutSeconds` into the read paths (a
|
||||
`CancellationTokenSource.CancelAfter` linked into `ct`, or run the SDK call on a
|
||||
worker with a bounded wait), or remove the property and its XML doc so the code
|
||||
does not advertise a guarantee it does not provide.
|
||||
|
||||
**Resolution:** _(open)_
|
||||
|
||||
### Driver.Historian.Wonderware-011
|
||||
|
||||
| Field | Value |
|
||||
|---|---|
|
||||
| Severity | Low |
|
||||
| Category | Design-document adherence |
|
||||
| Location | `Backend/HistorianDataSource.cs:9-12`, `Backend/IHistorianDataSource.cs:9-11`, `Backend/HistorianSample.cs:7-9`, `Backend/HistorianConfiguration.cs:7-9` |
|
||||
| Status | Open |
|
||||
|
||||
**Description:** Several XML doc comments reference the retired v1 architecture as
|
||||
if it were current: "inside Galaxy.Host", "the Proxy maps returned samples", "the
|
||||
Host returns these across the IPC boundary as `GalaxyDataValue`", "Populated from
|
||||
... the Proxy DriverInstance.DriverConfig". Per `CLAUDE.md`, PR 7.2 retired the
|
||||
`Galaxy.Host` / `Galaxy.Proxy` / `Galaxy.Shared` projects, and this driver is now a
|
||||
standalone sidecar whose client is the .NET 10 `WonderwareHistorianClient`
|
||||
(`docs/AlarmTracking.md`). The comments are stale and misdescribe the current data
|
||||
flow, which contradicts the "no stale design docs/comments" expectation in the
|
||||
review checklist.
|
||||
|
||||
**Recommendation:** Update the doc comments to describe the current sidecar/IPC
|
||||
architecture (sidecar talking to `WonderwareHistorianClient` over the named pipe),
|
||||
dropping the `Galaxy.Host` / `Proxy` / `GalaxyDataValue` references.
|
||||
|
||||
**Resolution:** _(open)_
|
||||
|
||||
### Driver.Historian.Wonderware-012
|
||||
|
||||
| Field | Value |
|
||||
|---|---|
|
||||
| Severity | Low |
|
||||
| Category | Testing coverage |
|
||||
| Location | `Backend/HistorianDataSource.cs`, `tests/Drivers/ZB.MOM.WW.OtOpcUa.Driver.Historian.Wonderware.Tests/` |
|
||||
| Status | Open |
|
||||
|
||||
**Description:** The unit-test suite covers `HistorianQualityMapper`,
|
||||
`HistorianClusterEndpointPicker`, `SdkAlarmHistorianWriteBackend`,
|
||||
`AahClientManagedAlarmEventWriter`, the IPC round trip, and `Program` alarm-writer
|
||||
wiring. `HistorianDataSource` itself — the largest and most logic-dense file in
|
||||
the module — has no direct unit coverage of its read paths, despite
|
||||
`IHistorianConnectionFactory` being explicitly extracted "so tests can inject
|
||||
fakes that control connection success, failure, and timeout behavior". The
|
||||
connect-failover-and-cooldown loop (`ConnectToAnyHealthyNode`), the mid-query
|
||||
connection-reset path (`HandleConnectionError`), the string-vs-numeric value
|
||||
selection (see -003), the at-time per-timestamp loop, and `ExtractAggregateValue`
|
||||
column dispatch are all untested. A stale empty test directory
|
||||
(`tests/ZB.MOM.WW.OtOpcUa.Driver.Historian.Wonderware.Tests/`, containing only
|
||||
`bin/obj`) also sits alongside the live `tests/Drivers/...` project and should be
|
||||
removed to avoid confusion.
|
||||
|
||||
**Recommendation:** Add `HistorianDataSource` tests driving an
|
||||
`IHistorianConnectionFactory` fake — covering failover, cooldown, mid-query reset,
|
||||
cancellation, and the value-type selection — and delete the stale empty
|
||||
`tests/ZB.MOM.WW.OtOpcUa.Driver.Historian.Wonderware.Tests/` directory.
|
||||
|
||||
**Resolution:** _(open)_
|
||||
238
code-reviews/Driver.Modbus.Addressing/findings.md
Normal file
238
code-reviews/Driver.Modbus.Addressing/findings.md
Normal file
@@ -0,0 +1,238 @@
|
||||
# Code Review — Driver.Modbus.Addressing
|
||||
|
||||
| Field | Value |
|
||||
|---|---|
|
||||
| Module | `src/Drivers/ZB.MOM.WW.OtOpcUa.Driver.Modbus.Addressing` |
|
||||
| Reviewer | Claude Code |
|
||||
| Review date | 2026-05-22 |
|
||||
| Commit reviewed | `76d35d1` |
|
||||
| Status | Reviewed |
|
||||
| Open findings | 9 |
|
||||
|
||||
## Checklist coverage
|
||||
|
||||
A comprehensive review completes every category, recording "No issues found" where
|
||||
a category produced nothing rather than leaving it blank.
|
||||
|
||||
| # | Category | Result |
|
||||
|---|---|---|
|
||||
| 1 | Correctness & logic bugs | Driver.Modbus.Addressing-001, -002, -003, -004 |
|
||||
| 2 | OtOpcUa conventions | No issues found |
|
||||
| 3 | Concurrency & thread safety | No issues found |
|
||||
| 4 | Error handling & resilience | Driver.Modbus.Addressing-005, -006 |
|
||||
| 5 | Security | No issues found |
|
||||
| 6 | Performance & resource management | No issues found |
|
||||
| 7 | Design-document adherence | Driver.Modbus.Addressing-001, -007 |
|
||||
| 8 | Code organization & conventions | No issues found |
|
||||
| 9 | Testing coverage | Driver.Modbus.Addressing-008 |
|
||||
| 10 | Documentation & comments | Driver.Modbus.Addressing-009 |
|
||||
|
||||
## Findings
|
||||
|
||||
### Driver.Modbus.Addressing-001
|
||||
|
||||
| Field | Value |
|
||||
|---|---|
|
||||
| Severity | High |
|
||||
| Category | Correctness & logic bugs |
|
||||
| Location | `ModbusAddressParser.cs:230-235`, `DirectLogicAddress.cs:66-73` |
|
||||
| Status | Open |
|
||||
|
||||
**Description:** The DL205 family-native branch routes every V-prefixed address through
|
||||
`DirectLogicAddress.UserVMemoryToPdu`, which is a plain octal-to-decimal conversion. DL205/DL260
|
||||
system V-memory (V40400 and up) is NOT a simple octal decode — per `docs/v2/dl205.md` section
|
||||
V-Memory, V40400 must map to Modbus PDU 0x2100 (decimal 8448) on a factory-mode ECOM module.
|
||||
The parser instead octal-decodes V40400 to decimal 16640 (0x4100), the wrong register. The
|
||||
`DirectLogicAddress.SystemVMemoryToPdu` / `SystemVMemoryBasePdu` helper that exists to do this
|
||||
correctly is never called by the parser — it is dead code from the parser point of view. A tag
|
||||
spreadsheet that addresses any DL system register through the grammar string silently reads and
|
||||
writes the wrong PLC memory. The companion test `ModbusFamilyParserTests.cs:20` bakes the wrong
|
||||
value (V40400 to 16640) into a passing assertion, so the regression is locked in.
|
||||
|
||||
**Recommendation:** Make the DL205 V branch detect the system bank (octal address >= 40400) and
|
||||
route it through `SystemVMemoryToPdu`, or explicitly reject system V-memory in the grammar string
|
||||
with a diagnostic pointing at the structured tag form. Either way, fix the V40400 test to assert
|
||||
the corrected mapping.
|
||||
|
||||
**Resolution:** _(open)_
|
||||
|
||||
### Driver.Modbus.Addressing-002
|
||||
|
||||
| Field | Value |
|
||||
|---|---|
|
||||
| Severity | Medium |
|
||||
| Category | Correctness & logic bugs |
|
||||
| Location | `ModbusAddressParser.cs:86-94` |
|
||||
| Status | Open |
|
||||
|
||||
**Description:** In the 3-field disambiguation, an empty 3rd field (`40001:F:`) reaches
|
||||
`parts[2].All(char.IsDigit)`. `Enumerable.All` returns true for an empty sequence, so the empty
|
||||
string is classified as a valid-shaped array count, assigned to `countPart`, then silently dropped
|
||||
by the later `string.IsNullOrEmpty(countPart)` guard. The result is that `40001:F:` parses
|
||||
successfully as a plain scalar with a dangling empty field rather than being rejected as
|
||||
malformed. The 4-field form `40001:F::` has the analogous effect. A user who mistypes a trailing
|
||||
colon gets no diagnostic.
|
||||
|
||||
**Recommendation:** Reject an empty 3rd field explicitly, or guard the `All(char.IsDigit)` branch
|
||||
with `parts[2].Length > 0`.
|
||||
|
||||
**Resolution:** _(open)_
|
||||
|
||||
### Driver.Modbus.Addressing-003
|
||||
|
||||
| Field | Value |
|
||||
|---|---|
|
||||
| Severity | Medium |
|
||||
| Category | Correctness & logic bugs |
|
||||
| Location | `ModbusAddressParser.cs:405-406`, `ModbusAddressParser.cs:128` |
|
||||
| Status | Open |
|
||||
|
||||
**Description:** `LooksLikeByteOrderToken` classifies any 4-letter token as a byte-order token.
|
||||
A 3-field address whose 3rd field is a 4-letter type-like token (e.g. `40001:S:BOOL`) is routed
|
||||
into `TryParseByteOrder`, producing the misleading diagnostic "Unknown byte order BOOL" instead
|
||||
of telling the user the type belongs in field 2. The type code BOOL is exactly 4 letters and
|
||||
could only ever be intended as a type — the shape heuristic cannot tell a mistyped type from a
|
||||
byte order, so the diagnostic actively misdirects.
|
||||
|
||||
**Recommendation:** When `TryParseByteOrder` fails on a 4-letter token in the 3-field form, widen
|
||||
the error message to mention that field 3 is a byte order and field 2 is the type, or attempt a
|
||||
type-parse fallback before emitting the byte-order error.
|
||||
|
||||
**Resolution:** _(open)_
|
||||
|
||||
### Driver.Modbus.Addressing-004
|
||||
|
||||
| Field | Value |
|
||||
|---|---|
|
||||
| Severity | Medium |
|
||||
| Category | Correctness & logic bugs |
|
||||
| Location | `ModbusAddressParser.cs:182-194` |
|
||||
| Status | Open |
|
||||
|
||||
**Description:** The bit suffix is stripped using `text.IndexOf('.')` — the first dot. An input
|
||||
such as `40001.5.3` produces a bit text of "5.3", rejected by `byte.TryParse` with the generic
|
||||
"Bit index must be 0..15" message. A Modicon-style decimal-point typo like `400.01` is silently
|
||||
treated as region/offset 400 plus bit 01; 400 then fails Modicon length validation, so the
|
||||
surfaced error is the Modicon length diagnostic rather than a bit-index diagnostic, because the
|
||||
bit was parsed first and 01 is a valid bit. The dot-handling assumes a single dot without
|
||||
asserting it, and the diagnostics for these malformed inputs are inconsistent.
|
||||
|
||||
**Recommendation:** Use `LastIndexOf('.')` or assert exactly one dot, and validate that the
|
||||
region/offset segment is non-empty and dot-free after the strip so malformed inputs get a precise
|
||||
diagnostic.
|
||||
|
||||
**Resolution:** _(open)_
|
||||
|
||||
### Driver.Modbus.Addressing-005
|
||||
|
||||
| Field | Value |
|
||||
|---|---|
|
||||
| Severity | Medium |
|
||||
| Category | Error handling & resilience |
|
||||
| Location | `ModbusAddressParser.cs:200-213` |
|
||||
| Status | Open |
|
||||
|
||||
**Description:** `TryParseRegionAndOffset` tries family-native, then mnemonic, then Modicon. When
|
||||
all three fail it returns false with whatever error the Modicon parser last wrote (comment: "the
|
||||
Modicon error is the more specific diagnostic"). For a non-Generic family this is misleading:
|
||||
`TryParseFamilyNative` returns false with error left null for any address that does not start with
|
||||
a recognised family prefix, and even for recognised prefixes it only sets error inside the catch.
|
||||
The subsequent mnemonic and Modicon attempts overwrite error. Net effect: a clearly
|
||||
family-native-shaped input that fails deep in the family helper can still surface a generic
|
||||
Modicon "must be 5 or 6 digits" error, hiding the real cause (e.g. "contains non-octal digit").
|
||||
|
||||
**Recommendation:** When a non-Generic family is configured and the input matches a family
|
||||
prefix, prefer and preserve the family-native error rather than letting the Modicon fallback
|
||||
overwrite it.
|
||||
|
||||
**Resolution:** _(open)_
|
||||
|
||||
### Driver.Modbus.Addressing-006
|
||||
|
||||
| Field | Value |
|
||||
|---|---|
|
||||
| Severity | Low |
|
||||
| Category | Error handling & resilience |
|
||||
| Location | `ModbusAddressParser.cs:297-301` |
|
||||
| Status | Open |
|
||||
|
||||
**Description:** `TryParseFamilyNative` catches only `ArgumentException` and `OverflowException`.
|
||||
The current helpers throw only those (including `ArgumentOutOfRangeException`, which derives from
|
||||
`ArgumentException`), so today it is correct. But the parser intent is to convert helper
|
||||
exceptions into structured errors; any future helper change that throws a different exception type
|
||||
(e.g. a `FormatException` from a `ushort.Parse` swap) would escape as an unhandled exception out
|
||||
of a `TryParse` method, violating the try-parse contract that config-bind hot-path callers
|
||||
depend on.
|
||||
|
||||
**Recommendation:** Either document the exact exception contract of the helpers and keep the
|
||||
narrow catch, or broaden to a general catch-all that records the message — a try-parse method
|
||||
should never throw.
|
||||
|
||||
**Resolution:** _(open)_
|
||||
|
||||
### Driver.Modbus.Addressing-007
|
||||
|
||||
| Field | Value |
|
||||
|---|---|
|
||||
| Severity | Low |
|
||||
| Category | Design-document adherence |
|
||||
| Location | `ModbusDataType.cs:91-95`, `docs/v2/dl205.md` section Strings |
|
||||
| Status | Open |
|
||||
|
||||
**Description:** `ModbusStringByteOrder` (HighByteFirst / LowByteFirst) is defined in this
|
||||
assembly and documented as the DL205 low-byte-first string-packing knob, but `ParsedModbusAddress`
|
||||
has no field for it and `ModbusAddressParser` never produces or consumes it. The `STR<n>` grammar
|
||||
form cannot express the DL205 string byte order described in `docs/v2/dl205.md` — a DL205 string
|
||||
tag parsed from the grammar string always carries the default order. The enum is effectively
|
||||
unreachable from the parser, so the grammar cannot represent a known, documented device quirk.
|
||||
|
||||
**Recommendation:** Either add a `StringByteOrder` field to `ParsedModbusAddress` plus a grammar
|
||||
token for it, or document explicitly that DL205 string byte order is only configurable via the
|
||||
structured tag form and is intentionally out of grammar scope.
|
||||
|
||||
**Resolution:** _(open)_
|
||||
|
||||
### Driver.Modbus.Addressing-008
|
||||
|
||||
| Field | Value |
|
||||
|---|---|
|
||||
| Severity | Medium |
|
||||
| Category | Testing coverage |
|
||||
| Location | `tests/Drivers/ZB.MOM.WW.OtOpcUa.Driver.Modbus.Addressing.Tests/` |
|
||||
| Status | Open |
|
||||
|
||||
**Description:** Several edge cases of the address arithmetic are untested or asserted wrong:
|
||||
(a) DL205 system V-memory mapping is tested only with the incorrect expected value
|
||||
(`ModbusFamilyParserTests.cs:20`, see finding -001); (b) there is no test for `UserVMemoryToPdu`
|
||||
or `AddOctalOffset` overflow (V200000, C200000) hitting the `OverflowException` path; (c) no test
|
||||
for the empty-trailing-field cases of finding -002; (d) `MelsecAddress.ParseHex` overflow and
|
||||
`DRegisterToHolding` / `MRelayToCoil` bank-base overflow are untested; (e) no test that
|
||||
`SystemVMemoryToPdu` is exercised at all. The address-arithmetic overflow and off-by-one paths
|
||||
are exactly the high-risk surface this module owns, and they are the least covered.
|
||||
|
||||
**Recommendation:** Add overflow/boundary tests for every PDU/coil/discrete translation helper
|
||||
and for the parser count/bit/field edge cases. Correct the V40400 assertion as part of fixing
|
||||
finding -001.
|
||||
|
||||
**Resolution:** _(open)_
|
||||
|
||||
### Driver.Modbus.Addressing-009
|
||||
|
||||
| Field | Value |
|
||||
|---|---|
|
||||
| Severity | Low |
|
||||
| Category | Documentation & comments |
|
||||
| Location | `ModbusModiconAddress.cs:55-64`, `ModbusModiconAddress.cs:104-110` |
|
||||
| Status | Open |
|
||||
|
||||
**Description:** The comments on `ModbusModiconAddress.TryParse` are slightly inaccurate. The
|
||||
remark that 5-digit Modicon is always exactly 5 chars (40001..49999) and 6-digit is exactly 6
|
||||
(400001..465536-shaped) implies the leading digit is always 4, but the parser accepts leading
|
||||
0/1/3 too — a 5-digit coil is 00001..09999, not 40001..49999. Separately, the line-106 comment
|
||||
says the 5-digit form caps at 9999 by construction while the adjacent code path applies the same
|
||||
`> 65536` check to both forms; the comment describes an invariant the code does not rely on.
|
||||
|
||||
**Recommendation:** Reword the range examples to cover all four region digits and drop the
|
||||
caps-at-9999 aside or restate it as a precise statement about trailing-digit count.
|
||||
|
||||
**Resolution:** _(open)_
|
||||
234
code-reviews/Driver.Modbus.Cli/findings.md
Normal file
234
code-reviews/Driver.Modbus.Cli/findings.md
Normal file
@@ -0,0 +1,234 @@
|
||||
# Code Review — Driver.Modbus.Cli
|
||||
|
||||
| Field | Value |
|
||||
|---|---|
|
||||
| Module | `src/Drivers/Cli/ZB.MOM.WW.OtOpcUa.Driver.Modbus.Cli` |
|
||||
| Reviewer | Claude Code |
|
||||
| Review date | 2026-05-22 |
|
||||
| Commit reviewed | `76d35d1` |
|
||||
| Status | Reviewed |
|
||||
| Open findings | 8 |
|
||||
|
||||
## Checklist coverage
|
||||
|
||||
A comprehensive review completes every category, recording "No issues found" where
|
||||
a category produced nothing rather than leaving it blank.
|
||||
|
||||
| # | Category | Result |
|
||||
|---|---|---|
|
||||
| 1 | Correctness & logic bugs | Driver.Modbus.Cli-001, Driver.Modbus.Cli-002, Driver.Modbus.Cli-003 |
|
||||
| 2 | OtOpcUa conventions | No issues found |
|
||||
| 3 | Concurrency & thread safety | Driver.Modbus.Cli-004 |
|
||||
| 4 | Error handling & resilience | Driver.Modbus.Cli-005, Driver.Modbus.Cli-006 |
|
||||
| 5 | Security | No issues found |
|
||||
| 6 | Performance & resource management | No issues found |
|
||||
| 7 | Design-document adherence | Driver.Modbus.Cli-007 |
|
||||
| 8 | Code organization & conventions | No issues found |
|
||||
| 9 | Testing coverage | Driver.Modbus.Cli-008 |
|
||||
| 10 | Documentation & comments | No issues found |
|
||||
|
||||
## Findings
|
||||
|
||||
### Driver.Modbus.Cli-001
|
||||
|
||||
| Field | Value |
|
||||
|---|---|
|
||||
| Severity | Medium |
|
||||
| Category | Correctness & logic bugs |
|
||||
| Location | `src/Drivers/Cli/ZB.MOM.WW.OtOpcUa.Driver.Modbus.Cli/Commands/SubscribeCommand.cs:43-51` |
|
||||
| Status | Open |
|
||||
|
||||
**Description:** `SubscribeCommand` synthesises its `ModbusTagDefinition` with only
|
||||
`Name`, `Region`, `Address`, `DataType`, `Writable`, and `ByteOrder` — it never
|
||||
exposes or passes `--bit-index`, `--string-length`, or `--string-byte-order`.
|
||||
A user running `subscribe -t BitInRegister` always watches bit 0 regardless of
|
||||
intent, and `subscribe -t String` runs with `StringLength = 0`. The doc
|
||||
(`docs/Driver.Modbus.Cli.md`) lists `BitInRegister`, `String`, `Bcd16`, `Bcd32`
|
||||
in the `subscribe` `--type` help text, so these types are advertised as supported
|
||||
but cannot be used correctly. `read` and `write` both expose all three flags;
|
||||
`subscribe` is the odd one out.
|
||||
|
||||
**Recommendation:** Add `--bit-index`, `--string-length`, and `--string-byte-order`
|
||||
options to `SubscribeCommand` (mirroring `ReadCommand`) and pass them into the
|
||||
`ModbusTagDefinition`, or trim the `--type` help text to the types `subscribe`
|
||||
actually supports and reject `BitInRegister` / `String` at command entry with a
|
||||
clear message.
|
||||
|
||||
**Resolution:** _(open)_
|
||||
|
||||
### Driver.Modbus.Cli-002
|
||||
|
||||
| Field | Value |
|
||||
|---|---|
|
||||
| Severity | Medium |
|
||||
| Category | Correctness & logic bugs |
|
||||
| Location | `src/Drivers/Cli/ZB.MOM.WW.OtOpcUa.Driver.Modbus.Cli/Commands/WriteCommand.cs:54-89` |
|
||||
| Status | Open |
|
||||
|
||||
**Description:** `WriteCommand` rejects read-only regions (`DiscreteInputs` /
|
||||
`InputRegisters`) but does not validate that `--type` is meaningful for the
|
||||
`Coils` region. `write -r Coils -a 5 -t UInt16 -v 42` builds a `Coils` tag with
|
||||
`DataType = UInt16`; the value parses to a boxed `ushort`, and the driver's
|
||||
`WriteOneAsync` coil branch calls `Convert.ToBoolean(value)` which succeeds for
|
||||
any non-zero `ushort` (yields `true`). The write silently lands as a coil ON with
|
||||
no diagnostic, even though the operator asked for a 16-bit register write. A coil
|
||||
region only supports `Bool`-style boolean values.
|
||||
|
||||
**Recommendation:** After the read-only-region check, reject `Region == Coils`
|
||||
combined with any non-boolean `--type` (anything other than `Bool`), with a
|
||||
message explaining coils carry a single bit.
|
||||
|
||||
**Resolution:** _(open)_
|
||||
|
||||
### Driver.Modbus.Cli-003
|
||||
|
||||
| Field | Value |
|
||||
|---|---|
|
||||
| Severity | Low |
|
||||
| Category | Correctness & logic bugs |
|
||||
| Location | `src/Drivers/Cli/ZB.MOM.WW.OtOpcUa.Driver.Modbus.Cli/ModbusCommandBase.cs:14-24` |
|
||||
| Status | Open |
|
||||
|
||||
**Description:** `Port` (`int`) and `TimeoutMs` (`int`) accept any 32-bit value,
|
||||
including negatives and ports above 65535. `UnitId` is a `byte`, so it accepts
|
||||
0-255 even though the option description and `docs/Driver.Modbus.Cli.md` both say
|
||||
the valid range is 1-247 (0 is the Modbus broadcast address; 248-255 are
|
||||
reserved). A negative `--timeout-ms` becomes a negative `TimeSpan` passed straight
|
||||
into the driver; an out-of-range `--port` fails later with an opaque socket
|
||||
error. None of these are validated at parse time.
|
||||
|
||||
**Recommendation:** Validate `Port` (1-65535), `TimeoutMs` (greater than 0), and
|
||||
`UnitId` (1-247) at the top of each command's `ExecuteAsync` (or in
|
||||
`ModbusCommandBase`), throwing `CliFx.Exceptions.CommandException` with a clear
|
||||
message — consistent with how `WriteCommand` already rejects bad regions and
|
||||
boolean strings.
|
||||
|
||||
**Resolution:** _(open)_
|
||||
|
||||
|
||||
### Driver.Modbus.Cli-004
|
||||
|
||||
| Field | Value |
|
||||
|---|---|
|
||||
| Severity | Low |
|
||||
| Category | Concurrency & thread safety |
|
||||
| Location | `src/Drivers/Cli/ZB.MOM.WW.OtOpcUa.Driver.Modbus.Cli/Commands/SubscribeCommand.cs:61-67` |
|
||||
| Status | Open |
|
||||
|
||||
**Description:** The `OnDataChange` handler is invoked from the driver's
|
||||
`PollGroupEngine` background thread and calls `console.Output.WriteLine`
|
||||
synchronously. An exception thrown inside this handler (e.g. an `IOException` on a
|
||||
redirected or closed stdout) propagates on the poll-engine thread and is not
|
||||
caught — it could fault the background loop. For a long-running `subscribe` this
|
||||
is a real, if low-probability, crash path. Output lines are also written without
|
||||
any synchronization, so overlapping poll ticks could interleave partial lines.
|
||||
|
||||
**Recommendation:** Wrap the handler body in a `try/catch` that swallows or logs
|
||||
write failures so a transient console-write error cannot tear down the poll loop.
|
||||
A single `lock` around the write also removes the interleave risk.
|
||||
|
||||
**Resolution:** _(open)_
|
||||
|
||||
### Driver.Modbus.Cli-005
|
||||
|
||||
| Field | Value |
|
||||
|---|---|
|
||||
| Severity | Low |
|
||||
| Category | Error handling & resilience |
|
||||
| Location | `src/Drivers/Cli/ZB.MOM.WW.OtOpcUa.Driver.Modbus.Cli/Commands/ProbeCommand.cs:21-54`; `Commands/ReadCommand.cs:46-75`; `Commands/WriteCommand.cs:54-89` |
|
||||
| Status | Open |
|
||||
|
||||
**Description:** All three commands call `ConfigureLogging()` then
|
||||
`console.RegisterCancellationHandler()`, but if the operator presses Ctrl+C
|
||||
before `InitializeAsync` completes, the resulting `OperationCancelledException`
|
||||
propagates out of `ExecuteAsync` unhandled. CliFx renders unhandled non-
|
||||
`CommandException` exceptions as a full stack trace, which is noisy for what is
|
||||
just a user-cancelled run. `SubscribeCommand` correctly catches
|
||||
`OperationCancelledException` around its `Task.Delay`, but the connect/read/write
|
||||
commands do not catch it around their driver calls.
|
||||
|
||||
**Recommendation:** Either let cancellation surface a clean message (catch
|
||||
`OperationCancelledException` in each command and exit quietly) or document that
|
||||
the noisy trace on Ctrl+C-during-connect is acceptable. Consistency with
|
||||
`SubscribeCommand`'s handling is the cleaner choice.
|
||||
|
||||
**Resolution:** _(open)_
|
||||
|
||||
### Driver.Modbus.Cli-006
|
||||
|
||||
| Field | Value |
|
||||
|---|---|
|
||||
| Severity | Low |
|
||||
| Category | Error handling & resilience |
|
||||
| Location | `src/Drivers/Cli/ZB.MOM.WW.OtOpcUa.Driver.Modbus.Cli/Commands/ProbeCommand.cs:35-53` |
|
||||
| Status | Open |
|
||||
|
||||
**Description:** `probe` reports `Health: {health.State}` from `GetHealth()`.
|
||||
After a successful `InitializeAsync` the driver sets state to `Healthy`
|
||||
regardless of whether the subsequent probe register read returns Good or a Bad
|
||||
status code. `ReadAsync` does not throw on a Modbus exception response — it
|
||||
returns a `DataValueSnapshot` with a Bad `StatusCode`. So `probe` against a host
|
||||
that accepts the TCP connection but rejects FC03 at the probe address prints
|
||||
`Health: Healthy` while the snapshot line below shows a Bad status. The two lines
|
||||
disagree, and the headline `Health` value (the thing an operator scans first)
|
||||
overstates success. The doc bills `probe` as the "is the PLC up + talking Modbus"
|
||||
check, which the bare `Healthy` does not actually confirm.
|
||||
|
||||
**Recommendation:** Have `probe` derive its headline verdict from the probe
|
||||
snapshot's `StatusCode` (Good vs Bad) rather than — or in addition to — the driver
|
||||
`State`, or print a single combined verdict line so the two cannot contradict each
|
||||
other.
|
||||
|
||||
**Resolution:** _(open)_
|
||||
|
||||
### Driver.Modbus.Cli-007
|
||||
|
||||
| Field | Value |
|
||||
|---|---|
|
||||
| Severity | Low |
|
||||
| Category | Design-document adherence |
|
||||
| Location | `docs/Driver.Modbus.Cli.md:124-156`; `src/Drivers/Cli/ZB.MOM.WW.OtOpcUa.Driver.Modbus.Cli/Commands/ReadCommand.cs` |
|
||||
| Status | Open |
|
||||
|
||||
**Description:** `docs/Driver.Modbus.Cli.md` devotes a whole "v2 addressing
|
||||
grammar" section to the industry-standard tag-address strings (`40001:F:CDAB`,
|
||||
`HR1:I`, `C100`, `V2000:F:CDAB`, etc.) and says "set the per-tag `addressString`
|
||||
field instead of the structured `region` + `address` + `dataType` fields." None of
|
||||
the CLI commands expose an `--address-string` (or equivalent) flag — `read`,
|
||||
`write`, and `subscribe` only accept the structured `--region` + `--address` +
|
||||
`--type` triple. The documented address-string grammar is reachable only through a
|
||||
hand-written `DriverConfig` JSON, not through this CLI. The doc reads as if the CLI
|
||||
supports it.
|
||||
|
||||
**Recommendation:** Either add an `--address-string` option that feeds the
|
||||
driver's address-string parser (and `--family` for the DL205/MELSEC native
|
||||
syntax), or scope the "v2 addressing grammar" section of the doc to note it
|
||||
applies to `DriverConfig` JSON and is not a CLI flag.
|
||||
|
||||
**Resolution:** _(open)_
|
||||
|
||||
### Driver.Modbus.Cli-008
|
||||
|
||||
| Field | Value |
|
||||
|---|---|
|
||||
| Severity | Low |
|
||||
| Category | Testing coverage |
|
||||
| Location | `tests/Drivers/Cli/ZB.MOM.WW.OtOpcUa.Driver.Modbus.Cli.Tests/` |
|
||||
| Status | Open |
|
||||
|
||||
**Description:** The test project covers only the two pure-function seams:
|
||||
`ReadCommand.SynthesiseTagName` and `WriteCommand.ParseValue`. There is no coverage
|
||||
for `WriteCommand`'s read-only-region rejection (`Region is not (Coils or
|
||||
HoldingRegisters)`), no test for `ModbusCommandBase.BuildOptions` (e.g. that
|
||||
`Probe.Enabled` is `false` and `AutoReconnect` tracks `--disable-reconnect`), and
|
||||
no test asserting unsupported write types throw. The branch logic in
|
||||
`WriteCommand.ExecuteAsync` and `ModbusCommandBase.BuildOptions` is the part most
|
||||
likely to regress and is currently untested. The validation gaps in findings
|
||||
002/003 are also untested precisely because no test exercises that path.
|
||||
|
||||
**Recommendation:** Add tests for `WriteCommand`'s region-validation branch and for
|
||||
`ModbusCommandBase.BuildOptions` (construct a command instance via the `init`
|
||||
setters and assert the produced `ModbusDriverOptions`). Once findings 002/003 are
|
||||
fixed, add tests for the new validation paths.
|
||||
|
||||
**Resolution:** _(open)_
|
||||
207
code-reviews/Driver.Modbus/findings.md
Normal file
207
code-reviews/Driver.Modbus/findings.md
Normal file
@@ -0,0 +1,207 @@
|
||||
# Code Review — Driver.Modbus
|
||||
|
||||
| Field | Value |
|
||||
|---|---|
|
||||
| Module | `src/Drivers/ZB.MOM.WW.OtOpcUa.Driver.Modbus` |
|
||||
| Reviewer | Claude Code |
|
||||
| Review date | 2026-05-22 |
|
||||
| Commit reviewed | `76d35d1` |
|
||||
| Status | Reviewed |
|
||||
| Open findings | 12 |
|
||||
|
||||
## Checklist coverage
|
||||
|
||||
| # | Category | Result |
|
||||
|---|---|---|
|
||||
| 1 | Correctness & logic bugs | Driver.Modbus-002, Driver.Modbus-005, Driver.Modbus-009 |
|
||||
| 2 | OtOpcUa conventions | No issues found |
|
||||
| 3 | Concurrency & thread safety | Driver.Modbus-001, Driver.Modbus-003 |
|
||||
| 4 | Error handling & resilience | Driver.Modbus-006, Driver.Modbus-010 |
|
||||
| 5 | Security | No issues found |
|
||||
| 6 | Performance & resource management | Driver.Modbus-004 |
|
||||
| 7 | Design-document adherence | Driver.Modbus-007 |
|
||||
| 8 | Code organization & conventions | Driver.Modbus-011 |
|
||||
| 9 | Testing coverage | Driver.Modbus-012 |
|
||||
| 10 | Documentation & comments | Driver.Modbus-008 |
|
||||
|
||||
## Findings
|
||||
|
||||
### Driver.Modbus-001
|
||||
|
||||
| Field | Value |
|
||||
|---|---|
|
||||
| Severity | High |
|
||||
| Category | Concurrency & thread safety |
|
||||
| Location | `ModbusDriver.cs:92,99-122` |
|
||||
| Status | Open |
|
||||
|
||||
**Description:** `_lastPublishedByRef` is a plain `Dictionary<string, object>` mutated inside `ShouldPublish`, which runs on the `PollGroupEngine.onChange` callback. `PollGroupEngine` runs one background `Task` per subscription (`PollGroupEngine.cs:64`), so a driver with two or more subscriptions invokes `onChange` — and therefore `ShouldPublish` — concurrently on separate threads. `ShouldPublish` does `TryGetValue` and indexer writes on the unsynchronized dictionary (`ModbusDriver.cs:108`, `112`, `120`). Concurrent reads/writes of a non-thread-safe `Dictionary` can corrupt internal state, drop entries, or throw `IndexOutOfRangeException`/`InvalidOperationException`, crashing the poll loop. The sibling cache `_lastWrittenByRef` is correctly guarded by `_lastWrittenLock` — only the deadband cache was left unprotected.
|
||||
|
||||
**Recommendation:** Guard `_lastPublishedByRef` with a dedicated lock around every access in `ShouldPublish`, or switch it to `ConcurrentDictionary<string, object>` and use `AddOrUpdate`/`TryGetValue`.
|
||||
|
||||
**Resolution:** _(open)_
|
||||
|
||||
### Driver.Modbus-002
|
||||
|
||||
| Field | Value |
|
||||
|---|---|
|
||||
| Severity | Medium |
|
||||
| Category | Correctness & logic bugs |
|
||||
| Location | `ModbusDriver.cs:127-186` |
|
||||
| Status | Open |
|
||||
|
||||
**Description:** `ShutdownAsync` never clears `_tagsByName`, and `InitializeAsync` repopulates it with `_tagsByName[t.Name] = t` (`ModbusDriver.cs:134`) without clearing first. `ReinitializeAsync` calls `ShutdownAsync` then `InitializeAsync`. Because `_options.Tags` is fixed for a driver instance, the same set re-inserts harmlessly today — but the asymmetry is a latent bug: any future path that re-runs init with a different tag set leaves stale tag entries that resolve reads/writes against deleted nodes. `_lastPublishedByRef` and `_lastWrittenByRef` similarly survive a Reinitialize, retaining deadband/write-suppression baselines against the old config, while `_autoProhibited` *is* deliberately cleared (`ModbusDriver.cs:179`) — the inconsistency shows the clearing was simply overlooked.
|
||||
|
||||
**Recommendation:** Clear `_tagsByName`, `_lastPublishedByRef`, and `_lastWrittenByRef` in `ShutdownAsync` (or at the top of `InitializeAsync`) so a Reinitialize starts from a clean state, consistent with the existing `_autoProhibited.Clear()`.
|
||||
|
||||
**Resolution:** _(open)_
|
||||
|
||||
### Driver.Modbus-003
|
||||
|
||||
| Field | Value |
|
||||
|---|---|
|
||||
| Severity | Low |
|
||||
| Category | Concurrency & thread safety |
|
||||
| Location | `ModbusDriver.cs:59,188,241,259,266,726,745,759` |
|
||||
| Status | Open |
|
||||
|
||||
**Description:** `_health` is a non-`volatile` reference field written from multiple threads (concurrent `ReadAsync` callers, the coalesced-read path, `WriteAsync` indirectly, and `ProbeLoopAsync`) and read by `GetHealth()`. Reference assignment is atomic on .NET so a torn read cannot occur, but there is no happens-before ordering: a stale `DriverHealth` can be observed on another core, and concurrent writers race so "last write wins" is non-deterministic (a `Degraded` write from a failed read can clobber a just-published `Healthy`, or vice versa).
|
||||
|
||||
**Recommendation:** Mark `_health` `volatile`, or assign via `Volatile.Write` and read with `Volatile.Read`, to give `GetHealth()` a defined ordering guarantee.
|
||||
|
||||
**Resolution:** _(open)_
|
||||
|
||||
### Driver.Modbus-004
|
||||
|
||||
| Field | Value |
|
||||
|---|---|
|
||||
| Severity | Medium |
|
||||
| Category | Performance & resource management |
|
||||
| Location | `ModbusDriver.cs:1468-1473` |
|
||||
| Status | Open |
|
||||
|
||||
**Description:** `DisposeAsync()` only disposes `_transport`. Unlike `ShutdownAsync`, it does not cancel/dispose `_probeCts` or `_reprobeCts`, nor dispose `_poll` (the `PollGroupEngine`). A caller that uses `await using` or `using` without first calling `ShutdownAsync` leaks the probe loop, the re-probe loop, and every active polled subscription background `Task`/`CancellationTokenSource`. The two `Task.Run` loops keep running against a disposed transport, throwing on every tick. `Dispose()` (sync) has the same gap and additionally blocks on the async path via `GetAwaiter().GetResult()`.
|
||||
|
||||
**Recommendation:** Make `DisposeAsync` perform the same teardown as `ShutdownAsync` (cancel both CTSs, dispose them, dispose `_poll`) before disposing `_transport`. Have `ShutdownAsync` and `DisposeAsync` share a private `TeardownAsync` helper.
|
||||
|
||||
**Resolution:** _(open)_
|
||||
|
||||
### Driver.Modbus-005
|
||||
|
||||
| Field | Value |
|
||||
|---|---|
|
||||
| Severity | Medium |
|
||||
| Category | Correctness & logic bugs |
|
||||
| Location | `ModbusDriver.cs:777-798,323-330` |
|
||||
| Status | Open |
|
||||
|
||||
**Description:** `ReadRegisterBlockAsync` and `ReadBitBlockAsync` index `resp[1]` and call `Buffer.BlockCopy(resp, 2, ..., resp[1])` with no bounds validation. `ModbusTcpTransport.SendOnceAsync` validates only the MBAP length field and the exception high-bit — it does not guarantee a non-exception response PDU is long enough to hold function-code + byte-count + the claimed data. A device (or buggy server) returning a 1-byte PDU, or a byte-count larger than the actual payload, produces an `IndexOutOfRangeException`/`ArgumentException` rather than a clean comms error. `DecodeBitArray` similarly indexes `bitmap[0]` (`ModbusDriver.cs:325`) without checking the bitmap is non-empty. In `ReadAsync` these are caught by the catch-all and mapped to `BadCommunicationError`, so impact is limited; in `ReadCoalescedAsync` the exception is opaque to the narrower catch arms.
|
||||
|
||||
**Recommendation:** In `ReadRegisterBlockAsync`/`ReadBitBlockAsync`, validate `resp.Length >= 2` and `resp.Length >= 2 + resp[1]` before slicing, throwing a descriptive `InvalidDataException`. Validate the decoded byte/bit count matches the request quantity.
|
||||
|
||||
**Resolution:** _(open)_
|
||||
|
||||
### Driver.Modbus-006
|
||||
|
||||
| Field | Value |
|
||||
|---|---|
|
||||
| Severity | Medium |
|
||||
| Category | Error handling & resilience |
|
||||
| Location | `ModbusDriver.cs:514-524,532-550` |
|
||||
| Status | Open |
|
||||
|
||||
**Description:** `RunReprobeOnceForTestAsync` reads `_transport` once at the top (`var transport = _transport ?? throw ...`). If `ShutdownAsync` runs (setting `_transport = null` and disposing it) while a re-probe pass is mid-iteration, the loop keeps issuing reads against the captured, disposed transport. `ReprobeLoopAsync` only catches `OperationCanceledException when (ct.IsCancellationRequested)` — an `ObjectDisposedException` from the disposed transport escapes `RunReprobeOnceForTestAsync` and faults the fire-and-forget background `Task`, silently killing the re-probe loop with the wrong failure mode.
|
||||
|
||||
**Recommendation:** Re-check `_transport`/cancellation inside the per-candidate loop, or broaden the `ReprobeLoopAsync` catch to also swallow `ObjectDisposedException` when `ct.IsCancellationRequested`.
|
||||
|
||||
**Resolution:** _(open)_
|
||||
|
||||
### Driver.Modbus-007
|
||||
|
||||
| Field | Value |
|
||||
|---|---|
|
||||
| Severity | Low |
|
||||
| Category | Design-document adherence |
|
||||
| Location | `ModbusDriver.cs:1392`, `ModbusDriverOptions.cs:74-80` |
|
||||
| Status | Open |
|
||||
|
||||
**Description:** Two design-vs-code drifts. (1) `MapDataType` maps `Int64`/`UInt64` to `DriverDataType.Int32` with the inline comment "widening to Int32 loses precision; PR 25 adds Int64 to DriverDataType". The address-space node for a 64-bit Modbus tag is declared `Int32`, misrepresenting the OPC UA variable's `DataType` even though `DecodeRegister` produces a correct `long`/`ulong` value — clients see a type/value mismatch. (2) `DisableFC23` is documented and bound from JSON but is a confirmed no-op ("The driver does not currently emit FC23"). Both are acknowledged-but-unfinished items worth tracking.
|
||||
|
||||
**Recommendation:** Track the PR 25 `DriverDataType.Int64` follow-up; until then document the Int32 surfacing limitation in `docs/v2/modbus-addressing.md` so operators configuring `I_64`/`UI_64` tags understand the node type. Mark `DisableFC23` clearly as reserved/unimplemented or gate it once FC23 ships.
|
||||
|
||||
**Resolution:** _(open)_
|
||||
|
||||
### Driver.Modbus-008
|
||||
|
||||
| Field | Value |
|
||||
|---|---|
|
||||
| Severity | Low |
|
||||
| Category | Documentation & comments |
|
||||
| Location | `ModbusDriver.cs:411-417,700-703,737-744` |
|
||||
| Status | Open |
|
||||
|
||||
**Description:** Stale/misleading comments. (1) The `<summary>` block at `ModbusDriver.cs:411-417` says auto-prohibited ranges are "Cleared by ReinitializeAsync ... or by an explicit re-probe API (not yet shipped)" — the re-probe loop has shipped (#151, `ReprobeLoopAsync`), so the parenthetical is wrong. (2) The comment at `ModbusDriver.cs:700-703` ("On block-level failure mark every member Bad — caller's per-tag fallback won't re-try since handled-set already includes them; auto-split-on-failure is a follow-up") contradicts the actual `catch (ModbusException)` arm below it, which deliberately does not add members to `handled` and does defer to per-tag fallback (and auto-split has shipped via bisection). The empty `foreach (var (idx, _) in block.Members) { }` loop at `ModbusDriver.cs:737-744`, with only a comment body, is dead code from that superseded design.
|
||||
|
||||
**Recommendation:** Update the two comments to match the shipped #148/#150/#151 behaviour and delete the empty `foreach` loop in the `catch (ModbusException)` arm.
|
||||
|
||||
**Resolution:** _(open)_
|
||||
|
||||
### Driver.Modbus-009
|
||||
|
||||
| Field | Value |
|
||||
|---|---|
|
||||
| Severity | Low |
|
||||
| Category | Correctness & logic bugs |
|
||||
| Location | `ModbusDriver.cs:1160-1167`, `ModbusTcpTransport.cs:94-95` |
|
||||
| Status | Open |
|
||||
|
||||
**Description:** Two edge cases. (1) `RegisterCount` for `ModbusDataType.String` computes `(tag.StringLength + 1) / 2`; a tag configured with `StringLength = 0` yields a register count of 0, flowing into `ReadOneAsync` as `totalRegs = 0` and producing an FC03/FC04 with quantity 0 — a spec-illegal request the PLC rejects with exception 03. The factory does not reject `StringLength = 0` for String tags. (2) `EnableKeepAlive` casts `opts.Time.TotalSeconds`/`opts.Interval.TotalSeconds` to `int`; a sub-second configured `TimeSpan` (e.g. 500 ms) truncates to 0, which most OSes reject or interpret as "use default", silently defeating the configured keep-alive timing.
|
||||
|
||||
**Recommendation:** Validate `StringLength >= 1` for `String` tags in `ModbusDriverFactoryExtensions.BuildTag`. For keep-alive, round up to a minimum of 1 second or validate the configured `TimeSpan` is a whole number of seconds.
|
||||
|
||||
**Resolution:** _(open)_
|
||||
|
||||
### Driver.Modbus-010
|
||||
|
||||
| Field | Value |
|
||||
|---|---|
|
||||
| Severity | Low |
|
||||
| Category | Error handling & resilience |
|
||||
| Location | `ModbusDriver.cs:864-868`, `ModbusDriverOptions.cs:116-125` |
|
||||
| Status | Open |
|
||||
|
||||
**Description:** When `WriteOnChangeOnly` is enabled and `IsRedundantWrite` returns true, `WriteAsync` returns `WriteResult(0u)` (Good) without touching the wire. The suppression baseline (`_lastWrittenByRef`) is only invalidated by a *read* that returns a divergent value. If a driver instance has `WriteOnChangeOnly = true` but a tag is never subscribed/read (write-only setpoint), a value the operator believes was re-asserted is silently suppressed forever after the first write — no time- or count-based expiry exists. The option XML doc describes the read-invalidation path but does not warn about write-only tags.
|
||||
|
||||
**Recommendation:** Document the write-only-tag caveat on the `WriteOnChangeOnly` option, or add an optional TTL to the suppression cache so a periodic re-write still reaches the PLC.
|
||||
|
||||
**Resolution:** _(open)_
|
||||
|
||||
### Driver.Modbus-011
|
||||
|
||||
| Field | Value |
|
||||
|---|---|
|
||||
| Severity | Low |
|
||||
| Category | Code organization & conventions |
|
||||
| Location | `ModbusDriver.cs:23-43,89-97,408-432` |
|
||||
| Status | Open |
|
||||
|
||||
**Description:** Field and member declarations are interleaved with methods throughout `ModbusDriver`. `ResolveHost` (a public method) is the first member of the class, followed by `BuildSlaveHostName`, then a block of fields; `_lastPublishedByRef`/`_lastWrittenByRef` are declared after the constructor; `ProhibitionState`, `_autoProhibited`, and `_reprobeCts` are declared mid-file between `DecodeRegisterArray` and `RangeIsAutoProhibited`. There are also two near-identical `<summary>` blocks stacked back-to-back at `ModbusDriver.cs:411-423`. This hurts readability of a 1400-line file and makes the field inventory hard to audit (relevant to the thread-safety findings above).
|
||||
|
||||
**Recommendation:** Group all instance fields at the top of the class, move nested types together, and remove the orphaned first `<summary>` at lines 411-417 that no longer precedes a member.
|
||||
|
||||
**Resolution:** _(open)_
|
||||
|
||||
### Driver.Modbus-012
|
||||
|
||||
| Field | Value |
|
||||
|---|---|
|
||||
| Severity | Low |
|
||||
| Category | Testing coverage |
|
||||
| Location | `tests/Drivers/ZB.MOM.WW.OtOpcUa.Driver.Modbus.Tests/` |
|
||||
| Status | Open |
|
||||
|
||||
**Description:** The unit suite is broad (coalescing, bisection, auto-recovery, byte order, arrays, BCD, RMW, caps, multi-unit, probe, reconnect, subscription). Gaps relative to the findings above: (1) no test exercises concurrent multi-subscription publishing, so the `_lastPublishedByRef` race (Driver.Modbus-001) is uncaught; (2) no test covers `ReinitializeAsync` state hygiene for stale `_tagsByName`/caches (Driver.Modbus-002); (3) no test feeds a malformed/short response PDU through `ReadRegisterBlockAsync`/`DecodeBitArray` to confirm a clean `BadCommunicationError` rather than an index-range crash (Driver.Modbus-005); (4) no test asserts `DisposeAsync` (vs `ShutdownAsync`) tears down the probe/re-probe loops and `_poll` (Driver.Modbus-004).
|
||||
|
||||
**Recommendation:** Add unit tests for concurrent deadband publishing across two subscriptions, `ReinitializeAsync` state hygiene, malformed-response handling in the register/bit block readers, and `DisposeAsync` loop teardown.
|
||||
|
||||
**Resolution:** _(open)_
|
||||
252
code-reviews/Driver.OpcUaClient/findings.md
Normal file
252
code-reviews/Driver.OpcUaClient/findings.md
Normal file
@@ -0,0 +1,252 @@
|
||||
# Code Review — Driver.OpcUaClient
|
||||
|
||||
| Field | Value |
|
||||
|---|---|
|
||||
| Module | `src/Drivers/ZB.MOM.WW.OtOpcUa.Driver.OpcUaClient` |
|
||||
| Reviewer | Claude Code |
|
||||
| Review date | 2026-05-22 |
|
||||
| Commit reviewed | `76d35d1` |
|
||||
| Status | Reviewed |
|
||||
| Open findings | 15 |
|
||||
|
||||
## Checklist coverage
|
||||
|
||||
| # | Category | Result |
|
||||
|---|---|---|
|
||||
| 1 | Correctness & logic bugs | Driver.OpcUaClient-001, -002, -003, -010, -011 |
|
||||
| 2 | OtOpcUa conventions | Driver.OpcUaClient-004 |
|
||||
| 3 | Concurrency & thread safety | Driver.OpcUaClient-005, -006, -007 |
|
||||
| 4 | Error handling & resilience | Driver.OpcUaClient-002, -008, -009 |
|
||||
| 5 | Security | Driver.OpcUaClient-012 |
|
||||
| 6 | Performance & resource management | Driver.OpcUaClient-013, -014 |
|
||||
| 7 | Design-document adherence | Driver.OpcUaClient-004, -013, -015 |
|
||||
| 8 | Code organization & conventions | No issues found |
|
||||
| 9 | Testing coverage | Driver.OpcUaClient-015 |
|
||||
| 10 | Documentation & comments | Driver.OpcUaClient-011 |
|
||||
|
||||
## Findings
|
||||
|
||||
### Driver.OpcUaClient-001
|
||||
|
||||
| Field | Value |
|
||||
|---|---|
|
||||
| Severity | High |
|
||||
| Category | Correctness & logic bugs |
|
||||
| Location | `OpcUaClientDriver.cs:444`, `:466`, `:517`, `:540`, `:599`, `:610` |
|
||||
| Status | Open |
|
||||
|
||||
**Description:** ReadAsync, WriteAsync, and DiscoverAsync capture the session into a local variable via RequireSession() before acquiring `_gate`, then perform the wire call on that captured reference inside the gate. The reconnect path (OnReconnectComplete, line 1330) swaps `Session` to a brand-new ISession. A read that captured the pre-reconnect session at line 444, then blocked on `_gate.WaitAsync` while a reconnect completed, issues ReadAsync against a stale/closed session. The catch block then fans out BadCommunicationError for the whole batch even though the driver is healthy on the new session, and the operation is silently lost. The gate does not protect against the session being swapped underneath a waiter.
|
||||
|
||||
**Recommendation:** Re-read `Session` inside the `_gate` critical section (after WaitAsync returns), or route the session swap itself through `_gate` so a swap cannot interleave with a gated operation.
|
||||
|
||||
**Resolution:** _(open)_
|
||||
|
||||
### Driver.OpcUaClient-002
|
||||
|
||||
| Field | Value |
|
||||
|---|---|
|
||||
| Severity | High |
|
||||
| Category | Error handling & resilience |
|
||||
| Location | `OpcUaClientDriver.cs:1330-1359` |
|
||||
| Status | Open |
|
||||
|
||||
**Description:** OnReconnectComplete handles only the success case. When SessionReconnectHandler gives up (its retry loop exhausts the 2-minute maxReconnectPeriod), it invokes the callback with `handler.Session == null`. The code sets `Session = null`, disposes the handler, and sets `_reconnectHandler = null`, but leaves `_health` at whatever it was (typically Degraded) and `_hostState` at Stopped. There is no further reconnect attempt (the handler is gone, and OnKeepAlive only fires on a live session which no longer exists), and DriverState is never set to Faulted. The driver is permanently wedged: no session, no reconnect loop, no Faulted signal for the Core, and ReinitializeAsync is never triggered. This is the single largest gateway resilience gap.
|
||||
|
||||
**Recommendation:** In OnReconnectComplete, when newSession is null, set `_health` to a Faulted DriverHealth with an explanatory message so the Core can fan out Bad quality and offer an operator reinitialize. Consider re-arming a fresh reconnect attempt rather than giving up entirely for an always-on gateway.
|
||||
|
||||
**Resolution:** _(open)_
|
||||
|
||||
### Driver.OpcUaClient-003
|
||||
|
||||
| Field | Value |
|
||||
|---|---|
|
||||
| Severity | High |
|
||||
| Category | Correctness & logic bugs |
|
||||
| Location | `OpcUaClientDriver.cs:644-711` |
|
||||
| Status | Open |
|
||||
|
||||
**Description:** BrowseRecursiveAsync calls session.BrowseAsync with `requestedMaxReferencesPerNode: 0` but never follows browse continuation points. OPC UA servers enforce a server-side max-references-per-node limit; when a node has more children than the server returns in one response, BrowseResult.ContinuationPoint is non-empty and the caller must issue BrowseNext to retrieve the remainder. This driver discards the continuation point, so any folder on the remote server with a large child set is silently truncated: discovered tags go missing from the local address space with no error. For the tens-of-thousands-of-nodes scenario the options doc targets (MaxDiscoveredNodes = 10000), this is a realistic and silent data-completeness bug.
|
||||
|
||||
**Recommendation:** After processing resp.Results[0].References, check resp.Results[0].ContinuationPoint; while non-empty, call session.BrowseNextAsync and append the additional references before recursing/registering.
|
||||
|
||||
**Resolution:** _(open)_
|
||||
|
||||
### Driver.OpcUaClient-004
|
||||
|
||||
| Field | Value |
|
||||
|---|---|
|
||||
| Severity | High |
|
||||
| Category | Design-document adherence |
|
||||
| Location | `OpcUaClientDriver.cs:596-632`, `:789`, `OpcUaClientDriverOptions.cs` |
|
||||
| Status | Open |
|
||||
|
||||
**Description:** docs/v2/driver-specs.md section 8 mandates two features that are absent. (1) Namespace remapping: the spec requires building a bidirectional namespace map at connect time from session.NamespaceUris. The driver instead stores the raw upstream NodeId string (pv.NodeId.ToString()) as DriverAttributeInfo.FullName and re-parses it verbatim for reads/writes. The namespace index embedded in `ns=N;...` is server-session-relative; if the upstream server reorders its namespace table across a restart (permitted by the spec), every stored ns=N reference points at the wrong namespace and reads/writes silently address wrong nodes. (2) TargetNamespaceKind enforcement: section 8 requires the driver to enforce Equipment-vs-SystemPlatform choice at startup and fail draft validation on misconfiguration; OpcUaClientDriverOptions has no such knob.
|
||||
|
||||
**Recommendation:** Build a namespace-URI map from session.NamespaceUris at connect time and store NodeIds in a server-stable form (namespace URI plus identifier) rather than session-relative ns=N. Add the TargetNamespaceKind option and the startup validation section 8 describes, or document explicitly why the design deviates.
|
||||
|
||||
**Resolution:** _(open)_
|
||||
|
||||
### Driver.OpcUaClient-005
|
||||
|
||||
| Field | Value |
|
||||
|---|---|
|
||||
| Severity | High |
|
||||
| Category | Concurrency & thread safety |
|
||||
| Location | `OpcUaClientDriver.cs:1297-1319` |
|
||||
| Status | Open |
|
||||
|
||||
**Description:** OnKeepAlive reads and writes `_reconnectHandler` without any lock: `if (_reconnectHandler is not null) return;` followed by `_reconnectHandler = new SessionReconnectHandler(...)`. Keep-alive callbacks are raised from the SDK keep-alive timer thread; on a bad keep-alive the SDK can fire the handler repeatedly while the channel stays down. Two callbacks racing through the check-then-set both observe null, both construct a SessionReconnectHandler, both call BeginReconnect, and the second assignment overwrites the first handler, leaking the first handler (its retry loop keeps running, unreferenced and never disposed) and creating two competing reconnect loops. ShutdownAsync then only cancels/disposes the one that won the assignment race.
|
||||
|
||||
**Recommendation:** Guard the `_reconnectHandler` check-and-set with `_probeLock` (already held for `_hostState`), or use Interlocked.CompareExchange to ensure exactly one handler is constructed per drop.
|
||||
|
||||
**Resolution:** _(open)_
|
||||
|
||||
### Driver.OpcUaClient-006
|
||||
|
||||
| Field | Value |
|
||||
|---|---|
|
||||
| Severity | Medium |
|
||||
| Category | Concurrency & thread safety |
|
||||
| Location | `OpcUaClientDriver.cs:1330-1359` |
|
||||
| Status | Open |
|
||||
|
||||
**Description:** OnReconnectComplete mutates `Session` (line 1347) directly from the reconnect-handler callback thread with no synchronization against ReadAsync/WriteAsync/DiscoverAsync/ShutdownAsync. Session is a plain auto-property with no memory barrier; a concurrent reader on another thread may observe a stale reference. ShutdownAsync (line 425) can also run concurrently with OnReconnectComplete: ShutdownAsync disposes the session and sets Session = null while OnReconnectComplete sets Session = newSession, and the interleaving is unspecified, potentially leaving a live session leaked after shutdown.
|
||||
|
||||
**Recommendation:** Route all Session mutations through a single lock (or the `_gate`). Make ShutdownAsync cancel the reconnect handler and wait for any in-flight OnReconnectComplete to settle before disposing the session.
|
||||
|
||||
**Resolution:** _(open)_
|
||||
|
||||
### Driver.OpcUaClient-007
|
||||
|
||||
| Field | Value |
|
||||
|---|---|
|
||||
| Severity | Medium |
|
||||
| Category | Concurrency & thread safety |
|
||||
| Location | `OpcUaClientDriver.cs:1374`, `:1376-1383`, `:508` |
|
||||
| Status | Open |
|
||||
|
||||
**Description:** Two disposal races. (1) Dispose() does `DisposeAsync().AsTask().GetAwaiter().GetResult()`, synchronous blocking on async work. The Galaxy stability review (driver-stability.md, the 2026-04-13 findings) explicitly calls out sync-over-async on the OPC UA stack thread as a closed bug class; if Dispose() runs on the OPC UA stack thread or any thread the SDK continuations need, this deadlocks. (2) DisposeAsync disposes `_gate` (line 1382) after ShutdownAsync returns, but ShutdownAsync does not drain in-flight ReadAsync/WriteAsync operations holding `_gate`. An in-flight read that calls `_gate.Release()` (line 508) after `_gate.Dispose()` throws ObjectDisposedException on a background thread.
|
||||
|
||||
**Recommendation:** Provide an async disposal path callers prefer; if a sync Dispose() is unavoidable keep it free of .GetResult() on SDK-thread-affine work. Before disposing `_gate`, acquire it once so all in-flight gated operations have completed, or guard releases against disposal.
|
||||
|
||||
**Resolution:** _(open)_
|
||||
|
||||
### Driver.OpcUaClient-008
|
||||
|
||||
| Field | Value |
|
||||
|---|---|
|
||||
| Severity | Medium |
|
||||
| Category | Error handling & resilience |
|
||||
| Location | `OpcUaClientDriver.cs:1092-1099` |
|
||||
| Status | Open |
|
||||
|
||||
**Description:** AcknowledgeAsync issues the batched CallAsync and then catches all exceptions with a best-effort empty catch; it also never inspects the per-call results in the success path (`_ = await session.CallAsync(...)`). An alarm acknowledgment the upstream server rejects (BadConditionAlreadyAcked, BadNodeIdUnknown, BadUserAccessDenied) is reported as success to the caller. IAlarmSource.AcknowledgeAsync has no per-item result, so the only way a failure could surface is via an exception, and the catch suppresses even that. Operators acking a critical alarm get no signal that the ack did not take.
|
||||
|
||||
**Recommendation:** Inspect CallMethodResult.StatusCode for each result and log Bad codes; rethrow (or surface via driver health) genuine transport failures rather than swallowing them. Consider extending the contract so per-ack failures propagate.
|
||||
|
||||
**Resolution:** _(open)_
|
||||
|
||||
### Driver.OpcUaClient-009
|
||||
|
||||
| Field | Value |
|
||||
|---|---|
|
||||
| Severity | Medium |
|
||||
| Category | Error handling & resilience |
|
||||
| Location | `OpcUaClientDriver.cs:560-564` |
|
||||
| Status | Open |
|
||||
|
||||
**Description:** WriteAsync's catch block fans out BadCommunicationError across the whole batch on any exception. Writes are non-idempotent by default (IWritable remarks, decision #44/#45): a timeout exception may fire after the upstream server already applied the write. Reporting BadCommunicationError (a code that reads as "definitely did not happen") for a write that may have succeeded is misleading; the OPC UA client downstream may safely re-issue and double-apply. The read path has the same fan-out but reads are idempotent so it is benign there; for writes the ambiguity matters.
|
||||
|
||||
**Recommendation:** Map write timeouts/cancellations to BadTimeout (which downstream correctly treats as "outcome unknown, do not blindly retry") rather than BadCommunicationError, and only use BadCommunicationError for failures that provably occurred before the request reached the wire.
|
||||
|
||||
**Resolution:** _(open)_
|
||||
|
||||
### Driver.OpcUaClient-010
|
||||
|
||||
| Field | Value |
|
||||
|---|---|
|
||||
| Severity | Medium |
|
||||
| Category | Correctness & logic bugs |
|
||||
| Location | `OpcUaClientDriver.cs:823-824` |
|
||||
| Status | Open |
|
||||
|
||||
**Description:** MapUpstreamDataType maps DataTypeIds.Byte (the OPC UA unsigned 8-bit type) to DriverDataType.Int16. Byte should map to an unsigned driver type (UInt16 is the smallest unsigned available, matching how SByte belongs with the signed family). Mapping an unsigned 0-255 type onto signed Int16 misrepresents the type metadata downstream: clients see a signed type for an unsigned source, and any range/validation logic keyed off the driver data type is wrong. SByte correctly belongs with Int16; Byte does not.
|
||||
|
||||
**Recommendation:** Map DataTypeIds.Byte to DriverDataType.UInt16 (or add a Byte/UInt8 driver type if the enum supports finer granularity), keeping SByte and Int16 on the signed Int16 mapping.
|
||||
|
||||
**Resolution:** _(open)_
|
||||
|
||||
### Driver.OpcUaClient-011
|
||||
|
||||
| Field | Value |
|
||||
|---|---|
|
||||
| Severity | Low |
|
||||
| Category | Documentation & comments |
|
||||
| Location | `OpcUaClientDriver.cs:783-784` |
|
||||
| Status | Open |
|
||||
|
||||
**Description:** The comment on the isArray computation states "-1 = scalar; 1+ = array dimensions; 0 = one-dimensional array". This is inaccurate against OPC UA ValueRank semantics: -3 is ScalarOrOneDimension, -2 is Any, -1 is Scalar, and 0 is OneOrMoreDimensions (not specifically one-dimensional). The code `valueRank >= 0` treats -2 (Any) and -3 (ScalarOrOneDimension) as scalar, which is a defensible default, but the comment misdescribes the constants and would mislead a maintainer.
|
||||
|
||||
**Recommendation:** Correct the comment to the actual ValueRank constants (-3 ScalarOrOneDimension, -2 Any, -1 Scalar, 0 OneOrMoreDimensions, 1 OneDimension, >1 multi-dim) and state the deliberate choice that anything >= 0 is treated as an array.
|
||||
|
||||
**Resolution:** _(open)_
|
||||
|
||||
### Driver.OpcUaClient-012
|
||||
|
||||
| Field | Value |
|
||||
|---|---|
|
||||
| Severity | Medium |
|
||||
| Category | Security |
|
||||
| Location | `OpcUaClientDriver.cs:210-217` |
|
||||
| Status | Open |
|
||||
|
||||
**Description:** When AutoAcceptCertificates is true the driver registers a CertificateValidation handler that accepts only StatusCodes.BadCertificateUntrusted. A self-signed or otherwise untrusted server certificate frequently fails validation with a different code first (BadCertificateChainIncomplete, BadCertificateTimeInvalid, BadCertificateHostNameInvalid), so auto-accept silently does not accept many real dev certificates and the connect fails confusingly. The handler is added to config.CertificateValidator but never removed; each driver instance leaks a delegate subscription on a validator that may be process-shared. The option doc says auto-accept is dev-only and must be false in production, but there is no runtime guard preventing AutoAcceptCertificates=true shipping to production and no log warning when it is enabled.
|
||||
|
||||
**Recommendation:** When auto-accepting for dev, accept the full set of certificate-validation error codes (or use the SDK AutoAcceptUntrustedCertificates path consistently). Emit a prominent warning log every time AutoAcceptCertificates is enabled so a production misconfiguration is visible. Detach the handler on shutdown.
|
||||
|
||||
**Resolution:** _(open)_
|
||||
|
||||
### Driver.OpcUaClient-013
|
||||
|
||||
| Field | Value |
|
||||
|---|---|
|
||||
| Severity | Medium |
|
||||
| Category | Performance & resource management |
|
||||
| Location | `OpcUaClientDriver.cs:436-437` |
|
||||
| Status | Open |
|
||||
|
||||
**Description:** GetMemoryFootprint() is hard-coded to return 0 and FlushOptionalCachesAsync is a no-op Task.CompletedTask. docs/v2/driver-stability.md section "In-process only (Tier A/B)" makes per-instance allocation tracking a contract requirement, and driver-specs.md section 8 explicitly calls out browse-cache memory: BrowseStrategy=Full against a large remote server can cache tens of thousands of node descriptions and the per-instance budget should bound this. Returning 0 means the Core 30-second footprint poll can never detect this driver's browse-cache growth, and the cache-budget-breach to flush escalation path is dead code. A gateway pointed at a 10k-node server (the configured cap) silently evades the Tier-A memory-guard mechanism.
|
||||
|
||||
**Recommendation:** Track an approximate footprint for the discovered-node set and any cached browse state, return it from GetMemoryFootprint(), and implement FlushOptionalCachesAsync to drop droppable cache. If the driver genuinely holds no significant cache, document why 0 is correct.
|
||||
|
||||
**Resolution:** _(open)_
|
||||
|
||||
### Driver.OpcUaClient-014
|
||||
|
||||
| Field | Value |
|
||||
|---|---|
|
||||
| Severity | Low |
|
||||
| Category | Performance & resource management |
|
||||
| Location | `OpcUaClientDriver.cs:904`, `:1035` |
|
||||
| Status | Open |
|
||||
|
||||
**Description:** `MonitoredItem.Notification += (mi, args) => ...` (and the alarm-event equivalent) attaches a closure-capturing lambda to each monitored item's event. The lambda is never detached. When UnsubscribeAsync removes a subscription it calls Subscription.DeleteAsync but does not clear the MonitoredItem.Notification handlers; if the SDK retains the MonitoredItem/Subscription graph anywhere (the session keeps a reference until its own disposal, or during transfer-on-reconnect), the driver instance is kept alive by the closure longer than necessary.
|
||||
|
||||
**Recommendation:** Detach the Notification handlers when deleting a subscription, or hold the handler delegate so it can be explicitly removed in UnsubscribeAsync/ShutdownAsync.
|
||||
|
||||
**Resolution:** _(open)_
|
||||
|
||||
### Driver.OpcUaClient-015
|
||||
|
||||
| Field | Value |
|
||||
|---|---|
|
||||
| Severity | Medium |
|
||||
| Category | Testing coverage |
|
||||
| Location | `tests/Drivers/ZB.MOM.WW.OtOpcUa.Driver.OpcUaClient.Tests/*`, `tests/Drivers/ZB.MOM.WW.OtOpcUa.Driver.OpcUaClient.IntegrationTests/OpcUaClientSmokeTests.cs` |
|
||||
| Status | Open |
|
||||
|
||||
**Description:** Unit-test coverage is solid for the pure mappers (MapSeverity, MapUpstreamDataType, MapSecurityPolicy, MapAggregateToNodeId, BuildCertificateIdentity, ResolveEndpointCandidates) and for "throws before init" guards, but the highest-risk behaviours of a gateway driver have no test: the reconnect/session-swap path (OnKeepAlive to OnReconnectComplete, findings -001/-002/-005/-006), browse continuation-point handling (-003), the cascading-quality fan-out on a mid-batch transport failure, and namespace remapping (-004). The reconnect test file itself states wire-level disconnect-reconnect-resume coverage lands with the in-process fixture, i.e. the single largest gateway bug surface (per driver-specs.md section 8) is explicitly untested. The integration suite is Docker-fixture gated against opc-plc and is a smoke test only. The failed-reconnect-to-Faulted and concurrent-keep-alive races are pure-logic paths testable with a fake ISession.
|
||||
|
||||
**Recommendation:** Add tests exercising the reconnect callbacks with a stub session (success and give-up cases), a browse test with a paged/continuation-point server stub, and a read-batch test asserting upstream Bad StatusCodes pass through verbatim while a transport throw fans out the local fault code.
|
||||
|
||||
**Resolution:** _(open)_
|
||||
209
code-reviews/Driver.S7.Cli/findings.md
Normal file
209
code-reviews/Driver.S7.Cli/findings.md
Normal file
@@ -0,0 +1,209 @@
|
||||
# Code Review — Driver.S7.Cli
|
||||
|
||||
| Field | Value |
|
||||
|---|---|
|
||||
| Module | `src/Drivers/Cli/ZB.MOM.WW.OtOpcUa.Driver.S7.Cli` |
|
||||
| Reviewer | Claude Code |
|
||||
| Review date | 2026-05-22 |
|
||||
| Commit reviewed | `76d35d1` |
|
||||
| Status | Reviewed |
|
||||
| Open findings | 7 |
|
||||
|
||||
## Checklist coverage
|
||||
|
||||
A comprehensive review completes every category, recording "No issues found" where
|
||||
a category produced nothing rather than leaving it blank.
|
||||
|
||||
| # | Category | Result |
|
||||
|---|---|---|
|
||||
| 1 | Correctness & logic bugs | Driver.S7.Cli-001, Driver.S7.Cli-002 |
|
||||
| 2 | OtOpcUa conventions | No issues found |
|
||||
| 3 | Concurrency & thread safety | No issues found |
|
||||
| 4 | Error handling & resilience | Driver.S7.Cli-001, Driver.S7.Cli-003 |
|
||||
| 5 | Security | No issues found |
|
||||
| 6 | Performance & resource management | Driver.S7.Cli-004 |
|
||||
| 7 | Design-document adherence | Driver.S7.Cli-002 |
|
||||
| 8 | Code organization & conventions | Driver.S7.Cli-005 |
|
||||
| 9 | Testing coverage | Driver.S7.Cli-006 |
|
||||
| 10 | Documentation & comments | Driver.S7.Cli-002, Driver.S7.Cli-007 |
|
||||
|
||||
## Findings
|
||||
|
||||
### Driver.S7.Cli-001
|
||||
|
||||
| Field | Value |
|
||||
|---|---|
|
||||
| Severity | Medium |
|
||||
| Category | Error handling & resilience |
|
||||
| Location | `src/Drivers/Cli/ZB.MOM.WW.OtOpcUa.Driver.S7.Cli/Commands/WriteCommand.cs:65-80` |
|
||||
| Status | Open |
|
||||
|
||||
**Description:** `WriteCommand.ParseValue` parses numeric and `DateTime` values with the
|
||||
raw BCL parsers (`short.Parse`, `float.Parse`, `DateTime.Parse`, etc.). On malformed
|
||||
input these throw `FormatException` / `OverflowException`, which are *not*
|
||||
`CliFx.Exceptions.CommandException`. CliFx renders a `CommandException` as a clean
|
||||
one-line error with a non-zero exit code, but renders any other exception as a full
|
||||
.NET stack trace. The `ParseValue` bool path is handled correctly (it throws
|
||||
`CommandException` for unrecognised input), so the command is internally inconsistent:
|
||||
`write -t Bool -v maybe` gives a friendly message while `write -t Int16 -v xyz` dumps a
|
||||
stack trace. The module own test `ParseValue_non_numeric_for_numeric_types_throws`
|
||||
asserts the raw `FormatException` leaks, confirming the behaviour is unintended-but-shipped.
|
||||
|
||||
**Recommendation:** Wrap the numeric / `DateTime` parses in a `try`/`catch` that
|
||||
re-throws `FormatException` and `OverflowException` as
|
||||
`CliFx.Exceptions.CommandException` with a message that names the `--type` and the
|
||||
offending value — matching the bool path. Update the test to expect `CommandException`.
|
||||
|
||||
**Resolution:** _(open)_
|
||||
|
||||
### Driver.S7.Cli-002
|
||||
|
||||
| Field | Value |
|
||||
|---|---|
|
||||
| Severity | Medium |
|
||||
| Category | Design-document adherence |
|
||||
| Location | `src/Drivers/Cli/ZB.MOM.WW.OtOpcUa.Driver.S7.Cli/Commands/ReadCommand.cs:22-29`, `Commands/WriteCommand.cs:21-33`, `Commands/SubscribeCommand.cs:18-21`; `docs/Driver.S7.Cli.md:70-73,80-81` |
|
||||
| Status | Open |
|
||||
|
||||
**Description:** The `--type` option help text on `read`, `write`, and `subscribe`
|
||||
advertises the full `S7DataType` set (`Int64 / UInt64 / Float64 / String / DateTime`),
|
||||
and `docs/Driver.S7.Cli.md` shows a worked `read ... -t String --string-length 80`
|
||||
example plus a `--string-length` flag on `read`/`write`. The underlying `S7Driver`
|
||||
(`S7Driver.cs:241-245` for reads, `:316-320` for writes) throws `NotSupportedException`
|
||||
for `Int64`, `UInt64`, `Float64`, `String`, and `DateTime` — the driver maps that to
|
||||
`BadNotSupported`. Consequently every CLI invocation using one of those types — and the
|
||||
documented `--string-length` string-read example — fails at runtime with
|
||||
`0x803D0000 (Bad)`. The CLI surface and docs promise capability the driver does not yet
|
||||
implement.
|
||||
|
||||
**Recommendation:** Either (a) trim the `--type` help text and the `--string-length`
|
||||
flag/examples to the implemented set (`Bool / Byte / Int16 / UInt16 / Int32 / UInt32 /
|
||||
Float32`) until the follow-up driver PR lands, or (b) keep the surface but add a one-line
|
||||
"types beyond Float32 are not yet implemented and surface BadNotSupported" caveat to the
|
||||
help text and `docs/Driver.S7.Cli.md`. Option (a) is preferred so the CLI does not offer
|
||||
options that cannot succeed.
|
||||
|
||||
**Resolution:** _(open)_
|
||||
|
||||
### Driver.S7.Cli-003
|
||||
|
||||
| Field | Value |
|
||||
|---|---|
|
||||
| Severity | Medium |
|
||||
| Category | Error handling & resilience |
|
||||
| Location | `src/Drivers/Cli/ZB.MOM.WW.OtOpcUa.Driver.S7.Cli/Commands/ProbeCommand.cs:38-50` |
|
||||
| Status | Open |
|
||||
|
||||
**Description:** `ProbeCommand` XML doc and the `Driver.S7.Cli.md` "fastest is the
|
||||
device talking" framing say the probe "connects ... prints health" and "surfaces
|
||||
`BadNotSupported`" when PUT/GET is disabled. But when the PLC is unreachable (connection
|
||||
refused, host down, wrong slot), `driver.InitializeAsync` throws and the exception
|
||||
propagates straight out of `ExecuteAsync` — the code that prints `Host:`, `Health:`,
|
||||
`Last error:`, and the snapshot is never reached. The most common probe failure (device
|
||||
not reachable at all) therefore produces a CliFx stack trace rather than the structured
|
||||
health report the command exists to give. Note PUT/GET-disabled only surfaces during
|
||||
`ReadAsync` (after a successful connect), so that one path does reach the health print —
|
||||
but a refused TCP connect does not.
|
||||
|
||||
**Recommendation:** Wrap the `InitializeAsync` + `ReadAsync` body in a `try`/`catch` that,
|
||||
on failure, still prints the `Host:` / `CPU:` lines and a `Health:` / `Last error:`
|
||||
report derived from `driver.GetHealth()` (which `InitializeAsync` sets to
|
||||
`Faulted` with the exception message before re-throwing). The probe should report an
|
||||
unreachable device, not crash on it.
|
||||
|
||||
**Resolution:** _(open)_
|
||||
|
||||
### Driver.S7.Cli-004
|
||||
|
||||
| Field | Value |
|
||||
|---|---|
|
||||
| Severity | Low |
|
||||
| Category | Performance & resource management |
|
||||
| Location | `src/Drivers/Cli/ZB.MOM.WW.OtOpcUa.Driver.S7.Cli/Commands/ProbeCommand.cs:36,53`, `Commands/ReadCommand.cs:45,54`, `Commands/WriteCommand.cs:51,60`, `Commands/SubscribeCommand.cs:39,73` |
|
||||
| Status | Open |
|
||||
|
||||
**Description:** Every command declares the driver with `await using var driver = new
|
||||
S7Driver(...)` and *also* calls `await driver.ShutdownAsync(...)` in a `finally` block.
|
||||
`S7Driver.DisposeAsync` itself calls `ShutdownAsync`, so shutdown runs twice per command
|
||||
(three times for `subscribe`, which also unsubscribes). `ShutdownAsync` is idempotent
|
||||
(`Plc?.Close()` is best-effort, `_subscriptions` is cleared) so there is no functional
|
||||
bug, but the explicit `finally`-block `ShutdownAsync` call is redundant given the
|
||||
`await using`. It is also slightly misleading — a reader may assume the `await using` is
|
||||
not actually disposing.
|
||||
|
||||
**Recommendation:** Drop the explicit `await driver.ShutdownAsync(...)` from the
|
||||
`finally` blocks and rely on `await using` for teardown; keep only the
|
||||
`subscribe` command `UnsubscribeAsync`. Alternatively drop `await using`
|
||||
and keep the explicit `finally`. Pick one disposal mechanism per command.
|
||||
|
||||
**Resolution:** _(open)_
|
||||
|
||||
### Driver.S7.Cli-005
|
||||
|
||||
| Field | Value |
|
||||
|---|---|
|
||||
| Severity | Low |
|
||||
| Category | Code organization & conventions |
|
||||
| Location | `tests/ZB.MOM.WW.OtOpcUa.Driver.S7.Cli.Tests/` |
|
||||
| Status | Open |
|
||||
|
||||
**Description:** A stale directory `tests/ZB.MOM.WW.OtOpcUa.Driver.S7.Cli.Tests/`
|
||||
exists containing only an `obj/` folder — no `.csproj`, no source. The real test
|
||||
project lives at `tests/Drivers/Cli/ZB.MOM.WW.OtOpcUa.Driver.S7.Cli.Tests/`. The empty
|
||||
directory is a leftover from the project move into `tests/Drivers/Cli/` and is not
|
||||
referenced by `ZB.MOM.WW.OtOpcUa.slnx`. It is dead clutter that can mislead anyone
|
||||
grepping the tree for the S7 CLI test project.
|
||||
|
||||
**Recommendation:** Delete the stale `tests/ZB.MOM.WW.OtOpcUa.Driver.S7.Cli.Tests/`
|
||||
directory (including its `obj/`). This is outside the module `src/` tree but is the
|
||||
S7 CLI own orphaned test folder, so it belongs to this module cleanup.
|
||||
|
||||
**Resolution:** _(open)_
|
||||
|
||||
### Driver.S7.Cli-006
|
||||
|
||||
| Field | Value |
|
||||
|---|---|
|
||||
| Severity | Low |
|
||||
| Category | Testing coverage |
|
||||
| Location | `tests/Drivers/Cli/ZB.MOM.WW.OtOpcUa.Driver.S7.Cli.Tests/WriteCommandParseValueTests.cs` |
|
||||
| Status | Open |
|
||||
|
||||
**Description:** The only test file covers `WriteCommand.ParseValue` and
|
||||
`ReadCommand.SynthesiseTagName`. `S7CommandBase.BuildOptions` — which maps the
|
||||
host / port / CPU / rack / slot / timeout flags onto an `S7DriverOptions` and forces
|
||||
`Probe.Enabled = false` — has no test, despite being pure, deterministic, and
|
||||
`internal`-visible to the test assembly via `InternalsVisibleTo`. A regression that
|
||||
dropped `Probe = new S7ProbeOptions { Enabled = false }` (which would start an
|
||||
unwanted background probe loop in a one-shot CLI run) or mis-mapped `TimeoutMs` would
|
||||
not be caught. `ParseValue` is also missing an explicit overflow-edge test (e.g.
|
||||
`Byte` value `256`) — the current `ParseValue_Byte_ranges` test stops at `255`.
|
||||
|
||||
**Recommendation:** Add a `BuildOptions` test (assert `Probe.Enabled == false`,
|
||||
`Timeout` matches `TimeoutMs`, and host/port/CPU/rack/slot flow through). Add an
|
||||
overflow case to the `ParseValue` numeric tests once Driver.S7.Cli-001 is resolved so
|
||||
the test asserts the wrapped `CommandException`.
|
||||
|
||||
**Resolution:** _(open)_
|
||||
|
||||
### Driver.S7.Cli-007
|
||||
|
||||
| Field | Value |
|
||||
|---|---|
|
||||
| Severity | Low |
|
||||
| Category | Documentation & comments |
|
||||
| Location | `src/Drivers/Cli/ZB.MOM.WW.OtOpcUa.Driver.S7.Cli/Commands/SubscribeCommand.cs:45-51` |
|
||||
| Status | Open |
|
||||
|
||||
**Description:** The Modbus CLI `SubscribeCommand` carries an explanatory comment on
|
||||
the `OnDataChange` handler ("Route every data-change event to the CliFx console (not
|
||||
System.Console — the analyzer flags it + IConsole is the testable abstraction)"). The S7
|
||||
`SubscribeCommand` is a near-verbatim copy but dropped that comment, so the non-obvious
|
||||
reason the handler uses `console.Output.WriteLine` (synchronous, on a driver background
|
||||
thread) instead of `System.Console` or the `async` `WriteLineAsync` is undocumented here.
|
||||
Minor, but the rationale is worth keeping consistent across the CLI family.
|
||||
|
||||
**Recommendation:** Re-add the one-line comment from the Modbus `SubscribeCommand` so
|
||||
the S7 copy explains why the event handler writes via `console.Output` synchronously.
|
||||
|
||||
**Resolution:** _(open)_
|
||||
383
code-reviews/Driver.S7/findings.md
Normal file
383
code-reviews/Driver.S7/findings.md
Normal file
@@ -0,0 +1,383 @@
|
||||
# Code Review — Driver.S7
|
||||
|
||||
| Field | Value |
|
||||
|---|---|
|
||||
| Module | `src/Drivers/ZB.MOM.WW.OtOpcUa.Driver.S7` |
|
||||
| Reviewer | Claude Code |
|
||||
| Review date | 2026-05-22 |
|
||||
| Commit reviewed | `76d35d1` |
|
||||
| Status | Reviewed |
|
||||
| Open findings | 14 |
|
||||
|
||||
## Checklist coverage
|
||||
|
||||
A comprehensive review completes every category, recording "No issues found" where
|
||||
a category produced nothing rather than leaving it blank.
|
||||
|
||||
| # | Category | Result |
|
||||
|---|---|---|
|
||||
| 1 | Correctness & logic bugs | Driver.S7-001, Driver.S7-002, Driver.S7-003 |
|
||||
| 2 | OtOpcUa conventions | Driver.S7-004, Driver.S7-005 |
|
||||
| 3 | Concurrency & thread safety | Driver.S7-006 |
|
||||
| 4 | Error handling & resilience | Driver.S7-007, Driver.S7-008, Driver.S7-009 |
|
||||
| 5 | Security | No issues found |
|
||||
| 6 | Performance & resource management | Driver.S7-010 |
|
||||
| 7 | Design-document adherence | Driver.S7-011, Driver.S7-012 |
|
||||
| 8 | Code organization & conventions | Driver.S7-013 |
|
||||
| 9 | Testing coverage | Driver.S7-014 |
|
||||
| 10 | Documentation & comments | Driver.S7-012 (shared) |
|
||||
|
||||
## Findings
|
||||
|
||||
### Driver.S7-001
|
||||
|
||||
| Field | Value |
|
||||
|---|---|
|
||||
| Severity | High |
|
||||
| Category | Correctness & logic bugs |
|
||||
| Location | `S7AddressParser.cs:93`, `S7Driver.cs:231` |
|
||||
| Status | Open |
|
||||
|
||||
**Description:** S7AddressParser.Parse accepts Timer (T0) and Counter (C0)
|
||||
addresses and the test suite asserts they parse successfully, but the read path
|
||||
cannot serve them. Two problems compound: (1) ReadOneAsync type-mapping switch
|
||||
(lines 231-250) has no case for any Timer/Counter combination, so a Timer/Counter
|
||||
tag falls through to the default arm and throws InvalidDataException with a
|
||||
misleading "type-mismatch" message on every read; (2) the read is issued via
|
||||
plc.ReadAsync(tag.Address, ...) passing the raw address string, and S7.Net
|
||||
string-based parser does not understand T{n}/C{n} syntax. A tag configured with a
|
||||
timer or counter address passes init-time parsing (the docstring promises config
|
||||
typos fail fast at init) and then fails on every read - exactly the
|
||||
un-diagnosable failure mode the fail-fast parse was meant to prevent.
|
||||
|
||||
**Recommendation:** Either drop Timer/Counter from S7AddressParser and S7Area
|
||||
until they are wired through to S7.Net, or implement the Timer/Counter read path.
|
||||
If kept, reject Timer/Counter tags at InitializeAsync with a clear "not yet
|
||||
supported" error rather than letting them parse clean.
|
||||
|
||||
**Resolution:** _(open)_
|
||||
|
||||
### Driver.S7-002
|
||||
|
||||
| Field | Value |
|
||||
|---|---|
|
||||
| Severity | Medium |
|
||||
| Category | Correctness & logic bugs |
|
||||
| Location | `S7Driver.cs:350` |
|
||||
| Status | Open |
|
||||
|
||||
**Description:** MapDataType collapses S7DataType.UInt32 to DriverDataType.Int32.
|
||||
UInt32 values above int.MaxValue (2^31-1) wrap to negative when surfaced to the
|
||||
OPC UA client, silently corrupting the value. The inline comment only flags
|
||||
Int64/UInt64 as "widens; lossy" but UInt32 to Int32 is equally lossy and is not
|
||||
called out.
|
||||
|
||||
**Recommendation:** Map UInt32/UInt16 to a DriverDataType wide enough to hold the
|
||||
unsigned range, or add the missing unsigned DriverDataType members. At minimum
|
||||
correct the comment so the lossiness of UInt32 is documented.
|
||||
|
||||
**Resolution:** _(open)_
|
||||
|
||||
### Driver.S7-003
|
||||
|
||||
| Field | Value |
|
||||
|---|---|
|
||||
| Severity | Low |
|
||||
| Category | Correctness & logic bugs |
|
||||
| Location | `S7Driver.cs:172`, `S7Driver.cs:255` |
|
||||
| Status | Open |
|
||||
|
||||
**Description:** ReadAsync and WriteAsync dereference fullReferences.Count /
|
||||
writes.Count with no null guard. A null argument throws NullReferenceException
|
||||
rather than ArgumentNullException, and the NRE escapes before the _gate is taken
|
||||
so it is not wrapped in a per-item status. DiscoverAsync correctly uses
|
||||
ArgumentNullException.ThrowIfNull(builder); the read/write entry points are
|
||||
inconsistent with it.
|
||||
|
||||
**Recommendation:** Add ArgumentNullException.ThrowIfNull for the list parameters
|
||||
at the top of ReadAsync and WriteAsync.
|
||||
|
||||
**Resolution:** _(open)_
|
||||
|
||||
### Driver.S7-004
|
||||
|
||||
| Field | Value |
|
||||
|---|---|
|
||||
| Severity | Medium |
|
||||
| Category | OtOpcUa conventions |
|
||||
| Location | `S7Driver.cs` (whole file) |
|
||||
| Status | Open |
|
||||
|
||||
**Description:** The driver performs no logging. CLAUDE.md Library Preferences
|
||||
mandate Serilog with a rolling daily file sink. Every error path is an empty
|
||||
catch block (Initialize cleanup line 130, ShutdownAsync lines 142/149/153,
|
||||
ProbeLoop line 483, PollLoop lines 396/406, Dispose line 511). Connection faults,
|
||||
probe transitions, PUT/GET-disabled config errors, and poll-loop exceptions are
|
||||
all silently swallowed. An operator has only the DriverHealth.LastError string
|
||||
and no event trail to diagnose an intermittent PLC.
|
||||
|
||||
**Recommendation:** Inject an ILogger/ILoggerFactory and log connect
|
||||
success/failure, probe Running/Stopped transitions, PUT/GET-disabled detection,
|
||||
and swallowed poll-loop / shutdown exceptions.
|
||||
|
||||
**Resolution:** _(open)_
|
||||
|
||||
### Driver.S7-005
|
||||
|
||||
| Field | Value |
|
||||
|---|---|
|
||||
| Severity | Low |
|
||||
| Category | OtOpcUa conventions |
|
||||
| Location | `S7Driver.cs:33`, `S7Driver.cs:433` |
|
||||
| Status | Open |
|
||||
|
||||
**Description:** System.Collections.Concurrent.ConcurrentDictionary is written
|
||||
out with a fully-qualified namespace at the field declarations instead of a
|
||||
using System.Collections.Concurrent directive. ImplicitUsings is enabled and the
|
||||
rest of the codebase relies on using directives; the inline FQN is inconsistent
|
||||
with house style. Similar redundant global::S7.Net.* qualifiers appear throughout
|
||||
S7Driver.cs despite the file-top using S7.Net.
|
||||
|
||||
**Recommendation:** Add using System.Collections.Concurrent and drop the
|
||||
redundant global::S7.Net. qualifiers where using S7.Net already covers them.
|
||||
|
||||
**Resolution:** _(open)_
|
||||
|
||||
### Driver.S7-006
|
||||
|
||||
| Field | Value |
|
||||
|---|---|
|
||||
| Severity | High |
|
||||
| Category | Concurrency & thread safety |
|
||||
| Location | `S7Driver.cs:140`, `S7Driver.cs:457`, `S7Driver.cs:506` |
|
||||
| Status | Open |
|
||||
|
||||
**Description:** Disposal races with the in-flight probe / poll tasks.
|
||||
ShutdownAsync calls _probeCts.Cancel() and cancels each subscription CTS, but it
|
||||
does not await the ProbeLoopAsync / PollLoopAsync tasks (they are fire-and-forget
|
||||
Task.Run with the task handle discarded). DisposeAsync then calls ShutdownAsync
|
||||
followed immediately by _gate.Dispose(). A probe or poll iteration that is
|
||||
between _gate.WaitAsync and _gate.Release() when cancellation fires will call
|
||||
Release() (line 479) or have WaitAsync observe a disposed semaphore -
|
||||
ObjectDisposedException. The probe loop broad catch swallows it, but the
|
||||
disposal-ordering bug is real: the semaphore can be disposed while a worker still
|
||||
holds or is waiting on it. The same applies to _probeCts.Dispose() (line 143)
|
||||
running while ProbeLoopAsync may still touch the linked token.
|
||||
|
||||
**Recommendation:** Track the probe and poll Task handles, and in ShutdownAsync
|
||||
(or DisposeAsync) await Task.WhenAll(...) with a bounded timeout after cancelling,
|
||||
before disposing _gate and the CTS objects.
|
||||
|
||||
**Resolution:** _(open)_
|
||||
|
||||
### Driver.S7-007
|
||||
|
||||
| Field | Value |
|
||||
|---|---|
|
||||
| Severity | High |
|
||||
| Category | Error handling & resilience |
|
||||
| Location | `S7Driver.cs:200`, `S7DriverOptions.cs:13`, `docs/v2/driver-specs.md:434` |
|
||||
| Status | Open |
|
||||
|
||||
**Description:** PUT/GET-disabled handling contradicts the design and the
|
||||
module own docstring. driver-specs.md section 5 (line 434) and the
|
||||
S7DriverOptions class remark both state PUT/GET-disabled must be mapped to
|
||||
BadNotSupported and surfaced as a configuration alert, not a transient fault,
|
||||
because blind retry is wasted effort. The actual code (ReadAsync, lines 200-208)
|
||||
catches every S7.Net.PlcException and maps it to StatusBadDeviceFailure, then
|
||||
sets health to Degraded. Consequences: (1) a genuinely transient PlcException
|
||||
(e.g. CPU briefly in STOP) is reported identically to a permanent PUT/GET
|
||||
misconfiguration - the operator cannot tell a config problem from a transient
|
||||
one, which is the exact distinction the spec demands; (2) the promised
|
||||
BadNotSupported status code is never produced, so the S7DriverOptions docstring
|
||||
is now false.
|
||||
|
||||
**Recommendation:** Inspect PlcException.ErrorCode and map the
|
||||
PUT/GET-disabled / access-denied code to BadNotSupported with a distinct
|
||||
config-alert health state; keep BadDeviceFailure/Degraded only for genuine
|
||||
device-fault error codes.
|
||||
|
||||
**Resolution:** _(open)_
|
||||
|
||||
### Driver.S7-008
|
||||
|
||||
| Field | Value |
|
||||
|---|---|
|
||||
| Severity | Medium |
|
||||
| Category | Error handling & resilience |
|
||||
| Location | `S7Driver.cs:286` |
|
||||
| Status | Open |
|
||||
|
||||
**Description:** WriteAsync catch ladder is coarser than ReadAsync and loses
|
||||
information. The generic catch (Exception) maps everything - socket errors,
|
||||
timeouts, OverflowException from Convert.ToInt16 of an out-of-range value,
|
||||
NullReferenceException from Convert.ToBoolean(null) - to StatusBadInternalError.
|
||||
A genuine transport fault during a write is reported to the client as an internal
|
||||
error rather than BadCommunicationError, and unlike ReadAsync the write path never
|
||||
updates _health on failure, so a PLC that is down stays Healthy in the dashboard
|
||||
as long as only writes are attempted. OperationCanceledException is also caught
|
||||
and turned into a status code rather than propagating.
|
||||
|
||||
**Recommendation:** Mirror the ReadAsync catch structure: let
|
||||
OperationCanceledException propagate, map socket/timeout faults to
|
||||
BadCommunicationError, map value-conversion failures to a distinct out-of-range
|
||||
status, and update _health to Degraded on transport failures.
|
||||
|
||||
**Resolution:** _(open)_
|
||||
|
||||
### Driver.S7-009
|
||||
|
||||
| Field | Value |
|
||||
|---|---|
|
||||
| Severity | Low |
|
||||
| Category | Error handling & resilience |
|
||||
| Location | `S7Driver.cs:392` |
|
||||
| Status | Open |
|
||||
|
||||
**Description:** The subscription poll loop never reflects sustained polling
|
||||
failure anywhere an operator can see it. PollLoopAsync swallows every
|
||||
non-cancellation exception with an empty catch and the comment claims "the health
|
||||
surface reflects it" - but a poll failure routes through ReadAsync, which only
|
||||
sets DriverState.Degraded when the per-tag read throws inside the gate;
|
||||
exceptions thrown before that (e.g. RequirePlc() when Plc is null after a drop)
|
||||
bypass the health update entirely. A subscription against an uninitialized or
|
||||
dropped driver loops forever silently, with no backoff - re-polling every
|
||||
Interval indefinitely on a hard failure.
|
||||
|
||||
**Recommendation:** Have the poll loop update _health on repeated failure and
|
||||
apply a capped backoff after consecutive errors; at minimum log the swallowed
|
||||
exception (see Driver.S7-004).
|
||||
|
||||
**Resolution:** _(open)_
|
||||
|
||||
### Driver.S7-010
|
||||
|
||||
| Field | Value |
|
||||
|---|---|
|
||||
| Severity | Low |
|
||||
| Category | Performance & resource management |
|
||||
| Location | `S7Driver.cs:504` |
|
||||
| Status | Open |
|
||||
|
||||
**Description:** Dispose() is implemented as
|
||||
DisposeAsync().AsTask().GetAwaiter().GetResult() - sync-over-async. Inside the
|
||||
generic host this is currently safe (no captured SynchronizationContext), but it
|
||||
is a known deadlock pattern. The only async work behind DisposeAsync is
|
||||
ShutdownAsync, which does nothing async (returns Task.CompletedTask). The
|
||||
blocking wrap is unnecessary risk.
|
||||
|
||||
**Recommendation:** Since ShutdownAsync is effectively synchronous, have Dispose()
|
||||
perform the teardown directly (cancel CTSs, close Plc, dispose _gate) without
|
||||
round-tripping through the async path.
|
||||
|
||||
**Resolution:** _(open)_
|
||||
|
||||
### Driver.S7-011
|
||||
|
||||
| Field | Value |
|
||||
|---|---|
|
||||
| Severity | High |
|
||||
| Category | Design-document adherence |
|
||||
| Location | `S7Driver.cs:82`, `S7Driver.cs:134`, `IDriver.cs:24` |
|
||||
| Status | Open |
|
||||
|
||||
**Description:** S7Driver ignores the driverConfigJson parameter on both
|
||||
InitializeAsync and ReinitializeAsync. The IDriver contract states InitializeAsync
|
||||
initializes the driver "from its DriverConfig JSON" and ReinitializeAsync "applies
|
||||
a config change in place". All configuration is instead captured in the
|
||||
constructor (S7DriverOptions options), and ReinitializeAsync simply calls
|
||||
ShutdownAsync then InitializeAsync with the same options object. Consequently a
|
||||
config change delivered to ReinitializeAsync (the documented IGenerationApplier
|
||||
recovery path per driver-stability.md) is silently discarded - the driver
|
||||
re-opens with the old config. This breaks the only Core-initiated in-process
|
||||
recovery path.
|
||||
|
||||
**Recommendation:** Either re-parse driverConfigJson inside
|
||||
InitializeAsync/ReinitializeAsync and rebuild _options from it, or document
|
||||
explicitly that S7 reconfiguration requires instance recreation and have
|
||||
ReinitializeAsync signal that the passed JSON is unused so the contract mismatch
|
||||
is visible.
|
||||
|
||||
**Resolution:** _(open)_
|
||||
|
||||
### Driver.S7-012
|
||||
|
||||
| Field | Value |
|
||||
|---|---|
|
||||
| Severity | Medium |
|
||||
| Category | Design-document adherence |
|
||||
| Location | `S7DriverOptions.cs:59`, `S7Driver.cs:457` |
|
||||
| Status | Open |
|
||||
|
||||
**Description:** S7ProbeOptions.ProbeAddress is configured (default "MW0"),
|
||||
documented at length ("the driver runs a tick loop that issues a cheap read
|
||||
against S7ProbeOptions.ProbeAddress"), surfaced in the factory DTO
|
||||
(S7ProbeDto.ProbeAddress), and parsed from JSON - but it is never read by any
|
||||
code. ProbeLoopAsync probes liveness via plc.ReadStatusAsync (CPU status), not via
|
||||
a read of ProbeAddress. The XML doc on the S7DriverOptions.Probe property and on
|
||||
ProbeAddress describes behaviour the driver does not implement. An operator who
|
||||
sets ProbeAddress to a known-good DB word expecting the probe to exercise it will
|
||||
see no effect.
|
||||
|
||||
**Recommendation:** Either make ProbeLoopAsync actually read ProbeAddress
|
||||
(parsing it once at init and rejecting a bad value early), or delete ProbeAddress
|
||||
from S7ProbeOptions/S7ProbeDto and correct the XML docs to describe the
|
||||
ReadStatusAsync-based probe.
|
||||
|
||||
**Resolution:** _(open)_
|
||||
|
||||
### Driver.S7-013
|
||||
|
||||
| Field | Value |
|
||||
|---|---|
|
||||
| Severity | Low |
|
||||
| Category | Code organization & conventions |
|
||||
| Location | `S7DriverOptions.cs:90`, `S7Driver.cs:300` |
|
||||
| Status | Open |
|
||||
|
||||
**Description:** S7TagDefinition.StringLength is a public configured/JSON-bound
|
||||
parameter (default 254) but is dead: S7DataType.String reads and writes both
|
||||
throw NotSupportedException ("...land in a follow-up PR"), so StringLength is
|
||||
never consumed. Likewise S7DataType.Int64, UInt64, Float64, String, and DateTime
|
||||
are exposed as configurable, browse through MapDataType into real DriverDataType
|
||||
values, and pass DiscoverAsync - creating address-space nodes - yet every
|
||||
read/write of them throws NotSupportedException, becoming BadNotSupported. A site
|
||||
can configure a Float64 tag, see the node appear, and get BadNotSupported on
|
||||
every access. The scaffold/follow-up-PR split leaks half-implemented types into
|
||||
the configurable surface.
|
||||
|
||||
**Recommendation:** Reject the not-yet-implemented S7DataType values (and
|
||||
StringLength) at InitializeAsync / factory validation with a clear "not yet
|
||||
supported" error, so a partially-implemented type cannot be configured into a
|
||||
live address space.
|
||||
|
||||
**Resolution:** _(open)_
|
||||
|
||||
### Driver.S7-014
|
||||
|
||||
| Field | Value |
|
||||
|---|---|
|
||||
| Severity | Medium |
|
||||
| Category | Testing coverage |
|
||||
| Location | `tests/Drivers/ZB.MOM.WW.OtOpcUa.Driver.S7.Tests/` |
|
||||
| Status | Open |
|
||||
|
||||
**Description:** Test coverage has notable gaps for the driver behavioural
|
||||
core: (1) no test exercises the ReadOneAsync type-reinterpret switch (Int16 from
|
||||
ushort, Int32 from uint, Float32 from UInt32 bits) - the most logic-heavy method
|
||||
in the driver is untested, and the unsigned/signed unchecked casts are
|
||||
unverified; (2) no test covers a Timer/Counter tag end-to-end, which would have
|
||||
caught Driver.S7-001; (3) no test covers WriteOneAsync boxing conversions or
|
||||
the out-of-range Convert failure paths; (4) the read-write tests only cover error
|
||||
paths (uninitialized, bad address) - the happy path is explicitly deferred to "a
|
||||
follow-up PR" with no mock S7 server, leaving the entire successful read, write,
|
||||
poll, and probe-transition surface untested; (5) ReinitializeAsync and the
|
||||
driverConfigJson-ignored behaviour (Driver.S7-011) has no test.
|
||||
|
||||
**Recommendation:** Add unit tests for ReadOneAsync/WriteOneAsync type mapping by
|
||||
factoring the pure reinterpret/boxing logic out of the PLC round-trip so it is
|
||||
testable without a live PLC, and add a Timer/Counter rejection test. Track the
|
||||
live/mock-server happy-path coverage as an explicit follow-up rather than an
|
||||
open-ended deferral.
|
||||
|
||||
**Resolution:** _(open)_
|
||||
202
code-reviews/Driver.TwinCAT.Cli/findings.md
Normal file
202
code-reviews/Driver.TwinCAT.Cli/findings.md
Normal file
@@ -0,0 +1,202 @@
|
||||
# Code Review — Driver.TwinCAT.Cli
|
||||
|
||||
| Field | Value |
|
||||
|---|---|
|
||||
| Module | `src/Drivers/Cli/ZB.MOM.WW.OtOpcUa.Driver.TwinCAT.Cli` |
|
||||
| Reviewer | Claude Code |
|
||||
| Review date | 2026-05-22 |
|
||||
| Commit reviewed | `76d35d1` |
|
||||
| Status | Reviewed |
|
||||
| Open findings | 7 |
|
||||
|
||||
## Checklist coverage
|
||||
|
||||
A comprehensive review completes every category, recording "No issues found" where
|
||||
a category produced nothing rather than leaving it blank.
|
||||
|
||||
| # | Category | Result |
|
||||
|---|---|---|
|
||||
| 1 | Correctness & logic bugs | Driver.TwinCAT.Cli-001 |
|
||||
| 2 | OtOpcUa conventions | No issues found |
|
||||
| 3 | Concurrency & thread safety | Driver.TwinCAT.Cli-002 |
|
||||
| 4 | Error handling & resilience | Driver.TwinCAT.Cli-003 |
|
||||
| 5 | Security | No issues found |
|
||||
| 6 | Performance & resource management | No issues found |
|
||||
| 7 | Design-document adherence | Driver.TwinCAT.Cli-004 |
|
||||
| 8 | Code organization & conventions | Driver.TwinCAT.Cli-005 |
|
||||
| 9 | Testing coverage | Driver.TwinCAT.Cli-006 |
|
||||
| 10 | Documentation & comments | Driver.TwinCAT.Cli-007 |
|
||||
|
||||
## Findings
|
||||
|
||||
<!-- One ### entry per finding. IDs are <Module>-NNN, sequential within the module,
|
||||
never reused. Findings are never deleted — close them by changing Status and
|
||||
completing Resolution. -->
|
||||
|
||||
### Driver.TwinCAT.Cli-001
|
||||
|
||||
| Field | Value |
|
||||
|---|---|
|
||||
| Severity | Low |
|
||||
| Category | Correctness & logic bugs |
|
||||
| Location | `TwinCATCommandBase.cs:23-24`, `Commands/SubscribeCommand.cs:23-24`, `Commands/BrowseCommand.cs:21-24` |
|
||||
| Status | Open |
|
||||
|
||||
**Description:** Numeric command options are accepted without range validation. `--timeout-ms`
|
||||
feeds `Timeout => TimeSpan.FromMilliseconds(TimeoutMs)`; passing `--timeout-ms 0` or a negative
|
||||
value yields `TimeSpan.Zero`/a negative `TimeSpan`, which is then handed to the driver's
|
||||
`TwinCATDriverOptions.Timeout` and on to `ITwinCATClient.ConnectAsync`, producing an immediate
|
||||
failure or undefined behaviour rather than a clear "bad argument" message. The same applies to
|
||||
`subscribe --interval-ms` (negative -> `TimeSpan.FromMilliseconds(negative)` passed to
|
||||
`SubscribeAsync`) and `--ams-port` (`AmsPort` accepts negative / out-of-range port numbers,
|
||||
which only surface later as an opaque transport error). For a commissioning/diagnostic tool the
|
||||
failure mode should be a readable up-front rejection.
|
||||
|
||||
**Recommendation:** Validate the numeric options at the top of each `ExecuteAsync` (or via a
|
||||
shared helper on `TwinCATCommandBase`) and throw `CliFx.Exceptions.CommandException` with a
|
||||
clear message when `TimeoutMs <= 0`, `IntervalMs <= 0`, or `AmsPort` falls outside `1..65535`.
|
||||
|
||||
**Resolution:** _(open)_
|
||||
|
||||
### Driver.TwinCAT.Cli-002
|
||||
|
||||
| Field | Value |
|
||||
|---|---|
|
||||
| Severity | Low |
|
||||
| Category | Concurrency & thread safety |
|
||||
| Location | `Commands/SubscribeCommand.cs:46-58` |
|
||||
| Status | Open |
|
||||
|
||||
**Description:** The `OnDataChange` handler calls `console.Output.WriteLine(line)` synchronously.
|
||||
In native ADS-notification mode the event is raised from the `Beckhoff.TwinCAT.Ads`
|
||||
notification callback thread (see `TwinCATDriver.SubscribeAsync`, which invokes `OnDataChange`
|
||||
from the ADS `AddNotificationAsync` callback). That write can interleave with the main thread's
|
||||
`console.Output.WriteLineAsync(...)` "Subscribed to ..." banner and with subsequent change
|
||||
events if the PLC pushes faster than a single write completes. A `TextWriter` is not guaranteed
|
||||
thread-safe, so output lines can be garbled — undesirable for a tool whose stated purpose is
|
||||
producing clean screen-recorded bug-report timelines. The same pattern exists in the other
|
||||
driver CLIs (S7/Modbus), but those go through `PollGroupEngine`, whose change callbacks are
|
||||
serialised on one poll loop; the TwinCAT native path has no such serialisation.
|
||||
|
||||
**Recommendation:** Serialise console writes from the change handler, e.g. wrap the
|
||||
`WriteLine` body in a `lock` on a private object that the banner write also takes, or use
|
||||
`TextWriter.Synchronized`. At minimum, gate it so the banner is written before the
|
||||
subscription is registered (it already is) and lock the per-event writes against each other.
|
||||
|
||||
**Resolution:** _(open)_
|
||||
|
||||
### Driver.TwinCAT.Cli-003
|
||||
|
||||
| Field | Value |
|
||||
|---|---|
|
||||
| Severity | Low |
|
||||
| Category | Error handling & resilience |
|
||||
| Location | `Commands/SubscribeCommand.cs:56-58` |
|
||||
| Status | Open |
|
||||
|
||||
**Description:** The subscribe banner reports the mechanism purely from the `--poll-only` flag
|
||||
(`var mode = PollOnly ? "polling" : "ADS notification"`). The doc (`docs/Driver.TwinCAT.Cli.md`)
|
||||
states the banner "announces which mechanism is in play". The CLI always declares exactly one
|
||||
tag, so a registration that produces zero notification handles is unlikely, but `TwinCATDriver.
|
||||
SubscribeAsync` silently `continue`s past any reference not found in `_tagsByName`/`_devices`
|
||||
and a poll-mode fallback inside the driver is also possible in principle. The banner therefore
|
||||
asserts a mechanism it has not actually confirmed. It is informational only, so the impact is
|
||||
limited to a misleading diagnostic line.
|
||||
|
||||
**Recommendation:** Either derive the banner text from observable subscription state (e.g. the
|
||||
returned `ISubscriptionHandle.DiagnosticId`, which is `twincat-native-sub-*` for the native
|
||||
path vs the `PollGroupEngine` handle for poll mode) or soften the wording to "(requested:
|
||||
ADS notification)" so it does not over-claim.
|
||||
|
||||
**Resolution:** _(open)_
|
||||
|
||||
### Driver.TwinCAT.Cli-004
|
||||
|
||||
| Field | Value |
|
||||
|---|---|
|
||||
| Severity | Low |
|
||||
| Category | Design-document adherence |
|
||||
| Location | `TwinCATCommandBase.cs:26-29`, `Commands/BrowseCommand.cs` |
|
||||
| Status | Open |
|
||||
|
||||
**Description:** `--poll-only` is declared on `TwinCATCommandBase`, so it is inherited by
|
||||
`browse`. `BrowseCommand` only ever calls `DiscoverAsync` — it never subscribes — so
|
||||
`UseNativeNotifications = !PollOnly` has no observable effect on a browse run. The flag still
|
||||
appears in `otopcua-twincat-cli browse --help`, implying it changes browse behaviour when it
|
||||
does not. `docs/Driver.TwinCAT.Cli.md` documents `--poll-only` only under `subscribe` and lists
|
||||
no per-command flags for `browse` beyond `--prefix`/`--max`, so the help text and the docs
|
||||
disagree.
|
||||
|
||||
**Recommendation:** Move `--poll-only` (and arguably the notification-only relevance of the
|
||||
flag) onto an intermediate base shared by only `probe`/`read`/`subscribe`, or override/hide it
|
||||
for `browse`. Alternatively document explicitly that the flag is a no-op for `browse`.
|
||||
|
||||
**Resolution:** _(open)_
|
||||
|
||||
### Driver.TwinCAT.Cli-005
|
||||
|
||||
| Field | Value |
|
||||
|---|---|
|
||||
| Severity | Low |
|
||||
| Category | Code organization & conventions |
|
||||
| Location | `Commands/ProbeCommand.cs:23`, `Commands/ReadCommand.cs:20`, `Commands/WriteCommand.cs:20`, `Commands/SubscribeCommand.cs:18` |
|
||||
| Status | Open |
|
||||
|
||||
**Description:** The `--type` option is declared with the short alias `-t` on `read`, `write`,
|
||||
and `subscribe`, but `ProbeCommand` declares `[CommandOption("type", ...)]` with no short
|
||||
alias. An operator who has internalised `-t` from the other three verbs will get a CliFx
|
||||
"unknown option" error on `probe -t Bool`. The inconsistency is gratuitous — all four commands
|
||||
take the same `TwinCATDataType` option.
|
||||
|
||||
**Recommendation:** Add the `'t'` short alias to `ProbeCommand`'s `--type` option to match the
|
||||
other three commands.
|
||||
|
||||
**Resolution:** _(open)_
|
||||
|
||||
### Driver.TwinCAT.Cli-006
|
||||
|
||||
| Field | Value |
|
||||
|---|---|
|
||||
| Severity | Low |
|
||||
| Category | Testing coverage |
|
||||
| Location | `tests/Drivers/Cli/ZB.MOM.WW.OtOpcUa.Driver.TwinCAT.Cli.Tests/WriteCommandParseValueTests.cs` |
|
||||
| Status | Open |
|
||||
|
||||
**Description:** The only test file covers `WriteCommand.ParseValue` and
|
||||
`ReadCommand.SynthesiseTagName`. Other deterministic, router-independent logic is untested:
|
||||
`TwinCATCommandBase.Gateway` (the `ads://{netId}:{port}` string the driver's
|
||||
`TwinCATAmsAddress.TryParse` consumes — a regression here breaks every command), `BuildOptions`
|
||||
(tag wiring, `UseNativeNotifications` toggle, `Probe.Enabled = false`), and `BrowseCommand`'s
|
||||
`CollectingAddressSpaceBuilder` with its `--prefix`/`--max` filtering and the `RO`/`RW` access
|
||||
derivation. These are pure and can be unit-tested without an AMS router. `InternalsVisibleTo`
|
||||
is already wired for the test assembly. Note also the stale empty sibling test directory
|
||||
`tests/ZB.MOM.WW.OtOpcUa.Driver.TwinCAT.Cli.Tests` (no project, no files) — out of this
|
||||
module's scope but worth flagging to whoever owns the test tree.
|
||||
|
||||
**Recommendation:** Add unit tests for `Gateway`/`DriverInstanceId` string composition, for
|
||||
`BuildOptions` field wiring, and for the `CollectingAddressSpaceBuilder` prefix/max filtering
|
||||
and access-classification logic.
|
||||
|
||||
**Resolution:** _(open)_
|
||||
|
||||
### Driver.TwinCAT.Cli-007
|
||||
|
||||
| Field | Value |
|
||||
|---|---|
|
||||
| Severity | Low |
|
||||
| Category | Documentation & comments |
|
||||
| Location | `TwinCATCommandBase.cs:31-36` |
|
||||
| Status | Open |
|
||||
|
||||
**Description:** The `Timeout` override has an empty `init` accessor with the comment
|
||||
`/* driven by TimeoutMs */`. Because the base `DriverCommandBase.Timeout` is declared
|
||||
`abstract { get; init; }`, the override must supply an `init`, but here it silently discards
|
||||
any value. This is intentional, yet the empty body invites a future maintainer to "fix" it by
|
||||
adding a backing field, which would then diverge from `TimeoutMs`. The XML `<inheritdoc/>`
|
||||
gives no hint of the deliberate no-op. This is a maintainability/clarity nit, not a bug.
|
||||
|
||||
**Recommendation:** Replace `<inheritdoc/>` with a short summary stating that `Timeout` is a
|
||||
computed projection of `--timeout-ms` and the `init` accessor is intentionally a no-op, so the
|
||||
design intent survives refactoring.
|
||||
|
||||
**Resolution:** _(open)_
|
||||
426
code-reviews/Driver.TwinCAT/findings.md
Normal file
426
code-reviews/Driver.TwinCAT/findings.md
Normal file
@@ -0,0 +1,426 @@
|
||||
# Code Review — Driver.TwinCAT
|
||||
|
||||
| Field | Value |
|
||||
|---|---|
|
||||
| Module | `src/Drivers/ZB.MOM.WW.OtOpcUa.Driver.TwinCAT` |
|
||||
| Reviewer | Claude Code |
|
||||
| Review date | 2026-05-22 |
|
||||
| Commit reviewed | `76d35d1` |
|
||||
| Status | Reviewed |
|
||||
| Open findings | 16 |
|
||||
|
||||
## Checklist coverage
|
||||
|
||||
A comprehensive review completes every category, recording "No issues found" where
|
||||
a category produced nothing rather than leaving it blank.
|
||||
|
||||
| # | Category | Result |
|
||||
|---|---|---|
|
||||
| 1 | Correctness & logic bugs | Driver.TwinCAT-001, Driver.TwinCAT-002, Driver.TwinCAT-003, Driver.TwinCAT-004 |
|
||||
| 2 | OtOpcUa conventions | Driver.TwinCAT-005, Driver.TwinCAT-006 |
|
||||
| 3 | Concurrency & thread safety | Driver.TwinCAT-007, Driver.TwinCAT-008, Driver.TwinCAT-009 |
|
||||
| 4 | Error handling & resilience | Driver.TwinCAT-010, Driver.TwinCAT-011 |
|
||||
| 5 | Security | No issues found |
|
||||
| 6 | Performance & resource management | Driver.TwinCAT-012 |
|
||||
| 7 | Design-document adherence | Driver.TwinCAT-013, Driver.TwinCAT-014 |
|
||||
| 8 | Code organization & conventions | Driver.TwinCAT-015 |
|
||||
| 9 | Testing coverage | Driver.TwinCAT-016 |
|
||||
| 10 | Documentation & comments | Driver.TwinCAT-004 (data-type comment), Driver.TwinCAT-014 |
|
||||
|
||||
## Findings
|
||||
|
||||
### Driver.TwinCAT-001
|
||||
|
||||
| Field | Value |
|
||||
|---|---|
|
||||
| Severity | High |
|
||||
| Category | Correctness & logic bugs |
|
||||
| Location | `TwinCATDriver.cs:41-78` |
|
||||
| Status | Open |
|
||||
|
||||
**Description:** `InitializeAsync` and `ReinitializeAsync` both ignore their `driverConfigJson`
|
||||
parameter entirely. `InitializeAsync` builds device/tag state exclusively from `_options`,
|
||||
captured once in the constructor. `ReinitializeAsync` calls `ShutdownAsync` then
|
||||
`InitializeAsync(driverConfigJson, ...)` — but since `InitializeAsync` never reads
|
||||
`driverConfigJson`, a `ReinitializeAsync` with a changed config silently re-applies the
|
||||
original constructor-time options. Per `IDriver.ReinitializeAsync` docs and
|
||||
`docs/v2/driver-stability.md` section "In-process only (Tier A/B)", `Reinitialize` is the only
|
||||
Core-initiated path to apply a new config generation without a process restart. As written,
|
||||
config changes (added/removed devices, tags, probe settings) to a TwinCAT driver instance
|
||||
are never picked up at runtime.
|
||||
|
||||
**Recommendation:** Parse `driverConfigJson` in `InitializeAsync` (reuse
|
||||
`TwinCATDriverFactoryExtensions` DTO + option-builder logic — extract it to a shared static
|
||||
parser) and assign the resulting options to a mutable field, rather than relying on the
|
||||
constructor-captured `_options`. Alternatively, document explicitly that the constructor is
|
||||
the sole config source and have the Core recreate the driver instance on config change.
|
||||
|
||||
**Resolution:** _(open)_
|
||||
|
||||
### Driver.TwinCAT-002
|
||||
|
||||
| Field | Value |
|
||||
|---|---|
|
||||
| Severity | High |
|
||||
| Category | Correctness & logic bugs |
|
||||
| Location | `TwinCATDataType.cs:34-48`, `AdsTwinCATClient.cs:264-281` |
|
||||
| Status | Open |
|
||||
|
||||
**Description:** `TwinCATDataTypeExtensions.ToDriverDataType` maps `LInt` and `ULInt` (signed/
|
||||
unsigned 64-bit) to `DriverDataType.Int32` (comment: "matches Int64 gap"). The address-space
|
||||
layer therefore creates a 32-bit OPC UA node for a 64-bit PLC value. Meanwhile
|
||||
`AdsTwinCATClient.MapToClrType` reads `LInt`/`ULInt` as `long`/`ulong` (64-bit), so the read
|
||||
path returns a boxed `long`/`ulong` into a node typed Int32. `DriverDataType` already has an
|
||||
`Int64`/`UInt64` member (`DriverDataType.cs:16-19`), so the "gap" the comment refers to does
|
||||
not exist. Values above `int.MaxValue` are silently truncated or produce a type mismatch at
|
||||
the OPC UA encode layer; `UDInt` is also folded into `Int32`, so unsigned 32-bit values in
|
||||
the range 0x80000000 to 0xFFFFFFFF surface as negative.
|
||||
|
||||
**Recommendation:** Map `LInt` to `Int64`, `ULInt` to `UInt64`, `UDInt` to `UInt32`, `UInt`
|
||||
to `UInt16`, and `USInt`/`SInt` to their natural widths. Remove the stale "Int64 gap" comment.
|
||||
|
||||
**Resolution:** _(open)_
|
||||
|
||||
### Driver.TwinCAT-003
|
||||
|
||||
| Field | Value |
|
||||
|---|---|
|
||||
| Severity | Medium |
|
||||
| Category | Correctness & logic bugs |
|
||||
| Location | `AdsTwinCATClient.cs:264-281`, `283-300` |
|
||||
| Status | Open |
|
||||
|
||||
**Description:** `MapToClrType` has a `_ => typeof(int)` fallthrough and `ConvertForWrite` has
|
||||
a `_ => throw NotSupportedException` fallthrough. `TwinCATDataType.Structure` is a declared
|
||||
enum member, and a config-supplied tag can carry `DataType: "Structure"` because `ParseEnum`
|
||||
in the factory accepts any enum name case-insensitively. A `Structure` tag therefore reads as
|
||||
a 4-byte `int` against whatever the symbol actually is (a UDT blob) — a garbage/out-of-bounds
|
||||
read rather than a clean rejection — while a write fails late with `NotSupportedException`.
|
||||
Discovery `ToDriverDataType` maps `Structure` to `String`, compounding the inconsistency.
|
||||
|
||||
**Recommendation:** Reject `Structure`-typed pre-declared tags at `InitializeAsync` /
|
||||
`TwinCATDriverFactoryExtensions.BuildTag` time with a clear error — the driver atomic surface
|
||||
does not support UDT tags, and `BrowseSymbolsAsync` already correctly yields
|
||||
`DataType = null` for them.
|
||||
|
||||
**Resolution:** _(open)_
|
||||
|
||||
### Driver.TwinCAT-004
|
||||
|
||||
| Field | Value |
|
||||
|---|---|
|
||||
| Severity | Low |
|
||||
| Category | Correctness & logic bugs |
|
||||
| Location | `TwinCATDataType.cs:24-27` |
|
||||
| Status | Open |
|
||||
|
||||
**Description:** The inline comments for the IEC time types are inaccurate. TwinCAT `TIME` is
|
||||
a duration (32-bit, milliseconds) — not "ms since epoch of day". `DATE` is stored as seconds
|
||||
since 1970-01-01 (truncated to a day boundary), not "days since 1970-01-01". These types are
|
||||
also all read/written as raw `uint` and mapped to `DriverDataType.Int32` — the operator sees
|
||||
a raw counter, not a usable date/duration. Misleading comments will mislead the next
|
||||
implementer who tries to add proper conversion.
|
||||
|
||||
**Recommendation:** Correct the comments to match the TwinCAT/IEC 61131-3 representation. If
|
||||
date/time semantics are intended to be exposed properly, track a follow-up to decode them to
|
||||
`DriverDataType.DateTime`; otherwise document that they surface as raw counters.
|
||||
|
||||
**Resolution:** _(open)_
|
||||
|
||||
### Driver.TwinCAT-005
|
||||
|
||||
| Field | Value |
|
||||
|---|---|
|
||||
| Severity | Medium |
|
||||
| Category | OtOpcUa conventions |
|
||||
| Location | `TwinCATDriver.cs` (whole file), `AdsTwinCATClient.cs` (whole file) |
|
||||
| Status | Open |
|
||||
|
||||
**Description:** The driver performs no logging. `CLAUDE.md` Library Preferences mandate
|
||||
Serilog with a rolling daily file sink. Connect failures, ADS error codes, symbol-browse
|
||||
failures (`DiscoverAsync` swallows them in a bare `catch`), notification-registration
|
||||
failures, and probe state transitions all vanish into status fields or are swallowed
|
||||
silently. Operators get a `Degraded` health string with no correlatable log trail.
|
||||
|
||||
**Recommendation:** Inject an `ILogger`/Serilog logger and log at minimum: connect
|
||||
success/failure per device, ADS errors with code, symbol-browse fallback (the `DiscoverAsync`
|
||||
catch), native-notification registration failures, and host state transitions
|
||||
(`TransitionDeviceState`).
|
||||
|
||||
**Resolution:** _(open)_
|
||||
|
||||
### Driver.TwinCAT-006
|
||||
|
||||
| Field | Value |
|
||||
|---|---|
|
||||
| Severity | Low |
|
||||
| Category | OtOpcUa conventions |
|
||||
| Location | `TwinCATDriver.cs:406-411` |
|
||||
| Status | Open |
|
||||
|
||||
**Description:** `ResolveHost` falls back to `DriverInstanceId` when there are no configured
|
||||
devices and the reference is unknown. `DriverInstanceId` is a logical config-DB identifier,
|
||||
not a host address; `IPerCallHostResolver` consumers expect a host key that correlates with
|
||||
`GetHostStatuses()` entries (`HostConnectivityStatus.HostName` equals
|
||||
`device.Options.HostAddress`). Returning the instance ID produces a host key that matches no
|
||||
connectivity-status row.
|
||||
|
||||
**Recommendation:** Return a stable sentinel that will not be confused with a real host (an
|
||||
empty string or a documented unresolved marker), or document why the instance ID is the chosen
|
||||
fallback. Prefer the first device HostAddress only when one exists (already done).
|
||||
|
||||
**Resolution:** _(open)_
|
||||
|
||||
### Driver.TwinCAT-007
|
||||
|
||||
| Field | Value |
|
||||
|---|---|
|
||||
| Severity | High |
|
||||
| Category | Concurrency & thread safety |
|
||||
| Location | `TwinCATDriver.cs:413-429` |
|
||||
| Status | Open |
|
||||
|
||||
**Description:** `EnsureConnectedAsync` is not thread-safe. `ReadAsync`, `WriteAsync`,
|
||||
`SubscribeAsync`, and the per-device `ProbeLoopAsync` background task can all call it
|
||||
concurrently for the same `DeviceState`. The sequence `device.Client ??= _clientFactory.Create()`
|
||||
followed by `await device.Client.ConnectAsync(...)` has no lock: two threads can both observe
|
||||
`device.Client` null-or-disconnected, each create/connect a client, and one
|
||||
`AdsTwinCATClient` is leaked (its `AdsClient` + `AdsNotificationEx` handler never disposed).
|
||||
Worse, on the connect-failure path one thread does `device.Client.Dispose(); device.Client = null;`
|
||||
while another thread is mid-`ConnectAsync` on that same client instance — a disposal race that
|
||||
can throw `ObjectDisposedException` or corrupt the `AdsClient`. The probe loop runs
|
||||
continuously, so this race is not hypothetical under any concurrent read/write load.
|
||||
|
||||
**Recommendation:** Guard `EnsureConnectedAsync` per-device with a `SemaphoreSlim` (one per
|
||||
`DeviceState`), or use an async-lazy connect with proper invalidation. The S7/AB-CIP drivers
|
||||
serialize device access with a `SemaphoreSlim` — follow that pattern. Note this also
|
||||
serializes the wire, which `docs/v2/driver-specs.md` recommends for single-connection-per-PLC.
|
||||
|
||||
**Resolution:** _(open)_
|
||||
|
||||
### Driver.TwinCAT-008
|
||||
|
||||
| Field | Value |
|
||||
|---|---|
|
||||
| Severity | High |
|
||||
| Category | Concurrency & thread safety |
|
||||
| Location | `AdsTwinCATClient.cs:162-169`, `TwinCATDriver.cs:319-324` |
|
||||
| Status | Open |
|
||||
|
||||
**Description:** Native ADS notification callbacks (`OnAdsNotificationEx`) run on the
|
||||
`AdsClient` AMS router thread. `docs/v2/driver-specs.md` section 6 explicitly calls this out
|
||||
as a code-review checklist item: "Callbacks must marshal to a managed work queue immediately
|
||||
(no driver logic on the router thread) — blocking the router thread blocks every ADS
|
||||
notification across the process." The current path invokes `reg.OnChange(...)` directly on the
|
||||
router thread, and `OnChange` is the driver lambda that calls `OnDataChange?.Invoke(this, ...)`
|
||||
— i.e. every downstream Core/OPC UA subscriber handler executes synchronously on the AMS
|
||||
router thread. A single slow consumer stalls ADS notification delivery for every tag on every
|
||||
device in the process.
|
||||
|
||||
**Recommendation:** Marshal notification values onto a bounded `Channel`/work queue drained by
|
||||
a dedicated managed task before invoking `OnChange`/`OnDataChange`, exactly as the Galaxy
|
||||
`EventPump` does. Keep the router-thread callback to a non-blocking enqueue only.
|
||||
|
||||
**Resolution:** _(open)_
|
||||
|
||||
### Driver.TwinCAT-009
|
||||
|
||||
| Field | Value |
|
||||
|---|---|
|
||||
| Severity | Medium |
|
||||
| Category | Concurrency & thread safety |
|
||||
| Location | `TwinCATDriver.cs:80-99`, `41-72`, `366-388` |
|
||||
| Status | Open |
|
||||
|
||||
**Description:** `ShutdownAsync` mutates `_devices`, `_tagsByName`, and `_nativeSubs` with no
|
||||
synchronization while `ReadAsync`/`WriteAsync`/`SubscribeAsync` may be iterating or indexing
|
||||
those same plain `Dictionary<>` instances on other threads (`_devices` and `_tagsByName` are
|
||||
non-concurrent dictionaries). `ShutdownAsync` calls `_devices.Clear()`/`_tagsByName.Clear()`
|
||||
concurrently with `_devices.TryGetValue` in a read — `Dictionary<>` is not safe for concurrent
|
||||
read+write and can throw or corrupt internal state. `ReinitializeAsync` makes this worse: it
|
||||
runs `ShutdownAsync` then `InitializeAsync` (which re-populates the same dictionaries) while
|
||||
in-flight reads continue. The probe loop `EnsureConnectedAsync` also touches `DeviceState`
|
||||
objects that `ShutdownAsync` is disposing — `ShutdownAsync` cancels `ProbeCts` but does not
|
||||
await the probe task before calling `DisposeClient()`.
|
||||
|
||||
**Recommendation:** Either swap `_devices`/`_tagsByName` to `ConcurrentDictionary` and snapshot
|
||||
them on rebuild, or introduce a lifecycle lock / `volatile` running guard so reads fail fast
|
||||
with `BadServerHalted`/`BadNodeIdUnknown` once shutdown begins. Cancel and await the probe
|
||||
tasks before disposing `DeviceState`s.
|
||||
|
||||
**Resolution:** _(open)_
|
||||
|
||||
### Driver.TwinCAT-010
|
||||
|
||||
| Field | Value |
|
||||
|---|---|
|
||||
| Severity | Medium |
|
||||
| Category | Error handling & resilience |
|
||||
| Location | `AdsTwinCATClient.cs:178-195` |
|
||||
| Status | Open |
|
||||
|
||||
**Description:** `BrowseSymbolsAsync` checks `cancellationToken.IsCancellationRequested` and
|
||||
does `yield break` (a clean completion) rather than throwing `OperationCanceledException`.
|
||||
`DiscoverAsync` (`TwinCATDriver.cs:274`) explicitly has `catch (OperationCanceledException)
|
||||
{ throw; }` to propagate cancellation distinctly from a genuine browse failure. Because the
|
||||
client never throws on cancellation, a cancelled discovery silently completes as if the
|
||||
symbol table were fully and successfully walked — the address space is built from a partial
|
||||
symbol set with no indication it was truncated. The `SymbolLoaderFactory.Create` /
|
||||
`loader.Symbols` enumeration itself is also not cancellable.
|
||||
|
||||
**Recommendation:** Call `cancellationToken.ThrowIfCancellationRequested()` instead of
|
||||
`yield break` so a cancelled browse surfaces as cancellation, not as a successful but partial
|
||||
discovery.
|
||||
|
||||
**Resolution:** _(open)_
|
||||
|
||||
### Driver.TwinCAT-011
|
||||
|
||||
| Field | Value |
|
||||
|---|---|
|
||||
| Severity | Medium |
|
||||
| Category | Error handling & resilience |
|
||||
| Location | `TwinCATStatusMapper.cs:29-42` |
|
||||
| Status | Open |
|
||||
|
||||
**Description:** ADS error-code mapping has gaps and an inconsistency versus
|
||||
`docs/v2/driver-specs.md` section 6. The spec documents symbol-not-found as 0x0701
|
||||
(1793 decimal) and symbol-version-changed as 0x0702 (1794 decimal). `MapAdsError` maps
|
||||
decimal 1798 to `BadNodeIdUnknown` (symbol not found) and 1793/1794 to `BadOutOfRange`
|
||||
(invalid index group/offset). The decimal-vs-hex interpretation of the documented codes does
|
||||
not line up, so the mapper appears to treat the symbol-version-changed code as a generic range
|
||||
error. 0x0710 "Not ready / PLC in Config mode" has no entry and falls through to
|
||||
`BadCommunicationError`; the driver-spec recommends distinguishing it. And 0x0702
|
||||
symbol-version-changed is never routed to rediscovery (see Driver.TwinCAT-013).
|
||||
|
||||
**Recommendation:** Confirm the actual `AdsErrorCode` numeric values from
|
||||
`Beckhoff.TwinCAT.Ads` (the SDK enum, not the doc hex shorthand) and align the mapper. Add an
|
||||
explicit case for symbol-version-changed routed to rediscovery, and for PLC-in-Config mapped
|
||||
to `BadOutOfService`/`BadInvalidState`.
|
||||
|
||||
**Resolution:** _(open)_
|
||||
|
||||
### Driver.TwinCAT-012
|
||||
|
||||
| Field | Value |
|
||||
|---|---|
|
||||
| Severity | Medium |
|
||||
| Category | Performance & resource management |
|
||||
| Location | `TwinCATDriver.cs:102`, `AdsTwinCATClient.cs:178-195` |
|
||||
| Status | Open |
|
||||
|
||||
**Description:** `GetMemoryFootprint()` returns a hard-coded 0. `docs/v2/driver-stability.md`
|
||||
section "In-process only (Tier A/B) — driver-instance allocation tracking" requires the
|
||||
footprint to reflect "bytes attributable to their own caches (symbol cache, subscription
|
||||
items, queued operations)", and section 6 of `driver-specs.md` explicitly identifies cached
|
||||
symbol info as "the largest in-driver allocation" for TwinCAT and ties `FlushOptionalCachesAsync`
|
||||
to flushing it. Reporting 0 means Core allocation-slope detection and cache-budget enforcement
|
||||
are blind to this driver, and `FlushOptionalCachesAsync` is a no-op. (Note: the current
|
||||
`BrowseSymbolsAsync` does not retain a symbol cache — it streams and discards — so
|
||||
re-discovery re-downloads the whole symbol table each time, itself a performance concern for
|
||||
`EnableControllerBrowse` deployments.)
|
||||
|
||||
**Recommendation:** Either implement an actual symbol cache + report its size via
|
||||
`GetMemoryFootprint()` and flush it in `FlushOptionalCachesAsync`, or, if the
|
||||
stream-and-discard design is intentional, report the real footprint of `_nativeSubs` /
|
||||
`_tagsByName` and document that the driver holds no flushable cache.
|
||||
|
||||
**Resolution:** _(open)_
|
||||
|
||||
### Driver.TwinCAT-013
|
||||
|
||||
| Field | Value |
|
||||
|---|---|
|
||||
| Severity | High |
|
||||
| Category | Design-document adherence |
|
||||
| Location | `TwinCATDriver.cs:11-12` (capability list), whole file |
|
||||
| Status | Open |
|
||||
|
||||
**Description:** `TwinCATDriver` does not implement `IRediscoverable`. Both
|
||||
`docs/v2/driver-specs.md` section 6 and `docs/v2/driver-stability.md` section "TwinCAT — Deep
|
||||
Dive" state this as the defining TwinCAT failure mode: "Symbol-version-changed (0x0702) is
|
||||
the unique TwinCAT failure mode... The driver must catch 0x0702, mark its symbol cache
|
||||
invalid, re-upload symbols, rebuild the address space subtree... Treat this as a
|
||||
`IRediscoverable` invocation, not as a connection error." The `IRediscoverable` XML doc names
|
||||
TwinCAT symbol-version-changed as a canonical example. The current driver maps the error to a
|
||||
generic `BadOutOfRange`/`BadCommunicationError` quality code and never re-runs discovery, so
|
||||
after a PLC program re-download every symbol handle and notification silently goes stale with
|
||||
no address-space rebuild.
|
||||
|
||||
**Recommendation:** Implement `IRediscoverable`; detect the symbol-version-changed ADS error
|
||||
on read/write/notification paths, raise `OnRediscoveryNeeded` with a scoped reason string, and
|
||||
re-establish native notifications after the Core re-runs `DiscoverAsync`. This is explicitly
|
||||
part of the documented driver contract, not optional.
|
||||
|
||||
**Resolution:** _(open)_
|
||||
|
||||
### Driver.TwinCAT-014
|
||||
|
||||
| Field | Value |
|
||||
|---|---|
|
||||
| Severity | Low |
|
||||
| Category | Design-document adherence |
|
||||
| Location | `TwinCATDriverOptions.cs:41-43`, `TwinCATDriverOptions.cs:57-62`, `AdsTwinCATClient.cs:145` |
|
||||
| Status | Open |
|
||||
|
||||
**Description:** Several drifts between the implemented config surface and
|
||||
`docs/v2/driver-specs.md` section 6. The spec connection-settings list has separate `Host`
|
||||
(IP), `AmsNetId`, and `AmsPort` fields; the implementation collapses these into a single
|
||||
`HostAddress` string parsed as ads://{netId}:{port}, so the target device IP has no home
|
||||
field. `TwinCATProbeOptions.Timeout` (`TwinCATDriverOptions.cs:61`) is never read anywhere —
|
||||
the probe path connects via `_options.Timeout` — a dead config field. The spec lists
|
||||
`NotificationMaxDelayMs`; the code hard-codes max-delay 0 in `NotificationSettings`
|
||||
(`AdsTwinCATClient.cs:145`) with no config knob.
|
||||
|
||||
**Recommendation:** Reconcile the driver-spec doc with the implemented `TwinCATDriverOptions`
|
||||
shape (the doc is DRAFT, so updating it is acceptable). Remove or wire up
|
||||
`TwinCATProbeOptions.Timeout`. Expose `NotificationMaxDelayMs` if batching control is wanted.
|
||||
|
||||
**Resolution:** _(open)_
|
||||
|
||||
### Driver.TwinCAT-015
|
||||
|
||||
| Field | Value |
|
||||
|---|---|
|
||||
| Severity | Low |
|
||||
| Category | Code organization & conventions |
|
||||
| Location | `TwinCATDriver.cs:431-432` |
|
||||
| Status | Open |
|
||||
|
||||
**Description:** `Dispose()` runs `DisposeAsync().AsTask().GetAwaiter().GetResult()` —
|
||||
sync-over-async. `docs/v2/driver-stability.md` section Galaxy explicitly lists "sync-over-async
|
||||
on the OPC UA stack thread" among the four 2026-04-13 stability findings that had to be
|
||||
closed. `DisposeAsync` calls `ShutdownAsync`, which awaits `_poll.DisposeAsync()` and disposes
|
||||
clients; if `Dispose()` is ever called on a thread with a single-threaded synchronization
|
||||
context (the OPC UA stack), `GetResult()` can deadlock.
|
||||
|
||||
**Recommendation:** Make `Dispose()` perform a genuinely synchronous teardown. The operations
|
||||
here — cancelling token sources, disposing clients, clearing dictionaries — are all
|
||||
synchronous, and `PollGroupEngine.DisposeAsync` completes synchronously, so factor the
|
||||
synchronous teardown out so `Dispose()` does not block on a `Task`.
|
||||
|
||||
**Resolution:** _(open)_
|
||||
|
||||
### Driver.TwinCAT-016
|
||||
|
||||
| Field | Value |
|
||||
|---|---|
|
||||
| Severity | Low |
|
||||
| Category | Testing coverage |
|
||||
| Location | `tests/Drivers/ZB.MOM.WW.OtOpcUa.Driver.TwinCAT.Tests/` |
|
||||
| Status | Open |
|
||||
|
||||
**Description:** Unit coverage exists for AMS-address parsing, symbol-path parsing, read/write,
|
||||
native notifications, symbol browse, and the capability surface. Gaps tied to the findings
|
||||
above: no test exercises `ReinitializeAsync` with a changed config (Driver.TwinCAT-001 would
|
||||
have been caught); no concurrency test drives `ReadAsync`/`WriteAsync`/probe against one
|
||||
device simultaneously (Driver.TwinCAT-007/009); no test covers the symbol-version-changed to
|
||||
rediscovery path (Driver.TwinCAT-013, currently unimplemented); no test covers a `Structure`-
|
||||
typed pre-declared tag (Driver.TwinCAT-003); no test asserts 64-bit `LInt`/`ULInt` round-trip
|
||||
without truncation (Driver.TwinCAT-002).
|
||||
|
||||
**Recommendation:** Add unit tests for the above paths once the corresponding findings are
|
||||
addressed, especially a concurrency stress test for `EnsureConnectedAsync` and a
|
||||
`ReinitializeAsync`-applies-new-config test.
|
||||
|
||||
**Resolution:** _(open)_
|
||||
@@ -10,13 +10,378 @@ Each module's `findings.md` is the source of truth; this file is generated from
|
||||
|
||||
| Module | Reviewer | Date | Commit | Status | Open | Total |
|
||||
|---|---|---|---|---|---|---|
|
||||
| _no modules reviewed yet_ | | | | | | |
|
||||
| [Admin](Admin/findings.md) | Claude Code | 2026-05-22 | `76d35d1` | Reviewed | 12 | 12 |
|
||||
| [Analyzers](Analyzers/findings.md) | Claude Code | 2026-05-22 | `76d35d1` | Reviewed | 7 | 7 |
|
||||
| [Client.CLI](Client.CLI/findings.md) | Claude Code | 2026-05-22 | `76d35d1` | Reviewed | 10 | 10 |
|
||||
| [Client.Shared](Client.Shared/findings.md) | Claude Code | 2026-05-22 | `76d35d1` | Reviewed | 11 | 11 |
|
||||
| [Client.UI](Client.UI/findings.md) | Claude Code | 2026-05-22 | `76d35d1` | Reviewed | 11 | 11 |
|
||||
| [Configuration](Configuration/findings.md) | Claude Code | 2026-05-22 | `76d35d1` | Reviewed | 11 | 11 |
|
||||
| [Core](Core/findings.md) | Claude Code | 2026-05-22 | `76d35d1` | Reviewed | 12 | 12 |
|
||||
| [Core.Abstractions](Core.Abstractions/findings.md) | Claude Code | 2026-05-22 | `76d35d1` | Reviewed | 8 | 8 |
|
||||
| [Core.AlarmHistorian](Core.AlarmHistorian/findings.md) | Claude Code | 2026-05-22 | `76d35d1` | Reviewed | 11 | 11 |
|
||||
| [Core.ScriptedAlarms](Core.ScriptedAlarms/findings.md) | Claude Code | 2026-05-22 | `76d35d1` | Reviewed | 12 | 12 |
|
||||
| [Core.Scripting](Core.Scripting/findings.md) | Claude Code | 2026-05-22 | `76d35d1` | Reviewed | 11 | 11 |
|
||||
| [Core.VirtualTags](Core.VirtualTags/findings.md) | Claude Code | 2026-05-22 | `76d35d1` | Reviewed | 13 | 13 |
|
||||
| [Driver.AbCip](Driver.AbCip/findings.md) | Claude Code | 2026-05-22 | `76d35d1` | Reviewed | 15 | 15 |
|
||||
| [Driver.AbCip.Cli](Driver.AbCip.Cli/findings.md) | Claude Code | 2026-05-22 | `76d35d1` | Reviewed | 8 | 8 |
|
||||
| [Driver.AbLegacy](Driver.AbLegacy/findings.md) | Claude Code | 2026-05-22 | `76d35d1` | Reviewed | 13 | 13 |
|
||||
| [Driver.AbLegacy.Cli](Driver.AbLegacy.Cli/findings.md) | Claude Code | 2026-05-22 | `76d35d1` | Reviewed | 7 | 7 |
|
||||
| [Driver.Cli.Common](Driver.Cli.Common/findings.md) | Claude Code | 2026-05-22 | `76d35d1` | Reviewed | 6 | 6 |
|
||||
| [Driver.FOCAS](Driver.FOCAS/findings.md) | Claude Code | 2026-05-22 | `76d35d1` | Reviewed | 12 | 12 |
|
||||
| [Driver.FOCAS.Cli](Driver.FOCAS.Cli/findings.md) | Claude Code | 2026-05-22 | `76d35d1` | Reviewed | 5 | 5 |
|
||||
| [Driver.Galaxy](Driver.Galaxy/findings.md) | Claude Code | 2026-05-22 | `76d35d1` | Reviewed | 14 | 14 |
|
||||
| [Driver.Historian.Wonderware](Driver.Historian.Wonderware/findings.md) | Claude Code | 2026-05-22 | `76d35d1` | Reviewed | 12 | 12 |
|
||||
| [Driver.Historian.Wonderware.Client](Driver.Historian.Wonderware.Client/findings.md) | Claude Code | 2026-05-22 | `76d35d1` | Reviewed | 10 | 10 |
|
||||
| [Driver.Modbus](Driver.Modbus/findings.md) | Claude Code | 2026-05-22 | `76d35d1` | Reviewed | 12 | 12 |
|
||||
| [Driver.Modbus.Addressing](Driver.Modbus.Addressing/findings.md) | Claude Code | 2026-05-22 | `76d35d1` | Reviewed | 9 | 9 |
|
||||
| [Driver.Modbus.Cli](Driver.Modbus.Cli/findings.md) | Claude Code | 2026-05-22 | `76d35d1` | Reviewed | 8 | 8 |
|
||||
| [Driver.OpcUaClient](Driver.OpcUaClient/findings.md) | Claude Code | 2026-05-22 | `76d35d1` | Reviewed | 15 | 15 |
|
||||
| [Driver.S7](Driver.S7/findings.md) | Claude Code | 2026-05-22 | `76d35d1` | Reviewed | 14 | 14 |
|
||||
| [Driver.S7.Cli](Driver.S7.Cli/findings.md) | Claude Code | 2026-05-22 | `76d35d1` | Reviewed | 7 | 7 |
|
||||
| [Driver.TwinCAT](Driver.TwinCAT/findings.md) | Claude Code | 2026-05-22 | `76d35d1` | Reviewed | 16 | 16 |
|
||||
| [Driver.TwinCAT.Cli](Driver.TwinCAT.Cli/findings.md) | Claude Code | 2026-05-22 | `76d35d1` | Reviewed | 7 | 7 |
|
||||
| [Server](Server/findings.md) | Claude Code | 2026-05-22 | `76d35d1` | Reviewed | 15 | 15 |
|
||||
|
||||
## Pending findings
|
||||
|
||||
Findings with status `Open` or `In Progress`, ordered by severity.
|
||||
|
||||
_No pending findings._
|
||||
| ID | Severity | Category | Location | Description |
|
||||
|---|---|---|---|---|
|
||||
| Admin-001 | Critical | Security | `Components/Routes.razor:4-11`, `Program.cs:150` | The router uses a plain `RouteView` (not `AuthorizeRouteView`), and `MapRazorComponents<App>()` is registered without `.RequireAuthorization()`. A page-level `[Authorize]` attribute on a routable Razor component is only enforced when the r… |
|
||||
| Admin-002 | Critical | Security | `Components/Pages/Clusters/NewCluster.razor:1-7`, `Home.razor`, `Fleet.razor`, `Hosts.razor`, `AlarmsHistorian.razor`, `Clusters/ClustersList.razor`, `Clusters/Generations.razor`, `Drivers/FocasDetail.razor` | Several routable pages carry no authorization attribute at all. Most critically `NewCluster` (`/clusters/new`) is a mutating page — its `CreateAsync` writes a new `ServerCluster` row and a draft generation. Combined with Admin-001 (the rou… |
|
||||
| Core.AlarmHistorian-001 | Critical | Correctness & logic bugs | `src/Core/ZB.MOM.WW.OtOpcUa.Core.AlarmHistorian/SqliteStoreAndForwardSink.cs:255-278` | `ReadBatch` builds two parallel lists, `rowIds` and `events`, that `DrainOnceAsync` later indexes together (`rowIds[i]` paired with `outcomes[i]`, where `outcomes` is 1:1 with `events`). But `rowIds.Add(reader.GetInt64(0))` runs unconditio… |
|
||||
| Core.Scripting-001 | Critical | Security | `ForbiddenTypeAnalyzer.cs:45`, `ScriptSandbox.cs:54` | `System.Environment` lives in the allowed `System` namespace (it is in `System.Private.CoreLib`, which is allow-listed for primitives) and is not on the forbidden-namespace deny-list. Nothing prevents an operator-authored script from calli… |
|
||||
| Driver.Galaxy-001 | Critical | Error handling & resilience | `Runtime/EventPump.cs:128`, `GalaxyDriver.cs:222` | The `ReconnectSupervisor` is constructed in `BuildProductionRuntimeAsync` and exposes `ReportTransportFailure(Exception)` as the only entry point that starts the reopen -> replay recovery loop. Nothing in the driver ever calls `ReportTrans… |
|
||||
| Server-001 | Critical | Correctness & logic bugs | `src/Server/ZB.MOM.WW.OtOpcUa.Server/OpcUa/DriverNodeManager.cs:1791` | `WriteNodeIdUnknown` calls itself unconditionally as its first statement, then sets `errors[i]`. Unbounded recursion with no base case overflows the stack. Called from all four `HistoryRead*` overrides whenever a HistoryRead targets a node… |
|
||||
| Admin-003 | High | Security | `Program.cs:137-139`, `Hubs/FleetStatusHub.cs:11`, `Hubs/AlertHub.cs:10`, `Hubs/ScriptLogHub.cs:30` | All three SignalR hubs (`/hubs/fleet`, `/hubs/alerts`, `/hubs/script-log`) are mapped with no `[Authorize]` attribute and no `.RequireAuthorization()` on the `MapHub` call. Any unauthenticated client can open a hub connection: `FleetStatus… |
|
||||
| Admin-004 | High | Security | `appsettings.json:3,13-14` | The checked-in `appsettings.json` contains live-looking secrets in plaintext: the `ConfigDb` connection string with `User Id=sa;Password=OtOpcUaDev_2026!` and the LDAP `ServiceAccountPassword: "serviceaccount123"`. It also sets `Encrypt=Fa… |
|
||||
| Admin-005 | High | Correctness & logic bugs | `Components/Pages/Login.razor:15,107-110` | `Login.razor` is an interactive component (the project default render mode is interactive server; the page declares no `@rendermode` but uses `EditForm`/`InputText` interactive binding and runs `SignInAsync` from an event handler). It call… |
|
||||
| Client.Shared-005 | High | Concurrency & thread safety | `OpcUaClientService.cs:19`, `OpcUaClientService.cs:226-249`, `OpcUaClientService.cs:499-521` | `_activeDataSubscriptions` is a plain `Dictionary` mutated from at least three thread contexts with no synchronization: the caller thread (`SubscribeAsync`/`UnsubscribeAsync`), the keep-alive callback thread (`HandleKeepAliveFailureAsync`… |
|
||||
| Client.Shared-006 | High | Concurrency & thread safety | `OpcUaClientService.cs:97-100`, `OpcUaClientService.cs:432-497` | `HandleKeepAliveFailureAsync` is launched fire-and-forget (`_ = HandleKeepAliveFailureAsync()`) from every bad keep-alive callback. The only guard against re-entry is the non-atomic check `if (_state == Reconnecting \|\| _state == Disconnect… |
|
||||
| Configuration-001 | High | Correctness & logic bugs | `src/Core/ZB.MOM.WW.OtOpcUa.Configuration/Migrations/20260417215224_StoredProcedures.cs:282` | `sp_PublishGeneration` invokes `EXEC dbo.sp_ValidateDraft @DraftGenerationId = @DraftGenerationId;` and then continues unconditionally to the reservation MERGE and the `Status='Published'` update. `sp_ValidateDraft` signals every failure w… |
|
||||
| Configuration-008 | High | Security | `src/Core/ZB.MOM.WW.OtOpcUa.Configuration/Migrations/20260417215224_StoredProcedures.cs:150`, `:373`, `:468` | Three stored procedures build `ConfigAuditLog.DetailsJson` by raw string concatenation of caller-supplied `nvarchar` parameters: `sp_RegisterNodeGenerationApplied` (`@Status`), `sp_RollbackToGeneration` (`@TargetGenerationId`), `sp_Release… |
|
||||
| Core-001 | High | Correctness & logic bugs | `src/Core/ZB.MOM.WW.OtOpcUa.Core/Authorization/UserAuthorizationState.cs:50-68` | `NeedsRefresh` can never return `true` with the default field values. `AuthCacheMaxStaleness` defaults to 5 minutes and `MembershipFreshnessInterval` defaults to 15 minutes. `NeedsRefresh(utcNow)` is defined as `!IsStale(utcNow) && elapsed… |
|
||||
| Core-002 | High | Security | `src/Core/ZB.MOM.WW.OtOpcUa.Core/Authorization/TriePermissionEvaluator.cs:24-50` | `TriePermissionEvaluator.Authorize` never compares the session's `AuthGenerationId` against the generation of the trie it evaluates against. It calls `_cache.GetTrie(scope.ClusterId)` — the current-generation shortcut — and authorizes agai… |
|
||||
| Core.AlarmHistorian-002 | High | Correctness & logic bugs | `src/Core/ZB.MOM.WW.OtOpcUa.Core.AlarmHistorian/SqliteStoreAndForwardSink.cs:99-105,386-388` | The class computes an exponential-backoff value (`_backoffIndex`, `BumpBackoff`, `CurrentBackoff`, the `BackoffLadder`) and the class doc-comment states "Drain runs on a shared `Timer`. Exponential backoff on `RetryPlease`: 1s → 2s → 5s →… |
|
||||
| Core.AlarmHistorian-004 | High | Concurrency & thread safety | `src/Core/ZB.MOM.WW.OtOpcUa.Core.AlarmHistorian/SqliteStoreAndForwardSink.cs:90,112,176,259` | Every operation opens a brand-new `SqliteConnection` from the bare connection string `Data Source={databasePath}` — no `busy_timeout` / `Pragma`, no shared cache. SQLite serializes writers with a file lock; when `EnqueueAsync` (emitting th… |
|
||||
| Core.AlarmHistorian-006 | High | Error handling & resilience | `src/Core/ZB.MOM.WW.OtOpcUa.Core.AlarmHistorian/SqliteStoreAndForwardSink.cs:103,135-216` | `StartDrainLoop` launches the drain with `new Timer(_ => _ = DrainOnceAsync(CancellationToken.None), ...)`. The returned `Task` is discarded (`_ =`), so any exception thrown by `DrainOnceAsync` is an unobserved task exception — never logge… |
|
||||
| Core.ScriptedAlarms-001 | High | Concurrency & thread safety | `ScriptedAlarmEngine.cs:175`, `ScriptedAlarmEngine.cs:178`, `ScriptedAlarmEngine.cs:73`, `ScriptedAlarmEngine.cs:368` | `_alarms` is a plain `Dictionary<string, AlarmState>` (line 42). Every mutation of it (`LoadAsync`, `ApplyAsync`, `ReevaluateAsync`, `ShelvingCheckAsync`) correctly happens under the `_evalGate` semaphore, but four read paths touch it with… |
|
||||
| Core.Scripting-002 | High | Security | `ForbiddenTypeAnalyzer.cs:70` | The syntax walker only inspects four node kinds: `ObjectCreationExpressionSyntax`, `InvocationExpressionSyntax` with a member-access target, `MemberAccessExpressionSyntax`, and bare `IdentifierNameSyntax`. It never visits `TypeOfExpression… |
|
||||
| Core.VirtualTags-001 | High | Correctness & logic bugs | `src/Core/ZB.MOM.WW.OtOpcUa.Core.VirtualTags/VirtualTagEngine.cs:306` | `OnScriptSetVirtualTag` updates `_valueCache`, notifies observers, and records history for the written path, but it does not schedule a cascade for tags that depend on the written path. `docs/VirtualTags.md` (VirtualTagContext section) exp… |
|
||||
| Driver.AbCip-001 | High | Correctness & logic bugs | `AbCipDriver.cs:111`, `AbCipDriver.cs:163-167` | `InitializeAsync(string driverConfigJson, ...)` never reads `driverConfigJson`. It builds all device/tag state from `_options`, captured at construction time. `ReinitializeAsync` calls `ShutdownAsync` then `InitializeAsync(driverConfigJson… |
|
||||
| Driver.AbCip-002 | High | Correctness & logic bugs | `AbCipStatusMapper.cs:65-78` | `MapLibplctagStatus` maps negative libplctag codes that do not match the libplctag.NET `Status` enum / native `libplctag.h` constants. `LibplctagTagRuntime.GetStatus()` returns `(int)_tag.GetStatus()`, the underlying value of the `Status`… |
|
||||
| Driver.AbCip-003 | High | Correctness & logic bugs | `AbCipUdtMemberLayout.cs:32-54`, `AbCipDriver.cs:426-430`, `AbCipUdtReadPlanner.cs:48` | The whole-UDT read path (`ReadGroupAsync`) decodes each grouped member at the byte offset produced by `AbCipUdtMemberLayout.TryBuild`, which computes offsets purely from declaration order of the configured `AbCipStructureMember` list under… |
|
||||
| Driver.AbCip-008 | High | Concurrency & thread safety | `AbCipDriver.cs:144-152`, `AbCipDriver.cs:169-183`, `AbCipDriver.cs:235-281` | Probe loops are started fire-and-forget (`_ = Task.Run(() => ProbeLoopAsync(state, ct), ct)`) and the resulting Task is never stored or awaited. `ShutdownAsync` cancels `state.ProbeCts`, then immediately disposes it, sets it null, and call… |
|
||||
| Driver.AbLegacy-001 | High | Correctness & logic bugs | `AbLegacyAddress.cs:54`, `AbLegacyDriver.cs:368-374` | `AbLegacyAddress.TryParse` accepts a `BitIndex` of `0..31` for every file type. A PCCC N-file word is a signed 16-bit integer, so valid bit indices are `0..15`. When a tag is `Bit`-typed against an N-file with a bit suffix of `16..31` (e.g… |
|
||||
| Driver.AbLegacy-006 | High | Concurrency & thread safety | `AbLegacyDriver.cs:107-158`, `AbLegacyDriver.cs:162-234`, `LibplctagLegacyTagRuntime.cs` | A per-tag `IAbLegacyTagRuntime` (wrapping a single libplctag `Tag`) is cached in `DeviceState.Runtimes` and reused. `ReadAsync` (called directly by the server read path) and the `PollGroupEngine` poll loop (which also calls `ReadAsync` via… |
|
||||
| Driver.Cli.Common-001 | High | Correctness & logic bugs | `src/Drivers/Cli/ZB.MOM.WW.OtOpcUa.Driver.Cli.Common/SnapshotFormatter.cs:106-119` | The `FormatStatus` shortlist maps four OPC UA status names to incorrect numeric codes. The correct OPC UA spec values (verified against the OPC Foundation UA-.NETStandard `Opc.Ua.StatusCodes` table) are: \| Name in shortlist \| Code used \| C… |
|
||||
| Driver.FOCAS-001 | High | Correctness & logic bugs | `FocasDriverFactoryExtensions.cs:54-86`, `FocasDriverFactoryExtensions.cs:132-140` | `FocasDriverConfigDto` exposes only `Backend`, `Series`, `TimeoutMs`, `Devices`, `Tags`, and `Probe`. It has no `FixedTree`, `AlarmProjection`, or `HandleRecycle` properties, and `CreateInstance` never sets those three options on `FocasDri… |
|
||||
| Driver.FOCAS-002 | High | Correctness & logic bugs | `WireFocasClient.cs:164-179`, `FocasDriver.cs:513`, `FocasDriver.cs:593` | The fixed-tree bootstrap probes the `ProgramInfo` capability via `SafeTryProbe(() => client.GetProgramInfoAsync(ct))` and treats a non-null result as "supported". But `WireFocasClient.GetProgramInfoAsync` never throws on a FOCAS error retu… |
|
||||
| Driver.Galaxy-002 | High | Correctness & logic bugs | `Browse/DataTypeMap.cs:13`, `Runtime/MxValueDecoder.cs:9` | `DataTypeMap.Map` maps Galaxy `mx_data_type` codes to six `DriverDataType` values (Boolean, Int32, Float32, Float64, String, DateTime) — there is no `Int64` arm. Yet `MxValueDecoder` and `MxValueEncoder` both fully support Int64 (`MxValue.… |
|
||||
| Driver.Galaxy-008 | High | Error handling & resilience | `GalaxyDriver.cs:264-276`, `Runtime/EventPump.cs:97-103` | Even if Driver.Galaxy-001 is fixed and the supervisor's `ReplayAsync` runs, recovery is incomplete. `ReplayAsync` re-issues `SubscribeBulkAsync` for the tracked tags, but the `EventPump` background loop that consumes `StreamEvents` is not… |
|
||||
| Driver.Historian.Wonderware-001 | High | Correctness and logic bugs | `Backend/SdkAlarmHistorianWriteBackend.cs:68`, `Backend/AahClientManagedAlarmEventWriter.cs:82-103` | `MalformedErrors` includes `HistorianAccessError.ErrorValue.WriteToReadOnlyFile`. When `ClassifyOutcome` routes that code through `MapOutcome`, `isMalformedInput` is `true`, so the per-event result becomes `PermanentFail` and the lmxopcua-… |
|
||||
| Driver.Historian.Wonderware.Client-001 | High | Correctness & logic bugs | `WonderwareHistorianClient.cs:98-113` | `ReadAtTimeAsync` violates the explicit `IHistorianDataSource.ReadAtTimeAsync` contract. The interface XML doc states: the returned list MUST be the same length and order as `timestampsUtc`, and gaps are returned as Bad-quality snapshots.… |
|
||||
| Driver.Modbus-001 | High | Concurrency & thread safety | `ModbusDriver.cs:92,99-122` | `_lastPublishedByRef` is a plain `Dictionary<string, object>` mutated inside `ShouldPublish`, which runs on the `PollGroupEngine.onChange` callback. `PollGroupEngine` runs one background `Task` per subscription (`PollGroupEngine.cs:64`), s… |
|
||||
| Driver.Modbus.Addressing-001 | High | Correctness & logic bugs | `ModbusAddressParser.cs:230-235`, `DirectLogicAddress.cs:66-73` | The DL205 family-native branch routes every V-prefixed address through `DirectLogicAddress.UserVMemoryToPdu`, which is a plain octal-to-decimal conversion. DL205/DL260 system V-memory (V40400 and up) is NOT a simple octal decode — per `doc… |
|
||||
| Driver.OpcUaClient-001 | High | Correctness & logic bugs | `OpcUaClientDriver.cs:444`, `:466`, `:517`, `:540`, `:599`, `:610` | ReadAsync, WriteAsync, and DiscoverAsync capture the session into a local variable via RequireSession() before acquiring `_gate`, then perform the wire call on that captured reference inside the gate. The reconnect path (OnReconnectComplet… |
|
||||
| Driver.OpcUaClient-002 | High | Error handling & resilience | `OpcUaClientDriver.cs:1330-1359` | OnReconnectComplete handles only the success case. When SessionReconnectHandler gives up (its retry loop exhausts the 2-minute maxReconnectPeriod), it invokes the callback with `handler.Session == null`. The code sets `Session = null`, dis… |
|
||||
| Driver.OpcUaClient-003 | High | Correctness & logic bugs | `OpcUaClientDriver.cs:644-711` | BrowseRecursiveAsync calls session.BrowseAsync with `requestedMaxReferencesPerNode: 0` but never follows browse continuation points. OPC UA servers enforce a server-side max-references-per-node limit; when a node has more children than the… |
|
||||
| Driver.OpcUaClient-004 | High | Design-document adherence | `OpcUaClientDriver.cs:596-632`, `:789`, `OpcUaClientDriverOptions.cs` | docs/v2/driver-specs.md section 8 mandates two features that are absent. (1) Namespace remapping: the spec requires building a bidirectional namespace map at connect time from session.NamespaceUris. The driver instead stores the raw upstre… |
|
||||
| Driver.OpcUaClient-005 | High | Concurrency & thread safety | `OpcUaClientDriver.cs:1297-1319` | OnKeepAlive reads and writes `_reconnectHandler` without any lock: `if (_reconnectHandler is not null) return;` followed by `_reconnectHandler = new SessionReconnectHandler(...)`. Keep-alive callbacks are raised from the SDK keep-alive tim… |
|
||||
| Driver.S7-001 | High | Correctness & logic bugs | `S7AddressParser.cs:93`, `S7Driver.cs:231` | S7AddressParser.Parse accepts Timer (T0) and Counter (C0) addresses and the test suite asserts they parse successfully, but the read path cannot serve them. Two problems compound: (1) ReadOneAsync type-mapping switch (lines 231-250) has no… |
|
||||
| Driver.S7-006 | High | Concurrency & thread safety | `S7Driver.cs:140`, `S7Driver.cs:457`, `S7Driver.cs:506` | Disposal races with the in-flight probe / poll tasks. ShutdownAsync calls _probeCts.Cancel() and cancels each subscription CTS, but it does not await the ProbeLoopAsync / PollLoopAsync tasks (they are fire-and-forget Task.Run with the task… |
|
||||
| Driver.S7-007 | High | Error handling & resilience | `S7Driver.cs:200`, `S7DriverOptions.cs:13`, `docs/v2/driver-specs.md:434` | PUT/GET-disabled handling contradicts the design and the module own docstring. driver-specs.md section 5 (line 434) and the S7DriverOptions class remark both state PUT/GET-disabled must be mapped to BadNotSupported and surfaced as a config… |
|
||||
| Driver.S7-011 | High | Design-document adherence | `S7Driver.cs:82`, `S7Driver.cs:134`, `IDriver.cs:24` | S7Driver ignores the driverConfigJson parameter on both InitializeAsync and ReinitializeAsync. The IDriver contract states InitializeAsync initializes the driver "from its DriverConfig JSON" and ReinitializeAsync "applies a config change i… |
|
||||
| Driver.TwinCAT-001 | High | Correctness & logic bugs | `TwinCATDriver.cs:41-78` | `InitializeAsync` and `ReinitializeAsync` both ignore their `driverConfigJson` parameter entirely. `InitializeAsync` builds device/tag state exclusively from `_options`, captured once in the constructor. `ReinitializeAsync` calls `Shutdown… |
|
||||
| Driver.TwinCAT-002 | High | Correctness & logic bugs | `TwinCATDataType.cs:34-48`, `AdsTwinCATClient.cs:264-281` | `TwinCATDataTypeExtensions.ToDriverDataType` maps `LInt` and `ULInt` (signed/ unsigned 64-bit) to `DriverDataType.Int32` (comment: "matches Int64 gap"). The address-space layer therefore creates a 32-bit OPC UA node for a 64-bit PLC value.… |
|
||||
| Driver.TwinCAT-007 | High | Concurrency & thread safety | `TwinCATDriver.cs:413-429` | `EnsureConnectedAsync` is not thread-safe. `ReadAsync`, `WriteAsync`, `SubscribeAsync`, and the per-device `ProbeLoopAsync` background task can all call it concurrently for the same `DeviceState`. The sequence `device.Client ??= _clientFac… |
|
||||
| Driver.TwinCAT-008 | High | Concurrency & thread safety | `AdsTwinCATClient.cs:162-169`, `TwinCATDriver.cs:319-324` | Native ADS notification callbacks (`OnAdsNotificationEx`) run on the `AdsClient` AMS router thread. `docs/v2/driver-specs.md` section 6 explicitly calls this out as a code-review checklist item: "Callbacks must marshal to a managed work qu… |
|
||||
| Driver.TwinCAT-013 | High | Design-document adherence | `TwinCATDriver.cs:11-12` (capability list), whole file | `TwinCATDriver` does not implement `IRediscoverable`. Both `docs/v2/driver-specs.md` section 6 and `docs/v2/driver-stability.md` section "TwinCAT — Deep Dive" state this as the defining TwinCAT failure mode: "Symbol-version-changed (0x0702… |
|
||||
| Server-002 | High | Correctness & logic bugs | `src/Server/ZB.MOM.WW.OtOpcUa.Server/Security/AuthorizationGate.cs:60-63` | `IsAllowed` does `if (decision.IsAllowed) return true; return !_strictMode;`. When a session carries resolved LDAP groups and the evaluator returns an explicit deny, lax mode (default) overrides it to `true`. The lax fallback is intended o… |
|
||||
| Server-009 | High | Security | `src/Server/ZB.MOM.WW.OtOpcUa.Server/Security/LdapOptions.cs:44`, `src/Server/ZB.MOM.WW.OtOpcUa.Server/Program.cs:74` | `AllowInsecureLdap` defaults to `true` (and `Program.cs` reads `?? true`); `UseTls` defaults to `false`. Out of the box, usernames and plaintext passwords are bound to LDAP over an unencrypted socket. A production deployment enabling LDAP… |
|
||||
| Admin-006 | Medium | Security | `Components/Layout/MainLayout.razor:47-49`, `Program.cs:129,131-135` | `app.UseAntiforgery()` is enabled, but the Sign-out form (`<form method="post" action="/auth/logout">`) renders no antiforgery token, and the `MapPost("/auth/logout", ...)` endpoint does not call `.DisableAntiforgery()` or otherwise opt ou… |
|
||||
| Admin-007 | Medium | Design-document adherence | `Components/Pages/Clusters/NewCluster.razor:91,95-96` | `NewCluster.CreateAsync` hardcodes `CreatedBy = "admin-ui"` (both on the `ServerCluster` row and the draft generation) instead of the signed-in operator principal name. `admin-ui.md` section "Audit" requires "the operator principal" be rec… |
|
||||
| Admin-008 | Medium | Error handling & resilience | `Services/ReservationService.cs:28-37` | `ReservationService.ReleaseAsync` calls `sp_ReleaseExternalIdReservation` with only `@Kind`, `@Value`, `@ReleaseReason`. `admin-ui.md` section "Release an external-ID reservation" specifies the proc sets `ReleasedBy` to the FleetAdmin who… |
|
||||
| Admin-009 | Medium | Testing coverage | `src/Server/ZB.MOM.WW.OtOpcUa.Admin` (whole module) | The module most security-critical behaviours have no enforced test coverage at the boundary that matters. There is no test that an unauthenticated request to a page or hub is rejected (which would have caught Admin-001/002/003), no test of… |
|
||||
| Analyzers-001 | Medium | Correctness & logic bugs | `src/Tooling/ZB.MOM.WW.OtOpcUa.Analyzers/UnwrappedCapabilityCallAnalyzer.cs:135-139` | `IsInsideWrapperLambda` treats a guarded call as "wrapped" if it is textually inside ANY lambda that is an argument to ANY invocation whose containing type is `CapabilityInvoker` or `AlarmSurfaceInvoker`. It matches the containing type onl… |
|
||||
| Analyzers-006 | Medium | Testing coverage | `tests/Tooling/ZB.MOM.WW.OtOpcUa.Analyzers.Tests/UnwrappedCapabilityCallAnalyzerTests.cs` | The test suite exercises only 3 of the 7 guarded interfaces (`IReadable`, `IWritable`, `ITagDiscovery`) and one positive / one negative lambda case. Significant untested behaviour for an analyzer that gates a repo-wide resilience invariant… |
|
||||
| Client.CLI-001 | Medium | Correctness & logic bugs | `Commands/HistoryReadCommand.cs:73`, `Commands/HistoryReadCommand.cs:76` | The start and end options are parsed with `DateTime.Parse(StartTime)` with no `IFormatProvider` or `DateTimeStyles`. Parsing therefore depends on the current OS culture: the same `--start "03/04/2026"` resolves to March 4 on an en-US box a… |
|
||||
| Client.CLI-005 | Medium | Concurrency & thread safety | `Commands/SubscribeCommand.cs:66-78`, `Commands/AlarmsCommand.cs:52-64` | The `DataChanged` and `AlarmEvent` handlers write to `console.Output` (a `System.IO.TextWriter`) directly from the OPC UA SDK subscription/notification thread, while the command main flow is awaiting `Task.Delay(Timeout.Infinite, ct)` and… |
|
||||
| Client.Shared-001 | Medium | Correctness & logic bugs | `OpcUaClientService.cs:552` | `OnAlarmEventNotification` returns early when `eventFields.EventFields` has fewer than 6 entries. The event filter built by `CreateAlarmEventFilter` always registers 13 select clauses, so a conforming server returns 13 fields. The `< 6` th… |
|
||||
| Client.Shared-002 | Medium | Correctness & logic bugs | `OpcUaClientService.cs:351-355`, `OpcUaClientService.cs:373` | `GetRedundancyInfoAsync` performs unguarded unboxing casts on values read from the server: `(int)redundancySupportValue.Value` and `(byte)serviceLevelValue.Value`. Unlike the `ServerUriArray`/`ServerArray` reads below them, the `Redundancy… |
|
||||
| Client.Shared-007 | Medium | Concurrency & thread safety | `OpcUaClientService.cs:581-622` | In the alarm fallback path, the `Task.Run` closure mutates the captured locals `activeState`, `ackedState`, `time`, and `capturedMessage`, then reads them when invoking `AlarmEvent`. Because the captured `_session` reference can be replace… |
|
||||
| Client.Shared-008 | Medium | Error handling & resilience | `OpcUaClientService.cs:170-180`, `Helpers/ValueConverter.cs:15-31` | `WriteValueAsync` coerces a string input to the target type by reading the node's current value and inferring the type from `currentDataValue.Value`. When the node has never been written, or the read returns a `Bad` status with a null `Val… |
|
||||
| Client.UI-001 | Medium | Correctness & logic bugs | `ViewModels/HistoryViewModel.cs:76`, `ViewModels/HistoryViewModel.cs:77` | `ReadHistoryAsync` runs as a `RelayCommand` body, which is invoked on the UI thread, so the bare `IsLoading = true` at line 76 happens to land on the right thread today. But `Results.Clear()` on the very next line is wrapped in `_dispatche… |
|
||||
| Client.UI-002 | Medium | Correctness & logic bugs | `ViewModels/MainWindowViewModel.cs:255`, `ViewModels/MainWindowViewModel.cs:333` | `ConnectAsync` calls `await BrowseTree.LoadRootsAsync()` and `ViewHistoryForSelectedNode` calls `History.SelectedNodeId = ...` by dereferencing the nullable child view-model properties (`BrowseTreeViewModel?`, `HistoryViewModel?`) without… |
|
||||
| Client.UI-005 | Medium | Concurrency & thread safety | `ViewModels/MainWindowViewModel.cs:286-304`, `ViewModels/MainWindowViewModel.cs:155-189` | `SubscriptionsViewModel` and `AlarmsViewModel` attach handlers to the long-lived `_service` events (`DataChanged`, `AlarmEvent`) in their constructors and detach them only via `Teardown()`. `Teardown()` is called from `DisconnectAsync` (op… |
|
||||
| Client.UI-007 | Medium | Security | `Services/UserSettings.cs:22-23`, `Services/JsonSettingsService.cs:38-50`, `ViewModels/MainWindowViewModel.cs:393-408` | The OPC UA `UserName`-token password is persisted in cleartext. `UserSettings.Password` is a plain `string`, `JsonSettingsService.Save` serializes the whole settings object to `settings.json` under `LocalApplicationData`, and `SaveSettings… |
|
||||
| Client.UI-008 | Medium | Performance & resource management | `ViewModels/MainWindowViewModel.cs:18`, `ViewModels/MainWindowViewModel.cs:125-148`, `App.axaml.cs:18-32` | `IOpcUaClientService` is declared `IDisposable` (`IOpcUaClientService.cs:10`), and the concrete service owns an OPC UA session plus SDK resources. `MainWindowViewModel` holds `_service` for the lifetime of the app but never calls `_service… |
|
||||
| Configuration-002 | Medium | Correctness & logic bugs | `src/Core/ZB.MOM.WW.OtOpcUa.Configuration/Migrations/20260417215224_StoredProcedures.cs:325` | `sp_RollbackToGeneration` opens its own `BEGIN TRANSACTION`, clones rows into a new Draft, then `EXEC dbo.sp_PublishGeneration`, which itself runs `BEGIN TRANSACTION` (nesting `@@TRANCOUNT` to 2) and on its failure paths executes a bare `R… |
|
||||
| Configuration-003 | Medium | Correctness & logic bugs | `src/Core/ZB.MOM.WW.OtOpcUa.Configuration/Validation/DraftValidator.cs:73` | `ValidatePathLength` computes path length with hard-coded constants — it always charges 64 chars for Enterprise+Site (`32 + 32 + ...`) regardless of the cluster's actual values. This over-rejects: a short Enterprise/Site is penalised by up… |
|
||||
| Configuration-006 | Medium | Error handling & resilience | `src/Core/ZB.MOM.WW.OtOpcUa.Configuration/LocalCache/ResilientConfigReader.cs:79` | The fallback `catch` filters on `ex is not OperationCanceledException`. A SQL command timeout surfaced by ADO.NET as a `TaskCanceledException` (derives from `OperationCanceledException`) is then treated as caller cancellation and propagate… |
|
||||
| Configuration-009 | Medium | Security | `src/Core/ZB.MOM.WW.OtOpcUa.Configuration/DesignTimeDbContextFactory.cs:14` | `DefaultConnectionString` embeds a plaintext `sa` password with `User Id=sa` directly in source, checked into the repository. Although used only at design time (`dotnet ef`), a checked-in `sa` credential normalises committing DB passwords… |
|
||||
| Core-003 | Medium | Correctness & logic bugs | `src/Core/ZB.MOM.WW.OtOpcUa.Core/Authorization/PermissionTrie.cs:80-98` | `WalkSystemPlatform` records every Galaxy folder-segment grant with `NodeAclScopeKind.Equipment` (see the comment at lines 82-86) because `NodeAclScopeKind` has no `FolderSegment` member. The functional union of permission flags is unaffec… |
|
||||
| Core-005 | Medium | Concurrency & thread safety | `src/Core/ZB.MOM.WW.OtOpcUa.Core/Authorization/PermissionTrieCache.cs:59-70` | `Prune` mutates the `ConcurrentDictionary` with a plain indexer assignment (`_byCluster[clusterId] = new ClusterEntry(...)`) after a separate `TryGetValue` read. If `Install` runs concurrently for the same cluster, the `AddOrUpdate` in `In… |
|
||||
| Core-006 | Medium | Concurrency & thread safety | `src/Core/ZB.MOM.WW.OtOpcUa.Core/OpcUa/GenericDriverNodeManager.cs:42-64` | `BuildAddressSpaceAsync` is not guarded against being called more than once. A second call subscribes a second `_alarmForwarder` to `IAlarmSource.OnAlarmEvent` and overwrites the `_alarmForwarder` field, so the first delegate is leaked (st… |
|
||||
| Core-007 | Medium | Error handling & resilience | `src/Core/ZB.MOM.WW.OtOpcUa.Core/Resilience/AlarmSurfaceInvoker.cs:75-83` | `UnsubscribeAsync` always routes through `_defaultHost`, even when an `IPerCallHostResolver` is wired and the original `SubscribeAsync` fanned the subscription out to a non-default host. The `IAlarmSubscriptionHandle` is opaque here and ca… |
|
||||
| Core.Abstractions-001 | Medium | Correctness & logic bugs | `src/Core/ZB.MOM.WW.OtOpcUa.Core.Abstractions/PollGroupEngine.cs:112` | `PollOnceAsync` detects a change with `!Equals(lastSeen?.Value, current.Value)`. `object.Equals` falls back to reference equality for reference types that do not override it — including `T[]` array values. The capability interfaces explici… |
|
||||
| Core.Abstractions-002 | Medium | Correctness & logic bugs | `src/Core/ZB.MOM.WW.OtOpcUa.Core.Abstractions/PollGroupEngine.cs:105-109` | `PollOnceAsync` iterates `state.TagReferences` and indexes the reader's result with `snapshots[i]`, assuming the driver-supplied `_reader` delegate returns exactly one snapshot per input reference in input order. The contract is documented… |
|
||||
| Core.Abstractions-003 | Medium | Concurrency & thread safety | `src/Core/ZB.MOM.WW.OtOpcUa.Core.Abstractions/PollGroupEngine.cs:64,121-130` | `Subscribe` starts the poll loop with a fire-and-forget `Task.Run` and keeps no reference to the returned `Task`. Neither `Unsubscribe` nor `DisposeAsync` awaits the loop's completion — they only cancel the `CancellationTokenSource` and di… |
|
||||
| Core.AlarmHistorian-003 | Medium | OtOpcUa conventions | `src/Core/ZB.MOM.WW.OtOpcUa.Core.AlarmHistorian/SqliteStoreAndForwardSink.cs:107-127,218-243,246-253` | `EnqueueAsync` is declared `async`-shaped (`Task EnqueueAsync(...)`) and the `IAlarmHistorianSink` contract explicitly states "the sink MUST NOT block the emitting thread … `EnqueueAsync` returns as soon as the queue row is committed." But… |
|
||||
| Core.AlarmHistorian-005 | Medium | Concurrency & thread safety | `src/Core/ZB.MOM.WW.OtOpcUa.Core.AlarmHistorian/SqliteStoreAndForwardSink.cs:66-71,141-143,199,386-388` | The mutable status fields `_lastDrainUtc`, `_lastSuccessUtc`, `_lastError`, `_drainState`, and `_backoffIndex` are written by the drain timer thread inside `DrainOnceAsync` and read concurrently by `GetStatus()` / `CurrentBackoff` on Admin… |
|
||||
| Core.AlarmHistorian-007 | Medium | Error handling & resilience | `src/Core/ZB.MOM.WW.OtOpcUa.Core.AlarmHistorian/SqliteStoreAndForwardSink.cs:172-174` | When the writer returns a wrong-cardinality result, the code throws `InvalidOperationException` after `WriteBatchAsync` has already succeeded. The events were potentially delivered to the historian, but no rows are deleted or dead-lettered… |
|
||||
| Core.AlarmHistorian-009 | Medium | Design-document adherence | `src/Core/ZB.MOM.WW.OtOpcUa.Core.AlarmHistorian/SqliteStoreAndForwardSink.cs:317-347` | `docs/AlarmTracking.md` and the `IAlarmHistorianSink` contract present the SQLite queue as the durability guarantee — "Durably enqueue the event", "operator acks never block on the historian being reachable". But `EnforceCapacity` silently… |
|
||||
| Core.AlarmHistorian-010 | Medium | Testing coverage | `tests/Core/ZB.MOM.WW.OtOpcUa.Core.AlarmHistorian.Tests/SqliteStoreAndForwardSinkTests.cs` | The test suite covers the happy paths well (Ack/Retry/PermanentFail, capacity eviction, retention purge, ctor validation) but leaves critical paths untested: (a) no test exercises a corrupt / `null`-deserializing `PayloadJson` row, so the… |
|
||||
| Core.ScriptedAlarms-002 | Medium | Correctness & logic bugs | `ScriptedAlarmEngine.cs:162`, `ScriptedAlarmEngine.cs:90` | `LoadAsync` is written to be re-callable — it begins by calling `UnsubscribeFromUpstream()`, `_alarms.Clear()`, and `_alarmsReferencing.Clear()` (lines 90-92), which only makes sense if a reload is supported. But at line 162 it uncondition… |
|
||||
| Core.ScriptedAlarms-004 | Medium | Concurrency & thread safety | `ScriptedAlarmEngine.cs:138-143`, `ScriptedAlarmEngine.cs:227-234` | During `LoadAsync`, `_upstream.SubscribeTag(path, OnUpstreamChange)` is called inside the `_evalGate` critical section (line 142). If an upstream implementation delivers an initial value synchronously from inside `SubscribeTag` (a common p… |
|
||||
| Core.ScriptedAlarms-005 | Medium | Concurrency & thread safety | `ScriptedAlarmEngine.cs:365-369`, `ScriptedAlarmEngine.cs:416-424` | `Dispose` sets `_disposed = true`, disposes `_shelvingTimer`, and clears `_alarms`. A `RunShelvingCheck` callback already in flight on a thread-pool thread can have passed its `if (_disposed) return;` check (line 367) before `Dispose` ran,… |
|
||||
| Core.ScriptedAlarms-007 | Medium | Error handling & resilience | `ScriptedAlarmEngine.cs:216`, `ScriptedAlarmEngine.cs:251`, `ScriptedAlarmEngine.cs:154`, `ScriptedAlarmEngine.cs:387` | Every state mutation calls `await _store.SaveAsync(...)` and relies on it succeeding. If the production SQL-backed `IAlarmStateStore` (Stream E) throws — transient SQL outage, deadlock, timeout — the exception propagates: in `ApplyAsync` i… |
|
||||
| Core.ScriptedAlarms-012 | Medium | Testing coverage | `tests/Core/ZB.MOM.WW.OtOpcUa.Core.ScriptedAlarms.Tests/ScriptedAlarmEngineTests.cs` | Several engine behaviours central to the module have no test coverage: (1) the 5-second shelving timer / timed-shelve auto-expiry through the *engine* — only the pure `Part9StateMachine.ApplyShelvingCheck` is tested, never `ScriptedAlarmEn… |
|
||||
| Core.Scripting-003 | Medium | Security | `TimedScriptEvaluator.cs:9`, `ScriptSandbox.cs:30` | There is no bound on memory a script may allocate or on the number of threads/tasks a script may spawn. The class docs acknowledge unbounded memory as "a budget concern" deferred to v3, but in-process execution means a script doing `new by… |
|
||||
| Core.Scripting-004 | Medium | Correctness & logic bugs | `DependencyExtractor.cs:73` | The walker matches tag-access calls purely by spelling — any `InvocationExpressionSyntax` whose member name is `GetTag` or `SetVirtualTag` is treated as a `ScriptContext` tag access, regardless of the receiver. A script that defines a loca… |
|
||||
| Core.Scripting-007 | Medium | Error handling & resilience | `TimedScriptEvaluator.cs:60` | `RunAsync` wraps the inner run in `Task.Run(...)` and then awaits `WaitAsync(Timeout, ct)`. If the caller-supplied `ct` cancels at roughly the same time the timeout elapses, the order in which `WaitAsync` observes the timeout vs. the cance… |
|
||||
| Core.Scripting-010 | Medium | Testing coverage | `tests/Core/ZB.MOM.WW.OtOpcUa.Core.Scripting.Tests/ScriptSandboxTests.cs:54` | The sandbox-escape test suite covers only the four obvious vectors (File / Http / Process / Reflection) as direct member-access calls. It does not test: `typeof(forbidden)`, generic type arguments (`List<FileInfo>`), cast expressions to fo… |
|
||||
| Core.VirtualTags-002 | Medium | Correctness & logic bugs | `src/Core/ZB.MOM.WW.OtOpcUa.Core.VirtualTags/VirtualTagEngine.cs:237` | The cold-start guard `if (!AreInputsReady(ctxCache)) return;` silently abandons the evaluation when any input is null or Bad-quality. For a chained virtual tag (C depends on B depends on driver tag A), if A is still Bad at startup, B is sk… |
|
||||
| Core.VirtualTags-003 | Medium | Correctness & logic bugs | `src/Core/ZB.MOM.WW.OtOpcUa.Core.VirtualTags/VirtualTagEngine.cs:117-120` | The upstream-subscription loop in `Load` iterates `definitions.SelectMany(d => _tags[d.Path].Reads)`. If `definitions` contains two rows with the same Path, the first registers `_tags[Path]` and the second overwrites it, but `definitions`… |
|
||||
| Core.VirtualTags-005 | Medium | Concurrency & thread safety | `src/Core/ZB.MOM.WW.OtOpcUa.Core.VirtualTags/VirtualTagSource.cs:50-64` | `SubscribeAsync` registers the per-path engine observers first (lines 52-56), then in a second loop reads the current value and fires the initial-data callback (lines 60-64). Between those two loops an upstream change can cascade and the e… |
|
||||
| Core.VirtualTags-008 | Medium | Performance & resource management | `src/Core/ZB.MOM.WW.OtOpcUa.Core.VirtualTags/DependencyGraph.cs:81-115` | `TransitiveDependentsInOrder` calls `TopologicalSort()` (a full O(V+E) Kahn pass plus a Dictionary rank build) on every invocation, and it is invoked from `CascadeAsync` on every upstream change event (`OnUpstreamChange`). On a large graph… |
|
||||
| Core.VirtualTags-012 | Medium | Testing coverage | `tests/Core/ZB.MOM.WW.OtOpcUa.Core.VirtualTags.Tests/` | Several behaviours of the engine have no test coverage: (1) the cold-start `AreInputsReady` guard -- no test exercises an upstream that is null/Bad at evaluation time and asserts the resulting tag state (see Core.VirtualTags-002); (2) `ctx… |
|
||||
| Driver.AbCip-004 | Medium | Correctness & logic bugs | `AbCipDataType.cs:51-58`, `LibplctagTagRuntime.cs:47-49,53` | `ToDriverDataType` maps `LInt`/`ULInt` to `DriverDataType.Int32` (a TODO comment notes the gap) and `Dt` to `Int32`. But `LibplctagTagRuntime.DecodeValueAt` returns an actual `long` for `LInt`/`ULInt` (`_tag.GetInt64`, `(long)_tag.GetUInt6… |
|
||||
| Driver.AbCip-005 | Medium | Correctness & logic bugs | `AbCipDriver.cs:124-141` | In `InitializeAsync`, when a `Structure` tag declares `Members`, the loop registers each fanned-out member into `_tagsByName` but the parent Structure tag itself is also left in `_tagsByName` (added at line 125 before the member check). A… |
|
||||
| Driver.AbCip-006 | Medium | OtOpcUa conventions | `PlcTagHandle.cs:28-59`, `AbCipDriver.cs:806-807,832-833`, `LibplctagTagRuntime.cs:117` | `driver-specs.md` makes the SafeHandle-wrapped native handle a non-negotiable Tier-B protection ("Wrap every libplctag handle in a SafeHandle with finalizer calling plc_tag_destroy"). The repo ships `PlcTagHandle : SafeHandle` for this, bu… |
|
||||
| Driver.AbCip-009 | Medium | Concurrency & thread safety | `AbCipDriver.cs:621-648`, `AbCipDriver.cs:591-614` | `EnsureTagRuntimeAsync` and `EnsureParentRuntimeAsync` are check-then-act on a non-thread-safe `Dictionary` (`device.Runtimes` / `device.ParentRuntimes`). `ReadAsync` is `IReadable` and may be invoked concurrently: the server read path, ea… |
|
||||
| Driver.AbCip-010 | Medium | Error handling & resilience | `AbCipDriver.cs:621-648`, `AbCipDriver.cs:346-391` | Once `EnsureTagRuntimeAsync` successfully creates and initializes a `LibplctagTagRuntime`, that runtime is cached for the lifetime of the device and never re-created on failure. If the underlying native tag enters a permanently-bad state (… |
|
||||
| Driver.AbCip-014 | Medium | Testing coverage | `tests/Drivers/ZB.MOM.WW.OtOpcUa.Driver.AbCip.Tests/AbCipStatusMapperTests.cs:28-40` | `AbCipStatusMapperTests.MapLibplctagStatus_maps_known_codes` asserts the mapper against the same wrong integer constants (-5, -7, -14, -16, -17) the production code uses (see Driver.AbCip-002). The test locks in the bug rather than catchin… |
|
||||
| Driver.AbCip.Cli-001 | Medium | Error handling & resilience | `src/Drivers/Cli/ZB.MOM.WW.OtOpcUa.Driver.AbCip.Cli/Commands/WriteCommand.cs:70-85` | `ParseValue` parses every numeric Logix type with the BCL `*.Parse` methods (`sbyte.Parse`, `short.Parse`, `int.Parse`, `float.Parse`, ...). These throw the raw `FormatException` and `OverflowException` on bad operator input. The module's… |
|
||||
| Driver.AbCip.Cli-002 | Medium | Correctness & logic bugs | `src/Drivers/Cli/ZB.MOM.WW.OtOpcUa.Driver.AbCip.Cli/Commands/ProbeCommand.cs:21-23`; `Commands/ReadCommand.cs:24-25`; `Commands/SubscribeCommand.cs:20-22` | `ProbeCommand`, `ReadCommand`, and `SubscribeCommand` expose `--type` as a free `AbCipDataType` enum option with no exclusion of `AbCipDataType.Structure`. Only `WriteCommand` rejects `Structure` (with an explicit `CommandException`). Pass… |
|
||||
| Driver.AbLegacy-002 | Medium | Correctness & logic bugs | `AbLegacyDriver.cs:368` | In `WriteBitInWordAsync` the parent word is decoded with `Convert.ToInt32(parentRuntime.DecodeValue(AbLegacyDataType.Int, ...))`. `LibplctagLegacyTagRuntime.DecodeValue` for `AbLegacyDataType.Int` returns `(int)_tag.GetInt16(0)` - a sign-e… |
|
||||
| Driver.AbLegacy-003 | Medium | Correctness & logic bugs | `AbLegacyAddress.cs:62-95` | `TryParse` does not reject several malformed PCCC addresses that the XML docs imply are invalid: - A sub-element and a bit index together (`T4:0.ACC/2`) parse successfully even though no PCCC element supports both. - I/O/S files with a fil… |
|
||||
| Driver.AbLegacy-004 | Medium | Correctness & logic bugs | `LibplctagLegacyTagRuntime.cs:36-37` | `DecodeValue` for `AbLegacyDataType.Bit` with `bitIndex == null` returns `_tag.GetInt8(0) != 0`. A bit-file element (`B3:0/0`) is a single bit inside a 16-bit word; reading only the low byte (`GetInt8(0)`) means a `Bit` tag whose live bit… |
|
||||
| Driver.AbLegacy-007 | Medium | Concurrency & thread safety | `AbLegacyDriver.cs:411-438`, `AbLegacyDriver.cs:386-409` | `EnsureTagRuntimeAsync` and `EnsureParentRuntimeAsync` are check-then-act: `device.Runtimes.TryGetValue(...)` then, after `await runtime.InitializeAsync`, `device.Runtimes[def.Name] = runtime`. `Dictionary` is not thread-safe, and two conc… |
|
||||
| Driver.AbLegacy-008 | Medium | Concurrency & thread safety | `AbLegacyDriver.cs:21`, `AbLegacyDriver.cs:138-146`, `AbLegacyDriver.cs:216-229` | `_health` is a plain non-volatile reference field mutated from `ReadAsync`, `WriteAsync` (both can run on multiple threads / poll loops) and `InitializeAsync`/`ShutdownAsync`, and read by `GetHealth()` from yet another thread. There is no… |
|
||||
| Driver.AbLegacy-009 | Medium | Error handling & resilience | `AbLegacyDriver.cs:41-74` | `InitializeAsync` starts probe loops with `Task.Run` inside the try block. If `InitializeAsync` fails - or is re-entered - after some probe loops are already started, the catch only sets `_health = Faulted` and rethrows; it does not cancel… |
|
||||
| Driver.AbLegacy-010 | Medium | Error handling & resilience | `AbLegacyStatusMapper.cs:26-56` | `MapLibplctagStatus` maps the integer codes -5/-7/-14/-16/-17. These do not match the native libplctag PLCTAG_ERR_* constants (PLCTAG_ERR_TIMEOUT = -32, PLCTAG_ERR_NOT_FOUND = -22, PLCTAG_ERR_NOT_ALLOWED = -21, PLCTAG_ERR_OUT_OF_BOUNDS = -… |
|
||||
| Driver.AbLegacy-012 | Medium | Design-document adherence | `PlcFamilies/AbLegacyPlcFamilyProfile.cs:7-54`, `AbLegacyDriver.cs:48-52` | `AbLegacyPlcFamilyProfile` declares four record properties - `DefaultCipPath`, `MaxTagBytes`, `SupportsStringFile`, `SupportsLongFile` - and only `LibplctagPlcAttribute` is ever consumed. In particular: - `DefaultCipPath` is dead: the per-… |
|
||||
| Driver.AbLegacy.Cli-001 | Medium | Error handling & resilience | `Commands/WriteCommand.cs:46`, `Commands/WriteCommand.cs:62-72` | `WriteCommand.ExecuteAsync` calls `ParseValue(Value, DataType)` at line 46, *before* the `try` block and outside any catch. `ParseValue` uses `short.Parse` / `int.Parse` / `float.Parse`, which throw `FormatException` on malformed input (`-… |
|
||||
| Driver.Cli.Common-002 | Medium | Correctness & logic bugs | `src/Drivers/Cli/ZB.MOM.WW.OtOpcUa.Driver.Cli.Common/SnapshotFormatter.cs:101-122` | `FormatStatus` matches the full 32-bit status word for exact equality against the shortlist. OPC UA status codes carry sub-code/flag bits in the low 16 bits (info type, structure-changed, semantics-changed, limit bits, overflow, etc.). A d… |
|
||||
| Driver.Cli.Common-003 | Medium | Concurrency & thread safety | `src/Drivers/Cli/ZB.MOM.WW.OtOpcUa.Driver.Cli.Common/DriverCommandBase.cs:51-59` | `ConfigureLogging` assigns the process-global `Serilog.Log.Logger` without disposing the previously assigned logger and the library never calls `Log.CloseAndFlush()`. Each call creates a fresh `Logger` via `CreateLogger()` and overwrites `… |
|
||||
| Driver.Cli.Common-005 | Medium | Testing coverage | `tests/Drivers/Cli/ZB.MOM.WW.OtOpcUa.Driver.Cli.Common.Tests/SnapshotFormatterTests.cs:27-37` | The `FormatStatus_names_well_known_status_codes` `[Theory]` asserts `0x80060000 => "BadTimeout"`, which encodes the wrong spec value (see Driver.Cli.Common-001). The test passes because it validates the formatter against the same incorrect… |
|
||||
| Driver.FOCAS-003 | Medium | Correctness & logic bugs | `FocasDriver.cs:71-79` | In `InitializeAsync`, capability-matrix validation only runs when `_devices.TryGetValue(tag.DeviceHostAddress, out var device)` succeeds. A tag whose `DeviceHostAddress` does not match any configured device (a common config typo, e.g. a tr… |
|
||||
| Driver.FOCAS-004 | Medium | OtOpcUa conventions | `FocasDriver.cs:374-379`, `WireFocasClient.cs:48-50` | `DiscoverAsync` emits user tags with `SecurityClass = tag.Writable ? SecurityClassification.Operate : SecurityClassification.ViewOnly`, and `FocasTagDefinition.Writable` defaults to `true` (also defaulted to `true` in the factory - `t.Writ… |
|
||||
| Driver.FOCAS-005 | Medium | Concurrency & thread safety | `FocasDriver.cs:28`, `FocasDriver.cs:206-215`, `FocasDriver.cs:261`, `FocasDriver.cs:274` | `_health` is a plain (non-volatile) field mutated from multiple concurrent contexts - `ReadAsync`, `WriteAsync`, and the per-device `ProbeLoopAsync` can all run on different threads simultaneously (subscriptions go through `PollGroupEngine… |
|
||||
| Driver.FOCAS-006 | Medium | Error handling & resilience | `FocasDriver.cs:859-874`, `WireFocasClient.cs:22-31` | `EnsureConnectedAsync` reuses the cached `IFocasClient` instance across a transient disconnect: it only checks `device.Client is { IsConnected: true }` and otherwise calls `ConnectAsync` again on the same object. For a `WireFocasClient` wh… |
|
||||
| Driver.FOCAS-012 | Medium | Testing coverage | `FocasDriverFactoryExtensions.cs`, `FocasDriver.cs:495-629` (`FixedTreeLoopAsync`) | The unit test project does not exercise `FocasDriverFactoryExtensions.CreateInstance` with `FixedTree` / `AlarmProjection` / `HandleRecycle` config sections - which is why the config-mapping gap in Driver.FOCAS-001 was not caught. There is… |
|
||||
| Driver.Galaxy-003 | Medium | Correctness & logic bugs | `Runtime/StatusCodeMap.cs:86` | `FromMxStatus` returns `Good` whenever `status.Success != 0`. The intent (per the surrounding comment "Honors the success flag") is that a non-zero `Success` means success. But if `MxStatusProxy.Success` is itself a native HRESULT/return c… |
|
||||
| Driver.Galaxy-004 | Medium | Correctness & logic bugs | `GalaxyDriver.cs:901` | `OnPumpDataChange` reconstructs a raw OPC DA quality byte from an OPC UA `StatusCode` for the probe watcher: it shifts `StatusCode >> 30` and maps `0->192, 1->64, _->0`. The `StatusCode` was itself produced upstream by `StatusCodeMap.FromQ… |
|
||||
| Driver.Galaxy-006 | Medium | Concurrency & thread safety | `GalaxyDriver.cs:848-861` | `OnAlarmFeedTransition` picks the "owner" handle with `_alarmSubscriptions.First()` under `_alarmHandlersLock`. `HashSet<T>.First()` enumeration order is unspecified and unstable across mutations — when multiple alarm subscriptions are act… |
|
||||
| Driver.Galaxy-007 | Medium | Concurrency & thread safety | `GalaxyDriver.cs:937-968` | `Dispose()` is not synchronized against the capability methods. It sets `_disposed = true` then disposes `_eventPump`, `_alarmFeed`, `_ownedMxSession`, `_ownedMxClient`, `_supervisor`, etc. A concurrent `SubscribeAsync`/`ReadAsync`/`WriteA… |
|
||||
| Driver.Galaxy-009 | Medium | Error handling & resilience | `GalaxyDriver.cs:354-371` | `StartDeployWatcher` launches the watch loop with `_ = _deployWatcher.StartAsync(CancellationToken.None)` — a fire-and-forget with a discarded `Task`. `StartAsync` can throw synchronously (`InvalidOperationException` if already started); t… |
|
||||
| Driver.Galaxy-011 | Medium | Performance & resource management | `GalaxyDriver.cs:411` | `GetMemoryFootprint()` unconditionally returns `0` with a comment "PR 4.4 sets this from SubscriptionRegistry size" — PR 4.4 has shipped (the registry exists and is used) but the method was never updated. `IHostConnectivityProbe.GetMemoryF… |
|
||||
| Driver.Galaxy-014 | Medium | Testing coverage | `src/Drivers/ZB.MOM.WW.OtOpcUa.Driver.Galaxy` (module-wide) | The reconnect/recovery path is the module's highest-risk surface and is effectively untested at the integration seam. The `ReconnectSupervisor` has a clean test seam (injectable `reopen`/`replay`/`backoffDelay`), but because nothing wires… |
|
||||
| Driver.Historian.Wonderware-002 | Medium | Correctness and logic bugs | `Ipc/HistorianFrameHandler.cs:162`, `:181` | `HandleWriteAlarmEventsAsync` dereferences `req.Events.Length` in both the `_alarmWriter is null` branch (line 162) and the catch block (line 181). MessagePack deserializes an absent or explicit-nil array field as a `null` reference, not `… |
|
||||
| Driver.Historian.Wonderware-003 | Medium | Correctness and logic bugs | `Backend/HistorianDataSource.cs:320-323`, `:457-460` | Raw and at-time reads decide whether a sample is a string or a numeric with `if (!string.IsNullOrEmpty(result.StringValue) && result.Value == 0)`. The `result.Value == 0` clause is intended to distinguish a real numeric zero from a string… |
|
||||
| Driver.Historian.Wonderware-006 | Medium | Error handling and resilience | `Ipc/PipeServer.cs:120-128` | `RunAsync` re-accepts connections in a `while` loop. If `RunOneConnectionAsync` throws synchronously and immediately on every iteration (for example `new NamedPipeServerStream(...)` fails because the pipe name is already in use, or `PipeAc… |
|
||||
| Driver.Historian.Wonderware-009 | Medium | Performance and resource management | `Backend/HistorianDataSource.cs:382-395`, `Ipc/Contracts.cs:85-99` | `ReadAggregateAsync` drains `query.MoveNext` into `results` with no upper bound, unlike `ReadRawAsync`, which honours `maxValues` / `MaxValuesPerRead` and breaks. `ReadProcessedRequest` carries no max-buckets field. A processed read over a… |
|
||||
| Driver.Historian.Wonderware.Client-002 | Medium | Correctness & logic bugs | `WonderwareHistorianClient.cs:154-199`, `IAlarmHistorianSink.cs:66-74` | `WriteBatchAsync` can never return `HistorianWriteOutcome.PermanentFail`. `HistorianWriteOutcome` defines three states (`Ack`, `RetryPlease`, `PermanentFail`) and the drain worker is documented to move the event to the dead-letter table on… |
|
||||
| Driver.Historian.Wonderware.Client-005 | Medium | Error handling & resilience | `Ipc/FrameReader.cs:31-32` | After reading the 4-byte length prefix, `ReadFrameAsync` reads the kind byte with the synchronous, blocking `_stream.ReadByte()` and ignores the `CancellationToken`. On a `NamedPipeClientStream` with `PipeOptions.Asynchronous`, a synchrono… |
|
||||
| Driver.Historian.Wonderware.Client-007 | Medium | Security | `WonderwareHistorianClient.cs:276` | `ToSnapshots` deserializes peer-supplied bytes with `MessagePackSerializer.Deserialize<object>(dto.ValueBytes)`, typeless MessagePack deserialization. The `object` overload resolves runtime types from the wire payload. The client treats th… |
|
||||
| Driver.Historian.Wonderware.Client-009 | Medium | Testing coverage | `tests/Drivers/ZB.MOM.WW.OtOpcUa.Driver.Historian.Wonderware.Client.Tests/WonderwareHistorianClientTests.cs` | The suite covers happy paths, server-error, bad-secret, a single reconnect and health counters, but several critical paths are untested: (1) `ReadAtTimeAsync` with a partial/reordered sidecar reply, the contract-alignment case from finding… |
|
||||
| Driver.Modbus-002 | Medium | Correctness & logic bugs | `ModbusDriver.cs:127-186` | `ShutdownAsync` never clears `_tagsByName`, and `InitializeAsync` repopulates it with `_tagsByName[t.Name] = t` (`ModbusDriver.cs:134`) without clearing first. `ReinitializeAsync` calls `ShutdownAsync` then `InitializeAsync`. Because `_opt… |
|
||||
| Driver.Modbus-004 | Medium | Performance & resource management | `ModbusDriver.cs:1468-1473` | `DisposeAsync()` only disposes `_transport`. Unlike `ShutdownAsync`, it does not cancel/dispose `_probeCts` or `_reprobeCts`, nor dispose `_poll` (the `PollGroupEngine`). A caller that uses `await using` or `using` without first calling `S… |
|
||||
| Driver.Modbus-005 | Medium | Correctness & logic bugs | `ModbusDriver.cs:777-798,323-330` | `ReadRegisterBlockAsync` and `ReadBitBlockAsync` index `resp[1]` and call `Buffer.BlockCopy(resp, 2, ..., resp[1])` with no bounds validation. `ModbusTcpTransport.SendOnceAsync` validates only the MBAP length field and the exception high-b… |
|
||||
| Driver.Modbus-006 | Medium | Error handling & resilience | `ModbusDriver.cs:514-524,532-550` | `RunReprobeOnceForTestAsync` reads `_transport` once at the top (`var transport = _transport ?? throw ...`). If `ShutdownAsync` runs (setting `_transport = null` and disposing it) while a re-probe pass is mid-iteration, the loop keeps issu… |
|
||||
| Driver.Modbus.Addressing-002 | Medium | Correctness & logic bugs | `ModbusAddressParser.cs:86-94` | In the 3-field disambiguation, an empty 3rd field (`40001:F:`) reaches `parts[2].All(char.IsDigit)`. `Enumerable.All` returns true for an empty sequence, so the empty string is classified as a valid-shaped array count, assigned to `countPa… |
|
||||
| Driver.Modbus.Addressing-003 | Medium | Correctness & logic bugs | `ModbusAddressParser.cs:405-406`, `ModbusAddressParser.cs:128` | `LooksLikeByteOrderToken` classifies any 4-letter token as a byte-order token. A 3-field address whose 3rd field is a 4-letter type-like token (e.g. `40001:S:BOOL`) is routed into `TryParseByteOrder`, producing the misleading diagnostic "U… |
|
||||
| Driver.Modbus.Addressing-004 | Medium | Correctness & logic bugs | `ModbusAddressParser.cs:182-194` | The bit suffix is stripped using `text.IndexOf('.')` — the first dot. An input such as `40001.5.3` produces a bit text of "5.3", rejected by `byte.TryParse` with the generic "Bit index must be 0..15" message. A Modicon-style decimal-point… |
|
||||
| Driver.Modbus.Addressing-005 | Medium | Error handling & resilience | `ModbusAddressParser.cs:200-213` | `TryParseRegionAndOffset` tries family-native, then mnemonic, then Modicon. When all three fail it returns false with whatever error the Modicon parser last wrote (comment: "the Modicon error is the more specific diagnostic"). For a non-Ge… |
|
||||
| Driver.Modbus.Addressing-008 | Medium | Testing coverage | `tests/Drivers/ZB.MOM.WW.OtOpcUa.Driver.Modbus.Addressing.Tests/` | Several edge cases of the address arithmetic are untested or asserted wrong: (a) DL205 system V-memory mapping is tested only with the incorrect expected value (`ModbusFamilyParserTests.cs:20`, see finding -001); (b) there is no test for `… |
|
||||
| Driver.Modbus.Cli-001 | Medium | Correctness & logic bugs | `src/Drivers/Cli/ZB.MOM.WW.OtOpcUa.Driver.Modbus.Cli/Commands/SubscribeCommand.cs:43-51` | `SubscribeCommand` synthesises its `ModbusTagDefinition` with only `Name`, `Region`, `Address`, `DataType`, `Writable`, and `ByteOrder` — it never exposes or passes `--bit-index`, `--string-length`, or `--string-byte-order`. A user running… |
|
||||
| Driver.Modbus.Cli-002 | Medium | Correctness & logic bugs | `src/Drivers/Cli/ZB.MOM.WW.OtOpcUa.Driver.Modbus.Cli/Commands/WriteCommand.cs:54-89` | `WriteCommand` rejects read-only regions (`DiscreteInputs` / `InputRegisters`) but does not validate that `--type` is meaningful for the `Coils` region. `write -r Coils -a 5 -t UInt16 -v 42` builds a `Coils` tag with `DataType = UInt16`; t… |
|
||||
| Driver.OpcUaClient-006 | Medium | Concurrency & thread safety | `OpcUaClientDriver.cs:1330-1359` | OnReconnectComplete mutates `Session` (line 1347) directly from the reconnect-handler callback thread with no synchronization against ReadAsync/WriteAsync/DiscoverAsync/ShutdownAsync. Session is a plain auto-property with no memory barrier… |
|
||||
| Driver.OpcUaClient-007 | Medium | Concurrency & thread safety | `OpcUaClientDriver.cs:1374`, `:1376-1383`, `:508` | Two disposal races. (1) Dispose() does `DisposeAsync().AsTask().GetAwaiter().GetResult()`, synchronous blocking on async work. The Galaxy stability review (driver-stability.md, the 2026-04-13 findings) explicitly calls out sync-over-async… |
|
||||
| Driver.OpcUaClient-008 | Medium | Error handling & resilience | `OpcUaClientDriver.cs:1092-1099` | AcknowledgeAsync issues the batched CallAsync and then catches all exceptions with a best-effort empty catch; it also never inspects the per-call results in the success path (`_ = await session.CallAsync(...)`). An alarm acknowledgment the… |
|
||||
| Driver.OpcUaClient-009 | Medium | Error handling & resilience | `OpcUaClientDriver.cs:560-564` | WriteAsync's catch block fans out BadCommunicationError across the whole batch on any exception. Writes are non-idempotent by default (IWritable remarks, decision #44/#45): a timeout exception may fire after the upstream server already app… |
|
||||
| Driver.OpcUaClient-010 | Medium | Correctness & logic bugs | `OpcUaClientDriver.cs:823-824` | MapUpstreamDataType maps DataTypeIds.Byte (the OPC UA unsigned 8-bit type) to DriverDataType.Int16. Byte should map to an unsigned driver type (UInt16 is the smallest unsigned available, matching how SByte belongs with the signed family).… |
|
||||
| Driver.OpcUaClient-012 | Medium | Security | `OpcUaClientDriver.cs:210-217` | When AutoAcceptCertificates is true the driver registers a CertificateValidation handler that accepts only StatusCodes.BadCertificateUntrusted. A self-signed or otherwise untrusted server certificate frequently fails validation with a diff… |
|
||||
| Driver.OpcUaClient-013 | Medium | Performance & resource management | `OpcUaClientDriver.cs:436-437` | GetMemoryFootprint() is hard-coded to return 0 and FlushOptionalCachesAsync is a no-op Task.CompletedTask. docs/v2/driver-stability.md section "In-process only (Tier A/B)" makes per-instance allocation tracking a contract requirement, and… |
|
||||
| Driver.OpcUaClient-015 | Medium | Testing coverage | `tests/Drivers/ZB.MOM.WW.OtOpcUa.Driver.OpcUaClient.Tests/*`, `tests/Drivers/ZB.MOM.WW.OtOpcUa.Driver.OpcUaClient.IntegrationTests/OpcUaClientSmokeTests.cs` | Unit-test coverage is solid for the pure mappers (MapSeverity, MapUpstreamDataType, MapSecurityPolicy, MapAggregateToNodeId, BuildCertificateIdentity, ResolveEndpointCandidates) and for "throws before init" guards, but the highest-risk beh… |
|
||||
| Driver.S7-002 | Medium | Correctness & logic bugs | `S7Driver.cs:350` | MapDataType collapses S7DataType.UInt32 to DriverDataType.Int32. UInt32 values above int.MaxValue (2^31-1) wrap to negative when surfaced to the OPC UA client, silently corrupting the value. The inline comment only flags Int64/UInt64 as "w… |
|
||||
| Driver.S7-004 | Medium | OtOpcUa conventions | `S7Driver.cs` (whole file) | The driver performs no logging. CLAUDE.md Library Preferences mandate Serilog with a rolling daily file sink. Every error path is an empty catch block (Initialize cleanup line 130, ShutdownAsync lines 142/149/153, ProbeLoop line 483, PollL… |
|
||||
| Driver.S7-008 | Medium | Error handling & resilience | `S7Driver.cs:286` | WriteAsync catch ladder is coarser than ReadAsync and loses information. The generic catch (Exception) maps everything - socket errors, timeouts, OverflowException from Convert.ToInt16 of an out-of-range value, NullReferenceException from… |
|
||||
| Driver.S7-012 | Medium | Design-document adherence | `S7DriverOptions.cs:59`, `S7Driver.cs:457` | S7ProbeOptions.ProbeAddress is configured (default "MW0"), documented at length ("the driver runs a tick loop that issues a cheap read against S7ProbeOptions.ProbeAddress"), surfaced in the factory DTO (S7ProbeDto.ProbeAddress), and parsed… |
|
||||
| Driver.S7-014 | Medium | Testing coverage | `tests/Drivers/ZB.MOM.WW.OtOpcUa.Driver.S7.Tests/` | Test coverage has notable gaps for the driver behavioural core: (1) no test exercises the ReadOneAsync type-reinterpret switch (Int16 from ushort, Int32 from uint, Float32 from UInt32 bits) - the most logic-heavy method in the driver is un… |
|
||||
| Driver.S7.Cli-001 | Medium | Error handling & resilience | `src/Drivers/Cli/ZB.MOM.WW.OtOpcUa.Driver.S7.Cli/Commands/WriteCommand.cs:65-80` | `WriteCommand.ParseValue` parses numeric and `DateTime` values with the raw BCL parsers (`short.Parse`, `float.Parse`, `DateTime.Parse`, etc.). On malformed input these throw `FormatException` / `OverflowException`, which are *not* `CliFx.… |
|
||||
| Driver.S7.Cli-002 | Medium | Design-document adherence | `src/Drivers/Cli/ZB.MOM.WW.OtOpcUa.Driver.S7.Cli/Commands/ReadCommand.cs:22-29`, `Commands/WriteCommand.cs:21-33`, `Commands/SubscribeCommand.cs:18-21`; `docs/Driver.S7.Cli.md:70-73,80-81` | The `--type` option help text on `read`, `write`, and `subscribe` advertises the full `S7DataType` set (`Int64 / UInt64 / Float64 / String / DateTime`), and `docs/Driver.S7.Cli.md` shows a worked `read ... -t String --string-length 80` exa… |
|
||||
| Driver.S7.Cli-003 | Medium | Error handling & resilience | `src/Drivers/Cli/ZB.MOM.WW.OtOpcUa.Driver.S7.Cli/Commands/ProbeCommand.cs:38-50` | `ProbeCommand` XML doc and the `Driver.S7.Cli.md` "fastest is the device talking" framing say the probe "connects ... prints health" and "surfaces `BadNotSupported`" when PUT/GET is disabled. But when the PLC is unreachable (connection ref… |
|
||||
| Driver.TwinCAT-003 | Medium | Correctness & logic bugs | `AdsTwinCATClient.cs:264-281`, `283-300` | `MapToClrType` has a `_ => typeof(int)` fallthrough and `ConvertForWrite` has a `_ => throw NotSupportedException` fallthrough. `TwinCATDataType.Structure` is a declared enum member, and a config-supplied tag can carry `DataType: "Structur… |
|
||||
| Driver.TwinCAT-005 | Medium | OtOpcUa conventions | `TwinCATDriver.cs` (whole file), `AdsTwinCATClient.cs` (whole file) | The driver performs no logging. `CLAUDE.md` Library Preferences mandate Serilog with a rolling daily file sink. Connect failures, ADS error codes, symbol-browse failures (`DiscoverAsync` swallows them in a bare `catch`), notification-regis… |
|
||||
| Driver.TwinCAT-009 | Medium | Concurrency & thread safety | `TwinCATDriver.cs:80-99`, `41-72`, `366-388` | `ShutdownAsync` mutates `_devices`, `_tagsByName`, and `_nativeSubs` with no synchronization while `ReadAsync`/`WriteAsync`/`SubscribeAsync` may be iterating or indexing those same plain `Dictionary<>` instances on other threads (`_devices… |
|
||||
| Driver.TwinCAT-010 | Medium | Error handling & resilience | `AdsTwinCATClient.cs:178-195` | `BrowseSymbolsAsync` checks `cancellationToken.IsCancellationRequested` and does `yield break` (a clean completion) rather than throwing `OperationCanceledException`. `DiscoverAsync` (`TwinCATDriver.cs:274`) explicitly has `catch (Operatio… |
|
||||
| Driver.TwinCAT-011 | Medium | Error handling & resilience | `TwinCATStatusMapper.cs:29-42` | ADS error-code mapping has gaps and an inconsistency versus `docs/v2/driver-specs.md` section 6. The spec documents symbol-not-found as 0x0701 (1793 decimal) and symbol-version-changed as 0x0702 (1794 decimal). `MapAdsError` maps decimal 1… |
|
||||
| Driver.TwinCAT-012 | Medium | Performance & resource management | `TwinCATDriver.cs:102`, `AdsTwinCATClient.cs:178-195` | `GetMemoryFootprint()` returns a hard-coded 0. `docs/v2/driver-stability.md` section "In-process only (Tier A/B) — driver-instance allocation tracking" requires the footprint to reflect "bytes attributable to their own caches (symbol cache… |
|
||||
| Server-003 | Medium | Correctness & logic bugs | `src/Server/ZB.MOM.WW.OtOpcUa.Server/Phase7/RingBufferHistoryWriter.cs:96-119` | `ReadRawAsync`'s XML doc claims "newest-first," but `TagRingBuffer.Snapshot()` returns oldest-to-newest and the loop preserves that order — so results are oldest-first. Also `maxValuesPerNode` is capped against total buffer size *before* t… |
|
||||
| Server-005 | Medium | Concurrency & thread safety | `src/Server/ZB.MOM.WW.OtOpcUa.Server/Alarms/AlarmConditionService.cs:166`, `src/Server/ZB.MOM.WW.OtOpcUa.Server/OpcUa/DriverNodeManager.cs:303-311` | `OnValueChanged` raises `TransitionRaised` on the value-change thread; the subscriber `OnAlarmServiceTransition` drives `ConditionSink.OnTransition` → `alarm.ReportEvent`. `DriverNodeManager.Dispose` detaches the handler but does not synch… |
|
||||
| Server-007 | Medium | Error handling & resilience | `src/Server/ZB.MOM.WW.OtOpcUa.Server/OpcUa/OpcUaApplicationHost.cs:179-183` | `HealthEndpointsHost` is built without a `configDbHealthy` delegate, so the default `() => true` is used — `/healthz` always reports `configDbReachable = true` and never 503s on a DB outage. `_staleConfigFlag` is also never supplied by `Pr… |
|
||||
| Server-010 | Medium | Security | `src/Server/ZB.MOM.WW.OtOpcUa.Server/OpcUa/OpcUaServerOptions.cs:59`, `src/Server/ZB.MOM.WW.OtOpcUa.Server/OpcUa/OpcUaApplicationHost.cs:284-291` | `AutoAcceptUntrustedClientCertificates` defaults to `true` (`Program.cs` reads `?? true`). `BuildConfiguration` wires a handler that accepts any client cert failing with `BadCertificateUntrusted`. A deployment that forgets to flip the flag… |
|
||||
| Server-011 | Medium | Security | `src/Server/ZB.MOM.WW.OtOpcUa.Server/OpcUa/OpcUaApplicationHost.cs:322-346` | `BuildUserTokenPolicies` advertises a `UserName` token policy only when `SecurityProfile == Basic256Sha256SignAndEncrypt && Ldap.Enabled`. With the default `SecurityProfile = None` and `Ldap.Enabled = true`, the LDAP authenticator is wired… |
|
||||
| Server-013 | Medium | Design-document adherence | `src/Server/ZB.MOM.WW.OtOpcUa.Server/OpcUa/OpcUaServerOptions.cs:9-19`, `src/Server/ZB.MOM.WW.OtOpcUa.Server/OpcUa/OpcUaApplicationHost.cs:296-346`, `src/Server/ZB.MOM.WW.OtOpcUa.Server/Program.cs:89` | `docs/security.md` documents 7 transport security profiles and `CLAUDE.md` references a `SecurityProfileResolver`. The code's `OpcUaSecurityProfile` enum has only `None` and `Basic256Sha256SignAndEncrypt`; `BuildSecurityPolicies` adds a po… |
|
||||
| Admin-010 | Low | OtOpcUa conventions | `Components/App.razor:9,16` | `App.razor` loads Bootstrap CSS and JS from the `cdn.jsdelivr.net` CDN. `admin-ui.md` section "Tech Stack" specifies "Bootstrap 5 vendored under `wwwroot/lib/bootstrap/`" precisely so the Admin app has no third-party runtime dependency. A… |
|
||||
| Admin-011 | Low | Concurrency & thread safety | `Hubs/FleetStatusPoller.cs:24-26,98-103` | `FleetStatusPoller` keeps three plain `Dictionary<>` fields (`_last`, `_lastRole`, `_lastResilience`) mutated from `PollOnceAsync`. The poller `ExecuteAsync` loop is single-threaded so the steady-state poll path is safe, but `ResetCache()`… |
|
||||
| Admin-012 | Low | Design-document adherence | `Services/EquipmentCsvImporter.cs:18-19,33-37,229,232` | `EquipmentCsvImporter` declares `EquipmentId` as a required CSV column and parses it into a `required` field. `admin-ui.md` section "Equipment CSV import" (revised after adversarial review finding #4) is explicit: "No `EquipmentId` column… |
|
||||
| Analyzers-002 | Low | Correctness & logic bugs | `src/Tooling/ZB.MOM.WW.OtOpcUa.Analyzers/UnwrappedCapabilityCallAnalyzer.cs:46-50,130` | `AlarmSurfaceInvoker` is listed in `WrapperTypes`, but `AlarmSurfaceInvoker`'s public methods (`SubscribeAsync`, `UnsubscribeAsync`, `AcknowledgeAsync`) take no lambda arguments at all — callers pass `IReadOnlyList<...>` / `IAlarmSubscript… |
|
||||
| Analyzers-003 | Low | Error handling & resilience | `src/Tooling/ZB.MOM.WW.OtOpcUa.Analyzers/UnwrappedCapabilityCallAnalyzer.cs:80,114-116` | `IsInsideWrapperLambda` is passed `context.Operation.SemanticModel` and returns `false` when that model is `null`. A `false` return means "not wrapped", so a null semantic model produces a false-positive diagnostic rather than silently ski… |
|
||||
| Analyzers-004 | Low | Performance & resource management | `src/Tooling/ZB.MOM.WW.OtOpcUa.Analyzers/UnwrappedCapabilityCallAnalyzer.cs:95-112` | `ImplementsGuardedInterface` runs on every invocation operation in the compilation (every keystroke in the IDE). For each candidate it allocates via `AllInterfaces.Concat(new[] { method.ContainingType })`, builds a fully-qualified display… |
|
||||
| Analyzers-005 | Low | Design-document adherence | `src/Tooling/ZB.MOM.WW.OtOpcUa.Analyzers/UnwrappedCapabilityCallAnalyzer.cs:33-43` | `CapabilityInvoker`'s XML doc (`src/Core/.../Resilience/CapabilityInvoker.cs:15-17`) enumerates the routed capability surface as `IReadable`, `IWritable`, `ITagDiscovery`, `ISubscribable`, `IHostConnectivityProbe`, `IAlarmSource`, and all… |
|
||||
| Analyzers-007 | Low | Documentation & comments | `src/Tooling/ZB.MOM.WW.OtOpcUa.Analyzers/UnwrappedCapabilityCallAnalyzer.cs:21-26` | The `<remarks>` block states the analyzer "matches by receiver-interface identity using Roslyn's semantic model, not by method name". This is accurate for the guarded-call detection (`ImplementsGuardedInterface` uses symbols), but the wrap… |
|
||||
| Client.CLI-002 | Low | Correctness & logic bugs | `Commands/SubscribeCommand.cs:129-137` | The summary computes `neverWentBad` as every target whose node-id key is absent from the `everBad` dictionary. A node that received no update at all is also absent from `everBad`, so it is counted in `neverWentBad` and printed under the he… |
|
||||
| Client.CLI-003 | Low | Correctness & logic bugs | `Commands/BrowseCommand.cs:29-30`, `Commands/SubscribeCommand.cs:20-27`, `Commands/AlarmsCommand.cs:28-29`, `Commands/HistoryReadCommand.cs:42-43` | Numeric command options accept any value with no range validation. `--depth`, `--interval`, `--max-depth`, `--max`, and the history `--interval` can all be supplied as `0` or a negative number. A negative `--depth`/`--max-depth` silently d… |
|
||||
| Client.CLI-004 | Low | OtOpcUa conventions | `Commands/SubscribeCommand.cs:13-37` | `SubscribeCommand` is the only command in the module whose constructor and all `[CommandOption]` properties have no XML doc comments. Every other command (`ConnectCommand`, `ReadCommand`, `WriteCommand`, `BrowseCommand`, `AlarmsCommand`, `… |
|
||||
| Client.CLI-006 | Low | Error handling & resilience | `Commands/HistoryReadCommand.cs:73`, `Commands/HistoryReadCommand.cs:76`, `Helpers/NodeIdParser.cs:39` | Operator input-format errors surface as raw .NET exceptions rather than clean CLI errors. An unparseable start/end value throws `FormatException` straight out of `DateTime.Parse`; an invalid node id throws `FormatException`/`ArgumentExcept… |
|
||||
| Client.CLI-007 | Low | Performance & resource management | `CommandBase.cs:112-123` | `ConfigureLogging` builds a new Serilog `LoggerConfiguration`, creates a logger, and assigns it to the static `Log.Logger` without disposing the previously assigned logger. For a single CLI invocation this leaks at most one logger and the… |
|
||||
| Client.CLI-008 | Low | Documentation & comments | `docs/Client.CLI.md:158-217` | `docs/Client.CLI.md` is stale relative to the code at this commit. (1) The `subscribe` command section documents only `-n` and `-i`, but the code (`SubscribeCommand`) also exposes `-r/--recursive`, `--max-depth`, `-q/--quiet`, `--duration`… |
|
||||
| Client.CLI-009 | Low | Code organization & conventions | `Commands/SubscribeCommand.cs:66-165`, `Commands/AlarmsCommand.cs:52-91` | Both long-running commands attach an event handler (`service.DataChanged += ...`, `service.AlarmEvent += ...`) with a lambda and never detach it. Because the handler closes over `console`, the captured console and the closure remain refere… |
|
||||
| Client.CLI-010 | Low | Testing coverage | `tests/Client/ZB.MOM.WW.OtOpcUa.Client.CLI.Tests/SubscribeCommandTests.cs` | The new `SubscribeCommand` capabilities are largely untested. The four `SubscribeCommandTests` cover only single-node subscribe, unsubscribe-on-cancel, disconnect-in-finally, and the subscription message. There is no test for the `--recurs… |
|
||||
| Client.Shared-003 | Low | Correctness & logic bugs | `Adapters/DefaultSessionAdapter.cs:76`, `Adapters/DefaultSessionAdapter.cs:273` | `WriteValueAsync` returns `response.Results[0]` and `CallMethodAsync` reads `result.Results[0]` without first checking the `Results` collection is non-empty. A malformed or service-level-faulted response (empty `Results` alongside a servic… |
|
||||
| Client.Shared-004 | Low | OtOpcUa conventions | `Adapters/DefaultSessionAdapter.cs:228`, `Adapters/DefaultSessionAdapter.cs:121`, `Adapters/DefaultSessionAdapter.cs:172` | `CloseAsync`, `HistoryReadRawAsync`, and `HistoryReadAggregateAsync` are declared `async Task` but call the synchronous `Session.Close()` / `Session.HistoryRead(...)` APIs and contain no `await`. The history methods run a blocking synchron… |
|
||||
| Client.Shared-009 | Low | Error handling & resilience / Documentation & comments | `OpcUaClientService.cs:302-322` | `AcknowledgeAlarmAsync` is typed `Task<StatusCode>` and its XML doc implies the returned code reports the ack outcome, but the method unconditionally `return StatusCodes.Good`. The actual failure path is `DefaultSessionAdapter.CallMethodAs… |
|
||||
| Client.Shared-010 | Low | Performance & resource management | `Models/ConnectionSettings.cs:48`, `OpcUaClientService.cs:408-417` | `ConnectionSettings.CertificateStorePath` is initialized to `ClientStoragePaths.GetPkiPath()` as a property initializer, so every `ConnectionSettings` instantiation runs `Environment.GetFolderPath` + `Path.Combine` and, on the first call p… |
|
||||
| Client.Shared-011 | Low | Testing coverage | `tests/Client/ZB.MOM.WW.OtOpcUa.Client.Shared.Tests/OpcUaClientServiceTests.cs` | The test suite is solid for the happy paths, connection lifecycle, and single-failover behavior. Gaps relative to the findings above: (a) no test exercises concurrent `SubscribeAsync`/failover to expose the `_activeDataSubscriptions` race… |
|
||||
| Client.UI-003 | Low | OtOpcUa conventions | `ZB.MOM.WW.OtOpcUa.Client.UI.csproj:20-21`, `Program.cs:14-20` | The csproj references `Serilog` and `Serilog.Sinks.Console`, and `docs/Client.UI.md` lists Serilog as the logging technology, but no source file in the module uses Serilog. `Program.BuildAvaloniaApp()` uses Avalonia's `LogToTrace()` and th… |
|
||||
| Client.UI-004 | Low | OtOpcUa conventions | `Views/MainWindow.axaml.cs:125-138` | `OnBrowseCertPathClicked` uses `OpenFolderDialog`, which is obsolete in Avalonia 11.x (the version pinned in the csproj). The supported replacement is the `StorageProvider` API (`StorageProvider.OpenFolderPickerAsync`). Using the obsolete… |
|
||||
| Client.UI-006 | Low | Error handling & resilience | `ViewModels/MainWindowViewModel.cs:244-252`, `ViewModels/AlarmsViewModel.cs:88-112`, `ViewModels/SubscriptionsViewModel.cs:79-94` | Many catch blocks swallow exceptions silently with an empty body and only a comment (`// Redundancy info not available`, `// Subscribe failed`, `// Subscription failed; no item added`, and others). When a subscribe, alarm-subscribe, or red… |
|
||||
| Client.UI-009 | Low | Design-document adherence | `ViewModels/HistoryViewModel.cs:44-54` | `HistoryViewModel.AggregateTypes` exposes eight entries: `null` (Raw) plus Average, Minimum, Maximum, Count, Start, End, and `StandardDeviation`. `docs/Client.UI.md` ("Query Options" table) lists only "Raw (default), Average, Minimum, Maxi… |
|
||||
| Client.UI-010 | Low | Code organization & conventions | `Controls/DateTimeRangePicker.axaml.cs:33-37`, `Controls/DateTimeRangePicker.axaml.cs:70-80` | `DateTimeRangePicker` declares `MinDateTimeProperty` / `MaxDateTimeProperty` styled properties with public CLR accessors, but neither is read anywhere in the control. `TryParseDateTime`, `OnStartLostFocus`, and `OnEndLostFocus` never clamp… |
|
||||
| Client.UI-011 | Low | Documentation & comments | `Views/MainWindow.axaml:81`, `Services/JsonSettingsService.cs:11-15` | The certificate-store-path `TextBox` watermark reads `(default: AppData/LmxOpcUaClient/pki)`, referencing the legacy pre-task-#208 folder name. Per `CLAUDE.md` / `docs/Client.UI.md` the canonical path is now `{LocalAppData}/OtOpcUaClient/`… |
|
||||
| Configuration-004 | Low | OtOpcUa conventions | `src/Core/ZB.MOM.WW.OtOpcUa.Configuration/Enums/NodePermissions.cs:8`, `src/Core/ZB.MOM.WW.OtOpcUa.Configuration/OtOpcUaConfigDbContext.cs:417` | `NodePermissions` is declared `[Flags] enum ... : uint`, while its XML doc and `NodeAcl.PermissionFlags`' doc both say "stored as int", and `ConfigureNodeAcl` uses `HasConversion<int>()` — a `uint`→`int` conversion. Only bits 0–11 are used… |
|
||||
| Configuration-005 | Low | Concurrency & thread safety | `src/Core/ZB.MOM.WW.OtOpcUa.Configuration/LocalCache/LiteDbConfigCache.cs:50` | `PutAsync` performs a non-atomic find-then-insert/update. Two concurrent `PutAsync` calls for the same `(ClusterId, GenerationId)` can both observe `existing is null` and both `Insert`, producing two rows for one generation. The constructo… |
|
||||
| Configuration-007 | Low | Error handling & resilience | `src/Core/ZB.MOM.WW.OtOpcUa.Configuration/Apply/GenerationApplier.cs:44` | `ApplyPass` wraps each callback in `catch (Exception ex)`. This swallows `OperationCanceledException` — a cancellation during a callback is recorded as just another entity error string and the applier keeps walking the remaining passes ins… |
|
||||
| Configuration-010 | Low | Security | `src/Core/ZB.MOM.WW.OtOpcUa.Configuration/LocalCache/ResilientConfigReader.cs:81` | On central-DB read failure the warning log records the full exception object. Callers pass arbitrary `centralFetch` delegates; if any delegate closes over a connection string, an exception thrown from it (or a `SqlException` carrying serve… |
|
||||
| Configuration-011 | Low | Testing coverage | `src/Core/ZB.MOM.WW.OtOpcUa.Configuration/Apply/GenerationApplier.cs:7`, `src/Core/ZB.MOM.WW.OtOpcUa.Configuration/Validation/DraftValidator.cs:60` | The companion test project covers the cache, schema compliance, stored procedures, and `DraftValidator` well, but two flagged behaviours are not pinned: (a) `GenerationApplier` ordering/cancellation when a Removed callback fails — no test… |
|
||||
| Core-004 | Low | OtOpcUa conventions | `src/Core/ZB.MOM.WW.OtOpcUa.Core/Hosting/DriverHost.cs:55,72,87` | `DriverHost` is a library type whose async calls (`driver.InitializeAsync`, `driver.ShutdownAsync`) do not use `ConfigureAwait(false)`, whereas the sibling `CapabilityInvoker` and `AlarmSurfaceInvoker` in the same module consistently do. T… |
|
||||
| Core-008 | Low | Error handling & resilience | `src/Core/ZB.MOM.WW.OtOpcUa.Core/OpcUa/GenericDriverNodeManager.cs:42-64` | The XML summary of `BuildAddressSpaceAsync` states "Driver exceptions are isolated per decision #12 — the driver's subtree is marked Faulted, but other drivers remain available." The method body contains no such isolation: an exception fro… |
|
||||
| Core-009 | Low | Performance & resource management | `src/Core/ZB.MOM.WW.OtOpcUa.Core/Resilience/CapabilityInvoker.cs:121-128` | `ExecuteWriteAsync` calls `_optionsAccessor()` three times for a single non-idempotent write (once for the `with` expression, once inside the dictionary initializer for `.Resolve(...)`, plus the discarded base). On the per-write hot path i… |
|
||||
| Core-010 | Low | Code organization & conventions | `src/Core/ZB.MOM.WW.OtOpcUa.Core/Resilience/DriverResilienceOptions.cs:45-52` | `DriverResilienceOptions.Resolve` indexes the tier-default dictionary directly (`defaults[capability]`) with no fallback. Any future addition to `DriverCapability` that is not also added to all three tier tables in `GetTierDefaults` will m… |
|
||||
| Core-011 | Low | Testing coverage | `src/Core/ZB.MOM.WW.OtOpcUa.Core/Authorization/PermissionTrieBuilder.cs:58-75` | `PermissionTrieBuilder.Descend` has a two-branch behaviour: with a `scopePaths` lookup it descends the real hierarchy; without one it falls back to placing every non-cluster row directly under the root keyed by `ScopeId` ("works for determ… |
|
||||
| Core-012 | Low | Documentation & comments | `src/Core/ZB.MOM.WW.OtOpcUa.Core/Stability/WedgeDetector.cs:26`, `src/Core/ZB.MOM.WW.OtOpcUa.Core/Observability/DriverHealthReport.cs:11-22` | Two stale doc comments. (1) `WedgeDetector` — the `<summary>` above the constructor reads "Whether the driver reported itself `DriverState.Healthy` at construction." The constructor takes only a `TimeSpan threshold` and the detector is doc… |
|
||||
| Core.Abstractions-004 | Low | Concurrency & thread safety | `src/Core/ZB.MOM.WW.OtOpcUa.Core.Abstractions/DriverTypeRegistry.cs:23-40` | `Register` performs a check-then-act sequence (`snapshot.ContainsKey` then build `next` then `Interlocked.Exchange`) that is not atomic. Two threads registering concurrently can both pass the duplicate check and both build a `next` diction… |
|
||||
| Core.Abstractions-005 | Low | Error handling & resilience | `src/Core/ZB.MOM.WW.OtOpcUa.Core.Abstractions/PollGroupEngine.cs:90,99` | Both the initial-poll and steady-state catch blocks use a bare `catch { }` that swallows every exception type, including non-transient programmer errors such as `NullReferenceException` and `ArgumentOutOfRangeException` (see Core.Abstracti… |
|
||||
| Core.Abstractions-006 | Low | Code organization & conventions | `src/Core/ZB.MOM.WW.OtOpcUa.Core.Abstractions/IHistoryProvider.cs:63,84-86`, `src/Core/ZB.MOM.WW.OtOpcUa.Core.Abstractions/Historian/IHistorianDataSource.cs:30,63` | The two history-read surfaces use inconsistent integer types for the same "maximum rows" concept. `IHistoryProvider.ReadRawAsync` and `IHistorianDataSource.ReadRawAsync` take `uint maxValuesPerNode`, but `ReadEventsAsync` (on both interfac… |
|
||||
| Core.Abstractions-007 | Low | Testing coverage | `tests/Core/ZB.MOM.WW.OtOpcUa.Core.Abstractions.Tests/PollGroupEngineTests.cs` | `PollGroupEngine` is the only behavioural (non-DTO) type in the module and its tests, while solid for the happy paths, miss two paths that this review identifies as defect-prone: (a) no test exercises an array-valued tag whose contents are… |
|
||||
| Core.Abstractions-008 | Low | Documentation & comments | `src/Core/ZB.MOM.WW.OtOpcUa.Core.Abstractions/DriverHealth.cs:9`, `src/Core/ZB.MOM.WW.OtOpcUa.Core.Abstractions/IHistoryProvider.cs:39-43,65-69` | Two XML-doc inaccuracies: 1. `DriverHealth.LastError` is documented as "Most recent error message; null when state is Healthy." The `DriverState` enum also defines `Degraded`, `Reconnecting`, and `Faulted` states, all of which carry an err… |
|
||||
| Core.AlarmHistorian-008 | Low | Performance & resource management | `src/Core/ZB.MOM.WW.OtOpcUa.Core.AlarmHistorian/SqliteStoreAndForwardSink.cs:107-127,255-278` | Each `EnqueueAsync` (one per alarm transition — a hot path on a busy plant) opens a connection, runs `EnforceCapacity` (a `COUNT(*)` over the queue table on every single enqueue), serializes JSON, inserts, and closes the connection. The un… |
|
||||
| Core.AlarmHistorian-011 | Low | Documentation & comments | `src/Core/ZB.MOM.WW.OtOpcUa.Core.AlarmHistorian/IAlarmHistorianSink.cs:5-9,76`, `AlarmHistorianEvent.cs:20` | Several doc-comments reference the retired v1 architecture. The `IAlarmHistorianSink` summary says ingestion "routes through Galaxy.Host's pipe" and `IAlarmHistorianWriter` says "Stream G wires this to the Galaxy.Host IPC client", but `doc… |
|
||||
| Core.ScriptedAlarms-003 | Low | Documentation & comments | `ScriptedAlarmEngine.cs:343`, `docs/ScriptedAlarms.md:107` | `docs/ScriptedAlarms.md` (Composition step 3) and the `OnUpstreamChange` comment ("Fire-and-forget so driver-side dispatch isn't blocked", line 225-226) describe the `OnEvent` emission path as non-blocking / fire-and-forget. In the code, `… |
|
||||
| Core.ScriptedAlarms-006 | Low | Concurrency & thread safety | `ScriptedAlarmEngine.cs:232`, `ScriptedAlarmEngine.cs:369` | `OnUpstreamChange` and `RunShelvingCheck` both launch fire-and-forget tasks (`_ = ReevaluateAsync(...)`, `_ = ShelvingCheckAsync(...)`) with `CancellationToken.None`. There is no tracking of these in-flight tasks, so `Dispose` cannot await… |
|
||||
| Core.ScriptedAlarms-008 | Low | Performance & resource management | `Part9StateMachine.cs:261-268` | `AppendComment` copies the entire existing comment list into a new `List` on every audit-producing transition (ack, confirm, shelve, unshelve, enable, disable, add-comment, auto-unshelve). The `Comments` list is append-only and unbounded —… |
|
||||
| Core.ScriptedAlarms-009 | Low | Performance & resource management | `ScriptedAlarmEngine.cs:309-315`, `ScriptedAlarmEngine.cs:271` | `BuildReadCache` allocates a fresh `Dictionary<string, DataValueSnapshot>` on every predicate evaluation, i.e. on every upstream tag change for every referencing alarm. On a busy line where many tags feeding many alarms change frequently,… |
|
||||
| Core.ScriptedAlarms-010 | Low | Design-document adherence | `ScriptedAlarmEngine.cs:325-336`, `AlarmPredicateContext.cs:33-40`, `MessageTemplate.cs:47` | Quality handling is inconsistent across the three places that inspect a `DataValueSnapshot.StatusCode`. `AreInputsReady` (engine, line 333) treats only outright Bad (bit 31) as not-ready, so an Uncertain-quality input is fed to the predica… |
|
||||
| Core.ScriptedAlarms-011 | Low | Code organization & conventions | `Part9StateMachine.cs:275` | `TransitionResult.NoOp(state, reason)` takes a `reason` string parameter that is documented in the calling code as a diagnostic ("disabled — predicate result ignored", "already acknowledged", etc.) but the factory method silently discards… |
|
||||
| Core.Scripting-005 | Low | Correctness & logic bugs | `DependencyExtractor.cs:97` | A raw string literal token passed as the tag path (a raw triple-quote literal) tokenizes as `SingleLineRawStringLiteralToken` / `MultiLineRawStringLiteralToken`, not `StringLiteralToken`. The check `literal.Token.IsKind(SyntaxKind.StringLi… |
|
||||
| Core.Scripting-006 | Low | Concurrency & thread safety | `CompiledScriptCache.cs:55` | On a failed compile the `catch` block calls `_cache.TryRemove(key, out _)` without a value comparison. If two threads race a miss for the same bad source, both observe the same faulted `Lazy` and throw, and both call `TryRemove(key)`. If a… |
|
||||
| Core.Scripting-008 | Low | Performance & resource management | `CompiledScriptCache.cs:34`, `ScriptEvaluator.cs:34` | `CompiledScriptCache` has no capacity bound (acknowledged in the class remarks) and no eviction. Each cached `ScriptEvaluator` holds a Roslyn `ScriptRunner<T>` delegate, which keeps the dynamically emitted script assembly loaded for the pr… |
|
||||
| Core.Scripting-009 | Low | Design-document adherence | `ForbiddenTypeAnalyzer.cs:45` | The Phase 7 plan decision #6 (`docs/v2/implementation/phase-7-scripting-and-alarming.md`) enumerates the forbidden surface as "No HttpClient / File / Process / reflection". `ForbiddenTypeAnalyzer` actually denies a broader set — `System.Th… |
|
||||
| Core.Scripting-011 | Low | Testing coverage | `tests/Core/ZB.MOM.WW.OtOpcUa.Core.Scripting.Tests/` | Two source files have no direct test coverage: `ScriptContext` (`Deadband` static helper is exercised only indirectly through `ScriptSandboxTests`, and not for its boundary `tolerance` behaviour) and `ScriptSandbox.Build` itself (the `Argu… |
|
||||
| Core.VirtualTags-004 | Low | Correctness & logic bugs | `src/Core/ZB.MOM.WW.OtOpcUa.Core.VirtualTags/VirtualTagEngine.cs:349` | `CoerceResult`'s switch has a default arm (`_ => raw`) that returns the script's raw return value uncoerced for any `DriverDataType` not in the explicit list (e.g. an array type, Byte, or a future enum member). The resulting `DataValueSnap… |
|
||||
| Core.VirtualTags-006 | Low | Concurrency & thread safety | `src/Core/ZB.MOM.WW.OtOpcUa.Core.VirtualTags/VirtualTagEngine.cs:177-182`, `:395-401` | `Subscribe` does `_observers.GetOrAdd(path, _ => [])` then `lock (list) { list.Add(observer); }`. When `Unsub.Dispose` removes the last observer, the now-empty List is left in `_observers` and the dictionary entry is never removed. For a l… |
|
||||
| Core.VirtualTags-007 | Low | Error handling & resilience | `src/Core/ZB.MOM.WW.OtOpcUa.Core.VirtualTags/TimerTriggerScheduler.cs:58` | `Tick` calls `_engine.EvaluateOneAsync(p, _cts.Token).GetAwaiter().GetResult()`, blocking the `System.Threading.Timer` callback thread (a thread-pool thread) for the full duration of the evaluation. Because `EvaluateInternalAsync` serialis… |
|
||||
| Core.VirtualTags-009 | Low | Performance & resource management | `src/Core/ZB.MOM.WW.OtOpcUa.Core.VirtualTags/DependencyGraph.cs:64-65`, `:72-73` | `DirectDependencies` and `DirectDependents` allocate a fresh empty `HashSet<string>` on every call for an unregistered node. `DirectDependents` is called inside the `TopologicalSort` Kahn loop and the `CascadeAsync` DFS, so for a graph wit… |
|
||||
| Core.VirtualTags-010 | Low | Documentation & comments | `src/Core/ZB.MOM.WW.OtOpcUa.Core.VirtualTags/ITagUpstreamSource.cs:18`, `VirtualTagContext.cs:30`, `VirtualTagDefinition.cs:28` | Several XML docs reference component names that do not exist in the codebase. `ITagUpstreamSource` XML doc says the subscription path "feeds the engine's ChangeTriggerDispatcher" -- there is no ChangeTriggerDispatcher; the actual path is `… |
|
||||
| Core.VirtualTags-011 | Low | Code organization & conventions | `src/Core/ZB.MOM.WW.OtOpcUa.Core.VirtualTags/VirtualTagEngine.cs:404-409` | `VirtualTagState` records a Writes set (the `ctx.SetVirtualTag` targets extracted by `DependencyExtractor`), but nothing in the engine reads it -- it is captured at `Load` and never used. Declared write targets are not validated against th… |
|
||||
| Core.VirtualTags-013 | Low | Documentation & comments | `src/Core/ZB.MOM.WW.OtOpcUa.Core.VirtualTags/DependencyGraph.cs:266-270` | `DependencyCycleException.BuildMessage` renders each cycle as `string.Join(" -> ", c) + " -> " + c[0]`, presenting the SCC member list as a traversable edge path that loops back to its first element. Tarjan's algorithm returns the members… |
|
||||
| Driver.AbCip-007 | Low | OtOpcUa conventions | `AbCipDriver.cs` (whole file), `AbCipAlarmProjection.cs`, `LibplctagTagRuntime.cs` | `CLAUDE.md` Library Preferences mandate Serilog with a rolling daily file sink. The driver has no logging at all: no `ILogger`/Serilog dependency is injected or used. Failure paths instead swallow exceptions into the `_health` string (`Rea… |
|
||||
| Driver.AbCip-011 | Low | Error handling & resilience | `AbCipDriver.cs:144-152`, `AbCipDriverOptions.cs:131-143` | `InitializeAsync` only starts probe loops when `_options.Probe.Enabled` is true AND `Probe.ProbeTagPath` is non-blank. When `Probe.Enabled` is true (the default) but `ProbeTagPath` is null (also the default; the doc comment says "PR 8 wire… |
|
||||
| Driver.AbCip-012 | Low | Performance & resource management | `LibplctagTemplateReader.cs:15-35`, `AbCipDriver.cs:88-92` | `LibplctagTemplateReader` is created per `FetchUdtShapeAsync` call, and each call constructs a fresh libplctag `Tag` for the @udt pseudo-tag, initializes it (a CIP connection handshake), reads, and disposes it. There is no reuse of the `Ta… |
|
||||
| Driver.AbCip-013 | Low | Design-document adherence | `AbCipDriverOptions.cs:70-73`, `PlcFamilies/AbCipPlcFamilyProfile.cs:13-19`, `LibplctagTagRuntime.cs:16-27` | `driver-specs.md` specifies the AB CIP per-device connection settings as discrete fields: Host, Path, PlcType, TimeoutMs, AllowPacking, ConnectionSize. The implementation instead collapses host + path into a single opaque ab:// URL string… |
|
||||
| Driver.AbCip-015 | Low | Documentation & comments | `AbCipDriver.cs:9-11`, `PlcTagHandle.cs:23-27,53-58`, `AbCipTemplateCache.cs:12-15`, `IAbCipTagEnumerator.cs:6-11`, `AbCipDriverOptions.cs:21` | Numerous comments are stale relative to the commit under review. `AbCipDriver.cs:9-11` says the driver "Implements IDriver only for now" with capabilities shipping "in subsequent PRs (3-8)" while the class already implements all of them. `… |
|
||||
| Driver.AbCip.Cli-003 | Low | Concurrency & thread safety | `src/Drivers/Cli/ZB.MOM.WW.OtOpcUa.Driver.AbCip.Cli/Commands/SubscribeCommand.cs:50-56,60-61` | The `OnDataChange` handler writes change lines to `console.Output` (a `TextWriter`) from the driver's poll-engine callback thread, while the command's main flow concurrently writes the "Subscribed to ... Ctrl+C to stop." line on the CLI th… |
|
||||
| Driver.AbCip.Cli-004 | Low | Error handling & resilience | `src/Drivers/Cli/ZB.MOM.WW.OtOpcUa.Driver.AbCip.Cli/Commands/SubscribeCommand.cs:28,58`; `AbCipCommandBase.cs:26-34` | `--interval-ms` (`IntervalMs`) is taken verbatim and passed as `TimeSpan.FromMilliseconds(IntervalMs)` to `SubscribeAsync` with no validation. A zero or negative value produces a non-positive `TimeSpan`; the option description claims "Poll… |
|
||||
| Driver.AbCip.Cli-005 | Low | Performance & resource management | `src/Drivers/Cli/ZB.MOM.WW.OtOpcUa.Driver.Cli.Common/DriverCommandBase.cs:51-59` | `ConfigureLogging` assigns a freshly created Serilog logger to the process-global `Log.Logger` but never calls `Log.CloseAndFlush()`. For a short-lived one-shot command (`probe`, `read`, `write`) the process exit flushes the console sink,… |
|
||||
| Driver.AbCip.Cli-006 | Low | Design-document adherence | `src/Drivers/Cli/ZB.MOM.WW.OtOpcUa.Driver.AbCip.Cli/AbCipCommandBase.cs:29-34` | `AbCipCommandBase` overrides the abstract `DriverCommandBase.Timeout` property with a getter derived from `TimeoutMs` and an empty `init` body (`init { /* driven by TimeoutMs */ }`). Because the override has no `[CommandOption]` attribute,… |
|
||||
| Driver.AbCip.Cli-007 | Low | Testing coverage | `tests/Drivers/Cli/ZB.MOM.WW.OtOpcUa.Driver.AbCip.Cli.Tests/WriteCommandParseValueTests.cs` | The only test file covers `WriteCommand.ParseValue` and `ReadCommand.SynthesiseTagName` — both pure static helpers. There is no coverage for `AbCipCommandBase.BuildOptions` (the flag-to-`AbCipDriverOptions` mapping that all four commands d… |
|
||||
| Driver.AbCip.Cli-008 | Low | Documentation & comments | `docs/Driver.AbCip.Cli.md:8-9` | `docs/Driver.AbCip.Cli.md` opens with "Second of four driver test-client CLIs (Modbus -> AB CIP -> AB Legacy -> S7 -> TwinCAT)." The count "four" contradicts the chain that follows it (five names) and contradicts `docs/DriverClis.md`, whic… |
|
||||
| Driver.AbLegacy-005 | Low | OtOpcUa conventions | `AbLegacyDriver.cs` (whole file) | The driver uses no `ILogger`/Serilog at all. Probe-loop failures, runtime initialisation failures, libplctag non-zero statuses, and read/write exceptions are folded into `DriverHealth.Detail` strings but never logged. CLAUDE.md names Seril… |
|
||||
| Driver.AbLegacy-011 | Low | Performance & resource management | `AbLegacyDriver.cs:440` | `Dispose()` is implemented as `DisposeAsync().AsTask().GetAwaiter().GetResult()` - sync-over-async. `ShutdownAsync` awaits `_poll.DisposeAsync()` (which completes synchronously) and does no other real async work, so a deadlock is unlikely… |
|
||||
| Driver.AbLegacy-013 | Low | Code organization & conventions | `AbLegacyDriver.cs:340-345`, `AbLegacyDriver.cs:238-264` | Two minor organisational issues: 1. `ResolveHost` returns `_options.Devices.FirstOrDefault()?.HostAddress ?? DriverInstanceId` when the reference is unknown and no devices are configured. `DriverInstanceId` is not a host address (ab://...)… |
|
||||
| Driver.AbLegacy.Cli-002 | Low | Correctness & logic bugs | `Commands/WriteCommand.cs:27-29`, `Program.cs:6-9` | The `--value` option help text states "booleans accept true/false/1/0", but `ParseBool` (`WriteCommand.cs:74-80`) and the error message also accept `on/off` and `yes/no`, and `DriverClis.md` documents the full `true/false/1/0/yes/no/on/off… |
|
||||
| Driver.AbLegacy.Cli-003 | Low | Concurrency & thread safety | `Commands/SubscribeCommand.cs:47-53` | The `OnDataChange` handler calls `console.Output.WriteLine(line)` (the synchronous overload) directly from the `PollGroupEngine` poll thread. The poll engine raises change events from a background timer/loop thread, so two ticks that fire… |
|
||||
| Driver.AbLegacy.Cli-004 | Low | Error handling & resilience | `Commands/ProbeCommand.cs:37-56`, `Commands/ReadCommand.cs:39-50`, `Commands/WriteCommand.cs:48-59`, `Commands/SubscribeCommand.cs:41-76` | Every command does `await using var driver = new AbLegacyDriver(...)` *and* an explicit `await driver.ShutdownAsync(...)` in the `finally`. `AbLegacyDriver` `DisposeAsync` itself calls `ShutdownAsync`, so the driver is shut down twice on t… |
|
||||
| Driver.AbLegacy.Cli-005 | Low | Design-document adherence | `Commands/SubscribeCommand.cs:23-25`, `docs/Driver.AbLegacy.Cli.md:94-96` | The subscribe command interval option is `--interval-ms` (default 1000). `docs/Driver.AbLegacy.Cli.md` shows the subscribe example as `otopcua-ablegacy-cli subscribe ... -i 500`, which works because of the short alias `'i'`, but the doc ne… |
|
||||
| Driver.AbLegacy.Cli-006 | Low | Code organization & conventions | `Commands/ProbeCommand.cs:20-22` | `ProbeCommand` declares its `--type` option with no short alias, while `ReadCommand`, `WriteCommand`, and `SubscribeCommand` all declare `--type` with the short alias `'t'`. `ProbeCommand` also gives `--address` the alias `'a'`, matching t… |
|
||||
| Driver.AbLegacy.Cli-007 | Low | Testing coverage | `tests/Drivers/Cli/ZB.MOM.WW.OtOpcUa.Driver.AbLegacy.Cli.Tests/WriteCommandParseValueTests.cs` | The only test file in the CLI test project covers `WriteCommand.ParseValue` and `ReadCommand.SynthesiseTagName`. Two behaviours that are pure logic (testable without a device) are uncovered: (1) `AbLegacyCommandBase.BuildOptions` — that it… |
|
||||
| Driver.Cli.Common-004 | Low | Error handling & resilience | `src/Drivers/Cli/ZB.MOM.WW.OtOpcUa.Driver.Cli.Common/SnapshotFormatter.cs:68-70` | `FormatTable` calls `rows.Max(r => r.Tag.Length)` (and the same for the value and status columns) without guarding against empty input. When `tagNames` and `snapshots` are both empty (equal length, so the mismatch check at line 56 passes),… |
|
||||
| Driver.Cli.Common-006 | Low | Documentation & comments | `src/Drivers/Cli/ZB.MOM.WW.OtOpcUa.Driver.Cli.Common/SnapshotFormatter.cs:71`, `src/Drivers/Cli/ZB.MOM.WW.OtOpcUa.Driver.Cli.Common/DriverCommandBase.cs:9` | Two minor doc inaccuracies. (1) The comment at `SnapshotFormatter.cs:71` states the "source-time column is fixed-width (ISO-8601 to ms) so no max-measurement needed" — true only when every snapshot has a non-null `SourceTimestampUtc`. `For… |
|
||||
| Driver.FOCAS-007 | Low | Error handling & resilience | `FocasDriver.cs:140-148`, `FocasDriver.cs:478-484`, `FocasDriver.cs:529-533`, `FocasAlarmProjection.cs:61-63` | Numerous `try { ... } catch {}` blocks swallow every exception with no logging - `ShutdownAsync` (CTS cancel/dispose), `RecycleLoopAsync` (`DisposeClient`), `FixedTreeLoopAsync` transient catches, `ProbeLoopAsync`, and the alarm projection… |
|
||||
| Driver.FOCAS-008 | Low | Performance & resource management | `FocasDriver.cs:201`, `FocasDriver.cs:253` | `ReadAsync` and `WriteAsync` call `FocasAddress.TryParse(def.Address)` on every operation, even though `InitializeAsync` already parsed and validated every tag address. On a subscription hot path (each poll tick re-enters `ReadAsync`) this… |
|
||||
| Driver.FOCAS-009 | Low | Design-document adherence | `FocasDriverOptions.cs:110-115`, `FocasDriver.cs:468-486`, `FocasDriverFactoryExtensions.cs:75-80` | `FocasProbeOptions.Timeout` is parsed by the factory (`FocasProbeDto.TimeoutMs` to `FocasProbeOptions.Timeout`) but never consumed. `ProbeLoopAsync` calls `client.ProbeAsync(ct)` with only the probe-loop cancellation token; no per-probe ti… |
|
||||
| Driver.FOCAS-010 | Low | Code organization & conventions | `IFocasClient.cs:210-227` (`FocasOpMode`), `FocasConstants.cs:42-78` (`FocasOperationMode`) | There are two parallel operation-mode-to-text mappings with divergent labels. `FocasOpMode.ToText` (used by the driver fixed-tree `OperationMode/ModeText` node) yields `"TJOG"`, `"TEACH_IN_HANDLE"`; `FocasOperationModeExtensions.ToText` (i… |
|
||||
| Driver.FOCAS-011 | Low | Code organization & conventions | `IFocasClient.cs:275-287` (`FocasAlarmType`), `FocasAlarmProjection.cs:149-175` | `FocasAlarmType` declares its constants as `public const int`, but the only consumers - `FocasAlarmProjection.MapAlarmType(short type)` and `MapSeverity(short type)` - take a `short` and `switch` against these `int` constants. It compiles… |
|
||||
| Driver.FOCAS.Cli-001 | Low | Error handling & resilience | `Commands/WriteCommand.cs:58-68` | `WriteCommand.ParseValue` parses the numeric `--value` types (`Byte`/`Int16`/`Int32`/`Float32`/`Float64`) with `sbyte.Parse` / `short.Parse` / etc. These throw raw `FormatException` or `OverflowException` for malformed or out-of-range inpu… |
|
||||
| Driver.FOCAS.Cli-002 | Low | Concurrency & thread safety | `Commands/SubscribeCommand.cs:45-51` | The `subscribe` command attaches an `OnDataChange` handler that calls the synchronous `console.Output.WriteLine`. `OnDataChange` is raised from the driver's `PollGroupEngine` tick thread, while the command's main flow writes the "Subscribe… |
|
||||
| Driver.FOCAS.Cli-003 | Low | Error handling & resilience | `FocasCommandBase.cs:19` (`CncPort`), `FocasCommandBase.cs:27` (`TimeoutMs`), `Commands/SubscribeCommand.cs:23` (`IntervalMs`) | The numeric command options `--cnc-port`, `--timeout-ms`, and `--interval-ms` are accepted without range validation. A zero or negative `--cnc-port` produces an invalid `focas://host:<n>` string; `--timeout-ms 0` yields a zero `TimeSpan` o… |
|
||||
| Driver.FOCAS.Cli-004 | Low | Performance & resource management | `Commands/ProbeCommand.cs:37,54`; `Commands/ReadCommand.cs:37,46`; `Commands/WriteCommand.cs:45,54`; `Commands/SubscribeCommand.cs:39,73` | Every command declares `await using var driver = new FocasDriver(...)` |
|
||||
| Driver.FOCAS.Cli-005 | Low | Design-document adherence | `Commands/WriteCommand.cs:50`, `Commands/ProbeCommand.cs:50` (via `SnapshotFormatter.FormatStatus`) | `docs/Driver.FOCAS.Cli.md` documents `BadDeviceFailure` and `BadCommunicationError` as the key diagnostic signals an operator reads off `probe` / `write` output ("A `BadCommunicationError` means ... `BadDeviceFailure` after a successful co… |
|
||||
| Driver.Galaxy-005 | Low | OtOpcUa conventions | `Runtime/EventPump.cs:81-88` | The `BoundedChannelOptions` comment states "Newest-dropped policy: when full, the producer's TryWrite returns false ... We do this manually rather than relying on `BoundedChannelFullMode.DropWrite`" — but the option is then set to `FullMod… |
|
||||
| Driver.Galaxy-010 | Low | Security | `GalaxyDriver.cs:311-341` | `ResolveApiKey` supports an `env:`/`file:` indirection and otherwise treats the config string as the literal API key ("Anything else — used as the literal API key. Convenient for dev"). `GalaxyGatewayOptions`' own XML doc claims "the API k… |
|
||||
| Driver.Galaxy-012 | Low | Performance & resource management | `Runtime/SubscriptionRegistry.cs:65-67`, `GalaxyDriver.cs:538`, `GalaxyDriver.cs:675` | Several hot paths are O(n^2) per call. `SubscriptionRegistry.ResolveSubscribers` does `entry.Bindings.FirstOrDefault(b => b.ItemHandle == itemHandle)` — a linear scan of the whole binding list for every event dispatch; at 50k tags this is… |
|
||||
| Driver.Galaxy-013 | Low | Design-document adherence | `GalaxyDriver.cs:14-27`, `GalaxyDriver.cs:374-382`, `Config/GalaxyDriverOptions.cs:84-86` | Multiple doc comments are stale relative to the shipped code. `GalaxyDriver`'s class summary still describes the file as "the project skeleton with `IDriver` bodies that wire to a future `IGalaxyGatewayClient` abstraction. Capability inter… |
|
||||
| Driver.Historian.Wonderware-004 | Low | Correctness and logic bugs | `Backend/SdkAlarmHistorianWriteBackend.cs:198-201` | `ToHistorianEvent` only assigns `historianEvent.Id` when `Guid.TryParse(dto.EventId, ...)` succeeds. If `EventId` is not a parseable GUID (or is empty), `Id` stays `Guid.Empty` and the event is written to the historian with an all-zeros id… |
|
||||
| Driver.Historian.Wonderware-005 | Low | Concurrency and thread safety | `Backend/HistorianDataSource.cs:124`, `:126-127` | `GetHealthSnapshot` reads `_activeProcessNode` and `_activeEventNode` inside `_healthLock`, but those two fields are written under `_connectionLock` / `_eventConnectionLock` (lines 183, 243, 209-210, 266-269) — a different lock. The health… |
|
||||
| Driver.Historian.Wonderware-007 | Low | Error handling and resilience | `Ipc/PipeServer.cs:70-75` | When `VerifyCaller` rejects the peer SID, the server logs the reason and calls `_current.Disconnect()` with no `HelloAck` frame sent. The shared-secret-mismatch and major-version-mismatch paths below it both send a rejecting `HelloAck` so… |
|
||||
| Driver.Historian.Wonderware-008 | Low | Error handling and resilience | `Backend/HistorianDataSource.cs:301-307`, `:374-380` | When `query.StartQuery` returns `false`, `ReadRawAsync` and `ReadAggregateAsync` call `HandleConnectionError()` and return an empty result list. A failed `StartQuery` is not necessarily a connection failure — it can be a bad tag name, an i… |
|
||||
| Driver.Historian.Wonderware-010 | Low | Performance and resource management | `Backend/HistorianConfiguration.cs:32-36`, `Backend/HistorianDataSource.cs` (all read methods) | `HistorianConfiguration.RequestTimeoutSeconds` is documented as the "outer safety timeout applied to sync-over-async Historian operations" and is copied around (`SdkAlarmHistorianWriteBackend.CloneConfigWithServerName:346`), but it is neve… |
|
||||
| Driver.Historian.Wonderware-011 | Low | Design-document adherence | `Backend/HistorianDataSource.cs:9-12`, `Backend/IHistorianDataSource.cs:9-11`, `Backend/HistorianSample.cs:7-9`, `Backend/HistorianConfiguration.cs:7-9` | Several XML doc comments reference the retired v1 architecture as if it were current: "inside Galaxy.Host", "the Proxy maps returned samples", "the Host returns these across the IPC boundary as `GalaxyDataValue`", "Populated from ... the P… |
|
||||
| Driver.Historian.Wonderware-012 | Low | Testing coverage | `Backend/HistorianDataSource.cs`, `tests/Drivers/ZB.MOM.WW.OtOpcUa.Driver.Historian.Wonderware.Tests/` | The unit-test suite covers `HistorianQualityMapper`, `HistorianClusterEndpointPicker`, `SdkAlarmHistorianWriteBackend`, `AahClientManagedAlarmEventWriter`, the IPC round trip, and `Program` alarm-writer wiring. `HistorianDataSource` itself… |
|
||||
| Driver.Historian.Wonderware.Client-003 | Low | Concurrency & thread safety | `WonderwareHistorianClient.cs:207`, `WonderwareHistorianClient.cs:132-150` | `_totalQueries` is mutated with `Interlocked.Increment` in `Invoke`, but read inside `GetHealthSnapshot` under `_healthLock`, and every other counter (`_totalSuccesses`, `_totalFailures`, `_consecutiveFailures`) is mutated only under `_hea… |
|
||||
| Driver.Historian.Wonderware.Client-004 | Low | Concurrency & thread safety | `WonderwareHistorianClient.cs:203-267` | A sidecar-reported failure is recorded in two non-atomic steps under separate lock acquisitions: `Invoke` calls `RecordSuccess()` (line 211) and then the caller calls `ThrowIfFailed` which calls `ReclassifySuccessAsFailure()` (line 256), d… |
|
||||
| Driver.Historian.Wonderware.Client-006 | Low | Error handling & resilience | `Internal/PipeChannel.cs:96-107`, `WonderwareHistorianClientOptions.cs:11-12` | `PipeChannel.InvokeAsync` retries exactly once on transport failure and otherwise propagates. The options expose `ReconnectInitialBackoff` and `ReconnectMaxBackoff` and `WonderwareHistorianClientOptions` documents them as exponential backo… |
|
||||
| Driver.Historian.Wonderware.Client-008 | Low | Security | `ZB.MOM.WW.OtOpcUa.Driver.Historian.Wonderware.Client.csproj:29-32` | The csproj suppresses two NuGet audit advisories (`GHSA-37gx-xxp4-5rgx`, `GHSA-w3x6-4m5h-cxqf`) for the `MessagePack` 2.5.187 dependency with no inline comment recording why the suppression is safe, who reviewed it, or when it should be re… |
|
||||
| Driver.Historian.Wonderware.Client-010 | Low | Documentation & comments | `WonderwareHistorianClient.cs:355-361`, `WonderwareHistorianClient.cs:132-150` | Two doc/behaviour mismatches. (1) The `Dispose()` XML comment asserts the underlying channel async cleanup is non-blocking so the `GetAwaiter()/GetResult()` bridge is safe. `PipeChannel.DisposeAsync` calls `ResetTransport()`, which invokes… |
|
||||
| Driver.Modbus-003 | Low | Concurrency & thread safety | `ModbusDriver.cs:59,188,241,259,266,726,745,759` | `_health` is a non-`volatile` reference field written from multiple threads (concurrent `ReadAsync` callers, the coalesced-read path, `WriteAsync` indirectly, and `ProbeLoopAsync`) and read by `GetHealth()`. Reference assignment is atomic… |
|
||||
| Driver.Modbus-007 | Low | Design-document adherence | `ModbusDriver.cs:1392`, `ModbusDriverOptions.cs:74-80` | Two design-vs-code drifts. (1) `MapDataType` maps `Int64`/`UInt64` to `DriverDataType.Int32` with the inline comment "widening to Int32 loses precision; PR 25 adds Int64 to DriverDataType". The address-space node for a 64-bit Modbus tag is… |
|
||||
| Driver.Modbus-008 | Low | Documentation & comments | `ModbusDriver.cs:411-417,700-703,737-744` | Stale/misleading comments. (1) The `<summary>` block at `ModbusDriver.cs:411-417` says auto-prohibited ranges are "Cleared by ReinitializeAsync ... or by an explicit re-probe API (not yet shipped)" — the re-probe loop has shipped (#151, `R… |
|
||||
| Driver.Modbus-009 | Low | Correctness & logic bugs | `ModbusDriver.cs:1160-1167`, `ModbusTcpTransport.cs:94-95` | Two edge cases. (1) `RegisterCount` for `ModbusDataType.String` computes `(tag.StringLength + 1) / 2`; a tag configured with `StringLength = 0` yields a register count of 0, flowing into `ReadOneAsync` as `totalRegs = 0` and producing an F… |
|
||||
| Driver.Modbus-010 | Low | Error handling & resilience | `ModbusDriver.cs:864-868`, `ModbusDriverOptions.cs:116-125` | When `WriteOnChangeOnly` is enabled and `IsRedundantWrite` returns true, `WriteAsync` returns `WriteResult(0u)` (Good) without touching the wire. The suppression baseline (`_lastWrittenByRef`) is only invalidated by a *read* that returns a… |
|
||||
| Driver.Modbus-011 | Low | Code organization & conventions | `ModbusDriver.cs:23-43,89-97,408-432` | Field and member declarations are interleaved with methods throughout `ModbusDriver`. `ResolveHost` (a public method) is the first member of the class, followed by `BuildSlaveHostName`, then a block of fields; `_lastPublishedByRef`/`_lastW… |
|
||||
| Driver.Modbus-012 | Low | Testing coverage | `tests/Drivers/ZB.MOM.WW.OtOpcUa.Driver.Modbus.Tests/` | The unit suite is broad (coalescing, bisection, auto-recovery, byte order, arrays, BCD, RMW, caps, multi-unit, probe, reconnect, subscription). Gaps relative to the findings above: (1) no test exercises concurrent multi-subscription publis… |
|
||||
| Driver.Modbus.Addressing-006 | Low | Error handling & resilience | `ModbusAddressParser.cs:297-301` | `TryParseFamilyNative` catches only `ArgumentException` and `OverflowException`. The current helpers throw only those (including `ArgumentOutOfRangeException`, which derives from `ArgumentException`), so today it is correct. But the parser… |
|
||||
| Driver.Modbus.Addressing-007 | Low | Design-document adherence | `ModbusDataType.cs:91-95`, `docs/v2/dl205.md` section Strings | `ModbusStringByteOrder` (HighByteFirst / LowByteFirst) is defined in this assembly and documented as the DL205 low-byte-first string-packing knob, but `ParsedModbusAddress` has no field for it and `ModbusAddressParser` never produces or co… |
|
||||
| Driver.Modbus.Addressing-009 | Low | Documentation & comments | `ModbusModiconAddress.cs:55-64`, `ModbusModiconAddress.cs:104-110` | The comments on `ModbusModiconAddress.TryParse` are slightly inaccurate. The remark that 5-digit Modicon is always exactly 5 chars (40001..49999) and 6-digit is exactly 6 (400001..465536-shaped) implies the leading digit is always 4, but t… |
|
||||
| Driver.Modbus.Cli-003 | Low | Correctness & logic bugs | `src/Drivers/Cli/ZB.MOM.WW.OtOpcUa.Driver.Modbus.Cli/ModbusCommandBase.cs:14-24` | `Port` (`int`) and `TimeoutMs` (`int`) accept any 32-bit value, including negatives and ports above 65535. `UnitId` is a `byte`, so it accepts 0-255 even though the option description and `docs/Driver.Modbus.Cli.md` both say the valid rang… |
|
||||
| Driver.Modbus.Cli-004 | Low | Concurrency & thread safety | `src/Drivers/Cli/ZB.MOM.WW.OtOpcUa.Driver.Modbus.Cli/Commands/SubscribeCommand.cs:61-67` | The `OnDataChange` handler is invoked from the driver's `PollGroupEngine` background thread and calls `console.Output.WriteLine` synchronously. An exception thrown inside this handler (e.g. an `IOException` on a redirected or closed stdout… |
|
||||
| Driver.Modbus.Cli-005 | Low | Error handling & resilience | `src/Drivers/Cli/ZB.MOM.WW.OtOpcUa.Driver.Modbus.Cli/Commands/ProbeCommand.cs:21-54`; `Commands/ReadCommand.cs:46-75`; `Commands/WriteCommand.cs:54-89` | All three commands call `ConfigureLogging()` then `console.RegisterCancellationHandler()`, but if the operator presses Ctrl+C before `InitializeAsync` completes, the resulting `OperationCancelledException` propagates out of `ExecuteAsync`… |
|
||||
| Driver.Modbus.Cli-006 | Low | Error handling & resilience | `src/Drivers/Cli/ZB.MOM.WW.OtOpcUa.Driver.Modbus.Cli/Commands/ProbeCommand.cs:35-53` | `probe` reports `Health: {health.State}` from `GetHealth()`. After a successful `InitializeAsync` the driver sets state to `Healthy` regardless of whether the subsequent probe register read returns Good or a Bad status code. `ReadAsync` do… |
|
||||
| Driver.Modbus.Cli-007 | Low | Design-document adherence | `docs/Driver.Modbus.Cli.md:124-156`; `src/Drivers/Cli/ZB.MOM.WW.OtOpcUa.Driver.Modbus.Cli/Commands/ReadCommand.cs` | `docs/Driver.Modbus.Cli.md` devotes a whole "v2 addressing grammar" section to the industry-standard tag-address strings (`40001:F:CDAB`, `HR1:I`, `C100`, `V2000:F:CDAB`, etc.) and says "set the per-tag `addressString` field instead of the… |
|
||||
| Driver.Modbus.Cli-008 | Low | Testing coverage | `tests/Drivers/Cli/ZB.MOM.WW.OtOpcUa.Driver.Modbus.Cli.Tests/` | The test project covers only the two pure-function seams: `ReadCommand.SynthesiseTagName` and `WriteCommand.ParseValue`. There is no coverage for `WriteCommand`'s read-only-region rejection (`Region is not (Coils or HoldingRegisters)`), no… |
|
||||
| Driver.OpcUaClient-011 | Low | Documentation & comments | `OpcUaClientDriver.cs:783-784` | The comment on the isArray computation states "-1 = scalar; 1+ = array dimensions; 0 = one-dimensional array". This is inaccurate against OPC UA ValueRank semantics: -3 is ScalarOrOneDimension, -2 is Any, -1 is Scalar, and 0 is OneOrMoreDi… |
|
||||
| Driver.OpcUaClient-014 | Low | Performance & resource management | `OpcUaClientDriver.cs:904`, `:1035` | `MonitoredItem.Notification += (mi, args) => ...` (and the alarm-event equivalent) attaches a closure-capturing lambda to each monitored item's event. The lambda is never detached. When UnsubscribeAsync removes a subscription it calls Subs… |
|
||||
| Driver.S7-003 | Low | Correctness & logic bugs | `S7Driver.cs:172`, `S7Driver.cs:255` | ReadAsync and WriteAsync dereference fullReferences.Count / writes.Count with no null guard. A null argument throws NullReferenceException rather than ArgumentNullException, and the NRE escapes before the _gate is taken so it is not wrappe… |
|
||||
| Driver.S7-005 | Low | OtOpcUa conventions | `S7Driver.cs:33`, `S7Driver.cs:433` | System.Collections.Concurrent.ConcurrentDictionary is written out with a fully-qualified namespace at the field declarations instead of a using System.Collections.Concurrent directive. ImplicitUsings is enabled and the rest of the codebase… |
|
||||
| Driver.S7-009 | Low | Error handling & resilience | `S7Driver.cs:392` | The subscription poll loop never reflects sustained polling failure anywhere an operator can see it. PollLoopAsync swallows every non-cancellation exception with an empty catch and the comment claims "the health surface reflects it" - but… |
|
||||
| Driver.S7-010 | Low | Performance & resource management | `S7Driver.cs:504` | Dispose() is implemented as DisposeAsync().AsTask().GetAwaiter().GetResult() - sync-over-async. Inside the generic host this is currently safe (no captured SynchronizationContext), but it is a known deadlock pattern. The only async work be… |
|
||||
| Driver.S7-013 | Low | Code organization & conventions | `S7DriverOptions.cs:90`, `S7Driver.cs:300` | S7TagDefinition.StringLength is a public configured/JSON-bound parameter (default 254) but is dead: S7DataType.String reads and writes both throw NotSupportedException ("...land in a follow-up PR"), so StringLength is never consumed. Likew… |
|
||||
| Driver.S7.Cli-004 | Low | Performance & resource management | `src/Drivers/Cli/ZB.MOM.WW.OtOpcUa.Driver.S7.Cli/Commands/ProbeCommand.cs:36,53`, `Commands/ReadCommand.cs:45,54`, `Commands/WriteCommand.cs:51,60`, `Commands/SubscribeCommand.cs:39,73` | Every command declares the driver with `await using var driver = new S7Driver(...)` and *also* calls `await driver.ShutdownAsync(...)` in a `finally` block. `S7Driver.DisposeAsync` itself calls `ShutdownAsync`, so shutdown runs twice per c… |
|
||||
| Driver.S7.Cli-005 | Low | Code organization & conventions | `tests/ZB.MOM.WW.OtOpcUa.Driver.S7.Cli.Tests/` | A stale directory `tests/ZB.MOM.WW.OtOpcUa.Driver.S7.Cli.Tests/` exists containing only an `obj/` folder — no `.csproj`, no source. The real test project lives at `tests/Drivers/Cli/ZB.MOM.WW.OtOpcUa.Driver.S7.Cli.Tests/`. The empty direct… |
|
||||
| Driver.S7.Cli-006 | Low | Testing coverage | `tests/Drivers/Cli/ZB.MOM.WW.OtOpcUa.Driver.S7.Cli.Tests/WriteCommandParseValueTests.cs` | The only test file covers `WriteCommand.ParseValue` and `ReadCommand.SynthesiseTagName`. `S7CommandBase.BuildOptions` — which maps the host / port / CPU / rack / slot / timeout flags onto an `S7DriverOptions` and forces `Probe.Enabled = fa… |
|
||||
| Driver.S7.Cli-007 | Low | Documentation & comments | `src/Drivers/Cli/ZB.MOM.WW.OtOpcUa.Driver.S7.Cli/Commands/SubscribeCommand.cs:45-51` | The Modbus CLI `SubscribeCommand` carries an explanatory comment on the `OnDataChange` handler ("Route every data-change event to the CliFx console (not System.Console — the analyzer flags it + IConsole is the testable abstraction)"). The… |
|
||||
| Driver.TwinCAT-004 | Low | Correctness & logic bugs | `TwinCATDataType.cs:24-27` | The inline comments for the IEC time types are inaccurate. TwinCAT `TIME` is a duration (32-bit, milliseconds) — not "ms since epoch of day". `DATE` is stored as seconds since 1970-01-01 (truncated to a day boundary), not "days since 1970-… |
|
||||
| Driver.TwinCAT-006 | Low | OtOpcUa conventions | `TwinCATDriver.cs:406-411` | `ResolveHost` falls back to `DriverInstanceId` when there are no configured devices and the reference is unknown. `DriverInstanceId` is a logical config-DB identifier, not a host address; `IPerCallHostResolver` consumers expect a host key… |
|
||||
| Driver.TwinCAT-014 | Low | Design-document adherence | `TwinCATDriverOptions.cs:41-43`, `TwinCATDriverOptions.cs:57-62`, `AdsTwinCATClient.cs:145` | Several drifts between the implemented config surface and `docs/v2/driver-specs.md` section 6. The spec connection-settings list has separate `Host` (IP), `AmsNetId`, and `AmsPort` fields; the implementation collapses these into a single `… |
|
||||
| Driver.TwinCAT-015 | Low | Code organization & conventions | `TwinCATDriver.cs:431-432` | `Dispose()` runs `DisposeAsync().AsTask().GetAwaiter().GetResult()` — sync-over-async. `docs/v2/driver-stability.md` section Galaxy explicitly lists "sync-over-async on the OPC UA stack thread" among the four 2026-04-13 stability findings… |
|
||||
| Driver.TwinCAT-016 | Low | Testing coverage | `tests/Drivers/ZB.MOM.WW.OtOpcUa.Driver.TwinCAT.Tests/` | Unit coverage exists for AMS-address parsing, symbol-path parsing, read/write, native notifications, symbol browse, and the capability surface. Gaps tied to the findings above: no test exercises `ReinitializeAsync` with a changed config (D… |
|
||||
| Driver.TwinCAT.Cli-001 | Low | Correctness & logic bugs | `TwinCATCommandBase.cs:23-24`, `Commands/SubscribeCommand.cs:23-24`, `Commands/BrowseCommand.cs:21-24` | Numeric command options are accepted without range validation. `--timeout-ms` feeds `Timeout => TimeSpan.FromMilliseconds(TimeoutMs)`; passing `--timeout-ms 0` or a negative value yields `TimeSpan.Zero`/a negative `TimeSpan`, which is then… |
|
||||
| Driver.TwinCAT.Cli-002 | Low | Concurrency & thread safety | `Commands/SubscribeCommand.cs:46-58` | The `OnDataChange` handler calls `console.Output.WriteLine(line)` synchronously. In native ADS-notification mode the event is raised from the `Beckhoff.TwinCAT.Ads` notification callback thread (see `TwinCATDriver.SubscribeAsync`, which in… |
|
||||
| Driver.TwinCAT.Cli-003 | Low | Error handling & resilience | `Commands/SubscribeCommand.cs:56-58` | The subscribe banner reports the mechanism purely from the `--poll-only` flag (`var mode = PollOnly ? "polling" : "ADS notification"`). The doc (`docs/Driver.TwinCAT.Cli.md`) states the banner "announces which mechanism is in play". The CL… |
|
||||
| Driver.TwinCAT.Cli-004 | Low | Design-document adherence | `TwinCATCommandBase.cs:26-29`, `Commands/BrowseCommand.cs` | `--poll-only` is declared on `TwinCATCommandBase`, so it is inherited by `browse`. `BrowseCommand` only ever calls `DiscoverAsync` — it never subscribes — so `UseNativeNotifications = !PollOnly` has no observable effect on a browse run. Th… |
|
||||
| Driver.TwinCAT.Cli-005 | Low | Code organization & conventions | `Commands/ProbeCommand.cs:23`, `Commands/ReadCommand.cs:20`, `Commands/WriteCommand.cs:20`, `Commands/SubscribeCommand.cs:18` | The `--type` option is declared with the short alias `-t` on `read`, `write`, and `subscribe`, but `ProbeCommand` declares `[CommandOption("type", ...)]` with no short alias. An operator who has internalised `-t` from the other three verbs… |
|
||||
| Driver.TwinCAT.Cli-006 | Low | Testing coverage | `tests/Drivers/Cli/ZB.MOM.WW.OtOpcUa.Driver.TwinCAT.Cli.Tests/WriteCommandParseValueTests.cs` | The only test file covers `WriteCommand.ParseValue` and `ReadCommand.SynthesiseTagName`. Other deterministic, router-independent logic is untested: `TwinCATCommandBase.Gateway` (the `ads://{netId}:{port}` string the driver's `TwinCATAmsAdd… |
|
||||
| Driver.TwinCAT.Cli-007 | Low | Documentation & comments | `TwinCATCommandBase.cs:31-36` | The `Timeout` override has an empty `init` accessor with the comment `/* driven by TimeoutMs */`. Because the base `DriverCommandBase.Timeout` is declared `abstract { get; init; }`, the override must supply an `init`, but here it silently… |
|
||||
| Server-004 | Low | OtOpcUa conventions | `src/Server/ZB.MOM.WW.OtOpcUa.Server/OpcUa/OtOpcUaServer.cs:187-200` | `RoleBasedIdentity` declares its own `Display` property, but the base `UserIdentity` already has a settable `DisplayName`. `DriverNodeManager.ResolveCallUser`/`RouteScriptedAlarmMethodCalls` read the base `DisplayName`, never `Display`. Si… |
|
||||
| Server-006 | Low | Concurrency & thread safety | `src/Server/ZB.MOM.WW.OtOpcUa.Server/OpcUa/DriverNodeManager.cs:478-482, 1342-1348` | `OnReadValue`/`OnWriteValue` are synchronous stack hooks that block on async driver calls via `.GetAwaiter().GetResult()` with `CancellationToken.None`. With `MaxRequestThreadCount = 100`, a burst of reads/writes into a stalled driver pins… |
|
||||
| Server-008 | Low | Error handling & resilience | `src/Server/ZB.MOM.WW.OtOpcUa.Server/OpcUa/DriverNodeManager.cs:736` | `RouteScriptedAlarmMethodCalls` marks a handled slot by setting `errors[i] = ServiceResult.Good`, assuming `base.Call` skips non-null *Good* error slots. The stack and `GateCallMethodRequests` only ever pre-populate *Bad* slots; the skip-o… |
|
||||
| Server-012 | Low | Performance & resource management | `src/Server/ZB.MOM.WW.OtOpcUa.Server/Hosting/PeerHttpProbeLoop.cs:78-79` | `ProbeAsync` creates an `IHttpClientFactory` client and mutates `client.Timeout` on every 2-second probe tick. The timeout belongs on the request or on the named-client registration, not set per call on a factory-vended instance. |
|
||||
| Server-014 | Low | Code organization & conventions | `src/Server/ZB.MOM.WW.OtOpcUa.Server/SealedBootstrap.cs` | `SealedBootstrap` claims in its xml-doc to "close release blocker #2" by consuming the generation-sealed cache + resilient reader + stale-config flag, but `Program.cs` registers and uses `NodeBootstrap` instead. `SealedBootstrap` is never… |
|
||||
| Server-015 | Low | Documentation & comments | `src/Server/ZB.MOM.WW.OtOpcUa.Server/OpcUa/OtOpcUaServer.cs:16-21`, `src/Server/ZB.MOM.WW.OtOpcUa.Server/OpcUa/OpcUaServerOptions.cs:21-26` | `OtOpcUaServer`'s class doc still says "PR 16 minimum-viable scope ... no security ... LDAP + security profiles are deferred." `OpcUaServerOptions`'s says "PR 17 minimum-viable scope: no LDAP, no security profiles beyond None." Both are st… |
|
||||
|
||||
## Closed findings
|
||||
|
||||
|
||||
237
code-reviews/Server/findings.md
Normal file
237
code-reviews/Server/findings.md
Normal file
@@ -0,0 +1,237 @@
|
||||
# Code Review — Server
|
||||
|
||||
| Field | Value |
|
||||
|---|---|
|
||||
| Module | `src/Server/ZB.MOM.WW.OtOpcUa.Server` |
|
||||
| Reviewer | Claude Code |
|
||||
| Review date | 2026-05-22 |
|
||||
| Commit reviewed | `76d35d1` |
|
||||
| Status | Reviewed |
|
||||
| Open findings | 15 |
|
||||
|
||||
## Checklist coverage
|
||||
|
||||
| # | Category | Result |
|
||||
|---|---|---|
|
||||
| 1 | Correctness & logic bugs | Server-001, Server-002, Server-003 |
|
||||
| 2 | OtOpcUa conventions | Server-004 |
|
||||
| 3 | Concurrency & thread safety | Server-005, Server-006 |
|
||||
| 4 | Error handling & resilience | Server-007, Server-008 |
|
||||
| 5 | Security | Server-009, Server-010, Server-011 |
|
||||
| 6 | Performance & resource management | Server-012 |
|
||||
| 7 | Design-document adherence | Server-013 |
|
||||
| 8 | Code organization & conventions | Server-014 |
|
||||
| 9 | Testing coverage | No issues found |
|
||||
| 10 | Documentation & comments | Server-015 |
|
||||
|
||||
## Findings
|
||||
|
||||
### Server-001
|
||||
| Field | Value |
|
||||
|---|---|
|
||||
| Severity | Critical |
|
||||
| Category | Correctness & logic bugs |
|
||||
| Location | `src/Server/ZB.MOM.WW.OtOpcUa.Server/OpcUa/DriverNodeManager.cs:1791` |
|
||||
| Status | Open |
|
||||
|
||||
**Description:** `WriteNodeIdUnknown` calls itself unconditionally as its first statement, then sets `errors[i]`. Unbounded recursion with no base case overflows the stack. Called from all four `HistoryRead*` overrides whenever a HistoryRead targets a node whose `NodeId` cannot be resolved to a driver full reference. Any client issuing such a HistoryRead triggers an uncatchable `StackOverflowException` that terminates the process — a remotely-triggerable DoS.
|
||||
|
||||
**Recommendation:** Replace the self-call with the result-slot assignment mirroring `WriteUnsupported`/`WriteInternalError`: `results[i] = new OpcHistoryReadResult { StatusCode = StatusCodes.BadNodeIdUnknown };` then `errors[i] = StatusCodes.BadNodeIdUnknown;`.
|
||||
|
||||
**Resolution:** _(open)_
|
||||
|
||||
### Server-002
|
||||
| Field | Value |
|
||||
|---|---|
|
||||
| Severity | High |
|
||||
| Category | Correctness & logic bugs |
|
||||
| Location | `src/Server/ZB.MOM.WW.OtOpcUa.Server/Security/AuthorizationGate.cs:60-63` |
|
||||
| Status | Open |
|
||||
|
||||
**Description:** `IsAllowed` does `if (decision.IsAllowed) return true; return !_strictMode;`. When a session carries resolved LDAP groups and the evaluator returns an explicit deny, lax mode (default) overrides it to `true`. The lax fallback is intended only for sessions lacking LDAP groups / missing tries, but here it also nullifies authored `NodeAcl` deny rules for fully-resolved sessions. Per-tag deny ACLs do nothing until `StrictMode` is on.
|
||||
|
||||
**Recommendation:** Distinguish "indeterminate / no grant" from "explicit deny." Fall through to `!_strictMode` only when indeterminate; an explicit deny returns `false` regardless of mode. Extend `AuthorizeDecision` with an `IsExplicitDeny` flag if needed.
|
||||
|
||||
**Resolution:** _(open)_
|
||||
|
||||
### Server-003
|
||||
| Field | Value |
|
||||
|---|---|
|
||||
| Severity | Medium |
|
||||
| Category | Correctness & logic bugs |
|
||||
| Location | `src/Server/ZB.MOM.WW.OtOpcUa.Server/Phase7/RingBufferHistoryWriter.cs:96-119` |
|
||||
| Status | Open |
|
||||
|
||||
**Description:** `ReadRawAsync`'s XML doc claims "newest-first," but `TagRingBuffer.Snapshot()` returns oldest-to-newest and the loop preserves that order — so results are oldest-first. Also `maxValuesPerNode` is capped against total buffer size *before* the `[startUtc, endUtc)` filter, so a paged read returns the oldest in-window samples, contradicting the doc and usual HistoryRead expectations.
|
||||
|
||||
**Recommendation:** Make code and doc agree on ordering (raw HistoryRead is normally ascending source-timestamp). Apply `maxValuesPerNode` to the in-window count, not the whole buffer.
|
||||
|
||||
**Resolution:** _(open)_
|
||||
|
||||
### Server-004
|
||||
| Field | Value |
|
||||
|---|---|
|
||||
| Severity | Low |
|
||||
| Category | OtOpcUa conventions |
|
||||
| Location | `src/Server/ZB.MOM.WW.OtOpcUa.Server/OpcUa/OtOpcUaServer.cs:187-200` |
|
||||
| Status | Open |
|
||||
|
||||
**Description:** `RoleBasedIdentity` declares its own `Display` property, but the base `UserIdentity` already has a settable `DisplayName`. `DriverNodeManager.ResolveCallUser`/`RouteScriptedAlarmMethodCalls` read the base `DisplayName`, never `Display`. Since the ctor passes only `userName` to base, `DisplayName` resolves to the username — so scripted-alarm Ack/Confirm/Shelve audit entries record the raw username, not the LDAP-resolved display name the comment promises. `Display` is dead code.
|
||||
|
||||
**Recommendation:** Drop `Display`; set the base `DisplayName = displayName ?? userName;`. Verify `ResolveCallUser` yields the resolved display name.
|
||||
|
||||
**Resolution:** _(open)_
|
||||
|
||||
### Server-005
|
||||
| Field | Value |
|
||||
|---|---|
|
||||
| Severity | Medium |
|
||||
| Category | Concurrency & thread safety |
|
||||
| Location | `src/Server/ZB.MOM.WW.OtOpcUa.Server/Alarms/AlarmConditionService.cs:166`, `src/Server/ZB.MOM.WW.OtOpcUa.Server/OpcUa/DriverNodeManager.cs:303-311` |
|
||||
| Status | Open |
|
||||
|
||||
**Description:** `OnValueChanged` raises `TransitionRaised` on the value-change thread; the subscriber `OnAlarmServiceTransition` drives `ConditionSink.OnTransition` → `alarm.ReportEvent`. `DriverNodeManager.Dispose` detaches the handler but does not synchronise against an in-flight `Invoke`. The service is process-shared across drivers, so a transition can dispatch to a `ConditionSink` whose `DriverNodeManager` is concurrently being disposed → `ReportEvent` on a torn-down node manager.
|
||||
|
||||
**Recommendation:** Guard `OnAlarmServiceTransition` with a `_disposed` check under `Lock` before `sink.OnTransition`. Document that handlers must tolerate invocation during their owner's disposal.
|
||||
|
||||
**Resolution:** _(open)_
|
||||
|
||||
### Server-006
|
||||
| Field | Value |
|
||||
|---|---|
|
||||
| Severity | Low |
|
||||
| Category | Concurrency & thread safety |
|
||||
| Location | `src/Server/ZB.MOM.WW.OtOpcUa.Server/OpcUa/DriverNodeManager.cs:478-482, 1342-1348` |
|
||||
| Status | Open |
|
||||
|
||||
**Description:** `OnReadValue`/`OnWriteValue` are synchronous stack hooks that block on async driver calls via `.GetAwaiter().GetResult()` with `CancellationToken.None`. With `MaxRequestThreadCount = 100`, a burst of reads/writes into a stalled driver pins request threads for the full pipeline timeout, exhausting the pool and stalling unrelated sessions. The call cannot be cancelled by a client timeout.
|
||||
|
||||
**Recommendation:** Derive a `CancellationToken` from the `OperationContext` / `TransportQuotas.OperationTimeout` so a stuck driver call is abandoned. Longer term, use the stack's async service overrides if available.
|
||||
|
||||
**Resolution:** _(open)_
|
||||
|
||||
### Server-007
|
||||
| Field | Value |
|
||||
|---|---|
|
||||
| Severity | Medium |
|
||||
| Category | Error handling & resilience |
|
||||
| Location | `src/Server/ZB.MOM.WW.OtOpcUa.Server/OpcUa/OpcUaApplicationHost.cs:179-183` |
|
||||
| Status | Open |
|
||||
|
||||
**Description:** `HealthEndpointsHost` is built without a `configDbHealthy` delegate, so the default `() => true` is used — `/healthz` always reports `configDbReachable = true` and never 503s on a DB outage. `_staleConfigFlag` is also never supplied by `Program.cs`, so the stale-config signal is inert too. `/healthz` degenerates to a pure liveness probe; operators get a false-healthy during a DB outage.
|
||||
|
||||
**Recommendation:** Wire a real config-DB probe (cheap cached `SELECT 1`) into `HealthEndpointsHost`, and register `StaleConfigFlag` in `Program.cs`. Or move DB health to `/readyz` and drop the misleading `configDbReachable` field.
|
||||
|
||||
**Resolution:** _(open)_
|
||||
|
||||
### Server-008
|
||||
| Field | Value |
|
||||
|---|---|
|
||||
| Severity | Low |
|
||||
| Category | Error handling & resilience |
|
||||
| Location | `src/Server/ZB.MOM.WW.OtOpcUa.Server/OpcUa/DriverNodeManager.cs:736` |
|
||||
| Status | Open |
|
||||
|
||||
**Description:** `RouteScriptedAlarmMethodCalls` marks a handled slot by setting `errors[i] = ServiceResult.Good`, assuming `base.Call` skips non-null *Good* error slots. The stack and `GateCallMethodRequests` only ever pre-populate *Bad* slots; the skip-on-Good assumption is not a guaranteed SDK contract. If `base.Call` re-dispatches, the engine method and the stack's built-in Part 9 handler both fire — double transition.
|
||||
|
||||
**Recommendation:** Verify against the pinned SDK whether `base.Call` skips Good-pre-populated slots. If not, exclude routed slots from `methodsToCall` before `base.Call`. Add a test asserting exactly-once engine transition for a routed Acknowledge.
|
||||
|
||||
**Resolution:** _(open)_
|
||||
|
||||
### Server-009
|
||||
| Field | Value |
|
||||
|---|---|
|
||||
| Severity | High |
|
||||
| Category | Security |
|
||||
| Location | `src/Server/ZB.MOM.WW.OtOpcUa.Server/Security/LdapOptions.cs:44`, `src/Server/ZB.MOM.WW.OtOpcUa.Server/Program.cs:74` |
|
||||
| Status | Open |
|
||||
|
||||
**Description:** `AllowInsecureLdap` defaults to `true` (and `Program.cs` reads `?? true`); `UseTls` defaults to `false`. Out of the box, usernames and plaintext passwords are bound to LDAP over an unencrypted socket. A production deployment enabling LDAP without explicitly setting `AllowInsecureLdap=false` ships credentials in clear text on the server→LDAP hop.
|
||||
|
||||
**Recommendation:** Default `AllowInsecureLdap` to `false` in both the property initializer and the `Program.cs` fallback. Log a startup warning when LDAP is enabled with `UseTls=false && AllowInsecureLdap=true`.
|
||||
|
||||
**Resolution:** _(open)_
|
||||
|
||||
### Server-010
|
||||
| Field | Value |
|
||||
|---|---|
|
||||
| Severity | Medium |
|
||||
| Category | Security |
|
||||
| Location | `src/Server/ZB.MOM.WW.OtOpcUa.Server/OpcUa/OpcUaServerOptions.cs:59`, `src/Server/ZB.MOM.WW.OtOpcUa.Server/OpcUa/OpcUaApplicationHost.cs:284-291` |
|
||||
| Status | Open |
|
||||
|
||||
**Description:** `AutoAcceptUntrustedClientCertificates` defaults to `true` (`Program.cs` reads `?? true`). `BuildConfiguration` wires a handler that accepts any client cert failing with `BadCertificateUntrusted`. A deployment that forgets to flip the flag accepts every untrusted client cert, defeating the PKI trust list. With the always-present `None` policy, the default posture is fully open.
|
||||
|
||||
**Recommendation:** Default `AutoAcceptUntrustedClientCertificates` to `false`; keep auto-accept as opt-in dev convenience. `docs/security.md` already shows `false` — align code to doc.
|
||||
|
||||
**Resolution:** _(open)_
|
||||
|
||||
### Server-011
|
||||
| Field | Value |
|
||||
|---|---|
|
||||
| Severity | Medium |
|
||||
| Category | Security |
|
||||
| Location | `src/Server/ZB.MOM.WW.OtOpcUa.Server/OpcUa/OpcUaApplicationHost.cs:322-346` |
|
||||
| Status | Open |
|
||||
|
||||
**Description:** `BuildUserTokenPolicies` advertises a `UserName` token policy only when `SecurityProfile == Basic256Sha256SignAndEncrypt && Ldap.Enabled`. With the default `SecurityProfile = None` and `Ldap.Enabled = true`, the LDAP authenticator is wired but no UserName policy is advertised — clients cannot present credentials; the only path in is Anonymous. The operator's intent is silently not honoured, with no diagnostic.
|
||||
|
||||
**Recommendation:** Validate config at startup and warn/fail when `Ldap.Enabled = true` but no UserName policy is advertised. Allow UserName tokens on any non-None profile (they are stack-encrypted regardless, per `docs/security.md`).
|
||||
|
||||
**Resolution:** _(open)_
|
||||
|
||||
### Server-012
|
||||
| Field | Value |
|
||||
|---|---|
|
||||
| Severity | Low |
|
||||
| Category | Performance & resource management |
|
||||
| Location | `src/Server/ZB.MOM.WW.OtOpcUa.Server/Hosting/PeerHttpProbeLoop.cs:78-79` |
|
||||
| Status | Open |
|
||||
|
||||
**Description:** `ProbeAsync` creates an `IHttpClientFactory` client and mutates `client.Timeout` on every 2-second probe tick. The timeout belongs on the request or on the named-client registration, not set per call on a factory-vended instance.
|
||||
|
||||
**Recommendation:** Configure the timeout once via `AddHttpClient(HttpClientName).ConfigureHttpClient(...)`, or use a per-request linked `CancellationTokenSource(_options.HttpProbeTimeout)`; drop the per-call `client.Timeout` mutation.
|
||||
|
||||
**Resolution:** _(open)_
|
||||
|
||||
### Server-013
|
||||
| Field | Value |
|
||||
|---|---|
|
||||
| Severity | Medium |
|
||||
| Category | Design-document adherence |
|
||||
| Location | `src/Server/ZB.MOM.WW.OtOpcUa.Server/OpcUa/OpcUaServerOptions.cs:9-19`, `src/Server/ZB.MOM.WW.OtOpcUa.Server/OpcUa/OpcUaApplicationHost.cs:296-346`, `src/Server/ZB.MOM.WW.OtOpcUa.Server/Program.cs:89` |
|
||||
| Status | Open |
|
||||
|
||||
**Description:** `docs/security.md` documents 7 transport security profiles and `CLAUDE.md` references a `SecurityProfileResolver`. The code's `OpcUaSecurityProfile` enum has only `None` and `Basic256Sha256SignAndEncrypt`; `BuildSecurityPolicies` adds a policy only for the latter; `SecurityProfileResolver` does not exist in the repo (grep finds it only in docs). `Basic256Sha256-Sign` and all Aes profiles are unimplemented, and `Program.cs:89`'s `Enum.TryParse` silently selects `None` for an unrecognised profile string.
|
||||
|
||||
**Recommendation:** Reconcile code and docs — implement the missing profiles + `SecurityProfileResolver`, or trim `docs/security.md` / `CLAUDE.md` to the two supported profiles. At minimum, log a warning when a configured `SecurityProfile` fails to parse instead of silently using `None`.
|
||||
|
||||
**Resolution:** _(open)_
|
||||
|
||||
### Server-014
|
||||
| Field | Value |
|
||||
|---|---|
|
||||
| Severity | Low |
|
||||
| Category | Code organization & conventions |
|
||||
| Location | `src/Server/ZB.MOM.WW.OtOpcUa.Server/SealedBootstrap.cs` |
|
||||
| Status | Open |
|
||||
|
||||
**Description:** `SealedBootstrap` claims in its xml-doc to "close release blocker #2" by consuming the generation-sealed cache + resilient reader + stale-config flag, but `Program.cs` registers and uses `NodeBootstrap` instead. `SealedBootstrap` is never registered in DI nor referenced by `OpcUaServerService` — it and its `StaleConfigFlag` plumbing are dead in the production wire-up; the release blocker remains open in practice.
|
||||
|
||||
**Recommendation:** Either register `SealedBootstrap` (with `GenerationSealedCache`/`ResilientConfigReader`/`StaleConfigFlag`) and wire `StaleConfigFlag` into the health host, or delete `SealedBootstrap` and correct the release-readiness doc.
|
||||
|
||||
**Resolution:** _(open)_
|
||||
|
||||
### Server-015
|
||||
| Field | Value |
|
||||
|---|---|
|
||||
| Severity | Low |
|
||||
| Category | Documentation & comments |
|
||||
| Location | `src/Server/ZB.MOM.WW.OtOpcUa.Server/OpcUa/OtOpcUaServer.cs:16-21`, `src/Server/ZB.MOM.WW.OtOpcUa.Server/OpcUa/OpcUaServerOptions.cs:21-26` |
|
||||
| Status | Open |
|
||||
|
||||
**Description:** `OtOpcUaServer`'s class doc still says "PR 16 minimum-viable scope ... no security ... LDAP + security profiles are deferred." `OpcUaServerOptions`'s says "PR 17 minimum-viable scope: no LDAP, no security profiles beyond None." Both are stale — the class now does LDAP UserName auth, anonymous-role mapping, and a configurable security profile. A reader would wrongly conclude the server has no authentication.
|
||||
|
||||
**Recommendation:** Update both class summaries to describe current behaviour and drop the "deferred to a future PR" language.
|
||||
|
||||
**Resolution:** _(open)_
|
||||
Reference in New Issue
Block a user