Commit Graph

162 Commits

Author SHA1 Message Date
Joseph Doherty
abbf49141c fix(core): resolve High code-review findings (Core-001, Core-002)
Core-001: swap the authorization-cache defaults so
MembershipFreshnessInterval (5 min, inner re-resolve trigger) is
strictly less than AuthCacheMaxStaleness (15 min, fail-closed
ceiling), so NeedsRefresh's warm-refresh path is reachable.

Core-002: TriePermissionEvaluator.Authorize now compares the trie's
GenerationId against the session's AuthGenerationId and re-fetches the
session's bound generation on mismatch, failing closed when that
generation has been pruned.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-22 06:13:01 -04:00
Joseph Doherty
ee51878c08 fix(configuration): resolve High code-review findings (Configuration-001, Configuration-008)
Configuration-001: wrap the EXEC dbo.sp_ValidateDraft call in
sp_PublishGeneration in a BEGIN TRY/CATCH ROLLBACK; THROW block so a
validation RAISERROR aborts the publish instead of being ignored.

Configuration-008: route caller-supplied strings interpolated into
ConfigAuditLog.DetailsJson through STRING_ESCAPE(@x, 'json') and emit
sp_RollbackToGeneration's @TargetGenerationId as a bare JSON number,
closing the JSON-injection / denial-of-operation vector.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-22 06:12:00 -04:00
Joseph Doherty
adf794f791 fix(server): resolve High code-review findings (Server-002, Server-009)
Server-002 — AuthorizationGate lax-mode no longer overrides explicit deny.
IsAllowed now switches on the evaluator's AuthorizationVerdict: Allow -> true,
Denied (an authored deny rule matched) -> false in BOTH strict and lax mode,
and only the indeterminate NotGranted case falls through to !_strictMode.
Previously `if (decision.IsAllowed) return true; return !_strictMode;` let lax
mode (the default) nullify authored NodeAcl deny rules for fully-resolved
sessions. The tri-state AuthorizationVerdict.Denied member is now honoured.

Server-009 — LDAP is secure-by-default. LdapOptions.AllowInsecureLdap now
defaults to false (was true) and Program.cs's config fallback reads `?? false`
(was `?? true`), so an LDAP-enabled deployment will not bind credentials over
an unencrypted socket unless an operator explicitly opts in. Program.cs also
logs a startup warning when LDAP is enabled with UseTls=false and
AllowInsecureLdap=true, flagging the clear-text server->LDAP credential hop.

Regression tests: AuthorizationGateTests covers all four verdict x mode
combinations via a fixed-verdict evaluator stub; new LdapOptionsTests asserts
the secure defaults. Both Server and Server.Tests build clean; the 15 targeted
tests pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-22 06:11:06 -04:00
Joseph Doherty
7bb21c2aa2 fix(scripting): resolve High code-review finding (Core.Scripting-002)
The ForbiddenTypeAnalyzer syntax walker only inspected four node kinds
(ObjectCreation, Invocation-with-member-access, MemberAccess, bare
Identifier), so a forbidden type named through typeof, a generic type
argument, a cast, an is/as type pattern, default(T), an array-creation
element type, or an explicitly-typed local declaration produced no
examined node and bypassed the sandbox check.

Analyze now runs a second pass that resolves GetTypeInfo on every
TypeSyntax node and recursively unwraps array element types and generic
type arguments, so forbidden types nested at any depth are rejected at
compile. The original member/call node-kind switch is kept deliberately
narrow (rather than resolving GetSymbolInfo on every node) to avoid
flagging harmless inherited members such as typeof(int).Name, whose Name
property is declared by System.Reflection.MemberInfo. A span+type dedupe
keeps the two passes from emitting duplicate rejections.

Regression tests added in ScriptSandboxTests cover typeof, generic type
arguments, casts, default(T), is/as patterns, array element types, and
typed local declarations with forbidden types, plus over-block guards
asserting allowed generics and typeof still compile.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-22 06:08:08 -04:00
Joseph Doherty
8c7c605478 docs(code-reviews): regenerate index — 6 Critical findings resolved
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-22 05:54:40 -04:00
Joseph Doherty
4df8737c86 fix(driver-galaxy): wire event-stream faults to the reconnect supervisor (Driver.Galaxy-001)
The ReconnectSupervisor was constructed but its trigger
ReportTransportFailure was never called. When the gateway StreamEvents
stream faulted, EventPump just logged and exited — the supervisor was
never notified, so a transient gateway drop permanently stopped
data-change notifications while GetHealth() still reported Healthy.

EventPump gains an optional onStreamFault callback invoked from its
stream-fault catch block (not on clean shutdown). GalaxyDriver wires it
to ReconnectSupervisor.ReportTransportFailure so a transport drop drives
reopen → replay.

This is the minimal fix for -001; the pump-restart-on-reopen gap remains
tracked as Driver.Galaxy-008. Regression tests cover the callback being
invoked on fault, the end-to-end supervisor reopen/replay, and that a
clean shutdown does not fire it. Driver.Galaxy suite: 206/206 pass.

Resolves code-review finding Driver.Galaxy-001 (Critical).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-22 05:54:33 -04:00
Joseph Doherty
796871c210 fix(alarm-historian): keep queue rows aligned to events on drain (Core.AlarmHistorian-001)
ReadBatch built parallel rowIds / events lists: rowIds.Add ran for every
row but events.Add was guarded by `if (evt is not null)`. A corrupt /
null-deserializing payload desynced the lists, so DrainOnceAsync applied
each outcome to the wrong RowId — an Ack could delete an un-sent event
(silent alarm-event data loss) and the corrupt row stalled the queue
head forever.

ReadBatch now returns a single list of QueueRow(long RowId,
AlarmHistorianEvent? Event) records so a rowId can never drift from its
event; deserialization is wrapped to yield null on JsonException.
DrainOnceAsync immediately dead-letters rows whose payload is
null/un-deserializable and forwards only well-formed events to the
writer, mapping outcomes by RowId.

Regression tests cover a corrupt row mid-batch and at the queue head.
Core.AlarmHistorian suite: 16/16 pass.

Resolves code-review finding Core.AlarmHistorian-001 (Critical).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-22 05:54:20 -04:00
Joseph Doherty
cfb9ff1032 fix(scripting): block dangerous System types in the script sandbox (Core.Scripting-001)
ForbiddenTypeAnalyzer used only a namespace-prefix deny-list. System.Environment,
System.AppDomain, System.GC and System.Activator live directly in the System
namespace, which must stay allowed for primitives (Math, String, ...), so they
were never caught — an operator-authored predicate could call
System.Environment.Exit(0) and terminate the in-process OPC UA server.

Add a type-granular deny-list (ForbiddenFullTypeNames) checked by
fully-qualified type name after the namespace-prefix check; legitimate System
types are unaffected.

Regression tests assert scripts referencing Environment/AppDomain/GC/Activator
are rejected at analysis time. Core.Scripting suite: 68/68 pass.

Resolves code-review finding Core.Scripting-001 (Critical).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-22 05:54:08 -04:00
Joseph Doherty
973730d0eb fix(admin): enforce authentication on all Admin UI routes (Admin-001/002)
Admin-001: Routes.razor used a plain RouteView, so the page-level
[Authorize] attributes on 11 pages were inert — every page, including
mutating ones, was reachable fully unauthenticated.
Admin-002: several pages (e.g. NewCluster, which writes config rows)
carried no auth attribute at all.

- Routes.razor: RouteView → AuthorizeRouteView with NotAuthorized /
  Authorizing slots; add RedirectToLogin component.
- Program.cs: SetFallbackPolicy(RequireAuthenticatedUser) — secure by
  default for new pages/endpoints.
- Login.razor: [AllowAnonymous] so login stays reachable; login page,
  /auth/* endpoints and static assets remain anonymous.
- Add [Authorize] to the previously un-gated pages; NewCluster gated to
  the CanPublish (FleetAdmin) policy.

Regression tests in PageAuthorizationTests pin that anonymous requests
to protected/mutating routes are rejected and that login + static
assets stay anonymously reachable. Admin test suite: 210/210 pass.

Resolves code-review findings Admin-001 and Admin-002 (Critical).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-22 05:53:58 -04:00
Joseph Doherty
571066130b fix(server): stop WriteNodeIdUnknown infinite recursion (Server-001)
WriteNodeIdUnknown called itself unconditionally as its first statement
— unbounded recursion with no base case → StackOverflowException, an
uncatchable process crash reachable by any client issuing a HistoryRead
on an unresolvable NodeId (remote DoS).

Replace the self-call with the result-slot assignment, mirroring
WriteUnsupported / WriteInternalError. The helper is now internal so the
regression test can pin the StatusCode without a server fixture.

Resolves code-review finding Server-001 (Critical).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-22 05:53:44 -04:00
Joseph Doherty
8568f5cd85 docs(code-reviews): comprehensive per-module review pass at 76d35d1
Reviewed all 31 src/ production projects against the 10-category
checklist in REVIEW-PROCESS.md. Each module gets its own findings.md;
code-reviews/README.md is regenerated from them.

334 findings: 6 Critical, 46 High, 126 Medium, 156 Low.

Critical findings:
- Server-001: WriteNodeIdUnknown recurses unconditionally — a HistoryRead
  on an unresolvable node crashes the process (remote DoS).
- Admin-001/002: app-wide auth bypass (RouteView not AuthorizeRouteView)
  plus unauthenticated mutating routes.
- Core.Scripting-001: System.Environment reachable from operator scripts;
  Environment.Exit() terminates the server.
- Core.AlarmHistorian-001: rowIds/events parallel-list desync on a corrupt
  payload misapplies outcomes — silent alarm-event data loss.
- Driver.Galaxy-001: ReconnectSupervisor is built but never triggered, so
  a transient gateway drop permanently kills the event stream.

All findings are Status=Open; resolution is tracked per REVIEW-PROCESS.md
section 4. Review only — no source code changed.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-22 05:20:27 -04:00
Joseph Doherty
76d35d1b9f chore: add per-module code review process and tracking infra
Adapts the code-review procedure, folder layout, template, and tooling
from the sibling mxaccessgw repo to lmxopcua.

- REVIEW-PROCESS.md: per-module review workflow — a module is one src/
  or tests/ project (ZB.MOM.WW.OtOpcUa. prefix stripped); 10-category
  checklist; finding IDs/severities/statuses; re-review rules.
- code-reviews/_template/findings.md: per-module findings template.
- code-reviews/regen-readme.py: generates the cross-module README.md
  index from the per-module findings.md files; --check gates staleness
  and consistency.
- code-reviews/test_regen_readme.py: dependency-free generator tests.
- code-reviews/prompt.md: orchestration prompt for clearing the backlog.
- code-reviews/README.md: generated index (no modules reviewed yet).
- scripts/check-code-reviews-readme.ps1: CI / pre-commit check wrapper.

Adapted to this repo: ZB.MOM.WW.OtOpcUa module naming, OtOpcUa
conventions checklist (in-process GalaxyDriver + mxaccessgw,
contained-name vs tag-name, ACL at DriverNodeManager), single .NET
solution build/test commands, and the lmxopcua design docs.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-22 04:08:47 -04:00