Commit Graph

895 Commits

Author SHA1 Message Date
Joseph Doherty
adf794f791 fix(server): resolve High code-review findings (Server-002, Server-009)
Server-002 — AuthorizationGate lax-mode no longer overrides explicit deny.
IsAllowed now switches on the evaluator's AuthorizationVerdict: Allow -> true,
Denied (an authored deny rule matched) -> false in BOTH strict and lax mode,
and only the indeterminate NotGranted case falls through to !_strictMode.
Previously `if (decision.IsAllowed) return true; return !_strictMode;` let lax
mode (the default) nullify authored NodeAcl deny rules for fully-resolved
sessions. The tri-state AuthorizationVerdict.Denied member is now honoured.

Server-009 — LDAP is secure-by-default. LdapOptions.AllowInsecureLdap now
defaults to false (was true) and Program.cs's config fallback reads `?? false`
(was `?? true`), so an LDAP-enabled deployment will not bind credentials over
an unencrypted socket unless an operator explicitly opts in. Program.cs also
logs a startup warning when LDAP is enabled with UseTls=false and
AllowInsecureLdap=true, flagging the clear-text server->LDAP credential hop.

Regression tests: AuthorizationGateTests covers all four verdict x mode
combinations via a fixed-verdict evaluator stub; new LdapOptionsTests asserts
the secure defaults. Both Server and Server.Tests build clean; the 15 targeted
tests pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-22 06:11:06 -04:00
Joseph Doherty
7bb21c2aa2 fix(scripting): resolve High code-review finding (Core.Scripting-002)
The ForbiddenTypeAnalyzer syntax walker only inspected four node kinds
(ObjectCreation, Invocation-with-member-access, MemberAccess, bare
Identifier), so a forbidden type named through typeof, a generic type
argument, a cast, an is/as type pattern, default(T), an array-creation
element type, or an explicitly-typed local declaration produced no
examined node and bypassed the sandbox check.

Analyze now runs a second pass that resolves GetTypeInfo on every
TypeSyntax node and recursively unwraps array element types and generic
type arguments, so forbidden types nested at any depth are rejected at
compile. The original member/call node-kind switch is kept deliberately
narrow (rather than resolving GetSymbolInfo on every node) to avoid
flagging harmless inherited members such as typeof(int).Name, whose Name
property is declared by System.Reflection.MemberInfo. A span+type dedupe
keeps the two passes from emitting duplicate rejections.

Regression tests added in ScriptSandboxTests cover typeof, generic type
arguments, casts, default(T), is/as patterns, array element types, and
typed local declarations with forbidden types, plus over-block guards
asserting allowed generics and typeof still compile.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-22 06:08:08 -04:00
Joseph Doherty
8c7c605478 docs(code-reviews): regenerate index — 6 Critical findings resolved
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-22 05:54:40 -04:00
Joseph Doherty
4df8737c86 fix(driver-galaxy): wire event-stream faults to the reconnect supervisor (Driver.Galaxy-001)
The ReconnectSupervisor was constructed but its trigger
ReportTransportFailure was never called. When the gateway StreamEvents
stream faulted, EventPump just logged and exited — the supervisor was
never notified, so a transient gateway drop permanently stopped
data-change notifications while GetHealth() still reported Healthy.

EventPump gains an optional onStreamFault callback invoked from its
stream-fault catch block (not on clean shutdown). GalaxyDriver wires it
to ReconnectSupervisor.ReportTransportFailure so a transport drop drives
reopen → replay.

This is the minimal fix for -001; the pump-restart-on-reopen gap remains
tracked as Driver.Galaxy-008. Regression tests cover the callback being
invoked on fault, the end-to-end supervisor reopen/replay, and that a
clean shutdown does not fire it. Driver.Galaxy suite: 206/206 pass.

Resolves code-review finding Driver.Galaxy-001 (Critical).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-22 05:54:33 -04:00
Joseph Doherty
796871c210 fix(alarm-historian): keep queue rows aligned to events on drain (Core.AlarmHistorian-001)
ReadBatch built parallel rowIds / events lists: rowIds.Add ran for every
row but events.Add was guarded by `if (evt is not null)`. A corrupt /
null-deserializing payload desynced the lists, so DrainOnceAsync applied
each outcome to the wrong RowId — an Ack could delete an un-sent event
(silent alarm-event data loss) and the corrupt row stalled the queue
head forever.

ReadBatch now returns a single list of QueueRow(long RowId,
AlarmHistorianEvent? Event) records so a rowId can never drift from its
event; deserialization is wrapped to yield null on JsonException.
DrainOnceAsync immediately dead-letters rows whose payload is
null/un-deserializable and forwards only well-formed events to the
writer, mapping outcomes by RowId.

Regression tests cover a corrupt row mid-batch and at the queue head.
Core.AlarmHistorian suite: 16/16 pass.

Resolves code-review finding Core.AlarmHistorian-001 (Critical).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-22 05:54:20 -04:00
Joseph Doherty
cfb9ff1032 fix(scripting): block dangerous System types in the script sandbox (Core.Scripting-001)
ForbiddenTypeAnalyzer used only a namespace-prefix deny-list. System.Environment,
System.AppDomain, System.GC and System.Activator live directly in the System
namespace, which must stay allowed for primitives (Math, String, ...), so they
were never caught — an operator-authored predicate could call
System.Environment.Exit(0) and terminate the in-process OPC UA server.

Add a type-granular deny-list (ForbiddenFullTypeNames) checked by
fully-qualified type name after the namespace-prefix check; legitimate System
types are unaffected.

Regression tests assert scripts referencing Environment/AppDomain/GC/Activator
are rejected at analysis time. Core.Scripting suite: 68/68 pass.

Resolves code-review finding Core.Scripting-001 (Critical).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-22 05:54:08 -04:00
Joseph Doherty
973730d0eb fix(admin): enforce authentication on all Admin UI routes (Admin-001/002)
Admin-001: Routes.razor used a plain RouteView, so the page-level
[Authorize] attributes on 11 pages were inert — every page, including
mutating ones, was reachable fully unauthenticated.
Admin-002: several pages (e.g. NewCluster, which writes config rows)
carried no auth attribute at all.

- Routes.razor: RouteView → AuthorizeRouteView with NotAuthorized /
  Authorizing slots; add RedirectToLogin component.
- Program.cs: SetFallbackPolicy(RequireAuthenticatedUser) — secure by
  default for new pages/endpoints.
- Login.razor: [AllowAnonymous] so login stays reachable; login page,
  /auth/* endpoints and static assets remain anonymous.
- Add [Authorize] to the previously un-gated pages; NewCluster gated to
  the CanPublish (FleetAdmin) policy.

Regression tests in PageAuthorizationTests pin that anonymous requests
to protected/mutating routes are rejected and that login + static
assets stay anonymously reachable. Admin test suite: 210/210 pass.

Resolves code-review findings Admin-001 and Admin-002 (Critical).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-22 05:53:58 -04:00
Joseph Doherty
571066130b fix(server): stop WriteNodeIdUnknown infinite recursion (Server-001)
WriteNodeIdUnknown called itself unconditionally as its first statement
— unbounded recursion with no base case → StackOverflowException, an
uncatchable process crash reachable by any client issuing a HistoryRead
on an unresolvable NodeId (remote DoS).

Replace the self-call with the result-slot assignment, mirroring
WriteUnsupported / WriteInternalError. The helper is now internal so the
regression test can pin the StatusCode without a server fixture.

Resolves code-review finding Server-001 (Critical).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-22 05:53:44 -04:00
Joseph Doherty
8568f5cd85 docs(code-reviews): comprehensive per-module review pass at 76d35d1
Reviewed all 31 src/ production projects against the 10-category
checklist in REVIEW-PROCESS.md. Each module gets its own findings.md;
code-reviews/README.md is regenerated from them.

334 findings: 6 Critical, 46 High, 126 Medium, 156 Low.

Critical findings:
- Server-001: WriteNodeIdUnknown recurses unconditionally — a HistoryRead
  on an unresolvable node crashes the process (remote DoS).
- Admin-001/002: app-wide auth bypass (RouteView not AuthorizeRouteView)
  plus unauthenticated mutating routes.
- Core.Scripting-001: System.Environment reachable from operator scripts;
  Environment.Exit() terminates the server.
- Core.AlarmHistorian-001: rowIds/events parallel-list desync on a corrupt
  payload misapplies outcomes — silent alarm-event data loss.
- Driver.Galaxy-001: ReconnectSupervisor is built but never triggered, so
  a transient gateway drop permanently kills the event stream.

All findings are Status=Open; resolution is tracked per REVIEW-PROCESS.md
section 4. Review only — no source code changed.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-22 05:20:27 -04:00
Joseph Doherty
76d35d1b9f chore: add per-module code review process and tracking infra
Adapts the code-review procedure, folder layout, template, and tooling
from the sibling mxaccessgw repo to lmxopcua.

- REVIEW-PROCESS.md: per-module review workflow — a module is one src/
  or tests/ project (ZB.MOM.WW.OtOpcUa. prefix stripped); 10-category
  checklist; finding IDs/severities/statuses; re-review rules.
- code-reviews/_template/findings.md: per-module findings template.
- code-reviews/regen-readme.py: generates the cross-module README.md
  index from the per-module findings.md files; --check gates staleness
  and consistency.
- code-reviews/test_regen_readme.py: dependency-free generator tests.
- code-reviews/prompt.md: orchestration prompt for clearing the backlog.
- code-reviews/README.md: generated index (no modules reviewed yet).
- scripts/check-code-reviews-readme.ps1: CI / pre-commit check wrapper.

Adapted to this repo: ZB.MOM.WW.OtOpcUa module naming, OtOpcUa
conventions checklist (in-process GalaxyDriver + mxaccessgw,
contained-name vs tag-name, ACL at DriverNodeManager), single .NET
solution build/test commands, and the lmxopcua design docs.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-22 04:08:47 -04:00
Joseph Doherty
27a8d05b7c feat(driver-galaxy): consume the gateway's session-less alarm model
The mxaccessgw updated alarms to a session-less central monitor:
AcknowledgeAlarm dropped SessionId and alarm transitions now come from
the session-less StreamAlarms feed instead of the per-session worker
StreamEvents stream. The GalaxyDriver no longer compiled against the
updated client.

- GatewayGalaxyAlarmAcknowledger: session-less rewrite — no GalaxyMxSession;
  outcome read from ProtocolStatus (throw) and Hresult (warn).
- New IGalaxyAlarmFeed seam + GatewayGalaxyAlarmFeed: background consumer
  of StreamAlarms that decodes the active-alarm snapshot plus live
  transitions into GalaxyAlarmTransition and reopens the stream on
  transport faults.
- EventPump: drop the dead per-session OnAlarmTransition path; the
  per-session stream no longer carries alarms.
- GalaxyDriver: bridge the feed onto IAlarmSource.OnAlarmEvent; the feed
  starts on SubscribeAlarmsAsync, independent of data subscriptions.
- Tests: replace EventPumpAlarmTests with GatewayGalaxyAlarmFeedTests;
  move the driver alarm-source tests onto the IGalaxyAlarmFeed seam.

Browse needed no change — GatewayGalaxyHierarchySource consumes the
unchanged DiscoverHierarchy contract.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-22 03:59:36 -04:00
Joseph Doherty
cd2306db66 feat(historian-sidecar): live aahClientManaged alarm-event write path (C.1)
SdkAlarmHistorianWriteBackend.WriteBatchAsync replaces the RetryPlease
placeholder with the real entry point — HistorianAccess.AddStreamedValue
(HistorianEvent, out HistorianAccessError) in aahClientManaged, pinned by
decompiling the installed SDK.

The write path opens its own ReadOnly=false connection: the query-side
HistorianDataSource opens ReadOnly sessions and AddStreamedValue fails on
those with WriteToReadOnlyFile. IHistorianConnectionFactory gains a readOnly
parameter (default true, query path unchanged); BuildConnectionArgs is
extracted as a pure helper. HistorianClusterEndpointPicker is shared for
node failover; connection-class errors abort the batch as RetryPlease and
reset the connection, malformed-input codes map to PermanentFail.

Tests: connection-unavailable batch deferral, ClassifyOutcome error-code
table, BuildConnectionArgs read-vs-write shaping (80 pass, 2 rig-skipped).
Live_* round-trip tests stay Skip-gated for the D.1 rollout smoke.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-18 16:08:32 -04:00
Joseph Doherty
419eda256b feat(server): route OPC UA Part 9 AddComment to ScriptedAlarmEngine
RouteScriptedAlarmMethodCalls now handles ConditionType.AddComment
alongside Acknowledge/Confirm, dispatching to engine.AddCommentAsync.
An empty comment is rejected by the Part 9 state machine and surfaced
as BadInvalidArgument. MapCallOperation gates AddComment at the
AlarmAcknowledge tier — there is no dedicated AddComment permission bit.

Closes phase-7-status.md Gap 1: all Part 9 alarm methods now route to
the engine. Adds 3 unit tests.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-18 09:43:03 -04:00
Joseph Doherty
c5915700bd feat(server): route OPC UA Part 9 shelve methods to ScriptedAlarmEngine (#24)
OneShotShelve / TimedShelve / Unshelve now reach the ScriptedAlarmEngine.
Scripted-alarm condition nodes get a ShelvedStateMachine subtree created
before alarm.Create so the stack wires each shelve method's dispatch
handler; AlarmConditionState.OnShelve / OnTimedUnshelve route to the
engine and mirror the result onto the OPC UA node via SetShelvingState.

The three per-instance shelve method NodeIds are indexed so the Call gate
resolves them to OpcUaOperation.AlarmShelve instead of falling through to
generic Call. Engine dispatch is split into the node-free InvokeEngineShelve
so the routing decision is unit-testable.

Adds 9 unit tests; updates phase-7-status.md Gap 1 (only AddComment remains
unwired) and the #24 entry in looseends.md.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-18 09:31:30 -04:00
Joseph Doherty
56bb1ceaf5 Merge branch 'feat/alarm-historian-c1-writer'
Task #19 (alarms C.1) — complete PR C.1 test coverage for the Wonderware
historian alarm-event writer (100/1000-event batching + cluster-failover).
The C.1 code structure was already on master; the live aahClientManaged
SDK binding in SdkAlarmHistorianWriteBackend remains rig-gated (D.1).
2026-05-18 06:27:27 -04:00
Joseph Doherty
8a51842e89 test(historian-sidecar): complete PR C.1 test coverage for AahClientManagedAlarmEventWriter
Add the spec-required 100/1000-event batching tests and cluster-failover
tests that were missing from the existing C.1 suite:

- AahClientManagedAlarmEventWriterTests: add Large_batch_all_ack_returns_all_true
  (batchSize 100 + 1000) and Large_batch_alternating_outcomes_are_positionally_correct
  (batchSize 100 + 1000) to satisfy the "1 / 100 / 1000 events" spec requirement;
  add Backend_retry_then_succeed_simulates_cluster_failover to cover the
  RetryPlease-then-Ack sequence at the IPC layer (unit-level stand-in for the
  rig-gated live cluster-failover path).

- SdkAlarmHistorianWriteBackendTests (new file): unit tests that pin the
  placeholder backend's RetryPlease-for-every-slot contract (preserves queued
  events while D.1 is unresolved); plus two Skip("rig-required") integration
  tests covering the live SDK single-event roundtrip and cluster failover via
  HistorianClusterEndpointPicker — remove the Skip in PR D.1.

Feasibility note: aahClientManaged.dll IS present in lib/ and referenced in
the csproj; the SDK call site is isolated behind IAlarmHistorianWriteBackend
in SdkAlarmHistorianWriteBackend.WriteBatchAsync (single method, D.1 seam).
The full AahClientManagedAlarmEventWriter implementation was already complete.

Build: 0 errors, 0 warnings.
Tests: 64 passed, 2 skipped (rig-gated), 0 failed.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-18 06:25:11 -04:00
Joseph Doherty
fea2b34e9a Merge branch 'feat/wave4-phase7'
Wave 4 — Phase 7 gap closure + flaky-test stabilization:
- #24 route OPC UA Part 9 Acknowledge/Confirm to ScriptedAlarmEngine
- #25/#26/#27 /virtual-tags + /scripted-alarms pages + /script-log viewer
- #28 RingBufferHistoryWriter — production IHistoryWriter for virtual tags
- #29 stabilize 3 flaky compliance-gate tests (root-cause fixes)
2026-05-18 06:02:11 -04:00
Joseph Doherty
392b219233 fix(tests): stabilize three flaky tests under parallel full-solution load
#1 EventPumpBoundedChannelTests.Tags_metrics_with_client_name_for_multi_driver_hosts:
Replace fixed Task.Delay(100) with a poll-until-condition loop (5 s
timeout, 25 ms poll) so the test waits until the galaxy.events.received
measurement for galaxy.client=Driver-X actually lands in the listener.
Also adds lock(captured) in the MeterListener callback and at all reads,
since Counter.Add() fires the callback on the RunAsync background thread.

#2 VirtualTagEngineTests.Upstream_change_triggers_cascade_through_two_levels:
After waiting for B=15.0, also await WaitForConditionAsync for C=30.0
before asserting C. The cascade runs B then C sequentially under the
_evalGate semaphore; the prior code could read C while its evaluation
had not yet acquired the gate.

#3 ThreeUserInteropMatrixTests.Admin_Resolves_All_Five_Groups_From_LDAP:
Wrap the AuthenticateAsync call in a 15 s linked CancellationTokenSource
with one retry so transient GLAuth latency spikes under parallel test
load do not cause a CancellationToken expiry before the LDAP bind/search
complete.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-18 05:59:00 -04:00
Joseph Doherty
41f133a337 feat(admin-ui): add /virtual-tags, /scripted-alarms, and /script-log pages (tasks #25, #26, #27)
Gap 2 (#25): VirtualTagsTab.razor + /virtual-tags global page — list/create/toggle
virtual tags per draft generation with DataType, Script, trigger, Historize, Enabled
fields. Tab wired into DraftEditor.

Gap 3 (#26): ScriptedAlarmsTab.razor + /scripted-alarms global page — list/create
scripted alarms with AlarmType, Severity, MessageTemplate, PredicateScript,
HistorizeToAveva, Retain. SeverityBand helper shows Low/Medium/High/Critical label.
Tab wired into DraftEditor.

Gap 4 (#27): ScriptLogHub (SignalR IAsyncEnumerable stream) tails scripts-*.log with
optional ScriptName filter; ScriptLog.razor provides Start/Stop/Clear controls plus
level filter dropdown. Hub registered at /hubs/script-log in Program.cs.

Nav rail gains a "Scripting" eyebrow with entries for all three pages.
19 new unit tests for ScriptLogHub parse/filter/tail helpers (Category=Unit).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-18 05:58:59 -04:00
Joseph Doherty
bc8ff7a5fe feat(phase7): wire RingBufferHistoryWriter as production IHistoryWriter for virtual tags (Gap 5)
Closes Phase 7 Gap 5: VirtualTagEngine called IHistoryWriter.Record per evaluation
when Historize=true but Phase7EngineComposer always passed NullHistoryWriter, so
virtual-tag history was computed but never persisted.

The fix:
- New RingBufferHistoryWriter implements both IHistoryWriter (write port for the
  evaluation pipeline) and IHistorianDataSource (read port for IHistoryRouter so
  OPC UA HistoryRead on virtual-tag nodes resolves here). Maintains one bounded
  ring buffer (1000 samples, configurable) per tag path; Record() is O(1) and
  never blocks evaluation.
- Phase7EngineComposer.Compose now accepts IHistoryRouter? and, when any
  VirtualTagDefinition.Historize=true, creates a RingBufferHistoryWriter, passes
  it to VirtualTagEngine as historyWriter, adds it to the disposables list, and
  registers it under the "virtual:" prefix in the router for HistoryRead dispatch.
- Phase7Composer accepts IHistoryRouter? from DI (already registered as singleton
  in Program.cs) and threads it through to Phase7EngineComposer.Compose.
- NullHistoryWriter remains as fallback when no tags request historization.
- 16 new unit tests in RingBufferHistoryWriterTests.cs cover ring-buffer semantics,
  eviction, per-tag isolation, ReadRawAsync windowing, IHistorianDataSource stubs,
  router registration, and the Historize=false / null-router fallback paths.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-18 05:58:50 -04:00
Joseph Doherty
ca149ce907 feat(phase7): route OPC UA Part 9 Acknowledge/Confirm methods to ScriptedAlarmEngine (task #24)
Gap 1 of phase-7-status.md. Intercepts AcknowledgeableConditionType_Acknowledge and
AcknowledgeableConditionType_Confirm calls in DriverNodeManager.Call and dispatches
them to ScriptedAlarmEngine so OPC UA HMI clients can acknowledge/confirm scripted alarms
in addition to the existing Admin UI path. Shelve methods deferred (per-instance NodeIds,
not well-known type MethodIds — follow-up task). AlarmEngine is now exposed through
Phase7ComposedSources so the server wire-up passes it to every DriverNodeManager. 13 new
unit tests cover dispatch kernel, identity fallback, batch handling, and error paths.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-18 05:58:50 -04:00
Joseph Doherty
1913bda6b8 Merge branch 'fix/phase-6-1-stale-compliance-check' 2026-05-18 05:25:11 -04:00
Joseph Doherty
fa965ede3d fix(compliance): drop stale Galaxy.Proxy assertions from phase 6.1 gate
phase-6-1-compliance.ps1 asserted CircuitBreaker.cs / Backoff.cs exist
under src/ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Proxy/Supervisor/ — but the
Galaxy.Proxy project was retired in PR 7.2 with the legacy in-process
Galaxy stack. Circuit-breaker + backoff resilience is now the Core
pipeline (DriverResiliencePipelineBuilder, per-device-keyed), which the
same gate already asserts and passes. The two stale checks always fail.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-18 05:25:11 -04:00
Joseph Doherty
7b3b6580b3 Merge branch 'fix/test-fixture-sql-host'
Fix DB-test fixture defaults missed in the 2026-04-28 SQL-host migration.
2026-05-18 05:12:21 -04:00
Joseph Doherty
41da84293a fix(tests): point DB-test fixture defaults at the migrated SQL host
Four DB-backed test fixtures still defaulted DefaultServer to
localhost,14330 — missed in the 2026-04-28 Docker migration that moved
SQL Server off this VM onto the shared host 10.100.0.35. With no SQL on
localhost, all 31 DB-backed tests failed with connection timeouts,
which in turn failed the Phase 6 compliance gate (phase-6-all.ps1).

Updated SchemaComplianceFixture, HostStatusPublisherTests,
FleetStatusPollerTests, and AdminServicesIntegrationTests to default to
10.100.0.35,14330 (still overridable via OTOPCUA_CONFIG_TEST_SERVER).
Verified: Configuration.Tests 91 pass, HostStatusPublisher 4 pass,
FleetStatusPoller + AdminServicesIntegration 5 pass — all 31 green.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-18 05:12:20 -04:00
Joseph Doherty
16a87b08f3 docs: add four planning runbooks for Phase 6.3 interop, v2 GA gates, live-hardware validation, and alarms worker wiring
Produces docs/plans/ entries for tasks #13, #15, #16, and #17-#20:
- phase-6-3-redundancy-interop-plan.md: automation boundary analysis,
  concrete test matrix (A/B/C blocks), and a step-by-step cutover
  runbook for the deferred Stream F client interop work
- v2-ga-lab-gates-plan.md: exact gate list with command, pass criterion,
  and owner for each of the nine v2 GA exit criteria
- live-hardware-validation-runbooks.md: one runbook per driver (FOCAS
  CNC smoke #54, AB CIP live-boot, TwinCAT wire-live) with preconditions,
  procedure, expected results, and recording template
- alarms-worker-wiring-plan.md: focused plan for A.2/A.3-A.4/C.1/D.1
  worker wiring in the mxaccessgw sibling repo, documenting the
  discovered AVEVA API surface, the architectural decision that blocks
  A.2, the dependency order, and what each item needs to unblock

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-18 04:53:36 -04:00
Joseph Doherty
da8a3e46f7 Merge branch 'feat/tasks-12-14-22-23'
Wave 2 of the task-list run:
- #12 harden the Phase 6.2 deferred authz gates (28 new tests; gates already shipped in fb6dd34)
- #14 Phase 6.4 Stream B.5 — five-identifier ranked equipment search
- #22 docs/v2/phase-7-status.md — Phase 7 reconciliation (Phase 7 ~complete, 5 gaps)
- #23 retire docs/v2/lmx-followups.md
2026-05-18 04:42:39 -04:00
Joseph Doherty
09af8d2830 docs: add Phase 7 status reconciliation document
Audits every Phase 7 plan stream (A-H) against the repo, confirms the
exit gate is fully closed, and records the five genuine remaining gaps:
OPC UA method-call dispatch for alarm Ack/Confirm/Shelve, the
/virtual-tags and /scripted-alarms Admin UI tabs, the script log viewer,
and the missing production IHistoryWriter for virtual-tag historization.
Also notes that docs/v2/v2-release-readiness.md carries a stale "out of
scope" label — Phase 7 shipped completely after that doc was last updated.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-18 04:40:32 -04:00
Joseph Doherty
6c78027b5a docs: retire docs/v2/lmx-followups.md (all items DONE, pre-PR-7.2 arch)
Every item in lmx-followups.md was marked DONE and rooted in the
retired GalaxyProxyDriver / OtOpcUaGalaxyHost named-pipe architecture
deleted in PR 7.2. No live or unique content remains: v2-release-readiness.md
is the canonical open-work tracker. Remove the file and drop the
now-dead link from docs/README.md.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-18 04:39:58 -04:00
Joseph Doherty
bb1854b2f8 feat(admin): add five-identifier ranked equipment search (Phase 6.4 Stream B.5)
Implements the missing Stream B.5 search from the Phase 6.4 plan:
- EquipmentService.SearchAsync scopes to a cluster, scores hits across
  ZTag / MachineCode / SAPID / EquipmentId / EquipmentUuid (decision #117):
  exact = 100, prefix = 50, fuzzy (opt-in) = 20; published generation
  outranks draft on equal scores per spec.
- EquipmentSearchHit record carries Score + MatchedField for badge display.
- EquipmentTab.razor gains a search panel with per-row matched-field chips
  (green exact, amber prefix, grey fuzzy) and fuzzy opt-in checkbox.
- 14 new unit tests in EquipmentSearchTests.cs (Category=Unit) cover exact,
  prefix, fuzzy, case-insensitivity, tie-break, cross-cluster isolation, and
  maxResults cap; all 148 admin unit tests pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-18 04:39:58 -04:00
Joseph Doherty
70d7166a39 test(server): harden deferred authz gates — task #12 Browse/Subscribe/Call/AlarmAck
Add DeferredGateHardeningTests (28 unit tests) covering the Phase 6.2
compliance-checklist gaps left by the per-gate unit suites that shipped
with the gate implementations:

- Lax-mode fall-through for CreateMonitoredItems and Call gates (null
  identity and identity-without-LDAP-groups both skip denial in lax mode,
  consistent with BrowseGatingTests.Lax_mode_null_identity)
- Flag isolation: Subscribe-only grant does NOT imply Read; Read-only
  grant does NOT imply Subscribe; HistoryRead-only grant does NOT imply
  Read and vice versa (Phase 6.2 compliance: "HistoryRead uses its own flag")
- Alarm-bit isolation: AlarmAcknowledge alone does not grant AlarmConfirm
  or AlarmShelve; Browse alone does not grant AlarmAcknowledge
- AlarmShelve falls through to OpcUaOperation.Call in MapCallOperation
  (documents the ShelvedStateMachine per-instance NodeId limitation noted
  in the implementation, with the follow-up path: MethodCall grant covers it)
- Complete OpcUaOperation→NodePermissions mapping coverage for all deferred
  operations (Browse, CreateMonitoredItems, TransferSubscriptions, Call,
  AlarmAcknowledge, AlarmConfirm, AlarmShelve) — both positive and
  wrong-bit negative cases
- Multi-group union for deferred gates (grp-browse ∪ grp-ack gives both
  Browse and AlarmAcknowledge without leaking Read or Call)

Build: 0 errors on Server.csproj (verified against main repo build which
carries the gRPC-generated Galaxy driver artifacts the isolated worktree
lacks — that pre-existing gap is unrelated to these changes).
Test count: 247 → 275 (+28 unit, 0 failures).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-18 04:39:57 -04:00
Joseph Doherty
6968872e5d Merge branch 'feat/tasks-9-10-11-admin-hardening'
Wave 1 of the task-list run — three Admin hardening tasks:
- #9  resilient LDAP role-grant reads (ResilientLdapGroupRoleMappingService)
- #10 InteractiveServer render mode on 13 interactive pages + hub-URL fixes
- #11 ZTag/SAPID reservation pre-check in equipment CSV import (task #197)
2026-05-18 04:25:50 -04:00
Joseph Doherty
020c30f9a6 feat(admin): add ZTag/SAPID reservation pre-check to equipment CSV import (task #197)
ApplyReservationPreCheckAsync on EquipmentImportBatchService queries active
ExternalIdReservation rows in a single round-trip at parse time; rows whose ZTag
or SAPID is claimed by a different EquipmentUuid are moved from AcceptedRows to
RejectedRows with a descriptive reason. ImportEquipment.razor calls the check
after EquipmentCsvImporter.Parse so conflicts appear in the preview before the
operator clicks Stage + Finalise. Updated notice banner to reflect the pre-check
is now live; 6 new unit tests cover conflict, no-conflict, same-UUID, released-
reservation, and empty-input paths.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-18 04:24:29 -04:00
Joseph Doherty
a8dabc47f9 fix(admin): add InteractiveServer render mode to remaining interactive pages
ModbusAddressPreview (@bind on dropdowns + child ModbusAddressEditor with @oninput),
ModbusDiagnostics (@onclick Refresh), and NewCluster (EditForm with Nav.NavigateTo on submit)
were missed in the first pass — all three require interactivity but had no @rendermode.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-18 04:24:29 -04:00
Joseph Doherty
43291d7fdd fix(admin): add InteractiveServer render mode to all interactive Blazor pages; fix wrong hub URLs
Eight pages were using @onclick handlers, Timers, or HubConnections but had no @rendermode,
causing interactivity to be silently dead under static SSR. Added @rendermode RenderMode.InteractiveServer
(with the required @using Microsoft.AspNetCore.Components.Web) to: AlarmsHistorian, Certificates,
Fleet, Home, Hosts, Reservations, DraftEditor, and ImportEquipment.

Also fixed two hub URL bugs: AclsTab and RedundancyTab were connecting to the non-existent
/hubs/fleet-status path; corrected to /hubs/fleet which matches the MapHub<FleetStatusHub>
call in Program.cs. Build: 0 errors, 0 warnings.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-18 04:24:29 -04:00
Joseph Doherty
75b91ebb97 feat(admin): wrap LdapGroupRoleMappingService in Phase 6.1-style resilience pipeline (Phase 6.2 Stream A.2)
Add ResilientLdapGroupRoleMappingService — a singleton decorator that wraps the
hot-path GetByGroupsAsync call in a Polly pipeline (timeout 2s → retry 3× jittered
→ fallback to in-memory sealed snapshot) so a transient Config DB outage at
Admin sign-in falls back to the last-known-good mapping set rather than denying
every login. The static LdapOptions.GroupToRole bootstrap dictionary in
AdminRoleGrantResolver remains the lock-out-proof floor regardless of DB state.

DI wiring uses keyed services: LdapGroupRoleMappingService (EF, scoped) is
registered under key "LdapGroupRoleMappingService.Inner"; the resilient singleton
decorator is the primary ILdapGroupRoleMappingService binding. The singleton
avoids the captive-dependency anti-pattern by using IServiceScopeFactory to open
a short-lived scope for each DB call.

Write methods (CreateAsync, DeleteAsync, ListAllAsync) pass through unchanged —
resilience is read-path only per Phase 6.1 design decision.

15 new unit tests cover: DB success/failure/retry paths, snapshot sealing and
per-group-set isolation, order-independent cache key normalisation, cancellation
propagation, and pass-through method routing.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-18 04:24:22 -04:00
Joseph Doherty
412cdec9b1 Merge branch 'docs/reservations'
Add docs/Reservations.md documenting the external-ID reservation flow.
2026-05-18 03:48:27 -04:00
Joseph Doherty
b90718013e docs: add Reservations.md — external-ID reservation flow
Document the ZTag / SAPID external-ID reservation subsystem: what a
reservation is, why it sits outside the generation flow (decision #124),
the ExternalIdReservation table, the lifecycle (author → publish
precheck → publish-time MERGE → FleetAdmin release), and the
/reservations Admin page. Linked from the docs README Operational table.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-18 03:48:21 -04:00
Joseph Doherty
4a2e993a95 Merge branch 'chore/admin-trim-page-notices'
Trim explanatory notices from the role-grants and certificates pages.
2026-05-18 03:34:20 -04:00
Joseph Doherty
adbbb5e7d0 chore(admin): trim explanatory notices from role-grants and certificates
- Role grants: drop the page notice describing the LDAP-group → role
  mapping semantics; this is moving to the user instructions.
- Certificates: drop the trailing "operators should retry the rejected
  client's connection" note from the trust notice.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-18 03:34:13 -04:00
Joseph Doherty
a8ef73dcb5 Merge branch 'feat/ldap-role-grants-signin'
Consume DB-backed LDAP role grants at Admin sign-in, with fleet-wide
and cluster-scoped roles, and fix the role-grants page interactivity.
2026-05-18 03:18:27 -04:00
Joseph Doherty
22fd314694 fix(admin): make the role-grants page interactive
The role-grants page is the authoring surface for LdapGroupRoleMapping
rows, but it had no @rendermode — so it rendered as static SSR and its
@onclick handlers (Add grant, Revoke) never fired. App.razor's <Routes/>
sets no global render mode; only ClusterDetail opted in.

- Add @rendermode RenderMode.InteractiveServer.
- Fix the SignalR hub URL: the page connected to /hubs/fleet-status,
  but FleetStatusHub is mapped at /hubs/fleet. Static SSR masked this
  (OnAfterRenderAsync never ran); enabling interactivity surfaced the
  404 that terminated the circuit.

Verified in-browser: Add grant opens the form, a cluster-scoped grant
saves and lists, Revoke removes it.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-18 03:17:56 -04:00
Joseph Doherty
8adb83afee feat(admin): consume LDAP role grants at sign-in, incl. cluster scoping
The role-grants page authored LdapGroupRoleMapping rows but nothing
consumed them — sign-in only read the static appsettings GroupToRole
dictionary. Wire the DB-backed grants into the auth path.

- AdminRoleGrantResolver merges the static bootstrap dictionary (always
  fleet-wide, lock-out-proof) with DB grants; system-wide rows fold into
  fleet roles, cluster-scoped rows become (cluster, role) grants.
- Login emits a ClaimTypes.Role claim per fleet role and a cluster_role
  claim per cluster-scoped grant; lock-out check spans both scopes.
- ClusterRoleClaims + ClaimsPrincipal extensions resolve the effective
  role for a cluster (highest of fleet-wide and cluster-scoped).
- ClusterAuthorizeView gates cluster pages: ClusterDetail (view +
  ConfigEditor draft actions), DraftEditor (ConfigEditor / FleetAdmin
  publish), DiffViewer (ConfigViewer), ImportEquipment (ConfigEditor).
- RoleGrants page is now FleetAdmin-only; Account surfaces fleet-wide
  and cluster-scoped grants separately.

Control-plane only — decision #150 holds, NodeAcl is untouched.

Tests: AdminRoleGrantResolverTests + ClusterRoleClaimsTests (22).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-18 03:09:06 -04:00
Joseph Doherty
1e04796953 Merge branch 'feat/admin-technical-light-design'
Restyle the Admin web UI with the technical-light design system,
and fix the LDAP sign-in path so it actually authenticates against
GLAuth (form binding, service-account DN, user-search attribute).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-18 02:49:35 -04:00
Joseph Doherty
5f5bfe1ea5 fix: make Admin LDAP sign-in work against GLAuth
Three bugs blocked sign-in entirely:

- Login.razor is static-SSR but its form model lacked
  [SupplyParameterFromForm], so the posted username/password never
  bound — SignInAsync saw empty fields and bailed before LDAP was
  contacted. Annotate the model; seed it in OnInitialized since
  BL0008 forbids an initializer on a [SupplyParameterFromForm]
  property.
- appsettings.json ServiceAccountDn used ou=svcaccts, which GLAuth
  reads as a (non-existent) group — the service-account bind failed
  with "Group not found". Use cn=serviceaccount,dc=lmxopcua,dc=local.
- LdapAuthService resolved the user DN by searching (uid=...), but
  GLAuth keys users by cn. Add an LdapOptions.UserNameAttribute knob
  (default cn for GLAuth; set sAMAccountName for Active Directory)
  and use it for the search filter.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-18 02:48:00 -04:00
Joseph Doherty
482d5f5637 feat: restyle Admin UI with the technical-light design system
Adopt the technical-light design system across the Admin web UI:

- Vendor theme.css + IBM Plex woff2 fonts into wwwroot; include
  theme.css globally after Bootstrap.
- Rebuild MainLayout: top app-bar (brand mark, breadcrumb, connection
  pill) + hairline-ruled side rail with accent-bordered active link.
- Convert all 33 pages to the component catalog — tables to
  panel + data-table (num/mono columns), KPI cards to agg-grid,
  detail blocks to metric-card/kv rows, badges to chips, alerts to
  panel notice, headings to page-title/panel-head, .rise reveals.
- Buttons/forms stay on Bootstrap; theme.css restyles them via
  --bs-* overrides. View-specific layout lives in app.css; all
  colour/type comes from theme.css tokens.

Also fix a pre-existing /fleet 500: the node-state query ordered on
a property of a constructed FleetNodeRow record, which EF Core
cannot translate. Order the join's columns before projecting.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-18 02:20:09 -04:00
Joseph Doherty
31b9468102 Merge branch 'fix/admin-configdb-host'
fix: point Admin ConfigDb at the shared SQL host (10.100.0.35,14330).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-17 06:27:07 -04:00
Joseph Doherty
cf024c8150 fix: point Admin ConfigDb at the shared SQL host
The Admin appsettings.json still carried Server=localhost,14330 — a
straggler from before the 2026-04-28 Docker migration that moved SQL
Server onto the shared Linux host. Every other checked-in appsettings
was rewritten then; this one was missed, so the Admin web UI returned
HTTP 500 on every page (SqlException, connection timeout). Repoint it
at 10.100.0.35,14330 to match.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-17 05:04:14 -04:00
Joseph Doherty
0aee14686b Merge branch 'chore/solution-module-folders'
Organize the solution into module folders (Core/Server/Drivers/Client/
Tooling) on disk and in ZB.MOM.WW.OtOpcUa.slnx, with all .csproj, script,
and docs path references updated to match. Build green; unit tests pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-17 02:28:05 -04:00
Joseph Doherty
4e1751e1a4 docs: correct CLAUDE.md test commands for per-module test layout
The Build Commands block referenced tests/ZB.MOM.WW.OtOpcUa.Tests and
.IntegrationTests, which never existed in v2. Replace with the actual
per-module layout under tests/<module>/ and note which suites need
Docker fixtures or the central SQL Server.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-17 02:11:36 -04:00