Commit Graph

17 Commits

Author SHA1 Message Date
Joseph Doherty
3de688f8d6 fix(admin): resolve High code-review findings (Admin-003, Admin-004, Admin-005)
Admin-003 — SignalR hubs were anonymously reachable: an unauthenticated
client could open /hubs/fleet, /hubs/alerts and /hubs/script-log and
stream fleet state, alert detail text and server script-log contents.
Added [Authorize] to FleetStatusHub, AlertHub and ScriptLogHub, and
chained .RequireAuthorization() onto all three MapHub() calls as a
belt-and-braces backstop.

Admin-004 — appsettings.json committed live-looking secrets (the `sa`
ConfigDb password and the LDAP ServiceAccountPassword) in plaintext.
Replaced both with empty placeholders sourced from user-secrets (dev) or
the ConnectionStrings__ConfigDb / Authentication__Ldap__ServiceAccountPassword
environment variables (prod); added a UserSecretsId to the Admin csproj
and a fail-fast guard in Program.cs when ConfigDb is empty/missing.

Admin-005 — Login.razor performed SignInAsync from an interactive Blazor
circuit, where the original HTTP response has long completed so the auth
cookie was not emitted. Rewrote it as a static-rendered plain HTML form
(data-enhance="false") posting to a new AuthEndpoints.MapAuthEndpoints()
minimal-API handler (/auth/login, /auth/logout) that does the LDAP bind,
grant resolution, cookie SignInAsync and redirect while the endpoint
still owns the response. Includes an open-redirect guard on returnUrl.

Added xUnit + Shouldly regression tests: AuthEndpointsTests (login cookie
issuance, failed-bind redirect, open-redirect rejection, logout, anonymous
hub negotiate rejection) and AppSettingsSecretHygieneTests (no committed
secrets). All 26 auth-related tests pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-22 06:27:38 -04:00
Joseph Doherty
adf794f791 fix(server): resolve High code-review findings (Server-002, Server-009)
Server-002 — AuthorizationGate lax-mode no longer overrides explicit deny.
IsAllowed now switches on the evaluator's AuthorizationVerdict: Allow -> true,
Denied (an authored deny rule matched) -> false in BOTH strict and lax mode,
and only the indeterminate NotGranted case falls through to !_strictMode.
Previously `if (decision.IsAllowed) return true; return !_strictMode;` let lax
mode (the default) nullify authored NodeAcl deny rules for fully-resolved
sessions. The tri-state AuthorizationVerdict.Denied member is now honoured.

Server-009 — LDAP is secure-by-default. LdapOptions.AllowInsecureLdap now
defaults to false (was true) and Program.cs's config fallback reads `?? false`
(was `?? true`), so an LDAP-enabled deployment will not bind credentials over
an unencrypted socket unless an operator explicitly opts in. Program.cs also
logs a startup warning when LDAP is enabled with UseTls=false and
AllowInsecureLdap=true, flagging the clear-text server->LDAP credential hop.

Regression tests: AuthorizationGateTests covers all four verdict x mode
combinations via a fixed-verdict evaluator stub; new LdapOptionsTests asserts
the secure defaults. Both Server and Server.Tests build clean; the 15 targeted
tests pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-22 06:11:06 -04:00
Joseph Doherty
973730d0eb fix(admin): enforce authentication on all Admin UI routes (Admin-001/002)
Admin-001: Routes.razor used a plain RouteView, so the page-level
[Authorize] attributes on 11 pages were inert — every page, including
mutating ones, was reachable fully unauthenticated.
Admin-002: several pages (e.g. NewCluster, which writes config rows)
carried no auth attribute at all.

- Routes.razor: RouteView → AuthorizeRouteView with NotAuthorized /
  Authorizing slots; add RedirectToLogin component.
- Program.cs: SetFallbackPolicy(RequireAuthenticatedUser) — secure by
  default for new pages/endpoints.
- Login.razor: [AllowAnonymous] so login stays reachable; login page,
  /auth/* endpoints and static assets remain anonymous.
- Add [Authorize] to the previously un-gated pages; NewCluster gated to
  the CanPublish (FleetAdmin) policy.

Regression tests in PageAuthorizationTests pin that anonymous requests
to protected/mutating routes are rejected and that login + static
assets stay anonymously reachable. Admin test suite: 210/210 pass.

Resolves code-review findings Admin-001 and Admin-002 (Critical).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-22 05:53:58 -04:00
Joseph Doherty
571066130b fix(server): stop WriteNodeIdUnknown infinite recursion (Server-001)
WriteNodeIdUnknown called itself unconditionally as its first statement
— unbounded recursion with no base case → StackOverflowException, an
uncatchable process crash reachable by any client issuing a HistoryRead
on an unresolvable NodeId (remote DoS).

Replace the self-call with the result-slot assignment, mirroring
WriteUnsupported / WriteInternalError. The helper is now internal so the
regression test can pin the StatusCode without a server fixture.

Resolves code-review finding Server-001 (Critical).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-22 05:53:44 -04:00
Joseph Doherty
419eda256b feat(server): route OPC UA Part 9 AddComment to ScriptedAlarmEngine
RouteScriptedAlarmMethodCalls now handles ConditionType.AddComment
alongside Acknowledge/Confirm, dispatching to engine.AddCommentAsync.
An empty comment is rejected by the Part 9 state machine and surfaced
as BadInvalidArgument. MapCallOperation gates AddComment at the
AlarmAcknowledge tier — there is no dedicated AddComment permission bit.

Closes phase-7-status.md Gap 1: all Part 9 alarm methods now route to
the engine. Adds 3 unit tests.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-18 09:43:03 -04:00
Joseph Doherty
c5915700bd feat(server): route OPC UA Part 9 shelve methods to ScriptedAlarmEngine (#24)
OneShotShelve / TimedShelve / Unshelve now reach the ScriptedAlarmEngine.
Scripted-alarm condition nodes get a ShelvedStateMachine subtree created
before alarm.Create so the stack wires each shelve method's dispatch
handler; AlarmConditionState.OnShelve / OnTimedUnshelve route to the
engine and mirror the result onto the OPC UA node via SetShelvingState.

The three per-instance shelve method NodeIds are indexed so the Call gate
resolves them to OpcUaOperation.AlarmShelve instead of falling through to
generic Call. Engine dispatch is split into the node-free InvokeEngineShelve
so the routing decision is unit-testable.

Adds 9 unit tests; updates phase-7-status.md Gap 1 (only AddComment remains
unwired) and the #24 entry in looseends.md.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-18 09:31:30 -04:00
Joseph Doherty
392b219233 fix(tests): stabilize three flaky tests under parallel full-solution load
#1 EventPumpBoundedChannelTests.Tags_metrics_with_client_name_for_multi_driver_hosts:
Replace fixed Task.Delay(100) with a poll-until-condition loop (5 s
timeout, 25 ms poll) so the test waits until the galaxy.events.received
measurement for galaxy.client=Driver-X actually lands in the listener.
Also adds lock(captured) in the MeterListener callback and at all reads,
since Counter.Add() fires the callback on the RunAsync background thread.

#2 VirtualTagEngineTests.Upstream_change_triggers_cascade_through_two_levels:
After waiting for B=15.0, also await WaitForConditionAsync for C=30.0
before asserting C. The cascade runs B then C sequentially under the
_evalGate semaphore; the prior code could read C while its evaluation
had not yet acquired the gate.

#3 ThreeUserInteropMatrixTests.Admin_Resolves_All_Five_Groups_From_LDAP:
Wrap the AuthenticateAsync call in a 15 s linked CancellationTokenSource
with one retry so transient GLAuth latency spikes under parallel test
load do not cause a CancellationToken expiry before the LDAP bind/search
complete.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-18 05:59:00 -04:00
Joseph Doherty
41f133a337 feat(admin-ui): add /virtual-tags, /scripted-alarms, and /script-log pages (tasks #25, #26, #27)
Gap 2 (#25): VirtualTagsTab.razor + /virtual-tags global page — list/create/toggle
virtual tags per draft generation with DataType, Script, trigger, Historize, Enabled
fields. Tab wired into DraftEditor.

Gap 3 (#26): ScriptedAlarmsTab.razor + /scripted-alarms global page — list/create
scripted alarms with AlarmType, Severity, MessageTemplate, PredicateScript,
HistorizeToAveva, Retain. SeverityBand helper shows Low/Medium/High/Critical label.
Tab wired into DraftEditor.

Gap 4 (#27): ScriptLogHub (SignalR IAsyncEnumerable stream) tails scripts-*.log with
optional ScriptName filter; ScriptLog.razor provides Start/Stop/Clear controls plus
level filter dropdown. Hub registered at /hubs/script-log in Program.cs.

Nav rail gains a "Scripting" eyebrow with entries for all three pages.
19 new unit tests for ScriptLogHub parse/filter/tail helpers (Category=Unit).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-18 05:58:59 -04:00
Joseph Doherty
bc8ff7a5fe feat(phase7): wire RingBufferHistoryWriter as production IHistoryWriter for virtual tags (Gap 5)
Closes Phase 7 Gap 5: VirtualTagEngine called IHistoryWriter.Record per evaluation
when Historize=true but Phase7EngineComposer always passed NullHistoryWriter, so
virtual-tag history was computed but never persisted.

The fix:
- New RingBufferHistoryWriter implements both IHistoryWriter (write port for the
  evaluation pipeline) and IHistorianDataSource (read port for IHistoryRouter so
  OPC UA HistoryRead on virtual-tag nodes resolves here). Maintains one bounded
  ring buffer (1000 samples, configurable) per tag path; Record() is O(1) and
  never blocks evaluation.
- Phase7EngineComposer.Compose now accepts IHistoryRouter? and, when any
  VirtualTagDefinition.Historize=true, creates a RingBufferHistoryWriter, passes
  it to VirtualTagEngine as historyWriter, adds it to the disposables list, and
  registers it under the "virtual:" prefix in the router for HistoryRead dispatch.
- Phase7Composer accepts IHistoryRouter? from DI (already registered as singleton
  in Program.cs) and threads it through to Phase7EngineComposer.Compose.
- NullHistoryWriter remains as fallback when no tags request historization.
- 16 new unit tests in RingBufferHistoryWriterTests.cs cover ring-buffer semantics,
  eviction, per-tag isolation, ReadRawAsync windowing, IHistorianDataSource stubs,
  router registration, and the Historize=false / null-router fallback paths.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-18 05:58:50 -04:00
Joseph Doherty
ca149ce907 feat(phase7): route OPC UA Part 9 Acknowledge/Confirm methods to ScriptedAlarmEngine (task #24)
Gap 1 of phase-7-status.md. Intercepts AcknowledgeableConditionType_Acknowledge and
AcknowledgeableConditionType_Confirm calls in DriverNodeManager.Call and dispatches
them to ScriptedAlarmEngine so OPC UA HMI clients can acknowledge/confirm scripted alarms
in addition to the existing Admin UI path. Shelve methods deferred (per-instance NodeIds,
not well-known type MethodIds — follow-up task). AlarmEngine is now exposed through
Phase7ComposedSources so the server wire-up passes it to every DriverNodeManager. 13 new
unit tests cover dispatch kernel, identity fallback, batch handling, and error paths.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-18 05:58:50 -04:00
Joseph Doherty
41da84293a fix(tests): point DB-test fixture defaults at the migrated SQL host
Four DB-backed test fixtures still defaulted DefaultServer to
localhost,14330 — missed in the 2026-04-28 Docker migration that moved
SQL Server off this VM onto the shared host 10.100.0.35. With no SQL on
localhost, all 31 DB-backed tests failed with connection timeouts,
which in turn failed the Phase 6 compliance gate (phase-6-all.ps1).

Updated SchemaComplianceFixture, HostStatusPublisherTests,
FleetStatusPollerTests, and AdminServicesIntegrationTests to default to
10.100.0.35,14330 (still overridable via OTOPCUA_CONFIG_TEST_SERVER).
Verified: Configuration.Tests 91 pass, HostStatusPublisher 4 pass,
FleetStatusPoller + AdminServicesIntegration 5 pass — all 31 green.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-18 05:12:20 -04:00
Joseph Doherty
bb1854b2f8 feat(admin): add five-identifier ranked equipment search (Phase 6.4 Stream B.5)
Implements the missing Stream B.5 search from the Phase 6.4 plan:
- EquipmentService.SearchAsync scopes to a cluster, scores hits across
  ZTag / MachineCode / SAPID / EquipmentId / EquipmentUuid (decision #117):
  exact = 100, prefix = 50, fuzzy (opt-in) = 20; published generation
  outranks draft on equal scores per spec.
- EquipmentSearchHit record carries Score + MatchedField for badge display.
- EquipmentTab.razor gains a search panel with per-row matched-field chips
  (green exact, amber prefix, grey fuzzy) and fuzzy opt-in checkbox.
- 14 new unit tests in EquipmentSearchTests.cs (Category=Unit) cover exact,
  prefix, fuzzy, case-insensitivity, tie-break, cross-cluster isolation, and
  maxResults cap; all 148 admin unit tests pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-18 04:39:58 -04:00
Joseph Doherty
70d7166a39 test(server): harden deferred authz gates — task #12 Browse/Subscribe/Call/AlarmAck
Add DeferredGateHardeningTests (28 unit tests) covering the Phase 6.2
compliance-checklist gaps left by the per-gate unit suites that shipped
with the gate implementations:

- Lax-mode fall-through for CreateMonitoredItems and Call gates (null
  identity and identity-without-LDAP-groups both skip denial in lax mode,
  consistent with BrowseGatingTests.Lax_mode_null_identity)
- Flag isolation: Subscribe-only grant does NOT imply Read; Read-only
  grant does NOT imply Subscribe; HistoryRead-only grant does NOT imply
  Read and vice versa (Phase 6.2 compliance: "HistoryRead uses its own flag")
- Alarm-bit isolation: AlarmAcknowledge alone does not grant AlarmConfirm
  or AlarmShelve; Browse alone does not grant AlarmAcknowledge
- AlarmShelve falls through to OpcUaOperation.Call in MapCallOperation
  (documents the ShelvedStateMachine per-instance NodeId limitation noted
  in the implementation, with the follow-up path: MethodCall grant covers it)
- Complete OpcUaOperation→NodePermissions mapping coverage for all deferred
  operations (Browse, CreateMonitoredItems, TransferSubscriptions, Call,
  AlarmAcknowledge, AlarmConfirm, AlarmShelve) — both positive and
  wrong-bit negative cases
- Multi-group union for deferred gates (grp-browse ∪ grp-ack gives both
  Browse and AlarmAcknowledge without leaking Read or Call)

Build: 0 errors on Server.csproj (verified against main repo build which
carries the gRPC-generated Galaxy driver artifacts the isolated worktree
lacks — that pre-existing gap is unrelated to these changes).
Test count: 247 → 275 (+28 unit, 0 failures).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-18 04:39:57 -04:00
Joseph Doherty
020c30f9a6 feat(admin): add ZTag/SAPID reservation pre-check to equipment CSV import (task #197)
ApplyReservationPreCheckAsync on EquipmentImportBatchService queries active
ExternalIdReservation rows in a single round-trip at parse time; rows whose ZTag
or SAPID is claimed by a different EquipmentUuid are moved from AcceptedRows to
RejectedRows with a descriptive reason. ImportEquipment.razor calls the check
after EquipmentCsvImporter.Parse so conflicts appear in the preview before the
operator clicks Stage + Finalise. Updated notice banner to reflect the pre-check
is now live; 6 new unit tests cover conflict, no-conflict, same-UUID, released-
reservation, and empty-input paths.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-18 04:24:29 -04:00
Joseph Doherty
75b91ebb97 feat(admin): wrap LdapGroupRoleMappingService in Phase 6.1-style resilience pipeline (Phase 6.2 Stream A.2)
Add ResilientLdapGroupRoleMappingService — a singleton decorator that wraps the
hot-path GetByGroupsAsync call in a Polly pipeline (timeout 2s → retry 3× jittered
→ fallback to in-memory sealed snapshot) so a transient Config DB outage at
Admin sign-in falls back to the last-known-good mapping set rather than denying
every login. The static LdapOptions.GroupToRole bootstrap dictionary in
AdminRoleGrantResolver remains the lock-out-proof floor regardless of DB state.

DI wiring uses keyed services: LdapGroupRoleMappingService (EF, scoped) is
registered under key "LdapGroupRoleMappingService.Inner"; the resilient singleton
decorator is the primary ILdapGroupRoleMappingService binding. The singleton
avoids the captive-dependency anti-pattern by using IServiceScopeFactory to open
a short-lived scope for each DB call.

Write methods (CreateAsync, DeleteAsync, ListAllAsync) pass through unchanged —
resilience is read-path only per Phase 6.1 design decision.

15 new unit tests cover: DB success/failure/retry paths, snapshot sealing and
per-group-set isolation, order-independent cache key normalisation, cancellation
propagation, and pass-through method routing.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-18 04:24:22 -04:00
Joseph Doherty
8adb83afee feat(admin): consume LDAP role grants at sign-in, incl. cluster scoping
The role-grants page authored LdapGroupRoleMapping rows but nothing
consumed them — sign-in only read the static appsettings GroupToRole
dictionary. Wire the DB-backed grants into the auth path.

- AdminRoleGrantResolver merges the static bootstrap dictionary (always
  fleet-wide, lock-out-proof) with DB grants; system-wide rows fold into
  fleet roles, cluster-scoped rows become (cluster, role) grants.
- Login emits a ClaimTypes.Role claim per fleet role and a cluster_role
  claim per cluster-scoped grant; lock-out check spans both scopes.
- ClusterRoleClaims + ClaimsPrincipal extensions resolve the effective
  role for a cluster (highest of fleet-wide and cluster-scoped).
- ClusterAuthorizeView gates cluster pages: ClusterDetail (view +
  ConfigEditor draft actions), DraftEditor (ConfigEditor / FleetAdmin
  publish), DiffViewer (ConfigViewer), ImportEquipment (ConfigEditor).
- RoleGrants page is now FleetAdmin-only; Account surfaces fleet-wide
  and cluster-scoped grants separately.

Control-plane only — decision #150 holds, NodeAcl is untouched.

Tests: AdminRoleGrantResolverTests + ClusterRoleClaimsTests (22).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-18 03:09:06 -04:00
Joseph Doherty
a25593a9c6 chore: organize solution into module folders (Core/Server/Drivers/Client/Tooling)
Group all 69 projects into category subfolders under src/ and tests/ so the
Rider Solution Explorer mirrors the module structure. Folders: Core, Server,
Drivers (with a nested Driver CLIs subfolder), Client, Tooling.

- Move every project folder on disk with git mv (history preserved as renames).
- Recompute relative paths in 57 .csproj files: cross-category ProjectReferences,
  the lib/ HintPath+None refs in Driver.Historian.Wonderware, and the external
  mxaccessgw refs in Driver.Galaxy and its test project.
- Rebuild ZB.MOM.WW.OtOpcUa.slnx with nested solution folders.
- Re-prefix project paths in functional scripts (e2e, compliance, smoke SQL,
  integration, install).

Build green (0 errors); unit tests pass. Docs left for a separate pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-17 01:55:28 -04:00