Joseph Doherty
ecc2389ca8
AclsTab Probe-this-permission — first of three #196 slices. New /clusters/{ClusterId}/draft/{GenerationId} ACLs-tab gains a probe card above the grant table so operators can ask the trie "if cn=X asks for permission Y on node Z, would it be granted, and which rows contributed?" without shell-ing into the DB. Service thinly wraps the same PermissionTrieBuilder + PermissionTrie.CollectMatches call path the Server's dispatch layer uses at request time, so a probe answer is by construction identical to what the live server would decide. New PermissionProbeService.ProbeAsync(generationId, ldapGroup, NodeScope, requiredFlags) — loads the target generation's NodeAcl rows filtered to the cluster (critical: without the cluster filter, cross-cluster grants leak into the probe which tested false-positive in the unit suite), builds a trie, CollectMatches against the supplied scope + [ldapGroup], ORs the matched-grant flags into Effective, compares to Required. Returns PermissionProbeResult(Granted, Required, Effective, Matches) — Matches carries LdapGroup + Scope + PermissionFlags per matched row so the UI can render the contribution chain. Zero side effects + no audit rows — a failing probe is a question, not a denial. AclsTab.razor gains the probe card at the top (before the New-grant form + grant table): six inputs for ldap group + every NodeScope level (NamespaceId → UnsAreaId → UnsLineId → EquipmentId → TagId — blank fields become null so the trie walks only as deep as the operator specified), a NodePermissions dropdown filtered to skip None, Probe button, green Granted / red Denied badge + Required/Effective bitmask display, and (when matches exist) a small table showing which LdapGroup matched at which level with which flags. Admin csproj adds ProjectReference to Core — the trie + NodeScope live there + were previously Server-only. Five new PermissionProbeServiceTests covering: cluster-level row grants a namespace-level read; no-group-match denies with empty Effective; matching group but insufficient flags (Browse+Read vs WriteOperate required) denies with correct Effective bitmask; cross-cluster grants stay isolated (c2's WriteOperate does NOT leak into c1's probe); generation isolation (gen1's Read-only does NOT let gen2's WriteOperate-requiring probe pass). Admin.Tests 92/92 passing (was 87, +5). Admin builds 0 errors. Remaining #196 slices — SignalR invalidation + draft-diff ACL section — ship in follow-up PRs so the review surface per PR stays tight.
...
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com >
2026-04-20 00:28:17 -04:00
Joseph Doherty
1f3343e61f
OpenTelemetry redundancy metrics + RoleChanged SignalR push. Closes instrumentation + live-push slices of task #198 ; the exporter wiring (OTLP vs Prometheus package decision) is split to new task #201 because the collector/scrape-endpoint choice is a fleet-ops decision that deserves its own PR rather than hardcoded here. New RedundancyMetrics class (Singleton-registered in DI) owning a System.Diagnostics.Metrics.Meter("ZB.MOM.WW.OtOpcUa.Redundancy", "1.0.0"). Three ObservableGauge instruments — otopcua.redundancy.primary_count / secondary_count / stale_count — all tagged by cluster.id, populated by SetClusterCounts(clusterId, primary, secondary, stale) which the poller calls at the tail of every tick; ObservableGauge callbacks snapshot the last value set under a lock so the reader (OTel collector, dotnet-counters) sees consistent tuples. One Counter — otopcua.redundancy.role_transition — tagged cluster.id, node.id, from_role, to_role; ideal for tracking "how often does Cluster-X failover" + "which node transitions most" aggregate queries. In-box Metrics API means zero NuGet dep here — the exporter PR adds OpenTelemetry.Extensions.Hosting + OpenTelemetry.Exporter.OpenTelemetryProtocol or OpenTelemetry.Exporter.Prometheus.AspNetCore to actually ship the data somewhere. FleetStatusPoller extended with role-change detection. Its PollOnceAsync now pulls ClusterNode rows alongside the existing ClusterNodeGenerationState scan, and a new PollRolesAsync walks every node comparing RedundancyRole to the _lastRole cache. On change: records the transition to RedundancyMetrics + emits a RoleChanged SignalR message to both FleetStatusHub.GroupName(cluster) + FleetStatusHub.FleetGroup so cluster-scoped + fleet-wide subscribers both see it. First observation per node is a bootstrap (cache fill) + NOT a transition — avoids spurious churn on service startup or pod restart. UpdateClusterGauges groups nodes by cluster + sets the three gauge values, using ClusterNodeService.StaleThreshold (shared 30s convention) for staleness so the /hosts page + the gauge agree. RoleChangedMessage record lives alongside NodeStateChangedMessage in FleetStatusPoller.cs. RedundancyTab.razor subscribes to the fleet-status hub on first parameters-set, filters RoleChanged events to the current cluster, reloads the node list + paints a blue info banner ("Role changed on node-a: Primary → Secondary at HH:mm:ss UTC") so operators see the transition without needing to poll-refresh the page. IAsyncDisposable closes the connection on tab swap-away. Two new RedundancyMetricsTests covering RecordRoleTransition tag emission (cluster.id + node.id + from_role + to_role all flow through the MeterListener callback) + ObservableGauge snapshot for two clusters (assert primary_count=1 for c1, stale_count=1 for c2). Existing FleetStatusPollerTests ctor-line updated to pass a RedundancyMetrics instance; all tests still pass. Full Admin.Tests suite 87/87 passing (was 85, +2). Admin project builds 0 errors. Task #201 captures the exporter-wiring follow-up — OpenTelemetry.Extensions.Hosting + OTLP vs Prometheus + /metrics endpoint decision, driven by fleet-ops infra direction.
...
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com >
2026-04-19 23:16:09 -04:00
Joseph Doherty
c0751fdda5
ExternalIdReservation merge inside FinaliseBatchAsync. Closes task #197 . The FinaliseBatch docstring called this out as a narrower follow-up pending a concurrent-insert test matrix, and the CSV import UI PR ( #163 ) noted that operators would see raw DbUpdate UNIQUE-constraint messages on ZTag/SAPID collision until this landed. Now every finalised-batch row reserves ZTag + SAPID in the same EF transaction as the Equipment inserts, so either both commit atomically or neither does. New MergeReservation helper handles the four outcomes per (Kind, Value) pair: (1) value empty/whitespace → skip the reservation entirely (operator left the optional identifier blank); (2) active reservation exists for same EquipmentUuid → bump LastPublishedAt + reuse (re-finalising a batch against the same equipment must be idempotent, e.g. a retry after a transient DB blip); (3) active reservation exists for a DIFFERENT EquipmentUuid → throw ExternalIdReservationConflictException with the conflicting UUID + originating cluster + first-published timestamp so operator sees exactly who owns the value + where to resolve it (release via sp_ReleaseExternalIdReservation or pick a new ZTag); (4) no active reservation → create a fresh row with FirstPublishedBy = batch.CreatedBy + FirstPublishedAt = transaction time. Pre-commit overlap scan uses one round-trip (WHERE Kind+Value IN the batch's distinct sets, filtered to ReleasedAt IS NULL so explicitly-released values can be re-issued per decision #124 ) + caches the results in a Dictionary keyed on (Kind, value.ToLowerInvariant()) for O(1) lookup during the row loop. Race-safety catch: if another finalise commits between our cache-load + our SaveChanges, SQL Server surfaces a 2601/2627 unique-index violation against UX_ExternalIdReservation_KindValue_Active — IsReservationUniquenessViolation walks the inner-exception chain for that specific signature + rethrows as ExternalIdReservationConflictException so the UI shows a clean message instead of a raw DbUpdateException. The index-name match means unrelated filtered-unique violations (future indices) don't get mis-classified. Test-fixture Row() helper updated to generate unique SAPID per row (sap-{ZTag}) — the prior shared SAPID="sap" worked only because reservations didn't exist; two rows sharing a SAPID under different EquipmentUuids now collide as intended by decision #124 's fleet-wide uniqueness rule. Four new tests: (a) finalise creates both ZTag + SAPID reservations with expected Kind + Value; (b) re-finalising same EquipmentUuid's ZTag from a different batch does not create a duplicate (LastPublishedAt refresh only); (c) different EquipmentUuid claiming the same ZTag throws ExternalIdReservationConflictException with the ZTag value in the message + Equipment row for the second batch is NOT inserted (transaction rolled back cleanly); (d) row with empty ZTag + empty SAPID skips reservation entirely. Full Admin.Tests suite 85/85 passing (was 81 before this PR, +4). Admin project builds 0 errors. Note: the InMemory EF provider doesn't enforce filtered-unique indices, so the IsReservationUniquenessViolation catch is exercised only in the SQL Server integration path — the in-memory tests cover the cache-level conflict detection in MergeReservation instead, which is the first line of defence + catches the same-batch + published-vs-staged cases. The DbUpdate catch protects only the last-second race where two concurrent transactions both passed the cache check.
...
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com >
2026-04-19 23:02:31 -04:00
Joseph Doherty
5ee510dc1a
UnsTab native HTML5 drag/drop + 409 concurrent-edit modal + optimistic-concurrency commit path. Closes UI slice of task #153 (Phase 6.4 Stream A UI follow-up). Playwright E2E smoke is split into new task #199 — Playwright install + WebApplicationFactory + seeded-DB harness is genuinely its own infra-setup PR. Native HTML5 attributes (draggable, @ondragstart, @ondragover, @ondragleave, @ondrop) deliberately over MudBlazor per the task title — no MudBlazor ever joins this project. Two new service methods on UnsService land the data layer the existing UnsImpactAnalyzer assumed but which didn't actually exist: (1) LoadSnapshotAsync(generationId) — walks UnsAreas + UnsLines + per-line equipment counts + builds a UnsTreeSnapshot including a 16-char SHA-256 revision token computed deterministically over the sorted (kind, id, parent, name, notes) tuple-set so it's stable across processes + changes whenever any row is added / modified / deleted; (2) MoveLineAsync(generationId, expectedToken, lineId, targetAreaId) — re-parents one line inside the same draft under an EF transaction, recomputes the current revision token from freshly-loaded rows, and throws DraftRevisionConflictException when the caller-supplied token no longer matches. Token mismatch means another operator mutated the draft between preview + commit + the move rolls back rather than clobbering their work. No-op same-area drop is a silent return. Cross-generation move is prevented by the generationId filter on the transaction reads. UnsTab.razor gains draggable="true" on every line row with @ondragstart capturing the LineId into _dragLineId, and every area row is a drop target (@ondragover with :preventDefault so the browser accepts drops, @ondrop kicking off OnLineDroppedAsync). Drop path loads a fresh snapshot, builds a UnsMoveOperation(Kind=LineMove, source/target cluster matching because cross-cluster is decision-#82 rejected), runs UnsImpactAnalyzer.Analyze + shows a Bootstrap modal rendered inline in the component — modal shows HumanReadableSummary + equipment/tag counts + any CascadeWarnings list. Confirm button calls MoveLineAsync with the snapshot's RevisionToken; DraftRevisionConflictException surfaces a separate red-header "Draft changed — refresh required" modal with a Reload button that re-fetches areas + lines from the DB. New DraftRevisionConflictException in UnsService.cs, co-located with the service that throws it. Five new UnsServiceMoveTests covering LoadSnapshotAsync (areas + lines + equipment counts), RevisionToken stability between two reads, RevisionToken changes on AddLineAsync, MoveLineAsync happy path reparents the line in the DB, MoveLineAsync with stale token throws DraftRevisionConflictException + leaves the DB unchanged. Admin suite 81/81 passing (was 76, +5). Admin project builds 0 errors. Task #199 captures the deferred Playwright E2E smoke — drag a line onto a different area in a real browser, assert preview modal contents, click Confirm, assert the line row shows the new area. That PR stands up a new tests/ZB.MOM.WW.OtOpcUa.Admin.E2ETests project with Playwright + WebApplicationFactory + seeded InMemory DbContext.
...
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com >
2026-04-19 22:30:48 -04:00
Joseph Doherty
13d5a7968b
Admin RedundancyTab — per-cluster read-only topology view. Closes the UI slice of task #149 (Phase 6.3 Stream E — Admin UI RedundancyTab + OpenTelemetry metrics + SignalR); the OpenTelemetry metrics + RoleChanged SignalR push are split into new follow-up task #198 because each is a structural add that deserves its own test matrix + NuGet-dep decision rather than riding this UI PR. New /clusters/{ClusterId} Redundancy tab slotted between ACLs and Audit in the existing ClusterDetail tab bar. Shows each ClusterNode row in the cluster with columns Node / Role (Primary green, Secondary blue, Standalone primary-blue badge) / Host / OPC UA port / ServiceLevel base / ApplicationUri (text-break so the long urn: doesn't blow out the table) / Enabled badge / Last seen (relative age via the same FormatAge helper as Hosts.razor, with a yellow "Stale" chip once LastSeenAt crosses the 30s threshold shared with HostStatusService.StaleThreshold — a missed heartbeat plus clock-skew buffer). Four summary cards above the table — total Nodes, Primary count, Secondary count, Stale count. Two guard-rail alerts: (a) red "No Primary or Standalone" when the cluster has no authoritative write target (all rows are Secondaries — read-only until one is promoted by the server-side RedundancyCoordinator apply-lease flow); (b) red "Split-brain" when >1 Primary exists — apply-lease enforcement at the coordinator level should have made this impossible, so the alert implies a hand-edited DB row + an investigation. New ClusterNodeService with ListByClusterAsync (ordered by ServiceLevelBase descending so Primary rows with higher base float to the top) + a static IsStale predicate matching HostStatusService's 30s convention. DI-registered alongside the existing scoped services in Program.cs. Writes (role swap, enable/disable) are deliberately absent from the service — they go through the RedundancyCoordinator apply-lease flow on the server side + direct DB mutation from Admin would race with it. New ClusterNodeServiceTests covering IsStale across null/recent/old LastSeenAt + ListByClusterAsync ordering + cluster filter. 4/4 new tests passing; full Admin.Tests suite 76/76 (was 72 before this PR, +4). Admin project builds 0 errors. Task #198 captures the deferred work: (1) OpenTelemetry Meter for primary/secondary/stale counts + role_transition counter with from/to/node tags + OTLP exporter config; (2) RoleChanged SignalR push — extend FleetStatusPoller to detect RedundancyRole changes on ClusterNode rows + emit a RoleChanged hub message so the RedundancyTab refreshes instantly instead of on-page-load polling.
...
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com >
2026-04-19 22:14:25 -04:00
Joseph Doherty
ad131932d3
Phase 6.4 Stream B.2-B.4 server-side — EquipmentImportBatch staging + FinaliseBatch transaction
...
Closes the server-side/data-layer piece of Phase 6.4 Stream B.2-B.4. The
CSV-import preview + modal UI (Stream B.3/B.5) still belongs to the Admin
UI follow-up — this PR owns the staging tables + atomic finalise alone.
Configuration:
- New EquipmentImportBatch entity (Id, ClusterId, CreatedBy, CreatedAtUtc,
RowsStaged/Accepted/Rejected, FinalisedAtUtc?). Composite index on
(CreatedBy, FinalisedAtUtc) powers the Admin preview modal's "my open
batches" query.
- New EquipmentImportRow entity — one row per CSV row, 8 required columns
from decision #117 + 9 optional from decision #139 + IsAccepted flag +
RejectReason. FK to EquipmentImportBatch with cascade delete so
DropBatch collapses the whole tree.
- EF migration 20260419_..._AddEquipmentImportBatch.
- SchemaComplianceTests expected tables list gains the two new tables.
Admin.Services.EquipmentImportBatchService:
- CreateBatchAsync — new header row, caller-supplied ClusterId + CreatedBy.
- StageRowsAsync(batchId, acceptedRows, rejectedRows) — bulk-inserts the
parsed CSV rows into staging. Rejected rows carry LineNumberInFile +
RejectReason for the preview modal. Throws when the batch is finalised.
- DropBatchAsync — removes batch + cascaded rows. Throws when the batch
was already finalised (rollback via staging is not a time machine).
- FinaliseBatchAsync(batchId, generationId, driverInstanceId, unsLineId) —
atomic apply. Opens an EF transaction when the provider supports it
(SQL Server in prod; InMemory in tests skips the tx), bulk-inserts
every accepted staging row into Equipment, stamps
EquipmentImportBatch.FinalisedAtUtc, commits. Failure rolls back so
Equipment never partially mutates. Idempotent-under-double-call:
second finalise throws ImportBatchAlreadyFinalisedException.
- ListByUserAsync(createdBy, includeFinalised) — the Admin preview modal's
backing query. OrderByDescending on CreatedAtUtc so the most-recent
batch shows first.
- Two exception types: ImportBatchNotFoundException +
ImportBatchAlreadyFinalisedException.
ExternalIdReservation merging (ZTag + SAPID fleet-wide uniqueness) is NOT
done here — a narrower follow-up wires it once the concurrent-insert test
matrix is green.
Tests (10 new EquipmentImportBatchServiceTests, all pass):
- CreateBatch populates Id + CreatedAtUtc + zero-ed counters.
- StageRows accepted + rejected both persist; counters advance.
- DropBatch cascades row delete.
- DropBatch after finalise throws.
- Finalise translates accepted staging rows → Equipment under the target
GenerationId + DriverInstanceId + UnsLineId.
- Finalise twice throws.
- Finalise of unknown batch throws.
- Stage after finalise throws.
- ListByUserAsync filters by creator + finalised flag.
- Drop of unknown batch is a no-op (idempotent rollback).
Full solution dotnet test: 1235 passing (was 1225, +10). Pre-existing
Client.CLI Subscribe flake unchanged.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com >
2026-04-19 14:55:39 -04:00
Joseph Doherty
560a961cca
Phase 6.4 Stream A + B data layer — UnsImpactAnalyzer + EquipmentCsvImporter (parser)
...
Ships the pure-logic data layer of Phase 6.4. Blazor UI pieces
(UnsTab drag/drop, CSV import modal, preview table, FinaliseImportBatch txn,
staging tables) are deferred to visual-compliance follow-ups (tasks #153 ,
#155 , #157 ).
Admin.Services additions:
- UnsImpactAnalyzer.Analyze(snapshot, move) — pure-function, no I/O. Three
move variants: LineMove, AreaRename, LineMerge. Returns UnsImpactPreview
with AffectedEquipmentCount + AffectedTagCount + CascadeWarnings +
RevisionToken + HumanReadableSummary the Admin UI shows in the confirm
modal. Cross-cluster moves rejected with CrossClusterMoveRejectedException
per decision #82 . Missing source/target throws UnsMoveValidationException.
Surfaces sibling-line same-name ambiguity as a cascade warning.
- DraftRevisionToken — opaque revision fingerprint. Preview captures the
token; Confirm compares it. The 409-concurrent-edit UX plumbs through on
the Razor-page follow-up (task #153 ). Matches(other) is null-safe.
- UnsTreeSnapshot + UnsAreaSummary + UnsLineSummary — snapshot shape the
caller hands to the analyzer. Tests build them in-memory without a DB.
- EquipmentCsvImporter.Parse(csvText) — RFC 4180 CSV parser per decision #95 .
Version-marker contract: line 1 must be "# OtOpcUaCsv v1" (future shapes
bump the version). Required columns from decision #117 + optional columns
from decision #139 . Rejects unknown columns, duplicate column names,
blank required fields, duplicate ZTags within the file. Quoted-field
handling supports embedded commas + escaped "" quotes. Returns
EquipmentCsvParseResult { AcceptedRows, RejectedRows } so the preview
modal renders accept/reject counts without re-parsing.
Tests (22 new, all pass):
- UnsImpactAnalyzerTests (9): line move counts equipment + tags; cross-
cluster throws; unknown source/target throws validation; ambiguous same-
name target raises warning; area rename sums across lines; line merge
cross-area warns; same-area merge no warning; DraftRevisionToken matches
semantics.
- EquipmentCsvImporterTests (13): empty file throws; missing version marker;
missing required column; unknown column; duplicate column; valid single
row round-trips; optional columns populate when present; blank required
field rejects row; duplicate ZTag rejects second; RFC 4180 quoted fields
with commas + escaped quotes; mismatched column count rejects; blank
lines between rows ignored; required + optional column constants match
decisions #117 + #139 exactly.
Full solution dotnet test: 1159 passing (Phase 6.3 = 1137, Phase 6.4 A+B
data = +22). Pre-existing Client.CLI Subscribe flake unchanged.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com >
2026-04-19 10:09:47 -04:00
Joseph Doherty
3b8280f08a
Phase 6.2 Stream D (data layer) — ValidatedNodeAclAuthoringService with write-time invariants
...
Ships the non-UI piece of Stream D: a draft-aware write surface over NodeAcl
that enforces the Phase 6.2 plan's scope-uniqueness + grant-shape invariants.
Blazor UI pieces (RoleGrantsTab + AclsTab refresh + SignalR invalidation +
visual-compliance reviewer signoff) are deferred to the Phase 6.1-style
follow-up task.
Admin.Services:
- ValidatedNodeAclAuthoringService — alongside existing NodeAclService (raw
CRUD, kept for read + revoke paths). GrantAsync enforces:
* Permissions != None (decision #129 — additive only, no empty grants).
* Cluster scope has null ScopeId.
* Sub-cluster scope requires a populated ScopeId.
* No duplicate (GenerationId, ClusterId, LdapGroup, ScopeKind, ScopeId)
tuple — operator updates the row instead of inserting a duplicate.
UpdatePermissionsAsync also rejects None (operator revokes via NodeAclService).
Violations throw InvalidNodeAclGrantException.
Tests (10 new in Admin.Tests/ValidatedNodeAclAuthoringServiceTests):
- Grant rejects None permissions.
- Grant rejects Cluster-scope with ScopeId / sub-cluster without ScopeId.
- Grant succeeds on well-formed row.
- Grant rejects duplicate (group, scope) in same draft.
- Grant allows same group at different scope.
- Grant allows same (group, scope) in different draft.
- UpdatePermissions rejects None.
- UpdatePermissions round-trips new flags + notes.
- UpdatePermissions on unknown rowid throws.
Microsoft.EntityFrameworkCore.InMemory 10.0.0 added to Admin.Tests csproj.
Full solution dotnet test: 1097 passing (was 1087, +10). Phase 6.2 total is
now 1087+10 = 1097; baseline 906 → +191 net across Phase 6.1 (all streams) +
Phase 6.2 (Streams A, B, C foundation, D data layer).
Stream D follow-up task tracks: RoleGrantsTab CRUD over LdapGroupRoleMapping,
AclsTab write-through + Probe-this-permission diagnostic, draft-diff ACL
section, SignalR PermissionTrieCache invalidation push.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com >
2026-04-19 09:39:06 -04:00
Joseph Doherty
ed88835d34
Phase 3 PR 28 — Admin UI cert-trust management page. New /certificates route (FleetAdmin-only) surfaces the OPC UA server's PKI store rejected + trusted certs and gives operators Trust / Delete / Revoke actions so rejected client certs can be promoted without touching disk. CertTrustService reads $PkiStoreRoot/{rejected,trusted}/certs/*.der files directly via X509CertificateLoader — no Opc.Ua dependency in the Admin project, which keeps the Admin host runnable on a machine that doesn't have the full Server install locally (only needs the shared PKI directory reachable; typical deployment has Admin + Server side-by-side on the same box and PkiStoreRoot defaults match so a plain-vanilla install needs no override). CertTrustOptions bound from the Admin's 'CertTrust:PkiStoreRoot' section, default %ProgramData%\OtOpcUa\pki (matches OpcUaServerOptions.PkiStoreRoot default). Trust action moves the .der from rejected/certs/ to trusted/certs/ via File.Move(overwrite:true) — idempotent, tolerates a concurrent operator doing the same move. Delete wipes the file. Revoke removes from trusted/certs/ (Opc.Ua re-reads the Directory store on each new client handshake, so no explicit reload signal is needed; operators retry the rejected connection after trusting). Thumbprint matching is case-insensitive because X509Certificate2.Thumbprint is upper-case hex but operators copy-paste from logs that sometimes lowercase it. Malformed files in the store are logged + skipped — a single bad .der can't take the whole management page offline. Missing store directories produce empty lists rather than exceptions so a pristine install (Server never run yet, no rejected/trusted dirs yet) doesn't crash the page.
...
Razor page layout: two tables (Rejected / Trusted) with Subject / Issuer / Thumbprint / Valid-window / Actions columns, status banner after each action with success or warning kind ('file missing' = another admin handled it), FleetAdmin-only via [Authorize(Roles=AdminRoles.FleetAdmin)]. Each action invokes LogActionAsync which Serilog-logs the authenticated admin user + thumbprint + action for an audit trail — DB-level ConfigAuditLog persistence is deferred because its schema is cluster-scoped and cert actions are cluster-agnostic; Serilog + CertTrustService's filesystem-op info logs give the forensic trail in the meantime. Sidebar link added to MainLayout between Reservations and the future Account page.
Tests — CertTrustServiceTests (9 new unit cases): ListRejected parses Subject + Thumbprint + store kind from a self-signed test cert written into rejected/certs/; rejected and trusted stores are kept separate; TrustRejected moves the file and the Rejected list is empty afterwards; TrustRejected with a thumbprint not in rejected returns false without touching trusted; DeleteRejected removes the file; UntrustCert removes from trusted only; thumbprint match is case-insensitive (operator UX); missing store directories produce empty lists instead of throwing DirectoryNotFoundException (pristine-install tolerance); a junk .der in the store is logged + skipped and the valid certs still surface (one bad file doesn't break the page). Full Admin.Tests Unit suite: 23 pass / 0 fail (14 prior + 9 new). Full Admin build clean — 0 errors, 0 warnings.
lmx-followups.md #3 marked DONE with a cross-reference to this PR and a note that flipping AutoAcceptUntrustedClientCertificates to false as the production default is a deployment-config follow-up, not a code gap — the Admin UI is now ready to be the trust gate.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com >
2026-04-18 14:37:55 -04:00
Joseph Doherty
18f93d72bb
Phase 1 LDAP auth + SignalR real-time — closes the last two open Admin UI TODOs. LDAP: Admin/Security/ gets SecurityOptions (bound from appsettings.json Authentication:Ldap), LdapAuthResult record, ILdapAuthService + LdapAuthService ported from scadalink-design's LdapAuthService (TLS guard, search-then-bind when a service account is configured, direct-bind fallback, service-account re-bind after user bind so attribute lookup uses the service principal's read rights, LdapException-to-friendly-message translation, OperationCanceledException pass-through), RoleMapper (pure function: case-insensitive group-name match against LdapOptions.GroupToRole, returns the distinct set of mapped Admin roles). EscapeLdapFilter escapes the five LDAP filter control chars (\, *, (, ), \0); ExtractFirstRdnValue pulls the value portion of a DN's leading RDN for memberOf parsing; ExtractOuSegment added as a GLAuth-specific fallback when the directory doesn't populate memberOf but does embed ou=PrimaryGroup into user DNs (actual GLAuth config in C:\publish\glauth\glauth.cfg uses nameformat=cn, groupformat=ou — direct bind is enough). Login page rewritten: EditForm → ILdapAuthService.AuthenticateAsync → cookie sign-in with claims (Name = displayName, NameIdentifier = username, Role for each mapped role, ldap_group for each raw group); failed bind shows the service's error; empty-role-map returns an explicit "no Admin role mapped" message rather than silently succeeding. appsettings.json gains an Authentication:Ldap section with dev-GLAuth defaults (localhost:3893, UseTls=false, AllowInsecureLdap=true for dev, GroupToRole maps GLAuth's ReadOnly/WriteOperate/AlarmAck → ConfigViewer/ConfigEditor/FleetAdmin). SignalR: two hubs + a BackgroundService poller. FleetStatusHub routes per-cluster NodeStateChanged pushes (SubscribeCluster/UnsubscribeCluster on connection; FleetGroup for dashboard-wide) with a typed NodeStateChangedMessage payload. AlertHub auto-subscribes every connection to the AllAlertsGroup and exposes AcknowledgeAsync (ack persistence deferred to v2.1). FleetStatusPoller (IHostedService, 5s default cadence) scans ClusterNodeGenerationState joined with ClusterNode, caches the prior snapshot per NodeId, pushes NodeStateChanged on any delta, raises AlertMessage("apply-failed") on transition INTO Failed (sticky — the hub client acks later). Program.cs registers HttpContextAccessor (sign-in needs it), SignalR, LdapOptions + ILdapAuthService, the poller as hosted service, and maps /hubs/fleet + /hubs/alerts endpoints. ClusterDetail adds @rendermode RenderMode.InteractiveServer, @implements IAsyncDisposable, and a HubConnectionBuilder subscription that calls LoadAsync() on each NodeStateChanged for its cluster so the "current published" card refreshes without a page reload; a dismissable "Live update" info banner surfaces the most recent event. Microsoft.AspNetCore.SignalR.Client 10.0.0 + Novell.Directory.Ldap.NETStandard 3.6.0 added. Tests: 13 new — RoleMapperTests (single group, case-insensitive match, multi-group distinct-roles, unknown-group ignored, empty-map); LdapAuthServiceTests (EscapeLdapFilter with 4 inputs, ExtractFirstRdnValue with 4 inputs — all via reflection against internals); LdapLiveBindTests (skip when localhost:3893 unreachable; valid-credentials-bind-succeeds; wrong-password-fails-with-recognizable-error; empty-username-rejected-before-hitting-directory); FleetStatusPollerTests (throwaway DB, seeds cluster+node+generation+apply-state, runs PollOnceAsync, asserts NodeStateChanged hit the recorder; second test seeds a Failed state and asserts AlertRaised fired) — backed by RecordingHubContext/RecordingHubClients/RecordingClientProxy that capture SendCoreAsync invocations while throwing NotImplementedException for the IHubClients methods the poller doesn't call (fail-fast if evolution adds new dependencies). InternalsVisibleTo added so the test project can call FleetStatusPoller.PollOnceAsync directly. Full solution 946 pass / 1 pre-existing Phase 0 baseline failure.
...
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com >
2026-04-17 22:28:49 -04:00
Joseph Doherty
7a5b535cd6
Phase 1 Stream E Admin UI — finish Blazor pages so operators can run the draft → publish → rollback workflow end-to-end without hand-executing SQL. Adds eight new scoped services that wrap the Configuration stored procs + managed validators: EquipmentService (CRUD with auto-derived EquipmentId per decision #125 ), UnsService (areas + lines), NamespaceService, DriverInstanceService (generic JSON DriverConfig editor per decision #94 — per-driver schema validation lands in each driver's phase), NodeAclService (grant + revoke with bundled-preset permission sets; full per-flag editor + bulk-grant + permission simulator deferred to v2.1), ReservationService (fleet-wide active + released reservation inspector + FleetAdmin-only sp_ReleaseExternalIdReservation wrapper with required-reason invariant), DraftValidationService (hydrates a DraftSnapshot from the draft's rows plus prior-cluster Equipment + active reservations, runs the managed DraftValidator to surface every rule in one pass for inline validation panel), AuditLogService (recent ConfigAuditLog reader). Pages: /clusters list with create-new shortcut; /clusters/new wizard that creates the cluster row + initial empty draft in one go; /clusters/{id} detail with 8 tabs (Overview / Generations / Equipment / UNS Structure / Namespaces / Drivers / ACLs / Audit) — tabs that write always target the active draft, published generations stay read-only; /clusters/{id}/draft/{gen} editor with live validation panel (errors list with stable code + message + context; publish button disabled while any error exists) and tab-embedded sub-components; /clusters/{id}/draft/{gen}/diff three-column view backed by sp_ComputeGenerationDiff with Added/Removed/Modified badges; Generations tab with per-row rollback action wired to sp_RollbackToGeneration; /reservations FleetAdmin-only page (CanPublish policy) with active + released lists and a modal release dialog that enforces non-empty reason and round-trips through sp_ReleaseExternalIdReservation; /login scaffold with stub credential accept + FleetAdmin-role cookie issuance (real LDAP bind via the ScadaLink-parity LdapAuthService is deferred until live GLAuth integration — marked in the login view and in the Phase 1 partial-exit TODO). Layout: sidebar gets Overview / Clusters / Reservations + AuthorizeView with signed-in username + roles + sign-out POST to /auth/logout; cascading authentication state registered for <AuthorizeView> to work in RenderMode.InteractiveServer. Integration testing: AdminServicesIntegrationTests creates a throwaway per-run database (same pattern as the Configuration test fixture), applies all three migrations, and exercises (1) create-cluster → add-namespace+UNS+driver+equipment → validate (expects zero errors) → publish (expects Published status) → rollback (expects one new Published + at least one Superseded); (2) cross-cluster namespace binding draft → validates to BadCrossClusterNamespaceBinding per decision #122 . Old flat Components/Pages/Clusters.razor moved to Components/Pages/Clusters/ClustersList.razor so the Clusters folder can host tab sub-components without the razor generator creating a type-and-namespace collision. Dev appsettings.json connection string switched from Integrated Security to sa auth to match the otopcua-mssql container on port 14330 (remapped from 1433 to coexist with the native MSSQL14 Galaxy ZB instance). Browser smoke test completed: home page, clusters list, new-cluster form, cluster detail with a seeded row, reservations (redirected to login for anon user) all return 200 / 302-to-login as expected; full solution 928 pass / 1 pre-existing Phase 0 baseline failure. Phase 1 Stream E items explicitly deferred with TODOs: CSV import for Equipment, SignalR FleetStatusHub + AlertHub real-time push, bulk-grant workflow, permission-simulator trie, merge-equipment draft, AppServer-via-OI-Gateway end-to-end smoke test (decision #142 ), and the real LDAP bind replacing the Login page stub.
...
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com >
2026-04-17 21:52:42 -04:00
Joseph Doherty
01fd90c178
Phase 1 Streams B–E scaffold + Phase 2 Streams A–C scaffold — 8 new projects with ~70 new tests, all green alongside the 494 v1 IntegrationTests baseline (parity preserved: no v1 tests broken; legacy OtOpcUa.Host untouched). Phase 1 finish: Configuration project (16 entities + 10 enums + DbContext + DesignTimeDbContextFactory + InitialSchema/StoredProcedures/AuthorizationGrants migrations — 8 procs including sp_PublishGeneration with MERGE on ExternalIdReservation per decision #124 , sp_RollbackToGeneration cloning rows into a new published generation, sp_ValidateDraft with cross-cluster-namespace + EquipmentUuid-immutability + ZTag/SAPID reservation pre-flight, sp_ComputeGenerationDiff with CHECKSUM-based row signature — plus OtOpcUaNode/OtOpcUaAdmin SQL roles with EXECUTE grants scoped to per-principal-class proc sets and DENY UPDATE/DELETE/INSERT/SELECT on dbo schema); managed DraftValidator covering UNS segment regex, path length, EquipmentUuid immutability across generations, same-cluster namespace binding (decision #122 ), reservation pre-flight, EquipmentId derivation (decision #125 ), driver↔namespace compatibility — returning every failing rule in one pass; LiteDB local cache with round-trip + ring pruning + corruption-fast-fail; GenerationApplier with per-entity Added/Removed/Modified diff and dependency-ordered callbacks (namespace → driver → device → equipment → poll-group → tag, Removed before Added); Core project with GenericDriverNodeManager (scaffold for the Phase 2 Galaxy port) and DriverHost lifecycle registry; Server project using Microsoft.Extensions.Hosting BackgroundService replacing TopShelf, with NodeBootstrap that falls back to LiteDB cache when the central DB is unreachable (decision #79 ); Admin project scaffolded as Blazor Server with Bootstrap 5 sidebar layout, cookie auth, three admin roles (ConfigViewer/ConfigEditor/FleetAdmin), Cluster + Generation services fronting the stored procs. Phase 2 scaffold: Driver.Galaxy.Shared (netstandard2.0) with full MessagePack IPC contract surface — Hello version negotiation, Open/CloseSession, Heartbeat, DiscoverHierarchy + GalaxyObjectInfo/GalaxyAttributeInfo, Read/WriteValues, Subscribe/Unsubscribe/OnDataChange, AlarmSubscribe/Event/Ack, HistoryRead, HostConnectivityStatus, Recycle — plus length-prefixed framing (decision #28 ) with a 16 MiB cap and thread-safe FrameWriter/FrameReader; Driver.Galaxy.Host (net48) implementing the Tier C cross-cutting protections from driver-stability.md — strict PipeAcl (allow configured server SID only, explicit deny on LocalSystem + Administrators), PipeServer with caller-SID verification via pipe.RunAsClient + WindowsIdentity.GetCurrent and per-process shared-secret Hello, Galaxy-specific MemoryWatchdog (warn at max(1.5×baseline, +200 MB), soft-recycle at max(2×baseline, +200 MB), hard ceiling 1.5 GB, slope ≥5 MB/min over 30-min rolling window), RecyclePolicy (1 soft recycle per hour cap + 03:00 local daily scheduled), PostMortemMmf (1000-entry ring buffer in %ProgramData%\OtOpcUa\driver-postmortem\galaxy.mmf, survives hard crash, readable cross-process), MxAccessHandle : SafeHandle (ReleaseHandle loops Marshal.ReleaseComObject until refcount=0 then calls optional unregister callback), StaPump with responsiveness probe (BlockingCollection dispatcher for Phase 1 — real Win32 GetMessage/DispatchMessage pump slots in with the same semantics when the Galaxy code lift happens), IsExternalInit shim for init setters on .NET 4.8; Driver.Galaxy.Proxy (net10) implementing IDriver + ITagDiscovery forwarding over the IPC channel with MX data-type and security-classification mapping, plus Supervisor pieces — Backoff (5s → 15s → 60s capped, reset-on-stable-run), CircuitBreaker (3 crashes per 5 min opens; 1h → 4h → manual cooldown escalation; sticky alert doesn't auto-clear), HeartbeatMonitor (2s cadence, 3 consecutive misses = host dead per driver-stability.md). Infrastructure: docker SQL Server remapped to host port 14330 to coexist with the native MSSQL14 Galaxy ZB DB instance on 1433; NuGetAuditSuppress applied per-project for two System.Security.Cryptography.Xml advisories that only reach via EF Core Design with PrivateAssets=all (fix ships in 11.0.0-preview); .slnx gains 14 project registrations. Deferred with explicit TODOs in docs/v2/implementation/phase-2-partial-exit-evidence.md: Phase 1 Stream E Admin UI pages (Generations listing + draft-diff-publish, Equipment CRUD with OPC 40010 fields, UNS Areas/Lines tabs, ACLs + permission simulator, Generic JSON config editor, SignalR real-time, Release-Reservation + Merge-Equipment workflows, LDAP login page, AppServer smoke test per decision #142 ), Phase 2 Stream D (Galaxy MXAccess code lift out of legacy OtOpcUa.Host, dual-service installer, appsettings → DriverConfig migration script, legacy Host deletion — blocked by parity), Phase 2 Stream E (v1 IntegrationTests against v2 topology, Client.CLI walkthrough diff, four 2026-04-13 stability findings regression tests, adversarial review — requires live MXAccess runtime).
...
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com >
2026-04-17 21:35:25 -04:00