560a961cca5b4e8657b8321b044a264700dc2fbf
4 Commits
| Author | SHA1 | Message | Date | |
|---|---|---|---|---|
|
|
0fcdfc7546 |
Phase 6.2 Stream A — LdapGroupRoleMapping entity + EF migration + CRUD service
Stream A.1-A.2 per docs/v2/implementation/phase-6-2-authorization-runtime.md.
Seed-data migration (A.3) is a separate follow-up once production LDAP group
DNs are finalised; until then CRUD via the Admin UI handles the fleet set up.
Configuration:
- New AdminRole enum {ConfigViewer, ConfigEditor, FleetAdmin} — string-stored.
- New LdapGroupRoleMapping entity with Id (surrogate PK), LdapGroup (512 chars),
Role (AdminRole enum), ClusterId (nullable, FK to ServerCluster), IsSystemWide,
CreatedAtUtc, Notes.
- EF config: UX_LdapGroupRoleMapping_Group_Cluster unique index on
(LdapGroup, ClusterId) + IX_LdapGroupRoleMapping_Group hot-path index on
LdapGroup for sign-in lookups. Cluster FK cascades on cluster delete.
- Migration 20260419_..._AddLdapGroupRoleMapping generated via `dotnet ef`.
Configuration.Services:
- ILdapGroupRoleMappingService — CRUD surface. Declared as control-plane only
per decision #150; the OPC UA data-path evaluator must NOT depend on this
interface (Phase 6.2 compliance check on control/data-plane separation).
GetByGroupsAsync is the hot-path sign-in lookup.
- LdapGroupRoleMappingService (EF Core impl) enforces the write-time invariant
"exactly one of (ClusterId populated, IsSystemWide=true)" and surfaces
InvalidLdapGroupRoleMappingException on violation. Create auto-populates Id
+ CreatedAtUtc when omitted.
Tests (9 new, all pass) in Configuration.Tests:
- Create sets Id + CreatedAtUtc.
- Create rejects empty LdapGroup.
- Create rejects IsSystemWide=true with populated ClusterId.
- Create rejects IsSystemWide=false with null ClusterId.
- GetByGroupsAsync returns matching rows only.
- GetByGroupsAsync with empty input returns empty (no full-table scan).
- ListAllAsync orders by group then cluster.
- Delete removes the target row.
- Delete of unknown id is a no-op.
Microsoft.EntityFrameworkCore.InMemory 10.0.0 added to Configuration.Tests for
the service-level tests (schema-compliance tests still use the live SQL
fixture).
SchemaComplianceTests updated to expect the new LdapGroupRoleMapping table.
Full solution dotnet test: 1051 passing (baseline 906, Phase 6.1 shipped at
1042, Phase 6.2 Stream A adds 9 = 1051). Pre-existing Client.CLI Subscribe
flake unchanged.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
||
|
|
cbcaf6593a |
Phase 6.1 Stream E (data layer) — DriverInstanceResilienceStatus entity + DriverResilienceStatusTracker + EF migration
Ships the data + runtime layer of Stream E. The SignalR hub and Blazor /hosts page refresh (E.2-E.3) are follow-up work paired with the visual-compliance review per Phase 6.4 patterns — documented as a deferred follow-up below. Configuration: - New entity DriverInstanceResilienceStatus with: DriverInstanceId, HostName (composite PK), LastCircuitBreakerOpenUtc, ConsecutiveFailures, CurrentBulkheadDepth, LastRecycleUtc, BaselineFootprintBytes, CurrentFootprintBytes, LastSampledUtc. - Separate from DriverHostStatus (per-host connectivity view) so a Running host that has tripped its breaker or is nearing its memory ceiling shows up distinctly on Admin /hosts. Admin page left-joins both for display. - OtOpcUaConfigDbContext + Fluent-API config + IX_DriverResilience_LastSampled index for the stale-sample filter query. - EF migration: 20260419124034_AddDriverInstanceResilienceStatus. Core.Resilience: - DriverResilienceStatusTracker — process-singleton in-memory tracker keyed on (DriverInstanceId, HostName). CapabilityInvoker + MemoryTracking + MemoryRecycle callers record failure/success/breaker-open/recycle/footprint events; a HostedService (Stream E.2 follow-up) samples this tracker every 5 s and persists to the DB. Pure in-memory keeps tests fast + the core free of EF/SQL dependencies. Tests: - DriverResilienceStatusTrackerTests (9 new, all pass): tryget-before-write returns null; failures accumulate; success resets; breaker/recycle/footprint fields populate; per-host isolation; snapshot returns all pairs; concurrent writes don't lose counts. - SchemaComplianceTests: expected-tables list updated to include the new DriverInstanceResilienceStatus table. Full solution dotnet test: 1042 passing (baseline 906, +136 for Phase 6.1 so far across Streams A/B/C/D/E.1). Pre-existing Client.CLI Subscribe flake unchanged. Deferred to follow-up PR (E.2/E.3): - ResilienceStatusPublisher HostedService that samples DriverResilienceStatusTracker every 5 s + upserts DriverInstanceResilienceStatus rows. - Admin FleetStatusHub SignalR hub pushing LastCircuitBreakerOpenUtc / CurrentBulkheadDepth / LastRecycleUtc on change. - Admin /hosts Blazor column additions (red badge when ConsecutiveFailures > breakerThreshold / 2). Visual-compliance reviewer signoff alongside Phase 6.4 admin-ui patterns. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> |
||
|
|
8464e3f376 |
Phase 3 PR 33 — DriverHostStatus entity + EF migration (data-layer for LMX #7). New DriverHostStatus entity with composite key (NodeId, DriverInstanceId, HostName) persists each server node's per-host connectivity view — one row per (server node, driver instance, probe-reported host), which means a redundant 2-node cluster with one Galaxy driver reporting 3 platforms produces 6 rows because each server node owns its own runtime view of the shared host topology, not 3. Fields: NodeId (64), DriverInstanceId (64), HostName (256 — fits Galaxy FQDNs and Modbus host:port strings), State (DriverHostState enum — Unknown/Running/Stopped/Faulted, persisted as nvarchar(16) via HasConversion<string> so DBAs inspecting the table see readable state names not ordinals), StateChangedUtc + LastSeenUtc (datetime2(3) — StateChangedUtc tracks actual transitions while LastSeenUtc advances on every publisher heartbeat so the Admin UI can flag stale rows from a crashed Server independent of State), Detail (nullable 1024 — exception message from the driver's probe when Faulted, null otherwise).
DriverHostState enum lives in Configuration.Enums/ rather than reusing Core.Abstractions.HostState so the Configuration project stays free of driver-runtime dependencies (it's referenced by both the Admin process and the Server process, so pulling in the driver-abstractions assembly to every Admin build would be unnecessary weight). The server-side publisher hosted service (follow-up PR 34) will translate HostStatusChangedEventArgs.NewState to this enum on every transition. No foreign key to ClusterNode — a Server may start reporting host status before its ClusterNode row exists (first-boot bootstrap), and we'd rather keep the status row than drop it. The Admin-side service that renders the dashboard will left-join on NodeId when presenting. Two indexes declared: IX_DriverHostStatus_Node drives the per-cluster drill-down (Admin UI joins ClusterNode on ClusterId to pick which NodeIds to fetch), IX_DriverHostStatus_LastSeen drives the stale-row query (now - LastSeen > threshold). EF migration AddDriverHostStatus creates the table + PK + both indexes. Model snapshot updated. SchemaComplianceTests expected-tables list extended. DriverHostStatusTests (3 new cases, category SchemaCompliance, uses the shared fixture DB): composite key allows same (host, driver) across different nodes AND same (node, host) across different drivers — both real-world cases the publisher needs to support; upsert-in-place pattern (fetch-by-composite-PK, mutate, save) produces one row not two — the pattern the publisher will use; State enum persists as string not int — reading the DB via ADO.NET returns 'Faulted' not '3'. Configuration.Tests SchemaCompliance suite: 10 pass / 0 fail (7 prior + 3 new). Configuration build clean. No Server or Admin code changes yet — publisher + /hosts page are PR 34. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> |
||
|
|
01fd90c178 |
Phase 1 Streams B–E scaffold + Phase 2 Streams A–C scaffold — 8 new projects with ~70 new tests, all green alongside the 494 v1 IntegrationTests baseline (parity preserved: no v1 tests broken; legacy OtOpcUa.Host untouched). Phase 1 finish: Configuration project (16 entities + 10 enums + DbContext + DesignTimeDbContextFactory + InitialSchema/StoredProcedures/AuthorizationGrants migrations — 8 procs including sp_PublishGeneration with MERGE on ExternalIdReservation per decision #124, sp_RollbackToGeneration cloning rows into a new published generation, sp_ValidateDraft with cross-cluster-namespace + EquipmentUuid-immutability + ZTag/SAPID reservation pre-flight, sp_ComputeGenerationDiff with CHECKSUM-based row signature — plus OtOpcUaNode/OtOpcUaAdmin SQL roles with EXECUTE grants scoped to per-principal-class proc sets and DENY UPDATE/DELETE/INSERT/SELECT on dbo schema); managed DraftValidator covering UNS segment regex, path length, EquipmentUuid immutability across generations, same-cluster namespace binding (decision #122), reservation pre-flight, EquipmentId derivation (decision #125), driver↔namespace compatibility — returning every failing rule in one pass; LiteDB local cache with round-trip + ring pruning + corruption-fast-fail; GenerationApplier with per-entity Added/Removed/Modified diff and dependency-ordered callbacks (namespace → driver → device → equipment → poll-group → tag, Removed before Added); Core project with GenericDriverNodeManager (scaffold for the Phase 2 Galaxy port) and DriverHost lifecycle registry; Server project using Microsoft.Extensions.Hosting BackgroundService replacing TopShelf, with NodeBootstrap that falls back to LiteDB cache when the central DB is unreachable (decision #79); Admin project scaffolded as Blazor Server with Bootstrap 5 sidebar layout, cookie auth, three admin roles (ConfigViewer/ConfigEditor/FleetAdmin), Cluster + Generation services fronting the stored procs. Phase 2 scaffold: Driver.Galaxy.Shared (netstandard2.0) with full MessagePack IPC contract surface — Hello version negotiation, Open/CloseSession, Heartbeat, DiscoverHierarchy + GalaxyObjectInfo/GalaxyAttributeInfo, Read/WriteValues, Subscribe/Unsubscribe/OnDataChange, AlarmSubscribe/Event/Ack, HistoryRead, HostConnectivityStatus, Recycle — plus length-prefixed framing (decision #28) with a 16 MiB cap and thread-safe FrameWriter/FrameReader; Driver.Galaxy.Host (net48) implementing the Tier C cross-cutting protections from driver-stability.md — strict PipeAcl (allow configured server SID only, explicit deny on LocalSystem + Administrators), PipeServer with caller-SID verification via pipe.RunAsClient + WindowsIdentity.GetCurrent and per-process shared-secret Hello, Galaxy-specific MemoryWatchdog (warn at max(1.5×baseline, +200 MB), soft-recycle at max(2×baseline, +200 MB), hard ceiling 1.5 GB, slope ≥5 MB/min over 30-min rolling window), RecyclePolicy (1 soft recycle per hour cap + 03:00 local daily scheduled), PostMortemMmf (1000-entry ring buffer in %ProgramData%\OtOpcUa\driver-postmortem\galaxy.mmf, survives hard crash, readable cross-process), MxAccessHandle : SafeHandle (ReleaseHandle loops Marshal.ReleaseComObject until refcount=0 then calls optional unregister callback), StaPump with responsiveness probe (BlockingCollection dispatcher for Phase 1 — real Win32 GetMessage/DispatchMessage pump slots in with the same semantics when the Galaxy code lift happens), IsExternalInit shim for init setters on .NET 4.8; Driver.Galaxy.Proxy (net10) implementing IDriver + ITagDiscovery forwarding over the IPC channel with MX data-type and security-classification mapping, plus Supervisor pieces — Backoff (5s → 15s → 60s capped, reset-on-stable-run), CircuitBreaker (3 crashes per 5 min opens; 1h → 4h → manual cooldown escalation; sticky alert doesn't auto-clear), HeartbeatMonitor (2s cadence, 3 consecutive misses = host dead per driver-stability.md). Infrastructure: docker SQL Server remapped to host port 14330 to coexist with the native MSSQL14 Galaxy ZB DB instance on 1433; NuGetAuditSuppress applied per-project for two System.Security.Cryptography.Xml advisories that only reach via EF Core Design with PrivateAssets=all (fix ships in 11.0.0-preview); .slnx gains 14 project registrations. Deferred with explicit TODOs in docs/v2/implementation/phase-2-partial-exit-evidence.md: Phase 1 Stream E Admin UI pages (Generations listing + draft-diff-publish, Equipment CRUD with OPC 40010 fields, UNS Areas/Lines tabs, ACLs + permission simulator, Generic JSON config editor, SignalR real-time, Release-Reservation + Merge-Equipment workflows, LDAP login page, AppServer smoke test per decision #142), Phase 2 Stream D (Galaxy MXAccess code lift out of legacy OtOpcUa.Host, dual-service installer, appsettings → DriverConfig migration script, legacy Host deletion — blocked by parity), Phase 2 Stream E (v1 IntegrationTests against v2 topology, Client.CLI walkthrough diff, four 2026-04-13 stability findings regression tests, adversarial review — requires live MXAccess runtime).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> |