Files
lmxopcua/tests
Joseph Doherty cde018aec1 Phase 3 PR 52 -- Modbus exception-code -> OPC UA StatusCode translation. Before this PR every server-side Modbus exception AND every transport-layer failure collapsed to BadInternalError (0x80020000) in the driver's Read/Write results, making field diagnosis 'is this a tag misconfig or a driver bug?' impossible from the OPC UA client side. PR 52 adds a MapModbusExceptionToStatus helper that translates per spec: 01 Illegal Function -> BadNotSupported (0x803D0000); 02 Illegal Data Address -> BadOutOfRange (0x803C0000); 03 Illegal Data Value -> BadOutOfRange; 04 Server Failure -> BadDeviceFailure (0x80550000); 05/06 Acknowledge/Busy -> BadDeviceFailure; 0A/0B Gateway -> BadCommunicationError (0x80050000); unknown -> BadInternalError fallback. Non-Modbus failures (socket drop, timeout, malformed frame) in ReadAsync are now distinguished from tag-level faults: they map to BadCommunicationError so operators check network/PLC reachability rather than tag definitions. Why per-DL205: docs/v2/dl205.md documents DL205/DL260 returning only codes 01-04 with specific triggers -- exception 04 specifically means 'CPU in PROGRAM mode during a protected write', which is operator-recoverable by switching the CPU to RUN; surfacing it as BadDeviceFailure (not BadInternalError) makes the fix obvious. Changes in ModbusDriver: Read catch-chain now ModbusException first (-> mapper), generic Exception second (-> BadCommunicationError); Write catch-chain same pattern but generic Exception stays BadInternalError because write failures can legitimately come from EncodeRegister (out-of-range value) which is a driver-layer fault. Unit tests: MapModbusExceptionToStatus theory exercising every code in the table including the 0xFF fallback; Read_surface_exception_02_as_BadOutOfRange with an ExceptionRaisingTransport that forces code 02; Write_surface_exception_04_as_BadDeviceFailure for CPU-mode faults; Read_non_modbus_failure_maps_to_BadCommunicationError with a NonModbusFailureTransport that raises EndOfStreamException. 115/115 Modbus.Tests pass. Integration test: DL205ExceptionCodeTests.DL205_FC03_at_unmapped_register_returns_BadOutOfRange reads HR[16383] which is beyond the seeded uint16 cells on the dl205.json profile; pymodbus returns exception 02 and the driver surfaces BadOutOfRange. 11/11 DL205 integration tests pass with MODBUS_SIM_PROFILE=dl205.
2026-04-18 22:28:37 -04:00
..
Phase 3 PR 28 — Admin UI cert-trust management page. New /certificates route (FleetAdmin-only) surfaces the OPC UA server's PKI store rejected + trusted certs and gives operators Trust / Delete / Revoke actions so rejected client certs can be promoted without touching disk. CertTrustService reads $PkiStoreRoot/{rejected,trusted}/certs/*.der files directly via X509CertificateLoader — no Opc.Ua dependency in the Admin project, which keeps the Admin host runnable on a machine that doesn't have the full Server install locally (only needs the shared PKI directory reachable; typical deployment has Admin + Server side-by-side on the same box and PkiStoreRoot defaults match so a plain-vanilla install needs no override). CertTrustOptions bound from the Admin's 'CertTrust:PkiStoreRoot' section, default %ProgramData%\OtOpcUa\pki (matches OpcUaServerOptions.PkiStoreRoot default). Trust action moves the .der from rejected/certs/ to trusted/certs/ via File.Move(overwrite:true) — idempotent, tolerates a concurrent operator doing the same move. Delete wipes the file. Revoke removes from trusted/certs/ (Opc.Ua re-reads the Directory store on each new client handshake, so no explicit reload signal is needed; operators retry the rejected connection after trusting). Thumbprint matching is case-insensitive because X509Certificate2.Thumbprint is upper-case hex but operators copy-paste from logs that sometimes lowercase it. Malformed files in the store are logged + skipped — a single bad .der can't take the whole management page offline. Missing store directories produce empty lists rather than exceptions so a pristine install (Server never run yet, no rejected/trusted dirs yet) doesn't crash the page.
2026-04-18 14:37:55 -04:00
Phase 3 PR 15 — alarm-condition contract in IAddressSpaceBuilder + wire OnAlarmEvent through GenericDriverNodeManager. IAddressSpaceBuilder.IVariableHandle gains MarkAsAlarmCondition(AlarmConditionInfo) which returns an IAlarmConditionSink. AlarmConditionInfo carries SourceName/InitialSeverity/InitialDescription. Concrete address-space builders (the upcoming PR 16 OPC UA server backend) materialize a sibling AlarmConditionState node on the first call; the sink receives every lifecycle transition the generic node manager forwards. GenericDriverNodeManager gains a CapturingBuilder wrapper that transparently wraps every Folder/Variable call — the wrapper observes MarkAsAlarmCondition calls without participating in materialization, captures the resulting IAlarmConditionSink into an internal source-node-id → sink ConcurrentDictionary keyed by IVariableHandle.FullReference. After DiscoverAsync completes, if the driver implements IAlarmSource the node manager subscribes to OnAlarmEvent and routes every AlarmEventArgs to the sink registered for args.SourceNodeId — unknown source ids are dropped silently (may belong to another driver or to a variable the builder chose not to flag). Dispose unsubscribes the forwarder to prevent dangling invocation-list references across node-manager rebuilds. GalaxyProxyDriver.DiscoverAsync now calls handle.MarkAsAlarmCondition(new AlarmConditionInfo(fullName, AlarmSeverity.Medium, null)) on every attr.IsAlarm=true variable — severity seed is Medium because the live Priority byte arrives through the subsequent GalaxyAlarmEvent stream (which PR 14's GalaxyAlarmTracker now emits); the Admin UI sees the severity update on the first transition. RecordingAddressSpaceBuilder in Driver.Galaxy.E2E gains a RecordedAlarmCondition list + a RecordingSink implementation that captures AlarmEventArgs for test assertion — the E2E parity suite can now verify alarm-condition registration shape in addition to folder/variable shape. Tests (4 new GenericDriverNodeManagerTests): Alarm_events_are_routed_to_the_sink_registered_for_the_matching_source_node_id — 2 alarms registered (Tank.HiHi + Heater.OverTemp), driver raises an event for Tank.HiHi, the Tank.HiHi sink captures the payload, the Heater.OverTemp sink does not (tag-scoped fan-out, not broadcast); Non_alarm_variables_do_not_register_sinks — plain Tank.Level in the same discover is not in TrackedAlarmSources; Unknown_source_node_id_is_dropped_silently — a transition for Unknown.Source doesn't reach any sink + no exception; Dispose_unsubscribes_from_OnAlarmEvent — post-dispose, a transition for a previously-registered tag is no-op because the forwarder detached. InternalsVisibleTo('ZB.MOM.WW.OtOpcUa.Core.Tests') added to Core csproj so TrackedAlarmSources internal property is visible to the test. Full solution: 0 errors, 152 unit tests pass (8 Core + 14 Proxy + 14 Admin + 24 Configuration + 6 Shared + 84 Galaxy.Host + 2 Server). PR 16 will implement the concrete OPC UA address-space builder that materializes AlarmConditionState from this contract.
2026-04-18 07:51:35 -04:00
Phase 3 PR 15 — alarm-condition contract in IAddressSpaceBuilder + wire OnAlarmEvent through GenericDriverNodeManager. IAddressSpaceBuilder.IVariableHandle gains MarkAsAlarmCondition(AlarmConditionInfo) which returns an IAlarmConditionSink. AlarmConditionInfo carries SourceName/InitialSeverity/InitialDescription. Concrete address-space builders (the upcoming PR 16 OPC UA server backend) materialize a sibling AlarmConditionState node on the first call; the sink receives every lifecycle transition the generic node manager forwards. GenericDriverNodeManager gains a CapturingBuilder wrapper that transparently wraps every Folder/Variable call — the wrapper observes MarkAsAlarmCondition calls without participating in materialization, captures the resulting IAlarmConditionSink into an internal source-node-id → sink ConcurrentDictionary keyed by IVariableHandle.FullReference. After DiscoverAsync completes, if the driver implements IAlarmSource the node manager subscribes to OnAlarmEvent and routes every AlarmEventArgs to the sink registered for args.SourceNodeId — unknown source ids are dropped silently (may belong to another driver or to a variable the builder chose not to flag). Dispose unsubscribes the forwarder to prevent dangling invocation-list references across node-manager rebuilds. GalaxyProxyDriver.DiscoverAsync now calls handle.MarkAsAlarmCondition(new AlarmConditionInfo(fullName, AlarmSeverity.Medium, null)) on every attr.IsAlarm=true variable — severity seed is Medium because the live Priority byte arrives through the subsequent GalaxyAlarmEvent stream (which PR 14's GalaxyAlarmTracker now emits); the Admin UI sees the severity update on the first transition. RecordingAddressSpaceBuilder in Driver.Galaxy.E2E gains a RecordedAlarmCondition list + a RecordingSink implementation that captures AlarmEventArgs for test assertion — the E2E parity suite can now verify alarm-condition registration shape in addition to folder/variable shape. Tests (4 new GenericDriverNodeManagerTests): Alarm_events_are_routed_to_the_sink_registered_for_the_matching_source_node_id — 2 alarms registered (Tank.HiHi + Heater.OverTemp), driver raises an event for Tank.HiHi, the Tank.HiHi sink captures the payload, the Heater.OverTemp sink does not (tag-scoped fan-out, not broadcast); Non_alarm_variables_do_not_register_sinks — plain Tank.Level in the same discover is not in TrackedAlarmSources; Unknown_source_node_id_is_dropped_silently — a transition for Unknown.Source doesn't reach any sink + no exception; Dispose_unsubscribes_from_OnAlarmEvent — post-dispose, a transition for a previously-registered tag is no-op because the forwarder detached. InternalsVisibleTo('ZB.MOM.WW.OtOpcUa.Core.Tests') added to Core csproj so TrackedAlarmSources internal property is visible to the test. Full solution: 0 errors, 152 unit tests pass (8 Core + 14 Proxy + 14 Admin + 24 Configuration + 6 Shared + 84 Galaxy.Host + 2 Server). PR 16 will implement the concrete OPC UA address-space builder that materializes AlarmConditionState from this contract.
2026-04-18 07:51:35 -04:00
Phase 1 Streams B–E scaffold + Phase 2 Streams A–C scaffold — 8 new projects with ~70 new tests, all green alongside the 494 v1 IntegrationTests baseline (parity preserved: no v1 tests broken; legacy OtOpcUa.Host untouched). Phase 1 finish: Configuration project (16 entities + 10 enums + DbContext + DesignTimeDbContextFactory + InitialSchema/StoredProcedures/AuthorizationGrants migrations — 8 procs including sp_PublishGeneration with MERGE on ExternalIdReservation per decision #124, sp_RollbackToGeneration cloning rows into a new published generation, sp_ValidateDraft with cross-cluster-namespace + EquipmentUuid-immutability + ZTag/SAPID reservation pre-flight, sp_ComputeGenerationDiff with CHECKSUM-based row signature — plus OtOpcUaNode/OtOpcUaAdmin SQL roles with EXECUTE grants scoped to per-principal-class proc sets and DENY UPDATE/DELETE/INSERT/SELECT on dbo schema); managed DraftValidator covering UNS segment regex, path length, EquipmentUuid immutability across generations, same-cluster namespace binding (decision #122), reservation pre-flight, EquipmentId derivation (decision #125), driver↔namespace compatibility — returning every failing rule in one pass; LiteDB local cache with round-trip + ring pruning + corruption-fast-fail; GenerationApplier with per-entity Added/Removed/Modified diff and dependency-ordered callbacks (namespace → driver → device → equipment → poll-group → tag, Removed before Added); Core project with GenericDriverNodeManager (scaffold for the Phase 2 Galaxy port) and DriverHost lifecycle registry; Server project using Microsoft.Extensions.Hosting BackgroundService replacing TopShelf, with NodeBootstrap that falls back to LiteDB cache when the central DB is unreachable (decision #79); Admin project scaffolded as Blazor Server with Bootstrap 5 sidebar layout, cookie auth, three admin roles (ConfigViewer/ConfigEditor/FleetAdmin), Cluster + Generation services fronting the stored procs. Phase 2 scaffold: Driver.Galaxy.Shared (netstandard2.0) with full MessagePack IPC contract surface — Hello version negotiation, Open/CloseSession, Heartbeat, DiscoverHierarchy + GalaxyObjectInfo/GalaxyAttributeInfo, Read/WriteValues, Subscribe/Unsubscribe/OnDataChange, AlarmSubscribe/Event/Ack, HistoryRead, HostConnectivityStatus, Recycle — plus length-prefixed framing (decision #28) with a 16 MiB cap and thread-safe FrameWriter/FrameReader; Driver.Galaxy.Host (net48) implementing the Tier C cross-cutting protections from driver-stability.md — strict PipeAcl (allow configured server SID only, explicit deny on LocalSystem + Administrators), PipeServer with caller-SID verification via pipe.RunAsClient + WindowsIdentity.GetCurrent and per-process shared-secret Hello, Galaxy-specific MemoryWatchdog (warn at max(1.5×baseline, +200 MB), soft-recycle at max(2×baseline, +200 MB), hard ceiling 1.5 GB, slope ≥5 MB/min over 30-min rolling window), RecyclePolicy (1 soft recycle per hour cap + 03:00 local daily scheduled), PostMortemMmf (1000-entry ring buffer in %ProgramData%\OtOpcUa\driver-postmortem\galaxy.mmf, survives hard crash, readable cross-process), MxAccessHandle : SafeHandle (ReleaseHandle loops Marshal.ReleaseComObject until refcount=0 then calls optional unregister callback), StaPump with responsiveness probe (BlockingCollection dispatcher for Phase 1 — real Win32 GetMessage/DispatchMessage pump slots in with the same semantics when the Galaxy code lift happens), IsExternalInit shim for init setters on .NET 4.8; Driver.Galaxy.Proxy (net10) implementing IDriver + ITagDiscovery forwarding over the IPC channel with MX data-type and security-classification mapping, plus Supervisor pieces — Backoff (5s → 15s → 60s capped, reset-on-stable-run), CircuitBreaker (3 crashes per 5 min opens; 1h → 4h → manual cooldown escalation; sticky alert doesn't auto-clear), HeartbeatMonitor (2s cadence, 3 consecutive misses = host dead per driver-stability.md). Infrastructure: docker SQL Server remapped to host port 14330 to coexist with the native MSSQL14 Galaxy ZB DB instance on 1433; NuGetAuditSuppress applied per-project for two System.Security.Cryptography.Xml advisories that only reach via EF Core Design with PrivateAssets=all (fix ships in 11.0.0-preview); .slnx gains 14 project registrations. Deferred with explicit TODOs in docs/v2/implementation/phase-2-partial-exit-evidence.md: Phase 1 Stream E Admin UI pages (Generations listing + draft-diff-publish, Equipment CRUD with OPC 40010 fields, UNS Areas/Lines tabs, ACLs + permission simulator, Generic JSON config editor, SignalR real-time, Release-Reservation + Merge-Equipment workflows, LDAP login page, AppServer smoke test per decision #142), Phase 2 Stream D (Galaxy MXAccess code lift out of legacy OtOpcUa.Host, dual-service installer, appsettings → DriverConfig migration script, legacy Host deletion — blocked by parity), Phase 2 Stream E (v1 IntegrationTests against v2 topology, Client.CLI walkthrough diff, four 2026-04-13 stability findings regression tests, adversarial review — requires live MXAccess runtime).
2026-04-17 21:35:25 -04:00
Phase 3 PR 52 -- Modbus exception-code -> OPC UA StatusCode translation. Before this PR every server-side Modbus exception AND every transport-layer failure collapsed to BadInternalError (0x80020000) in the driver's Read/Write results, making field diagnosis 'is this a tag misconfig or a driver bug?' impossible from the OPC UA client side. PR 52 adds a MapModbusExceptionToStatus helper that translates per spec: 01 Illegal Function -> BadNotSupported (0x803D0000); 02 Illegal Data Address -> BadOutOfRange (0x803C0000); 03 Illegal Data Value -> BadOutOfRange; 04 Server Failure -> BadDeviceFailure (0x80550000); 05/06 Acknowledge/Busy -> BadDeviceFailure; 0A/0B Gateway -> BadCommunicationError (0x80050000); unknown -> BadInternalError fallback. Non-Modbus failures (socket drop, timeout, malformed frame) in ReadAsync are now distinguished from tag-level faults: they map to BadCommunicationError so operators check network/PLC reachability rather than tag definitions. Why per-DL205: docs/v2/dl205.md documents DL205/DL260 returning only codes 01-04 with specific triggers -- exception 04 specifically means 'CPU in PROGRAM mode during a protected write', which is operator-recoverable by switching the CPU to RUN; surfacing it as BadDeviceFailure (not BadInternalError) makes the fix obvious. Changes in ModbusDriver: Read catch-chain now ModbusException first (-> mapper), generic Exception second (-> BadCommunicationError); Write catch-chain same pattern but generic Exception stays BadInternalError because write failures can legitimately come from EncodeRegister (out-of-range value) which is a driver-layer fault. Unit tests: MapModbusExceptionToStatus theory exercising every code in the table including the 0xFF fallback; Read_surface_exception_02_as_BadOutOfRange with an ExceptionRaisingTransport that forces code 02; Write_surface_exception_04_as_BadDeviceFailure for CPU-mode faults; Read_non_modbus_failure_maps_to_BadCommunicationError with a NonModbusFailureTransport that raises EndOfStreamException. 115/115 Modbus.Tests pass. Integration test: DL205ExceptionCodeTests.DL205_FC03_at_unmapped_register_returns_BadOutOfRange reads HR[16383] which is beyond the seeded uint16 cells on the dl205.json profile; pymodbus returns exception 02 and the driver surfaces BadOutOfRange. 11/11 DL205 integration tests pass with MODBUS_SIM_PROFILE=dl205.
2026-04-18 22:28:37 -04:00
Phase 3 PR 52 -- Modbus exception-code -> OPC UA StatusCode translation. Before this PR every server-side Modbus exception AND every transport-layer failure collapsed to BadInternalError (0x80020000) in the driver's Read/Write results, making field diagnosis 'is this a tag misconfig or a driver bug?' impossible from the OPC UA client side. PR 52 adds a MapModbusExceptionToStatus helper that translates per spec: 01 Illegal Function -> BadNotSupported (0x803D0000); 02 Illegal Data Address -> BadOutOfRange (0x803C0000); 03 Illegal Data Value -> BadOutOfRange; 04 Server Failure -> BadDeviceFailure (0x80550000); 05/06 Acknowledge/Busy -> BadDeviceFailure; 0A/0B Gateway -> BadCommunicationError (0x80050000); unknown -> BadInternalError fallback. Non-Modbus failures (socket drop, timeout, malformed frame) in ReadAsync are now distinguished from tag-level faults: they map to BadCommunicationError so operators check network/PLC reachability rather than tag definitions. Why per-DL205: docs/v2/dl205.md documents DL205/DL260 returning only codes 01-04 with specific triggers -- exception 04 specifically means 'CPU in PROGRAM mode during a protected write', which is operator-recoverable by switching the CPU to RUN; surfacing it as BadDeviceFailure (not BadInternalError) makes the fix obvious. Changes in ModbusDriver: Read catch-chain now ModbusException first (-> mapper), generic Exception second (-> BadCommunicationError); Write catch-chain same pattern but generic Exception stays BadInternalError because write failures can legitimately come from EncodeRegister (out-of-range value) which is a driver-layer fault. Unit tests: MapModbusExceptionToStatus theory exercising every code in the table including the 0xFF fallback; Read_surface_exception_02_as_BadOutOfRange with an ExceptionRaisingTransport that forces code 02; Write_surface_exception_04_as_BadDeviceFailure for CPU-mode faults; Read_non_modbus_failure_maps_to_BadCommunicationError with a NonModbusFailureTransport that raises EndOfStreamException. 115/115 Modbus.Tests pass. Integration test: DL205ExceptionCodeTests.DL205_FC03_at_unmapped_register_returns_BadOutOfRange reads HR[16383] which is beyond the seeded uint16 cells on the dl205.json profile; pymodbus returns exception 02 and the driver surfaces BadOutOfRange. 11/11 DL205 integration tests pass with MODBUS_SIM_PROFILE=dl205.
2026-04-18 22:28:37 -04:00