Commit Graph

180 Commits

Author SHA1 Message Date
Joseph Doherty
cde018aec1 Phase 3 PR 52 -- Modbus exception-code -> OPC UA StatusCode translation. Before this PR every server-side Modbus exception AND every transport-layer failure collapsed to BadInternalError (0x80020000) in the driver's Read/Write results, making field diagnosis 'is this a tag misconfig or a driver bug?' impossible from the OPC UA client side. PR 52 adds a MapModbusExceptionToStatus helper that translates per spec: 01 Illegal Function -> BadNotSupported (0x803D0000); 02 Illegal Data Address -> BadOutOfRange (0x803C0000); 03 Illegal Data Value -> BadOutOfRange; 04 Server Failure -> BadDeviceFailure (0x80550000); 05/06 Acknowledge/Busy -> BadDeviceFailure; 0A/0B Gateway -> BadCommunicationError (0x80050000); unknown -> BadInternalError fallback. Non-Modbus failures (socket drop, timeout, malformed frame) in ReadAsync are now distinguished from tag-level faults: they map to BadCommunicationError so operators check network/PLC reachability rather than tag definitions. Why per-DL205: docs/v2/dl205.md documents DL205/DL260 returning only codes 01-04 with specific triggers -- exception 04 specifically means 'CPU in PROGRAM mode during a protected write', which is operator-recoverable by switching the CPU to RUN; surfacing it as BadDeviceFailure (not BadInternalError) makes the fix obvious. Changes in ModbusDriver: Read catch-chain now ModbusException first (-> mapper), generic Exception second (-> BadCommunicationError); Write catch-chain same pattern but generic Exception stays BadInternalError because write failures can legitimately come from EncodeRegister (out-of-range value) which is a driver-layer fault. Unit tests: MapModbusExceptionToStatus theory exercising every code in the table including the 0xFF fallback; Read_surface_exception_02_as_BadOutOfRange with an ExceptionRaisingTransport that forces code 02; Write_surface_exception_04_as_BadDeviceFailure for CPU-mode faults; Read_non_modbus_failure_maps_to_BadCommunicationError with a NonModbusFailureTransport that raises EndOfStreamException. 115/115 Modbus.Tests pass. Integration test: DL205ExceptionCodeTests.DL205_FC03_at_unmapped_register_returns_BadOutOfRange reads HR[16383] which is beyond the seeded uint16 cells on the dl205.json profile; pymodbus returns exception 02 and the driver surfaces BadOutOfRange. 11/11 DL205 integration tests pass with MODBUS_SIM_PROFILE=dl205. 2026-04-18 22:28:37 -04:00
Joseph Doherty
9892a0253d Phase 3 PR 51 -- DL260 X-input FC02 discrete-input mapping end-to-end test. Integration test DL205XInputTests reads FC02 at the DirectLogicAddress.XInputToDiscrete-resolved address and asserts two behaviors against the dl205.json pymodbus profile: (1) X20 octal (=decimal 16 = Modbus DI 16) reads ON, proving the helper correctly octal-parses the trailing number and adds it to the 0 base; (2) X21 octal reads OFF (not exception) -- per docs/v2/dl205.md §I/O-mapping, 'reading a non-populated X input returns zero, not an exception' on DL260, because the CPU sizes the discrete-input table to the configured I/O not the installed hardware. Pymodbus models this by returning the default 0 value for any DI bit in the configured 'di size' range that wasn't explicitly seeded, matching real DL260 behaviour. Test uses X20 rather than X0 to sidestep a shared-blocks conflict: pymodbus places FC01/FC02 bit-address 0..15 into cell 0, but cell 0 is already uint16-typed (V0 marker = 0xCAFE) per the register-zero quirk test, and shared-blocks semantics allow only one type per cell. X20 octal = DI 16 lands in cell 1 which is free, so both the V0 quirk AND the X-input quirk can coexist in one profile. dl205.json: bits cell 1 seeded value=9 (bits 0 and 3 set -> X20, X23 octal = ON), write-range extended to include cell 1 (though X-inputs are read-only; the write-range entry is required by pymodbus for ANY cell referenced in a bits section even if only reads are expected -- pymodbus validates write-access uniformly). 10/10 DL205 integration tests pass with MODBUS_SIM_PROFILE=dl205. No driver code changes -- the XInputToDiscrete helper + FC02 read path already landed in PRs 50 and 21 respectively. This PR closes the integration-test gap that docs/v2/dl205.md called out under test name DL205_Xinput_unpopulated_reads_as_zero. 2026-04-18 22:25:13 -04:00
Joseph Doherty
b5464f11ee Phase 3 PR 50 -- DL260 bit-memory address helpers (Y/C/X/SP) + live coil integration tests. Adds four new static helpers to DirectLogicAddress covering every discrete-memory bank on the DL260: YOutputToCoil (Y0=coil 2048), CRelayToCoil (C0=coil 3072), XInputToDiscrete (X0=DI 0), SpecialToDiscrete (SP0=DI 1024). Each helper takes the DirectLOGIC ladder-logic address (e.g. 'Y0', 'Y17', 'C1777') and adds the octal-decoded offset to the bank's Modbus base per the DL260 user manual's I/O-configuration chapter table. Uses the same 'octal-walk + reject 8/9' pattern as UserVMemoryToPdu so misaligned addresses fail loudly with a clear ArgumentException rather than silently hitting the wrong coil. Fixes a pymodbus-config bug surfaced during integration-test validation: dl205.json had bits entries at cell indices 2048 / 3072 / 4000, but pymodbus's ModbusSimulatorContext.validate divides bit addresses by 16 before indexing into the shared cell array -- so Modbus coil 2048 reads cell 128, not cell 2048. The sim was returning Illegal Data Address (exception 02) for every bit read in the Y/C/scratch range. Moved bits entries to cells 128 (Y bank marker = 0b101 for Y0=ON, Y1=OFF, Y2=ON), 192 (C bank marker = 0b101 for C0/C1/C2), 250 (scratch cell covering coils 4000..4015). write list updated to the correct cell addresses. Unit tests: YOutputToCoil theory sweep (Y0->2048, Y1->2049, Y7->2055, Y10->2056 octal-to-decimal, Y17->2063, Y777->2559 top of DL260 Y range), CRelayToCoil theory (C0->3072 through C1777->4095), XInputToDiscrete theory, SpecialToDiscrete theory (with case-insensitive 'SP' prefix). Bit_address_rejects_non_octal_digits (Y8/C9/X18), Bit_address_rejects_empty, accepts_lowercase_prefix, accepts_bare_octal_without_prefix. 48/48 Modbus.Tests pass. Integration tests: DL205CoilMappingTests with three facts -- DL260_Y0_maps_to_coil_2048 (FC01 at Y0 returns ON), DL260_C0_maps_to_coil_3072 (FC01 at C0 returns ON), DL260_scratch_Crelay_supports_write_then_read (FC05 write + FC01 read round-trip at coil 4000 proves the DL-mapped coil bank is fully read/write capable end-to-end). 9/9 DL205 integration tests pass against the pymodbus dl205 profile with MODBUS_SIM_PROFILE=dl205. Caller opts into the helpers per tag the same way as PR 47's V-memory helper -- pass DirectLogicAddress.YOutputToCoil("Y0") as the ModbusTagDefinition Address; no driver-wide DL-family flag. PR 51 adds the X-input read-side integration test (there's nothing to write since X-inputs are FC02 discrete inputs, read-only); PR 52 exception-code translation; PR 53 transport reconnect-on-drop since DL260 doesn't send TCP keepalives. 2026-04-18 22:22:42 -04:00
dae29f14c8 Merge pull request 'Phase 3 PR 49 -- Per-device FC03/FC16 register caps with auto-chunking' (#48) from phase-3-pr49-dl205-fc-caps into v2 2026-04-18 22:13:46 -04:00
f306793e36 Merge pull request 'Phase 3 PR 48 -- DL205 CDAB float word order end-to-end test' (#47) from phase-3-pr48-dl205-cdab-float into v2 2026-04-18 22:13:39 -04:00
9e61873cc0 Merge pull request 'Phase 3 PR 47 -- DL205 V-memory octal-address helper' (#46) from phase-3-pr47-dl205-vmemory into v2 2026-04-18 22:13:32 -04:00
1a60470d4a Merge pull request 'Phase 3 PR 46 -- DL205 BCD decoder' (#45) from phase-3-pr46-dl205-bcd into v2 2026-04-18 22:13:24 -04:00
635f67bb02 Merge pull request 'Phase 3 PR 45 -- DL205 string byte-order quirk' (#44) from phase-3-pr45-dl205-string-byte-order into v2 2026-04-18 22:12:15 -04:00
Joseph Doherty
a3f2f95344 Phase 3 PR 49 -- Per-device FC03/FC16 register caps with auto-chunking. Adds MaxRegistersPerRead (default 125, spec max) + MaxRegistersPerWrite (default 123, spec max) to ModbusDriverOptions. Reads that exceed the cap automatically split into consecutive FC03 requests: the driver dispatches chunks of [cap] regs at incrementing addresses, copies each response into an assembled byte[] buffer, and hands the full payload to DecodeRegister. From the caller's view a 240-char string read against a cap-100 device is still one Read() call returning one string -- the chunking is invisible, the wire shows N requests of cap-sized quantity plus one tail chunk. Writes are NOT auto-chunked. Splitting an FC16 across two transactions would lose atomicity -- mid-split crash leaves half the value written, which is strictly worse than rejecting upfront. Instead, writes exceeding MaxRegistersPerWrite throw InvalidOperationException with a message naming the tag + cap + the caller's escape hatch (shorten StringLength or split into multiple tags). The driver catches the exception internally and surfaces it to IWritable as BadInternalError so the caller pattern stays symmetric with other failure modes. Per-family cap cheat-sheet (documented in xml-doc on the option): Modbus-TCP spec = 125 read / 123 write, AutomationDirect DL205/DL260 = 128 read / 100 write (128 exceeds spec byte-count capacity so in practice 125 is the working ceiling), Mitsubishi Q/FX3U = 64 / 64, Omron CJ/CS = 125 / 123. Not all PLCs reject over-cap requests cleanly -- some drop the connection silently -- so having the cap enforced client-side prevents the hard-to-diagnose 'driver just stopped' failure mode. Unit tests: Read_within_cap_issues_single_FC03_request (control: no unnecessary chunking), Read_above_cap_splits_into_two_FC03_requests (120 regs / cap 100 -> 100+20, asserts exact per-chunk (Address,Quantity) and end-to-end payload continuity starting with register[100] high byte = 'A'), Read_cap_honors_Mitsubishi_lower_cap_of_64 (100 regs / cap 64 -> 64+36), Write_exceeding_cap_throws_instead_of_splitting (110 regs / cap 100 -> status != 0 AND Fc16Requests.Count == 0 to prove nothing was sent), Write_within_cap_proceeds_normally (control: cap honored on short writes too). Tests use a new RecordingTransport that captures the (Address, Quantity) tuple of every FC03/FC16 request so the chunk layout is directly assertable -- the existing FakeTransport does not expose request history. 103/103 Modbus.Tests pass; 6/6 DL205 integration tests still pass against the live pymodbus dl205 profile with MODBUS_SIM_PROFILE=dl205. 2026-04-18 21:58:49 -04:00
Joseph Doherty
463c5a4320 Phase 3 PR 48 -- DL205 CDAB word order for Float32 end-to-end test. The driver has supported ModbusByteOrder.WordSwap (CDAB) since PR 24 for all multi-register types -- the underlying word-swap code path was already there. PR 48 closes the loop with an integration test that validates it end-to-end against the dl205 pymodbus profile: HR[1056..1057] stores IEEE-754 1.5f with the low word at the lower address (0x0000 at HR[1056], 0x3FC0 at HR[1057]). Reading with WordSwap returns 1.5f; reading with BigEndian returns a tiny denormal (~5.74e-41) -- a silent "value is 0" bug that typically surfaces in the field only when an operator notices a setpoint readout stuck at 0 while the PLC display shows the real value. Test asserts both: WordSwap==1.5f AND BigEndian!=1.5f, proving the flag is not a no-op. No driver code changes -- the word-swap normalization at NormalizeWordOrder() has handled Float32/Int32/UInt32 correctly since PR 24 and the unit test suite already covers it (Int32_WordSwap_decodes_CDAB_layout + Float32 equivalent). This PR exists primarily to lock in the integration-level validation so future refactors of the codec don't silently break DL205/DL260 floats. 6/6 DL205 integration tests pass with MODBUS_SIM_PROFILE=dl205. 2026-04-18 21:51:15 -04:00
Joseph Doherty
2b5222f5db Phase 3 PR 47 -- DL205 V-memory octal-address helper. Adds DirectLogicAddress static class with two entry points: UserVMemoryToPdu(string) parses a DirectLOGIC V-address (V-prefixed or bare, whitespace tolerated) as OCTAL and returns the 0-based Modbus PDU address. V2000 octal = decimal 1024 = PDU 0x0400, which is the canonical start of the user V-memory bank on DL205/DL260. SystemVMemoryBasePdu + SystemVMemoryToPdu(ushort offset) handle the system bank (V40400 and up) which does NOT follow the simple octal-to-decimal formula -- the CPU relocates the system bank to PDU 0x2100 in H2-ECOM100 absolute mode. A naive caller converting 40400 octal would land at PDU 0x4100 (decimal 16640) and miss the system registers entirely; the helper routes the correct 0x2100 base. Why this matters: DirectLOGIC operators think in OCTAL (the ladder-logic editor, the Productivity/Do-more UI, every AutomationDirect manual addresses V-memory octally) while the Modbus wire is DECIMAL. Integrators routinely copy V-addresses from the PLC documentation into client configs and read garbage because they treated V2000 as decimal 2000 (HR[2000] = 0 in the dl205 sim, zero in most PLCs). The helper makes the translation explicit per the D2-USER-M appendix + H2-ECOM-M \u00A76.5 references cited in docs/v2/dl205.md. Unit tests: UserVMemoryToPdu_converts_octal_V_prefix (V0, V1, V7, V10, V2000, V7777, V10000, V17777 -- the exact sweep documented in dl205.md), UserVMemoryToPdu_accepts_bare_or_prefixed_or_padded (case + whitespace tolerance), UserVMemoryToPdu_rejects_non_octal_digits (V8/V19/V2009 must throw ArgumentException with 'octal' in the message -- .NET has no base-8 int.Parse so we hand-walk digits to catch 8/9 instead of silently accepting them), UserVMemoryToPdu_rejects_empty_input, UserVMemoryToPdu_overflow_rejected (200000 octal = 0x10000 overflows ushort), SystemVMemoryBasePdu_is_0x2100_for_V40400, SystemVMemoryToPdu_offsets_within_bank, SystemVMemoryToPdu_rejects_overflow. 23/23 Modbus.Tests pass. Integration tests against dl205.json pymodbus profile: DL205_V2000_user_memory_resolves_to_PDU_0x0400_marker (reads HR[0x0400]=0x2000), DL205_V40400_system_memory_resolves_to_PDU_0x2100_marker (reads HR[0x2100]=0x4040). 5/5 DL205 integration tests pass. Caller opts into the helper per tag by calling DirectLogicAddress.UserVMemoryToPdu("V2000") as the ModbusTagDefinition Address -- no driver-wide "DL205 mode" flag needed, because users mix DL and non-DL tags in a single driver instance all the time. 2026-04-18 21:49:58 -04:00
Joseph Doherty
8248b126ce Phase 3 PR 46 -- DL205 BCD decoder (binary-coded-decimal numeric encoding). Adds ModbusDataType.Bcd16 and Bcd32 to the driver. Bcd16 is 1 register wide, Bcd32 is 2 registers wide; Bcd32 respects ModbusByteOrder (BigEndian/WordSwap) the same way Int32 does so the CDAB-style families (including DL205/DL260 themselves) can be configured. DecodeRegister uses the new internal DecodeBcd helper: walks each nibble from MSB to LSB, multiplies the running result by 10, adds the nibble as a decimal digit. Explicitly rejects nibbles > 9 with InvalidDataException -- hardware sometimes produces garbage during write-in-progress transitions and silently returning wrong numeric values would quietly corrupt the caller's data. EncodeRegister's new EncodeBcd inverts the operation (mod/div by 10 nibble-by-nibble) with an up-front overflow check against 10^nibbles-1. Why this matters for DL205/DL260: AutomationDirect DirectLOGIC uses BCD as the default numeric encoding for timers, counters, and operator-display numerics (not binary). A plain Int16 read of register 0x1234 returns 4660; the BCD path returns 1234. The two differ enough that silently defaulting to Int16 would give wildly wrong HMI values -- the caller must opt in to Bcd16/Bcd32 per tag. Unit tests: DecodeBcd (theory: 0,1,9,10,1234,9999), DecodeBcd_rejects_nibbles_above_nine, EncodeBcd (theory), Bcd16_decodes_DL205_register_1234_as_decimal_1234 (control: same bytes as Int16 decode to 4660), Bcd16_encode_round_trips_with_decode, Bcd16_encode_rejects_out_of_range_values, Bcd32_decodes_8_digits_big_endian, Bcd32_word_swap_handles_CDAB_layout, Bcd32_encode_round_trips_with_decode, Bcd_RegisterCount_matches_underlying_width. 66/66 Modbus.Tests pass. Integration test: DL205BcdQuirkTests.DL205_BCD16_decodes_HR1072_as_decimal_1234 against dl205.json pymodbus profile (HR[1072]=0x1234). Asserts Bcd16 decode=1234 AND Int16 decode=0x1234 on the same wire bytes to prove the paths are distinct. 3/3 DL205 integration tests pass with MODBUS_SIM_PROFILE=dl205. 2026-04-18 21:46:25 -04:00
Joseph Doherty
cd19022d19 Phase 3 PR 45 -- DL205 string byte-order quirk (low-byte-first ASCII packing). Adds ModbusStringByteOrder enum {HighByteFirst, LowByteFirst} + StringByteOrder field on ModbusTagDefinition (default HighByteFirst, the standard Modbus convention). DecodeRegister + EncodeRegister String branches now respect per-tag byte order. Under LowByteFirst each register packs the first char in the low byte instead of the high byte -- the AutomationDirect DirectLOGIC DL205/DL260/DL350 family's headline string quirk. Without the flag the driver decodes 'eHllo' garbage from HR[1040..1042] even though wire bytes are identical. Unit tests: String_LowByteFirst_decodes_DL205_packed_Hello (5 chars across 3 regs with nul pad), String_LowByteFirst_decode_truncates_at_first_nul, String_LowByteFirst_encode_round_trips_with_decode (asserts exact DL205-documented byte sequence {0x65,0x48,0x6C,0x6C,0x00,0x6F} + symmetric encode->decode), String_HighByteFirst_and_LowByteFirst_differ_on_same_wire (control: same wire, different flag => different decode). 56/56 Modbus.Tests pass. Integration test: DL205StringQuirkTests.DL205_string_low_byte_first_decodes_Hello_from_HR1040 against the dl205.json pymodbus profile; reads HR[1040..1042] with both flags on the same tag map and asserts LowByteFirst='Hello' + HighByteFirst!='Hello'. Gated on MODBUS_SIM_PROFILE=dl205 since the standard profile doesn't seed HR[1040..1042]. Verified 2/2 integration tests pass against running pymodbus dl205 simulator. Baseline for PR 46 (BCD decoder), PR 47 (V-memory octal helper), PR 48 (CDAB float order), PR 49 (FC03/FC16 per-device caps) -- each lands its own DL205_<behavior> test class in tests/.../DL205/. 2026-04-18 21:43:32 -04:00
5ee9acb255 Merge pull request 'Phase 3 PR 44 -- pymodbus validation + IPv4-explicit transport bugfix' (#43) from phase-3-pr44-pymodbus-validation-fixes into v2 2026-04-18 21:39:24 -04:00
Joseph Doherty
02fccbc762 Phase 3 PR 43 — followup commit: validate pymodbus simulator end-to-end + fix three real bugs surfaced by running it. winget-installed Python 3.12.10 + pip-installed pymodbus[simulator]==3.13.0 on the dev box; both profiles boot cleanly, the integration-suite smoke test passes against either profile.
Three substantive issues caught + fixed during the validation pass:
1. pymodbus rejects unknown keys at device-list / setup level. My PR 43 commit had `_layout_note`, `_uint16_layout`, `_bits_layout`, `_write_note` device-level JSON-comment fields that crashed pymodbus startup with `INVALID key in setup`. Removed all device-level _* fields. Inline `_quirk` keys WITHIN individual register entries are tolerated by pymodbus 3.13.0 — kept those in dl205.json since they document the byte math per quirk and the README + git history aren't enough context for a hand-author reading raw integer values. Documented the constraint in the top-level _comment of each profile.
2. pymodbus rejects sweeping `write` ranges that include any cell not assigned a type. My initial standard.json had `write: [[0, 2047]]` but only seeded HR[0..31] + HR[100] + HR[200..209] + bits[1024..1109] — pymodbus blew up on cell 32 (gap between HR[31] and HR[100]). Fixed by listing per-block write ranges that exactly mirror the seeded ranges. Same fix in dl205.json (was `[[0, 16383]]`).
3. pymodbus simulator stores all 4 standard Modbus tables in ONE underlying cell array — each cell can only be typed once (BITS or UINT16, not both). My initial standard.json had `bits[0..31]` AND `uint16[0..31]` overlapping at the same addresses; pymodbus crashed with `ERROR "uint16" <Cell> used`. Fixed by relocating coils to address 1024+, well clear of the uint16 entries at 0..209. Documented the layout constraint in the standard.json top-level _comment.
Substantive driver bug fixed: ModbusTcpTransport.ConnectAsync was using `new TcpClient()` (default constructor — dual-stack, IPv6 first) then `ConnectAsync(host, port)` with the user's hostname. .NET's TcpClient default-resolves "localhost" to ::1 first, fails to connect to pymodbus (which binds 0.0.0.0 IPv4-only), and only then retries IPv4 — the failure surfaces as the entire ConnectAsync timeout (2s by default) before the IPv4 attempt even starts. PR 30's smoke test silently SKIPPED because the fixture's TCP probe hit the same dual-stack ordering and timed out. Both fixed: ModbusSimulatorFixture probe now resolves Dns.GetHostAddresses, prefers AddressFamily.InterNetwork, dials IPv4 explicitly. ModbusTcpTransport does the same — resolves first, prefers IPv4, falls back to whatever Dns returns (handles IPv6-only hosts in the future). This is a real production-readiness fix because most Modbus PLCs are IPv4-only — a generic dual-stack TcpClient would burn the entire connect timeout against any IPv4-only PLC, masquerading as a connection failure when the PLC is actually fine.
Smoke-test address shifted HR[100] -> HR[200]. Standard.json's HR[100] is the auto-incrementing register that drives subscribe-and-receive tests, so write-then-read against it would race the increment. HR[200] is the first cell of a writable scratch range present in BOTH simulator profiles. DL205Profile.cs xml-doc updated to explain the shift; tag name "DL205_Smoke_HReg100" -> "Smoke_HReg200" + smoke test references updated. dl205.json gains a matching scratch HR[200..209] range so the smoke test runs identically against either profile.
Validation matrix:
- standard.json boot: clean (TCP 5020 listening within ~3s of pymodbus.simulator launch).
- dl205.json boot: clean.
- pymodbus client direct FC06 to HR[200]=1234 + FC03 read: round-trip OK.
- raw-bytes PowerShell TcpClient FC06 + 12-byte response: matches FC06 spec (echo of address + value).
- DL205SmokeTest against standard.json: 1/1 pass (was failing as 'BadInternalError' due to the dual-stack timeout + tag-name typo — both fixed).
- DL205SmokeTest against dl205.json: 1/1 pass.
- Modbus.Tests Unit suite: 52/52 pass — dual-stack transport fix is non-breaking.
- Solution build clean.
Memory + future-PR setup: pymodbus install + activation pattern is now bullet-pointed at the top of Pymodbus/README.md so future PRs (the per-quirk DL205_<behavior> tests in PR 44+) don't have to repeat the trial-and-error of getting the simulator + integration tests cooperating. The three bugs above are documented inline in the JSON profiles + ModbusTcpTransport so they don't bite again.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-18 21:14:02 -04:00
faeab34541 Merge pull request 'Phase 3 PR 43 — Swap ModbusPal to pymodbus for the integration-test simulator' (#42) from phase-3-pr43-pymodbus-swap into v2 2026-04-18 20:52:46 -04:00
Joseph Doherty
a05b84858d Phase 3 PR 43 — Swap ModbusPal to pymodbus for the integration-test simulator. Replaces the .xmpp profiles shipped in PR 42 with pymodbus 3.13.0 ModbusSimulatorServer JSON configs in tests/ZB.MOM.WW.OtOpcUa.Driver.Modbus.IntegrationTests/Pymodbus/. Substantive reasons for the swap (rationale block in the test-plan doc): ModbusPal 1.6b is abandoned (last release ~2019), Java GUI-only with no headless mode in the official JAR, and only exposes 2 of the 4 standard Modbus tables (holding_registers + coils — no input_registers, no discrete_inputs). pymodbus is current stable, pure Python CLI (pip install pymodbus[simulator]==3.13.0), exposes all four tables, has built-in declarative actions (increment / random / timestamp / uptime) for dynamic registers, supports custom Python actions for anything more complex, and ships an optional aiohttp-based web UI / REST API for live inspection. Pip-installable on Windows; sidesteps the privileged-port admin requirement by defaulting to TCP 5020.
ModbusSimulatorFixture default port bumped from 502 to 5020 to match the pymodbus convention. Override via MODBUS_SIM_ENDPOINT for a real PLC on its native 502. Skip-message updated to point at the new Pymodbus\serve.ps1 wrapper instead of 'start ModbusPal'. csproj <None Update> rule swapped from ModbusPal/** to Pymodbus/** so the new JSON profiles + serve.ps1 + README copy to test-output as PreserveNewest.
standard.json — generic Modbus TCP server, slave id 1, port 5020, shared blocks=false (independent coils + HR address spaces, more textbook-PLC-like). HR[0..31] seeded with address-as-value via per-register uint16 entries, HR[100] auto-increments via the built-in increment action with parameters minval=0/maxval=65535 (drives subscribe-and-receive integration tests so they have a register that ticks without a write — pymodbus's increment ticks per-access not wall-clock, which is good enough for a 250ms-poll test), HR[200..209] scratch range left at 0 for write tests, coils 0..31 alternating, coils 100..109 scratch. write list covers 0..1023 so any test address is mutable.
dl205.json — AutomationDirect DirectLOGIC DL205/DL260 quirk simulator, slave id 1, port 5020, shared blocks=true (matches DL series memory model where coils/DI/HR overlay the same word address space). Each quirky register seeded with the pre-computed raw uint16 value documented in docs/v2/dl205.md, with an inline _quirk JSON-comment naming the behavior so future-me reading the file knows why HR[1040]=25928 means 'H' lo / 'e' hi (the user's headline string-byte-order finding). Encoded quirks: V0 marker at HR[0]=0xCAFE; V2000 at HR[1024]=0x2000; V40400 at HR[8448]=0x4040; 'Hello' string at HR[1040..1042] first-char-low-byte; Float32 1.5f at HR[1056..1057] in CDAB word order (low word first); BCD register at HR[1072]=0x1234; FC03-128-cap block at HR[1280..1407]; Y0/C0 coil markers at 2048/3072; scratch C-relays at 4000..4007.
serve.ps1 wrapper — pwsh script with a -Profile {standard|dl205} parameter switch. Validates pymodbus.simulator is on PATH (clearer message than the raw CommandNotFoundException), validates the profile JSON exists, builds the right --modbus_server/--modbus_device/--json_file/--http_port arg list, and execs pymodbus.simulator in the foreground. -HttpPort 0 disables the web UI. Foreground exec lets the operator Ctrl+C to stop without an extra control script.
README.md fully rewritten for pymodbus: install command (pip install 'pymodbus[simulator]==3.13.0' — pinned for reproducibility, [simulator] extra pulls aiohttp), per-profile reference tables, the same DL205 quirk → register table from PR 42 but adjusted for pymodbus paths, what's-NEW-vs-ModbusPal section (all four tables, raw uint16 seeding, declarative actions, custom Python action modules, headless, web UI, maintained), trade-offs section (float32-as-two-uint16s for explicit CDAB control, increment ticks per-access not wall-clock, shared-blocks mode for DL205 vs separate for Standard), file-format quick reference for hand-authoring more profiles. References pinned to the pymodbus readthedocs simulator/config + REST API pages.
docs/v2/modbus-test-plan.md harness section rewritten with the swap rationale; PR-history list updated to mark PR 42 SUPERSEDED by PR 43 and call out PR 44+ as the per-quirk implementation track. Test-conventions bullet about 'don't depend on ModbusPal state between tests' generalized to 'don't depend on simulator state' and a note added that pymodbus's REST API can reset state between facts if a test ever needs it.
DL205Profile.cs and DL205SmokeTests.cs xml-doc updated to reference pymodbus / dl205.json instead of ModbusPal / DL205.xmpp.
Functional validation deferred — Python isn't installed on this dev box (winget search returned no matches for Python.Python.3 exact). JSON parses structurally (PowerShell ConvertFrom-Json clean on both files), build clean, .json + serve.ps1 + README all copy to test-output as expected. User installs pymodbus when they want to actually run the simulator end-to-end; if pymodbus rejects the config the README's reference link to pymodbus's simulator/config schema doc is the right next stop.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-18 20:35:26 -04:00
c59ac9e52d Merge pull request 'Phase 3 PR 42 — ModbusPal simulator profiles for Standard + DL205/DL260' (#41) from phase-3-pr42-modbuspal-profiles into v2 2026-04-18 20:12:39 -04:00
Joseph Doherty
02a0e8efd1 Phase 3 PR 42 — ModbusPal simulator profiles for Standard Modbus + DL205/DL260 quirks. Two hand-authored .xmpp profiles in tests/ZB.MOM.WW.OtOpcUa.Driver.Modbus.IntegrationTests/ModbusPal/ that integration tests load via the GUI to drive the suite without a real PLC. Both well-formed XML (verified via PowerShell [xml] cast); both copied to test-output as PreserveNewest content per the existing csproj rule.
Standard.xmpp — generic Modbus TCP server on port 502, slave id 1. HR[0..31] seeded with address-as-value (HR[5]=5 — easy mental map for diagnostics), HR[100] auto-incrementing via a 1Hz LinearGenerator binding (drives subscribe-and-receive integration tests so they have a register that actually changes without a write), HR[200..209] scratch range for write-roundtrip tests, coils 0..31 alternating on/off, coils 100..109 scratch. The Tick automation runs 0..65535 over 60s looping; bound to HR[100] via Binding_SINT16 — slow enough that a 250ms-poll integration test sees discrete jumps, fast enough that a 5s subscribe test sees several change notifications.
DL205.xmpp — AutomationDirect DirectLOGIC DL205/DL260 quirk simulator on port 502, slave id 1, modeling the behaviors documented in docs/v2/dl205.md as concrete register values so DL205 integration tests can assert each quirk WITHOUT a live PLC. Per-quirk encoding: V0 marker at HR[0]=0xCAFE proves register 0 is valid (rejects-register-0 rumour disproved); V2000 marker at HR[1024]=0x2000 proves V-memory octal-to-decimal mapping; V40400 marker at HR[8448]=0x4040 proves V40400→PDU 0x2100 (NOT register 0, contrary to the widespread shorthand); 'Hello' string at HR[1040..1042] packed first-char-low-byte (HR[1040]=0x6548 = 'H' lo + 'e' hi, HR[1041]=0x6C6C, HR[1042]=0x006F) — the headline string-byte-order quirk the user flagged; Float32 1.5f at HR[1056..1057] in CDAB word order (low word first: 0, then 0x3FC0); BCD register at HR[1072]=0x1234 representing decimal 1234 in BCD nibbles (NOT binary 0x04D2); 128-register block at HR[1280..1407] for FC03-128-cap testing; Y0 marker at coil 2048, C0 marker at coil 3072, scratch C-coils at 4000..4007 for write tests.
Critical limitation flagged inline + in README: ModbusPal 1.6b CANNOT represent the DL205 quirks semantically — it has no string binding, no BCD binding, no arbitrary-byte-layout binding (only SINT16/SINT32/FLOAT32 with word-order). So every DL205 quirk is encoded as a pre-computed raw 16-bit integer with the math worked out in inline comments above each register. Becomes unreadable past ~50 quirky registers; the README's 'alternatives' section recommends switching to pymodbus when that threshold approaches (pymodbus's ModbusSimulatorServer has first-class headless + scriptable callbacks for byte-level layouts).
Other ModbusPal 1.6b limitations called out in README: only holding_registers + coils sections in the official build (no input_registers / discrete_inputs — DL260 X-input markers can't be encoded faithfully here, FC02/FC04 tests wait for a fork or pymodbus); abandoned project (last release 1.6b, active forks at SCADA-LTS/ModbusPal, ControlThings-io/modbuspal, mrhenrike/ModbusPalEnhanced); no headless mode in the official JAR (-loadFile / -hide flags only in source-built forks); CVE-2018-10832 XXE on .xmpp import (don't import untrusted profiles — the in-repo ones are author-controlled).
README.md updated with: per-profile description tables, getting-started (download jar + java -jar + GUI File>Load>Run), MODBUS_SIM_ENDPOINT env-var override doc, two reference tables documenting which HR / coil address encodes which DL205 quirk + which test name asserts it (the same DL205_<behavior> naming convention from docs/v2/modbus-test-plan.md), 4-row alternatives comparison (pymodbus / diagslave / ModbusMechanic / ModRSsim2) for when ModbusPal can no longer carry the load, and a quick-reference XML format table at the bottom for future-me hand-authoring more profiles.
Pure documentation + test-asset PR — no code changes. The integration tests that consume these profiles (the actual DL205_<behavior> facts) land one at a time in PR 43+ as user validates each quirk via ModbusPal on the bench.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-18 20:05:20 -04:00
7009483d16 Merge pull request 'Phase 3 PR 41 — Document AutomationDirect DL205 / DL260 Modbus quirks' (#40) from phase-3-pr41-dl205-quirks-doc into v2 2026-04-18 19:52:20 -04:00
Joseph Doherty
9de96554dc Phase 3 PR 41 — Document AutomationDirect DL205 / DL260 Modbus quirks. Adds docs/v2/dl205.md (~300 lines, 8 H2 sections, primary-source citations) covering every place the DL205/DL260 family diverges from textbook Modbus or has non-obvious behavior a generic client gets wrong. Replaces the placeholder _pending_ list in modbus-test-plan.md with a confirmed-behaviors table that doubles as the integration-test roadmap.
The user explicitly flagged that DL205/DL260 strings don't follow Modbus convention; research turned up that and a lot more. Headline findings:
String packing — TWO chars per V-memory register but the FIRST char is in the LOW byte (opposite of the big-endian Modbus convention generic drivers default to). 'Hello' in V2000 reads back as 'eHll o\0' on a textbook decoder. Kepware's DirectLogic driver exposes a per-tag 'String Byte Order = Low/High' toggle specifically for this; we'll need the same. Null-terminated, no length prefix, no dedicated KSTR address space — strings live wherever ladder allocates them in V-memory.
V-memory addressing — DirectLOGIC's native V-memory is OCTAL (V2000, V40400) but Modbus is decimal. The CPU translates: V2000 octal = decimal 1024 = Modbus PDU 0x0400. The widespread 'V40400 = register 0' shorthand is wrong on modern firmware (that was DL05/DL06 relative mode); on H2-ECOM100 absolute mode (factory default) V40400 = PDU 0x2100. We'd surface this with an address-format helper in the device profile so operators write V2000 instead of computing 1024 by hand.
Word order CDAB for all 32-bit values — DL205 and DL260 agree, ECOM modules don't re-swap. Already supported via ModbusByteOrder.WordSwap; just needs to be the default in the DL205 profile.
BCD-as-default numeric storage — bit one I didn't expect. DirectLOGIC stores 'V2000 = 1234' as 0x1234 on the wire (BCD nibbles), not as 0x04D2 (decimal 1234). IEEE 754 Float32 only works when ladder used the explicit R type (LDR/OUTR instructions). We need a new decoder mode for BCD-encoded registers — current code assumes binary integers.
FC quantity caps — FC03/04 cap at 128 (above spec's 125 — Bonus territory, current code already respects 125), FC16 caps at 100 (BELOW spec's 123 — important bulk-write batching gotcha). Quantity overrun returns exception 03 IllegalDataValue.
Coil/discrete mappings — DL260: X0->discrete input 0, Y0->coil 2048, C0->coil 3072. SP specials at discrete input 1024-1535 RO. These are CPU-wired constants and cannot be remapped; need to be hardcoded in the DL205/DL260 device profile.
Register 0 — accepted on DL205/DL260 with ECOM in absolute mode, contrary to the widespread internet claim that 'DirectLOGIC rejects register 0'. That rumour was an older DL05/DL06 relative-mode artefact. Our ModbusProbeOptions.ProbeAddress default of 0 is therefore safe for DL205/DL260.
Exception codes — only the standard 01-04. Write-to-protected-bit returns 02 on newer firmware, 04 on older (firmware-transition revision unconfirmed); driver should map both to BadNotWritable. No proprietary exception codes.
Behavioral oddities — H2-ECOM100 accepts MAX 4 simultaneous TCP connections (5th refused at TCP accept). No TCP keepalive (intermediate NAT/firewall drops idle sockets after 2-5 min — periodic probe required). No mid-stream resync on malformed MBAP — driver must reconnect + replay. TxId-drop-under-load forum rumour is unconfirmed; our single-flight + TxId-match guard handles it either way.
Each H2 section ends with the integration-test names we'd ship per the modbus-test-plan.md DL205_<behavior> convention — twelve named test slots ready for PR 42+ to fill in one at a time. References (8) cited inline, primarily D2-USER-M, HA-ECOM-M, and the Kepware DirectLogic Ethernet driver manual which documents these vendor quirks explicitly because they have to cope with them.
modbus-test-plan.md DL205 section rewritten as a priority-ordered table with three columns (quirk / driver impact / test name), pointing the reader at dl205.md for the full reference. Operator-reported items separated into a tail subsection so future-me knows which behaviors are documented vs reproduced-on-hardware.
Pure documentation PR — no code changes. The actual driver work (string-byte-order option, BCD decoder mode, V-memory address helper, FC16 cap-per-device-family, multi-client TCP handling) lands one PR per quirk in PR 42+ as ModbusPal validation completes.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-18 19:49:35 -04:00
af35fac0ef Merge pull request 'Phase 3 PR 40 — LiveStack write + subscribe tests against TestMachine_001' (#39) from phase-3-pr40-livestack-write-subscribe into v2 2026-04-18 19:41:55 -04:00
Joseph Doherty
aa8834a231 Phase 3 PR 40 — LiveStackSmokeTests: write-roundtrip + subscribe-receives-OnDataChange against the live Galaxy. Finishes LMX #5 by exercising the IWritable + ISubscribable capability paths end-to-end through the Proxy → OtOpcUaGalaxyHost service → MXAccess → real Galaxy.
Two new facts target DelmiaReceiver_001.TestAttribute — the writable Boolean UDA on the TestMachine_001 hierarchy in this dev Galaxy. The user nominated TestMachine_001 (the deployed test-target object) as a scratch surface for live testing; ZB query showed DelmiaReceiver_001 carries one dynamic_attribute named TestAttribute (mx_data_type=1=Boolean, lock_type=0=writable, security_classification=1=Operate). Naming makes the intent obvious — the attribute exists for exactly this kind of integration testing — and Boolean keeps the assertions simple (invert, write, read back).
Write_then_read_roundtrips_a_writable_Boolean_attribute_on_TestMachine_001: reads the current value as the baseline (Galaxy may return Uncertain quality until the Engine has scanned the attribute at least once — we don't read into a typed bool until Status is Good), inverts it, writes via IWritable, then polls reads in a 5s loop until either the new value comes back or the budget expires. The scan-window poll (rather than a single read after a fixed delay) accommodates Galaxy's variable scan latency on a fresh service start. Restore-on-finally writes the original value back so re-running the test doesn't accumulate a flipped TestAttribute on the dev box (Galaxy holds UDA values across runs since they're deployed). Best-effort restore — swallows exceptions so a failure in restore doesn't mask the primary assertion.
Subscribe_fires_OnDataChange_with_initial_value_then_again_after_a_write: subscribes to the same attribute with a 250ms publishing interval, captures every OnDataChange notification onto a thread-safe ConcurrentQueue (MXAccess advisory fires on its own thread per Galaxy's COM apartment model — must not block it), waits up to 5s for the initial-value callback (per ISubscribable's contract: 'driver MAY fire OnDataChange immediately with the current value'), records the queue depth as a baseline, writes the toggled value, waits up to 8s for at least one MORE notification, then searches the queue tail for the notification carrying the toggled value (initial value may appear multiple times before the write commits — looking at the tail finds the post-write delta even if the queue grew during the wait window). Unsubscribes on finally + restores baseline.
Both tests use Convert.ToBoolean(value ?? false) to defensively handle the Boxed-vs-typed quirk in MessagePack-deserialized Galaxy values — depending on the wire encoding the Boolean might come back as System.Boolean or System.Object boxing one. Convert.ToBoolean handles both. Same pattern in OnReadValue's existing usage.
WaitForAsync helper does the loop+budget pattern shared by both tests.
PR 40 is the code side of LMX #5's final two deferred facts. To actually run them green requires re-executing from a normal (non-admin) PowerShell — the elevated-shell skip from PR 39 fires correctly under bash + sc.exe-context (verified). lmx-followups.md #5 updated to note the new facts + the run command + the one remaining genuine follow-up (alarm-condition fact when an alarm-flagged attribute is deployed on TestMachine_001).
Test posture from elevated bash: 7 LiveStackSmokeTests facts discovered (was 5; +2 new), all skip cleanly with the elevation message. Build clean.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-18 19:38:34 -04:00
976e73e051 Merge pull request 'Phase 3 PR 39 — LiveStackFixture skip-with-reason for elevated shells' (#38) from phase-3-pr39-elevated-shell-skip into v2 2026-04-18 19:31:30 -04:00
Joseph Doherty
8fb3dbe53b Phase 3 PR 39 — LiveStackFixture pre-flight detect for elevated shell. The OtOpcUaGalaxyHost named-pipe ACL allows the configured SID but explicitly DENIES Administrators per decision #76 / PipeAcl.cs (production-hardening — keeps an admin shell on a deployed box from connecting to the IPC channel without going through the configured service principal). A test process running with a high-integrity elevated token carries the Administrators group in its security context regardless of whose user it 'is', so the deny rule trumps the user's allow and the pipe connect returns UnauthorizedAccessException at the prerequisite-probe stage. Functionally correct but operationally confusing — when this hit during the PR 38 install workflow it took five steps to diagnose ('the user IS in the allow list, why is the pipe denying access?'). The pre-existing ParityFixture (PR 18) already documents this with an explicit early-skip; LiveStackFixture (PR 37) didn't.
PR 39 closes the gap. New IsElevatedAdministratorOnWindows static helper (Windows-only via RuntimeInformation.IsOSPlatform; non-Windows hosts return false and let the prerequisite probe own the skip-with-reason path) checks WindowsPrincipal.IsInRole(WindowsBuiltInRole.Administrator) on the current process token. When true, InitializeAsync short-circuits to a SkipReason that names the cause directly: 'elevated token's Admins group membership trumps the allow rule — re-run from a NORMAL (non-admin) PowerShell window'. Catches and swallows any probe-side exception so a Win32 oddity can't crash the test fixture; failed probe falls through to the regular prerequisite path.
The check fires BEFORE AvevaPrerequisites.CheckAllAsync runs because the prereq probe's own pipe connect hits the same admin-deny and surfaces UnauthorizedAccessException with no context. Short-circuiting earlier saves the 10-second probe + produces a single actionable line.
Tests — verified manually from an elevated bash session against the just-installed OtOpcUaGalaxyHost service: skip message reads 'Test host is running with elevated (Administrators) privileges, but the OtOpcUaGalaxyHost named-pipe ACL explicitly denies Administrators per the IPC security design (decision #76 / PipeAcl.cs). Re-run from a NORMAL (non-admin) PowerShell window — even when your user is already in the pipe's allow list, the elevated token's Admins group membership trumps the allow rule.' Proxy.Tests Unit: 17 pass / 0 fail (unchanged — fixture change is non-breaking; existing tests don't run as admin in normal CI flow). Build clean.
Bonus: gitignored .local/ directory (a previous direct commit on local v2 that I'm now landing here) so per-install secrets like the Galaxy.Host shared-secret file don't leak into the repo.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-18 19:17:43 -04:00
Joseph Doherty
a61e637411 Gitignore .local/ directory for dev-only secrets like the Galaxy.Host shared secret. Created during the PR 38 / install-services workflow to keep per-install secrets out of the repo. 2026-04-18 19:15:13 -04:00
e4885aadd0 Merge pull request 'Phase 3 PR 38 — DriverNodeManager HistoryRead override (LMX #1 finish)' (#37) from phase-3-pr38-historyread-servicehandler into v2 2026-04-18 17:53:24 -04:00
Joseph Doherty
52a29100b1 Phase 3 PR 38 — DriverNodeManager HistoryRead override (LMX #1 finish). Wires the OPC UA HistoryRead service through CustomNodeManager2's four protected per-kind hooks — HistoryReadRawModified / HistoryReadProcessed / HistoryReadAtTime / HistoryReadEvents — each dispatching to the driver's IHistoryProvider capability (PR 35 for ReadAtTime + ReadEvents on top of PR 19-era ReadRaw + ReadProcessed). Was the last missing piece of the end-to-end HistoryRead path: PR 10 + PR 11 shipped the Galaxy.Host IPC contracts, PR 35 surfaced them on IHistoryProvider + GalaxyProxyDriver, but no server-side handler bridged OPC UA HistoryRead service requests onto the capability interface. Now it does.
Per-kind override shape: each hook receives the pre-filtered nodesToProcess list (NodeHandles for nodes this manager claimed), iterates them, resolves handle.NodeId.Identifier to the driver-side full reference string, and dispatches to the right IHistoryProvider method. Write back into the outer results + errors slots at handle.Index (not the local loop counter — nodesToProcess is a filtered subset of nodesToRead, so indexing by the loop counter lands in the wrong slot for mixed-manager batches). WriteResult helper sets both results[i] AND errors[i]; this matters because MasterNodeManager merges them and leaving errors[i] at its default (BadHistoryOperationUnsupported) overrides a Good result with Unsupported on the wire — this was the subtle failure mode that masked a correctly-constructed HistoryData response during debugging. Failure-isolation per node: NotSupportedException from a driver that doesn't implement a particular HistoryProvider method translates to BadHistoryOperationUnsupported in that slot; generic exceptions log and surface BadInternalError; unresolvable NodeIds get BadNodeIdUnknown. The batch continues unconditionally.
Aggregate mapping: MapAggregate translates ObjectIds.AggregateFunction_Average / Minimum / Maximum / Total / Count to the driver's HistoryAggregateType enum. Null for anything else (e.g. TimeAverage, Interpolative) so the handler surfaces BadAggregateNotSupported at the batch level — per Part 13, one unsupported aggregate means the whole request fails since ReadProcessedDetails carries one aggregate list for all nodes. BuildHistoryData wraps driver DataValueSnapshots as Opc.Ua.HistoryData in an ExtensionObject; BuildHistoryEvent wraps HistoricalEvents as Opc.Ua.HistoryEvent with the canonical BaseEventType field list (EventId, SourceName, Message, Severity, Time, ReceiveTime — the order OPC UA clients that didn't customize the SelectClause expect). ToDataValue preserves null SourceTimestamp (Galaxy historian rows often carry only ServerTimestamp) — synthesizing a SourceTimestamp would lie about actual sample time.
Two address-space changes were required to make the stack dispatch reach the per-kind hooks at all: (1) historized variables get AccessLevels.HistoryRead added to their AccessLevel byte — the base's early-gate check on (variable.AccessLevel & HistoryRead != 0) was rejecting requests before our override ever ran; (2) the driver-root folder gets EventNotifiers.HistoryRead | SubscribeToEvents so HistoryReadEvents can target it (the conventional pattern for alarm-history browse against a driver-owned object). Document the 'set both bits' requirement inline since it's not obvious from the surface API.
OpcHistoryReadResult alias: Opc.Ua.HistoryReadResult (service-layer per-node result) collides with Core.Abstractions.HistoryReadResult (driver-side samples + continuation point) by type name; the alias 'using OpcHistoryReadResult = Opc.Ua.HistoryReadResult' keeps the override signatures unambiguous and the test project applies the mirror pattern for its stub driver impl.
Tests — DriverNodeManagerHistoryMappingTests (12 new Category=Unit cases): MapAggregate translates each supported aggregate NodeId via reflection-backed theory (guards against the stack renaming AggregateFunction_* constants); returns null for unsupported NodeIds (TimeAverage) and null input; BuildHistoryData wraps samples with correct DataValues + SourceTimestamp preservation; BuildHistoryEvent emits the 6-element BaseEventType field list in canonical order (regression guard for a future 'respect the client's SelectClauses' change); null SourceName / Message translate to empty-string Variants (nullable-Variant refactor trap); ToDataValue preserves StatusCode + both timestamps; ToDataValue leaves SourceTimestamp at default when the snapshot omits it. HistoryReadIntegrationTests (5 new Category=Integration): drives a real OPC UA client Session.HistoryRead against a fake HistoryDriver through the running server. Covers raw round-trip (verifies per-node DataValue ordering + values); processed with Average aggregate (captures the driver's received aggregate + interval, asserting MapAggregate routed correctly); unsupported aggregate (TimeAverage → BadAggregateNotSupported); at-time (forwards the per-timestamp list); events (BaseEventType field list shape, SelectClauses populated to satisfy the stack's filter validator). Server.Tests Unit: 55 pass / 0 fail (43 prior + 12 new mapping). Server.Tests Integration: 14 pass / 0 fail (9 prior + 5 new history). Full solution build clean, 0 errors.
lmx-followups.md #1 updated to 'DONE (PRs 35 + 38)' with two explicit deferred items: continuation-point plumbing (driver returns null today so pass-through is fine) and per-SelectClause evaluation in HistoryReadEvents (clients with custom field selections get the canonical BaseEventType layout today).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-18 17:50:23 -04:00
19bcf20fbe Merge pull request 'Phase 3 PR 37 — End-to-end live-stack Galaxy smoke test' (#36) from phase-3-pr37-live-stack-smoke into v2 2026-04-18 16:56:50 -04:00
Joseph Doherty
8adc8f5ab8 Phase 3 PR 37 — End-to-end live-stack Galaxy smoke test. Closes the code side of LMX follow-up #5; once OtOpcUaGalaxyHost is installed + started on the dev box, the suite exercises the full topology GalaxyProxyDriver in-process → named-pipe IPC → running OtOpcUaGalaxyHost Windows service → MxAccessGalaxyBackend → live MXAccess runtime → real deployed Galaxy objects. Never spawns the Host process itself — connects to the already-running service per project_galaxy_host_service.md, which is the only way to exercise the production COM-apartment + service-account + pipe-ACL configuration.
LiveStackConfig resolves the pipe name + per-install shared secret from two sources in order: OTOPCUA_GALAXY_PIPE + OTOPCUA_GALAXY_SECRET env vars first (for CI / benchwork overrides), then the service's per-process Environment registry values under HKLM\SYSTEM\CurrentControlSet\Services\OtOpcUaGalaxyHost (what Install-Services.ps1 writes at install time). Registry read requires the test host to run elevated on most boxes — the skip message says so explicitly so operators see the right remediation. Hard-coded secrets are deliberately avoided: the installer generates 32 fresh random bytes per install, a committed secret would diverge from production the moment the service is re-installed.
LiveStackFixture is an IAsyncLifetime that (1) runs AvevaPrerequisites.CheckAllAsync with CheckGalaxyHostPipe=true + CheckHistorian=false — produces a structured PrerequisiteReport whose SkipReason is the exact operator-facing 'here's what you need to fix' text, (2) resolves LiveStackConfig and surfaces a clear skip when the secret isn't discoverable, (3) instantiates GalaxyProxyDriver + calls InitializeAsync (the IPC handshake), capturing a skip with the exception detail + common-cause hints (secret mismatch, SID not in pipe ACL, Host's backend couldn't connect to ZB) rather than letting a NullRef cascade through every subsequent test. SkipIfUnavailable() translates the captured SkipReason into Assert.Skip at the top of every fact so tests read as cleanly-skipped with a visible reason, not silently-passed or crashed.
LiveStackSmokeTests (5 facts, Collection=LiveStack, Category=LiveGalaxy): Fixture_initialized_successfully (cheapest possible end-to-end assertion — if this passes, the IPC handshake worked); Driver_reports_Healthy_after_IPC_handshake (DriverHealth.State post-connect); DiscoverAsync_returns_at_least_one_variable_from_live_galaxy (captures every Variable() call from DiscoverAsync via CapturingAddressSpaceBuilder and asserts > 0 — zero here usually means the Host couldn't read ZB, the skip message names OTOPCUA_GALAXY_ZB_CONN to check); GetHostStatuses_reports_at_least_one_platform (IHostConnectivityProbe surface — zero means the probe loop hasn't fired or no Platform is deployed locally); Can_read_a_discovered_variable_from_live_galaxy (reads the first discovered attribute's full reference, asserts status != BadInternalError — Galaxy's Uncertain-quality-until-first-Engine-scan is intentionally NOT treated as failure since it depends on runtime state that varies across test runs). Read-only by design; writes need an agreed scratch tag to avoid mutating a process-critical attribute — deferred to a follow-up PR that reuses this fixture.
CapturingAddressSpaceBuilder is a minimal IAddressSpaceBuilder that flattens every Variable() call into a list so tests can inspect what discovery produced without booting the full OPC UA node-manager stack; alarm annotation + property calls are no-ops. Scoped private to the test class.
Galaxy.Proxy.Tests csproj gains a ProjectReference to Driver.Galaxy.TestSupport (PR 36) for AvevaPrerequisites. The NU1702 warning about the Host project being net48-referenced-by-net10 is pre-existing from the HostSubprocessParityTests — Proxy.Tests only needs the Host EXE path for that parity scenario, not type surface.
Test run on THIS machine (OtOpcUaGalaxyHost not yet installed): Skipped! Failed 0, Passed 0, Skipped 5 — each skip message includes the full prerequisites report pointing at the missing service. Once the service is installed + started (scripts\install\Install-Services.ps1), the 5 facts will execute against live Galaxy. Proxy.Tests Unit: 17 pass / 0 fail (unchanged — new tests are Category=LiveGalaxy, separate suite). Full Proxy build clean. Memory already captures the 'live tests run via already-running service, don't spawn' convention (project_galaxy_host_service.md).
lmx-followups.md #5 updated: status is 'IN PROGRESS' across PRs 36 + 37 with the explicit remaining work (install + start services, subscribe-and-receive, write round-trip).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-18 16:49:51 -04:00
261869d84e Merge pull request 'Phase 3 PR 36 — AVEVA prerequisites test-support library' (#35) from phase-3-pr36-aveva-prerequisites into v2 2026-04-18 16:44:41 -04:00
Joseph Doherty
08c90d19fd Phase 3 PR 36 — AVEVA prerequisites test-support library. New tests/ZB.MOM.WW.OtOpcUa.Driver.Galaxy.TestSupport multi-targeted class library (net10.0 + net48 so both the modern and the MXAccess-COM x86 test projects can consume it) that probes every piece of the AVEVA System Platform + OtOpcUa stack a live-Galaxy test depends on and returns a structured PrerequisiteReport. Closes the gap where live-smoke tests silently returned 'unreachable' without telling operators which specific piece failed.
AvevaPrerequisites.CheckAllAsync walks eight probe categories producing PrerequisiteCheck rows each with Name (e.g. 'service:aaBootstrap', 'sql:ZB', 'com:LMXProxy', 'registry:ArchestrA.Framework'), Category (AvevaCoreService / AvevaSoftService / AvevaInstall / MxAccessCom / GalaxyRepository / AvevaHistorian / OtOpcUaService / Environment), Status (Pass / Warn / Fail / Skip), and operator-facing Detail message. Report aggregates them: IsLivetestReady (no Fails anywhere) and IsAvevaSideReady (AVEVA-side categories pass, our v2 services can be absent while still considering the environment AVEVA-ready) so different test tiers can use the right threshold.
Individual probes: ServiceProbe.Check queries the Windows Service Control Manager via System.ServiceProcess.ServiceController — treats DemandStart+Stopped as Warn (NmxSvc is DemandStart by design; master pulls it up) but AutoStart+Stopped as Fail; not-installed is Fail for hard-required services, Warn for soft ones; non-Windows hosts get Skip; transitional states like StartPending get Warn with a 'try again' hint. RegistryProbe reads HKLM\SOFTWARE\WOW6432Node\ArchestrA\{Framework,Framework\Platform,MSIInstall} — Framework key presence + populated InstallPath/RootPath values mean System Platform installed; PfeConfigOptions in the Platform subkey (format 'PlatformId=N,EngineId=N,...') indicates a Platform has been deployed from the IDE (PlatformId=0 means never deployed — MXAccess will connect but every subscription will be Bad quality); RebootRequired='True' under MSIInstall surfaces as a loud warn since post-patch behavior is undefined. MxAccessComProbe resolves the LMXProxy.LMXProxyServer ProgID → CLSID → HKLM\SOFTWARE\Classes\WOW6432Node\CLSID\{guid}\InprocServer32, verifying the registered file exists on disk (catches the orphan-registry case where a previous uninstall left the ProgID registered but the DLL is gone — distinguishes it from the 'totally not installed' case by message); also emits a Warn when the test process is 64-bit (MXAccess COM activation fails with REGDB_E_CLASSNOTREG 0x80040154 regardless of registration, so seeing this warning tells operators why the activation would fail even on a fully-installed machine). SqlProbe tests Galaxy Repository via Microsoft.Data.SqlClient using the Windows-auth localhost connection string the repo code defaults to — distinguishes 'SQL Server unreachable' (connection fails) from 'ZB database does not exist' (SELECT DB_ID('ZB') returns null) because they have different remediation paths (sc.exe start MSSQLSERVER vs. restore from .cab backup); a secondary CheckDeployedObjectCountAsync query on 'gobject WHERE deployed_version > 0' warns when the count is zero because discovery smoke tests will return empty hierarchies. NamedPipeProbe opens a 2s NamedPipeClientStream against OtOpcUaGalaxyHost's pipe ('OtOpcUaGalaxy' per the installer default) — pipe accepting a connection proves the Host service is listening; disconnects immediately so we don't consume a session slot.
Service lists kept as internal static data so tests can inspect + override: CoreServices (aaBootstrap + aaGR + NmxSvc + MSSQLSERVER — hard fail if missing), SoftServices (aaLogger + aaUserValidator + aaGlobalDataCacheMonitorSvr — warn only; stack runs without them but diagnostics/auth are degraded), HistorianServices (aahClientAccessPoint + aahGateway — opt-in via Options.CheckHistorian, only matters for HistoryRead IPC paths), OtOpcUaServices (our OtOpcUaGalaxyHost hard-required for end-to-end live tests + OtOpcUa warn + GLAuth warn). Narrower entry points CheckRepositoryOnlyAsync and CheckGalaxyHostPipeOnlyAsync for tests that only care about specific subsystems — avoid paying the full probe cost on every GalaxyRepositoryLiveSmokeTests fact.
Multi-targeting mechanics: System.ServiceProcess.ServiceController + Microsoft.Win32.Registry are NuGet packages on net10 but in-box BCL references on net48; csproj conditions Package vs Reference by TargetFramework. Microsoft.Data.SqlClient v6 supports both frameworks so single PackageReference. Net48Polyfills.cs provides IsExternalInit shim (records/init-only setters) and SupportedOSPlatformAttribute stub so the same Probe sources compile on both frameworks without per-callsite preprocessor guards — lets Roslyn's platform-compatibility analyzer stay useful on net10 without breaking net48 builds.
Existing GalaxyRepositoryLiveSmokeTests updated to delegate its skip decision to AvevaPrerequisites.CheckRepositoryOnlyAsync (legacy ZbReachableAsync kept as a compatibility adapter so the in-test 'if (!await ZbReachableAsync()) return;' pattern keeps working while the surrounding fixtures gradually migrate to Assert.Skip-with-reason). Slnx file registers the new project.
Tests — AvevaPrerequisitesLiveTests (8 new Integration cases, Category=LiveGalaxy): the helper correctly reports Framework install (registry pass), aaBootstrap Running (service pass), aaGR Running (service pass), MxAccess COM registered (com pass), ZB database reachable (sql pass), deployed-object count > 0 (warn-upgraded-to-pass because this box has 49 objects deployed), the AVEVA side is ready even when our own services (OtOpcUaGalaxyHost) aren't installed yet (IsAvevaSideReady=true), and the helper emits rows for OtOpcUaGalaxyHost + OtOpcUa + GLAuth even when not installed (regression guard — nobody can accidentally ship a check that omits our own services). Full Galaxy.Host.Tests Category=LiveGalaxy suite: 13 pass (5 prior smoke + 8 new prerequisites). Full solution build clean, 0 errors.
What's NOT in this PR: end-to-end Galaxy stack smoke (Proxy → Host pipe → MXAccess → real Galaxy tag). That's the next PR — this one is the gate the end-to-end smoke will call first to produce actionable skip messages instead of silent returns.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-18 16:36:13 -04:00
5cc120d836 Merge pull request 'Phase 3 PR 35 — IHistoryProvider gains ReadAtTime + ReadEvents; Proxy implements both' (#34) from phase-3-pr35-history-readtime-readevents into v2 2026-04-18 16:12:43 -04:00
Joseph Doherty
bf329b05d8 Phase 3 PR 35 — IHistoryProvider gains ReadAtTimeAsync + ReadEventsAsync; GalaxyProxyDriver implements both. Extends Core.Abstractions.IHistoryProvider with two new methods that round out the OPC UA Part 11 HistoryRead surface (HistoryReadAtTime + HistoryReadEvents are the last two modes not covered by the PR 19-era ReadRawAsync + ReadProcessedAsync) and wires GalaxyProxyDriver to call the existing PR-10/PR-11 IPC contracts the Host already implements.
Interface additions use C# default interface implementations that throw NotSupportedException — existing IHistoryProvider implementations keep compiling, only drivers whose backend carries the relevant capability override. This matches the 'capabilities are optional per driver' design already used by IHistoryProvider.ReadProcessedAsync's docs (Modbus / OPC UA Client drivers never had an event historian and the default-throw path lets callers see BadHistoryOperationUnsupported naturally). New HistoricalEvent record models one historian row (EventId, SourceName, EventTimeUtc + ReceivedTimeUtc — process vs historian-persist timestamps, Message, Severity mapped to OPC UA's 1-1000 range); HistoricalEventsResult pairs the event list with a continuation-point token for future batching. Both live in Core.Abstractions so downstream (Proxy, Host, Server) reference a single domain shape — no Shared-contract leak into the driver-facing interface.
GalaxyProxyDriver.ReadAtTimeAsync maps the domain DateTime[] to Unix-ms longs, calls CallAsync on the existing MessageKind.HistoryReadAtTimeRequest, and trusts the Host's one-sample-per-requested-timestamp contract (the Host pads with bad-quality snapshots for timestamps it can't interpolate; re-aligning on the Proxy side would duplicate the Host's interpolation policy logic). ReadEventsAsync does the same for HistoryReadEventsRequest; ToHistoricalEvent translates GalaxyHistoricalEvent (MessagePack-annotated, Unix-ms) to the domain record, explicitly tagging DateTimeKind.Utc on both timestamp fields so downstream serializers (JSON, OPC UA types) don't apply an unexpected local-time offset.
Tests — HistoricalEventMappingTests (3 new Proxy.Tests unit cases): every field maps correctly from wire to domain; null SourceName and null DisplayText preserve through the mapping (system events without a source come out with null so callers can distinguish them from alarm events); both timestamps come out as DateTimeKind.Utc (regression guard against a future refactor using DateTime.FromFileTimeUtc or similar that defaults to Unspecified). Driver.Galaxy.Proxy.Tests Unit suite: 17 pass / 0 fail (14 prior + 3 new). Full solution build clean, 0 errors.
Scope exclusions — DriverNodeManager HistoryRead service-handler wiring (on the OPC UA Server side, where HistoryReadAtTime and HistoryReadEvents service requests land) and the full-loop integration test (OPC UA client → server → IPC → Host → HistorianDataSource → back) are deferred to a focused follow-up PR. The capability surface is the load-bearing change; wiring the service handlers is mechanical in comparison and worth its own PR for reviewability. docs/v2/lmx-followups.md #1 updated with the split.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-18 16:08:27 -04:00
2584379e75 Merge pull request 'Phase 3 PR 34 — Host-status publisher (Server) + /hosts drill-down page (Admin)' (#33) from phase-3-pr34-host-status-publisher-page into v2 2026-04-18 16:04:20 -04:00
Joseph Doherty
ef2a810b2d Phase 3 PR 34 — Host-status publisher (Server) + /hosts drill-down page (Admin). Closes LMX follow-up #7 by wiring together the data layer from PR 33. Server.HostStatusPublisher is a BackgroundService that walks every driver registered in DriverHost every 10 seconds, skips drivers that don't implement IHostConnectivityProbe, calls GetHostStatuses() on each probe-capable driver, and upserts one DriverHostStatus row per (NodeId, DriverInstanceId, HostName) into the central config DB. Upsert path: SingleOrDefaultAsync on the composite PK; if no row exists, Add a new one; if a row exists, LastSeenUtc advances unconditionally (heartbeat) and State + StateChangedUtc update only on transitions so Admin UI can distinguish 'still reporting, still Running' from 'freshly transitioned to Running'. MapState translates Core.Abstractions.HostState to Configuration.Enums.DriverHostState (intentional duplicate enum — Configuration project stays free of driver-runtime deps per PR 33's choice). If a driver's GetHostStatuses throws, log warning and skip that driver this tick — never take down the Server on a publisher failure. If the DB is unreachable, log warning + retry next heartbeat (no buffering — next tick's current-state snapshot is more useful than replaying stale transitions after a long outage). 2-second startup delay so NodeBootstrap's RegisterAsync calls land before the first publish tick, then tick runs immediately so a freshly-started Server surfaces its host topology in the Admin UI without waiting a full interval.
Polling chosen over event-driven for initial scope: simpler, matches Admin UI consumer cadence, avoids DriverHost lifecycle-event plumbing that doesn't exist today. Event-driven push for sub-heartbeat latency is a straightforward follow-up.
Admin.Services.HostStatusService left-joins DriverHostStatus against ClusterNode on NodeId so rows persist even when the ClusterNode entry doesn't exist yet (first-boot bootstrap case). StaleThreshold = 30s — covers one missed publisher heartbeat plus a generous buffer for clock skew and GC pauses. Admin Components/Pages/Hosts.razor — FleetAdmin-visible page grouped by cluster (handles the '(unassigned)' case for rows without a matching ClusterNode). Four summary cards (Hosts / Running / Stale / Faulted); per-cluster table with Node / Driver / Host / State + Stale-badge / Last-transition / Last-seen / Detail columns; 10s auto-refresh via IServiceScopeFactory timer pattern matching FleetStatusPoller + Fleet dashboard (PR 27). Row-class highlighting: Faulted → table-danger, Stale → table-warning, else default. State badge maps DriverHostState enum to bootstrap color classes. Sidebar link added between 'Fleet status' and 'Clusters'.
Server csproj adds Microsoft.EntityFrameworkCore.SqlServer 10.0.0 + registers OtOpcUaConfigDbContext in Program.cs scoped via NodeOptions.ConfigDbConnectionString (no Admin-style manual SQL raw — the DbContext is the only access path, keeps migrations owner-of-record).
Tests — HostStatusPublisherTests (4 new Integration cases, uses per-run throwaway DB matching the FleetStatusPollerTests pattern): publisher upserts one row per host from each probe-capable driver and skips non-probe drivers; second tick advances LastSeenUtc without creating duplicate rows (upsert pattern verified end-to-end); state change between ticks updates State AND StateChangedUtc (datetime2(3) rounds to millisecond precision so comparison uses 1ms tolerance — documented inline); MapState translates every HostState enum member. Server.Tests Integration: 4 new tests pass. Admin build clean, Admin.Tests Unit still 23 / 0. docs/v2/lmx-followups.md item #7 marked DONE with three explicit deferred items (event-driven push, failure-count column, SignalR fan-out).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-18 15:51:55 -04:00
a7764e50f3 Merge pull request 'Phase 3 PR 33 — DriverHostStatus entity + migration (LMX #7 data layer)' (#32) from phase-3-pr33-driverhoststatus-entity into v2 2026-04-18 15:43:37 -04:00
Joseph Doherty
8464e3f376 Phase 3 PR 33 — DriverHostStatus entity + EF migration (data-layer for LMX #7). New DriverHostStatus entity with composite key (NodeId, DriverInstanceId, HostName) persists each server node's per-host connectivity view — one row per (server node, driver instance, probe-reported host), which means a redundant 2-node cluster with one Galaxy driver reporting 3 platforms produces 6 rows because each server node owns its own runtime view of the shared host topology, not 3. Fields: NodeId (64), DriverInstanceId (64), HostName (256 — fits Galaxy FQDNs and Modbus host:port strings), State (DriverHostState enum — Unknown/Running/Stopped/Faulted, persisted as nvarchar(16) via HasConversion<string> so DBAs inspecting the table see readable state names not ordinals), StateChangedUtc + LastSeenUtc (datetime2(3) — StateChangedUtc tracks actual transitions while LastSeenUtc advances on every publisher heartbeat so the Admin UI can flag stale rows from a crashed Server independent of State), Detail (nullable 1024 — exception message from the driver's probe when Faulted, null otherwise).
DriverHostState enum lives in Configuration.Enums/ rather than reusing Core.Abstractions.HostState so the Configuration project stays free of driver-runtime dependencies (it's referenced by both the Admin process and the Server process, so pulling in the driver-abstractions assembly to every Admin build would be unnecessary weight). The server-side publisher hosted service (follow-up PR 34) will translate HostStatusChangedEventArgs.NewState to this enum on every transition.
No foreign key to ClusterNode — a Server may start reporting host status before its ClusterNode row exists (first-boot bootstrap), and we'd rather keep the status row than drop it. The Admin-side service that renders the dashboard will left-join on NodeId when presenting. Two indexes declared: IX_DriverHostStatus_Node drives the per-cluster drill-down (Admin UI joins ClusterNode on ClusterId to pick which NodeIds to fetch), IX_DriverHostStatus_LastSeen drives the stale-row query (now - LastSeen > threshold).
EF migration AddDriverHostStatus creates the table + PK + both indexes. Model snapshot updated. SchemaComplianceTests expected-tables list extended. DriverHostStatusTests (3 new cases, category SchemaCompliance, uses the shared fixture DB): composite key allows same (host, driver) across different nodes AND same (node, host) across different drivers — both real-world cases the publisher needs to support; upsert-in-place pattern (fetch-by-composite-PK, mutate, save) produces one row not two — the pattern the publisher will use; State enum persists as string not int — reading the DB via ADO.NET returns 'Faulted' not '3'.
Configuration.Tests SchemaCompliance suite: 10 pass / 0 fail (7 prior + 3 new). Configuration build clean. No Server or Admin code changes yet — publisher + /hosts page are PR 34.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-18 15:38:41 -04:00
a9357600e7 Merge pull request 'Phase 3 PR 32 — Multi-driver integration test' (#31) from phase-3-pr32-multi-driver-integration into v2 2026-04-18 15:34:16 -04:00
Joseph Doherty
2f00c74bbb Phase 3 PR 32 — Multi-driver integration test. Closes LMX follow-up #6 with Server.Tests/MultipleDriverInstancesIntegrationTests.cs: registers two StubDriver instances (alpha + beta) with distinct DriverInstanceIds on one DriverHost, boots the full OpcUaApplicationHost, and exercises three behaviors end-to-end via a real OPC UA client session. (1) Each driver's namespace URI resolves to a distinct index in the client's NamespaceUris (alpha → urn:OtOpcUa:alpha, beta → urn:OtOpcUa:beta) — proves DriverNodeManager's namespaceUris-per-driver base-ctor wiring actually lands two separate INodeManager registrations. (2) Browsing one subtree returns only that driver's folder; the other driver's folder does NOT leak into the wrong subtree. This is the test that catches a cross-driver routing regression the v1 single-driver code path couldn't surface — if a future refactor flattens both drivers into a shared namespace, the 'shouldNotContain' assertion fails cleanly. (3) Reads route to the owning driver by namespace — alpha's ReadAsync returns 42 while beta's returns 99; a misroute would surface as 99 showing up on an alpha node id or vice versa. StubDriver is parameterized on (DriverInstanceId, folderName, readValue) so the same class constructs both instances without copy-paste.
No production code changes — pure additive test. Server.Tests Integration: 3 new tests pass; existing OpcUaServerIntegrationTests stays green (single-driver case still exercised there). Full Server.Tests Unit still 43 / 0. Deferred: multi-driver alarm-event case (two drivers each raising a GalaxyAlarmEvent, assert each condition lands on its owning instance's condition node) — needs a stub IAlarmSource and is worth its own focused PR.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-18 15:29:49 -04:00
5d5e1f9650 Merge pull request 'Phase 3 PR 31 — Live-LDAP integration test + Active Directory compatibility' (#30) from phase-3-pr31-live-ldap-ad-compat into v2 2026-04-18 15:27:54 -04:00
Joseph Doherty
4886a5783f Phase 3 PR 31 — Live-LDAP integration test + Active Directory compatibility. Closes LMX follow-up #4 with 6 live-bind tests in Server.Tests/LdapUserAuthenticatorLiveTests.cs against the dev GLAuth instance at localhost:3893 (skipped cleanly when unreachable via Assert.Skip + a clear SkipReason — matches the GalaxyRepositoryLiveSmokeTests pattern). Coverage: valid credentials bind + surface DisplayName; wrong password fails; unknown user fails; empty credentials fail pre-flight without touching the directory; writeop user's memberOf maps through GroupToRole to WriteOperate (the exact string WriteAuthzPolicy.IsAllowed expects); admin user surfaces all four mapped roles (WriteOperate + WriteTune + WriteConfigure + AlarmAck) proving memberOf parsing doesn't stop after the first match. While wiring this up, the authenticator's hard-coded user-lookup filter 'uid=<name>' didn't match GLAuth (which keys users by cn and doesn't populate uid) — AND it doesn't match Active Directory either, which uses sAMAccountName. Added UserNameAttribute to LdapOptions (default 'uid' for RFC 2307 backcompat) so deployments override to 'cn' / 'sAMAccountName' / 'userPrincipalName' as the directory requires; authenticator filter now interpolates the configured attribute. The default stays 'uid' so existing test fixtures and OpenLDAP installs keep working without a config change — a regression guard in LdapUserAuthenticatorAdCompatTests.LdapOptions_default_UserNameAttribute_is_uid_for_rfc2307_compat pins this so a future 'helpful' default change can't silently break anyone.
Active Directory compatibility. LdapOptions xml-doc expanded with a cheat-sheet covering Server (DC FQDN), Port 389 vs 636, UseTls=true under AD LDAP-signing enforcement, dedicated read-only service account DN, sAMAccountName vs userPrincipalName vs cn trade-offs, memberOf DN shape (CN=Group,OU=...,DC=... with the CN= RDN stripped to become the GroupToRole key), and the explicit 'nested groups NOT expanded' call-out (LDAP_MATCHING_RULE_IN_CHAIN / tokenGroups is a future authenticator enhancement, not a config change). docs/security.md §'Active Directory configuration' adds a complete appsettings.json snippet with realistic AD group names (OPCUA-Operators → WriteOperate, OPCUA-Engineers → WriteConfigure, OPCUA-AlarmAck → AlarmAck, OPCUA-Tuners → WriteTune), LDAPS port 636, TLS on, insecure-LDAP off, and operator-facing notes on each field. LdapUserAuthenticatorAdCompatTests (5 unit guards): ExtractFirstRdnValue parses AD-style 'CN=OPCUA-Operators,OU=...,DC=...' DNs correctly (case-preserving — operators' GroupToRole keys stay readable); also handles mixed case and spaces in group names ('Domain Users'); also works against the OpenLDAP ou=<group>,ou=groups shape (GLAuth) so one extractor tolerates both memberOf formats common in the field; EscapeLdapFilter escapes the RFC 4515 injection set (\, *, (, ), \0) so a malicious login like 'admin)(cn=*' can't break out of the filter; default UserNameAttribute regression guard.
Test posture — Server.Tests Unit: 43 pass / 0 fail (38 prior + 5 new AD-compat guards). Server.Tests LiveLdap category: 6 pass / 0 fail against running GLAuth (would skip cleanly without). Server build clean, 0 errors, 0 warnings.
Deferred: the session-identity end-to-end check (drive a full OPC UA UserName session, then read a 'whoami' node to verify the role landed on RoleBasedIdentity). That needs a test-only address-space node and is scoped for a separate PR.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-18 15:23:22 -04:00
d70a2e0077 Merge pull request 'Phase 3 PR 30 — Modbus integration-test project scaffold + DL205 smoke test' (#29) from phase-3-pr30-modbus-integration-scaffold into v2 2026-04-18 15:08:45 -04:00
Joseph Doherty
cb7b81a87a Phase 3 PR 30 — Modbus integration-test project scaffold. New tests/ZB.MOM.WW.OtOpcUa.Driver.Modbus.IntegrationTests project is the harness modbus-test-plan.md called for: a skip-when-unreachable fixture that TCP-probes a Modbus simulator endpoint (MODBUS_SIM_ENDPOINT, default localhost:502) once per test session, a DL205 device profile stub (single writable holding register at address 100, probe disabled to avoid racing with assertions), and one happy-path smoke test that initializes the real ModbusDriver + real ModbusTcpTransport, writes a known Int16 value, reads it back, and asserts status=0 + value round-trip. No DL205 quirk assertions yet — those land one-per-PR as the user validates each behavior in ModbusPal (word order for 32-bit, register-zero access, coil addressing base, max registers per FC03, response framing under load, exception code on protected-bit coil write).
ModbusSimulatorFixture is a collection fixture so the 2s TCP probe runs once per run, not per test; SkipReason gets a clear operator-facing message ('start ModbusPal or override MODBUS_SIM_ENDPOINT'). Tests call Assert.Skip(sim.SkipReason) rather than silently returning — matches the test-plan convention and reads cleanly in CI logs. DL205Profile.BuildOptions deliberately disables the background probe loop since integration tests drive reads explicitly and the probe would race with assertions. Tag naming uses the DL205_ prefix so filter 'DisplayName~DL205' surfaces device-specific failures at a glance.
Project references: xunit.v3 + Shouldly + Microsoft.NET.Test.Sdk + xunit.runner.visualstudio (matches the existing Driver.Modbus.Tests unit project), project ref to src/Driver.Modbus. Registered in ZB.MOM.WW.OtOpcUa.slnx under tests/. ModbusPal/README.md documents the dev loop (install ModbusPal jar, load profile, start simulator, dotnet test), explains MODBUS_SIM_ENDPOINT override for real-PLC benchwork, and flags DL205.xmpp as the first profile to add in a follow-up PR.
dotnet test run against the scaffold (no simulator running) skips cleanly: 0 failed, 0 passed, 1 skipped, with the SkipReason surfaced. dotnet build clean (0 warnings, 0 errors). Updated docs/v2/modbus-test-plan.md to mark the scaffold PR done and renumbered future PRs from 'PR 27+' to 'PR 31+' to stay in sync with the actual PR chain.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-18 15:02:39 -04:00
901d2b8019 Merge pull request 'Phase 3 PR 29 — Account/session page with roles + capabilities' (#28) from phase-3-pr29-account-page into v2 2026-04-18 14:46:45 -04:00
Joseph Doherty
d5fa1f450e Phase 3 PR 29 — Account/session page expanding the minimal sidebar role display into a dedicated /account route. Shows the authenticated operator's identity (username from ClaimTypes.NameIdentifier, display name from ClaimTypes.Name), their Admin roles as badges (from ClaimTypes.Role), the raw LDAP groups that mapped to those roles (from the 'ldap_group' claim added by Login.razor at sign-in), and a capability table listing each Admin capability with its required role and a Yes/No badge showing whether this session has it. Capability list mirrors the Program.cs authorization policies + each page's [Authorize] attribute so operators can self-service check whether their session has access without trial-and-error navigation — capabilities covered: view clusters + fleet status (all roles), edit configuration drafts (ConfigEditor or FleetAdmin per CanEdit policy), publish generations (FleetAdmin per CanPublish policy), manage certificate trust (FleetAdmin per PR 28 Certificates page attribute), manage external-ID reservations (ConfigEditor or FleetAdmin per Reservations page attribute).
Sidebar's 'Signed in as' line now wraps the display name in a link to /account so the existing sidebar-compact view becomes the entry point for the fuller page — keeps the sign-out button where it was for muscle memory, just adds the detail page one click away. Page is gated with [Authorize] (any authenticated admin) rather than a specific role — the capability table deliberately works for every signed-in user so they can see what they DON'T have access to, which helps them file the right ticket with their LDAP admin instead of getting a plain Access Denied when navigating blindly.
Capability → required-role table is defined as a private readonly record list in the page rather than pulled from a service because it's a UI-presentation concern, not runtime policy state — the runtime policy IS Program.cs's AddAuthorizationBuilder + each page's [Authorize] attribute, and this table just mirrors it for operator readability. Comment on the list reminds future-me to extend it when a new policy or [Authorize] page lands. No behavior change if roles are empty, but the page surfaces a hint ('Sign-in would have been blocked, so if you're seeing this, the session claim is likely stale') that nudges the operator toward signing out + back in.
No new tests added — the page is pure display over claims; its only logic is the 'has-capability' Any-overlap check which is exactly what ASP.NET's [Authorize(Roles=...)] does in-framework, and duplicating that in a unit test would test the framework rather than our code. Admin.Tests Unit stays 23 pass / 0 fail. Admin build clean.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-18 14:43:35 -04:00
6fdaee3a71 Merge pull request 'Phase 3 PR 28 — Admin UI cert-trust management page' (#27) from phase-3-pr28-cert-trust into v2 2026-04-18 14:42:52 -04:00
Joseph Doherty
ed88835d34 Phase 3 PR 28 — Admin UI cert-trust management page. New /certificates route (FleetAdmin-only) surfaces the OPC UA server's PKI store rejected + trusted certs and gives operators Trust / Delete / Revoke actions so rejected client certs can be promoted without touching disk. CertTrustService reads $PkiStoreRoot/{rejected,trusted}/certs/*.der files directly via X509CertificateLoader — no Opc.Ua dependency in the Admin project, which keeps the Admin host runnable on a machine that doesn't have the full Server install locally (only needs the shared PKI directory reachable; typical deployment has Admin + Server side-by-side on the same box and PkiStoreRoot defaults match so a plain-vanilla install needs no override). CertTrustOptions bound from the Admin's 'CertTrust:PkiStoreRoot' section, default %ProgramData%\OtOpcUa\pki (matches OpcUaServerOptions.PkiStoreRoot default). Trust action moves the .der from rejected/certs/ to trusted/certs/ via File.Move(overwrite:true) — idempotent, tolerates a concurrent operator doing the same move. Delete wipes the file. Revoke removes from trusted/certs/ (Opc.Ua re-reads the Directory store on each new client handshake, so no explicit reload signal is needed; operators retry the rejected connection after trusting). Thumbprint matching is case-insensitive because X509Certificate2.Thumbprint is upper-case hex but operators copy-paste from logs that sometimes lowercase it. Malformed files in the store are logged + skipped — a single bad .der can't take the whole management page offline. Missing store directories produce empty lists rather than exceptions so a pristine install (Server never run yet, no rejected/trusted dirs yet) doesn't crash the page.
Razor page layout: two tables (Rejected / Trusted) with Subject / Issuer / Thumbprint / Valid-window / Actions columns, status banner after each action with success or warning kind ('file missing' = another admin handled it), FleetAdmin-only via [Authorize(Roles=AdminRoles.FleetAdmin)]. Each action invokes LogActionAsync which Serilog-logs the authenticated admin user + thumbprint + action for an audit trail — DB-level ConfigAuditLog persistence is deferred because its schema is cluster-scoped and cert actions are cluster-agnostic; Serilog + CertTrustService's filesystem-op info logs give the forensic trail in the meantime. Sidebar link added to MainLayout between Reservations and the future Account page.
Tests — CertTrustServiceTests (9 new unit cases): ListRejected parses Subject + Thumbprint + store kind from a self-signed test cert written into rejected/certs/; rejected and trusted stores are kept separate; TrustRejected moves the file and the Rejected list is empty afterwards; TrustRejected with a thumbprint not in rejected returns false without touching trusted; DeleteRejected removes the file; UntrustCert removes from trusted only; thumbprint match is case-insensitive (operator UX); missing store directories produce empty lists instead of throwing DirectoryNotFoundException (pristine-install tolerance); a junk .der in the store is logged + skipped and the valid certs still surface (one bad file doesn't break the page). Full Admin.Tests Unit suite: 23 pass / 0 fail (14 prior + 9 new). Full Admin build clean — 0 errors, 0 warnings.
lmx-followups.md #3 marked DONE with a cross-reference to this PR and a note that flipping AutoAcceptUntrustedClientCertificates to false as the production default is a deployment-config follow-up, not a code gap — the Admin UI is now ready to be the trust gate.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-18 14:37:55 -04:00
5389d4d22d Phase 3 PR 27 � Fleet status dashboard page (#26) 2026-04-18 14:07:16 -04:00
Joseph Doherty
b5f8661e98 Phase 3 PR 27 — Fleet status dashboard page. New /fleet route shows per-node apply state (ClusterNodeGenerationState joined with ClusterNode for the ClusterId) in a sortable table with summary cards for Total / Applied / Stale / Failed node counts. Stale detection: LastSeenAt older than 30s triggers a table-warning row class + yellow count card. Failed rows get table-danger + red card. Badge classes per LastAppliedStatus: Applied=bg-success, Failed=bg-danger, Applying=bg-info, unknown=bg-secondary. Timestamps rendered as relative-age strings ('42s ago', '15m ago', '3h ago', then absolute date for >24h). Error column is truncated to 320px with the full message in a tooltip so the table stays readable on wide fleets. Initial data load on OnInitializedAsync; auto-refresh every 5s via a Timer that calls InvokeAsync(RefreshAsync) — matches the FleetStatusPoller's 5s cadence so the dashboard sees the most recent state without polling ahead of the broadcaster. A Refresh button also kicks a manual reload; _refreshing gate prevents double-runs when the timer fires during an in-flight query. IServiceScopeFactory (matches FleetStatusPoller's pattern) creates a fresh DI scope per refresh so the per-page DbContext can't race the timer with the render thread; no new DI registrations needed. Live SignalR hub push is deliberately deferred to a follow-up PR — the existing FleetStatusHub + NodeStateChangedMessage already works for external JavaScript clients; wiring an in-process Blazor Server consumer adds HubConnectionBuilder plumbing that's worth its own focused change. Sidebar link added to MainLayout between Overview and Clusters. Full Admin.Tests Unit suite 14 pass / 0 fail — unchanged, no tests regressed. Full Admin build clean (0 errors, 0 warnings). Closes the 'no per-driver dashboard' gap from lmx-followups item #7 at the fleet level; per-host (platform/engine/Modbus PLC) granularity still needs a dedicated page that consumes IHostConnectivityProbe.GetHostStatuses from the Server process — that's the live-SignalR follow-up.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-18 13:12:31 -04:00