Commit Graph

165 Commits

Author SHA1 Message Date
Joseph Doherty
02fccbc762 Phase 3 PR 43 — followup commit: validate pymodbus simulator end-to-end + fix three real bugs surfaced by running it. winget-installed Python 3.12.10 + pip-installed pymodbus[simulator]==3.13.0 on the dev box; both profiles boot cleanly, the integration-suite smoke test passes against either profile.
Three substantive issues caught + fixed during the validation pass:
1. pymodbus rejects unknown keys at device-list / setup level. My PR 43 commit had `_layout_note`, `_uint16_layout`, `_bits_layout`, `_write_note` device-level JSON-comment fields that crashed pymodbus startup with `INVALID key in setup`. Removed all device-level _* fields. Inline `_quirk` keys WITHIN individual register entries are tolerated by pymodbus 3.13.0 — kept those in dl205.json since they document the byte math per quirk and the README + git history aren't enough context for a hand-author reading raw integer values. Documented the constraint in the top-level _comment of each profile.
2. pymodbus rejects sweeping `write` ranges that include any cell not assigned a type. My initial standard.json had `write: [[0, 2047]]` but only seeded HR[0..31] + HR[100] + HR[200..209] + bits[1024..1109] — pymodbus blew up on cell 32 (gap between HR[31] and HR[100]). Fixed by listing per-block write ranges that exactly mirror the seeded ranges. Same fix in dl205.json (was `[[0, 16383]]`).
3. pymodbus simulator stores all 4 standard Modbus tables in ONE underlying cell array — each cell can only be typed once (BITS or UINT16, not both). My initial standard.json had `bits[0..31]` AND `uint16[0..31]` overlapping at the same addresses; pymodbus crashed with `ERROR "uint16" <Cell> used`. Fixed by relocating coils to address 1024+, well clear of the uint16 entries at 0..209. Documented the layout constraint in the standard.json top-level _comment.
Substantive driver bug fixed: ModbusTcpTransport.ConnectAsync was using `new TcpClient()` (default constructor — dual-stack, IPv6 first) then `ConnectAsync(host, port)` with the user's hostname. .NET's TcpClient default-resolves "localhost" to ::1 first, fails to connect to pymodbus (which binds 0.0.0.0 IPv4-only), and only then retries IPv4 — the failure surfaces as the entire ConnectAsync timeout (2s by default) before the IPv4 attempt even starts. PR 30's smoke test silently SKIPPED because the fixture's TCP probe hit the same dual-stack ordering and timed out. Both fixed: ModbusSimulatorFixture probe now resolves Dns.GetHostAddresses, prefers AddressFamily.InterNetwork, dials IPv4 explicitly. ModbusTcpTransport does the same — resolves first, prefers IPv4, falls back to whatever Dns returns (handles IPv6-only hosts in the future). This is a real production-readiness fix because most Modbus PLCs are IPv4-only — a generic dual-stack TcpClient would burn the entire connect timeout against any IPv4-only PLC, masquerading as a connection failure when the PLC is actually fine.
Smoke-test address shifted HR[100] -> HR[200]. Standard.json's HR[100] is the auto-incrementing register that drives subscribe-and-receive tests, so write-then-read against it would race the increment. HR[200] is the first cell of a writable scratch range present in BOTH simulator profiles. DL205Profile.cs xml-doc updated to explain the shift; tag name "DL205_Smoke_HReg100" -> "Smoke_HReg200" + smoke test references updated. dl205.json gains a matching scratch HR[200..209] range so the smoke test runs identically against either profile.
Validation matrix:
- standard.json boot: clean (TCP 5020 listening within ~3s of pymodbus.simulator launch).
- dl205.json boot: clean.
- pymodbus client direct FC06 to HR[200]=1234 + FC03 read: round-trip OK.
- raw-bytes PowerShell TcpClient FC06 + 12-byte response: matches FC06 spec (echo of address + value).
- DL205SmokeTest against standard.json: 1/1 pass (was failing as 'BadInternalError' due to the dual-stack timeout + tag-name typo — both fixed).
- DL205SmokeTest against dl205.json: 1/1 pass.
- Modbus.Tests Unit suite: 52/52 pass — dual-stack transport fix is non-breaking.
- Solution build clean.
Memory + future-PR setup: pymodbus install + activation pattern is now bullet-pointed at the top of Pymodbus/README.md so future PRs (the per-quirk DL205_<behavior> tests in PR 44+) don't have to repeat the trial-and-error of getting the simulator + integration tests cooperating. The three bugs above are documented inline in the JSON profiles + ModbusTcpTransport so they don't bite again.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-18 21:14:02 -04:00
Joseph Doherty
a05b84858d Phase 3 PR 43 — Swap ModbusPal to pymodbus for the integration-test simulator. Replaces the .xmpp profiles shipped in PR 42 with pymodbus 3.13.0 ModbusSimulatorServer JSON configs in tests/ZB.MOM.WW.OtOpcUa.Driver.Modbus.IntegrationTests/Pymodbus/. Substantive reasons for the swap (rationale block in the test-plan doc): ModbusPal 1.6b is abandoned (last release ~2019), Java GUI-only with no headless mode in the official JAR, and only exposes 2 of the 4 standard Modbus tables (holding_registers + coils — no input_registers, no discrete_inputs). pymodbus is current stable, pure Python CLI (pip install pymodbus[simulator]==3.13.0), exposes all four tables, has built-in declarative actions (increment / random / timestamp / uptime) for dynamic registers, supports custom Python actions for anything more complex, and ships an optional aiohttp-based web UI / REST API for live inspection. Pip-installable on Windows; sidesteps the privileged-port admin requirement by defaulting to TCP 5020.
ModbusSimulatorFixture default port bumped from 502 to 5020 to match the pymodbus convention. Override via MODBUS_SIM_ENDPOINT for a real PLC on its native 502. Skip-message updated to point at the new Pymodbus\serve.ps1 wrapper instead of 'start ModbusPal'. csproj <None Update> rule swapped from ModbusPal/** to Pymodbus/** so the new JSON profiles + serve.ps1 + README copy to test-output as PreserveNewest.
standard.json — generic Modbus TCP server, slave id 1, port 5020, shared blocks=false (independent coils + HR address spaces, more textbook-PLC-like). HR[0..31] seeded with address-as-value via per-register uint16 entries, HR[100] auto-increments via the built-in increment action with parameters minval=0/maxval=65535 (drives subscribe-and-receive integration tests so they have a register that ticks without a write — pymodbus's increment ticks per-access not wall-clock, which is good enough for a 250ms-poll test), HR[200..209] scratch range left at 0 for write tests, coils 0..31 alternating, coils 100..109 scratch. write list covers 0..1023 so any test address is mutable.
dl205.json — AutomationDirect DirectLOGIC DL205/DL260 quirk simulator, slave id 1, port 5020, shared blocks=true (matches DL series memory model where coils/DI/HR overlay the same word address space). Each quirky register seeded with the pre-computed raw uint16 value documented in docs/v2/dl205.md, with an inline _quirk JSON-comment naming the behavior so future-me reading the file knows why HR[1040]=25928 means 'H' lo / 'e' hi (the user's headline string-byte-order finding). Encoded quirks: V0 marker at HR[0]=0xCAFE; V2000 at HR[1024]=0x2000; V40400 at HR[8448]=0x4040; 'Hello' string at HR[1040..1042] first-char-low-byte; Float32 1.5f at HR[1056..1057] in CDAB word order (low word first); BCD register at HR[1072]=0x1234; FC03-128-cap block at HR[1280..1407]; Y0/C0 coil markers at 2048/3072; scratch C-relays at 4000..4007.
serve.ps1 wrapper — pwsh script with a -Profile {standard|dl205} parameter switch. Validates pymodbus.simulator is on PATH (clearer message than the raw CommandNotFoundException), validates the profile JSON exists, builds the right --modbus_server/--modbus_device/--json_file/--http_port arg list, and execs pymodbus.simulator in the foreground. -HttpPort 0 disables the web UI. Foreground exec lets the operator Ctrl+C to stop without an extra control script.
README.md fully rewritten for pymodbus: install command (pip install 'pymodbus[simulator]==3.13.0' — pinned for reproducibility, [simulator] extra pulls aiohttp), per-profile reference tables, the same DL205 quirk → register table from PR 42 but adjusted for pymodbus paths, what's-NEW-vs-ModbusPal section (all four tables, raw uint16 seeding, declarative actions, custom Python action modules, headless, web UI, maintained), trade-offs section (float32-as-two-uint16s for explicit CDAB control, increment ticks per-access not wall-clock, shared-blocks mode for DL205 vs separate for Standard), file-format quick reference for hand-authoring more profiles. References pinned to the pymodbus readthedocs simulator/config + REST API pages.
docs/v2/modbus-test-plan.md harness section rewritten with the swap rationale; PR-history list updated to mark PR 42 SUPERSEDED by PR 43 and call out PR 44+ as the per-quirk implementation track. Test-conventions bullet about 'don't depend on ModbusPal state between tests' generalized to 'don't depend on simulator state' and a note added that pymodbus's REST API can reset state between facts if a test ever needs it.
DL205Profile.cs and DL205SmokeTests.cs xml-doc updated to reference pymodbus / dl205.json instead of ModbusPal / DL205.xmpp.
Functional validation deferred — Python isn't installed on this dev box (winget search returned no matches for Python.Python.3 exact). JSON parses structurally (PowerShell ConvertFrom-Json clean on both files), build clean, .json + serve.ps1 + README all copy to test-output as expected. User installs pymodbus when they want to actually run the simulator end-to-end; if pymodbus rejects the config the README's reference link to pymodbus's simulator/config schema doc is the right next stop.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-18 20:35:26 -04:00
c59ac9e52d Merge pull request 'Phase 3 PR 42 — ModbusPal simulator profiles for Standard + DL205/DL260' (#41) from phase-3-pr42-modbuspal-profiles into v2 2026-04-18 20:12:39 -04:00
Joseph Doherty
02a0e8efd1 Phase 3 PR 42 — ModbusPal simulator profiles for Standard Modbus + DL205/DL260 quirks. Two hand-authored .xmpp profiles in tests/ZB.MOM.WW.OtOpcUa.Driver.Modbus.IntegrationTests/ModbusPal/ that integration tests load via the GUI to drive the suite without a real PLC. Both well-formed XML (verified via PowerShell [xml] cast); both copied to test-output as PreserveNewest content per the existing csproj rule.
Standard.xmpp — generic Modbus TCP server on port 502, slave id 1. HR[0..31] seeded with address-as-value (HR[5]=5 — easy mental map for diagnostics), HR[100] auto-incrementing via a 1Hz LinearGenerator binding (drives subscribe-and-receive integration tests so they have a register that actually changes without a write), HR[200..209] scratch range for write-roundtrip tests, coils 0..31 alternating on/off, coils 100..109 scratch. The Tick automation runs 0..65535 over 60s looping; bound to HR[100] via Binding_SINT16 — slow enough that a 250ms-poll integration test sees discrete jumps, fast enough that a 5s subscribe test sees several change notifications.
DL205.xmpp — AutomationDirect DirectLOGIC DL205/DL260 quirk simulator on port 502, slave id 1, modeling the behaviors documented in docs/v2/dl205.md as concrete register values so DL205 integration tests can assert each quirk WITHOUT a live PLC. Per-quirk encoding: V0 marker at HR[0]=0xCAFE proves register 0 is valid (rejects-register-0 rumour disproved); V2000 marker at HR[1024]=0x2000 proves V-memory octal-to-decimal mapping; V40400 marker at HR[8448]=0x4040 proves V40400→PDU 0x2100 (NOT register 0, contrary to the widespread shorthand); 'Hello' string at HR[1040..1042] packed first-char-low-byte (HR[1040]=0x6548 = 'H' lo + 'e' hi, HR[1041]=0x6C6C, HR[1042]=0x006F) — the headline string-byte-order quirk the user flagged; Float32 1.5f at HR[1056..1057] in CDAB word order (low word first: 0, then 0x3FC0); BCD register at HR[1072]=0x1234 representing decimal 1234 in BCD nibbles (NOT binary 0x04D2); 128-register block at HR[1280..1407] for FC03-128-cap testing; Y0 marker at coil 2048, C0 marker at coil 3072, scratch C-coils at 4000..4007 for write tests.
Critical limitation flagged inline + in README: ModbusPal 1.6b CANNOT represent the DL205 quirks semantically — it has no string binding, no BCD binding, no arbitrary-byte-layout binding (only SINT16/SINT32/FLOAT32 with word-order). So every DL205 quirk is encoded as a pre-computed raw 16-bit integer with the math worked out in inline comments above each register. Becomes unreadable past ~50 quirky registers; the README's 'alternatives' section recommends switching to pymodbus when that threshold approaches (pymodbus's ModbusSimulatorServer has first-class headless + scriptable callbacks for byte-level layouts).
Other ModbusPal 1.6b limitations called out in README: only holding_registers + coils sections in the official build (no input_registers / discrete_inputs — DL260 X-input markers can't be encoded faithfully here, FC02/FC04 tests wait for a fork or pymodbus); abandoned project (last release 1.6b, active forks at SCADA-LTS/ModbusPal, ControlThings-io/modbuspal, mrhenrike/ModbusPalEnhanced); no headless mode in the official JAR (-loadFile / -hide flags only in source-built forks); CVE-2018-10832 XXE on .xmpp import (don't import untrusted profiles — the in-repo ones are author-controlled).
README.md updated with: per-profile description tables, getting-started (download jar + java -jar + GUI File>Load>Run), MODBUS_SIM_ENDPOINT env-var override doc, two reference tables documenting which HR / coil address encodes which DL205 quirk + which test name asserts it (the same DL205_<behavior> naming convention from docs/v2/modbus-test-plan.md), 4-row alternatives comparison (pymodbus / diagslave / ModbusMechanic / ModRSsim2) for when ModbusPal can no longer carry the load, and a quick-reference XML format table at the bottom for future-me hand-authoring more profiles.
Pure documentation + test-asset PR — no code changes. The integration tests that consume these profiles (the actual DL205_<behavior> facts) land one at a time in PR 43+ as user validates each quirk via ModbusPal on the bench.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-18 20:05:20 -04:00
7009483d16 Merge pull request 'Phase 3 PR 41 — Document AutomationDirect DL205 / DL260 Modbus quirks' (#40) from phase-3-pr41-dl205-quirks-doc into v2 2026-04-18 19:52:20 -04:00
Joseph Doherty
9de96554dc Phase 3 PR 41 — Document AutomationDirect DL205 / DL260 Modbus quirks. Adds docs/v2/dl205.md (~300 lines, 8 H2 sections, primary-source citations) covering every place the DL205/DL260 family diverges from textbook Modbus or has non-obvious behavior a generic client gets wrong. Replaces the placeholder _pending_ list in modbus-test-plan.md with a confirmed-behaviors table that doubles as the integration-test roadmap.
The user explicitly flagged that DL205/DL260 strings don't follow Modbus convention; research turned up that and a lot more. Headline findings:
String packing — TWO chars per V-memory register but the FIRST char is in the LOW byte (opposite of the big-endian Modbus convention generic drivers default to). 'Hello' in V2000 reads back as 'eHll o\0' on a textbook decoder. Kepware's DirectLogic driver exposes a per-tag 'String Byte Order = Low/High' toggle specifically for this; we'll need the same. Null-terminated, no length prefix, no dedicated KSTR address space — strings live wherever ladder allocates them in V-memory.
V-memory addressing — DirectLOGIC's native V-memory is OCTAL (V2000, V40400) but Modbus is decimal. The CPU translates: V2000 octal = decimal 1024 = Modbus PDU 0x0400. The widespread 'V40400 = register 0' shorthand is wrong on modern firmware (that was DL05/DL06 relative mode); on H2-ECOM100 absolute mode (factory default) V40400 = PDU 0x2100. We'd surface this with an address-format helper in the device profile so operators write V2000 instead of computing 1024 by hand.
Word order CDAB for all 32-bit values — DL205 and DL260 agree, ECOM modules don't re-swap. Already supported via ModbusByteOrder.WordSwap; just needs to be the default in the DL205 profile.
BCD-as-default numeric storage — bit one I didn't expect. DirectLOGIC stores 'V2000 = 1234' as 0x1234 on the wire (BCD nibbles), not as 0x04D2 (decimal 1234). IEEE 754 Float32 only works when ladder used the explicit R type (LDR/OUTR instructions). We need a new decoder mode for BCD-encoded registers — current code assumes binary integers.
FC quantity caps — FC03/04 cap at 128 (above spec's 125 — Bonus territory, current code already respects 125), FC16 caps at 100 (BELOW spec's 123 — important bulk-write batching gotcha). Quantity overrun returns exception 03 IllegalDataValue.
Coil/discrete mappings — DL260: X0->discrete input 0, Y0->coil 2048, C0->coil 3072. SP specials at discrete input 1024-1535 RO. These are CPU-wired constants and cannot be remapped; need to be hardcoded in the DL205/DL260 device profile.
Register 0 — accepted on DL205/DL260 with ECOM in absolute mode, contrary to the widespread internet claim that 'DirectLOGIC rejects register 0'. That rumour was an older DL05/DL06 relative-mode artefact. Our ModbusProbeOptions.ProbeAddress default of 0 is therefore safe for DL205/DL260.
Exception codes — only the standard 01-04. Write-to-protected-bit returns 02 on newer firmware, 04 on older (firmware-transition revision unconfirmed); driver should map both to BadNotWritable. No proprietary exception codes.
Behavioral oddities — H2-ECOM100 accepts MAX 4 simultaneous TCP connections (5th refused at TCP accept). No TCP keepalive (intermediate NAT/firewall drops idle sockets after 2-5 min — periodic probe required). No mid-stream resync on malformed MBAP — driver must reconnect + replay. TxId-drop-under-load forum rumour is unconfirmed; our single-flight + TxId-match guard handles it either way.
Each H2 section ends with the integration-test names we'd ship per the modbus-test-plan.md DL205_<behavior> convention — twelve named test slots ready for PR 42+ to fill in one at a time. References (8) cited inline, primarily D2-USER-M, HA-ECOM-M, and the Kepware DirectLogic Ethernet driver manual which documents these vendor quirks explicitly because they have to cope with them.
modbus-test-plan.md DL205 section rewritten as a priority-ordered table with three columns (quirk / driver impact / test name), pointing the reader at dl205.md for the full reference. Operator-reported items separated into a tail subsection so future-me knows which behaviors are documented vs reproduced-on-hardware.
Pure documentation PR — no code changes. The actual driver work (string-byte-order option, BCD decoder mode, V-memory address helper, FC16 cap-per-device-family, multi-client TCP handling) lands one PR per quirk in PR 42+ as ModbusPal validation completes.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-18 19:49:35 -04:00
af35fac0ef Merge pull request 'Phase 3 PR 40 — LiveStack write + subscribe tests against TestMachine_001' (#39) from phase-3-pr40-livestack-write-subscribe into v2 2026-04-18 19:41:55 -04:00
Joseph Doherty
aa8834a231 Phase 3 PR 40 — LiveStackSmokeTests: write-roundtrip + subscribe-receives-OnDataChange against the live Galaxy. Finishes LMX #5 by exercising the IWritable + ISubscribable capability paths end-to-end through the Proxy → OtOpcUaGalaxyHost service → MXAccess → real Galaxy.
Two new facts target DelmiaReceiver_001.TestAttribute — the writable Boolean UDA on the TestMachine_001 hierarchy in this dev Galaxy. The user nominated TestMachine_001 (the deployed test-target object) as a scratch surface for live testing; ZB query showed DelmiaReceiver_001 carries one dynamic_attribute named TestAttribute (mx_data_type=1=Boolean, lock_type=0=writable, security_classification=1=Operate). Naming makes the intent obvious — the attribute exists for exactly this kind of integration testing — and Boolean keeps the assertions simple (invert, write, read back).
Write_then_read_roundtrips_a_writable_Boolean_attribute_on_TestMachine_001: reads the current value as the baseline (Galaxy may return Uncertain quality until the Engine has scanned the attribute at least once — we don't read into a typed bool until Status is Good), inverts it, writes via IWritable, then polls reads in a 5s loop until either the new value comes back or the budget expires. The scan-window poll (rather than a single read after a fixed delay) accommodates Galaxy's variable scan latency on a fresh service start. Restore-on-finally writes the original value back so re-running the test doesn't accumulate a flipped TestAttribute on the dev box (Galaxy holds UDA values across runs since they're deployed). Best-effort restore — swallows exceptions so a failure in restore doesn't mask the primary assertion.
Subscribe_fires_OnDataChange_with_initial_value_then_again_after_a_write: subscribes to the same attribute with a 250ms publishing interval, captures every OnDataChange notification onto a thread-safe ConcurrentQueue (MXAccess advisory fires on its own thread per Galaxy's COM apartment model — must not block it), waits up to 5s for the initial-value callback (per ISubscribable's contract: 'driver MAY fire OnDataChange immediately with the current value'), records the queue depth as a baseline, writes the toggled value, waits up to 8s for at least one MORE notification, then searches the queue tail for the notification carrying the toggled value (initial value may appear multiple times before the write commits — looking at the tail finds the post-write delta even if the queue grew during the wait window). Unsubscribes on finally + restores baseline.
Both tests use Convert.ToBoolean(value ?? false) to defensively handle the Boxed-vs-typed quirk in MessagePack-deserialized Galaxy values — depending on the wire encoding the Boolean might come back as System.Boolean or System.Object boxing one. Convert.ToBoolean handles both. Same pattern in OnReadValue's existing usage.
WaitForAsync helper does the loop+budget pattern shared by both tests.
PR 40 is the code side of LMX #5's final two deferred facts. To actually run them green requires re-executing from a normal (non-admin) PowerShell — the elevated-shell skip from PR 39 fires correctly under bash + sc.exe-context (verified). lmx-followups.md #5 updated to note the new facts + the run command + the one remaining genuine follow-up (alarm-condition fact when an alarm-flagged attribute is deployed on TestMachine_001).
Test posture from elevated bash: 7 LiveStackSmokeTests facts discovered (was 5; +2 new), all skip cleanly with the elevation message. Build clean.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-18 19:38:34 -04:00
976e73e051 Merge pull request 'Phase 3 PR 39 — LiveStackFixture skip-with-reason for elevated shells' (#38) from phase-3-pr39-elevated-shell-skip into v2 2026-04-18 19:31:30 -04:00
Joseph Doherty
8fb3dbe53b Phase 3 PR 39 — LiveStackFixture pre-flight detect for elevated shell. The OtOpcUaGalaxyHost named-pipe ACL allows the configured SID but explicitly DENIES Administrators per decision #76 / PipeAcl.cs (production-hardening — keeps an admin shell on a deployed box from connecting to the IPC channel without going through the configured service principal). A test process running with a high-integrity elevated token carries the Administrators group in its security context regardless of whose user it 'is', so the deny rule trumps the user's allow and the pipe connect returns UnauthorizedAccessException at the prerequisite-probe stage. Functionally correct but operationally confusing — when this hit during the PR 38 install workflow it took five steps to diagnose ('the user IS in the allow list, why is the pipe denying access?'). The pre-existing ParityFixture (PR 18) already documents this with an explicit early-skip; LiveStackFixture (PR 37) didn't.
PR 39 closes the gap. New IsElevatedAdministratorOnWindows static helper (Windows-only via RuntimeInformation.IsOSPlatform; non-Windows hosts return false and let the prerequisite probe own the skip-with-reason path) checks WindowsPrincipal.IsInRole(WindowsBuiltInRole.Administrator) on the current process token. When true, InitializeAsync short-circuits to a SkipReason that names the cause directly: 'elevated token's Admins group membership trumps the allow rule — re-run from a NORMAL (non-admin) PowerShell window'. Catches and swallows any probe-side exception so a Win32 oddity can't crash the test fixture; failed probe falls through to the regular prerequisite path.
The check fires BEFORE AvevaPrerequisites.CheckAllAsync runs because the prereq probe's own pipe connect hits the same admin-deny and surfaces UnauthorizedAccessException with no context. Short-circuiting earlier saves the 10-second probe + produces a single actionable line.
Tests — verified manually from an elevated bash session against the just-installed OtOpcUaGalaxyHost service: skip message reads 'Test host is running with elevated (Administrators) privileges, but the OtOpcUaGalaxyHost named-pipe ACL explicitly denies Administrators per the IPC security design (decision #76 / PipeAcl.cs). Re-run from a NORMAL (non-admin) PowerShell window — even when your user is already in the pipe's allow list, the elevated token's Admins group membership trumps the allow rule.' Proxy.Tests Unit: 17 pass / 0 fail (unchanged — fixture change is non-breaking; existing tests don't run as admin in normal CI flow). Build clean.
Bonus: gitignored .local/ directory (a previous direct commit on local v2 that I'm now landing here) so per-install secrets like the Galaxy.Host shared-secret file don't leak into the repo.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-18 19:17:43 -04:00
Joseph Doherty
a61e637411 Gitignore .local/ directory for dev-only secrets like the Galaxy.Host shared secret. Created during the PR 38 / install-services workflow to keep per-install secrets out of the repo. 2026-04-18 19:15:13 -04:00
e4885aadd0 Merge pull request 'Phase 3 PR 38 — DriverNodeManager HistoryRead override (LMX #1 finish)' (#37) from phase-3-pr38-historyread-servicehandler into v2 2026-04-18 17:53:24 -04:00
Joseph Doherty
52a29100b1 Phase 3 PR 38 — DriverNodeManager HistoryRead override (LMX #1 finish). Wires the OPC UA HistoryRead service through CustomNodeManager2's four protected per-kind hooks — HistoryReadRawModified / HistoryReadProcessed / HistoryReadAtTime / HistoryReadEvents — each dispatching to the driver's IHistoryProvider capability (PR 35 for ReadAtTime + ReadEvents on top of PR 19-era ReadRaw + ReadProcessed). Was the last missing piece of the end-to-end HistoryRead path: PR 10 + PR 11 shipped the Galaxy.Host IPC contracts, PR 35 surfaced them on IHistoryProvider + GalaxyProxyDriver, but no server-side handler bridged OPC UA HistoryRead service requests onto the capability interface. Now it does.
Per-kind override shape: each hook receives the pre-filtered nodesToProcess list (NodeHandles for nodes this manager claimed), iterates them, resolves handle.NodeId.Identifier to the driver-side full reference string, and dispatches to the right IHistoryProvider method. Write back into the outer results + errors slots at handle.Index (not the local loop counter — nodesToProcess is a filtered subset of nodesToRead, so indexing by the loop counter lands in the wrong slot for mixed-manager batches). WriteResult helper sets both results[i] AND errors[i]; this matters because MasterNodeManager merges them and leaving errors[i] at its default (BadHistoryOperationUnsupported) overrides a Good result with Unsupported on the wire — this was the subtle failure mode that masked a correctly-constructed HistoryData response during debugging. Failure-isolation per node: NotSupportedException from a driver that doesn't implement a particular HistoryProvider method translates to BadHistoryOperationUnsupported in that slot; generic exceptions log and surface BadInternalError; unresolvable NodeIds get BadNodeIdUnknown. The batch continues unconditionally.
Aggregate mapping: MapAggregate translates ObjectIds.AggregateFunction_Average / Minimum / Maximum / Total / Count to the driver's HistoryAggregateType enum. Null for anything else (e.g. TimeAverage, Interpolative) so the handler surfaces BadAggregateNotSupported at the batch level — per Part 13, one unsupported aggregate means the whole request fails since ReadProcessedDetails carries one aggregate list for all nodes. BuildHistoryData wraps driver DataValueSnapshots as Opc.Ua.HistoryData in an ExtensionObject; BuildHistoryEvent wraps HistoricalEvents as Opc.Ua.HistoryEvent with the canonical BaseEventType field list (EventId, SourceName, Message, Severity, Time, ReceiveTime — the order OPC UA clients that didn't customize the SelectClause expect). ToDataValue preserves null SourceTimestamp (Galaxy historian rows often carry only ServerTimestamp) — synthesizing a SourceTimestamp would lie about actual sample time.
Two address-space changes were required to make the stack dispatch reach the per-kind hooks at all: (1) historized variables get AccessLevels.HistoryRead added to their AccessLevel byte — the base's early-gate check on (variable.AccessLevel & HistoryRead != 0) was rejecting requests before our override ever ran; (2) the driver-root folder gets EventNotifiers.HistoryRead | SubscribeToEvents so HistoryReadEvents can target it (the conventional pattern for alarm-history browse against a driver-owned object). Document the 'set both bits' requirement inline since it's not obvious from the surface API.
OpcHistoryReadResult alias: Opc.Ua.HistoryReadResult (service-layer per-node result) collides with Core.Abstractions.HistoryReadResult (driver-side samples + continuation point) by type name; the alias 'using OpcHistoryReadResult = Opc.Ua.HistoryReadResult' keeps the override signatures unambiguous and the test project applies the mirror pattern for its stub driver impl.
Tests — DriverNodeManagerHistoryMappingTests (12 new Category=Unit cases): MapAggregate translates each supported aggregate NodeId via reflection-backed theory (guards against the stack renaming AggregateFunction_* constants); returns null for unsupported NodeIds (TimeAverage) and null input; BuildHistoryData wraps samples with correct DataValues + SourceTimestamp preservation; BuildHistoryEvent emits the 6-element BaseEventType field list in canonical order (regression guard for a future 'respect the client's SelectClauses' change); null SourceName / Message translate to empty-string Variants (nullable-Variant refactor trap); ToDataValue preserves StatusCode + both timestamps; ToDataValue leaves SourceTimestamp at default when the snapshot omits it. HistoryReadIntegrationTests (5 new Category=Integration): drives a real OPC UA client Session.HistoryRead against a fake HistoryDriver through the running server. Covers raw round-trip (verifies per-node DataValue ordering + values); processed with Average aggregate (captures the driver's received aggregate + interval, asserting MapAggregate routed correctly); unsupported aggregate (TimeAverage → BadAggregateNotSupported); at-time (forwards the per-timestamp list); events (BaseEventType field list shape, SelectClauses populated to satisfy the stack's filter validator). Server.Tests Unit: 55 pass / 0 fail (43 prior + 12 new mapping). Server.Tests Integration: 14 pass / 0 fail (9 prior + 5 new history). Full solution build clean, 0 errors.
lmx-followups.md #1 updated to 'DONE (PRs 35 + 38)' with two explicit deferred items: continuation-point plumbing (driver returns null today so pass-through is fine) and per-SelectClause evaluation in HistoryReadEvents (clients with custom field selections get the canonical BaseEventType layout today).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-18 17:50:23 -04:00
19bcf20fbe Merge pull request 'Phase 3 PR 37 — End-to-end live-stack Galaxy smoke test' (#36) from phase-3-pr37-live-stack-smoke into v2 2026-04-18 16:56:50 -04:00
Joseph Doherty
8adc8f5ab8 Phase 3 PR 37 — End-to-end live-stack Galaxy smoke test. Closes the code side of LMX follow-up #5; once OtOpcUaGalaxyHost is installed + started on the dev box, the suite exercises the full topology GalaxyProxyDriver in-process → named-pipe IPC → running OtOpcUaGalaxyHost Windows service → MxAccessGalaxyBackend → live MXAccess runtime → real deployed Galaxy objects. Never spawns the Host process itself — connects to the already-running service per project_galaxy_host_service.md, which is the only way to exercise the production COM-apartment + service-account + pipe-ACL configuration.
LiveStackConfig resolves the pipe name + per-install shared secret from two sources in order: OTOPCUA_GALAXY_PIPE + OTOPCUA_GALAXY_SECRET env vars first (for CI / benchwork overrides), then the service's per-process Environment registry values under HKLM\SYSTEM\CurrentControlSet\Services\OtOpcUaGalaxyHost (what Install-Services.ps1 writes at install time). Registry read requires the test host to run elevated on most boxes — the skip message says so explicitly so operators see the right remediation. Hard-coded secrets are deliberately avoided: the installer generates 32 fresh random bytes per install, a committed secret would diverge from production the moment the service is re-installed.
LiveStackFixture is an IAsyncLifetime that (1) runs AvevaPrerequisites.CheckAllAsync with CheckGalaxyHostPipe=true + CheckHistorian=false — produces a structured PrerequisiteReport whose SkipReason is the exact operator-facing 'here's what you need to fix' text, (2) resolves LiveStackConfig and surfaces a clear skip when the secret isn't discoverable, (3) instantiates GalaxyProxyDriver + calls InitializeAsync (the IPC handshake), capturing a skip with the exception detail + common-cause hints (secret mismatch, SID not in pipe ACL, Host's backend couldn't connect to ZB) rather than letting a NullRef cascade through every subsequent test. SkipIfUnavailable() translates the captured SkipReason into Assert.Skip at the top of every fact so tests read as cleanly-skipped with a visible reason, not silently-passed or crashed.
LiveStackSmokeTests (5 facts, Collection=LiveStack, Category=LiveGalaxy): Fixture_initialized_successfully (cheapest possible end-to-end assertion — if this passes, the IPC handshake worked); Driver_reports_Healthy_after_IPC_handshake (DriverHealth.State post-connect); DiscoverAsync_returns_at_least_one_variable_from_live_galaxy (captures every Variable() call from DiscoverAsync via CapturingAddressSpaceBuilder and asserts > 0 — zero here usually means the Host couldn't read ZB, the skip message names OTOPCUA_GALAXY_ZB_CONN to check); GetHostStatuses_reports_at_least_one_platform (IHostConnectivityProbe surface — zero means the probe loop hasn't fired or no Platform is deployed locally); Can_read_a_discovered_variable_from_live_galaxy (reads the first discovered attribute's full reference, asserts status != BadInternalError — Galaxy's Uncertain-quality-until-first-Engine-scan is intentionally NOT treated as failure since it depends on runtime state that varies across test runs). Read-only by design; writes need an agreed scratch tag to avoid mutating a process-critical attribute — deferred to a follow-up PR that reuses this fixture.
CapturingAddressSpaceBuilder is a minimal IAddressSpaceBuilder that flattens every Variable() call into a list so tests can inspect what discovery produced without booting the full OPC UA node-manager stack; alarm annotation + property calls are no-ops. Scoped private to the test class.
Galaxy.Proxy.Tests csproj gains a ProjectReference to Driver.Galaxy.TestSupport (PR 36) for AvevaPrerequisites. The NU1702 warning about the Host project being net48-referenced-by-net10 is pre-existing from the HostSubprocessParityTests — Proxy.Tests only needs the Host EXE path for that parity scenario, not type surface.
Test run on THIS machine (OtOpcUaGalaxyHost not yet installed): Skipped! Failed 0, Passed 0, Skipped 5 — each skip message includes the full prerequisites report pointing at the missing service. Once the service is installed + started (scripts\install\Install-Services.ps1), the 5 facts will execute against live Galaxy. Proxy.Tests Unit: 17 pass / 0 fail (unchanged — new tests are Category=LiveGalaxy, separate suite). Full Proxy build clean. Memory already captures the 'live tests run via already-running service, don't spawn' convention (project_galaxy_host_service.md).
lmx-followups.md #5 updated: status is 'IN PROGRESS' across PRs 36 + 37 with the explicit remaining work (install + start services, subscribe-and-receive, write round-trip).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-18 16:49:51 -04:00
261869d84e Merge pull request 'Phase 3 PR 36 — AVEVA prerequisites test-support library' (#35) from phase-3-pr36-aveva-prerequisites into v2 2026-04-18 16:44:41 -04:00
Joseph Doherty
08c90d19fd Phase 3 PR 36 — AVEVA prerequisites test-support library. New tests/ZB.MOM.WW.OtOpcUa.Driver.Galaxy.TestSupport multi-targeted class library (net10.0 + net48 so both the modern and the MXAccess-COM x86 test projects can consume it) that probes every piece of the AVEVA System Platform + OtOpcUa stack a live-Galaxy test depends on and returns a structured PrerequisiteReport. Closes the gap where live-smoke tests silently returned 'unreachable' without telling operators which specific piece failed.
AvevaPrerequisites.CheckAllAsync walks eight probe categories producing PrerequisiteCheck rows each with Name (e.g. 'service:aaBootstrap', 'sql:ZB', 'com:LMXProxy', 'registry:ArchestrA.Framework'), Category (AvevaCoreService / AvevaSoftService / AvevaInstall / MxAccessCom / GalaxyRepository / AvevaHistorian / OtOpcUaService / Environment), Status (Pass / Warn / Fail / Skip), and operator-facing Detail message. Report aggregates them: IsLivetestReady (no Fails anywhere) and IsAvevaSideReady (AVEVA-side categories pass, our v2 services can be absent while still considering the environment AVEVA-ready) so different test tiers can use the right threshold.
Individual probes: ServiceProbe.Check queries the Windows Service Control Manager via System.ServiceProcess.ServiceController — treats DemandStart+Stopped as Warn (NmxSvc is DemandStart by design; master pulls it up) but AutoStart+Stopped as Fail; not-installed is Fail for hard-required services, Warn for soft ones; non-Windows hosts get Skip; transitional states like StartPending get Warn with a 'try again' hint. RegistryProbe reads HKLM\SOFTWARE\WOW6432Node\ArchestrA\{Framework,Framework\Platform,MSIInstall} — Framework key presence + populated InstallPath/RootPath values mean System Platform installed; PfeConfigOptions in the Platform subkey (format 'PlatformId=N,EngineId=N,...') indicates a Platform has been deployed from the IDE (PlatformId=0 means never deployed — MXAccess will connect but every subscription will be Bad quality); RebootRequired='True' under MSIInstall surfaces as a loud warn since post-patch behavior is undefined. MxAccessComProbe resolves the LMXProxy.LMXProxyServer ProgID → CLSID → HKLM\SOFTWARE\Classes\WOW6432Node\CLSID\{guid}\InprocServer32, verifying the registered file exists on disk (catches the orphan-registry case where a previous uninstall left the ProgID registered but the DLL is gone — distinguishes it from the 'totally not installed' case by message); also emits a Warn when the test process is 64-bit (MXAccess COM activation fails with REGDB_E_CLASSNOTREG 0x80040154 regardless of registration, so seeing this warning tells operators why the activation would fail even on a fully-installed machine). SqlProbe tests Galaxy Repository via Microsoft.Data.SqlClient using the Windows-auth localhost connection string the repo code defaults to — distinguishes 'SQL Server unreachable' (connection fails) from 'ZB database does not exist' (SELECT DB_ID('ZB') returns null) because they have different remediation paths (sc.exe start MSSQLSERVER vs. restore from .cab backup); a secondary CheckDeployedObjectCountAsync query on 'gobject WHERE deployed_version > 0' warns when the count is zero because discovery smoke tests will return empty hierarchies. NamedPipeProbe opens a 2s NamedPipeClientStream against OtOpcUaGalaxyHost's pipe ('OtOpcUaGalaxy' per the installer default) — pipe accepting a connection proves the Host service is listening; disconnects immediately so we don't consume a session slot.
Service lists kept as internal static data so tests can inspect + override: CoreServices (aaBootstrap + aaGR + NmxSvc + MSSQLSERVER — hard fail if missing), SoftServices (aaLogger + aaUserValidator + aaGlobalDataCacheMonitorSvr — warn only; stack runs without them but diagnostics/auth are degraded), HistorianServices (aahClientAccessPoint + aahGateway — opt-in via Options.CheckHistorian, only matters for HistoryRead IPC paths), OtOpcUaServices (our OtOpcUaGalaxyHost hard-required for end-to-end live tests + OtOpcUa warn + GLAuth warn). Narrower entry points CheckRepositoryOnlyAsync and CheckGalaxyHostPipeOnlyAsync for tests that only care about specific subsystems — avoid paying the full probe cost on every GalaxyRepositoryLiveSmokeTests fact.
Multi-targeting mechanics: System.ServiceProcess.ServiceController + Microsoft.Win32.Registry are NuGet packages on net10 but in-box BCL references on net48; csproj conditions Package vs Reference by TargetFramework. Microsoft.Data.SqlClient v6 supports both frameworks so single PackageReference. Net48Polyfills.cs provides IsExternalInit shim (records/init-only setters) and SupportedOSPlatformAttribute stub so the same Probe sources compile on both frameworks without per-callsite preprocessor guards — lets Roslyn's platform-compatibility analyzer stay useful on net10 without breaking net48 builds.
Existing GalaxyRepositoryLiveSmokeTests updated to delegate its skip decision to AvevaPrerequisites.CheckRepositoryOnlyAsync (legacy ZbReachableAsync kept as a compatibility adapter so the in-test 'if (!await ZbReachableAsync()) return;' pattern keeps working while the surrounding fixtures gradually migrate to Assert.Skip-with-reason). Slnx file registers the new project.
Tests — AvevaPrerequisitesLiveTests (8 new Integration cases, Category=LiveGalaxy): the helper correctly reports Framework install (registry pass), aaBootstrap Running (service pass), aaGR Running (service pass), MxAccess COM registered (com pass), ZB database reachable (sql pass), deployed-object count > 0 (warn-upgraded-to-pass because this box has 49 objects deployed), the AVEVA side is ready even when our own services (OtOpcUaGalaxyHost) aren't installed yet (IsAvevaSideReady=true), and the helper emits rows for OtOpcUaGalaxyHost + OtOpcUa + GLAuth even when not installed (regression guard — nobody can accidentally ship a check that omits our own services). Full Galaxy.Host.Tests Category=LiveGalaxy suite: 13 pass (5 prior smoke + 8 new prerequisites). Full solution build clean, 0 errors.
What's NOT in this PR: end-to-end Galaxy stack smoke (Proxy → Host pipe → MXAccess → real Galaxy tag). That's the next PR — this one is the gate the end-to-end smoke will call first to produce actionable skip messages instead of silent returns.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-18 16:36:13 -04:00
5cc120d836 Merge pull request 'Phase 3 PR 35 — IHistoryProvider gains ReadAtTime + ReadEvents; Proxy implements both' (#34) from phase-3-pr35-history-readtime-readevents into v2 2026-04-18 16:12:43 -04:00
Joseph Doherty
bf329b05d8 Phase 3 PR 35 — IHistoryProvider gains ReadAtTimeAsync + ReadEventsAsync; GalaxyProxyDriver implements both. Extends Core.Abstractions.IHistoryProvider with two new methods that round out the OPC UA Part 11 HistoryRead surface (HistoryReadAtTime + HistoryReadEvents are the last two modes not covered by the PR 19-era ReadRawAsync + ReadProcessedAsync) and wires GalaxyProxyDriver to call the existing PR-10/PR-11 IPC contracts the Host already implements.
Interface additions use C# default interface implementations that throw NotSupportedException — existing IHistoryProvider implementations keep compiling, only drivers whose backend carries the relevant capability override. This matches the 'capabilities are optional per driver' design already used by IHistoryProvider.ReadProcessedAsync's docs (Modbus / OPC UA Client drivers never had an event historian and the default-throw path lets callers see BadHistoryOperationUnsupported naturally). New HistoricalEvent record models one historian row (EventId, SourceName, EventTimeUtc + ReceivedTimeUtc — process vs historian-persist timestamps, Message, Severity mapped to OPC UA's 1-1000 range); HistoricalEventsResult pairs the event list with a continuation-point token for future batching. Both live in Core.Abstractions so downstream (Proxy, Host, Server) reference a single domain shape — no Shared-contract leak into the driver-facing interface.
GalaxyProxyDriver.ReadAtTimeAsync maps the domain DateTime[] to Unix-ms longs, calls CallAsync on the existing MessageKind.HistoryReadAtTimeRequest, and trusts the Host's one-sample-per-requested-timestamp contract (the Host pads with bad-quality snapshots for timestamps it can't interpolate; re-aligning on the Proxy side would duplicate the Host's interpolation policy logic). ReadEventsAsync does the same for HistoryReadEventsRequest; ToHistoricalEvent translates GalaxyHistoricalEvent (MessagePack-annotated, Unix-ms) to the domain record, explicitly tagging DateTimeKind.Utc on both timestamp fields so downstream serializers (JSON, OPC UA types) don't apply an unexpected local-time offset.
Tests — HistoricalEventMappingTests (3 new Proxy.Tests unit cases): every field maps correctly from wire to domain; null SourceName and null DisplayText preserve through the mapping (system events without a source come out with null so callers can distinguish them from alarm events); both timestamps come out as DateTimeKind.Utc (regression guard against a future refactor using DateTime.FromFileTimeUtc or similar that defaults to Unspecified). Driver.Galaxy.Proxy.Tests Unit suite: 17 pass / 0 fail (14 prior + 3 new). Full solution build clean, 0 errors.
Scope exclusions — DriverNodeManager HistoryRead service-handler wiring (on the OPC UA Server side, where HistoryReadAtTime and HistoryReadEvents service requests land) and the full-loop integration test (OPC UA client → server → IPC → Host → HistorianDataSource → back) are deferred to a focused follow-up PR. The capability surface is the load-bearing change; wiring the service handlers is mechanical in comparison and worth its own PR for reviewability. docs/v2/lmx-followups.md #1 updated with the split.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-18 16:08:27 -04:00
2584379e75 Merge pull request 'Phase 3 PR 34 — Host-status publisher (Server) + /hosts drill-down page (Admin)' (#33) from phase-3-pr34-host-status-publisher-page into v2 2026-04-18 16:04:20 -04:00
Joseph Doherty
ef2a810b2d Phase 3 PR 34 — Host-status publisher (Server) + /hosts drill-down page (Admin). Closes LMX follow-up #7 by wiring together the data layer from PR 33. Server.HostStatusPublisher is a BackgroundService that walks every driver registered in DriverHost every 10 seconds, skips drivers that don't implement IHostConnectivityProbe, calls GetHostStatuses() on each probe-capable driver, and upserts one DriverHostStatus row per (NodeId, DriverInstanceId, HostName) into the central config DB. Upsert path: SingleOrDefaultAsync on the composite PK; if no row exists, Add a new one; if a row exists, LastSeenUtc advances unconditionally (heartbeat) and State + StateChangedUtc update only on transitions so Admin UI can distinguish 'still reporting, still Running' from 'freshly transitioned to Running'. MapState translates Core.Abstractions.HostState to Configuration.Enums.DriverHostState (intentional duplicate enum — Configuration project stays free of driver-runtime deps per PR 33's choice). If a driver's GetHostStatuses throws, log warning and skip that driver this tick — never take down the Server on a publisher failure. If the DB is unreachable, log warning + retry next heartbeat (no buffering — next tick's current-state snapshot is more useful than replaying stale transitions after a long outage). 2-second startup delay so NodeBootstrap's RegisterAsync calls land before the first publish tick, then tick runs immediately so a freshly-started Server surfaces its host topology in the Admin UI without waiting a full interval.
Polling chosen over event-driven for initial scope: simpler, matches Admin UI consumer cadence, avoids DriverHost lifecycle-event plumbing that doesn't exist today. Event-driven push for sub-heartbeat latency is a straightforward follow-up.
Admin.Services.HostStatusService left-joins DriverHostStatus against ClusterNode on NodeId so rows persist even when the ClusterNode entry doesn't exist yet (first-boot bootstrap case). StaleThreshold = 30s — covers one missed publisher heartbeat plus a generous buffer for clock skew and GC pauses. Admin Components/Pages/Hosts.razor — FleetAdmin-visible page grouped by cluster (handles the '(unassigned)' case for rows without a matching ClusterNode). Four summary cards (Hosts / Running / Stale / Faulted); per-cluster table with Node / Driver / Host / State + Stale-badge / Last-transition / Last-seen / Detail columns; 10s auto-refresh via IServiceScopeFactory timer pattern matching FleetStatusPoller + Fleet dashboard (PR 27). Row-class highlighting: Faulted → table-danger, Stale → table-warning, else default. State badge maps DriverHostState enum to bootstrap color classes. Sidebar link added between 'Fleet status' and 'Clusters'.
Server csproj adds Microsoft.EntityFrameworkCore.SqlServer 10.0.0 + registers OtOpcUaConfigDbContext in Program.cs scoped via NodeOptions.ConfigDbConnectionString (no Admin-style manual SQL raw — the DbContext is the only access path, keeps migrations owner-of-record).
Tests — HostStatusPublisherTests (4 new Integration cases, uses per-run throwaway DB matching the FleetStatusPollerTests pattern): publisher upserts one row per host from each probe-capable driver and skips non-probe drivers; second tick advances LastSeenUtc without creating duplicate rows (upsert pattern verified end-to-end); state change between ticks updates State AND StateChangedUtc (datetime2(3) rounds to millisecond precision so comparison uses 1ms tolerance — documented inline); MapState translates every HostState enum member. Server.Tests Integration: 4 new tests pass. Admin build clean, Admin.Tests Unit still 23 / 0. docs/v2/lmx-followups.md item #7 marked DONE with three explicit deferred items (event-driven push, failure-count column, SignalR fan-out).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-18 15:51:55 -04:00
a7764e50f3 Merge pull request 'Phase 3 PR 33 — DriverHostStatus entity + migration (LMX #7 data layer)' (#32) from phase-3-pr33-driverhoststatus-entity into v2 2026-04-18 15:43:37 -04:00
Joseph Doherty
8464e3f376 Phase 3 PR 33 — DriverHostStatus entity + EF migration (data-layer for LMX #7). New DriverHostStatus entity with composite key (NodeId, DriverInstanceId, HostName) persists each server node's per-host connectivity view — one row per (server node, driver instance, probe-reported host), which means a redundant 2-node cluster with one Galaxy driver reporting 3 platforms produces 6 rows because each server node owns its own runtime view of the shared host topology, not 3. Fields: NodeId (64), DriverInstanceId (64), HostName (256 — fits Galaxy FQDNs and Modbus host:port strings), State (DriverHostState enum — Unknown/Running/Stopped/Faulted, persisted as nvarchar(16) via HasConversion<string> so DBAs inspecting the table see readable state names not ordinals), StateChangedUtc + LastSeenUtc (datetime2(3) — StateChangedUtc tracks actual transitions while LastSeenUtc advances on every publisher heartbeat so the Admin UI can flag stale rows from a crashed Server independent of State), Detail (nullable 1024 — exception message from the driver's probe when Faulted, null otherwise).
DriverHostState enum lives in Configuration.Enums/ rather than reusing Core.Abstractions.HostState so the Configuration project stays free of driver-runtime dependencies (it's referenced by both the Admin process and the Server process, so pulling in the driver-abstractions assembly to every Admin build would be unnecessary weight). The server-side publisher hosted service (follow-up PR 34) will translate HostStatusChangedEventArgs.NewState to this enum on every transition.
No foreign key to ClusterNode — a Server may start reporting host status before its ClusterNode row exists (first-boot bootstrap), and we'd rather keep the status row than drop it. The Admin-side service that renders the dashboard will left-join on NodeId when presenting. Two indexes declared: IX_DriverHostStatus_Node drives the per-cluster drill-down (Admin UI joins ClusterNode on ClusterId to pick which NodeIds to fetch), IX_DriverHostStatus_LastSeen drives the stale-row query (now - LastSeen > threshold).
EF migration AddDriverHostStatus creates the table + PK + both indexes. Model snapshot updated. SchemaComplianceTests expected-tables list extended. DriverHostStatusTests (3 new cases, category SchemaCompliance, uses the shared fixture DB): composite key allows same (host, driver) across different nodes AND same (node, host) across different drivers — both real-world cases the publisher needs to support; upsert-in-place pattern (fetch-by-composite-PK, mutate, save) produces one row not two — the pattern the publisher will use; State enum persists as string not int — reading the DB via ADO.NET returns 'Faulted' not '3'.
Configuration.Tests SchemaCompliance suite: 10 pass / 0 fail (7 prior + 3 new). Configuration build clean. No Server or Admin code changes yet — publisher + /hosts page are PR 34.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-18 15:38:41 -04:00
a9357600e7 Merge pull request 'Phase 3 PR 32 — Multi-driver integration test' (#31) from phase-3-pr32-multi-driver-integration into v2 2026-04-18 15:34:16 -04:00
Joseph Doherty
2f00c74bbb Phase 3 PR 32 — Multi-driver integration test. Closes LMX follow-up #6 with Server.Tests/MultipleDriverInstancesIntegrationTests.cs: registers two StubDriver instances (alpha + beta) with distinct DriverInstanceIds on one DriverHost, boots the full OpcUaApplicationHost, and exercises three behaviors end-to-end via a real OPC UA client session. (1) Each driver's namespace URI resolves to a distinct index in the client's NamespaceUris (alpha → urn:OtOpcUa:alpha, beta → urn:OtOpcUa:beta) — proves DriverNodeManager's namespaceUris-per-driver base-ctor wiring actually lands two separate INodeManager registrations. (2) Browsing one subtree returns only that driver's folder; the other driver's folder does NOT leak into the wrong subtree. This is the test that catches a cross-driver routing regression the v1 single-driver code path couldn't surface — if a future refactor flattens both drivers into a shared namespace, the 'shouldNotContain' assertion fails cleanly. (3) Reads route to the owning driver by namespace — alpha's ReadAsync returns 42 while beta's returns 99; a misroute would surface as 99 showing up on an alpha node id or vice versa. StubDriver is parameterized on (DriverInstanceId, folderName, readValue) so the same class constructs both instances without copy-paste.
No production code changes — pure additive test. Server.Tests Integration: 3 new tests pass; existing OpcUaServerIntegrationTests stays green (single-driver case still exercised there). Full Server.Tests Unit still 43 / 0. Deferred: multi-driver alarm-event case (two drivers each raising a GalaxyAlarmEvent, assert each condition lands on its owning instance's condition node) — needs a stub IAlarmSource and is worth its own focused PR.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-18 15:29:49 -04:00
5d5e1f9650 Merge pull request 'Phase 3 PR 31 — Live-LDAP integration test + Active Directory compatibility' (#30) from phase-3-pr31-live-ldap-ad-compat into v2 2026-04-18 15:27:54 -04:00
Joseph Doherty
4886a5783f Phase 3 PR 31 — Live-LDAP integration test + Active Directory compatibility. Closes LMX follow-up #4 with 6 live-bind tests in Server.Tests/LdapUserAuthenticatorLiveTests.cs against the dev GLAuth instance at localhost:3893 (skipped cleanly when unreachable via Assert.Skip + a clear SkipReason — matches the GalaxyRepositoryLiveSmokeTests pattern). Coverage: valid credentials bind + surface DisplayName; wrong password fails; unknown user fails; empty credentials fail pre-flight without touching the directory; writeop user's memberOf maps through GroupToRole to WriteOperate (the exact string WriteAuthzPolicy.IsAllowed expects); admin user surfaces all four mapped roles (WriteOperate + WriteTune + WriteConfigure + AlarmAck) proving memberOf parsing doesn't stop after the first match. While wiring this up, the authenticator's hard-coded user-lookup filter 'uid=<name>' didn't match GLAuth (which keys users by cn and doesn't populate uid) — AND it doesn't match Active Directory either, which uses sAMAccountName. Added UserNameAttribute to LdapOptions (default 'uid' for RFC 2307 backcompat) so deployments override to 'cn' / 'sAMAccountName' / 'userPrincipalName' as the directory requires; authenticator filter now interpolates the configured attribute. The default stays 'uid' so existing test fixtures and OpenLDAP installs keep working without a config change — a regression guard in LdapUserAuthenticatorAdCompatTests.LdapOptions_default_UserNameAttribute_is_uid_for_rfc2307_compat pins this so a future 'helpful' default change can't silently break anyone.
Active Directory compatibility. LdapOptions xml-doc expanded with a cheat-sheet covering Server (DC FQDN), Port 389 vs 636, UseTls=true under AD LDAP-signing enforcement, dedicated read-only service account DN, sAMAccountName vs userPrincipalName vs cn trade-offs, memberOf DN shape (CN=Group,OU=...,DC=... with the CN= RDN stripped to become the GroupToRole key), and the explicit 'nested groups NOT expanded' call-out (LDAP_MATCHING_RULE_IN_CHAIN / tokenGroups is a future authenticator enhancement, not a config change). docs/security.md §'Active Directory configuration' adds a complete appsettings.json snippet with realistic AD group names (OPCUA-Operators → WriteOperate, OPCUA-Engineers → WriteConfigure, OPCUA-AlarmAck → AlarmAck, OPCUA-Tuners → WriteTune), LDAPS port 636, TLS on, insecure-LDAP off, and operator-facing notes on each field. LdapUserAuthenticatorAdCompatTests (5 unit guards): ExtractFirstRdnValue parses AD-style 'CN=OPCUA-Operators,OU=...,DC=...' DNs correctly (case-preserving — operators' GroupToRole keys stay readable); also handles mixed case and spaces in group names ('Domain Users'); also works against the OpenLDAP ou=<group>,ou=groups shape (GLAuth) so one extractor tolerates both memberOf formats common in the field; EscapeLdapFilter escapes the RFC 4515 injection set (\, *, (, ), \0) so a malicious login like 'admin)(cn=*' can't break out of the filter; default UserNameAttribute regression guard.
Test posture — Server.Tests Unit: 43 pass / 0 fail (38 prior + 5 new AD-compat guards). Server.Tests LiveLdap category: 6 pass / 0 fail against running GLAuth (would skip cleanly without). Server build clean, 0 errors, 0 warnings.
Deferred: the session-identity end-to-end check (drive a full OPC UA UserName session, then read a 'whoami' node to verify the role landed on RoleBasedIdentity). That needs a test-only address-space node and is scoped for a separate PR.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-18 15:23:22 -04:00
d70a2e0077 Merge pull request 'Phase 3 PR 30 — Modbus integration-test project scaffold + DL205 smoke test' (#29) from phase-3-pr30-modbus-integration-scaffold into v2 2026-04-18 15:08:45 -04:00
Joseph Doherty
cb7b81a87a Phase 3 PR 30 — Modbus integration-test project scaffold. New tests/ZB.MOM.WW.OtOpcUa.Driver.Modbus.IntegrationTests project is the harness modbus-test-plan.md called for: a skip-when-unreachable fixture that TCP-probes a Modbus simulator endpoint (MODBUS_SIM_ENDPOINT, default localhost:502) once per test session, a DL205 device profile stub (single writable holding register at address 100, probe disabled to avoid racing with assertions), and one happy-path smoke test that initializes the real ModbusDriver + real ModbusTcpTransport, writes a known Int16 value, reads it back, and asserts status=0 + value round-trip. No DL205 quirk assertions yet — those land one-per-PR as the user validates each behavior in ModbusPal (word order for 32-bit, register-zero access, coil addressing base, max registers per FC03, response framing under load, exception code on protected-bit coil write).
ModbusSimulatorFixture is a collection fixture so the 2s TCP probe runs once per run, not per test; SkipReason gets a clear operator-facing message ('start ModbusPal or override MODBUS_SIM_ENDPOINT'). Tests call Assert.Skip(sim.SkipReason) rather than silently returning — matches the test-plan convention and reads cleanly in CI logs. DL205Profile.BuildOptions deliberately disables the background probe loop since integration tests drive reads explicitly and the probe would race with assertions. Tag naming uses the DL205_ prefix so filter 'DisplayName~DL205' surfaces device-specific failures at a glance.
Project references: xunit.v3 + Shouldly + Microsoft.NET.Test.Sdk + xunit.runner.visualstudio (matches the existing Driver.Modbus.Tests unit project), project ref to src/Driver.Modbus. Registered in ZB.MOM.WW.OtOpcUa.slnx under tests/. ModbusPal/README.md documents the dev loop (install ModbusPal jar, load profile, start simulator, dotnet test), explains MODBUS_SIM_ENDPOINT override for real-PLC benchwork, and flags DL205.xmpp as the first profile to add in a follow-up PR.
dotnet test run against the scaffold (no simulator running) skips cleanly: 0 failed, 0 passed, 1 skipped, with the SkipReason surfaced. dotnet build clean (0 warnings, 0 errors). Updated docs/v2/modbus-test-plan.md to mark the scaffold PR done and renumbered future PRs from 'PR 27+' to 'PR 31+' to stay in sync with the actual PR chain.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-18 15:02:39 -04:00
901d2b8019 Merge pull request 'Phase 3 PR 29 — Account/session page with roles + capabilities' (#28) from phase-3-pr29-account-page into v2 2026-04-18 14:46:45 -04:00
Joseph Doherty
d5fa1f450e Phase 3 PR 29 — Account/session page expanding the minimal sidebar role display into a dedicated /account route. Shows the authenticated operator's identity (username from ClaimTypes.NameIdentifier, display name from ClaimTypes.Name), their Admin roles as badges (from ClaimTypes.Role), the raw LDAP groups that mapped to those roles (from the 'ldap_group' claim added by Login.razor at sign-in), and a capability table listing each Admin capability with its required role and a Yes/No badge showing whether this session has it. Capability list mirrors the Program.cs authorization policies + each page's [Authorize] attribute so operators can self-service check whether their session has access without trial-and-error navigation — capabilities covered: view clusters + fleet status (all roles), edit configuration drafts (ConfigEditor or FleetAdmin per CanEdit policy), publish generations (FleetAdmin per CanPublish policy), manage certificate trust (FleetAdmin per PR 28 Certificates page attribute), manage external-ID reservations (ConfigEditor or FleetAdmin per Reservations page attribute).
Sidebar's 'Signed in as' line now wraps the display name in a link to /account so the existing sidebar-compact view becomes the entry point for the fuller page — keeps the sign-out button where it was for muscle memory, just adds the detail page one click away. Page is gated with [Authorize] (any authenticated admin) rather than a specific role — the capability table deliberately works for every signed-in user so they can see what they DON'T have access to, which helps them file the right ticket with their LDAP admin instead of getting a plain Access Denied when navigating blindly.
Capability → required-role table is defined as a private readonly record list in the page rather than pulled from a service because it's a UI-presentation concern, not runtime policy state — the runtime policy IS Program.cs's AddAuthorizationBuilder + each page's [Authorize] attribute, and this table just mirrors it for operator readability. Comment on the list reminds future-me to extend it when a new policy or [Authorize] page lands. No behavior change if roles are empty, but the page surfaces a hint ('Sign-in would have been blocked, so if you're seeing this, the session claim is likely stale') that nudges the operator toward signing out + back in.
No new tests added — the page is pure display over claims; its only logic is the 'has-capability' Any-overlap check which is exactly what ASP.NET's [Authorize(Roles=...)] does in-framework, and duplicating that in a unit test would test the framework rather than our code. Admin.Tests Unit stays 23 pass / 0 fail. Admin build clean.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-18 14:43:35 -04:00
6fdaee3a71 Merge pull request 'Phase 3 PR 28 — Admin UI cert-trust management page' (#27) from phase-3-pr28-cert-trust into v2 2026-04-18 14:42:52 -04:00
Joseph Doherty
ed88835d34 Phase 3 PR 28 — Admin UI cert-trust management page. New /certificates route (FleetAdmin-only) surfaces the OPC UA server's PKI store rejected + trusted certs and gives operators Trust / Delete / Revoke actions so rejected client certs can be promoted without touching disk. CertTrustService reads $PkiStoreRoot/{rejected,trusted}/certs/*.der files directly via X509CertificateLoader — no Opc.Ua dependency in the Admin project, which keeps the Admin host runnable on a machine that doesn't have the full Server install locally (only needs the shared PKI directory reachable; typical deployment has Admin + Server side-by-side on the same box and PkiStoreRoot defaults match so a plain-vanilla install needs no override). CertTrustOptions bound from the Admin's 'CertTrust:PkiStoreRoot' section, default %ProgramData%\OtOpcUa\pki (matches OpcUaServerOptions.PkiStoreRoot default). Trust action moves the .der from rejected/certs/ to trusted/certs/ via File.Move(overwrite:true) — idempotent, tolerates a concurrent operator doing the same move. Delete wipes the file. Revoke removes from trusted/certs/ (Opc.Ua re-reads the Directory store on each new client handshake, so no explicit reload signal is needed; operators retry the rejected connection after trusting). Thumbprint matching is case-insensitive because X509Certificate2.Thumbprint is upper-case hex but operators copy-paste from logs that sometimes lowercase it. Malformed files in the store are logged + skipped — a single bad .der can't take the whole management page offline. Missing store directories produce empty lists rather than exceptions so a pristine install (Server never run yet, no rejected/trusted dirs yet) doesn't crash the page.
Razor page layout: two tables (Rejected / Trusted) with Subject / Issuer / Thumbprint / Valid-window / Actions columns, status banner after each action with success or warning kind ('file missing' = another admin handled it), FleetAdmin-only via [Authorize(Roles=AdminRoles.FleetAdmin)]. Each action invokes LogActionAsync which Serilog-logs the authenticated admin user + thumbprint + action for an audit trail — DB-level ConfigAuditLog persistence is deferred because its schema is cluster-scoped and cert actions are cluster-agnostic; Serilog + CertTrustService's filesystem-op info logs give the forensic trail in the meantime. Sidebar link added to MainLayout between Reservations and the future Account page.
Tests — CertTrustServiceTests (9 new unit cases): ListRejected parses Subject + Thumbprint + store kind from a self-signed test cert written into rejected/certs/; rejected and trusted stores are kept separate; TrustRejected moves the file and the Rejected list is empty afterwards; TrustRejected with a thumbprint not in rejected returns false without touching trusted; DeleteRejected removes the file; UntrustCert removes from trusted only; thumbprint match is case-insensitive (operator UX); missing store directories produce empty lists instead of throwing DirectoryNotFoundException (pristine-install tolerance); a junk .der in the store is logged + skipped and the valid certs still surface (one bad file doesn't break the page). Full Admin.Tests Unit suite: 23 pass / 0 fail (14 prior + 9 new). Full Admin build clean — 0 errors, 0 warnings.
lmx-followups.md #3 marked DONE with a cross-reference to this PR and a note that flipping AutoAcceptUntrustedClientCertificates to false as the production default is a deployment-config follow-up, not a code gap — the Admin UI is now ready to be the trust gate.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-18 14:37:55 -04:00
5389d4d22d Phase 3 PR 27 � Fleet status dashboard page (#26) 2026-04-18 14:07:16 -04:00
Joseph Doherty
b5f8661e98 Phase 3 PR 27 — Fleet status dashboard page. New /fleet route shows per-node apply state (ClusterNodeGenerationState joined with ClusterNode for the ClusterId) in a sortable table with summary cards for Total / Applied / Stale / Failed node counts. Stale detection: LastSeenAt older than 30s triggers a table-warning row class + yellow count card. Failed rows get table-danger + red card. Badge classes per LastAppliedStatus: Applied=bg-success, Failed=bg-danger, Applying=bg-info, unknown=bg-secondary. Timestamps rendered as relative-age strings ('42s ago', '15m ago', '3h ago', then absolute date for >24h). Error column is truncated to 320px with the full message in a tooltip so the table stays readable on wide fleets. Initial data load on OnInitializedAsync; auto-refresh every 5s via a Timer that calls InvokeAsync(RefreshAsync) — matches the FleetStatusPoller's 5s cadence so the dashboard sees the most recent state without polling ahead of the broadcaster. A Refresh button also kicks a manual reload; _refreshing gate prevents double-runs when the timer fires during an in-flight query. IServiceScopeFactory (matches FleetStatusPoller's pattern) creates a fresh DI scope per refresh so the per-page DbContext can't race the timer with the render thread; no new DI registrations needed. Live SignalR hub push is deliberately deferred to a follow-up PR — the existing FleetStatusHub + NodeStateChangedMessage already works for external JavaScript clients; wiring an in-process Blazor Server consumer adds HubConnectionBuilder plumbing that's worth its own focused change. Sidebar link added to MainLayout between Overview and Clusters. Full Admin.Tests Unit suite 14 pass / 0 fail — unchanged, no tests regressed. Full Admin build clean (0 errors, 0 warnings). Closes the 'no per-driver dashboard' gap from lmx-followups item #7 at the fleet level; per-host (platform/engine/Modbus PLC) granularity still needs a dedicated page that consumes IHostConnectivityProbe.GetHostStatuses from the Server process — that's the live-SignalR follow-up.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-18 13:12:31 -04:00
4058b88784 Merge pull request 'Phase 3 PR 26 — server-layer write authorization by role' (#25) from phase-3-pr26-server-write-authz into v2 2026-04-18 13:04:35 -04:00
Joseph Doherty
6b04a85f86 Phase 3 PR 26 — server-layer write authorization gating by role. Per the user's ACL-at-server-layer directive (saved as feedback_acl_at_server_layer.md in memory), write authorization is enforced in DriverNodeManager.OnWriteValue and never delegated to the driver or to driver-specific auth (the v1 Galaxy-provided security path is explicitly not part of v2 — drivers report SecurityClassification as discovery metadata only). New WriteAuthzPolicy static class in Server/Security/ maps SecurityClassification → required role per the table documented in docs/Configuration.md: FreeAccess = no role required (anonymous sessions can write), Operate + SecuredWrite = WriteOperate, Tune = WriteTune, VerifiedWrite + Configure = WriteConfigure, ViewOnly = deny regardless of roles. Role matching is case-insensitive and role requirements do NOT cascade — a session with WriteConfigure can write Configure attributes but needs WriteOperate separately to write Operate attributes; this is deliberate so escalation is an explicit LDAP group assignment, not a hierarchy the policy silently grants. DriverNodeManager gains a _securityByFullRef Dictionary populated during Variable() registration (parallel to the existing _variablesByFullRef) so OnWriteValue can look up the classification in O(1) on the hot path. OnWriteValue casts the session's context.UserIdentity to the new IRoleBearer interface (implemented by OtOpcUaServer.RoleBasedIdentity from PR 19) — empty Roles collection when the session is anonymous; the same WriteAuthzPolicy.IsAllowed check then either short-circuits true (FreeAccess), false (ViewOnly), or walks the roles list looking for the required one. On deny, OnWriteValue logs 'Write denied for {FullRef}: classification=X userRoles=[...]' at Information level (readable trail for operator complaints) and returns BadUserAccessDenied without touching IWritable.WriteAsync — drivers never see a request we'd have refused. IRoleBearer kept as a minimal server-side interface rather than reusing some abstraction from Core.Abstractions because the concept is OPC-UA-session-scoped and doesn't generalize (the driver side has no notion of a user session). Tests — WriteAuthzPolicyTests (17 new cases): FreeAccess allows write with empty role set + arbitrary roles; ViewOnly denies write even with every role; Operate requires WriteOperate; role match is case-insensitive; Operate denies empty role set + wrong role; SecuredWrite shares Operate's requirement; Tune requires WriteTune; Tune denies WriteOperate-only (asserts roles don't cascade — this is the test that catches a future regression where someone 'helpfully' adds a role-escalation table); Configure requires WriteConfigure; VerifiedWrite shares Configure's requirement; multi-role session allowed when any role matches; unrelated roles denied; RequiredRole theory covering all 5 classified-and-mapped rows + null for FreeAccess/ViewOnly special cases. lmx-followups.md follow-up #2 marked DONE with a back-reference to this PR and the memory note. Full Server.Tests Unit suite: 38 pass / 0 fail (17 new WriteAuthz + 14 SecurityConfiguration from PR 19 + 2 NodeBootstrap + 5 others). Server.Tests Integration (Category=Integration) 2 pass — existing PR 17 anonymous-endpoint smoke tests stay green since the read path doesn't hit OnWriteValue.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-18 13:01:01 -04:00
cd8691280a Merge pull request 'Phase 3 PR 25 — Modbus test plan + DL205 quirk catalog' (#24) from phase-3-pr25-modbus-test-plan into v2 2026-04-18 12:49:19 -04:00
Joseph Doherty
77d09bf64e Phase 3 PR 25 — modbus-test-plan.md: integration-test playbook with per-device quirk catalog. ModbusPal is the chosen simulator; AutomationDirect DL205 is the first target device class with 6 pending quirks to document and cover with named tests (word order for 32-bit values, register-zero access policy, coil addressing base, maximum registers per FC03, response framing under sustained load, exception code on protected-bit coil write). Each quirk placeholder has a proposed test name so the user's validation work translates directly into integration tests. Test conventions section codifies the named-per-quirk pattern, skip-when-unreachable guard, real ModbusTcpTransport usage, and inter-test isolation. Sets up the harness-and-catalog structure future device families (Allen-Bradley Micrologix, Siemens S7-1200 Modbus gateway, Schneider M340, whatever the user hits) will slot into — same per-device catalog shape, cross-device patterns section for recurring quirks that can get promoted into driver defaults. Next concrete PRs proposed: PR 26 for the integration test project scaffold + DL205 profile + fixture with skip-guard + one smoke test, PR 27+ for the individual confirmed quirks one-per-PR.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-18 12:45:21 -04:00
163c821e74 Merge pull request 'Phase 3 PR 24 — Modbus PLC data type extensions' (#23) from phase-3-pr24-modbus-types into v2 2026-04-18 12:32:55 -04:00
Joseph Doherty
eea31dcc4e Phase 3 PR 24 — Modbus PLC data type extensions. Extends ModbusDataType beyond the textbook Int16/UInt16/Int32/UInt32/Float32 set with Int64/UInt64/Float64 (4-register types), BitInRegister (single bit within a holding register, BitIndex 0-15 LSB-first), and String (ASCII packed 2 chars per register with StringLength-driven sizing). Adds ModbusByteOrder enum on ModbusTagDefinition covering the two word-orderings that matter in the real PLC population: BigEndian (ABCD — Modbus TCP standard, Schneider PLCs that follow it strictly) and WordSwap (CDAB — Siemens S7 family, several Allen-Bradley series, some Modicon families). NormalizeWordOrder helper reverses word pairs in-place for 32-bit values and reverses all four words for 64-bit values (keeps bytes big-endian within each register, which is universal; swaps only the word positions). Internal codec surface switched from (bytes, ModbusDataType) pairs to (bytes, ModbusTagDefinition) because the tag carries the ByteOrder + BitIndex + StringLength context the codec needs; RegisterCount similarly takes the tag so strings can compute ceil(StringLength/2). DriverDataType mapping in MapDataType extended to cover the new logical types — Int64/UInt64 widen to Int32 (PR 25 follow-up: extend DriverDataType enum with Int64 to avoid precision loss), Float64 maps to DriverDataType.Float64, String maps to DriverDataType.String, BitInRegister surfaces as Boolean, all other mappings preserved. BitInRegister writes throw a deliberate InvalidOperationException with a 'read-modify-write' hint — to atomically flip a single bit the driver needs to FC03 the register, OR/AND in the bit, then FC06 it back; that's a separate PR because the bit-modify atomicity story needs a per-register mutex and optional compare-and-write semantics. Everything else (decoder paths for both byte orders, Int64/UInt64/Float64 encode + decode, bit-index extraction across both register halves, String nul-truncation on decode, String nul-padding on encode) ships here. Tests (21 new ModbusDataTypeTests): RegisterCount_returns_correct_register_count_per_type theory (10 rows covering every numeric type); RegisterCount_for_String_rounds_up_to_register_pair theory (5 rows including the 0-char edge case that returns 0 registers); Int32_BigEndian_decodes_ABCD_layout + Int32_WordSwap_decodes_CDAB_layout + Float32_WordSwap_encode_decode_roundtrips (covers the two most-common 32-bit orderings); Int64_BigEndian_roundtrips + UInt64_WordSwap_reverses_four_words (word-swap on 64-bit reverses the four-word layout explicitly, with the test computing the expected wire shape by hand rather than trusting the implementation) + Float64_roundtrips_under_word_swap (3.14159265358979 survives the round-trip with 1e-12 tolerance); BitInRegister_extracts_bit_at_index theory (6 rows including LSB, MSB, and arbitrary bits in a multi-bit mask); BitInRegister_write_is_not_supported_in_PR24 (asserts the exception message steers the reader to the 'read-modify-write' follow-up); String_decodes_ASCII_packed_two_chars_per_register (decodes 'HELLO!' from 3 packed registers with the 'HELLO!'u8 test-only UTF-8 literal which happens to equal the ASCII bytes for this ASCII input); String_decode_truncates_at_first_nul ('Hi' padded with nuls reads back as 'Hi'); String_encode_nul_pads_remaining_bytes (short input writes remaining bytes as 0). Full solution: 0 errors, 217 unit + integration tests pass (22 + 30 new Modbus = 52 Modbus total, 165 pre-existing). ModbusDriver capability footprint now matches the most common industrial PLC workloads — Siemens S7 + Allen-Bradley + Modicon all supported via ByteOrder config without driver forks.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-18 12:27:12 -04:00
8a692d4ba8 Merge pull request 'Phase 3 PR 23 — Modbus IHostConnectivityProbe' (#22) from phase-3-pr23-modbus-probe into v2 2026-04-18 12:23:04 -04:00
Joseph Doherty
268b12edec Phase 3 PR 23 — Modbus IHostConnectivityProbe. ModbusDriver now implements 6 of 8 capability interfaces (adds IHostConnectivityProbe alongside IDriver + ITagDiscovery + IReadable + IWritable + ISubscribable from the earlier PRs). Background probe loop kicks off in InitializeAsync when ModbusProbeOptions.Enabled is true, sends a cheap FC03 read-1-register at ProbeAddress (default 0) every Interval (default 5s) with a per-tick Timeout (default 2s), and tracks the single Modbus endpoint's state in the HostState machine. Initial state = Unknown; first successful probe transitions to Running; any transport/timeout failure transitions to Stopped; recovery transitions back to Running. OnHostStatusChanged fires exactly on transitions (not on repeat successes — prevents event-spam on a healthy connection). HostName format is 'host:port' so the Admin UI can display the endpoint uniformly with Galaxy platforms/engines in the fleet status dashboard. GetHostStatuses returns a single-item list with the current state + last-change timestamp (Modbus driver talks to exactly one endpoint per instance — operators spin up multiple driver instances for multi-PLC deployments). ShutdownAsync cancels the probe CTS before tearing down the transport so the loop can't log a spurious Stopped after intentional shutdown (OperationCanceledException caught separately from the 'real' transport errors). ModbusDriverOptions extended with ModbusProbeOptions sub-record (Enabled default true, Interval 5s, Timeout 2s, ProbeAddress ushort for PLCs that have register-0 policies; most PLCs tolerate an FC03 at 0 but some industrial gateways lock it). Tests (7 new ModbusProbeTests): Initial_state_is_Unknown_before_first_probe_tick (probe disabled, state stays Unknown, HostName formatted); First_successful_probe_transitions_to_Running (enabled, waits for probe count + event queue, asserts Unknown → Running with correct OldState/NewState); Transport_failure_transitions_to_Stopped (flip fake.Reachable = false mid-run, wait for state diff); Recovery_transitions_Stopped_back_to_Running (up → down → up, asserts ≥ 3 transitions); Repeated_successful_probes_do_not_generate_duplicate_Running_events (several hundred ms of stable probes, count stays at 1); Disabled_probe_stays_Unknown_and_fires_no_events (safety guard when operator wants to disable probing); Shutdown_stops_the_probe_loop (probe count captured at shutdown, delay 400ms, assert ≤ 1 extra to tolerate the narrow race where an in-flight tick completes after shutdown — the contract is 'no new ticks scheduled' not 'instantaneous freeze'). FlappyTransport fake exposes a volatile Reachable flag so tests can toggle the PLC availability mid-run, + ProbeCount counter so tests can assert the loop actually issued requests. WaitForStateAsync helper polls GetHostStatuses up to a deadline; tolerates scheduler jitter on slow CI runners. Full solution: 0 errors, 202 unit + integration tests pass (22 Modbus + 180 pre-existing).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-18 12:12:00 -04:00
edce1be742 Merge pull request 'Phase 3 PR 22 — Modbus ISubscribable via polling overlay' (#21) from phase-3-pr22-modbus-subscribe into v2 2026-04-18 12:07:51 -04:00
Joseph Doherty
18b3e24710 Phase 3 PR 22 — Modbus ISubscribable via polling overlay. Modbus has no push model at the wire protocol (unlike MXAccess's OnDataChange callback or OPC UA's own Subscription service), so the driver layers a per-subscription polling loop on top of the existing IReadable path: SubscribeAsync returns an opaque ModbusSubscriptionHandle, starts a background Task.Run that sleeps for the requested publishing interval (floored to 100ms so a misconfigured sub-10ms request doesn't hammer the PLC), reads every subscribed tag through the same FC01/03/04 path the one-shot ReadAsync uses, diffs the returned DataValueSnapshot against the last known value per tag, and raises OnDataChange exactly when (a) it's the first poll (initial-data push per OPC UA Part 4 convention) or (b) boxed value changed or (c) StatusCode changed — stable values don't generate event traffic after the initial push, matching the v1 MXAccess OnDataChange shape. SubscriptionState record holds the handle + tag list + interval + per-subscription CancellationTokenSource + ConcurrentDictionary<string,DataValueSnapshot> LastValues; UnsubscribeAsync removes the state from _subscriptions and cancels the CTS, stopping the polling loop cleanly. Multiple overlapping subscriptions each get their own polling Task so a slow PLC on one subscription can't stall the others. ShutdownAsync cancels every active subscription CTS before tearing down the transport so the driver doesn't leave orphaned polling tasks pumping requests against a disposed socket. Transient poll errors are swallowed inside the loop (the loop continues to the next tick) — the driver's health surface reflects the last-known Degraded state from the underlying ReadAsync path. OperationCanceledException is caught separately to exit the loop silently on unsubscribe/shutdown. Tests (6 new ModbusSubscriptionTests): Initial_poll_raises_OnDataChange_for_every_subscribed_tag asserts the initial-data push fires once per tag in the subscribe call (2 tags → 2 events with FullReference='Level' and FullReference='Temp'); Unchanged_values_do_not_raise_after_initial_poll lets the loop run ~5 cycles at 100ms with a stable register value, asserts only the initial push fires (no event spam on stable tags); Value_change_between_polls_raises_OnDataChange mutates the fake register bank between poll ticks and asserts a second event fires with the new value (verified via e.Snapshot.Value.ShouldBe((short)200)); Unsubscribe_stops_the_polling_loop captures the event count right after UnsubscribeAsync, mutates a register that would have triggered a change if polling continued, asserts the count stays the same after 400ms; SubscribeAsync_floors_intervals_below_100ms passes a 10ms interval + asserts only 1 event fires across 300ms (if the floor weren't enforced we'd see 30 — the test asserts the floor semantically by counting events on stable data); Multiple_subscriptions_fire_independently creates two subs on different tags, unsubscribes only one, mutates the other's tag, asserts only the surviving sub emits while the unsubscribed one stays at its pre-unsubscribe count. FakeTransport in this test file is scoped to FC03 only since that's all the subscription path exercises — keeps the test doubles minimal and the failure modes obvious. WaitForCountAsync helper polls a ConcurrentQueue up to a deadline, makes the tests tolerant of scheduler jitter on slow CI runners. Full solution: 0 errors, 195 tests pass (6 new subscription + 9 existing Modbus + 180 pre-existing). ModbusDriver now implements IDriver + ITagDiscovery + IReadable + IWritable + ISubscribable — five of the eight capability interfaces. IAlarmSource + IHistoryProvider remain unimplemented because Modbus has no wire-level alarm or history semantics; IHostConnectivityProbe is a plausible future addition (treat transport disconnect as a Stopped signal) but needs the socket-level connection-state tracking plumbed through IModbusTransport which is its own PR.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-18 12:03:39 -04:00
f6a12dafe9 Merge pull request 'Phase 3 PR 21 — Modbus TCP driver (first native-protocol greenfield)' (#20) from phase-3-pr21-modbus-driver into v2 2026-04-18 11:58:20 -04:00
Joseph Doherty
058c3dddd3 Phase 3 PR 21 — Modbus TCP driver: first native-protocol greenfield for v2. New src/Driver.Modbus project with ModbusDriver implementing IDriver + ITagDiscovery + IReadable + IWritable. Validates the driver-agnostic abstractions (IAddressSpaceBuilder, DriverAttributeInfo, DataValueSnapshot, WriteRequest/WriteResult) generalize beyond Galaxy — nothing Galaxy-specific is used here. ModbusDriverOptions carries Host/Port/UnitId/Timeout + a pre-declared tag list (Modbus has no discovery protocol — tags are configuration). IModbusTransport abstracts the socket layer so tests swap in-memory fakes; concrete ModbusTcpTransport speaks the MBAP ADU (TxId + Protocol=0 + Length + UnitId + PDU) over TcpClient, serializes requests through a semaphore for single-flight in-order responses, validates the response TxId matches, surfaces server exception PDUs as ModbusException with function code + exception code. DiscoverAsync streams one folder per driver with a BaseDataVariable per tag + DriverAttributeInfo that flags writable tags as SecurityClassification.Operate vs ViewOnly for read-only regions. ReadAsync routes per-tag by ModbusRegion: FC01 for Coils, FC02 for DiscreteInputs, FC03 for HoldingRegisters, FC04 for InputRegisters; register values decoded through System.Buffers.Binary.BinaryPrimitives (BigEndian for single-register Int16/UInt16 + two-register Int32/UInt32/Float32 per standard modbus word-swap conventions). WriteAsync uses FC05 (Write Single Coil with 0xFF00/0x0000 encoding) for booleans, FC06 (Write Single Register) for 16-bit types, FC16 (Write Multiple Registers) for 32-bit types. Unknown tag → BadNodeIdUnknown; write to InputRegister or DiscreteInput or Writable=false tag → BadNotWritable; exception during transport → BadInternalError + driver health Degraded. Subscriptions + Historian + Alarms deliberately out of scope — Modbus has no push model (subscribe would be a polling overlay, additive PR) and no history/alarm semantics at the protocol level. Tests (9 new ModbusDriverTests): InitializeAsync connects + populates the tag map + sets health=Healthy; Read Int16 from HoldingRegister returns BigEndian value; Read Float32 spans two registers BigEndian (IEEE 754 single for 25.5f round-trips exactly); Read Coil returns boolean from the bit-packed response; unknown tag name returns BadNodeIdUnknown without an exception; Write UInt16 round-trips via FC06; Write Float32 uses FC16 (two-register write verified by decoding back through the fake register bank); Write to InputRegister returns BadNotWritable; Discover streams one folder + one variable per tag with correct DriverDataType mapping (Int16/Int32→Int32, UInt16/UInt32→Int32, Float32→Float32, Bool→Boolean). FakeTransport simulates a 256-register/256-coil bank + implements the 7 function codes the driver uses. slnx updated with the new src + tests entries. Full solution post-add: 0 errors, 189 tests pass (9 Modbus + 180 pre-existing). IDriver abstraction validated against a fundamentally different protocol — Modbus TCP has no AlarmExtension, no ScanState, no IPC boundary, no historian, no LDAP — and the same builder/reader/writer contract plugged straight in. Future PRs on this driver: ISubscribable via a polling loop, IHostConnectivityProbe for dead-device detection, PLC-specific data-type extensions (Int64/BCD/string-in-registers).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-18 11:55:21 -04:00
52791952dd Merge pull request 'Phase 3 PR 20 — lmx-followups.md' (#19) from phase-3-pr20-lmx-followups into v2 2026-04-18 11:50:38 -04:00
Joseph Doherty
860deb8e0d Phase 3 PR 20 — lmx-followups.md: track remaining Galaxy-bridge tasks after PR 19 (HistoryReadAtTime/Events Proxy wiring, write-gating by role, Admin UI cert trust, live-LDAP integration test, full-stack Galaxy smoke, multi-driver test, per-host dashboard). Documents what each item depends on, the shipped surface it builds on, and the minimal to-do so a future session can pick any one off in isolation.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-18 11:43:15 -04:00
f5e7173de3 Merge pull request 'Phase 3 PR 19 — LDAP user identity + Basic256Sha256 security profile' (#18) from phase-3-pr19-ldap-security into v2 2026-04-18 11:36:18 -04:00