Files
lmxopcua/docs/v2
Joseph Doherty 9de96554dc Phase 3 PR 41 — Document AutomationDirect DL205 / DL260 Modbus quirks. Adds docs/v2/dl205.md (~300 lines, 8 H2 sections, primary-source citations) covering every place the DL205/DL260 family diverges from textbook Modbus or has non-obvious behavior a generic client gets wrong. Replaces the placeholder _pending_ list in modbus-test-plan.md with a confirmed-behaviors table that doubles as the integration-test roadmap.
The user explicitly flagged that DL205/DL260 strings don't follow Modbus convention; research turned up that and a lot more. Headline findings:
String packing — TWO chars per V-memory register but the FIRST char is in the LOW byte (opposite of the big-endian Modbus convention generic drivers default to). 'Hello' in V2000 reads back as 'eHll o\0' on a textbook decoder. Kepware's DirectLogic driver exposes a per-tag 'String Byte Order = Low/High' toggle specifically for this; we'll need the same. Null-terminated, no length prefix, no dedicated KSTR address space — strings live wherever ladder allocates them in V-memory.
V-memory addressing — DirectLOGIC's native V-memory is OCTAL (V2000, V40400) but Modbus is decimal. The CPU translates: V2000 octal = decimal 1024 = Modbus PDU 0x0400. The widespread 'V40400 = register 0' shorthand is wrong on modern firmware (that was DL05/DL06 relative mode); on H2-ECOM100 absolute mode (factory default) V40400 = PDU 0x2100. We'd surface this with an address-format helper in the device profile so operators write V2000 instead of computing 1024 by hand.
Word order CDAB for all 32-bit values — DL205 and DL260 agree, ECOM modules don't re-swap. Already supported via ModbusByteOrder.WordSwap; just needs to be the default in the DL205 profile.
BCD-as-default numeric storage — bit one I didn't expect. DirectLOGIC stores 'V2000 = 1234' as 0x1234 on the wire (BCD nibbles), not as 0x04D2 (decimal 1234). IEEE 754 Float32 only works when ladder used the explicit R type (LDR/OUTR instructions). We need a new decoder mode for BCD-encoded registers — current code assumes binary integers.
FC quantity caps — FC03/04 cap at 128 (above spec's 125 — Bonus territory, current code already respects 125), FC16 caps at 100 (BELOW spec's 123 — important bulk-write batching gotcha). Quantity overrun returns exception 03 IllegalDataValue.
Coil/discrete mappings — DL260: X0->discrete input 0, Y0->coil 2048, C0->coil 3072. SP specials at discrete input 1024-1535 RO. These are CPU-wired constants and cannot be remapped; need to be hardcoded in the DL205/DL260 device profile.
Register 0 — accepted on DL205/DL260 with ECOM in absolute mode, contrary to the widespread internet claim that 'DirectLOGIC rejects register 0'. That rumour was an older DL05/DL06 relative-mode artefact. Our ModbusProbeOptions.ProbeAddress default of 0 is therefore safe for DL205/DL260.
Exception codes — only the standard 01-04. Write-to-protected-bit returns 02 on newer firmware, 04 on older (firmware-transition revision unconfirmed); driver should map both to BadNotWritable. No proprietary exception codes.
Behavioral oddities — H2-ECOM100 accepts MAX 4 simultaneous TCP connections (5th refused at TCP accept). No TCP keepalive (intermediate NAT/firewall drops idle sockets after 2-5 min — periodic probe required). No mid-stream resync on malformed MBAP — driver must reconnect + replay. TxId-drop-under-load forum rumour is unconfirmed; our single-flight + TxId-match guard handles it either way.
Each H2 section ends with the integration-test names we'd ship per the modbus-test-plan.md DL205_<behavior> convention — twelve named test slots ready for PR 42+ to fill in one at a time. References (8) cited inline, primarily D2-USER-M, HA-ECOM-M, and the Kepware DirectLogic Ethernet driver manual which documents these vendor quirks explicitly because they have to cope with them.
modbus-test-plan.md DL205 section rewritten as a priority-ordered table with three columns (quirk / driver impact / test name), pointing the reader at dl205.md for the full reference. Operator-reported items separated into a tail subsection so future-me knows which behaviors are documented vs reproduced-on-hardware.
Pure documentation PR — no code changes. The actual driver work (string-byte-order option, BCD decoder mode, V-memory address helper, FC16 cap-per-device-family, multi-client TCP handling) lands one PR per quirk in PR 42+ as ModbusPal validation completes.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-18 19:49:35 -04:00
..
Phase 2 PR 4 — close the 4 open high/medium MXAccess findings from exit-gate-phase-2-final.md. High 1 (ReadAsync subscription-leak on cancel): the one-shot read now wraps subscribe→first-OnDataChange→unsubscribe in try/finally so the per-tag callback is always detached, and if the read installed the underlying MXAccess subscription itself (the prior _addressToHandle key was absent) it tears it down on the way out — no leaked probe item handles when the caller cancels or times out. High 2 (no reconnect loop): MxAccessClient gets a MxAccessClientOptions {AutoReconnect, MonitorInterval=5s, StaleThreshold=60s} + a background MonitorLoopAsync started at first ConnectAsync. The loop wakes every MonitorInterval, checks _lastObservedActivityUtc (bumped by every OnDataChange callback), and if stale probes the proxy with a no-op COM AddItem("$Heartbeat") on the StaPump; if the probe throws or returns false, the loop reconnects-with-replay — Unregister (best-effort), Register, snapshot _addressToHandle.Keys + clear, re-AddItem every previously-active subscription, ConnectionStateChanged events fire for the false→true transition, ReconnectCount bumps. Medium 3 (subscriptions don't push frames back to Proxy): IGalaxyBackend gains OnDataChange/OnAlarmEvent/OnHostStatusChanged events; new IFrameHandler.AttachConnection(FrameWriter) is called per-connection by PipeServer after Hello + the returned IDisposable disposes at connection close; GalaxyFrameHandler.ConnectionSink subscribes the events for the connection lifetime, fire-and-forget pushes them as MessageKind.OnDataChangeNotification / AlarmEvent / RuntimeStatusChange frames through the writer, swallows ObjectDisposedException for the dispose race, and unsubscribes in Dispose to prevent leaked invocation list refs across reconnects. MxAccessGalaxyBackend's existing SubscribeAsync (which previously discarded values via a (_, __) => {} callback) now wires OnTagValueChanged that fans out per-tag value changes to every subscription ID listening (one MXAccess subscription, multi-fan-out — _refToSubs reverse map). UnsubscribeAsync also reverse-walks the map to only call mx.UnsubscribeAsync when the LAST sub for a tag drops. Stub + DbBacked backends declare the events with #pragma warning disable CS0067 because they never raise them but must satisfy the interface (treat-warnings-as-errors would otherwise fail). Medium 4 (WriteValuesAsync doesn't await OnWriteComplete): MxAccessClient.WriteAsync rewritten to return Task<bool> via the v1-style TaskCompletionSource-keyed-by-item-handle pattern in _pendingWrites — adds the TCS before the Write call, awaits it with a configurable timeout (default 5s), removes the TCS in finally, returns true only when OnWriteComplete reported success. MxAccessGalaxyBackend.WriteValuesAsync now reports per-tag Bad_InternalError ("MXAccess runtime reported write failure") when the bool returns false, instead of false-positive Good. PipeServer's IFrameHandler interface adds the AttachConnection(FrameWriter):IDisposable method + a public NoopAttachment nested class (net48 doesn't support default interface methods so the empty-attach is exposed for stub implementations). StubFrameHandler returns IFrameHandler.NoopAttachment.Instance. RunOneConnectionAsync calls AttachConnection after HelloAck and usings the returned disposable so it disposes at the connection scope's finally. ConnectionStateChanged event added on MxAccessClient (caller-facing diagnostics for false→true reconnect transitions). docs/v2/implementation/pr-4-body.md is the Gitea web-UI paste-in for opening PR 4 once pushed; includes 2 new low-priority adversarial findings (probe item-handle leak; replay-loop silently swallows per-subscription failures) flagged as follow-ups not PR 4 blockers. Full solution 460 pass / 7 skip (E2E on admin shell) / 1 pre-existing Phase 0 baseline. No regressions vs PR 2's baseline.
2026-04-18 01:12:09 -04:00
Doc — record that this dev box (DESKTOP-6JL3KKO) hosts the full AVEVA stack required for the LmxOpcUa Phase 2 breakout, removing the "needs live MXAccess runtime" environmental blocker that the partial-exit evidence cited as gating Streams D + E. Inventory verified via Get-Service: 27 ArchestrA / Wonderware / AVEVA services running including aaBootstrap, aaGR (Galaxy Repository), aaLogger, aaUserValidator, aaPim, ArchestrADataStore, AsbServiceManager, AutoBuild_Service; the full Historian set (aahClientAccessPoint, aahGateway, aahInSight, aahSearchIndexer, aahSupervisor, InSQLStorage, InSQLConfiguration, InSQLEventSystem, InSQLIndexing, InSQLIOServer, InSQLManualStorage, InSQLSystemDriver, HistorianSearch-x64); slssvc (Wonderware SuiteLink); MXAccess COM DLL at C:\Program Files (x86)\ArchestrA\Framework\bin\ArchestrA.MXAccess.dll plus the matching .tlb files; OI-Gateway install at C:\Program Files (x86)\Wonderware\OI-Server\OI-Gateway\ — which means the Phase 1 Task E.10 AppServer-via-OI-Gateway smoke test (decision #142) is *also* runnable on the same box, not blocked on a separate AVEVA test machine as the original deferral assumed. dev-environment.md inventory row for "Dev Galaxy" now lists every service and file path; status flips to "Fully available — Phase 2 lift unblocked"; the GLAuth row also fills out v2.4.0 actual install details (direct-bind cn={user},dc=lmxopcua,dc=local; users readonly/writeop/writetune/writeconfig/alarmack/admin/serviceaccount; running under NSSM service GLAuth; current GroupToRole mapping ReadOnly→ConfigViewer / WriteOperate→ConfigEditor / AlarmAck→FleetAdmin) and notes the v2-rebrand to dc=otopcua,dc=local is a future cosmetic change. phase-2-partial-exit-evidence.md status header gains "runtime now in place"; an Update 2026-04-17 callout enumerates the same service inventory and concludes "no environmental blocker remains"; the next-session checklist's first step changes from "stand up dev Galaxy" to "verify the local AVEVA stack is still green (Get-Service aaGR, aaBootstrap, slssvc → Running) and the Galaxy ZB repository is reachable" with a new step 9 calling out that the AppServer-via-OI-Gateway smoke test should now be folded in opportunistically. plan.md §"4. Galaxy/MXAccess as Out-of-Process Driver" gains a "Dev environment for the LmxOpcUa breakout" paragraph documenting which physical machine has the runtime so the planning doc no longer reads as if AVEVA capability were a future logistical concern. No source / test changes.
2026-04-17 22:42:15 -04:00
Harden v2 design against the four findings from the 2026-04-17 Codex adversarial review of the db schema and admin UI: (1) DriverInstance.NamespaceId now enforces a same-cluster invariant in three layers (sp_ValidateDraft cross-table check using the new UX_Namespace_Generation_LogicalId_Cluster composite index, server-side namespace-selection API scoping that prevents bypass via crafted requests, and audit-log entries on cross-cluster attempts) so a draft for cluster A can no longer bind to cluster B's namespace and leak its URI into A's endpoint; (2) the Namespace table moves from cluster-level to generation-versioned with append-only logical-ID identity and locked NamespaceUri/Kind across generations so admins can no longer disable a namespace that a published driver depends on outside the publish/diff/rollback flow, the cluster-create workflow opens an initial draft containing the default namespaces instead of writing namespace rows directly, and the Admin UI Namespaces tab becomes hybrid (read-only over published, click-to-edit opens draft) like the UNS Structure tab; (3) ZTag/SAPID fleet-wide uniqueness moves from per-generation indexes (which silently allow rollback or re-enable to reintroduce duplicates) into a new ExternalIdReservation table that sits outside generation versioning, with sp_PublishGeneration reserving atomically via MERGE under transaction lock so a different EquipmentUuid attempting the same active value rolls the whole publish back, an FleetAdmin-only sp_ReleaseExternalIdReservation as the only path to free a value for reuse with audit trail, and a corresponding Release-reservation operator workflow in the Admin UI; (4) Equipment.EquipmentId is now system-generated as 'EQ-' + first 12 hex chars of EquipmentUuid, never operator-supplied or editable, removed from the Equipment CSV import schema entirely (rows match by EquipmentUuid for updates or create new equipment with auto-generated identifiers when no UUID is supplied), with a new Merge-or-Rebind-equipment operator workflow handling the rare case where two UUIDs need to be reconciled — closing the corruption path where typos and bulk-import renames were minting duplicate identities and breaking downstream UUID-keyed lineage. New decisions #122-125 with explicit "supersedes" notes for the earlier #107 (cluster-level namespace) and #116 (operator-set EquipmentId) frames they revise.
2026-04-17 11:08:58 -04:00
Doc — record that this dev box (DESKTOP-6JL3KKO) hosts the full AVEVA stack required for the LmxOpcUa Phase 2 breakout, removing the "needs live MXAccess runtime" environmental blocker that the partial-exit evidence cited as gating Streams D + E. Inventory verified via Get-Service: 27 ArchestrA / Wonderware / AVEVA services running including aaBootstrap, aaGR (Galaxy Repository), aaLogger, aaUserValidator, aaPim, ArchestrADataStore, AsbServiceManager, AutoBuild_Service; the full Historian set (aahClientAccessPoint, aahGateway, aahInSight, aahSearchIndexer, aahSupervisor, InSQLStorage, InSQLConfiguration, InSQLEventSystem, InSQLIndexing, InSQLIOServer, InSQLManualStorage, InSQLSystemDriver, HistorianSearch-x64); slssvc (Wonderware SuiteLink); MXAccess COM DLL at C:\Program Files (x86)\ArchestrA\Framework\bin\ArchestrA.MXAccess.dll plus the matching .tlb files; OI-Gateway install at C:\Program Files (x86)\Wonderware\OI-Server\OI-Gateway\ — which means the Phase 1 Task E.10 AppServer-via-OI-Gateway smoke test (decision #142) is *also* runnable on the same box, not blocked on a separate AVEVA test machine as the original deferral assumed. dev-environment.md inventory row for "Dev Galaxy" now lists every service and file path; status flips to "Fully available — Phase 2 lift unblocked"; the GLAuth row also fills out v2.4.0 actual install details (direct-bind cn={user},dc=lmxopcua,dc=local; users readonly/writeop/writetune/writeconfig/alarmack/admin/serviceaccount; running under NSSM service GLAuth; current GroupToRole mapping ReadOnly→ConfigViewer / WriteOperate→ConfigEditor / AlarmAck→FleetAdmin) and notes the v2-rebrand to dc=otopcua,dc=local is a future cosmetic change. phase-2-partial-exit-evidence.md status header gains "runtime now in place"; an Update 2026-04-17 callout enumerates the same service inventory and concludes "no environmental blocker remains"; the next-session checklist's first step changes from "stand up dev Galaxy" to "verify the local AVEVA stack is still green (Get-Service aaGR, aaBootstrap, slssvc → Running) and the Galaxy ZB repository is reachable" with a new step 9 calling out that the AppServer-via-OI-Gateway smoke test should now be folded in opportunistically. plan.md §"4. Galaxy/MXAccess as Out-of-Process Driver" gains a "Dev environment for the LmxOpcUa breakout" paragraph documenting which physical machine has the runtime so the planning doc no longer reads as if AVEVA capability were a future logistical concern. No source / test changes.
2026-04-17 22:42:15 -04:00
Phase 2 Stream D Option B — archive v1 surface + new Driver.Galaxy.E2E parity suite. Non-destructive intermediate state: the v1 OtOpcUa.Host + Historian.Aveva + Tests + IntegrationTests projects all still build (494 v1 unit + 6 v1 integration tests still pass when run explicitly), but solution-level dotnet test ZB.MOM.WW.OtOpcUa.slnx now skips them via IsTestProject=false on the test projects + archive-status PropertyGroup comments on the src projects. The destructive deletion is reserved for Phase 2 PR 3 with explicit operator review per CLAUDE.md "only use destructive operations when truly the best approach". tests/ZB.MOM.WW.OtOpcUa.Tests/ renamed via git mv to tests/ZB.MOM.WW.OtOpcUa.Tests.v1Archive/; csproj <AssemblyName> kept as the original ZB.MOM.WW.OtOpcUa.Tests so v1 OtOpcUa.Host's [InternalsVisibleTo("ZB.MOM.WW.OtOpcUa.Tests")] still matches and the project rebuilds clean. tests/ZB.MOM.WW.OtOpcUa.IntegrationTests gets <IsTestProject>false</IsTestProject>. src/ZB.MOM.WW.OtOpcUa.Host + src/ZB.MOM.WW.OtOpcUa.Historian.Aveva get PropertyGroup archive-status comments documenting they're functionally superseded but kept in-build because cascading dependencies (Historian.Aveva → Host; IntegrationTests → Host) make a single-PR deletion high blast-radius. New tests/ZB.MOM.WW.OtOpcUa.Driver.Galaxy.E2E/ project (.NET 10) with ParityFixture that spawns OtOpcUa.Driver.Galaxy.Host.exe (net48 x86) as a Process.Start subprocess with OTOPCUA_GALAXY_BACKEND=db env vars, awaits 2s for the PipeServer to bind, then exposes a connected GalaxyProxyDriver; skips on non-Windows / Administrator shells (PipeAcl denies admins per decision #76) / ZB unreachable / Host EXE not built — each skip carries a SkipReason string the test method reads via Assert.Skip(SkipReason). RecordingAddressSpaceBuilder captures every Folder/Variable/AddProperty registration so parity tests can assert on the same shape v1 LmxNodeManager produced. HierarchyParityTests (3) — Discover returns gobjects with attributes; attribute full references match the tag.attribute Galaxy reference grammar; HistoryExtension flag flows through correctly. StabilityFindingsRegressionTests (4) — one test per 2026-04-13 stability finding from commits c76ab8f and 7310925: phantom probe subscription doesn't corrupt unrelated host status; HostStatusChangedEventArgs structurally carries a specific HostName + OldState + NewState (event signature mathematically prevents the v1 cross-host quality-clear bug); all GalaxyProxyDriver capability methods return Task or Task<T> (sync-over-async would deadlock OPC UA stack thread); AcknowledgeAsync completes before returning (no fire-and-forget background work that could race shutdown). Solution test count: 470 pass / 7 skip (E2E on admin shell) / 1 pre-existing Phase 0 baseline. Run archived suites explicitly: dotnet test tests/ZB.MOM.WW.OtOpcUa.Tests.v1Archive (494 pass) + dotnet test tests/ZB.MOM.WW.OtOpcUa.IntegrationTests (6 pass). docs/v2/V1_ARCHIVE_STATUS.md inventories every archived surface with run-it-explicitly instructions + a 10-step deletion plan for PR 3 + rollback procedure (git revert restores all four projects). docs/v2/implementation/exit-gate-phase-2-final.md supersedes the two partial-exit docs with the per-stream status table (A/B/C/D/E all addressed, D split across PR 2/3 per safety protocol), the test count breakdown, fresh adversarial review of PR 2 deltas (4 new findings: medium IsTestProject=false safety net loss, medium structural-vs-behavioral stability tests, low backend=db default, low Process.Start env inheritance), the 8 carried-forward findings from exit-gate-phase-2.md, the recommended PR order (1 → 2 → 3 → 4). docs/v2/implementation/pr-2-body.md is the Gitea web-UI paste-in for opening PR 2 once pushed.
2026-04-18 00:56:21 -04:00