Files
lmxopcua/docs/v2
Joseph Doherty 4695a5c88e Phase 6 — Draft 4 implementation plans covering v2 unimplemented features + adversarial review + adjustments. After drivers were paused per user direction, audited the v2 plan for features documented-but-unshipped and identified four coherent tracks that had no implementation plan at all. Each plan follows the docs/v2/implementation/phase-*.md template (DRAFT status, branch name, Stream A-E task breakdown, Compliance Checks, Risks, Completion Checklist). docs/v2/implementation/phase-6-1-resilience-and-observability.md (243 lines) covers Polly resilience pipelines wired to every capability interface, Tier A/B/C runtime enforcement (memory watchdog generalized beyond Galaxy, scheduled recycle per decision #67, wedge detection), health endpoints on :4841, structured Serilog with correlation IDs, LiteDB local-cache fallback per decision #36. phase-6-2-authorization-runtime.md (145 lines) wires ACL enforcement on every OPC UA Read/Write/Subscribe/Call path + LDAP-group-to-admin-role grants per decisions #105 and #129 -- runtime permission-trie evaluator over the 6-level Cluster/Namespace/UnsArea/UnsLine/Equipment/Tag hierarchy, per-session cache invalidated on generation-apply + LDAP-cache expiry. phase-6-3-redundancy-runtime.md (165 lines) lands the non-transparent warm/hot redundancy runtime per decisions #79-85: dynamic ServiceLevel node, ServerUriArray peer broadcast, mid-apply dip via sp_PublishGeneration hook, operator-driven role transition (no auto-election -- plan remains explicit about what's out of scope). phase-6-4-admin-ui-completion.md (178 lines) closes Phase 1 Stream E completion-checklist items that never landed: UNS drag-reorder + impact preview, Equipment CSV import, 5-identifier search, draft-diff viewer enhancements, OPC 40010 _base Identification field exposure per decisions #138-139. Each plan then got a Codex adversarial-review pass (codex mcp tool, read-only sandbox, synchronous). Reviews explicitly targeted decision-log conflicts, API-shape assumptions, unbounded blast radius, under-specified state transitions, and testing holes. Appended 'Adversarial Review — 2026-04-19' section to each plan with numbered findings (severity / finding / why-it-matters / adjustment accepted). Review surfaced real substantive issues that the initial drafts glossed over: Phase 6.1 auto-retry conflicting with decisions #44-45 no-auto-write-retry rule; Phase 6.1 per-driver-instance pipeline breaking decision #35's per-device isolation; Phase 6.1 recycle/watchdog at Tier A/B breaching decisions #73-74 Tier-C-only constraint; Phase 6.2 conflating control-plane LdapGroupRoleMapping with data-plane ACL grants; Phase 6.2 missing Browse enforcement entirely; Phase 6.2 subscription re-authorization policy unresolved between create-time-only and per-publish; Phase 6.3 ServiceLevel=0 colliding with OPC UA Part 5 Maintenance semantics; Phase 6.3 ServerUriArray excluding self (spec-bug); Phase 6.3 apply-window counter race on cancellation; Phase 6.3 client cutover for Kepware/Aveva OI Gateway is unverified hearsay; Phase 6.4 stale UNS impact preview overwriting concurrent draft edits; Phase 6.4 identifier contract drifting from admin-ui.md canonical set (ZTag/MachineCode/SAPID/EquipmentId/EquipmentUuid, not ZTag/SAPID/UniqueId/Alias1/Alias2); Phase 6.4 CSV import atomicity internally contradictory (single txn vs chunked inserts); Phase 6.4 OPC 40010 field list not matching decision #139. Every finding has an adjustment in the plan doc -- plans are meant to be executable from the next session with the critique already baked in rather than a clean draft that would run into the same issues at implementation time. Codex thread IDs cited in each plan's review section for reproducibility. Pure documentation PR -- no code changes. Plans are DRAFT status; each becomes its own implementation phase with its own entry-gate + exit-gate when business prioritizes.
2026-04-19 03:15:00 -04:00
..
Phase 6 — Draft 4 implementation plans covering v2 unimplemented features + adversarial review + adjustments. After drivers were paused per user direction, audited the v2 plan for features documented-but-unshipped and identified four coherent tracks that had no implementation plan at all. Each plan follows the docs/v2/implementation/phase-*.md template (DRAFT status, branch name, Stream A-E task breakdown, Compliance Checks, Risks, Completion Checklist). docs/v2/implementation/phase-6-1-resilience-and-observability.md (243 lines) covers Polly resilience pipelines wired to every capability interface, Tier A/B/C runtime enforcement (memory watchdog generalized beyond Galaxy, scheduled recycle per decision #67, wedge detection), health endpoints on :4841, structured Serilog with correlation IDs, LiteDB local-cache fallback per decision #36. phase-6-2-authorization-runtime.md (145 lines) wires ACL enforcement on every OPC UA Read/Write/Subscribe/Call path + LDAP-group-to-admin-role grants per decisions #105 and #129 -- runtime permission-trie evaluator over the 6-level Cluster/Namespace/UnsArea/UnsLine/Equipment/Tag hierarchy, per-session cache invalidated on generation-apply + LDAP-cache expiry. phase-6-3-redundancy-runtime.md (165 lines) lands the non-transparent warm/hot redundancy runtime per decisions #79-85: dynamic ServiceLevel node, ServerUriArray peer broadcast, mid-apply dip via sp_PublishGeneration hook, operator-driven role transition (no auto-election -- plan remains explicit about what's out of scope). phase-6-4-admin-ui-completion.md (178 lines) closes Phase 1 Stream E completion-checklist items that never landed: UNS drag-reorder + impact preview, Equipment CSV import, 5-identifier search, draft-diff viewer enhancements, OPC 40010 _base Identification field exposure per decisions #138-139. Each plan then got a Codex adversarial-review pass (codex mcp tool, read-only sandbox, synchronous). Reviews explicitly targeted decision-log conflicts, API-shape assumptions, unbounded blast radius, under-specified state transitions, and testing holes. Appended 'Adversarial Review — 2026-04-19' section to each plan with numbered findings (severity / finding / why-it-matters / adjustment accepted). Review surfaced real substantive issues that the initial drafts glossed over: Phase 6.1 auto-retry conflicting with decisions #44-45 no-auto-write-retry rule; Phase 6.1 per-driver-instance pipeline breaking decision #35's per-device isolation; Phase 6.1 recycle/watchdog at Tier A/B breaching decisions #73-74 Tier-C-only constraint; Phase 6.2 conflating control-plane LdapGroupRoleMapping with data-plane ACL grants; Phase 6.2 missing Browse enforcement entirely; Phase 6.2 subscription re-authorization policy unresolved between create-time-only and per-publish; Phase 6.3 ServiceLevel=0 colliding with OPC UA Part 5 Maintenance semantics; Phase 6.3 ServerUriArray excluding self (spec-bug); Phase 6.3 apply-window counter race on cancellation; Phase 6.3 client cutover for Kepware/Aveva OI Gateway is unverified hearsay; Phase 6.4 stale UNS impact preview overwriting concurrent draft edits; Phase 6.4 identifier contract drifting from admin-ui.md canonical set (ZTag/MachineCode/SAPID/EquipmentId/EquipmentUuid, not ZTag/SAPID/UniqueId/Alias1/Alias2); Phase 6.4 CSV import atomicity internally contradictory (single txn vs chunked inserts); Phase 6.4 OPC 40010 field list not matching decision #139. Every finding has an adjustment in the plan doc -- plans are meant to be executable from the next session with the critique already baked in rather than a clean draft that would run into the same issues at implementation time. Codex thread IDs cited in each plan's review section for reproducibility. Pure documentation PR -- no code changes. Plans are DRAFT status; each becomes its own implementation phase with its own entry-gate + exit-gate when business prioritizes.
2026-04-19 03:15:00 -04:00
Doc — record that this dev box (DESKTOP-6JL3KKO) hosts the full AVEVA stack required for the LmxOpcUa Phase 2 breakout, removing the "needs live MXAccess runtime" environmental blocker that the partial-exit evidence cited as gating Streams D + E. Inventory verified via Get-Service: 27 ArchestrA / Wonderware / AVEVA services running including aaBootstrap, aaGR (Galaxy Repository), aaLogger, aaUserValidator, aaPim, ArchestrADataStore, AsbServiceManager, AutoBuild_Service; the full Historian set (aahClientAccessPoint, aahGateway, aahInSight, aahSearchIndexer, aahSupervisor, InSQLStorage, InSQLConfiguration, InSQLEventSystem, InSQLIndexing, InSQLIOServer, InSQLManualStorage, InSQLSystemDriver, HistorianSearch-x64); slssvc (Wonderware SuiteLink); MXAccess COM DLL at C:\Program Files (x86)\ArchestrA\Framework\bin\ArchestrA.MXAccess.dll plus the matching .tlb files; OI-Gateway install at C:\Program Files (x86)\Wonderware\OI-Server\OI-Gateway\ — which means the Phase 1 Task E.10 AppServer-via-OI-Gateway smoke test (decision #142) is *also* runnable on the same box, not blocked on a separate AVEVA test machine as the original deferral assumed. dev-environment.md inventory row for "Dev Galaxy" now lists every service and file path; status flips to "Fully available — Phase 2 lift unblocked"; the GLAuth row also fills out v2.4.0 actual install details (direct-bind cn={user},dc=lmxopcua,dc=local; users readonly/writeop/writetune/writeconfig/alarmack/admin/serviceaccount; running under NSSM service GLAuth; current GroupToRole mapping ReadOnly→ConfigViewer / WriteOperate→ConfigEditor / AlarmAck→FleetAdmin) and notes the v2-rebrand to dc=otopcua,dc=local is a future cosmetic change. phase-2-partial-exit-evidence.md status header gains "runtime now in place"; an Update 2026-04-17 callout enumerates the same service inventory and concludes "no environmental blocker remains"; the next-session checklist's first step changes from "stand up dev Galaxy" to "verify the local AVEVA stack is still green (Get-Service aaGR, aaBootstrap, slssvc → Running) and the Galaxy ZB repository is reachable" with a new step 9 calling out that the AppServer-via-OI-Gateway smoke test should now be folded in opportunistically. plan.md §"4. Galaxy/MXAccess as Out-of-Process Driver" gains a "Dev environment for the LmxOpcUa breakout" paragraph documenting which physical machine has the runtime so the planning doc no longer reads as if AVEVA capability were a future logistical concern. No source / test changes.
2026-04-17 22:42:15 -04:00
Harden v2 design against the four findings from the 2026-04-17 Codex adversarial review of the db schema and admin UI: (1) DriverInstance.NamespaceId now enforces a same-cluster invariant in three layers (sp_ValidateDraft cross-table check using the new UX_Namespace_Generation_LogicalId_Cluster composite index, server-side namespace-selection API scoping that prevents bypass via crafted requests, and audit-log entries on cross-cluster attempts) so a draft for cluster A can no longer bind to cluster B's namespace and leak its URI into A's endpoint; (2) the Namespace table moves from cluster-level to generation-versioned with append-only logical-ID identity and locked NamespaceUri/Kind across generations so admins can no longer disable a namespace that a published driver depends on outside the publish/diff/rollback flow, the cluster-create workflow opens an initial draft containing the default namespaces instead of writing namespace rows directly, and the Admin UI Namespaces tab becomes hybrid (read-only over published, click-to-edit opens draft) like the UNS Structure tab; (3) ZTag/SAPID fleet-wide uniqueness moves from per-generation indexes (which silently allow rollback or re-enable to reintroduce duplicates) into a new ExternalIdReservation table that sits outside generation versioning, with sp_PublishGeneration reserving atomically via MERGE under transaction lock so a different EquipmentUuid attempting the same active value rolls the whole publish back, an FleetAdmin-only sp_ReleaseExternalIdReservation as the only path to free a value for reuse with audit trail, and a corresponding Release-reservation operator workflow in the Admin UI; (4) Equipment.EquipmentId is now system-generated as 'EQ-' + first 12 hex chars of EquipmentUuid, never operator-supplied or editable, removed from the Equipment CSV import schema entirely (rows match by EquipmentUuid for updates or create new equipment with auto-generated identifiers when no UUID is supplied), with a new Merge-or-Rebind-equipment operator workflow handling the rare case where two UUIDs need to be reconciled — closing the corruption path where typos and bulk-import renames were minting duplicate identities and breaking downstream UUID-keyed lineage. New decisions #122-125 with explicit "supersedes" notes for the earlier #107 (cluster-level namespace) and #116 (operator-set EquipmentId) frames they revise.
2026-04-17 11:08:58 -04:00
Phase 3 PR 55 -- Mitsubishi MELSEC Modbus TCP quirks research document. 451-line doc at docs/v2/mitsubishi.md mirroring the docs/v2/dl205.md template for the MELSEC family (Q-series + QJ71MT91, L-series + LJ71MT91, iQ-R + RJ71EN71, iQ-R built-in Ethernet, iQ-F FX5U built-in, FX3U + FX3U-ENET / FX3U-ENET-P502, FX3GE built-in). Like Siemens S7, MELSEC Modbus is a patchwork of per-site-configured add-on modules rather than a fixed firmware stack, but the MELSEC-specific traps are different enough to warrant their own document. Key findings worth flagging for the PR 58+ implementation track: (1) MODULE NAMING TRAP -- QJ71MB91 is SERIAL RTU, not TCP. The Q-series TCP module is QJ71MT91. Driver docs + config UI should surface this clearly because the confusion costs operators hours when they try to connect to an RS-232 module via Ethernet. (2) NO CANONICAL MAPPING -- every MELSEC Modbus site has a unique 'Modbus Device Assignment Parameter' block of up to 16 assignments (each binding a MELSEC device range like D0..D1023 to a Modbus-address range); the driver must treat the mapping as runtime config, not device-family profile. (3) X/Y BASE DEPENDS ON FAMILY -- Q/L/iQ-R use HEX notation for X/Y (X20 = decimal 32), FX/iQ-F use OCTAL (X20 = decimal 16, same as DL260); iQ-F has a GX Works3 project toggle that can flip this. Single biggest off-by-N source in MELSEC driver code -- driver address helper must take a family selector. (4) Word order CDAB across Q/L/iQ-R/iQ-F by default (CPU-level, not module-level) -- no user-configurable swap on the server side. FX5U's SWAP instruction is for CLIENT mode only. Driver Mitsubishi profile default must be ByteOrder.WordSwap, matching DL260 but OPPOSITE of Siemens S7. (5) D-registers are BINARY by default (opposite of DL205's BCD-by-default). FNC 18 BCD / FNC 19 BIN instructions confirm binary-by-default in the ladder. Caller must explicitly opt-in to Bcd16/Bcd32 tags when the ladder stores BCD, same pattern as DL205 but the default is inverted. (6) FX5U FIRMWARE GATE -- needs firmware >= 1.060 for native Modbus TCP server; older firmware is client-only. Surface a clear capability error on connect. (7) FX3U PORT 502 SPLIT -- the standard FX3U-ENET cannot bind port 502 (lower port range restricted on the firmware); only FX3U-ENET-P502 can. FX3U-ENET-ADP has no Modbus at all and is a common operator mis-purchase -- driver should surface 'module does not support Modbus' as a distinct error, not 'connection refused'. (8) QJ71MT91 does NOT support FC22 (Mask Write) or FC23 (Read-Write Multiple). iQ-R and iQ-F do. Driver bulk-read optimization must gate on module capability. (9) MAX CONNECTIONS -- 16 simultaneous on Q/L/iQ-R, 8 on FX5U and FX3U-ENET. (10) STOP-mode writes -- configurable on Q/L/iQ-R/iQ-F (default = accept writes even in STOP), always rejected with exception 04 on FX3U-ENET. Per-model test differentiation section names the tests Mitsubishi_QJ71MT91_*, Mitsubishi_FX5U_*, Mitsubishi_FX3U_ENET_*, with a shared Mitsubishi_Common_* fixture for CDAB-word-order + binary-not-BCD + standard-exception-codes tests. 17 cited references including primary Mitsubishi manuals (SH-080446 for QJ71MT91, JY997D56101 for FX5, SH-081259 for iQ-R Ethernet, JY997D18101 for FX3U-ENET) plus Ignition / Kepware / Fernhill / HMS third-party driver release notes. Three unconfirmed rumours flagged explicitly: iQ-R RJ71EN71 early firmware rumoured ABCD word order (no primary source), QJ71MT91 firmware < 2010-05 FC15 odd-byte-count truncation (forum report only), FX3U-ENET firmware < 1.14 out-of-order TxId echoes under load (unreproducible on bench). Pure documentation PR -- no code, no tests. Per-quirk implementation lands in PRs 58+. Research conducted 2026-04-18.
2026-04-18 22:51:28 -04:00
Doc — record that this dev box (DESKTOP-6JL3KKO) hosts the full AVEVA stack required for the LmxOpcUa Phase 2 breakout, removing the "needs live MXAccess runtime" environmental blocker that the partial-exit evidence cited as gating Streams D + E. Inventory verified via Get-Service: 27 ArchestrA / Wonderware / AVEVA services running including aaBootstrap, aaGR (Galaxy Repository), aaLogger, aaUserValidator, aaPim, ArchestrADataStore, AsbServiceManager, AutoBuild_Service; the full Historian set (aahClientAccessPoint, aahGateway, aahInSight, aahSearchIndexer, aahSupervisor, InSQLStorage, InSQLConfiguration, InSQLEventSystem, InSQLIndexing, InSQLIOServer, InSQLManualStorage, InSQLSystemDriver, HistorianSearch-x64); slssvc (Wonderware SuiteLink); MXAccess COM DLL at C:\Program Files (x86)\ArchestrA\Framework\bin\ArchestrA.MXAccess.dll plus the matching .tlb files; OI-Gateway install at C:\Program Files (x86)\Wonderware\OI-Server\OI-Gateway\ — which means the Phase 1 Task E.10 AppServer-via-OI-Gateway smoke test (decision #142) is *also* runnable on the same box, not blocked on a separate AVEVA test machine as the original deferral assumed. dev-environment.md inventory row for "Dev Galaxy" now lists every service and file path; status flips to "Fully available — Phase 2 lift unblocked"; the GLAuth row also fills out v2.4.0 actual install details (direct-bind cn={user},dc=lmxopcua,dc=local; users readonly/writeop/writetune/writeconfig/alarmack/admin/serviceaccount; running under NSSM service GLAuth; current GroupToRole mapping ReadOnly→ConfigViewer / WriteOperate→ConfigEditor / AlarmAck→FleetAdmin) and notes the v2-rebrand to dc=otopcua,dc=local is a future cosmetic change. phase-2-partial-exit-evidence.md status header gains "runtime now in place"; an Update 2026-04-17 callout enumerates the same service inventory and concludes "no environmental blocker remains"; the next-session checklist's first step changes from "stand up dev Galaxy" to "verify the local AVEVA stack is still green (Get-Service aaGR, aaBootstrap, slssvc → Running) and the Galaxy ZB repository is reachable" with a new step 9 calling out that the AppServer-via-OI-Gateway smoke test should now be folded in opportunistically. plan.md §"4. Galaxy/MXAccess as Out-of-Process Driver" gains a "Dev environment for the LmxOpcUa breakout" paragraph documenting which physical machine has the runtime so the planning doc no longer reads as if AVEVA capability were a future logistical concern. No source / test changes.
2026-04-17 22:42:15 -04:00
Phase 3 PR 54 -- Siemens S7 Modbus TCP quirks research document. 485-line doc at docs/v2/s7.md mirroring the docs/v2/dl205.md template for the Siemens SIMATIC S7 family (S7-1200 / S7-1500 / S7-300 / S7-400 / ET 200SP / CP 343-1 / CP 443-1 / CP 343-1 Lean / MODBUSPN). Siemens S7 is fundamentally different from DL260: there is no fixed Modbus memory map baked into firmware -- every deployment runs MB_SERVER (S7-1200/1500/ET 200SP), MODBUSCP (S7-300/400 + CP), or MODBUSPN (S7-300/400 PN) library blocks wired up to user DBs via the MB_HOLD_REG / ADDR parameters. The driver's job is therefore to handle per-site CONFIG rather than per-family QUIRKS, and the doc makes that explicit. Key findings worth flagging for the PR 56+ implementation track: (1) S7 has no fixed memory map -- must accept per-site DriverConfig, cannot assume vendor-standard layout. (2) MB_SERVER requires NON-optimized DBs in TIA Portal; optimized DBs cause the library to return STATUS 0x8383 on every access -- the single most common S7 Modbus deployment bug in the field. (3) Word order is ABCD by default (big-endian bytes + big-endian words) across all Siemens S7 Modbus paths, which is the OPPOSITE of DL260 CDAB -- the Modbus driver's S7 profile default must be ByteOrder.BigEndian, not WordSwap. (4) MB_SERVER listens on ONE port per FB instance; multi-client support requires running MB_SERVER on 502 / 503 / 504 / ... simultaneously -- most clients assume port 502 multiplexes, which is wrong on S7. (5) CP 343-1 Lean is SERVER-ONLY and requires the separate 2XV9450-1MB00 MODBUS TCP CP library license; client mode calls return immediate error on Lean. (6) MB_SERVER does NOT filter Unit ID, accepts any value. Means the driver can't use Unit ID to detect 'direct vs gateway' topology. (7) FC23 Read-Write Multiple, FC22 Mask Write, FC20/21 File Records, FC43 Device Identification all return exception 01 Illegal Function on every S7 variant -- the driver MUST NOT attempt bulk-read optimisation via FC23 when talking to S7. (8) STOP-mode read/write behaviour is non-deterministic across firmware bands: reads may return cached data (library internal buffer), writes may succeed-silently or return exception 04 depending on CPU firmware version -- flagged as 'driver treats both as unavailable, do not distinguish'. Unconfirmed rumours flagged separately: 'V2.0+ reverses float byte order' claim (cited but not reproduced), STOP-mode caching location (folklore, no primary source). Per-model test differentiation section names the tests as S7_<model>_<behavior> matching the DL205 template convention (e.g. S7_1200_MB_SERVER_requires_non_optimized_DB, S7_343_1_Lean_rejects_client_mode, S7_FC23_returns_IllegalFunction). 31 cited references across the Siemens Industry Online Support entry-ID system (68011496 for MB_SERVER FAQ, etc.), TIA Portal library manuals, and three third-party driver vendor release notes (Kepware, Ignition, FactoryTalk). This is a pure documentation PR -- no code, no tests, no csproj changes. Per-quirk implementation lands in PRs 56+. Research conducted 2026-04-18 against latest publicly-available Siemens documentation; STOP-mode behaviour and MB_SERVER versioning specifically cross-checked against Siemens forum answers from 2024-2025.
2026-04-18 22:50:51 -04:00
Phase 2 PR 61 -- Close V1_ARCHIVE_STATUS.md; Phase 2 Streams D + E done. Purely a documentation-closure PR. The v1 archive deletion itself happened across earlier PRs: PR 2 on phase-2-stream-d archive-marked the four v1 projects (IsTestProject=false so dotnet test slnx bypassed them); Phase 3 PR 18 deleted the archived project source trees. What remained on disk was stale bin/obj residue from pre-deletion builds -- git never tracked those, so removing them from the working tree is cosmetic only (no source-file diff in this PR). What this PR actually changes: V1_ARCHIVE_STATUS.md is rewritten from 'Deletion plan (Phase 2 PR 3)' pre-work prose to a CLOSED retrospective that (a) lists all five v1 directories as deleted with check-marks (src/OtOpcUa.Host, src/Historian.Aveva, tests/Historian.Aveva.Tests, tests/Tests.v1Archive, tests/IntegrationTests), (b) names the parity-bar tests that now fill the role the 494 v1 tests originally held (Driver.Galaxy.E2E cross-FX subprocess parity + stability-findings regression, per-component *.Tests projects, Driver.Modbus.IntegrationTests, LiveStack/ smoke tests), and (c) gives the closure timeline connecting PR 2 -> Phase 3 PR 18 -> this PR 61. Also added the Modbus TCP driver family as parity coverage that didn't exist in v1 (DL205 + S7-1500 + Mitsubishi MELSEC via pymodbus sim). Stream D (retire legacy Host) has been effectively done since Phase 3 PR 18; Stream E (parity validation) is done since PR 2 landed the Driver.Galaxy.E2E project with HostSubprocessParityTests + HierarchyParityTests + StabilityFindingsRegressionTests. This PR exists to definitively close the two pending Phase 2 tasks on the task list and give future-me (or anyone picking up Phase 2 retrospectives) a single 'what actually happened' doc instead of a 'what we plan to do' prose that didn't match reality. dotnet build ZB.MOM.WW.OtOpcUa.slnx: 0 errors, 200 warnings (all xunit1051 cancellation-token analyzer advisories, unchanged from v2 tip). No test regressions -- no source code changed.
2026-04-18 23:20:54 -04:00