Compare commits

..

184 Commits

Author SHA1 Message Date
dohertj2 746c5decfe Merge pull request '[s7] S7 — Optimized DB / S7Plus (decision PR)' (#408) from auto/s7/PR-S7-F into auto/driver-gaps 2026-04-26 11:25:37 -04:00
Joseph Doherty b217ca61ce Auto: s7-f — Optimized DB / S7Plus decision (Track 1+3 docs-only)
Closes #304
2026-04-26 11:22:40 -04:00
dohertj2 26708b6609 Merge pull request '[twincat] TwinCAT — IAlarmSource via TC3 EventLogger' (#407) from auto/twincat/5.1 into auto/driver-gaps 2026-04-26 11:16:07 -04:00
Joseph Doherty c88e0b6bed Auto: twincat-5.1 — IAlarmSource via TC3 EventLogger (gated, scaffold)
Closes #316
2026-04-26 11:13:24 -04:00
dohertj2 3babfb8a99 Merge pull request '[s7] S7 — PLC password / protection-level handling' (#406) from auto/s7/PR-S7-E2 into auto/driver-gaps 2026-04-26 10:53:52 -04:00
Joseph Doherty 30c3b10c94 Auto: s7-e2 — PLC password / protection-level handling
Closes #303
2026-04-26 10:51:07 -04:00
dohertj2 e0f3d1c925 Merge pull request '[s7] S7 — CPU diagnostic buffer / SZL reads' (#405) from auto/s7/PR-S7-E1 into auto/driver-gaps 2026-04-26 10:33:35 -04:00
Joseph Doherty 108f69d198 Auto: s7-e1 — CPU diagnostic buffer / SZL reads
Closes #302
2026-04-26 10:30:43 -04:00
dohertj2 f7e0d9a9e7 Merge pull request '[opcuaclient] OpcUaClient — ServerUriArray redundant failover' (#404) from auto/opcuaclient/14 into auto/driver-gaps 2026-04-26 10:07:47 -04:00
Joseph Doherty 705c98ad98 Auto: opcuaclient-14 — ServerUriArray redundant failover
Closes #286
2026-04-26 10:05:05 -04:00
dohertj2 35d733d73b Merge pull request '[opcuaclient] OpcUaClient — Full Aggregate function set' (#403) from auto/opcuaclient/13 into auto/driver-gaps 2026-04-26 09:49:14 -04:00
Joseph Doherty 0adc5adb59 Auto: opcuaclient-13 — Part 13 aggregate catalog mapping
Closes #285
2026-04-26 09:46:33 -04:00
dohertj2 7cbc566db9 Merge pull request '[opcuaclient] OpcUaClient — IHistoryProvider.ReadEventsAsync interface fix + impl' (#402) from auto/opcuaclient/12 into auto/driver-gaps 2026-04-26 09:32:22 -04:00
Joseph Doherty c36903d6a0 Auto: opcuaclient-12 — IHistoryProvider.ReadEventsAsync EventFilter spec + impl
Adds a filter-aware overload of IHistoryProvider.ReadEventsAsync that carries
EventFilter SelectClauses + WhereClause, and implements it on the OPC UA
Client driver via Session.HistoryReadAsync + ReadEventDetails.

The change is additive (default-impl returns NotSupportedException) so the
existing Galaxy.Proxy.GalaxyProxyDriver implementation keeps compiling
against the fixed-field overload — no cross-driver refactor required.

* Core.Abstractions: new EventHistoryRequest / SimpleAttributeSpec /
  ContentFilterSpec records mirror the OPC UA wire shape transport-neutrally.
  HistoricalEventBatch / HistoricalEventRow carry an open-ended Fields bag
  keyed by SimpleAttributeSpec.FieldName so server-side dispatch can re-align
  with the client's wire-side SelectClause order.
* OpcUaClient driver: new ReadEventsAsync(fullReference, EventHistoryRequest, ct)
  builds an EventFilter, calls Session.HistoryReadAsync, and unwraps
  HistoryEvent.Events into HistoricalEventBatch rows. Default SelectClause
  set matches BuildHistoryEvent on the server side. ContentFilter bytes are
  decoded through the live session's MessageContext (passthrough — the
  driver does not evaluate filters).
* Unit tests: 7 new tests cover SelectClause translation, default-clause
  fallback, malformed where-clause swallowing, uninitialized-driver guard,
  null-request guard, and IHistoryProvider default fallback.
* Integration scaffold: build-only [Fact] gated on opc-plc --alm; flips to
  green when the fixture image is upgraded.
* Docs: HistoryRead Events section in docs/drivers/OpcUaClient.md plus a
  cross-link from Client.CLI.md historyread page.
* E2E: -HistoryEvents switch on scripts/e2e/test-opcuaclient.ps1 confirms
  the gateway round-trips HistoryReadEvents without
  BadHistoryOperationUnsupported (gated; defaults to skip).

Closes #284
2026-04-26 09:29:40 -04:00
dohertj2 2ee61c0999 Merge pull request '[focas] FOCAS — Cycle time per part / last cycle delta' (#401) from auto/focas/F5-a into auto/driver-gaps 2026-04-26 09:14:06 -04:00
Joseph Doherty e3d7c65f61 Auto: focas-f5a — cycle time per part / last cycle delta
Closes #272
2026-04-26 09:11:21 -04:00
dohertj2 45770e8d90 Merge pull request '[ablegacy] AbLegacy — DH+ via 1756-DHRIO bridging' (#400) from auto/ablegacy/13 into auto/driver-gaps 2026-04-26 08:59:02 -04:00
Joseph Doherty 399257377b Auto: ablegacy-13 — DH+ via 1756-DHRIO bridging validation
Closes #256
2026-04-26 08:56:23 -04:00
dohertj2 08a4db2952 Merge pull request '[ablegacy] AbLegacy — Auto-demote on comm failure' (#399) from auto/ablegacy/12 into auto/driver-gaps 2026-04-26 08:47:59 -04:00
Joseph Doherty 1e3053c0d8 Auto: ablegacy-12 — auto-demote on comm failure
Closes #255
2026-04-26 08:44:53 -04:00
dohertj2 8ee65a75d2 Merge pull request '[abcip] AbCip — IPerCallHostResolver failover routing' (#398) from auto/abcip/5.2 into auto/driver-gaps 2026-04-26 08:16:21 -04:00
Joseph Doherty 9e157fc8a4 Auto: abcip-5.2 — HSBY failover routing in ResolveHost
Closes #243
2026-04-26 08:13:41 -04:00
dohertj2 258ce8e937 Merge pull request '[abcip] AbCip — HSBY paired-IP probing' (#397) from auto/abcip/5.1 into auto/driver-gaps 2026-04-26 07:54:37 -04:00
Joseph Doherty 561b0f9ea9 Auto: abcip-5.1 — HSBY paired-IP role probing
Closes #242
2026-04-26 07:51:44 -04:00
dohertj2 349aa5c6f4 Merge pull request '[twincat] TwinCAT — Nested UDT browse via online type walker' (#396) from auto/twincat/4.1 into auto/driver-gaps 2026-04-26 07:31:40 -04:00
Joseph Doherty 0444cb699d Auto: twincat-4.1 — nested UDT browse via online type walker
Closes #315
2026-04-26 07:28:52 -04:00
dohertj2 da6e19d07d Merge pull request '[s7] S7 — Instance-DB / FB parameter access' (#395) from auto/s7/PR-S7-D3 into auto/driver-gaps 2026-04-26 07:07:17 -04:00
Joseph Doherty baf1d65875 Auto: s7-d3 — instance-DB / FB parameter resolution
Closes #301
2026-04-26 07:04:40 -04:00
dohertj2 c9e28b881e Merge pull request '[s7] S7 — UDT / STRUCT / nested-DB handling' (#394) from auto/s7/PR-S7-D2 into auto/driver-gaps 2026-04-26 06:53:12 -04:00
Joseph Doherty 5f8d84db43 Auto: s7-d2 — UDT / STRUCT / nested-DB fan-out
Closes #300
2026-04-26 06:50:26 -04:00
dohertj2 7e62a1158f Merge pull request '[s7] S7 — Symbol-table / TIA Portal export browse' (#393) from auto/s7/PR-S7-D1 into auto/driver-gaps 2026-04-26 06:35:00 -04:00
Joseph Doherty a908dff7b5 Auto: s7-d1 — TIA Portal CSV + STEP 7 Classic AWL symbol import
Closes #299
2026-04-26 06:32:18 -04:00
dohertj2 ac3fd45cc6 Merge pull request '[opcuaclient] OpcUaClient — Reverse Connect' (#392) from auto/opcuaclient/11 into auto/driver-gaps 2026-04-26 06:11:11 -04:00
Joseph Doherty 5c72deb839 Auto: opcuaclient-11 — reverse connect (server-initiated)
Closes #283
2026-04-26 06:08:30 -04:00
dohertj2 9a3bc08e1c Merge pull request '[focas] FOCAS — Password / unlock parameter' (#391) from auto/focas/F4-d into auto/driver-gaps 2026-04-26 05:50:13 -04:00
Joseph Doherty 86f3fc2733 Auto: focas-f4d — password / unlock parameter
Closes #271
2026-04-26 05:45:13 -04:00
dohertj2 d676b4056d Merge pull request '[focas] FOCAS — pmc_wrpmcrng' (#390) from auto/focas/F4-c into auto/driver-gaps 2026-04-26 05:18:48 -04:00
Joseph Doherty 54c09d4d5d Auto: focas-f4c — pmc_wrpmcrng with bit-level RMW
Closes #270
2026-04-26 05:15:52 -04:00
dohertj2 0c967af645 Merge pull request '[focas] FOCAS — cnc_wrmacro + cnc_wrparam' (#389) from auto/focas/F4-b into auto/driver-gaps 2026-04-26 04:57:15 -04:00
Joseph Doherty f48f31cfc7 Auto: focas-f4b — cnc_wrmacro + cnc_wrparam writes
Closes #269
2026-04-26 04:54:28 -04:00
dohertj2 71af554497 Merge pull request '[focas] FOCAS — Write infrastructure + per-tag opt-in' (#388) from auto/focas/F4-a into auto/driver-gaps 2026-04-26 04:35:27 -04:00
Joseph Doherty 1bfe8fba0e Auto: focas-f4a — write infrastructure + per-tag opt-in
Closes #268
2026-04-26 04:32:43 -04:00
dohertj2 6f1657b1c0 Merge pull request '[ablegacy] AbLegacy — RSLogix 500/PLC-5 symbol import' (#387) from auto/ablegacy/11 into auto/driver-gaps 2026-04-26 04:16:00 -04:00
Joseph Doherty 4e8df38bb2 Auto: ablegacy-11 — RSLogix 500/PLC-5 CSV symbol import
Closes #254
2026-04-26 04:13:13 -04:00
dohertj2 4fdeef7a6c Merge pull request '[ablegacy] AbLegacy — Diagnostic counters as tags' (#386) from auto/ablegacy/10 into auto/driver-gaps 2026-04-26 03:53:28 -04:00
Joseph Doherty 42472b5549 Auto: ablegacy-10 — diagnostic counters as tags
Closes #253
2026-04-26 03:50:47 -04:00
dohertj2 14876ea210 Merge pull request '[ablegacy] AbLegacy — Per-device timeout / retry overrides' (#385) from auto/ablegacy/9 into auto/driver-gaps 2026-04-26 03:35:25 -04:00
Joseph Doherty c292dcc1db Auto: ablegacy-9 — per-device timeout / retry overrides
Closes #252
2026-04-26 03:32:45 -04:00
dohertj2 4ff1537d8a Merge pull request '[abcip] AbCip — _RefreshTagDb writeable system tag' (#384) from auto/abcip/4.4 into auto/driver-gaps 2026-04-26 03:19:13 -04:00
Joseph Doherty e0e5e04e48 Auto: abcip-4.4 — _RefreshTagDb writeable system tag
Closes #241
2026-04-26 03:16:28 -04:00
dohertj2 e46e4de31f Merge pull request '[abcip] AbCip — Diagnostic / system tags as browseable variables' (#383) from auto/abcip/4.3 into auto/driver-gaps 2026-04-26 02:58:40 -04:00
Joseph Doherty 901a5b9b21 Auto: abcip-4.3 — diagnostic / system tags as browseable variables
Closes #240
2026-04-26 02:55:56 -04:00
dohertj2 9c108cd00a Merge pull request '[abcip] AbCip — Write deadband / write-on-change' (#382) from auto/abcip/4.2 into auto/driver-gaps 2026-04-26 02:34:27 -04:00
Joseph Doherty da9936f7f0 Auto: abcip-4.2 — write deadband / write-on-change
Closes #239
2026-04-26 02:31:50 -04:00
dohertj2 9202ebe5ef Merge pull request '[abcip] AbCip — Per-tag scan rate / scan group bucketing' (#381) from auto/abcip/4.1 into auto/driver-gaps 2026-04-26 02:18:25 -04:00
Joseph Doherty b45713622f Auto: abcip-4.1 — per-tag scan rate / scan group bucketing
Closes #238
2026-04-26 02:15:50 -04:00
dohertj2 e5c38a5a0e Merge pull request '[twincat] TwinCAT — Cycle-time / jitter / PLC-state diagnostics' (#380) from auto/twincat/3.2 into auto/driver-gaps 2026-04-26 02:02:36 -04:00
Joseph Doherty 24a3cda56a Auto: twincat-3.2 — cycle-time / jitter / PLC-state diagnostics
Closes #314
2026-04-26 01:59:56 -04:00
dohertj2 30e39a752a Merge pull request '[twincat] TwinCAT — Per-tag MaxDelay tuning' (#379) from auto/twincat/3.1 into auto/driver-gaps 2026-04-26 01:47:44 -04:00
Joseph Doherty fb57717f6f Auto: twincat-3.1 — per-tag MaxDelay tuning
Closes #313
2026-04-26 01:45:12 -04:00
dohertj2 621de94126 Merge pull request '[s7] S7 — Pre-flight PUT/GET enablement test' (#378) from auto/s7/PR-S7-C5 into auto/driver-gaps 2026-04-26 01:34:27 -04:00
Joseph Doherty 64a11ef285 Auto: s7-c5 — pre-flight PUT/GET enablement test
Closes #298
2026-04-26 01:31:48 -04:00
dohertj2 4bc8aa2478 Merge pull request '[s7] S7 — Deadband / on-change with thresholds' (#377) from auto/s7/PR-S7-C4 into auto/driver-gaps 2026-04-26 01:17:33 -04:00
Joseph Doherty 06b39a28fa Auto: s7-c4 — deadband / on-change with thresholds
Closes #297
2026-04-26 01:14:59 -04:00
dohertj2 8909302929 Merge pull request '[s7] S7 — Per-tag scan group / publish rate' (#376) from auto/s7/PR-S7-C3 into auto/driver-gaps 2026-04-26 01:05:39 -04:00
Joseph Doherty 162c82b8d9 Auto: s7-c3 — per-tag scan group / publish rate
Closes #296
2026-04-26 01:03:00 -04:00
dohertj2 ca3d4bf581 Merge pull request '[s7] S7 — TSAP / Connection Type selector' (#375) from auto/s7/PR-S7-C2 into auto/driver-gaps 2026-04-26 00:51:44 -04:00
Joseph Doherty 3b98e4d366 Auto: s7-c2 — TSAP / Connection Type selector
Closes #295
2026-04-26 00:49:10 -04:00
dohertj2 bcf83bf39b Merge pull request '[s7] S7 — PDU size negotiation surfaced' (#374) from auto/s7/PR-S7-C1 into auto/driver-gaps 2026-04-26 00:38:24 -04:00
Joseph Doherty 6540bbe1ef Auto: s7-c1 — surface negotiated PDU size via DriverHealth.Diagnostics
Closes #294
2026-04-26 00:35:49 -04:00
dohertj2 f469cf7e0d Merge pull request '[opcuaclient] OpcUaClient — Auto re-import on ModelChangeEvent' (#373) from auto/opcuaclient/10 into auto/driver-gaps 2026-04-26 00:27:00 -04:00
Joseph Doherty ab3ed6b6a3 Auto: opcuaclient-10 — auto re-import on ModelChangeEvent
Closes #282
2026-04-26 00:24:24 -04:00
dohertj2 eed5857aa9 Merge pull request '[focas] FOCAS — cnc_rdalmhistry alarm-history extension' (#372) from auto/focas/F3-a into auto/driver-gaps 2026-04-26 00:10:36 -04:00
Joseph Doherty 7f9d6a778e Auto: focas-f3a — cnc_rdalmhistry alarm-history extension
Adds FocasAlarmProjection with two modes (ActiveOnly default, ActivePlusHistory)
that polls cnc_rdalmhistry on connect + on a configurable cadence (5 min default,
HistoryDepth=100 capped at 250). Emits historic events via IAlarmSource with
SourceTimestampUtc set from the CNC's reported timestamp; dedup keyed on
(OccurrenceTime, AlarmNumber, AlarmType). Ships the ODBALMHIS packed-buffer
decoder + encoder in Wire/FocasAlarmHistoryDecoder.cs and threads
ReadAlarmHistoryAsync through IFocasClient (default no-op so existing transport
variants stay back-compat). FocasDriver now implements IAlarmSource.

13 new unit tests cover: mode switch, dedup, distinct-timestamp emission,
type-as-key behaviour, OccurrenceTime passthrough (not Now), HistoryDepth
clamp/fallback, and decoder round-trip. All 341 FOCAS unit tests still pass.

Docs: docs/drivers/FOCAS.md (new), docs/v2/focas-deployment.md (new),
docs/v2/implementation/focas-wire-protocol.md (new),
docs/v2/implementation/focas-simulator-plan.md (new),
docs/drivers/FOCAS-Test-Fixture.md (alarm-history bullet appended).

Closes #267
2026-04-26 00:07:59 -04:00
dohertj2 1922b93bd5 Merge pull request '[ablegacy] AbLegacy — Per-tag deadband / change filter' (#371) from auto/ablegacy/8 into auto/driver-gaps 2026-04-25 23:52:42 -04:00
Joseph Doherty eb5286148e Auto: ablegacy-8 — per-tag deadband / change filter
Closes #251
2026-04-25 23:50:07 -04:00
dohertj2 69069aa3be Merge pull request '[ablegacy] AbLegacy — Array contiguous block addressing' (#370) from auto/ablegacy/7 into auto/driver-gaps 2026-04-25 23:38:40 -04:00
Joseph Doherty c689ac58b1 Auto: ablegacy-7 — array contiguous block addressing
Closes #250
2026-04-25 23:36:01 -04:00
dohertj2 05528bf71c Merge pull request '[abcip] AbCip — Logical-blocking / non-blocking strategy selector' (#369) from auto/abcip/3.3 into auto/driver-gaps 2026-04-25 23:18:47 -04:00
Joseph Doherty 01f4ee6b53 Auto: abcip-3.3 — read-strategy selector (WholeUdt / MultiPacket / Auto)
Closes #237
2026-04-25 23:16:06 -04:00
dohertj2 8a8dc1ee5a Merge pull request '[abcip] AbCip — Symbolic vs logical (instance-ID) addressing toggle' (#368) from auto/abcip/3.2 into auto/driver-gaps 2026-04-25 23:01:13 -04:00
Joseph Doherty 0c6a0d6e50 Auto: abcip-3.2 — symbolic vs logical addressing toggle
Closes #236
2026-04-25 22:58:33 -04:00
dohertj2 73ff10b595 Merge pull request '[abcip] AbCip — Configurable CIP Connection Size per device' (#367) from auto/abcip/3.1 into auto/driver-gaps 2026-04-25 22:41:51 -04:00
Joseph Doherty f6c26db609 Auto: abcip-3.1 — configurable CIP connection size per device
Closes #235
2026-04-25 22:39:05 -04:00
dohertj2 7cbddd4b4a Merge pull request '[twincat] TwinCAT — Symbol-version invalidation listener' (#366) from auto/twincat/2.3 into auto/driver-gaps 2026-04-25 22:18:47 -04:00
Joseph Doherty 4098d72bbb Auto: twincat-2.3 — symbol-version invalidation listener
Closes #312
2026-04-25 22:16:05 -04:00
dohertj2 569001364f Merge pull request '[twincat] TwinCAT — Handle-based access with caching' (#365) from auto/twincat/2.2 into auto/driver-gaps 2026-04-25 22:06:01 -04:00
Joseph Doherty b67eb6c8d0 Auto: twincat-2.2 — handle-based access with caching
Closes #311
2026-04-25 22:03:20 -04:00
dohertj2 4a071b6d5a Merge pull request '[twincat] TwinCAT — ADS Sum-read / Sum-write' (#364) from auto/twincat/2.1 into auto/driver-gaps 2026-04-25 21:46:21 -04:00
Joseph Doherty 931049b5a7 Auto: twincat-2.1 — ADS Sum-read / Sum-write
Closes #310
2026-04-25 21:43:32 -04:00
dohertj2 fa2fbb404d Merge pull request '[s7] S7 — Block-read coalescing for contiguous DBs' (#363) from auto/s7/PR-S7-B2 into auto/driver-gaps 2026-04-25 21:25:56 -04:00
Joseph Doherty 17faf76ea7 Auto: s7-b2 — block-read coalescing for contiguous DBs
Closes #293
2026-04-25 21:23:06 -04:00
dohertj2 5432c49364 Merge pull request '[s7] S7 — Multi-variable PDU packing' (#362) from auto/s7/PR-S7-B1 into auto/driver-gaps 2026-04-25 21:06:55 -04:00
Joseph Doherty d7633fe36f Auto: s7-b1 — multi-variable PDU packing
Replaces the per-tag Plc.ReadAsync loop in S7Driver.ReadAsync with a
batched ReadMultipleVarsAsync path. Scalar fixed-width tags (Bool, Byte,
Int16/UInt16, Int32/UInt32, Float32, Float64) are bin-packed into ≤18-item
batches at the default 240-byte PDU using S7.Net.Types.DataItem; arrays,
strings, dates, 64-bit ints, and UDT-shaped types stay on the legacy
ReadOneAsync path. On batch-level failure each tag in the batch falls
back to ReadOneAsync so good tags still produce values and the offender
gets its per-item StatusCode (BadDeviceFailure / BadCommunicationError).

100 scalar reads now coalesce into ≤6 PDU round-trips instead of 100.

Closes #292
2026-04-25 21:04:32 -04:00
dohertj2 69d9a6fbb5 Merge pull request '[opcuaclient] OpcUaClient — Method node mirroring + Call passthrough' (#361) from auto/opcuaclient/9 into auto/driver-gaps 2026-04-25 20:55:09 -04:00
Joseph Doherty 07abee5f6d Auto: opcuaclient-9 — method node mirroring + Call passthrough
Adds the 9th capability interface (IMethodInvoker) so the OPC UA Client
driver can mirror upstream OPC UA Method nodes into the local address
space and forward Call invocations as Session.CallAsync. Method-bearing
servers (e.g. ProgramStateMachine, Acknowledge / Confirm methods, custom
control surfaces) now show up downstream instead of being silently
filtered out.

- Core.Abstractions: IMethodInvoker + MethodCallResult; default no-op
  IAddressSpaceBuilder.RegisterMethodNode + MirroredMethodNodeInfo +
  MethodArgumentInfo. Default impls keep tag-based drivers and existing
  builders compiling without forced overrides.
- OpcUaClientDriver: BrowseRecursiveAsync now lifts the Method node-class
  filter; for each method it walks HasProperty to pick up InputArguments
  + OutputArguments and decodes the Argument extension objects into
  MethodArgumentInfo. Status-codes pass through verbatim (cascading
  quality), local NodeId-parse + lost-session faults surface as
  BadNodeIdInvalid / BadCommunicationError.
- 7 new unit tests cover the capability surface, the DTO shapes, and the
  back-compat default no-op for RegisterMethodNode. Suite green at
  160/160.

Server-side OnCallMethod dispatch (wiring local MethodNode handlers to
IMethodInvoker.CallMethodAsync inside DriverNodeManager) is deferred to
a follow-up — the driver-side surface + browse mirror ship cleanly here.

Closes #281

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-25 20:52:39 -04:00
dohertj2 0f3abed4c7 Merge pull request '[opcuaclient] OpcUaClient — Type definition mirroring' (#360) from auto/opcuaclient/8 into auto/driver-gaps 2026-04-25 20:40:49 -04:00
Joseph Doherty cc21281cbb Auto: opcuaclient-8 — type definition mirroring
Adds an opt-in pass-3 walk of the upstream TypesFolder (i=86) so the OPC UA
Client driver can mirror upstream type definitions into the local address
space. Honours the curation rules from PR-7 (#359). Structural mirror only —
binary-encoding priming via LoadDataTypeSystem is tracked as a follow-up
because that helper was removed from the public ISession surface in
OPCFoundation.NetStandard 1.5.378+.

- IAddressSpaceBuilder.RegisterTypeNode (default no-op for back-compat)
- MirroredTypeNodeInfo + MirroredTypeKind in Core.Abstractions
- OpcUaClientDriverOptions.MirrorTypeDefinitions (default false)
- DiscoverAsync pass-3: FetchTypeTreeAsync + recursive HasSubtype walk per
  branch (ObjectTypes/VariableTypes/DataTypes/ReferenceTypes), best-effort
  IsAbstract read, IncludePaths/ExcludePaths still applied
- 6 new unit tests; all 153 OpcUaClient unit tests pass

Closes #280
2026-04-25 20:38:17 -04:00
dohertj2 5e164dc965 Merge pull request '[opcuaclient] OpcUaClient — Selective import + namespace remap' (#359) from auto/opcuaclient/7 into auto/driver-gaps 2026-04-25 20:23:14 -04:00
Joseph Doherty 02d1c85190 Auto: opcuaclient-7 — selective import + namespace remap
Adds OpcUaClientCurationOptions on OpcUaClientDriverOptions.Curation with:
- IncludePaths/ExcludePaths globs (* and ? only) matched against the
  slash-joined BrowsePath segments collected during BrowseRecursiveAsync.
  Empty Include = include all so existing deployments are unaffected;
  Exclude wins over Include. Pruned folders are skipped wholesale, so
  descendants don't reach the wire.
- NamespaceRemap (URI -> URI) applied to DriverAttributeInfo.FullName when
  registering variables. Index-0 / standard nodes round-trip unchanged;
  remapped nodes serialise via ExpandedNodeId so downstream clients see
  the local-side URI.
- RootAlias replaces the hard-coded "Remote" folder name at the top of
  the mirrored tree.

Closes #279
2026-04-25 20:20:47 -04:00
dohertj2 1d3e9a3237 Merge pull request '[opcuaclient] OpcUaClient — Discovery URL FindServers' (#358) from auto/opcuaclient/6 into auto/driver-gaps 2026-04-25 20:13:21 -04:00
Joseph Doherty 0f509fbd3a Auto: opcuaclient-6 — Discovery URL FindServers
Adds optional `DiscoveryUrl` knob to OpcUaClientDriverOptions. When set,
the driver runs `DiscoveryClient.CreateAsync` + `FindServersAsync` +
`GetEndpointsAsync` against that URL during InitializeAsync and prepends
the discovered endpoint URLs (filtered to matching SecurityPolicy +
SecurityMode) to the failover candidate list. De-duplicates URLs that
appear in both discovered and static lists (case-insensitive). Discovery
failures are non-fatal — falls back to statically configured candidates.

The doc comment notes that FindServers requires SecurityMode=None on the
discovery channel per OPC UA spec, even when the data channel uses Sign
or SignAndEncrypt.

Closes #278

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-25 20:10:59 -04:00
dohertj2 e879b3ae90 Merge pull request '[focas] FOCAS — Bulk PMC range read coalescing' (#357) from auto/focas/F2-d into auto/driver-gaps 2026-04-25 20:04:36 -04:00
Joseph Doherty 4d3ee47235 Auto: focas-f2d — PMC range coalescing
Closes #266
2026-04-25 20:02:10 -04:00
dohertj2 9ebe5bd523 Merge pull request '[focas] FOCAS — PMC F/G letters for 16i' (#356) from auto/focas/F2-c into auto/driver-gaps 2026-04-25 19:51:39 -04:00
Joseph Doherty 63099115bf Auto: focas-f2c — PMC F/G for 16i
Capability-matrix correctness fix: real 16i ladders use F (CNC->PMC) and
G (PMC->CNC) signal groups for handshakes, but PmcLetters(Sixteen_i) was
returning {X,Y,R,D} only. Widen to {X,Y,F,G,R,D}; M/C/E/A/K/T remain
0i-F / 30i-only. Updated the matching test row.

Closes #265
2026-04-25 19:49:25 -04:00
dohertj2 7042b11f34 Merge pull request '[focas] FOCAS — Multi-path/multi-channel CNC' (#355) from auto/focas/F2-b into auto/driver-gaps 2026-04-25 19:45:27 -04:00
Joseph Doherty 2f3eeecd17 Auto: focas-f2b — multi-path/multi-channel CNC
Adds optional `@N` path suffix to FocasAddress (PARAM:1815@2, R100@3.0,
MACRO:500@2, DIAG:280@2/1) with PathId defaulting to 1 for back-compat.
Per-device PathCount is discovered via cnc_rdpathnum at first connect and
cached on DeviceState; reads with PathId>PathCount return BadOutOfRange.
The driver issues cnc_setpath before each non-default-path read and
tracks LastSetPath so repeat reads on the same path skip the wire call.

Closes #264
2026-04-25 19:42:58 -04:00
dohertj2 3b82f4f5fb Merge pull request '[focas] FOCAS — DIAG: address scheme' (#354) from auto/focas/F2-a into auto/driver-gaps 2026-04-25 19:34:08 -04:00
Joseph Doherty 451b37a632 Auto: focas-f2a — DIAG: address scheme
New FocasAreaKind.Diagnostic parsed from DIAG:nnn (whole-CNC) and
DIAG:nnn/axis (per-axis), validated against a per-series
FocasCapabilityMatrix.DiagnosticRange table (16i: 0-499; 0i-F family:
0-999; 30i/31i/32i: 0-1023; Power Motion i: 0-255; Unknown: permissive
per existing matrix convention).

IFocasClient gains ReadDiagnosticAsync(diagNumber, axisOrZero, type,
ct) with a default returning BadNotSupported so older transport
variants degrade gracefully. FwlibFocasClient implements it via a new
cnc_rddiag P/Invoke that reuses the IODBPSD struct (same shape as
cnc_rdparam). FocasDriver.ReadAsync dispatches Diagnostic addresses
through the new path; non-Diagnostic kinds keep the existing
ReadAsync route unchanged.

Tests: parser positives (DIAG:1031, DIAG:280/2, case-insensitive,
zero, axis-8) + negatives (malformed, axis>31), capability matrix
boundaries per series, driver-level dispatch verifying axis index
threads through, init-time rejection on out-of-range, and
BadNotSupported fallback when the wire client doesn't override the
default. 266/266 pass in Driver.FOCAS.Tests.

Closes #263

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-25 19:31:49 -04:00
dohertj2 6743d51db8 Merge pull request '[ablegacy] AbLegacy — ST string verification + length guard' (#353) from auto/ablegacy/6 into auto/driver-gaps 2026-04-25 19:21:13 -04:00
Joseph Doherty 0044603902 Auto: ablegacy-6 — ST string verification + length guard
Verifies libplctag's GetString/SetString round-trips ST file strings (1-word
length prefix + 82 ASCII bytes) end-to-end through the driver, and adds a
client-side length guard so over-82-char writes return BadOutOfRange instead
of being silently truncated by libplctag.

- LibplctagLegacyTagRuntime.EncodeValue: throws ArgumentOutOfRangeException
  for >82-char String writes (StFileMaxStringLength constant).
- AbLegacyDriver.WriteAsync: catches ArgumentOutOfRangeException and maps to
  BadOutOfRange.
- AbLegacyStringEncodingTests: 16 unit tests covering empty / 41-char /
  82-char / embedded-NUL / non-ASCII reads + writes; over-length writes
  return BadOutOfRange and never call WriteAsync; both Slc500 and Plc5
  family paths exercised.

Closes #249

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-25 19:18:55 -04:00
dohertj2 2fc71d288e Merge pull request '[ablegacy] AbLegacy — PD/MG/PLS/BT structure files' (#352) from auto/ablegacy/5 into auto/driver-gaps 2026-04-25 19:11:11 -04:00
Joseph Doherty 286ab3ba41 Auto: ablegacy-5 — PD/MG/PLS/BT structure files
Adds PD (PID), MG (Message), PLS (Programmable Limit Switch) and BT
(Block Transfer) file types to the PCCC parser. New AbLegacyDataType
enum members (PidElement / MessageElement / PlsElement /
BlockTransferElement) plus per-type sub-element catalogue:

  - PD: SP/PV/CV/KP/KI/KD/MAXS/MINS/DB/OUT as Float32; EN/DN/MO/PE/
        AUTO/MAN/SP_VAL/SP_LL/SP_HL as Boolean (word-0 status bits).
  - MG: RBE/MS/SIZE/LEN as Int32; EN/EW/ER/DN/ST/CO/NR/TO as Boolean.
  - PLS: LEN as Int32 (bit table varies per PLC).
  - BT: RLEN/DLEN as Int32; EN/ST/DN/ER/CO/EW/TO/NR as Boolean.

Per-family flags on AbLegacyPlcFamilyProfile gate availability:

  - PD/MG: SLC500 + PLC-5 (operator + status bits both present).
  - PLS/BT: PLC-5 only (chassis-IO block transfer is PLC-5-specific).
  - MicroLogix + LogixPccc: rejected — no legacy file-letter form.

Status-bit indices match Rockwell DTAM / 1747-RM001 / 1785-6.5.12:
PD word 0 bits 0-8, MG/BT word 0 bits 8-15. PLC-set status bits
(PE/DN/SP_*; ST/DN/ER/CO/EW/NR/TO) are surfaced as ViewOnly via
IsPlcSetStatusBit, matching the Timer/Counter/Control pattern from
ablegacy-3.

LibplctagLegacyTagRuntime decodes PD non-bit members as Float32 and
MG/BT/PLS non-bit members as Int32; status bits route through GetBit
with the bit-position encoded by the driver via StatusBitIndex.

Tests: parser positive cases per family + negative cases per family,
catalogue + bit-index + read-only-bit assertions.

Closes #248
2026-04-25 19:08:51 -04:00
dohertj2 5ca2ad83cd Merge pull request '[abcip] AbCip — AOI input/output handling' (#351) from auto/abcip/2.6 into auto/driver-gaps 2026-04-25 19:01:03 -04:00
Joseph Doherty e3c0750f7d Auto: abcip-2.6 — AOI input/output handling
AOI-aware browse paths: AOI instances now fan out under directional
sub-folders (Inputs/, Outputs/, InOut/) instead of a flat layout. The
sub-folders only appear when at least one member carries a non-Local
AoiQualifier, so plain UDT tags keep the pre-2.6 flat structure.

- Add AoiQualifier enum (Local / Input / Output / InOut) + new property
  on AbCipStructureMember (defaults to Local).
- L5K parser learns ADD_ON_INSTRUCTION_DEFINITION blocks; PARAMETER
  entries' Usage attribute flows through L5kMember.Usage.
- L5X parser captures the Usage attribute on <Parameter> elements.
- L5kIngest maps Usage strings (Input/Output/InOut) to AoiQualifier;
  null + unknown values map to Local.
- AbCipDriver.DiscoverAsync groups directional members under
  Inputs / Outputs / InOut sub-folders when any member is non-Local.
- Tests for L5K AOI block parsing, L5X Usage capture, ingest mapping
  (both formats), and AOI-vs-plain UDT discovery fan-out.

Closes #234
2026-04-25 18:58:49 -04:00
dohertj2 177d75784b Merge pull request '[abcip] AbCip — Online tag-DB refresh trigger' (#350) from auto/abcip/2.5 into auto/driver-gaps 2026-04-25 18:48:14 -04:00
Joseph Doherty 6e244e0c01 Auto: abcip-2.5 — online tag-DB refresh trigger
Add IDriverControl capability interface in Core.Abstractions with a
RebrowseAsync(IAddressSpaceBuilder, CancellationToken) hook so operators
can force a controller-side @tags re-walk without restarting the driver.

AbCipDriver now implements IDriverControl. RebrowseAsync clears the UDT
template cache (so stale shapes from a pre-download program don't
survive) then runs the same enumerator + builder fan-out as
DiscoverAsync, serialised against concurrent discovery / rebrowse via
a new SemaphoreSlim.

Driver.AbCip.Cli ships a `rebrowse` subcommand mirroring the existing
probe / read shape: connects to a single gateway, runs RebrowseAsync
against an in-memory builder, and prints discovered tag names so
operators can sanity-check the controller's symbol table from a shell.

Tests cover: two consecutive RebrowseAsync calls bump the enumerator's
Create / Enumerate counters once each, discovered tags reach the
supplied builder, the template cache is dropped on rebrowse, and the
driver exposes IDriverControl. 313 AbCip unit tests + 17 CLI tests +
37 Core.Abstractions tests pass.

Closes #233
2026-04-25 18:45:48 -04:00
dohertj2 27878d0faf Merge pull request '[abcip] AbCip — CSV tag import/export' (#349) from auto/abcip/2.4 into auto/driver-gaps 2026-04-25 18:36:17 -04:00
Joseph Doherty 08d8a104bb Auto: abcip-2.4 — CSV tag import/export
CsvTagImporter / CsvTagExporter parse and emit Kepware-format AB CIP tag
CSVs (Tag Name, Address, Data Type, Respect Data Type, Client Access,
Scan Rate, Description, Scaling). Import maps Tag Name → AbCipTagDefinition.Name,
Address → TagPath, Data Type → DataType, Description → Description,
Client Access → Writable. Skips blank rows + ;/# section markers; honours
column reordering via header lookup; RFC-4180-ish quoting.

CsvImports collection on AbCipDriverOptions mirrors L5kImports/L5xImports
and is consumed by InitializeAsync (declared > L5K > L5X > CSV precedence).

CLI tag-export command dumps the merged tag table from a driver-options JSON
to a Kepware CSV — runs the same import-merge precedence the driver uses but
without contacting any PLC.

Tests cover R/W mapping, blank-row skip, quoted comma, escaped quote, name
prefix, unknown-type fall-through, header reordering, and a load → export →
reparse round-trip.

Closes #232
2026-04-25 18:33:55 -04:00
dohertj2 7ee0cbc3f4 Merge pull request '[abcip] AbCip — Descriptions to OPC UA Description' (#348) from auto/abcip/2.3 into auto/driver-gaps 2026-04-25 18:25:59 -04:00
Joseph Doherty e5299cda5a Auto: abcip-2.3 — descriptions to OPC UA Description
Threads tag/UDT-member descriptions captured by the L5K (#346) and L5X
(#347) parsers through AbCipTagDefinition + AbCipStructureMember into
DriverAttributeInfo, so the address-space builder sets the OPC UA
Description attribute on each Variable node. L5kMember and L5xParser
also now capture per-member descriptions (via the (Description := "...")
attribute block on L5K and the <Description> child on L5X), and
L5kIngest forwards them. DriverNodeManager surfaces
DriverAttributeInfo.Description as the Variable's Description property.

Description is added as a trailing optional parameter on
DriverAttributeInfo (default null) so every other driver continues
to construct the record unchanged.

Closes #231
2026-04-25 18:23:31 -04:00
dohertj2 e5b192fcb3 Merge pull request '[abcip] AbCip — L5X (XML) parser + ingest' (#347) from auto/abcip/2.2 into auto/driver-gaps 2026-04-25 18:13:14 -04:00
Joseph Doherty cfcaf5c1d3 Auto: abcip-2.2 — L5X (XML) parser + ingest
Adds Import/L5xParser.cs that consumes Studio 5000 L5X (XML) controller
exports via System.Xml.XPath and produces the same L5kDocument bundle as
L5kParser, so L5kIngest handles both formats interchangeably.

- Controller-scope and program-scope <Tag> elements with Name, DataType,
  TagType, ExternalAccess, AliasFor, and <Description> child.
- <DataType>/<Members>/<Member> with Hidden BOOL-host (ZZZZZZZZZZ*) skip.
- AddOnInstructionDefinitions surfaced as L5kDataType entries so AOI-typed
  tags pick up a member layout the same way UDT-typed tags do; hidden
  EnableIn/EnableOut parameters skipped. Full directional Input/Output/InOut
  modelling stays deferred to PR 2.6.

AbCipDriverOptions gains parallel L5xImports collection (mirrors
L5kImports field-for-field). InitializeAsync funnels both through one
shared MergeImport helper that differs only in the parser delegate.

Tests: 8 L5X fixtures cover controller- and program-scope tags, alias skip,
UDT layout fan-out, AOI-typed tag, ZZZZZZZZZZ host skip, hidden AOI param
skip, missing-ExternalAccess default, and an empty-controller no-throw.

Closes #230
2026-04-25 18:10:53 -04:00
dohertj2 2731318c81 Merge pull request '[abcip] AbCip — L5K parser + ingest' (#346) from auto/abcip/2.1 into auto/driver-gaps 2026-04-25 18:03:31 -04:00
Joseph Doherty 86407e6ca2 Auto: abcip-2.1 — L5K parser + ingest
Pure-text parser for Studio 5000 L5K controller exports. Recognises
TAG/END_TAG, DATATYPE/END_DATATYPE, and PROGRAM/END_PROGRAM blocks,
strips (* ... *) comments, and tolerates multi-line entries + unknown
sections (CONFIG, MOTION_GROUP, etc.). Output records — L5kTag,
L5kDataType, L5kMember — feed L5kIngest which converts to
AbCipTagDefinition + AbCipStructureMember. Alias tags and
ExternalAccess=None tags are skipped per Kepware precedent.

AbCipDriverOptions gains an L5kImports collection
(AbCipL5kImportOptions records — file path or inline text + per-import
device + name prefix). InitializeAsync merges the imports into the
declared Tags map, with declared tags winning on Name conflicts so
operators can override import results without editing the L5K source.

Tests cover controller-scope TAG, program-scope TAG, alias-tag flag,
DATATYPE with member array dims, comment stripping, unknown-section
skipping, multi-line entries, and the full ingest path including
ExternalAccess=None / ReadOnly / UDT-typed tag fanout.

Closes #229

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-25 18:01:08 -04:00
dohertj2 2266dd9ad5 Merge pull request '[twincat] TwinCAT — ENUM and ALIAS at discovery' (#345) from auto/twincat/1.5 into auto/driver-gaps 2026-04-25 17:51:08 -04:00
Joseph Doherty 0df14ab94a Auto: twincat-1.5 — ENUM/ALIAS discovery
Resolve TwinCAT symbol data types via the IDataType chain instead of a
flat name match. ALIAS chains walk BaseType recursively (depth-capped at
16 against pathological cycles); ENUM surfaces its underlying integer
base type. POINTER / REFERENCE / INTERFACE / UNION / STRUCT / ARRAY / FB
remain explicitly out of scope and surface as null.

Closes #309
2026-04-25 17:48:45 -04:00
dohertj2 448a97d67f Merge pull request '[twincat] TwinCAT — Whole-array reads' (#344) from auto/twincat/1.4 into auto/driver-gaps 2026-04-25 17:38:38 -04:00
Joseph Doherty b699052324 Auto: twincat-1.4 — whole-array reads
Surface int[]? ArrayDimensions on TwinCATTagDefinition + thread it through
ITwinCATClient.ReadValueAsync / WriteValueAsync. When non-null + non-empty,
AdsTwinCATClient issues a single ADS read against the symbol with
clrType.MakeArrayType() and returns the flat 1-D CLR Array; for IEC TIME /
DATE / DT / TOD element types we project per-element to the native
TimeSpan / DateTime so consumers see consistent types regardless of rank.

DiscoverAsync surfaces IsArray=true + ArrayDim=product(dims) onto
DriverAttributeInfo via a new ResolveArrayShape helper. Multi-dim shapes
flatten to the product on the wire — DriverAttributeInfo.ArrayDim is
single-uint today and the OPC UA layer reflects rank via its own metadata.

Native ADS notification subscriptions skip whole-array tags so the OPC UA
layer falls through to a polled snapshot — the per-element AdsNotificationEx
callback shape doesn't fit a flat array. Whole-array WRITES are out of
scope for this PR — AdsTwinCATClient.WriteValueAsync returns
BadNotSupported when ArrayDimensions is set.

Tests: TwinCATArrayReadTests covers ResolveArrayShape (null / empty /
single-dim / multi-dim flatten / non-positive defensive), DiscoverAsync
emitting IsArray + ArrayDim for declared array tags, single-dim + multi-dim
fake-client read fan-out, and the BadNotSupported gate on whole-array
writes. Existing 137 unit tests still pass — total now 143.

Closes #308
2026-04-25 17:36:15 -04:00
dohertj2 e6a55add20 Merge pull request '[twincat] TwinCAT — Bit-indexed BOOL writes (RMW)' (#343) from auto/twincat/1.3 into auto/driver-gaps 2026-04-25 17:25:21 -04:00
Joseph Doherty fcf89618cd Auto: twincat-1.3 — bit-indexed BOOL writes (RMW)
Replace the NotSupportedException at AdsTwinCATClient.WriteValueAsync
for bit-indexed BOOL writes with a read-modify-write path:

  1. Strip the trailing .N selector from the symbol path.
  2. Read the parent as UDINT.
  3. Set or clear bit N via the standard mask.
  4. Write the parent back.

Concurrent bit writers against the same parent serialise through a
per-parent SemaphoreSlim cached in a ConcurrentDictionary (never
removed — bounded by writable-bit-tag cardinality). Mirrors the AbCip /
Modbus / FOCAS bit-RMW pattern shipped in #181 pass 1.

The path-stripping (TryGetParentSymbolPath) and mask helper (ApplyBit)
are exposed as internal statics so tests can pin the pure logic without
needing a real ADS target. The FakeTwinCATClient mirrors the same RMW
semantics so driver-level round-trip tests assert the parent-word state.

Closes #307
2026-04-25 17:22:59 -04:00
dohertj2 f83c467647 Merge pull request '[twincat] TwinCAT — Native UA TIME/DATE/DT/TOD' (#342) from auto/twincat/1.2 into auto/driver-gaps 2026-04-25 17:16:39 -04:00
Joseph Doherty 80b2d7f8c3 Auto: twincat-1.2 — native UA TIME/DATE/DT/TOD
IEC 61131-3 TIME/TOD now surface as TimeSpan (UA Duration); DATE/DT
surface as DateTime (UTC). The wire form stays UDINT — AdsTwinCATClient
post-processes raw values in ReadValueAsync and OnAdsNotificationEx,
and accepts native CLR types in ConvertForWrite. Added Duration to
DriverDataType (back-compat: existing switches default to BaseDataType
for unknown enum values) and mapped it to DataTypeIds.Duration in
DriverNodeManager.

Closes #306
2026-04-25 17:14:12 -04:00
dohertj2 8286255ae5 Merge pull request '[twincat] TwinCAT — Int64 fidelity for LINT/ULINT' (#341) from auto/twincat/1.1 into auto/driver-gaps 2026-04-25 17:06:55 -04:00
Joseph Doherty 615ab25680 Auto: twincat-1.1 — Int64 fidelity for LINT/ULINT
Map LInt/ULInt to DriverDataType.Int64/UInt64 instead of truncating
to Int32. AdsTwinCATClient.MapToClrType already returns long/ulong
so the wire-level read returns the correct boxed types.

Closes #305
2026-04-25 17:04:43 -04:00
dohertj2 545cc74ec8 Merge pull request '[s7] S7 — LOGO!/S7-200 V-memory parser' (#340) from auto/s7/PR-S7-A5 into auto/driver-gaps 2026-04-25 17:00:59 -04:00
Joseph Doherty e5122c546b Auto: s7-a5 — LOGO!/S7-200 V-memory parser
Add CPU-aware overload S7AddressParser.Parse(string, CpuType?) that
accepts the V area letter for S7-200 / S7-200 Smart / LOGO! 0BA8 and
maps it to DataBlock DB1. V is rejected on S7-300/400/1200/1500 and on
the legacy CPU-agnostic Parse(string) overload. Width suffixes mirror
M/I/Q (VB/VW/VD/V0.0). S7Driver passes _options.CpuType so live tag
config picks up family-aware parsing.

Tests cover S7200/S7200Smart/Logo0BA8 positive cases, modern-family
rejection, and CPU-agnostic rejection.

Closes #291
2026-04-25 16:58:34 -04:00
dohertj2 6737edbad2 Merge pull request '[s7] S7 — Array tags (ValueRank=1)' (#339) from auto/s7/PR-S7-A4 into auto/driver-gaps 2026-04-25 16:51:34 -04:00
Joseph Doherty ce98c2ada3 Auto: s7-a4 — array tags (ValueRank=1)
- S7TagDefinition gets optional ElementCount; >1 marks the tag as a 1-D array.
- ReadOneAsync / WriteOneAsync: one byte-range Read/WriteBytesAsync covering
  N × elementBytes, sliced/packed client-side via the existing big-endian scalar
  codecs and S7DateTimeCodec.
- DiscoverAsync surfaces IsArray=true and ArrayDim=ElementCount → ValueRank=1.
- Init-time validation (now ahead of TCP open) caps ElementCount at 8000 and
  rejects unsupported element types: STRING/WSTRING/CHAR/WCHAR (variable-width)
  and BOOL (packed-bit layout) — both follow-ups.
- Supported element types: Byte, Int16/UInt16, Int32/UInt32, Int64/UInt64,
  Float32, Float64, Date, Time, TimeOfDay.

Closes #290
2026-04-25 16:49:02 -04:00
dohertj2 676eebd5e4 Merge pull request '[s7] S7 — DTL/DT/S5TIME/TIME/TOD/DATE codecs' (#338) from auto/s7/PR-S7-A3 into auto/driver-gaps 2026-04-25 16:40:02 -04:00
Joseph Doherty 2b66cec582 Auto: s7-a3 — DTL/DT/S5TIME/TIME/TOD/DATE codecs
Adds S7DateTimeCodec static class implementing the six Siemens S7 date/time
wire formats:

  - DTL (12 bytes): UInt16 BE year + month/day/dow/h/m/s + UInt32 BE nanos
  - DATE_AND_TIME (8 bytes BCD): yy/mm/dd/hh/mm/ss + 3-digit BCD ms + dow
  - S5TIME (16 bits): 2-bit timebase + 3-digit BCD count → TimeSpan
  - TIME (Int32 ms BE, signed) → TimeSpan, allows negative durations
  - TOD (UInt32 ms BE, 0..86399999) → TimeSpan since midnight
  - DATE (UInt16 BE days since 1990-01-01) → DateTime

Mirrors the S7StringCodec pattern from PR-S7-A2 — codecs operate on raw byte
spans so each format can be locked with golden-byte unit tests without a
live PLC. New S7DataType members (Dtl, DateAndTime, S5Time, Time, TimeOfDay,
Date) are wired into S7Driver.ReadOneAsync/WriteOneAsync via byte-level
ReadBytesAsync/WriteBytesAsync calls — S7.Net's string-keyed Read/Write
overloads have no syntax for these widths.

Uninitialized PLC buffers (all-zero year+month for DTL/DT) reject as
InvalidDataException → BadOutOfRange to operators, rather than decoding as
year-0001 garbage.

S5TIME / TIME / TOD surface as Int32 ms (DriverDataType has no Duration);
DTL / DT / DATE surface as DriverDataType.DateTime.

Test coverage: 30 new golden-vector + round-trip + rejection tests,
including the all-zero buffer rejection paths and BCD-nibble validation.
Build clean, 115/115 S7 tests pass.

Closes #289

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-25 16:37:39 -04:00
dohertj2 b751c1c096 Merge pull request '[s7] S7 — STRING/WSTRING/CHAR/WCHAR' (#337) from auto/s7/PR-S7-A2 into auto/driver-gaps 2026-04-25 16:28:22 -04:00
Joseph Doherty 316f820eff Auto: s7-a2 — STRING/WSTRING/CHAR/WCHAR
Closes the NotSupportedException cliff for S7 string-shaped types.

- S7DataType gains WString, Char, WChar members alongside the existing
  String entry.
- New S7StringCodec encodes/decodes the four wire formats:
    STRING  : 2-byte header (max-len + actual-len bytes) + N ASCII bytes
              -> total 2 + max_len.
    WSTRING : 4-byte header (max-len + actual-len UInt16 BE) + N×2
              UTF-16BE bytes -> total 4 + 2 × max_len.
    CHAR    : 1 ASCII byte (rejects non-ASCII on encode).
    WCHAR   : 2 UTF-16BE bytes.
  Header-bug clamp: actualLen > maxLen is silently clamped on read so
  firmware quirks don't walk past the wire buffer; rejected on write
  to avoid silent truncation.
- S7Driver.ReadOneAsync / WriteOneAsync issue ReadBytesAsync /
  WriteBytesAsync against the parsed Area / DbNumber / ByteOffset and
  honour S7TagDefinition.StringLength (default 254 = S7 STRING max).
- MapDataType returns DriverDataType.String for the three new enum
  members so OPC UA discovery surfaces them as scalar strings.

Tests: 21 new cases on S7StringCodec covering golden-byte vectors,
encode/decode round-trips, the firmware-bug header-clamp, ASCII-only
guard on CHAR, and the StringLength default. 85/85 passing.

Closes #288
2026-04-25 16:26:05 -04:00
dohertj2 38eb909f69 Merge pull request '[s7] S7 — 64-bit scalar types (LInt/ULInt/LReal/LWord)' (#336) from auto/s7/PR-S7-A1 into auto/driver-gaps 2026-04-25 16:18:40 -04:00
Joseph Doherty d1699af609 Auto: s7-a1 — 64-bit scalar types
Closes the NotSupportedException cliff for S7 Float64/Int64/UInt64.

- S7Size enum gains LWord (8 bytes); parser accepts DBLD/DBL on data
  blocks and LD on M/I/Q (e.g. DB1.DBLD0, DB1.DBL8, MLD0, ILD8, QLD16).
- S7Driver.ReadOneAsync / WriteOneAsync issue ReadBytesAsync /
  WriteBytesAsync for 64-bit types and convert big-endian via
  System.Buffers.Binary.BinaryPrimitives. S7's wire format is BE.
- Internal MapArea(S7Area) helper translates to S7.Net DataType.
- MapDataType now surfaces native DriverDataType for Int16/UInt16/
  UInt32/Int64/UInt64 instead of collapsing them all to Int32.

Tests: parser theories cover DBLD/DBL/MLD/ILD/QLD; discovery test
asserts the 64-bit DriverDataType mapping. 64/64 passing.

Closes #287
2026-04-25 16:16:23 -04:00
dohertj2 c6c694b69e Merge pull request '[opcuaclient] OpcUaClient — CRL/revocation handling' (#335) from auto/opcuaclient/5 into auto/driver-gaps 2026-04-25 16:08:21 -04:00
Joseph Doherty 4a3860ae92 Auto: opcuaclient-5 — CRL/revocation handling
Adds explicit revoked-vs-untrusted distinction to the OpcUaClient driver's
server-cert validation hook, plus three new knobs on a new
OpcUaCertificateValidationOptions sub-record:

  RejectSHA1SignedCertificates  (default true — SHA-1 is OPC UA spec-deprecated;
                                 this is a deliberately tighter default)
  RejectUnknownRevocationStatus (default false — keeps brownfield deployments
                                 without CRL infrastructure working)
  MinimumCertificateKeySize     (default 2048)

The validator hook now runs whether or not AutoAcceptCertificates is set:
revoked / issuer-revoked certs are always rejected with a distinct
"REVOKED" log line; SHA-1 + small-key certs are rejected per policy;
unknown-revocation gates on the new flag; untrusted still honours
AutoAccept.

Decision pipeline factored into a static EvaluateCertificateValidation
helper with a CertificateValidationDecision record so unit tests cover
all branches without needing to spin up an SDK CertificateValidator.

CRL files themselves: the OPC UA SDK reads them automatically from the
crl/ subdir of each cert store — no driver-side wiring needed.
Documented on the new options record.

Tests (12 new) cover defaults, every branch of the decision pipeline,
SHA-1 detection (custom X509SignatureGenerator since .NET 10's
CreateSelfSigned refuses SHA-1), and key-size detection. All 127
OpcUaClient unit tests still pass.

Closes #277

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-25 16:05:50 -04:00
dohertj2 d57e24a7fa Merge pull request '[opcuaclient] OpcUaClient — Diagnostics counters' (#334) from auto/opcuaclient/4 into auto/driver-gaps 2026-04-25 15:56:21 -04:00
Joseph Doherty bb1ab47b68 Auto: opcuaclient-4 — diagnostics counters
Per-driver counters surfaced via DriverHealth.Diagnostics for the
driver-diagnostics RPC. New OpcUaClientDiagnostics tracks
PublishRequestCount, NotificationCount, NotificationsPerSecond (5s-half-life
EWMA), MissingPublishRequestCount, DroppedNotificationCount,
SessionResetCount and LastReconnectUtcTicks via Interlocked on the hot path.

DriverHealth gains an optional IReadOnlyDictionary<string,double>?
Diagnostics parameter (defaulted null for back-compat with the seven other
drivers' constructors). OpcUaClientDriver wires Session.Notification +
Session.PublishError on connect and on reconnect-complete (recording a
session-reset there); GetHealth snapshots the counters on every poll so the
RPC sees fresh values without a tick source.

Tests: 11 new OpcUaClientDiagnosticsTests cover counter increments, EWMA
convergence, snapshot shape, GetHealth integration, and DriverHealth
back-compat. Full OpcUaClient.Tests 115/115 green.

Closes #276

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-25 15:53:57 -04:00
dohertj2 a04ba2af7a Merge pull request '[opcuaclient] OpcUaClient — Honor server OperationLimits' (#333) from auto/opcuaclient/3 into auto/driver-gaps 2026-04-25 15:41:17 -04:00
Joseph Doherty 494fdf2358 Auto: opcuaclient-3 — honor server OperationLimits
Closes #275
2026-04-25 15:38:55 -04:00
dohertj2 9f1e033e83 Merge pull request '[opcuaclient] OpcUaClient — Per-tag advanced subscription tuning incl. deadband' (#332) from auto/opcuaclient/2 into auto/driver-gaps 2026-04-25 15:27:51 -04:00
Joseph Doherty fae00749ca Auto: opcuaclient-2 — per-tag advanced subscription tuning
Closes #274
2026-04-25 15:25:20 -04:00
dohertj2 bf200e813e Merge pull request '[opcuaclient] OpcUaClient — Per-subscription tuning' (#331) from auto/opcuaclient/1 into auto/driver-gaps 2026-04-25 15:11:23 -04:00
Joseph Doherty 7209364c35 Auto: opcuaclient-1 — per-subscription tuning
Closes #273
2026-04-25 15:09:08 -04:00
dohertj2 8314c273e7 Merge pull request '[focas] FOCAS — Figure scaling + diagnostics' (#330) from auto/focas/F1-f into auto/driver-gaps 2026-04-25 15:04:05 -04:00
Joseph Doherty 1abf743a9f Auto: focas-f1f — figure scaling + diagnostics
Closes #262
2026-04-25 15:01:37 -04:00
dohertj2 63a79791cd Merge pull request '[focas] FOCAS — Operator messages + block text' (#329) from auto/focas/F1-e into auto/driver-gaps 2026-04-25 14:51:46 -04:00
Joseph Doherty cc757855e6 Auto: focas-f1e — operator messages + block text
Closes #261
2026-04-25 14:49:11 -04:00
dohertj2 84913638b1 Merge pull request '[focas] FOCAS — Tool number + work coordinate offsets' (#328) from auto/focas/F1-d into auto/driver-gaps 2026-04-25 14:40:17 -04:00
Joseph Doherty 9ec92a9082 Auto: focas-f1d — Tool number + work coordinate offsets
Closes #260
2026-04-25 14:37:51 -04:00
dohertj2 49fc23adc6 Merge pull request '[focas] FOCAS — Modal codes + overrides' (#327) from auto/focas/F1-c into auto/driver-gaps 2026-04-25 14:29:12 -04:00
Joseph Doherty 3c2c4f29ea Auto: focas-f1c — Modal codes + overrides
Closes #259

Adds Modal/ + Override/ fixed-tree subfolders per FOCAS device, mirroring the
pattern established by Status/ (#257) and Production/ (#258): cached snapshots
refreshed on the probe tick, served from cache on read, no extra wire traffic
on top of user-driven tag reads.

Modal/ surfaces the four universally-present aux modal codes M/S/T/B from
cnc_modal(type=100..103) as Int16. **G-group decoding (groups 1..21) is deferred
to a follow-up** — the FWLIB ODBMDL union differs per series + group and the
issue body explicitly permits this scoping. Adds the cnc_modal P/Invoke +
ODBMDL struct + a generic int16 cnc_rdparam helper so the follow-up can add
G-groups without further wire-level scaffolding.

Override/ surfaces Feed/Rapid/Spindle/Jog from cnc_rdparam at MTB-specific
parameter numbers (FocasDeviceOptions.OverrideParameters; defaults to 30i:
6010/6011/6014/6015). Per-field nullable params let a deployment hide overrides
their MTB doesn't wire up; passing OverrideParameters=null suppresses the entire
Override/ subfolder for that device.

6 unit tests cover discovery shape, omitted Override folder when unconfigured,
partial Override field selection, cached-snapshot reads (Modal + Override),
BadCommunicationError before first refresh, and the FwlibFocasClient
disconnected short-circuit.
2026-04-25 14:26:48 -04:00
dohertj2 ae7cc15178 Merge pull request '[focas] FOCAS — Parts count + cycle time' (#326) from auto/focas/F1-b into auto/driver-gaps 2026-04-25 14:17:17 -04:00
Joseph Doherty 3d9697b918 Auto: focas-f1b — parts count + cycle time
Closes #258
2026-04-25 14:14:54 -04:00
dohertj2 329e222aa2 Merge pull request '[focas] FOCAS — ODBST status flags as fixed-tree nodes' (#325) from auto/focas/F1-a into auto/driver-gaps 2026-04-25 14:07:42 -04:00
Joseph Doherty 551494d223 Auto: focas-f1a — ODBST status flags as fixed-tree nodes
Closes #257
2026-04-25 14:05:12 -04:00
dohertj2 5b4925e61a Merge pull request '[ablegacy] AbLegacy — Indirect/indexed addressing parser' (#324) from auto/ablegacy/4 into auto/driver-gaps 2026-04-25 13:53:28 -04:00
Joseph Doherty 4ff4cc5899 Auto: ablegacy-4 — indirect/indexed addressing parser
Closes #247
2026-04-25 13:51:03 -04:00
dohertj2 b95eaacc05 Merge pull request '[ablegacy] AbLegacy — Sub-element bit semantics' (#323) from auto/ablegacy/3 into auto/driver-gaps 2026-04-25 13:44:21 -04:00
Joseph Doherty c89f5bb3b9 Auto: ablegacy-3 — sub-element bit semantics
Closes #246
2026-04-25 13:41:52 -04:00
dohertj2 07235d3b66 Merge pull request '[ablegacy] AbLegacy — MicroLogix function-file letters' (#322) from auto/ablegacy/2 into auto/driver-gaps 2026-04-25 13:34:46 -04:00
Joseph Doherty f2bc36349e Auto: ablegacy-2 — MicroLogix function-file letters
Closes #245
2026-04-25 13:32:23 -04:00
dohertj2 ccf2e3a9c0 Merge pull request '[ablegacy] AbLegacy — PLC-5 octal I/O addressing' (#321) from auto/ablegacy/1 into auto/driver-gaps 2026-04-25 13:26:54 -04:00
Joseph Doherty 8f7265186d Auto: ablegacy-1 — PLC-5 octal I/O addressing
Closes #244
2026-04-25 13:25:22 -04:00
dohertj2 651d6c005c Merge pull request '[abcip] AbCip — CIP multi-tag write packing' (#320) from auto/abcip/1.4 into auto/driver-gaps 2026-04-25 13:16:49 -04:00
Joseph Doherty 36b2929780 Auto: abcip-1.4 — CIP multi-tag write packing
Group writes by device through new AbCipMultiWritePlanner; for families that
support CIP request packing (ControlLogix / CompactLogix / GuardLogix) the
packable writes for one device are dispatched concurrently so libplctag's
native scheduler can coalesce them onto one Multi-Service Packet (0x0A).
Micro800 keeps SupportsRequestPacking=false and falls back to per-tag
sequential writes. BOOL-within-DINT writes are excluded from packing and
continue to go through the per-parent RMW semaphore so two concurrent bit
writes against the same DINT cannot lose one another's update.

The libplctag .NET wrapper does not expose a Multi-Service Packet construction
API at the per-Tag surface (each Tag is one CIP service), so this PR uses
client-side coalescing — concurrent Task.WhenAll dispatch per device — rather
than building raw CIP frames. The native libplctag scheduler does pack
concurrent same-connection writes when the family allows it, which gives the
round-trip reduction #228 calls for without ballooning the diff.

Per-tag StatusCodes preserve caller order across success, transport failure,
non-writable tags, unknown references, and unknown devices, including in
mixed concurrent batches.

Closes #228
2026-04-25 13:14:28 -04:00
dohertj2 345ac97c43 Merge pull request '[abcip] AbCip — Array-slice read addressing Tag[0..N]' (#319) from auto/abcip/1.3 into auto/driver-gaps 2026-04-25 13:06:11 -04:00
Joseph Doherty 767ac4aec5 Auto: abcip-1.3 — array-slice read addressing
Closes #227
2026-04-25 13:03:45 -04:00
dohertj2 29edd835a3 Merge pull request '[abcip] AbCip — STRINGnn variant decoding' (#318) from auto/abcip/1.2 into auto/driver-gaps 2026-04-25 12:55:34 -04:00
Joseph Doherty d78a471e90 Auto: abcip-1.2 — STRINGnn variant decoding
Closes #226

Adds nullable StringLength to AbCipTagDefinition + AbCipStructureMember
so STRING_20 / STRING_40 / STRING_80 UDT variants decode against the
right DATA-array capacity. The configured length threads through a new
StringMaxCapacity field on AbCipTagCreateParams and lands on the
libplctag Tag.StringMaxCapacity attribute (verified property on
libplctag 1.5.2). Null leaves libplctag's default 82-byte STRING in
place for back-compat. Driver gates on DataType == String so a stray
StringLength on a DINT tag doesn't reshape that buffer. UDT member
fan-out copies StringLength from the AbCipStructureMember onto the
synthesised member tag definition.

Tests: 4 new in AbCipDriverReadTests covering threaded StringMaxCapacity,
the null back-compat path, the non-String gate, and the UDT-member fan-out.
2026-04-25 12:53:20 -04:00
dohertj2 1d9e40236b Merge pull request '[abcip] AbCip — LINT/ULINT 64-bit fidelity' (#317) from auto/abcip/1.1 into auto/driver-gaps 2026-04-25 12:47:17 -04:00
Joseph Doherty 2e6228a243 Auto: abcip-1.1 — LINT/ULINT 64-bit fidelity
Closes #225
2026-04-25 12:44:43 -04:00
990 changed files with 82690 additions and 48001 deletions
-3
View File
@@ -37,6 +37,3 @@ src/ZB.MOM.WW.OtOpcUa.Server/config_cache.db
# E2E sidecar config — NodeIds are specific to each dev's local seed (see scripts/e2e/README.md)
scripts/e2e/e2e-config.json
config_cache*.db
# Client CLI/UI runtime scratch (last-connected endpoint cache)
session.dat
+39 -72
View File
@@ -4,38 +4,15 @@ This file provides guidance to Claude Code (claude.ai/code) when working with co
## Project Goal
Build an OPC UA server (.NET 10) that exposes AVEVA System Platform
(Wonderware) Galaxy tags. The server mirrors the Galaxy object
hierarchy as an OPC UA address space, translating between
contained-name browse paths and tag-name runtime references. Galaxy
access flows through the in-process `GalaxyDriver`
(`src/ZB.MOM.WW.OtOpcUa.Driver.Galaxy/`) talking gRPC to a separately
installed **mxaccessgw** gateway process. The gateway owns the
MXAccess COM bitness constraint (its worker is x86 net48); everything
in this repo is .NET 10. PR 7.2 retired the legacy in-process
`Galaxy.Host` / `Galaxy.Proxy` / `Galaxy.Shared` projects + the
`OtOpcUaGalaxyHost` Windows service.
See `docs/v2/Galaxy.Performance.md` for the runtime perf surface
(tracing, metrics, soak harness).
Build an OPC UA server on .NET Framework 4.8 (32-bit) that exposes AVEVA System Platform (Wonderware) Galaxy tags via the MXAccess toolkit. The server mirrors the Galaxy object hierarchy as an OPC UA address space, translating between contained-name browse paths and tag-name runtime references.
## Architecture Overview
### Data Flow
1. **Galaxy Repository DB (ZB)** — SQL Server database holding the
deployed object hierarchy and attribute definitions. The
mxaccessgw's `GalaxyRepositoryClient` queries it via gRPC; the
driver consumes the materialised hierarchy through
`IGalaxyHierarchySource`.
2. **MXAccess (via mxaccessgw)** — Live read/write/subscribe over a
gRPC session. The gateway owns the COM apartment + STA pump
server-side; the driver speaks `MxCommand` / `MxEvent` protos
exclusively.
3. **OPC UA Server** — Exposes the hierarchy as browse nodes and
attributes as variable nodes. Clients browse via contained names
but reads/writes are translated to `tag_name.AttributeName` format
for MXAccess.
1. **Galaxy Repository DB (ZB)** — SQL Server database holding the deployed object hierarchy and attribute definitions. Queried at startup and on change detection to build/rebuild the OPC UA address space.
2. **MXAccess COM API** — Runtime data access layer. Subscribes to Galaxy tag attributes for live read/write. Requires a dedicated STA thread with a Win32 message pump for COM callbacks.
3. **OPC UA Server** — Exposes the hierarchy as browse nodes and attributes as variable nodes. Clients browse via contained names but reads/writes are translated to `tag_name.AttributeName` format for MXAccess.
### Key Concept: Contained Name vs Tag Name
@@ -45,17 +22,43 @@ Galaxy objects have two names:
Example: browsing `TestMachine_001/DelmiaReceiver/DownloadPath` translates to MXAccess reference `DelmiaReceiver_001.DownloadPath`.
See `gr/layout.md` for the full mapping and target OPC UA structure.
### Data Type Mapping
Galaxy `mx_data_type` values map to OPC UA types (Boolean, Int32, Float, Double, String, DateTime, etc.). Array attributes use ValueRank=1 with ArrayDimensions from the Galaxy attribute definition. The driver-side mapping lives in `src/ZB.MOM.WW.OtOpcUa.Driver.Galaxy/Browse/DataTypeMap.cs`.
Galaxy `mx_data_type` values map to OPC UA types (Boolean, Int32, Float, Double, String, DateTime, etc.). Array attributes use ValueRank=1 with ArrayDimensions from the Galaxy attribute definition. Full mapping in `gr/data_type_mapping.md`.
### Change Detection
`DeployWatcher` (`src/ZB.MOM.WW.OtOpcUa.Driver.Galaxy/Browse/DeployWatcher.cs`) polls the gateway's deploy-event signal and raises `IRediscoverable.OnRediscoveryNeeded` when the Galaxy redeploys. The server's `DriverHost` consumes the signal and rebuilds the address space.
Poll `galaxy.time_of_last_deploy` in the ZB database to detect redeployments, then rebuild the address space. See `gr/build_layout_plan.md` for the step-by-step plan.
## mxaccessgw
## Reference Implementation
The gateway lives in a sibling repo at `c:\Users\dohertj2\Desktop\mxaccessgw\`. See `docs/v2/Galaxy.ParityRig.md` for the gw setup recipe (build, API key provisioning via `apikey create-key`, env-var overrides for HTTP/2 cleartext + worker path). The gw's MXAccess Toolkit reference (its `gateway.md`) is the canonical MxAccess API doc; the standalone `mxaccess_documentation.md` previously kept in this repo retired in PR 7.3.
An existing MXAccess client implementation is at:
`C:\Users\dohertj2\Desktop\scadalink-design\lmxproxy\src\ZB.MOM.WW.LmxProxy.Host`
Key patterns from that codebase:
- **StaComThread** — Dedicated STA thread with Win32 message pump (`GetMessage`/`DispatchMessage` loop). All MXAccess COM objects must be created and called on this thread. Uses `PostThreadMessage(WM_APP)` to marshal work items.
- **LMXProxyServer COM object** — `Register(clientName)` returns a connection handle. `AddItem(handle, address)` + `AdviseSupervisory(handle, itemHandle)` for subscriptions. `OnDataChange`/`OnWriteComplete` events for callbacks.
- **Reconnect** — Stored subscriptions are replayed after reconnect. A probe tag subscription monitors connection health.
- **COM cleanup** — `Marshal.ReleaseComObject()` on disconnect. Event handlers must be unwired before unregister.
## MXAccess Documentation
`mxaccess_documentation.md` in the project root contains the full ArchestrA MXAccess Toolkit User's Guide. Key API: `ArchestrA.MxAccess` namespace, `LMXProxyServer` class. The toolkit DLLs are in `Program Files (x86)\ArchestrA\Framework\bin`.
## Galaxy Repository Database
Connection: `sqlcmd -S localhost -d ZB -E` (Windows Auth). See `gr/connectioninfo.md`.
The `gr/` folder contains:
- `queries/` — SQL for hierarchy extraction, attribute lookup, and change detection
- `ddl/tables/` and `ddl/views/` — Schema definitions
- `schema.md` — Full table/view reference
- `build_layout_plan.md` — Step-by-step plan for building the OPC UA address space from DB queries
- `gr/CLAUDE.md` — Detailed guidance for working within the `gr/` subfolder
Key tables: `gobject` (hierarchy/deployment), `template_definition` (object categories), `dynamic_attribute` (user-defined attributes), `primitive_instance` (primitive-to-attribute links), `galaxy` (change detection).
## Build Commands
@@ -68,48 +71,11 @@ dotnet test tests/ZB.MOM.WW.OtOpcUa.IntegrationTests # integration tests
dotnet test --filter "FullyQualifiedName~MyTestClass.MyMethod" # single test
```
## Docker Workflow (driver fixtures + central SQL Server)
> **Migrated 2026-04-28**: Docker config + host moved off this dev VM (DESKTOP-6JL3KKO) onto the shared Linux Docker host (`DOCKER`, 10.100.0.35) so the dev VM could shed WSL2/Hyper-V and have its GPU re-attached via ESXi passthrough. Docker Desktop is no longer installed here. All checked-in `appsettings.json` defaults, fixture-class default endpoints, and `e2e-config.sample.json` were rewritten to target `10.100.0.35`. The driver fixture compose files under `tests/.../Docker/docker-compose.yml` now carry a `project: lmxopcua` label on every service. See `docs/v2/dev-environment.md` for the full rewrite (header dated 2026-04-28).
Docker workloads run on a shared Linux host at **`10.100.0.35`** — not on this VM. Stacks live at `/opt/otopcua-<driver>/` on the host and carry the `project=lmxopcua` label so they're discoverable via `docker ps --filter label=project=lmxopcua`.
**`docker -H ssh://...` does NOT work from this VM.** Windows OpenSSH ↔ docker.exe stdio bridging hangs (`docker system dial-stdio` runs server-side but no API data flows). Use the helper below — it SSHes into the docker host and runs `docker compose` server-side.
**Use `lmxopcua-fix.ps1` (in `~/bin`) to control fixtures from this VM:**
```powershell
lmxopcua-fix ls # list all lmxopcua-tagged containers on the host
lmxopcua-fix up modbus standard # bring a profile up
lmxopcua-fix up abcip controllogix
lmxopcua-fix up s7 s7_1500
lmxopcua-fix up opcuaclient # single-service stack, no profile arg
lmxopcua-fix down modbus # tear stack down
lmxopcua-fix logs modbus
lmxopcua-fix sync modbus # rsync this repo's tests/.../Docker/ → /opt/otopcua-modbus/
```
**`sync` is the deployment step.** When you edit a fixture's compose file or Dockerfile under `tests/.../Docker/`, run `lmxopcua-fix sync <driver>` to push the changes to the docker host before bringing the stack up. The repo files are the source of truth; `/opt/otopcua-<driver>/` is a mirrored deployment.
**Endpoints (defaults already point at the docker host):**
- SQL Server (always-on): `10.100.0.35,14330` — used by `appsettings.json` for `ConfigDb`.
- Modbus: `10.100.0.35:5020` (`MODBUS_SIM_ENDPOINT`)
- AB CIP: `10.100.0.35:44818` (`AB_SERVER_ENDPOINT`)
- S7: `10.100.0.35:1102` (`S7_SIM_ENDPOINT`)
- OPC UA reference (opc-plc): `opc.tcp://10.100.0.35:50000` (`OPCUA_SIM_ENDPOINT`)
Override any endpoint via the env var to point at a real PLC. The local OtOpcUa server runs on this VM at `opc.tcp://localhost:4840`**that's not on the docker host**.
See `docs/v2/dev-environment.md` for the full inventory and rationale.
## Build & Runtime Constraints
- Language: C#, .NET 10, AnyCPU. The MXAccess COM bitness constraint
is owned by the mxaccessgw worker (x86 net48), not by anything in
this repo.
- The gateway's MXAccess worker requires a deployed ArchestrA Platform
on the machine running the gateway. The OtOpcUa server itself does
not.
- Language: C#, .NET Framework 4.8, **x86 (32-bit)** platform target — required for MXAccess COM interop
- MXAccess requires a deployed ArchestrA Platform on the machine running the server
- COM apartment: MXAccess objects must live on an STA thread with a message pump
## Transport Security
@@ -117,7 +83,7 @@ The server supports configurable OPC UA transport security via the `Security` se
## Redundancy
The server supports non-transparent warm/hot redundancy via the `Redundancy` section in `appsettings.json`. Two instances share the same Galaxy DB and the same mxaccessgw (under distinct `MxAccess.ClientName` values) but have unique `ApplicationUri` values. Each exposes `RedundancySupport`, `ServerUriArray`, and a dynamic `ServiceLevel` based on role and runtime health. The primary advertises a higher ServiceLevel than the secondary. See `docs/Redundancy.md` for the full guide.
The server supports non-transparent warm/hot redundancy via the `Redundancy` section in `appsettings.json`. Two instances share the same Galaxy DB and MXAccess runtime but have unique `ApplicationUri` values. Each exposes `RedundancySupport`, `ServerUriArray`, and a dynamic `ServiceLevel` based on role and runtime health. The primary advertises a higher ServiceLevel than the secondary. See `docs/Redundancy.md` for the full guide.
## LDAP Authentication
@@ -128,6 +94,7 @@ The server uses LDAP-based user authentication via the `Authentication.Ldap` sec
- **Logging**: Serilog with rolling daily file sink
- **Unit tests**: xUnit + Shouldly for assertions
- **Service hosting (Server, Admin)**: .NET generic host with `AddWindowsService` (decision #30 — replaced TopShelf in v2; see `src/ZB.MOM.WW.OtOpcUa.Server/OpcUaServerService.cs`)
- **Service hosting (Galaxy.Host)**: plain console app wrapped by NSSM (`.NET Framework 4.8 x86` — required by MXAccess COM bitness)
- **OPC UA**: OPC Foundation UA .NET Standard stack (https://github.com/opcfoundation/ua-.netstandard) — NuGet: `OPCFoundation.NetStandard.Opc.Ua.Server`
## OPC UA .NET Standard Documentation
+168 -83
View File
@@ -1,115 +1,200 @@
# OtOpcUa
# LmxOpcUa
OPC UA server (.NET 10 AnyCPU) that exposes a fleet of industrial drivers as a single OPC UA address space. Drivers ship in-process for AVEVA System Platform Galaxy (via the sibling `mxaccessgw` repo), Modbus TCP, Siemens S7, Allen-Bradley CIP (ControlLogix / CompactLogix), Allen-Bradley Legacy (SLC 500 / MicroLogix), Beckhoff TwinCAT (ADS), FANUC FOCAS, and OPC UA Client (gateway).
A cross-platform client stack (.NET 10) — shared library, CLI, and Avalonia desktop app — connects to any OPC UA server.
OPC UA server and cross-platform client tools for AVEVA System Platform (Wonderware) Galaxy. The server exposes Galaxy tags via MXAccess as an OPC UA address space. The client stack provides a shared library, CLI tool, and Avalonia desktop application for browsing, reading/writing, subscriptions, alarms, and historical data.
## Architecture
```
OPC UA Clients (CLI, Desktop UI, 3rd-party)
|
v
+-------------------------------------+
| OtOpcUa.Server (.NET 10 AnyCPU) |
| address space + capability fan-out|
+-------------------------------------+
| | | | | | | |
Galaxy Modbus S7 AbCip AbLeg TwinCAT FOCAS OpcUaClient
|
v
mxaccessgw (sibling repo, gRPC)
|
v
MXAccess COM (x86 worker, on AVEVA box)
OPC UA Clients
(CLI, Desktop UI, 3rd-party)
|
v
+-----------------+ +------------------+ +-----------------+
| Galaxy Repo DB |---->| OPC UA Server |<--->| MXAccess Client |
| (SQL Server) | | (address space) | | (STA + COM) |
+-----------------+ +------------------+ +-----------------+
| |
+-------+--------+ +---------+---------+
| Status Dashboard| | Historian Runtime |
| (HTTP/JSON) | | (SQL Server) |
+----------------+ +-------------------+
```
Galaxy is the only driver with an external runtime: it speaks gRPC to a separately installed `mxaccessgw` server (sibling repo at `c:\Users\dohertj2\Desktop\mxaccessgw\`) which owns the MXAccess COM apartment and the x86/STA bitness constraint server-side. Everything in this repo is platform-agnostic .NET 10.
## Contained Name vs Tag Name
## Prerequisites
| Browse Path (contained names) | Runtime Reference (tag name) |
|-------------------------------|------------------------------|
| `TestMachine_001/DelmiaReceiver/DownloadPath` | `DelmiaReceiver_001.DownloadPath` |
| `TestMachine_001/MESReceiver/MoveInBatchID` | `MESReceiver_001.MoveInBatchID` |
- .NET 10 SDK (server, drivers, clients all target .NET 10)
- SQL Server reachable for the central config DB
- For Galaxy specifically: a running `mxaccessgw` deployment — see [docs/v2/Galaxy.ParityRig.md](docs/v2/Galaxy.ParityRig.md)
- For Wonderware Historian read-back: optional `OtOpcUaWonderwareHistorian` sidecar — see [docs/ServiceHosting.md](docs/ServiceHosting.md)
---
## Quick Start
## Server
The OPC UA server runs on .NET Framework 4.8 (x86) and bridges the Galaxy runtime to OPC UA clients.
### Server Prerequisites
- .NET Framework 4.8 SDK
- AVEVA System Platform with ArchestrA Framework installed
- Galaxy repository database (SQL Server, Windows Auth)
- MXAccess COM registered (`LMXProxy.LMXProxyServer`)
- Wonderware Historian (optional, for historical data access)
- Windows (required for COM interop and MXAccess)
### Build and Run Server
```bash
dotnet restore ZB.MOM.WW.OtOpcUa.slnx
dotnet build ZB.MOM.WW.OtOpcUa.slnx
dotnet test ZB.MOM.WW.OtOpcUa.slnx
# Run the server in dev (foreground)
dotnet run --project src/ZB.MOM.WW.OtOpcUa.Server
dotnet restore ZB.MOM.WW.LmxOpcUa.slnx
dotnet build src/ZB.MOM.WW.LmxOpcUa.Host
dotnet run --project src/ZB.MOM.WW.LmxOpcUa.Host
```
The server starts on `opc.tcp://localhost:4840` with the `None` security profile. Configure `Security.Profiles` in `src/ZB.MOM.WW.OtOpcUa.Server/appsettings.json` to enable `Basic256Sha256-Sign` or `Basic256Sha256-SignAndEncrypt`. See [docs/security.md](docs/security.md).
The server starts on `opc.tcp://localhost:4840/LmxOpcUa` with the `None` security profile by default. Configure `Security.Profiles` in `appsettings.json` to enable `Basic256Sha256-Sign` or `Basic256Sha256-SignAndEncrypt` for transport security. See [Security Guide](docs/security.md).
## Install as Windows Services
Production deployment is driven by `scripts/install/Install-Services.ps1`, which registers the `OtOpcUa` server service (and optionally the `OtOpcUaWonderwareHistorian` sidecar) under a chosen service account. Galaxy support requires a separately installed `mxaccessgw` — neither this repo nor the install script provisions it.
```powershell
.\scripts\install\Install-Services.ps1 `
-InstallRoot 'C:\Program Files\OtOpcUa' `
-ServiceAccount 'DOMAIN\svc-otopcua'
```
Add `-InstallWonderwareHistorian` for the historian sidecar. See the script header and [docs/ServiceHosting.md](docs/ServiceHosting.md) for full options.
## Client CLI
### Install as Windows Service
```bash
dotnet run --project src/ZB.MOM.WW.OtOpcUa.Client.CLI -- connect -u opc.tcp://localhost:4840
dotnet run --project src/ZB.MOM.WW.OtOpcUa.Client.CLI -- browse -u opc.tcp://localhost:4840 -r -d 3
dotnet run --project src/ZB.MOM.WW.OtOpcUa.Client.CLI -- read -u opc.tcp://localhost:4840 -n "ns=2;s=SomeNode"
dotnet run --project src/ZB.MOM.WW.OtOpcUa.Client.CLI -- write -u opc.tcp://localhost:4840 -n "ns=2;s=SomeNode" -v 42
dotnet run --project src/ZB.MOM.WW.OtOpcUa.Client.CLI -- subscribe -u opc.tcp://localhost:4840 -n "ns=2;s=SomeNode" -i 500
cd src/ZB.MOM.WW.LmxOpcUa.Host/bin/Debug/net48
ZB.MOM.WW.LmxOpcUa.Host.exe install
ZB.MOM.WW.LmxOpcUa.Host.exe start
```
See [docs/Client.CLI.md](docs/Client.CLI.md) and [docs/Client.UI.md](docs/Client.UI.md).
**Service logon requirement:** The service must run under a Windows account that has access to the AVEVA Galaxy and Historian. The default `LocalSystem` account can connect to MXAccess and SQL Server but **cannot authenticate with the Historian SDK** (HCAP). Configure the service to "Log on as" a domain or local user that is a recognized ArchestrA platform user. This can be set in `services.msc` or during install with `ZB.MOM.WW.LmxOpcUa.Host.exe install -username DOMAIN\user -password ***`.
### Run Server Tests
```bash
dotnet test tests/ZB.MOM.WW.LmxOpcUa.Tests
dotnet test tests/ZB.MOM.WW.LmxOpcUa.IntegrationTests
```
---
## Client Stack
The client stack is cross-platform (.NET 10) and consists of three projects sharing a common `IOpcUaClientService` abstraction. No AVEVA software or COM is required — the clients connect to any OPC UA server.
### Client Prerequisites
- .NET 10 SDK
- No platform-specific dependencies (runs on Windows, macOS, Linux)
### Build All Clients
```bash
dotnet build src/ZB.MOM.WW.LmxOpcUa.Client.Shared
dotnet build src/ZB.MOM.WW.LmxOpcUa.Client.CLI
dotnet build src/ZB.MOM.WW.LmxOpcUa.Client.UI
```
### Run Client Tests
```bash
dotnet test tests/ZB.MOM.WW.LmxOpcUa.Client.Shared.Tests
dotnet test tests/ZB.MOM.WW.LmxOpcUa.Client.CLI.Tests
dotnet test tests/ZB.MOM.WW.LmxOpcUa.Client.UI.Tests
```
### Client CLI
```bash
# Connect
dotnet run --project src/ZB.MOM.WW.LmxOpcUa.Client.CLI -- connect -u opc.tcp://localhost:4840/LmxOpcUa
# Browse Galaxy hierarchy
dotnet run --project src/ZB.MOM.WW.LmxOpcUa.Client.CLI -- browse -u opc.tcp://localhost:4840/LmxOpcUa -n "ns=3;s=ZB" -r -d 5
# Read a tag
dotnet run --project src/ZB.MOM.WW.LmxOpcUa.Client.CLI -- read -u opc.tcp://localhost:4840/LmxOpcUa -n "ns=3;s=TestMachine_001.MachineID"
# Write a tag
dotnet run --project src/ZB.MOM.WW.LmxOpcUa.Client.CLI -- write -u opc.tcp://localhost:4840/LmxOpcUa -n "ns=3;s=TestChildObject.TestString" -v "Hello"
# Subscribe to changes
dotnet run --project src/ZB.MOM.WW.LmxOpcUa.Client.CLI -- subscribe -u opc.tcp://localhost:4840/LmxOpcUa -n "ns=3;s=TestChildObject.TestInt" -i 500
# Read historical data
dotnet run --project src/ZB.MOM.WW.LmxOpcUa.Client.CLI -- historyread -u opc.tcp://localhost:4840/LmxOpcUa -n "ns=3;s=TestMachine_001.TestHistoryValue" --start "2026-03-25" --end "2026-03-30"
# Subscribe to alarm events
dotnet run --project src/ZB.MOM.WW.LmxOpcUa.Client.CLI -- alarms -u opc.tcp://localhost:4840/LmxOpcUa -n "ns=3;s=TestMachine_001" --refresh
# Query redundancy state
dotnet run --project src/ZB.MOM.WW.LmxOpcUa.Client.CLI -- redundancy -u opc.tcp://localhost:4840/LmxOpcUa
```
### Client UI
```bash
dotnet run --project src/ZB.MOM.WW.LmxOpcUa.Client.UI
```
The desktop application provides browse tree, subscriptions, alarm monitoring, history reads, and write dialogs. See [Client UI Documentation](docs/Client.UI.md) for details.
---
## Project Structure
```
src/
ZB.MOM.WW.LmxOpcUa.Host/ OPC UA server (.NET Framework 4.8, x86)
Configuration/ Config binding and validation
Domain/ Interfaces, DTOs, enums, mappers
Historian/ Wonderware Historian data source
Metrics/ Performance tracking (rolling P95)
MxAccess/ STA thread, COM interop, subscriptions
GalaxyRepository/ SQL queries, change detection
OpcUa/ Server, node manager, address space, alarms, diff
Status/ HTTP dashboard, health checks
ZB.MOM.WW.LmxOpcUa.Client.Shared/ Shared OPC UA client library (.NET 10)
ZB.MOM.WW.LmxOpcUa.Client.CLI/ Command-line client (.NET 10)
ZB.MOM.WW.LmxOpcUa.Client.UI/ Avalonia desktop client (.NET 10)
tests/
ZB.MOM.WW.LmxOpcUa.Tests/ Server unit + integration tests
ZB.MOM.WW.LmxOpcUa.IntegrationTests/ Server integration tests (live DB)
ZB.MOM.WW.LmxOpcUa.Client.Shared.Tests/ Shared library tests
ZB.MOM.WW.LmxOpcUa.Client.CLI.Tests/ CLI command tests
ZB.MOM.WW.LmxOpcUa.Client.UI.Tests/ UI ViewModel + headless tests
gr/ Galaxy repository docs, SQL queries, schema
```
## Documentation
### Architecture deep-dives
### Server
| Topic | Doc |
| Component | Description |
|---|---|
| OPC UA server composition, namespace fan-out, Polly invoker | [docs/OpcUaServer.md](docs/OpcUaServer.md) |
| Address space layout | [docs/AddressSpace.md](docs/AddressSpace.md) |
| Read / Write dispatch (driver vs virtual vs scripted-alarm) | [docs/ReadWriteOperations.md](docs/ReadWriteOperations.md) |
| Incremental sync (driver-backend rediscovery + config publishes) | [docs/IncrementalSync.md](docs/IncrementalSync.md) |
| Service hosting (Server + Admin + optional historian sidecar) | [docs/ServiceHosting.md](docs/ServiceHosting.md) |
| Security (transport, LDAP, certificates) | [docs/security.md](docs/security.md) |
| Redundancy | [docs/Redundancy.md](docs/Redundancy.md) |
| Status dashboard | [docs/StatusDashboard.md](docs/StatusDashboard.md) |
| [OPC UA Server](docs/OpcUaServer.md) | Endpoint, sessions, security policy, server lifecycle |
| [Address Space](docs/AddressSpace.md) | Hierarchy nodes, variable nodes, primitive grouping, NodeId scheme |
| [Galaxy Repository](docs/GalaxyRepository.md) | SQL queries, deployed package chain, change detection |
| [MXAccess Bridge](docs/MxAccessBridge.md) | STA thread, COM interop, subscriptions, reconnection |
| [Data Type Mapping](docs/DataTypeMapping.md) | Galaxy to OPC UA types, arrays, security classification |
| [Read/Write Operations](docs/ReadWriteOperations.md) | Value reads, writes, access level enforcement, array element writes |
| [Subscriptions](docs/Subscriptions.md) | Ref-counted MXAccess subscriptions, data change dispatch |
| [Alarm Tracking](docs/AlarmTracking.md) | AlarmConditionState nodes, InAlarm monitoring, event reporting |
| [Historical Data Access](docs/HistoricalDataAccess.md) | Historian data source, HistoryReadRaw, HistoryReadProcessed |
| [Incremental Sync](docs/IncrementalSync.md) | Diff computation, subtree teardown/rebuild, subscription preservation |
| [Configuration](docs/Configuration.md) | appsettings.json binding, feature flags, validation |
| [Status Dashboard](docs/StatusDashboard.md) | HTTP server, health checks, metrics reporting |
| [Service Hosting](docs/ServiceHosting.md) | TopShelf, startup/shutdown sequence, error handling |
| [Security](docs/security.md) | Transport security profiles, certificate trust, production hardening |
| [Redundancy](docs/Redundancy.md) | Non-transparent warm/hot redundancy, ServiceLevel, paired deployment |
### Drivers
### Client
| Topic | Doc |
| Component | Description |
|---|---|
| Driver specs (per-driver capability surface, config, addressing) | [docs/v2/driver-specs.md](docs/v2/driver-specs.md) |
| Galaxy driver | [docs/drivers/Galaxy.md](docs/drivers/Galaxy.md) |
| Modbus / S7 / AbCip / AbLegacy / TwinCAT / FOCAS / OpcUaClient | [docs/drivers/](docs/drivers/) |
| Galaxy parity rig (mxaccessgw setup) | [docs/v2/Galaxy.ParityRig.md](docs/v2/Galaxy.ParityRig.md) |
| Galaxy performance + tracing | [docs/v2/Galaxy.Performance.md](docs/v2/Galaxy.Performance.md) |
| [Client CLI](docs/Client.CLI.md) | Connect, browse, read, write, subscribe, historyread, alarms, redundancy commands |
| [Client UI](docs/Client.UI.md) | Avalonia desktop client: browse, subscribe, alarms, history, write values |
### Clients
### Reference
| Topic | Doc |
|---|---|
| Client CLI | [docs/Client.CLI.md](docs/Client.CLI.md) |
| Client UI (Avalonia desktop) | [docs/Client.UI.md](docs/Client.UI.md) |
### v1 archive
The original v1 in-process MXAccess docs (Galaxy.Host topology,
Configuration env vars, AlarmTracking, DataTypeMapping,
HistoricalDataAccess, Subscriptions, etc.) are preserved under
[docs/v1/](docs/v1/) — historical reference only. PR 7.2 retired the
v1 architecture on 2026-04-30; current state is documented in the
sections above.
- [Galaxy Repository Queries](gr/CLAUDE.md) — SQL queries for hierarchy, attributes, and change detection
- [Data Type Mapping](gr/data_type_mapping.md) — Galaxy to OPC UA type mapping with security classification
## License
+12 -9
View File
@@ -9,16 +9,17 @@
<Project Path="src/ZB.MOM.WW.OtOpcUa.Core.AlarmHistorian/ZB.MOM.WW.OtOpcUa.Core.AlarmHistorian.csproj"/>
<Project Path="src/ZB.MOM.WW.OtOpcUa.Server/ZB.MOM.WW.OtOpcUa.Server.csproj"/>
<Project Path="src/ZB.MOM.WW.OtOpcUa.Admin/ZB.MOM.WW.OtOpcUa.Admin.csproj"/>
<Project Path="src/ZB.MOM.WW.OtOpcUa.Driver.Galaxy/ZB.MOM.WW.OtOpcUa.Driver.Galaxy.csproj"/>
<Project Path="src/ZB.MOM.WW.OtOpcUa.Driver.Historian.Wonderware/ZB.MOM.WW.OtOpcUa.Driver.Historian.Wonderware.csproj"/>
<Project Path="src/ZB.MOM.WW.OtOpcUa.Driver.Historian.Wonderware.Client/ZB.MOM.WW.OtOpcUa.Driver.Historian.Wonderware.Client.csproj"/>
<Project Path="src/ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Shared/ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Shared.csproj"/>
<Project Path="src/ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Host/ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Host.csproj"/>
<Project Path="src/ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Proxy/ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Proxy.csproj"/>
<Project Path="src/ZB.MOM.WW.OtOpcUa.Driver.Modbus/ZB.MOM.WW.OtOpcUa.Driver.Modbus.csproj"/>
<Project Path="src/ZB.MOM.WW.OtOpcUa.Driver.Modbus.Addressing/ZB.MOM.WW.OtOpcUa.Driver.Modbus.Addressing.csproj"/>
<Project Path="src/ZB.MOM.WW.OtOpcUa.Driver.S7/ZB.MOM.WW.OtOpcUa.Driver.S7.csproj"/>
<Project Path="src/ZB.MOM.WW.OtOpcUa.Driver.AbCip/ZB.MOM.WW.OtOpcUa.Driver.AbCip.csproj"/>
<Project Path="src/ZB.MOM.WW.OtOpcUa.Driver.AbLegacy/ZB.MOM.WW.OtOpcUa.Driver.AbLegacy.csproj"/>
<Project Path="src/ZB.MOM.WW.OtOpcUa.Driver.TwinCAT/ZB.MOM.WW.OtOpcUa.Driver.TwinCAT.csproj"/>
<Project Path="src/ZB.MOM.WW.OtOpcUa.Driver.FOCAS/ZB.MOM.WW.OtOpcUa.Driver.FOCAS.csproj"/>
<Project Path="src/ZB.MOM.WW.OtOpcUa.Driver.FOCAS.Shared/ZB.MOM.WW.OtOpcUa.Driver.FOCAS.Shared.csproj"/>
<Project Path="src/ZB.MOM.WW.OtOpcUa.Driver.FOCAS.Host/ZB.MOM.WW.OtOpcUa.Driver.FOCAS.Host.csproj"/>
<Project Path="src/ZB.MOM.WW.OtOpcUa.Driver.OpcUaClient/ZB.MOM.WW.OtOpcUa.Driver.OpcUaClient.csproj"/>
<Project Path="src/ZB.MOM.WW.OtOpcUa.Client.Shared/ZB.MOM.WW.OtOpcUa.Client.Shared.csproj"/>
<Project Path="src/ZB.MOM.WW.OtOpcUa.Client.CLI/ZB.MOM.WW.OtOpcUa.Client.CLI.csproj"/>
@@ -43,11 +44,12 @@
<Project Path="tests/ZB.MOM.WW.OtOpcUa.Server.Tests/ZB.MOM.WW.OtOpcUa.Server.Tests.csproj"/>
<Project Path="tests/ZB.MOM.WW.OtOpcUa.Admin.Tests/ZB.MOM.WW.OtOpcUa.Admin.Tests.csproj"/>
<Project Path="tests/ZB.MOM.WW.OtOpcUa.Admin.E2ETests/ZB.MOM.WW.OtOpcUa.Admin.E2ETests.csproj"/>
<Project Path="tests/ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Tests/ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Tests.csproj"/>
<Project Path="tests/ZB.MOM.WW.OtOpcUa.Driver.Historian.Wonderware.Tests/ZB.MOM.WW.OtOpcUa.Driver.Historian.Wonderware.Tests.csproj"/>
<Project Path="tests/ZB.MOM.WW.OtOpcUa.Driver.Historian.Wonderware.Client.Tests/ZB.MOM.WW.OtOpcUa.Driver.Historian.Wonderware.Client.Tests.csproj"/>
<Project Path="tests/ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Shared.Tests/ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Shared.Tests.csproj"/>
<Project Path="tests/ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Host.Tests/ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Host.Tests.csproj"/>
<Project Path="tests/ZB.MOM.WW.OtOpcUa.Driver.Galaxy.TestSupport/ZB.MOM.WW.OtOpcUa.Driver.Galaxy.TestSupport.csproj"/>
<Project Path="tests/ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Proxy.Tests/ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Proxy.Tests.csproj"/>
<Project Path="tests/ZB.MOM.WW.OtOpcUa.Driver.Galaxy.E2E/ZB.MOM.WW.OtOpcUa.Driver.Galaxy.E2E.csproj"/>
<Project Path="tests/ZB.MOM.WW.OtOpcUa.Driver.Modbus.Tests/ZB.MOM.WW.OtOpcUa.Driver.Modbus.Tests.csproj"/>
<Project Path="tests/ZB.MOM.WW.OtOpcUa.Driver.Modbus.Addressing.Tests/ZB.MOM.WW.OtOpcUa.Driver.Modbus.Addressing.Tests.csproj"/>
<Project Path="tests/ZB.MOM.WW.OtOpcUa.Driver.Modbus.IntegrationTests/ZB.MOM.WW.OtOpcUa.Driver.Modbus.IntegrationTests.csproj"/>
<Project Path="tests/ZB.MOM.WW.OtOpcUa.Driver.Cli.Common.Tests/ZB.MOM.WW.OtOpcUa.Driver.Cli.Common.Tests.csproj"/>
<Project Path="tests/ZB.MOM.WW.OtOpcUa.Driver.Modbus.Cli.Tests/ZB.MOM.WW.OtOpcUa.Driver.Modbus.Cli.Tests.csproj"/>
@@ -63,7 +65,8 @@
<Project Path="tests/ZB.MOM.WW.OtOpcUa.Driver.TwinCAT.Tests/ZB.MOM.WW.OtOpcUa.Driver.TwinCAT.Tests.csproj"/>
<Project Path="tests/ZB.MOM.WW.OtOpcUa.Driver.TwinCAT.IntegrationTests/ZB.MOM.WW.OtOpcUa.Driver.TwinCAT.IntegrationTests.csproj"/>
<Project Path="tests/ZB.MOM.WW.OtOpcUa.Driver.FOCAS.Tests/ZB.MOM.WW.OtOpcUa.Driver.FOCAS.Tests.csproj"/>
<Project Path="tests/ZB.MOM.WW.OtOpcUa.Driver.FOCAS.IntegrationTests/ZB.MOM.WW.OtOpcUa.Driver.FOCAS.IntegrationTests.csproj"/>
<Project Path="tests/ZB.MOM.WW.OtOpcUa.Driver.FOCAS.Shared.Tests/ZB.MOM.WW.OtOpcUa.Driver.FOCAS.Shared.Tests.csproj"/>
<Project Path="tests/ZB.MOM.WW.OtOpcUa.Driver.FOCAS.Host.Tests/ZB.MOM.WW.OtOpcUa.Driver.FOCAS.Host.Tests.csproj"/>
<Project Path="tests/ZB.MOM.WW.OtOpcUa.Driver.AbCip.IntegrationTests/ZB.MOM.WW.OtOpcUa.Driver.AbCip.IntegrationTests.csproj"/>
<Project Path="tests/ZB.MOM.WW.OtOpcUa.Driver.OpcUaClient.Tests/ZB.MOM.WW.OtOpcUa.Driver.OpcUaClient.Tests.csproj"/>
<Project Path="tests/ZB.MOM.WW.OtOpcUa.Driver.OpcUaClient.IntegrationTests/ZB.MOM.WW.OtOpcUa.Driver.OpcUaClient.IntegrationTests.csproj"/>
+1
View File
@@ -0,0 +1 @@
{"title":"Phase 3 PR 54 -- Siemens S7 Modbus TCP quirks research doc","body":"## Summary\n\nAdds `docs/v2/s7.md` (485 lines) covering Siemens SIMATIC S7 family Modbus TCP behavior. Mirrors the `docs/v2/dl205.md` template for future per-quirk implementation PRs.\n\n## Key findings for the implementation track\n\n- **No fixed memory map** — every S7 Modbus server is user-wired via `MB_SERVER`/`MODBUSCP`/`MODBUSPN` library blocks. Driver must accept per-site config, not assume a vendor layout.\n- **MB_SERVER requires non-optimized DBs** (STATUS `0x8383` if optimized). Most common field bug.\n- **Word order default = ABCD** (opposite of DL260). Driver's S7 profile default must be `ByteOrder.BigEndian`, not `WordSwap`.\n- **One port per MB_SERVER instance** — multi-client requires parallel FBs on 503/504/… Most clients assume port 502 multiplexes (wrong on S7).\n- **CP 343-1 Lean is server-only**, requires the `2XV9450-1MB00` license.\n- **FC20/21/22/23/43 all return Illegal Function** on every S7 variant — driver must not attempt FC23 bulk-read optimization for S7.\n- **STOP-mode behavior non-deterministic** across firmware bands — treat both read/write STOP-mode responses as unavailable.\n\nTwo items flagged as unconfirmed rumour (V2.0+ float byte-order claim, STOP-mode caching location).\n\nNo code, no tests — implementation lands in PRs 56+.\n\n## Test plan\n- [x] Doc renders as markdown\n- [x] 31 citations present\n- [x] Section structure matches dl205.md template","head":"phase-3-pr54-s7-research-doc","base":"v2"}
+1
View File
@@ -0,0 +1 @@
{"title":"Phase 3 PR 55 -- Mitsubishi MELSEC Modbus TCP quirks research doc","body":"## Summary\n\nAdds `docs/v2/mitsubishi.md` (451 lines) covering MELSEC Q/L/iQ-R/iQ-F/FX3U Modbus TCP behavior. Mirrors `docs/v2/dl205.md` template for per-quirk implementation PRs.\n\n## Key findings for the implementation track\n\n- **Module naming trap** — `QJ71MB91` is SERIAL RTU, not TCP. TCP module is `QJ71MT91`. Surface clearly in driver docs.\n- **No canonical mapping** — per-site 'Modbus Device Assignment Parameter' block (up to 16 entries). Treat mapping as runtime config.\n- **X/Y hex vs octal depends on family** — Q/L/iQ-R use HEX (X20 = decimal 32); FX/iQ-F use OCTAL (X20 = decimal 16). Helper must take a family selector.\n- **Word order CDAB default** across all MELSEC families (opposite of Siemens S7). Driver Mitsubishi profile default: `ByteOrder.WordSwap`.\n- **D-registers binary by default** (opposite of DL205's BCD default). Caller opts in to `Bcd16`/`Bcd32` when ladder uses BCD.\n- **FX5U needs firmware ≥ 1.060** for Modbus TCP server — older is client-only.\n- **FX3U-ENET vs FX3U-ENET-P502 vs FX3U-ENET-ADP** — only the middle one binds port 502; the last has no Modbus at all. Common operator mis-purchase.\n- **QJ71MT91 does NOT support FC22 / FC23** — iQ-R / iQ-F do. Bulk-read optimization must gate on capability.\n- **STOP-mode writes configurable** on Q/L/iQ-R/iQ-F (default accept), always rejected on FX3U-ENET.\n\nThree unconfirmed rumours flagged separately.\n\nNo code, no tests — implementation lands in PRs 58+.\n\n## Test plan\n- [x] Doc renders as markdown\n- [x] 17 citations present\n- [x] Per-model test naming matrix included (`Mitsubishi_QJ71MT91_*`, `Mitsubishi_FX5U_*`, `Mitsubishi_FX3U_ENET_*`, shared `Mitsubishi_Common_*`)","head":"phase-3-pr55-mitsubishi-research-doc","base":"v2"}
+106 -107
View File
@@ -1,129 +1,128 @@
# Alarm tracking — v2 final architecture
# Alarm Tracking
This document describes how OtOpcUa surfaces alarms to OPC UA Part 9
clients after the **alarms-over-gateway** epic
([docs/plans/alarms-over-gateway.md](plans/alarms-over-gateway.md))
landed. The v1 architecture (Galaxy.Host's COM-side `GalaxyAlarmTracker`)
is preserved at [docs/v1/AlarmTracking.md](v1/AlarmTracking.md) for
historical reference.
Alarm surfacing is an optional driver capability exposed via `IAlarmSource` (`src/ZB.MOM.WW.OtOpcUa.Core.Abstractions/IAlarmSource.cs`). Drivers whose backends have an alarm concept implement it — today: Galaxy (MXAccess alarms), FOCAS (CNC alarms), OPC UA Client (A&C events from the upstream server). Modbus / S7 / AB CIP / AB Legacy / TwinCAT do not implement the interface and the feature is simply absent from their subtrees.
## Three alarm sources, one OPC UA Part 9 surface
## IAlarmSource surface
| Source | Driver capability | Path |
|----------------------------------|--------------------------|------|
| **Galaxy MxAccess (driver-native)** | `GalaxyDriver : IAlarmSource` | gateway → worker → MxAccess alarm sink → `MX_EVENT_FAMILY_ON_ALARM_TRANSITION``EventPump` → driver `OnAlarmEvent``AlarmConditionService` |
| **Galaxy sub-attribute fallback** | `IWritable` writes to `$Alarm*` sub-attributes | gateway data subscription → driver `OnDataChange``DriverNodeManager` ConditionSink → `AlarmConditionService` |
| **Scripted alarms** | `Phase7EngineComposer` | server-side script evaluator → `Phase7EngineComposer.RouteToHistorianAsync` + `AlarmConditionService` |
```csharp
Task<IAlarmSubscriptionHandle> SubscribeAlarmsAsync(
IReadOnlyList<string> sourceNodeIds, CancellationToken cancellationToken);
Task UnsubscribeAlarmsAsync(IAlarmSubscriptionHandle handle, CancellationToken cancellationToken);
Task AcknowledgeAsync(IReadOnlyList<AlarmAcknowledgeRequest> acknowledgements,
CancellationToken cancellationToken);
event EventHandler<AlarmEventArgs>? OnAlarmEvent;
```
All three converge on `AlarmConditionService` (`src/ZB.MOM.WW.OtOpcUa.Server/Alarms/AlarmConditionService.cs`),
which owns the OPC UA Part 9 state machine and dispatches transitions
to the OPC UA condition node managers. Driver-native transitions take
precedence over sub-attribute synthesis when both arrive for the same
condition — the dedup logic prefers the richer driver-native record
because it carries the full operator + raise-time + category metadata
that the value-driven path collapses.
The driver fires `OnAlarmEvent` for every transition (`Active`, `Acknowledged`, `Inactive`) with an `AlarmEventArgs` carrying the source node id, condition id, alarm type, message, severity (`AlarmSeverity` enum), and source timestamp.
## Galaxy driver path (driver-native)
## AlarmSurfaceInvoker
Restored in PR B.2 of the epic. `GalaxyDriver` implements
`IAlarmSource` with these surfaces:
`AlarmSurfaceInvoker` (`src/ZB.MOM.WW.OtOpcUa.Core/Resilience/AlarmSurfaceInvoker.cs`) wraps the three mutating surfaces through `CapabilityInvoker`:
- `SubscribeAlarmsAsync(sourceNodeIds)` → returns a sentinel handle.
The driver doesn't multiplex per source-node-id today; every
active handle observes the gateway's alarm-event stream. The
server-side `AlarmConditionService` filters by source-node before
raising the OPC UA condition.
- `UnsubscribeAlarmsAsync(handle)` → symmetric handle removal.
- `AcknowledgeAsync(requests)` → routes one gateway RPC per
acknowledgement through `IGalaxyAlarmAcknowledger`. Production
uses `GatewayGalaxyAlarmAcknowledger` calling
`MxGatewayClient.AcknowledgeAlarmAsync` (PR E.2 SDK method).
- `OnAlarmEvent` → bridges `EventPump.OnAlarmTransition` (PR B.1)
onto `AlarmEventArgs`. Suppressed when no alarm subscription is
active so untracked transitions don't leak through.
- `SubscribeAlarmsAsync` / `UnsubscribeAlarmsAsync` run through the `DriverCapability.AlarmSubscribe` pipeline — retries apply under the tier configuration.
- `AcknowledgeAsync` runs through `DriverCapability.AlarmAcknowledge` which does NOT retry per decision #143. A timed-out ack may have already registered at the plant floor; replay would silently double-acknowledge.
The proto contract carries the rich payload — alarm full reference,
source-object reference, alarm-type-name, transition kind (Raise /
Acknowledge / Clear / Retrigger), severity (raw MxAccess scale),
original raise timestamp, transition timestamp, operator user,
operator comment, alarm category, description. `MxAccessSeverityMapper`
(PR B.1) translates the raw severity onto the four-bucket
`AlarmSeverity` ladder — boundaries match v1's `GalaxyAlarmTracker`
so customers see no surprise re-classification.
Multi-host fan-out: when the driver implements `IPerCallHostResolver`, each source node id is resolved individually and batches are grouped by host so a dead PLC inside a multi-device driver doesn't poison sibling breakers. Single-host drivers fall back to `IDriver.DriverInstanceId` as the pipeline-key host.
The richer fields surface on `Core.Abstractions.AlarmEventArgs` via
the optional properties added in PR E.7 (`OperatorComment`,
`OriginalRaiseTimestampUtc`, `AlarmCategory`). Consumers that don't
need them are unaffected; consumers that do (Client.UI, Client.CLI
verbose mode) read the new fields when present.
## Condition-node creation via CapturingBuilder
## Galaxy sub-attribute fallback
Alarm-condition nodes are materialized at address-space build time. During `GenericDriverNodeManager.BuildAddressSpaceAsync` the builder is wrapped in a `CapturingBuilder` that observes every `Variable()` call. When a driver calls `IVariableHandle.MarkAsAlarmCondition(AlarmConditionInfo)` on a returned handle, the server-side `DriverNodeManager.VariableHandle` creates a sibling `AlarmConditionState` node and returns an `IAlarmConditionSink`. The wrapper stores the sink in `_alarmSinks` keyed by the variable's full reference, then `GenericDriverNodeManager` registers a forwarder on `IAlarmSource.OnAlarmEvent` that routes each push to the matching sink by `SourceNodeId`. Unknown source ids are dropped silently — they may belong to another driver.
For Galaxy templates without `$Alarm*` extensions, the value-driven
path stays in place: `DriverNodeManager` registers an
`AlarmConditionState` per Galaxy variable that bears alarm-bearing
sub-attributes (`InAlarm`, `Acked`, `Priority`, `Description`),
subscribes to those sub-attributes, and synthesizes Part 9 transitions
when the values change. This path operated as the only Galaxy alarm
path between PR 7.2 and the alarms-over-gateway epic; it remains the
fallback today.
The `AlarmConditionState` layout matches OPC UA Part 9:
When both paths report the same condition,
`AlarmConditionService.AlarmConditionState` keeps the
driver-native record and discards the duplicate sub-attribute
synthesis. Driver-native transitions are richer (carry operator
comment + original raise time) and arrive lower-latency (no
publishing-interval delay on the sub-attribute reads), so they win
the dedup.
- `SourceNode` → the originating variable
- `SourceName` / `ConditionName` → from `AlarmConditionInfo.SourceName`
- Initial state: enabled, inactive, acknowledged, severity per `InitialSeverity`, retain false
- `HasCondition` references wire the source variable ↔ the condition node bidirectionally
## Acknowledge routing
Drivers flag alarm-bearing variables at discovery time via `DriverAttributeInfo.IsAlarm = true`. The Galaxy driver, for example, sets this on attributes that have an `AlarmExtension` primitive in the Galaxy repository DB; FOCAS sets it on the CNC alarm register.
`DriverNodeManager` picks the acknowledger when registering each
condition (PR B.3 logic):
## State transitions
- Driver implements `IAlarmSource`
`DriverAlarmSourceAcknowledger` routes the operator comment
through `IAlarmSource.AcknowledgeAsync` via the existing
`AlarmSurfaceInvoker` (Phase 6.1 resilience pipeline; no-retry
per decision #143). End-to-end operator-comment fidelity is
preserved.
- Driver doesn't implement `IAlarmSource`
`DriverWritableAcknowledger` writes the comment into the
`AckMsgWriteRef` sub-attribute via `IWritable.WriteAsync`. Same
resilience pipeline; collapses the comment into a single string
write at the wire level.
`ConditionSink.OnTransition` runs under the node manager's `Lock` and maps the `AlarmEventArgs.AlarmType` string to Part 9 state:
The OPC UA Part 9 `AlarmConditionState.OnAcknowledge` delegate
already validates the session's `AlarmAck` role before dispatching,
so the gateway-side ack RPC only sees authenticated, authorised
calls.
| AlarmType | Action |
|---|---|
| `Active` | `SetActiveState(true)`, `SetAcknowledgedState(false)`, `Retain = true` |
| `Acknowledged` | `SetAcknowledgedState(true)` |
| `Inactive` | `SetActiveState(false)`; `Retain = false` once both inactive and acknowledged |
## Historian write-back (non-Galaxy alarms)
Severity is remapped: `AlarmSeverity.Low/Medium/High/Critical` → OPC UA numeric 250 / 500 / 700 / 900. `Message.Value` is set from `AlarmEventArgs.Message` on every transition. `ClearChangeMasks(true)` and `ReportEvent(condition)` fire the OPC UA event notification for clients subscribed to any ancestor notifier.
Scripted alarms (and any future non-Galaxy `IAlarmSource` like
AB CIP ALMD) route to AVEVA Historian via the Wonderware sidecar:
## Acknowledge dispatch
- `Phase7Composer.ResolveHistorianSink` resolves an
`IAlarmHistorianWriter` from either a driver that natively
implements it or the DI-registered `WonderwareHistorianClient`
(the sidecar IPC client). Driver-provided wins when both are
present.
- `SqliteStoreAndForwardSink` queues each transition to a local
SQLite database and drains in the background via the resolved
writer.
- Sidecar (PR C.1 + C.2) forwards the events to `aahClientManaged`'s
alarm-event write API; the live SDK call site is pinned during
PR D.1's deploy-rig validation.
Alarm acknowledgement initiated by an OPC UA client flows:
Galaxy-native alarms with `$Alarm*` extensions reach AVEVA Historian
directly via System Platform's `HistorizeToAveva` toggle on the
alarm primitive — no involvement from OtOpcUa. This sidecar path is
exclusively for non-Galaxy alarm producers.
1. The SDK invokes the `AlarmConditionState.OnAcknowledge` method delegate.
2. The handler checks the session's roles for `AlarmAck` — drivers never see a request the session wasn't entitled to make.
3. `AlarmSurfaceInvoker.AcknowledgeAsync` is called with the source / condition / comment tuple. The invoker groups by host and runs each batch through the no-retry `AlarmAcknowledge` pipeline.
## Cross-references
Drivers return normally for success or throw to signal the ack failed at the backend.
- Plan: [docs/plans/alarms-over-gateway.md](plans/alarms-over-gateway.md)
- v1 archive: [docs/v1/AlarmTracking.md](v1/AlarmTracking.md)
- Galaxy driver: [docs/drivers/Galaxy.md](drivers/Galaxy.md)
- Phase 7 scripting + alarming: [docs/v2/implementation/phase-7-scripting-and-alarming.md](v2/implementation/phase-7-scripting-and-alarming.md)
- Security + ACL: [docs/Security.md](Security.md)
## EventNotifier propagation
Drivers that want hierarchical alarm subscriptions propagate `EventNotifier.SubscribeToEvents` up the containment chain during discovery — the Galaxy driver flips the flag on every ancestor of an alarm-bearing object up to the driver root, mirroring v1 behavior. Clients subscribed at the driver root, a mid-level folder, or the `Objects/` root see alarm events from every descendant with an `AlarmConditionState` sibling. The driver-root `FolderState` is created in `DriverNodeManager.CreateAddressSpace` with `EventNotifier = SubscribeToEvents | HistoryRead` so alarm event subscriptions and alarm history both have a single natural target.
## ConditionRefresh
The OPC UA `ConditionRefresh` service queues the current state of every retained condition back to the requesting monitored items. `DriverNodeManager` iterates the node manager's `AlarmConditionState` collection and queues each condition whose `Retain.Value == true` — matching the Part 9 requirement.
## Alarm historian sink
Distinct from the live `IAlarmSource` stream and the Part 9 `AlarmConditionState` materialization above, qualifying alarm transitions are **also** persisted to a durable event log for downstream AVEVA Historian ingestion. This is a separate subsystem from the `IHistoryProvider` capability used by `HistoryReadEvents` (see [HistoricalDataAccess.md](HistoricalDataAccess.md#alarm-event-history-vs-ihistoryprovider)): the sink is a *producer* path (server → Historian) that runs independently of any client HistoryRead call.
### `IAlarmHistorianSink`
`src/ZB.MOM.WW.OtOpcUa.Core.AlarmHistorian/IAlarmHistorianSink.cs` defines the intake contract:
```csharp
Task EnqueueAsync(AlarmHistorianEvent evt, CancellationToken cancellationToken);
HistorianSinkStatus GetStatus();
```
`EnqueueAsync` is fire-and-forget from the producer's perspective — it must never block the emitting thread. The event payload (`AlarmHistorianEvent` — same file) is source-agnostic: `AlarmId`, `EquipmentPath`, `AlarmName`, `AlarmTypeName` (Part 9 subtype name), `Severity`, `EventKind` (free-form transition string — `Activated` / `Cleared` / `Acknowledged` / `Confirmed` / `Shelved` / …), `Message`, `User`, `Comment`, `TimestampUtc`.
The sink scope is defined to span every alarm source (plan decision #15: scripted, Galaxy-native, AB CIP ALMD, any future `IAlarmSource`), gated per-alarm by a `HistorizeToAveva` toggle on the producer. Today only `Phase7EngineComposer.RouteToHistorianAsync` (`src/ZB.MOM.WW.OtOpcUa.Server/Phase7/Phase7EngineComposer.cs`) is wired — it subscribes to `ScriptedAlarmEngine.OnEvent` and marshals each emission into `AlarmHistorianEvent`. Galaxy-native alarms continue to reach AVEVA Historian via the driver's direct `aahClientManaged` path and do not flow through the sink; the AB CIP ALMD path remains unwired pending a producer-side integration.
### `SqliteStoreAndForwardSink`
Default production implementation (`src/ZB.MOM.WW.OtOpcUa.Core.AlarmHistorian/SqliteStoreAndForwardSink.cs`). A local SQLite queue absorbs every `EnqueueAsync` synchronously; a background `Timer` drains batches asynchronously to an `IAlarmHistorianWriter` so operator actions are never blocked on historian reachability.
Queue schema (single table `Queue`): `RowId PK autoincrement`, `AlarmId`, `EnqueuedUtc`, `PayloadJson` (serialized `AlarmHistorianEvent`), `AttemptCount`, `LastAttemptUtc`, `LastError`, `DeadLettered` (bool), plus `IX_Queue_Drain (DeadLettered, RowId)`. Default capacity `1_000_000` non-dead-lettered rows; oldest rows evict with a WARN log past the cap.
Drain cadence: `StartDrainLoop(tickInterval)` arms a periodic timer. `DrainOnceAsync` reads up to `batchSize` rows (default 100) in `RowId` order and forwards them through `IAlarmHistorianWriter.WriteBatchAsync`, which returns one `HistorianWriteOutcome` per row:
| Outcome | Action |
|---|---|
| `Ack` | Row deleted. |
| `PermanentFail` | Row flipped to `DeadLettered = 1` with reason. Peers in the batch retry independently. |
| `RetryPlease` | `AttemptCount` bumped; row stays queued. Drain worker enters `BackingOff`. |
Writer-side exceptions treat the whole batch as `RetryPlease`.
Backoff ladder on `RetryPlease` (hard-coded): 1s → 2s → 5s → 15s → 60s cap. Reset to 0 on any batch with no retries. `CurrentBackoff` exposes the current step for instrumentation; the drain timer itself fires on `tickInterval`, so the ladder governs write cadence rather than timer period.
Dead-letter retention defaults to 30 days (plan decision #21). `PurgeAgedDeadLetters` runs each drain pass and deletes rows whose `LastAttemptUtc` is past the cutoff. `RetryDeadLettered()` is an operator action that clears `DeadLettered` + resets `AttemptCount` on every dead-lettered row so they rejoin the main queue.
### Composition and writer resolution
`Phase7Composer.ResolveHistorianSink` (`src/ZB.MOM.WW.OtOpcUa.Server/Phase7/Phase7Composer.cs`) scans the registered drivers for one that implements `IAlarmHistorianWriter`. Today that is `GalaxyProxyDriver` via `GalaxyHistorianWriter` (`src/ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Proxy/Ipc/GalaxyHistorianWriter.cs`), which forwards batches over the Galaxy.Host pipe to the `aahClientManaged` alarm schema. When a writer is found, a `SqliteStoreAndForwardSink` is instantiated against `%ProgramData%/OtOpcUa/alarm-historian-queue.db` with a 2 s drain tick and the writer attached. When no driver provides a writer the fallback is the DI-registered `NullAlarmHistorianSink` (`src/ZB.MOM.WW.OtOpcUa.Server/Program.cs`), which silently discards and reports `HistorianDrainState.Disabled`.
### Status and observability
`GetStatus()` returns `HistorianSinkStatus(QueueDepth, DeadLetterDepth, LastDrainUtc, LastSuccessUtc, LastError, DrainState)` — two `COUNT(*)` scalars plus last-drain telemetry. `DrainState` is one of `Disabled` / `Idle` / `Draining` / `BackingOff`.
The Admin UI `/alarms/historian` page surfaces this through `HistorianDiagnosticsService` (`src/ZB.MOM.WW.OtOpcUa.Admin/Services/HistorianDiagnosticsService.cs`), which also exposes `TryRetryDeadLettered` — it calls through to `SqliteStoreAndForwardSink.RetryDeadLettered` when the live sink is the SQLite implementation and returns 0 otherwise.
## Key source files
- `src/ZB.MOM.WW.OtOpcUa.Core.Abstractions/IAlarmSource.cs` — capability contract + `AlarmEventArgs`
- `src/ZB.MOM.WW.OtOpcUa.Core/Resilience/AlarmSurfaceInvoker.cs` — per-host fan-out + no-retry ack
- `src/ZB.MOM.WW.OtOpcUa.Core/OpcUa/GenericDriverNodeManager.cs``CapturingBuilder` + alarm forwarder
- `src/ZB.MOM.WW.OtOpcUa.Server/OpcUa/DriverNodeManager.cs``VariableHandle.MarkAsAlarmCondition` + `ConditionSink`
- `src/ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Host/Backend/Alarms/GalaxyAlarmTracker.cs` — Galaxy-specific alarm-event production
- `src/ZB.MOM.WW.OtOpcUa.Core.AlarmHistorian/IAlarmHistorianSink.cs` — historian sink intake contract + `AlarmHistorianEvent` + `HistorianSinkStatus` + `IAlarmHistorianWriter`
- `src/ZB.MOM.WW.OtOpcUa.Core.AlarmHistorian/SqliteStoreAndForwardSink.cs` — durable queue + drain worker + backoff ladder + dead-letter retention
- `src/ZB.MOM.WW.OtOpcUa.Server/Phase7/Phase7EngineComposer.cs``RouteToHistorianAsync` wires scripted-alarm emissions into the sink
- `src/ZB.MOM.WW.OtOpcUa.Server/Phase7/Phase7Composer.cs``ResolveHistorianSink` selects `SqliteStoreAndForwardSink` vs `NullAlarmHistorianSink`
- `src/ZB.MOM.WW.OtOpcUa.Admin/Services/HistorianDiagnosticsService.cs` — Admin UI `/alarms/historian` status + retry-dead-lettered operator action
+33 -6
View File
@@ -183,19 +183,46 @@ otopcua-cli historyread -u opc.tcp://localhost:4840/OtOpcUa \
| `--start` | Start time, ISO 8601 or date string (default: 24 hours ago) |
| `--end` | End time, ISO 8601 or date string (default: now) |
| `--max` | Maximum number of values (default: 1000) |
| `--aggregate` | Aggregate function: Average, Minimum, Maximum, Count, Start, End |
| `--aggregate` | Aggregate function name (see catalog below). Case-insensitive. |
| `--interval` | Processing interval in milliseconds for aggregates (default: 3600000) |
#### Aggregate mapping
The CLI accepts the seven aggregates listed below — these are the
human-driven set the operator typically asks for from the command line.
| Name | OPC UA Node ID |
|------|---------------|
| `Average` | `AggregateFunction_Average` |
| `Minimum` | `AggregateFunction_Minimum` |
| `Maximum` | `AggregateFunction_Maximum` |
| `Average` (or `avg`) | `AggregateFunction_Average` |
| `Minimum` (or `min`) | `AggregateFunction_Minimum` |
| `Maximum` (or `max`) | `AggregateFunction_Maximum` |
| `Count` | `AggregateFunction_Count` |
| `Start` | `AggregateFunction_Start` |
| `End` | `AggregateFunction_End` |
| `Start` (or `first`) | `AggregateFunction_Start` |
| `End` (or `last`) | `AggregateFunction_End` |
| `StandardDeviation` (or `stddev` / `stdev`) | `AggregateFunction_StandardDeviationSample` |
The driver-side `IHistoryProvider.ReadProcessedAsync` API (used by the
OtOpcUa server's HistoryRead facade) supports the full OPC UA Part 13 §5
catalog — ~30 aggregates including `TimeAverage`, `Interpolative`, `Range`,
`PercentGood`, `Delta`, etc. See
[`docs/drivers/OpcUaClient.md`](drivers/OpcUaClient.md#historyread-aggregates-part-13-catalog)
for the full list. Adding a new CLI shorthand is a one-line change in
`HistoryReadCommand.ParseAggregateType` — file an issue if you need one
exposed.
#### Event-mode coverage
Drivers that implement the filter-aware
`IHistoryProvider.ReadEventsAsync(fullReference, EventHistoryRequest, ct)`
overload (currently the OPC UA Client gateway driver — Galaxy keeps the
fixed-field fallback) honour `EventFilter` SelectClauses and a `WhereClause`
when the server-side history facade forwards them. The CLI does not yet
expose a dedicated `--events` flag — clients that need filter-aware event
history call `HistoryReadEvents` through their own SDK; the CLI's
`historyread` command stays focused on the data-history (Raw / Processed /
AtTime) path. Adding `--events` is tracked as a follow-up — the wire path
on the driver side is in place (see
[`docs/drivers/OpcUaClient.md`](drivers/OpcUaClient.md#historyread-events)).
### alarms
+109
View File
@@ -20,6 +20,8 @@ dotnet run --project src/ZB.MOM.WW.OtOpcUa.Driver.AbCip.Cli -- --help
| `-g` / `--gateway` | **required** | Canonical `ab://host[:port]/cip-path` |
| `-f` / `--family` | `ControlLogix` | ControlLogix / CompactLogix / Micro800 / GuardLogix |
| `--timeout-ms` | `5000` | Per-operation timeout |
| `--addressing-mode` | `Auto` | `Auto` / `Symbolic` / `Logical` — see [AbCip-Performance §Addressing mode](drivers/AbCip-Performance.md#addressing-mode). `Logical` against Micro800 silently falls back to Symbolic with a warning. |
| `--partner` | _(unset)_ | PR abcip-5.1 — partner gateway URI for a ControlLogix HSBY pair (e.g. `ab://10.0.0.6/1,0`). When set, the driver runs a second role-probe loop against the partner and the [`hsby-status`](#hsby-status--which-chassis-is-active-now) command can surface which chassis is currently Active. See [AbCip-HSBY.md](drivers/AbCip-HSBY.md) for the full guide. |
| `--verbose` | off | Serilog debug output |
Family ↔ CIP-path cheat sheet:
@@ -55,6 +57,21 @@ otopcua-abcip-cli read -g ab://10.0.0.5/1,0 -t "Recipe[3]" --type Real
otopcua-abcip-cli read -g ab://10.0.0.5/1,0 -t "Motor01.Speed" --type Real
```
#### Diagnostic / system tags
PR abcip-4.3 exposes five read-only diagnostic variables per device under
`AbCip/<device>/_System/` in the OPC UA address space (see
[AbCip-Operability §System tags](drivers/AbCip-Operability.md#system-tags--_system-folder)
for the full table). These are not reachable through the AB CIP CLI — they
live on the OPC UA server side, not the libplctag wire — so to read one,
point the **OPC UA client** CLI at the running OtOpcUa server:
```powershell
# Read _ConnectionStatus for one device through the OPC UA server
otopcua-client-cli read -u opc.tcp://localhost:4840 \
-n "ns=2;s=AbCip/ab://10.0.0.5/1,0/_System/_ConnectionStatus"
```
### `write` — single Logix tag
Same shape as `read` plus `-v`. Values parse per `--type` using invariant
@@ -73,6 +90,65 @@ otopcua-abcip-cli write -g ab://10.0.0.5/1,0 -t StartCommand --type Bool -v true
otopcua-abcip-cli subscribe -g ab://10.0.0.5/1,0 -t Motor01_Speed --type Real -i 500
```
### `hsby-status` — which chassis is Active now?
PR abcip-5.1 — read the role tag (`WallClockTime.SyncStatus` by default,
`S:34` for legacy SLC500 / PLC-5 fronts) on a ControlLogix HSBY pair and
print which chassis is currently Active. Requires `--partner`.
```powershell
otopcua-abcip-cli hsby-status -g ab://10.0.0.5/1,0 --partner ab://10.0.0.6/1,0
# Custom role tag (legacy fronts) and more samples
otopcua-abcip-cli hsby-status -g ab://10.0.0.5/1,0 --partner ab://10.0.0.6/1,0 \
--role-tag S:34 --samples 5
```
| Flag | Default | Purpose |
|---|---|---|
| `--role-tag` | `WallClockTime.SyncStatus` | Address of the role tag. Use `S:34` for SLC500 / PLC-5. |
| `--samples` | `3` | Number of role-probe ticks to wait for before printing. |
The output prints the resolved roles + the address of whichever chassis the
driver currently considers Active. PR abcip-5.1 only **reports** the role —
PR abcip-5.2 will land the routing change so reads / writes flow to the
Active chassis automatically.
See [AbCip-HSBY.md](drivers/AbCip-HSBY.md) for the role-tag detection matrix
+ active-resolution rules + the feature-flag gate.
### `rebrowse` — force a controller-side `@tags` re-walk
PR abcip-2.5 (issue #233) added `RebrowseAsync` to drop the cached UDT
template shapes and re-run the symbol-table enumerator without restarting
the driver. The CLI variant builds a transient driver against the supplied
gateway, runs the rebrowse, and prints the freshly discovered tag names —
useful after a controller program-download to confirm the new tags are
visible on the wire before wiring them through the OtOpcUa server.
```powershell
otopcua-abcip-cli rebrowse -g ab://10.0.0.5/1,0
```
## Refreshing the tag DB
Two operator-facing surfaces drive the same `RebrowseAsync` plumbing — pick
the one that matches your context:
| Surface | When to use | Command |
|---|---|---|
| **CLI `rebrowse`** | Off-server validation. Spins up a transient driver against the gateway, prints the discovered tag list, no shared state with the live OtOpcUa server. | `otopcua-abcip-cli rebrowse -g ab://10.0.0.5/1,0` |
| **OPC UA write to `_RefreshTagDb`** | Production / Admin-UI button (PR abcip-4.4). Forces the **live** driver to re-walk + clear its template cache. The `AbCip.RefreshTriggers` driver-diagnostics counter increments per truthy write. | `otopcua-client-cli write -u opc.tcp://localhost:4840 -n "ns=2;s=AbCip/ab://10.0.0.5/1,0/_System/_RefreshTagDb" -v true --type Boolean` |
Read-back semantics: `_RefreshTagDb` always reads back as `false` (Kepware-
style "latches to idle the moment the dispatch returns") so a subscribed
client sees a stable shape regardless of how many refreshes have fired.
Falsy / unparseable writes are no-ops that still report `Good` so a UI
template that resets the trigger flag after firing it doesn't see a phantom
error. See
[AbCip-Operability §System tags](drivers/AbCip-Operability.md#refreshing-the-tag-db-via-opc-ua-write)
for the full semantics + the diagnostics counter wiring.
## Typical workflows
- **"Is the PLC reachable?"** → `probe`.
@@ -81,3 +157,36 @@ otopcua-abcip-cli subscribe -g ab://10.0.0.5/1,0 -t Motor01_Speed --type Real -i
- **"Is this GuardLogix safety tag writable from non-safety?"** → `write` and
read the status code — safety tags surface `BadNotWritable` / CIP errors,
non-safety tags surface `Good`.
- **"Did my program download show up in the address space?"** → `rebrowse`
(off-server) or write `true` to the live server's `_RefreshTagDb` system
tag (in-server, PR abcip-4.4) — both drop the template cache + force a
fresh `@tags` walk.
## Connection Size
PR abcip-3.1 introduced a per-device `ConnectionSize` override on the driver
side (`AbCipDeviceOptions.ConnectionSize`, range `500..4002`). The CLI does
not expose a flag for it — every CLI invocation uses the family-default
Connection Size (4002 / 504 / 488 depending on `--family`). When a Forward
Open is rejected with a CIP error like `0x01/0x113` ("connection request
size invalid"), the symptom is almost always a **mismatch between the chosen
family default and the controller firmware**:
- **v19-and-earlier ControlLogix** caps at 504 — pick `--family CompactLogix`
on the CLI to fall back to that narrower default.
- **5069-L1/L2/L3 CompactLogix** narrow-buffer parts also cap at 504, which
is the family default already.
- **FW20+ ControlLogix** accepts the full 4002.
For the warning *"AbCip device 'X' family 'Y' uses a narrow-buffer profile
(default ConnectionSize Z); the configured ConnectionSize N exceeds the
511-byte legacy-firmware cap..."* see
[`docs/drivers/AbCip-Performance.md`](drivers/AbCip-Performance.md) — that
warning is fired by the driver host, not the CLI.
## Related operability knobs
- [`docs/drivers/AbCip-Operability.md`](drivers/AbCip-Operability.md) — Phase 4
per-tag knobs (per-tag scan rate, deadband, etc). The CLI does not expose
these knobs directly; they're set in driver config JSON and consumed by the
driver at subscribe time.
+193 -1
View File
@@ -19,7 +19,11 @@ dotnet run --project src/ZB.MOM.WW.OtOpcUa.Driver.AbLegacy.Cli -- --help
|---|---|---|
| `-g` / `--gateway` | **required** | Canonical `ab://host[:port]/cip-path` |
| `-P` / `--plc-type` | `Slc500` | Slc500 / MicroLogix / Plc5 / LogixPccc |
| `--timeout-ms` | `5000` | Per-operation timeout |
| `--timeout-ms` | `5000` | Per-operation timeout — see precedence note below |
| `--retries` | `0` | Retry count on transient `BadCommunicationError` (PR 9 / #252) |
| `--demote-failure-threshold` | `3` | **PR ablegacy-12 / #255** — consecutive comm failures before the device is auto-demoted |
| `--demote-for-ms` | `30000` | **PR ablegacy-12 / #255** — auto-demote cool-down window in ms |
| `--no-demote` | off | **PR ablegacy-12 / #255** — disable auto-demote entirely (counters still tick) |
| `--verbose` | off | Serilog debug output |
Family ↔ CIP-path cheat sheet:
@@ -28,6 +32,52 @@ Family ↔ CIP-path cheat sheet:
with no backplane
- **LogixPccc** — `1,0` (Logix controller accessed via the PCCC compatibility
layer; rare)
- **PLC-5 via 1756-DHRIO bridge** — `1,<slot>,2,<station-octal>` (PLC-5 only).
See [drivers/AbLegacy-DH-Bridging.md](drivers/AbLegacy-DH-Bridging.md) for
the full DH+ syntax, octal-station reference (00..77 = 0..63), and manual
hardware smoke procedure.
#### DHRIO worked example (PR ablegacy-13 / #256)
PLC-5 on DH+ node 7 (octal 07), DHRIO module in chassis slot 3,
EtherNet/IP gateway 192.168.1.10:
```powershell
otopcua-ablegacy-cli read `
-g ab://192.168.1.10/1,3,2,07 `
-P Plc5 -a N7:10 -t Int
```
The parser validates `1,<slot>,2,<station>`: port-1 must be the backplane,
slot must be 0..16, port-3 must be `2` (DH+), station must be octal 0..77 (so
`80`, `90`, etc. are rejected). Combining a DH+ bridge path with a non-PLC-5
family at startup throws `InvalidOperationException("DHRIO bridging is
PLC-5-only")`.
### Per-device timeout / retry tuning (#252, PR 9)
The CLI's `--timeout-ms` is the **driver-wide default** when launched as a
one-shot test client. In production (server-side, multi-device deployment)
each `AbLegacyDeviceOptions` row carries its own optional `Timeout` /
`Retries` that override the driver-wide value.
Precedence (highest → lowest): per-device override → driver-wide default →
hard-coded fallback (2000 ms / 0 retries).
Tuning cheat sheet — start here, measure, then trim:
| Family | Recommended `Timeout` | Notes |
|---|---|---|
| SLC 5/01 (RS-232 / DH+ bridge) | **5000 ms** | Slowest of the bunch; serial round-trip plus DH+ hop |
| SLC 5/02 / 5/03 (DH+) | 3000 ms | Bridged Ethernet → DH+ adds ~1 s |
| **SLC 5/04 / 5/05** (Ethernet) | **2000 ms** | Fastest of the SLC family — direct EIP/PCCC |
| MicroLogix 1100 / 1400 | **3000 ms** | Single-CPU, slow scan; no backplane |
| PLC-5 (Ethernet I/F) | 2500 ms | Comparable to SLC 5/05 over EIP |
| LogixPccc compat layer | 2000 ms | Logix CPU is fast; PCCC layer is the floor |
A small `--retries 1` (or `2` for slow chassis) is generally safe — the retry
loop only fires on transient `BadCommunicationError`; terminal errors
(`BadNodeIdUnknown`, `BadTypeMismatch`, …) surface on the first attempt.
## PCCC address primer
@@ -58,6 +108,37 @@ otopcua-ablegacy-cli probe -g ab://192.168.1.20/1,0
otopcua-ablegacy-cli probe -g ab://192.168.1.30/ -P MicroLogix -a S:0
```
`probe` output (PR ablegacy-12 / #255) reports both `Health` (driver health
state) and `Host state`. The latter is sourced from `IHostConnectivityProbe`
and surfaces `Demoted` when the auto-demote threshold has tripped — a fast
visual signal that the CLI is short-circuiting future reads against this
device until the cool-down expires:
```text
Gateway: ab://192.168.1.20/1,0
PLC type: Slc500
Health: Degraded
Host state: Demoted
Last error: libplctag status -33 reading N7:0
```
### Auto-demote knobs
```powershell
# Trip after just one comm failure, hold for 60s.
otopcua-ablegacy-cli read -g ab://192.168.1.20/1,0 -a N7:0 -t Int `
--demote-failure-threshold 1 --demote-for-ms 60000
# Opt out of auto-demote — stresses the link without short-circuiting.
otopcua-ablegacy-cli read -g ab://192.168.1.20/1,0 -a N7:0 -t Int --no-demote
```
The CLI is a one-shot test client — auto-demote primarily matters in the
server-side multi-device deployment, where a single demoted PLC can no
longer block reads against its healthy peers. Use the CLI flags to
reproduce a flapping-link scenario locally before tuning the server-side
`appsettings.json` `Demote` block.
### `read`
```powershell
@@ -75,8 +156,17 @@ otopcua-ablegacy-cli read -g ab://192.168.1.20/1,0 -a L19:0 -t Long
# Timer ACC
otopcua-ablegacy-cli read -g ab://192.168.1.20/1,0 -a T4:0.ACC -t TimerElement
# Diagnostic counter (PR ablegacy-10 / #253). The seven _Diagnostics/<name>
# addresses live alongside user tags — short-circuit serves them straight from
# the in-process counter store, so no PCCC frame is sent to the PLC.
otopcua-ablegacy-cli read -g ab://192.168.1.20/1,0 --address _Diagnostics/RequestCount
```
The diagnostic surface auto-emits per device — no config required. See
`docs/drivers/AbLegacy-Diagnostics.md` for the full counter table + reset
semantics + collision-rejection rules.
### `write`
```powershell
@@ -95,6 +185,108 @@ PLC-managed — use with caution.
otopcua-ablegacy-cli subscribe -g ab://192.168.1.20/1,0 -a N7:10 -t Int -i 500
```
#### Deadband
PR 8 — per-tag absolute / percent change filter on top of the polled subscription. The driver
caches the last *published* value per tag and suppresses `OnDataChange` notifications until the
new sample crosses the configured threshold.
| Flag | Effect |
|---|---|
| `--deadband-absolute <value>` | Suppress until `|new - prev| >= value`. |
| `--deadband-percent <value>` | Suppress until `|new - prev| >= |prev * value / 100|`. `prev == 0` always publishes (avoids div-by-zero). |
Booleans bypass the filter entirely (every transition publishes); strings + status changes
always publish; first-seen always publishes; both flags set → either passing triggers a
publish (Kepware-style logical OR).
```powershell
# Float — drop sub-0.5 jitter from the noisy load-cell address.
otopcua-ablegacy-cli subscribe -g ab://192.168.1.20/1,0 -a F8:0 -t Float -i 500 `
--deadband-absolute 0.5
# Integer — only fire on >= 5% deviation from the last reported value.
otopcua-ablegacy-cli subscribe -g ab://192.168.1.20/1,0 -a N7:10 -t Int -i 500 `
--deadband-percent 5
```
## Array reads
PR 7 — one PCCC frame can carry up to ~120 words. Address an array tag with either the
Rockwell-native `,N` suffix or the libplctag-native `[N]` suffix on the word number; both
forms canonicalise to `[N]` when the driver hands the tag to libplctag, and the parser
caps `N` at 120.
```powershell
# Rockwell `,N` form — "10 consecutive words starting at N7:0"
otopcua-ablegacy-cli read -g ab://192.168.1.20/1,0 -a "N7:0,10" -t Int
# libplctag `[N]` form — same wire result
otopcua-ablegacy-cli read -g ab://192.168.1.20/1,0 -a "N7:0[10]" -t Int
# Float / Long arrays — same suffix syntax, narrower frame ceiling on Float (~60 elements)
# and Long (~60 elements) because each element is 4 bytes vs Int's 2.
otopcua-ablegacy-cli read -g ab://192.168.1.20/1,0 -a "F8:0,4" -t Float
otopcua-ablegacy-cli read -g ab://192.168.1.20/1,0 -a "L19:0,4" -t Long
# --array-length override — pin the element count from config rather than the address
# suffix. Wins over the parsed `,N` / `[N]` value when both are set; useful for keeping the
# address string compact while bumping the element count from a tags config file.
otopcua-ablegacy-cli read -g ab://192.168.1.20/1,0 -a "N7:0" --array-length 10 -t Int
```
Array tags reject sub-element references (`T4:0,5.ACC`) and bit suffixes (`N7:0,10/3`) at
parse time — both combinations are semantically meaningless against a contiguous block.
For `B`-files the Rockwell convention is "one BOOL per word, not per bit": `B3:0,10`
returns `bool[10]` (one per word's non-zero state), not `bool[160]`.
### `import-rslogix`
ablegacy-11 / [#254](https://github.com/dohertj2/lmxopcua/issues/254) — bulk-import RSLogix
500 / 5 CSV symbol exports into an `appsettings.json` tag fragment. Avoids hand-typing every
`N7:0` / `F8:12` / `B3:0/5` row of a several-hundred-tag PLC. Binary `.RSS` / `.RSP` project
files are out of scope; export to CSV first.
```powershell
# Default: emit JSON fragment to stdout
otopcua-ablegacy-cli import-rslogix `
--file C:\plc\plc-export.csv `
--device ab://192.168.1.20/1,0
# Write the fragment to a file + print a summary line to stdout
otopcua-ablegacy-cli import-rslogix `
--file C:\plc\plc-export.csv `
--device ab://192.168.1.20/1,0 `
--output tags.json
# Filter by Scope column — only import Local:1 program-scoped tags
otopcua-ablegacy-cli import-rslogix `
--file C:\plc\plc-export.csv `
--device ab://192.168.1.20/1,0 `
--scope Local:1
# Summary mode — one-line counter for CI / health checks
otopcua-ablegacy-cli import-rslogix `
--file C:\plc\plc-export.csv `
--device ab://192.168.1.20/1,0 `
--emit summary
```
| Flag | Default | Purpose |
|---|---|---|
| `-f` / `--file` | **required** | RSLogix CSV path |
| `-d` / `--device` | **required** | `ab://host[:port]/cip-path` every imported tag binds to |
| `--emit` | `appsettings-fragment` | `appsettings-fragment` (JSON) or `summary` (one-line counter) |
| `-o` / `--output` | stdout | Optional output file path |
| `--scope` | none | Scope filter — `Global` / `Local:N` (case-insensitive); empty Scope counts as Global |
| `--max-rows` | unlimited | Defensive cap on rows imported |
| `--strict` | off | Fail-fast on first malformed row (default permissive: skip + log) |
See [drivers/AbLegacy-RSLogix-Import.md](drivers/AbLegacy-RSLogix-Import.md) for the full
column reference, file-letter → `AbLegacyDataType` mapping, and the API surface
(`IRsLogixImporter`, `AbLegacyDriverOptions.AddRsLogixImport`).
## Known caveat — ab_server upstream gap
The integration-fixture `ab_server` Docker container accepts TCP but its PCCC
+81 -18
View File
@@ -5,22 +5,27 @@ protocol. Uses the **same** `FocasDriver` the OtOpcUa server does — PMC R/G/F
file registers, axis bits, parameters, and macro variables — all through
`FocasAddressParser` syntax.
Sixth of the driver test-client CLIs.
Sixth of the driver test-client CLIs, added alongside the Tier-C isolation
work tracked in task #220.
## Architecture note
FOCAS is an in-process driver. The pure-managed `WireFocasClient`
speaks the FOCAS2 binary protocol directly over TCP:8193, removing the
Tier-C process-isolation split that the historical P/Invoke + out-of-
process Host arrangement required. The CLI loads `FocasDriver` with
`WireFocasClientFactory` and talks to the CNC without any native
components.
FOCAS is a Tier-C driver: `Fwlib32.dll` is a proprietary 32-bit Fanuc library
with a documented habit of crashing its hosting process on network errors.
The target runtime deployment splits the driver into an in-process
`FocasProxyDriver` (.NET 10 x64) and an out-of-process `Driver.FOCAS.Host`
(.NET 4.8 x86 Windows service) that owns the DLL — see
[v2/implementation/focas-isolation-plan.md](v2/implementation/focas-isolation-plan.md)
and
[v2/implementation/phase-6-1-resilience-and-observability.md](v2/implementation/phase-6-1-resilience-and-observability.md)
for topology + supervisor / respawn / back-pressure design.
A dev-friendly mock is available — start
`tests/ZB.MOM.WW.OtOpcUa.Driver.FOCAS.IntegrationTests/Docker/docker-compose.yml`
and point `--cnc-host` at `localhost` for end-to-end CLI exercises
without a real CNC. See
[drivers/FOCAS-Test-Fixture.md](drivers/FOCAS-Test-Fixture.md).
The CLI skips the proxy and loads `FocasDriver` directly (via
`FwlibFocasClientFactory`, which P/Invokes `Fwlib32.dll` in the CLI's own
process). There is **no public simulator** for FOCAS; a meaningful probe
requires a real CNC + a licensed `Fwlib32.dll` on `PATH` (or next to the
executable). On a dev box without the DLL, every wire call surfaces as
`BadCommunicationError` — still useful as a "CLI wire-up is correct" signal.
## Build + run
@@ -46,6 +51,7 @@ Every command accepts:
| `-p` / `--cnc-port` | `8193` | FOCAS TCP port (FOCAS-over-EIP default) |
| `-s` / `--series` | `Unknown` | CNC series — `Unknown` / `Zero_i_D` / `Zero_i_F` / `Zero_i_MF` / `Zero_i_TF` / `Sixteen_i` / `Thirty_i` / `ThirtyOne_i` / `ThirtyTwo_i` / `PowerMotion_i` |
| `--timeout-ms` | `2000` | Per-operation timeout |
| `--cnc-password` | (none) | **F4-d (issue #271)** — optional CNC connection-level password emitted via `cnc_wrunlockparam` on connect. Required only by controllers that gate parameter writes / selected reads behind a password switch (16i + some 30i firmwares with parameter-protect on). **PASSWORD INVARIANT: never logged.** The CLI's Serilog config does not destructure this flag and `FocasDeviceOptions.ToString` redacts the value. See [`v2/focas-deployment.md`](v2/focas-deployment.md) § "FOCAS password handling". |
| `--verbose` | off | Serilog debug output |
## Addressing
@@ -105,16 +111,74 @@ Values parse per `--type` with invariant culture. Booleans accept
```powershell
otopcua-focas-cli write -h 192.168.1.50 -a R100 -t Int16 -v 42
otopcua-focas-cli write -h 192.168.1.50 -a G50.3 -t Bit -v on
otopcua-focas-cli write -h 192.168.1.50 -a MACRO:500 -t Float64 -v 3.14
# MACRO: write — recipe / setpoint surface (server-side WriteOperate ACL)
otopcua-focas-cli write -h 192.168.1.50 -a MACRO:500 -t Int32 -v 42
# PARAM: write — commissioning surface (server-side WriteConfigure ACL,
# CNC must be in MDI mode + parameter-write switch enabled, else EW_PASSWD
# surfaces as BadUserAccessDenied)
otopcua-focas-cli write -h 192.168.1.50 -a PARAM:1815 -t Int32 -v 100
```
> **WARNING — `write -a G50.3 -t Bit -v on` is a read-modify-write.**
> The wire call `pmc_wrpmcrng` is byte-addressed; the driver reads the
> parent byte at `G50` first, sets bit 3, and writes the byte back. Other
> bits in `G50` that the ladder is concurrently updating may be clobbered
> by the byte we read a millisecond ago. Coordinate via a ladder-side
> handshake when this matters. **PMC writes also bypass the ladder's
> normal MDI-mode protection** — a misdirected bit can move motion or
> latch a feedhold the moment it lands. Verify e-stop is live and the
> machine is in JOG mode before issuing the first PMC write of a
> session. See [`docs/drivers/FOCAS.md`](drivers/FOCAS.md) "PMC bit-write
> read-modify-write semantics" for the full RMW flow.
PMC G/R writes land on a running machine — be careful which file you hit.
Parameter writes may require the CNC to be in MDI mode with the
parameter-write switch enabled.
#### Server-enforced ACL — issue #269, plan PR F4-b
When the same write flows through the OtOpcUa server (rather than the CLI's
direct-to-CNC path), the server-layer ACL gates by tag kind:
- `PARAM:` writes require **`WriteConfigure`** group membership — heavier
ACL because a misdirected parameter write can put the CNC in a bad
state.
- `MACRO:` writes require **`WriteOperate`** — matches the standard HMI
recipe / setpoint surface.
- PMC R/G/F writes require **`WriteOperate`**.
The classification is declared by the FOCAS driver per tag and enforced by
`DriverNodeManager`; the driver itself never inspects user identity. See
[`docs/security.md`](security.md) for the full LDAP-group → permission
mapping, [`docs/v2/acl-design.md`](v2/acl-design.md) for the design, and
[`docs/v2/focas-deployment.md`](v2/focas-deployment.md) "Write safety" for
the operator pre-check runbook (MDI mode, parameter-write switch).
**Writes are non-idempotent by default** — a timeout after the CNC already
applied the write will NOT auto-retry (plan decisions #44 + #45).
#### Server-side `Writes` enforcement (issue #268 F4-a + #269 F4-b + #270 F4-c)
The OtOpcUa server gates every FOCAS write behind multiple independent
opt-ins: `FocasDriverOptions.Writes.Enabled` (driver-level master switch),
`Writes.AllowParameter` (PARAM kill switch — F4-b), `Writes.AllowMacro`
(MACRO kill switch — F4-b), `Writes.AllowPmc` (PMC kill switch — F4-c),
and `FocasTagDefinition.Writable` (per-tag). All default `false`; any one
off short-circuits the server-side `WriteAsync` to `BadNotWritable` before
the wire client is touched. See [`docs/drivers/FOCAS.md`](drivers/FOCAS.md)
"Writes (opt-in, off by default)" subsection +
[`docs/v2/decisions.md`](v2/decisions.md) for the decision record.
**The CLI bypasses the server-side flag.** `otopcua-focas-cli write` is a
per-invocation operator tool — it sets `Writes.Enabled = true` locally for
the lifetime of one process and creates the synthesised tag with
`Writable = true`. This is intentional: the CLI is the operator's
direct-to-CNC fallback, not a long-lived process bound to the central
config DB. Configuring the server still requires both opt-ins to be set
explicitly in the DriverInstance JSON.
### `subscribe` — watch an address until Ctrl+C
FOCAS has no push model; the shared `PollGroupEngine` handles the tick
@@ -147,8 +211,7 @@ fails.
**"Why did this macro flip?"** → `subscribe` to the macro, let the
operator reproduce the cycle, watch the HH:mm:ss.fff timeline.
**"Can I reach the CNC on TCP:8193?"** → `probe` against any host. A
`BadCommunicationError` means the wire client couldn't open a socket
(firewall / wrong host / FOCAS Ethernet option unlicensed on the CNC).
`BadDeviceFailure` after a successful connect means the CNC is rejecting
the session setup — check the CNC's FOCAS option and password settings.
**"Is the Fwlib32 DLL wired up?"** → `probe` against any host. A
`DllNotFoundException` surfacing as `BadCommunicationError` with a
matching `Last error` line means the driver is loading but the DLL is
missing; anything else means a transport-layer problem.
-34
View File
@@ -119,37 +119,3 @@ address.
**"What's the right byte order for this family?"** → `read` with
`--byte-order BigEndian`, then with `--byte-order WordSwap`. The one that
gives plausible values is the correct one for that device.
## v2 addressing grammar
The driver accepts the industry-standard tag-address grammar so you can
paste tag spreadsheets from Wonderware / Kepware / Ignition without
per-row manual translation. Full reference + grammar rules:
[`docs/v2/modbus-addressing.md`](v2/modbus-addressing.md).
Quick examples:
```
40001 HoldingRegisters[0], Int16
400001 same, 6-digit form
40001:F Float32
40001:F:CDAB Float32 word-swapped
40001:STR20 20-char ASCII string
40001:S:5 Int16[5] array (3-field shorthand)
40001:F:CDAB:10 Float32[10] with explicit word-swap (4-field strict)
40001.5 bit 5 of HR[0]
HR1:I Int32 via mnemonic region prefix (matches Wonderware)
C100 Coil 100 (mnemonic, 1-based)
V2000:F:CDAB DL205 V-memory at PDU 1024 + Float32 + word-swap (Family=DL205)
D100:I MELSEC D-register 100, Int32 (Family=MELSEC)
```
**Type-code reminder** (post-#146): `:I` is **Int32** (matches Wonderware
DASMBTCP + Ignition `HRI`). The explicit Int16 code is `:S`. Bare HR/IR
with no type still defaults to Int16. Pre-#146 codes `:DI` / `:L` /
`:UDI` / `:UL` / `:LI` / `:ULI` / `:LBCD` are removed; configs that use
them get a clear "Unknown type code" diagnostic at parse time.
In `DriverConfig` JSON, set the per-tag `addressString` field instead of
the structured `region` + `address` + `dataType` fields. Both styles can
coexist within one driver instance.
+134
View File
@@ -22,6 +22,11 @@ dotnet run --project src/ZB.MOM.WW.OtOpcUa.Driver.S7.Cli -- --help
| `--rack` | `0` | Hardware rack (S7-400 distributed setups only) |
| `--slot` | `0` | CPU slot (S7-300 = 2, S7-400 = 2 or 3, S7-1200/1500 = 0) |
| `--timeout-ms` | `5000` | Per-operation timeout |
| `--tsap-mode` | `Auto` | ISO-on-TCP connection class: `Auto` / `Pg` / `Op` / `S7Basic` / `Other`. Hardened S7-1500 / ET 200SP CPUs may require `Op` or `S7Basic`. See [s7.md TSAP / Connection Type](v2/s7.md#tsap--connection-type). |
| `--local-tsap` | (unset) | Optional 16-bit local TSAP override (e.g. `0x0200`). Required when `--tsap-mode Other`; wins over class default under Pg/Op/S7Basic. |
| `--remote-tsap` | (unset) | Optional 16-bit remote TSAP override. Required when `--tsap-mode Other`; wins over class default under Pg/Op/S7Basic. |
| `--password` | (unset) | Connection-level password sent right after `OpenAsync`. Used by hardened S7-300/400 (protection levels 1-3) and S7-1200/1500 (TIA Portal *Connection Mechanism* gate). Never logged. NB: S7netplus 0.20 doesn't expose `SendPassword`; the CLI prints a one-line warning and continues. See [s7.md "PLC password / protection levels"](v2/s7.md#plc-password--protection-levels). |
| `--protection-level` | `Auto` | Declarative hint: `Auto` / `None` / `Level1` / `Level2` / `Level3` (S7-300/400) / `ConnectionMechanism` (S7-1200/1500). Diagnostic only — the wire-side unlock is driven by `--password`. |
| `--verbose` | off | Serilog debug output |
## PUT/GET must be enabled
@@ -31,6 +36,28 @@ Enable it in TIA Portal: *Device config → Protection & Security → Connection
mechanisms → "Permit access with PUT/GET communication from remote partner"*.
Without it the CLI's first read will surface `BadNotSupported`.
### Pre-flight PUT/GET enablement (PR-S7-C5)
The driver issues a tiny 2-byte read against `Probe.ProbeAddress` (default
`MW0`) immediately after `OpenAsync` and **fails `InitializeAsync` with a
typed `S7PutGetDisabledException`** when the PLC rejects the read with the
wire-level "function not allowed" response. The exception message names the
exact TIA Portal toggle to flip — operators see the configuration fix at
init time, not after the first per-tag read produces `BadDeviceFailure`.
Two opt-out knobs on the JSON `Probe` block:
- `ProbeAddress` — set to `""` (empty string) to skip the pre-flight read
entirely. Useful when no fingerprint address has been wired.
- `SkipPreflight` — set to `true` to defer the check to runtime while
keeping the background liveness loop. Per-tag reads still surface
`BadDeviceFailure` until PUT/GET is enabled, but Init succeeds and the
driver becomes visible in the Admin UI.
See [s7.md "Pre-flight PUT/GET enablement"](v2/s7.md#pre-flight-putget-enablement)
for the full rationale, classifier behaviour, and the wire-level
`ErrorCode` matching.
## S7 address grammar cheat sheet
| Form | Meaning |
@@ -70,6 +97,17 @@ otopcua-s7-cli read -h 192.168.1.30 -a M0.0 -t Bool
# 80-char S7 string
otopcua-s7-cli read -h 192.168.1.30 -a DB10.STRING[0] -t String --string-length 80
# CPU diagnostics (SZL) — virtual @System.* addresses (PR-S7-E1).
# Requires ExposeSystemTags = true on the driver instance; surfaces as
# BadNotSupported until S7netplus exposes a public ReadSzlAsync (or we ship
# a raw-PDU helper). See docs/v2/s7.md "CPU diagnostics (SZL)" for the full
# table and the snap7 / S7netplus 0.20 caveat.
otopcua-s7-cli read -h 192.168.1.30 -a @System.CpuType -t String
otopcua-s7-cli read -h 192.168.1.30 -a @System.Firmware -t String
otopcua-s7-cli read -h 192.168.1.30 -a @System.OrderNo -t String
otopcua-s7-cli read -h 192.168.1.30 -a @System.CycleMs.Min -t Float64
otopcua-s7-cli read -h 192.168.1.30 -a "@System.DiagBuffer.Entry[0]" -t String
```
### `write`
@@ -83,6 +121,63 @@ otopcua-s7-cli write -h 192.168.1.30 -a M0.0 -t Bool -v true
**Writes to M / Q are real** — they drive the PLC program. Be careful what you
flip on a running machine.
### Hardened CPU — forcing OP-class TSAP
```powershell
# Probe a hardened S7-1500 that rejects PG class but accepts OP.
otopcua-s7-cli probe -h 10.50.12.30 --tsap-mode Op
# Read against the same CPU.
otopcua-s7-cli read -h 10.50.12.30 --tsap-mode Op -a DB1.DBW0 -t Int16
# Manual TSAP override (e.g. site with a fixed proprietary TSAP gateway).
otopcua-s7-cli probe -h 10.50.12.30 --tsap-mode Other --local-tsap 0x4D57 --remote-tsap 0x4D58
```
Without `--tsap-mode`, the CLI uses S7netplus's CpuType-derived default (PG
class for almost everything). The same connection-refused failure shape that a
wrong `--slot` produces also shows up when the CPU rejects PG class — try
`--tsap-mode Op` first when the handshake is failing on otherwise-correct
endpoint config. See [s7.md TSAP / Connection Type](v2/s7.md#tsap--connection-type)
for the byte table and motivation.
### Hardened CPU — supplying a connection-level password
```powershell
# S7-300 protection-level 2 — read+write protected without unlock.
otopcua-s7-cli read -h 192.168.1.31 -c S7300 --slot 2 `
--password "tia-portal-set-password" `
--protection-level Level2 `
-a DB1.DBW0 -t Int16
# S7-1500 ConnectionMechanism — TIA Portal Protection & Security pane gate.
otopcua-s7-cli probe -h 10.50.12.30 `
--tsap-mode Op `
--password "tia-portal-set-password" `
--protection-level ConnectionMechanism
```
The password is emitted to the PLC immediately after `OpenAsync` succeeds and
before the pre-flight PUT/GET probe runs (the same probe that would otherwise
be the first operation a hardened CPU refuses). Never logged in any form;
identifier-only success line is `S7 password sent for {Host}`.
**S7netplus 0.20 does not yet expose a public `SendPassword`** — the driver
discovers the method reflectively, so a future minor release will be picked
up automatically. Until then, configuring `--password` on a hardened CPU
emits this warning at Init:
```
[Warning] S7 password is set on driver '<id>' against host '<host>', but
the linked S7netplus library does not expose SendPassword; password is
being ignored at the wire.
```
Init still completes (the COTP handshake itself doesn't require the
password) but the first read against a hardened CPU will surface
`BadDeviceFailure`. See [s7.md "PLC password / protection levels"](v2/s7.md#plc-password--protection-levels)
for the full motivation, the no-log invariant, and the workaround matrix.
### `subscribe`
```powershell
@@ -91,3 +186,42 @@ otopcua-s7-cli subscribe -h 192.168.1.30 -a DB1.DBW0 -t Int16 -i 500
S7comm has no native push — the CLI polls through `PollGroupEngine` just like
Modbus / AB.
### `import-symbols`
PR-S7-D1 / [#299](https://github.com/dohertj2/lmxopcua/issues/299) — read a TIA
Portal CSV ("Show all tags" export) or STEP 7 Classic `.AWL` file and emit a
JSON tag fragment for `appsettings.json`, or a one-line summary. Mirrors the
AB Legacy `import-rslogix` CLI in shape.
```powershell
# TIA Portal CSV — emit JSON fragment to stdout
otopcua-s7-cli import-symbols --file plc-export.csv --format tia
# STEP 7 Classic AWL — emit summary line
otopcua-s7-cli import-symbols --file classic.awl --format awl --emit summary
# DE-locale CSV — auto-detected; output to file
otopcua-s7-cli import-symbols `
--file plc-de.csv `
--format tia `
--emit appsettings-fragment `
--output tags.json
# Strict mode — fail-fast on the first malformed row (CI lint)
otopcua-s7-cli import-symbols --file plc.csv --format tia --strict
```
| Flag | Default | Purpose |
|---|---|---|
| `-f` / `--file` | **required** | Path to the TIA CSV or `.AWL` file |
| `--format` | `tia` | `tia` (CSV) or `awl` (STEP 7 Classic) |
| `-d` / `--device` | none | Optional documentation tag (held for symmetry with `import-rslogix`) |
| `--emit` | `appsettings-fragment` | `appsettings-fragment` (JSON) or `summary` (one-line counter) |
| `-o` / `--output` | stdout | Optional path; when set the JSON fragment is written there + summary line goes to stdout |
| `--max-rows` | unlimited | Defensive cap on rows imported |
| `--strict` | off | Fail-fast on the first malformed row (default permissive: skip + log) |
UDT-typed rows import as placeholder tags (data type forced to `Byte`); see
[S7-TIA-Import.md](drivers/S7-TIA-Import.md) for the full format reference,
locale auto-detection, and AWL position-based addressing rules.
+152 -35
View File
@@ -1,9 +1,9 @@
# `otopcua-twincat-cli` — Beckhoff TwinCAT test client
Ad-hoc probe / read / write / subscribe / browse tool for Beckhoff TwinCAT 2 /
TwinCAT 3 runtimes via ADS. Uses the **same** `TwinCATDriver` the OtOpcUa
server does (`Beckhoff.TwinCAT.Ads` package). Native ADS notifications by
default; `--poll-only` falls back to the shared `PollGroupEngine`.
Ad-hoc probe / read / write / subscribe tool for Beckhoff TwinCAT 2 / TwinCAT 3
runtimes via ADS. Uses the **same** `TwinCATDriver` the OtOpcUa server does
(`Beckhoff.TwinCAT.Ads` package). Native ADS notifications by default;
`--poll-only` falls back to the shared `PollGroupEngine`.
Fifth (final) of the driver test-client CLIs.
@@ -28,6 +28,56 @@ sessions. Pick one:
The CLI compiles + runs without a router, but every wire call fails with a
transport error until one is reachable.
## UDT decomposition
PR 4.1 (issue #315) replaces the old "skip non-atomic symbols" behaviour
of `BrowseSymbolsAsync` with a recursive type walker
(`TwinCATTypeWalker`). When the OtOpcUa server's TwinCAT driver runs
discovery with `EnableControllerBrowse=true`, struct / UDT / function-block
typed symbols flatten into one OPC UA variable per atomic leaf. Browse
addresses use the same dotted-instance form the PLC exposes:
| PLC declaration | OPC UA browse paths surfaced |
|---|---|
| `MAIN.bStart : BOOL` | `MAIN.bStart` |
| `GVL.stMotor : ST_Motor` | `GVL.stMotor.bRunning`, `GVL.stMotor.nState`, `GVL.stMotor.rTemperature`, … |
| `GVL.aRecipe : ARRAY[1..10] OF DINT` | `GVL.aRecipe[1]``GVL.aRecipe[10]` |
| `GVL.aPairs : ARRAY[0..2] OF ST_Pair` | `GVL.aPairs[0].nCount`, `GVL.aPairs[0].rValue`, `GVL.aPairs[1].…` |
| `GVL.aBig : ARRAY[1..5000] OF DINT` | `GVL.aBig` (single whole-array root — over the cap) |
The CLI's `read` / `write` / `subscribe` commands take dotted paths
directly:
```powershell
# Read a struct member
otopcua-twincat-cli read -n 192.168.1.40.1.1 -s GVL.stMotor.rTemperature -t Real
# Read an array element
otopcua-twincat-cli read -n 192.168.1.40.1.1 -s "GVL.aRecipe[3]" -t DInt
```
### Array expansion bound
`TwinCATDriverOptions.MaxArrayExpansion` (default `1024`) caps how many
elements an array contributes to the discovered address space. Arrays
whose total element count exceeds the cap surface as a single
whole-array root with `IsArrayRoot=true` instead of one variable per
element. Raise the bound when operators routinely care about individual
elements of large recipe / lookup tables; lower it to keep discovery
cheap for symbol tables that ship multi-thousand-element scratch
arrays. Pre-declared whole-array tags from the `Tags` config bypass the
walker entirely — set `ArrayDimensions` on a `TwinCATTagDefinition` to
keep array reads on the existing PR 1.4 read-array path.
### Cycle / depth guard
The walker tracks the visited-type set + a hard depth cap of 8 levels
so a self-pointer (`POINTER TO ST_Self`) or pathological alias chain
terminates rather than spinning. POINTER / REFERENCE members are
skipped at the type-graph level — surfacing them would require
dereferencing through the AMS routing layer which has its own access
patterns.
## Common flags
| Flag | Default | Purpose |
@@ -50,13 +100,6 @@ caller interpret semantics.
### `probe`
Per-command flags:
| Flag | Default | Purpose |
|---|---|---|
| `-s` / `--symbol` | **required** | Symbol path to probe (e.g. `MAIN.bRunning`) |
| `--type` | `DInt` | Declared data type — see the [Data types](#data-types) list |
```powershell
# Local TwinCAT 3, probe a canonical global
otopcua-twincat-cli probe -n 127.0.0.1.1.1 -s "TwinCAT_SystemInfoVarList._AppInfo.OnlineChangeCnt"
@@ -65,6 +108,35 @@ otopcua-twincat-cli probe -n 127.0.0.1.1.1 -s "TwinCAT_SystemInfoVarList._AppInf
otopcua-twincat-cli probe -n 192.168.1.40.1.1 -s MAIN.bRunning --type Bool
```
#### Health probe
The OtOpcUa server's TwinCAT driver runs an internal probe loop (PR 3.2, issue #314)
that — alongside the cheap `ReadStateAsync` reachability check — samples four
well-known system symbols once per probe interval and surfaces the result through
the cross-driver `driver-diagnostics` RPC (added for Modbus, task #154). The same
symbols can be probed directly via the CLI for ad-hoc troubleshooting:
```powershell
# Cycle time (UDINT, 100 ns ticks → ÷10000 for ms)
otopcua-twincat-cli probe -n 192.168.1.40.1.1 -s "TwinCAT_SystemInfoVarList._TaskInfo[1].CycleTime" --type UDInt
# Last task execution wall-clock (UDINT, 100 ns ticks → ÷10000 for ms)
otopcua-twincat-cli probe -n 192.168.1.40.1.1 -s "TwinCAT_SystemInfoVarList._TaskInfo[1].LastExecTime" --type UDInt
# Online-change count — increments on every accepted online change
otopcua-twincat-cli probe -n 192.168.1.40.1.1 -s "TwinCAT_SystemInfoVarList._AppInfo.OnlineChangeCnt" --type UDInt
# Loaded PLC project name (STRING(80))
otopcua-twincat-cli probe -n 192.168.1.40.1.1 -s "TwinCAT_SystemInfoVarList._AppInfo.AppName" --type String
```
Within the running OtOpcUa server these four signals land on
`DeviceState.LastDiagnostics` as a `TwinCATDeviceDiagnostics` record + are folded
into `DriverHealth.Diagnostics` keyed `TwinCAT.CycleTimeMs`, `TwinCAT.LastExecTimeMs`,
`TwinCAT.JitterMs` (computed `LastExecTimeMs - CycleTimeMs`),
`TwinCAT.OnlineChangeCnt`, and `TwinCAT.OnlineChangeIncrements`. See
`docs/drivers/TwinCAT-Test-Fixture.md §Diagnostics` for the full mapping.
### `read`
```powershell
@@ -84,6 +156,14 @@ otopcua-twincat-cli read -n 192.168.1.40.1.1 -s "Recipe[3]" -t Real
otopcua-twincat-cli read -n 192.168.1.40.1.1 -s GVL.sMessage -t WString
```
ADS variable handles for `read` / `write` symbols are cached transparently
inside the CLI's underlying `AdsTwinCATClient`. The first read of a symbol
resolves a handle; repeats reuse the cached handle for smaller AMS payloads
and skipped name resolution. The cache wipes on reconnect, on
`DeviceSymbolVersionInvalid` (with a one-shot retry), and on CLI exit. See
`docs/drivers/TwinCAT-Test-Fixture.md §Handle caching` for the full story
including the staleness caveat after an online change.
### `write`
```powershell
@@ -96,41 +176,78 @@ Structure writes refused — drop to driver config JSON for those.
### `subscribe`
Per-command flags:
| Flag | Default | Purpose |
|---|---|---|
| `-s` / `--symbol` | **required** | Symbol path — same format as `read` |
| `-t` / `--type` | `DInt` | Declared data type |
| `-i` / `--interval-ms` | `1000` | Publishing interval in **milliseconds** — native mode passes this as the ADS `NotificationSettings.CycleTime` |
```powershell
# Native ADS notifications (default) — PLC pushes on its own cycle
otopcua-twincat-cli subscribe -n 192.168.1.40.1.1 -s GVL.Counter -t DInt -i 500
# Fall back to polling for runtimes where native notifications are constrained
otopcua-twincat-cli subscribe -n 192.168.1.40.1.1 -s GVL.Counter -t DInt -i 500 --poll-only
# Coalesce bursty changes — runtime buffers up to 500 ms before dispatch
otopcua-twincat-cli subscribe -n 192.168.1.40.1.1 -s GVL.Counter -t DInt -i 50 --max-delay-ms 500
```
The subscribe banner announces which mechanism is in play — "ADS notification"
or "polling" — so it's obvious in screen-recorded bug reports.
### `browse`
Walks the controller's symbol table via ADS `SymbolLoaderFactory` (same path
`TwinCATDriver.DiscoverAsync` takes when `EnableControllerBrowse = true`).
Output filters to symbols whose type maps onto the driver's atomic surface —
UDTs / function-block instances don't appear.
| Flag | Default | Purpose |
|---|---|---|
| `--prefix` | _(none)_ | Case-sensitive instance-path prefix filter (e.g. `GVL_Fixture`) |
| `--max` | `500` | Max symbols to print. `0` = unbounded |
| `-s` / `--symbol` | **required** | Symbol path — same format as `read` |
| `-t` / `--type` | `DInt` | IEC type (see Data types section) |
| `-i` / `--interval-ms` | `1000` | **Cycle time** — minimum interval between change checks the PLC runtime applies |
| `--max-delay-ms` | `0` | **Max coalescing window** — upper bound on how long the runtime buffers change events before dispatch. `0` = fire ASAP, no coalescing |
| `--poll-only` | off | Disable native notifications, use `PollGroupEngine` instead |
`-i` / `--interval-ms` and `--max-delay-ms` are different things and both flow
into the Beckhoff `NotificationSettings` ctor:
- **`--interval-ms`** is the *cycle*: the runtime checks for value changes at
most this often. Smaller = lower latency, higher CPU.
- **`--max-delay-ms`** is the *coalescing ceiling*: once a change is detected,
the runtime can hold it for up to this long before dispatching, which lets
it batch a burst of changes into a single callback. Default `0` means
every detected change fires immediately — same as the pre-PR-3.1 behaviour.
For high-frequency signals (a counter incrementing every 10 ms PLC cycle),
pair a small `-i` (so latency stays bounded) with a non-zero `--max-delay-ms`
(so the OPC UA queue downstream doesn't flood). For slow signals just leave
`--max-delay-ms` at `0`.
The subscribe banner announces which mechanism is in play — "ADS notification"
or "polling" — and includes the `max-delay` value when set, so it's obvious
in screen-recorded bug reports.
`--poll-only` polls go through the same cached-handle path as `read`, so
repeated polls of the same symbol carry only a 4-byte handle on the wire
rather than the full symbolic path.
### `alarms` (PR 5.1 / #316)
Stream TC3 EventLogger alarms via the driver's `IAlarmSource` bridge.
Subscribes against AMS port 110 (`AMSPORT_EVENTLOG`) on the same target,
prints each event with timestamp / source / severity / message until
Ctrl+C.
```powershell
# Everything under a single GVL
otopcua-twincat-cli browse -n 192.168.1.40.1.1 --prefix GVL_Fixture
# All alarms — every event the EventLogger surfaces
otopcua-twincat-cli alarms -n 192.168.1.40.1.1
# Full dump (beware: flat-mode walks on a real controller can top 10k symbols)
otopcua-twincat-cli browse -n 192.168.1.40.1.1 --max 0
# Filter by source — only events whose source name matches (case-insensitive)
otopcua-twincat-cli alarms -n 192.168.1.40.1.1 --source Conveyor1.MotorOverload
# Multiple sources — repeat the flag
otopcua-twincat-cli alarms -n 192.168.1.40.1.1 --source Conveyor1 --source Pump3
```
| Flag | Default | Purpose |
|---|---|---|
| `--source` | (none) | Optional source filter; repeat for multiple |
Output format (one line per event):
```
[HH:mm:ss.fff] <source> sev=<Low|Medium|High|Critical> type=<event-class> cond=<condition-id> "<message>"
```
The verb forces `EnableAlarms=true` on the underlying driver; the
default driver config keeps it off so deployments without an
EventLogger configured pay no cost. See
[`docs/drivers/TwinCAT.md` §Alarms](drivers/TwinCAT.md) for the
full bridge architecture and decode caveats.
+13
View File
@@ -67,6 +67,19 @@ their flag values to the already-shipped driver.
then the other. The plausible result identifies the correct setting
for that device family. (Modbus, S7.)
## Family-specific commands
Most drivers ship the four shared verbs and nothing else. AB Legacy adds a
fifth family-specific verb for bulk symbol-table import:
| Driver | Extra verb | Doc |
|---|---|---|
| AB Legacy | `import-rslogix` — read RSLogix 500/5 CSV symbol exports + emit a JSON tag fragment | [drivers/AbLegacy-RSLogix-Import.md](drivers/AbLegacy-RSLogix-Import.md) |
Binary RSLogix project files (`.RSS` / `.RSP`) are out of scope for v1 — the
format is proprietary and undocumented; no parser ships in libplctag or any
community library. Export to CSV first.
## Known gaps
- **AB Legacy cip-path quirk** — libplctag's ab_server requires a
+14 -20
View File
@@ -11,8 +11,9 @@ The project was originally called **LmxOpcUa** (a single-driver Galaxy/MXAccess
- **Core** owns the OPC UA stack, address space, session/security/subscription machinery.
- **Drivers** plug in via capability interfaces in `ZB.MOM.WW.OtOpcUa.Core.Abstractions`: `IDriver`, `IReadable`, `IWritable`, `ITagDiscovery`, `ISubscribable`, `IHostConnectivityProbe`, `IAlarmSource`, `IHistoryProvider`, `IPerCallHostResolver`. Each driver opts into whichever it supports.
- **Server** is the OPC UA endpoint process (net10, AnyCPU). Hosts every driver in-process. The Galaxy driver reaches MXAccess via gRPC to a separately-installed **mxaccessgw** sidecar (sibling repo); it is no longer hosted from this repo.
- **Server** is the OPC UA endpoint process (net10, x64). Hosts every driver except Galaxy in-process; talks to Galaxy via a named pipe because MXAccess COM is 32-bit-only.
- **Admin** is the Blazor Server operator UI (net10, x64). Owns the Config DB draft/publish flow, ACL + role-grant authoring, fleet status + `/metrics` scrape endpoint.
- **Galaxy.Host** is a .NET Framework 4.8 x86 Windows service that wraps MXAccess COM on an STA thread for the Galaxy driver.
## Where to find what
@@ -23,11 +24,11 @@ The project was originally called **LmxOpcUa** (a single-driver Galaxy/MXAccess
| [OpcUaServer.md](OpcUaServer.md) | Top-level server architecture — Core, driver dispatch, Config DB, generations |
| [AddressSpace.md](AddressSpace.md) | `GenericDriverNodeManager` + `ITagDiscovery` + `IAddressSpaceBuilder` |
| [ReadWriteOperations.md](ReadWriteOperations.md) | OPC UA Read/Write → `CapabilityInvoker``IReadable`/`IWritable` |
| [Subscriptions.md](v1/Subscriptions.md) | Monitored items → `ISubscribable` + per-driver subscription refcount (v1 archive) |
| [AlarmTracking.md](v1/AlarmTracking.md) | `IAlarmSource` + `AlarmSurfaceInvoker` + OPC UA alarm conditions (v1 archive) |
| [DataTypeMapping.md](v1/DataTypeMapping.md) | Per-driver `DriverAttributeInfo` → OPC UA variable types (v1 archive — live mapping is in `src/ZB.MOM.WW.OtOpcUa.Driver.Galaxy/Browse/DataTypeMap.cs`) |
| [Subscriptions.md](Subscriptions.md) | Monitored items → `ISubscribable` + per-driver subscription refcount |
| [AlarmTracking.md](AlarmTracking.md) | `IAlarmSource` + `AlarmSurfaceInvoker` + OPC UA alarm conditions |
| [DataTypeMapping.md](DataTypeMapping.md) | Per-driver `DriverAttributeInfo` → OPC UA variable types |
| [IncrementalSync.md](IncrementalSync.md) | Address-space rebuild on redeploy + `sp_ComputeGenerationDiff` |
| [HistoricalDataAccess.md](v1/HistoricalDataAccess.md) | `IHistoryProvider` as a per-driver optional capability (v1 archive) |
| [HistoricalDataAccess.md](HistoricalDataAccess.md) | `IHistoryProvider` as a per-driver optional capability |
| [VirtualTags.md](VirtualTags.md) | `Core.Scripting` + `Core.VirtualTags` — Roslyn script sandbox, engine, dispatch alongside driver tags |
| [ScriptedAlarms.md](ScriptedAlarms.md) | `Core.ScriptedAlarms` — script-predicate `IAlarmSource` + Part 9 state machine |
@@ -35,7 +36,7 @@ Two Core subsystems are shipped without a dedicated top-level doc; see the secti
| Project | See |
|---------|-----|
| `Core.AlarmHistorian` | [AlarmTracking.md](v1/AlarmTracking.md) § Alarm historian sink (v1 archive) |
| `Core.AlarmHistorian` | [AlarmTracking.md](AlarmTracking.md) § Alarm historian sink |
| `Analyzers` (Roslyn OTOPCUA0001) | [security.md](security.md) § OTOPCUA0001 Analyzer |
### Drivers
@@ -43,8 +44,8 @@ Two Core subsystems are shipped without a dedicated top-level doc; see the secti
| Doc | Covers |
|-----|--------|
| [drivers/README.md](drivers/README.md) | Index of the eight shipped drivers + capability matrix |
| [drivers/Galaxy.md](drivers/Galaxy.md) | Galaxy driver — in-process gRPC client to the mxaccessgw sidecar |
| [v1/drivers/Galaxy-Repository.md](v1/drivers/Galaxy-Repository.md) | Galaxy-specific discovery via the ZB SQL database (v1 archive — the gateway owns this path now) |
| [drivers/Galaxy.md](drivers/Galaxy.md) | Galaxy driver — MXAccess bridge, Host/Proxy split, named-pipe IPC |
| [drivers/Galaxy-Repository.md](drivers/Galaxy-Repository.md) | Galaxy-specific discovery via the ZB SQL database |
For Modbus / S7 / AB CIP / AB Legacy / TwinCAT / FOCAS / OPC UA Client specifics, see [v2/driver-specs.md](v2/driver-specs.md).
@@ -52,10 +53,10 @@ For Modbus / S7 / AB CIP / AB Legacy / TwinCAT / FOCAS / OPC UA Client specifics
| Doc | Covers |
|-----|--------|
| [Configuration.md](v1/Configuration.md) | appsettings bootstrap + Config DB + Admin UI draft/publish (v1 archive — `OTOPCUA_GALAXY_*` env vars now live in mxaccessgw config) |
| [Configuration.md](Configuration.md) | appsettings bootstrap + Config DB + Admin UI draft/publish |
| [security.md](security.md) | Transport security profiles, LDAP auth, ACL trie, role grants, OTOPCUA0001 analyzer |
| [Redundancy.md](Redundancy.md) | `RedundancyCoordinator`, `ServiceLevelCalculator`, apply-lease, Prometheus metrics |
| [ServiceHosting.md](ServiceHosting.md) | Two-process deploy (Server + Admin) install/uninstall, plus the optional `OtOpcUaWonderwareHistorian` sidecar |
| [ServiceHosting.md](ServiceHosting.md) | Three-process deploy (Server + Admin + Galaxy.Host) install/uninstall |
| [StatusDashboard.md](StatusDashboard.md) | Pointer — superseded by [v2/admin-ui.md](v2/admin-ui.md) |
### Client tooling
@@ -78,10 +79,10 @@ For Modbus / S7 / AB CIP / AB Legacy / TwinCAT / FOCAS / OPC UA Client specifics
|-----|--------|
| [reqs/HighLevelReqs.md](reqs/HighLevelReqs.md) | HLRs — numbered system-level requirements |
| [reqs/OpcUaServerReqs.md](reqs/OpcUaServerReqs.md) | OPC UA server-layer reqs |
| [v1/reqs/ServiceHostReqs.md](v1/reqs/ServiceHostReqs.md) | Per-process hosting reqs (v1 archive — only `OtOpcUa` server hosting remains in scope post-PR-7.2) |
| [reqs/ServiceHostReqs.md](reqs/ServiceHostReqs.md) | Per-process hosting reqs |
| [reqs/ClientRequirements.md](reqs/ClientRequirements.md) | Client CLI + UI reqs |
| [v1/reqs/GalaxyRepositoryReqs.md](v1/reqs/GalaxyRepositoryReqs.md) | Galaxy-scoped repository reqs (v1 archive — owned by mxaccessgw today) |
| [v1/reqs/MxAccessClientReqs.md](v1/reqs/MxAccessClientReqs.md) | Galaxy-scoped MXAccess reqs (v1 archive — owned by mxaccessgw today) |
| [reqs/GalaxyRepositoryReqs.md](reqs/GalaxyRepositoryReqs.md) | Galaxy-scoped repository reqs |
| [reqs/MxAccessClientReqs.md](reqs/MxAccessClientReqs.md) | Galaxy-scoped MXAccess reqs |
| [reqs/StatusDashboardReqs.md](reqs/StatusDashboardReqs.md) | Pointer — superseded by Admin UI |
## Implementation history (`docs/v2/`)
@@ -95,11 +96,4 @@ Design decisions + phase plans + execution notes. Load-bearing cross-references
- [v2/driver-specs.md](v2/driver-specs.md) — per-driver addressing + quirks for every shipped protocol
- [v2/dev-environment.md](v2/dev-environment.md) — dev-box bootstrap
- [v2/test-data-sources.md](v2/test-data-sources.md) — integration-test simulator matrix (includes the pinned libplctag `ab_server` version for AB CIP tests)
- [v2/multi-host-dispatch.md](v2/multi-host-dispatch.md) — per-PLC circuit breakers (Phase 6.1 decision #144)
- [v2/v2-release-readiness.md](v2/v2-release-readiness.md) — release-readiness tracker
- [v2/lmx-followups.md](v2/lmx-followups.md) — historical Galaxy-bridge follow-ups (pre-PR-7.2)
- [v2/implementation/phase-*-*.md](v2/implementation/) — per-phase execution plans with exit-gate evidence
## v1 archive
The v1 in-process MXAccess architecture (Galaxy.Host + Galaxy.Proxy + Galaxy.Shared, .NET 4.8 x86 COM, the `OtOpcUaGalaxyHost` Windows service) was retired in PR 7.2 (2026-04-30, commit `ae7106d`). Docs that described that shape are kept under [v1/](v1/) as historical record — see [v1/README.md](v1/README.md) for the index.
+4
View File
@@ -98,6 +98,10 @@ Role swaps, stand-alone promotions, and base-level adjustments all happen throug
The OtOpcUa Client CLI at `src/ZB.MOM.WW.OtOpcUa.Client.CLI` supports `-F` / `--failover-urls` for automatic client-side failover; for long-running subscriptions the CLI monitors session KeepAlive and reconnects to the next available server, recreating the subscription on the new endpoint. See [`Client.CLI.md`](Client.CLI.md) for the command reference.
## vs. upstream-side redundancy
The mechanics on this page describe **OtOpcUa as a redundant server** — two of our instances clustered behind one OPC UA address space, exposing `ServerUriArray` + dynamic `ServiceLevel` to downstream clients. The mirror-image scenario — **the OPC UA Client driver consuming an upstream redundant pair** — is documented separately in [`drivers/OpcUaClient.md` § Upstream redundancy](drivers/OpcUaClient.md#upstream-redundancy-serverarray). Both rely on the same OPC UA Part 4 § 6.6.2 model (non-transparent warm/hot via `RedundancySupport` + `ServerUriArray` + `ServiceLevel`); they sit at opposite ends of the gateway pipeline. A deployment can wire either, both, or neither.
## Depth reference
For the full decision trail and implementation plan — topology invariants, peer-probe cadence, recovery-dwell policy, compliance-script guard against enum-value drift — see `docs/v2/plan.md` §Phase 6.3.
+112 -41
View File
@@ -2,61 +2,132 @@
## Overview
A production OtOpcUa deployment runs **two or three processes**, each
with a distinct runtime and install surface:
A production OtOpcUa deployment runs **three processes**, each with a distinct runtime, platform target, and install surface:
| Process | Project | Runtime | Platform | Responsibility |
|---|---|---|---|---|
| **OtOpcUa Server** | `src/ZB.MOM.WW.OtOpcUa.Server` | .NET 10 | x64 | Hosts the OPC UA endpoint; loads every driver in-process (Modbus, S7, AbCip, AbLegacy, TwinCAT, FOCAS, OPC UA Client, Galaxy via mxaccessgw); exposes `/healthz`. |
| **OtOpcUa Server** | `src/ZB.MOM.WW.OtOpcUa.Server` | .NET 10 | x64 | Hosts the OPC UA endpoint; loads every non-Galaxy driver in-process; exposes `/healthz`. |
| **OtOpcUa Admin** | `src/ZB.MOM.WW.OtOpcUa.Admin` | .NET 10 (ASP.NET Core / Blazor Server) | x64 | Operator UI for Config DB editing + fleet status, SignalR hubs (`FleetStatusHub`, `AlertHub`), Prometheus `/metrics`. |
| **OtOpcUa Wonderware Historian** *(optional)* | `src/ZB.MOM.WW.OtOpcUa.Driver.Historian.Wonderware` | .NET Framework 4.8 | x86 (32-bit) | Out-of-process sidecar exposing the Wonderware Historian SDK over a named pipe. Required only when `Historian:Wonderware:Enabled=true` in `appsettings.json`. |
| **OtOpcUa Galaxy.Host** | `src/ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Host` | .NET Framework 4.8 | x86 (32-bit) | Hosts MXAccess COM on a dedicated STA thread with a Win32 message pump; exposes a named-pipe IPC surface consumed by `Driver.Galaxy.Proxy` inside the Server process. |
Galaxy access uses a separately-installed **mxaccessgw** running out
of a sibling repo (`c:\Users\dohertj2\Desktop\mxaccessgw\`) — see
`docs/v2/Galaxy.ParityRig.md` for setup. The mxaccessgw owns the
MXAccess COM bitness constraint (its worker is x86 net48); nothing
in the OtOpcUa repo carries that constraint anymore. PR 7.2 retired
the legacy in-process `Galaxy.Host` / `Galaxy.Proxy` / `Galaxy.Shared`
projects + the `OtOpcUaGalaxyHost` Windows service.
The x86 / .NET Framework 4.8 constraint applies **only** to Galaxy.Host because the MXAccess toolkit DLLs (`Program Files (x86)\ArchestrA\Framework\bin`) are 32-bit-only COM. Every other driver (Modbus, S7, OpcUaClient, AbCip, AbLegacy, TwinCAT, FOCAS) runs in-process in the 64-bit Server.
## OtOpcUa Server
## Server process
Hosted via `Microsoft.Extensions.Hosting` with `AddWindowsService`
(decision #30 — replaced TopShelf in v2). The host's `Build()`
returns immediately when launched interactively (e.g. `dotnet run`)
but blocks for SCM signals when running as a Windows service.
`src/ZB.MOM.WW.OtOpcUa.Server/Program.cs` uses the generic host:
In-process drivers are registered at startup in `Program.cs`'s
`DriverFactoryRegistry` block; the `DriverInstance` rows in the
central Config DB select which driver factories materialise into
live `IDriver` instances. See `docs/v2/driver-specs.md` for the
per-driver `DriverConfig` JSON shapes.
```csharp
var builder = Host.CreateApplicationBuilder(args);
builder.Services.AddSerilog();
builder.Services.AddWindowsService(o => o.ServiceName = "OtOpcUa");
builder.Services.AddHostedService<OpcUaServerService>();
builder.Services.AddHostedService<HostStatusPublisher>();
```
## OtOpcUa Admin
`OpcUaServerService` is a `BackgroundService` (decision #30 — TopShelf from v1 was replaced by the generic-host `AddWindowsService` wrapper; no TopShelf dependency remains in any csproj). It owns:
Same hosting model; runs the Blazor Server UI + SignalR hubs.
Reads from the same Config DB the Server writes to.
1. Config bootstrap — reads `Node:NodeId`, `Node:ClusterId`, `Node:ConfigDbConnectionString`, `Node:LocalCachePath` from `appsettings.json`.
2. `NodeBootstrap` — pulls the latest published generation from the Config DB into the LiteDB local cache (`LiteDbConfigCache`) so the node starts even if the central DB is briefly unreachable.
3. `DriverHost` — instantiates configured driver instances from the generation, wires each through `CapabilityInvoker` resilience pipelines.
4. `OpcUaApplicationHost` — builds the OPC UA endpoint, applies `OpcUaServerOptions` + `LdapOptions`, registers `AuthorizationGate` at dispatch.
5. `HostStatusPublisher` — a second hosted service that heartbeats `DriverHostStatus` rows so the Admin UI Fleet view sees the node.
## OtOpcUa Wonderware Historian (optional)
### Installation
When `Historian:Wonderware:Enabled=true`, the Server speaks to a
sidecar that wraps the Wonderware Historian SDK (which is .NET
Framework only). The pipe IPC contract is in
`src/ZB.MOM.WW.OtOpcUa.Driver.Historian.Wonderware.Client/Contracts/`
and the sidecar's pipe handler lives at
`src/ZB.MOM.WW.OtOpcUa.Driver.Historian.Wonderware/Pipe/`.
Same executable, different modes driven by the .NET generic-host `AddWindowsService` wrapper:
Install via the `-InstallWonderwareHistorian` switch on
`scripts/install/Install-Services.ps1`.
| Mode | Invocation |
|---|---|
| Console | `ZB.MOM.WW.OtOpcUa.Server.exe` |
| Install as Windows service | `sc create OtOpcUa binPath="C:\Program Files\OtOpcUa\Server\ZB.MOM.WW.OtOpcUa.Server.exe" start=auto` |
| Start | `sc start OtOpcUa` |
| Stop | `sc stop OtOpcUa` |
| Uninstall | `sc delete OtOpcUa` |
## Install / Uninstall
### Health endpoints
- `scripts/install/Install-Services.ps1` installs `OtOpcUa` and
optionally `OtOpcUaWonderwareHistorian`.
- `scripts/install/Uninstall-Services.ps1` — stops + removes both,
plus `OtOpcUaGalaxyHost` if a pre-7.2 rig still carries it.
The Server exposes `/healthz` + `/readyz` used by (a) the Admin `FleetStatusPoller` as input to Fleet status and (b) `PeerReachabilityTracker` in a peer Server process as the HTTP side of the peer-reachability probe.
## Logging
## Admin process
Serilog with rolling-daily file sinks. Each service writes to
`%ProgramData%\OtOpcUa\<service>-*.log` plus stdout (NSSM-friendly).
`src/ZB.MOM.WW.OtOpcUa.Admin/Program.cs` is a stock `WebApplication`. Highlights:
- Cookie auth (`CookieAuthenticationDefaults`, scheme name `OtOpcUa.Admin`) + Blazor Server (`AddInteractiveServerComponents`) + SignalR.
- Authorization policies gated by `AdminRoles`: `ConfigViewer`, `ConfigEditor`, `FleetAdmin` (see `Services/AdminRoles.cs`). `CanEdit` policy requires `ConfigEditor` or `FleetAdmin`; `CanPublish` requires `FleetAdmin`.
- `OtOpcUaConfigDbContext` registered against `ConnectionStrings:ConfigDb`.
- Scoped services: `ClusterService`, `GenerationService`, `EquipmentService`, `UnsService`, `NamespaceService`, `DriverInstanceService`, `NodeAclService`, `PermissionProbeService`, `AclChangeNotifier`, `ReservationService`, `DraftValidationService`, `AuditLogService`, `HostStatusService`, `ClusterNodeService`, `EquipmentImportBatchService`, `ILdapGroupRoleMappingService`.
- Singleton `RedundancyMetrics` (meter name `ZB.MOM.WW.OtOpcUa.Redundancy`) + `CertTrustService` (promotes rejected client certs in the Server's PKI store to trusted via the Admin Certificates page).
- `LdapAuthService` bound to `Authentication:Ldap` — same LDAP flow as ScadaLink CentralUI for visual parity.
- SignalR hubs mapped at `/hubs/fleet` and `/hubs/alerts`; `FleetStatusPoller` runs as a hosted service and pushes `RoleChanged`, host status, and alert events.
- OpenTelemetry → Prometheus exporter at `/metrics` when `Metrics:Prometheus:Enabled=true` (default). Pull-based means no Collector required in the common K8s deploy.
### Installation
Deployed as an ASP.NET Core service; the generic-host `AddWindowsService` wrapper (or IIS reverse-proxy for multi-node fleets) provides install/uninstall. Listens on whatever `ASPNETCORE_URLS` specifies.
## Galaxy.Host process
`src/ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Host/Program.cs` is a .NET Framework 4.8 x86 console executable. Configuration comes from environment variables supplied by the supervisor (`Driver.Galaxy.Proxy.Supervisor`):
| Env var | Purpose |
|---|---|
| `OTOPCUA_GALAXY_PIPE` | Pipe name the host listens on (default `OtOpcUaGalaxy`). |
| `OTOPCUA_ALLOWED_SID` | SID of the Server process's principal; anyone else is refused during the handshake. |
| `OTOPCUA_GALAXY_SECRET` | Per-spawn shared secret the client must present in the Hello frame. |
| `OTOPCUA_GALAXY_BACKEND` | `mxaccess` (default), `db` (ZB-only, no COM), `stub` (in-memory; for tests). |
| `OTOPCUA_GALAXY_ZB_CONN` | SQL connection string to the ZB Galaxy repository. |
| `OTOPCUA_HISTORIAN_*` | Optional Wonderware Historian SDK config if Historian is enabled for this node. |
The host spins up `StaPump` (the STA thread with message pump), creates the MXAccess `LMXProxyServer` COM object on that thread, and handles all COM calls there; the IPC layer marshals work items via `PostThreadMessage`.
### Pipe security
`PipeServer` builds a `PipeAcl` from the provided `SecurityIdentifier` + uses `NamedPipeServerStream` with `maxNumberOfServerInstances: 1`. The handshake requires a matching shared secret in the first Hello frame; callers whose SID doesn't match `OTOPCUA_ALLOWED_SID` are rejected before any frame is processed. **By design the pipe ACL denies BUILTIN\Administrators** — live smoke tests must therefore run from a non-elevated shell that matches the allowed principal. The installed dev host (`OtOpcUaGalaxyHost`) runs as `dohertj2` with the secret at `.local/galaxy-host-secret.txt`.
### Installation
NSSM-wrapped (the Non-Sucking Service Manager) because the executable itself is a plain console app, not a `ServiceBase` Windows service. The supervisor then adopts the child process over the pipe after install. Install/uninstall commands follow the NSSM pattern:
```bash
nssm install OtOpcUaGalaxyHost "C:\Program Files (x86)\OtOpcUa\Galaxy.Host\ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Host.exe"
nssm set OtOpcUaGalaxyHost ObjectName .\dohertj2 <password>
nssm set OtOpcUaGalaxyHost AppEnvironmentExtra OTOPCUA_GALAXY_BACKEND=mxaccess OTOPCUA_GALAXY_SECRET=OTOPCUA_ALLOWED_SID=
nssm start OtOpcUaGalaxyHost
```
(Exact values for the environment block are generated by the Admin UI + committed alongside `.local/galaxy-host-secret.txt` on the dev box.)
## Inter-process communication
```
┌──────────────────────────┐ LDAP bind (Authentication:Ldap) ┌──────────────────────────┐
│ OtOpcUa Admin (x64) │ ─────────────────────────────────────────────▶│ LDAP / AD │
│ Blazor Server + SignalR │ └──────────────────────────┘
│ /metrics (Prometheus) │ FleetStatusPoller → ClusterNode poll
│ │ ─────────────────────────────────────────────▶┌──────────────────────────┐
│ │ Cluster/Generation/ACL writes │ Config DB (SQL Server) │
└──────────────────────────┘ ─────────────────────────────────────────────▶│ OtOpcUaConfigDbContext │
▲ └──────────────────────────┘
│ SignalR ▲
│ (role change, │ sp_GetCurrentGenerationForCluster
│ host status, │ sp_PublishGeneration
│ alerts) │
┌──────────────────────────┐ │
│ OtOpcUa Server (x64) │ ──────────────────────────────────────────────────────────┘
│ OPC UA endpoint │
│ Non-Galaxy drivers │ Named pipe (OtOpcUaGalaxy) ┌──────────────────────────┐
│ Driver.Galaxy.Proxy │ ─────────────────────────────────────────────▶│ Galaxy.Host (x86 .NFx) │
│ │ SID + shared-secret handshake │ STA + message pump │
│ /healthz /readyz │ │ MXAccess COM │
└──────────────────────────┘ │ Historian SDK (opt) │
└──────────────────────────┘
```
## appsettings.json boundary
Each process reads its own `appsettings.json` for **bootstrap only** — connection strings, LDAP bind config, transport security profile, redundancy node id, logging. The authoritative configuration tree (drivers, UNS, tags, ACLs) lives in the Config DB and is edited through the Admin UI. See [`Configuration.md`](Configuration.md) for the split.
## Development bootstrap
For the Windows install steps (SQL Server in Docker, .NET 10 SDK, .NET Framework 4.8 SDK, Docker Desktop WSL 2 backend, EF Core CLI, first-run migration), see [`docs/v2/dev-environment.md`](v2/dev-environment.md).
+332
View File
@@ -0,0 +1,332 @@
# AbCip — ControlLogix HSBY paired-IP support
PR abcip-5.1 + 5.2 ship **non-transparent** HSBY (Hot-Standby) awareness
to the AB CIP driver. Each device may declare a partner gateway; when both
gateways are up the driver concurrently probes a role tag on each chassis,
reports which one is currently Active, and routes reads / writes through
that chassis automatically.
- **PR abcip-5.1** — gathers + reports the role of each chassis through
driver diagnostics. See [Role-tag detection matrix](#role-tag-detection-matrix)
+ [Active-resolution rules](#active-resolution-rules).
- **PR abcip-5.2** — wires the resolved active address into
`AbCipDriver.ResolveHost` and the runtime-cache lifecycle. See
[Failover behaviour](#failover-behaviour-pr-52) +
[Failure-mode walkthrough](#failure-mode-walkthrough).
## When to use HSBY paired IPs
You have a redundant **ControlLogix** chassis pair (1756-RM redundancy
module, two CPUs, one acting + one standby) and the SCADA / OPC UA layer
needs to keep talking to *whichever chassis is currently Active* without an
operator manually re-pointing the connection.
Pre-5.1 the driver only knew about a single `HostAddress`. After a
hot-standby switch-over, the standby (now Active) carried a **different IP**
and the driver kept probing the dead-but-was-Active address until someone
edited the config.
PR abcip-5.1 closes the visibility half of that gap by reading the role tag
on both chassis. PR abcip-5.2 closes the routing half by re-pointing
`ResolveHost` at the Active address each tick + invalidating the per-tag
runtime cache + write-coalescer state on every flip.
## Configuration
```jsonc
{
"Devices": [
{
"HostAddress": "ab://10.0.0.5/1,0",
"PartnerHostAddress": "ab://10.0.0.6/1,0",
"Hsby": {
"Enabled": true,
"RoleTagAddress": "WallClockTime.SyncStatus",
"ProbeIntervalMs": 2000
}
}
]
}
```
| Field | Default | Notes |
|---|---|---|
| `PartnerHostAddress` | `null` | Canonical `ab://gateway[:port]/cip-path` of the partner chassis. `null` = no HSBY pair; the driver behaves exactly like every pre-5.1 build. |
| `Hsby.Enabled` | `false` | Master switch. When `false` (or `Hsby` omitted) no role probing happens, even if `PartnerHostAddress` is set. |
| `Hsby.RoleTagAddress` | `WallClockTime.SyncStatus` | Address of the role tag on each chassis. See [role-tag detection matrix](#role-tag-detection-matrix). |
| `Hsby.ProbeIntervalMs` | `2000` | How often each chassis is sampled. 2 s is a good default — tight enough to detect a switch-over within one Admin-UI refresh, loose enough to leave headroom for the regular probe loop. |
## Feature-flag gate (`Redundancy.Hsby.Enabled`)
`Hsby.Enabled = false` (the default) is the off-switch for the entire
feature. The role-probe loop never starts, the diagnostics keys are not
emitted, and the driver behaves identically to a pre-5.1 build. This is the
gate to flip when an operator wants to roll the feature out cautiously
across a fleet — set `Hsby.Enabled = true` per-device in driver config (no
build flag, no env var).
When the gate is on but the partner gateway is unreachable, the role-probe
loop reports `HsbyRole.Unknown` for the partner each tick. The primary's
role still drives the active-chassis resolution; the operator sees the
partner's role as Unknown in the Admin UI / driver diagnostics, which is the
correct surface for "we can't reach the standby chassis right now."
## Role-tag detection matrix
| Firmware / fronts | Address | Decode |
|---|---|---|
| **v20 / v24 / v32+ ControlLogix HSBY** | `WallClockTime.SyncStatus` (DINT) | `0` = Standby, `1` = Synchronized / Active, `2` = Disqualified, anything else = Unknown |
| **PLC-5 / SLC500 status-byte fallback** | `S:34` Module Status word | bit 0 = "this chassis is Active". Bit set → `Active`; clear → `Standby` |
| **Custom user role tag** | any DINT-typed CIP path | Same matrix as `WallClockTime.SyncStatus` (0 / 1 / 2). Out-of-range values → Unknown. |
`AbCipHsbyRoleProber.MapValueToRole` is the value-to-role mapper; unit tests
in `tests/ZB.MOM.WW.OtOpcUa.Driver.AbCip.Tests/AbCipHsbyTests.cs` pin every
row of the matrix.
## What gets reported
The driver surfaces three diagnostics counters per HSBY-enabled device
(visible via `driver-diagnostics` RPC + the Admin UI):
| Counter | Value |
|---|---|
| `AbCip.HsbyActive` | `1` if primary is Active, `2` if partner is Active, `0` if neither (or HSBY off) |
| `AbCip.HsbyPrimaryRole` | `(int)HsbyRole``0` = Unknown, `1` = Active, `2` = Standby, `3` = Disqualified |
| `AbCip.HsbyPartnerRole` | Same encoding as `HsbyPrimaryRole`, observed on the partner chassis |
| `AbCip.HsbyFailoverCount` (PR 5.2) | Total number of `ActiveAddress` transitions the probe loop has observed across every HSBY-enabled device on this driver. Each increment maps to one runtime-cache invalidation + write-coalescer reset. |
When more than one HSBY pair is configured on the same driver instance the
flat keys are scoped per primary host: `AbCip.HsbyActive[ab://10.0.0.5/1,0]`,
etc.
The `DeviceState.ActiveAddress` field (internal; surfaced via
`HsbyActive` diagnostics) is the address PR 5.2 routes through
`ResolveHost` + uses to scope the per-host bulkhead / breaker key.
See [Failover behaviour](#failover-behaviour-pr-52) for the runtime
implications.
### Active-resolution rules
| Primary role | Partner role | `ActiveAddress` resolution |
|---|---|---|
| Active | Standby / Disqualified / Unknown | primary |
| Standby / Disqualified / Unknown | Active | partner |
| Active | Active (split-brain) | **primary wins**, warning logged |
| Standby + Standby | Standby + Standby | `null` — PR 5.2's `ResolveHost` falls back to the configured primary; the existing dial flow surfaces `BadCommunicationError` if the primary is also down. See [Both-stuck](#both-stuck-no-chassis-active). |
| Unknown + Unknown | Unknown + Unknown | `null` (same fallback as Standby + Standby) |
Split-brain (both chassis claim Active simultaneously) is a real
production failure mode — typically a redundancy-module misconfiguration or
a partial network split. The driver picks primary deterministically + emits
a warning through `AbCipDriverOptions.OnWarning` so operators see it in the
log.
## CLI flags
The `otopcua-abcip-cli` tool exposes the HSBY plumbing through two surfaces
(see [Driver.AbCip.Cli.md](../Driver.AbCip.Cli.md) for the full CLI guide):
- `--partner <gateway>` — global flag on every command. Sets
`PartnerHostAddress` + auto-enables `Hsby.Enabled = true` so the role
probe runs alongside any read / write / subscribe.
- `hsby-status` — dedicated command that prints which chassis is
currently Active. Reads the role tag on both gateways for a few ticks +
prints the `(primary, partner, active)` tuple.
```powershell
# Print which chassis is Active right now
otopcua-abcip-cli hsby-status -g ab://10.0.0.5/1,0 --partner ab://10.0.0.6/1,0
# Subscribe through the active chassis (PR 5.2 follow-up — today the
# subscribe stays pointed at the primary; the role probe runs alongside).
otopcua-abcip-cli subscribe -g ab://10.0.0.5/1,0 --partner ab://10.0.0.6/1,0 \
-t Motor01_Speed --type Real -i 500
```
## Test coverage
- **Unit** (`tests/ZB.MOM.WW.OtOpcUa.Driver.AbCip.Tests/AbCipHsbyTests.cs`):
- Pure `MapValueToRole` matrix (WallClockTime.SyncStatus + S:34 bit
mask + Unknown values).
- End-to-end driver loop: primary Active / partner Standby resolves to
primary; both Active resolves to primary with a warning; both
Standby clears `ActiveAddress`; primary read failure routes to
partner.
- Diagnostics surface (`AbCip.HsbyActive` / `HsbyPrimaryRole` /
`HsbyPartnerRole`).
- DTO JSON round-trip (`PartnerHostAddress` + `Hsby.{Enabled,
RoleTagAddress, ProbeIntervalMs}` survive deserialise → driver →
`DeviceState`).
- `Hsby.Enabled = false` → no role probing.
- **Integration** (`tests/ZB.MOM.WW.OtOpcUa.Driver.AbCip.IntegrationTests/`):
- `AbCipHsbyRoleProberTests.cs` (PR 5.1) and
`AbCipHsbyFailoverTests.cs` (PR 5.2) — both **skipped by default**
(`Assert.Skip`). `ab_server` cannot emulate a ControlLogix HSBY
pair (no `WallClockTime.SyncStatus`, no second chassis concept).
The Docker `paired` profile (PR 5.1) brings up two `ab_server`
instances + a stub `hsby-mux` sidecar so the topology is
documented, but a patched `ab_server` image that actually serves
the role tag is still on the follow-up list.
- Trait `Category=Hsby` so `dotnet test --filter Category=Hsby`
finds them once they're promoted.
- **End-to-end** (`scripts/e2e/test-abcip-hsby.ps1`, PR 5.2):
- Paired-fixture variant of `test-abcip.ps1`. Subscribes to a tag
through the OPC UA server, flips the active chassis mid-stream
via the `hsby-mux` sidecar's `POST /flip` endpoint, asserts the
stream survives + `AbCip.HsbyFailoverCount` increments. Gated
on operator-supplied `BridgeNodeId` + a running paired fixture;
ships unwired into `test-all.ps1` until the patched `ab_server`
lands.
## Failover behaviour (PR 5.2)
PR 5.2 wires `DeviceState.ActiveAddress` into the read / write hot path
through `AbCipDriver.ResolveHost` and the runtime-cache lifecycle. After
the role-probe loop (PR 5.1) detects an active-address transition the
driver re-points every wire-level operation at the now-Active chassis
without operator intervention.
### What flips on a failover
| Aspect | Pre-flip | Post-flip |
|---|---|---|
| `ResolveHost(tag)` return | primary `HostAddress` | the partner address (when partner is now Active) |
| Per-tag libplctag handles in `DeviceState.Runtimes` | created against primary gateway | dropped on flip; lazily re-created against the partner gateway on next read / write |
| Parent-DINT RMW handles in `DeviceState.ParentRuntimes` | primary gateway | dropped on flip; same re-create-on-demand path |
| `AbCipWriteCoalescer` per-device cache | last-known-written values from the primary | reset; the first write of any value to the partner pays the full round-trip |
| `LogicalInstanceMap` (Logical-mode `@tags` walk) | populated for primary | cleared; the next read on a Logical-mode device re-walks `@tags` against the partner |
| Per-host bulkhead key (Polly bulkhead + breaker, plan decision #144) | keyed on primary `HostAddress` | keyed on the new active address — the partner gets its own fresh breaker state instead of inheriting a tripped breaker from the now-standby |
| `AbCip.HsbyFailoverCount` diagnostic | 0 | incremented by 1 on every transition observed by the probe loop |
### How the invalidation runs
PR 5.2 introduces an internal `OnActiveAddressChanged` event raised by
`HsbyProbeLoopAsync` on every `DeviceState.ActiveAddress` transition. The
driver subscribes to it from its own constructor; the handler
(`HandleActiveAddressChanged`) does the cache invalidation in one place:
1. Disposes every entry in `DeviceState.Runtimes` and
`DeviceState.ParentRuntimes`, then clears both dicts. Disposed
`IAbCipTagRuntime` instances release their underlying libplctag
handles so the native heap doesn't leak.
2. Clears `DeviceState.LogicalInstanceMap` and resets
`LogicalWalkComplete = false` so the next read on a Logical-mode
device re-fires the `@tags` symbol walk against the new chassis.
3. Calls `AbCipWriteCoalescer.Reset(deviceHostAddress)` so cached
"we already wrote 42" decisions don't stale-suppress the first
partner-side write.
4. Resets `DeviceState.RuntimesAddress = null` so subsequent
diagnostics observers see a fresh stamp on the next runtime
creation.
5. `Interlocked.Increment` on the driver-wide
`AbCip.HsbyFailoverCount` counter.
The handler is idempotent — a second event for the same address change
is harmless because the dicts are already empty + the coalescer reset
is itself idempotent.
### Bulkhead key semantics
The per-host resilience pipeline (Polly bulkhead + circuit breaker, plan
decision #144) keys on whatever `IPerCallHostResolver.ResolveHost`
returns. PR 5.2 changes that resolver so an HSBY-failed-over device
returns the partner's address, which means:
- The **device-state lookup** (`_devices.TryGetValue`) keeps using the
configured primary `HostAddress` as the dictionary key — that key
never changes for the lifetime of a device, so multi-device
configurations stay routable.
- The **resilience pipeline** (Polly bulkhead, breaker, retry policies)
receives the active address as the host-name dimension. The standby
chassis's tripped breaker (if its primary went away) doesn't bleed
over to the partner; the partner gets fresh limits + a closed
breaker.
When HSBY is disabled (`Hsby.Enabled = false`) `ResolveHost` returns the
configured primary `HostAddress` exactly as it always has — pre-5.2
behaviour, no double-key risk.
## Failure-mode walkthrough
PR 5.2 adds three failover surface areas to reason about. The table
below summarises the behaviour the driver reports + how an operator
can inspect it.
### Primary-stuck (primary unreachable, partner Active)
The primary chassis goes away (network partition, power loss, a stuck
Forward Open). The role-probe loop reads `HsbyRole.Unknown` for the
primary and `HsbyRole.Active` for the partner.
| Surface | Behaviour |
|---|---|
| `DeviceState.ActiveAddress` | partner address |
| `DeviceState.PrimaryRole` | `Unknown` |
| `DeviceState.PartnerRole` | `Active` |
| `ResolveHost(tag)` | partner address |
| Reads / writes | route through partner gateway transparently |
| `AbCip.HsbyFailoverCount` | incremented when the address transitioned away from the primary |
| `AbCip.HsbyActive` | `2` (partner is the active chassis) |
| Operator action | none required for routing; investigate why the primary is unreachable through the connectivity-probe loop's `_System/_ConnectionStatus` for the device |
### Secondary-stuck (partner unreachable, primary Active)
The partner chassis goes away (its OPC UA server is down, its IP is
unreachable, the redundancy module unhitched it). The probe loop reads
`HsbyRole.Active` for the primary and `HsbyRole.Unknown` for the partner.
| Surface | Behaviour |
|---|---|
| `DeviceState.ActiveAddress` | primary address (no transition; this is the steady state) |
| `DeviceState.PrimaryRole` | `Active` |
| `DeviceState.PartnerRole` | `Unknown` |
| `ResolveHost(tag)` | primary address |
| Reads / writes | route through primary gateway exactly as in a non-HSBY deployment |
| `AbCip.HsbyFailoverCount` | unchanged — no flip happened |
| `AbCip.HsbyActive` | `1` (primary is the active chassis) |
| Operator action | investigate why the partner is unreachable; the operational risk is that a future primary-side outage has no fall-back |
### Both-stuck (no chassis Active)
Both chassis report `Standby` / `Disqualified` / `Unknown` (a
redundancy-module misconfiguration, both controllers in Program mode,
or both unreachable).
| Surface | Behaviour |
|---|---|
| `DeviceState.ActiveAddress` | `null` |
| `ResolveHost(tag)` | falls back to the configured primary `HostAddress` |
| Reads / writes | dispatched to the configured primary; a stuck primary surfaces `BadCommunicationError` per the existing dial flow |
| `AbCip.HsbyActive` | `0` (no chassis Active) |
| `AbCip.HsbyFailoverCount` | incremented when the transition `Active → null` happened |
| Operator action | investigate the redundancy module / mode keys; the SCADA layer sees stuck-or-bad-quality reads, not incorrect routing |
The "fall back to primary on null Active" choice is deliberate. Routing
all reads to a deterministic chassis (the configured primary) keeps the
breaker key + bulkhead state stable while the operator diagnoses the
double-down outage; the alternative (round-robin / partner) would just
trip both breakers in turn and obscure which chassis is the real
problem.
## Follow-ups (beyond PR 5.2)
- **Patched `ab_server` image** — add a writable `WallClockTime.SyncStatus`
tag (or a separate Python shim) so the Docker `paired` profile can
exercise the wire-level role probe + the
`tests/.../IntegrationTests/AbCipHsbyFailoverTests.cs` scaffold can
flip its `Assert.Skip` for a real integration assertion.
- **`hsby-mux` REST endpoint** — `POST /flip {"active": "primary"}` writes
`1` to the chosen chassis + `0` to the other so integration tests +
`scripts/e2e/test-abcip-hsby.ps1` can drive switch-overs
deterministically.
- **GuardLogix HSBY** — same role-tag plumbing applies; verify against a
real 1756-L8xS pair when one is on-site.
## See also
- [`docs/Driver.AbCip.Cli.md`](../Driver.AbCip.Cli.md) — `--partner` flag +
`hsby-status` command reference
- [`docs/drivers/AbServer-Test-Fixture.md`](AbServer-Test-Fixture.md) §"What
it does NOT cover" — HSBY entry
- [`docs/Redundancy.md`](../Redundancy.md) — server-level (OPC UA-stack)
redundancy; HSBY is the **driver-level** companion
+406
View File
@@ -0,0 +1,406 @@
# AB CIP — Operability knobs
Phase 4 of the AB CIP driver plan introduces operator-tunable behaviour that
changes how the driver schedules per-tag traffic, deduplicates updates, and
surfaces health — knobs that an operator typically reaches for *after* the
address space is in place and the deployment is past the green-field phase.
The Phase 3 doc (`AbCip-Performance.md`) covers connection-shape and
read-strategy knobs; this doc is the home for the per-tag scheduling and
operability levers as PRs land.
PR abcip-4.1 ships the first knob: per-tag **Scan Rate** (Kepware-parity scan
classes).
## Per-tag scan rate
### What it is
A per-tag override of the OPC UA subscription's `publishingInterval`. The AB
CIP driver mirrors the Galaxy hierarchy as a single OPC UA address space, so
every tag served from one driver normally ticks at the publishing interval the
client requested when it created the Subscription. This knob lets specific
tags publish at a different cadence — fast HMI tags at 100 ms, batch /
historian tags at 110 s — without forcing the operator to split tags into
separate subscriptions or driver instances.
It is the Kepware "scan classes" model expressed per-tag. The same shape is
already shipped in the S7 driver (`S7TagDefinition.ScanGroup`) and the AB
Legacy / TwinCAT drivers; AB CIP adopts a leaner per-tag-only form because the
CIP single-connection model means the practical knob a deployment reaches for
is "this one tag, faster", not "every tag in this group".
### How it interacts with OPC UA publishingInterval
OPC UA semantics:
- The Subscription's `publishingInterval` is the *upper bound* on how often
the server publishes a NotificationMessage. Each MonitoredItem also has its
own `samplingInterval`; that's where this knob lands.
- A per-tag `samplingInterval` shorter than the Subscription's
`publishingInterval` means the server samples faster but only publishes at
the next Subscription tick — clients may receive multiple values for one
tag in a single Publish response.
- A per-tag `samplingInterval` longer than the Subscription's
`publishingInterval` is legal too — the server simply skips ticks for that
tag.
AB CIP-side: the driver's `SubscribeAsync` receives one `publishingInterval`
plus a list of tag references. With per-tag `ScanRateMs` it buckets the input
list by resolved interval and registers one `PollGroupEngine` subscription per
bucket. Each bucket runs an independent timer, so a 100 ms tag never waits
for a 1000 ms tag's `Task.Delay` to expire.
### Override knob
`AbCipTagDefinition.ScanRateMs` (`int?`, default `null`). `null` = use the
subscription's default `publishingInterval` (legacy behaviour). Bind via
driver config JSON:
```json
{
"Tags": [
{
"Name": "Motor1.Speed",
"DeviceHostAddress": "ab://10.0.0.5/1,0",
"TagPath": "Motor1.Speed",
"DataType": "DInt",
"ScanRateMs": 100
},
{
"Name": "Motor1.RunHours",
"DeviceHostAddress": "ab://10.0.0.5/1,0",
"TagPath": "Motor1.RunHours",
"DataType": "DInt",
"ScanRateMs": 5000
},
{
"Name": "Motor1.NamePlate",
"DeviceHostAddress": "ab://10.0.0.5/1,0",
"TagPath": "Motor1.NamePlate",
"DataType": "String"
}
]
}
```
Result: three buckets — 100 ms, 5000 ms, and the subscription default for
`NamePlate`. UDT members inherit the parent tag's `ScanRateMs` at fan-out
time, so a UDT declared at 100 ms publishes every member at 100 ms without
the operator having to repeat the override on each member.
### Floor and degenerate cases
- `PollGroupEngine` floors every bucket at **100 ms** — a `ScanRateMs: 25`
is clamped up. The floor matches the Modbus / S7 / TwinCAT floors and
protects the wire from sub-mailbox-scan polling.
- `ScanRateMs: 0` and negative values are treated as unset — the tag falls
back to the subscription default. Mis-typed config degrades, doesn't fault.
- A `ScanRateMs` equal to the subscription default collapses into the same
bucket as plain tags. The driver doesn't fragment poll loops when the
override is redundant.
- Tags whose names don't appear in the driver's tag map (typo / discovery
miss) fall through to the subscription default — same "config typo
degrades" stance as the rest of the driver.
### Wire impact
Per-bucket independent timers do **not** parallelise CIP traffic. The driver
serializes wire-side reads through its per-device libplctag handles, so a
fast bucket and a slow bucket trade off against each other on the wire — the
multi-rate split decouples *cadence* (the 100 ms bucket isn't queued behind
the 1000 ms bucket's `Task.Delay`), not *throughput*. The wire still moves
one CIP request at a time per device.
If you're reading a large tag set and the slow bucket starves the fast
bucket, the lever is `AbCipDeviceOptions.ConnectionSize` (Phase 3) — pack
more tags into one CIP RTT so the slow bucket finishes faster. Per-tag scan
rate is a *scheduling* knob, not a *throughput* knob.
### Comparison to Kepware scan classes
| Kepware concept | AB CIP equivalent |
|---|---|
| Scan class table (named groups → rate) | implicit: each distinct `ScanRateMs` value is its own bucket |
| Default scan class | OPC UA Subscription's `publishingInterval` |
| Per-tag scan class assignment | `AbCipTagDefinition.ScanRateMs` |
| "Scan mode: Respect client" | always — the OPC UA `publishingInterval` is the default |
| "Force write" / "Write through cache" | not exposed — AB CIP writes always go to the wire |
The leaner shape (per-tag rate, not named groups) keeps the JSON config flat
and reflects how operators tend to use the knob in practice — a handful of
"this specific tag needs to be fast" overrides on top of a sensible default,
rather than a separate tier of scan-class definitions.
### Verification
- **Unit**: `AbCipPerTagScanRateTests` (`tests/.../AbCip.Tests`). Asserts
bucketing math, default-rate collapse, UDT member inheritance, JSON DTO
round-trip, and end-to-end cadence against the in-process fake.
- **Integration**: `AbCipPerTagScanRateTests`
(`tests/.../AbCip.IntegrationTests`). Drives two tags at 100 ms / 1000 ms
against a live `ab_server` and asserts the bucket count + each tag receives
the initial-data push.
- **E2E**: `scripts/e2e/test-abcip.ps1` — see the *PerTagScanRate* assertion.
### Cross-references
- `docs/Driver.AbCip.Cli.md` — there is no CLI surface change for this knob;
scan rate is a config-time concern.
- `docs/drivers/AbCip-Performance.md` — Phase 3 throughput knobs that pair
with per-tag scan rate when a slow bucket starves a fast one.
- S7 driver `ScanGroup` model in `src/.../S7DriverOptions.cs` — the
named-group form of the same idea.
## Write deadband / write-on-change
PR abcip-4.2 ships the second operability knob: per-tag write coalescing,
the *write-side* companion to the read-side deadband already shipped at the
OPC UA monitored-item layer. The driver remembers the value last
successfully written for a tag and can suppress redundant or below-threshold
follow-up writes — they return `Good` to the OPC UA client without ever
hitting the wire.
### What it is
- **`AbCipTagDefinition.WriteDeadband`** (`double?`, default `null`) —
numeric absolute-difference threshold. When set, a write whose
`|new last|` is below the deadband is suppressed.
- **`AbCipTagDefinition.WriteOnChange`** (`bool`, default `false`) —
equality gate. When set, a write whose value equals the last successfully
written value is suppressed.
Both knobs combine on the same tag. For numerics, the deadband path takes
priority; the equality fallback covers the cases the deadband doesn't (BOOL
setpoints, STRING constants, `WriteDeadband=0`, etc).
### Worked setpoint-jitter example
A motor speed setpoint published from an HMI tends to wobble by a few
ticks even when the operator hasn't touched it — UI rounding, Modbus
gateway re-encoding, RPN script noise. With `WriteDeadband: 0.5`:
```json
{
"Tags": [
{
"Name": "Motor1.Speed.SP",
"DeviceHostAddress": "ab://10.0.0.5/1,0",
"TagPath": "Motor1.Speed.SP",
"DataType": "Real",
"WriteDeadband": 0.5
}
]
}
```
Sequence of writes from the HMI (one every 100 ms, no operator input):
| Time | Value | `\|new last\|` | Wire? |
|---|---|---|---|
| 0 ms | 50.0 | n/a (first) | yes |
| 100 ms | 50.2 | 0.2 < 0.5 | suppressed |
| 200 ms | 50.3 | 0.3 < 0.5 | suppressed |
| 300 ms | 50.6 | 0.6 ≥ 0.5 | yes |
| 400 ms | 50.6 | 0.0 < 0.5 | suppressed |
| 500 ms | 51.5 | 0.9 ≥ 0.5 | yes |
Three writes hit the wire; three are suppressed. The OPC UA client sees
`Good` on every call. The PLC sees only the values that actually crossed
the deadband.
### Combining with WriteOnChange
A digital reset bit driven by a UI that pulses it at every cycle:
```json
{
"Name": "Conveyor.Reset",
"DeviceHostAddress": "ab://10.0.0.5/1,0",
"TagPath": "Conveyor.Reset",
"DataType": "Bool",
"WriteOnChange": true
}
```
Three consecutive `false → false → false` writes from the UI collapse to
one wire write (`false`, the first). When the operator clicks the reset
button (`true`), that write passes; subsequent `true → true` repeats
suppress until the UI clears it back to `false`.
Numeric tags can also opt into both: `WriteDeadband: 0.5` plus
`WriteOnChange: true` is well-defined — the deadband suppresses jitter, the
equality gate suppresses exact repeats (which the deadband path also catches
because `|0| < 0.5`, but having both set documents the operator's intent).
### Special cases
- **First write** always passes through. The coalescer has no prior value
to compare against, so the first write of any tag pays the full
round-trip and seeds the cache.
- **NaN / Infinity** bypass deadband suppression. IEEE-754 comparisons
against NaN are undefined and a stale `+Inf` shouldn't silently swallow
a real reset; the wire decides. `WriteOnChange` equality on NaN still
follows .NET semantics (`Equals(NaN, NaN) == true` for `double` boxed in
`object`), so a `WriteOnChange` tag stuck on NaN will suppress repeats
until something else writes a real value.
- **Failed writes** do *not* seed the cache. If the wire write fails, the
next attempt with the same value still hits the wire because the
coalescer never recorded a "last successful value" for it.
- **Reconnect drops the cache**. The driver's host-state probe transitions
`Stopped → Running` after a reconnect; both transitions reset the
per-device coalescer cache, so the first post-reconnect write of any
value pays the full round-trip. The PLC may have been restarted while
the driver was offline and our cached "we already wrote 42" is stale.
- **Two devices, same tag address**. The cache is keyed on
`(deviceHostAddress, tagAddress)` so two PLCs running the same Logix
program keep independent caches — writing 42 to A doesn't suppress
writing 42 to B.
- **Bit-in-DINT writes** consult the coalescer too, so a UI that pulses
`Flags.3` at every cycle benefits from the same `WriteOnChange`
suppression as a plain BOOL tag.
- **Plain back-compat tags** (no `WriteDeadband`, no `WriteOnChange`)
take a fast-path through the coalescer that increments only the
`WritesPassedThrough` counter — no dictionary lookup, no allocation. The
knobs are zero-overhead opt-in.
### Diagnostics
The driver surfaces two counters through `DriverHealth.Diagnostics` (the
same path the `driver-diagnostics` RPC + Admin UI render for Modbus / S7 /
OPC UA Client):
- `AbCip.WritesSuppressed` — total writes the coalescer skipped.
- `AbCip.WritesPassedThrough` — total writes that hit the wire after
consulting the coalescer.
Their ratio is the "wire savings" headline. A deployment with `0`
suppressions either has no tags opted in or has the deadband too tight /
the equality threshold too loose; revisit the per-tag config.
### Verification
- **Unit**: `AbCipWriteDeadbandTests` (`tests/.../AbCip.Tests`). Asserts
the deadband math, the equality fallback, the first-write pass-through,
reset-on-reconnect, two-device cache independence, suppressed-Good
status, NaN bypass, the back-compat fast path, and DTO round-trip.
- **Integration**: `AbCipWriteDeadbandTests`
(`tests/.../AbCip.IntegrationTests`). Drives a 5-write jittery sequence
with `WriteDeadband: 1.0` against a live `ab_server` and asserts the
driver's diagnostics counter matches the expected suppression count.
- **E2E**: `scripts/e2e/test-abcip.ps1` — see the *WriteCoalesce*
assertion.
### Cross-references
- `docs/drivers/AbServer-Test-Fixture.md` §7 — capability surfaces beyond
read; mentions write-coalesce coverage.
- Modbus driver — read-side deadband in `ModbusDriver` predates this
write-side equivalent; the config shape is intentionally similar.
- Kepware "Deadband (write)" knob — this is the AB CIP equivalent.
## System tags / `_System` folder
PR abcip-4.3 surfaces five read-only diagnostic variables under
`AbCip/<device>/_System/` so SCADA / Admin clients can pivot from "is the
wire up?" to "what's our scan rate / tag count?" without leaving the OPC UA
address space. The values come straight from the live
`IHostConnectivityProbe` + `DriverHealth` surfaces — reads bypass libplctag
and are served from the in-memory snapshot the probe loop / read loop
updates. PR abcip-4.4 added `_RefreshTagDb` as a sixth, writeable entry —
the Kepware-style refresh trigger.
### What it ships
| Variable | Type | Access | Source | Notes |
|---|---|---|---|---|
| `_ConnectionStatus` | String | ViewOnly | `HostState` | `Running` / `Stopped` / `Unknown` / `Faulted`. Mirrors what the connectivity probe sees. |
| `_ScanRate` | Float64 | ViewOnly | `AbCipProbeOptions.Interval` | Configured probe interval in milliseconds — compare against `_LastScanTimeMs` to spot wire stretch. |
| `_TagCount` | Int32 | ViewOnly | `_tagsByName` | Discovered tag count for this device, excluding `_System/*`. |
| `_DeviceError` | String | ViewOnly | `DriverHealth.LastError` | Most recent error message; empty when the device is healthy. |
| `_LastScanTimeMs` | Float64 | ViewOnly | `ReadAsync` wall-clock | Duration of the most-recent `ReadAsync` iteration on this device. |
| `_RefreshTagDb` | Boolean | **Operate** | n/a (write-only trigger) | PR abcip-4.4 — Kepware-style refresh trigger. Reads always return `false`. Writing any truthy value (`true`, non-zero number, `"true"` / `"1"` strings, case-insensitive) dispatches to `RebrowseAsync` against the device's cached `IAddressSpaceBuilder`. Falsy / unparseable writes are no-ops that report `Good` so a UI that resets the trigger flag doesn't see a phantom error. The `AbCip.RefreshTriggers` diagnostic counter increments per truthy write. |
### When the snapshot updates
- **Probe transitions** — every `Running ↔ Stopped` flip refreshes the
device's snapshot inline, so a client subscribed to
`_System/_ConnectionStatus` sees the new state on the next OPC UA
publish tick.
- **Read iterations** — `ReadAsync` recomputes `_LastScanTimeMs` per
device that owned at least one reference in the batch + writes a fresh
snapshot before returning.
- **Driver init** — every device gets a seeded snapshot
(`Unknown` / `0` / `""`) before the probe loop spins up so a read that
arrives before the first probe iteration returns a stable shape rather
than null.
### Browse + read example
```powershell
# Browse the synthetic folder
otopcua-client-cli browse -u opc.tcp://localhost:4840 \
-n "ns=2;s=AbCip/ab://10.0.0.5/1,0/_System"
# Read the connection status
otopcua-client-cli read -u opc.tcp://localhost:4840 \
-n "ns=2;s=AbCip/ab://10.0.0.5/1,0/_System/_ConnectionStatus"
```
The driver-side reference embeds the device host address (the
`_System/<device>/<name>` form) so the dispatcher can route by device
without an additional registry. PR abcip-4.4 turned `_RefreshTagDb` into
a writeable refresh trigger; the rest of the surface remains `ViewOnly`.
### Refreshing the tag DB via OPC UA write
PR abcip-4.4 wires `_RefreshTagDb` to the same `RebrowseAsync` entry point
the CLI's `rebrowse` command exercises (issue #233). Operators have two
roughly-equivalent ways to force a controller-side `@tags` re-walk after a
program download:
```powershell
# Path A — OPC UA write to the system tag (production / Admin UI path)
otopcua-client-cli write -u opc.tcp://localhost:4840 \
-n "ns=2;s=AbCip/ab://10.0.0.5/1,0/_System/_RefreshTagDb" \
-v true --type Boolean
# Path B — direct CLI rebrowse against a transient driver (admin / debug path)
otopcua-abcip-cli rebrowse -g ab://10.0.0.5/1,0
```
Both paths drop the UDT template cache + re-run the enumerator walk. Path A
is the operator-facing surface (the same `IDriverControl.RebrowseAsync`
contract, just dispatched from the OPC UA write surface instead of an
in-process call). Path B spins up its own driver instance so it doesn't
share the live server's cache, which makes it useful for one-off
controller-side validation.
The `AbCip.RefreshTriggers` driver-diagnostics counter increments per
successful truthy write, so the Admin UI / driver-diagnostics RPC can show
a "Refreshes since boot" tile that pairs naturally with the existing
`WritesSuppressed` / `WritesPassedThrough` write-coalescer counters.
### Verification
- **Unit**: `AbCipSystemTagSourceTests`
(`tests/.../AbCip.Tests`) — covers snapshot round-trip, two-device
isolation, recognised-name lookup, default-shape on unseeded devices,
discovery emits the six canonical nodes, and `ReadAsync` dispatches
through the source instead of libplctag.
- **Unit**: `AbCipRefreshTagDbTests`
(`tests/.../AbCip.Tests`) — PR abcip-4.4 — covers discovery emits the
trigger as Operate, reads always return `false`, truthy/falsy/null write
semantics, the `AbCip.RefreshTriggers` counter, two-device counter
independence, defends-in-depth `BadNotWritable` for read-only system
variables, no-op-Good when no builder is cached yet, and mixed-batch
routing alongside ordinary tag writes.
- **Integration**: `AbCipSystemTagDiscoveryTests`
(`tests/.../AbCip.IntegrationTests`) — `[AbServerFact]` connects to a
real `ab_server`, browses `_System/`, reads each variable, asserts
every one returns Good with a non-null value.
- **Integration**: `AbCipRefreshTagDbTests`
(`tests/.../AbCip.IntegrationTests`) — PR abcip-4.4 — `[AbServerFact]`
drives a `_RefreshTagDb` write, asserts the template cache drops + the
per-device counter advances against a live `ab_server`.
- **E2E**: `scripts/e2e/test-abcip.ps1` — see the *SystemTagBrowse* +
*RefreshTagDbWrite* assertions.
+405
View File
@@ -0,0 +1,405 @@
# AB CIP — Performance knobs
Phase 3 of the AB CIP driver plan introduces a small set of operator-tunable
performance knobs that change how the driver talks to the controller without
altering the address space or per-tag semantics. They consolidate decisions
that Kepware exposes as a slider / advanced page so deployments running into
high-latency PLCs, narrow-CPU CompactLogix parts, or legacy ControlLogix
firmware have an explicit lever to pull.
This document is the home for those knobs as PRs land. PR abcip-3.1 ships the
first knob: per-device **CIP Connection Size**.
## Connection Size
### What it is
CIP Connection Size — the byte ceiling on a single Forward Open response
fragment, set during the EtherNet/IP Forward Open handshake. Larger
connection sizes pack more tags into a single CIP RTT (higher request-packing
density, fewer round-trips for the same scan list); smaller connection sizes
stay compatible with legacy or narrow-buffer firmware that rejects oversized
Forward Open requests.
### Family defaults
The driver picks a Connection Size from the per-family profile when the
device-level override is unset:
| Family | Default | Rationale |
|---|---:|---|
| `ControlLogix` | `4002` | Large Forward Open — FW20+ |
| `GuardLogix` | `4002` | Same wire protocol as ControlLogix |
| `CompactLogix` | `504` | 5069-L1/L2/L3 narrow-buffer parts (5370 family) |
| `Micro800` | `488` | Hard cap on Micro800 firmware |
These map straight to libplctag's `connection_size` attribute and match the
defaults Kepware uses out of the box for the same families.
### Override knob
`AbCipDeviceOptions.ConnectionSize` (`int?`, default `null`) overrides the
family default for one device. Bind it through driver config JSON:
```json
{
"Devices": [
{
"HostAddress": "ab://10.0.0.5/1,0",
"PlcFamily": "ControlLogix",
"ConnectionSize": 504
}
]
}
```
The override threads through every libplctag handle the driver creates for
that device — read tags, write tags, probe tags, UDT-template reads, the
`@tags` walker, and BOOL-in-DINT parent runtimes. There is no per-tag
override; one Connection Size applies to the whole controller (matches CIP
session semantics).
### Valid range
`[500..4002]` bytes. This matches the slider Kepware exposes for the same
family. Values outside the range fail driver `InitializeAsync` with an
`InvalidOperationException` — there's no silent clamp; misconfigured devices
fail loudly so operators see the problem at deploy time.
| Value | Behaviour |
|---|---|
| `null` | Use family default (4002 / 504 / 488) |
| `499` or below | Driver init fault — out-of-range |
| `500..4002` | Threaded through to libplctag |
| `4003` or above | Driver init fault — out-of-range |
### Legacy-firmware caveat
ControlLogix firmware **v19 and earlier** caps the CIP buffer at **504
bytes** — Connection Sizes above that cause the controller to reject the
Forward Open with CIP error 0x01/0x113. The 5069-L1/L2/L3 CompactLogix narrow
parts are subject to the same cap.
The driver emits a warning via `AbCipDriverOptions.OnWarning` when the
configured Connection Size **exceeds 511** *and* the device's family profile
default is also at-or-below the legacy cap (i.e. CompactLogix with default
504, or Micro800 with default 488). Production hosting should wire
`OnWarning` to the application logger; the unit tests (`AbCipConnectionSizeTests`)
collect into a list to assert which warnings fired.
The warning fires once per device at `InitializeAsync`. It does not block
initialisation — operators may need the override anyway when running newer
CompactLogix firmware that does support the larger Forward Open. The
controller will reject the connection at runtime if it can't honour the size,
and that surfaces through the standard `IHostConnectivityProbe` channel.
### Performance trade-off
| Larger Connection Size | Smaller Connection Size |
|---|---|
| More tags per CIP RTT — higher throughput | Compatible with legacy / narrow firmware |
| Bigger buffers held by libplctag native (RSS impact) | Lower memory footprint |
| Forward Open rejected on FW19- ControlLogix | Always works (assuming ≥500) |
| Required for high-density scan lists | Forces more round-trips — higher latency |
For most FW20+ ControlLogix shops, the default `4002` is correct and the
override is unnecessary. The override is mainly useful when:
1. **Migrating off Kepware** with a controller-specific slider value already
tuned in production — set Connection Size to match.
2. **Mixed-firmware fleets** where some controllers are still on FW19 — set
the legacy controllers explicitly to `504`.
3. **CompactLogix L1/L2/L3** running newer firmware that supports a larger
Forward Open than the family-default 504 — bump the override up.
4. **Micro800** never goes above `488`; the override is for documentation /
discoverability rather than capability change.
### libplctag wrapper limitation
The libplctag .NET wrapper (1.5.x) does not expose `connection_size` as a
public `Tag` property. The driver propagates the value via reflection on the
wrapper's internal `NativeTagWrapper.SetIntAttribute("connection_size", N)`
after `InitializeAsync` — equivalent to libplctag's
`plc_tag_set_int_attribute`. Because libplctag native parses
`connection_size` only at create time, this is **best-effort** until either:
- the libplctag .NET wrapper exposes `ConnectionSize` directly (planned in
the upstream backlog), in which case the reflection no-ops cleanly, or
- libplctag native gains post-create hot-update for `connection_size`, in
which case the call lands as intended.
In the meantime the value is correctly stored on `DeviceState.ConnectionSize`
+ surfaces in every `AbCipTagCreateParams` the driver builds, so the override
is observable end-to-end through the public driver surface and unit tests
even if the underlying wrapper isn't yet honouring it on the wire.
Operators who need *guaranteed* Connection Size enforcement against FW19
controllers today can pin `libplctag` to a wrapper version that exposes
`ConnectionSize` once one is available, or run a libplctag native build
patched for runtime updates. Both paths are tracked in the AB CIP plan.
### See also
- [`docs/Driver.AbCip.Cli.md`](../Driver.AbCip.Cli.md) — AB CIP CLI uses the
family default ConnectionSize on each invocation; per-device overrides only
apply through the driver's device-config JSON, not the CLI's command-line.
- [`docs/drivers/AbServer-Test-Fixture.md`](AbServer-Test-Fixture.md) §5 —
ab_server simulator does not enforce the narrow CompactLogix cap, so
Connection Size correctness is verified by unit tests + Emulate-rig live
smokes only.
- [`PlcFamilies/AbCipPlcFamilyProfile.cs`](../../src/ZB.MOM.WW.OtOpcUa.Driver.AbCip/PlcFamilies/AbCipPlcFamilyProfile.cs) —
per-family default values.
- [`AbCipConnectionSize`](../../src/ZB.MOM.WW.OtOpcUa.Driver.AbCip/AbCipConnectionSize.cs) —
range bounds + legacy-firmware threshold constants.
## Addressing mode
### What it is
CIP exposes two equivalent ways to address a Logix tag on the wire:
1. **Symbolic** — the request carries the tag's ASCII name and the controller
parses + resolves the path on every read. This is the libplctag default
and what every previous driver build has used.
2. **Logical** — the request carries a CIP Symbol Object instance ID (a small
integer assigned by the controller when the project was downloaded). The
controller skips ASCII parsing entirely; the lookup is a single
instance-table dereference.
Logical addressing is faster on the controller side and produces smaller
request frames. The trade-off is that the driver has to learn the
name → instance-id mapping once, by reading the `@tags` pseudo-tag at
startup, and the resolution step has to repeat after a controller program
download (instance IDs are re-assigned).
### Enum values
`AbCipDeviceOptions.AddressingMode` (`AddressingMode` enum, default
`Auto`) takes one of three values:
| Value | Behaviour |
|---|---|
| `Auto` | Driver picks. **Currently resolves to `Symbolic`** — a future PR will plumb a real auto-detection heuristic (firmware version + symbol-table size). |
| `Symbolic` | Force ASCII symbolic addressing on the wire. The historical default. |
| `Logical` | Use CIP logical-segment / instance-ID addressing. Triggers a one-time `@tags` walk at the first read; subsequent reads consult the cached map. |
`Auto` is documented as "Symbolic-for-now" so deployments setting `Auto`
explicitly today will silently flip to a real heuristic when one ships,
matching the spirit of the toggle. Operators who want to pin the wire
behaviour should set `Symbolic` or `Logical` directly.
### Family compatibility
Logical addressing depends on the controller implementing CIP Symbol Object
class 0x6B with stable instance IDs. Older AB families don't:
| Family | Logical addressing supported? | Why |
|---|---|---|
| `ControlLogix` | yes | Native class 0x6B support, FW10+ |
| `CompactLogix` | yes | Same wire protocol as ControlLogix |
| `GuardLogix` | yes | Same wire protocol; safety partition is tag-level, not addressing-level |
| `Micro800` | **no** | Firmware does not implement class 0x6B; instance-ID reads trip CIP "Path Segment Error" 0x04 |
| `SLC500` / `PLC5` | **no** | Pre-CIP families; PCCC bridging only — no Symbol Object at all |
When `AddressingMode = Logical` is set on an unsupported family, the driver
**falls back to Symbolic with a warning** (via `OnWarning`) instead of
faulting. This keeps mixed-firmware deployments working — operators can ship
a uniform "Logical" config across the fleet and let the driver downgrade
the families that can't honour it.
The driver-level decision is exposed via
`PlcFamilies.AbCipPlcFamilyProfile.SupportsLogicalAddressing` and resolved at
`AbCipDriver.InitializeAsync` time; the resolved mode is stored on
`DeviceState.AddressingMode` and threaded through every
`AbCipTagCreateParams` from then on.
### One-time symbol-table walk
The first read on a Logical-mode device triggers a one-time `@tags` walk via
`LibplctagTagEnumerator` (the same component used for opt-in controller
browse). The driver caches the resulting name → instance-id map on
`DeviceState.LogicalInstanceMap`; subsequent reads consult the cache without
issuing another walk. The walk is gated by a per-device `SemaphoreSlim` so
parallel first-reads serialise on a single dispatch.
The walk happens in `AbCipDriver.EnsureLogicalMappingsAsync` and runs only
for devices that have actually resolved to `Logical`. Symbolic-mode devices
skip the walk entirely. Walk failures are non-fatal: the
`LogicalWalkComplete` flag still flips to `true` so the driver does not
re-attempt indefinitely, and per-tag handles fall back to Symbolic addressing
on the wire (libplctag's default).
A controller program download invalidates the instance IDs. There is no
auto-invalidation today — operators trigger a fresh walk by either
restarting the driver or calling `RebrowseAsync` (the same surface that
clears the UDT template cache) with logic-mode plumbing extended in a
future PR. For now, restart-on-download is the recommended workflow.
### libplctag wrapper limitation
The libplctag .NET wrapper (1.5.x) does **not** expose a public knob for
instance-ID addressing. The driver translates Logical-mode params into
libplctag attributes via reflection on
`NativeTagWrapper.SetAttributeString("use_connected_msg", "1")` +
`SetAttributeString("cip_addr", "0x6B,N")` — same best-effort fallback
pattern as the Connection Size knob.
This means **Logical mode is observable end-to-end through the public
driver surface and unit tests today**, but the actual wire behaviour
remains Symbolic until either:
- the upstream libplctag .NET wrapper exposes the
`UseConnectedMessaging` + `CipAddr` properties on `Tag` directly
(planned in the upstream backlog), in which case the reflection no-ops
cleanly, or
- libplctag native gains post-create hot-update for `cip_addr`, in which
case the call lands as intended.
The driver-level bookkeeping (resolved mode, instance-id map, family
compatibility, fall-back warning) is fully wired so the upgrade path is
purely a wrapper-version bump.
### Performance trade-off
| Symbolic addressing | Logical addressing |
|---|---|
| Works everywhere | Requires Symbol Object class 0x6B |
| ASCII parse on every read (controller-side cost) | One-time walk; instance-id lookup thereafter |
| No first-read latency | First read on a device pays the `@tags` walk |
| Smaller code surface | Stale on program download — restart driver to re-walk |
| Best for small / sparse tag sets | Best for >500-tag scans with stable controller |
For scan lists in the **single-digit-tag** range, the per-poll ASCII parse
cost is invisible. For **medium** scan lists (~100 tags) the gain is real
but small — typically 510% per CIP RTT depending on tag-name length. The
break-even point is where the ASCII-parse overhead starts dominating,
roughly **>500 tags** in a tight scan loop, which is also where libplctag's
own request-packing benefits compound. Large MES / batch projects with
many UDT instances are the canonical case.
### Driver config JSON
Bind the toggle through the driver-config JSON:
```json
{
"Devices": [
{
"HostAddress": "ab://10.0.0.5/1,0",
"PlcFamily": "ControlLogix",
"AddressingMode": "Logical"
}
]
}
```
`"Auto"`, `"Symbolic"`, and `"Logical"` parse case-insensitively. Omitting
the field defaults to `"Auto"`.
### See also
- [`AbCipDriverOptions.AddressingMode`](../../src/ZB.MOM.WW.OtOpcUa.Driver.AbCip/AbCipDriverOptions.cs) —
enum definition + per-value docstrings.
- [`AbCipPlcFamilyProfile.SupportsLogicalAddressing`](../../src/ZB.MOM.WW.OtOpcUa.Driver.AbCip/PlcFamilies/AbCipPlcFamilyProfile.cs) —
family compatibility table source-of-truth.
- [`docs/drivers/AbServer-Test-Fixture.md`](AbServer-Test-Fixture.md) §
"What it actually covers" — Logical-mode fixture coverage status.
- [`AbCipAddressingModeBenchTests`](../../tests/ZB.MOM.WW.OtOpcUa.Driver.AbCip.IntegrationTests/AbCipAddressingModeBenchTests.cs) —
scaffold for the wall-clock comparison; gated on `[AbServerFact]`.
## Read strategy (PR abcip-3.3)
A per-device toggle that controls how multi-member UDT batches are read.
The default `Auto` value matches every previous build's behaviour for dense
reads but switches to per-member bundling when only a handful of members of
a large UDT are subscribed — the canonical "5 of 50" sparse-subscription
case where reading the whole UDT buffer just to extract a few fields wastes
wire bandwidth.
### Three modes
| Mode | When to use |
|---|---|
| `WholeUdt` | Most members of every subscribed UDT are read together. One libplctag read per parent UDT, members decoded in-memory at their byte offsets. The task #194 default. |
| `MultiPacket` | A few members of a large UDT are subscribed at a time. One read per subscribed member, bundled per parent into one CIP Multi-Service Packet. |
| `Auto` (default) | Planner picks per-batch from the subscribed-member fraction (see *Sparsity threshold*). |
### Sparsity threshold
Auto mode divides `subscribedMembers / totalMembers` for each parent UDT and
picks `MultiPacket` when the fraction is **strictly less than** the
threshold, else `WholeUdt`. Default threshold `0.25` — a 1/4 subscription is
the rough break-even where the wire-cost of one whole-UDT read still beats
N member reads on a ControlLogix 4002-byte connection-size buffer; above
1/4, the per-member overhead dominates.
Tune via `AbCipDeviceOptions.MultiPacketSparsityThreshold` (clamped to
`[0..1]`). Threshold `0.0` = "never MultiPacket"; `1.0` = "always MultiPacket
when any member is subscribed."
### Family compatibility
`MultiPacket` requires CIP service `0x0A` (Multi-Service Packet) on the
controller. Source of truth is
[`AbCipPlcFamilyProfile.SupportsRequestPacking`](../../src/ZB.MOM.WW.OtOpcUa.Driver.AbCip/PlcFamilies/AbCipPlcFamilyProfile.cs):
| Family | `SupportsRequestPacking` |
|---|---|
| ControlLogix | yes |
| CompactLogix | yes |
| GuardLogix | yes (wire identical to ControlLogix) |
| Micro800 | **no** |
| SLC500 / PLC5 (when those profiles ship) | **no** |
User-forced `MultiPacket` against a non-packing family logs a warning at
device init and falls back to `WholeUdt`. `Auto` against a non-packing
family stays `Auto` at the device level — the per-batch heuristic caps the
strategy to `WholeUdt` so the wire never sees a Multi-Service-Packet against
a controller that can't decode it.
### libplctag wrapper limitation
The libplctag .NET wrapper (1.5.x) does not expose the `0x0A` service as a
public knob — same wrapper-version constraint that gates PR abcip-3.1's
`connection_size` and PR abcip-3.2's instance-ID addressing. Today's
MultiPacket runtime therefore issues N libplctag reads sequentially when
the planner picks the strategy; the wire-level bundling lands cleanly when
an upstream wrapper release exposes the primitive.
The driver-level bookkeeping (resolved strategy, per-batch heuristic,
family-compat fall-back, per-device dispatch counters) is fully wired so
the upgrade path is a wrapper-version bump only — the planner already
produces the right plan, and `AbCipMultiPacketReadPlanner.Build` is
covered by unit tests that pin the plan shape rather than wire bytes.
### Driver config JSON
```json
{
"Devices": [
{
"HostAddress": "ab://10.0.0.5/1,0",
"PlcFamily": "ControlLogix",
"ReadStrategy": "Auto",
"MultiPacketSparsityThreshold": 0.25
}
]
}
```
`"Auto"`, `"WholeUdt"`, and `"MultiPacket"` parse case-insensitively.
Omitting the field defaults to `"Auto"`. Omitting
`MultiPacketSparsityThreshold` defaults to `0.25`.
### See also
- [`AbCipDriverOptions.ReadStrategy`](../../src/ZB.MOM.WW.OtOpcUa.Driver.AbCip/AbCipDriverOptions.cs) —
enum definition + per-value docstrings.
- [`AbCipMultiPacketReadPlanner`](../../src/ZB.MOM.WW.OtOpcUa.Driver.AbCip/AbCipMultiPacketReadPlanner.cs) —
plan shape + Auto-mode heuristic.
- [`AbCipPlcFamilyProfile.SupportsRequestPacking`](../../src/ZB.MOM.WW.OtOpcUa.Driver.AbCip/PlcFamilies/AbCipPlcFamilyProfile.cs) —
family compatibility table source-of-truth.
- [`AbCipReadStrategyTests`](../../tests/ZB.MOM.WW.OtOpcUa.Driver.AbCip.Tests/AbCipReadStrategyTests.cs) —
device-init resolution, heuristic edges, dispatch counters, DTO round-trip.
- [`AbCipEmulateMultiPacketReadTests`](../../tests/ZB.MOM.WW.OtOpcUa.Driver.AbCip.IntegrationTests/Emulate/AbCipEmulateMultiPacketReadTests.cs) —
golden-box-tier wire-level coverage scaffold; gated on `AB_SERVER_PROFILE=emulate`.
+141
View File
@@ -0,0 +1,141 @@
# AB Legacy — DH+ via 1756-DHRIO bridging
PR ablegacy-13 / [#256](https://github.com/dohertj2/lmxopcua/issues/256). The AB
Legacy driver can address a PLC-5 sitting on a DH+ link by routing CIP requests
through a 1756-DHRIO module installed in a ControlLogix chassis. This is the
canonical way to keep an installed-base PLC-5 fleet alive after the chassis-
level migration to ControlLogix; the DHRIO module exposes a DH+ "side" that
talks to the legacy PLC-5 / SLC-DH+ peers and a backplane "side" that the
ControlLogix CPU + Ethernet bridge can route through.
## Wire layout
```
OtOpcUa server ──EtherNet/IP──► 1756-EN2T (slot 0) ──backplane──► 1756-DHRIO (slot N) ──DH+──► PLC-5
```
Two CIP hops:
1. **Backplane** — port `1`, slot `<N>` (the slot the DHRIO module lives in).
2. **DH+** — port `2`, station `<S>` (the DH+ node address of the target PLC-5,
in **octal**).
Resulting CIP path: `1,<N>,2,<S>`.
> The first port `1` is always the backplane; port `2` is the DH+ side of the
> 1756-DHRIO module. This mirrors the convention Rockwell uses in RSLinx + RSLogix
> 5.
## Octal station number
The DH+ network was specified with **octal** node addresses. Rockwell tooling
displays them in octal too (RSLogix 5 → "DH+ Node Address" field on the
controller properties dialog). The driver follows suit — the station segment
of the CIP path **must be parsed as octal** (digits 0..7 only; `8`, `9`, and
multi-byte garbage are rejected).
DH+ addresses run `0..77` octal == `0..63` decimal. Quick reference:
| Octal | Decimal | Octal | Decimal | Octal | Decimal | Octal | Decimal |
|------:|--------:|------:|--------:|------:|--------:|------:|--------:|
| 00 | 0 | 20 | 16 | 40 | 32 | 60 | 48 |
| 01 | 1 | 21 | 17 | 41 | 33 | 61 | 49 |
| 02 | 2 | 22 | 18 | 42 | 34 | 62 | 50 |
| 03 | 3 | 23 | 19 | 43 | 35 | 63 | 51 |
| 04 | 4 | 24 | 20 | 44 | 36 | 64 | 52 |
| 05 | 5 | 25 | 21 | 45 | 37 | 65 | 53 |
| 06 | 6 | 26 | 22 | 46 | 38 | 66 | 54 |
| 07 | 7 | 27 | 23 | 47 | 39 | 67 | 55 |
| 10 | 8 | 30 | 24 | 50 | 40 | 70 | 56 |
| 11 | 9 | 31 | 25 | 51 | 41 | 71 | 57 |
| 12 | 10 | 32 | 26 | 52 | 42 | 72 | 58 |
| 13 | 11 | 33 | 27 | 53 | 43 | 73 | 59 |
| 14 | 12 | 34 | 28 | 54 | 44 | 74 | 60 |
| 15 | 13 | 35 | 29 | 55 | 45 | 75 | 61 |
| 16 | 14 | 36 | 30 | 56 | 46 | 76 | 62 |
| 17 | 15 | 37 | 31 | 57 | 47 | 77 | 63 |
Anything past `77` octal (i.e. decimal > 63) is invalid on a real DH+ network
and rejected by the parser.
## PLC-5 only
DHRIO bridging is **PLC-5-only**. The driver enforces this at
`AbLegacyDriver.InitializeAsync` time: a DH+ bridge path combined with
`PlcFamily=Slc500 / MicroLogix / LogixPccc` throws
`InvalidOperationException("DHRIO bridging is PLC-5-only")` immediately rather
than letting reads silently fail with `BadCommunicationError` on the wire.
Background: the 1756-DHRIO module only speaks DH+ to PLC-5 / SLC-DH+ peers, and
libplctag's PCCC stack only exposes the PLC-5 side. SLC 5/04 boxes on DH+
**can** be physically reached through a DHRIO module, but the protocol stack
needed to drive them isn't exposed by libplctag — out of scope for this driver.
## CLI worked example
PLC-5 at DH+ node `07` (octal == 7 decimal), DHRIO module in slot 3, gateway
`192.168.1.10`:
```powershell
otopcua-ablegacy-cli probe `
-g ab://192.168.1.10/1,3,2,07 `
-P Plc5 `
-a N7:0
```
```powershell
# Read N7:10 from the PLC-5 across the DHRIO bridge
otopcua-ablegacy-cli read `
-g ab://192.168.1.10/1,3,2,07 `
-P Plc5 `
-a N7:10 `
-t Int
```
The driver surfaces the parsed bridge form on the host-address record:
`BackplaneSlot=3`, `DhPlusPort=2`, `DhPlusStation=7` (decimal-translated). Use
those values when reading driver-diagnostics output to confirm the bridge was
recognised — a non-bridge CIP path leaves all three fields null.
## Manual smoke procedure
There is no automated end-to-end coverage for DH+ bridging because the only
path to wire-level validation is real hardware (libplctag's `ab_server` Docker
image doesn't simulate the DHRIO + DH+ + PLC-5 stack). The unit-test layer
covers parser positive / negative cases.
Hardware smoke checklist:
1. Confirm the 1756-DHRIO module is present in the target ControlLogix chassis.
RSLinx Classic should show `DH+, 1` under the chassis tree with the PLC-5
nodes enumerated underneath.
2. Note the DHRIO module's slot number (the `<N>` in `1,<N>,2,<S>`).
3. Note the target PLC-5's DH+ node address — read it off the front-panel switch
bank, or the controller properties in RSLogix 5. **Read it as octal**.
4. From an OtOpcUa box that can reach the EtherNet/IP gateway:
```powershell
otopcua-ablegacy-cli probe -g ab://<gateway>/1,<slot>,2,<station-octal> -P Plc5 -a S:0
```
`S:0` (status file word 0) is non-destructive and present on every PLC-5.
5. If the probe succeeds, exercise an N file read against a known
non-zero address. Compare against the value displayed in RSLogix 5 →
Online → Data → N7.
If the probe fails with `BadCommunicationError`:
- Wrong slot number — re-check via RSLinx.
- Wrong octal node — convert from RSLogix 5's display value (already octal); a
decimal-thinking conversion mistake is the most common smoke failure.
- DHRIO module's DH+ baud rate doesn't match the PLC-5's switch setting (57.6k
/ 115.2k / 230.4k) — driver-side problem this can't paper over.
- A scanner on the DHRIO is in scheduled-mode and starving unscheduled
PCCC traffic — bump the DHRIO's unscheduled-message slice in RSLogix 5000.
## See also
- [`Driver.AbLegacy.Cli.md`](../Driver.AbLegacy.Cli.md) — the family / CIP-path
cheat sheet now carries a DHRIO row.
- [`drivers/AbLegacy-Test-Fixture.md`](AbLegacy-Test-Fixture.md) — DH+ bridging
is unit-only; no Docker fixture supports it.
+188
View File
@@ -0,0 +1,188 @@
# AB Legacy diagnostic counters
Per-device diagnostic counters surface as auto-generated read-only OPC UA
variables under each device's synthetic `_Diagnostics/` folder. HMIs can bind
directly without going through a separate diagnostics RPC. Mirrors the AB CIP
`_System/` pattern from PR abcip-4.3.
Closes #253 (PR ablegacy-10).
## The nine counters
Each device managed by the `AbLegacyDriver` exposes nine read-only nodes under
`AbLegacy/<host>/_Diagnostics/<name>`. The first seven shipped in PR ablegacy-10;
`DemoteCount` + `LastDemotedUtc` arrived with PR ablegacy-12 / #255 (auto-demote
on comm failure).
| Name | Type | Semantics |
|---|---|---|
| `RequestCount` | Int64 | Total `ReadAsync` requests issued against this device. One increment per non-diagnostic reference per call, success or failure. |
| `ResponseCount` | Int64 | Successful read responses. |
| `ErrorCount` | Int64 | Failed read responses (any non-Good status). |
| `RetryCount` | Int64 | Retry attempts beyond the first per the PR 9 retry loop. A single read with two retries adds two. |
| `LastErrorCode` | Int32 | Most recent libplctag status code on a failed read; `0` when no error has been seen since the last reset. |
| `LastErrorMessage` | String | Most recent libplctag error message on a failed read; empty when no error has been seen since the last reset. |
| `CommFailures` | Int64 | Count of read failures mapped to `BadCommunicationError`. Spans transient libplctag throws + retried-out chains so operators see a single "wire fell off" counter. |
| `DemoteCount` | Int64 | **PR ablegacy-12** — cumulative auto-demote events for this device. Bumps every time the driver crosses the consecutive-failure threshold and arms a fresh cool-down window. Cumulative across `ReinitializeAsync` (preserved through redeploys) so a flapping link surfaces as a steadily climbing counter. |
| `LastDemotedUtc` | String | **PR ablegacy-12** — ISO-8601 UTC timestamp of the most recent auto-demotion. Empty string when this device has never been demoted. |
**Address shape**: `_Diagnostics/<deviceHostAddress>/<name>`
e.g. `_Diagnostics/ab://10.0.0.5/1,0/RequestCount`.
The `<deviceHostAddress>` segment is the canonical `ab://host[:port]/cip-path`
string from `AbLegacyDeviceOptions.HostAddress`. The browse path looks like
`AbLegacy/<deviceHostAddress>/_Diagnostics/<name>` — the same shape as a
user-config tag node, just under a reserved sibling folder.
## Reset behaviour
| Trigger | Effect |
|---|---|
| `ReinitializeAsync` | Every counter for every device resets to zero, plus `LastErrorMessage` clears to empty. **PR ablegacy-12 exception:** `DemoteCount` + `LastDemotedUtc` survive the reinit so an operator redeploying mid-incident doesn't lose the flapping-link history. |
| `ShutdownAsync` | All counters drop with the device map (including `DemoteCount`). |
| Driver process restart | Counters start at zero. |
| Probe transition Stopped→Running | **No automatic reset** — counters are cumulative across reconnect events so operators can spot intermittent links by watching `CommFailures` keep climbing. |
| Probe transition Demoted→Running | **PR ablegacy-12** — early-clear of the active demote window, but the cumulative `DemoteCount` stays put. |
There is no in-process "reset" RPC at the time of writing. If you need to
clear counters without a redeploy, kick a `ReinitializeAsync` from the Admin
RPC surface — the driver re-EnsureDevice's each host so the freshly registered
counters start at zero.
## What does *not* increment counters
Reads against `_Diagnostics/<host>/<name>` are **driver-local observability**,
not field traffic — they short-circuit before the libplctag dispatch and do
NOT increment `RequestCount` or any other counter. Otherwise a 1 Hz HMI poll
of `RequestCount` would make the counter chase its own tail.
Writes against `_Diagnostics/*` are rejected with `BadNotWritable` because
every diagnostic node is `SecurityClassification.ViewOnly` — a misbehaving
SCADA template can't accidentally clobber the diagnostic surface.
## Collision with user tags
User-config tags must not shadow the seven reserved diagnostic names and
must not live under the synthetic `_Diagnostics/` folder. Both shapes are
rejected at `InitializeAsync` time with a clear `InvalidOperationException`:
- A tag named `RequestCount` (or any of the other six reserved names) is
rejected because it would silently never resolve at read time — the
diagnostics short-circuit wins.
- A tag whose `Address` starts with `_Diagnostics/` is rejected because the
whole prefix is owned by the auto-emitted counters.
Pick a different name (`SiteRequestCount`, `MachineRequestCount`) or a
different address path (real PCCC files like `N7:0`).
## HMI binding examples
### OPC UA Client CLI
```powershell
dotnet run --project src/ZB.MOM.WW.OtOpcUa.Client.CLI -- read `
-u opc.tcp://localhost:4840 `
-n "ns=2;s=AbLegacy/ab://10.0.0.5/1,0/_Diagnostics/RequestCount"
```
### AB Legacy CLI (driver-direct, no OPC UA layer)
```powershell
dotnet run --project src/ZB.MOM.WW.OtOpcUa.Driver.AbLegacy.Cli -- read `
-g "ab://10.0.0.5/1,0" -P Slc500 `
--address "_Diagnostics/RequestCount"
```
The driver-direct path lets you sanity-check the counter without standing up
an OPC UA server — useful when triaging a wire-level issue on the bench.
### Subscription pattern
Subscribe to all seven counters at a slow rate (e.g. 510 s) on a long-lived
overview dashboard, plus a faster rate (1 s) on `LastErrorMessage` /
`LastErrorCode` when actively debugging a flapping link. The diagnostics
short-circuit makes every read O(1) — there's no penalty for fast polling
of the counter itself, only the OPC UA subscription bookkeeping.
## Auto-demote on comm failure (PR ablegacy-12 / #255)
When a device fails N consecutive reads or probes the driver marks it
**Demoted** for a configurable cool-down window. Reads against a demoted
device short-circuit with `BadCommunicationError` *without invoking
libplctag* — that's the whole point of the feature: one slow PLC sharing
the driver thread can't starve faster peers reading from healthy hosts on
the same `AbLegacyDriver` instance.
### Configuration
Per-device, optional. `null` keeps the documented defaults (auto-demote
**enabled** with 3 failures / 30 s).
```jsonc
{
"Devices": [
{
"HostAddress": "ab://10.0.0.5/1,0",
"PlcFamily": "Slc500",
"Demote": {
"FailureThreshold": 3, // default 3
"DemoteForMs": 30000, // default 30s
"Enabled": true // default true
}
}
]
}
```
| Knob | Default | Notes |
|---|---|---|
| `FailureThreshold` | `3` | Consecutive comm failures before the device is demoted. A successful read or probe resets the tally. Terminal failures (`BadNodeIdUnknown`, `BadTypeMismatch`, …) **do not count** — they're config / decoder mismatches, not field outages. |
| `DemoteForMs` | `30000` (30s) | Cool-down window. Reads while this is active short-circuit; a successful probe clears it early. |
| `Enabled` | `true` | Set to `false` to keep the diagnostic counters but skip the auto-throttle. The failure tally still ticks but never arms the cool-down. |
### Recovery
Three ways out of Demoted, in order of likelihood:
1. **Probe success** — the per-device probe loop (`Probe.Enabled = true`,
default address `S:0`) is the fast path. The next probe iteration after
demotion will exercise the wire; on success it clears
`DemotedUntilUtc` immediately and transitions the host to `Running`.
2. **Window expiry** — once `DemoteForMs` elapses the demote marker
clears on the next read attempt. The read goes through; if it fails,
the failure tally keeps counting from where it left off (so a
permanently-down device re-arms the window after one more consecutive
failure rather than having to repeat the full threshold).
3. **`ReinitializeAsync`** — clears `ConsecutiveFailures` +
`DemotedUntilUtc` outright. Cumulative `DemoteCount` survives.
### Observability
`DemoteCount` is the headline counter — it bumps once per demotion event,
not per short-circuited read. A device that flaps every hour for a week
shows `DemoteCount = ~168` on Friday afternoon, which is the operator
signal you actually want.
`LastDemotedUtc` is the ISO-8601 UTC timestamp of the most recent
demotion. Bind it on a per-device tile alongside `DemoteCount` for
"flapping link" alerting.
### Host-state surface
A demoted device reports `HostState.Demoted` (new in PR ablegacy-12
on `Core.Abstractions/IHostConnectivityProbe.cs`). Consumers that
predate the new value (the central `HostStatusPublisher`) safely treat
it as `Stopped` — no schema migration needed.
## Cross-references
- [`AbLegacyDiagnosticTags.cs`](../../src/ZB.MOM.WW.OtOpcUa.Driver.AbLegacy/AbLegacyDiagnosticTags.cs)
— counter store + read short-circuit
- [`AbLegacyDriver.cs`](../../src/ZB.MOM.WW.OtOpcUa.Driver.AbLegacy/AbLegacyDriver.cs)
— increment sites in `ReadAsync`, discovery emission in `DiscoverAsync`,
auto-demote bookkeeping in `RecordFailureAndMaybeDemote` + `ProbeLoopAsync`
- [`AbLegacy-Test-Fixture.md`](AbLegacy-Test-Fixture.md) — `AbLegacyDiagnosticsTests`
+ `AbLegacyAutoDemoteTests` + collision-rejection contract
- [AB CIP `_System/` parallel](../../src/ZB.MOM.WW.OtOpcUa.Driver.AbCip/AbCipSystemTagSource.cs)
— same pattern with the CIP-specific six entries (incl. writeable
`_RefreshTagDb` trigger)
+163
View File
@@ -0,0 +1,163 @@
# AB Legacy — RSLogix symbol & data-table import
ablegacy-11 / [#254](https://github.com/dohertj2/lmxopcua/issues/254) — bulk-import
RSLogix 500 / 5 symbol exports into the AB Legacy driver. Saves operators from
hand-typing every `N7:0` / `F8:12` / `B3:0/5` row of a several-hundred-tag PLC
into `appsettings.json`.
## Supported formats — v1
| Format | Status | Notes |
|---|---|---|
| `.CSV` "Database Export" | **supported** | Header columns `Symbol,Address,Description,DataType,Scope`; quoted fields, doubled-quote escapes, comment lines (`;` / `#`) all honoured |
| `.SLC` text export | **supported** | RSLogix 500's "Save As Text" emits the same column shape — point the importer at the file directly |
| `.RSS` (RSLogix 500 binary project) | **out of scope** | Proprietary; no parser ships in libplctag or any community project. Export to CSV first |
| `.RSP` (RSLogix 5 binary project) | **out of scope** | Same as `.RSS` |
The binary `.RSS` / `.RSP` non-goal isn't a "we don't have time" decision —
Rockwell's binary format is undocumented + tied to RSLogix's internal page
layout, and the only known parsers are commercial IDE plugins. v1 ships with
text/CSV only and a clean abstraction (`IRsLogixImporter`) so a binary parser
can slot in later without reshaping the call sites.
## CSV column reference
| Column | Required | Notes |
|---|---|---|
| `Symbol` | yes | OPC UA tag name. RSLogix symbols are already stable; the importer uses them verbatim |
| `Address` | yes | PCCC address. File letter implies `DataType` (see below); the importer's resolution wins over the CSV's `DataType` column |
| `Description` | no | Parsed but currently unused — `AbLegacyTagDefinition` has no `Description` field at the v2 schema layer (see [#248](https://github.com/dohertj2/lmxopcua/issues/248)). Held in the column contract for future schema bumps |
| `DataType` | no | RSLogix-supplied (`INT` / `REAL` / `BOOL` / `TIMER` / …). Ignored at import time; the importer derives the type from the file letter |
| `Scope` | no | `Global` (default when blank) or `Local:N` for ladder-file-N-scoped tags. Acts as a filter when `--scope` is set on the CLI |
### File-letter → `AbLegacyDataType` mapping
| Letter | Example | Maps to | Notes |
|---|---|---|---|
| `N` | `N7:0` | `Int` (signed 16-bit) | |
| `F` | `F8:0` | `Float` (32-bit IEEE-754) | |
| `B` | `B3:0/0` | `Bit` | Bit-within-word also forces Bit when `BitIndex` is set |
| `L` | `L9:0` | `Long` (signed 32-bit) | SLC 5/05+ only |
| `ST` | `ST10:0` | `String` | 82-byte fixed-length + length word |
| `T` | `T4:0.ACC` | `TimerElement` | Sub-element implied by `.ACC` / `.PRE` / `.EN` / `.DN` |
| `C` | `C5:0.ACC` | `CounterElement` | |
| `R` | `R6:0.LEN` | `ControlElement` | |
| `A` | `A14:0` | `AnalogInt` | Older hardware |
| `I` / `O` / `S` | `I:0/0` | `Int` (or `Bit` with bit suffix) | I/O + status files |
| `PD` / `MG` / `PLS` / `BT` | `PD9:0` | `PidElement` etc. | Family-gated; PD/MG common on SLC500 + PLC-5; PLS/BT PLC-5 only |
| `RTC` / `HSC` / `DLS` / … | `RTC:0.YR` | `MicroLogixFunctionFile` | MicroLogix 1100 / 1400 only |
A bit suffix (`/N`) on any file letter forces `Bit`, regardless of the file
letter's normal classification — `N7:0/3` parses as Bit, not Int.
## Scope filter
The `Scope` column distinguishes program-scoped tags (`Local:1`, `Local:2`, …)
from globals. RSLogix exports usually mix both. The CLI's `--scope` flag (and
`ImportOptions.ScopeFilter` at the API level) keeps only the rows whose
`Scope` value matches case-insensitively; rows with no `Scope` column count as
`Global`.
```powershell
# Import only the Global symbols
otopcua-ablegacy-cli import-rslogix `
--file plc-export.csv `
--device ab://192.168.1.20/1,0 `
--scope Global
# Import only the file-2 program-scope tags
otopcua-ablegacy-cli import-rslogix `
--file plc-export.csv `
--device ab://192.168.1.20/1,0 `
--scope Local:2
```
## CLI subcommand — `import-rslogix`
```powershell
otopcua-ablegacy-cli import-rslogix --help
```
| Flag | Default | Purpose |
|---|---|---|
| `-f` / `--file` | **required** | Path to the CSV export |
| `-d` / `--device` | **required** | Canonical AB Legacy gateway URI every imported tag binds to |
| `--emit` | `appsettings-fragment` | `appsettings-fragment` (JSON) or `summary` (one-line counter) |
| `-o` / `--output` | stdout | Optional path; when set the JSON fragment is written there + summary line goes to stdout |
| `--scope` | none | Optional Scope filter (case-insensitive) |
| `--max-rows` | unlimited | Defensive cap on rows imported |
| `--strict` | off | Fail-fast on the first malformed row (default permissive: skip + log) |
### `appsettings-fragment` output shape
The default `--emit appsettings-fragment` mode writes a JSON object whose
`Tags` array is shaped like the `AbLegacyDriverConfigDto.Tags` array — paste
straight into the driver-instance config under
`Drivers/<instance>/Config/Tags`.
```json
{
"Tags": [
{
"Name": "MotorSpeed",
"DeviceHostAddress": "ab://192.168.1.20/1,0",
"Address": "N7:0",
"DataType": "Int",
"Writable": true
},
]
}
```
### Summary line
`--emit summary` writes a single line:
```
Imported 142 tag(s), skipped 3, errors 0.
```
`Skipped` covers Scope-filter rejections + missing-required-field rows; `errors`
covers rows whose `Address` failed to parse as a PCCC address.
## API surface — `IRsLogixImporter` + `AddRsLogixImport`
For server-side / bootstrap use-cases the importer is also reachable via:
```csharp
using ZB.MOM.WW.OtOpcUa.Driver.AbLegacy;
using ZB.MOM.WW.OtOpcUa.Driver.AbLegacy.Import;
var options = new AbLegacyDriverOptions
{
Devices = [new AbLegacyDeviceOptions("ab://192.168.1.20/1,0")],
};
// Append imported tags onto an existing options object.
var updated = options.AddRsLogixImport(
path: @"C:\plc\plc-export.csv",
deviceHostAddress: "ab://192.168.1.20/1,0",
out var result);
// result.ParsedCount / SkippedCount / ErrorCount surface the import telemetry.
Console.WriteLine($"Imported {result.ParsedCount} tags");
```
For a hand-managed importer instance (e.g. supplying a custom `ILogger`) call
`new RsLogixSymbolImport(logger).Parse(stream, deviceHostAddress, opts)`
directly.
## Operational notes
- The importer is **additive**`AddRsLogixImport` concatenates onto the
existing `Tags` list rather than replacing it. Hand-rolled tags (system-status
variables, computed fields the operator added by hand) survive a re-import.
- Re-imports are not idempotent today — calling `AddRsLogixImport` twice will
produce duplicate tag rows. Operators are expected to either start from a
clean options object or de-duplicate themselves; a future schema rev may add
a `replace=true` switch.
- Description metadata is dropped on the floor — see the column reference
above. When [#248](https://github.com/dohertj2/lmxopcua/issues/248) lands a
`Description` field on `AbLegacyTagDefinition` the importer will start
populating it without further changes to the CSV contract.
+106
View File
@@ -36,6 +36,12 @@ supplies a `FakeAbLegacyTag`.
- `AbLegacyAddressTests` — PCCC address parsing for SLC / MicroLogix / PLC-5
/ LogixPccc-mode (`N7:0`, `F8:12`, `B3:0/5`, etc.)
- `AbLegacyArrayTests` — PR 7 array contiguous-block addressing: parser
positives + rejects for `,N` / `[N]` suffixes, options-override
(`ArrayLength`), driver `IsArray` discovery, and array decoding for N / F /
L / B files (Rockwell convention: one BOOL per word for `B3:0,10`). Latency
benchmark against the Docker fixture is a perf-flagged integration case in
`AbLegacyArrayReadTests` — runs only when ab_server is reachable.
- `AbLegacyCapabilityTests` — data type mapping, read-only enforcement
- `AbLegacyReadWriteTests` — read + write happy + error paths against the fake
- `AbLegacyBitRmwTests` — bit-within-DINT read-modify-write serialization via
@@ -43,6 +49,63 @@ supplies a `FakeAbLegacyTag`.
- `AbLegacyHostAndStatusTests` — probe + host-status transitions driven by
fake-returned statuses
- `AbLegacyDriverTests``IDriver` lifecycle
- `AbLegacyDiagnosticsTests` — PR ablegacy-10 / #253 per-device diagnostic
counters: 5 reads (3 ok / 2 fail) → `RequestCount=5`, `ResponseCount=3`,
`ErrorCount=2`; `LastErrorCode` reflects the most recent libplctag status;
`RetryCount` increments per retry attempt beyond the first; counters reset
on `ReinitializeAsync`; discovery emits the canonical diagnostic variables
per device under `_Diagnostics/` (now 9 with PR ablegacy-12); collision
rejection at `InitializeAsync` for user tags shadowing reserved names or
`_Diagnostics/` addresses; the `_Diagnostics/<host>/<name>` short-circuit
returns the live snapshot through `ReadAsync` without bumping
`RequestCount`; two devices keep counters independent.
- `AbLegacyAutoDemoteTests`**PR ablegacy-12 / #255** auto-demote on comm
failure: 3 consecutive failures arm the demote window and surface
`HostState.Demoted`; subsequent reads short-circuit with
`BadCommunicationError` *without invoking libplctag* (verified via
`factory.Tags["N7:0"].ReadCount` not advancing); successful read resets
the consecutive-failure counter; failure-success-failure pattern doesn't
cross the threshold; `DemoteCount` + `LastDemotedUtc` surface via
`_Diagnostics/`; `Enabled=false` opts out (failures still count, demotion
never fires); `ReinitializeAsync` clears the active window but preserves
cumulative `DemoteCount`; cool-down expiry allows the next read through;
two devices in one driver — one faulty, one healthy — proves the faulty
side's demotion doesn't starve the healthy side; `BadNodeIdUnknown`
(terminal) does not count toward the comm-failure tally; DTO JSON
round-trip preserves `FailureThreshold` / `DemoteForMs` / `Enabled` at
the per-device level; `HostState.Demoted` enum value is wired through
`Core.Abstractions`. Companion integration test in
`tests/.../IntegrationTests/AbLegacyAutoDemoteTests.cs` runs the
two-device-one-unreachable scenario against a live ab_server fixture
using `127.0.0.1:1` as the unreachable peer.
- `RsLogixSymbolImportTests` — ablegacy-11 / #254 RSLogix CSV symbol-import parser:
canonical 8-row CSV (one row per N/F/B/L/ST/T/C/R) → 8 typed
`AbLegacyTagDefinition`s with the right `DataType`; header + comment-line
(`;` / `#`) skipping; malformed-row → log warning + skip (`IgnoreInvalid=true`
default) vs. `InvalidDataException` (`IgnoreInvalid=false`); empty stream →
empty result; UTF-8 BOM survival; embedded comma in quoted Description;
doubled-quote escape; `--scope` filter (Global vs. Local:N); `MaxRowsToImport`
cap; missing required header column → `InvalidDataException` regardless of
`IgnoreInvalid`; `TryResolveDataType` rejects garbage + bit-suffix overrides
the file letter (`N7:0/3` → Bit).
- `RsLogixSymbolImportGoldenTests` — golden-snapshot integration: loads
`Fixtures/rslogix-canonical.csv` (8-row canonical export covering every v1
file letter), serialises the resulting tag list, and compares to
`Fixtures/rslogix-canonical-expected.json`. On mismatch the actual JSON is
dumped to `%TEMP%/rslogix-canonical-actual.json` and the path printed in the
failure message so the dev can `cp` the golden after reviewing the diff.
- `AbLegacyDriverFactoryAddRsLogixImportTests` — covers the
`AbLegacyDriverFactoryExtensions.AddRsLogixImport` extension method:
appends imported tags onto an existing options object without dropping the
hand-rolled tags or the device list; mutates by-copy (immutability
guarantee); `AddRsLogixImportWithResult` tuple overload returns both the
modified options and the import counters.
- `AbLegacyDeadbandTests` — PR 8 per-tag deadband / change filter:
absolute-only suppression sequence `[10.0, 10.5, 11.5, 11.6] -> [10.0, 11.5]`,
percent-only suppression with a zero-prev short-circuit, both-set logical-OR
semantics (Kepware), Boolean edge-only publish, string change-only publish,
status-change always-publish, first-seen always-publish, ReinitializeAsync
cache wipe, JSON DTO round-trip.
Capability surfaces whose contract is verified: `IDriver`, `IReadable`,
`IWritable`, `ITagDiscovery`, `ISubscribable`, `IHostConnectivityProbe`,
@@ -69,6 +132,17 @@ driver-side correctness depends on libplctag being correct.
`IPerCallHostResolver` contract is verified; real PCCC wire routing across
multiple gateways is not.
### 3a. DH+ via 1756-DHRIO bridging (PR ablegacy-13 / #256)
Unit-only — coverage lives in `AbLegacyDhPlusBridgingTests`. The CIP-path
parser positive / negative cases (octal-station validation, slot bounds, port
shape) and the PLC-5-only family guard at `InitializeAsync` are exercised
against fakes. There is no Docker fixture for DH+ because libplctag's
`ab_server` doesn't simulate the DHRIO + DH+ + PLC-5 stack — wire-level
validation requires real hardware. See
[`AbLegacy-DH-Bridging.md`](AbLegacy-DH-Bridging.md) for the manual smoke
procedure.
### 4. Alarms / history
PCCC has no alarm object + no history object. Driver doesn't implement
@@ -111,6 +185,33 @@ cover the common ones but uncommon ones (`R` counters, `S` status files,
network; parts are end-of-life but still available. PLC-5 +
LogixPccc-mode behaviour + DF1 serial need specific controllers.
## Per-device options (`AbLegacyDeviceOptions`)
Each entry in `AbLegacyDriverOptions.Devices` carries:
| Field | Type | Default | Notes |
|---|---|---|---|
| `HostAddress` | string | required | `ab://host[:port]/cip-path` |
| `PlcFamily` | enum | `Slc500` | Slc500 / MicroLogix / Plc5 / LogixPccc |
| `DeviceName` | string | null | Friendly label used in browse + diagnostics |
| `Timeout` | TimeSpan? | null → driver-wide default | **PR 9 / #252** — wins over the driver-wide `Timeout`. Mix-and-match: SLC 5/01 ≈ 5 s, SLC 5/05 ≈ 2 s, MicroLogix 1100 ≈ 3 s |
| `Retries` | int? | null → driver-wide default → 0 | **PR 9 / #252** — retries on transient `BadCommunicationError`; terminal errors surface on the first attempt |
JSON shape (mirrored on `AbLegacyDeviceDto`):
```json
{
"HostAddress": "ab://192.168.1.10/1,0",
"PlcFamily": "Slc500",
"DeviceName": "slc-5-01-line-A",
"TimeoutMs": 5000,
"Retries": 1
}
```
Per-device overrides also flow into the probe loop — slow chassis won't be
falsely marked Stopped just because the driver-wide probe timeout is tight.
## Key fixture / config files
- `tests/ZB.MOM.WW.OtOpcUa.Driver.AbLegacy.IntegrationTests/AbLegacyServerFixture.cs`
@@ -124,5 +225,10 @@ cover the common ones but uncommon ones (`R` counters, `S` status files,
— known-limitations write-up + resolution paths
- `tests/ZB.MOM.WW.OtOpcUa.Driver.AbLegacy.Tests/FakeAbLegacyTag.cs`
in-process fake + factory
- `tests/ZB.MOM.WW.OtOpcUa.Driver.AbLegacy.Tests/Fixtures/rslogix-canonical.csv`
— ablegacy-11 / #254 8-row canonical RSLogix CSV symbol export, one row per
v1 file letter (N/F/B/L/ST/T/C/R)
- `tests/ZB.MOM.WW.OtOpcUa.Driver.AbLegacy.Tests/Fixtures/rslogix-canonical-expected.json`
— golden snapshot the import tests compare against
- `src/ZB.MOM.WW.OtOpcUa.Driver.AbLegacy/AbLegacyDriver.cs` — scope remarks
at the top of the file
+69 -3
View File
@@ -38,6 +38,16 @@ quirk. UDT / alarm / quirk behavior is verified only by unit tests with
- `--plc controllogix` and `--plc compactlogix` mode dispatch.
- The skip-on-missing-binary behavior (`AbServerFactAttribute`) so a fresh
clone without the simulator stays green.
- **Symbolic vs Logical addressing wall-clock** (PR abcip-3.2,
`AbCipAddressingModeBenchTests`) — both modes complete + emit timing.
**Emulate-tier only**: `ab_server` does not currently honour the CIP Symbol
Object class 0x6B `cip_addr` attribute that Logical mode sets, so on the
fixture the two modes measure the same wire path. The bench scaffold
asserts both complete + records timing for human inspection; the actual
Symbolic-vs-Logical perf comparison requires a real ControlLogix /
CompactLogix on the network. See
[`docs/drivers/AbCip-Performance.md`](AbCip-Performance.md) §"Addressing
mode" for the full caveat.
## What it does NOT cover
@@ -60,6 +70,19 @@ Unit coverage: `AbCipFetchUdtShapeTests`, `CipTemplateObjectDecoderTests`,
`AbCipDriverWholeUdtReadTests` — all with golden Template-Object byte buffers
+ offset-keyed `FakeAbCipTag` values.
PR abcip-3.3 layers a per-device **`ReadStrategy`** selector on top
(`WholeUdt` / `MultiPacket` / `Auto`, see
[`AbCip-Performance.md`](AbCip-Performance.md) §"Read strategy"). Strategy
switching is planner-side: the dispatcher picks between
`AbCipUdtReadPlanner` (whole-UDT) and `AbCipMultiPacketReadPlanner`
(per-member, bundled per parent) per batch. The selector + per-batch Auto
heuristic + family-compat fall-back + per-device dispatch counters are
**unit-tested only** in `AbCipReadStrategyTests``ab_server` cannot host
a 50-member UDT to exercise the sparse case the strategy is designed for,
and the libplctag .NET wrapper (1.5.x) does not expose explicit
Multi-Service-Packet bundling, so wire-level coverage stays Emulate-tier
in `AbCipEmulateMultiPacketReadTests` (gated on `AB_SERVER_PROFILE=emulate`).
### 2. ALMD / ALMA alarm projection (#177)
Depends on the ALMD UDT shape, which `ab_server` cannot emulate. The
@@ -96,6 +119,15 @@ value per PR 10, but `ab_server` accepts whatever the client asks for — the
cap's correctness is trusted from its unit test, never stressed against a
simulator that rejects oversized requests.
PR abcip-3.1 layers the **per-device `ConnectionSize` override** on top
(`AbCipDeviceOptions.ConnectionSize`, range `[500..4002]`, see
[`AbCip-Performance.md`](AbCip-Performance.md)). Same gap — `ab_server`
happily honours an oversized override against the CompactLogix profile, so
the legacy-firmware warning + Forward Open rejection that real 5069-L1/L2/L3
parts emit are unit-tested only. Live coverage stays Emulate / rig-only
(connect against a real CompactLogix L2 with `ConnectionSize=1500` to
confirm the Forward Open fails with CIP error 0x01/0x113).
### 6. BOOL-within-DINT read-modify-write (#181)
The `AbCipDriver.WriteBitInDIntAsync` RMW path + its per-parent `SemaphoreSlim`
@@ -107,14 +139,48 @@ the RMW path is not exercised end-to-end.
No smoke test for:
- `IWritable.WriteAsync`
- `IWritable.WriteAsync` — atomic write coverage; PR abcip-4.2 added a
multi-write *suppression* smoke (jittery 5-write sequence with
`WriteDeadband: 1.0` against `ab_server`, asserting the driver's
diagnostics counter matches the expected suppression count) but pure
atomic-write coverage end-to-end is still unit-only.
- `ITagDiscovery.DiscoverAsync` (`@tags` walker)
- `ISubscribable.SubscribeAsync` (poll-group engine)
- `IHostConnectivityProbe` state transitions under wire failure
- ~~`IHostConnectivityProbe` state transitions under wire failure~~
covered as of PR abcip-4.3. `AbCipSystemTagDiscoveryTests` connects to
`ab_server`, drives the discovery + read path against the synthetic
`_System/_ConnectionStatus` variable, and asserts the live snapshot
reflects the probe-driven `HostState`. Wire-failure transitions still
rely on unit-level `ThrowOnRead` injection rather than a real wire pull,
but the end-to-end probe → snapshot → OPC UA address-space link is
exercised against `ab_server`.
- `IPerCallHostResolver` multi-device routing
The driver implements all of these + they have unit coverage, but the only
end-to-end path `ab_server` validates today is atomic `ReadAsync`.
end-to-end paths `ab_server` validates today are atomic `ReadAsync` and
write-deadband / write-on-change suppression.
### 8. ControlLogix HSBY paired-IP role probing (PR abcip-5.1)
`ab_server` has no second-chassis concept and no `WallClockTime.SyncStatus`
tag. The HSBY paired-IP role-prober (PR abcip-5.1) is unit-tested only —
`AbCipHsbyTests` drives two fake runtimes (primary + partner), pins each
chassis's role-tag value, and asserts the active-resolution rules + DTO
round-trip + diagnostics surface.
The `paired` Docker compose profile spins up two `ab_server` instances +
a stub `hsby-mux` sidecar so the topology is documented, but PR 5.2 follow-
up needs a patched `ab_server` image (or a Python shim) that actually
serves the role tag before the integration test
(`AbCipHsbyRoleProberTests`) can flip its `Assert.Skip` into a real wire
assertion. Until then the test is gated on `Category=Hsby` + skipped by
default.
Lab-rig coverage is the authoritative path — a real 1756-RM redundant
chassis pair is the only place the live `WallClockTime.SyncStatus` matrix
+ split-brain handling can be exercised end-to-end. See
[`AbCip-HSBY.md`](AbCip-HSBY.md) for the full configuration + role-tag
detection matrix.
## Logix Emulate golden-box tier
+131 -112
View File
@@ -2,149 +2,168 @@
Coverage map + gap inventory for the FANUC FOCAS2 CNC driver.
**Status:** as of 2026-04-24, OtOpcUa speaks FOCAS2 directly over TCP
via the pure-managed [`Focas.Wire`](https://github.com/Ladder99/focas-mock/tree/main/dotnet/Focas.Wire)
client. Integration tests run the managed driver end-to-end against the
vendored `focas-mock` Python server (at
[`tests/.../Docker/focas-mock/`](../../tests/ZB.MOM.WW.OtOpcUa.Driver.FOCAS.IntegrationTests/Docker/focas-mock/VENDORED.md))
whose native FOCAS Ethernet responder is verified PDU-by-PDU against the
real `fwlibe64.dll`.
**TL;DR: there is no integration fixture.** Every test uses a
`FakeFocasClient` injected via `IFocasClientFactory`. Fanuc's FOCAS library
(`Fwlib32.dll`) is closed-source proprietary with no public simulator;
CNC-side behavior is trusted from field deployments.
No shim DLL, no P/Invoke, no licensed binary — any dev box or CI runner
with Docker can run the full fixture end-to-end.
## What the fixture is
Hardware validation against a real CNC is still useful to catch
series-specific firmware quirks (see [§ Hardware-only gaps](#hardware-only-gaps))
but the mock's wire responder covers every FOCAS call OtOpcUa issues.
Nothing at the integration layer.
`tests/ZB.MOM.WW.OtOpcUa.Driver.FOCAS.Tests/` is unit-only. The driver ships
as Tier C (process-isolated) per `docs/v2/driver-stability.md` because the
FANUC DLL has known crash modes; tests can't replicate those in-process.
## What the fixture covers
## What it actually covers (unit only)
### Unit layer (no container required)
`tests/ZB.MOM.WW.OtOpcUa.Driver.FOCAS.Tests/` uses `FakeFocasClient`
injected via `IFocasClientFactory`:
- `FocasCapabilityTests` — data-type mapping (PMC bit / byte / word /
long / float / double, macro variable types, parameter types)
- `FocasCapabilityMatrixTests` — per-CNC-series range validation across
16i / 0i-D / 0i-F / 30i / Power Motion, 46 theory cases locking every
documented range boundary. See
[`docs/v2/focas-version-matrix.md`](../v2/focas-version-matrix.md).
- `FocasReadWriteTests` — read / write contract against the fake, FOCAS
native status → OPC UA `StatusCode` mapping
- `FocasCapabilityTests` — data-type mapping (PMC bit / word / float,
macro variable types, parameter types)
- `FocasCapabilityMatrixTests` — per-CNC-series range validation (macro
/ parameter / PMC letter + number) across 16i / 0i-D / 0i-F /
30i / PowerMotion. See [`docs/v2/focas-version-matrix.md`](../v2/focas-version-matrix.md)
for the authoritative matrix. 46 theory cases lock every documented
range boundary — widening a range without updating the doc fails a
test.
- `FocasReadWriteTests` — read + write against the fake, FOCAS native status
→ OPC UA StatusCode mapping
- `FocasScaffoldingTests``IDriver` lifecycle + multi-device routing
- `FocasPmcBitRmwTests` — PMC bit read-modify-write synchronisation
- `FocasAlarmProjectionTests` — raise / clear diffing, severity mapping
- `FocasHandleRecycleTests` — proactive session recycle cadence
- `FocasPmcBitRmwTests` — PMC bit read-modify-write synchronization (per-byte
`SemaphoreSlim`, mirrors the AB / Modbus pattern from #181)
- `FwlibNativeHelperTests``Focas32.dll``Fwlib32.dll` bridge validation
+ P/Invoke signature validation
Capability surfaces whose contract is verified: `IDriver`, `IReadable`,
`ITagDiscovery`, `ISubscribable`, `IHostConnectivityProbe`,
`IPerCallHostResolver`, `IAlarmSource`. `IWritable` intentionally
returns `BadNotWritable` — OtOpcUa is read-only against FOCAS.
`IWritable`, `ITagDiscovery`, `ISubscribable`, `IHostConnectivityProbe`,
`IPerCallHostResolver`.
Pre-flight validation runs in `FocasDriver.InitializeAsync` — configs
referencing out-of-range addresses fail at load time with a diagnostic
message naming the CNC series + documented limit.
message naming the CNC series + documented limit. This closes the
cheap half of the hardware-free stability gap; Tier-C process
isolation (task #220) closes the expensive half — see
[`docs/v2/implementation/focas-isolation-plan.md`](../v2/implementation/focas-isolation-plan.md).
### Integration layer (mock only, no CNC, no shim)
## What it does NOT cover
`tests/ZB.MOM.WW.OtOpcUa.Driver.FOCAS.IntegrationTests/` drives the
managed `FocasDriver` end-to-end. A single gate:
### 1. FOCAS wire traffic
**Docker compose up** — tests skip when the TCP probe to
`localhost:8193` fails with a pointer to the compose command.
No FOCAS TCP frame is sent. `Fwlib32.dll`'s TCP-to-FANUC-gateway exchange is
closed-source; the driver trusts the P/Invoke layer per #193. Real CNC
correctness is trusted from field deployments.
When the mock is up, `WireFocasClient` dials it over TCP exactly like a
real CNC, and the mock's native FOCAS Ethernet responder replies with
binary PDUs against the documented command IDs. Covered assertions:
### 2. Alarm / parameter-change callbacks
- Session open / close (`cnc_allclibhndl3` + `cnc_freelibhndl`)
- Parameter read-back after `mock_patch` seed → `cnc_rdparam`
- Macro read-back after seed → `cnc_rdmacro` (scaled-decimal
translation verified)
- PMC range read after seed → `pmc_rdpmcrng`
- `IAlarmSource` raise + clear transitions after `mock_patch`
alarm-list changes → `cnc_rdalmmsg2`
- Fixed-tree bootstrap: identity / axes / spindle / program / timers /
servo meters populate via `cnc_sysinfo`, `cnc_rdaxisname`,
`cnc_rdspdlname`, `cnc_rddynamic2`, `cnc_exeprgname2`,
`cnc_rdblkcount`, `cnc_rdopmode`, `cnc_rdsvmeter`, `cnc_rdspload`,
`cnc_rdspmaxrpm`, `cnc_rdtimer`
- Per-series profile selection via `mock_load_profile` — tests can
pin one profile and assert series-gated capability suppression
FOCAS has no push model — the driver polls via the shared `PollGroupEngine`.
There are no CNC-initiated callbacks to test; the absence is by design.
### E2E script (CLI)
### 3. Macro / ladder variable types
`scripts/e2e/test-focas.ps1` drives the Client.CLI against a running
OtOpcUa server. Accepts:
FANUC has CNC-specific extensions (macro variables `#100-#999`, system
variables `#1000-#5000`, PMC timers / counters / keep-relays) whose
per-address semantics differ across 0i-F / 30i / 31i / 32i Series. Driver
covers the common address shapes; per-model quirks are not stressed.
- `-CncHost` / `-CncPort` for real hardware
- `-ProfileName <compose-profile>` for the Docker mock
- `-Series <csv>` for per-series matrix mode
- `-HandleLeakCycles <N>` for handle-leak stress
### 4. Model-specific behavior
## Hardware-only gaps
- Alarm retention across power cycles (model-specific CNC behavior)
- Parameter range enforcement (CNC rejects out-of-range writes)
- MTB (machine tool builder) custom screens that expose non-standard data
The mock has parity with the real `fwlibe64.dll` for the calls OtOpcUa
issues, but a real CNC can still surface things a reference
implementation can't:
### 5. Tier-C process isolation — architecture shipped, Fwlib32 integration hardware-gated
1. **Series-specific firmware quirks** — alarm retention across power
cycles, parameter range enforcement by the CNC (not the driver),
MTB custom screens, series-specific option bits. Each series has
documented behaviours that only a bench CNC exercises.
2. **Wire-level stress** — burst reads, concurrent device writes,
network-partition recovery under load. The mock handles these
correctly but production behaviour is the source of truth.
3. **Transient operational states** — alarm floods, emergency-stop
transitions, power-on resync. These are easy to stub but hard to
cover comprehensively in the mock.
The Tier-C architecture is now in place as of PRs #169#173 (FOCAS
PR AE, task #220):
Track the close-out under task #54 (live-CNC smoke). When the rig
lands, the hardware path runs alongside the mock path; the mock
stays as the CI quality gate.
- `Driver.FOCAS.Shared` carries MessagePack IPC contracts
- `Driver.FOCAS.Host` (.NET 4.8 x86 Windows service via NSSM) accepts
a connection on a strictly-ACL'd named pipe + dispatches frames to
an `IFocasBackend`
- `Driver.FOCAS.Ipc.IpcFocasClient` implements the `IFocasClient` DI
seam by forwarding over IPC — swap the DI registration and the
driver runs Tier-C with zero other changes
- `Driver.FOCAS.Supervisor.FocasHostSupervisor` owns the spawn +
heartbeat + respawn + 3-in-5min crash-loop breaker + sticky alert
- `Driver.FOCAS.Host.Stability.PostMortemMmf`
`Driver.FOCAS.Supervisor.PostMortemReader` — ring-buffer of the
last ~1000 IPC operations survives a Host crash
## When to trust each layer
The one remaining gap is the production `FwlibHostedBackend`: an
`IFocasBackend` implementation that wraps the licensed
`Fwlib32.dll` P/Invoke. That's hardware-gated on task #222 — we
need a CNC on the bench (or the licensed FANUC developer kit DLL
with a test harness) to validate it. Until then, the Host ships
`FakeFocasBackend` + `UnconfiguredFocasBackend`. Setting
`OTOPCUA_FOCAS_BACKEND=fake` lets operators smoke-test the whole
Tier-C pipeline end-to-end without any CNC.
| Question | Unit | Integration (mock) | Real CNC |
| --- | :---: | :---: | :---: |
| "Does PMC address `R100.3` route to the right bit?" | ✅ | ✅ | ✅ |
| "Does the Fanuc status → OPC UA StatusCode map cover every documented code?" | ✅ (contract) | ✅ | ✅ |
| "Does `FocasDriver.ReadAsync` correctly decode a seeded parameter?" | no | ✅ | ✅ |
| "Does `IAlarmSource` fire raise + clear events?" | ✅ (Fake) | ✅ (wire) | ✅ |
| "Does a real read against a 30i Series return correct bytes?" | no | ✅ (via profile) | ✅ (required) |
| "Do series-specific firmware quirks behave as documented?" | no | no | ✅ (required) |
| "Does the driver survive real network partitions?" | no | partial (socket kill) | ✅ (required) |
## When to trust FOCAS tests, when to reach for a rig
## Running the integration fixture
| Question | Unit tests | Real CNC |
| --- | --- | --- |
| "Does PMC address `R100.3` route to the right bit?" | yes | yes |
| "Does the FANUC status → OPC UA StatusCode map cover every documented code?" | yes (contract) | yes |
| "Does a real read against a 30i Series return correct bytes?" | no | yes (required) |
| "Does `Fwlib32.dll` crash on concurrent reads?" | no | yes (stress) |
| "Do macro variables round-trip across power cycles?" | no | yes (required) |
```powershell
# 1) Start the mock on a chosen profile.
docker compose -f tests/ZB.MOM.WW.OtOpcUa.Driver.FOCAS.IntegrationTests/Docker/docker-compose.yml up -d
## Alarm history (`cnc_rdalmhistry`) — issue #267, plan PR F3-a
# 2) Run the tests. No shim build, no DLL copy — the driver dials the mock directly.
dotnet test tests/ZB.MOM.WW.OtOpcUa.Driver.FOCAS.IntegrationTests/
```
`FocasAlarmProjection` ships two modes:
Or use `scripts/integration/run-focas.ps1` which wraps compose up / test
/ compose down and accepts `-Profile <name>` to pin a per-series run.
- **`ActiveOnly`** (default) — surfaces only currently-active alarms.
No history poll. Same back-compat shape every prior FOCAS deployment used.
- **`ActivePlusHistory`** — additionally polls `cnc_rdalmhistry` on connect
+ on the configured cadence (`HistoryPollInterval`, default 5 min). Each
unseen entry fires an `OnAlarmEvent` with `SourceTimestampUtc` set from
the CNC's reported timestamp, not Now.
Unit-test coverage in `FocasAlarmProjectionTests`:
- mode `ActiveOnly` — no `ReadAlarmHistoryAsync` call ever issued
- mode `ActivePlusHistory` — first poll fires on subscribe (== "on connect")
- dedup — same `(OccurrenceTime, AlarmNumber, AlarmType)` triple across two
polls only emits once
- distinct entries with different timestamps each emit separately
- same alarm number / different type still emits both (type is part of the
dedup key)
- `OccurrenceTime` is the wire timestamp (round-trips a year-old stamp
without bleeding into Now)
- `HistoryDepth` clamp — user-supplied 500 collapses to 250 on the wire;
zero / negative falls back to the 100 default
- `FocasAlarmHistoryDecoder` — round-trips through `Encode` / `Decode` and
pins the simulator command id at `0x0F1A`
Future integration coverage (not yet shipped — no FOCAS integration test
project exists):
- a focas-mock with a per-profile ring buffer and `mock_patch_alarmhistory`
admin endpoint will let `cnc_rdalmhistry` round-trip end-to-end through
the wire protocol
- `FocasSimFixture.SeedAlarmHistoryAsync` will let series tests prime canned
history without per-test JSON
## Follow-up candidates
1. **Nothing public** — Fanuc's FOCAS Developer Kit ships an emulator DLL
but it's under NDA + tied to licensed dev-kit installations; can't
redistribute for CI.
2. **Lab rig** — used FANUC 0i-F simulator controller (or a retired machine
tool) on a dedicated network; only path that covers real CNC behavior.
3. **Process isolation first** — before trusting FOCAS in production at
scale, shipping the Tier-C out-of-process Host architecture (similar to
Galaxy) is higher value than a CI simulator.
## Key fixture / config files
- `tests/ZB.MOM.WW.OtOpcUa.Driver.FOCAS.IntegrationTests/Docker/focas-mock/`
— vendored `focas-mock` Python source + Dockerfile
- `tests/ZB.MOM.WW.OtOpcUa.Driver.FOCAS.IntegrationTests/Docker/docker-compose.yml`
— per-series compose profiles
- `tests/ZB.MOM.WW.OtOpcUa.Driver.FOCAS.IntegrationTests/FocasSimFixture.cs`
— collection fixture + mock admin API client
- `tests/ZB.MOM.WW.OtOpcUa.Driver.FOCAS.IntegrationTests/Series/FixedTreePopulatesTests.cs`
— fixed-tree end-to-end tests
- `tests/ZB.MOM.WW.OtOpcUa.Driver.FOCAS.IntegrationTests/Series/WireBackendTests.cs`
— pure-wire-backend end-to-end tests
- `tests/ZB.MOM.WW.OtOpcUa.Driver.FOCAS.Tests/FakeFocasClient.cs`
in-process unit fake
- `src/ZB.MOM.WW.OtOpcUa.Driver.FOCAS/Wire/WireFocasClient.cs` — the
managed wire client backing production deployments
in-process fake implementing `IFocasClient`
- `tests/ZB.MOM.WW.OtOpcUa.Driver.FOCAS.Tests/FocasCapabilityMatrixTests.cs`
— parameterized theories locking the per-series matrix
- `src/ZB.MOM.WW.OtOpcUa.Driver.FOCAS/FocasDriver.cs` — ctor takes
`IFocasClientFactory`
- `src/ZB.MOM.WW.OtOpcUa.Driver.FOCAS/FocasCapabilityMatrix.cs`
per-series range validator
per-CNC-series range validator (the matrix the doc describes)
- `docs/v2/focas-version-matrix.md` — authoritative range reference
- `docs/v2/implementation/focas-isolation-plan.md` — Tier-C isolation
plan (task #220)
- `docs/v2/driver-stability.md` — Tier C scope + process-isolation rationale
+249 -206
View File
@@ -1,238 +1,281 @@
# FOCAS Driver
# FOCAS driver
Getting-started guide for the FANUC FOCAS2 driver. This is the short path — for
the exhaustive per-node mapping read [`docs/v2/driver-specs.md §7`](../v2/driver-specs.md),
for the test-harness map read [FOCAS-Test-Fixture.md](FOCAS-Test-Fixture.md).
Fanuc CNC driver for the FS 0i / 16i / 18i / 21i / 30i / 31i / 32i / 35i /
Power Mate i families. Talks to the controller via the licensed
`Fwlib32.dll` (Tier C, process-isolated per
[`docs/v2/driver-stability.md`](../v2/driver-stability.md)).
## What it talks to
For range-validation and per-series capability surface see
[`docs/v2/focas-version-matrix.md`](../v2/focas-version-matrix.md).
FANUC CNCs (0i-D / 0i-F / 0i-MF / 0i-TF / 16i / 30i / 31i / 32i / Power Motion i)
over the proprietary FOCAS2 protocol on TCP port 8193. The wire is spoken
directly by the pure-managed [`Focas.Wire`](https://github.com/Ladder99/focas-mock)
client — no Fwlib64.dll, no P/Invoke, no out-of-process isolation needed.
## Fixed-tree `Production/` projection — issue #258 (F1-b) + issue #272 (F5-a)
OtOpcUa is **read-only** against FOCAS; all reads go over the native wire
protocol using the documented command IDs. Writes return
`BadNotWritable` by design.
Per-device read-only nodes refreshed from the same `cnc_rdparam` /
cycle-timer poll the probe loop already runs. No additional wire calls
are issued for any of these — they are all cache-or-derive reads.
## Project split
| Node | DataType | Source | Notes |
| --- | --- | --- | --- |
| `Production/PartsProduced` | `Int32` | `cnc_rdparam(6711)` | Active parts-count counter. Wraps to 0 on operator reset. |
| `Production/PartsRequired` | `Int32` | `cnc_rdparam(6712)` | Operator-set target. |
| `Production/PartsTotal` | `Int32` | `cnc_rdparam(6713)` | Lifetime parts counter. |
| `Production/CycleTimeSeconds` | `Int32` | `cnc_rdtimer` (channel 0) | Live cycle-time accumulator. Resets to 0 on next cycle start (CNC-side behaviour). |
| **`Production/LastCycleSeconds`** | **`Float64`** | **derived** | **Plan PR F5-a — seconds for the most recently completed cycle, computed as `CycleTimeSeconds(now) - CycleTimeSeconds(at previous parts-count increment)`. `null` until the second observed parts-count increment establishes a delta. Pure derivation, no new wire calls. See edge-case rules below.** |
| **`Production/LastCycleStartUtc`** | **`DateTime`** *(UTC)* | **derived** | **Plan PR F5-a — UTC wall-clock of the most-recent cycle's start, computed as `nowUtc - LastCycleSeconds`. `null` alongside `LastCycleSeconds` until the second observed increment.** |
| Project | Target | Role |
|---------|--------|------|
| `src/ZB.MOM.WW.OtOpcUa.Driver.FOCAS/` | net10.0 | In-process driver — hosts `WireFocasClient` which speaks FOCAS2 over TCP directly |
### F5-a derivation edge-case rules
Previous `Driver.FOCAS.Host` / `Driver.FOCAS.Shared` Tier-C split has been
retired — the managed wire client removes the native-crash blast radius
that justified the out-of-process service.
- **First observation** establishes the baseline; `LastCycleSeconds` /
`LastCycleStartUtc` stay `null` until the second observed parts-count
increment produces the first delta.
- **Parts-count counter reset** (current value goes backwards, e.g.
shift-change zero) **preserves the last published values** so an
operator reading the tag mid-shift-change sees the last known cycle
duration rather than `null` / Bad. The next positive transition
produces a fresh delta from the new baseline.
- **Cycle-timer rollover** (delta would be negative — e.g. CNC zeroes
the cycle timer at part completion) **leaves the previously-published
values unchanged for one tick** and re-baselines so the next
increment produces a clean delta. The driver does NOT publish a
negative `LastCycleSeconds`.
- **Parts-count jumps `> 1`** (backfill — e.g. counter increments by
3 at once) publish the **timer delta over the window** as
`LastCycleSeconds`. The plan's "delta over the window between
successive parts-count increments" definition does not divide by the
count delta; the value reflects the actual elapsed timer between the
two observations.
- **Reconnect / reinit** clears the derivation state — the prior CNC
session's cycle-timer + parts-count snapshots may be invalidated by
the FWLIB session boundary, so the next post-reconnect probe tick
re-establishes the baseline before the next delta publishes.
## Minimum deployment
## Alarm history (`cnc_rdalmhistry`) — issue #267, plan PR F3-a
Register the driver instance in the main server's `appsettings.json`. No
separate service, no DLL deployment, no shared-secret handshake:
`FocasAlarmProjection` exposes two modes via `FocasDriverOptions.AlarmProjection`:
| Mode | Behaviour |
| --- | --- |
| `ActiveOnly` *(default)* | Subscribe / unsubscribe / acknowledge wire up so capability negotiation works, but no history poll runs. Back-compat with every pre-F3-a deployment. |
| `ActivePlusHistory` | On subscribe (== "on connect") and on every `HistoryPollInterval` tick, the projection issues `cnc_rdalmhistry` for the most recent `HistoryDepth` entries. Each previously-unseen entry fires an `OnAlarmEvent` with `SourceTimestampUtc` set from the CNC's reported timestamp — OPC UA dashboards see the real occurrence time, not the moment the projection polled. |
### Config knobs
```jsonc
"Drivers": {
"focas-cnc-1": {
"Type": "FOCAS",
"Config": {
"Backend": "wire",
"Devices": [
{ "HostAddress": "focas://10.20.30.40:8193", "Series": "ThirtyOne_i" }
],
"Tags": [
{ "Name": "Mode", "DeviceHostAddress": "focas://10.20.30.40:8193",
"Address": "PARAM:3402", "DataType": "Int32", "Writable": false },
{ "Name": "SpndLoad", "DeviceHostAddress": "focas://10.20.30.40:8193",
"Address": "MACRO:500", "DataType": "Float64", "Writable": false }
]
}
}
}
```
The main server opens two TCP sockets per configured device and speaks the
FOCAS2 binary protocol directly. No local privileged components, no
platform bitness constraint — the driver runs on every host OtOpcUa runs
on.
## Address forms
| Form | Example | Meaning |
|------|---------|---------|
| `X0.0` / `R100` / `R100.3` | PMC bit or byte | Letter + number; optional `.bit` for bit access |
| `PARAM:1815` / `PARAM:1815/0` | CNC parameter | Number + optional axis index |
| `MACRO:500` | Custom macro variable | System / user macro variable number |
Addresses are validated against the per-device `Series` at `InitializeAsync`
a config referencing a number outside the documented range for that series
fails at load time with an error message naming the limit. See
[`docs/v2/focas-version-matrix.md`](../v2/focas-version-matrix.md) for the
authoritative range table.
## Backend selection
The driver picks its client from `Config.Backend`:
| Value | Client | Use it for |
|-------|--------|------------|
| `wire` (default) | `WireFocasClient` | Production — pure-managed FOCAS2 over TCP |
| `unimplemented` / `none` / `stub` | `UnimplementedFocasClientFactory` | Scaffolding a DriverInstance row before the CNC endpoint is reachable |
Previous backends (`fwlib`, `fwlib32`, `ipc`) have been retired along
with `Driver.FOCAS.Host` and the Fwlib P/Invoke path. Configs that still
reference them will throw at startup with a message pointing here.
## Capability surface
| Capability | Wire path | Notes |
|------------|-----------|-------|
| `IReadable` | `ReadAsync``cnc_rdpmcrng` / `cnc_rdparam` / `cnc_rdmacro` | One TCP request/response per read; `Focas.Wire` serializes requests on socket 2 internally |
| `IWritable` | returns `BadNotWritable` | OtOpcUa is read-only against FOCAS by design — no `cnc_wrparam` / `pmc_wrpmcrng` / `cnc_wrmacro` path is implemented |
| `ITagDiscovery` | `DiscoverAsync` | Emits `FOCAS/{device}/{tag}` folders per configured device |
| `ISubscribable` | polled via shared `PollGroupEngine` | FOCAS has no push model — subscriptions turn into per-tag polling groups |
| `IHostConnectivityProbe` | periodic `cnc_rdcncstat` | Probe cadence is `Probe.Interval`; transitions fire `OnHostStatusChanged` |
| `IPerCallHostResolver` | lookup in `_tagsByName` | Each call routes to the device of the referenced tag |
| `IAlarmSource` | polled `cnc_rdalmmsg2` via `FocasAlarmProjection` | Opt-in — set `AlarmProjection.Enabled=true`; diffs `(AlarmNumber, Type)` between ticks |
Ack is a no-op — FANUC clears alarms on its own once the underlying condition
resolves, so `AcknowledgeAsync` swallows the batch rather than surfacing
`BadNotSupported`.
## Fixed node tree
Enable a pre-defined hierarchy of CNC nodes populated automatically from
`cnc_sysinfo` + `cnc_rdaxisname` + `cnc_rddynamic2` + related FWLIB calls,
so operators get an out-of-the-box view of identity / axes / program /
timers without declaring per-address tags.
```jsonc
"Config": {
"Devices": [ ... ],
"Tags": [ ... ],
"FixedTree": {
"Enabled": true,
"PollInterval": "00:00:00.250", // fast — per-axis dynamic reads
"ProgramPollInterval": "00:00:01", // medium — program + mode changes
"TimerPollInterval": "00:00:30" // slow — cumulative counters
}
}
```
What gets populated (all under `FOCAS/{deviceHostAddress}/`):
| Subtree | Nodes | Source call |
|---------|-------|-------------|
| `Identity/` | `SeriesNumber`, `Version`, `MaxAxes`, `CncType`, `MtType`, `AxisCount` | `cnc_sysinfo` once at bootstrap |
| `Axes/{name}/` | `AbsolutePosition`, `MachinePosition`, `RelativePosition`, `DistanceToGo`, `ServoLoad` — one folder per discovered axis | `cnc_rdaxisname` once + `cnc_rddynamic2` + `cnc_rdsvmeter` per tick |
| `Axes/FeedRate/Actual`, `Axes/SpindleSpeed/Actual` | Current feed + spindle RPM | `cnc_rddynamic2` |
| `Spindle/{name}/` | `Load` (percentage), `MaxRpm` — one folder per discovered spindle | `cnc_rdspdlname` once + `cnc_rdspload` + `cnc_rdspmaxrpm` |
| `Program/` | `Name` (filename), `ONumber`, `Number`, `MainNumber`, `Sequence`, `BlockCount` | `cnc_exeprgname2` + `cnc_rdblkcount` + cached `cnc_rddynamic2` |
| `OperationMode/` | `Mode` (int), `ModeText` ("AUTO", "MDI", "EDIT", …) | `cnc_rdopmode` |
| `Timers/` | `PowerOnSeconds`, `OperatingSeconds`, `CuttingSeconds`, `CycleSeconds` | `cnc_rdtimer` × 4 |
### Per-series node suppression
The driver probes each optional call once at bootstrap. If the target CNC
returns `EW_FUNC` / `EW_NOOPT` / `EW_VERSION` on the wire, the
corresponding subtree is **not emitted** — the operator doesn't see nodes
that will only ever return `BadDeviceFailure`. Capability suppression
covers `Spindle/`, `Program/` + `OperationMode/`, `Timers/`, and
per-axis `ServoLoad` independently. Identity + `Axes/*` position reads
(which every Fanuc CNC supports) are always emitted.
Position values are scaled integers (matching FOCAS's convention). The
managed side exposes them as `Float64` OPC UA nodes; a future
`cnc_getfigure` integration will add per-axis decimal scaling. Until
then, treat the raw integer as the value the CNC reports and scale on
the client side if decimal precision matters.
**Still user-authored**: `PARAM:6711`, `MACRO:500`, `R100` etc. — specific
numbers whose meaning is MTB-specific. Those go under the device folder
alongside the fixed subtree.
## Alarm projection
Alarm surfacing is **disabled by default** because the polling cost is wasted
on sites that don't consume CNC alarms. Opt in per driver instance:
```jsonc
"Config": {
"Devices": [ ... ],
"Tags": [ ... ],
{
"AlarmProjection": {
"Enabled": true,
"PollInterval": "00:00:02"
"Mode": "ActivePlusHistory", // "ActiveOnly" (default) | "ActivePlusHistory"
"HistoryPollInterval": "00:05:00", // default 5 min
"HistoryDepth": 100 // default 100, capped at 250
}
}
```
Every alarm transition fires `OnAlarmEvent` with:
### Dedup key
- `SourceNodeId` = the device host address (FOCAS has no per-node alarm model;
the CNC exposes a single flat active-alarm list per session)
- `ConditionId` = `"{host}#{Type}:{AlarmNumber}"`
- `AlarmType` = projected from FANUC's `ALM_TYPE_*` (e.g. `Overtravel`, `Servo`,
`Parameter`, `MacroAlarm`)
- `Severity` = Overtravel / Servo / PulseCode → `Critical`; Parameter / Macro
`Medium`; everything else → `High`
`(OccurrenceTime, AlarmNumber, AlarmType)`. The same triple across two
polls only emits once. The dedup set is in-memory and **resets on
reconnect** — first poll after reconnect re-emits everything in the ring
buffer. OPC UA clients that need exactly-once semantics dedupe client-side
on the same triple (the timestamp + type + number tuple is stable across
the boundary).
Cleared alarms fire a second event with `" (cleared)"` appended to the message
so downstream consumers can ignore the clear if they only care about raises.
### `HistoryDepth` cap
## Handle recycling
Capped at `FocasAlarmProjectionOptions.MaxHistoryDepth = 250` so an
operator who types `10000` by accident can't blast the wire session with a
giant request. Typical FANUC ring buffers cap at ~100 entries; the default
`HistoryDepth = 100` matches the most common ring-buffer size.
FANUC CNCs have a finite FWLIB handle pool (~510 concurrent connections) and
certain series have documented handle-leak bugs that manifest after long uptime.
The driver can proactively close + reopen each device's session on a cadence to
release its slot back to the pool:
### Wire surface
- Wire-protocol command id: `0x0F1A` (see
[`docs/v2/implementation/focas-wire-protocol.md`](../v2/implementation/focas-wire-protocol.md)).
- ODBALMHIS struct decoder: `Wire/FocasAlarmHistoryDecoder.cs`.
- Tier-C Fwlib32 backend short-circuits the packed-buffer decoder by
surfacing the FWLIB struct fields directly into
`FocasAlarmHistoryEntry`.
## Writes (opt-in, off by default) — issue #268 (F4-a) + #269 (F4-b) + #270 (F4-c)
Writes ship behind multiple independent opt-ins. All default off so a freshly
deployed FOCAS driver is read-only until the deployment makes a deliberate
choice. Decision record: [`docs/v2/decisions.md`](../v2/decisions.md) →
"FOCAS write-path opt-in".
| Knob | Default | Effect when off |
| --- | --- | --- |
| `FocasDriverOptions.Writes.Enabled` *(driver-level master switch)* | `false` | Every entry in a `WriteAsync` batch short-circuits to `BadNotWritable` with status text `writes disabled at driver level`. Wire client never gets touched. |
| **`FocasDriverOptions.Writes.AllowParameter`** *(F4-b granular kill switch)* | **`false`** | **`PARAM:` writes return `BadNotWritable` with no wire client constructed. Defense in depth — even if `Enabled = true` an operator must explicitly opt into parameter writes per kind because a misdirected `cnc_wrparam` can put the CNC in a bad state.** |
| **`FocasDriverOptions.Writes.AllowMacro`** *(F4-b granular kill switch)* | **`false`** | **`MACRO:` writes return `BadNotWritable` with no wire client constructed. Macro writes are the normal HMI-driven recipe / setpoint surface; gating them separately from `AllowParameter` lets a deployment open MACRO without exposing the heavier PARAM write surface.** |
| **`FocasDriverOptions.Writes.AllowPmc`** *(F4-c granular kill switch)* | **`false`** | **PMC writes (R/G/F/D/X/Y/K/A/E/T/C letters, both Bit and Byte) return `BadNotWritable` with no wire client constructed. PMC is ladder working memory — a mistargeted bit can move motion, latch a feedhold, or flip a safety interlock, so PMC writes are gated separately from PARAM/MACRO so an operator team can open PARAM (commissioning) without exposing the much higher-blast-radius PMC surface.** |
| `FocasTagDefinition.Writable` *(per-tag opt-in)* | `false` | The per-tag check returns `BadNotWritable` for that tag even when the driver-level flags are on. |
> **PMC SAFETY CALLOUT** — PMC is the FANUC ladder's working memory. A
> mistargeted bit can move motion (a Y-coil writing to a servo enable),
> latch a feedhold (an internal R-relay the ladder ANDs with cycle-start),
> or flip a safety interlock (an X-input shadow). **Treat PMC writes the
> same way you'd treat editing a live ladder:** verify e-stop is live and
> the machine is in jog mode before issuing the first write of a session.
> The driver gates these writes behind THREE independent opt-ins
> (`Writes.Enabled` + `Writes.AllowPmc` + per-tag `Writable`) precisely
> because the blast radius is higher than parameter writes.
### PMC bit-write read-modify-write semantics — F4-c
The FOCAS wire call `pmc_wrpmcrng` is **byte-addressed** — there is no
sub-byte write primitive. When the driver receives a write request on a
`Bit` tag (e.g. `R100.3`), it:
1. Reads the parent byte via `pmc_rdpmcrng` (1 byte at `R100`).
2. Masks the target bit (set: `current | (1 << bit)`; clear: `current & ~(1 << bit)`).
3. Writes the modified byte back via `pmc_wrpmcrng` (1 byte at `R100`).
A **per-byte semaphore** serialises concurrent bit writes against the same
byte so two updates that race never lose one another's bit. RMW means **a
PMC bit write reads first, then writes back the whole byte** — if the ladder
is also writing to that byte at the same instant, there is a small window
where the driver's value can clobber the ladder's. Operators who care about
this race must coordinate the write through a ladder-side handshake (e.g.
the operator sets a request bit, the ladder reads + clears it).
### Config shape — F4-c
```jsonc
"Config": {
"Devices": [ ... ],
"HandleRecycle": {
{
"Writes": {
"Enabled": true,
"Interval": "01:00:00"
}
"AllowParameter": true, // F4-b — opt into cnc_wrparam
"AllowMacro": true, // F4-b — opt into cnc_wrmacro
"AllowPmc": true // F4-c — opt into pmc_wrpmcrng (incl. RMW bit writes)
},
"Tags": [
{ "Name": "RPM", "Address": "PARAM:1815", "DataType": "Int32",
"Writable": true, "WriteIdempotent": false },
{ "Name": "Recipe", "Address": "MACRO:500", "DataType": "Int32",
"Writable": true, "WriteIdempotent": false },
{ "Name": "StartFlag", "Address": "R100.3", "DataType": "Bit",
"Writable": true, "WriteIdempotent": true }
]
}
```
Disabled by default — a healthy CNC + driver doesn't need it. Enable when field
experience shows handle exhaustion. Typical tuning: 30 min for sites running
multiple OtOpcUa instances against the same CNC (they share the pool); 6 h for a
single-client deployment. Reads / writes during recycle simply wait for the
reconnect rather than failing — worst case, an operator sees a brief read
latency spike once per cadence.
### Server-layer ACL (LDAP groups)
## Testing
Per the [`docs/v2/acl-design.md`](../v2/acl-design.md) tier model, the FOCAS
driver only declares per-tag `SecurityClassification`; `DriverNodeManager`
applies the gate. The classification post-F4-b is:
- **Unit tests**`tests/ZB.MOM.WW.OtOpcUa.Driver.FOCAS.Tests/` cover the
driver surface via `FakeFocasClient`. Includes the alarm-projection raise /
clear diffing tests.
- **Integration tests**`tests/ZB.MOM.WW.OtOpcUa.Driver.FOCAS.IntegrationTests/`
hold the Docker simulator scaffold; see
[`docs/v2/implementation/focas-wire-protocol.md`](../v2/implementation/focas-wire-protocol.md)
for what the simulator emits vs. real CNC behaviour.
- **E2E script**`scripts/e2e/test-focas.ps1` stages Host + Proxy + a real
CNC (or the simulator) and exercises connect → read → write → subscribe
round-trips. See [`docs/drivers/FOCAS-Test-Fixture.md`](FOCAS-Test-Fixture.md)
for the coverage map.
| Tag kind | Classification | LDAP group required (default mapping) |
| --- | --- | --- |
| `PARAM:N` writable | `Configure` | **`WriteConfigure`** |
| `MACRO:N` writable | `Operate` | `WriteOperate` |
| Other writable (PMC R/G/F/...) | `Operate` | `WriteOperate` |
| Non-writable | `ViewOnly` | (no write permission) |
## Troubleshooting
Parameter writes need the heavier `WriteConfigure` group because they're
mostly emergency commissioning territory; macro writes use `WriteOperate`
because they're the normal HMI recipe surface. The driver-level
`AllowParameter` / `AllowMacro` kill switches sit independently of ACL — an
operator-team kill switch the deployment can flip without redeploying ACL
group memberships. See [`docs/security.md`](../security.md) for the full
group/permission map.
| Symptom | Likely cause | Fix |
|---------|--------------|-----|
| `BadCommunicationError` on every read | CNC unreachable on TCP:8193 | Check firewall / LAN reachability; FOCAS Ethernet option must be licensed on the CNC side |
| Every read returns `BadNotWritable` on writes | Expected — OtOpcUa is read-only against FOCAS | If you actually need writes, open a feature request — the driver's managed wire client doesn't expose the write commands |
| `BadOutOfRange` on reads for a macro/parameter | Config address outside the declared `Series` range | Check `docs/v2/focas-version-matrix.md` — either fix the address or widen the `Series` |
| Alarm events never fire | `AlarmProjection.Enabled` left at default (false) | Set it to `true` in the driver config |
`WriteIdempotent` is plumbed through Polly retry by the server-layer
`CapabilityInvoker.ExecuteWriteAsync`. When `false` (default), failed writes
are NOT auto-retried per plan decisions #44/#45 — a timeout that fires after
the CNC already accepted the write would otherwise risk a duplicate
non-idempotent action (alarm acks, M-code pulses, recipe steps). Flip
`WriteIdempotent` on per tag for genuinely-idempotent writes (a parameter
value that the operator simply wants forced to a target).
## Further reading
### FOCAS password — issue #271 (F4-d)
- [`docs/v2/driver-specs.md §7`](../v2/driver-specs.md) — full OPC UA node
mapping, pre-defined tag set, per-API notes
- [`docs/v2/focas-version-matrix.md`](../v2/focas-version-matrix.md) —
per-series macro / parameter / PMC range table
- [`docs/v2/implementation/focas-wire-protocol.md`](../v2/implementation/focas-wire-protocol.md)
— captured FOCAS2 wire semantics (magic prefix, handshake, command-id table)
- [upstream `Focas.Wire`](https://github.com/Ladder99/focas-mock/tree/main/dotnet/Focas.Wire)
— the managed client implementation OtOpcUa consumes as a NuGet dependency
Some controllers — notably 16i and certain 30i firmwares with the
parameter-protect switch on — gate `cnc_wrparam` and a handful of reads
behind a connection-level password. Without unlocking the session, every
gated wire call returns `EW_PASSWD`, which the F4-b mapping surfaces as
`BadUserAccessDenied`.
`FocasDeviceOptions.Password` plumbs the password through the device config:
```jsonc
{
"Devices": [
{
"HostAddress": "focas://10.0.0.5:8193",
"Password": "1234" // F4-d — optional CNC password
}
]
}
```
When set, the driver:
1. **On connect**, calls `IFocasClient.UnlockAsync(password, ct)` after
the FWLIB handle opens but before any read/write fires. The FWLIB-backed
client emits `cnc_wrunlockparam` with the password ASCII-encoded into
the 4-byte FOCAS password slot (right-padded with `0x00`, truncated at
4 bytes — that's the shape the public Fanuc samples document).
2. **On `BadUserAccessDenied` from any gated read or write**, re-issues
`UnlockAsync` and retries the call **exactly once**. A second
`EW_PASSWD` propagates unchanged so a wrong password doesn't loop
forever on the wire.
3. **Reset on reconnect** — FWLIB unlock state lives on the handle, so
any reconnect path (planned or unplanned) re-runs unlock automatically
via `EnsureConnectedAsync`.
**No-log invariant.** The password is a secret. The driver MUST NOT log
it. Specifically:
- `FocasDeviceOptions` overrides the record's auto-generated `ToString`
to print `Password = ***` when the field is non-null. Any Serilog
destructure that flows the device options through `{Device}` gets the
redaction for free.
- `FwlibFocasClient.UnlockAsync` does not include the password in any
exception message — only the FWLIB return code (`EW_PASSWD`,
`EW_HANDLE`, etc.) makes it into the surface.
- `FocasDriver` logs only `"FOCAS unlock applied for {host}"` when the
unlock succeeds — no password.
- The Driver.FOCAS.Cli `--cnc-password` flag is also redacted at the
same `FocasDeviceOptions` choke point.
- See [`docs/v2/focas-deployment.md`](../v2/focas-deployment.md)
§ "FOCAS password handling" for the storage/rotation runbook + the
cross-link to [`docs/Security.md`](../Security.md).
When the controller does **not** need a password, leave `Password`
unset (`null`) and the driver short-circuits the unlock call entirely —
no wire-level cost.
### Status-code semantics post-F4-b
- `BadNotWritable` — one of: driver-level `Writes.Enabled = false`; per-tag
`Writable = false`; **`Writes.AllowParameter = false` for a `PARAM:` tag
(F4-b)**; **`Writes.AllowMacro = false` for a `MACRO:` tag (F4-b)**;
**`Writes.AllowPmc = false` for a PMC tag (F4-c)**. Same status code,
five distinct paths — operators distinguish by checking the knobs.
- `BadUserAccessDenied`**F4-b** — the CNC reported `EW_PASSWD`
(parameter-write switch off / unlock required). **F4-d** wires the
`cnc_wrunlockparam` retry path on top: when `Password` is configured
the driver re-issues unlock + retries the gated call once before
surfacing this status. A persistent `BadUserAccessDenied` after F4-d
means either (a) the password doesn't match the controller, or (b)
the parameter-write switch on the pendant is still off and the
controller wants both the switch + the password.
- `BadNotSupported` — both opt-ins flipped on, but the wire client doesn't
implement the kind being written (e.g. older transport variant). F4-a
wired the generic dispatch; F4-b adds typed `WriteParameterAsync` /
`WriteMacroAsync` entry points whose default impls return
`BadNotSupported` so transports compiled against a stale `IFocasClient`
surface still build.
- `BadNodeIdUnknown` — full-reference doesn't match any configured
`FocasTagDefinition.Name`.
- `BadCommunicationError` — wire failure (DLL not loaded, IPC peer dead,
etc.).
### CLI bypass
`otopcua-focas-cli write` ([`docs/Driver.FOCAS.Cli.md`](../Driver.FOCAS.Cli.md))
sets `Writes.Enabled=true` locally for the lifetime of one invocation
because the CLI is a per-operator tool — not a long-lived process bound to
the central config DB. The server-side flag is untouched; configure-the-
server code paths remain safer-by-default.
@@ -35,14 +35,13 @@ Multi-project test topology:
## How tests skip
- **E2E parity**: `ParityFixture.SkipIfUnavailable()` runs at class init and
checks Windows-only, ZB SQL reachable on `localhost:1433`, Host EXE built
in the expected `bin/` folder. Any miss → tests skip.
checks Windows-only, non-admin user, ZB SQL reachable on
`localhost:1433`, Host EXE built in the expected `bin/` folder. Any miss
→ tests skip.
- **Live-smoke** (`GalaxyRepositoryLiveSmokeTests`): `Assert.Skip` when ZB
unreachable. A `per project_galaxy_host_installed` memory on this repo's
dev box notes the MXAccess runtime is installed. The pipe ACL allows the
configured SID outright; elevation of the caller doesn't matter because
the per-connection SID check in `PipeServer.VerifyCaller` only compares
user SIDs (not group membership or integrity level).
dev box notes the MXAccess runtime is installed + pipe ACL denies Admins,
so live tests must run from a non-elevated shell.
- **Unit** tests (Shared, Proxy contract, most Host.Tests) have no skip —
they run anywhere.
+184 -77
View File
@@ -1,104 +1,211 @@
# Galaxy Driver
The Galaxy driver bridges OtOpcUa to AVEVA System Platform (Wonderware) Galaxies. It is a **Tier-A in-process driver** that runs in the OtOpcUa server's .NET 10 AnyCPU process and speaks gRPC to a separately installed `mxaccessgw` server (sibling repo at `c:\Users\dohertj2\Desktop\mxaccessgw\`). The gateway owns the MXAccess COM apartment, the STA + Win32 message pump, the Galaxy Repository SQL reader, and the Historian SDK — all the bits that need x86 / .NET Framework 4.8 / COM interop. The driver itself is platform-agnostic and contains no COM, no STA thread, and no x86 bitness constraint.
The Galaxy driver bridges OtOpcUa to AVEVA System Platform (Wonderware) Galaxies through the `ArchestrA.MxAccess` COM API plus the Galaxy Repository SQL database. It is one driver of seven in the OtOpcUa platform (see [drivers/README.md](README.md) for the full list); all other drivers run in-process in the main Server (.NET 10 x64). Galaxy is the exception — it runs as its own Windows service and talks to the Server over a local named pipe.
For the driver spec (capability surface, config shape, addressing), see [docs/v2/driver-specs.md §1](../v2/driver-specs.md). For the gateway setup recipe, see [docs/v2/Galaxy.ParityRig.md](../v2/Galaxy.ParityRig.md). For tracing, metrics, and soak profile, see [docs/v2/Galaxy.Performance.md](../v2/Galaxy.Performance.md).
For the decision record on why Galaxy is out-of-process and how the refactor was staged, see [docs/v2/plan.md §4 Galaxy/MXAccess as Out-of-Process Driver](../v2/plan.md). For the full driver spec (addressing, data-type map, config shape), see [docs/v2/driver-specs.md §1](../v2/driver-specs.md).
> **Note**: the related drivers `Galaxy-Repository.md` and `Galaxy-Test-Fixture.md` describe the previous v1 / out-of-process topology and are being moved to `docs/v1/` by a parallel cleanup track. Use `Galaxy.ParityRig.md` and the `mxaccessgw` repo for current testing.
## Project Split
## Architecture
Galaxy ships as three projects:
```
+---------------------------------------+
| OtOpcUa.Server (.NET 10 AnyCPU) |
| GalaxyDriver (in-process) |
| ITagDiscovery / IReadable / |
| IWritable / ISubscribable / |
| IRediscoverable / |
| IHostConnectivityProbe / |
| IAlarmSource |
+-------------------+-------------------+
|
gRPC (default http://localhost:5120)
|
v
+---------------------------------------+
| mxaccessgw (sibling repo) |
| +-------------------------------+ |
| | MxGateway.Worker (x86 net48) | |
| | STA + WM_APP pump | |
| | ArchestrA.MxAccess COM | |
| | Galaxy Repository SQL | |
| | Wonderware Historian SDK | |
| +-------------------------------+ |
+---------------------------------------+
| Project | Target | Role |
|---------|--------|------|
| `src/ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Shared/` | .NET Standard 2.0 | IPC contracts (MessagePack records + `MessageKind` enum) referenced by both sides |
| `src/ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Host/` | .NET Framework 4.8 **x86** | Separate Windows service hosting the MXAccess COM objects, STA thread + Win32 message pump, Galaxy Repository reader, Historian SDK, runtime-probe manager |
| `src/ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Proxy/` | .NET 10 (matches Server) | `GalaxyProxyDriver : IDriver, ITagDiscovery, IReadable, IWritable, ISubscribable, IAlarmSource, IHistoryProvider, IRediscoverable, IHostConnectivityProbe` — loaded in-process by the Server; every call forwards over the pipe to the Host |
The Shared assembly is the **only** contract between the two runtimes. It carries no COM or SDK references so Proxy (net10) can reference it without dragging x86 code into the Server process.
## Why Out-of-Process
Two reasons drive the split, per `docs/v2/plan.md`:
1. **Bitness constraint.** MXAccess is 32-bit COM only — `ArchestrA.MxAccess.dll` in `Program Files (x86)\ArchestrA\Framework\bin` has no 64-bit variant. The main OtOpcUa Server is .NET 10 x64 (the OPC Foundation stack, SqlClient, and every other non-Galaxy driver target 64-bit). In-process hosting would force the whole Server to x86, which every other driver project would then inherit.
2. **Tier-C stability isolation.** Galaxy is classified Tier C in [docs/v2/driver-stability.md](../v2/driver-stability.md) — the COM runtime, STA thread, Aveva Historian SDK, and SQL queries all have crash/hang modes that can take down the hosting process. Isolating the driver in its own Windows service means a COM deadlock, AccessViolation in an unmanaged Historian DLL, or a runaway SQL query never takes the Server endpoint down. The Proxy-side supervisor restarts the Host with crash-loop circuit-breaker.
The same Tier-C isolation story applies to FOCAS (decision record in `docs/v2/plan.md` §7), which is the second out-of-process driver.
## IPC Transport
`GalaxyProxyDriver``GalaxyIpcClient` → named pipe → `Galaxy.Host` pipe server.
- Pipe name: `otopcua-galaxy-{DriverInstanceId}` (localhost-only, no TCP surface)
- Wire format: MessagePack-CSharp, length-prefixed frames
- ACL: pipe is created with a DACL that grants only the Server's service identity; the Admins group is explicitly denied so a live-smoke test running from an elevated shell fails fast rather than silently bypassing the handshake
- Handshake: Proxy presents a shared secret at `OpenSessionRequest`; Host rejects anything else with `MessageKind.OpenSessionResponse{Success=false}`
- Heartbeat: Proxy sends a periodic ping; missed heartbeats trigger the Proxy-side crash-loop supervisor to restart the Host
Every capability call on `GalaxyProxyDriver` (Read, Write, Subscribe, HistoryRead*, etc.) serializes a `*Request`, awaits the matching `*Response` via a `CallAsync<TReq, TResp>` helper, and rehydrates the result into the `Core.Abstractions` shape the Server expects.
## STA Thread Requirement (Host-side)
MXAccess COM objects — `LMXProxyServer` instantiation, `Register`, `AddItem`, `AdviseSupervisory`, `Write`, and cleanup calls — must all execute on the same Single-Threaded Apartment. Calling a COM object from the wrong thread causes marshalling failures or silent data corruption.
`StaComThread` in the Host provides that thread with the apartment state set before the thread starts:
```csharp
_thread = new Thread(ThreadEntry) { Name = "MxAccess-STA", IsBackground = true };
_thread.SetApartmentState(ApartmentState.STA);
```
History reads moved server-side in PR 7.2 (`IHistoryRouter`). Galaxy no longer implements `IHistoryProvider` of its own.
Work items queue via `RunAsync(Action)` or `RunAsync<T>(Func<T>)` into a `ConcurrentQueue<Action>` and post `WM_APP` to wake the pump. Each work item is wrapped in a `TaskCompletionSource` so callers can `await` the result from any thread — including the IPC handler thread that receives the inbound pipe request.
`IAlarmSource` was retired with PR 7.2 and **restored in PR B.2** of the
alarms-over-gateway epic ([docs/plans/alarms-over-gateway.md](../plans/alarms-over-gateway.md)).
Alarm transitions arrive on the same gateway `StreamEvents` channel as
data-change events under the new `MX_EVENT_FAMILY_ON_ALARM_TRANSITION`
family; acknowledgements route through the gateway's
`AcknowledgeAlarm` RPC. The previous value-driven sub-attribute path
remains as a fallback for Galaxy templates without `$Alarm*`
extensions — the server-side `AlarmConditionService` dedups when both
paths fire on the same condition. See [docs/AlarmTracking.md](../AlarmTracking.md)
for the v2-final architecture.
## Win32 Message Pump (Host-side)
## Project Layout
COM callbacks (`OnDataChange`, `OnWriteComplete`) are delivered through the Windows message loop. `StaComThread` runs a standard Win32 message pump via P/Invoke:
The driver ships as a single project: `src/ZB.MOM.WW.OtOpcUa.Driver.Galaxy/` (.NET 10, AnyCPU). Sub-folders:
1. `PeekMessage` primes the message queue (required before `PostThreadMessage` works)
2. `GetMessage` blocks until a message arrives
3. `WM_APP` drains the work queue
4. `WM_APP + 1` drains the queue and posts `WM_QUIT` to exit the loop
5. All other messages go through `TranslateMessage` / `DispatchMessage` for COM callback delivery
| Folder | Role |
|--------|------|
| `Browse/` | Static-side discovery: `GalaxyDiscoverer` walks the gateway's hierarchy + attribute-set RPCs, `DataTypeMap` and `SecurityMap` translate Galaxy types and security classifications into OPC UA equivalents, `AlarmRefBuilder` extracts alarm-bearing attribute references for the server-layer alarm engine. `IGalaxyHierarchySource` + `GatewayGalaxyHierarchySource` + `TracedGalaxyHierarchySource` decorate the gateway browse RPC; `IGalaxyDeployWatchSource` + `GatewayGalaxyDeployWatchSource` + `DeployWatcher` drive `IRediscoverable`. |
| `Runtime/` | Live data path: `EventPump` runs the gateway's `StreamEvents` RPC and fans out to subscribers via a bounded channel; `GalaxyMxSession` is the read-side handle; `GatewayGalaxySubscriber` + `GatewayGalaxyDataWriter` (each with a `Traced*` decorator) implement `ISubscribable` / `IWritable`; `SubscriptionRegistry` tracks subscription state for replay; `ReconnectSupervisor` owns the backoff loop and triggers `ReplaySubscriptions` on session loss; `StatusCodeMap` translates gateway StatusCodes to OPC UA; `MxValueDecoder` / `MxValueEncoder` handle scalar + array marshalling; `GalaxyTelemetry` + `GalaxySubscriptionHandle` round out the surface. |
| `Health/` | `HostStatusAggregator` rolls per-platform probe state into the driver's `IHostConnectivityProbe` view; `PerPlatformProbeWatcher` listens on the gateway's per-host status stream; `HostConnectivityForwarder` pushes transitions out to the server's connectivity bus. |
| `Config/` | `GalaxyDriverOptions` and the four nested option records (`GalaxyGatewayOptions`, `GalaxyMxAccessOptions`, `GalaxyRepositoryOptions`, `GalaxyReconnectOptions`). |
Without this pump MXAccess callbacks never fire and the driver delivers no live data.
Project root files:
## LMXProxyServer COM Object
- `GalaxyDriver.cs``IDriver` + capability-interface implementation; composes the Browse / Runtime / Health collaborators.
- `GalaxyDriverFactoryExtensions.cs` — DI registration helper used by the server's driver bootstrap.
`MxProxyAdapter` wraps the real `ArchestrA.MxAccess.LMXProxyServer` COM object behind the `IMxProxy` interface so Host unit tests can substitute a fake proxy without requiring the ArchestrA runtime. Lifecycle:
## Capability Surface
1. **`Register(clientName)`** — Creates a new `LMXProxyServer` instance, wires up `OnDataChange` and `OnWriteComplete` event handlers, calls `Register` to obtain a connection handle
2. **`Unregister(handle)`** — Unwires event handlers, calls `Unregister`, releases the COM object via `Marshal.ReleaseComObject`
`GalaxyDriver : IDriver, ITagDiscovery, IReadable, IWritable, ISubscribable, IRediscoverable, IHostConnectivityProbe, IDisposable`.
## Register / AddItem / AdviseSupervisory Pattern
| Capability | Implementation entry point |
|------------|---------------------------|
| `ITagDiscovery` | `Browse/GalaxyDiscoverer.cs` |
| `IRediscoverable` | `Browse/DeployWatcher.cs` |
| `IReadable` | `Runtime/GalaxyMxSession.cs` |
| `IWritable` | `Runtime/GatewayGalaxyDataWriter.cs` |
| `ISubscribable` | `Runtime/GatewayGalaxySubscriber.cs` (driven by `EventPump`) |
| `IHostConnectivityProbe` | `Health/HostStatusAggregator.cs` |
Every MXAccess data operation follows a three-step pattern, all executed on the STA thread:
## Configuration
1. **`AddItem(handle, address)`** — Resolves a Galaxy tag reference (e.g., `TestMachine_001.MachineID`) to an integer item handle
2. **`AdviseSupervisory(handle, itemHandle)`** — Subscribes the item for supervisory data-change callbacks
3. The runtime begins delivering `OnDataChange` events
`DriverConfig` JSON binds to `Config/GalaxyDriverOptions.cs`. The four sections are:
For writes, after `AddItem` + `AdviseSupervisory`, `Write(handle, itemHandle, value, securityClassification)` sends the value; `OnWriteComplete` confirms or rejects. Cleanup reverses: `UnAdviseSupervisory` then `RemoveItem`.
- **`Gateway`** — endpoint, API key secret ref, TLS knobs, connect/call/stream timeouts. `StreamTimeoutSeconds = 0` keeps the long-lived `StreamEvents` RPC open for the driver's lifetime.
- **`MxAccess`** — `ClientName` (must be unique per OtOpcUa instance — redundancy pairs enforce uniqueness at install time), `PublishingIntervalMs` (forwarded as `buffered_update_interval_ms` on subscribe), `WriteUserId` for ArchestrA secured-write, `EventPumpChannelCapacity` (default 50_000 — one second of headroom at 50k tags / 1Hz; tune via the `galaxy.events.dropped` metric).
- **`Repository`** — `DiscoverPageSize`, `WatchDeployEvents`.
- **`Reconnect`** — `InitialBackoffMs`, `MaxBackoffMs`, `ReplayOnSessionLost` (calls the gateway's `ReplaySubscriptions` RPC after reconnect rather than re-issuing subscribe-bulk for every tag).
## OnDataChange and OnWriteComplete Callbacks
Full per-field descriptions live in `Config/GalaxyDriverOptions.cs`. The full JSON skeleton is reproduced in [docs/v2/driver-specs.md §1](../v2/driver-specs.md).
### OnDataChange
## Reconnect + Replay
Fired by the COM runtime on the STA thread when a subscribed tag changes. The handler in `MxAccessClient.EventHandlers.cs`:
`ReconnectSupervisor` owns an exponential-backoff loop bounded by `Reconnect.InitialBackoffMs` / `MaxBackoffMs`. On session loss it tears down the gRPC channel, redials, and — when `ReplayOnSessionLost = true` — calls the gateway's `ReplaySubscriptions` RPC with the cached subscription set from `SubscriptionRegistry` instead of re-subscribing tag-by-tag. The gateway's worker then re-issues `AdviseSupervisory` server-side under the apartment lock.
1. Maps the integer `phItemHandle` back to a tag address via `_handleToAddress`
2. Maps the MXAccess quality code to the internal `Quality` enum
3. Checks `MXSTATUS_PROXY` for error details and adjusts quality
4. Converts the timestamp to UTC
5. Constructs a `Vtq` (Value/Timestamp/Quality) and delivers it to:
- The stored per-tag subscription callback
- Any pending one-shot read completions
- The global `OnTagValueChanged` event (consumed by the Host's subscription dispatcher, which packages changes into `DataChangeEventArgs` and forwards them over the pipe to `GalaxyProxyDriver.OnDataChange`)
## Testing
### OnWriteComplete
- **Unit tests**: `tests/ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Tests/` — fakes the gateway gRPC surface; covers Browse, Runtime, Health, and Config in isolation.
- **Parity rig + dev-rig walkthrough**: see [docs/v2/Galaxy.ParityRig.md](../v2/Galaxy.ParityRig.md). The rig stands up a real `mxaccessgw` against a live Galaxy and exercises the full read / write / subscribe / rediscover path.
- **Performance + soak**: see [docs/v2/Galaxy.Performance.md](../v2/Galaxy.Performance.md).
Fired when the runtime acknowledges or rejects a write. The handler resolves the pending `TaskCompletionSource<bool>` for the item handle. If `MXSTATUS_PROXY.success == 0` the write is considered failed and the error detail is logged.
## Operational Notes
## Reconnection Logic
- **MXAccess `ClientName` collisions**: two OtOpcUa instances sharing a `ClientName` cause the older Wonderware session to lose subscription state. Redundancy pairs (decision #149) enforce uniqueness via install scripts.
- **Channel saturation**: `galaxy.events.dropped > 0` indicates `EventPump` is back-pressured. Raise `EventPumpChannelCapacity` or investigate downstream slowness in the server-side fan-out.
- **Connectivity surface**: per-platform probe state is exposed through `IHostConnectivityProbe` and aggregated by the server's connectivity bus — there is no driver-private dashboard surface anymore. The Admin UI's Host Status panel is the consumer.
`MxAccessClient` implements automatic reconnection through two mechanisms.
### Monitor loop
`StartMonitor` launches a background task that polls at `MonitorIntervalSeconds`. On each cycle:
- If the state is `Disconnected` or `Error` and `AutoReconnect` is enabled, it calls `ReconnectAsync`
- If connected and a probe tag is configured, it checks the probe staleness threshold
### Reconnect sequence
`ReconnectAsync` performs a full disconnect-then-connect cycle:
1. Increment the reconnect counter
2. `DisconnectAsync` — tear down all active subscriptions (`UnAdviseSupervisory` + `RemoveItem` for each), detach COM event handlers, call `Unregister`, clear all handle mappings
3. `ConnectAsync` — create a fresh `LMXProxyServer`, register, replay all stored subscriptions, re-subscribe the probe tag
Stored subscriptions (`_storedSubscriptions`) persist across reconnects. `ReplayStoredSubscriptionsAsync` iterates the stored entries and calls `AddItem` + `AdviseSupervisory` for each.
## Probe Tag Health Monitoring
A configurable probe tag (e.g., a frequently updating Galaxy attribute) serves as a connection health indicator. After connecting, the client subscribes to the probe tag and records `_lastProbeValueTime` on every `OnDataChange`. The monitor loop compares `DateTime.UtcNow - _lastProbeValueTime` against `ProbeStaleThresholdSeconds`; if the probe has not updated within the window, the connection is assumed stale and a reconnect is forced. This catches scenarios where the COM connection is technically alive but the runtime has stopped delivering data.
## Per-Host Runtime Status Probes (`<Host>.ScanState`)
Separate from the connection-level probe, the driver advises `<HostName>.ScanState` on every deployed `$WinPlatform` and `$AppEngine` in the Galaxy. These probes track per-host runtime state so the Admin UI dashboard can report "this specific Platform / AppEngine is off scan" and the driver can proactively invalidate every OPC UA variable hosted by the stopped object — preventing MXAccess from serving stale Good-quality cached values to clients who read those tags while the host is down.
Enabled by default via `MxAccess.RuntimeStatusProbesEnabled`; see [Configuration](../Configuration.md#mxaccess) for the two config fields.
### How it works
`GalaxyRuntimeProbeManager` lives in `Driver.Galaxy.Host` alongside the rest of the MXAccess code. It is owned by the Host's subscription dispatcher and runs a three-state machine per host (Unknown / Running / Stopped):
1. **Discovery** — After the Host completes `BuildAddressSpace`, the manager filters the hierarchy to rows where `CategoryId == 1` (`$WinPlatform`) or `CategoryId == 3` (`$AppEngine`) and issues `AdviseSupervisory` for `<TagName>.ScanState` on each one. Probes are driver-owned, not ref-counted against client subscriptions, and persist across address-space rebuilds via a `Sync` diff.
2. **Transition predicate** — A probe callback is interpreted as `isRunning = vtq.Quality.IsGood() && vtq.Value is bool b && b`. Everything else (explicit `ScanState = false`, bad quality, communication errors) means **Stopped**.
3. **On-change-only delivery**`ScanState` is delivered only when the value actually changes. A stably Running host may go hours without a callback. `Tick()` does NOT run a starvation check on Running entries — the only time-based transition is **Unknown → Stopped** when the initial callback hasn't arrived within `RuntimeStatusUnknownTimeoutSeconds` (default 15s). This protects against a probe that fails to resolve at all without incorrectly flipping healthy long-running hosts.
4. **Transport gating** — When `IMxAccessClient.State != Connected`, `GetSnapshot()` forces every entry to `Unknown`. The dashboard shows the Connection panel as the primary signal in that case rather than misleading operators with "every host stopped".
5. **Subscribe failure rollback** — If `SubscribeAsync` throws for a new probe (SDK failure, broker rejection, transport error), the manager rolls back both `_byProbe` and `_probeByGobjectId` so the probe never appears in `GetSnapshot()`. Stability review 2026-04-13 Finding 1.
### Subtree quality invalidation on transition
When a host transitions **Running → Stopped**, the probe manager invokes a callback that walks `_hostedVariables[gobjectId]` — the set of every OPC UA variable transitively hosted by that Galaxy object — and sets each variable's `StatusCode` to `BadOutOfService`. **Stopped → Running** calls `ClearHostVariablesBadQuality` to reset each to `Good` so the next on-change MXAccess update repopulates the value.
The hosted-variables map is built once per `BuildAddressSpace` by walking each object's `HostedByGobjectId` chain up to the nearest Platform or Engine ancestor. A variable hosted by an Engine inside a Platform lands in both the Engine's list and the Platform's list, so stopping the Platform transitively invalidates every descendant Engine's variables.
### Read-path short-circuit (`IsTagUnderStoppedHost`)
The Host's Read handler checks `IsTagUnderStoppedHost(tagRef)` (a reverse-index lookup `_hostIdsByTagRef[tagRef]``GalaxyRuntimeProbeManager.IsHostStopped(hostId)`) before the MXAccess round-trip. When the owning host is Stopped, the handler returns a synthesized `DataValue { Value = cachedVar.Value, StatusCode = BadOutOfService }` directly without touching MXAccess. This guarantees clients see a uniform `BadOutOfService` on every descendant tag of a stopped host, regardless of whether they're reading or subscribing.
### Deferred dispatch — the STA deadlock
**Critical**: probe transition callbacks must **not** run synchronously on the STA thread that delivered the `OnDataChange`. `MarkHostVariablesBadQuality` takes the subscription dispatcher lock, which may be held by a worker thread currently inside `Read` waiting on an `_mxAccessClient.ReadAsync()` round-trip that is itself waiting for the STA thread. Classic circular wait — the first real deploy of this feature hung inside 30 seconds from exactly this pattern.
The fix is a deferred-dispatch queue: probe callbacks enqueue the transition onto `ConcurrentQueue<(int GobjectId, bool Stopped)>` and set the existing dispatch signal. The dispatch thread drains the queue inside its existing 100ms `WaitOne` loop — outside any locks held by the STA path — and then calls `MarkHostVariablesBadQuality` / `ClearHostVariablesBadQuality` under its own natural lock acquisition. No circular wait, no STA involvement.
### Dashboard and health surface
- Admin UI **Galaxy Runtime** panel shows per-host state with Name / Kind / State / Since / Last Error columns. Panel color is green (all Running), yellow (any Unknown, none Stopped), red (any Stopped), gray (MXAccess transport disconnected)
- `HealthCheckService.CheckHealth` rolls overall driver health to `Degraded` when any host is Stopped
See [Status Dashboard](../StatusDashboard.md#galaxy-runtime) for the field table and [Configuration](../Configuration.md#mxaccess) for the config fields.
## Request Timeout Safety Backstop
Every sync-over-async site on the OPC UA stack thread that calls into Galaxy (`Read`, `Write`, address-space rebuild probe sync) is wrapped in a bounded `SyncOverAsync.WaitSync(...)` helper with timeout `MxAccess.RequestTimeoutSeconds` (default 30s). Inner `ReadTimeoutSeconds` / `WriteTimeoutSeconds` bounds on the async path are the first line of defense; the outer wrapper is a backstop so a scheduler stall, slow reconnect, or any other non-returning async path cannot park the stack thread indefinitely.
On timeout, the underlying task is **not** cancelled — it runs to completion on the thread pool and is abandoned. This is acceptable because Galaxy IPC clients are shared singletons and the abandoned continuation does not capture request-scoped state. The OPC UA stack receives `StatusCodes.BadTimeout` on the affected operation.
`ConfigurationValidator` enforces `RequestTimeoutSeconds >= 1` and warns when it is set below the inner Read/Write timeouts (operator misconfiguration). Stability review 2026-04-13 Finding 3.
All capability calls at the Server dispatch layer are additionally wrapped by `CapabilityInvoker` (Core/Resilience/) which runs them through a Polly pipeline keyed on `(DriverInstanceId, HostName, DriverCapability)`. `OTOPCUA0001` analyzer enforces the wrap at build time.
## Why Marshal.ReleaseComObject Is Needed
The .NET Framework runtime's garbage collector releases COM references non-deterministically. For MXAccess, delayed release can leave stale COM connections open, preventing clean re-registration. `MxProxyAdapter.Unregister` calls `Marshal.ReleaseComObject(_lmxProxy)` in a `finally` block to immediately drive the COM reference count to zero. This ensures the underlying COM server is freed before a reconnect attempt creates a new instance.
## Tag Discovery and Historical Data
Tag discovery (the Galaxy Repository SQL reader + `LocalPlatform` scope filter) is covered in [Galaxy-Repository.md](Galaxy-Repository.md). The Galaxy driver is `ITagDiscovery` for the Server's bootstrap path and `IRediscoverable` for the on-change-redeploy path.
Historical data access (raw, processed, at-time, events) runs against the Aveva Historian via the `aahClientManaged` SDK and is exposed through the Galaxy driver's `IHistoryProvider` implementation. See [HistoricalDataAccess.md](../HistoricalDataAccess.md).
## Key source files
Host-side (`.NET 4.8 x86`, `src/ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Host/`):
- `Backend/MxAccess/StaComThread.cs` — STA thread and Win32 message pump
- `Backend/MxAccess/MxAccessClient.cs` — Core client (partial)
- `Backend/MxAccess/MxAccessClient.Connection.cs` — Connect / disconnect / reconnect
- `Backend/MxAccess/MxAccessClient.Subscription.cs` — Subscribe / unsubscribe / replay
- `Backend/MxAccess/MxAccessClient.ReadWrite.cs` — Read and write operations
- `Backend/MxAccess/MxAccessClient.EventHandlers.cs``OnDataChange` / `OnWriteComplete` handlers
- `Backend/MxAccess/MxAccessClient.Monitor.cs` — Background health monitor
- `Backend/MxAccess/MxProxyAdapter.cs` — COM object wrapper
- `Backend/MxAccess/GalaxyRuntimeProbeManager.cs` — Per-host `ScanState` probes, state machine, `IsHostStopped` lookup
- `Backend/Historian/HistorianDataSource.cs``aahClientManaged` SDK wrapper (see [HistoricalDataAccess.md](../HistoricalDataAccess.md))
- `Ipc/GalaxyIpcServer.cs` — Named-pipe server, message dispatch
- `Domain/IMxAccessClient.cs` — Client interface
Shared (`.NET Standard 2.0`, `src/ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Shared/`):
- `Contracts/MessageKind.cs` — IPC message kinds (`ReadRequest`, `HistoryReadRequest`, `OpenSessionResponse`, …)
- `Contracts/*.cs` — MessagePack DTOs for every request/response pair
Proxy-side (`.NET 10`, `src/ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Proxy/`):
- `GalaxyProxyDriver.cs``IDriver`/`ITagDiscovery`/`IReadable`/`IWritable`/`ISubscribable`/`IAlarmSource`/`IHistoryProvider`/`IRediscoverable`/`IHostConnectivityProbe` implementation; every method forwards via `GalaxyIpcClient`
- `Ipc/GalaxyIpcClient.cs` — Named-pipe client, `CallAsync<TReq, TResp>`, reconnect on broken pipe
- `GalaxyProxySupervisor.cs` — Host-process monitor, crash-loop circuit-breaker, Host relaunch
+40
View File
@@ -47,6 +47,13 @@ the tests mock.
- `OpcUaClientSmokeTests.Client_subscribe_receives_StepUp_data_changes_from_live_server`
real `MonitoredItem` subscription against `ns=3;s=FastUInt1` (ticks every
100 ms); asserts `OnDataChange` fires within 3 s of subscribe
- `OpcUaClientReverseConnectSmokeTests.Driver_accepts_reverse_connect_from_opc_plc_rc_simulator`
reverse-connect (server-initiated) coverage. Driver binds
`opc.tcp://0.0.0.0:4844`, the `opc-plc-rc` docker service dials in via
`--rc opc.tcp://host.docker.internal:4844`, and a Read round-trips over
the inbound socket. Gated on `OPCUA_RC_SIM=1` because the simulator
requires `host.docker.internal` resolution which not every CI runner
exposes.
Wire-level surfaces verified: `IDriver` + `IReadable` + `ISubscribable` +
`IHostConnectivityProbe` (via the Secure Channel exchange).
@@ -159,6 +166,35 @@ Beyond that:
3. **Dedicated historian integration lab** — only path for
historian-specific coverage.
## HistoryRead aggregate coverage
PR-13 (issue #285) extended `HistoryAggregateType` from 5 to ~30 values
matching the OPC UA Part 13 §5 catalog. The mapping itself
(`OpcUaClientDriver.MapAggregateToNodeId`) is unit-tested via
`OpcUaClientAggregateMappingTests`:
- The full enum is swept with `Enum.GetValues<HistoryAggregateType>()`
every value must resolve to a non-null namespace-0 numeric `NodeId`.
- The 25 new aggregates each assert against a reflection-resolved
`Opc.Ua.ObjectIds.AggregateFunction_*` field by name, so a future SDK
upgrade that renames a constant trips the test loudly.
- The original 5 ordinals stay pinned to their pre-PR-13 NodeIds so existing
config files / persisted enums keep working.
This is **the well-known-NodeId test path** — the standard Part 13 NodeIds
are stable across SDK versions; round-tripping each one against a live
upstream is the integration suite's job and doesn't add coverage to the
mapping table itself.
`OpcUaClientAggregateSweepTests` is the integration counterpart. It loops
every enum value against a real opc-plc upstream and asserts the wire path
doesn't crash even when the simulator returns
`BadAggregateNotSupported` for an aggregate it doesn't honour. opc-plc's
default profile doesn't enable HistoryRead on the well-known nodes, so the
test currently `Assert.Skip`s — re-enables when the fixture image is
upgraded to a history-sim profile (`--useslowtypes --ut=10` or similar) and
a known-good historized NodeId is wired into `OpcPlcProfile`.
## Key fixture / config files
- `tests/ZB.MOM.WW.OtOpcUa.Driver.OpcUaClient.Tests/` — unit tests with
@@ -168,3 +204,7 @@ Beyond that:
- `tests/ZB.MOM.WW.OtOpcUa.Server.Tests/OpcUaServerIntegrationTests.cs`
the server-side integration harness a future loopback client test could
piggyback on
- `tests/ZB.MOM.WW.OtOpcUa.Driver.OpcUaClient.Tests/OpcUaClientAggregateMappingTests.cs`
— Part 13 aggregate enum-to-NodeId mapping coverage (PR-13)
- `tests/ZB.MOM.WW.OtOpcUa.Driver.OpcUaClient.IntegrationTests/OpcUaClientAggregateSweepTests.cs`
— wire-side aggregate sweep against opc-plc (build-only scaffold; PR-13)
+350
View File
@@ -0,0 +1,350 @@
# OPC UA Client driver
Tier-A in-process driver that opens a `Session` against a remote OPC UA server
and re-exposes its address space through the local OtOpcUa server. The
"gateway / aggregation" direction — opposite to the usual "server exposes PLC
data" flow.
For the test fixture (opc-plc) see [`OpcUaClient-Test-Fixture.md`](OpcUaClient-Test-Fixture.md).
For the configuration surface see `OpcUaClientDriverOptions` in
[`src/ZB.MOM.WW.OtOpcUa.Driver.OpcUaClient/OpcUaClientDriverOptions.cs`](../../src/ZB.MOM.WW.OtOpcUa.Driver.OpcUaClient/OpcUaClientDriverOptions.cs).
## Auto re-import on `ModelChangeEvent`
The driver subscribes to `BaseModelChangeEventType` (and its subtype
`GeneralModelChangeEventType`) on the upstream `Server` node (`i=2253`) at
the end of `InitializeAsync`. When the upstream server advertises a
topology change, the driver coalesces events over a debounce window and
runs a single re-import (equivalent to calling `ReinitializeAsync`
internally `ShutdownAsync` + `InitializeAsync`).
### Configuration
| Option | Default | Notes |
| --- | --- | --- |
| `WatchModelChanges` | `true` | Disable to skip the watch entirely (no extra subscription, no re-import on topology change). |
| `ModelChangeDebounce` | `5s` | Coalescing window. The first event starts the timer; further events extend it; when it elapses with no new events, the driver fires one re-import. |
### Behaviour
- One model-change subscription per driver instance, separate from the
data + alarm subscriptions. Created best-effort: a server that doesn't
advertise the event types or rejects the `EventFilter` falls through to
no-watch — `InitializeAsync` still succeeds.
- The `EventFilter` selects only the `EventType` field (a `WhereClause`
constrains by `OfType BaseModelChangeEventType`). Payload fields like
`Changes[]` are intentionally ignored: the driver always re-imports the
full upstream root, so per-event delta tracking would just add wire
overhead.
- Debounce is implemented via a single-shot `Timer`; every event calls
`Timer.Change(window, Infinite)` so a burst of N events triggers exactly
one re-import after the window elapses with no further events.
- The re-import path acquires the same `_gate` semaphore that `ReadAsync`
/ `WriteAsync` / `BrowseAsync` / `SubscribeAsync` use. Downstream callers
see a brief browse-gap (≈ the upstream `DiscoverAsync` duration) while
the gate is held — but no torn reads or split-batch writes.
- Failure during the re-import is best-effort: the next `ModelChangeEvent`
triggers another attempt, and the keep-alive watchdog covers permanent
upstream loss. Operators see failures through `DriverHealth.LastError`
+ the diagnostics counters.
### When to disable
Flip `WatchModelChanges` to `false` when:
- The upstream topology is known-static (e.g. firmware-pinned PLC) and
the driver should never run a re-import unprompted.
- The brief browse-gap during re-import is unacceptable and a manual
`ReinitializeAsync` call from the operator is preferred.
- The upstream server fires spurious `ModelChangeEvent`s that don't
reflect real topology changes, causing wasted re-imports. Tighten or
disable rather than chasing the noise downstream.
## Reverse Connect (server-initiated)
OPC UA's reverse-connect mode flips the transport direction: instead of the
client dialling the server, the **server** dials the client's listener. The
upstream sends a `ReverseHello` and the client continues the OPC UA
handshake on the inbound socket. Required for OT-DMZ deployments where the
plant firewall only permits outbound traffic from the upstream — the
gateway opens a listener, the upstream reaches out.
### Configuration
| Option | Default | Notes |
| --- | --- | --- |
| `ReverseConnect.Enabled` | `false` | Opt-in. When `true`, replaces the failover dial-sweep with a `WaitForConnection` call. |
| `ReverseConnect.ListenerUrl` | `null` | Local listener URL the SDK binds. Typically `opc.tcp://0.0.0.0:4844` (any interface) or a specific NIC for multi-homed gateways. **Required when `Enabled` is `true`.** |
| `ReverseConnect.ExpectedServerUri` | `null` | Upstream's `ApplicationUri` to filter inbound dials. `null` accepts the first connection (only safe with one upstream targeting the listener). |
### Shared listener (singleton)
A single underlying `Opc.Ua.Client.ReverseConnectManager` per process keyed
on `ListenerUrl`. Two driver instances that share a listener URL multiplex
onto one TCP socket; the SDK demuxes inbound dials by the upstream's
reported `ServerUri`. The wrapper (`ReverseConnectListener`) is
reference-counted — first `Acquire` binds the port, last `Release` tears it
down. Letting drivers come and go independently without races on
port-bind / port-unbind.
When two drivers share a listener:
- They MUST set `ExpectedServerUri` to disambiguate; otherwise the first
upstream to dial in wins regardless of which driver is waiting.
- They CAN come and go independently; the listener stays alive while at
least one driver references it.
### Behaviour
- The dial path is bypassed entirely when `Enabled` is `true`. Failover
across multiple `EndpointUrls` doesn't apply — there's no client-side
dial to fail over.
- `ExpectedServerUri` is the SDK's filter parameter to `WaitForConnectionAsync`.
Inbound `ReverseHello`s from a different upstream are ignored and the
caller keeps waiting.
- The same `EndpointDescription` derivation runs as the dial path — the
first `EndpointUrl` in the candidate list seeds `SecurityPolicy` /
`SecurityMode` / `EndpointUrl` for the session-create call. The actual
endpoint lives on the upstream and the SDK reconciles after the
`ReverseHello`.
- Cancellation: `Timeout` bounds the wait. A stuck listener with no inbound
dial throws after `Timeout` rather than hanging init forever.
- Shutdown releases the listener reference. The last release stops the
listener so the port can be re-bound by a future driver lifecycle.
### Wiring it up on the upstream
The upstream OPC UA server has to be configured to dial out. The `opc-plc`
simulator does this with `--rc=opc.tcp://<gateway-host>:4844`; for a real
upstream see your server's reverse-connect docs (most major implementations
expose a "ReverseConnect.Endpoint" config knob).
### When NOT to use
- Standard plant networks where the gateway can dial the upstream — the
conventional dial path is simpler and supports failover natively.
- Public-internet OPC UA: reverse-connect is a network-policy workaround,
not a security primitive. Always pair with `Sign` or `SignAndEncrypt`
+ a vetted user-token policy.
## HistoryRead Events
The driver passes through OPC UA `HistoryReadEvents` to the upstream server.
HistoryRead Raw / Processed / AtTime ship in the same code path
(`ExecuteHistoryReadAsync`); event history takes a slightly different shape
because the client sends an `EventFilter` (SelectClauses + WhereClause) rather
than a plain numeric / time-based detail block.
### Wire path
`IHistoryProvider.ReadEventsAsync(fullReference, EventHistoryRequest, ct)`
translates to:
```
new ReadEventDetails {
StartTime,
EndTime,
NumValuesPerNode,
Filter = EventFilter { SelectClauses, WhereClause }
}
```
…and is sent through `Session.HistoryReadAsync` to the upstream server. The
returned `HistoryEvent.Events` collection (one `HistoryEventFieldList` per
historical event) is unwrapped into `HistoricalEventBatch.Events`, where each
`HistoricalEventRow.Fields` dictionary is keyed by the
`SimpleAttributeSpec.FieldName` the caller supplied. The server-side history
dispatcher uses those keys to align fields with the wire-side SelectClause
order — drivers don't have to honour the entire OPC UA `EventFilter` shape
verbatim.
### SelectClauses
When `EventHistoryRequest.SelectClauses` is `null` the driver falls back to a
default set that matches `BuildHistoryEvent` on the server side:
| Field | Browse path | Notes |
| --- | --- | --- |
| `EventId` | `EventId` | BaseEventType — stable unique id. |
| `SourceName` | `SourceName` | Source-object name. |
| `Time` | `Time` | Process-side event timestamp. Used for `OccurrenceTime`. |
| `Message` | `Message` | LocalizedText payload. |
| `Severity` | `Severity` | OPC UA 1-1000 scale. |
| `ReceiveTime` | `ReceiveTime` | Server-side ingest timestamp. |
Custom SelectClauses are supported — pass any
`IReadOnlyList<SimpleAttributeSpec>`. Each entry's `TypeDefinitionId`
defaults to `BaseEventType` when `null`; pass an explicit NodeId text (e.g.
`"i=2782"` for `ConditionType`) to reach typed-condition fields.
### WhereClause
`ContentFilterSpec.EncodedOperands` carries the binary-encoded
`ContentFilter` from the wire. The driver decodes it into the SDK
`ContentFilter` and attaches it to the outgoing `EventFilter` verbatim — the
OPC UA Client driver is a passthrough for filter semantics, it does not
evaluate them. A malformed filter is dropped silently; the SelectClause
projection still goes out.
### Continuation points
Returned in `HistoricalEventBatch.ContinuationPoint`. The server-side
HistoryRead facade is responsible for round-tripping these so a paged event
read against a chatty upstream completes incrementally. The driver itself
doesn't track them — every `ReadEventsAsync` call issues a fresh
`HistoryReadAsync`.
## HistoryRead Aggregates (Part 13 catalog)
`IHistoryProvider.ReadProcessedAsync` takes a `HistoryAggregateType` and the
driver maps it to the standard `Opc.Ua.ObjectIds.AggregateFunction_*` NodeId
in `MapAggregateToNodeId`. PR-13 (issue #285) extended the enum from the
original 5 values (Average / Minimum / Maximum / Total / Count) to the full
OPC UA Part 13 §5 catalog — ~30 aggregates.
The mapping is best-effort: not every upstream OPC UA server implements every
aggregate. Aggregates the upstream rejects come back with
`StatusCode=BadAggregateNotSupported` on the per-row HistoryRead result; the
driver passes that through verbatim (cascading-quality rule, Part 11 §8) — it
does not throw. Servers advertise the aggregates they support via the
`AggregateConfiguration` object on the `Server` node; clients can probe it at
runtime.
### Catalog
| Enum value | SDK NodeId field | Part 13 § | Server-side support | Typical use |
| --- | --- | --- | --- | --- |
| `Average` | `AggregateFunction_Average` | §5.4 | almost always | smoothing |
| `Minimum` | `AggregateFunction_Minimum` | §5.5 | almost always | low watermark |
| `Maximum` | `AggregateFunction_Maximum` | §5.6 | almost always | high watermark |
| `Total` | `AggregateFunction_Total` | §5.10 | usually | totalisation |
| `Count` | `AggregateFunction_Count` | §5.18 | almost always | sample count |
| `TimeAverage` | `AggregateFunction_TimeAverage` | §5.4.2 | usually | time-weighted mean |
| `TimeAverage2` | `AggregateFunction_TimeAverage2` | §5.4.3 | sometimes | bounded time-weighted mean |
| `Interpolative` | `AggregateFunction_Interpolative` | §5.3 | usually | trend snapshot |
| `MinimumActualTime` | `AggregateFunction_MinimumActualTime` | §5.5.4 | sometimes | when low occurred |
| `MaximumActualTime` | `AggregateFunction_MaximumActualTime` | §5.6.4 | sometimes | when high occurred |
| `Range` | `AggregateFunction_Range` | §5.7 | usually | spread |
| `Range2` | `AggregateFunction_Range2` | §5.7 | sometimes | bounded spread |
| `AnnotationCount` | `AggregateFunction_AnnotationCount` | §5.21 | rarely | operator notes |
| `DurationGood` | `AggregateFunction_DurationGood` | §5.16 | sometimes | quality coverage |
| `DurationBad` | `AggregateFunction_DurationBad` | §5.16 | sometimes | gap accounting |
| `PercentGood` | `AggregateFunction_PercentGood` | §5.17 | sometimes | quality % |
| `PercentBad` | `AggregateFunction_PercentBad` | §5.17 | sometimes | gap % |
| `WorstQuality` | `AggregateFunction_WorstQuality` | §5.20 | sometimes | worst seen |
| `WorstQuality2` | `AggregateFunction_WorstQuality2` | §5.20 | rarely | bounded worst |
| `StandardDeviationSample` | `AggregateFunction_StandardDeviationSample` | §5.13 | sometimes | n-1 stddev |
| `StandardDeviationPopulation` | `AggregateFunction_StandardDeviationPopulation` | §5.13 | sometimes | n stddev |
| `VarianceSample` | `AggregateFunction_VarianceSample` | §5.13 | sometimes | n-1 variance |
| `VariancePopulation` | `AggregateFunction_VariancePopulation` | §5.13 | sometimes | n variance |
| `NumberOfTransitions` | `AggregateFunction_NumberOfTransitions` | §5.12 | sometimes | event count |
| `DurationInStateZero` | `AggregateFunction_DurationInStateZero` | §5.19 | sometimes | OFF time |
| `DurationInStateNonZero` | `AggregateFunction_DurationInStateNonZero` | §5.19 | sometimes | ON time |
| `Start` | `AggregateFunction_Start` | §5.8 | usually | first sample |
| `End` | `AggregateFunction_End` | §5.9 | usually | last sample |
| `Delta` | `AggregateFunction_Delta` | §5.11 | usually | end-start |
| `StartBound` | `AggregateFunction_StartBound` | §5.8 | sometimes | extrapolated start |
| `EndBound` | `AggregateFunction_EndBound` | §5.9 | sometimes | extrapolated end |
"Server-side support" is heuristic — see your upstream's `AggregateConfiguration`
node for the authoritative list. AVEVA Historian, KEPServerEX, Prosys, and
opc-plc each implement different subsets.
### Driver-side validation
The mapping itself is unit-tested over the full enum
(`OpcUaClientAggregateMappingTests`) — every value resolves to a non-null
namespace-0 NodeId, and the original 5 ordinals stay pinned. Wire-side
behaviour against a live server is exercised by
`OpcUaClientAggregateSweepTests` (build-only scaffold pending an opc-plc
history-sim profile).
## Upstream redundancy (`ServerArray`)
When the upstream OPC UA server is itself a redundant pair (warm or hot per
OPC UA Part 4 §6.6.2), the driver supports **mid-session failover** driven by
the upstream's own `Server.ServerRedundancy.RedundancySupport` +
`ServerUriArray` + `Server.ServiceLevel` nodes. Distinct from the static
boot-time failover sweep on `EndpointUrls`: that path picks a single survivor
at session-create time; this path swaps the active session live when the
upstream signals degradation, transferring subscriptions onto the secondary so
monitored-item handles stay valid.
### Configuration
| Option | Default | Notes |
| --- | --- | --- |
| `Redundancy.Enabled` | `false` | Opt-in. When `false`, the driver doesn't read `RedundancySupport` / `ServerUriArray` and doesn't subscribe to `ServiceLevel`. |
| `Redundancy.ServiceLevelThreshold` | `200` | Byte value below which the driver triggers failover. OPC UA spec convention: 200+ = healthy primary, 100..199 = degraded, 0..99 = unrecoverable. |
| `Redundancy.RecheckInterval` | `5s` | Lower bound between two consecutive failovers — suppresses oscillation when ServiceLevel flaps around the threshold. |
### Behaviour
- At session activation the driver reads
`Server.ServerRedundancy.RedundancySupport`. When `None`, the driver records
an empty peer list and the failover path becomes a no-op (`ServiceLevel`
drops are still observable via diagnostics but trigger nothing).
- When the upstream advertises `Cold` / `Warm` / `WarmActive` / `Hot`, the
driver pulls `Server.ServerRedundancy.ServerUriArray` for the peer list,
falling back to the top-level `Server.ServerArray` for legacy upstreams that
don't expose the redundancy node.
- A dedicated subscription on `Server.ServiceLevel` (publish interval 1s,
separate from the alarm + data subscriptions) drives every failover decision
via the SDK's notification path — no polling loop.
- On a drop below `ServiceLevelThreshold` the driver picks the next URI in the
peer list that isn't the active one, opens a parallel session against it,
and calls `Session.TransferSubscriptionsAsync(other, sendInitialValues:true)`
to migrate every live subscription (data + alarm + model-change +
service-level itself). On success the driver swaps `Session`, closes the
old one, and bumps `RedundancyFailoverCount`.
- On any failure (`BadSecureChannelClosed`, `BadCertificateUntrusted`,
`TransferSubscriptions` returning `false`, secondary unreachable) the driver
leaves the existing session untouched, increments
`RedundancyFailoverFailures`, and waits for the next ServiceLevel
notification. The keep-alive watchdog continues to cover full
upstream-loss scenarios.
### Shared client-cert prerequisite
`TransferSubscriptionsAsync` requires the secondary's secure channel to accept
the same client certificate the primary did. Operators running heterogeneous
secondaries (different cert trust stores) will see `BadCertificateUntrusted`
on every failover attempt and the failures counter climbing. The fix is to
push the gateway driver's application-instance certificate into both
upstreams' `TrustedPeerCertificates` store before enabling redundancy. A
follow-up adds a fallback path that re-creates subscriptions instead of
transferring when the secondary rejects the channel.
### Diagnostics
The `driver-diagnostics` RPC surfaces three new counters via
`DriverHealth.Diagnostics`:
| Key | Type | Notes |
| --- | --- | --- |
| `RedundancyFailoverCount` | `double` (long-counted) | Successful mid-session swaps since driver start. |
| `RedundancyFailoverFailures` | `double` (long-counted) | Swap attempts that bailed (TransferSubscriptions false, secondary unreachable, etc.). |
| `ActiveServerUri` | string (in `OpcUaClientDiagnostics.ActiveServerUri`) | URI of the upstream the driver is currently bound to. Updates on every successful failover. |
### Forced-failover runbook
To validate the wiring against a real redundant upstream pair:
1. Confirm the upstream advertises `RedundancySupport != None` and a
non-empty `ServerUriArray`. Use the Client CLI:
`dotnet run --project src/ZB.MOM.WW.OtOpcUa.Client.CLI -- redundancy -u <primary>`.
2. Set `Redundancy.Enabled = true` on the gateway's `OpcUaClient` driver
instance and restart.
3. Tail driver diagnostics:
`driver-diagnostics --instance <id>` — note `RedundancyFailoverCount = 0`
pre-test.
4. Drive a `ServiceLevel` drop on the primary. On AVEVA / KEPServer this is
typically a "force standby" Admin action; on a custom server it's a write
to the simulated ServiceLevel node.
5. Observe `RedundancyFailoverCount = 1` within `RecheckInterval` of the
drop, the gateway's `HostName` swap to the secondary URI, and downstream
reads/subscriptions continuing without interruption.
For non-redundant upstreams (single-server deployments) the recommended
configuration is to leave `Redundancy.Enabled = false` and rely on
`EndpointUrls` for boot-time failover only.
+1 -4
View File
@@ -26,7 +26,7 @@ Driver type metadata is registered at startup in `DriverTypeRegistry` (`src/ZB.M
| AB CIP | `Driver.AbCip` | A | libplctag CIP | IDriver, ITagDiscovery, IReadable, IWritable, ISubscribable, IHostConnectivityProbe, IPerCallHostResolver, IAlarmSource | ControlLogix / CompactLogix. Tag discovery uses the `@tags` walker to enumerate controller-scoped + program-scoped symbols; UDT member resolution via the UDT template reader |
| AB Legacy | `Driver.AbLegacy` | A | libplctag PCCC | IDriver, ITagDiscovery, IReadable, IWritable, ISubscribable, IHostConnectivityProbe, IPerCallHostResolver | SLC 500 / MicroLogix. File-based addressing (`N7:0`, `F8:0`) — no symbol table, tag list is user-authored in the config DB |
| TwinCAT | `Driver.TwinCAT` | B | Beckhoff `TwinCAT.Ads` (`TcAdsClient`) | IDriver, ITagDiscovery, IReadable, IWritable, ISubscribable, IHostConnectivityProbe, IPerCallHostResolver | The only native-notification driver outside Galaxy — ADS delivers `ValueChangedCallback` events the driver forwards straight to `ISubscribable.OnDataChange` without polling. Symbol tree uploaded via `SymbolLoaderFactory` |
| [FOCAS](FOCAS.md) | `Driver.FOCAS` | A | Pure-managed `FocasWireClient` FOCAS/2 Ethernet binary protocol on TCP:8193, inlined into the driver assembly | IDriver, ITagDiscovery, IReadable, ISubscribable, IHostConnectivityProbe, IPerCallHostResolver, IAlarmSource | Read-only by design (WriteAsync returns `BadNotWritable`). CNC-shaped data model (axes, spindle, PMC, macros, alarms) not a flat tag map. Previously Tier-C (Host + P/Invoke + shim DLL); retired in the 2026-04-24 migration when the managed wire client landed |
| FOCAS | `Driver.FOCAS` | C | FANUC FOCAS2 (`Fwlib32.dll` P/Invoke) | IDriver, ITagDiscovery, IReadable, IWritable, ISubscribable, IHostConnectivityProbe, IPerCallHostResolver | Tier C — FOCAS DLL has crash modes that warrant process isolation. CNC-shaped data model (axes, spindle, PMC, macros, alarms) not a flat tag map |
| OPC UA Client | `Driver.OpcUaClient` | B | OPCFoundation `Opc.Ua.Client` | IDriver, ITagDiscovery, IReadable, IWritable, ISubscribable, IAlarmSource, IHistoryProvider, IHostConnectivityProbe | Gateway/aggregation driver. Opens a single `Session` against a remote OPC UA server and re-exposes its address space. Owns its own `ApplicationConfiguration` (distinct from `Client.Shared`) because it's always-on with keep-alive + `TransferSubscriptions` across SDK reconnect, not an interactive CLI |
## Per-driver documentation
@@ -35,9 +35,6 @@ Driver type metadata is registered at startup in `DriverTypeRegistry` (`src/ZB.M
- [Galaxy.md](Galaxy.md) — COM bridge, STA pump, IPC, runtime probes
- [Galaxy-Repository.md](Galaxy-Repository.md) — ZB SQL reader, `LocalPlatform` scope filter, change detection
- **FOCAS** has a short getting-started doc because the Tier-C two-project deployment + backend-selection env var + alarm projection opt-in all need explaining up front:
- [FOCAS.md](FOCAS.md) — deployment, config, capability surface, alarm projection, troubleshooting
- **All other drivers** share a single per-driver specification in [docs/v2/driver-specs.md](../v2/driver-specs.md) — addressing, data-type maps, connection settings, and quirks live there. That file is the authoritative per-driver reference; this index points at it rather than duplicating.
## Test-fixture coverage maps
+338
View File
@@ -0,0 +1,338 @@
# S7 — TIA Portal CSV & STEP 7 Classic AWL symbol import
PR-S7-D1 / [#299](https://github.com/dohertj2/lmxopcua/issues/299) — bulk-import
TIA Portal "Show all tags" CSV exports and STEP 7 Classic AWL declaration files
into the S7 driver. Saves operators from hand-typing every `%MW0` /
`%DB1.DBW0` row of a several-hundred-tag PLC into `appsettings.json`.
## Supported formats — v1
| Format | Status | Notes |
|---|---|---|
| TIA Portal `.CSV` ("Show all tags" export) | **supported** | Header columns `Name,Path,Data type,Logical address,Comment,Hmi accessible,…`; en-US (`,`) and DE-locale (`;` separator + `,` decimal) auto-detected |
| STEP 7 Classic `.AWL` (`VAR_GLOBAL` + `DATA_BLOCK`) | **supported, best-effort** | Position-based offset assignment (no exact byte offsets in hand-exported AWL — see below) |
| STEP 7 / TIA Portal native binary (`.s7p`, `.zap`) | **out of scope** | Proprietary; no community parser. Use TIA's "Show all tags" CSV export |
| TIA Portal Openness API | **out of scope** | Requires a licensed TIA install + OpenAPI license; future PR |
## TIA Portal CSV column reference
| Column | Required | Notes |
|---|---|---|
| `Name` | yes | OPC UA tag name. TIA symbols are stable across deployments; the importer uses them verbatim |
| `Logical address` (or `Address`) | yes | TIA-style address with leading `%` (e.g. `%MW0`, `%DB1.DBW10`, `%DB1.DBX2.3`). Stripped on import |
| `Data type` | recommended | TIA primitive type (`Int`, `Real`, `Bool`, `String`, …) — drives the imported `S7DataType` |
| `Comment` | no | Parsed but currently unused — `S7TagDefinition` has no `Description` field at the v2 schema layer (see [#248](https://github.com/dohertj2/lmxopcua/issues/248)). Held in the column contract for future schema bumps |
| `Hmi accessible` | no | Filter — rows with `False` / `FALSCH` / `nein` are skipped (internal symbols TIA shows in the editor but doesn't expose to client interfaces). Missing column defaults to `True` |
| `Hmi visible` / `Hmi writeable` | no | Currently unused — held for future Admin-UI-side metadata |
| `Length` | no | For `String` rows: max length. Default 254. Drives `StringLength` on the imported tag |
| `Path` | no | TIA tag-table path (`Default tag table`, custom names). Currently unused; held in the contract |
### TIA `Data type``S7DataType` mapping
| TIA type | Maps to | Notes |
|---|---|---|
| `Bool` | `Bool` | Bit access; address must include a `.bit` suffix |
| `Byte`, `SInt`, `USInt` | `Byte` | 1-byte unsigned/signed |
| `Int` | `Int16` | Signed 16-bit |
| `Word`, `UInt` | `UInt16` | Unsigned 16-bit |
| `DInt` | `Int32` | Signed 32-bit |
| `DWord`, `UDInt` | `UInt32` | Unsigned 32-bit |
| `LInt` | `Int64` | 64-bit signed (S7-1500 only) |
| `LWord`, `ULInt` | `UInt64` | 64-bit unsigned (S7-1500 only) |
| `Real` | `Float32` | IEEE-754 32-bit |
| `LReal` | `Float64` | IEEE-754 64-bit (S7-1500 only) |
| `String` | `String` | S7 STRING with 2-byte header; `Length` column drives `StringLength` |
| `WString` | `WString` | S7 WSTRING (UTF-16BE) |
| `Char` / `WChar` | `Char` / `WChar` | Single-character |
| `Date` | `Date` | UInt16 days since 1990-01-01 |
| `Time` | `Time` | Int32 ms |
| `TOD` / `Time_Of_Day` | `TimeOfDay` | UInt32 ms since midnight |
| `DT` / `Date_And_Time` | `DateAndTime` | 8-byte BCD |
| `DTL` | `Dtl` | 12-byte structured (S7-1200 / S7-1500) |
| `S5Time` | `S5Time` | 16-bit BCD duration |
| `Struct` / quoted UDT name | UDT placeholder | See below |
### UDT placeholders
UDT-typed symbols (TIA `Data type` = `"MyUdt"` quoted, or the literal `Struct`)
import as a **placeholder** — the resulting tag lands in the driver options so
it shows up in the Admin UI tag list, but its data type is forced to `Byte`
and the row is marked `Writable = false`.
`S7ImportResult.UdtPlaceholderCount` tracks how many of the imported tags
landed in this bucket.
#### Cooperation with `Udts` declarations (PR-S7-D2 / #300)
PR-S7-D2 ships UDT fan-out via `S7DriverOptions.Udts` + `S7TagDefinition.UdtName`.
The importer and the `Udts` declaration cooperate as follows:
1. The importer emits a placeholder row for each UDT-typed symbol — same as
today (data type forced to `Byte`, `Writable = false`).
2. The operator hand-edits the placeholder row in the resulting JSON / options
object and:
- Sets `UdtName` to the UDT type name from the TIA "Data type" column
- Removes the `Writable: false` marker (UDT leaves inherit the parent's
writability)
3. The operator declares the matching `S7UdtDefinition` in
`S7DriverOptions.Udts` (member offsets come from the TIA UDT definition
in the project file — TIA's "Show all tags" CSV does not export struct
field offsets, hence the manual layout step).
4. At driver init, the fan-out replaces the placeholder with one scalar leaf
per UDT member.
The importer does NOT auto-populate `Udts` — UDT layouts live in the project
file, not the symbol-table CSV. A future enhancement may parse the SCL UDT
declaration alongside the CSV; for now the cooperation is "importer flags it,
operator declares the layout, driver fans out at init".
See [`docs/v2/s7.md` "UDT / STRUCT support"](../v2/s7.md#udt--struct-support)
for the full fan-out semantics, the 4-level nesting cap, and the
Optimized-block-access prerequisite.
## Instance DBs / FB parameters
PR-S7-D3 / [#301](https://github.com/dohertj2/lmxopcua/issues/301) — multi-instance
Function-Block (FB) instances are addressed symbolically inside the PLC program
(`MyFB_Instance.MyParam`) but the runtime wire access still needs the absolute
`DBn.DBW_offset`. TIA Portal's "Show all tags" CSV export distinguishes these
rows from regular global DBs via the **`DB type`** column.
### `DB type` column convention
| `DB type` value | Meaning | Path |
|---|---|---|
| (empty) | Legacy export — no column at all (TIA pre-v15 / partial export). Treated as Global. | D1 (existing) |
| `Global DB` / `Global` / `Global Data Block` | Standalone DB declared in the project tree. | D1 (existing) |
| `Globaler Datenbaustein` | Same as above, DE locale. | D1 (existing) |
| `Instance DB` / `Instance` / `Instance Data Block` | Multi-instance FB instance. Member tags are the FB's `IN` / `OUT` / `IN_OUT` / `STAT` parameters. | **D3 (new)** |
| `Instance-DB` / `Instanz-DB` / `Instanz-Datenbaustein` | Same as above (locale + dashing variants). | **D3 (new)** |
The `DB type` column is matched case-insensitively; quoting and surrounding
whitespace are tolerated.
### `MyFB_Instance.MyParam``DBn.DBW_offset`
The TIA Portal export ships the **resolved absolute address** in the
`Logical address` column for every instance-DB member — TIA itself walks the FB
interface declaration at export time and writes out the byte-offset-anchored
address verbatim. The importer accepts these rows the same way as a Global-DB
row, with two differences:
1. The row counts under `S7ImportResult.InstanceDbCount` (a sub-counter of
`ParsedCount`) so the operator can see how much of the import depends on the
FB-interface layout.
2. The row is rejected from the UDT placeholder path even if the data type
column happens to match a UDT name pattern — instance-DB members always
import as fully-functional scalar tags.
Example fixture row:
```csv
Name,Path,Data type,Logical address,Comment,Hmi accessible,DB type
MotorFB_1.Speed,FB instances,Int,%DB7.DBW0,Speed setpoint,True,Instance DB
```
The imported `S7TagDefinition` ends up with:
```csharp
new S7TagDefinition(
Name: "MotorFB_1.Speed",
Address: "DB7.DBW0",
DataType: S7DataType.Int16,
Writable: true);
```
### Empty-`Logical address` fallback
When TIA exports an instance-DB row with an empty `Logical address` column
(rare in practice — happens when the export was generated against a
not-yet-compiled project), `InstanceDbResolver` can compute the absolute
address from explicit parent-DB / parent-base-offset / member-offset inputs.
This fallback is exposed at the resolver-class level for advanced bootstrap
scenarios; the CSV path itself does not currently parse interface declarations
out of the file (TIA's CSV doesn't carry them).
For now the operator workflow is: re-export from TIA after compiling the
project so every instance-DB row carries a resolved `Logical address`.
### Re-import on FB-interface edit — caveat
When the FB interface changes — a member is added, removed, or reordered in
TIA — the instance-DB layout shifts on the PLC side. Member byte offsets that
worked yesterday point at the wrong word today; absolute-offset addressing has
no in-band schema check.
**The driver does not auto-detect this.** Operators must:
1. Recompile the FB in TIA Portal.
2. Download the updated program to the PLC.
3. **Re-export "Show all tags" CSV** from the updated project.
4. Re-import the CSV via `AddTiaCsvImport` or the `import-symbols` CLI.
5. Restart the driver instance (Admin UI → Drivers → Reload).
A stale import will silently read / write the wrong byte offsets — the values
will look like valid PLC data but reference whichever member used to live at
that offset before the interface edit. There is no runtime guard; this is the
same caveat that applies to all absolute-offset DB addressing on S7-1200 /
1500 (see [`docs/v2/s7.md` "UDT / STRUCT support"](../v2/s7.md#udt--struct-support)
for the parallel UDT-edit story).
A future enhancement may add a project-fingerprint compare at driver init —
hashing the interface offsets at import time and re-checking against a known
PLC system function. Tracked as a follow-up; not in PR-S7-D3.
## DE locale handling
TIA Portal honours the Windows display locale when writing CSV. A DE-locale
install emits:
- Field separator `;` (because `,` is the decimal separator)
- Decimal-comma in addresses: `%MW0,5` rather than `%MW0.5` for bit addresses
- Boolean column values `WAHR` / `FALSCH` rather than `True` / `False`
The importer **auto-detects** the locale from the first non-blank line:
- Field-separator detection: counts `;` vs `,` occurrences in the header
- Decimal-comma detection: scans the first data row's address column for a
digit-comma-digit pattern
- Boolean column values: recognises both languages (`true/false/wahr/falsch/yes/no/ja/nein`,
case-insensitive) plus bare `0`/`1`
The address column is rewritten to en-US shape (`%MW0,5``MW0.5`) before the
strict `S7AddressParser` runs, so the rest of the driver pipeline sees a
single canonical address shape.
## STEP 7 Classic AWL — `VAR_GLOBAL` + `DATA_BLOCK`
Best-effort parser for legacy STEP 7 Classic projects:
- `VAR_GLOBAL … END_VAR` — global memory area declarations. Each entry maps to
a sequential `M{B|W|D}{offset}` address based on declaration order.
- `DATA_BLOCK DBn … END_DATA_BLOCK` — DB declarations. Each field maps to a
`DB{n}.DB{B|W|D}{offset}` address based on declaration order; the DB number
is parsed from the `DATA_BLOCK` line's `DBn` keyword.
### Position-based addressing — heuristic
Real STEP 7 Classic projects carry exact byte offsets in the symbol table /
.gr8 deployment artefact, but a hand-exported AWL file omits them. The
importer assumes:
| Type | Bytes |
|---|---|
| `BOOL` | 1 (rounded up to byte alignment) |
| `BYTE` / `SINT` / `USINT` / `CHAR` | 1 |
| `INT` / `WORD` / `UINT` | 2 |
| `DINT` / `DWORD` / `UDINT` / `REAL` | 4 |
| `LREAL` / `LINT` / `ULINT` / `LWORD` | 8 |
| `STRING[N]` | N + 2 (2-byte header) |
| `STRING` (no length) | 256 |
| `STRUCT` / `Array[…] of …` / quoted UDT name | UDT placeholder (8-bit Byte at next aligned offset) |
S7 alignment rule: offsets round up to a 2-byte boundary for any 16-bit-or-larger
type. Sites needing exact offsets should drive their symbol import from the
TIA Portal CSV path instead — the CSV carries the offsets verbatim.
Comments (`(* ... *)` block, `// ...` line) are stripped before declaration
parsing. Initial-value clauses (`:= 0`) are recognised and discarded.
## CLI subcommand — `import-symbols`
```powershell
otopcua-s7-cli import-symbols --help
```
| Flag | Default | Purpose |
|---|---|---|
| `-f` / `--file` | **required** | Path to the TIA CSV or `.AWL` file |
| `--format` | `tia` | `tia` (CSV) or `awl` (STEP 7 Classic) |
| `-d` / `--device` | none | Optional documentation tag (reserved for symmetry with `import-rslogix`) |
| `--emit` | `appsettings-fragment` | `appsettings-fragment` (JSON) or `summary` (one-line counter) |
| `-o` / `--output` | stdout | Optional path; when set the JSON fragment is written there + summary line goes to stdout |
| `--max-rows` | unlimited | Defensive cap on rows imported |
| `--strict` | off | Fail-fast on the first malformed row (default permissive: skip + log) |
### `appsettings-fragment` output shape
The default `--emit appsettings-fragment` mode writes a JSON object whose
`Tags` array is shaped like the `S7DriverConfigDto.Tags` array — paste
straight into the driver-instance config under
`Drivers/<instance>/Config/Tags`.
```json
{
"Tags": [
{
"Name": "MotorSpeed",
"Address": "MW0",
"DataType": "Int16",
"Writable": true,
"StringLength": 254
},
]
}
```
### Summary line
`--emit summary` writes a single line:
```
Imported 142 tag(s), skipped 3, errors 0, udt-placeholders 5, instance-db 9.
```
`Skipped` covers HMI-accessible-false rows + missing-required-field rows;
`errors` covers rows whose `Address` failed to parse as an S7 address;
`udt-placeholders` covers UDT-typed rows that imported as placeholders;
`instance-db` (PR-S7-D3) covers rows whose `DB type` column tagged them as
multi-instance FB-instance members.
## API surface — `IS7SymbolImporter` + `AddTiaCsvImport` / `AddAwlImport`
For server-side / bootstrap use-cases the importer is reachable via:
```csharp
using ZB.MOM.WW.OtOpcUa.Driver.S7;
using ZB.MOM.WW.OtOpcUa.Driver.S7.SymbolImport;
var options = new S7DriverOptions { Host = "192.168.1.30", CpuType = CpuType.S71500 };
// Append imported tags onto an existing options object.
var updated = options.AddTiaCsvImport(
path: @"C:\plc\tia-export.csv",
out var result);
Console.WriteLine($"Imported {result.ParsedCount} tags ({result.UdtPlaceholderCount} placeholders)");
// AWL variant — same shape.
var withAwl = updated.AddAwlImport(
path: @"C:\plc\classic.awl",
out var awlResult);
```
For a hand-managed importer instance (e.g. supplying a custom `ILogger`) call
`new TiaCsvImporter(logger).Parse(stream, opts)` or
`new AwlImporter(logger).Parse(stream, opts)` directly.
## Operational notes
- The importers are **additive**`AddTiaCsvImport` / `AddAwlImport` concatenate
onto the existing `Tags` list rather than replacing it. Hand-rolled tags
(system-status variables, computed fields the operator added by hand) survive
a re-import.
- Re-imports are not idempotent — calling `AddTiaCsvImport` twice will produce
duplicate tag rows. Operators are expected to start from a clean options
object or de-duplicate themselves; a future schema rev may add a
`replace=true` switch.
- UDT placeholders surface in the Admin UI as non-writable Byte tags. PR-S7-D2
added the runtime UDT fan-out (`S7DriverOptions.Udts` + `S7TagDefinition.UdtName`)
— operators upgrade a placeholder row by setting `UdtName` and declaring the
matching `S7UdtDefinition`; see "Cooperation with `Udts` declarations" above.
Placeholder-only rows still work as a Byte view of the first byte but
can't browse / read their members until the layout is declared.
- Description metadata is dropped on the floor today — see the column
reference above. When [#248](https://github.com/dohertj2/lmxopcua/issues/248)
lands a `Description` field on `S7TagDefinition` the importer will start
populating it without further changes to the CSV contract.
+106 -3
View File
@@ -44,6 +44,10 @@ The driver ctor change that made this possible:
bool-with-bit in one batch call; proves typed decode per S7DataType
- `S7_1500SmokeTests.Driver_write_then_read_round_trip_on_scratch_word`
`DB1.DBW100` write → read-back; proves write path + buffer visibility
- `S7_1500DiagnosticsTests.Driver_exposes_negotiated_pdu_size_post_init`
asserts `DriverHealth.Diagnostics["S7.NegotiatedPduSize"]` is non-zero
after `InitializeAsync`; proves the negotiated PDU size surfaces in
driver health (Snap7 fixture pins this at 240 bytes — see fixture README)
### Unit
@@ -84,10 +88,59 @@ real PLC latency is not exercised.
S7-1200 vs S7-1500 vs S7-300/400 connection semantics (PG vs OP vs S7-Basic)
not differentiated at test time.
**Optimized DB / S7Plus** is the variant-shaped gap with the biggest field
impact. snap7 happens to behave like a classic-S7comm-only PLC, so the
integration suite cannot reproduce the shape that an S7-1500 with default
"Optimized block access" checked would return (`BadDeviceFailure` on every
absolute-offset read). The decision is documented at
[`docs/v2/s7.md` § Optimized DB constraint (S7Plus)](../v2/s7.md#optimized-db-constraint-s7plus)
and tracked in [`docs/featuregaps.md`](../featuregaps.md) row #1; the
project ships **Track 1** (operator unchecks Optimized block access in TIA
Portal) and **Track 3** (bridge via the `OpcUaClient` driver against the
CPU's onboard OPC UA server). A custom S7Plus implementation is out of
scope.
### 5. Data types beyond the scalars
UDT fan-out, `STRING` with length-prefix quirks, `DTL` / `DATE_AND_TIME`,
arrays of structs — not covered.
`STRING` with length-prefix quirks, `DTL` / `DATE_AND_TIME`, arrays of
structs — not covered. UDT fan-out IS covered (PR-S7-D2 / #300) via the
`udt_layout` meta-seed in `Docker/profiles/s7_1500.json` and the
`Driver_fans_out_udt_into_member_tags` integration test.
### 6. SZL (System Status List) — `@System.*` virtual addresses
PR-S7-E1 / [#302](https://github.com/dohertj2/dohertj2/lmxopcua/issues/302)
adds a virtual `@System.*` address surface (CPU type, firmware, scan-cycle
stats, diagnostic-buffer ring) backed by SZL reads. **snap7 does not
implement SZL** — the simulator answers every SZL request with a function-
not-supported error, so the integration profile exercises only the
not-supported semantics (`@System.CpuType` against snap7 returns
`BadNotSupported`). Live-firmware SZL coverage is parked behind a
`[Fact(Skip = ...)]` until either S7netplus exposes a public `ReadSzlAsync`
or we ship a raw S7comm PDU helper. See
[`docs/v2/s7.md` "CPU diagnostics (SZL)"](../v2/s7.md#cpu-diagnostics-szl)
for the wire-status detail.
### 7. Password / protection levels — not modelled by snap7
PR-S7-E2 / [#303](https://github.com/dohertj2/lmxopcua/issues/303) adds
`Password` + `ProtectionLevel` options that emit a connection-level password
right after `OpenAsync`. **snap7 does not model S7 protection levels** — the
simulator accepts every connection regardless of the password set on the
client, so the integration profile cannot distinguish "password sent
correctly" from "password ignored". Coverage stays at the unit-test seam:
`S7PasswordOptionsTests` injects a fake `IS7PlcAuthGate` to assert the
dispatch contract (Password=null skips the call; Password+SupportsSendPassword
calls the gate; auth-failed wraps to a clean `InvalidOperationException`),
plus the no-log invariant on `S7DriverOptions.ToString()`.
The wire path is also fundamentally limited until S7netplus 0.20 exposes a
public `SendPassword` — the driver currently logs a warning and continues
when the API is missing. See
[`docs/v2/s7.md` "PLC password / protection levels"](../v2/s7.md#plc-password--protection-levels)
for the library-limitation note. Live-firmware coverage of the unlock path
requires a hardened S7-1500 lab rig with TIA Portal "Protection & Security"
configured, which is parked as a follow-up.
## When to trust the S7 tests, when to reach for a rig
@@ -97,7 +150,7 @@ arrays of structs — not covered.
| "Does the driver lifecycle hang / crash?" | yes | yes |
| "Does a real read against an S7-1500 return correct bytes?" | no | yes (required) |
| "Does mailbox serialization actually prevent PG timeouts?" | no | yes (required) |
| "Does a UDT fan-out produce usable member variables?" | no | yes (required) |
| "Does a UDT fan-out produce usable member variables?" | yes (Snap7 + `udt_layout` meta-seed) | yes |
## Follow-up candidates
@@ -109,6 +162,56 @@ arrays of structs — not covered.
lab rig but not CI.
3. **Real S7 lab rig** — cheapest physical PLC (CPU 1212C) on a dedicated
network port, wired via self-hosted runner.
4. **PR-S7-C5 — PUT/GET-disabled pre-flight rejection.** Snap7 does *not*
model the hardened-CPU PUT/GET response (it accepts every read once the
COTP handshake completes), so the **failure** path of the pre-flight
probe — `S7PutGetDisabledException` thrown from `InitializeAsync` when
the PLC rejects the probe read with `ErrorCode.WrongCPU_Type` /
`ErrorCode.ReadData` — needs a real S7-1500 with PUT/GET disabled in TIA
Portal. The integration suite covers the *happy* path
(`Driver_preflight_passes_when_probe_address_seeded`); the failure path
should be added as a `--with-real-plc` opt-in test that the self-hosted
runner with the lab rig executes. The classifier branch
(`S7PreflightClassifier.IsPutGetDisabled`) is unit-tested without a
network in `S7PreflightTests.Classifier_matches_only_PUT_GET_disabled_error_codes`.
5. **Live-firmware Optimized-block-access toggle (PR-S7-F / [#304](https://github.com/dohertj2/lmxopcua/issues/304)).**
snap7 happens to behave like a classic-S7comm CPU, so the integration
profile cannot reproduce the failure that a default new TIA Portal V14+
project produces (`BadDeviceFailure` on `DB1.DBW0` against an Optimized
DB). A manual smoke test on the lab rig, gated behind `--with-real-plc`,
would close that loop. Suggested checklist on a real S7-1500 V2.5+:
1. Create `DB1` in TIA Portal with three INT members at offsets 0, 2, 4.
Leave **Optimized block access checked** (the default).
2. Compile + download to the PLC.
3. Drive the OtOpcUa S7 driver against `DB1.DBW0` — assert that the read
returns `BadDeviceFailure` (the Track-1-not-applied symptom). This is
the failure shape the docs warn about.
4. Open `DB1`'s properties → **uncheck Optimized block access**
compile → download. Re-run the read; assert it returns the seeded
INT value at offset 0. (Track 1 verified end-to-end.)
5. **Track 3 verification (separate run on the same rig):** with
Optimized access re-enabled on `DB1`, activate the CPU's onboard
OPC UA server in TIA Portal, expose `DB1.<MemberName>` through a
Server interface, register an `OpcUaClient` driver against
`opc.tcp://<plc-ip>:4840`, and assert the symbolic read returns the
same seeded value. This proves the bridge path against a real
Optimized DB without the operator having to disable Optimized
access.
The test must stay manual: TIA Portal compile + download cannot be
automated from CI without a Siemens engineering toolchain license, and
download-with-CPU-stop is destructive on a shared lab rig. Document
results inline in PR descriptions when the rig is available.
6. **PR-S7-E1 — live SZL test against a real S7-1500.** snap7 doesn't
implement SZL at all, and S7netplus 0.20 doesn't expose a public
`ReadSzlAsync`, so the `@System.*` virtual address surface currently
answers `BadNotSupported` against every backend. The parser
(`S7SzlParser`) is unit-tested against golden bytes; flipping the wire
path on requires either an S7netplus PR or a raw-PDU helper. Once that's
in, [`S7_1500SzlTests.System_CpuType_against_live_S7_1500_returns_non_empty_string`](../../tests/ZB.MOM.WW.OtOpcUa.Driver.S7.IntegrationTests/S7_1500/S7_1500SzlTests.cs)
should be flipped from `[Fact(Skip = ...)]` to env-var-gated against the
self-hosted runner with the lab rig.
Without any of these, S7 driver correctness against real hardware is trusted
from field deployments, not from the test suite.
+230 -119
View File
@@ -2,85 +2,68 @@
Coverage map + gap inventory for the Beckhoff TwinCAT ADS driver.
**TL;DR:** Integration-test suite lives at
`tests/ZB.MOM.WW.OtOpcUa.Driver.TwinCAT.IntegrationTests/`. `TwinCATXarFixture`
probes TCP 48898 on an operator-supplied runtime; the suite runs **14
`[TwinCATFact]` methods + one 16-case `[TwinCATTheory]` = 30 test cases** end-to-end
through the real ADS stack when the runtime is reachable, skips cleanly
otherwise. The runtime can be a Hyper-V XAR VM or a TCBSD VM
(`TwinCatProject/README.md` covers both). Unit tests via `FakeTwinCATClient`
still carry the exhaustive contract coverage alongside.
**TL;DR:** Integration-test scaffolding lives at
`tests/ZB.MOM.WW.OtOpcUa.Driver.TwinCAT.IntegrationTests/` (task #221).
`TwinCATXarFixture` probes TCP 48898 on an operator-supplied VM; three
smoke tests (read / write / native notification) run end-to-end through
the real ADS stack when the VM is reachable, skip cleanly otherwise.
**Remaining operational work**: stand up a TwinCAT 3 XAR runtime in a
Hyper-V VM, author the `.tsproj` project documented at
`TwinCatProject/README.md`, rotate the 7-day trial license (or buy a
paid runtime). Unit tests via `FakeTwinCATClient` still carry the
exhaustive contract coverage.
TwinCAT is the only driver outside Galaxy that uses **native notifications**
(no polling) for `ISubscribable`. The integration suite verifies that path on
the wire; the fake exposes a fire-event harness so notification routing is
also contract-tested rigorously at the unit layer.
TwinCAT is the only driver outside Galaxy that uses **native
notifications** (no polling) for `ISubscribable`, and the fake exposes a
fire-event harness so notification routing is contract-tested rigorously
at the unit layer.
## What the fixture is
**Integration layer**: `tests/ZB.MOM.WW.OtOpcUa.Driver.TwinCAT.IntegrationTests/`
`TwinCATXarFixture` TCP-probes ADS port 48898 on the host supplied by
`TWINCAT_TARGET_HOST` (defaults to `localhost`) + requires
`TWINCAT_TARGET_NETID` (AmsNetId of the runtime). Optionally takes
`TWINCAT_TARGET_PORT` (default `851` = TC3 PLC runtime 1). No fixture-owned
lifecycle — XAR / TCBSD can't run in Docker because they bypass the host
kernel scheduler, so the runtime stays operator-managed.
`TwinCatProject/README.md` documents the required project state; the tests
gate on `[TwinCATFact]` / `[TwinCATTheory]` and skip cleanly when
`TWINCAT_TARGET_NETID` is unset or the probe fails.
**Integration layer** (task #221, scaffolded):
`tests/ZB.MOM.WW.OtOpcUa.Driver.TwinCAT.IntegrationTests/`
`TwinCATXarFixture` TCP-probes ADS port 48898 on the host specified by
`TWINCAT_TARGET_HOST` + requires `TWINCAT_TARGET_NETID` (AmsNetId of the
VM). No fixture-owned lifecycle — XAR can't run in Docker because it
bypasses the Windows kernel scheduler, so the VM stays
operator-managed. `TwinCatProject/README.md` documents the required
`.tsproj` project state; the file itself lands once the XAR VM is up +
the project is authored. Three smoke tests:
`Driver_reads_seeded_DINT_through_real_ADS`,
`Driver_write_then_read_round_trip_on_scratch_REAL`, and
`Driver_subscribe_receives_native_ADS_notifications_on_counter_changes`
— all skip cleanly via `[TwinCATFact]` when the runtime isn't
reachable.
**Unit layer**: `tests/ZB.MOM.WW.OtOpcUa.Driver.TwinCAT.Tests/` remains the
primary contract coverage. `FakeTwinCATClient` fakes the
`AddDeviceNotification` flow so tests can trigger callbacks without a running
runtime.
**Unit layer**: `tests/ZB.MOM.WW.OtOpcUa.Driver.TwinCAT.Tests/` is
still the primary coverage. `FakeTwinCATClient` also fakes the
`AddDeviceNotification` flow so tests can trigger callbacks without a
running runtime.
## What it actually covers
### Integration (live runtime)
### Integration (XAR VM, task #221 — code scaffolded, needs VM + project)
Every capability the driver implements is exercised on the wire:
- `TwinCAT3SmokeTests.Driver_reads_seeded_DINT_through_real_ADS` — real AMS
handshake + ADS read of `GVL_Fixture.nCounter` (seeded at 1234, MAIN
increments each cycle)
- `TwinCAT3SmokeTests.Driver_write_then_read_round_trip_on_scratch_REAL`
real ADS write + read on `GVL_Fixture.rSetpoint`
- `TwinCAT3SmokeTests.Driver_subscribe_receives_native_ADS_notifications_on_counter_changes`
— real `AddDeviceNotification` against the cycle-incrementing counter;
observes `OnDataChange` firing within 3 s of subscribe
- **Read**`Driver_reads_seeded_DINT_through_real_ADS` (AMS handshake +
symbolic read of `GVL_Fixture.nCounter`)
- **Write + read round-trip**`Driver_write_then_read_round_trip_on_scratch_REAL`
on `GVL_Fixture.rSetpoint`
- **Array element round-trip**`Driver_round_trips_array_element_write_and_read`
on `GVL_Arrays.aReal1D[5]` (exercises `TwinCATSymbolPath` subscript
rendering)
- **Subscribe (native ADS notifications)**
`Driver_subscribe_receives_native_ADS_notifications_on_counter_changes`;
observes `OnDataChange` firing within 10 s of subscribe
- **Symbol browse (direct client path)**
`Driver_browses_committed_symbol_hierarchy_via_real_ADS` via
`ITwinCATClient.BrowseSymbolsAsync`
- **Symbol browse (through DiscoverAsync + `IAddressSpaceBuilder` pipeline)**
`DiscoverAsync_renders_declared_tags_and_controller_browse_hits_address_space_builder`
verifies the real `TwinCAT/ → device/ → Discovered/` folder tree
- **Auto-reconnect**`Driver_auto_reconnects_after_underlying_client_is_disposed`
disposes the `AdsClient` mid-flight; next read must re-establish
- **Primitive type coverage**`Driver_reads_every_primitive_type_with_correct_mapping`
runs as a `[Theory]` against the 16 primitives in `GVL_Primitives`
(Bool, SInt, USInt, Int, UInt, DInt, UDInt, LInt, ULInt, Real, LReal,
String, Time, TimeOfDay, Date, DateTime) — asserts status + CLR type +
seed value where ergonomic
- **Bit-indexed BOOL**`Driver_reads_bit_indexed_BOOL_from_word` against
`GVL_Primitives.vWord.3` + `.4` (bits of `0xBEEF`)
- **Nested UDT navigation**`Driver_reads_deeply_nested_UDT_path` reads
`GVL_Plant.Line1.Stations[1].Axes[1].Motor.Temperature` (LREAL) + `.Running` (BOOL)
- **Multi-device routing + isolation**
`Driver_routes_reads_per_device_and_isolates_unreachable_peers` pairs the
real runtime with a bogus AmsNetId; healthy device reads still succeed
- **Probe loop + `IHostConnectivityProbe`**
`Probe_loop_raises_host_status_transition_to_Running_on_reachable_target`
asserts `OnHostStatusChanged → Running` and snapshot parity
- **Negative error mappings**
`Driver_reports_errors_for_unknown_tag_and_nonexistent_symbol_and_readonly_write`
covers `BadNodeIdUnknown`, ghost-symbol communication errors, and the
`BadNotWritable` short-circuit
All three gated on `TWINCAT_TARGET_HOST` + `TWINCAT_TARGET_NETID` env
vars; skip cleanly via `[TwinCATFact]` when the VM isn't reachable or
vars are unset.
All tests gate on `TWINCAT_TARGET_NETID` (required) via `[TwinCATFact]` /
`[TwinCATTheory]`; `TWINCAT_TARGET_HOST` (default `localhost`) and
`TWINCAT_TARGET_PORT` (default `851`) are optional overrides.
PR 4.1 / #315 adds `TwinCATUdtBrowseTests.Driver_browses_UDT_tree_and_flattens_to_atomic_leaves`
which exercises `TwinCATDriver.DiscoverAsync` end-to-end against the
`GVL_Plant` UDT fixture. Asserts the discovery surface emits one OPC UA
variable per atomic leaf and folds `aAlarmRecords[1..2000]` into a
single `IsArrayRoot` placeholder when the element count exceeds the
default 1024-element cap (UDT per-member coverage; see
`TwinCatProject/README.md §Complex hierarchy` for the supporting DUTs).
### Unit
@@ -90,69 +73,87 @@ All tests gate on `TWINCAT_TARGET_NETID` (required) via `[TwinCATFact]` /
- `TwinCATReadWriteTests` — read + write through the fake, status mapping
- `TwinCATSymbolPathTests` — symbol-path routing for nested struct members
- `TwinCATSymbolBrowserTests``ITagDiscovery.DiscoverAsync` via
`BrowseSymbolsAsync` + system-symbol filtering
- `TwinCATNativeNotificationTests``AddDeviceNotification` registration,
callback-delivery-to-`OnDataChange` wiring, unregister on unsubscribe
`ReadSymbolsAsync` (#188) + system-symbol filtering
- `TwinCATTypeWalkerTests` — PR 4.1 / #315 nested-UDT decomposition:
atomic / single-level struct / nested struct / array-of-atomic
(in / over `MaxArrayExpansion`) / array-of-struct / alias chain /
pointer skip / self-referencing struct depth-cap / per-leaf
`MaxArrayExpansion` honored / ReadOnly propagation. Stub `IDataType`
/ `IStructType` / `IArrayType` / `IMember` / `IDimensionCollection`
trees built in-test so the walker is exercised without
`Beckhoff.TwinCAT.Ads`-internal ctors.
- `TwinCATNativeNotificationTests``AddDeviceNotification` (#189)
registration, callback-delivery-to-`OnDataChange` wiring, unregister on
unsubscribe
- `TwinCATDriverTests``IDriver` lifecycle
Capability surfaces whose contract is verified at the unit layer: `IDriver`,
`IReadable`, `IWritable`, `ITagDiscovery`, `ISubscribable`,
`IHostConnectivityProbe`, `IPerCallHostResolver`. The integration suite now
verifies `ITagDiscovery` + `IHostConnectivityProbe` on the wire as well.
Capability surfaces whose contract is verified: `IDriver`, `IReadable`,
`IWritable`, `ITagDiscovery`, `ISubscribable`, `IHostConnectivityProbe`,
`IPerCallHostResolver`, `IAlarmSource` (PR 5.1 / #316, gated behind
`EnableAlarms=true` — see capability matrix below).
## Bugs caught by live runs
## Capability matrix
The integration suite surfaced three driver defects that `FakeTwinCATClient`
couldn't, since each lived below the abstraction boundary:
1. **Notification cycle time unit**`NotificationSettings(cycleTime, maxDelay)`
takes **milliseconds** per Beckhoff InfoSys
(`tcadsnetref/7313319051`), but the driver was multiplying by `10_000`
under a "100 ns units" assumption. A requested 250 ms cycle was being set
to ~41 minutes — subscribe never fired. Fix in `AdsTwinCATClient.AddNotificationAsync`.
2. **`STRING(N)` / `WSTRING(N)` type mapper** — `MapSymbolTypeName` only
matched bare `"STRING"` / `"WSTRING"`, so sized strings (the common case)
fell off `BrowseSymbolsAsync` entirely. Fix: strip the `(…)` bound before
the switch.
3. **Bit-indexed BOOL path** — driver was sending `"GVL.vWord.3"` to ADS as
a BOOL read. TwinCAT's symbol table doesn't expose bit-access paths; the
read returned `DeviceSymbolNotFound`. Fix: strip the `.N` suffix, read
the parent word as `uint`, extract the bit locally via `ExtractBit`.
All three paths are now pinned by live-wire tests.
| Capability | Status | Notes |
| --- | --- | --- |
| `IDriver` | yes | Lifecycle + health |
| `IReadable` | yes | Sum-read for scalars; per-tag for bit / array |
| `IWritable` | yes | Sum-write for scalars; per-tag for bit-RMW / array |
| `ITagDiscovery` | yes | Pre-declared + opt-in symbol-table walk |
| `ISubscribable` | yes | Native ADS notifications by default; poll fallback |
| `IHostConnectivityProbe` | yes | `ReadStateAsync` + system-symbol diagnostics |
| `IPerCallHostResolver` | yes | Tag → device hostAddress |
| `IAlarmSource` (PR 5.1 / #316) | partial | Scaffold + unit-tested; live wire decode is best-effort against AMS port 110, see `docs/v3/twincat-eventlogger-spike.md` |
| `IHistoryProvider` | no | Not in scope for this driver family |
## What it does NOT cover
### 1. AMS / ADS wire framing
### 1. AMS / ADS wire traffic
No raw AMS packet is inspected. Beckhoff's `TwinCAT.Ads` NuGet (their own
.NET SDK, not libplctag-style OSS) has no in-process fake at the frame
level; tests run against a real router.
No real AMS router frame is exchanged. Beckhoff's `TwinCAT.Ads` NuGet (their
own .NET SDK, not libplctag-style OSS) has no in-process fake; tests stub
the `ITwinCATClient` abstraction above it.
### 2. Multi-route AMS
ADS supports chained routes (`<localNetId> → <routerNetId> → <targetNetId>`)
for PLCs behind an EC master / IPC gateway. Parse coverage exists; wire-path
coverage is single-hop only.
coverage doesn't.
### 3. Notification coalescing under jitter
### 3. Notification reliability under jitter
`AddDeviceNotification` delivers at the runtime's cycle boundary; under
sustained CPU load or network jitter real notifications can coalesce. The
live test only asserts at-least-one delivery within a generous window —
coalescing behavior under stress isn't verified.
`AddDeviceNotification` delivers at the runtime's cycle boundary; under high
CPU load or network jitter real notifications can coalesce. The fake fires
one callback per test invocation — real callback-coalescing behavior is
untested.
PR 3.1 (#313) makes the per-tag `MaxDelay` configurable via
`TwinCATTagDefinition.MaxDelayMs` — the runtime can buffer changes for up to
that many milliseconds before dispatch, deliberately coalescing bursty
high-frequency signals so the OPC UA queue downstream doesn't flood. Default
`null` / `0` preserves the pre-PR-3.1 "fire ASAP" behaviour.
`TwinCATMaxDelayTests.Driver_coalesces_notifications_at_max_delay` exercises
the wire-side coalescer end-to-end against `GVL_Fixture.nCounter`; the unit
suite (`TwinCATNativeNotificationTests`) covers the plumbing contract via
the `FakeTwinCATClient.FakeNotification.MaxDelayMs` capture.
### 4. TC2 vs TC3 variant handling
TwinCAT 2 (ADS v1) and TwinCAT 3 (ADS v2) have subtly different
`GetSymbolInfoByName` semantics + symbol-table layouts. Driver + tests target
TC3; TC2 compatibility is not exercised.
`GetSymbolInfoByName` semantics + symbol-table layouts. Driver targets TC3;
TC2 compatibility is not exercised.
### 5. Alarms / history
### 5. Cycle-time alignment for `ISubscribable`
Driver doesn't implement `IAlarmSource` or `IHistoryProvider` — not in scope
for this driver family. TwinCAT 3's TcEventLogger could theoretically back
an `IAlarmSource`, but shipping that is a separate feature.
Native ADS notifications fire on the PLC cycle boundary. The fake test
harness assumes notifications fire on a timer the test controls;
cycle-aligned firing under real PLC control is not verified.
### 6. History
Driver doesn't implement `IHistoryProvider` — not in scope for this
driver family. (Alarms now have a dedicated `IAlarmSource` bridge — see
the capability matrix below + `docs/drivers/TwinCAT.md`.)
## When to trust TwinCAT tests, when to reach for a rig
@@ -162,25 +163,135 @@ an `IAlarmSource`, but shipping that is a separate feature.
| "Does notification → `OnDataChange` wire correctly?" | yes (contract) | yes |
| "Does symbol browsing filter TwinCAT internals?" | yes | yes |
| "Does a real ADS read return correct bytes?" | no | yes (required) |
| "Does auto-reconnect work on router restart?" | no (contract only) | yes (required) |
| "Do notifications coalesce under sustained load?" | no | yes (required) |
| "Do notifications coalesce under load?" | no | yes (required) |
| "Does a TC2 PLC work the same as TC3?" | no | yes (required) |
## Performance
PR 2.1 (Sum-read / Sum-write, IndexGroup `0xF080..0xF084`) replaced the per-tag
`ReadValueAsync` loop in `TwinCATDriver.ReadAsync` / `WriteAsync` with a
bucketed bulk dispatch — N tags addressed against the same device flow through a
single ADS sum-command round-trip via `SumInstancePathAnyTypeRead` (read) and
`SumWriteBySymbolPath` (write). Whole-array tags + bit-extracted BOOL tags
remain on the per-tag fallback path because the sum surface only marshals
scalars and bit-RMW writes need the per-parent serialisation lock.
**Baseline → Sum-command delta** (dev box, 1000 × DINT, XAR VM over LAN):
| Path | Round-trips | Wall-clock |
| --- | --- | --- |
| Per-tag loop (pre-PR 2.1) | 1000 | ~58 s |
| Sum-command bulk (PR 2.1) | 1 | ~250600 ms |
| Ratio | — | ≥ 10× typical, ≥ 5× CI floor |
The perf-tier test
`TwinCATSumCommandPerfTests.Driver_sum_read_1000_tags_beats_loop_baseline_by_5x`
asserts the ratio with a conservative 5× lower bound that survives noisy CI /
VM scheduling. It is gated behind both `TWINCAT_TARGET_NETID` (XAR reachable)
and `TWINCAT_PERF=1` (operator opt-in) — perf runs aren't part of the default
integration pass because they hit the wire heavily.
The required fixture state (1000-DINT GVL + churn POU) is documented in
`TwinCatProject/README.md §Performance scenarios`; XAE-form sources land at
`TwinCatProject/PLC/GVLs/GVL_Perf.TcGVL` + `TwinCatProject/PLC/POUs/FB_PerfChurn.TcPOU`.
### Handle caching (PR 2.2)
Per-tag reads / writes route through an in-process ADS variable-handle cache.
The first read of a symbol resolves a handle via `CreateVariableHandleAsync`;
subsequent reads / writes of the same symbol issue against the cached handle.
On the wire this trades a multi-byte symbolic path (`GVL_Perf.aTags[742]` =
20+ bytes) for a 4-byte handle, and the device server skips name resolution
on every subsequent op. Cache lifetime is process-scoped; entries are evicted
on `AdsErrorCode.DeviceSymbolVersionInvalid` (with one retry against a fresh
handle), wiped on reconnect (handles are per-AMS-session), and deleted
best-effort on driver disposal.
`TwinCATHandleCachePerfTests.Driver_handle_cache_avoids_repeat_symbol_resolution`
asserts the contract on real XAR by reading 50 symbols twice and verifying
the second pass issues zero new `CreateVariableHandleAsync` calls. It runs
under the standard `[TwinCATFact]` gate (XAR reachable; no `TWINCAT_PERF`
opt-in needed because 50 symbols is cheap).
**Self-invalidation (PR 2.3)**: handle cache is now self-invalidating on
TwinCAT online changes. `AdsTwinCATClient` registers an
`AdsSymbolVersionChanged` event listener (Beckhoff's high-level wrapper
around the SymbolVersion ADS notification, IndexGroup `0xF008`) on connect;
when the PLC's symbol-version counter increments — full re-init after a
download / activate-config — the listener fires and wipes the handle cache
proactively. Three-layered defence in depth: (1) proactive listener
preempts the next read entirely on full re-inits, (2) the
`DeviceSymbolVersionInvalid` evict-and-retry path from PR 2.2 catches the
narrower "symbol survives but its descriptor moved" race, and (3)
operators can still call `ITwinCATClient.FlushOptionalCachesAsync` manually
for the truly-paranoid case. The bulk Sum-read / Sum-write path remains
on symbolic paths in PR 2.2 (the bulk path's per-call symbol resolution
is already amortised across N tags; the perf delta vs. handle-batched
bulk is marginal — tracked as a follow-up for the Phase-2 perf sweep).
## Diagnostics
PR 3.2 (#314) augments the probe loop. On every successful tick (post `ReadStateAsync`)
the driver also reads four well-known system symbols off the AMS target and stashes
them on `DeviceState.LastDiagnostics` as a `TwinCATDeviceDiagnostics` record. The same
snapshot is folded into `DriverHealth.Diagnostics` so the cross-driver
`driver-diagnostics` RPC (added for Modbus, task #154) renders TwinCAT cycle-time /
jitter / online-change counters next to its peers without a per-driver special-case.
| Symbol | Type | Diagnostic key | Notes |
| --- | --- | --- | --- |
| `TwinCAT_SystemInfoVarList._AppInfo.AppName` | `STRING(80)` | (record only) | Running PLC project name, e.g. `"Plc1"` |
| `TwinCAT_SystemInfoVarList._AppInfo.OnlineChangeCnt` | `UDINT` | `TwinCAT.OnlineChangeCnt` | Increments on every accepted online change; informational |
| `TwinCAT_SystemInfoVarList._TaskInfo[1].CycleTime` | `UDINT` (100 ns ticks) | `TwinCAT.CycleTimeMs` | Configured task period after `÷10000` ms conversion |
| `TwinCAT_SystemInfoVarList._TaskInfo[1].LastExecTime` | `UDINT` (100 ns ticks) | `TwinCAT.LastExecTimeMs` | Wall-clock duration of the last task tick |
| (computed) | `double` | `TwinCAT.JitterMs` | `LastExecTimeMs - CycleTimeMs`; positive = overrun |
| (computed) | `long` | `TwinCAT.OnlineChangeIncrements` | Cumulative deltas observed since the driver started; only emitted once non-zero |
Each individual read is wrapped in best-effort try/catch. A runtime that doesn't
expose `_TaskInfo[1]` (older TwinCAT 2 builds, some soft-PLC implementations) still
produces a partial snapshot; the missing fields fall back to the previous tick's value
or the type default for the first probe tick. Wholesale failure of all four reads
leaves the previous snapshot in place and the next tick retries.
Single-device deployments produce flat keys (`TwinCAT.CycleTimeMs`); multi-device
deployments prefix with the AMS host address (`TwinCAT.<hostAddress>.CycleTimeMs`)
so the readout is unambiguous when one driver instance owns multiple AMS targets.
Wire-level coverage lives in
`TwinCATDiagnosticsIntegrationTests.Probe_loop_surfaces_cycle_time_and_online_change_count`
(asserts `CycleTimeMs > 0` + `OnlineChangeCnt >= 0` within one probe interval against a
reachable XAR runtime). Unit-level coverage of the dictionary shape, the per-symbol
try/catch, and the multi-device prefixing lives in `TwinCATDeviceDiagnosticsTests`
the `FakeTwinCATClient.SetSystemSymbolValue` helper drives the surface deterministically.
## Follow-up candidates
Deferred to v3 — see [`docs/v3/twincat-backlog.md`](../v3/twincat-backlog.md).
Covers TC2 coverage, notification-coalescing-under-load, multi-hop AMS,
license-rotation automation, and a dedicated lab IPC.
1. **XAR VM live-population** — scaffolding is in place (this PR); the
remaining work is operational: stand up the Hyper-V VM, install XAR,
author the `.tsproj` per `TwinCatProject/README.md`, configure the
bilateral ADS route, set `TWINCAT_TARGET_HOST` + `TWINCAT_TARGET_NETID`
on the dev box. Then the three smoke tests transition skip → pass.
Tracked as #221.
2. **License-rotation automation** — XAR's 7-day trial expires on
schedule. Either automate `TcActivate.exe /reactivate` via a
scheduled task on the VM (not officially supported; reportedly works
for some TC3 builds), or buy a paid runtime license (~$1k one-time
per runtime per CPU) to kill the rotation. The doc at
`TwinCatProject/README.md` §License rotation walks through both.
3. **Lab rig** — cheapest IPC (CX7000 / CX9020) on a dedicated network;
the only route that covers TC2 + real EtherCAT I/O timing + cycle
jitter under CPU load.
## Key fixture / config files
- `tests/ZB.MOM.WW.OtOpcUa.Driver.TwinCAT.IntegrationTests/TwinCATXarFixture.cs`
— TCP probe + skip-attributes + env-var parsing
- `tests/ZB.MOM.WW.OtOpcUa.Driver.TwinCAT.IntegrationTests/TwinCAT3SmokeTests.cs`
— wire-level test suite (14 `[TwinCATFact]` + 16-case `[TwinCATTheory]`)
three wire-level smoke tests
- `tests/ZB.MOM.WW.OtOpcUa.Driver.TwinCAT.IntegrationTests/TwinCatProject/README.md`
— project spec + VM setup + license-rotation notes
- `tests/ZB.MOM.WW.OtOpcUa.Driver.TwinCAT.Tests/FakeTwinCATClient.cs`
in-process fake with the notification-fire harness
- `src/ZB.MOM.WW.OtOpcUa.Driver.TwinCAT/TwinCATDriver.cs` — ctor is
`(TwinCATDriverOptions, string driverInstanceId, ITwinCATClientFactory? = null)`
in-process fake with the notification-fire harness used by
`TwinCATNativeNotificationTests`
- `src/ZB.MOM.WW.OtOpcUa.Driver.TwinCAT/TwinCATDriver.cs` — ctor takes
`ITwinCATClientFactory`
+115
View File
@@ -0,0 +1,115 @@
# TwinCAT driver — operator guide
Beckhoff TwinCAT 2 / TwinCAT 3 ADS driver. Talks to the runtime via
`Beckhoff.TwinCAT.Ads` v6 (managed); requires a reachable AMS router on
the host (local TwinCAT XAR, the standalone `Beckhoff.TwinCAT.Ads.TcpRouter`
NuGet, or any Windows box with TwinCAT installed and an authorised AMS
route).
## Configuration surface
`TwinCATDriverOptions` (one instance supports N AMS targets, each a
`TwinCATDeviceOptions`). Wire format mirrors the C# class on the JSON
side — every `init`-only property round-trips through
`System.Text.Json` with the default options.
| Option | Type | Default | Notes |
| --- | --- | --- | --- |
| `Devices` | `TwinCATDeviceOptions[]` | `[]` | One entry per AMS target. |
| `Tags` | `TwinCATTagDefinition[]` | `[]` | Pre-declared symbol set. |
| `Probe.Enabled` | `bool` | `true` | Per-tick `ReadStateAsync` against the runtime. |
| `Probe.Interval` | `TimeSpan` | `5 s` | |
| `Timeout` | `TimeSpan` | `2 s` | Per-operation timeout. |
| `UseNativeNotifications` | `bool` | `true` | False = fall through to PollGroupEngine. |
| `EnableControllerBrowse` | `bool` | `false` | Walk symbol table on `DiscoverAsync`. |
| `MaxArrayExpansion` | `int` | `1024` | Per-element cutoff during nested-UDT browse. |
| `EnableAlarms` (PR 5.1) | `bool` | `false` | Opt-in TC3 EventLogger bridge — see "Alarms" below. |
## Alarms (TC3 EventLogger bridge, PR 5.1 / #316)
When `EnableAlarms=true`, the driver implements `IAlarmSource` by
opening a second `AdsClient` against AMS port **110**
(`AMSPORT_EVENTLOG`) and adding a device notification on
`ADSIGRP_TCEVENTLOG_ALARMS`. Subscribers receive `OnAlarmEvent`
notifications for every transition the EventLogger surfaces (raise /
clear / acknowledge).
### Decode caveat
Beckhoff doesn't ship a managed wrapper for `TcEventLogger` in the
regular `Beckhoff.TwinCAT.Ads` v6 NuGet — only the C++ TcCOM headers
exist. The driver therefore decodes the AMS-port-110 binary payload
manually. The current implementation is best-effort: event class GUIDs
and source names usually decode cleanly; some less-common fields may
surface as `"Unknown"` until a follow-up PR lands a complete decoder.
Spike output captured at
[`docs/v3/twincat-eventlogger-spike.md`](../v3/twincat-eventlogger-spike.md).
### Wire path
| Layer | What it does |
| --- | --- |
| Primary `AdsClient` | The existing per-device session against the PLC runtime port (default `851`) — handles reads / writes / native subscriptions. |
| Secondary `AdsClient` (alarms) | Opens against AMS port `110` on the same target NetId. Adds one device notification on `ADSIGRP_TCEVENTLOG_ALARMS` with a `length=...` payload covering the full alarm-list shape. |
| `ITwinCATAlarmGate` (driver-internal) | Decodes incoming notifications into `TwinCATAlarmEvent` records (`EventClass`, `Source`, `Severity`, `Message`, `OccurrenceUtc`, `Acked`). |
| `TwinCATAlarmSource` | Projects `TwinCATAlarmEvent` onto the driver-agnostic `IAlarmSource.OnAlarmEvent`. |
### Severity mapping (TC3 → OPC UA AC)
TC3 EventLogger severity is a 0255 `USINT`. The driver maps it onto
the four-bucket `AlarmSeverity` enum the OPC UA AC layer consumes:
| TC3 severity | `AlarmSeverity` |
| --- | --- |
| 064 | `Low` |
| 65128 | `Medium` |
| 129192 | `High` |
| 193255 | `Critical` |
### Acknowledge
`AcknowledgeAsync` round-trips through `ITwinCATAlarmGate.AcknowledgeAsync`,
which writes to the EventLogger ack index group. Best-effort — the wire
format isn't documented in managed code, so individual ack failures don't
poison the batch and the gate returns silently when the EventLogger isn't
configured.
### Disabling
`EnableAlarms=false` (default) returns a sentinel handle from
`SubscribeAlarmsAsync` and never opens the secondary `AdsClient`.
`OnAlarmEvent` simply never fires. Capability negotiation still works,
which is why the driver advertises `IAlarmSource` unconditionally.
## CLI
The `otopcua-twincat-cli` test client exposes an `alarms` subcommand
that wraps the bridge end-to-end:
```powershell
dotnet run --project src/ZB.MOM.WW.OtOpcUa.Driver.TwinCAT.Cli -- alarms `
--ams-net-id 5.23.91.23.1.1 --ams-port 851 `
--source Conveyor1.MotorOverload
```
See [`docs/Driver.TwinCAT.Cli.md`](../Driver.TwinCAT.Cli.md) for the
full CLI surface.
## Test coverage
- **Unit**: `TwinCATAlarmSourceTests` covers (a) feature-gating off vs.
on, (b) gate-event projection shape, (c) multi-event ordering, (d)
source-filter matching, (e) acknowledge round-trip, (f) JSON DTO
round-trip.
- **Integration**: `TwinCATAlarmIntegrationTests.Driver_raises_alarm_event_when_PLC_logs_event`
ships build-only in PR 5.1; the GVL + FB_AlarmHarness ship as XAE
stubs at `tests/.../TwinCatProject/PLC/`. Once the XAR project
imports them the test transitions skip → pass.
## See also
- [`docs/v3/twincat-eventlogger-spike.md`](../v3/twincat-eventlogger-spike.md)
— spike output for the managed-wrapper question
- [`docs/drivers/TwinCAT-Test-Fixture.md`](TwinCAT-Test-Fixture.md)
— coverage map + capability matrix
- [`docs/Driver.TwinCAT.Cli.md`](../Driver.TwinCAT.Cli.md) — CLI guide
+20
View File
@@ -0,0 +1,20 @@
# Feature gaps — driver-side limitations and decisions
Cross-driver registry of known capability gaps, the workaround we ship, and
whether the gap is on the roadmap. Each row links to the driver-specific
deep-dive document. Closed entries stay in the table for traceability — they
are not deleted, only marked.
| # | Driver | Gap | Status | Workaround / decision | Roadmap | Reference |
|---|--------|-----|--------|-----------------------|---------|-----------|
| 1 | S7 | **Optimized DB / S7Plus** — S7netplus speaks classic S7comm only and cannot read S7-1200 / S7-1500 DBs that have "Optimized block access" checked (the TIA Portal V14+ default). Absolute-offset reads against an Optimized DB return `BadDeviceFailure`. | **Decided — Track 1 + Track 3 (closed by [#304](https://github.com/dohertj2/lmxopcua/issues/304))** | **Track 1 (docs):** operators uncheck "Optimized block access" in TIA Portal on every DB the driver reads, recompile, and download. **Track 3 (bridge):** for shops that won't or can't disable Optimized access, run an `OpcUaClient` driver instance against the S7-1500 V2.5+ CPU's onboard OPC UA server (Siemens runtime OPC UA license required). | **Track 2 (custom S7Plus library) is out of scope** unless a customer funds the ≥4-week initial implementation plus ongoing protocol-revision maintenance. Sharp7 / Snap7Net don't help — they are also classic-S7comm-only. | [`docs/v2/s7.md` § Optimized DB constraint](v2/s7.md#optimized-db-constraint-s7plus) · [`docs/drivers/OpcUaClient.md`](drivers/OpcUaClient.md) |
## How to read this table
- **Status** is one of `Open` (work pending), `Decided` (architectural
decision made; docs reflect it; no code change planned), or
`Closed` (delivered).
- **Roadmap** captures whether the gap is funded for the next phase. A blank
cell means "no roadmap; doc-only outcome."
- The numeric **#** is stable — new rows append at the bottom and keep their
number across deletions/edits so cross-references survive.
File diff suppressed because it is too large Load Diff
-136
View File
@@ -1,136 +0,0 @@
# Alarm Tracking — v1 archive
> **Historical record.** This document describes the v1 / pre-PR-7.2
> Galaxy alarm path that ran inside `Galaxy.Host`'s STA pump as
> `GalaxyAlarmTracker`. PR 7.2 retired the in-process Galaxy stack; the
> alarms-over-gateway epic (B.2 / B.3 / E.7) restored Galaxy's
> `IAlarmSource` capability against the new gateway-mediated transport.
> See [docs/AlarmTracking.md](../AlarmTracking.md) for the v2 final
> architecture — that is the document to read for current behaviour.
Alarm surfacing is an optional driver capability exposed via `IAlarmSource` (`src/ZB.MOM.WW.OtOpcUa.Core.Abstractions/IAlarmSource.cs`). Drivers whose backends have an alarm concept implement it — today: Galaxy (MXAccess alarms), FOCAS (CNC alarms), OPC UA Client (A&C events from the upstream server). Modbus / S7 / AB CIP / AB Legacy / TwinCAT do not implement the interface and the feature is simply absent from their subtrees.
## IAlarmSource surface
```csharp
Task<IAlarmSubscriptionHandle> SubscribeAlarmsAsync(
IReadOnlyList<string> sourceNodeIds, CancellationToken cancellationToken);
Task UnsubscribeAlarmsAsync(IAlarmSubscriptionHandle handle, CancellationToken cancellationToken);
Task AcknowledgeAsync(IReadOnlyList<AlarmAcknowledgeRequest> acknowledgements,
CancellationToken cancellationToken);
event EventHandler<AlarmEventArgs>? OnAlarmEvent;
```
The driver fires `OnAlarmEvent` for every transition (`Active`, `Acknowledged`, `Inactive`) with an `AlarmEventArgs` carrying the source node id, condition id, alarm type, message, severity (`AlarmSeverity` enum), and source timestamp.
## AlarmSurfaceInvoker
`AlarmSurfaceInvoker` (`src/ZB.MOM.WW.OtOpcUa.Core/Resilience/AlarmSurfaceInvoker.cs`) wraps the three mutating surfaces through `CapabilityInvoker`:
- `SubscribeAlarmsAsync` / `UnsubscribeAlarmsAsync` run through the `DriverCapability.AlarmSubscribe` pipeline — retries apply under the tier configuration.
- `AcknowledgeAsync` runs through `DriverCapability.AlarmAcknowledge` which does NOT retry per decision #143. A timed-out ack may have already registered at the plant floor; replay would silently double-acknowledge.
Multi-host fan-out: when the driver implements `IPerCallHostResolver`, each source node id is resolved individually and batches are grouped by host so a dead PLC inside a multi-device driver doesn't poison sibling breakers. Single-host drivers fall back to `IDriver.DriverInstanceId` as the pipeline-key host.
## Condition-node creation via CapturingBuilder
Alarm-condition nodes are materialized at address-space build time. During `GenericDriverNodeManager.BuildAddressSpaceAsync` the builder is wrapped in a `CapturingBuilder` that observes every `Variable()` call. When a driver calls `IVariableHandle.MarkAsAlarmCondition(AlarmConditionInfo)` on a returned handle, the server-side `DriverNodeManager.VariableHandle` creates a sibling `AlarmConditionState` node and returns an `IAlarmConditionSink`. The wrapper stores the sink in `_alarmSinks` keyed by the variable's full reference, then `GenericDriverNodeManager` registers a forwarder on `IAlarmSource.OnAlarmEvent` that routes each push to the matching sink by `SourceNodeId`. Unknown source ids are dropped silently — they may belong to another driver.
The `AlarmConditionState` layout matches OPC UA Part 9:
- `SourceNode` → the originating variable
- `SourceName` / `ConditionName` → from `AlarmConditionInfo.SourceName`
- Initial state: enabled, inactive, acknowledged, severity per `InitialSeverity`, retain false
- `HasCondition` references wire the source variable ↔ the condition node bidirectionally
Drivers flag alarm-bearing variables at discovery time via `DriverAttributeInfo.IsAlarm = true`. The Galaxy driver, for example, sets this on attributes that have an `AlarmExtension` primitive in the Galaxy repository DB; FOCAS sets it on the CNC alarm register.
## State transitions
`ConditionSink.OnTransition` runs under the node manager's `Lock` and maps the `AlarmEventArgs.AlarmType` string to Part 9 state:
| AlarmType | Action |
|---|---|
| `Active` | `SetActiveState(true)`, `SetAcknowledgedState(false)`, `Retain = true` |
| `Acknowledged` | `SetAcknowledgedState(true)` |
| `Inactive` | `SetActiveState(false)`; `Retain = false` once both inactive and acknowledged |
Severity is remapped: `AlarmSeverity.Low/Medium/High/Critical` → OPC UA numeric 250 / 500 / 700 / 900. `Message.Value` is set from `AlarmEventArgs.Message` on every transition. `ClearChangeMasks(true)` and `ReportEvent(condition)` fire the OPC UA event notification for clients subscribed to any ancestor notifier.
## Acknowledge dispatch
Alarm acknowledgement initiated by an OPC UA client flows:
1. The SDK invokes the `AlarmConditionState.OnAcknowledge` method delegate.
2. The handler checks the session's roles for `AlarmAck` — drivers never see a request the session wasn't entitled to make.
3. `AlarmSurfaceInvoker.AcknowledgeAsync` is called with the source / condition / comment tuple. The invoker groups by host and runs each batch through the no-retry `AlarmAcknowledge` pipeline.
Drivers return normally for success or throw to signal the ack failed at the backend.
## EventNotifier propagation
Drivers that want hierarchical alarm subscriptions propagate `EventNotifier.SubscribeToEvents` up the containment chain during discovery — the Galaxy driver flips the flag on every ancestor of an alarm-bearing object up to the driver root, mirroring v1 behavior. Clients subscribed at the driver root, a mid-level folder, or the `Objects/` root see alarm events from every descendant with an `AlarmConditionState` sibling. The driver-root `FolderState` is created in `DriverNodeManager.CreateAddressSpace` with `EventNotifier = SubscribeToEvents | HistoryRead` so alarm event subscriptions and alarm history both have a single natural target.
## ConditionRefresh
The OPC UA `ConditionRefresh` service queues the current state of every retained condition back to the requesting monitored items. `DriverNodeManager` iterates the node manager's `AlarmConditionState` collection and queues each condition whose `Retain.Value == true` — matching the Part 9 requirement.
## Alarm historian sink
Distinct from the live `IAlarmSource` stream and the Part 9 `AlarmConditionState` materialization above, qualifying alarm transitions are **also** persisted to a durable event log for downstream AVEVA Historian ingestion. This is a separate subsystem from the `IHistoryProvider` capability used by `HistoryReadEvents` (see [HistoricalDataAccess.md](HistoricalDataAccess.md#alarm-event-history-vs-ihistoryprovider)): the sink is a *producer* path (server → Historian) that runs independently of any client HistoryRead call.
### `IAlarmHistorianSink`
`src/ZB.MOM.WW.OtOpcUa.Core.AlarmHistorian/IAlarmHistorianSink.cs` defines the intake contract:
```csharp
Task EnqueueAsync(AlarmHistorianEvent evt, CancellationToken cancellationToken);
HistorianSinkStatus GetStatus();
```
`EnqueueAsync` is fire-and-forget from the producer's perspective — it must never block the emitting thread. The event payload (`AlarmHistorianEvent` — same file) is source-agnostic: `AlarmId`, `EquipmentPath`, `AlarmName`, `AlarmTypeName` (Part 9 subtype name), `Severity`, `EventKind` (free-form transition string — `Activated` / `Cleared` / `Acknowledged` / `Confirmed` / `Shelved` / …), `Message`, `User`, `Comment`, `TimestampUtc`.
The sink scope is defined to span every alarm source (plan decision #15: scripted, Galaxy-native, AB CIP ALMD, any future `IAlarmSource`), gated per-alarm by a `HistorizeToAveva` toggle on the producer. Today only `Phase7EngineComposer.RouteToHistorianAsync` (`src/ZB.MOM.WW.OtOpcUa.Server/Phase7/Phase7EngineComposer.cs`) is wired — it subscribes to `ScriptedAlarmEngine.OnEvent` and marshals each emission into `AlarmHistorianEvent`. Galaxy-native alarms continue to reach AVEVA Historian via the driver's direct `aahClientManaged` path and do not flow through the sink; the AB CIP ALMD path remains unwired pending a producer-side integration.
### `SqliteStoreAndForwardSink`
Default production implementation (`src/ZB.MOM.WW.OtOpcUa.Core.AlarmHistorian/SqliteStoreAndForwardSink.cs`). A local SQLite queue absorbs every `EnqueueAsync` synchronously; a background `Timer` drains batches asynchronously to an `IAlarmHistorianWriter` so operator actions are never blocked on historian reachability.
Queue schema (single table `Queue`): `RowId PK autoincrement`, `AlarmId`, `EnqueuedUtc`, `PayloadJson` (serialized `AlarmHistorianEvent`), `AttemptCount`, `LastAttemptUtc`, `LastError`, `DeadLettered` (bool), plus `IX_Queue_Drain (DeadLettered, RowId)`. Default capacity `1_000_000` non-dead-lettered rows; oldest rows evict with a WARN log past the cap.
Drain cadence: `StartDrainLoop(tickInterval)` arms a periodic timer. `DrainOnceAsync` reads up to `batchSize` rows (default 100) in `RowId` order and forwards them through `IAlarmHistorianWriter.WriteBatchAsync`, which returns one `HistorianWriteOutcome` per row:
| Outcome | Action |
|---|---|
| `Ack` | Row deleted. |
| `PermanentFail` | Row flipped to `DeadLettered = 1` with reason. Peers in the batch retry independently. |
| `RetryPlease` | `AttemptCount` bumped; row stays queued. Drain worker enters `BackingOff`. |
Writer-side exceptions treat the whole batch as `RetryPlease`.
Backoff ladder on `RetryPlease` (hard-coded): 1s → 2s → 5s → 15s → 60s cap. Reset to 0 on any batch with no retries. `CurrentBackoff` exposes the current step for instrumentation; the drain timer itself fires on `tickInterval`, so the ladder governs write cadence rather than timer period.
Dead-letter retention defaults to 30 days (plan decision #21). `PurgeAgedDeadLetters` runs each drain pass and deletes rows whose `LastAttemptUtc` is past the cutoff. `RetryDeadLettered()` is an operator action that clears `DeadLettered` + resets `AttemptCount` on every dead-lettered row so they rejoin the main queue.
### Composition and writer resolution
`Phase7Composer.ResolveHistorianSink` (`src/ZB.MOM.WW.OtOpcUa.Server/Phase7/Phase7Composer.cs`) scans the registered drivers for one that implements `IAlarmHistorianWriter`. Today that is `GalaxyProxyDriver` via `GalaxyHistorianWriter` (`src/ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Proxy/Ipc/GalaxyHistorianWriter.cs`), which forwards batches over the Galaxy.Host pipe to the `aahClientManaged` alarm schema. When a writer is found, a `SqliteStoreAndForwardSink` is instantiated against `%ProgramData%/OtOpcUa/alarm-historian-queue.db` with a 2 s drain tick and the writer attached. When no driver provides a writer the fallback is the DI-registered `NullAlarmHistorianSink` (`src/ZB.MOM.WW.OtOpcUa.Server/Program.cs`), which silently discards and reports `HistorianDrainState.Disabled`.
### Status and observability
`GetStatus()` returns `HistorianSinkStatus(QueueDepth, DeadLetterDepth, LastDrainUtc, LastSuccessUtc, LastError, DrainState)` — two `COUNT(*)` scalars plus last-drain telemetry. `DrainState` is one of `Disabled` / `Idle` / `Draining` / `BackingOff`.
The Admin UI `/alarms/historian` page surfaces this through `HistorianDiagnosticsService` (`src/ZB.MOM.WW.OtOpcUa.Admin/Services/HistorianDiagnosticsService.cs`), which also exposes `TryRetryDeadLettered` — it calls through to `SqliteStoreAndForwardSink.RetryDeadLettered` when the live sink is the SQLite implementation and returns 0 otherwise.
## Key source files
- `src/ZB.MOM.WW.OtOpcUa.Core.Abstractions/IAlarmSource.cs` — capability contract + `AlarmEventArgs`
- `src/ZB.MOM.WW.OtOpcUa.Core/Resilience/AlarmSurfaceInvoker.cs` — per-host fan-out + no-retry ack
- `src/ZB.MOM.WW.OtOpcUa.Core/OpcUa/GenericDriverNodeManager.cs``CapturingBuilder` + alarm forwarder
- `src/ZB.MOM.WW.OtOpcUa.Server/OpcUa/DriverNodeManager.cs``VariableHandle.MarkAsAlarmCondition` + `ConditionSink`
- `src/ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Host/Backend/Alarms/GalaxyAlarmTracker.cs` — Galaxy-specific alarm-event production
- `src/ZB.MOM.WW.OtOpcUa.Core.AlarmHistorian/IAlarmHistorianSink.cs` — historian sink intake contract + `AlarmHistorianEvent` + `HistorianSinkStatus` + `IAlarmHistorianWriter`
- `src/ZB.MOM.WW.OtOpcUa.Core.AlarmHistorian/SqliteStoreAndForwardSink.cs` — durable queue + drain worker + backoff ladder + dead-letter retention
- `src/ZB.MOM.WW.OtOpcUa.Server/Phase7/Phase7EngineComposer.cs``RouteToHistorianAsync` wires scripted-alarm emissions into the sink
- `src/ZB.MOM.WW.OtOpcUa.Server/Phase7/Phase7Composer.cs``ResolveHistorianSink` selects `SqliteStoreAndForwardSink` vs `NullAlarmHistorianSink`
- `src/ZB.MOM.WW.OtOpcUa.Admin/Services/HistorianDiagnosticsService.cs` — Admin UI `/alarms/historian` status + retry-dead-lettered operator action
-29
View File
@@ -1,29 +0,0 @@
# v1 documentation archive
This folder contains documentation that described the original v1
in-process MXAccess architecture (`Galaxy.Host` + `Galaxy.Proxy` +
`Galaxy.Shared` three-project split, .NET 4.8 x86 + COM apartment, the
`OtOpcUaGalaxyHost` Windows service). That architecture was retired in
PR 7.2 (merged 2026-04-30 at commit `ae7106d`). These docs are kept as
the historical record of how the system worked before the v2-mxgw
migration; treat their content as accurate at the time of writing, NOT
as current state.
For current architecture see:
- `CLAUDE.md` — agent-facing v2 overview
- `docs/drivers/Galaxy.md` — current Galaxy driver doc
- `docs/v2/Galaxy.ParityRig.md` — current testing setup
- `docs/v2/Galaxy.Performance.md` — observability + perf
| File | What it covered |
|---|---|
| `AlarmTracking.md` | v1 alarm-tracking flow through the in-process MXAccess client |
| `Configuration.md` | v1 server configuration (`OTOPCUA_GALAXY_*` env vars now live in mxaccessgw config) |
| `DataTypeMapping.md` | Galaxy `mx_data_type` → OPC UA type mapping (still accurate as a reference; the live mapping logic is in `src/ZB.MOM.WW.OtOpcUa.Driver.Galaxy/Browse/DataTypeMap.cs`) |
| `HistoricalDataAccess.md` | v1 IHistoryProvider on the Host side; current path is the server-level HistoryRouter + Wonderware sidecar |
| `Subscriptions.md` | v1 MXAccess subscription mechanics; current path uses gateway StreamEvents |
| `drivers/Galaxy-Repository.md` | v1 Host-side ZB SQL repository client; the gateway owns this path now |
| `drivers/Galaxy-Test-Fixture.md` | v1 test-fixture setup (parity tests + Galaxy.Host EXE spawn) |
| `reqs/GalaxyRepositoryReqs.md`, `reqs/MxAccessClientReqs.md` | Original Phase 0 requirements; satisfied in mxaccessgw repo today |
| `reqs/ServiceHostReqs.md` | Service-hosting requirements including `OtOpcUaGalaxyHost` (GHX-* section); only `OtOpcUa` server hosting remains in scope post-7.2 |
-161
View File
@@ -1,161 +0,0 @@
> **✅ Completed 2026-04-30 — historical record of the parity-rig validation gate for PR 7.2.**
>
> The matrix below was the go/no-go gate for retiring the legacy
> Galaxy.Host backend (PR 7.2). Final run on the dev rig 2026-04-30
> returned 14 passed / 1 skipped / 0 failed; PR 7.2 (commit `fe91d42`)
> deleted the legacy projects + service the next day. The "Running
> the matrix" section is preserved for historical reproducibility but
> the test projects it references (`Driver.Galaxy.ParityTests`) were
> deleted alongside the legacy backend; this matrix is no longer
> runnable. Current Galaxy testing flows through the gateway's own
> test suite (sibling mxaccessgw repo).
# Galaxy backend parity matrix
This document tracks the scenario × result matrix that the
`Driver.Galaxy.ParityTests` suite drives against both Galaxy backends —
the legacy out-of-process **Galaxy.Host** (.NET 4.8 x86 + MXAccess COM,
fronted by `GalaxyProxyDriver`) and the new in-process **mxgateway**
backend (`GalaxyDriver`, .NET 10 + gRPC against `mxaccessgw`).
Maintained alongside Phase 5 (PR 5.W). The Phase 7 default flip
(PR 7.1) consumes this matrix as its go/no-go gate — every row must be
either green or carry an explicit *accepted-delta* justification.
## Reading the matrix
- **Status: green** — the scenario asserts strict parity and passes
(or skips cleanly when the rig isn't up).
- **Status: yellow** — soft pin only (count or shape parity, not value
parity) — acceptable when the underlying COM/gRPC stacks have known
divergences in raw payloads but the surface presented to the
DriverNodeManager is equivalent.
- **Status: red** — divergence detected. Row carries a fix or a
follow-up task ID.
## Scenarios
Last verified end-to-end on the dev parity rig: **2026-04-30**
(legacy `OtOpcUaGalaxyHost` mxaccess backend; mxaccessgw v1.x at
`http://localhost:5120`; sandbox `OtOpcUaParityTest_001` deployed in
the `ZB` galaxy; 13 passed / 1 skipped / 0 failed in 19 minutes).
| PR | Test class | Scenario | Status | Notes |
|----|-----------|----------|--------|-------|
| 5.2 | `BrowseAndReadParityTests` | Same variable set | green | symmetric set diff on full-reference set, after `[]` array-suffix workaround in `GalaxyDiscoverer` |
| 5.2 | `BrowseAndReadParityTests` | Same DataType / SecurityClass / IsHistorized | green | per-attribute meta triple parity |
| 5.2 | `BrowseAndReadParityTests` | Same StatusCode-class on a sampled read | yellow | pins status class (Bad/Uncertain/Good); CLR type intentionally not asserted — see "Accepted deltas" #6 |
| 5.3 | `SubscribeAndEventRateParityTests` | Subscribe returns a handle on each backend | green | symmetric Unsubscribe cleanup |
| 5.3 | `SubscribeAndEventRateParityTests` | Event rate within ±50% over 3s | yellow | both backends fed by the same upstream MXAccess subscriptions; tolerance absorbs scheduler jitter |
| 5.4 | `WriteByClassificationParityTests` | FreeAccess / Operate write status-class parity | yellow | pins status class only; legacy flat-maps every failure to BadInternalError, mxgw distinguishes (BadCommunicationError, BadDeviceFailure, etc.) — see "Accepted deltas" #7 |
| 5.4 | `WriteByClassificationParityTests` | Configure / Tune routes via secured-write | yellow | same status-class pin |
| 5.5 | `AlarmTransitionParityTests` | Same alarm-condition source-node-id set | green | one-way invariant on sub-attribute refs (legacy populated → mxgw matches; legacy null → mxgw free to populate per AlarmRefBuilder) |
| 5.5 | `AlarmTransitionParityTests` | IsAlarm-marked variable count parity | green | soft pin — count must match, doesn't have to be non-zero |
| 5.6 | `HistoryReadParityTests` | Same historized attribute set | green | what HistoryRouter consumes when routing to the Wonderware sidecar |
| 5.6 | `HistoryReadParityTests` | New mxgw GalaxyDriver does not implement `IHistoryProvider` | green | architectural pin from Phase 1 (PR 1.3) on the *new* path; legacy `GalaxyProxyDriver` keeps the interface for back-compat until PR 7.2 — see "Accepted deltas" #8 |
| 5.7 | `ReconnectParityTests` | Reinitialize → both Healthy + reads succeed | green | recovery latency is *not* pinned (legacy: pipe + COM client; mxgw: re-Register gw session) |
| 5.7 | `ReconnectParityTests` | Health diverges only when one side recovers | yellow | soft pin until a toxiproxy-style fault injector lands |
| 5.8 | `ScanStateProbeParityTests` | Same per-platform host set | n/a — deferred | dev rig is licensed for one `$WinPlatform` only; multi-platform parity deferred to a customer rig (PR 4.7's unit tests pin the state-decoder + member-tracking logic) |
| 5.8 | `ScanStateProbeParityTests` | Same `HostState` per overlapping platform | n/a — deferred | same single-platform constraint |
## Accepted deltas
These are intentional differences between the two backends — the parity
suite skips or tolerates them by design.
1. **Transport-entry host name.** The legacy backend's
`IHostConnectivityProbe` surface includes a host entry named after
the Galaxy.Host process identity; the mxgw backend uses the
configured `MxAccess.ClientName`. The names differ, but both are
correct for their respective sessions — the parity test compares
only the platform-host subset.
2. **Reconnect latency cadence.** Legacy reconnect roundtrips an OS
named pipe + an MxAccess COM client + a Galaxy.Host process restart
if the host died. The mxgw reconnect re-Registers the gateway session
over an existing gRPC channel. Sub-second vs multi-second recoveries
are both correct for their own paths; only the eventual `Healthy`
convergence is pinned.
3. **Read-value drift.** A read sampled twice on a live Galaxy can
return different values legitimately. We pin `StatusCode`-class
parity (Bad/Uncertain/Good); value equality is not pinned.
4. **Event-rate variance.** Both backends consume the same upstream
MXAccess publish events but route them through different deserializers
(LMXProxyServer COM events vs gRPC `MxEvent` protos). Scheduler
jitter on either side can shift counts within a 3s window; we pin a
±50% ratio, not strict equality.
5. **`IHistoryProvider` on the new path only.** Phase 1 (PR 1.3) lifted
history off the per-driver path onto the server-owned
`HistoryRouter` for the *new* in-process `GalaxyDriver`. The legacy
`GalaxyProxyDriver` still surfaces `IHistoryProvider` for back-compat
with the legacy server bootstrap path — it's an accepted delta
retired in PR 7.2 alongside the rest of the legacy projects. The
pin we want to enforce is "the new path doesn't regress to per-driver
history."
6. **Read value-CLR-type.** Legacy returns the raw VARIANT (e.g.
`Byte[]`) for an attribute that hasn't received its first value
cycle from MxAccess yet, while mxgw returns the typed value
(`Single`, `Int32`, etc.). Once a real value is written or scanned,
both converge. Pinning CLR-type equality across the uninitialized
window adds noise without a real parity invariant — the
`StatusCode`-class assertion already covers the
"did the read succeed" question.
7. **Write-failure StatusCode mapping.** Legacy
`MxAccessGalaxyBackend.WriteValuesAsync` flat-maps every failure to
`BadInternalError` (`0x80020000`); mxgw
`GatewayGalaxyDataWriter.TranslateReply` uses
`MxStatusProxy.RawDetectedBy` to distinguish gw-layer faults
(`BadCommunicationError`, `0x80050000`) from MxAccess HRESULT
faults (`BadDeviceFailure`, `BadNotConnected`, etc.). Both yield
Bad-status — the parity invariant is the *status class*, not the
exact code. Tighter mapping parity isn't worth investing in: the
legacy mapping retires alongside `GalaxyProxyDriver` in PR 7.2.
8. **Single-platform scope on the dev rig.** Two
`ScanStateProbeParityTests` scenarios are deferred to a customer
rig with multiple deployed `$WinPlatform` instances; this dev box
is licensed for one. PR 4.7's unit tests (`PerPlatformProbeWatcherTests`)
pin the state-decoder + member-tracking logic at the seam level,
so the runtime parity check becomes a customer-rig acceptance gate
before that customer goes live, not a precondition for retiring
the legacy projects on this dev box.
9. **Workaround for the gw `[]` array-suffix bug.**
`mxaccessgw/src/MxGateway.Server/Galaxy/GalaxyRepository.cs:173-175`
appends `[]` to the `full_tag_reference` of array-typed attributes,
which `MxAccess COM IInstance.AddItem` doesn't accept. The lmxopcua
discoverer (`GalaxyDiscoverer.StripArraySuffix`) defensively strips
the suffix. Tracked in `mxaccessgw/requirements-array-suffix-fix.md`;
the workaround is removed when that gw fix lands.
## Outstanding deltas
None as of 2026-04-30. Phase 7 (PR 7.1) flipped the default to
`mxgw`; PR 7.2 (legacy project deletion) is unblocked — the matrix
gate is satisfied and no further soak/pilot precondition applies.
## Running the matrix
```bash
# Both backends must be reachable for any row to run; rows skip
# cleanly when their backend is unavailable.
dotnet test tests/ZB.MOM.WW.OtOpcUa.Driver.Galaxy.ParityTests/
```
Environment overrides for the mxgw backend:
| Variable | Default | Purpose |
|----------|---------|---------|
| `OTOPCUA_PARITY_GW_ENDPOINT` | `http://localhost:5120` | mxaccessgw gRPC endpoint |
| `OTOPCUA_PARITY_GW_API_KEY` | `parity-suite-key` | API key handed to `MxGatewayClient` |
| `OTOPCUA_PARITY_CLIENT_NAME` | `OtOpcUa-Parity` | `MxAccess.ClientName` for the session |
The legacy backend reads ZB SQL on `localhost:1433` and spawns
`OtOpcUa.Driver.Galaxy.Host.exe` from
`src/ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Host/bin/Debug/net48/` — both
must exist for the legacy half to resolve.
-381
View File
@@ -1,381 +0,0 @@
# Galaxy parity rig — runbook
> ✅ **Completed 2026-04-30 — historical record.** This runbook is the
> recipe that produced the green parity matrix that gated PR 7.2
> (retire legacy Galaxy projects, merged at commit `ae7106d`). The
> matrix it produced is captured in
> [`Galaxy.ParityMatrix.md`](Galaxy.ParityMatrix.md), also marked
> historical. The test project this doc drove
> (`Driver.Galaxy.ParityTests`) was deleted in PR 7.2, along with
> `Driver.Galaxy.{Host,Proxy,Shared}` and the `OtOpcUaGalaxyHost`
> Windows service. **You cannot re-run this rig today.** Current
> Galaxy testing flows through the gateway's own test suite in the
> sibling `mxaccessgw` repo.
>
> The text below is preserved as-written so the migration trail (what
> was tested, against what shape, with what env vars) stays auditable.
Brings up both Galaxy backends side-by-side against a single live Galaxy
so the parity matrix in `docs/v2/Galaxy.ParityMatrix.md` and the soak
scenario in `tests/ZB.MOM.WW.OtOpcUa.Driver.Galaxy.ParityTests/SoakScenarioTests.cs`
can run for real. Closing the parity matrix was the gate for PR 7.2
(retire legacy Galaxy projects).
## Conceptual layout
```
Galaxy ZB SQL ──┬── OtOpcUaGalaxyHost (NSSM service, net48 x86) [DELETED in PR 7.2]
│ └── MxAccess COM, ClientName "OtOpcUa-Galaxy.Host"
│ └── named pipe "OtOpcUaGalaxy"
│ ▲
│ │ pipe IPC
│ │
│ GalaxyProxyDriver ◄── parity test (legacy half)
└── mxaccessgw service
└── MxAccess COM, ClientName "OtOpcUa-Parity"
└── gRPC on http://localhost:5120
│ gRPC
GalaxyDriver (in-process) ◄── parity test (mxgw half)
```
Both halves talk to the **same Galaxy** through **two distinct MxAccess
sessions** (different ClientNames so they don't evict each other).
## What was on the dev box at the time
Per `~/.claude/projects/.../memory/` *as of the rig run*:
- **AVEVA System Platform + Galaxy + MXAccess runtime**`project_aveva_platform_installed.md`.
- **`OtOpcUaGalaxyHost`** Windows service running as `dohertj2`, NSSM-wrapped,
binary at `C:\publish\OtOpcUaGalaxyHost\OtOpcUa.Driver.Galaxy.Host.exe`,
shared secret at `.local/galaxy-host-secret.txt`, ZB SQL on `localhost:1433`
`project_galaxy_host_installed.md`. **(Service uninstalled and binary
retired as part of PR 7.2; the host source project no longer exists in
this repo.)**
- **Parity test project** (`Driver.Galaxy.ParityTests`) — committed and
skip-clean at the time of the rig run. **Deleted in PR 7.2.**
## Setup steps (one-time)
### 1. Build + run mxaccessgw
The gateway source is at `c:\Users\dohertj2\Desktop\mxaccessgw\`.
Build both halves — the worker has to be x86 net48 (MxAccess COM
bitness), the server is .NET 10:
```powershell
cd C:\Users\dohertj2\Desktop\mxaccessgw
dotnet build src\MxGateway.Worker -c Release # produces bin\x86\Release\net48\MxGateway.Worker.exe
dotnet build src\MxGateway.Server -c Release # produces bin\Release\net10.0\MxGateway.Server.dll
```
Initialize the auth database and mint an API key. The CLI mode is
gated by an `apikey` first-arg prefix:
```powershell
$env:MxGateway__ApiKeyPepper = "parity-rig-dev-pepper" # any stable string for dev
$srv = "C:\Users\dohertj2\Desktop\mxaccessgw\src\MxGateway.Server\bin\Release\net10.0\MxGateway.Server.dll"
dotnet $srv apikey init-db # → "init-db: initialized"
dotnet $srv apikey create-key `
--key-id parity-rig `
--display-name "OtOpcUa-Parity" `
--scopes "session:open,session:close,invoke:read,invoke:write,invoke:secure,events:read,metadata:read"
# → "API key: mxgw_parity-rig_<base64suffix>" ← capture this; you can't list secrets later
```
Save that exact key string for `OTOPCUA_PARITY_GW_API_KEY` in step 2.
Run the server with three env-var overrides — the defaults don't
quite match what gRPC + the parity test need:
```powershell
$env:MxGateway__ApiKeyPepper = "parity-rig-dev-pepper" # MUST match the create-key invocation
$env:Kestrel__Endpoints__Http__Url = "http://localhost:5120"
$env:Kestrel__Endpoints__Http__Protocols = "Http2" # gRPC needs h2c on plain HTTP
$env:MxGateway__Worker__ExecutablePath = `
"C:\Users\dohertj2\Desktop\mxaccessgw\src\MxGateway.Worker\bin\x86\Release\net48\MxGateway.Worker.exe"
# appsettings.json's relative path is missing the \net48 segment; absolute path sidesteps that
dotnet $srv
# → "Now listening on: http://localhost:5120"
```
The worker spawns lazily on the first OpenSession RPC — there's no
worker process visible in Task Manager until the first session. If
the worker can't spawn, the server returns `Failed to open session
session-…` with a `WorkerProcessLaunchException` in the server log.
NSSM-wrap it later if the rig becomes long-lived; for first-pass
provisioning a console window is easier to inspect.
### 2. Set the parity env vars
In the test-runner shell:
```powershell
$env:OTOPCUA_PARITY_GW_ENDPOINT = "http://localhost:5120"
$env:OTOPCUA_PARITY_GW_API_KEY = "parity-suite-key" # match the gw config
$env:OTOPCUA_PARITY_CLIENT_NAME = "OtOpcUa-Parity"
```
Elevation status doesn't matter — the legacy Galaxy.Host pipe ACL accepts
elevated and non-elevated `dohertj2` shells alike (the Administrators deny
ACE was removed 2026-04-24; see `project_galaxy_host_installed.md`).
### 3. Verify both halves resolve
```powershell
cd C:\Users\dohertj2\Desktop\lmxopcua
dotnet test tests\ZB.MOM.WW.OtOpcUa.Driver.Galaxy.ParityTests\ `
--filter "FullyQualifiedName~HarnessShapeTests"
```
`Harness_records_a_skip_reason_for_each_unavailable_backend` is the
two-line truth-teller:
- Both `LegacyDriver` non-null + both `MxGatewayDriver` non-null → rig is up.
- One side null → read its `LegacySkipReason` / `MxGatewaySkipReason` and fix.
## Running the matrix
Once both halves resolve:
```powershell
dotnet test tests\ZB.MOM.WW.OtOpcUa.Driver.Galaxy.ParityTests\ `
--filter "Category=ParityE2E"
```
This runs all 17 scenario tests across the seven scenario classes
(BrowseAndRead / Subscribe / Write / Alarm / History / Reconnect /
ScanState). Each scenario class is independent — failures in one don't
block the rest.
Track the result against `docs/v2/Galaxy.ParityMatrix.md`. Update each
row to:
- **green** if the scenario passes
- **yellow** if it skipped because the dev Galaxy doesn't have the right
shape (see coverage matrix below)
- **red** if it asserted a real delta — those are the deltas that block
PR 7.2; chase each before retiring the legacy backend
## Galaxy shape needed for full coverage
Skip-on-empty-shape scenarios fail-soft today. To turn a skip into a
real result, the dev Galaxy needs the shape in the right column:
| Scenario | Needs | Local rig |
|---|---|---|
| `BrowseAndReadParityTests` (3 tests) | Any deployed objects with attributes | ✅ existing seed |
| `SubscribeAndEventRateParityTests` event-rate | ≥5 attributes whose values *change* in 3s | ⚙ scriptable via graccess-cli |
| `WriteByClassificationParityTests` (FreeAccess/Operate) | A FreeAccess/Operate numeric attribute | ⚙ scriptable via graccess-cli |
| `WriteByClassificationParityTests` (Configure/Tune) | A Configure/Tune attribute | ⚙ scriptable via graccess-cli |
| `AlarmTransitionParityTests` (2 tests) | Attributes with the `$Alarm*` extension | ⚙ scriptable via graccess-cli |
| `HistoryReadParityTests` (historized set) | Attributes with the History extension | ⚙ scriptable via graccess-cli |
| `ScanStateProbeParityTests` (2 tests) | Multiple `$WinPlatform` / `$AppEngine` objects | ❌ **deferred to customer rig** — this dev box is provisioned for one platform only |
### The single-platform constraint
The dev box at `DESKTOP-6JL3KKO` is licensed / configured for a single
deployed `$WinPlatform`. Adding a second platform isn't feasible here,
so `ScanStateProbeParityTests` will skip in a "no overlap" branch on
this rig. Both of its scenarios already handle that case gracefully
(`Assert.Skip("no overlapping platform hosts between backends — likely
the transport names differ but no $WinPlatform was discovered")`), so
the matrix reports them as **n/a (deferred)** rather than red.
Plan: defer the two ScanState scenarios to a customer rig with multiple
platforms. The PR 7.2 gate accepts "n/a, deferred" on these rows
provided the legacy `GalaxyRuntimeProbeManager` and the in-process
`PerPlatformProbeWatcher` have matching unit-test coverage of the
state-decoder + member-tracking logic — which they do (PR 4.7's tests).
Treat the runtime parity check as a customer-rig acceptance gate before
that customer goes live, not a precondition for retiring the legacy
projects on this dev box.
### Provisioning the rest via graccess-cli
`C:\Users\dohertj2\Desktop\graccess\graccess_cli\` is a .NET Framework
4.8 console app over the ArchestrA GRAccess COM API. It can configure
templates, instances, attributes, UDAs, extensions, and attribute
security — i.e. every row above marked ⚙ scriptable. Full surface in
`graccess/graccess_cli/docs/usage.md` and per-area workflow guides
(`attribute-editing.md`, `template-editing.md`,
`template-instance-editing.md`).
Reserve a sandbox UDO (e.g. `OtOpcUaParityTest`) to avoid mutating
attributes on plant-relevant objects. Concrete commands per requirement:
**A FreeAccess/Operate numeric attribute** (covers WriteByClassification
FreeAccess/Operate scenario):
```powershell
graccess object uda add `
--galaxy ZB --name OtOpcUaParityTest --type template `
--uda OperateValue --data-type MxFloat `
--category MxCategoryWriteable_C --security MxSecurityOperate `
--confirm --confirm-target OtOpcUaParityTest
```
**A Configure / Tune attribute** (covers WriteByClassification
Configure/Tune scenario):
```powershell
# Tune
graccess object uda add `
--galaxy ZB --name OtOpcUaParityTest --type template `
--uda TuneValue --data-type MxFloat `
--category MxCategoryWriteable_T --security MxSecurityTune `
--confirm --confirm-target OtOpcUaParityTest
# Configure
graccess object uda add `
--galaxy ZB --name OtOpcUaParityTest --type template `
--uda ConfigValue --data-type MxFloat `
--category MxCategoryWriteable_C --security MxSecurityConfigure `
--confirm --confirm-target OtOpcUaParityTest
```
**A changing-value attribute** (covers Subscribe event-rate scenario).
Two ways:
1. *On-scan increment* — bind a script extension that bumps a counter
each scan. Simplest to author with `object extension add` against
`ScriptExtension` plus `object attribute set` for the script body
(see `attribute-editing.md` §"Edit Extensions" for the pattern).
2. *External writer loop* — leave the attribute as plain Float and run
a one-liner that writes incrementing values from the parity-test
shell. Uses the legacy backend path so it's available before the
mxgw subscriber is up. This keeps the Galaxy template clean.
For first-pass validation pick #2 — no template surgery needed, and the
write loop runs only during `dotnet test`.
**Attributes with the `$Alarm*` extension** (covers AlarmTransition
scenario). Per `attribute-editing.md` §"Edit Alarm Settings" the
likely-named attributes vary by extension type
(`Limit`, `RateOfChange`, etc.). Add the extension via:
```powershell
graccess object extension add `
--galaxy ZB --name OtOpcUaParityTest --type template `
--extension-type AnalogLimitAlarm --primitive AlarmInput `
--object-extension `
--confirm --confirm-target OtOpcUaParityTest
```
Then set HiHi/Hi/Lo/LoLo limit values + priority on the resulting
attributes via `object attribute set`. Inspect first via
`object attributes` to see the names the extension introduces — they
differ across Aveva versions.
**Attributes with the History extension** (covers HistoryRead routing
scenario). History settings are usually attribute or extension
attributes; `attribute-editing.md` §"Edit History Settings" covers the
discovery flow. Quick start:
```powershell
graccess object extension add `
--galaxy ZB --name OtOpcUaParityTest --type template `
--extension-type HistoryExtension --primitive HistoryRecord `
--object-extension `
--confirm --confirm-target OtOpcUaParityTest
# Then enable history on whichever attribute the extension points at
graccess object attribute set `
--galaxy ZB --name OtOpcUaParityTest --type template `
--attribute HistoryEnabled --value true --data-type bool `
--confirm --confirm-target OtOpcUaParityTest
```
**Deploy + restart Galaxy.Host after any of the above** so MxAccess
sees the change:
```powershell
graccess object deploy --galaxy ZB --name OtOpcUaParityTest_001 `
--confirm --confirm-target OtOpcUaParityTest_001
sc.exe restart OtOpcUaGalaxyHost # service no longer exists post-PR-7.2; in the modern shape, restart mxaccessgw instead
```
Then re-run the parity matrix. The previously-skipped scenarios should
now find a sandbox attribute matching their selector and assert.
## Soak run
The 24h × 50k soak gates the production confidence half of PR 7.2.
```powershell
$env:OTOPCUA_SOAK_RUN = "1"
$env:OTOPCUA_SOAK_TAGS = "<actual tag count if Galaxy < 50k>"
$env:OTOPCUA_SOAK_MINUTES = "1440" # default 24h; compress for first runs
$env:OTOPCUA_SOAK_DROP_PCT = "0.5"
dotnet test tests\ZB.MOM.WW.OtOpcUa.Driver.Galaxy.ParityTests\ `
--filter "Category=Soak"
```
The test logs a per-minute CSV-style line to stdout:
```
soak,1.0,received=51234,dispatched=51234,dropped=0,ws_mb=412
soak,2.0,received=102468,dispatched=102468,dropped=0,ws_mb=415
...
```
Capture stdout to a file for post-run analysis. The three guards
(`received` growing, `dropped/received` ratio, working-set delta) all
fire mid-run rather than at end-of-test, so a failure surfaces within
the first few minutes if the architecture is wrong.
## Compressed-tag soak (when Galaxy isn't 50k tags)
A first-pass validation is fine with the override:
```powershell
$env:OTOPCUA_SOAK_RUN = "1"
$env:OTOPCUA_SOAK_TAGS = "500" # whatever the dev Galaxy has
$env:OTOPCUA_SOAK_MINUTES = "60" # one hour is enough to surface plumbing bugs
$env:OTOPCUA_SOAK_DROP_PCT = "1.0"
```
This validates the *plumbing* (bounded channel, pump invariants, leak
guard) but doesn't pin the 50k-tag scaling assertion. Defer the full
50k validation to a customer rig with that scale, or build a synthetic
Galaxy with a script that imports 50k attributes onto a generated UDO
(~2 hours of one-off work).
## Troubleshooting
- **`MxGatewaySkipReason` says "mxaccessgw not reachable"** — the gw
isn't listening, or it's on a different port. `Test-NetConnection
localhost -Port 5120` is the quick check.
- **`MxGatewaySkipReason` says "mxgateway backend boot failed:
RpcException: Unauthenticated"** — API key mismatch. Verify the
`OTOPCUA_PARITY_GW_API_KEY` env var matches the gw's configured key.
- **`LegacySkipReason` says "Galaxy ZB SQL not reachable on
localhost:1433"** — SQL Server isn't running, or its TCP listener is
off. Check `services.msc` for the SQL Server (default) instance.
- **`LegacySkipReason` says "Galaxy.Host EXE not built"** — at rig time
the parity harness looked under
`src/ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Host/bin/Debug/net48/` for the
EXE it spawned as a subprocess, separate from the published copy at
`C:\publish\OtOpcUaGalaxyHost\` used by the Windows service. **Both
the source project and the published binary were removed in PR 7.2,
so this troubleshooting branch no longer applies — the legacy half
cannot be brought up at all.**
- **Both halves resolve but parity scenarios assert deltas** — that's
the expected outcome the rig exists to surface. Review each delta
against `docs/v2/Galaxy.ParityMatrix.md`'s "Accepted deltas" section
to decide whether it's a real bug or a pre-accepted divergence.
## After the rig is green
When the matrix is fully green or carries documented accepted-deltas,
PR 7.2 (legacy project deletion) is unblocked. The only follow-up is
to promote any newly-discovered accepted-delta to the matrix doc with
the why so the matrix history stays auditable.
-152
View File
@@ -1,152 +0,0 @@
# Galaxy backend performance
This document covers the performance surface of the in-process
`GalaxyDriver` (the v2 mxgw backend) — the ActivitySource it emits, the
metrics on its EventPump, the soak scenario that validates it, and the
tuning knobs you can reach for when the dev parity rig surfaces a hot
spot.
## Tracing surface (PR 6.1)
The driver emits spans on the `ZB.MOM.WW.OtOpcUa.Driver.Galaxy`
ActivitySource. No package dependency on OpenTelemetry — the host
process picks the listener (OTLP exporter, dotnet-trace, Application
Insights). Wire it via `OpenTelemetry.Trace.AddSource(...)` in the
host's tracing pipeline.
| Span | Source | Tags |
|------|--------|------|
| `galaxy.subscribe_bulk` | `TracedGalaxySubscriber` | `galaxy.client`, `galaxy.tag_count`, `galaxy.buffered_interval_ms`, `galaxy.success_count` |
| `galaxy.unsubscribe_bulk` | `TracedGalaxySubscriber` | `galaxy.client`, `galaxy.tag_count` |
| `galaxy.stream_events` | `TracedGalaxySubscriber` | `galaxy.client`, `galaxy.event_count` (set on stream end) |
| `galaxy.write` | `TracedGalaxyDataWriter` | `galaxy.client`, `galaxy.tag_count`, `galaxy.secured_write_count`, `galaxy.success_count` |
| `galaxy.get_hierarchy` | `TracedGalaxyHierarchySource` | `galaxy.client`, `galaxy.object_count` |
The stream-events span deliberately covers the *entire* stream lifetime
rather than per-event spans — at 50k tags / 1Hz the per-event volume
would dominate the trace pipeline. Per-event visibility flows through
the metrics surface instead.
## Metrics surface (PR 6.2)
`EventPump` publishes three counters on the
`ZB.MOM.WW.OtOpcUa.Driver.Galaxy` meter, each tagged with
`galaxy.client` so multi-driver hosts can split by source:
| Counter | Unit | Meaning |
|---------|------|---------|
| `galaxy.events.received` | `{event}` | MxEvents read from the gateway StreamEvents stream |
| `galaxy.events.dispatched` | `{event}` | MxEvents that made it through the bounded channel into `OnDataChange` |
| `galaxy.events.dropped` | `{event}` | MxEvents discarded because the bounded channel was full (newest-dropped) |
The invariant is `received = dispatched + dropped + (in-flight in the
channel)`. Watch the dropped counter — it is the leading indicator of
listener back-pressure. A non-zero dropped rate means a downstream
consumer (DriverNodeManager → UA notification queue → client) is
slower than the gw event stream; investigate that consumer before
raising `EventPump` channel capacity.
### Bounded channel design
The pump runs two background tasks:
1. **Producer** — reads from `IGalaxySubscriber.StreamEventsAsync`,
increments `events.received`, and `TryWrite`s into a bounded
`Channel<MxEvent>`. When the channel is full, the producer counts
the drop and continues reading the gw stream so back-pressure does
not propagate upstream (which would stall the gw worker and cascade
to *all* driver instances sharing that worker).
2. **Consumer** — reads from the channel, fans out via
`SubscriptionRegistry`, increments `events.dispatched`.
Default channel capacity is 50_000 (one second of headroom at 50k
tags / 1Hz). Override via the `EventPump` constructor's
`channelCapacity` parameter; the public-facing wiring path in
`GalaxyDriver.EnsureEventPumpStarted` does not yet expose this through
`GalaxyDriverOptions` because no parity scenario has needed it. Add it
when soak data does.
## Buffered update interval (PR 6.3)
`MxAccess.PublishingIntervalMs` (default 1000) flows through both
subscribe paths:
- `GalaxyDriver.SubscribeAsync` — the caller's `publishingInterval`
wins when non-zero (the server's UA subscription publishingInterval
drives this in production). When the caller passes
`TimeSpan.Zero`, the configured option is the fallback.
- `PerPlatformProbeWatcher` — the watcher passes the configured value
through `SubscribeBulkAsync` so probe `ScanState` changes publish at
the deployment's chosen cadence.
A session-level `SetBufferedUpdateInterval` RPC exists in the gw
protocol but the .NET client doesn't expose a typed helper yet —
adjusting an existing subscription's interval mid-flight is a
follow-up. Today's path subscribes once at the right interval, which
covers the common case.
## Soak scenario (PR 6.4)
`SoakScenarioTests.Soak_HoldsSubscription_AndKeepsEventStreamFlowing`
in `Driver.Galaxy.ParityTests` is the long-running validation. It
subscribes a configurable tag count (default 50_000), holds the
subscription for a configurable duration (default 24h), polls the
three counters every minute, and asserts:
- `events.received` continues to grow (gw stream isn't stuck)
- `events.dropped / events.received` stays under the configured
ceiling (default 0.5%)
- process working-set doesn't grow more than 1 GB above baseline
(leak guard)
Always skipped unless the operator opts in:
```bash
# Full 24h × 50k soak (production validation)
OTOPCUA_SOAK_RUN=1 dotnet test tests/ZB.MOM.WW.OtOpcUa.Driver.Galaxy.ParityTests/
# Compressed CI-friendly run (10min × 1k tags, 1% drop ceiling)
OTOPCUA_SOAK_RUN=1 OTOPCUA_SOAK_MINUTES=10 OTOPCUA_SOAK_TAGS=1000 \
OTOPCUA_SOAK_DROP_PCT=1.0 \
dotnet test tests/ZB.MOM.WW.OtOpcUa.Driver.Galaxy.ParityTests/
```
The scenario writes a per-minute CSV-style row to stdout
(`soak,<minutes>,received=…,dispatched=…,dropped=…,ws_mb=…`) so an
operator can grep the test runner output mid-run.
## Tuned defaults (PR 6.5)
| Option | Default | Source | Notes |
|--------|---------|--------|-------|
| `Gateway.ConnectTimeoutSeconds` | 10 | unchanged | Cold-start network paths fit comfortably; soak never observed >2s |
| `Gateway.DefaultCallTimeoutSeconds` | 30 | **bumped from 5** in PR 6.5 | A 50k-tag `SubscribeBulk` can exceed 5s under MxAccess COM apartment lock contention; 30s leaves headroom while still failing fast on a wedged worker |
| `Gateway.StreamTimeoutSeconds` | 0 (unlimited) | unchanged | The stream must run for the lifetime of the driver |
| `MxAccess.PublishingIntervalMs` | 1000 | unchanged | Matches the legacy `LMXProxyServer` cadence; deployments needing tighter health visibility can dial down |
| `Reconnect.InitialBackoffMs` | 500 | unchanged | First retry shouldn't dogpile a recovering gw |
| `Reconnect.MaxBackoffMs` | 30_000 | unchanged | 30s ceiling so a long-down gw doesn't sit in 5+ min backoff |
| `Repository.DiscoverPageSize` | 5000 | unchanged | One Galaxy page round-trip per ~5k objects; soak hadn't surfaced pressure |
| `EventPump` channel capacity | 50_000 | unchanged | One second of headroom at 50k tags / 1Hz |
The unchanged rows are not "definitely correct" — they are "no live
data argues for changing them." Re-run the soak scenario after every
substantive driver change, and revise this table when the data does.
## Where to look first when something's slow
1. **Slow `Discover`?** Inspect `galaxy.get_hierarchy` span duration
and `galaxy.object_count`. The gw walks the Galaxy DB serially;
slow Discovers usually mean a slow ZB SQL.
2. **Subscribe pile-up?** `galaxy.subscribe_bulk` span duration
correlates with `galaxy.tag_count`. If duration ÷ tag_count starts
climbing, the gw worker is probably under apartment-lock pressure.
3. **Events stalled?** Watch `galaxy.events.received`. Flat-lined
means the gw stream is wedged — kick the reconnect supervisor by
forcing a `ReinitializeAsync`.
4. **Dropped events?** Non-zero `galaxy.events.dropped` means a slow
downstream consumer. Profile `OnDataChange` handlers in
`DriverNodeManager` before bumping the channel capacity.
5. **Memory growing?** Confirm with the soak scenario's working-set
leak guard. Likely culprits: lingering subscription handles in
`SubscriptionRegistry`, or a downstream consumer retaining
`DataValueSnapshot` references past their useful life.
+73
View File
@@ -0,0 +1,73 @@
# Decisions
Architecture-level decisions taken during the v2 implementation, captured
once and referenced from feature docs / PR descriptions / ADR-style
follow-ups. Each entry lists the decision, the alternatives we considered,
and the rationale that tipped the call.
## FOCAS write-path opt-in
**Issue:** [#268](https://github.com/dohertj2/lmxopcua/issues/268). **Plan PR:** F4-a.
### Decision
The FOCAS driver ships writes behind two independent opt-ins, both default
off:
1. **Driver-level master switch**`FocasDriverOptions.Writes.Enabled`,
default `false`. When off, every entry in a `WriteAsync` batch short-
circuits to `BadNotWritable` with status text `writes disabled at
driver level`. The wire client is never touched.
2. **Per-tag opt-in**`FocasTagDefinition.Writable`, default `false`
(flipped from `true` in F4-a). A `Writable = false` tag returns
`BadNotWritable` even when the driver-level flag is on.
`BadNotSupported` is reserved for kinds the wire client hasn't yet
implemented; F4-b/c land actual macro / parameter / PMC writes that
currently dispatch to `BadNotSupported` (or to `Good` against the F4-a
fake) for unimplemented branches.
### Alternatives considered
- **Always-on writes (the pre-F4-a default).** Rejected: a single
misconfigured tag flipping `Writable = true` by accident would let an
operator overwrite a CNC parameter from any OPC UA client. The two-
opt-in posture means an accidental tag flip alone isn't enough.
- **Driver-level switch only.** Rejected: doesn't protect against an
operator with admin rights flipping the master switch to do bulk diag
reads but inheriting write capability for tags that were intended
read-only.
- **Per-tag opt-in only.** Rejected: doesn't give the deployment an "all
writes off" emergency lever — useful during a CNC commissioning where
writes are unsafe across the board for a period.
### Rationale
CNC writes are non-idempotent in the field's worst-case shape: feed
overrides, M-code pulses, alarm acks, recipe-step advances. Two opt-ins
is the cheapest defence-in-depth posture that still lets writes ship.
Both default off so a fresh deployment is read-only — the explicit choice
to enable writes lands at config time where it's reviewable, not at
runtime where it's invisible.
`WriteIdempotent` plumbs through `CapabilityInvoker.ExecuteWriteAsync`
into the Polly retry pipeline; default `false` means failed writes are
not auto-retried (plan decisions #44 / #45). Per-tag flip required for
genuinely-idempotent writes.
### CLI carve-out
`otopcua-focas-cli write` sets `Writes.Enabled = true` locally for the
lifetime of one process and synthesises a `Writable = true` tag. The CLI
is a per-operator direct-to-CNC tool — not a long-lived process bound to
the central config DB. Configuring the server still requires both opt-ins
to be set explicitly in the DriverInstance JSON. The bypass is documented
in `docs/Driver.FOCAS.Cli.md` so operators understand the asymmetry.
### Migration
Pre-F4-a deployments that relied on the `Writable = true` default need to
add `"Writable": true` to every tag they intend to write + an enclosing
`"Writes": { "Enabled": true }` block in their DriverInstance JSON.
Bootstrap rows seeded before F4-a get `Writable = false` after upgrade —
this is intentional; review-then-flip is the safer migration path.
+42 -88
View File
@@ -4,7 +4,6 @@
>
> **Branch**: `v2`
> **Created**: 2026-04-17
> **Updated 2026-04-28**: Docker workloads moved off the Windows dev VM to a shared Linux Docker host at `10.100.0.35` so the dev VM can have its GPU re-attached via ESXi passthrough (Hyper-V/WSL2 was blocking it). The two-tier model below is updated accordingly: per-developer Docker Desktop is gone; SQL Server + driver fixtures all live on the central Linux host, identifiable via `docker ps --filter label=project=lmxopcua`.
## Scope
@@ -14,31 +13,30 @@ Every external resource a developer needs on their machine, plus the dedicated i
## Two Environment Tiers
Per decision #99 (updated 2026-04-28):
Per decision #99:
| Tier | Purpose | Where it runs | Resources |
|------|---------|---------------|-----------|
| **PR-CI / inner-loop dev** | Fast, runs on minimal Windows + Linux build agents and developer laptops | Each developer's machine; CI runners | Pure-managed in-process simulators (NModbus, OPC Foundation reference server, FOCAS TCP stub from test project). No Docker, no VMs locally. |
| **Integration / nightly CI** | Full driver-stack validation against real wire protocols | **Shared Linux Docker host at `10.100.0.35`** (Debian 13, Docker 29.2.1) — one host for all developers; replaces the former per-developer Docker Desktop + Hyper-V model | All Docker simulators (pymodbus, ab_server, python-snap7, opc-plc) + central SQL Server, all running as `/opt/otopcua-<driver>/` stacks with the `project=lmxopcua` label. TwinCAT XAR + the Galaxy/mxaccessgw stack stay on the Windows dev VM (license + Hyper-V constraints unchanged) |
| **PR-CI / inner-loop dev** | Fast, runs on minimal Windows + Linux build agents and developer laptops | Each developer's machine; CI runners | Pure-managed in-process simulators (NModbus, OPC Foundation reference server, FOCAS TCP stub from test project). No Docker, no VMs. |
| **Nightly / integration CI** | Full driver-stack validation against real wire protocols | One dedicated Windows host with Docker Desktop + Hyper-V + a TwinCAT XAR VM | All Docker simulators (`oitc/modbus-server`, `ab_server`, Snap7), TwinCAT XAR VM, Galaxy.Host installer + dev Galaxy access, FOCAS TCP stub binary, FOCAS FaultShim assembly |
The Linux Docker host is shared because (a) only one team member needs it active at a time, (b) it removes the per-developer Docker Desktop install, and (c) the dev VM no longer needs Hyper-V/WSL2 — freeing it for GPU passthrough.
The tier split keeps developer onboarding fast (no Docker required for first build) while concentrating the heavy simulator setup on one machine the team maintains.
## Installed Inventory — Dev VM (`DESKTOP-6JL3KKO`)
## Installed Inventory — This Machine
Running record of v2 dev services on the Windows dev VM. Updated on every install / config change. Credentials here are **dev-only** per decision #137 — production uses Integrated Security / gMSA per decision #46 and never any value in this table.
Running record of every v2 dev service stood up on this developer machine. Updated on every install / config change. Credentials here are **dev-only** per decision #137 — production uses Integrated Security / gMSA per decision #46 and never any value in this table.
**Last updated**: 2026-04-28 — Docker Desktop + WSL2 removed; Docker workloads now live on the Linux Docker host (see next section).
**Last updated**: 2026-04-17
### Host
| Attribute | Value |
|-----------|-------|
| Machine name | `DESKTOP-6JL3KKO` (10.100.0.48) |
| User | `dohertj2` (local Administrators) |
| VM platform | VMware ESXi |
| Machine name | `DESKTOP-6JL3KKO` |
| User | `dohertj2` (member of local Administrators + `docker-users`) |
| VM platform | VMware (`VMware20,1`), nested virtualization enabled |
| CPU | Intel Xeon E5-2697 v4 @ 2.30GHz (3 vCPUs) |
| OS | Windows 10 Enterprise (10.0.19045) |
| GPU | (Re-attached after WSL2/Hyper-V removal) |
| OS | Windows (WSL2 + Hyper-V Platform features installed) |
### Toolchain
@@ -48,40 +46,36 @@ Running record of v2 dev services on the Windows dev VM. Updated on every instal
| .NET AspNetCore runtime | 10.0.5 | `C:\Program Files\dotnet\shared\Microsoft.AspNetCore.App\` | Pre-installed |
| .NET NETCore runtime | 10.0.5 | `C:\Program Files\dotnet\shared\Microsoft.NETCore.App\` | Pre-installed |
| .NET WindowsDesktop runtime | 10.0.5 | `C:\Program Files\dotnet\shared\Microsoft.WindowsDesktop.App\` | Pre-installed |
| .NET Framework 4.8 SDK | — | Optional — only needed when building the mxaccessgw worker (sibling repo, x86 net48) | — |
| .NET Framework 4.8 SDK | — | Pending (needed for Phase 2 Galaxy.Host; not yet required) | — |
| Git | Pre-installed | Standard | — |
| PowerShell 7 | Pre-installed | Standard | — |
| winget | v1.28.220 | Standard Windows feature | — |
| Docker CLI (standalone, no daemon) | 29.3.1 | `%USERPROFILE%\bin\docker.exe` | Static binary from download.docker.com (2026-04-28) |
| Docker Compose CLI plugin | latest | `%USERPROFILE%\.docker\cli-plugins\docker-compose.exe` | Direct download from github.com/docker/compose (2026-04-28) |
| `lmxopcua-fix.ps1` helper | n/a | `%USERPROFILE%\bin\lmxopcua-fix.ps1` | See "Docker host" section below |
| WSL | Default v2, distro `docker-desktop` `STATE Running` | — | `wsl --install --no-launch` (2026-04-17) |
| Docker Desktop | 29.3.1 (engine) / Docker Desktop 4.68.0 (app) | Standard | `winget install --id Docker.DockerDesktop` (2026-04-17) |
| `dotnet-ef` CLI | 10.0.6 | `%USERPROFILE%\.dotnet\tools\dotnet-ef.exe` | `dotnet tool install --global dotnet-ef --version 10.0.*` (2026-04-17) |
| ~~Docker Desktop~~ | — | Removed 2026-04-28 — replaced by remote Linux Docker host | — |
| ~~WSL2 (`docker-desktop` distro)~~ | — | Removed 2026-04-28 (frees Hyper-V for GPU passthrough) | — |
### Services
| Service | Container / Process | Version | Host:Port | Credentials (dev-only) | Data location | Status |
|---------|---------------------|---------|-----------|------------------------|---------------|--------|
| **Central config DB** | Docker container `otopcua-mssql` on the Linux Docker host (image `mcr.microsoft.com/mssql/server:2022-latest`) | 16.0.4250.1 (RTM-CU24-GDR, KB5083252) | `10.100.0.35:14330``1433` (container) — port 14330 retained from the previous local-container setup so connection-string ports don't churn | User `sa` / Password `OtOpcUaDev_2026!` | Docker named volume `otopcua-mssql-data` on the Docker host | ✅ Running on Docker host (`/opt/otopcua-mssql/`) since 2026-04-28; carries `project=lmxopcua` label |
| **Central config DB** | Docker container `otopcua-mssql` (image `mcr.microsoft.com/mssql/server:2022-latest`) | 16.0.4250.1 (RTM-CU24-GDR, KB5083252) | `localhost:14330` (host)`1433` (container) — remapped from 1433 to avoid collision with the native MSSQL14 instance that hosts the Galaxy `ZB` DB (both bind 0.0.0.0:1433; whichever wins the race gets connections) | User `sa` / Password `OtOpcUaDev_2026!` | Docker named volume `otopcua-mssql-data` (mounted at `/var/opt/mssql` inside container) | ✅ Running — `InitialSchema` migration applied, 16 entity tables live |
| Dev Galaxy (AVEVA System Platform) | Local install on this dev box — full ArchestrA + Historian + OI-Server stack | v1 baseline | Local COM via MXAccess (`C:\Program Files (x86)\ArchestrA\Framework\bin\ArchestrA.MXAccess.dll`); Historian via `aaH*` services; SuiteLink via `slssvc` | Windows Auth | Galaxy repository DB `ZB` on local SQL Server (separate instance from `otopcua-mssql` — legacy v1 Galaxy DB, not related to v2 config DB) | ✅ **Fully available — Phase 2 lift unblocked.** 27 ArchestrA / AVEVA / Wonderware services running incl. `aaBootstrap`, `aaGR` (Galaxy Repository), `aaLogger`, `aaUserValidator`, `aaPim`, `ArchestrADataStore`, `AsbServiceManager`, `AutoBuild_Service`; full Historian set (`aahClientAccessPoint`, `aahGateway`, `aahInSight`, `aahSearchIndexer`, `aahSupervisor`, `InSQLStorage`, `InSQLConfiguration`, `InSQLEventSystem`, `InSQLIndexing`, `InSQLIOServer`, `InSQLManualStorage`, `InSQLSystemDriver`, `HistorianSearch-x64`); `slssvc` (Wonderware SuiteLink); `OI-Gateway` install present at `C:\Program Files (x86)\Wonderware\OI-Server\OI-Gateway\` (decision #142 AppServer-via-OI-Gateway smoke test now also unblocked) |
| GLAuth (LDAP) | Local install at `C:\publish\glauth\` | v2.4.0 | `localhost:3893` (LDAP) / `3894` (LDAPS, disabled) | Direct-bind `cn={user},dc=lmxopcua,dc=local` per `auth.md`; users `readonly`/`writeop`/`writetune`/`writeconfig`/`alarmack`/`admin`/`serviceaccount` (passwords in `glauth.cfg` as SHA-256) | `C:\publish\glauth\` | ✅ Running (NSSM service `GLAuth`). Phase 1 Admin uses GroupToRole map `ReadOnly→ConfigViewer`, `WriteOperate→ConfigEditor`, `AlarmAck→FleetAdmin`. v2-rebrand to `dc=otopcua,dc=local` is a future cosmetic change |
| OPC Foundation reference server | Not yet built | — | `10.100.0.35:62541` (target) | `user1` / `password1` (reference-server defaults) | — | Pending (needed for Phase 5 OPC UA Client driver testing) |
| FOCAS TCP stub | Not yet built | — | `10.100.0.35:8193` (target) | n/a | — | Pending (built in Phase 5; runs on Docker host) |
| Modbus simulator (`otopcua-pymodbus:3.13.0`) | Docker compose at `/opt/otopcua-modbus/` on Docker host | pinned 3.13.0 | `10.100.0.35:5020` | n/a | n/a | Stack staged; bring up with `lmxopcua-fix up modbus <profile>` from this VM |
| AB CIP fixture (`otopcua-ab-server:libplctag-release`) | Docker compose at `/opt/otopcua-abcip/` on Docker host | source-pinned `release` tag | `10.100.0.35:44818` | n/a | n/a | Stack staged; bring up with `lmxopcua-fix up abcip <profile>` from this VM |
| S7 fixture (`otopcua-python-snap7:1.0`) | Docker compose at `/opt/otopcua-s7/` on Docker host | python-snap7 ≥2.0 | `10.100.0.35:1102` | n/a | n/a | Stack staged; bring up with `lmxopcua-fix up s7 s7_1500` from this VM |
| OPC UA simulator (`mcr.microsoft.com/iotedge/opc-plc:2.14.10`) | Docker compose at `/opt/otopcua-opcuaclient/` on Docker host | pinned 2.14.10 | `10.100.0.35:50000` | anonymous | n/a | Stack staged; bring up with `lmxopcua-fix up opcuaclient` from this VM |
| TwinCAT XAR VM | — | — | TBD via Hyper-V on a separate Windows host (NOT this dev VM) | TwinCAT default route creds | — | Pending — Hyper-V removed from this dev VM; XAR will live on a separate dedicated Windows machine if needed |
| OPC Foundation reference server | Not yet built | — | `localhost:62541` (target) | `user1` / `password1` (reference-server defaults) | — | Pending (needed for Phase 5 OPC UA Client driver testing) |
| FOCAS TCP stub | Not yet built | — | `localhost:8193` (target) | n/a | — | Pending (built in Phase 5) |
| Modbus simulator (`oitc/modbus-server`) | — | — | `localhost:502` (target) | n/a | — | Pending (needed for Phase 3 Modbus driver; moves to integration host per two-tier model) |
| libplctag `ab_server` | — | — | `localhost:44818` (target) | n/a | — | Pending (Phase 3/4 AB CIP and AB Legacy drivers) |
| Snap7 Server | — | — | `localhost:102` (target) | n/a | — | Pending (Phase 4 S7 driver) |
| TwinCAT XAR VM | — | — | `localhost:48898` (ADS) (target) | TwinCAT default route creds | — | Pending — runs in Hyper-V VM, not on this dev box (per decision #135) |
### Connection strings for `appsettings.Development.json`
Copy-paste-ready. The checked-in `appsettings.json` defaults already point at the Docker host (`10.100.0.35,14330`), so `appsettings.Development.json` is only needed for per-developer overrides.
Copy-paste-ready. **Never commit these to the repo** — they go in `appsettings.Development.json` (gitignored per the standard .NET convention) or in user-scoped dotnet secrets.
```jsonc
{
"ConfigDatabase": {
"ConnectionString": "Server=10.100.0.35,14330;Database=OtOpcUaConfig_Dev;User Id=sa;Password=OtOpcUaDev_2026!;TrustServerCertificate=true;Encrypt=false;"
"ConnectionString": "Server=localhost,14330;Database=OtOpcUaConfig_Dev;User Id=sa;Password=OtOpcUaDev_2026!;TrustServerCertificate=true;Encrypt=false;"
},
"Authentication": {
"Ldap": {
@@ -95,26 +89,29 @@ Copy-paste-ready. The checked-in `appsettings.json` defaults already point at th
}
```
LDAP host stays `localhost` because GLAuth still runs as a native NSSM service on this dev VM (not yet migrated to the Docker host).
For xUnit test fixtures that need a throwaway DB per test run, build connection strings with `Database=OtOpcUaConfig_Test_{timestamp}` to avoid cross-run pollution.
### Container management quick reference
All commands SSH into the Docker host. The standalone Windows `docker.exe` on this VM has no daemon — every operation runs server-side via the helper.
```powershell
# Status / log / lifecycle from this VM
lmxopcua-fix ls # list lmxopcua-tagged containers + status
lmxopcua-fix logs mssql # SQL Server log tail
ssh dohertj2@10.100.0.35 'docker stop otopcua-mssql; docker start otopcua-mssql'
ssh dohertj2@10.100.0.35 'docker logs otopcua-mssql --tail 50'
# Start / stop the SQL Server container (survives reboots via Docker Desktop auto-start)
docker stop otopcua-mssql
docker start otopcua-mssql
# sqlcmd inside the container (run on the Docker host)
ssh dohertj2@10.100.0.35 'docker exec otopcua-mssql /opt/mssql-tools18/bin/sqlcmd -S localhost -U sa -P "OtOpcUaDev_2026!" -C -Q "SELECT @@VERSION"'
# Logs (useful for diagnosing startup failures or login issues)
docker logs otopcua-mssql --tail 50
# Nuclear reset (destroys dev DB data)
ssh dohertj2@10.100.0.35 'cd /opt/otopcua-mssql && docker compose down -v && docker compose up -d'
# Shell into the container (rarely needed; sqlcmd is the usual tool)
docker exec -it otopcua-mssql bash
# Query via sqlcmd inside the container (Git Bash needs MSYS_NO_PATHCONV=1 to avoid path mangling)
MSYS_NO_PATHCONV=1 docker exec otopcua-mssql /opt/mssql-tools18/bin/sqlcmd -S localhost -U sa -P "OtOpcUaDev_2026!" -C -Q "SELECT @@VERSION"
# Nuclear reset: drop the container + volume (destroys all DB data)
docker stop otopcua-mssql
docker rm otopcua-mssql
docker volume rm otopcua-mssql-data
# …then re-run the docker run command from Bootstrap Step 6
```
### Credential rotation
@@ -128,7 +125,7 @@ Dev credentials in this inventory are convenience defaults, not secrets. Change
| Resource | Purpose | Type | Default port | Default credentials | Owner |
|----------|---------|------|--------------|---------------------|-------|
| **.NET 10 SDK** | Build all .NET 10 x64 projects | OS install | n/a | n/a | Developer |
| **.NET Framework 4.8 SDK + targeting pack** | Optional — build the mxaccessgw worker (sibling repo, x86 net48) | Windows install | n/a | n/a | Developer |
| **.NET Framework 4.8 SDK + targeting pack** | Build `Driver.Galaxy.Host` (Phase 2+) | Windows install | n/a | n/a | Developer |
| **Visual Studio 2022 17.8+ or Rider 2024+** | IDE (any C# IDE works; these are the supported configs) | OS install | n/a | n/a | Developer |
| **Git** | Source control | OS install | n/a | n/a | Developer |
| **PowerShell 7.4+** | Compliance scripts (`phase-N-compliance.ps1`) | OS install | n/a | n/a | Developer |
@@ -250,7 +247,7 @@ Order matters because some installs have prerequisites and several need admin el
winget install --id Microsoft.DotNet.SDK.10 --accept-package-agreements --accept-source-agreements
```
2. **Install .NET Framework 4.8 SDK + targeting pack** optional, only needed when building the mxaccessgw worker (sibling repo, x86 net48). Not required by anything in this repo.
2. **Install .NET Framework 4.8 SDK + targeting pack** — only needed when starting Phase 2 (Galaxy.Host); skip for Phase 01 if not yet there
```powershell
winget install --id Microsoft.DotNet.Framework.DeveloperPack_4 --accept-package-agreements --accept-source-agreements
```
@@ -408,49 +405,6 @@ For production:
- Per-NodeId credentials in `ClusterNodeCredential` table (per decision #83)
- Admin app uses LDAP (no SQL credential at all on the user-facing side)
## Service Refresh — `Refresh-Services.ps1`
The deploy host hosts three NSSM-wrapped services (`MxAccessGw`,
`OtOpcUaWonderwareHistorian`, `OtOpcUa`) that consume binaries from
`C:\publish\`. After landing changes in either repo, refresh the
deployed bits with `scripts\install\Refresh-Services.ps1`:
```powershell
# Default invocation (dev rig).
& C:\Users\dohertj2\Desktop\lmxopcua\scripts\install\Refresh-Services.ps1
# Skip the timestamped backup (faster on iterative dev cycles).
& Refresh-Services.ps1 -SkipBackup
# Dry-run — print the actions without doing them.
& Refresh-Services.ps1 -WhatIf
```
The script:
1. Stops services in reverse-dependency order (`OtOpcUa`
`OtOpcUaWonderwareHistorian``MxAccessGw`) and force-kills
any residual processes.
2. Snapshots the existing `C:\publish\mxaccessgw\` and
`C:\publish\lmxopcua\` trees to `C:\publish\.backup-<timestamp>\`
for rollback (skip with `-SkipBackup`).
3. Builds + copies mxaccessgw worker (x86 net48) + server (net10.0)
binaries from the sibling repo.
4. `dotnet publish`-es the OtOpcUa server + Wonderware historian
sidecar from this repo.
5. Ensures `OTOPCUA_HISTORIAN_ALARM_WRITE_ENABLED=true` is set on
the historian service env block (PR C.2 toggle).
6. Starts services in forward-dependency order (`MxAccessGw`
`OtOpcUaWonderwareHistorian``OtOpcUa`).
7. Smoke-verifies — service status, listening ports (5120 / 4840 /
4841), recent log tails.
Functional verification (alarm raise / scripted alarm historian
round-trip / sub-attribute fallback) is the operator's next step
after the refresh; see
[docs/plans/alarms-over-gateway.md](../plans/alarms-over-gateway.md)
§Track D for the scenarios.
## Test Data Seed
Each environment needs a baseline data set so cross-developer tests are reproducible. Lives in `tests/ZB.MOM.WW.OtOpcUa.IntegrationTests/SeedData/`:
@@ -528,7 +482,7 @@ Seeds are idempotent (re-runnable) and gitignored where they contain credentials
| Docker Desktop license terms change for org use | Track Docker pricing; budget approved or fall back to Podman if license becomes blocking |
| Integration host single point of failure | Document the setup so a second host can be provisioned in <2 days; test fixtures pin to a hostname so failover changes one DNS entry |
| GLAuth dev config drifts between developers | Sync script + template (Step 4) keep configs aligned; periodic review |
| Galaxy / MXAccess licensing for non-dev-machine | Galaxy stays on the dev machines that already have Aveva licenses; integration host does NOT run Galaxy (the mxaccessgw worker requires the AVEVA stack and runs on the dev box, not the shared host) |
| Galaxy / MXAccess licensing for non-dev-machine | Galaxy stays on the dev machines that already have Aveva licenses; integration host does NOT run Galaxy (Galaxy.Host integration tests run on the dev box, not the shared host) |
| Long-lived dev env credentials in dev `appsettings.Development.json` | Gitignored; documented as dev-only; production never uses these |
## Decisions to Add to plan.md
+271 -47
View File
@@ -10,65 +10,289 @@
### Summary
Galaxy (MXAccess) is a **Tier-A in-process driver** that runs in the OtOpcUa server's .NET 10 AnyCPU process and speaks gRPC to a separately installed `mxaccessgw` (sibling repo at `c:\Users\dohertj2\Desktop\mxaccessgw\`). The gateway owns the MXAccess COM apartment, the STA pump, and the Galaxy Repository / Historian SDK on its own host; the driver itself is platform-agnostic and carries no COM or x86 bitness constraint. Project lives at `src/ZB.MOM.WW.OtOpcUa.Driver.Galaxy/`.
Out-of-process **Tier C** driver bridging AVEVA System Platform (Wonderware) Galaxies. The existing v1 implementation is refactored behind the new driver capability interfaces and hosted in a separate Windows service (.NET 4.8 x86) that communicates with the main OtOpcUa server (.NET 10 x64) via named pipes + MessagePack. Hosted out-of-process for **two reasons**: COM/.NET 4.8 x86 bitness constraint **and** Tier C stability isolation (per `driver-stability.md`). FOCAS is the second Tier C driver, also out-of-process — see §7.
### Capability Surface
### Library & Dependencies
`GalaxyDriver` (in `GalaxyDriver.cs`) implements `IDriver`, `IDisposable`, plus six driver capabilities — eight interfaces total.
| Component | Package / Source | Version | Target | Notes |
|-----------|------------------|---------|--------|-------|
| **MXAccess COM** | `ArchestrA.MxAccess` (GAC / `lib/ArchestrA.MxAccess.dll`) | version-neutral late-bound | .NET 4.8 x86 | Pinned via `<Reference Include="ArchestrA.MxAccess">` with `EmbedInteropTypes=false`; interfaces: `LMXProxyServer`, `ILMXProxyServerEvents`, `MXSTATUS_PROXY` |
| **Galaxy DB client** | `System.Data.SqlClient` (BCL) | BCL | .NET 4.8 x86 | Direct SQL for hierarchy/attribute/change-detection queries |
| **Wonderware Historian SDK** | `aahClientManaged`, `aahClientCommon` | Historian-shipped | .NET 4.8 x86 | Optional — loaded only when `Historian.Enabled=true` |
| **MessagePack-CSharp** | `MessagePack` NuGet | 2.x | .NET Standard 2.0 (Shared) | IPC serialization; shared contract between Proxy and Host |
| **Named pipes** | `System.IO.Pipes` (BCL) | BCL | both sides | IPC transport, localhost only |
| Capability | Source files |
|------------|--------------|
| `ITagDiscovery` | `Browse/GalaxyDiscoverer.cs`, `Browse/GatewayGalaxyHierarchySource.cs`, `Browse/DataTypeMap.cs`, `Browse/SecurityMap.cs`, `Browse/AlarmRefBuilder.cs` |
| `IRediscoverable` | `Browse/DeployWatcher.cs`, `Browse/GatewayGalaxyDeployWatchSource.cs` |
| `IReadable` | `Runtime/GalaxyMxSession.cs`, `Runtime/MxValueDecoder.cs`, `Runtime/StatusCodeMap.cs` |
| `IWritable` | `Runtime/GatewayGalaxyDataWriter.cs` (+ `TracedGalaxyDataWriter.cs`), `Runtime/MxValueEncoder.cs` |
| `ISubscribable` | `Runtime/GatewayGalaxySubscriber.cs` (+ `TracedGalaxySubscriber.cs`), `Runtime/EventPump.cs`, `Runtime/SubscriptionRegistry.cs`, `Runtime/ReconnectSupervisor.cs` |
| `IHostConnectivityProbe` | `Health/HostStatusAggregator.cs`, `Health/HostConnectivityForwarder.cs`, `Health/PerPlatformProbeWatcher.cs` |
### Required Components
History reads + alarm condition tracking now live in the server-layer `IHistoryRouter` and `AlarmConditionService` (PR 7.2). Galaxy no longer carries `IHistoryProvider` or `IAlarmSource` of its own.
- **AVEVA System Platform / ArchestrA Platform** deployed on the same machine as `Galaxy.Host` (installs MXAccess COM objects into the GAC)
- A **deployed Galaxy** with at least one $WinPlatform object hosting $AppEngine(s) hosting AutomationObjects
- **SQL Server** reachable from `Galaxy.Host` with the Galaxy repository database (default `ZB`); Windows Auth by default
- **32-bit .NET Framework 4.8** runtime on the Host machine (MXAccess is 32-bit COM, no 64-bit variant)
- **STA thread + Win32 message pump** inside the Host process for all COM calls and event callbacks (see §13)
- **Wonderware Historian** installed on-box or reachable via aah SDK — *only* if HDA is enabled
- **No external firewall ports** — MXAccess is local-machine COM/IPC; pipe is localhost-only. Galaxy DB port (default SQL 1433) if the ZB database is remote.
### DriverConfig JSON shape
### Connection Settings (per driver instance, from central config DB)
Per `src/ZB.MOM.WW.OtOpcUa.Driver.Galaxy/Config/GalaxyDriverOptions.cs`:
All settings live under a schemaless `DriverConfig` JSON blob on the `DriverInstance` row. Current v1 equivalents (defaults and source file references in parentheses):
```jsonc
{
"Gateway": {
"Endpoint": "http://localhost:5120",
"ApiKeySecretRef": "secret:galaxy-gw-api-key",
"UseTls": true,
"CaCertificatePath": null,
"ConnectTimeoutSeconds": 10,
"DefaultCallTimeoutSeconds": 30,
"StreamTimeoutSeconds": 0
},
"MxAccess": {
"ClientName": "OtOpcUa",
"PublishingIntervalMs": 1000,
"WriteUserId": 0,
"EventPumpChannelCapacity": 50000
},
"Repository": {
"DiscoverPageSize": 5000,
"WatchDeployEvents": true
},
"Reconnect": {
"InitialBackoffMs": 500,
"MaxBackoffMs": 30000,
"ReplayOnSessionLost": true
}
}
```
**MXAccess** (`MxAccessConfiguration.cs`):
`Gateway.ApiKeySecretRef` resolves through the server-side secret store (DPAPI in production, env override in dev) — the API key never appears in cleartext config. `MxAccess.ClientName` MUST be unique per OtOpcUa instance; redundancy pairs enforce uniqueness at install time. `StreamTimeoutSeconds = 0` keeps the `StreamEvents` RPC alive for the lifetime of the driver.
| Setting | Type | Default | Description |
|---------|------|---------|-------------|
| `ClientName` | string | `"LmxOpcUa"` | Registration name passed to `LMXProxyServer.Register()` |
| `NodeName` | string? | `null` | Optional ArchestrA node override (null = local) |
| `GalaxyName` | string? | `null` | Optional Galaxy name override |
| `ReadTimeoutSeconds` | int | `5` | Per-read timeout |
| `WriteTimeoutSeconds` | int | `5` | Per-write timeout |
| `RequestTimeoutSeconds` | int | `30` | Outer safety timeout around any MXAccess request |
| `MaxConcurrentOperations` | int | `10` | Pool bound on in-flight MXAccess work items |
| `MonitorIntervalSeconds` | int | `5` | Connectivity heartbeat probe interval |
| `AutoReconnect` | bool | `true` | Replay stored subscriptions on COM reconnect |
| `ProbeTag` | string? | `null` | Optional heartbeat tag for health monitoring |
| `ProbeStaleThresholdSeconds` | int | `60` | Mark connection stale if no probe callback within |
| `RuntimeStatusProbesEnabled` | bool | `true` | Auto-subscribe `ScanState` for $WinPlatform / $AppEngine |
| `RuntimeStatusUnknownTimeoutSeconds` | int | `15` | Grace period before an un-probed host is assumed Stopped |
### Performance, tracing, soak
**Galaxy repository** (`GalaxyRepositoryConfiguration.cs`):
See [Galaxy.Performance.md](Galaxy.Performance.md) for the OpenTelemetry trace map, the per-RPC metric set (`galaxy.events.dropped`, channel headroom, reconnect backoff distribution), and the soak-run profile.
| Setting | Type | Default | Description |
|---------|------|---------|-------------|
| `ConnectionString` | string | `Server=localhost;Database=ZB;Integrated Security=true;` | ZB SQL Server connection |
| `ChangeDetectionIntervalSeconds` | int | `30` | Poll interval for `galaxy.time_of_last_deploy` |
| `CommandTimeoutSeconds` | int | `30` | SQL command timeout |
| `ExtendedAttributes` | bool | `false` | Include extended attribute metadata in discovery |
| `Scope` | enum (`Galaxy` \| `LocalPlatform`) | `Galaxy` | Address-space scope filter (commit bc282b6) |
| `PlatformName` | string? | `Environment.MachineName` | Platform to scope to when `Scope=LocalPlatform` |
### Parity rig + gateway setup
**IPC** (new for v2):
See [Galaxy.ParityRig.md](Galaxy.ParityRig.md) and the `mxaccessgw` repo for the gateway worker layout and the dev-rig recipe.
| Setting | Type | Default | Description |
|---------|------|---------|-------------|
| `PipeName` | string | `otopcua-galaxy-{InstanceId}` | Named pipe name |
| `HostStartupTimeoutMs` | int | `30000` | Proxy wait for Host `Ready` handshake |
| `IpcCallTimeoutMs` | int | `15000` | Per-call RPC timeout |
### Addressing
Galaxy objects carry two names:
- **`contained_name`** — human-readable, scoped to parent; used for OPC UA browse tree
- **`tag_name`** — globally unique system identifier; used for MXAccess runtime references
| Layer | Example |
|-------|---------|
| OPC UA browse path | `TestMachine_001/DelmiaReceiver/DownloadPath` |
| OPC UA NodeId | `ns=<galaxyNs>;s=<tagName>.<AttributeName>` |
| MXAccess reference | `DelmiaReceiver_001.DownloadPath` (passed to `AddItem()`) |
Tag discovery is **dynamic** — driven by the Galaxy repository DB (`gobject`, `dynamic_attribute`, `primitive_instance`, `template_definition`). Optional `Scope=LocalPlatform` filters the hierarchy via the `hosted_by_gobject_id` chain to the subtree rooted at the local $WinPlatform (on a dev Galaxy: 49→3 objects, 4206→386 attributes).
### Data Type Mapping (`MxDataTypeMapper.cs`, `gr/data_type_mapping.md`)
| mx_data_type | Galaxy Type | OPC UA BuiltInType | CLR Type |
|--------------|-------------|--------------------|----------|
| 1 | Boolean | Boolean (i=1) | `bool` |
| 2 | Integer | Int32 (i=6) | `int` |
| 3 | Float | Float (i=10) | `float` |
| 4 | Double | Double (i=11) | `double` |
| 5 | String | String (i=12) | `string` |
| 6 | Time | DateTime (i=13) | `DateTime` |
| 7 | ElapsedTime | Double (i=11) | `double` (seconds) |
| 8 | Reference | String (i=12) | `string` |
| 13 | Enumeration | Int32 (i=6) | `int` |
| 14 / 16 | Custom | String (i=12) | `string` |
| 15 | InternationalizedString | LocalizedText (i=21) | `string` |
| (default) | Unknown | String (i=12) | `string` |
**Arrays**: `is_array=0` → ValueRank `-1` (Scalar); `is_array=1` → ValueRank `1` (OneDimension), ArrayDimensions = `[array_dimension]`.
### Security Classification Mapping (`SecurityClassificationMapper.cs`)
| security_classification | Galaxy Level | OPC UA Write Permission |
|-------------------------|--------------|-------------------------|
| 0 | FreeAccess | `WriteOperate` |
| 1 | Operate | `WriteOperate` |
| 2 | SecuredWrite | — (read-only in v1) |
| 3 | VerifiedWrite | — (read-only in v1) |
| 4 | Tune | `WriteTune` |
| 5 | Configure | `WriteConfigure` |
| 6 | ViewOnly | — (read-only) |
Maps to the OPC UA roles `ReadOnly` / `WriteOperate` / `WriteTune` / `WriteConfigure` defined in the LDAP role provider (see `docs/security.md`).
### Subscription Model — Native MXAccess Advisories
**Galaxy is one of three drivers with native subscriptions (Galaxy, TwinCAT, OPC UA Client).** No polling.
- Mechanism: `LMXProxyServer.AddItem()``AdviseSupervisory(handle, itemHandle)`; callbacks delivered through the `ILMXProxyServerEvents.OnDataChange` COM event
- Callback signature: `MxDataChangeHandler(itemHandle, MXSTATUS_PROXY, value, quality, timestamp)`
- Dispatch: STA COM event → dispatch-thread queue → OPC UA `ClearChangeMasks` fan-out (decouples COM thread from UA stack lock — commit c76ab8f)
- **Stored subscriptions** replayed on reconnect via `ReplayStoredSubscriptionsAsync()`
- **Probe tag** + runtime-status probes provide connection-health visibility (see §14)
- **Bad-quality fan-out**: when a host ($WinPlatform or $AppEngine) ScanState transitions to Stopped, every attribute under that host is immediately published as `BadOutOfService` (commits 7310925, c76ab8f)
### Alarm Model
In-process alarm-condition tracking (v1 baseline; extended in v2 to match `IAlarmSource`):
- **Auto-subscribed attributes per alarm-eligible object**: `InAlarm`, `Priority`, `Description` (cached for severity and message)
- **Filtering**: `AlarmFilterConfiguration.ObjectFilters[]` — include/exclude by template chain (empty = all eligible)
- **Transitions**: `InAlarm` change → OPC UA A&C `AlarmConditionState` event (Active / Return to Normal)
- **Severity**: Galaxy `Priority` (1 = highest) mapped to OPC UA 11000 severity (higher = more severe)
- **Acknowledgment**: local OPC UA ack forwards to MXAccess write on the `Ack` attribute of the alarm-bearing object
### History Model — Wonderware Historian (optional plugin)
- Loaded **at runtime** from `ZB.MOM.WW.LmxOpcUa.Historian.Aveva.dll` when `Historian.Enabled=true`; compile-time optional
- SDK: `aahClientManaged` / `aahClientCommon`
- Supported OPC UA HDA calls:
- `HistoryReadRawModified` (raw values with bounds)
- `HistoryReadProcessed` (Historian aggregates: AVG, MIN, MAX, TIMEAVG, etc. — mapped to OPC UA aggregates)
- Continuation points for paged reads
- Only attributes flagged `historize=1` in the Galaxy DB expose `AccessLevel.HistoryRead`
### Error Mapping — MXAccess → Quality → OPC UA StatusCode
**Byte quality (OPC DA convention)** — `QualityMapper.cs`:
| OPC DA Quality | Category |
|----------------|----------|
| `>= 192` | Good |
| `64191` | Uncertain |
| `< 64` | Bad |
**MXAccess error codes → Quality** (`MxErrorCodes.cs`):
| Code | Name | Quality |
|------|------|---------|
| 1008 | `MX_E_InvalidReference` | `BadConfigError` |
| 1012 | `MX_E_WrongDataType` | `BadConfigError` |
| 1013 | `MX_E_NotWritable` | `BadOutOfService` |
| 1014 | `MX_E_RequestTimedOut` | `BadCommFailure` |
| 1015 | `MX_E_CommFailure` | `BadCommFailure` |
| 1016 | `MX_E_NotConnected` | `BadNotConnected` |
**Quality → OPC UA StatusCode** (`QualityMapper.cs`):
| Quality | StatusCode |
|---------|-----------|
| Good | `0x00000000` |
| GoodLocalOverride | `0x00D80000` |
| Uncertain | `0x40000000` |
| Bad (generic) | `0x80000000` |
| BadCommFailure | `0x80050000` |
| BadNotConnected | `0x808A0000` |
| BadOutOfService | `0x808D0000` |
### Change Detection
- `ChangeDetectionService` polls `galaxy.time_of_last_deploy` at `ChangeDetectionIntervalSeconds` (default 30s)
- On timestamp change, `OnGalaxyChanged` fires → Host re-queries hierarchy/attributes → emits `TagSetChanged` over IPC → Proxy implements `IRediscoverable` and rebuilds the affected subtree in the address space
- Platform-scope filter (commit bc282b6) applied during hierarchy load when `Scope=LocalPlatform`
### IPC Contract (Proxy ↔ Host) — `Galaxy.Shared`
.NET Standard 2.0 MessagePack contracts. Every request carries a correlation ID; responses carry the same ID plus success/error.
**Lifecycle / handshake**:
| Message | Direction | Payload |
|---------|-----------|---------|
| `ClientHello` | Proxy → Host | InstanceId, expected protocol version |
| `HostReady` | Host → Proxy | Host version, Galaxy name, capabilities |
| `Shutdown` | Proxy → Host | Graceful stop |
**Tag discovery** (`ITagDiscovery`):
| Message | Direction | Payload |
|---------|-----------|---------|
| `DiscoverHierarchyRequest` | Proxy → Host | `Scope`, `PlatformName` |
| `DiscoverHierarchyResponse` | Host → Proxy | `GalaxyObjectInfo[]` (TagName, ContainedName, ParentTagName, TemplateChain, category) |
| `DiscoverAttributesRequest` | Proxy → Host | `TagName[]` |
| `DiscoverAttributesResponse` | Host → Proxy | `GalaxyAttributeInfo[]` (Name, MxDataType, IsArray, ArrayDim, SecurityClass, Historized, WriteableRuntimeChecked) |
| `TagSetChangedNotification` | Host → Proxy | New deploy timestamp; triggers re-discover |
**Read / Write** (`IReadable`, `IWritable`):
| Message | Direction | Payload |
|---------|-----------|---------|
| `ReadRequest` | Proxy → Host | `TagRef[]` (tag_name + attribute) |
| `ReadResponse` | Host → Proxy | `VtqPayload[]` (value, quality, timestamp, statusCode) |
| `WriteRequest` | Proxy → Host | `(TagRef, Value, ExpectedDataType)[]` |
| `WriteResponse` | Host → Proxy | `(TagRef, StatusCode)[]` |
**Subscription** (`ISubscribable`):
| Message | Direction | Payload |
|---------|-----------|---------|
| `SubscribeRequest` | Proxy → Host | `TagRef[]` + Proxy-generated subscription ID |
| `SubscribeResponse` | Host → Proxy | Per-tag subscribe ack + handle |
| `UnsubscribeRequest` | Proxy → Host | handles |
| `DataChangeNotification` | Host → Proxy (push) | handle, VTQ, sequence number |
| `ProbeHealthNotification` | Host → Proxy (push) | probe tag staleness, `ScanState` transitions, overall connected/disconnected |
**Alarms** (`IAlarmSource`):
| Message | Direction | Payload |
|---------|-----------|---------|
| `AlarmEventNotification` | Host → Proxy (push) | source tag, InAlarm, Priority, Description, severity, transition type |
| `AlarmAckRequest` | Proxy → Host | source tag, user, comment |
**History** (`IHistoryProvider`):
| Message | Direction | Payload |
|---------|-----------|---------|
| `HistoryReadRawRequest` | Proxy → Host | TagRef, start, end, numValues, returnBounds, continuationPoint |
| `HistoryReadRawResponse` | Host → Proxy | values + next continuation point |
| `HistoryReadProcessedRequest` | Proxy → Host | TagRef, aggregateId, start, end, resampleInterval |
| `HistoryReadProcessedResponse` | Host → Proxy | aggregated values |
**Framing**: length-prefixed MessagePack frames over a single `NamedPipeServerStream` in `PipeTransmissionMode.Byte`. Separate outgoing pipe for push notifications or multiplex via message type tag.
### Threading / COM Constraints
- **STA thread** (`StaComThread.cs`) hosts MXAccess: `ApartmentState.STA`, raw Win32 `GetMessage` / `DispatchMessage` loop
- Work items marshaled in via `PostThreadMessage(WM_APP=0x8000)`
- **Per-handle serialization**: LMXProxyServer is not thread-safe — all Read/Write/Subscribe calls on one handle run serially via the STA queue
- **Dispatch thread** (separate from STA thread) drains `_pendingDataChanges` to the OPC UA framework; decouples the STA pump from UA stack locks so a slow subscriber can't back up COM event delivery
- **Reentrancy guards** — event unwiring must precede `Marshal.ReleaseComObject()` on disconnect
### Runtime Status (recent commits bc282b6 / 4b209f6 / 7310925 / c76ab8f / 0003984)
- `GalaxyRuntimeProbeManager` auto-subscribes `<ObjectName>.ScanState` for every $WinPlatform (category 1) and $AppEngine (category 3) in scope
- Per-host state machine: `Unknown → Running | Stopped`; transitions fire `_onHostStopped` / `_onHostRunning` callbacks on the dispatch thread
- **Synthetic OPC UA nodes** expose `ScanState` per host as read-only variables so clients see runtime topology without the dashboard
- **HealthCheck Rule 2e** monitors probe subscription health; a failed probe can no longer leave phantom entries that fan out false `BadOutOfService`
- Generalizes to the driver-agnostic `IHostConnectivityProbe` capability interface in v2 (see `plan.md` §5a)
### Implementation Notes
- **First Tier C out-of-process driver** — uses the `Galaxy.Proxy` / `Galaxy.Host` / `Galaxy.Shared` three-project split. The pattern is reusable; FOCAS is the second adopter (see §7), and any future driver with bitness, licensing, or stability-isolation needs reuses the same template. See `driver-stability.md` for the generalized contract
- `Galaxy.Proxy` (in the main server) implements `IDriver`, `ITagDiscovery`, `IRediscoverable`, `IReadable`, `IWritable`, `ISubscribable`, `IAlarmSource`, `IHistoryProvider`, `IHostConnectivityProbe`
- `Galaxy.Host` owns `MxAccessBridge`, `GalaxyRepository`, alarm tracking, `GalaxyRuntimeProbeManager`, and the Historian plugin — no reference to `Core.Abstractions`
- `Galaxy.Shared` is .NET Standard 2.0, referenced by both sides
- Existing v1 code is the implementation — **refactor in place** (extract capability interfaces first, then move behind IPC — see `plan.md` Decision #55)
- **Parity gate**: v2 driver must pass v1 `IntegrationTests` suite + scripted Client.CLI walkthrough before Phase 3 begins
### Operational Stability Notes
Galaxy has a Tier C deep dive in `driver-stability.md` covering the STA pump, COM object lifetime, subscription replay, recycle policy, and post-mortem contents. Driver-instance specifics:
- **Memory baseline scales with Galaxy size**. Watchdog floor of 200 MB above baseline + 1.5 GB hard ceiling — higher than FOCAS because legitimate Galaxy footprints are larger.
- **Slope tolerance is 5 MB/min** (more permissive than FOCAS) because address-space rebuild on redeploy can transiently allocate large amounts.
- **Known regression-prone failure modes** (closed in commits `c76ab8f` and `7310925`, must remain closed): phantom probe subscription flipping Tick() to Stopped; cross-host quality clear wiping sibling state during recovery; sync-over-async on the OPC UA stack thread; fire-and-forget alarm tasks racing shutdown. Each should have a regression test in the v2 parity suite.
- **STA pump health probe** every 10 s (separate from the proxy↔host heartbeat). A wedged pump is the most likely Tier C failure mode for Galaxy.
- **Recycle preserves cached `time_of_last_deploy` watermark** — the common case (crash unrelated to redeploy) skips full DB rediscovery for faster recovery.
### Namespace Assignment
Galaxy is the canonical **SystemPlatform-kind namespace** driver. It exposes Aveva System Platform / Galaxy objects as OPC UA — these are *processed* values with business meaning attached at Layer 3, not raw equipment signals. Per `plan.md` §4:
- The Galaxy driver's `DriverInstance.NamespaceId` must reference a `Namespace` row with `Kind = 'SystemPlatform'`.
- **UNS naming rules do NOT apply** to the Galaxy hierarchy. Tags belong to `DriverInstanceId + FolderPath` (v1 LmxOpcUa pattern preserved); `Tag.EquipmentId` is NULL.
- The Galaxy hierarchy reflects the gobject parent chain as v1 has always done — no migration to UNS path conventions in v2.
- If a future need arises to expose raw Galaxy gobject data alongside processed (e.g. an Aveva-Wonderware Historian raw signal feed), that becomes a *separate* driver instance assigned to an Equipment-kind namespace, with its own per-equipment mapping.
---
+1 -1
View File
@@ -174,7 +174,7 @@ Common contract for the proxy in the main server:
Named pipes default to allowing connections from any local user. Without explicit ACLs, any process on the host machine that knows the pipe name could connect, bypass the OPC UA server's authentication and authorization layers, and issue reads, writes, or alarm acknowledgments directly against the driver host. **This is a real privilege-escalation surface** — a service account with no OPC UA permissions could write field values it should never have access to. Every Tier C driver enforces the following:
1. **Pipe ACL**: the host creates the pipe with a `PipeSecurity` ACL that grants `ReadWrite | Synchronize` only to the OtOpcUa server's service principal SID. `LocalSystem` is explicitly denied. The ACL is set at pipe-creation time so it's atomic with the pipe being listenable. Administrators are **not** added to the deny list — UAC's filtered token carries the Admins group SID as deny-only, so a deny ACE on Administrators would fire even for non-elevated callers whose user account happens to be a member (common on dev boxes). The per-connection SID check in §2 remains the authorization boundary.
1. **Pipe ACL**: the host creates the pipe with a `PipeSecurity` ACL that grants `ReadWrite | Synchronize` only to the OtOpcUa server's service principal SID. All other local users — including LocalSystem and Administrators — are explicitly denied. The ACL is set at pipe-creation time so it's atomic with the pipe being listenable.
2. **Caller identity verification**: on each new pipe connection, the host calls `NamedPipeServerStream.GetImpersonationUserName()` (or impersonates and inspects the token) and verifies the connected client's SID matches the configured server service SID. Mismatches are logged and the connection is dropped before any RPC frame is read.
3. **Per-message authorization context**: every RPC frame includes the operation's authenticated OPC UA principal (forwarded by the Core after it has done its own authn/authz). The host treats this as input only — the driver-level authorization (e.g. "is this principal allowed to write Tune attributes?") is performed by the Core, but the host's own audit log records the principal so post-incident attribution is possible.
4. **No anonymous endpoints**: the heartbeat pipe has the same ACL as the data-plane pipe. There are no "open" pipes a generic client can probe.
+321
View File
@@ -0,0 +1,321 @@
# FOCAS deployment guide
Per-driver runbook for deploying the FANUC FOCAS driver. See
[`docs/drivers/FOCAS.md`](../drivers/FOCAS.md) for the per-feature
reference and [`focas-version-matrix.md`](./focas-version-matrix.md) for
the per-CNC-series capability surface.
## Operator config-knob cheat sheet
| Knob | Where | Default | Notes |
| --- | --- | --- | --- |
| `Devices[].HostAddress` | `FocasDriverOptions.Devices` | — | `focas://{ip}[:{port}]` |
| `Devices[].Series` | `FocasDriverOptions.Devices` | `Unknown` | Drives per-series range validation in `FocasCapabilityMatrix`. |
| `Devices[].OverrideParameters` | `FocasDriverOptions.Devices` | `null` | MTB-specific parameter numbers for Feed/Rapid/Spindle/Jog overrides. `null` suppresses the `Override/` subtree. |
| `Probe.Enabled` | `FocasDriverOptions.Probe` | `true` | Background reachability probe. |
| `Probe.Interval` | `FocasDriverOptions.Probe` | `00:00:05` | Probe cadence. |
| `FixedTree.ApplyFigureScaling` | `FocasDriverOptions.FixedTree` | `true` | Divide position values by 10^decimal-places (issue #262). |
| **`AlarmProjection.Mode`** | **`FocasDriverOptions.AlarmProjection`** | **`ActiveOnly`** | **`ActiveOnly` keeps today's behaviour. `ActivePlusHistory` polls `cnc_rdalmhistry` on connect + on `HistoryPollInterval` ticks (issue #267, plan PR F3-a).** |
| **`AlarmProjection.HistoryPollInterval`** | **`FocasDriverOptions.AlarmProjection`** | **`00:05:00`** | **Cadence of the history poll. Operator dashboards run the default; high-frequency rigs can drop to 30 s.** |
| **`AlarmProjection.HistoryDepth`** | **`FocasDriverOptions.AlarmProjection`** | **`100`** | **Most-recent-N ring-buffer entries pulled per poll. Hard-capped at `250` so misconfigured values can't blast the wire session.** |
## Sample `appsettings.json` snippet for `ActivePlusHistory`
```jsonc
{
"Drivers": {
"FOCAS": {
"Devices": [
{ "HostAddress": "focas://10.0.0.5:8193", "Series": "Series30i" }
],
"AlarmProjection": {
"Mode": "ActivePlusHistory",
"HistoryPollInterval": "00:05:00",
"HistoryDepth": 100
}
}
}
}
```
The history projection emits each unseen entry through
`IAlarmSource.OnAlarmEvent` with `SourceTimestampUtc` set from the CNC's
reported wall-clock — keep CNC clocks on UTC so the dedup key
`(OccurrenceTime, AlarmNumber, AlarmType)` stays stable across DST
transitions.
## Derived telemetry — issue #272 (plan PR F5-a)
The `Production/` subtree gains two **derived** nodes alongside the four
F1-b wire-sourced fields:
- `Production/LastCycleSeconds` (`Float64`)
- `Production/LastCycleStartUtc` (`DateTime` UTC)
**No new wire calls.** Both nodes are computed client-visible from the
same `cnc_rdparam(6711)` + `cnc_rdtimer` poll the F1-b projection
already runs on every probe tick. There is no per-device knob — the
nodes are present for every CNC the driver connects to and surface
`null` until the second observed parts-count increment produces the
first delta.
This means:
- **No additional CNC load.** Probe-tick wire traffic is unchanged.
- **No new opt-in.** The nodes ship enabled by default and are
read-only (`SecurityClassification.ViewOnly`); no LDAP group needs
the new permission.
- **Reconnect re-baselines.** Per the FWLIB session boundary the
derivation state resets on reconnect / reinit, so the first cycle
observed after a reconnect re-establishes the baseline before
publishing the first post-reconnect delta.
See [`docs/drivers/FOCAS.md`](../drivers/FOCAS.md) § "Fixed-tree
`Production/` projection" for the full edge-case behaviour matrix
(parts-count counter reset, cycle-timer rollover, parts-count jumps
&gt; 1).
## Write safety — issue #269 (PARAM/MACRO, F4-b) + issue #270 (PMC, F4-c)
The FOCAS driver supports `cnc_wrparam`, `cnc_wrmacro`, and `pmc_wrpmcrng`
writes behind multiple independent opt-ins. A misdirected parameter write
can put the CNC in a bad state; a misdirected PMC write can move motion or
latch a feedhold. The runbook below MUST be followed before flipping any
of the granular kill switches on.
### Operator pre-checks (every deployment, every change)
1. **CNC must be in MDI mode.** Most parameter writes fail with `EW_PASSWD`
(surfaces as `BadUserAccessDenied`) unless the CNC is in MDI. The
server-side write returns immediately with the access-denied status; no
value reaches the wire.
2. **Parameter-write switch enabled on the CNC pendant.** Even in MDI mode
protected parameters require the operator to physically enable the
parameter-write switch. Without it `cnc_wrparam` returns `EW_PASSWD`.
Plan PR F4-d will land an OPC UA-side unlock workflow; today the only
path is the pendant.
3. **Verify each tag's address against the FANUC manual.** Ranges vary per
CNC series; the
[`focas-version-matrix`](./focas-version-matrix.md) capability matrix
rejects out-of-range numbers at startup, but address-vs-meaning is the
operator's job.
4. **Dry run with `Writable = true` but `Writes.AllowParameter = false`.**
Staged opt-in catches mis-mapped tags: every PARAM write returns
`BadNotWritable` until you flip the granular flag, so you can confirm
the tag list before any wire write fires.
### PMC pre-checks (in addition to the above) — F4-c
PMC writes have a higher blast radius than PARAM/MACRO writes because PMC
is the ladder's working memory — bits in R/G/F/D directly drive servo
enables, feedhold latches, and safety interlocks. Before flipping
`Writes.AllowPmc` on:
1. **E-stop verified live + reachable.** The first PMC write of a session
should be issued with the operator's hand on the e-stop. PMC writes
bypass the ladder's normal MDI-mode protections; a misdirected bit can
move motion the moment it lands on the wire.
2. **Machine in JOG mode (or equivalent low-energy mode).** Auto / MEM
modes interpret PMC state immediately; JOG / MDI surface symptoms
slowly enough that the e-stop is the recovery path. **Never issue the
first PMC write of a deployment in Auto.**
3. **Audit the PMC tag list against the ladder print-out.** `R100.3` on
one machine is "homing complete"; on another it's "feedhold released".
The driver has no way to distinguish — the ladder source is the only
ground truth.
4. **Bit writes are read-modify-write — see
[`docs/drivers/FOCAS.md`](../drivers/FOCAS.md) "PMC bit-write read-modify-write semantics".**
`pmc_wrpmcrng` is byte-addressed; the driver reads the parent byte
first, masks the target bit, and writes the byte back. Concurrent
ladder writes to the same byte create a small race window. Coordinate
through a ladder-side handshake when this matters.
5. **Dry run with `Writable = true` but `Writes.AllowPmc = false`.** Same
staged-opt-in pattern as PARAM/MACRO — confirm tag mapping before any
PMC byte hits the wire.
### LDAP group requirements
Per [`docs/security.md`](../security.md) the server-layer ACL maps
`SecurityClassification` to LDAP groups. Post-F4-b:
| Tag kind | LDAP group required |
| --- | --- |
| `PARAM:N` (writable) | **`WriteConfigure`** — heaviest write tier; matches commissioning roles |
| `MACRO:N` (writable) | `WriteOperate` — standard HMI recipe / setpoint group |
| PMC R/G/F (writable) | `WriteOperate` |
| Read-only | `ReadOnly` |
Per the `feedback_acl_at_server_layer` design note, the FOCAS driver
declares the classification but does NOT enforce it; `DriverNodeManager`
applies the gate before the driver's `WriteAsync` ever runs. A user
without `WriteConfigure` who attempts a `PARAM:` write gets
`BadUserAccessDenied` from the server with no driver-level audit entry —
the OPC UA layer's audit log catches it.
### Audit-log expectations
Every successful write produces:
- An OPC UA AuditWriteEvent (server layer — see
[`docs/security.md`](../security.md) "Audit logging").
- A FOCAS driver-level Serilog entry tagged `Driver=FOCAS DriverInstanceId=...
TagName=... Address=... ResultStatus=...`.
- A `Writes/LastWriteAt` and `Writes/LastWriteStatus` diagnostic counter
refresh on the device's `Diagnostics/` fixed-tree node (planned;
populated as F4-c lands).
Failures to write (`BadUserAccessDenied`, `BadCommunicationError`, etc.)
produce the same audit entries with the failure status code so a
post-incident reviewer sees the same shape regardless of whether the write
succeeded.
**Audit PMC writes specifically.** Because PMC writes have the highest blast
radius of the three write kinds, ops should set up a saved-search /
dashboard query for `Driver=FOCAS` + `Address` matching the PMC letter
prefixes (`R*`, `G*`, `F*`, `D*`, `Y*`, etc.) and review on the same
cadence as ladder change reviews. A spike in PMC write rate or a write
to an address outside the audited tag list is the leading indicator of a
misconfigured client or compromised credential.
### Granular config example
```jsonc
{
"Drivers": {
"FOCAS": {
"Devices": [
{ "HostAddress": "focas://10.0.0.5:8193", "Series": "Series30i" }
],
"Writes": {
"Enabled": true,
"AllowMacro": true, // recipe / setpoint writes — operator role
"AllowParameter": false, // commissioning only — keep locked except during planned work
"AllowPmc": false // PMC writes — keep locked unless the deployment specifically needs them
},
"Tags": [
{ "Name": "Recipe.PartCount", "DeviceHostAddress": "focas://10.0.0.5:8193",
"Address": "MACRO:500", "DataType": "Int32",
"Writable": true, "WriteIdempotent": true },
{ "Name": "MaxFeedrate", "DeviceHostAddress": "focas://10.0.0.5:8193",
"Address": "PARAM:1815", "DataType": "Int32",
"Writable": false /* keep read-only until commissioning window */ },
{ "Name": "OperatorRequest", "DeviceHostAddress": "focas://10.0.0.5:8193",
"Address": "R100.3", "DataType": "Bit",
"Writable": false /* keep PMC read-only until ladder handshake reviewed */ }
]
}
}
}
```
Flipping `AllowParameter` / `AllowPmc` on for the commissioning window
(and back off afterward) is the recommended deployment cadence — the
granular kill switches are lightweight runtime toggles, not config-DB
redeploys. PMC in particular should default OFF in production and only
flip on for windows where the ladder team has signed off on the write
path.
## FOCAS password handling — issue #271 (F4-d)
Some controllers (16i + certain 30i firmwares with parameter-protect on)
gate `cnc_wrparam` and selected reads behind a connection-level password.
The driver supports this via the `Password` field on `FocasDeviceOptions`
which is emitted via `cnc_wrunlockparam` on connect and re-emitted on any
`EW_PASSWD` read/write retry path. See
[`docs/drivers/FOCAS.md`](../drivers/FOCAS.md) § "FOCAS password" for the
driver-side behaviour; this section covers the deployment side.
### Storage in `appsettings.json`
```jsonc
{
"Drivers": {
"Focas01": {
"DriverConfigJson": {
"Backend": "fwlib",
"Series": "Sixteen_i",
"Devices": [
{
"HostAddress": "focas://10.0.0.5:8193",
"Password": "1234"
}
]
}
}
}
}
```
For dev environments, the password is materialised under
`.local/focas-passwords.txt` (or whichever .local subkey the deployment
team prefers); production deployments use the same secrets-store /
KeyVault pattern the LDAP `Authentication.Ldap.Password` field follows.
**The `.local/` directory is .gitignore'd** — this is the same posture
as `.local/galaxy-host-secret.txt` and other dev secrets in this repo.
### No-log invariant
The driver guarantees the password is **never logged**:
1. **`FocasDeviceOptions` ToString redaction.** The record overrides
`PrintMembers` so any Serilog destructure of the device options renders
`Password = ***` when the field is non-null. This catches the most
common leak path — a structured-log statement that included
`{@Device}` for diagnostic context.
2. **No password in exception messages.** `FwlibFocasClient.UnlockAsync`
omits the password from its `InvalidOperationException` text — only
the FWLIB error code (`EW_PASSWD`, `EW_HANDLE`, etc.) makes it through.
3. **Driver log line uses host only.** When unlock succeeds the driver
updates `DriverHealth.StatusText` to `"FOCAS unlock applied for
{host}"` — no password.
4. **CLI flag covered by the same choke point.** The
`Driver.FOCAS.Cli --cnc-password` flag flows through
`FocasDeviceOptions.Password`, so its redaction is identical to the
server's. The PowerShell e2e harness (`scripts/e2e/test-focas.ps1
-CncPassword`) follows the same path.
Any new logging surface that touches `FocasDeviceOptions` MUST continue
to use the record's `ToString` (or otherwise omit `Password`). A code
review checklist item: "no log statement contains `device.Options.Password`
or `device.Password` directly."
### Password-rotation runbook
When the CNC password rotates (operator team flipped a parameter-protect
gate, or your security policy requires periodic rotation):
1. **Update the password on the controller** (CNC pendant or vendor's
admin tool). The exact path varies by series — Fanuc service manual
page reference depends on the MTB.
2. **Update `appsettings.json`** in place with the new value.
- Production: bump the secrets-store entry that backs the
`Devices[*].Password` config-DB column. Same workflow as rotating
the LDAP service-account password.
- Dev: update `.local/focas-passwords.txt` (or wherever the dev
deployment sources the secret).
3. **Restart the OtOpcUa server** (or trigger a config-DB bump that
forces driver reinitialise). The driver picks up the new password
on the next `EnsureConnectedAsync` call. **No need to manually
reconnect each device** — `cnc_wrunlockparam` emits on the next
wire-call boundary.
4. **Verify**. The first wire call after restart logs
`"FOCAS unlock applied for focas://{host}:{port}"` at info. A wrong
password surfaces as `BadUserAccessDenied` on the next gated read or
write.
5. **Audit.** OPC UA wrote-event entries (per
[`audit-log-rules.md`](audit-log-rules.md)) cover the
parameter/macro write paths. Password rotation itself is NOT logged
beyond "unlock applied" — same posture as LDAP service-account
rotation, where the password change is logged out-of-band by the IAM
system.
### Cross-references
- [`docs/Security.md`](../Security.md) — server-wide secrets handling +
the same `.local/` pattern used for LDAP and the Galaxy.Host pipe
secret. The FOCAS password follows the same posture.
- [`docs/drivers/FOCAS.md`](../drivers/FOCAS.md) § "FOCAS password" —
driver-side behaviour, EW_PASSWD retry semantics, status-code
surface.
- [`docs/v2/implementation/focas-wire-protocol.md`](implementation/focas-wire-protocol.md)
§ "cnc_wrunlockparam" — wire-frame layout for the password buffer.
+2 -2
View File
@@ -6,7 +6,7 @@ enforces at driver init time. Every row cites the Fanuc FOCAS Developer
Kit function whose documented input range determines the ceiling.
**Why this exists** — we have no FOCAS hardware on the bench and no
working simulator. FWLIB (Fwlib64, or Fwlib32 on legacy deployments) returns `EW_NUMBER` / `EW_PARAM` when you
working simulator. Fwlib32 returns `EW_NUMBER` / `EW_PARAM` when you
hand it an address outside the controller's supported range; the
driver would map that to a per-read `BadOutOfRange` at steady state.
Catching at `InitializeAsync` with this matrix surfaces operator
@@ -140,6 +140,6 @@ matrix: Macro variable #50000 is outside the documented range
This validation closes the cheap half of the FOCAS hardware-free
stability gap — config errors now fail at load instead of per-read.
The expensive half is Tier-C process isolation so that a crashing
`Fwlib64.dll` doesn't take the main OPC UA server down with it. See
`Fwlib32.dll` doesn't take the main OPC UA server down with it. See
[`docs/v2/implementation/focas-isolation-plan.md`](implementation/focas-isolation-plan.md)
for that plan (task #220).
@@ -1,38 +0,0 @@
# Admin UI Phase 6 status audit (2026-04-24)
Audit pass that closes the Phase 6 Admin-UI tasks that were tracked as still-open (#128#131) but already had their Blazor pages shipped. Every page listed below compiles against the current `OtOpcUaConfigDbContext` schema + the current Admin service surface, has substantive (non-stub) content, and is covered by `ZB.MOM.WW.OtOpcUa.Admin.Tests` (112/112 green).
## Task #128 — /hosts column refresh (Phase 6.1 Stream E.2/E.3)
`Components/Pages/Hosts.razor` — 233 LOC. Route `/hosts`. Ships:
- Per-driver circuit-breaker columns (`ConsecutiveFailures`, `LastCircuitBreakerOpenUtc`).
- Stale-row detection via `HostStatusService.IsStale` (publisher heartbeat ≥ 30 s stale threshold).
- Summary cards: Running / Stale / Faulted / total.
- Auto-refresh every `RefreshIntervalSeconds` driven by the `FleetStatusHub` SignalR feed.
- Health band via `DriverHostState` enum colour coding.
## Task #129 — RoleGrantsTab + AclsTab + Probe (Phase 6.2 Stream D)
- `Components/Pages/RoleGrants.razor` — 192 LOC. Route `/role-grants`. Edits LDAP-group → OPC-UA-role mappings with live reload over `AclChangeNotifier` SignalR.
- `Components/Pages/Clusters/AclsTab.razor` — 279 LOC. NodeAcl CRUD + the **"Probe this permission"** form (task #196 slice 1, embedded at line 38 onward). Binds `_probeGroup` / `_probeNamespaceId` / `_probeUnsAreaId` / `_probeUnsLineId` / `_probeEquipmentId` / `_probeTagId` / `_probePermission` through `PermissionProbeService`.
## Task #130 — RedundancyTab (Phase 6.3 Stream E)
`Components/Pages/Clusters/RedundancyTab.razor` — 175 LOC. Topology table, per-peer reachability (via `FleetStatusHub`), ServiceLevel band + `ApplyLeaseRegistry` / `RecoveryStateManager` state surfaces, failover action button. Live updates over the same SignalR hub `RedundancyPublisherHostedService` ticks.
## Task #131 — Draft / publish / diff / identification (Phase 6.4 Streams AD)
- `Components/Pages/Clusters/DraftEditor.razor` — 105 LOC. Route `/clusters/{ClusterId}/draft/{GenerationId:long}`. Calls `DraftValidationService` + `GenerationService`.
- `Components/Pages/Clusters/Generations.razor` — 73 LOC. Publish flow (generation state transitions through `sp_PublishGeneration`).
- `Components/Pages/Clusters/DiffViewer.razor` — 87 LOC. Route `/clusters/{ClusterId}/draft/{GenerationId:long}/diff`. Renders `sp_ComputeGenerationDiff` output.
- `Components/Pages/Clusters/IdentificationFields.razor` — 49 LOC. OPC 40010 Identification folder editor bound to the `Equipment` entity.
## What's NOT in this audit
- `#124` — Phase 6.2 3-user interop matrix. Authz layer is now covered by `ThreeUserInteropMatrixTests` in `ZB.MOM.WW.OtOpcUa.Server.Tests` (drives the 5 GLAuth users + admin through `LdapUserAuthenticator``AuthorizationGate.IsAllowed` for the role × operation matrix). The wire-level OPC UA-client cross-vendor leg still needs a UserName-token endpoint policy + manual client drill — that part stays a manual deliverable.
- `#119` — Phase 6.3 client interop matrix. Manual Ignition/Kepware/Aveva drills.
- `#113` — OPC UA CTT conformance pass. Manual CTT run.
- `#114` / `#115` — Redundancy cutover + deployment checklist. Manual.
Those remain GA-gating but require a human at a console, not a code change.
-129
View File
@@ -1,129 +0,0 @@
# Phase 3 Exit Gate — Driver Fleet (reconstructed retroactively)
> **Status**: **CLOSED (reconstructed 2026-04-23)**. The original plan split the
> driver work across Phases 3 / 4 / 5 (Modbus alone → four PLC drivers → two
> specialty drivers). In execution, all seven non-Galaxy drivers shipped under
> one umbrella against `Core.Abstractions` + `Core`'s generic driver-hosting
> machinery. This doc captures the closure retroactively; no forward work
> remains under these three original phase numbers.
>
> **Plan doc**: none — phases 3/4/5 were intentionally not split out into
> separate plan docs once it was clear the capability-interface contract
> introduced in Phase 1 (`Core.Abstractions` — plan decision #4) was stable
> enough that each driver could land as its own stream rather than as a
> gated mini-phase. See `docs/v2/plan.md` §6 for the now-consolidated
> migration strategy.
## Scope
All seven drivers in the v2 target list (Decision #5) minus Galaxy (closed
separately under Phase 2). The Galaxy Proxy+Host+Shared split exited under
`exit-gate-phase-2-final.md`; this gate does not re-cover it.
## What shipped
### Drivers
| Driver | Project | Capability surface | Test projects |
|---|---|---|---|
| Modbus TCP | `Driver.Modbus` + `Driver.Modbus.Cli` | `IDriver` + `ITagDiscovery` + `IReadable` + `IWritable` + `ISubscribable` + `IHostConnectivityProbe` | `Tests`, `IntegrationTests`, `Cli.Tests` |
| AB CIP | `Driver.AbCip` + `Driver.AbCip.Cli` | all of the above + `IPerCallHostResolver` + `IAlarmSource` | `Tests`, `IntegrationTests`, `Cli.Tests` |
| AB Legacy (PCCC / DF1) | `Driver.AbLegacy` + `Driver.AbLegacy.Cli` | `IDriver` + `IReadable` + `IWritable` + `ITagDiscovery` + `ISubscribable` + `IHostConnectivityProbe` + `IPerCallHostResolver` | `Tests`, `IntegrationTests`, `Cli.Tests` |
| Siemens S7 | `Driver.S7` + `Driver.S7.Cli` | `IDriver` + `ITagDiscovery` + `IReadable` + `IWritable` + `ISubscribable` + `IHostConnectivityProbe` | `Tests`, `IntegrationTests`, `Cli.Tests` |
| Beckhoff TwinCAT (ADS) | `Driver.TwinCAT` + `Driver.TwinCAT.Cli` | `IDriver` + `IReadable` + `IWritable` + `ITagDiscovery` + `ISubscribable` + `IHostConnectivityProbe` + `IPerCallHostResolver` | `Tests`, `IntegrationTests`, `Cli.Tests` |
| FANUC FOCAS | `Driver.FOCAS` + `Driver.FOCAS.Host` + `Driver.FOCAS.Shared` + `Driver.FOCAS.Cli` | `IDriver` + `IReadable` + `IWritable` + `ITagDiscovery` + `ISubscribable` + `IHostConnectivityProbe` + `IPerCallHostResolver`; Tier-C out-of-process backend mirrors the Galaxy Proxy/Host split. `Fwlib64FocasBackend` shipped 2026-04-23 as the production backend (P/Invoke against `Fwlib64.dll`); Host retargeted from net48 x86 to net10.0-windows x64 at the same time. | `Tests`, `Host.Tests`, `Shared.Tests`, `Cli.Tests` |
| OPC UA Client (gateway) | `Driver.OpcUaClient` | `IDriver` + `ITagDiscovery` + `IReadable` + `IWritable` + `ISubscribable` + `IHostConnectivityProbe` + `IAlarmSource` + `IHistoryProvider` (richest surface in the fleet — it's bridging another UA server) | `Tests`, `IntegrationTests` |
### Supporting infrastructure
| PR / Task | Summary |
|---|---|
| #248 | `DriverFactoryRegistry` + `DriverInstanceBootstrapper` — central DB `DriverInstance` rows materialise into live `IDriver` instances at server startup. |
| #210 | Modbus server-side factory + seed SQL (closed first child of umbrella #209). |
| #211 #212 #213 | AB CIP / S7 / AB Legacy server-side factories + seed SQL. |
| #220 (FOCAS) | FOCAS factory wired into the bootstrap pipeline; Tier-C split (`Driver.FOCAS.Host` process launcher, named-pipe IPC, NSSM install scripts, post-mortem MMF) shipped across the five-PR series. |
| (this session) | TwinCAT factory wired in + Server project reference added; all seven driver factories now register uniformly in `Server/Program.cs`. |
| #249 #250 #251 | Per-driver test-client CLI suite (`otopcua-<driver>-cli`) — shared lib + one CLI per driver for direct-to-PLC smoke testing independent of the server. |
| #253 + follow-ups | E2E CLI test scripts (`scripts/e2e/test-<driver>.ps1`) — five-stage bidirectional bridge + subscribe-sees-change assertions per driver, plus `test-all.ps1` matrix runner. |
| (this session) | OPC UA Client e2e script shipped (`test-opcuaclient.ps1`, 8 stages) — the only driver that was missing an e2e script. |
### Docs
Per-driver test-fixture documentation:
- `docs/drivers/Modbus-Test-Fixture.md`
- `docs/drivers/AbServer-Test-Fixture.md` (covers AB CIP fixture)
- `docs/drivers/AbLegacy-Test-Fixture.md`
- `docs/drivers/S7-Test-Fixture.md`
- `docs/drivers/TwinCAT-Test-Fixture.md`
- `docs/drivers/FOCAS-Test-Fixture.md`
- `docs/drivers/OpcUaClient-Test-Fixture.md`
Driver-level ops docs:
- `docs/Driver.Modbus.Cli.md`, `docs/Driver.AbCip.Cli.md`, `docs/Driver.AbLegacy.Cli.md`, `docs/Driver.S7.Cli.md`, `docs/Driver.TwinCAT.Cli.md`, `docs/Driver.FOCAS.Cli.md`
- `docs/v2/driver-specs.md` — unified capability-matrix spec for all eight drivers (Galaxy + seven).
## Compliance evidence
No dedicated `phase-3-compliance.ps1` exists — scope was too broad to fit the
single-script pattern that worked for Phases 6.x and 7. Verification instead
takes the form of the per-driver test suites + e2e scripts:
- [x] **Unit tests** — every driver has a `Tests` project with capability-interface contract tests; `dotnet test tests/ZB.MOM.WW.OtOpcUa.Driver.*.Tests` is green.
- [x] **Integration tests**`Driver.*.IntegrationTests` stands up Docker-hosted simulators (pymodbus, ab_server, python-snap7, opc-plc) at collection init and exercises real wire-level read/write/subscribe/probe per driver.
- [x] **CLI tests**`Driver.*.Cli.Tests` covers the per-driver test-client CLIs (#249#251).
- [x] **E2E scripts**`scripts/e2e/test-<driver>.ps1` covers the driver-CLI → PLC → OtOpcUa server → OPC UA client round-trip for all seven drivers + Galaxy; `test-all.ps1` aggregates; README status section (rewritten this session) summarises live-boot evidence.
- [x] **Factory registration** — all seven factories plus Galaxy register in `src/ZB.MOM.WW.OtOpcUa.Server/Program.cs` inside the `DriverFactoryRegistry` composition; the `DriverInstanceBootstrapper` can materialise any configured row.
- [x] **Seed SQL**#210#213 provide per-driver Config DB seed scripts so a fresh Config DB is populatable without Admin UI interaction.
### Live-boot verification
Recorded across the session-level tracking tasks:
| Driver | Fixture | Stages | Tracking |
|---|---|---|---|
| Modbus | pymodbus (dl205 profile) | 5/5 | #209 exit gate; bidirectional + subscribe-sees-change added in #253 follow-ups |
| AB CIP | `ab_server` ControlLogix | 5/5 | #220 |
| S7 | python-snap7 | 5/5 | #220 |
| AB Legacy | `ab_server` SLC500 / MicroLogix / PLC-5 (requires `/1,0` cip-path for Docker fixture) | 5/5 | #222 partial |
| OPC UA Client | opc-plc Docker fixture | 5/8 (probe, remote read, forward bridge, subscribe, browse) | (this session) |
| TwinCAT | TCBSD VM @ 10.100.0.128 (AmsNetId `41.169.163.43.1.1`) — real TwinCAT runtime under FreeBSD on ESXi; bypasses the Hyper-V/RTIME conflict that blocks XAR on this dev box | features validated | fixture is the TCBSD VM; `TWINCAT_TRUST_WIRE=1` still gates the e2e script by default so unintentional runs against cold fixtures don't false-pass |
| FOCAS | Lab-rig CNC + `Fwlib64.dll` | — | **deferred**`Fwlib64FocasBackend` shipped 2026-04-23; wire-level live-boot gated `FOCAS_TRUST_WIRE=1`, lab rig tracked under #222 follow-up |
| Galaxy | Live Galaxy + `OtOpcUaGalaxyHost` (this dev box) | 7/7 (read / write / subscribe / alarms / history) | closed under Phase 2 |
## Deferred to post-gate follow-ups
Items intentionally not blocking closure of this umbrella — each is hardware-
dependent and tracked separately:
- [ ] **FOCAS wire-level live-boot**`test-focas.ps1` against a real CNC once `Fwlib64.dll` is on PATH and `FOCAS_TRUST_WIRE=1` (#222 follow-up). The `Fwlib64FocasBackend` shipped 2026-04-23 — code exists, unit-tests green; only the live-CNC smoke test remains.
- [x] **FOCAS `Fwlib64FocasBackend`****CLOSED 2026-04-23**. The production backend in `src/ZB.MOM.WW.OtOpcUa.Driver.FOCAS.Host/Backend/Fwlib64FocasBackend.cs` wraps `FwlibFocasClient` to fulfil `IFocasBackend` against the licensed `Fwlib64.dll`. Host project retargeted to `net10.0-windows` x64. Default when `OTOPCUA_FOCAS_BACKEND` is unset. 6 new backend tests green. Only wire-level live-boot against real hardware remains — see item above.
- [ ] **OPC UA Client stages 5/7/8** — reverse-bridge, alarm, history stages are opt-in via sidecar NodeId params because opc-plc's default image has no writable nodes and doesn't historize. Against a richer upstream (Prosys, UA Expert sample server) all eight stages can run.
## Completion checklist
- [x] Modbus driver shipped + unit + integration + CLI tests green
- [x] AB CIP driver shipped + tests green + live-boot 5/5
- [x] AB Legacy driver shipped + tests green + live-boot 5/5
- [x] S7 driver shipped + tests green + live-boot 5/5
- [x] TwinCAT driver shipped + tests green + features validated against the TCBSD VM virtual-PLC fixture
- [x] FOCAS driver shipped (Tier-C split) + tests green (wire-live deferred)
- [x] OPC UA Client driver shipped + tests green + live-boot 5/8
- [x] `DriverFactoryRegistry` + `DriverInstanceBootstrapper` shipped
- [x] All seven factories registered in `Server/Program.cs`
- [x] Per-driver test-client CLI suite shipped
- [x] E2E test scripts shipped + `test-all.ps1` aggregator green
- [x] Per-driver test-fixture docs present
- [x] `docs/v2/driver-specs.md` unified capability spec present
- [x] `scripts/e2e/README.md` status section reflects current live-boot matrix
- [x] Exit gate doc checked in (this file)
- [x] TwinCAT validated against the TCBSD VM virtual-PLC fixture — `TWINCAT_TRUST_WIRE=1` + e2e script still gated by default to prevent false-pass against cold fixtures
- [ ] FOCAS lab-rig follow-up filed + tracked (#222)
## Why no compliance script
The Phases 6.1/6.2/6.3/6.4/7 pattern of a single `phase-N-compliance.ps1`
worked because each of those phases touched a narrow slice of server-side
runtime. A "phase-3-compliance.ps1" would have had to boot seven simulators,
configure seven DriverInstance rows, and run seven e2e scripts — which is
exactly what `scripts/e2e/test-all.ps1` already does. The aggregate runner
+ its README is the compliance artefact for this umbrella.
+9 -9
View File
@@ -1,6 +1,6 @@
# Phase 7 Exit Gate — Scripting, Virtual Tags, Scripted Alarms, Historian Sink
> **Status**: **FULLY CLOSED** 2026-04-23 audit — the three original follow-ups (#239 / #240 / #241) were all shipped under later branches but this exit-gate doc wasn't updated at the time. All three verified against the repo + tests green.
> **Status**: Open. Closed when every compliance check passes + every deferred item either ships or is filed as a post-v2-release follow-up.
>
> **Compliance script**: `scripts/compliance/phase-7-compliance.ps1`
> **Plan doc**: `docs/v2/implementation/phase-7-scripting-and-alarming.md`
@@ -45,13 +45,13 @@ Covered by `scripts/compliance/phase-7-compliance.ps1`:
- [x] Walker emits `NodeSourceKind.Virtual` + `NodeSourceKind.ScriptedAlarm` variables
- [x] `DriverNodeManager` dispatch routes Reads by source; Writes to non-Driver rejected with `BadUserAccessDenied` (plan #6)
## Deferred to Post-Gate Follow-ups (all closed as of 2026-04-23 audit)
## Deferred to Post-Gate Follow-ups
Originally kept out of the capstone so the gate could close cleanly. Each landed as a targeted follow-up PR; audit this session verified them against the repo:
Kept out of the capstone so the gate can close cleanly while the less-critical wiring lands in targeted PRs:
- [x] **SealedBootstrap composition root** (task #239) — **CLOSED**. `src/ZB.MOM.WW.OtOpcUa.Server/Phase7/Phase7Composer.cs` instantiates `VirtualTagEngine` + `ScriptedAlarmEngine` via `Phase7EngineComposer.Compose`, and `SqliteStoreAndForwardSink` in `ResolveHistorianSink` when a registered driver provides `IAlarmHistorianWriter` (today: `GalaxyProxyDriver`). `OpcUaServerService.ExecuteAsync` calls `Phase7Composer.PrepareAsync` then `OpcUaApplicationHost.SetPhase7Sources` **before** `applicationHost.StartAsync` so `OtOpcUaServer` + `DriverNodeManager` capture the `VirtualReadable` / `ScriptedAlarmReadable` at construction. 38 tests green under `tests/ZB.MOM.WW.OtOpcUa.Server.Tests/Phase7/` + `SealedBootstrapIntegrationTests`. The work landed under the label "Phase 7 follow-up #246" and was never re-labelled against #239.
- [x] **Live OPC UA end-to-end smoke** (task #240) — **CLOSED**. `scripts/e2e/test-phase7-virtualtags.ps1` drives a full Client.CLI read of a driver-sourced input, reads the VirtualTag computed off it, triggers a scripted alarm by writing the trigger value, and subscribes to the alarm condition — all through a running OtOpcUa server. Covered in `scripts/e2e/test-all.ps1` + `scripts/e2e/README.md` matrix.
- [x] **sp_ComputeGenerationDiff extension** (task #241) — **CLOSED**. Migration `20260420232000_ExtendComputeGenerationDiffWithPhase7.cs` extends the stored proc to emit Script / VirtualTag / ScriptedAlarm sections alongside the existing NodeAcl / Tag / Equipment / DriverInstance / Namespace output. Admin DiffViewer picks them up through its existing section-plugin architecture (Phase 6.4 Stream C).
- [ ] **SealedBootstrap composition root** (task #239) — instantiate `VirtualTagEngine` + `ScriptedAlarmEngine` + `SqliteStoreAndForwardSink` in `Program.cs`; pass `VirtualTagSource` + `ScriptedAlarmSource` as the new `IReadable` parameters on `DriverNodeManager`. Without this, the engines are dormant in production even though every piece is tested.
- [ ] **Live OPC UA end-to-end smoke** (task #240) — Client.CLI browse + read a virtual tag computed by Roslyn; Client.CLI acknowledge a scripted alarm via the Part 9 method node; historian-disabled deployment returns `BadNotFound` for virtual nodes rather than silent failure.
- [ ] **sp_ComputeGenerationDiff extension** (task #241) — emit Script / VirtualTag / ScriptedAlarm sections alongside the existing Namespace/DriverInstance/Equipment/Tag/NodeAcl rows so the Admin DiffViewer shows Phase 7 changes between generations.
## Completion Checklist
@@ -66,9 +66,9 @@ Originally kept out of the capstone so the gate could close cleanly. Each landed
- [x] `phase-7-compliance.ps1` present and passes
- [x] Full solution `dotnet test` passes (no new failures beyond pre-existing tolerated CLI flake)
- [x] Exit-gate doc checked in
- [x] `SealedBootstrap` composition follow-up shipped (#239 / Phase 7 follow-up #246)
- [x] Live end-to-end smoke follow-up shipped (#240`scripts/e2e/test-phase7-virtualtags.ps1`)
- [x] `sp_ComputeGenerationDiff` extension follow-up shipped (#241 — migration `ExtendComputeGenerationDiffWithPhase7`)
- [ ] `SealedBootstrap` composition follow-up filed + tracked
- [ ] Live end-to-end smoke follow-up filed + tracked
- [ ] `sp_ComputeGenerationDiff` extension follow-up filed + tracked
## How to run
+5 -16
View File
@@ -1,21 +1,10 @@
# FOCAS Tier-C isolation — plan for task #220
> **Status**: **FULLY SHIPPED** (code). PRs AE shipped the architecture; the
> 2026-04-23 follow-up shipped the production `Fwlib64FocasBackend` wrapping
> the licensed `Fwlib64.dll`. Only the wire-level live-boot against real
> hardware remains (task #222 / requires a bench CNC).
>
> **Major update 2026-04-23 — Host retargeted to .NET 10 x64 + Fwlib64**:
> Both `Fwlib32.dll` and `Fwlib64.dll` are licensed for this project. The
> original plan put the Host on .NET 4.8 x86 because Fwlib32 was assumed.
> With Fwlib64 available, the Host moves to `net10.0-windows` x64 — same
> runtime as the rest of the fleet. **Tier-C isolation stays anyway** — the
> blast-radius argument against a closed-source vendor P/Invoke is independent
> of bitness. Galaxy (forced x86 by MXAccess COM) is a pure bitness forcing;
> FOCAS is a pure blast-radius choice. Body of this document still reflects
> the original x86 assumptions in a few places — read them as historical
> design context; the current shape is in `docs/drivers/FOCAS-Test-Fixture.md`
> and `exit-gate-phase-3.md`.
> **Status**: PRs AE shipped. Architecture is in place; the only
> remaining FOCAS work is the hardware-dependent production
> integration of `Fwlib32.dll` into a real `IFocasBackend`
> (`FwlibHostedBackend`), which needs an actual CNC on the bench
> and is tracked as a follow-up on #220.
>
> **Pre-reqs shipped**: version matrix + pre-flight validation
> (PR #168 — the cheap half of the hardware-free stability gap).
@@ -0,0 +1,394 @@
# FOCAS simulator (focas-mock) plan
Notes on the focas-mock simulator that the FOCAS driver's integration
tests will eventually talk to. Today there is no FOCAS integration-test
project; this doc is the contract the future fixture will be built
against. Keeping the contract tracked in repo means the wire-protocol
command ids (and their request/response payloads) don't drift between the
.NET wire client and a future Python implementation.
## Ground rules
- Append-only command ids. Mirror
[`focas-wire-protocol.md`](./focas-wire-protocol.md) verbatim.
- Per-profile state. The simulator hosts N CNC profiles concurrently
(`Series0i`, `Series30i`, `PowerMotion`, ...). Each profile has its own
alarm-history ring buffer + its own override map.
- Admin endpoints under `POST /admin/...` mutate state without going
through the wire protocol; integration tests use these to seed canned
inputs.
## Protocol surface (current scope)
| Cmd | API | State impact |
| --- | --- | --- |
| `0x0001` | `cnc_rdcncstat` | reads cached ODBST per profile |
| `0x0002` | `cnc_rdparam` | reads parameter map per profile |
| `0x0003` | `cnc_rdmacro` | reads macro variables per profile |
| `0x0004` | `cnc_rddiag` | reads diagnostic map per profile |
| `0x0010` | `pmc_rdpmcrng` | reads PMC byte ranges |
| `0x0020` | `cnc_modal` | reads cached modal MSTB per profile |
| ... | ... | ... |
| **`0x0102`** | **`cnc_wrparam`** | **mutates per-profile parameter map; returns `EW_PASSWD` (`11`) when the profile's `unlock_state` is off (sets up F4-d's unlock workflow) — issue #269, plan PR F4-b** |
| **`0x0103`** | **`cnc_wrmacro`** | **mutates per-profile macro map; integer-only writes for now (decimalPointCount=0) — issue #269, plan PR F4-b** |
| **`0x0104`** | **`pmc_wrpmcrng`** | **mutates per-profile PMC byte tables; byte-aligned writes preserve untouched bytes; bit-level writes never reach the simulator (driver wraps with RMW) — issue #270, plan PR F4-c** |
| **`0x0105`** | **`cnc_wrunlockparam`** | **flips the per-profile `unlock_state` to true when the supplied 4-byte password buffer matches the profile's `unlock_password`; otherwise returns `EW_PASSWD`. State persists for the connection lifetime (per-session). — issue #271, plan PR F4-d** |
| **`0x0F1A`** | **`cnc_rdalmhistry`** | **dumps the per-profile alarm-history ring buffer (issue #267, plan PR F3-a)** |
## `cnc_rdalmhistry` mock behaviour
The simulator keeps a per-profile ring buffer of alarm-history entries.
Default fixture seeds 5 profiles with 10 canned entries each (per the F3-a
plan).
### Request decode
```
[int16 LE depth]
```
### Response encode
Use `FocasAlarmHistoryDecoder.Encode` semantics in reverse: emit the
count followed by `ALMHIS_data` blocks padded to 4-byte boundaries. The
.NET-side decoder consumes the same format verbatim, so a Python encoder
written against the table in
[`focas-wire-protocol.md`](./focas-wire-protocol.md) interoperates without
extra glue.
### Admin endpoint — `POST /admin/mock_patch_alarmhistory`
Replaces the alarm-history ring buffer for a profile.
```
POST /admin/mock_patch_alarmhistory
{
"profile": "Series30i",
"entries": [
{
"occurrenceTime": "2025-04-01T09:30:00Z",
"axisNo": 1,
"alarmType": 2,
"alarmNumber": 100,
"message": "Spindle overload"
},
...
]
}
```
`entries` order is interpreted as ring-buffer order (most-recent first to
match FANUC's natural surface).
### `FocasSimFixture.SeedAlarmHistoryAsync`
The future test-support helper wraps the admin endpoint:
```csharp
await fixture.SeedAlarmHistoryAsync(
profile: "Series30i",
entries: new []
{
new FocasAlarmHistoryEntry(
new DateTimeOffset(2025, 4, 1, 9, 30, 0, TimeSpan.Zero),
AxisNo: 1, AlarmType: 2, AlarmNumber: 100, Message: "Spindle overload"),
});
```
Integration test `Series/AlarmHistoryProjectionTests.cs` will assert:
- historic events fire once with the seeded timestamps
- second poll yields zero new events (dedup honoured end-to-end)
- active-alarm raise/clear still works alongside the history poll
These tests are blocked on the focas-mock + integration-test project
landing; the unit-test coverage in `FocasAlarmProjectionTests` already
exercises every same-process invariant.
## `cnc_wrparam` / `cnc_wrmacro` mock behaviour — issue #269, plan PR F4-b
When the focas-mock fixture lands, it MUST implement the contract below.
The .NET side already ships against this contract (`FwlibFocasClient.cs`
write helpers, `FakeFocasClient` round-trip support); writing the simulator
to the same shape lets the existing integration-test scaffolds at
`tests/.../IntegrationTests/Series/ParameterWriteTests.cs` and
`MacroWriteTests.cs` (when they materialise) light up without driver
changes.
### Per-profile state
Each profile owns:
- `parameters: Dict[int, int]` — map from parameter number to current value.
- `macros: Dict[int, int]` — map from macro number to current scaled-int
value (decimal-point count fixed at 0 for F4-b).
- `unlock_state: bool` — defaults `False`. When `False`, every
`cnc_wrparam` returns `EW_PASSWD` (numeric `11`) regardless of
parameter. Macro writes are NOT gated by `unlock_state`.
- `unlock_password: bytes` (4-byte buffer) — defaults to the profile's
fixture default (e.g. `b"1234"` for Series30i). Compared byte-for-byte
by the `cnc_wrunlockparam` handler; flips `unlock_state = True` on
match, leaves it untouched on mismatch (and returns `EW_PASSWD`).
Mutable via `POST /admin/mock_set_password` for tests that exercise
rotation. Issue #271, plan PR F4-d.
- `last_write: Optional[LastWrite]` — most-recent successful
`(kind, number, value, ts)` tuple, surfaced via the admin endpoint
below for audit-log assertions.
### `cnc_wrparam` request decode
```
[int16 LE datano][int16 LE axis][int8|int16|int32 LE value]
```
Width of the value field is determined by the request frame trailer
length per the table in
[`focas-wire-protocol.md`](./focas-wire-protocol.md). On
`unlock_state == False` short-circuit to `[int16 LE 11]` (`EW_PASSWD`).
Otherwise mutate `parameters[datano] = value`, set `last_write`, return
`[int16 LE 0]`.
### `cnc_wrmacro` request decode
```
[int16 LE number][int16 LE length=8][int32 LE mcr_val][int16 LE dec_val]
```
Always accept (no `unlock_state` gate). Mutate
`macros[number] = mcr_val` (we ignore `dec_val` for F4-b — integer-only).
Return `[int16 LE 0]`. Round-trip: a subsequent `cnc_rdmacro(number)`
returns `(mcr_val, 0)`.
### Admin endpoint — `POST /admin/mock_set_unlock_state`
Toggles `unlock_state` for the F4-d unlock workflow tests. Without this,
F4-b parameter-write integration tests can't reproduce the
`EW_PASSWD``BadUserAccessDenied` mapping.
```
POST /admin/mock_set_unlock_state
{ "profile": "Series30i", "unlocked": true }
```
### `cnc_wrunlockparam` request decode — issue #271, plan PR F4-d
```
[byte[4] password]
```
Match `password == profile.unlock_password` byte-for-byte. On match:
flip `unlock_state = True`, return `[int16 LE 0]`. On mismatch: leave
`unlock_state` untouched, return `[int16 LE 11]` (`EW_PASSWD`).
The simulator deliberately keeps unlock state per-session (per OpenSession
handle) so a reconnect drops back to `unlock_state = False` — matching the
FWLIB lifetime semantics described in
[`focas-wire-protocol.md`](./focas-wire-protocol.md) § "cnc_wrunlockparam".
### Admin endpoint — `POST /admin/mock_set_password`
Rotates the per-profile `unlock_password` for tests that exercise the
F4-d password-rotation runbook (`docs/v2/focas-deployment.md`
§ "FOCAS password handling"). Idempotent — call again to revert.
```
POST /admin/mock_set_password
{ "profile": "Series30i", "password": "5678" }
```
The endpoint accepts the password as a UTF-8/ASCII string and applies
the same right-pad-to-4-bytes / truncate-to-4-bytes normalisation the
driver does, so simulator-side matching is byte-symmetric with the
production wire encoder.
### Admin endpoint — `GET /admin/mock_get_last_write`
Returns the simulator's view of the most-recent successful write, used by
F4-b audit-log integration assertions ("did the write actually reach the
fixture, and is the audit log capturing the right kind/number/value?").
```
GET /admin/mock_get_last_write?profile=Series30i
->
{
"kind": "param", // "param" | "macro"
"number": 1815,
"value": 100,
"writtenAt": "2026-04-25T13:30:00Z"
}
```
When no write has happened the endpoint returns `null` rather than 404 so
the test helper can assert "no writes since fixture reset" without
exception handling.
## `pmc_wrpmcrng` mock behaviour — issue #270, plan PR F4-c
The simulator keeps a per-profile PMC byte table keyed by `(addr_type,
byte_address)` — the same map the existing `pmc_rdpmcrng` handler reads
from. The write handler mutates the same map so a subsequent read sees
the written bytes.
### Per-profile state
Each profile carries:
```python
pmc: Dict[int, bytearray] # addr_type -> bytearray (one per PMC letter, default 256 bytes each)
```
`addr_type` is the PMC area code (R=5, G=4, F=3, D=8, X=1, Y=2, K=10,
A=11, E=12, T=6, C=7); the existing `pmc_rdpmcrng` fixture seeds the
defaults (zeros + a few canned bits per the dl205-style profile fixtures).
### `pmc_wrpmcrng` request decode
| Offset | Width | Field |
| --- | --- | --- |
| 0 | int16 LE | `addr_type` |
| 2 | int16 LE | `data_type` (must be `0` = byte; the driver only emits byte writes) |
| 4 | uint16 LE | `datano_s` |
| 6 | uint16 LE | `datano_e` |
| 8 | bytes | `data[]``(datano_e - datano_s + 1)` bytes |
Handler steps:
1. Look up the per-profile bytearray for `addr_type` (allocate on first
write, default 256 zeros).
2. **Validate** `0 <= datano_s <= datano_e < len(bytearray)` — otherwise
return `EW_NUMBER` (`4`).
3. **Validate** `len(data) == datano_e - datano_s + 1` — otherwise
return `EW_LENGTH` (`14`).
4. **Validate** `data_type == 0` — otherwise return `EW_DATA` (`9`)
because the driver only ever emits byte writes (bit writes wrap with
driver-side RMW so they reach the simulator as 1-byte writes).
5. Copy `data[]` into `bytearray[datano_s:datano_e+1]`. Other bytes
in the array are untouched.
6. Update `last_write` admin-endpoint state (kind=`pmc`, address-type,
start byte, length, bytes).
7. Return `ew_status = 0`.
### Round-trip invariant
The simulator MUST satisfy:
```
write(R, [10..12], [0xAA, 0xBB, 0xCC]); read(R, [10..12]) == [0xAA, 0xBB, 0xCC]
```
and the **byte-isolation invariant**:
```
write(R, [11], [0xFF]); bytes[10] == prior bytes[10] && bytes[12] == prior bytes[12]
```
The integration tests `Series/PmcRangeWriteTests.cs` and
`Series/PmcBitRmwIntegrationTests.cs` assert both shapes.
### Admin endpoint — `GET /admin/mock_get_last_write` extension
The `last_write` payload gains a `kind: "pmc"` variant:
```
{
"kind": "pmc",
"addr_type": 5, // R
"datano_s": 100,
"datano_e": 100,
"bytes": "0x08", // hex-encoded
"writtenAt": "2026-04-25T13:30:00Z"
}
```
Bit-level writes never appear here as a separate kind — they reach the
simulator as 1-byte writes after the driver's RMW wrapper, so the audit
shape is identical to a byte write at the same address.
## Cycle-time per part / last cycle delta — F5-a (issue #272)
Plan PR F5-a derives `Production/LastCycleSeconds` +
`Production/LastCycleStartUtc` from the existing `cnc_rdparam(6711)` +
`cnc_rdtimer` snapshot stream — **pure derivation, no new wire calls**.
The simulator does NOT need new wire commands; the existing
`cnc_rdparam` + `cnc_rdtimer` handlers already cover the read surface.
What focas-mock DOES need is an admin endpoint + test-fixture helper
that lets integration tests atomically increment the parts-count
counter alongside the cycle-time timer so the driver sees a clean
"cycle completed" transition on the next probe tick.
### Per-profile state
Already covered by the existing F1-b state map:
- `parameters: Dict[int, int]` (entry `6711` is the parts-count counter).
- `timers: Dict[int, int]` (entry `0` is the live cycle-time counter,
in seconds).
### Admin endpoint — `POST /admin/mock_simulate_cycle_completion`
Atomically advances both values to model "the CNC just finished a
cycle". Atomicity matters: the F5-a derivation samples both fields on
every probe tick, so if the simulator updated parts-count and the
timer in two separate writes the test could observe an intermediate
state where parts-count incremented but the timer hasn't updated yet
(producing a misleading `LastCycleSeconds`).
```
POST /admin/mock_simulate_cycle_completion
{
"profile": "Series30i",
"partsDelta": 1, // default 1; tests asserting backfill use 3+
"newCycleTimerSeconds": 18 // absolute value, NOT a delta
}
```
Handler steps:
1. `parameters[6711] += partsDelta` (under the per-profile lock).
2. `timers[0] = newCycleTimerSeconds`.
3. Return `200 OK` with the new values for verification.
The endpoint MUST hold the profile's update lock for the full
read-modify-write so a concurrent `cnc_rdparam` + `cnc_rdtimer` poll
sees both fields in their pre-update OR post-update state — never
half-applied.
### `FocasSimFixture.SimulateCycleCompletionAsync`
The future test-support helper wraps the admin endpoint:
```csharp
await fixture.SimulateCycleCompletionAsync(
profile: "Series30i",
partsDelta: 1,
newCycleTimerSeconds: 18);
```
Integration test `Series/CycleDeltaTests.cs` will assert:
- After a 5 -> 6 transition with `newCycleTimerSeconds=18`, the
driver's `Production/LastCycleSeconds` settles to `currentTimer -
prevTimer`.
- `Production/LastCycleStartUtc` is within driver-tolerance of
`nowUtc - LastCycleSeconds` (allow a small window for probe-tick
jitter).
- Counter reset (parts -> 0) preserves the last published values.
- Cycle-timer rollover does not publish a negative delta.
These tests are blocked on the focas-mock + integration-test project
landing; the unit-test coverage in `FocasCycleDeltaTests` already
exercises every same-process invariant of the derivation.
### Status
focas-mock simulator has not landed yet (tracked separately from F4-b /
F4-c). F4-b + F4-c land the .NET-side wire encoders + dispatch + status
mapping unconditionally; the integration-test scaffolds at
`tests/.../IntegrationTests/Series/ParameterWriteTests.cs`,
`MacroWriteTests.cs`, `PmcRangeWriteTests.cs`, and
`PmcBitRmwIntegrationTests.cs` are deferred until the simulator +
integration-test project land. Until then unit-test coverage in
`FocasWriteParameterTests` / `FocasWriteMacroTests` /
`FocasWritePmcTests` exercises every same-process invariant against the
in-memory `FakeFocasClient`.
+263 -286
View File
@@ -1,290 +1,267 @@
# FOCAS wire protocol — what's authoritative vs. what's guessed
Written during Stream B on 2026-04-23 after a research pass through `strangesast/fwlib` +
public FOCAS documentation. Purpose: separate what we *know* about the FOCAS
wire protocol (can quote with confidence) from what we're *guessing* (will need
Wireshark traces to validate in Stream C).
This document directly informs `tests/ZB.MOM.WW.OtOpcUa.Driver.FOCAS.IntegrationTests/Docker/server/`.
## Authoritative — from Fanuc's public `fwlib32.h`
The header file is distributed with the FOCAS Developer Kit and mirrored in OSS
repos (notably `strangesast/fwlib`). The **struct layouts** documented there
are stable across FOCAS versions and authoritative for the payload shapes our
Python mock has to emit.
### ODBM — macro variable read buffer
```c
typedef struct odbm {
short datano; // macro variable number
short dummy; // reserved / alignment padding
long mcr_val; // 32-bit signed macro value
short dec_val; // decimal-point count (0-9)
} ODBM;
```
With `#pragma pack(push, 4)` (the FOCAS default), total size is **10 bytes** on
Windows: 2 + 2 + 4 + 2. Our `FwlibNative.cs` matches this exactly.
Our mock's `_READ_RESP_STRUCT = struct.Struct(">iH")` is **only 6 bytes**
missing `datano` + `dummy`. A real Fwlib decoding the scaffold response will
read garbage. Stream C fix: prepend two `short` fields.
### IODBPSD — CNC parameter read/write buffer
```c
typedef struct iodbpsd {
short datano; // parameter number
short type; // axis index (0 for non-axis parameters)
union {
char cdata;
short idata;
long ldata;
char cdatas[MAX_AXIS]; // MAX_AXIS varies — 8 on 0i, 32 on 30i
short idatas[MAX_AXIS];
long ldatas[MAX_AXIS];
} u;
} IODBPSD;
```
With `pack(4)` and `MAX_AXIS=8`, total size = 2 + 2 + 32 = **36 bytes**. Our
`FwlibNative.cs` matches this (`[SizeConst = 32]` data buffer).
Our mock's current param handler doesn't return bytes in IODBPSD shape —
response payload is just the raw value. Stream C fix: wrap in 4-byte header
+ union-padded data.
### ODBST — status info
```c
typedef struct odbst {
short dummy; // reserved
short tmmode; // Memory / Tape / MDI / EDIT / DNC
short aut; // automatic mode
short run; // running state
short motion; // motion state
short mstb; // M/S/T/B finish signal
short emergency; // emergency stop
short alarm; // alarm state
short edit; // edit mode sub-state
} ODBST;
```
9 × short = **18 bytes**. Our mock already emits 18 bytes via
`struct.Struct(">9h")`. ✓ correct.
### IODBPMC — PMC range read/write buffer
```c
typedef struct iodbpmc {
short type_a; // PMC address letter encoded as ADR_* numeric code
short type_d; // data type: 0=byte, 1=word, 2=long, 4=float, 5=double
unsigned short datano_s; // start address number
unsigned short datano_e; // end address number
union {
char cdata[5];
short idata[5];
long ldata[5];
float fdata[5];
double dbdata[5];
} u; // 40-byte union (widest = dbdata = 5×8 bytes)
} IODBPMC;
```
With `pack(4)` the union is 40 bytes; struct total = 8 + 40 = **48 bytes**.
Our `FwlibNative.cs` matches this.
Our mock's PMC handler takes a different layout (uint16 handle + uint8 letter
+ ...). Stream C fix: rewrite to IODBPMC shape.
## Reference trace findings (2026-04-23 dev-box reversing)
**Good news** — we don't need a bench CNC for first-pass reversing. Loading
`Fwlib64.dll` in `otopcua-focas-cli` + pointing it at our Python simulator on
`127.0.0.1:8193` + enabling `OTOPCUA_FOCAS_RAW_CAPTURE=1` on the sim lets us
observe Fwlib's outbound bytes + iterate on reply shapes. Each cycle is ~5s;
progress measure is "Fwlib sends more bytes before disconnecting".
### Confirmed wire facts
**Magic prefix** — every frame Fwlib sends begins with `0xA0 0xA0 0xA0 0xA0`
(4 bytes). This is NOT a length prefix — our scaffold tried to decode it as
uint32-big-endian = 2.7 GB and died. It's a fixed protocol marker.
**Handshake request** — `cnc_allclibhndl3` produces this 8-byte frame:
```
a0 a0 a0 a0 00 01 01 01
└─ magic ─┘ └── negotiation ──┘
```
The 4-byte negotiation field is stable across our observations (always
`00 01 01 01`). Interpretation TBD — possibly `(version_major=0x0001,
version_minor=0x0101)` or `(protocol=0x01, subtype=0x010101)`.
**Handshake reply that Fwlib accepts** (empirically confirmed — doesn't
disconnect):
```
a0 a0 a0 a0 00 01 01 01 00 XX 00 YY
└─ magic ─┘ └── echo ──┘ handle api_version
```
12 bytes: magic + echoed negotiation + 2-byte handle + 2-byte api_version code.
### Post-handshake frame shape — decoded via drain mode
The simulator's `OTOPCUA_FOCAS_DRAIN_AFTER_HANDSHAKE=1` mode reads all inbound
bytes for 1000 ms after the handshake reply without attempting any decode.
Captured payload from `cnc_allclibhndl3`:
```
00 02 00 02 a0 a0 a0 a0 00 01 21 01 00 00
└── prefix ─┘ └── magic ─┘ └─── body ────┘
4 bytes 4 bytes 6 bytes (total = 14 bytes)
```
**Key discovery**: post-handshake frames have a **4-byte prefix BEFORE the
magic**, not magic-first. Frame shape:
```
uint16 msg_counter // starts at 2; handshake was #1 implicitly
uint16 handle_echo // matches the handle our open reply returned
4 bytes FOCAS_MAGIC // 0xA0A0A0A0
N bytes body // function-specific
```
Session 1's drain captured only the prefix (`00 02 00 01`) before timing
out — TCP multiplexed the two test sessions's bytes differently. Session 2
caught the full 14-byte frame.
### Body bytes — first post-handshake request
Body on `cnc_allclibhndl3` first post-handshake frame:
```
00 01 21 01 00 00
```
Informed guesses (unvalidated):
- `00 01` = body length (1 useful byte?) or sub-request count
- `21 01` = function code / operation tag — `0x21` is seen in public FOCAS
reverse-engineering notes associated with "system info" / "controller
identification" queries
- `00 00` = padding / reserved
# FOCAS wire protocol — packed-buffer surface
Notes on the language-neutral packed-buffer encoding the FOCAS driver +
focas-mock simulator share. This format is **not** the FWLIB native struct
layout — Tier-C Fwlib32 backends marshal directly from the FANUC C struct.
The packed surface exists so the simulator (Python / FastAPI) and the .NET
wire client can speak a common format over IPC without piping a Win32 DLL
through both ends.
## Command id table
Each FOCAS-equivalent call gets a stable wire-protocol command id. Ids are
**append-only** — never renumber, never reuse.
| Id | FOCAS API | Surface |
| --- | --- | --- |
| `0x0001` | `cnc_rdcncstat` | ODBST 9-field status struct |
| `0x0002` | `cnc_rdparam` | parameter value (one number) |
| `0x0003` | `cnc_rdmacro` | macro variable value |
| `0x0004` | `cnc_rddiag` | diagnostic value |
| ... | ... | ... |
| **`0x0102`** | **`cnc_wrparam`** | **IODBPSD parameter-write packet (issue #269, plan PR F4-b)** |
| **`0x0103`** | **`cnc_wrmacro`** | **ODBM macro-write packet (issue #269, plan PR F4-b)** |
| **`0x0104`** | **`pmc_wrpmcrng`** | **IODBPMC PMC range-write packet (issue #270, plan PR F4-c)** |
| **`0x0105`** | **`cnc_wrunlockparam`** | **4-byte password buffer for the parameter-protect / read-protect unlock (issue #271, plan PR F4-d)** |
| `0x0F1A` | **`cnc_rdalmhistry`** | **ODBALMHIS alarm-history ring-buffer dump (issue #267, plan PR F3-a)** |
## ODBALMHIS — alarm history (`cnc_rdalmhistry`, command `0x0F1A`)
Issued by `FocasAlarmProjection` when
`FocasDriverOptions.AlarmProjection.Mode == ActivePlusHistory`. Returns up
to `depth` most-recent ring-buffer entries.
### Request
| Offset | Width | Field | Notes |
| --- | --- | --- | --- |
| 0 | `int16 LE` | `depth` | clamped client-side to `[1..250]` (`FocasAlarmProjectionOptions.MaxHistoryDepth`) |
### Response (packed buffer, little-endian)
| Offset | Width | Field |
| --- | --- | --- |
| 0 | `int16 LE` | `num_alm` — number of entries that follow. `< 0` indicates CNC error. |
| 2 | repeated | `ALMHIS_data alm[num_alm]` (see below) |
Each entry block:
| Offset (rel.) | Width | Field |
| --- | --- | --- |
| 0 | `int16 LE` | `year` |
| 2 | `int16 LE` | `month` |
| 4 | `int16 LE` | `day` |
| 6 | `int16 LE` | `hour` |
| 8 | `int16 LE` | `minute` |
| 10 | `int16 LE` | `second` |
| 12 | `int16 LE` | `axis_no` (1-based; 0 = whole-CNC) |
| 14 | `int16 LE` | `alm_type` (P/S/OT/SV/SR/MC/SP/PW/IO encoded numerically) |
| 16 | `int16 LE` | `alm_no` |
| 18 | `int16 LE` | `msg_len` (0..32 typical) |
| 20 | `msg_len` | ASCII message (no null terminator) |
| `20 + msg_len` | 0..3 | pad to 4-byte boundary so per-entry blocks stay self-delimiting |
The CNC stamps `year..second` in **its own local time**. The deployment
guide instructs operators to keep CNC clocks on UTC so the projection's
dedup key `(OccurrenceTime, AlarmNumber, AlarmType)` stays stable across
DST transitions. The .NET decoder
(`Wire/FocasAlarmHistoryDecoder.Decode`) constructs each
`DateTimeOffset` with `TimeSpan.Zero` (UTC) on that assumption.
### Error handling
- A negative `num_alm` short-circuits decode to an empty list — the
projection treats it as "no history this tick" and the next poll
retries.
- Malformed timestamps (e.g. month=0) are skipped per-entry instead of
faulting the whole decode; the dedup key for malformed entries would be
unstable anyway.
- `msg_len` overrunning the payload truncates the entry list at the
malformed entry rather than throwing.
## IODBPSD — parameter write (`cnc_wrparam`, command `0x0102`)
Issue #269, plan PR F4-b. The write-side payload is the **byte-symmetric
inverse of the `cnc_rdparam` read** — the same `IODBPSD` struct shape, and
the .NET wire client uses the read-side decoder reversed (`EncodeParamValue`
in `FwlibFocasClient.cs`) so the encoder/decoder are guaranteed to stay in
lock-step.
### Request
| Offset | Width | Field |
| --- | --- | --- |
| 0 | `int16 LE` | `datano` — parameter number (e.g. `1815`) |
| 2 | `int16 LE` | `type` — axis index (1-based; `0` = whole-CNC parameter) |
| 4 | `length` | `data` payload — width depends on parameter type |
`length` (request frame trailer, drives `data` width):
| FocasDataType | `length` | Payload encoding |
| --- | --- | --- |
| `Byte` | `4 + 1` | one signed byte at offset 4 |
| `Int16` | `4 + 2` | int16 LE at offset 4 |
| `Int32` | `4 + 4` | int32 LE at offset 4 |
Bit-addressed parameters (`PARAM:1815/0` form) are not supported by F4-b
and surface as `BadNotSupported`; F4-c will land the read-modify-write
helper alongside the PMC bit RMW path.
### Response
Single `int16 LE` return code per the standard FWLIB convention:
- `0``Good`
- `11` (`EW_PASSWD`) → **`BadUserAccessDenied`** (was `BadNotWritable`
pre-F4-b — see `FocasStatusMapper`). Means the parameter-write switch is
off or the CNC isn't in MDI mode; the F4-d unlock workflow will close
the loop on this from the OPC UA side.
- Other `EW_*` codes map per
[`FocasStatusMapper.MapFocasReturn`](../../src/ZB.MOM.WW.OtOpcUa.Driver.FOCAS/FocasStatusMapper.cs).
## ODBM — macro write (`cnc_wrmacro`, command `0x0103`)
Issue #269, plan PR F4-b. The write-side payload mirrors the
`cnc_rdmacro` read shape: the same `(mcr_val, dec_val)` (integer +
decimal-point count) split, but emitted from the .NET side rather than
decoded.
### Request
| Offset | Width | Field |
| --- | --- | --- |
| 0 | `int16 LE` | `number` — macro variable number (e.g. `500`) |
| 2 | `int16 LE` | `length` — fixed at `8` for ODBM |
| 4 | `int32 LE` | `mcr_val` — scaled integer value |
| 8 | `int16 LE` | `dec_val` — decimal-point count |
F4-b ships **integer-only writes** (`dec_val = 0`) to match the most
common HMI pattern; a future `WriteMacroScaled` overload will land if the
field calls for fractional macro setpoints. Read-side decoders apply
`mcr_val / 10^dec_val`, so a `dec_val = 0` write surfaces back as the
integer it was emitted as.
### Response
Same single-int16 envelope as `cnc_wrparam`. `EW_PASSWD` is rare on macro
writes (the gate-switch protection is parameter-specific) but the mapper
treats both kinds identically.
### Symmetry note
The plan carries a "byte layout symmetry" requirement — the encoder for
each kind is the read-side decoder reversed. Adding a new parameter type
(e.g. `Int64` parameters, when they ship) means extending both sides in
the same PR; the unit test
`FocasWriteParameterTests.ParameterWrite_round_trip_stores_value_visible_to_subsequent_read`
exercises encode → store → decode with the fake wire client and is the
canary for symmetry regressions.
## IODBPMC — PMC range write (`pmc_wrpmcrng`, command `0x0104`)
Issue #270, plan PR F4-c. The write-side payload is the read-side
`pmc_rdpmcrng` IODBPMC packet with the data direction inverted: the
caller fills the `data[]` byte run and the simulator / Fwlib32 stores
it; the response is the small status envelope rather than the populated
data buffer the read side returns.
### Request
| Offset | Width | Field |
| --- | --- | --- |
| 0 | `int16 LE` | `type_a` — PMC address-type code (R=5, G=4, F=3, D=8, X=1, Y=2, K=10, A=11, E=12, T=6, C=7) |
| 2 | `int16 LE` | `type_d` — data type (`0` = byte; only byte writes are issued — bit writes wrap the byte path with a read-modify-write helper) |
| 4 | `uint16 LE` | `datano_s` — first byte address (inclusive) |
| 6 | `uint16 LE` | `datano_e` — last byte address (inclusive) — `(datano_e - datano_s + 1)` is the byte count |
| 8 | `bytes` | `data[]` — payload, exactly `(datano_e - datano_s + 1)` bytes |
The header is 8 bytes; the FWLIB `IODBPMC.data` field caps at 32 bytes
(40-byte total per call), so larger ranges are chunked into 32-byte
sub-calls by the wire client. The simulator MUST honour the same chunk
ceiling so chunked-vs-single round-trips produce the same final bytes.
### Response
Same single-int16 envelope as `cnc_wrparam` / `cnc_wrmacro`:
| Offset | Width | Field |
| --- | --- | --- |
| 0 | `int16 LE` | `ew_status``0` = success, non-zero = FANUC `EW_*` |
`EW_NOOPT` (option not installed), `EW_NUMBER` (out-of-range address),
`EW_LENGTH` (chunk size mismatch) are the typical failures the simulator
reproduces; the mapper translates them to OPC UA status codes the same
way the read-side does.
### Bit-level RMW (driver-side, no extra wire op)
Likely this is Fwlib's "tell me what CNC you are" query — part of
`cnc_allclibhndl3`'s internal handshake continuation before the handle is
fully established. Returning an empty or malformed response causes Fwlib
to declare the far end "not a CNC" and error with `EW_FUNC` (16).
`pmc_wrpmcrng` is **byte-addressed** — there is no sub-byte write op on
the wire. Bit writes go through `IFocasClient.WritePmcBitAsync` which:
1. Issues a 1-byte `pmc_rdpmcrng` to fetch the parent byte.
2. Masks the target bit (set: OR; clear: AND-NOT).
3. Issues a 1-byte `pmc_wrpmcrng` with the modified byte.
A per-byte semaphore in `FwlibFocasClient` serialises concurrent bit
writes against the same byte so two updates that race never lose one
another's bit. The simulator's handler implements the same byte-aligned
semantics — bit writes never reach it as a separate frame.
### Iteration 3 — echo response, error-code advances
### Symmetry note
The encoder is the `pmc_rdpmcrng` decoder reversed: the read side parses
`(type_a, type_d, datano_s, datano_e)` from the request and emits the
data buffer in the response; the write side parses all five fields plus
the data buffer from the request and emits a status int16 in the
response. Tests `FocasWritePmcTests.PMC_*` exercise the round-trip on
the fake wire client.
## cnc_wrunlockparam — connection-level password unlock (command `0x0105`)
Issue #271, plan PR F4-d. Some controllers (notably 16i + certain 30i
firmwares with parameter-protect on) gate `cnc_wrparam` and selected
reads behind a connection-level password switch. The driver emits this
frame on connect when `FocasDeviceOptions.Password` is configured, and
re-emits it on any read/write that returns `EW_PASSWD` (then retries the
gated call once).
### Request
| Offset | Width | Field |
| --- | --- | --- |
| 0 | `byte[4]` | `password[4]` — 4-byte password buffer. ASCII-encoded from `FocasDeviceOptions.Password`, right-padded with `0x00`, truncated at 4 bytes. |
The 4-byte fixed slot matches the FANUC published shape — the controller
compares byte-for-byte. Longer / shorter source strings are normalised at
the driver layer before they hit this frame so the wire surface stays
canonical.
### Response
Same single-int16 envelope as the write frames:
Sending back `<prefix><magic><echoed body>` (14 bytes matching request shape)
advances Fwlib's client-side error code from **`EW_-16` (socket-level)** to
**`EW_-17` (protocol-level rejection)**. Fwlib reads our response in full
before disconnecting with `peer closed mid-frame`.
| Offset | Width | Field |
| --- | --- | --- |
| 0 | `int16 LE` | `ew_status``0` = success (gate now lifted for the lifetime of this FWLIB handle), `EW_PASSWD` = supplied password did not match the controller's slot, `EW_HANDLE` = handle invalid. |
### Lifetime
Unlock is bound to the FWLIB handle: it persists until the handle closes
(disconnect / reconnect). The driver reinvokes unlock on every
`EnsureConnectedAsync` reconnect path so a planned or unplanned wire
restart self-heals without operator intervention. A `BadUserAccessDenied`
on a read/write triggers a single-shot retry: re-emit unlock + redispatch
the gated call once. A second `EW_PASSWD` propagates unchanged so a
mismatched password doesn't loop forever on the wire.
### No-log invariant
Meaning: our **frame structure is correct enough** that Fwlib parses it as a
valid FOCAS frame; the **body content** (the 6 bytes after magic) is where
the semantic mismatch now lives. Fwlib expects specific bytes back for the
`0x2101` system-info query and an echo doesn't match.
### Current iteration block
Going deeper without reference requires either:
- **A bench CNC** (#54) to capture a real response to the `0x2101` query.
Stream C.2 Wireshark trace gives us the exact byte pattern Fwlib expects.
- **Published FOCAS response specs** for sub-function `0x2101` — not present
in `strangesast/fwlib` headers; likely only in the licensed Developer Kit
binary docs.
- **Blind enumeration** — try N variations of the 6-byte body response until
Fwlib's error code changes again. High cost, low signal.
The first two are both blocked on resources we don't have. The third is
~hundreds of cycles with no guarantee of convergence.
### Diminishing-returns checkpoint
**What we've proven without hardware**:
1. Magic prefix `0xA0A0A0A0` confirmed
2. Handshake request format decoded (`magic + 4-byte negotiation`)
3. Handshake response format that Fwlib accepts (`magic + echo + handle + api`)
4. Post-handshake frame format decoded (`prefix + magic + body`)
5. First post-handshake function code observed (`0x2101` — likely system-info)
6. Error code progression `EW_SOCKET``EW_PROTOCOL` confirms our framing is
structurally correct
**What we can't prove without bench CNC or reference docs**:
1. The exact 6-byte response body Fwlib expects for `0x2101`
2. The full list of post-handshake function codes + their body shapes
3. Whether subsequent frames use length prefixes or fixed body sizes
**Recommendation**: checkpoint here. The framing discoveries above are
preserved in `server/frames.py` + `server/state.py` + `server/focas_server.py`
+ `server/handlers/__init__.py`. When bench-CNC access unblocks Stream C.2's
reference trace, the iteration loop (with the framing work already done)
should converge in hours rather than days.
### Still unknown
- **Response shape** for the post-handshake body request — we can frame the
prefix + magic correctly now, but what the 6-byte body response should
carry (CNC series ID? version? capability flags?) needs further iteration.
- **Function-id numeric values** for the 9 FWLIB calls our driver makes —
one per call, need to be observed separately.
- **Error encoding** on the wire.
### Next iteration cycles
With the handshake working, each subsequent function gets its own probe-and-observe
loop. The simulator now has a `RAW_FRAME_MARKER = 0xFFFF` sentinel that lets a
handler return exact wire bytes (bypassing the scaffold envelope) — use that to
try different post-handshake replies and watch Fwlib's reaction.
## Stream C work order
Given what's authoritative vs. guessed, here's the most efficient path:
### Phase 1 — payload shapes (no hardware required)
- [ ] Rewrite `server/handlers/macro.py` response to return 10-byte ODBM:
`short datano, short dummy, int32 mcr_val, short dec_val`
- [ ] Rewrite `server/handlers/param.py` response to return 36-byte IODBPSD:
`short datano, short type, bytes[32] u`
- [ ] Rewrite `server/handlers/pmc.py` response to return 48-byte IODBPMC:
`short type_a, short type_d, uint16 datano_s, uint16 datano_e, bytes[40] u`
- [ ] Add unit tests asserting byte-exact sizes
- [ ] Update validate_harness.py to match the new shapes
Effect: when Stream C gets its first Wireshark trace, the payload-layer of the
mock is already correct. Only the framing layer needs iteration.
### Phase 2 — framing (requires hardware)
This is the iterative Wireshark loop — no point starting until the Windows rig
+ licensed Fwlib64.dll + real CNC are all available. See the implementer's
checklist in
[`tests/ZB.MOM.WW.OtOpcUa.Driver.FOCAS.IntegrationTests/Docker/README.md`](../../../tests/ZB.MOM.WW.OtOpcUa.Driver.FOCAS.IntegrationTests/Docker/README.md).
### Phase 3 — flip the C# test gate
Once Phase 2 proves Fwlib64 can talk to the mock:
- [ ] Flip `OTOPCUA_FOCAS_SIM_WIRE_COMPAT=1` in the CI env
- [ ] Expand `tests/.../IntegrationTests/Series/WireCompatGatedTests.cs` with
real per-series assertions
- [ ] Update `scripts/e2e/test-focas.ps1` to accept `-ProfileName`
- [ ] Close Stream D
## References
- [`src/ZB.MOM.WW.OtOpcUa.Driver.FOCAS/FwlibNative.cs`](../../../src/ZB.MOM.WW.OtOpcUa.Driver.FOCAS/FwlibNative.cs) — P/Invoke surface, authoritative struct layouts
- [`src/ZB.MOM.WW.OtOpcUa.Driver.FOCAS/FwlibFocasClient.cs`](../../../src/ZB.MOM.WW.OtOpcUa.Driver.FOCAS/FwlibFocasClient.cs) — reference C# implementation of each FWLIB call
- [`src/ZB.MOM.WW.OtOpcUa.Driver.FOCAS/FocasStatusMapper.cs`](../../../src/ZB.MOM.WW.OtOpcUa.Driver.FOCAS/FocasStatusMapper.cs) — EW_* → OPC UA status mapping
- Fanuc FOCAS Developer Kit (licensed, not in repo) — ultimate source of truth
- `strangesast/fwlib` on GitHub — redistributes `fwlib32.h` + runtime binaries; no wire protocol docs
The password is a secret. Wire-client implementations MUST NOT log the
password on either request or response. The current
`FwlibFocasClient.UnlockAsync` constructs an exception that includes
only the `EW_*` return code; the `FocasDeviceOptions` record overrides
its auto-generated `ToString` so any Serilog destructure renders
`Password = ***`. See
[`docs/v2/focas-deployment.md`](../focas-deployment.md)
§ "FOCAS password handling" for the deployment-side guarantees +
rotation runbook.
@@ -1,14 +1,3 @@
> **✅ Completed 2026-04-30 — historical record of Phase 2 (Galaxy out-of-process split).**
>
> Phase 2 produced the `Galaxy.Host` / `Galaxy.Proxy` / `Galaxy.Shared`
> three-project split as a stepping stone toward the eventual mxaccessgw
> architecture. Those projects shipped, served their purpose for
> roughly a year, then retired in PR 7.2 alongside the
> `OtOpcUaGalaxyHost` Windows service. This file is preserved as the
> phase-exit evidence; do not treat it as live architecture
> documentation. See `docs/drivers/Galaxy.md` for the current
> in-process driver.
# Phase 2 — Galaxy Out-of-Process Refactor (Tier C)
> **Status**: DRAFT — implementation plan for Phase 2 of the v2 build (`plan.md` §6, `driver-stability.md` §"Galaxy — Deep Dive").
@@ -183,7 +172,7 @@ Lift the existing `GalaxyRuntimeProbeManager` into the new project. Behaviors pe
#### Task B.6 — Named-pipe IPC server with mandatory ACL
Per decision #76 + `driver-stability.md` §"IPC Security":
- Pipe ACL on creation: `ReadWrite | Synchronize` granted only to the OtOpcUa server's service principal SID; LocalSystem **explicitly denied**. Administrators was dropped from the deny list so non-elevated admins on dev boxes aren't blocked via UAC-filtered-token deny-only semantics — the per-connection SID check (§2 of driver-stability.md) remains the real authorization boundary.
- Pipe ACL on creation: `ReadWrite | Synchronize` granted only to the OtOpcUa server's service principal SID; LocalSystem and Administrators **explicitly denied**
- Caller identity verification on each new connection: `GetImpersonationUserName()` cross-checked against configured server service SID; mismatches dropped before any RPC frame is read
- Per-process shared secret: passed by the supervisor at spawn time, required on first frame of every connection
- Heartbeat pipe: separate from data-plane pipe, same ACL
@@ -1,8 +1,6 @@
# Phase 6.1 — Resilience & Observability Runtime
> **Status**: **SHIPPED** 2026-04-19 — Streams A/B/C/D + E data layer merged to `v2` across PRs #78-82. Final exit-gate PR #83 turns the compliance script into real checks (all pass) and records this status update.
>
> **Stream E.2/E.3 closed 2026-04-23**`FleetStatusPoller` now polls `DriverInstanceResilienceStatus`, detects per-`(DriverInstanceId, HostName)` deltas, and pushes `ResilienceStatusChangedMessage` via `FleetStatusHub` on the fleet group. Admin `/hosts` page subscribes on load and upserts the matching `HostStatusRow` in-memory on receipt, so operator-visible resilience state now reflects the runtime within one poller tick (~5 s) instead of the Admin page's own 10-second refresh. `FleetStatusPollerTests.Poller_pushes_ResilienceStatusChanged_on_delta` covers the first-observation push, the no-delta-no-push invariant, and the mutated-row re-push.
> **Status**: **SHIPPED** 2026-04-19 — Streams A/B/C/D + E data layer merged to `v2` across PRs #78-82. Final exit-gate PR #83 turns the compliance script into real checks (all pass) and records this status update. One deferred piece: Stream E.2/E.3 SignalR hub + Blazor `/hosts` column refresh lands in a visual-compliance follow-up PR on the Phase 6.4 Admin UI branch.
>
> Baseline: 906 solution tests → post-Phase-6.1: 1042 passing (+136 net). One pre-existing Client.CLI Subscribe flake unchanged.
>
@@ -131,7 +129,7 @@ Closes these gaps flagged in the 2026-04-19 audit:
- [ ] Stream B: Tier registry + generalised watchdog + scheduled recycle + wedge detector
- [ ] Stream C: `/healthz` + `/readyz` + structured logging + JSON Serilog sink
- [ ] Stream D: LiteDB cache + Polly fallback in Configuration
- [x] Stream E: Admin `/hosts` page refresh (E.1 in PRs #78-82 with the data layer; E.2/E.3 closed 2026-04-23)
- [ ] Stream E: Admin `/hosts` page refresh
- [ ] Cross-cutting: `phase-6-1-compliance.ps1` exits 0; full solution `dotnet test` passes; exit-gate doc recorded
## Adversarial Review — 2026-04-19 (Codex, thread `019da489-e317-7aa1-ab1f-6335e0be2447`)
@@ -1,9 +1,10 @@
# Phase 6.2 — Authorization Runtime (ACL + LDAP grants)
> **Status**: **FULLY SHIPPED** (updated 2026-04-23 audit). Streams A-D core merged to `v2` across PRs #84-87 + exit-gate PR #88 on 2026-04-19; both named deferrals landed separately and were confirmed against the repo this session:
> **Status**: **SHIPPED (core)** 2026-04-19 — Streams A, B, C (foundation), D (data layer) merged to `v2` across PRs #84-87. Final exit-gate PR #88 turns the compliance stub into real checks (all pass, 2 deferred surfaces tracked).
>
> - **Task #143 Stream C dispatch wiring**`DriverNodeManager` calls `AuthorizationGate.IsAllowed(context.UserIdentity, OpcUaOperation.<Op>, scope)` on Read (line 249), Write (line 536) with per-classification `OpcUaOperation.WriteOperate` / `WriteTune` / `WriteConfigure` routed via `WriteAuthzPolicy`, and HistoryRead (4 call sites). `TriePermissionEvaluator` + `PermissionTrieCache` back the gate.
> - **Task #144 Stream D Admin UI**`RoleGrants.razor` (LDAP group → Admin role mapping) + `AclsTab.razor` (per-cluster node-ACL editor with a probe-this-permission surface via `PermissionProbeService`) + `AclChangeNotifier` SignalR hub for cache invalidation all present and wired.
> Deferred follow-ups (tracked separately):
> - Stream C dispatch wiring on the 11 OPC UA operation surfaces (task #143).
> - Stream D Admin UI — RoleGrantsTab, AclsTab Probe-this-permission, SignalR invalidation, draft-diff ACL section + visual-compliance reviewer signoff (task #144).
>
> Baseline pre-Phase-6.2: 1042 solution tests → post-Phase-6.2 core: 1097 passing (+55 net). One pre-existing Client.CLI Subscribe flake unchanged.
>
@@ -1,20 +1,13 @@
# Phase 6.3 — Redundancy Runtime
> **Status**: **SHIPPED (core + Stream C)** — original body merged 2026-04-19; audit 2026-04-23 promoted **Stream C (task #147)** into shipped state.
> **Status**: **SHIPPED (core)** 2026-04-19 — Streams B (ServiceLevelCalculator + RecoveryStateManager) and D core (ApplyLeaseRegistry) merged to `v2` in PR #89. Exit gate in PR #90.
>
> **In** (verified in repo):
> - Stream A — `ClusterTopologyLoader`, `RedundancyCoordinator`, `RedundancyTopology`, `PeerReachability` all present under `src/ZB.MOM.WW.OtOpcUa.Server/Redundancy/`. Coordinator is now also hosted by `Program.cs` via the new `RedundancyPublisherHostedService`, which calls `RefreshAsync` on startup.
> - Stream B`ServiceLevelCalculator` + `RecoveryStateManager`.
> - **Stream C (task #147) — OPC UA node wiring**. `ServerRedundancyNodeWriter` maintains `Server.ServiceLevel` (i=2267), `Server.ServerRedundancy.RedundancySupport` (i=2994), and `Server.ServerRedundancy.ServerUriArray` (non-transparent subtype) by writing the `PropertyState.Value` + calling `ClearChangeMasks`. `RedundancyPublisherHostedService` drives the publisher on a 1 s tick and fans `OnStateChanged` / `OnServerUriArrayChanged` into the writer. Mapping of `Configuration.RedundancyMode` → Part 4 `RedundancySupport` is Warm/Hot/None (v2 clusters don't enumerate Cold / HotAndMirrored per decision #85). Idempotent per-value dedupe prevents spurious OPC UA notifications. Unit coverage: `ServerRedundancyNodeWriterTests` (4 tests, green).
> - Stream D`ApplyLeaseRegistry`.
> - Stream E — `RedundancyTab.razor` with SignalR `RoleChanged` wiring (via `FleetStatusPoller` + `FleetStatusHub`) — stale-flag + role-swap banner.
>
> **Closed this session (2026-04-23)**:
> - **Task #148 part 2**`DraftValidator.ValidateClusterTopology(cluster, nodes)` now catches three pre-publish invariants the SQL CHECK can't see: (a) unsupported `NodeCount`/`RedundancyMode` pairs; (b) `Enabled`-node count vs. declared `NodeCount` mismatch (catches disabled-node drift with mode still Hot/Warm); (c) multiple-Primary per decision #84. Returns every failure in one pass — same shape as `Validate`. 8 new tests in `DraftValidatorTests` green.
> - **Task #150 Stream F**`docs/v2/redundancy-interop-playbook.md` captures the manual validation matrix against UaExpert + Kepware + AVEVA MXAccess failover. Automating these closed-source GUI clients in PR-CI is out of scope; the automatable half is already covered by `ServiceLevelCalculatorTests` / `RedundancyStatePublisherTests` / `ClusterTopologyLoaderTests` / `ServerRedundancyNodeWriterTests`.
>
> **Remaining (documented limitation, not blocking v2.0)**:
> - Non-transparent redundancy-state node upgrade — the SDK's default `Server.ServerRedundancy` object is the base `ServerRedundancyState`, so `ApplyServerUriArray` currently logs-and-skips. Operators on the rare deployment that needs `ServerUriArray` read-back get a clear warning with the upgrade path. Documented in the interop playbook's "Known limitations" section.
> Deferred follow-ups (tracked separately):
> - Stream A — RedundancyCoordinator cluster-topology loader (task #145).
> - Stream COPC UA node wiring: ServiceLevel + ServerUriArray + RedundancySupport (task #147).
> - Stream E — Admin UI RedundancyTab + OpenTelemetry metrics + SignalR (task #149).
> - Stream Fclient interop matrix + Galaxy MXAccess failover test (task #150).
> - sp_PublishGeneration pre-publish validator rejecting unsupported RedundancyMode values (task #148 part 2 — SQL-side).
>
> Baseline pre-Phase-6.3: 1097 solution tests → post-Phase-6.3 core: 1137 passing (+40 net).
>
@@ -1,17 +1,12 @@
# Phase 6.4 — Admin UI Completion
> **Status**: **SHIPPED (mostly)** 2026-04-19; audit 2026-04-23 confirms what landed separately after the data-layer PR #91:
> **Status**: **SHIPPED (data layer)** 2026-04-19 — Stream A.2 (UnsImpactAnalyzer + DraftRevisionToken) and Stream B.1 (EquipmentCsvImporter parser) merged to `v2` in PR #91. Exit gate in PR #92.
>
> **In** (verified in repo):
> - **Task #153 Stream A UI**`UnsTab.razor` with drag/drop handlers + concurrent-edit via `DraftRevisionToken` + `UnsImpactAnalyzer`; Playwright smoke test in `tests/ZB.MOM.WW.OtOpcUa.Admin.E2ETests/UnsTabDragDropE2ETests.cs`.
> - **Task #155 Stream B**`EquipmentImportBatch` entity + migration, `EquipmentImportBatchService.CreateBatchAsync` / `FinaliseBatchAsync` / `DropBatchAsync` / `ListByUserAsync`, `ImportEquipment.razor` UI.
> - **Task #156 Stream C**`DiffViewer.razor` + `DiffSection.razor` refactor in place.
> - Admin UI `IdentificationFields.razor` surface shipped (part of #157).
>
> **Closed this session (2026-04-23)**:
> - **Task #157 Stream D server-side half** was a stale audit claim. `src/ZB.MOM.WW.OtOpcUa.Core/OpcUa/IdentificationFolderBuilder.cs` ships the OPC 40010 Identification sub-folder materializer (Manufacturer / Model / SerialNumber / HardwareRevision / SoftwareRevision / YearOfConstruction / AssetLocation / ManufacturerUri / DeviceManualUri); `EquipmentNodeWalker.Walk` calls it per equipment; `IdentificationFolderBuilderTests` (158 lines) + two walker-level tests (`Walk_Materializes_Identification_Subfolder_When_AnyFieldPresent`, `Walk_Omits_Identification_Subfolder_When_AllFieldsNull`) cover the null-handling branches. The initial audit grepped only `src/ZB.MOM.WW.OtOpcUa.Server/OpcUa/`; the builder lives in `Core/OpcUa/`.
>
> **Phase 6.4 is now FULLY SHIPPED — no deferred surfaces remain.**
> Deferred follow-ups (Blazor UI + staging tables + address-space wiring):
> - Stream A UI — UnsTab MudBlazor drag/drop + 409 concurrent-edit modal + Playwright smoke (task #153).
> - Stream B follow-up — EquipmentImportBatch staging + FinaliseImportBatch transaction + CSV import UI (task #155).
> - Stream C — DiffViewer refactor into base + 6 section plugins + 1000-row cap + SignalR paging (task #156).
> - Stream D — IdentificationFields.razor + DriverNodeManager OPC 40010 sub-folder exposure (task #157).
>
> Baseline pre-Phase-6.4: 1137 solution tests → post-Phase-6.4 data layer: 1159 passing (+22).
>
+14 -111
View File
@@ -14,7 +14,7 @@ End-to-end validation that the Phase 7 production wiring chain (#243 / #244 / #2
| SQL Server reachable, `OtOpcUaConfig` DB exists with all migrations applied | `sqlcmd -S "localhost,14330" -d OtOpcUaConfig -U sa -P "..." -Q "SELECT COUNT(*) FROM dbo.__EFMigrationsHistory"` returns ≥ 11 |
| Server's `appsettings.json` `Node:ConfigDbConnectionString` matches your SQL Server | `cat src/ZB.MOM.WW.OtOpcUa.Server/appsettings.json` |
> **Galaxy.Host pipe ACL.** The pipe allows the configured `OTOPCUA_ALLOWED_SID` (typically the user that runs `OtOpcUaGalaxyHost``dohertj2` on the dev box). Run the Server under the same user; elevation doesn't matter — `PipeAcl.cs` no longer denies `BUILTIN\Administrators` since UAC's deny-only Admins SID would have blocked non-elevated dev-box admins too.
> **Galaxy.Host pipe ACL.** Per `docs/ServiceHosting.md`, the pipe ACL deliberately denies `BUILTIN\Administrators`. **Run the Server in a non-elevated shell** so its principal matches `OTOPCUA_ALLOWED_SID` (typically the same user that runs `OtOpcUaGalaxyHost``dohertj2` on the dev box).
## Setup
@@ -36,49 +36,11 @@ sqlcmd -S "localhost,14330" -d OtOpcUaConfig -U sa -P "OtOpcUaDev_2026!" `
Expected output ends with `Phase 7 smoke seed complete.` plus a Cluster / Node / Generation summary. Idempotent — re-running wipes the prior smoke state and starts clean.
The seed creates one each of: `ServerCluster`, `ClusterNode`, `ClusterNodeCredential` (binds the SQL login to the node — without this `sp_GetCurrentGenerationForCluster` returns `Unauthorized: caller X is not bound to NodeId p7-smoke-node`), `ConfigGeneration` (Published), `Namespace`, `UnsArea`, `UnsLine`, `Equipment`, `DriverInstance` (Galaxy proxy), `Tag`, two `Script` rows, one `VirtualTag` (`MachineStatus` = `Source > 0`, Boolean, historized), one `ScriptedAlarm` (`OverTemp` when `Source > 50`).
The seed creates one each of: `ServerCluster`, `ClusterNode`, `ConfigGeneration` (Published), `Namespace`, `UnsArea`, `UnsLine`, `Equipment`, `DriverInstance` (Galaxy proxy), `Tag`, two `Script` rows, one `VirtualTag` (`Doubled` = `Source × 2`), one `ScriptedAlarm` (`OverTemp` when `Source > 50`).
### 3. (Optional) Swap the Galaxy attribute
### 3. Replace the Galaxy attribute placeholder
The shipped seed points `dbo.Tag.TagConfig` at `TestMachine_001.TestHistoryValue` — the dev-box Galaxy ships it as Int32, writable (`security_classification = Operate`), and historized (`HistoryExtension` primitive), so every E2E stage has a real live target. To swap to another attribute on a different Galaxy, pick a candidate via the same shape:
```sql
-- Run against the Galaxy Repository DB (ZB).
;WITH dpc AS (
SELECT g.gobject_id, p.package_id, p.derived_from_package_id, 0 AS depth
FROM gobject g INNER JOIN package p ON p.package_id = g.deployed_package_id
WHERE g.is_template = 0 AND g.deployed_package_id <> 0
UNION ALL
SELECT c.gobject_id, p.package_id, p.derived_from_package_id, c.depth + 1
FROM dpc c INNER JOIN package p ON p.package_id = c.derived_from_package_id
WHERE c.derived_from_package_id <> 0 AND c.depth < 10
)
SELECT DISTINCT g.tag_name + '.' + da.attribute_name AS full_ref,
dt.description AS dtype, da.security_classification
FROM dpc
INNER JOIN dynamic_attribute da ON da.package_id = dpc.package_id
INNER JOIN gobject g ON g.gobject_id = dpc.gobject_id
LEFT JOIN data_type dt ON dt.mx_data_type = da.mx_data_type
WHERE da.attribute_name NOT LIKE '[_]%'
AND da.attribute_name NOT LIKE '%.Description'
AND da.mx_data_type IN (1, 2, 3, 4)
AND da.security_classification > 0 -- writable
AND EXISTS (
SELECT 1 FROM primitive_instance pi
INNER JOIN primitive_definition pd
ON pd.primitive_definition_id = pi.primitive_definition_id
AND pd.primitive_name = 'HistoryExtension'
WHERE pi.package_id = dpc.package_id AND pi.primitive_name = da.attribute_name)
ORDER BY full_ref;
```
Then update the seed:
```sql
UPDATE dbo.Tag
SET TagConfig = N'{"FullName":"YourReal.GalaxyAttr","DataType":"Int32"}'
WHERE TagId = 'p7-smoke-tag-source';
```
`scripts/smoke/seed-phase-7-smoke.sql` inserts a `dbo.Tag.TagConfig` JSON with `FullName = "REPLACE_WITH_REAL_GALAXY_ATTRIBUTE"`. Edit the SQL + re-run, or `UPDATE dbo.Tag SET TagConfig = N'{"FullName":"YourReal.GalaxyAttr","DataType":"Float64"}' WHERE TagId='p7-smoke-tag-source'`. Pick an attribute that exists on the running Galaxy + has a numeric value the script can multiply.
### 4. Point Server.appsettings at the smoke node
@@ -92,38 +54,9 @@ UPDATE dbo.Tag
}
```
### 4a. (Optional) Enable LDAP + SecurityProfile for the write stage
Anonymous OPC UA sessions are denied writes against `Operate`-classified tags by the PR 26 server-layer classification gate. To exercise the reverse-bridge + alarm-fires stages fully, the Server has to advertise a `UserName` UserTokenPolicy (any profile other than `None`) and authenticate against LDAP.
```json
{
"OpcUa": {
"SecurityProfile": "Basic256Sha256-Sign",
"Ldap": {
"Enabled": true,
"Server": "localhost",
"Port": 3893,
"SearchBase": "dc=lmxopcua,dc=local",
"ServiceAccountDn": "cn=serviceaccount,dc=lmxopcua,dc=local",
"ServiceAccountPassword": "serviceaccount123",
"GroupToRole": {
"ReadOnly": "ReadOnly",
"WriteOperate": "WriteOperate",
"WriteTune": "WriteTune",
"WriteConfigure": "WriteConfigure",
"AlarmAck": "AlarmAck"
}
}
}
}
```
Dev-box GLAuth ships `writeop` / `writeop123` in the `WriteOperate` group, `admin` / `admin123` across all write groups. See `C:\publish\glauth\auth.md`.
## Run
### 5. Start the Server
### 5. Start the Server (non-elevated shell)
```powershell
dotnet run --project src/ZB.MOM.WW.OtOpcUa.Server
@@ -149,39 +82,27 @@ Any line missing = follow up the failure surface (each step has its own log sign
dotnet run --project src/ZB.MOM.WW.OtOpcUa.Client.CLI -- browse -u opc.tcp://localhost:4840/OtOpcUa -r -d 5
```
Expect to see under the namespace root: `lab-floor → galaxy-line → reactor-1` with three child variables: `Source` (driver-sourced Int32), `MachineStatus` (virtual tag Boolean, `Source > 0`), and `OverTemp` (scripted alarm Boolean, `Source > 50`). NodeIds are path-based per OPC UA Part 3 §5.2.2 — the walker mints them from `{driverId}/{folder-path}/{browseName}` and stores the driver-side FullReference in an internal NodeId→FullRef map, so client subscriptions survive backend address renames.
Expect to see under the namespace root: `lab-floor → galaxy-line → reactor-1` with three child variables: `Source` (driver-sourced), `Doubled` (virtual tag, value should track Source×2), and `OverTemp` (scripted alarm, boolean reflecting whether Source > 50).
#### Read the virtual tag
```powershell
dotnet run --project src/ZB.MOM.WW.OtOpcUa.Client.CLI -- read `
-u opc.tcp://localhost:4840/OtOpcUa `
-n "ns=2;s=p7-smoke-galaxy/lab-floor/galaxy-line/reactor-1/MachineStatus"
dotnet run --project src/ZB.MOM.WW.OtOpcUa.Client.CLI -- read -u opc.tcp://localhost:4840/OtOpcUa -n "ns=2;s=p7-smoke-vt-derived"
```
Expected: `Boolean`. Push a value change into the Source Galaxy attribute and re-read — `MachineStatus` should follow within the bridge's publishing interval (1 second by default).
Expected: a `Float64` value approximately equal to `2 × Source`. Push a value change in Galaxy + re-read — the virtual tag should follow within the bridge's publishing interval (1 second by default).
#### Read the scripted alarm
```powershell
dotnet run --project src/ZB.MOM.WW.OtOpcUa.Client.CLI -- read `
-u opc.tcp://localhost:4840/OtOpcUa `
-n "ns=2;s=p7-smoke-galaxy/lab-floor/galaxy-line/reactor-1/OverTemp"
dotnet run --project src/ZB.MOM.WW.OtOpcUa.Client.CLI -- read -u opc.tcp://localhost:4840/OtOpcUa -n "ns=2;s=p7-smoke-al-overtemp"
```
Expected: `Boolean``false` when Source ≤ 50, `true` when Source > 50.
#### Drive the alarm + verify historian queue
Push a Source value above 50 — either from Galaxy itself, or via the Server's OPC UA write path using LDAP credentials (step 4a). Within ~1 second, `OverTemp.Read` flips to `true`. The alarm engine emits a transition to `Phase7EngineComposer.RouteToHistorianAsync``SqliteStoreAndForwardSink.EnqueueAsync` → drain worker (every 2s) → `GalaxyHistorianWriter.WriteBatchAsync` → Galaxy.Host pipe → Aveva Historian alarm schema.
```powershell
# OPC UA write path — requires LDAP from step 4a + a writeop-class user.
dotnet run --project src/ZB.MOM.WW.OtOpcUa.Client.CLI -- write `
-u opc.tcp://localhost:4840/OtOpcUa -S sign `
-n "ns=2;s=p7-smoke-galaxy/lab-floor/galaxy-line/reactor-1/Source" `
-v 75 -U writeop -P writeop123
```
In Galaxy, push a Source value above 50. Within ~1 second, `OverTemp.Read` flips to `true`. The alarm engine emits a transition to `Phase7EngineComposer.RouteToHistorianAsync``SqliteStoreAndForwardSink.EnqueueAsync` → drain worker (every 2s) → `GalaxyHistorianWriter.WriteBatchAsync` → Galaxy.Host pipe → Aveva Historian alarm schema.
Verify the queue absorbed the event:
@@ -199,32 +120,14 @@ Open the Historian Client (or InTouch alarm summary) — the `OverTemp` activati
- [ ] EF migrations applied through `20260420232000_ExtendComputeGenerationDiffWithPhase7`
- [ ] Smoke seed completes without errors + creates exactly 1 Published generation
- [ ] Server starts + logs the Phase 7 composition lines
- [ ] Client.CLI browse shows the UNS tree with Source / MachineStatus / OverTemp under reactor-1
- [ ] Read on `Source` returns a Good-quality Int32 value (proves MXAccess round-trip)
- [ ] Read on `MachineStatus` returns the live boolean truth of `Source > 0`
- [ ] Server starts in non-elevated shell + logs the Phase 7 composition lines
- [ ] Client.CLI browse shows the UNS tree with Source / Doubled / OverTemp under reactor-1
- [ ] Read on `Doubled` returns `2 × Source` value
- [ ] Read on `OverTemp` returns the live boolean truth of `Source > 50`
- [ ] `test-galaxy.ps1 -Username writeop -Password writeop123` drives Source past 50 and flips `OverTemp` to `true` within 1 s
- [ ] Pushing Source past 50 in Galaxy flips `OverTemp` to `true` within 1 s
- [ ] SQLite queue drains (`COUNT(*)` returns to 0 within 2 s of an alarm transition)
- [ ] Historian shows the `OverTemp` activation event with the rendered message
## Second-run evidence (2026-04-24 dev box)
Full live stack ran end-to-end once the IPC unblocks (commit `d11dd05`), path-based NodeIds (commit `8be82e0`), cold-start engine guards (commit `69e1d32`), and seed retarget to `TestMachine_001.TestHistoryValue` (commit `ec1a590`) landed. Anonymous `scripts/e2e/test-galaxy.ps1` run reaches 3/7:
```
[PASS] source NodeId readable (Galaxy pipe → proxy → server → client chain up)
[PASS] source value = System.Byte[]
[INFO] BadUserAccessDenied — attribute's Galaxy-side ACL blocks writes for this session.
```
The `INFO` stage is correct behaviour — Source is `Operate`-classified and the anonymous session carries no LDAP roles. The Virtual-tag / Subscribe / Alarm / History stages stay at `[FAIL]` for two further environmental reasons once write is unblocked:
1. `TestMachine_001.TestHistoryValue` is driven by whatever Galaxy code runs on the object — idle in the default dev-box state, so no subscription pushes fire.
2. Historian writes require the Aveva Historian SDK to accept the alarm schema event — dev box doesn't have that path live.
Running `./test-galaxy.ps1 -Username writeop -Password writeop123` with step 4a's LDAP + `SecurityProfile = Basic256Sha256-Sign` applied unblocks the reverse-bridge + alarm-fires stages. The virtual-tag, subscribe, and history stages depend on further deployment choices (pick an attribute Galaxy is actively writing to, wire Aveva Historian SDK).
## First-run evidence (2026-04-20 dev box)
Ran the smoke against the live dev environment. Captured log signatures prove the Phase 7 wiring chain executes in production:
+80
View File
@@ -0,0 +1,80 @@
# PR 1 — Phase 1 + Phase 2 A/B/C → v2
**Source**: `phase-1-configuration` (commits `980ea51..7403b92`, 11 commits)
**Target**: `v2`
**URL**: https://gitea.dohertylan.com/dohertj2/lmxopcua/pulls/new/phase-1-configuration
## Summary
- **Phase 1 complete** — Configuration project with 16 entities + 3 EF migrations
(InitialSchema + 8 stored procs + AuthorizationGrants), Core + Server + full Admin UI
(Blazor Server with cluster CRUD, draft → diff → publish → rollback, equipment with
OPC 40010, UNS, namespaces, drivers, ACLs, reservations, audit), LDAP via GLAuth
(`localhost:3893`), SignalR real-time fleet status + alerts.
- **Phase 2 Streams A + B + C feature-complete** — full IPC contract surface
(Galaxy.Shared, netstandard2.0, MessagePack), Galaxy.Host with real Win32 STA pump,
ACL + caller-SID + per-process-secret IPC, Galaxy-specific MemoryWatchdog +
RecyclePolicy + PostMortemMmf + MxAccessHandle, three `IGalaxyBackend`
implementations (Stub / DbBacked / **MxAccess** — real ArchestrA.MxAccess.dll
reference, x86, smoke-tested live against `LMXProxyServer`), Galaxy.Proxy with all
9 capability interfaces (`IDriver` / `ITagDiscovery` / `IReadable` / `IWritable` /
`ISubscribable` / `IAlarmSource` / `IHistoryProvider` / `IRediscoverable` /
`IHostConnectivityProbe`) + supervisor (Backoff + CircuitBreaker +
HeartbeatMonitor).
- **Phase 2 Stream D non-destructive deliverables** — appsettings.json → DriverConfig
migration script, two-service Windows installer scripts, process-spawn cross-FX
parity test, Stream D removal procedure doc with both Option A (rewrite 494 v1
tests) and Option B (archive + new v2 E2E suite) spelled out step-by-step.
## What's NOT in this PR
- Legacy `OtOpcUa.Host` deletion (Stream D.1) — reserved for a follow-up PR after
Option B's E2E suite is green. The 494 v1 tests still pass against the unchanged
legacy Host.
- Live-Galaxy parity validation (Stream E) — needs the iterative debug cycle the
removal-procedure doc describes.
## Tests
**964 pass / 1 pre-existing Phase 0 baseline failure**, across 14 test projects:
| Project | Pass | Notes |
|---|---:|---|
| Core.Abstractions.Tests | 24 | |
| Configuration.Tests | 42 | incl. 7 schema compliance, 8 stored-proc, 3 SQL-role auth, 13 validator, 6 LiteDB cache, 5 generation-applier |
| Core.Tests | 4 | DriverHost lifecycle |
| Server.Tests | 2 | NodeBootstrap + LiteDB cache fallback |
| Admin.Tests | 21 | incl. 5 RoleMapper, 6 LdapAuth, 3 LiveLdap, 2 FleetStatusPoller, 2 services-integration |
| Driver.Galaxy.Shared.Tests | 6 | Round-trip + framing |
| Driver.Galaxy.Host.Tests | 30 | incl. 5 GalaxyRepository live ZB, 3 live MXAccess COM, 5 EndToEndIpc, 2 IpcHandshake, 4 MemoryWatchdog, 3 RecyclePolicy, 3 PostMortemMmf, 3 StaPump, 2 service-installer dry-run |
| Driver.Galaxy.Proxy.Tests | 10 | 9 unit + 1 process-spawn parity |
| Client.Shared.Tests | 131 | unchanged |
| Client.UI.Tests | 98 | unchanged |
| Client.CLI.Tests | 51 / 1 fail | pre-existing baseline failure |
| Historian.Aveva.Tests | 41 | unchanged |
| IntegrationTests (net48) | 6 | unchanged — v1 parity baseline |
| **OtOpcUa.Tests (net48)** | **494** | **unchanged — v1 parity baseline** |
## Test plan for reviewers
- [ ] `dotnet build ZB.MOM.WW.OtOpcUa.slnx` succeeds with no warnings beyond the
known NuGetAuditSuppress + xUnit1051 warnings
- [ ] `dotnet test ZB.MOM.WW.OtOpcUa.slnx` shows the same 964/1 result
- [ ] `Get-Service aaGR, aaBootstrap` reports Running on the merger's box
- [ ] `docker ps --filter name=otopcua-mssql` shows the SQL container Up
- [ ] Admin UI boots (`dotnet run --project src/ZB.MOM.WW.OtOpcUa.Admin`); home page
renders at http://localhost:5123/; LDAP sign-in with GLAuth `readonly` /
`readonly123` succeeds
- [ ] Migration script dry-run: `powershell -File
scripts/migration/Migrate-AppSettings-To-DriverConfig.ps1 -DryRun` produces
a well-formed DriverConfig JSON
- [ ] Spot-read three commit messages to confirm the deferred-with-rationale items
are explicitly documented (`549cd36`, `a7126ba`, `7403b92` are the most
recent and most detailed)
## Follow-up tracking
PR 2 (next session) will execute Stream D Option B — archive `OtOpcUa.Tests` as
`OtOpcUa.Tests.v1Archive`, build the new `OtOpcUa.Driver.Galaxy.E2E` test project,
delete legacy `OtOpcUa.Host`, and run the parity-validation cycle. See
`docs/v2/implementation/stream-d-removal-procedure.md`.
+69
View File
@@ -0,0 +1,69 @@
# PR 2 — Phase 2 Stream D Option B (archive v1 + E2E suite) → v2
**Source**: `phase-2-stream-d` (branched from `phase-1-configuration`)
**Target**: `v2`
**URL** (after push): https://gitea.dohertylan.com/dohertj2/lmxopcua/pulls/new/phase-2-stream-d
## Summary
Phase 2 Stream D Option B per `docs/v2/implementation/stream-d-removal-procedure.md`:
- **Archived the v1 surface** without deleting:
- `tests/ZB.MOM.WW.OtOpcUa.Tests/``tests/ZB.MOM.WW.OtOpcUa.Tests.v1Archive/`
(`<AssemblyName>` kept as `ZB.MOM.WW.OtOpcUa.Tests` so v1 Host's `InternalsVisibleTo`
still matches; `<IsTestProject>false</IsTestProject>` so solution test runs skip it).
- `tests/ZB.MOM.WW.OtOpcUa.IntegrationTests/``<IsTestProject>false</IsTestProject>`
+ archive comment.
- `src/ZB.MOM.WW.OtOpcUa.Host/` + `src/ZB.MOM.WW.OtOpcUa.Historian.Aveva/` — archive
PropertyGroup comments. Both still build (Historian plugin + 41 historian tests still
pass) so Phase 2 PR 3 can delete them in a focused, reviewable destructive change.
- **New `tests/ZB.MOM.WW.OtOpcUa.Driver.Galaxy.E2E/`** test project (.NET 10):
- `ParityFixture` spawns `OtOpcUa.Driver.Galaxy.Host.exe` (net48 x86) as a subprocess via
`Process.Start`, connects via real named pipe, exposes a connected `GalaxyProxyDriver`.
Skips when Galaxy ZB unreachable / Host EXE not built / Administrator shell.
- `HierarchyParityTests` (3) and `StabilityFindingsRegressionTests` (4) — one test per
2026-04-13 stability finding (phantom probe, cross-host quality clear, sync-over-async,
fire-and-forget alarm shutdown race).
- **`docs/v2/V1_ARCHIVE_STATUS.md`** — inventory + deletion plan for PR 3.
- **`docs/v2/implementation/exit-gate-phase-2-final.md`** — supersedes the two partial-exit
docs with the as-built state, adversarial review of PR 2 deltas (4 new findings), and the
recommended PR sequence (1 → 2 → 3 → 4).
## What's NOT in this PR
- Deletion of the v1 archive — saved for PR 3 with explicit operator review (destructive change).
- Wonderware Historian SDK plugin port — Task B.1.h, follow-up to enable real `HistoryRead`.
- MxAccess subscription push-frames — Task B.1.s, follow-up to enable real-time
data-change push from Host → Proxy.
## Tests
**`dotnet test ZB.MOM.WW.OtOpcUa.slnx`**: **470 pass / 7 skip / 1 pre-existing baseline**.
The 7 skips are the new E2E tests, all skipping with the documented reason
"PipeAcl denies Administrators on dev shells" — the production install runs as a non-admin
service account and these tests will execute there.
Run the archived v1 suites explicitly:
```powershell
dotnet test tests/ZB.MOM.WW.OtOpcUa.Tests.v1Archive # → 494 pass
dotnet test tests/ZB.MOM.WW.OtOpcUa.IntegrationTests # → 6 pass
```
## Test plan for reviewers
- [ ] `dotnet build ZB.MOM.WW.OtOpcUa.slnx` succeeds with no warnings beyond the known
NuGetAuditSuppress + NU1702 cross-FX
- [ ] `dotnet test ZB.MOM.WW.OtOpcUa.slnx` shows the 470/7-skip/1-baseline result
- [ ] Both archived suites pass when run explicitly
- [ ] Build the Galaxy.Host EXE (`dotnet build src/ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Host`),
then run E2E tests on a non-admin shell — they should actually execute and pass
against live Galaxy ZB
- [ ] Spot-read `docs/v2/V1_ARCHIVE_STATUS.md` and confirm the deletion plan is acceptable
## Follow-up tracking
- **PR 3** (next session, when ready): execute the deletion plan in `V1_ARCHIVE_STATUS.md`.
4 projects removed, .slnx updated, full solution test confirms parity.
- **PR 4** (Phase 2 follow-up): port Historian plugin + wire MxAccess subscription pushes +
close the high/medium open findings from `exit-gate-phase-2-final.md`.
+91
View File
@@ -0,0 +1,91 @@
# PR 4 — Phase 2 follow-up: close the 4 open MXAccess findings
**Source**: `phase-2-pr4-findings` (branched from `phase-2-stream-d`)
**Target**: `v2`
## Summary
Closes the 4 high/medium open findings carried forward in `exit-gate-phase-2-final.md`:
- **High 1 — `ReadAsync` subscription-leak on cancel.** One-shot read now wraps the
subscribe→first-OnDataChange→unsubscribe pattern in a `try/finally` so the per-tag
callback is always detached, and if the read installed the underlying MXAccess
subscription itself (no other caller had it), it tears it down on the way out.
- **High 2 — No reconnect loop on the MXAccess COM connection.** New
`MxAccessClientOptions { AutoReconnect, MonitorInterval, StaleThreshold }` + a background
`MonitorLoopAsync` that watches a stale-activity threshold + probes the proxy via a
no-op COM call, then reconnects-with-replay (re-Register, re-AddItem every active
subscription) when the proxy is dead. Liveness signal: every `OnDataChange` callback bumps
`_lastObservedActivityUtc`. Defaults match v1 monitor cadence (5s poll, 60s stale).
`ReconnectCount` exposed for diagnostics; `ConnectionStateChanged` event for downstream
consumers (the supervisor on the Proxy side already surfaces this through its
HeartbeatMonitor, but the Host-side event lets local logging/metrics hook in).
- **Medium 3 — `MxAccessGalaxyBackend.SubscribeAsync` doesn't push OnDataChange frames back to
the Proxy.** New `IGalaxyBackend.OnDataChange` / `OnAlarmEvent` / `OnHostStatusChanged`
events that the new `GalaxyFrameHandler.AttachConnection` subscribes per-connection and
forwards as outbound `OnDataChangeNotification` / `AlarmEvent` /
`RuntimeStatusChange` frames through the connection's `FrameWriter`. `MxAccessGalaxyBackend`
fans out per-tag value changes to every `SubscriptionId` that's listening to that tag
(multiple Proxy subs may share a Galaxy attribute — single COM subscription, multi-fan-out
on the wire). Stub + DbBacked backends declare the events with `#pragma warning disable
CS0067` (treat-warnings-as-errors would otherwise fail on never-raised events that exist
only to satisfy the interface).
- **Medium 4 — `WriteValuesAsync` doesn't await `OnWriteComplete`.** New
`WriteAsync(...)` overload returns `bool` after awaiting the OnWriteComplete callback via
the v1-style `TaskCompletionSource`-keyed-by-item-handle pattern in `_pendingWrites`.
`MxAccessGalaxyBackend.WriteValuesAsync` now reports per-tag `Bad_InternalError` when the
runtime rejected the write, instead of false-positive `Good`.
## Pipe server change
`IFrameHandler` gains `AttachConnection(FrameWriter writer): IDisposable` so the handler can
register backend event sinks on each accepted connection and detach them at disconnect. The
`PipeServer.RunOneConnectionAsync` calls it after the Hello handshake and disposes it in the
finally of the per-connection scope. `StubFrameHandler` returns `IFrameHandler.NoopAttachment.Instance`
(net48 doesn't support default interface methods, so the empty-attach lives as a public nested
class).
## Tests
**`dotnet test ZB.MOM.WW.OtOpcUa.slnx`**: **460 pass / 7 skip (E2E on admin shell) / 1
pre-existing baseline failure**. No regressions. The Driver.Galaxy.Host unit tests + 5 live
ZB smoke + 3 live MXAccess COM smoke all pass unchanged.
## Test plan for reviewers
- [ ] `dotnet build` clean
- [ ] `dotnet test` shows 460/7-skip/1-baseline
- [ ] Spot-check `MxAccessClient.MonitorLoopAsync` against v1's `MxAccessClient.Monitor`
partial (`src/ZB.MOM.WW.OtOpcUa.Host/MxAccess/MxAccessClient.Monitor.cs`) — same
polling cadence, same probe-then-reconnect-with-replay shape
- [ ] Read `GalaxyFrameHandler.ConnectionSink.Dispose` and confirm event handlers are
detached on connection close (no leaked invocation list refs)
- [ ] `WriteValuesAsync` returning `Bad_InternalError` on a runtime-rejected write is the
correct shape — confirm against the v1 `MxAccessClient.ReadWrite.cs` pattern
## What's NOT in this PR
- Wonderware Historian SDK plugin port (Task B.1.h) — separate PR, larger scope.
- Alarm subsystem wire-up (`MxAccessGalaxyBackend.SubscribeAlarmsAsync` is still a no-op).
`OnAlarmEvent` is declared on the backend interface and pushed by the frame handler when
raised; `MxAccessGalaxyBackend` just doesn't raise it yet (waits for the alarm-tracking
port from v1's `AlarmObjectFilter` + Galaxy alarm primitives).
- Host-status push (`OnHostStatusChanged`) — declared on the interface and pushed by the
frame handler; `MxAccessGalaxyBackend` doesn't raise it (the Galaxy.Host's
`HostConnectivityProbe` from v1 needs porting too, scoped under the Historian PR).
## Adversarial review
Quick pass over the PR 4 deltas. No new findings beyond:
- **Low 1**`MonitorLoopAsync`'s `$Heartbeat` probe item-handle is leaked
(`AddItem` succeeds, never `RemoveItem`'d). Cosmetic — the probe item is internal to
the COM connection, dies with `Unregister` at disconnect/recycle. Worth a follow-up
to call `RemoveItem` after the probe succeeds.
- **Low 2** — Replay loop in `MonitorLoopAsync` swallows per-subscription failures. If
Galaxy permanently rejects a previously-valid reference (rare but possible after a
re-deploy), the user gets silent data loss for that one subscription. The stub-handler-
unaware operator wouldn't notice. Worth surfacing as a `ConnectionStateChanged(false)
→ ConnectionStateChanged(true)` payload that includes the replay-failures list.
Both are low-priority follow-ups, not PR 4 blockers.
-233
View File
@@ -1,233 +0,0 @@
# Modbus tag-addressing reference
Foundational doc for the Modbus addressing grammar shipped across #136#144.
Covers the address-string parser (`ModbusAddressParser`) that the wire driver
and the Admin UI both consume, the per-tag suffix modifiers, and the family-
native branch.
## Grammar
```
<region><offset>[.<bit>][:<type>[<len>]][:<order>][:<count>]
```
Each field is optional from left to right; the parser fills defaults.
### Region + offset
Three accepted forms — pick whichever matches your tag spreadsheet's
convention. All three resolve to the same `(Region, ushort PduOffset)`
on the wire.
| Form | Example | Means |
|---|---|---|
| Modicon 5-digit | `40001` | Holding register 1 (PDU 0) |
| Modicon 6-digit | `400001` | Holding register 1 (PDU 0); supports up to `465536` (PDU 65535) |
| Mnemonic | `HR1`, `IR1`, `C100`, `DI1` | Same regions; `1`-based register number |
Modicon leading-digit → region:
| Digit | Region | OPC UA wire FC |
|---|---|---|
| `0` | Coils | FC01 / FC05 / FC15 |
| `1` | DiscreteInputs | FC02 (read-only) |
| `3` | InputRegisters | FC04 (read-only) |
| `4` | HoldingRegisters | FC03 / FC06 / FC16 |
### Bit suffix `.N`
`40001.5` = bit 5 (LSB-first) of HR[0]. Implies `DataType=BitInRegister`;
mixing with an explicit type or array-count is rejected.
### Type code `:T`
Codes verified 2026-04-25 against [Wonderware DASMBTCP user
guide](https://cdn.logic-control.com/media/DASMBTCP.pdf) and the
[Ignition Modbus addressing
manual](https://www.docs.inductiveautomation.com/docs/8.1/ignition-modules/opc-ua/opc-ua-drivers/modbus/modbus-addressing).
The `I` / `UI` / `I_64` / `UI_64` / `BCD_32` shapes match Wonderware's
suffix convention and Ignition's underscore-N prefix variants where
those vendors agree.
| Code | Type | Registers | Vendor reference |
|---|---|---|---|
| `BOOL` | Boolean | 1 (region must be Coils / DiscreteInputs) | universal |
| `S` | Int16 | 1 | Wonderware DASMBTCP `S` = 16-bit signed |
| `US` | UInt16 | 1 | Ignition `HRUS` = Unsigned Short |
| `I` | Int32 | 2 | Wonderware DASMBTCP `I` = 32-bit signed; Ignition `HRI` |
| `UI` | UInt32 | 2 | Ignition `HRUI` |
| `I_64` | Int64 | 4 | Ignition `HRI_64` |
| `UI_64` | UInt64 | 4 | Ignition `HRUI_64` |
| `F` | Float32 | 2 | Wonderware `F`; Ignition `HRF` |
| `D` | Float64 | 4 | Ignition `HRD` |
| `BCD` | 16-bit BCD | 1 | Ignition `HRBCD` |
| `BCD_32` | 32-bit BCD | 2 | Ignition `HRBCD_32` |
| `STR<len>` | ASCII string, `len` chars (2 chars / register) | `ceil(len/2)` | analogous to Ignition `HRS<addr>:<len>` |
Default when omitted:
- Coils / DiscreteInputs → `BOOL`
- HoldingRegisters / InputRegisters → `S` (Int16) — matches Ignition's bare-`HR` default
**Codes removed in #146** (silent wrong-data risk, never compatible with the
two reference vendors): `:DI`, `:L`, `:UDI`, `:UL`, `:LI`, `:ULI`, `:LBCD`.
Pre-#146 configs that use these get a clear "Unknown type code" diagnostic at
parse time; rewrite to the post-#146 codes per the table above.
### Byte order `:O`
| Mnemonic | Meaning | Wire |
|---|---|---|
| `ABCD` | Big-endian (Modbus spec default) | `[A,B,C,D]` |
| `CDAB` | Word swap (Siemens, several AB) | `[C,D,A,B]` |
| `BADC` | Byte swap (legacy little-endian-internal devices) | `[B,A,D,C]` |
| `DCBA` | Full reverse (some EtherNet/IP gateways) | `[D,C,B,A]` |
For 8-byte values (Int64 / Float64) the same labels apply pairwise.
### Array count `:N`
`40001:F:5` = `Float32[5]` (consumes HR[0..9]). Array + bit suffix is
rejected. Strings are not arrays.
### Composition
The 3-field shorthand `40001:F:5` is parsed as `(type=F, count=5)` because
`5` isn't a valid byte-order mnemonic. Use the explicit 4-field form
`40001:F:CDAB:5` when you need a non-default order.
## Family-native syntax (#144)
When the driver instance has `Family != Generic`, the parser tries the
family's native syntax FIRST, then falls back to Modicon / mnemonic.
### DL205 (AutomationDirect DirectLOGIC)
| Form | Example | Mapping |
|---|---|---|
| `Vnnnn` (octal) | `V2000` | HoldingRegisters[1024] (octal 2000 = decimal 1024) |
| `Ynn` (octal) | `Y17` | Coils[2048 + 15] (Y-output base + offset) |
| `Cnn` (octal) | `C100` | Coils[3072 + 64] (C-relay base + offset) |
| `Xnn` (octal) | `X17` | DiscreteInputs[15] |
| `SPnn` (octal) | `SP10` | DiscreteInputs[1024 + 8] |
**Cross-family ambiguity**: `C100` means Coils[99] under `Generic`
(mnemonic) but Coils[3136] under `DL205`. Per-driver Family choice
disambiguates.
### MELSEC (Mitsubishi)
| Form | Example | Mapping (sub-family Q_L_iQR / F_iQF) |
|---|---|---|
| `Dnnn` (decimal) | `D100` | HoldingRegisters[100] |
| `Mnnn` (decimal) | `M50` | Coils[50] |
| `Xnn` | `X20` | DiscreteInputs[32 hex / 16 octal] |
| `Ynn` | `Y20` | Coils[32 hex / 16 octal] |
X / Y digit interpretation depends on `MelsecSubFamily`:
- `Q_L_iQR` → hex (default)
- `F_iQF` → octal
Bank-base offsets default to 0 in the grammar string. Sites with non-zero
"Modbus Device Assignment" bases use the structured tag form.
## Driver-instance options
Beyond per-tag addressing, `ModbusDriverOptions` exposes (#139#143):
### Connection (#139)
- `KeepAlive { Enabled, Time, Interval, RetryCount }` — TCP-level probes.
Defaults match the historical PR 53 wire output (Enabled=true, Time=30s,
Interval=10s, RetryCount=3).
- `IdleDisconnectTimeout` — proactively close + reconnect after this much
socket idle time. Default null = disabled.
- `Reconnect { InitialDelay, MaxDelay, BackoffMultiplier }` — geometric
backoff for the post-drop reconnect loop. Default
`(0, 30s, 2.0)` = immediate first retry, geometric thereafter.
### Protocol (#140)
- `MaxCoilsPerRead` (default 2000) — separate cap for FC01/FC02 coil reads.
- `UseFC15ForSingleCoilWrites` — force FC15 (write multiple coils
qty=1) for single-coil writes. Safety/audit PLCs may require this.
- `UseFC16ForSingleRegisterWrites` — same for FC16 vs FC06.
- `DisableFC23` — kill switch for FC23 (currently unused; reserved).
### Subscribe (#141)
- Per-tag `Deadband` — suppress sub-threshold publishes on numeric tags.
- `WriteOnChangeOnly` (driver-level) — short-circuit identical-value
writes. Cache invalidates on read-divergence.
### Multi-unit (#142)
- Per-tag `UnitId` — overrides the driver-level UnitId in the MBAP
header. Required for one-Ethernet-gateway / N-RTU-slave deployments.
- `IPerCallHostResolver.ResolveHost` returns `host:port/unitN` per tag so
per-PLC circuit breakers fire per slave.
- Per-tag `CoalesceProhibited` — escape hatch for #143's planner (read
this tag in isolation regardless of `MaxReadGap`).
### Block-read coalescing (#143)
- `MaxReadGap` (default 0 = off) — gap budget the planner is willing to
bridge between adjacent register tags. With `MaxReadGap=10`, three tags
at HR 100/102/110 collapse into one FC03 of quantity 11.
### Coalescing auto-recovery (#148 / #150 / #151 / #152)
- A coalesced read that fails with a Modbus exception (write-only or
protected register mid-block) records the failed range as
auto-prohibited. The planner stops re-coalescing across the range; the
per-tag fallback path keeps healthy members working in the same scan.
- **Bisection (#150)**: every re-probe pass narrows multi-register
prohibitions by trying the two halves separately. Over log2(span)
ticks the prohibition pins at the actual offending register(s);
intermediate halves that succeed get cleared.
- **Periodic re-probe (#151)**: opt in via
`AutoProhibitReprobeInterval` (TimeSpan?). Default null = disabled
(prohibitions persist for the driver lifetime; clear on
`ReinitializeAsync`).
- **Per-tag escape hatch**: `CoalesceProhibited` (bool, default false)
on `ModbusTagDefinition`. The planner reads such tags in isolation
regardless of `MaxReadGap`. Use for known-bad addresses you want to
exclude from the auto-discovery loop.
- **Diagnostics (#152)**: `ModbusDriver.GetAutoProhibitedRanges()`
returns a snapshot of every active prohibition as
`ModbusAutoProhibition` records (UnitId / Region / StartAddress /
EndAddress / LastProbedUtc / BisectionPending). Surface in the
driver-diagnostics RPC channel when that wiring lands; for now
consumable by in-process callers (Server health endpoints, log
aggregation).
## JSON DTO shape
The factory accepts both the structured form (legacy) and the new
`AddressString` form per-tag. Mix freely — newer pasted rows use the
grammar string; legacy rows keep the structured fields.
```json
{
"host": "10.1.2.3",
"port": 502,
"unitId": 1,
"family": "DL205",
"keepAlive": { "enabled": true, "timeMs": 30000, "intervalMs": 10000, "retryCount": 3 },
"idleDisconnectMs": 120000,
"reconnect": { "initialDelayMs": 0, "maxDelayMs": 30000, "backoffMultiplier": 2.0 },
"maxCoilsPerRead": 2000,
"writeOnChangeOnly": false,
"maxReadGap": 8,
"tags": [
{ "name": "Temp", "addressString": "V2000:F:CDAB" },
{ "name": "Setpoint", "addressString": "40001:I" },
{ "name": "Outputs", "addressString": "Y0:5" },
{ "name": "AlarmCount", "region": "HoldingRegisters", "address": 200, "dataType": "Int16", "deadband": 5.0 }
]
}
```
## Vendor compatibility caveat
The exact spelling of type codes (e.g. `I` vs `INT`, `BCD` vs `L_BCD`) and
the byte-order mnemonics were synthesised from training-era vendor docs
(Wonderware DASMBTCP, Kepware KEPServerEX, Ignition, Matrikon, OAS).
Before locking the grammar for a production deployment, verify against
the current Kepware "Modbus Ethernet Driver Help" PDF and Ignition's
"Modbus Addressing" user-manual page — if a critical tool's mnemonics
have shifted, add aliases in `ModbusAddressParser.TryParseType` rather
than asking users to rewrite spreadsheets.
-11
View File
@@ -70,17 +70,6 @@ integration tests until reproduced on hardware:
- TxId drop under load (forum rumour; not reproduced).
- Pre-2004 firmware ABCD word order (every shipped DL205/DL260 since 2004 is CDAB).
### Siemens SIMATIC S7
Quirk catalog at [`s7.md`](s7.md) — covers S7-1200 / S7-1500 / S7-300 / S7-400 /
ET 200SP. Modbus TCP isn't native; each platform exposes it via a different
add-on module with its own register-mapping conventions.
### Mitsubishi MELSEC
Quirk catalog at [`mitsubishi.md`](mitsubishi.md) — Modbus TCP via add-on modules
across the MELSEC family.
### Future devices
One section per device class, same shape as DL205. Quirks that apply across
-46
View File
@@ -1,46 +0,0 @@
# Multi-host dispatch — per-PLC circuit breakers
Phase 6.1 decision #144 / task #135. Motivation: a single DriverInstance that fronts N PLCs (Modbus with multiple slaves, AB CIP with multiple ControlLogix chassis, etc.) must not let one dead PLC trip the resilience breaker for its healthy siblings.
This note documents the shipped contract so future driver authors don't re-derive it.
## Contract
The resilience pipeline keys on `(DriverInstanceId, HostName, DriverCapability)`. One dead PLC opens only the pipeline keyed on its HostName; healthy sibling PLCs keep their own pipelines intact.
Three participants:
1. **`DriverResiliencePipelineBuilder.GetOrCreate(driverInstanceId, hostName, capability, options)`** — the pipeline cache. First call per key builds a Polly pipeline (timeout → retry → breaker). Subsequent calls return the cached instance. Covered by `DriverResiliencePipelineBuilderTests.Pipeline_IsIsolated_PerHost`.
2. **`CapabilityInvoker.ExecuteAsync(capability, hostName, callSite, ct)`** — takes `hostName` per-call. Threads it straight through to the pipeline builder. Covered by `CapabilityInvokerTests`.
3. **`IPerCallHostResolver.ResolveHost(fullReference)`** — an optional interface a multi-device driver implements. `DriverNodeManager.ResolveHostFor` calls it on every capability dispatch so the host flowing into the invoker comes from the tag's per-PLC metadata, not the driver instance. Single-device drivers don't implement it — `DriverNodeManager` falls back to `DriverInstanceId` as the hostname, which still flows through the same `(instance, host, capability)` key shape (one pipeline per single-device instance).
End-to-end `dead PLC, healthy PLC` scenario proven by `PerCallHostResolverDispatchTests.DeadPlc_DoesNotOpenBreaker_For_HealthyPlc_With_Resolver`.
## Driver author checklist
To light up per-PLC circuit breakers on a multi-device driver:
1. **Options model** — extend the driver's options type with an explicit device list. See `AbCipDriverOptions.Devices : IReadOnlyList<AbCipDeviceConfig>`.
2. **Tag → device mapping** — parse the tag's `DeviceId` from `TagConfig`. The driver's per-tag definition records the device HostAddress alongside the wire address. See `AbCipTagDefinition.DeviceHostAddress`.
3. **`IPerCallHostResolver`** — implement it on the driver. `ResolveHost(fullReference)` looks up the tag's definition and returns the device HostAddress. Unknown references should return a deterministic fallback (e.g. the first configured device's host) rather than throw — the invoker handles the mislookup at capability level when the actual read surfaces `BadNodeIdUnknown`.
4. **Health surface**`IHostConnectivityProbe.GetHostStatuses()` returns one `HostConnectivityStatus` per configured device so the Admin UI fleet page lights the per-PLC status distinctly.
5. **Transport per device** — one network connection per PLC, serialized per device via `SemaphoreSlim` (or equivalent). Do not share a transport across PLCs; the breaker-isolation guarantee disappears if they share a queue.
## Current fleet status (2026-04-24)
| Driver | Per-tag device | `IPerCallHostResolver` | Per-PLC breaker isolation |
|---|---|---|---|
| AB CIP | ✅ `DeviceId` | ✅ | ✅ live |
| AB Legacy | 1 device / instance | — (not needed) | trivial |
| Modbus | 1 device / instance today | — | trivial — multi-device refactor tracked separately |
| S7 | 1 device / instance today | — | trivial — same |
| TwinCAT | 1 device / instance today | — | trivial — same |
| FOCAS | 1 CNC / instance | — (not needed) | trivial |
| Galaxy | 1 Galaxy Host / instance | — (not needed) | trivial — Host recycle runs per instance |
| OPC UA Client | 1 upstream / instance | — (not needed) | trivial |
"Trivial" above means the pipeline key ends up as `(DriverInstanceId, DriverInstanceId, capability)` via `DriverNodeManager.ResolveHostFor`'s fallback — one pipeline per driver instance, which is correct for single-device drivers.
Extending Modbus / S7 / TwinCAT to multi-device follows the AB CIP template verbatim; it's per-driver surgery (schema row + options model + resolver implementation + transport fan-out) rather than shared-infrastructure work.
+13 -19
View File
@@ -689,7 +689,7 @@ Galaxy.Proxy ──→ Galaxy.Shared ←── Galaxy.Host
**Decided:**
- Mono-repo (Decision #31 above).
- `Core.Abstractions` is **internal-only for now** — no standalone NuGet. Keep the contract mutable while the first 8 drivers are being built; revisit publishing after the driver fleet (originally Phase 5, folded into the Phase 3 umbrella — see exit gate) once the shape has stabilized. Design the contract *as if* it will eventually be public (no leaky types, stable names) to minimize churn later.
- `Core.Abstractions` is **internal-only for now** — no standalone NuGet. Keep the contract mutable while the first 8 drivers are being built; revisit publishing after Phase 5 when the shape has stabilized. Design the contract *as if* it will eventually be public (no leaky types, stable names) to minimize churn later.
---
@@ -742,30 +742,24 @@ Each step leaves the system runnable. The generic extraction is effectively free
10. **Build `Galaxy.Proxy`** — .NET 10 in-process proxy implementing IDriver interfaces, forwarding over IPC
11. **Validate parity** — v2 Galaxy driver must pass the same integration tests as v1
**Phase 3 — Driver fleet (all seven non-Galaxy drivers) — ✅ CLOSED 2026-04-23** (see [`implementation/exit-gate-phase-3.md`](implementation/exit-gate-phase-3.md))
**Phase 3 — Modbus TCP driver (prove the abstraction)**
12. **Build `Driver.ModbusTcp`** — NModbus, config-driven tags from central DB, internal poll loop, device-as-folder hierarchy
13. **Add Modbus config screens to Admin** (first driver-specific config UI)
Originally split across Phase 3 (Modbus alone), Phase 4 (PLC drivers), and
Phase 5 (specialty drivers). In execution, once `Core.Abstractions` had
stabilised under Phase 1 + Phase 2, each driver landed as its own stream
rather than as a gated mini-phase; the phase numbers were folded into a
single umbrella. Shipped:
**Phase 4 — PLC drivers**
14. **Build `Driver.AbCip`** — libplctag, ControlLogix/CompactLogix symbolic tags + Admin config screens
15. **Build `Driver.AbLegacy`** — libplctag, SLC 500/MicroLogix file-based addressing + Admin config screens
16. **Build `Driver.S7`** — S7netplus, Siemens S7-300/400/1200/1500 + Admin config screens
17. **Build `Driver.TwinCat`** — Beckhoff.TwinCAT.Ads v6, native ADS notifications, symbol upload + Admin config screens
12. **`Driver.Modbus`** — NModbus, config-driven tags, internal poll loop, device-as-folder hierarchy (umbrella closure #210)
13. **`Driver.AbCip`** — libplctag, ControlLogix/CompactLogix symbolic tags (#211, live-booted under #220)
14. **`Driver.AbLegacy`** — libplctag, SLC 500 / MicroLogix / PLC-5 file-based addressing (#213, live-booted under #222)
15. **`Driver.S7`** — S7netplus, Siemens S7-300/400/1200/1500 (#212, live-booted under #220)
16. **`Driver.TwinCAT`** — Beckhoff.TwinCAT.Ads v7, native ADS notifications, symbol upload (factory wired 2026-04-23; wire-live deferred, #221)
17. **`Driver.FOCAS`** — FANUC FOCAS2 P/Invoke via Tier-C out-of-process `Driver.FOCAS.Host` (#220 five-PR split; wire-live deferred, #222 follow-up)
18. **`Driver.OpcUaClient`** — OPC UA client gateway / aggregation, namespace remapping, subscription proxying (scaffold #66; live-boot 5/8 stages via `test-opcuaclient.ps1`)
Supporting infrastructure: `DriverFactoryRegistry` + `DriverInstanceBootstrapper`
(#248); per-driver test-client CLI suite (#249#251); e2e test scripts with
aggregate runner (#253); server-side factory + seed SQL per driver (#210#213).
**Phase 5 — Specialty drivers**
18. **Build `Driver.Focas`**FANUC FOCAS2 P/Invoke, pre-defined CNC tag set, PMC/macro config + Admin config screens
19. **Build `Driver.OpcUaClient`**OPC UA client gateway/aggregation, namespace remapping, subscription proxying + Admin config screens
**Decided:**
- **Parity test for Galaxy**: existing v1 IntegrationTests suite + scripted Client.CLI walkthrough (see Section 4 above).
- **Timeline**: no hard deadline. Each phase ships when it's right — tests passing, Galaxy parity bar met. Quality cadence over calendar cadence.
- **FOCAS SDK**: license already secured. FOCAS driver shipped as part of the Phase 3 umbrella with Tier-C host; `Fwlib64.dll` available for P/Invoke (wire-level live-boot gated on lab rig, #222 follow-up).
- **FOCAS SDK**: license already secured. Phase 5 can proceed as scheduled; `Fwlib64.dll` available for P/Invoke.
---
-128
View File
@@ -1,128 +0,0 @@
# Redundancy Interop Playbook (Phase 6.3 Stream F — task #150)
> **Scope**: manual validation that third-party OPC UA clients + AVEVA MXAccess
> observe our non-transparent redundancy signals (ServiceLevel, ServerUriArray,
> RedundancySupport) and fail over to the Backup node when the Primary drops.
>
> **Why manual**: the third-party clients named here are Windows-GUI binaries
> (UaExpert, Kepware QuickClient) or embedded inside AVEVA System Platform.
> Automating any of them into PR-CI is out of scope for v2. This playbook
> captures the minimal dev-box-plus-VM setup and the expected pass criteria so
> the work can be executed repeatably at v2 release readiness and after any
> Phase 6.3 follow-up change.
## Prerequisites
1. Two `OtOpcUa.Server` nodes in one `ServerCluster`:
- Declared as `NodeCount = 2`, `RedundancyMode = Hot` (or `Warm`).
- Each with a distinct `ApplicationUri` (enforced by unique index per
decision #86).
- Each node's `StaticRoutes.xml` points at the other (`ServerCluster.Node[].Host`).
2. `scripts/install/Install-Services.ps1` applied on each node so the
`RedundancyPublisherHostedService` is running.
3. At least one `DriverInstance` with a reachable simulator or PLC so both
servers have a non-empty address space to browse.
4. On the client host:
- `UaExpert` ≥ 1.7 installed
- Kepware `ClientAce QuickClient` (or equivalent) — optional, for a second
client
5. For the AVEVA leg: a `Galaxy.Host` running against an MXAccess deployment
with an external OPC UA client object pointed at the cluster (not at a
single node).
## Expected signals on a running cluster
| Node | `ServiceLevel` | `RedundancySupport` | `ServerUriArray` |
|---|---|---|---|
| Primary, healthy, peer reachable | 200 | `Hot` (or `Warm`) | self + peer |
| Primary, mid-apply | 75 (`PrimaryMidApply`) | same | same |
| Primary, peer UNreachable | 150 (`PrimaryPeerDown`) | same | same |
| Backup, healthy | 100 (`Secondary`) | same | same |
| Either, dwelling in recovery | 50 (`Recovering`) | same | same |
| Either, invariant violation (two Primary, disabled-node mismatch) | 2 (`InvalidTopology`) | same | same |
(The band constants live in `ServiceLevelCalculator.Classify`.)
## Test matrix
Each row is one manual run; pass criterion in the right column.
### Block A — UA protocol signals (UaExpert)
| # | Scenario | Procedure | Pass criterion |
|---|---|---|---|
| A1 | ServiceLevel published | Connect UaExpert to Primary. Browse to `Server.ServerStatus.ServiceLevel`. | Value = 200 (or the expected Band byte per table above) |
| A2 | ServiceLevel updates on peer down | Connect to Primary. Stop Backup (`sc stop OtOpcUa`). Watch `ServiceLevel`. | Transitions 200 → 150 within ~2 s of peer probe timeout |
| A3 | RedundancySupport | Browse to `Server.ServerRedundancy.RedundancySupport`. | Value matches the declared `RedundancyMode` (Warm / Hot / None) |
| A4 | ServerUriArray (non-transparent upgrade) | Requires a redundancy-object-type upgrade follow-up. | When upgrade lands: `ServerUriArray` reports both ApplicationUris, self first |
| A5 | Mid-apply dip | On Primary trigger a `sp_PublishGeneration` apply. | `ServiceLevel` drops to 75 for the apply duration + dwell |
### Block B — Client failover
| # | Scenario | Procedure | Pass criterion |
|---|---|---|---|
| B1 | UaExpert picks Primary by ServiceLevel | In UaExpert configure a Redundancy Group with both endpoint URLs. | Client picks the Primary URL (higher ServiceLevel) |
| B2 | UaExpert cuts over on Primary kill | Kill the Primary's `OtOpcUa` service. | Client session reconnects to Backup within UaExpert's reconnect timeout (default 5 s). Data-change monitored items resume. |
| B3 | UaExpert cuts back when Primary returns | Start the Primary service. Wait ≥ recovery dwell (see `RecoveryStateManager.DwellTime`). | `ServiceLevel` on returning Primary goes through 50 (Recovering) → 200; UaExpert may or may not switch back (client-policy dependent; both are accepted outcomes) |
| B4 | Kepware QuickClient failover | Repeat B1B3 with Kepware in place of UaExpert. | Same pass criteria; establishes we're not UaExpert-specific |
### Block C — Galaxy MXAccess failover
This block validates that an AVEVA System Platform app consuming our cluster
via MXAccess tolerates a Primary drop the same way a native OPC UA client does.
The MXAccess toolkit internally wraps the OPC UA Client and does its own
redundancy negotiation; we're asserting that negotiation honors our
`ServiceLevel` signal.
| # | Scenario | Procedure | Pass criterion |
|---|---|---|---|
| C1 | Galaxy binds to Primary on first connect | Bring the cluster up. Start a Galaxy `$MxAccessClient` object pointed at the cluster with both node URLs. | Galaxy reports `QUALITY = Good` + initial values from the Primary |
| C2 | Galaxy redirects on Primary drop | Stop the Primary. | Galaxy's `QUALITY` briefly goes `Uncertain`, then back to `Good`; values continue streaming from the Backup within MXAccess's `ReconnectInterval` (default 20 s) |
| C3 | Galaxy handles mid-apply dip | Trigger a generation apply on the Primary. | Galaxy continues reading — the mid-apply dip is advertisory (ServiceLevel 75), not a session drop; MXAccess should stay bound |
## Recording results
Copy the tables above into a tracking doc per run. The tracking doc shape:
```
Run date: 2026-MM-DD
Cluster: <id> Primary: <node> Backup: <node> Release: <sha>
A1: PASS evidence: UaExpert screenshot uaexpert-a1.png
A2: PASS evidence: ServiceLevel trend grafana-a2.png
```
One pass of every row is the acceptance criterion. Re-run after any Phase 6.3
follow-up ships (especially the non-transparent redundancy-type upgrade, which
flips A4 from "deferred" to "expected pass").
## Known limitations
- **A4 pending**: `Server.ServerRedundancy` on our current SDK build lands as
the base `ServerRedundancyState`, which has no `ServerUriArray` child.
`ServerRedundancyNodeWriter.ApplyServerUriArray` logs-and-skips until the
redundancy-object-type upgrade follow-up lands.
- **Recovery dwell default**: `RecoveryStateManager.DwellTime` defaults to 60 s
in `Program.cs`. Adjust via future config knob if B3 takes too long to
observe.
- **C-block external dependency**: The `Galaxy.Host` side of the redundancy
story is largely out of our code — it's MXAccess's own client-redundancy
policy talking to our published ServiceLevel. A negative result on C1-C3
does not necessarily indicate an OtOpcUa bug; cross-check with UaExpert
(Block A / B) first.
## Automation notes (why this is a playbook, not a test)
- UaExpert and Kepware binaries are closed-source Windows GUIs; they don't
ship headless CLIs for the browse/connect/subscribe flows.
- The OPC Foundation reference SDK *can* drive every scenario, but our own
`Driver.OpcUaClient` tests already cover that client's behaviour; Block B
adds value specifically because these two clients have independent
redundancy implementations we don't control.
- For the sub-set of scenarios that *can* be automated — the self-loopback
case where our own `otopcua-cli` drives Primary + Backup — the existing
`tests/ZB.MOM.WW.OtOpcUa.Server.Tests/RedundancyStatePublisherTests` +
`ServiceLevelCalculatorTests` (unit) + `ClusterTopologyLoaderTests`
(integration) already cover the math + data path. The wire-level assertion
that the values actually land on the right OPC UA nodes is covered by
`ServerRedundancyNodeWriterTests`.
+1012
View File
File diff suppressed because it is too large Load Diff
+40 -61
View File
@@ -1,7 +1,7 @@
# v2 Release Readiness
> **Last updated**: 2026-04-24 (Phase 5 driver complement closed — AB CIP, AB Legacy, TwinCAT, FOCAS all shipped; FOCAS Tier-C retired for a pure-managed in-process client)
> **Status**: **RELEASE-READY (code-path)** for v2 GA. All three original code-path release blockers remain closed. Phase 5 is now complete. Remaining work is manual (live-hardware validations, client interop matrix, deployment checklist signoff, OPC UA CTT pass) + hardening follow-ups; see exit-criteria checklist below.
> **Last updated**: 2026-04-19 (all three release blockers CLOSED — Phase 6.3 Streams A/C core shipped)
> **Status**: **RELEASE-READY (code-path)** for v2 GA — all three code-path release blockers are closed. Remaining work is manual (client interop matrix, deployment checklist signoff, OPC UA CTT pass) + hardening follow-ups; see exit-criteria checklist below.
This doc is the single view of where v2 stands against its release criteria. Update it whenever a deferred follow-up closes or a new release blocker is discovered.
@@ -14,78 +14,67 @@ This doc is the single view of where v2 stands against its release criteria. Upd
| Phase 2 — Galaxy driver split (Proxy/Host/Shared) | ✓ | Shipped |
| Phase 3 — OPC UA server + LDAP + security profiles | ✓ | Shipped |
| Phase 4 — Redundancy scaffold (entities + endpoints) | ✓ | Shipped (runtime closes in 6.3) |
| Phase 5 — Drivers | ✓ | **Shipped** Galaxy, Modbus (+ DL205/S7/MELSEC profiles), S7 native, OPC UA Client, AB CIP, AB Legacy, TwinCAT ADS, FOCAS (managed wire client) |
| Phase 6.1 — Resilience & Observability | ✓ | Shipped (PRs #7883) |
| Phase 6.2 — Authorization runtime | ◐ core | Core shipped (PRs #8488, #94 dispatch wiring); finer-grained Browse/Subscribe/Alarm/Call gating + 3-user interop matrix deferred |
| Phase 6.3 — Redundancy runtime | ◐ core | Core shipped (PRs #8990, #9899); peer-probe HostedServices, OPC UA variable-node binding, `sp_PublishGeneration` lease wrap, client interop matrix deferred |
| Phase 6.4 — Admin UI completion | ◐ data layer + Identification | Data layer + OPC 40010 Identification folder shipped (PRs #9192, Identification audit close-out 2026-04-23); Blazor UI pieces deferred |
| Phase 5 — Drivers | ⚠ partial | Galaxy / Modbus / S7 / OpcUaClient shipped; AB CIP / AB Legacy / TwinCAT / FOCAS deferred (task #120) |
| Phase 6.1 — Resilience & Observability | ✓ | **SHIPPED** (PRs #7883) |
| Phase 6.2 — Authorization runtime | ◐ core | **SHIPPED (core)** (PRs #8488); dispatch wiring + Admin UI deferred |
| Phase 6.3 — Redundancy runtime | ◐ core | **SHIPPED (core)** (PRs #8990); coordinator + UA-node wiring + Admin UI + interop deferred |
| Phase 6.4 — Admin UI completion | ◐ data layer | **SHIPPED (data layer)** (PRs #9192); Blazor UI + OPC 40010 address-space wiring deferred |
**Driver integration-test counts** (end-to-end against live or simulated targets): Modbus 26, FOCAS 9, AbCip 7, OpcUaClient 3, S7 3, AbLegacy 2, TwinCAT 2. Plus Galaxy's separate cross-FX parity/stability suite.
**Aggregate test counts** (2026-04-19 baseline): 1159 passing across the solution. One pre-existing Client.CLI `SubscribeCommandTests.Execute_PrintsSubscriptionMessage` flake tracked separately. Rerun `dotnet test ZB.MOM.WW.OtOpcUa.slnx` after the FOCAS migration commits land to refresh the number.
**Aggregate test counts:** 906 baseline (pre-Phase-6) → **1159 passing** across Phase 6. One pre-existing Client.CLI `SubscribeCommandTests.Execute_PrintsSubscriptionMessage` flake tracked separately.
## Release blockers (must close before v2 GA)
All code-path release blockers are closed. The remaining items are live-hardware / manual validations listed under exit criteria.
Ordered by severity + impact on production fitness.
### ~~Security — Phase 6.2 dispatch wiring~~ (task #143**CLOSED** 2026-04-19, PR #94)
**Closed**. `AuthorizationGate` + `NodeScopeResolver` thread through `OpcUaApplicationHost → OtOpcUaServer → DriverNodeManager`. `OnReadValue` + `OnWriteValue` + all four HistoryRead paths call `gate.IsAllowed(identity, operation, scope)` before the invoker. Production deployments activate enforcement by constructing `OpcUaApplicationHost` with an `AuthorizationGate(StrictMode: true)` + populating the `NodeAcl` table.
**Closed**. `AuthorizationGate` + `NodeScopeResolver` now thread through `OpcUaApplicationHost → OtOpcUaServer → DriverNodeManager`. `OnReadValue` + `OnWriteValue` + all four HistoryRead paths call `gate.IsAllowed(identity, operation, scope)` before the invoker. Production deployments activate enforcement by constructing `OpcUaApplicationHost` with an `AuthorizationGate(StrictMode: true)` + populating the `NodeAcl` table.
Remaining Stream C surfaces (hardening, not release-blocking):
Additional Stream C surfaces (not release-blocking, hardening only):
- ~~Browse + TranslateBrowsePathsToNodeIds gating with ancestor-visibility logic per `acl-design.md` §Browse.~~ **Partial, 2026-04-24.** `DriverNodeManager.Browse` override post-filters the `ReferenceDescription` list via a new `FilterBrowseReferences` helper — denied nodes disappear silently per OPC UA convention. Ancestor-visibility implication (Read-grant at `Line/Tag` implying Browse on `Line`) still to ship; needs a subtree-has-any-grant query on the trie evaluator. `TranslateBrowsePathsToNodeIds` surface not yet wired.
- ~~CreateMonitoredItems + TransferSubscriptions gating with per-item `(AuthGenerationId, MembershipVersion)` stamp so revoked grants surface `BadUserAccessDenied` within one publish cycle (decision #153).~~ **Partial, 2026-04-24.** `DriverNodeManager.CreateMonitoredItems` override pre-gates each request and pre-populates `BadUserAccessDenied` into the errors slot for denied items (the base stack honours pre-set errors and skips those items). Decision #153's per-item `(AuthGenerationId, MembershipVersion)` stamp for detecting mid-subscription revocation is still to ship — needs subscription-layer plumbing. TransferSubscriptions not yet wired (same pattern).
- ~~Alarm Acknowledge / Confirm / Shelve gating.~~ **Partial, 2026-04-24.** Acknowledge + Confirm map to dedicated `OpcUaOperation.AlarmAcknowledge` / `AlarmConfirm` via `MapCallOperation`; Shelve falls through to generic `OpcUaOperation.Call` (needs per-instance method NodeId resolution to distinguish — follow-up).
- ~~Call (method invocation) gating.~~ **Closed 2026-04-24.** `DriverNodeManager.Call` override pre-gates each `CallMethodRequest` via `GateCallMethodRequests`. Denied calls return `BadUserAccessDenied` without running the method. Alarm methods map to alarm-specific operation kinds; everything else gates as generic `Call`.
- ~~Finer-grained scope resolution — current `NodeScopeResolver` returns a flat cluster-level scope. Joining against the live Configuration DB to populate UnsArea / UnsLine / Equipment path is tracked as Stream C.12.~~ **Closed 2026-04-24.** `AuthorizationBootstrap` now loads `NodeAcl` rows for the current generation into a `PermissionTrieCache`, builds the gate, and merges every registered driver's `EquipmentNamespaceContent` into a full-path `NodeScopeResolver` index. `OpcUaServerService` calls the bootstrap after the equipment registry is populated, before `OpcUaApplicationHost.StartAsync`. Disabled by default — operators flip `Node:Authorization:Enabled=true` to enforce, `StrictMode=true` to reject anonymous/no-groups identities.
- Browse + TranslateBrowsePathsToNodeIds gating with ancestor-visibility logic per `acl-design.md` §Browse.
- CreateMonitoredItems + TransferSubscriptions gating with per-item `(AuthGenerationId, MembershipVersion)` stamp so revoked grants surface `BadUserAccessDenied` within one publish cycle (decision #153).
- Alarm Acknowledge / Confirm / Shelve gating.
- Call (method invocation) gating.
- Finer-grained scope resolution — current `NodeScopeResolver` returns a flat cluster-level scope. Joining against the live Configuration DB to populate UnsArea / UnsLine / Equipment path is tracked as Stream C.12.
- 3-user integration matrix covering every operation × allow/deny.
These are additional hardening — the three highest-value surfaces (Read / Write / HistoryRead) are now gated, which covers the base-security gap for v2 GA.
### ~~Config fallback — Phase 6.1 Stream D wiring~~ (task #136**CLOSED** 2026-04-19, PR #96)
**Closed**. `SealedBootstrap` consumes `ResilientConfigReader` + `GenerationSealedCache` + `StaleConfigFlag` end-to-end; `/healthz` surfaces the stale flag.
**Closed**. `SealedBootstrap` consumes `ResilientConfigReader` + `GenerationSealedCache` + `StaleConfigFlag` end-to-end: bootstrap calls go through the timeout → retry → fallback-to-sealed pipeline; every central-DB success writes a fresh sealed snapshot so the next cache-miss has a known-good fallback; `StaleConfigFlag.IsStale` is now consumed by `HealthEndpointsHost.usingStaleConfig` so `/healthz` body reports reality.
Remaining follow-ups (hardening):
Production activation: Program.cs switches `NodeBootstrap → SealedBootstrap` + constructs `OpcUaApplicationHost` with the `StaleConfigFlag` as an optional ctor parameter.
Remaining follow-ups (hardening, not release-blocking):
- A `HostedService` that polls `sp_GetCurrentGenerationForCluster` periodically so peer-published generations land in this node's cache without a restart.
- Richer snapshot payload via `sp_GetGenerationContent` so fallback can serve full generation content (DriverInstance enumeration, ACL rows, etc.) from the sealed cache alone.
- Richer snapshot payload via `sp_GetGenerationContent` so fallback can serve the full generation content (DriverInstance enumeration, ACL rows, etc.) from the sealed cache alone.
### ~~Redundancy — Phase 6.3 Streams A/C core~~ (tasks #145 + #147**CLOSED** 2026-04-19, PRs #9899)
**Closed**. `RedundancyCoordinator` + `RedundancyStatePublisher` + `PeerReachabilityTracker` orchestrate topology + apply lease + recovery state + peer reachability through `ServiceLevelCalculator` + emit `OnStateChanged` / `OnServerUriArrayChanged` edge-triggered events.
**Closed**. The runtime orchestration layer now exists end-to-end:
- `RedundancyCoordinator` reads `ClusterNode` + peer list at startup (Stream A shipped in PR #98). Invariants enforced: 1-2 nodes (decision #83), unique ApplicationUri (#86), ≤1 Primary in Warm/Hot (#84). Startup fails fast on violation; runtime refresh logs + flips `IsTopologyValid=false` so the calculator falls to band 2 without tearing down.
- `RedundancyStatePublisher` orchestrates topology + apply lease + recovery state + peer reachability through `ServiceLevelCalculator` + emits `OnStateChanged` / `OnServerUriArrayChanged` edge-triggered events (Stream C core shipped in PR #99). The OPC UA `ServiceLevel` Byte variable + `ServerUriArray` String[] variable subscribe to these events.
Remaining Phase 6.3 surfaces (hardening, not release-blocking):
- ~~`PeerHttpProbeLoop` + `PeerUaProbeLoop` HostedServices populating `PeerReachabilityTracker` on each tick.~~ **Closed 2026-04-24.** Two-layer probe model shipped: HTTP probe at 2 s / 1 s timeout against `/healthz`; OPC UA probe at 10 s / 5 s timeout via `DiscoveryClient.GetEndpoints`, short-circuiting when HTTP reports the peer unhealthy. Registered on the Server as `AddHostedService<PeerHttpProbeLoop>` + `AddHostedService<PeerUaProbeLoop>`. Publisher now sees accurate `PeerReachability` per peer instead of degrading to `Unknown` → Isolated-Primary band (230).
- OPC UA variable-node wiring: bind `ServiceLevel` Byte + `ServerUriArray` String[] to the publisher's events via `BaseDataVariable.OnReadValue` / direct value push.
- ~~`sp_PublishGeneration` wraps its apply in `await using var lease = coordinator.BeginApplyLease(...)` so the `PrimaryMidApply` band (200) fires during actual publishes (task #148 part 2).~~ **Closed 2026-04-24.** The apply loop now lives in `GenerationRefreshHostedService` — polls `sp_GetCurrentGenerationForCluster` every 5s, opens a lease when a new generation is detected, calls `RedundancyCoordinator.RefreshAsync` inside the `await using`, releases the lease on all exit paths. Replaces the previous "topology never refreshes without a process restart" behaviour.
- Client interop matrix — Ignition / Kepware / Aveva OI Gateway (Stream F, task #150). Manual + doc-only.
- `PeerHttpProbeLoop` + `PeerUaProbeLoop` HostedServices that poll the peer + write to `PeerReachabilityTracker` on each tick. Without these the publisher sees `PeerReachability.Unknown` for every peer → Isolated-Primary band (230) even when the peer is up. Safe default (retains authority) but not the full non-transparent-redundancy UX.
- OPC UA variable-node wiring layer: bind the `ServiceLevel` Byte node + `ServerUriArray` String[] node to the publisher's events via `BaseDataVariable.OnReadValue` / direct value push. Scoped follow-up on the Opc.Ua.Server stack integration.
- `sp_PublishGeneration` wraps its apply in `await using var lease = coordinator.BeginApplyLease(...)` so the `PrimaryMidApply` band (200) fires during actual publishes (task #148 part 2).
- Client interop matrix validation — Ignition / Kepware / Aveva OI Gateway (Stream F, task #150). Manual + doc-only work; doesn't block code ship.
### ~~Phase 5 driver complement~~ (task #120**CLOSED** 2026-04-24)
### Remaining drivers (task #120)
**Closed**. All four deferred drivers shipped:
- **AB CIP** (PRs #202222) — `Driver.AbCip`, `Driver.AbCip.IntegrationTests` (7 tests), AB CIP Cli. Live-boot verified against a ControlLogix rig.
- **AB Legacy** (PRs #202, #223) — `Driver.AbLegacy`, `Driver.AbLegacy.IntegrationTests` (2 tests), AB Legacy Cli. PCCC cip-path workaround for SLC/MicroLogix.
- **TwinCAT ADS** (PRs #205, this branch `task-galaxy-e2e`) — `Driver.TwinCAT`, `Driver.TwinCAT.IntegrationTests` (2 tests), TwinCAT Cli. TCBSD/ESXi fixture for e2e since local Hyper-V / TwinCAT RTIME are mutually exclusive on the dev box.
- **FOCAS** (PRs #173, #199 + this session's migration) — `Driver.FOCAS` with an **in-process managed `FocasWireClient`** that speaks FOCAS/2 over TCP directly. Tier-C isolation retired — `Driver.FOCAS.Host` + `Driver.FOCAS.Shared` + `FwlibNative` P/Invoke + shim DLL + NSSM service all deleted. `Driver.FOCAS.IntegrationTests` covers 9 scenarios (fixed tree identity/axes/program/timers/spindle + user-authored PARAM/MACRO/PMC reads, Browse, Subscribe, IAlarmSource raise/clear, Probe transitions).
Decision recorded: FOCAS is **read-only** against the CNC by design — writes return `BadNotWritable`. See `docs/drivers/FOCAS.md` + `docs/drivers/FOCAS-Test-Fixture.md` for the deployment + coverage map.
AB CIP, AB Legacy, TwinCAT ADS, FOCAS drivers are planned but unshipped. Decision pending on whether these are release-blocking for v2 GA or can slip to a v2.1 follow-up.
## Nice-to-haves (not release-blocking)
- **Admin UI** — Phase 6.1 Stream E.2/E.3 (`/hosts` column refresh), Phase 6.2 Stream D (`RoleGrantsTab` + `AclsTab` Probe), Phase 6.3 Stream E (`RedundancyTab`), Phase 6.4 Streams A/B UI pieces, Stream C DiffViewer, Stream D `IdentificationFields.razor`. Tasks #134, #144, #149, #153, #155, #156, #157.
- **Background services** — Phase 6.1 Stream B.4 `ScheduledRecycleScheduler` HostedService (task #137), Phase 6.1 Stream A analyzer (task #135 — Roslyn analyzer asserting every capability surface routes through `CapabilityInvoker`).
- **Multi-host dispatch** — Phase 6.1 Stream A follow-up (task #135). Every driver currently gets a single pipeline keyed on `driver.DriverInstanceId`; multi-host drivers (Modbus with N PLCs) need per-PLC host resolution so failing PLCs trip per-PLC breakers without poisoning siblings. Decision #144 requires this but not wired.
- **Phase 7** — scripting + alarming + historian sink (plan drafted 2026-04-20 in `docs/v2/implementation/phase-7-*.md`). Out of scope for v2 GA.
## Live-hardware validations (task #54 + task family)
The code ships; these tasks remain open as lab/field verification:
- **#54** — FOCAS live-CNC wire-level smoke against a real FANUC control. The mock's wire responder is PDU-verified against `fwlibe64.dll` upstream but OtOpcUa's managed client has not been pointed at a production CNC.
- **AB CIP live-boot** — already passed on a ControlLogix rig (PR #222). Continue to run ahead of each release.
- **TwinCAT wire-live** — TCBSD/ESXi fixture covers the common path; production PLC verification remains lab-gated.
- **Multi-host dispatch** — Phase 6.1 Stream A follow-up (task #135). Currently every driver gets a single pipeline keyed on `driver.DriverInstanceId`; multi-host drivers (Modbus with N PLCs) need per-PLC host resolution so failing PLCs trip per-PLC breakers without poisoning siblings. Decision #144 requires this but we haven't wired it yet.
## Running the release-readiness check
@@ -93,12 +82,7 @@ The code ships; these tasks remain open as lab/field verification:
pwsh ./scripts/compliance/phase-6-all.ps1
```
This meta-runner invokes each `phase-6-N-compliance.ps1` script in sequence and reports an aggregate PASS/FAIL:
- `phase-6-1-compliance.ps1` — Resilience & Observability
- `phase-6-2-compliance.ps1` — Authorization runtime
- `phase-6-3-compliance.ps1` — Redundancy runtime
- `phase-6-4-compliance.ps1` — Admin UI completion
This meta-runner invokes each `phase-6-N-compliance.ps1` script in sequence and reports an aggregate PASS/FAIL. It is the single-command verification that what we claim is shipped still compiles + tests pass + the plan-level invariants are still satisfied.
Exit 0 = every phase passes its compliance checks + no test-count regression.
@@ -108,23 +92,18 @@ v2 GA requires all of the following:
- [ ] All four Phase 6.N compliance scripts exit 0.
- [ ] `dotnet test ZB.MOM.WW.OtOpcUa.slnx` passes with ≤ 1 known-flake failure.
- [x] Release blockers listed above all closed.
- [x] Phase 5 driver complement shipped (Galaxy, Modbus, S7, OpcUaClient, AbCip, AbLegacy, TwinCAT, FOCAS).
- [ ] Release blockers listed above all closed (or consciously deferred to v2.1 with a written decision).
- [ ] Production deployment checklist (separate doc) signed off by Fleet Admin.
- [ ] At least one end-to-end integration run against the live Galaxy on the dev box succeeds.
- [ ] FOCAS live-CNC wire-level smoke (#54) runs clean against a real FANUC control.
- [ ] OPC UA conformance test (CTT or UA Compliance Test Tool) passes against the live endpoint.
- [ ] Non-transparent redundancy cutover validated with at least one production client (Ignition 8.3 recommended — see decision #85).
## Change log
- **2026-04-24**Phase 5 driver complement closed (task #120 CLOSED). AB CIP, AB Legacy, TwinCAT, FOCAS all shipped. FOCAS migration: retired the Tier-C split (`Driver.FOCAS.Host` + `Driver.FOCAS.Shared` + `FwlibNative` + shim DLL deleted) in favour of a pure-managed in-process `FocasWireClient` inlined into `Driver.FOCAS`; driver is now read-only against the CNC by design. Integration test matrix grew to cover Browse / Subscribe / IAlarmSource / Probe end-to-end.
- **2026-04-23**Phase 6.4 audit close-out. IdentificationFolderBuilder + OPC 40010 Identification folder verified against the shipped code.
- **2026-04-20**Phase 7 plan drafted (`phase-7-scripting-and-alarming.md`, `phase-7-e2e-smoke.md`). Out of scope for v2 GA.
- **2026-04-19**Release blocker #3 closed (PRs #9899). Phase 6.3 Streams A + C core shipped: `ClusterTopologyLoader` + `RedundancyCoordinator` + `RedundancyStatePublisher` + `PeerReachabilityTracker`. Code-path release blockers all closed; remaining Phase 6.3 surfaces (peer-probe HostedServices, OPC UA variable-node binding, `sp_PublishGeneration` lease wrap, client interop matrix) are hardening follow-ups.
- **2026-04-19** — Release blocker #2 closed (PR #96). `SealedBootstrap` consumes `ResilientConfigReader` + `GenerationSealedCache` + `StaleConfigFlag`; `/healthz` surfaces the stale flag. Remaining follow-ups (periodic poller + richer snapshot payload) downgraded to hardening.
- **2026-04-19** — Release blocker #1 closed (PR #94). `AuthorizationGate` wired into `DriverNodeManager` Read / Write / HistoryRead dispatch. Remaining Stream C surfaces (Browse / Subscribe / Alarm / Call + finer-grained scope resolution) downgraded to hardening follow-ups — no longer release-blocking.
- **2026-04-19** — Phase 6.4 data layer merged (PRs #9192). Phase 6 core complete.
- **2026-04-19**Release blocker #3 **closed** (PRs #9899). Phase 6.3 Streams A + C core shipped: `ClusterTopologyLoader` + `RedundancyCoordinator` + `RedundancyStatePublisher` + `PeerReachabilityTracker`. Code-path release blockers all closed; remaining Phase 6.3 surfaces (peer-probe HostedServices, OPC UA variable-node binding, sp_PublishGeneration lease wrap, client interop matrix) are hardening follow-ups.
- **2026-04-19**Release blocker #2 **closed** (PR #96). `SealedBootstrap` consumes `ResilientConfigReader` + `GenerationSealedCache` + `StaleConfigFlag`; `/healthz` now surfaces the stale flag. Remaining follow-ups (periodic poller + richer snapshot payload) downgraded to hardening.
- **2026-04-19**Release blocker #1 **closed** (PR #94). `AuthorizationGate` wired into `DriverNodeManager` Read / Write / HistoryRead dispatch. Remaining Stream C surfaces (Browse / Subscribe / Alarm / Call + finer-grained scope resolution) downgraded to hardening follow-ups — no longer release-blocking.
- **2026-04-19**Phase 6.4 data layer merged (PRs #9192). Phase 6 core complete. Capstone doc created.
- **2026-04-19** — Phase 6.3 core merged (PRs #8990). `ServiceLevelCalculator` + `RecoveryStateManager` + `ApplyLeaseRegistry` land as pure logic; coordinator / UA-node wiring / Admin UI / interop deferred.
- **2026-04-19** — Phase 6.2 core merged (PRs #8488). `AuthorizationGate` + `TriePermissionEvaluator` + `LdapGroupRoleMapping` land; dispatch wiring + Admin UI deferred.
- **2026-04-19** — Phase 6.1 shipped (PRs #7883). Polly resilience + Tier A/B/C stability + health endpoints + LiteDB generation-sealed cache + Admin `/hosts` data layer all live.
-31
View File
@@ -1,31 +0,0 @@
# TwinCAT driver — v3 backlog
The v2 TwinCAT driver is considered solid: 28 integration tests (14 `[TwinCATFact]` +
16-case `[TwinCATTheory]`) running live against the TCBSD fixture, 110 unit tests,
three latent driver bugs shaken out (notification cycle units, `STRING(N)` mapper,
bit-indexed BOOL path). Further work is deferred.
Archived from `docs/drivers/TwinCAT-Test-Fixture.md` § Follow-up candidates.
## Deferred items
1. **TC2 coverage** — spin up a TC2 runtime (Windows CE IPC or legacy XAR)
and run the same suite; any delta surfaces. Blocked on hardware.
2. **Notification coalescing under load** — run the subscribe test while
the PLC cycle is saturated (bump `lineSim` complexity, watch for
dropped notifications). Doable on current rig; deferred as lower
priority than v3 feature work.
3. **Multi-hop AMS route** — add a test behind an IPC gateway with a
chained route entry. Blocked on hardware (gateway IPC).
4. **License-rotation automation** — XAR's 7-day trial expires on
schedule. Either automate `TcActivate.exe /reactivate` via a scheduled
task on the VM (not officially supported; reportedly works for some
TC3 builds), or buy a paid runtime license (~$1k one-time per runtime
per CPU) to kill the rotation. Ops item, not code.
5. **Lab rig** — cheapest IPC (CX7000 / CX9020) on a dedicated network;
the only route that covers TC2 + real EtherCAT I/O timing + cycle
jitter under CPU load. Blocked on hardware + budget.
+101
View File
@@ -0,0 +1,101 @@
# TC3 EventLogger spike — managed-wrapper investigation
**Question (b) from the PR 5.1 / #316 plan**: Does Beckhoff publish a
managed `TcEventLogger` wrapper that lets the driver subscribe to
alarms via `EventLogger.AlarmRaised` instead of decoding AMS port 110
notifications by hand?
## TL;DR
**No managed wrapper.** The `Beckhoff.TwinCAT.Ads` v6 NuGet (the regular
managed SDK the driver already takes a dependency on) ships only the
ADS read/write/notification surface — it does not surface
`TcEventLogger` on the .NET side. The C++ TcCOM headers
(`TcEventLogger.h` etc.) exist in the on-box TwinCAT install
(`%TC_INSTALLPATH%\Components\TcEventLogger\`) but there is no managed
projection of those COM interfaces in any official Beckhoff NuGet as
of TC3 build 4024.x.
Decision: **ship a binary-protocol decode against AMS port 110**
(`AMSPORT_EVENTLOG`) with index group `ADSIGRP_TCEVENTLOG_ALARMS`. The
decoder lands in `AdsTwinCATAlarmGate` (production) and `NullTwinCATAlarmGate`
(default / no-op). Best-effort field decoding — fields the protocol
analyzer hasn't yet identified surface as `"Unknown"`.
## What was checked
| Source | Result |
| --- | --- |
| `Beckhoff.TwinCAT.Ads` v6.x NuGet, namespace inventory | `TwinCAT.Ads`, `TwinCAT.Ads.SumCommand`, `TwinCAT.Ads.TypeSystem`, `TwinCAT.TypeSystem`. **No** `TcEventLogger` namespace. |
| `Beckhoff.TwinCAT.Ads.TcpRouter` v6.x NuGet | Router only; no EventLogger surface. |
| Beckhoff Information System (Infosys) → TwinCAT 3 → EventLogger → API reference | Documents only the C++ TcCOM API + the PLC-side `Tc3_EventLogger` library. No managed-language section. |
| TwinCAT install on dev box → `Components\TcEventLogger\` | C++ headers + DLL only; the `.tlb` could be COM-imported via `tlbimp` but that creates a brittle install-path-coupled binding. |
| Public Beckhoff GitHub orgs | `Beckhoff/TwinCAT-Tools-Library` etc. — no managed EventLogger wrapper. |
## Why decode at the wire?
A `tlbimp` projection of the on-box TcCOM `.tlb` would technically work
but introduces three problems:
1. **Install-path coupling** — the `.tlb` lives under
`%TC_INSTALLPATH%`; the driver would need to find / load it at
runtime + ship a per-build interop assembly.
2. **Bitness lock-in** — TcCOM is x86; the driver builds AnyCPU.
3. **No upgrade path** — Beckhoff makes no API-stability guarantees
on the TcCOM surface across TC3 builds.
Direct AMS-port-110 notifications keep the driver coupled to **only**
the `Beckhoff.TwinCAT.Ads` v6 NuGet's stable wire surface. Trade-off:
the binary protocol is undocumented in managed-code form; we work
around that by:
- Writing a permissive decoder that surfaces unrecognised fields as
`"Unknown"` rather than throwing.
- Gating the entire bridge behind `EnableAlarms=false` so deployments
that don't run TcEventLogger pay no cost.
- Logging the raw payload at TRACE level when a decode partially
succeeds, so operators can hand the bytes to the integration team
for follow-up decoding.
## What ships in PR 5.1
- `ITwinCATAlarmGate` interface — driver-internal seam.
- `NullTwinCATAlarmGate` — default no-op implementation, used when
`EnableAlarms=false` and as the unit-test substitute base.
- `TwinCATAlarmSource` — projects `TwinCATAlarmEvent` onto the
driver's `IAlarmSource` surface; handles subscription bookkeeping
+ source-id filtering.
- `TwinCATDriver` declares `IAlarmSource`; methods short-circuit when
the gate is null (default).
- Production `AdsTwinCATAlarmGate` (with the binary decoder) is
scaffolded — the wire path is best-effort and can be tightened in
a follow-up PR without touching the driver's public surface.
## Open questions for the follow-up PR
1. **Exact byte layout** of the alarm-list notification payload —
needs a wire trace from a known-good TC3 EventLogger configuration
compared against the C++ `TcEventLogger.h` struct definitions.
2. **Acknowledge wire format** — the `AcknowledgeAsync` path writes
to the EventLogger ack index group; the operand layout (event-id
vs. condition-id mapping) is best-effort in PR 5.1.
3. **Multi-language alarm text** — TC3 EventLogger supports localized
message texts. The decoder should pick the runtime's configured
language; PR 5.1 falls back to the first text it finds.
4. **Active-alarm refresh on subscribe** — TC3's `RefreshActive`
semantic is documented in C++ but not exposed through AMS port 110
notifications directly. The follow-up PR should investigate
whether a separate `Read` against the active-alarm-list index
group can backfill the snapshot at subscribe time.
## Why land PR 5.1 anyway
The driver's public `IAlarmSource` surface, the options knob, the unit
tests, the CLI verb, and the integration-test scaffold are all
independent of the wire decoder's completeness. Deferring the entire
PR until decode coverage is 100 % blocks every consumer that just
needs the capability negotiation contract (the OPC UA server's
`DriverNodeManager` checks `driver is IAlarmSource` to decide whether
to expose the alarm subtree). Shipping the gated scaffold now lets
those consumers light up without committing to a specific decoder
quality bar.
+51
View File
@@ -0,0 +1,51 @@
# CLAUDE.md
This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
## Purpose
The goal of this project is to identify and develop SQL queries that extract the Galaxy object hierarchy from the **System Platform Galaxy Repository** database in order to build a tag structure for an OPC UA server.
Specifically, we need to:
- Build the hierarchy of **areas** and **automation objects** (using contained names for human-readable browsing)
- Translate contained names to **tag_names** for read/write operations (e.g., `TestMachine_001.DelmiaReceiver` in the hierarchy becomes `DelmiaReceiver_001` when addressing tag values)
See `layout.md` for details on the hierarchy vs tag name relationship.
## Key Files
### Documentation
- `connectioninfo.md` — Database connection details and sqlcmd usage
- `layout.md` — Galaxy object hierarchy, contained_name vs tag_name translation, and target OPC UA structure
- `build_layout_plan.md` — Step-by-step plan for extracting hierarchy, attaching attributes, and monitoring for changes
- `data_type_mapping.md` — Galaxy mx_data_type to OPC UA DataType mapping, including array handling (ValueRank, ArrayDimensions)
### Queries
- `queries/hierarchy.sql` — Deployed object hierarchy with browse names and parent relationships
- `queries/attributes.sql` — User-defined (dynamic) attributes with data types and array dimensions
- `queries/attributes_extended.sql` — All attributes (system + user-defined) with data types and array dimensions
- `queries/change_detection.sql` — Poll `galaxy.time_of_last_deploy` to detect deployment changes
### Schema Reference
- `schema.md` — Full schema reference for all tables and views in the ZB database
- `ddl/tables/` — Individual CREATE TABLE definitions
- `ddl/views/` — Individual view definitions
## Working with the Galaxy Repository Database
The Galaxy Repository is the backing SQL Server database for Wonderware/AVEVA System Platform (Galaxy: ZB, localhost, Windows Auth). Key tables used by the queries:
- **gobject** — Object instances, hierarchy (contained_by_gobject_id, area_gobject_id), deployment state (deployed_package_id)
- **template_definition** — Object type categories (category_id distinguishes areas, engines, user-defined objects, etc.)
- **dynamic_attribute** — User-defined attributes on templates, inherited by instances via derived_from_gobject_id chain
- **attribute_definition** — System/primitive attributes
- **primitive_instance** — Links objects to their primitive components and attribute definitions
- **galaxy** — Single-row table with time_of_last_deploy for change detection
Use `sqlcmd -S localhost -d ZB -E -Q "..."` to run queries. See `connectioninfo.md` for details.
## Conventions
- Store all connection parameters in `connectioninfo.md`, not scattered across scripts.
- Keep SQL query examples and extraction notes as Markdown files in this repo.
- If scripts are added (Python, PowerShell, etc.), document their usage and dependencies alongside them.
+84
View File
@@ -0,0 +1,84 @@
# OPC UA Server Layout — Build Plan
## Overview
Extract the Galaxy object hierarchy and tag definitions from the ZB (Galaxy Repository) database to construct an OPC UA server address space. The root node is hardcoded as **ZB**.
## Step 1: Build the Browse Tree
Run `queries/hierarchy.sql` to get all deployed automation objects and their parent-child relationships.
For each row returned:
- `parent_gobject_id = 0` → child of the root ZB node
- `is_area = 1` → create as an OPC UA folder node (organizational)
- `is_area = 0` → create as an OPC UA object node (container for tags)
- Use `browse_name` as the OPC UA BrowseName/DisplayName
- Store `gobject_id` and `tag_name` for attribute lookup and tag reference translation
Build the tree by matching each row's `parent_gobject_id` to another row's `gobject_id`. The result is:
```
ZB (root, hardcoded)
└── DEV (folder, is_area=1)
├── DevAppEngine (object)
├── DevPlatform (object)
└── TestArea (folder, is_area=1)
├── DevTestObject (object)
└── TestMachine_001 (object)
├── DelmiaReceiver (object, browse_name from contained_name)
└── MESReceiver (object, browse_name from contained_name)
```
## Step 2: Attach Attributes as Tag Nodes
Run `queries/attributes.sql` to get all user-defined attributes for deployed objects.
For each attribute row:
- Match to the browse tree via `gobject_id`
- Create an OPC UA variable node under the matching object node
- Use `attribute_name` as the BrowseName/DisplayName
- Use `full_tag_reference` as the runtime tag path for read/write operations
- Map `mx_data_type` to OPC UA built-in types:
| mx_data_type | Description | OPC UA Type |
|--------------|-------------|-------------|
| 1 | Boolean | Boolean |
| 2 | Integer | Int32 |
| 3 | Float | Float |
| 4 | Double | Double |
| 5 | String | String |
| 6 | Time | DateTime |
| 7 | ElapsedTime | Double (seconds) or Duration |
- If `is_array = 1`, create the variable as an array with rank 1 and dimension from `array_dimension`
## Step 3: Monitor for Changes
Poll `queries/change_detection.sql` on a regular interval (e.g., every 30 seconds).
```
SELECT time_of_last_deploy FROM galaxy;
```
Compare the returned `time_of_last_deploy` to the last known value:
- **No change** → do nothing
- **Changed** → a deployment occurred; re-run Steps 1 and 2 to rebuild the address space
This handles objects being deployed, undeployed, added, or removed.
## Connection Details
See `connectioninfo.md` for database connection parameters and sqlcmd usage.
```
sqlcmd -S localhost -d ZB -E -Q "YOUR QUERY HERE"
```
## Query Files
| File | Purpose |
|------|---------|
| `queries/hierarchy.sql` | Deployed object hierarchy with browse names and parent relationships |
| `queries/attributes.sql` | User-defined attributes with data types and array dimensions |
| `queries/attributes_extended.sql` | All attributes (system + user-defined) with data types and array dimensions |
| `queries/change_detection.sql` | Poll galaxy.time_of_last_deploy for deployment changes |
+26
View File
@@ -0,0 +1,26 @@
# Galaxy Repository — Connection Information
## Database Connection
| Parameter | Value |
|-----------------|----------------|
| Server | localhost (default instance) |
| Database Name | ZB |
| Port | 1433 (default) |
| Authentication | Windows Auth |
| Username | dohertj2 |
## sqlcmd Usage
```
sqlcmd -S localhost -d ZB -E -Q "YOUR QUERY HERE"
```
- `-S localhost` — default instance
- `-d ZB` — database name
- `-E` — Windows Authentication (dohertj2)
## Notes
- The Galaxy Repository is a SQL Server database created and managed by AVEVA System Platform (formerly Wonderware).
- Typically accessed via SQL Server Management Studio (SSMS), `sqlcmd`, or programmatically via ODBC/ADO.NET/pyodbc.
+96
View File
@@ -0,0 +1,96 @@
# Data Type Mapping — Galaxy Repository to OPC UA
## Scalar Type Mapping
| mx_data_type | Galaxy Description | OPC UA DataType | OPC UA NodeId | Notes |
|--------------|--------------------|-----------------|---------------|-------|
| 1 | Boolean | Boolean | i=1 | Direct mapping |
| 2 | Integer (Int32) | Int32 | i=6 | Galaxy integers are 32-bit signed |
| 3 | Float (Single) | Float | i=10 | 32-bit IEEE 754 |
| 4 | Double | Double | i=11 | 64-bit IEEE 754 |
| 5 | String | String | i=12 | Unicode string |
| 6 | Time (DateTime) | DateTime | i=13 | Galaxy DateTime to OPC UA DateTime (100ns ticks since 1601-01-01) |
| 7 | ElapsedTime (TimeSpan) | Double | i=11 | No native OPC UA TimeSpan; map to Double representing seconds (or use Duration type alias, NodeId i=290) |
| 8 | (reference) | String | i=12 | Object reference; expose as string representation |
| 13 | (enumeration) | Int32 | i=6 | Enum backing value is integer |
| 14 | (custom) | String | i=12 | Fallback to string |
| 15 | InternationalizedString | LocalizedText | i=21 | OPC UA LocalizedText supports locale + text pairs |
| 16 | (custom) | String | i=12 | Fallback to string |
## OPC UA Built-in Type Reference
For context, the full set of OPC UA built-in types and their NodeIds:
| NodeId | Type | Description |
|--------|------|-------------|
| i=1 | Boolean | True/false |
| i=2 | SByte | Signed 8-bit integer |
| i=3 | Byte | Unsigned 8-bit integer |
| i=4 | Int16 | Signed 16-bit integer |
| i=5 | UInt16 | Unsigned 16-bit integer |
| i=6 | Int32 | Signed 32-bit integer |
| i=7 | UInt32 | Unsigned 32-bit integer |
| i=8 | Int64 | Signed 64-bit integer |
| i=9 | UInt64 | Unsigned 64-bit integer |
| i=10 | Float | 32-bit IEEE 754 |
| i=11 | Double | 64-bit IEEE 754 |
| i=12 | String | Unicode string |
| i=13 | DateTime | Date and time (100ns ticks since 1601-01-01) |
| i=14 | Guid | 128-bit globally unique identifier |
| i=15 | ByteString | Sequence of bytes |
| i=21 | LocalizedText | Locale + text pair |
## Array Handling
When `is_array = 1` in the attributes query, the OPC UA variable node must be configured as an array.
### ValueRank
Set on the OPC UA variable node to indicate scalar vs array:
| is_array | ValueRank | Meaning |
|----------|-----------|---------|
| 0 | -1 (Scalar) | Value is not an array |
| 1 | 1 (OneDimension) | Value is a one-dimensional array |
### ArrayDimensions
When `ValueRank = 1`, set the `ArrayDimensions` attribute to a single-element array containing the `array_dimension` value from the attributes query.
Example for `MESReceiver_001.MoveInPartNumbers` (`is_array=1`, `array_dimension=50`):
- DataType: String (i=12)
- ValueRank: 1
- ArrayDimensions: [50]
Example for `TestMachine_001.MachineID` (`is_array=0`):
- DataType: String (i=12)
- ValueRank: -1
- ArrayDimensions: (not set)
## Security Classification
Galaxy attributes have a `security_classification` column that controls the access level required for writes. The attributes query returns this value for each attribute.
| security_classification | Galaxy Level | OPC UA Access | Description |
|-------------------------|--------------|---------------|-------------|
| 0 | FreeAccess | ReadWrite | No security restrictions |
| 1 | Operate | ReadWrite | Normal operating level (default) |
| 2 | SecuredWrite | ReadOnly | Requires elevated write access |
| 3 | VerifiedWrite | ReadOnly | Requires verified/confirmed write access |
| 4 | Tune | ReadWrite | Tuning-level access |
| 5 | Configure | ReadWrite | Configuration-level access |
| 6 | ViewOnly | ReadOnly | Read-only, no writes permitted |
Most attributes default to `Operate` (1). Higher values indicate more restrictive write access. `ViewOnly` (6) attributes should be exposed as read-only in OPC UA (`AccessLevel = CurrentRead` only, no `CurrentWrite`).
## DateTime Conversion
Galaxy `Time` (mx_data_type=6) stores DateTime values. OPC UA DateTime is defined as the number of 100-nanosecond intervals since January 1, 1601 (UTC). Ensure the conversion accounts for:
- Timezone: Galaxy may store local time; OPC UA expects UTC
- Epoch difference: adjust if Galaxy uses a different epoch (e.g., Unix epoch 1970-01-01)
## ElapsedTime Handling
Galaxy `ElapsedTime` (mx_data_type=7) represents a duration/timespan. OPC UA has no native TimeSpan type. Options:
- **Double (i=11)**: Store as seconds (recommended for simplicity)
- **Duration (i=290)**: OPC UA type alias for Double, semantically represents milliseconds — use if the OPC UA SDK supports it
+13
View File
@@ -0,0 +1,13 @@
-- Table: ConversionQueue
CREATE TABLE [ConversionQueue] (
[id] int NULL,
[Name] nvarchar(329) NULL,
[IsCheckedOut] bit NOT NULL,
[Status] bit NOT NULL DEFAULT ((0)),
[MetaData] nchar(256) NULL,
[OperationType] nchar(20) NOT NULL,
[timestamp_of_last_change] bigint NULL,
[change_type] int NULL
);
GO
@@ -0,0 +1,9 @@
-- Table: CurrentSessionContainedName
CREATE TABLE [CurrentSessionContainedName] (
[Uniqeid] int NOT NULL,
[obj_id] int NULL,
[containedname] nvarchar(32) NULL,
CONSTRAINT [PK_CurrentSessionContainedName] PRIMARY KEY ([Uniqeid])
);
GO
+7
View File
@@ -0,0 +1,7 @@
-- Table: ImportTransaction
CREATE TABLE [ImportTransaction] (
[ImportOperationId] nvarchar(329) NULL,
[Status] bit NOT NULL DEFAULT ((1))
);
GO
+8
View File
@@ -0,0 +1,8 @@
-- Table: aa_sql_objects
CREATE TABLE [aa_sql_objects] (
[object_name] nvarchar(128) NOT NULL,
[object_type] nvarchar(10) NOT NULL,
CONSTRAINT [PK_aa_sql_objects] PRIMARY KEY ([object_name])
);
GO
@@ -0,0 +1,9 @@
-- Table: affected_overview_symbols
CREATE TABLE [affected_overview_symbols] (
[gobject_id] int NOT NULL,
[package_id] int NOT NULL,
[mx_primitive_id] smallint NOT NULL,
[visual_element_id] int NOT NULL
);
GO
+8
View File
@@ -0,0 +1,8 @@
-- Table: alarm_message_defaults
CREATE TABLE [alarm_message_defaults] (
[phrase_id] int NOT NULL,
[default_message] nvarchar(1024) NOT NULL,
CONSTRAINT [PK_alarm_message_defaults] PRIMARY KEY ([phrase_id])
);
GO
@@ -0,0 +1,8 @@
-- Table: alarm_message_timestamps
CREATE TABLE [alarm_message_timestamps] (
[gobject_id] int NOT NULL,
[timestamp_of_populate] bigint NOT NULL DEFAULT ((0)),
CONSTRAINT [PK_alarm_message_timestamps] PRIMARY KEY ([gobject_id])
);
GO
@@ -0,0 +1,12 @@
-- Table: alarm_message_translations
CREATE TABLE [alarm_message_translations] (
[phrase_id] int NOT NULL,
[locale_id] smallint NOT NULL,
[translated_message] nvarchar(1024) NOT NULL,
CONSTRAINT [PK_alarm_message_translations] PRIMARY KEY ([phrase_id], [locale_id], [phrase_id], [locale_id])
);
GO
ALTER TABLE [alarm_message_translations] ADD FOREIGN KEY ([locale_id]) REFERENCES [supported_locales] ([locale_id]);
GO
+13
View File
@@ -0,0 +1,13 @@
-- Table: alarm_messages
CREATE TABLE [alarm_messages] (
[gobject_id] int NOT NULL,
[package_id] int NOT NULL,
[mx_primitive_id] smallint NOT NULL,
[phrase_id] int NOT NULL,
CONSTRAINT [PK_alarm_messages] PRIMARY KEY ([gobject_id], [package_id], [mx_primitive_id], [phrase_id], [gobject_id], [gobject_id], [mx_primitive_id], [package_id], [gobject_id], [mx_primitive_id], [package_id], [gobject_id], [mx_primitive_id], [package_id])
);
GO
ALTER TABLE [alarm_messages] ADD FOREIGN KEY ([package_id]) REFERENCES [primitive_instance] ([package_id]);
GO
+24
View File
@@ -0,0 +1,24 @@
-- Table: attribute_definition
CREATE TABLE [attribute_definition] (
[attribute_definition_id] int NOT NULL,
[primitive_definition_id] int NOT NULL,
[attribute_name] nvarchar(329) NOT NULL,
[mx_attribute_id] smallint NOT NULL,
[has_config_set_handler] bit NOT NULL,
[mx_data_type] smallint NOT NULL,
[is_array] bit NOT NULL,
[security_classification] smallint NOT NULL,
[security_classification_needs_deployed] bit NOT NULL,
[mx_attribute_category] int NOT NULL,
[is_frequently_accessed] bit NOT NULL,
[is_locked] bit NOT NULL,
[is_locked_needs_deployed] bit NOT NULL,
[mx_value] text(2147483647) NOT NULL,
[mx_value_needs_deployed] bit NOT NULL,
CONSTRAINT [PK_attribute_definition] PRIMARY KEY ([primitive_definition_id], [mx_attribute_id], [primitive_definition_id])
);
GO
ALTER TABLE [attribute_definition] ADD FOREIGN KEY ([primitive_definition_id]) REFERENCES [primitive_definition] ([primitive_definition_id]);
GO
+26
View File
@@ -0,0 +1,26 @@
-- Table: attribute_reference
CREATE TABLE [attribute_reference] (
[gobject_id] int NOT NULL,
[package_id] int NOT NULL,
[referring_mx_primitive_id] smallint NOT NULL DEFAULT ((0)),
[referring_mx_attribute_id] smallint NOT NULL DEFAULT ((0)),
[element_index] smallint NOT NULL DEFAULT ((0)),
[resolved_gobject_id] int NOT NULL DEFAULT ((0)),
[reference_string] nvarchar(700) NOT NULL DEFAULT (''),
[context_string] nvarchar(329) NOT NULL DEFAULT (''),
[object_signature] int NOT NULL DEFAULT ((0)),
[resolved_mx_primitive_id] smallint NOT NULL DEFAULT ((0)),
[resolved_mx_attribute_id] smallint NOT NULL DEFAULT ((0)),
[resolved_mx_property_id] smallint NOT NULL DEFAULT ((0)),
[attribute_signature] int NOT NULL DEFAULT ((0)),
[lock_type] int NOT NULL DEFAULT ((0)),
[is_valid] bit NOT NULL DEFAULT ((0)),
[attr_res_status] int NOT NULL DEFAULT ((0)),
[attribute_index] smallint NULL DEFAULT ((-1)),
CONSTRAINT [PK_attribute_reference] PRIMARY KEY ([gobject_id], [package_id], [referring_mx_primitive_id], [referring_mx_attribute_id], [element_index], [gobject_id], [package_id], [referring_mx_primitive_id], [gobject_id], [package_id], [referring_mx_primitive_id], [gobject_id], [package_id], [referring_mx_primitive_id])
);
GO
ALTER TABLE [attribute_reference] ADD FOREIGN KEY ([referring_mx_primitive_id]) REFERENCES [primitive_instance] ([package_id]);
GO
@@ -0,0 +1,11 @@
-- Table: attributes_translation_table
CREATE TABLE [attributes_translation_table] (
[gobject_id] int NULL,
[attribute_name] nvarchar(329) NOT NULL,
[new_primitive_id] int NULL,
[new_attribute_id] int NULL,
[old_primitive_id] int NULL,
[old_attribute_id] int NULL
);
GO
+11
View File
@@ -0,0 +1,11 @@
-- Table: autobind_device
CREATE TABLE [autobind_device] (
[dio_id] int NOT NULL,
[overridden_naming_rule_id] int NULL,
CONSTRAINT [PK_autobind_device] PRIMARY KEY ([dio_id], [overridden_naming_rule_id], [dio_id])
);
GO
ALTER TABLE [autobind_device] ADD FOREIGN KEY ([dio_id]) REFERENCES [gobject] ([gobject_id]);
GO
@@ -0,0 +1,11 @@
-- Table: autobind_device_category
CREATE TABLE [autobind_device_category] (
[category_id] smallint NOT NULL,
[rule_id] int NULL DEFAULT ((0)),
CONSTRAINT [PK_autobind_device_category] PRIMARY KEY ([category_id], [rule_id], [category_id])
);
GO
ALTER TABLE [autobind_device_category] ADD FOREIGN KEY ([category_id]) REFERENCES [lookup_category] ([category_id]);
GO

Some files were not shown because too many files have changed in this diff Show More