Compare commits

...

27 Commits

Author SHA1 Message Date
Joseph Doherty 2d07d716dc Recover stashed driver-gaps work from pre-v2-mxgw-merge working tree
Captures uncommitted work that lived in the working tree on
v2-mxgw-integration but was orthogonal to the migration. Stashed
during the v2-mxgw merge to master (2026-04-30) and replanted here on
a feature branch off master so it's git-visible rather than living in
the stash list.

Two distinct buckets:

1. Tracked fixture/config refinements (10 files, ~36 lines):
   - scripts/e2e/test-opcuaclient.ps1
   - src/ZB.MOM.WW.OtOpcUa.Admin/appsettings.json
   - 5 docker-compose.yml under tests/.../IntegrationTests/Docker/
     (AbCip, Modbus, OpcUaClient, S7)
   - 4 fixture .cs files (AbServerFixture, ModbusSimulatorFixture,
     OpcPlcFixture, Snap7ServerFixture)

2. Untracked driver-gaps queue artifacts (~8000 lines):
   - docs/plans/{abcip,ablegacy,focas,opcuaclient,s7,twincat}-plan.md
     — per-driver gap plans
   - docs/featuregaps.md — cross-cutting analysis
   - docs/v2/focas-deployment.md, docs/v2/implementation/focas-simulator-plan.md
   - followup.md — auto/driver-gaps queue follow-ups
   - scripts/queue/ — PR-queue automation tooling (12 files including
     pr-manifest.yaml at 1473 lines)

This commit is a snapshot for recoverability — review and split into
focused PRs (or discard) before merging anywhere downstream.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-30 08:28:01 -04:00
Joseph Doherty ae7106dfce Merge branch 'v2-mxgw-integration': in-process GalaxyDriver via mxaccessgw
Lands the v2-mxgw migration end-to-end (39 PRs across 7 phases, plus
follow-up triage). Galaxy access now flows through the in-process
GalaxyDriver talking gRPC to a separately-installed mxaccessgw,
replacing the legacy out-of-process Galaxy.Host / Galaxy.Proxy /
Galaxy.Shared trio. The OtOpcUa server is .NET 10 AnyCPU; the
MXAccess COM bitness constraint moved to the gateway's worker.

Headline changes:

- Phase 1 (1.1-1.3, 1+2.W): IHistoryRouter at the server level;
  per-driver IHistoryProvider fallback retired.
- Phase 2 (2.1-2.3): AlarmConditionService at the server level driven
  by AlarmConditionInfo's five sub-attribute refs (InAlarmRef /
  PriorityRef / DescAttrNameRef / AckedRef / AckMsgWriteRef).
- Phase 3 (3.1-3.W): Driver.Historian.Wonderware sidecar (net48 x86)
  + .NET 10 client + pipe IPC for the historian SDK.
- Phase 4 (4.0-4.W): in-process Driver.Galaxy with all 8 capability
  interfaces (ITagDiscovery / IReadable / IWritable / ISubscribable /
  IRediscoverable / IHostConnectivityProbe + IDriver / IDisposable);
  ReconnectSupervisor + DeployWatcher + PerPlatformProbeWatcher.
- Phase 5 (5.1-5.W): parity matrix scaffolding; matrix verified green
  on the live ZB galaxy 2026-04-30 (14 passed / 1 skipped / 0 failed).
- Phase 6 (6.1-6.W): perf surface — OpenTelemetry traces around gw
  calls, bounded EventPump channel + drop-newest metrics, buffered
  update interval landing, soak scenario harness, tuned defaults,
  Galaxy.Performance.md.
- Phase 7 (7.1-7.3): Galaxy:DefaultBackend = "GalaxyMxGateway"
  default-flip; PR 7.2 deleted the 9 legacy project directories
  (Driver.Galaxy.Host, .Proxy, .Shared, Galaxy.E2E, Galaxy.ParityTests,
  Galaxy.TestSupport, plus the three tests projects); doc + memory
  housekeeping.

Plus follow-ups: production-path read via subscribe-once, ApiKey
resolver (env:/file:/literal), session-level
SetBufferedUpdateInterval, EventPump channel capacity surfaced through
options. graccess-cli typelib + lifecycle bugs filed as separate
requirements docs in the gw repo.
2026-04-30 08:19:06 -04:00
Joseph Doherty 1bd8a1875b PR 7.3 tail — doc + memory housekeeping for retired Galaxy.Host
Closes the v2-mxgw migration's housekeeping debt now that PR 7.2 has
retired the legacy projects + service.

Repo docs:
- CLAUDE.md: rewrote the Galaxy section + reference-impl + MXAccess
  documentation pointers; replaced .NET 4.8 x86 / COM apartment
  constraints with .NET 10 AnyCPU + a pointer to the gateway. Dropped
  the "Service hosting (Galaxy.Host)" library-preferences row.
- docs/ServiceHosting.md: rewrote (was 156 lines of Galaxy.Host pipe
  IPC details). Now reflects the v2 process shape: OtOpcUa.Server +
  OtOpcUa.Admin + optional OtOpcUaWonderwareHistorian, with Galaxy
  access via the in-process driver → mxaccessgw.
- docs/v2/dev-environment.md: scrubbed four Galaxy.Host references
  (TwinCAT/Galaxy.Host shared-host note; .NET 4.8 SDK row; install
  step #2; risks table). The .NET 4.8 SDK is now correctly framed as
  "optional, only needed when building the mxaccessgw worker".
- mxaccess_documentation.md: deleted from the repo root (obsolete; the
  gateway repo is the canonical MxAccess API doc).

Memory housekeeping (under ~/.claude/projects/.../memory/):
- Retired: project_galaxy_host_service.md,
  project_galaxy_host_installed.md, reference_impl.md (the LmxProxy
  Host MXAccess reference is no longer the design pattern this repo
  uses).
- Revised: project_overview.md (now describes the .NET 10 + mxaccessgw
  shape), project_aveva_platform_installed.md (AVEVA still required
  on the dev box but consumed by the gateway worker, not by anything
  here), project_galaxy_via_mxgateway.md (post-7.2 state — flagged as
  the only Galaxy backend), project_server_history_alarm_subsystems.md
  (per-driver fallbacks retired in PR 7.2).
- MEMORY.md index updated to match.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-30 08:14:22 -04:00
Joseph Doherty fe91d42927 PR 7.2 — Retire legacy Galaxy projects + service
Matrix-gate satisfied (14 passed / 1 skipped / 0 failed on 2026-04-30
per docs/v2/Galaxy.ParityMatrix.md). Galaxy access flows through the
in-process GalaxyDriver → mxaccessgw exclusively. Legacy infrastructure
deleted in this commit:

Source projects (6):
- src/ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Host         (.NET 4.8 x86 + MXAccess COM)
- src/ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Proxy        (in-process pipe client)
- src/ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Shared       (pipe-IPC contracts)
- tests/ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Host.Tests
- tests/ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Proxy.Tests
- tests/ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Shared.Tests

Test projects with no consumer after legacy retired (3):
- tests/ZB.MOM.WW.OtOpcUa.Driver.Galaxy.E2E         (drove Galaxy.Host EXE)
- tests/ZB.MOM.WW.OtOpcUa.Driver.Galaxy.ParityTests (drove both backends)
- tests/ZB.MOM.WW.OtOpcUa.Driver.Galaxy.TestSupport (only consumed by Host/Proxy tests)

Edits:
- ZB.MOM.WW.OtOpcUa.slnx: drop nine project entries
- Server.csproj: drop Driver.Galaxy.Proxy ProjectReference
- Server/Program.cs: drop GalaxyProxyDriverFactoryExtensions.Register
  + the parallel-registration comment block; only GalaxyDriverFactoryExtensions
  registers now under DriverType "GalaxyMxGateway"
- Install-Services.ps1: rewrite to drop OtOpcUaGalaxyHost service install +
  the GalaxySharedSecret/ZbConnection/GalaxyClientName/GalaxyPipeName/
  AvevaServiceDependencies/MxAccessInitialConnect* parameters that only
  applied to the legacy host. Adds a closing note pointing operators at
  the separate mxaccessgw install
- Uninstall-Services.ps1: keep OtOpcUaGalaxyHost in the cleanup loop so
  pre-7.2 rigs upgrade-uninstall cleanly, plus add OtOpcUaWonderwareHistorian
- scripts/e2e/test-galaxy.ps1: deleted (drove the legacy E2E)
- scripts/e2e/e2e-config.sample.json: rewrite the galaxy section comment
  to reflect the GalaxyMxGateway-only path
- scripts/e2e/README.md: drop OtOpcUaGalaxyHost references
- scripts/compliance/phase-7-compliance.ps1: drop Galaxy.Shared
  HistorianAlarms* checks (those contracts moved to
  Driver.Historian.Wonderware.Client in PR 3.4)

Live state: OtOpcUaGalaxyHost Windows service stopped + removed via
NSSM before this commit. The dev box's Galaxy access is now exclusively
through the running mxaccessgw (separate repo).

Stays out of scope for PR 7.2 (PR 7.3 territory):
- CLAUDE.md Galaxy section rewrite
- mxaccess_documentation.md deletion
- Memory entries for the now-retired Galaxy.Host service

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-30 08:01:19 -04:00
Joseph Doherty 6bf147a113 docs: drop soak + 2-week-pilot as PR 7.2 preconditions
The parity matrix gate is the precondition for retiring the legacy
Galaxy projects. The 24h × 50k soak run and 2-week production pilot
were sketched in early planning as additional safety nets but aren't
operationally applicable for this deployment — there's no separate
production fleet to pilot against, and the soak harness's value is as
ongoing diagnostic infrastructure (still shipped in PR 6.4) rather
than a one-shot release gate.

PR 7.2's only remaining precondition is the matrix being fully green
or carrying documented accepted-deltas — verified 2026-04-30 on the
dev rig: 14 passed / 1 skipped / 0 failed.

Affected:
- docs/v2/Galaxy.ParityMatrix.md "Outstanding deltas" — flips to
  "PR 7.2 is unblocked"
- docs/v2/Galaxy.ParityRig.md "After the rig is green" — drops the
  three-step soak+pilot flow, keeps only the matrix-doc bookkeeping
  follow-up
- lmx_mxgw_impl.md PR 7.2 "Depends on" — replaces "fully soaked"
  with the matrix-green precondition + the verification date

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-30 07:51:39 -04:00
Joseph Doherty 9db2edcbb5 parity: matrix fully green on dev rig (2026-04-30)
End-to-end run on the live ZB galaxy with mxaccessgw on
http://localhost:5120: 14 passed / 1 skipped / 0 failed in 18m53s.
PR 7.2's matrix-gate condition met. Three resolution patches in this
commit; the matrix doc records the new state.

1. Discoverer: defensive `[]` array-suffix strip
   ----------------------------------------------------
   The gw's GalaxyRepository.cs:173-175 appends `[]` to
   array-typed full_tag_reference values, but MxAccess COM
   IInstance.AddItem doesn't accept `[]`-suffixed addresses.
   GalaxyDiscoverer.StripArraySuffix removes the suffix client-side
   so SubscribeBulk / Read / Write paths see the canonical form.
   Tracked in mxaccessgw/requirements-array-suffix-fix.md; this
   workaround is removed when the gw fix lands.

2. WriteByClassification: pin status class, not exact code
   ---------------------------------------------------------
   Legacy MxAccessGalaxyBackend.WriteValuesAsync flat-maps every
   failure to BadInternalError (0x80020000); mxgw's
   GatewayGalaxyDataWriter.TranslateReply uses
   MxStatusProxy.RawDetectedBy to distinguish gw-layer faults
   (BadCommunicationError, 0x80050000) from MxAccess HRESULT
   faults. Both yield Bad-status — the parity invariant is the
   status class (Good/Uncertain/Bad), not the exact code. Both
   write tests now use AssertStatusClassMatches; legacy mapping
   retires alongside GalaxyProxyDriver in PR 7.2.

3. BrowseAndReadParity Read scenario: drop CLR-type assertion
   ------------------------------------------------------------
   Legacy returns the raw VARIANT (e.g. byte[]) for an attribute
   that hasn't received its first value cycle from MxAccess yet,
   while mxgw returns the typed value (Single, Int32, etc.). Once
   a real value is written or scanned, both converge. Pinning
   CLR-type equality across the uninitialized window adds noise
   without a real parity invariant — the StatusCode-class
   assertion already covers the "did the read succeed" question.
   The test still pins StatusCode-class parity per scenario.

4. Galaxy.ParityMatrix.md — first-rig results captured
   -----------------------------------------------------
   Per-row status flipped from "n/a unverified" to actual
   green / yellow / deferred outcomes from this run. Four new
   accepted-deltas added (read-value CLR type, write-status code
   mapping, single-platform ScanState scope, gw `[]` suffix
   workaround), bringing the total to nine. Outstanding deltas
   section flipped to "none as of 2026-04-30."

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-30 04:19:56 -04:00
Joseph Doherty 5e890ec9d6 parity: triage 3 false-positives from first-rig run (2026-04-30)
After running the matrix end-to-end against the live rig for the
first time, three of the nine failures were false positives — bugs in
the harness and test invariants, not real backend deltas:

1. ParityHarness configured the legacy backend with
   OTOPCUA_GALAXY_BACKEND=db, which is Discover-only. Reads, writes,
   and reinits all returned "MXAccess code lift pending — DB-backed
   backend covers Discover only". Switched to mxaccess backend; the
   ZB connection string still drives the discovery path.

2. HistoryReadParityTests asserted "neither backend implements
   IHistoryProvider" — but the legacy GalaxyProxyDriver still does
   (it's an accepted back-compat delta retired in PR 7.2). The
   architectural pin we *want* is "the new path doesn't regress to
   per-driver history", so the test now asserts only the mxgw side.

3. AlarmTransitionParityTests strict-pinned the five sub-attribute
   refs (InAlarmRef, etc.) on the legacy condition. PR 2.1 added
   those refs specifically so the new mxgw driver could populate them
   via AlarmRefBuilder; legacy pre-dates PR 2.1 and leaves them null
   — that's correct, not a regression. Test now asserts a one-way
   invariant: when legacy populated a ref, mxgw must match. When
   legacy is null, mxgw is free to populate (the mxgw → server-side
   AlarmConditionService direction).

The six remaining failures are real:

- 2 from the gw-side `[]` array suffix (filed in
  mxaccessgw/requirements-array-suffix-fix.md)
- 2 write-StatusCode mapping deltas (0x80050000 vs 0x80020000) —
  Bad-status both ways but mapped to different OPC UA codes
- 1 event-rate ratio of 5x (mxgw dispatches 5x legacy in the same
  3s window)
- (Plus the 2 ScanState scenarios that skip cleanly — single-platform
  rig as documented)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-30 03:00:44 -04:00
Joseph Doherty 580c45f494 docs: parity rig — concrete mxaccessgw setup recipe
Replaces the placeholder "configure an API key per gateway.md" with
the actual commands that worked end-to-end on this dev box:

- Build both halves (Worker x86 net48, Server net10)
- apikey init-db + apikey create-key with the seven scopes the parity
  test exercises (session:*, invoke:*, events:read, metadata:read)
- Three env-var overrides at server startup — capturing real lessons
  learned standing the rig up:
  * Kestrel__Endpoints__Http__Url = http://localhost:5120
  * Kestrel__Endpoints__Http__Protocols = Http2 (gRPC needs h2c on
    plain HTTP — without this flag the client gets HTTP_1_1_REQUIRED)
  * MxGateway__Worker__ExecutablePath = absolute path to the built
    worker (appsettings.json's relative path drops \net48 and the
    server can't resolve it)
- Note that workers spawn lazily on first OpenSession, not at server
  startup — so port-listening is necessary but not sufficient
  evidence the gateway is healthy.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-30 02:27:08 -04:00
Joseph Doherty da277a843a docs: provisioning recipes for parity rig via graccess-cli
Calls out the single-platform constraint on this dev box and the
graccess-cli at C:\Users\dohertj2\Desktop\graccess as the way to
configure the rest of the parity-rig Galaxy shape:

- ScanState probe parity (multi-platform) is deferred to a customer
  rig — not feasible on this dev box. PR 7.2 gate accepts
  "n/a, deferred" on those rows because PR 4.7's unit tests already
  pin the state-decoder + member-tracking logic.
- Per-row provisioning recipes for the five ⚙-scriptable rows:
  FreeAccess/Operate UDA, Configure/Tune UDA, value-change source
  (recommend external write-loop over template surgery), $Alarm*
  extension, History extension. All against a reserved
  OtOpcUaParityTest sandbox UDO so plant-relevant objects stay
  untouched.
- Trailing deploy + Galaxy.Host restart so MxAccess picks up the
  change before re-running the matrix.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-29 22:40:31 -04:00
Joseph Doherty c55da145ec docs: add Galaxy parity rig runbook
Walks through standing up both Galaxy backends side-by-side against a
single live Galaxy:

- Conceptual layout (two MxAccess sessions on distinct ClientNames so
  they don't evict each other)
- What's already on the dev box (AVEVA + OtOpcUaGalaxyHost service)
- mxaccessgw build + run + config (API key, ClientName)
- The three OTOPCUA_PARITY_* env vars the harness reads
- HarnessShapeTests as the two-line truth-teller for "did both halves
  resolve"
- Galaxy-shape coverage matrix mapping each scenario to what's needed
  for it to assert (rather than skip)
- Soak run recipes, including the compressed-tag fallback when the dev
  Galaxy doesn't have 50k attributes
- Troubleshooting for the four common SkipReasons
- Three further gates before PR 7.2 lands (matrix green, soak data,
  pilot flip)

Explicitly drops the stale "use a non-elevated shell" precondition —
the legacy Galaxy.Host pipe ACL accepts elevated and non-elevated
dohertj2 alike (resolved 2026-04-24).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-29 22:08:43 -04:00
Joseph Doherty 42f41fbe50 v2-mxgw follow-ups: production reads, secret resolution, perf knobs
Lands the five concrete code-level follow-ups identified after Phase 7.1:

#1 GalaxyDriver.ReadAsync now works in production. Previously threw
   NotSupportedException when no test reader was injected. New path
   subscribes through the existing SubscriptionRegistry + EventPump,
   waits for the first OnDataChange per item handle (gw pushes the
   initial value after SubscribeBulk), then unsubscribes. Tags the gw
   rejects up front, or that don't publish before the caller's CT
   fires, return Bad-status snapshots in input order so callers still
   get one snapshot per requested reference.

#2 ResolveApiKey() routes Gateway.ApiKeySecretRef through three forms:
   env:NAME, file:PATH, or literal-string fallback. A future DPAPI arm
   slots in here without touching the call site.

#3 GatewayGalaxySubscriber actually honors bufferedUpdateIntervalMs now
   (was being silently dropped). Calls SetBufferedUpdateInterval via
   the gw's MxCommandKind.SetBufferedUpdateInterval before SubscribeBulk
   when the requested interval differs from the cached last-applied
   value. Soft-fails on a non-Ok protocol status (the SubscribeBulk
   still succeeds at gw cadence).

#4 GalaxyMxAccessOptions.EventPumpChannelCapacity surfaces the bounded-
   channel size through DriverConfig JSON, defaulting to 50_000.

#5 Stale doc-comments in HostStatusAggregator and GatewayGalaxySubscriber
   describing follow-ups that already shipped.

Tests: +6 (read subscribe-once happy path + rejected-tag fallback;
five resolver scenarios). Total Galaxy driver tests now 180/180 green.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-29 17:27:24 -04:00
Joseph Doherty d5a87c7467 PR 7.3 — Doc updates for v2 Galaxy backend (partial)
Forward-looking doc surface for the new in-process GalaxyDriver:

- CLAUDE.md gains a "v2 Galaxy backend" preamble at the top pointing
  readers at lmx_mxgw.md and docs/v2/Galaxy.Performance.md, and
  framing the rest of the doc as the still-accurate v1 Galaxy.Host
  description.
- New auto-memory entry project_galaxy_via_mxgateway.md captures the
  default-since-PR-7.1 status, perf surface entry points, and the
  soak validation knobs.

Intentionally deferred until PR 7.2 (parity-rig-validated):

- Removing the v1 description and rewriting the architecture section
  outright.
- Deleting mxaccess_documentation.md (still consumed by Galaxy.Host).
- Retiring memory entries for project_galaxy_host_service.md /
  project_galaxy_host_installed.md / project_aveva_platform_installed.md
  — those describe a stack that's still installed and in active use.
- Scrubbing Galaxy.Host references from docs/v2/dev-environment.md,
  docs/ServiceHosting.md, docs/Redundancy.md, docs/security.md.

All those changes presuppose the legacy stack is gone, which it isn't
yet. Re-open this PR's tail once the parity matrix in
docs/v2/Galaxy.ParityMatrix.md is fully green on a live rig.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-29 17:07:23 -04:00
Joseph Doherty 6f4cbf8449 PR 7.1 — Default-flip Galaxy backend to mxgateway
Adds Galaxy.DefaultBackend = "GalaxyMxGateway" to the server
appsettings as the forward-looking default for tooling and migration
scripts that author new Galaxy DriverInstance rows. No runtime
behavior change — both factories register independently at startup,
so existing rows keep working until PR 7.2 retires the legacy
registration (gated on the parity matrix in
docs/v2/Galaxy.ParityMatrix.md going fully green on the parity rig).

The e2e-config.sample.json comment is updated to reflect the new
default endpoint (http://localhost:5120 mxaccessgw) while still
pointing pre-flip rigs at the legacy OtOpcUaGalaxyHost path.

Install-Services.ps1's OtOpcUaGalaxyHost registration is intentionally
unchanged — yanking that mid-flight without a soaked parity rig would
leave any in-progress installation without a Galaxy backend at all.
PR 7.2 retires it alongside the legacy projects.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-29 17:05:50 -04:00
Joseph Doherty edee47d77f PR 6.W — Galaxy.Performance.md
Documents the four perf surfaces shipped in Phase 6:

- Tracing surface (PR 6.1) — table of every span the driver emits +
  rationale for stream-level (not per-event) coverage.
- Metrics surface (PR 6.2) — three EventPump counters, tagging
  scheme, the bounded-channel design, and the
  received = dispatched + dropped + in-flight invariant.
- Buffered update interval (PR 6.3) — how MxAccess.PublishingIntervalMs
  flows through both subscribe paths and what's still pending on the
  gw side (typed SetBufferedUpdateInterval helper).
- Soak scenario (PR 6.4) — env-var-gated 24h × 50k validation with
  the CI-compressed override recipe.
- Tuned defaults (PR 6.5) — table of every default with source +
  notes; rows marked "unchanged" carry the explicit "no live data
  argues for changing this" caveat.

Closes with a "where to look first when something's slow" runbook
section so on-call doesn't have to re-derive the trace+metric
correlation map from primary docs.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-29 17:04:23 -04:00
Joseph Doherty 22ef2eb5ba PR 6.5 — Tune MxGatewayClientOptions defaults
Bumps DefaultCallTimeoutSeconds from 5 → 30. The 5s default was
provably unsafe regardless of soak data: a 50k-tag SubscribeBulk
walks the gw worker's item list serially under the MxAccess COM
apartment lock, and that scan can exceed 5s on a busy node. 30s
leaves comfortable headroom for the legitimate worst case while
still failing fast on a wedged worker.

ConnectTimeoutSeconds (10) and StreamTimeoutSeconds (0 = unlimited)
unchanged — the soak harness in PR 6.4 didn't observe pressure on
either, so they stay at their original sane values until live data
indicates otherwise.

Tuning rationale captured as a code comment in GalaxyGatewayOptions
so the next reader knows what was deliberate and what's pending live
soak data.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-29 17:03:06 -04:00
Joseph Doherty 698bdef572 PR 6.4 — Soak scenario test
Long-running soak harness exercising the in-process GalaxyDriver
against a live mxaccessgw. Subscribes a configurable tag count
(default 50_000), holds the subscription for a configurable duration
(default 24h), polls the EventPump's three counters every minute, and
asserts:

- events.received continues to grow (gw stream isn't stuck)
- events.dropped stays under a configurable percent ceiling
  (default 0.5%)
- process working-set doesn't grow >1 GB above baseline (leak guard)

Always skipped unless the operator opts in via OTOPCUA_SOAK_RUN=1.
Tag count, duration, and drop ceiling are env-overridable
(OTOPCUA_SOAK_TAGS / OTOPCUA_SOAK_MINUTES / OTOPCUA_SOAK_DROP_PCT) so
a smoke run can compress the scenario for CI gating.

Per-minute progress is logged as a CSV-style line to stdout so an
operator can grep the test runner output mid-run. PR 6.5 consumes the
data this scenario emits to tune MxGatewayClientOptions defaults.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-29 17:00:52 -04:00
Joseph Doherty 2fdad81af3 PR 6.3 — Buffered update interval landing
Wires MxAccess.PublishingIntervalMs into the gw's SubscribeBulk
bufferedUpdateIntervalMs parameter on both subscribe paths:

- GalaxyDriver.SubscribeAsync — when the caller passes TimeSpan.Zero
  (typical for infrastructure callers like the deploy watcher), the
  driver substitutes _options.MxAccess.PublishingIntervalMs. When the
  caller sets a non-zero interval (the server's UA subscription
  publishingInterval), that wins.
- PerPlatformProbeWatcher — new bufferedUpdateIntervalMs ctor parameter
  defaulting to 0 (gw default cadence). GalaxyDriver passes
  _options.MxAccess.PublishingIntervalMs so probe ScanState changes
  publish at the configured rate.

Tests: caller-wins-when-non-zero, fallback-to-config-when-zero on the
driver; default-zero, configured-forwarded, negative-rejected on the
probe watcher.

A session-level SetBufferedUpdateInterval RPC exists in the gw protocol
(MxCommandKind.SetBufferedUpdateInterval) but the .NET client doesn't
expose a typed helper yet — adjusting an existing subscription's
interval is a follow-up. Today's path subscribes once with the right
interval, which covers the common case.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-29 16:56:33 -04:00
Joseph Doherty 7b21c3b428 PR 6.2 — Bounded EventPump channel + drop-newest metrics
Decouples the gw stream-read loop from the listener-fanout loop with a
bounded Channel<MxEvent> (default capacity 50_000) sitting between them.
When a slow listener fills the channel, the producer's TryWrite returns
false and we count the drop rather than back-pressuring the gw stream.

Three counters on the ZB.MOM.WW.OtOpcUa.Driver.Galaxy meter expose the
pressure curve before it manifests as user-visible loss:

- galaxy.events.received  — MxEvents read from StreamEvents
- galaxy.events.dispatched — MxEvents that made it through to OnDataChange
- galaxy.events.dropped   — MxEvents discarded because the channel was full

Each measurement carries a galaxy.client tag so multi-driver hosts can
split by source. The driver wires _options.MxAccess.ClientName into the
new EventPump constructor parameter.

Tests: drop-newest under pressure, capacity validation, and per-pump
measurement filtering (xUnit can run other pump tests in parallel and
their measurements land on the same listener — the test filters to its
own client name).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-29 16:50:39 -04:00
Joseph Doherty 619207e7f5 PR 6.1 — OpenTelemetry traces around gw calls
In-box ActivitySource ("ZB.MOM.WW.OtOpcUa.Driver.Galaxy") wrapped around
the three gw-facing seams via decorators:

- TracedGalaxySubscriber — galaxy.subscribe_bulk / galaxy.unsubscribe_bulk
  / galaxy.stream_events spans. Stream span covers the entire stream
  lifetime with a galaxy.event_count tag (per-event spans would dominate
  the trace volume at 50k tags / 1Hz; PR 6.2 owns per-event metrics).
- TracedGalaxyDataWriter — galaxy.write spans tagged with
  galaxy.tag_count, galaxy.secured_write_count (split between FreeAccess
  /Operate vs Tune/Configure/VerifiedWrite, computed only when a listener
  is recording so the hot path stays free), galaxy.success_count.
- TracedGalaxyHierarchySource — galaxy.get_hierarchy spans tagged with
  galaxy.object_count.

GalaxyDriver.BuildProductionRuntimeAsync wraps the production seams in
the decorators. The driver itself doesn't take an OpenTelemetry package
dependency — System.Diagnostics.ActivitySource is in-box; the host
process picks the listener.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-29 16:36:47 -04:00
Joseph Doherty 78fe3e8a45 PR 5.W — Galaxy.ParityMatrix.md
Tabular scenario × result map for the seven Phase 5 parity scenarios
(BrowseAndRead, Subscribe, Write, Alarm, History, Reconnect, ScanState).
Each row records the assertion strength (green strict, yellow soft) and
flags accepted-delta cases:

- Transport-entry host name divergence (legacy = Galaxy.Host process,
  mxgw = MxAccess.ClientName)
- Reconnect latency cadence — different paths, both correct for their
  own session shape
- Sampled-read value drift (we pin StatusCode + type, not value)
- Event-rate ±50% tolerance over a 3s window
- Per-driver IHistoryProvider absence (architectural pin from PR 1.3)

Phase 7 (PR 7.1) consumes this matrix as the default-flip gate.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-29 16:32:20 -04:00
Joseph Doherty 837172ab39 PR 5.8 — Per-platform ScanState probe parity scenarios
Closes Phase 5 scenario coverage. Both
GalaxyRuntimeProbeManager (legacy) and PerPlatformProbeWatcher (PR 4.7)
must surface the same per-host status stream:

- GetHostStatuses_emits_same_host_set_after_Discover — drives Discover
  on both backends, waits 1.5s for the probe watcher's first push, then
  asserts the platform-host set agrees (transport-entry names differ
  by design — legacy uses the Galaxy.Host process identity, mxgw uses
  MxAccess.ClientName, so we strip those before comparing).
- GetHostStatuses_state_per_platform_matches_across_backends — for
  every overlapping platform host, the HostState must be identical.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-29 16:31:09 -04:00
Joseph Doherty 80a0ca2651 PR 5.7 — Reconnect / disruption parity scenarios
- Reinitialize_returns_both_backends_to_Healthy — drives
  ReinitializeAsync on each backend, asserts DriverState.Healthy
  afterwards, then re-reads a 3-tag sample to confirm the runtime
  surface is back. Recovery latency isn't pinned tightly (legacy = pipe
  + MxAccess COM client, mxgw = re-Register gw session — different
  cadences are expected).
- Health_state_diverges_only_when_one_backend_is_in_recovery — soft
  pin that both backends sit in Healthy or Degraded after init.

A tighter fault-injection scenario (toxiproxy-style) is the 5.7
follow-up — landed when the parity rig grows that capability.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-29 16:29:44 -04:00
Joseph Doherty 8d042c631b PR 5.6 — History-read parity scenarios
Galaxy history reads route through the server-owned HistoryRouter
(Phase 1, PR 1.3) — neither Galaxy backend implements IHistoryProvider
directly. Parity surface here is the routing decision:

- Discover_emits_same_historized_attribute_set_for_both_backends — the
  IsHistorized attribute set must agree symmetric-set-wise; that's what
  HistoryRouter consumes when deciding whether to route a HistoryRead to
  the Wonderware historian sidecar.
- Neither_Galaxy_backend_implements_IHistoryProvider_directly — pins
  the architectural decision so a regression that re-introduces a
  per-driver history path fires.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-29 16:29:01 -04:00
Joseph Doherty bbdbdf8afb PR 5.5 — Alarm transition parity scenarios
- Discover_emits_same_AlarmConditionInfo_per_alarm_attribute — both
  backends produce the same alarm-condition source-node-id set, with
  matching SourceName / InitialSeverity / InAlarmRef / DescAttrNameRef
  per condition. Skips when the rig's Galaxy carries no alarm-marked
  attributes.
- Discover_marks_at_least_one_alarm_attribute_when_dev_Galaxy_has_alarms
  — IsAlarm-marked variable count parity, soft-pinned (count must
  match across backends but doesn't have to be non-zero).

Alarm-event persistence (the SQLite store-and-forward → Wonderware
historian event store path) is exercised in PR 5.6 against the
historian sidecar.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-29 16:28:13 -04:00
Joseph Doherty 982771df9a PR 5.4 — Write-by-classification parity scenarios
Both backends route a write through the same path keyed off the attribute's
SecurityClassification, so a single write request must produce the same
StatusCode on each:

- FreeAccess_or_Operate_write_returns_same_StatusCode_on_both_backends
  picks the first numeric FreeAccess/Operate attribute and writes 0.0.
- Configure_class_write_routes_through_secured_path_on_both_backends
  picks a Configure/Tune attribute, writes through the secured path,
  asserts StatusCode parity (the test doesn't care whether the write
  succeeds — only that both backends produce the same outcome).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-29 16:26:57 -04:00
Joseph Doherty 9db6da9c20 PR 5.3 — Subscribe + event-rate parity scenarios
- Subscribe_returns_a_handle_for_each_backend — both backends accept
  the same full-reference list and return a non-null handle, with
  symmetric Unsubscribe cleanup.
- Subscribe_event_rate_within_tolerance_for_a_3s_window — counts
  OnDataChange invocations on each backend across a 3s window and
  asserts the mxgw/legacy ratio sits in [0.5, 1.5]. Skips when the
  sampled tags don't change in the window (configuration-only Galaxy).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-29 16:25:42 -04:00
Joseph Doherty 71443ecbf3 PR 5.2 — Browse + read parity scenarios
Three scenarios using ParityHarness.RequireBoth:

- Discover_emits_same_variable_set_for_both_backends — symmetric set diff
  on the full-reference set must be empty.
- Discover_emits_same_DataType_and_SecurityClass_per_attribute — meta
  triple (DriverDataType, SecurityClass, IsHistorized) must match per
  attribute.
- Read_returns_same_value_and_status_for_a_sampled_attribute — samples
  the first 5 discovered variables, reads through both backends, asserts
  StatusCode equality and value-CLR-type equality (raw values may drift
  between the two reads on a live Galaxy).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-29 16:24:36 -04:00
169 changed files with 10237 additions and 14818 deletions
+73 -26
View File
@@ -4,15 +4,39 @@ This file provides guidance to Claude Code (claude.ai/code) when working with co
## Project Goal
Build an OPC UA server on .NET Framework 4.8 (32-bit) that exposes AVEVA System Platform (Wonderware) Galaxy tags via the MXAccess toolkit. The server mirrors the Galaxy object hierarchy as an OPC UA address space, translating between contained-name browse paths and tag-name runtime references.
Build an OPC UA server (.NET 10) that exposes AVEVA System Platform
(Wonderware) Galaxy tags. The server mirrors the Galaxy object
hierarchy as an OPC UA address space, translating between
contained-name browse paths and tag-name runtime references. Galaxy
access flows through the in-process `GalaxyDriver`
(`src/ZB.MOM.WW.OtOpcUa.Driver.Galaxy/`) talking gRPC to a separately
installed **mxaccessgw** gateway process. The gateway owns the
MXAccess COM bitness constraint (its worker is x86 net48); everything
in this repo is .NET 10. PR 7.2 retired the legacy in-process
`Galaxy.Host` / `Galaxy.Proxy` / `Galaxy.Shared` projects + the
`OtOpcUaGalaxyHost` Windows service.
See `lmx_mxgw.md` for the migration design and
`docs/v2/Galaxy.Performance.md` for the runtime perf surface
(tracing, metrics, soak harness).
## Architecture Overview
### Data Flow
1. **Galaxy Repository DB (ZB)** — SQL Server database holding the deployed object hierarchy and attribute definitions. Queried at startup and on change detection to build/rebuild the OPC UA address space.
2. **MXAccess COM API** — Runtime data access layer. Subscribes to Galaxy tag attributes for live read/write. Requires a dedicated STA thread with a Win32 message pump for COM callbacks.
3. **OPC UA Server** — Exposes the hierarchy as browse nodes and attributes as variable nodes. Clients browse via contained names but reads/writes are translated to `tag_name.AttributeName` format for MXAccess.
1. **Galaxy Repository DB (ZB)** — SQL Server database holding the
deployed object hierarchy and attribute definitions. The
mxaccessgw's `GalaxyRepositoryClient` queries it via gRPC; the
driver consumes the materialised hierarchy through
`IGalaxyHierarchySource`.
2. **MXAccess (via mxaccessgw)** — Live read/write/subscribe over a
gRPC session. The gateway owns the COM apartment + STA pump
server-side; the driver speaks `MxCommand` / `MxEvent` protos
exclusively.
3. **OPC UA Server** — Exposes the hierarchy as browse nodes and
attributes as variable nodes. Clients browse via contained names
but reads/writes are translated to `tag_name.AttributeName` format
for MXAccess.
### Key Concept: Contained Name vs Tag Name
@@ -22,30 +46,17 @@ Galaxy objects have two names:
Example: browsing `TestMachine_001/DelmiaReceiver/DownloadPath` translates to MXAccess reference `DelmiaReceiver_001.DownloadPath`.
See `gr/layout.md` for the full mapping and target OPC UA structure.
### Data Type Mapping
Galaxy `mx_data_type` values map to OPC UA types (Boolean, Int32, Float, Double, String, DateTime, etc.). Array attributes use ValueRank=1 with ArrayDimensions from the Galaxy attribute definition. Full mapping in `gr/data_type_mapping.md`.
Galaxy `mx_data_type` values map to OPC UA types (Boolean, Int32, Float, Double, String, DateTime, etc.). Array attributes use ValueRank=1 with ArrayDimensions from the Galaxy attribute definition. The driver-side mapping lives in `src/ZB.MOM.WW.OtOpcUa.Driver.Galaxy/Browse/DataTypeMap.cs`.
### Change Detection
Poll `galaxy.time_of_last_deploy` in the ZB database to detect redeployments, then rebuild the address space. See `gr/build_layout_plan.md` for the step-by-step plan.
`DeployWatcher` (`src/ZB.MOM.WW.OtOpcUa.Driver.Galaxy/Browse/DeployWatcher.cs`) polls the gateway's deploy-event signal and raises `IRediscoverable.OnRediscoveryNeeded` when the Galaxy redeploys. The server's `DriverHost` consumes the signal and rebuilds the address space.
## Reference Implementation
## mxaccessgw
An existing MXAccess client implementation is at:
`C:\Users\dohertj2\Desktop\scadalink-design\lmxproxy\src\ZB.MOM.WW.LmxProxy.Host`
Key patterns from that codebase:
- **StaComThread** — Dedicated STA thread with Win32 message pump (`GetMessage`/`DispatchMessage` loop). All MXAccess COM objects must be created and called on this thread. Uses `PostThreadMessage(WM_APP)` to marshal work items.
- **LMXProxyServer COM object** — `Register(clientName)` returns a connection handle. `AddItem(handle, address)` + `AdviseSupervisory(handle, itemHandle)` for subscriptions. `OnDataChange`/`OnWriteComplete` events for callbacks.
- **Reconnect** — Stored subscriptions are replayed after reconnect. A probe tag subscription monitors connection health.
- **COM cleanup** — `Marshal.ReleaseComObject()` on disconnect. Event handlers must be unwired before unregister.
## MXAccess Documentation
`mxaccess_documentation.md` in the project root contains the full ArchestrA MXAccess Toolkit User's Guide. Key API: `ArchestrA.MxAccess` namespace, `LMXProxyServer` class. The toolkit DLLs are in `Program Files (x86)\ArchestrA\Framework\bin`.
The gateway lives in a sibling repo at `c:\Users\dohertj2\Desktop\mxaccessgw\`. See `docs/v2/Galaxy.ParityRig.md` for the gw setup recipe (build, API key provisioning via `apikey create-key`, env-var overrides for HTTP/2 cleartext + worker path). The gw's MXAccess Toolkit reference (its `gateway.md`) is the canonical MxAccess API doc; the standalone `mxaccess_documentation.md` previously kept in this repo retired in PR 7.3.
## Galaxy Repository Database
@@ -71,11 +82,48 @@ dotnet test tests/ZB.MOM.WW.OtOpcUa.IntegrationTests # integration tests
dotnet test --filter "FullyQualifiedName~MyTestClass.MyMethod" # single test
```
## Docker Workflow (driver fixtures + central SQL Server)
> **Migrated 2026-04-28**: Docker config + host moved off this dev VM (DESKTOP-6JL3KKO) onto the shared Linux Docker host (`DOCKER`, 10.100.0.35) so the dev VM could shed WSL2/Hyper-V and have its GPU re-attached via ESXi passthrough. Docker Desktop is no longer installed here. All checked-in `appsettings.json` defaults, fixture-class default endpoints, and `e2e-config.sample.json` were rewritten to target `10.100.0.35`. The driver fixture compose files under `tests/.../Docker/docker-compose.yml` now carry a `project: lmxopcua` label on every service. See `docs/v2/dev-environment.md` for the full rewrite (header dated 2026-04-28).
Docker workloads run on a shared Linux host at **`10.100.0.35`** — not on this VM. Stacks live at `/opt/otopcua-<driver>/` on the host and carry the `project=lmxopcua` label so they're discoverable via `docker ps --filter label=project=lmxopcua`.
**`docker -H ssh://...` does NOT work from this VM.** Windows OpenSSH ↔ docker.exe stdio bridging hangs (`docker system dial-stdio` runs server-side but no API data flows). Use the helper below — it SSHes into the docker host and runs `docker compose` server-side.
**Use `lmxopcua-fix.ps1` (in `~/bin`) to control fixtures from this VM:**
```powershell
lmxopcua-fix ls # list all lmxopcua-tagged containers on the host
lmxopcua-fix up modbus standard # bring a profile up
lmxopcua-fix up abcip controllogix
lmxopcua-fix up s7 s7_1500
lmxopcua-fix up opcuaclient # single-service stack, no profile arg
lmxopcua-fix down modbus # tear stack down
lmxopcua-fix logs modbus
lmxopcua-fix sync modbus # rsync this repo's tests/.../Docker/ → /opt/otopcua-modbus/
```
**`sync` is the deployment step.** When you edit a fixture's compose file or Dockerfile under `tests/.../Docker/`, run `lmxopcua-fix sync <driver>` to push the changes to the docker host before bringing the stack up. The repo files are the source of truth; `/opt/otopcua-<driver>/` is a mirrored deployment.
**Endpoints (defaults already point at the docker host):**
- SQL Server (always-on): `10.100.0.35,14330` — used by `appsettings.json` for `ConfigDb`.
- Modbus: `10.100.0.35:5020` (`MODBUS_SIM_ENDPOINT`)
- AB CIP: `10.100.0.35:44818` (`AB_SERVER_ENDPOINT`)
- S7: `10.100.0.35:1102` (`S7_SIM_ENDPOINT`)
- OPC UA reference (opc-plc): `opc.tcp://10.100.0.35:50000` (`OPCUA_SIM_ENDPOINT`)
Override any endpoint via the env var to point at a real PLC. The local OtOpcUa server runs on this VM at `opc.tcp://localhost:4840`**that's not on the docker host**.
See `docs/v2/dev-environment.md` for the full inventory and rationale.
## Build & Runtime Constraints
- Language: C#, .NET Framework 4.8, **x86 (32-bit)** platform target — required for MXAccess COM interop
- MXAccess requires a deployed ArchestrA Platform on the machine running the server
- COM apartment: MXAccess objects must live on an STA thread with a message pump
- Language: C#, .NET 10, AnyCPU. The MXAccess COM bitness constraint
is owned by the mxaccessgw worker (x86 net48), not by anything in
this repo.
- The gateway's MXAccess worker requires a deployed ArchestrA Platform
on the machine running the gateway. The OtOpcUa server itself does
not.
## Transport Security
@@ -83,7 +131,7 @@ The server supports configurable OPC UA transport security via the `Security` se
## Redundancy
The server supports non-transparent warm/hot redundancy via the `Redundancy` section in `appsettings.json`. Two instances share the same Galaxy DB and MXAccess runtime but have unique `ApplicationUri` values. Each exposes `RedundancySupport`, `ServerUriArray`, and a dynamic `ServiceLevel` based on role and runtime health. The primary advertises a higher ServiceLevel than the secondary. See `docs/Redundancy.md` for the full guide.
The server supports non-transparent warm/hot redundancy via the `Redundancy` section in `appsettings.json`. Two instances share the same Galaxy DB and the same mxaccessgw (under distinct `MxAccess.ClientName` values) but have unique `ApplicationUri` values. Each exposes `RedundancySupport`, `ServerUriArray`, and a dynamic `ServiceLevel` based on role and runtime health. The primary advertises a higher ServiceLevel than the secondary. See `docs/Redundancy.md` for the full guide.
## LDAP Authentication
@@ -94,7 +142,6 @@ The server uses LDAP-based user authentication via the `Authentication.Ldap` sec
- **Logging**: Serilog with rolling daily file sink
- **Unit tests**: xUnit + Shouldly for assertions
- **Service hosting (Server, Admin)**: .NET generic host with `AddWindowsService` (decision #30 — replaced TopShelf in v2; see `src/ZB.MOM.WW.OtOpcUa.Server/OpcUaServerService.cs`)
- **Service hosting (Galaxy.Host)**: plain console app wrapped by NSSM (`.NET Framework 4.8 x86` — required by MXAccess COM bitness)
- **OPC UA**: OPC Foundation UA .NET Standard stack (https://github.com/opcfoundation/ua-.netstandard) — NuGet: `OPCFoundation.NetStandard.Opc.Ua.Server`
## OPC UA .NET Standard Documentation
-9
View File
@@ -9,9 +9,6 @@
<Project Path="src/ZB.MOM.WW.OtOpcUa.Core.AlarmHistorian/ZB.MOM.WW.OtOpcUa.Core.AlarmHistorian.csproj"/>
<Project Path="src/ZB.MOM.WW.OtOpcUa.Server/ZB.MOM.WW.OtOpcUa.Server.csproj"/>
<Project Path="src/ZB.MOM.WW.OtOpcUa.Admin/ZB.MOM.WW.OtOpcUa.Admin.csproj"/>
<Project Path="src/ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Shared/ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Shared.csproj"/>
<Project Path="src/ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Host/ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Host.csproj"/>
<Project Path="src/ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Proxy/ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Proxy.csproj"/>
<Project Path="src/ZB.MOM.WW.OtOpcUa.Driver.Galaxy/ZB.MOM.WW.OtOpcUa.Driver.Galaxy.csproj"/>
<Project Path="src/ZB.MOM.WW.OtOpcUa.Driver.Historian.Wonderware/ZB.MOM.WW.OtOpcUa.Driver.Historian.Wonderware.csproj"/>
<Project Path="src/ZB.MOM.WW.OtOpcUa.Driver.Historian.Wonderware.Client/ZB.MOM.WW.OtOpcUa.Driver.Historian.Wonderware.Client.csproj"/>
@@ -46,12 +43,6 @@
<Project Path="tests/ZB.MOM.WW.OtOpcUa.Server.Tests/ZB.MOM.WW.OtOpcUa.Server.Tests.csproj"/>
<Project Path="tests/ZB.MOM.WW.OtOpcUa.Admin.Tests/ZB.MOM.WW.OtOpcUa.Admin.Tests.csproj"/>
<Project Path="tests/ZB.MOM.WW.OtOpcUa.Admin.E2ETests/ZB.MOM.WW.OtOpcUa.Admin.E2ETests.csproj"/>
<Project Path="tests/ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Shared.Tests/ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Shared.Tests.csproj"/>
<Project Path="tests/ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Host.Tests/ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Host.Tests.csproj"/>
<Project Path="tests/ZB.MOM.WW.OtOpcUa.Driver.Galaxy.TestSupport/ZB.MOM.WW.OtOpcUa.Driver.Galaxy.TestSupport.csproj"/>
<Project Path="tests/ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Proxy.Tests/ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Proxy.Tests.csproj"/>
<Project Path="tests/ZB.MOM.WW.OtOpcUa.Driver.Galaxy.E2E/ZB.MOM.WW.OtOpcUa.Driver.Galaxy.E2E.csproj"/>
<Project Path="tests/ZB.MOM.WW.OtOpcUa.Driver.Galaxy.ParityTests/ZB.MOM.WW.OtOpcUa.Driver.Galaxy.ParityTests.csproj"/>
<Project Path="tests/ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Tests/ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Tests.csproj"/>
<Project Path="tests/ZB.MOM.WW.OtOpcUa.Driver.Historian.Wonderware.Tests/ZB.MOM.WW.OtOpcUa.Driver.Historian.Wonderware.Tests.csproj"/>
<Project Path="tests/ZB.MOM.WW.OtOpcUa.Driver.Historian.Wonderware.Client.Tests/ZB.MOM.WW.OtOpcUa.Driver.Historian.Wonderware.Client.Tests.csproj"/>
+41 -112
View File
@@ -2,132 +2,61 @@
## Overview
A production OtOpcUa deployment runs **three processes**, each with a distinct runtime, platform target, and install surface:
A production OtOpcUa deployment runs **two or three processes**, each
with a distinct runtime and install surface:
| Process | Project | Runtime | Platform | Responsibility |
|---|---|---|---|---|
| **OtOpcUa Server** | `src/ZB.MOM.WW.OtOpcUa.Server` | .NET 10 | x64 | Hosts the OPC UA endpoint; loads every non-Galaxy driver in-process; exposes `/healthz`. |
| **OtOpcUa Server** | `src/ZB.MOM.WW.OtOpcUa.Server` | .NET 10 | x64 | Hosts the OPC UA endpoint; loads every driver in-process (Modbus, S7, AbCip, AbLegacy, TwinCAT, FOCAS, OPC UA Client, Galaxy via mxaccessgw); exposes `/healthz`. |
| **OtOpcUa Admin** | `src/ZB.MOM.WW.OtOpcUa.Admin` | .NET 10 (ASP.NET Core / Blazor Server) | x64 | Operator UI for Config DB editing + fleet status, SignalR hubs (`FleetStatusHub`, `AlertHub`), Prometheus `/metrics`. |
| **OtOpcUa Galaxy.Host** | `src/ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Host` | .NET Framework 4.8 | x86 (32-bit) | Hosts MXAccess COM on a dedicated STA thread with a Win32 message pump; exposes a named-pipe IPC surface consumed by `Driver.Galaxy.Proxy` inside the Server process. |
| **OtOpcUa Wonderware Historian** *(optional)* | `src/ZB.MOM.WW.OtOpcUa.Driver.Historian.Wonderware` | .NET Framework 4.8 | x86 (32-bit) | Out-of-process sidecar exposing the Wonderware Historian SDK over a named pipe. Required only when `Historian:Wonderware:Enabled=true` in `appsettings.json`. |
The x86 / .NET Framework 4.8 constraint applies **only** to Galaxy.Host because the MXAccess toolkit DLLs (`Program Files (x86)\ArchestrA\Framework\bin`) are 32-bit-only COM. Every other driver (Modbus, S7, OpcUaClient, AbCip, AbLegacy, TwinCAT, FOCAS) runs in-process in the 64-bit Server.
Galaxy access uses a separately-installed **mxaccessgw** running out
of a sibling repo (`c:\Users\dohertj2\Desktop\mxaccessgw\`) — see
`docs/v2/Galaxy.ParityRig.md` for setup. The mxaccessgw owns the
MXAccess COM bitness constraint (its worker is x86 net48); nothing
in the OtOpcUa repo carries that constraint anymore. PR 7.2 retired
the legacy in-process `Galaxy.Host` / `Galaxy.Proxy` / `Galaxy.Shared`
projects + the `OtOpcUaGalaxyHost` Windows service.
## Server process
## OtOpcUa Server
`src/ZB.MOM.WW.OtOpcUa.Server/Program.cs` uses the generic host:
Hosted via `Microsoft.Extensions.Hosting` with `AddWindowsService`
(decision #30 — replaced TopShelf in v2). The host's `Build()`
returns immediately when launched interactively (e.g. `dotnet run`)
but blocks for SCM signals when running as a Windows service.
```csharp
var builder = Host.CreateApplicationBuilder(args);
builder.Services.AddSerilog();
builder.Services.AddWindowsService(o => o.ServiceName = "OtOpcUa");
builder.Services.AddHostedService<OpcUaServerService>();
builder.Services.AddHostedService<HostStatusPublisher>();
```
In-process drivers are registered at startup in `Program.cs`'s
`DriverFactoryRegistry` block; the `DriverInstance` rows in the
central Config DB select which driver factories materialise into
live `IDriver` instances. See `docs/v2/driver-specs.md` for the
per-driver `DriverConfig` JSON shapes.
`OpcUaServerService` is a `BackgroundService` (decision #30 — TopShelf from v1 was replaced by the generic-host `AddWindowsService` wrapper; no TopShelf dependency remains in any csproj). It owns:
## OtOpcUa Admin
1. Config bootstrap — reads `Node:NodeId`, `Node:ClusterId`, `Node:ConfigDbConnectionString`, `Node:LocalCachePath` from `appsettings.json`.
2. `NodeBootstrap` — pulls the latest published generation from the Config DB into the LiteDB local cache (`LiteDbConfigCache`) so the node starts even if the central DB is briefly unreachable.
3. `DriverHost` — instantiates configured driver instances from the generation, wires each through `CapabilityInvoker` resilience pipelines.
4. `OpcUaApplicationHost` — builds the OPC UA endpoint, applies `OpcUaServerOptions` + `LdapOptions`, registers `AuthorizationGate` at dispatch.
5. `HostStatusPublisher` — a second hosted service that heartbeats `DriverHostStatus` rows so the Admin UI Fleet view sees the node.
Same hosting model; runs the Blazor Server UI + SignalR hubs.
Reads from the same Config DB the Server writes to.
### Installation
## OtOpcUa Wonderware Historian (optional)
Same executable, different modes driven by the .NET generic-host `AddWindowsService` wrapper:
When `Historian:Wonderware:Enabled=true`, the Server speaks to a
sidecar that wraps the Wonderware Historian SDK (which is .NET
Framework only). The pipe IPC contract is in
`src/ZB.MOM.WW.OtOpcUa.Driver.Historian.Wonderware.Client/Contracts/`
and the sidecar's pipe handler lives at
`src/ZB.MOM.WW.OtOpcUa.Driver.Historian.Wonderware/Pipe/`.
| Mode | Invocation |
|---|---|
| Console | `ZB.MOM.WW.OtOpcUa.Server.exe` |
| Install as Windows service | `sc create OtOpcUa binPath="C:\Program Files\OtOpcUa\Server\ZB.MOM.WW.OtOpcUa.Server.exe" start=auto` |
| Start | `sc start OtOpcUa` |
| Stop | `sc stop OtOpcUa` |
| Uninstall | `sc delete OtOpcUa` |
Install via the `-InstallWonderwareHistorian` switch on
`scripts/install/Install-Services.ps1`.
### Health endpoints
## Install / Uninstall
The Server exposes `/healthz` + `/readyz` used by (a) the Admin `FleetStatusPoller` as input to Fleet status and (b) `PeerReachabilityTracker` in a peer Server process as the HTTP side of the peer-reachability probe.
- `scripts/install/Install-Services.ps1` installs `OtOpcUa` and
optionally `OtOpcUaWonderwareHistorian`.
- `scripts/install/Uninstall-Services.ps1` — stops + removes both,
plus `OtOpcUaGalaxyHost` if a pre-7.2 rig still carries it.
## Admin process
## Logging
`src/ZB.MOM.WW.OtOpcUa.Admin/Program.cs` is a stock `WebApplication`. Highlights:
- Cookie auth (`CookieAuthenticationDefaults`, scheme name `OtOpcUa.Admin`) + Blazor Server (`AddInteractiveServerComponents`) + SignalR.
- Authorization policies gated by `AdminRoles`: `ConfigViewer`, `ConfigEditor`, `FleetAdmin` (see `Services/AdminRoles.cs`). `CanEdit` policy requires `ConfigEditor` or `FleetAdmin`; `CanPublish` requires `FleetAdmin`.
- `OtOpcUaConfigDbContext` registered against `ConnectionStrings:ConfigDb`.
- Scoped services: `ClusterService`, `GenerationService`, `EquipmentService`, `UnsService`, `NamespaceService`, `DriverInstanceService`, `NodeAclService`, `PermissionProbeService`, `AclChangeNotifier`, `ReservationService`, `DraftValidationService`, `AuditLogService`, `HostStatusService`, `ClusterNodeService`, `EquipmentImportBatchService`, `ILdapGroupRoleMappingService`.
- Singleton `RedundancyMetrics` (meter name `ZB.MOM.WW.OtOpcUa.Redundancy`) + `CertTrustService` (promotes rejected client certs in the Server's PKI store to trusted via the Admin Certificates page).
- `LdapAuthService` bound to `Authentication:Ldap` — same LDAP flow as ScadaLink CentralUI for visual parity.
- SignalR hubs mapped at `/hubs/fleet` and `/hubs/alerts`; `FleetStatusPoller` runs as a hosted service and pushes `RoleChanged`, host status, and alert events.
- OpenTelemetry → Prometheus exporter at `/metrics` when `Metrics:Prometheus:Enabled=true` (default). Pull-based means no Collector required in the common K8s deploy.
### Installation
Deployed as an ASP.NET Core service; the generic-host `AddWindowsService` wrapper (or IIS reverse-proxy for multi-node fleets) provides install/uninstall. Listens on whatever `ASPNETCORE_URLS` specifies.
## Galaxy.Host process
`src/ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Host/Program.cs` is a .NET Framework 4.8 x86 console executable. Configuration comes from environment variables supplied by the supervisor (`Driver.Galaxy.Proxy.Supervisor`):
| Env var | Purpose |
|---|---|
| `OTOPCUA_GALAXY_PIPE` | Pipe name the host listens on (default `OtOpcUaGalaxy`). |
| `OTOPCUA_ALLOWED_SID` | SID of the Server process's principal; anyone else is refused during the handshake. |
| `OTOPCUA_GALAXY_SECRET` | Per-spawn shared secret the client must present in the Hello frame. |
| `OTOPCUA_GALAXY_BACKEND` | `mxaccess` (default), `db` (ZB-only, no COM), `stub` (in-memory; for tests). |
| `OTOPCUA_GALAXY_ZB_CONN` | SQL connection string to the ZB Galaxy repository. |
| `OTOPCUA_HISTORIAN_*` | Optional Wonderware Historian SDK config if Historian is enabled for this node. |
The host spins up `StaPump` (the STA thread with message pump), creates the MXAccess `LMXProxyServer` COM object on that thread, and handles all COM calls there; the IPC layer marshals work items via `PostThreadMessage`.
### Pipe security
`PipeServer` builds a `PipeAcl` from the provided `SecurityIdentifier` + uses `NamedPipeServerStream` with `maxNumberOfServerInstances: 1`. The handshake requires a matching shared secret in the first Hello frame; callers whose SID doesn't match `OTOPCUA_ALLOWED_SID` are rejected before any frame is processed via `NamedPipeServerStream.RunAsClient` + a SID comparison against the configured allow list. The DACL grants `ReadWrite | Synchronize` only to the allowed SID and denies `LocalSystem`. The installed dev host (`OtOpcUaGalaxyHost`) runs as `dohertj2` with the secret at `.local/galaxy-host-secret.txt`.
### Installation
NSSM-wrapped (the Non-Sucking Service Manager) because the executable itself is a plain console app, not a `ServiceBase` Windows service. The supervisor then adopts the child process over the pipe after install. Install/uninstall commands follow the NSSM pattern:
```bash
nssm install OtOpcUaGalaxyHost "C:\Program Files (x86)\OtOpcUa\Galaxy.Host\ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Host.exe"
nssm set OtOpcUaGalaxyHost ObjectName .\dohertj2 <password>
nssm set OtOpcUaGalaxyHost AppEnvironmentExtra OTOPCUA_GALAXY_BACKEND=mxaccess OTOPCUA_GALAXY_SECRET=OTOPCUA_ALLOWED_SID=
nssm start OtOpcUaGalaxyHost
```
(Exact values for the environment block are generated by the Admin UI + committed alongside `.local/galaxy-host-secret.txt` on the dev box.)
## Inter-process communication
```
┌──────────────────────────┐ LDAP bind (Authentication:Ldap) ┌──────────────────────────┐
│ OtOpcUa Admin (x64) │ ─────────────────────────────────────────────▶│ LDAP / AD │
│ Blazor Server + SignalR │ └──────────────────────────┘
│ /metrics (Prometheus) │ FleetStatusPoller → ClusterNode poll
│ │ ─────────────────────────────────────────────▶┌──────────────────────────┐
│ │ Cluster/Generation/ACL writes │ Config DB (SQL Server) │
└──────────────────────────┘ ─────────────────────────────────────────────▶│ OtOpcUaConfigDbContext │
▲ └──────────────────────────┘
│ SignalR ▲
│ (role change, │ sp_GetCurrentGenerationForCluster
│ host status, │ sp_PublishGeneration
│ alerts) │
┌──────────────────────────┐ │
│ OtOpcUa Server (x64) │ ──────────────────────────────────────────────────────────┘
│ OPC UA endpoint │
│ Non-Galaxy drivers │ Named pipe (OtOpcUaGalaxy) ┌──────────────────────────┐
│ Driver.Galaxy.Proxy │ ─────────────────────────────────────────────▶│ Galaxy.Host (x86 .NFx) │
│ │ SID + shared-secret handshake │ STA + message pump │
│ /healthz /readyz │ │ MXAccess COM │
└──────────────────────────┘ │ Historian SDK (opt) │
└──────────────────────────┘
```
## appsettings.json boundary
Each process reads its own `appsettings.json` for **bootstrap only** — connection strings, LDAP bind config, transport security profile, redundancy node id, logging. The authoritative configuration tree (drivers, UNS, tags, ACLs) lives in the Config DB and is edited through the Admin UI. See [`Configuration.md`](Configuration.md) for the split.
## Development bootstrap
For the Windows install steps (SQL Server in Docker, .NET 10 SDK, .NET Framework 4.8 SDK, Docker Desktop WSL 2 backend, EF Core CLI, first-run migration), see [`docs/v2/dev-environment.md`](v2/dev-environment.md).
Serilog with rolling-daily file sinks. Each service writes to
`%ProgramData%\OtOpcUa\<service>-*.log` plus stdout (NSSM-friendly).
+623
View File
@@ -0,0 +1,623 @@
# Driver Feature Gaps vs Commercial OPC/SCADA Gateways
This document compares each non-Modbus, non-LMX driver in the OtOpcUa server against the feature surfaces of the dominant commercial gateways (Kepware KEPServerEX / PTC Kepware Edge, AVEVA OI Server / DAServer, Software Toolbox TOP Server, Matrikon, Unified Automation UaGateway, MTConnect-class Fanuc adapters, Beckhoff TF6100, etc.).
The intent is to:
- inventory what we already ship (with file:line citations into the current codebase)
- list missing or under-served features that are table-stakes for sites replacing those commercial gateways
- preserve the design choices that should NOT change just because a competitor does it differently
LMX (Galaxy / MXAccess) and Modbus are tracked elsewhere and are excluded here.
## Drivers covered
| Driver | Section | Implementation plan |
|---|---|---|
| AbCip — Allen-Bradley EtherNet/IP (ControlLogix / CompactLogix / Micro800 / GuardLogix) | [](#abcip-allen-bradley-ethernetip--logix) | [`plans/abcip-plan.md`](plans/abcip-plan.md) |
| AbLegacy — Allen-Bradley PLC-5 / SLC / MicroLogix (PCCC) | [](#ablegacy-allen-bradley-plc-5--slc--micrologix) | [`plans/ablegacy-plan.md`](plans/ablegacy-plan.md) |
| FOCAS — Fanuc CNC FOCAS / FOCAS2 | [](#focas-fanuc-cnc) | [`plans/focas-plan.md`](plans/focas-plan.md) |
| OpcUaClient — OPC UA aggregation client | [](#opcuaclient-opc-ua-aggregation-client) | [`plans/opcuaclient-plan.md`](plans/opcuaclient-plan.md) |
| S7 — Siemens S7-300 / 400 / 1200 / 1500 | [](#s7-siemens-s7-3004001200--1500) | [`plans/s7-plan.md`](plans/s7-plan.md) |
| TwinCAT — Beckhoff TwinCAT 2 / 3 (ADS) | [](#twincat-beckhoff-ads) | [`plans/twincat-plan.md`](plans/twincat-plan.md) |
## How to read this document
Every gap below is rated **[Build]** (recommended) or **[Skip]** (not recommended) inline at the start of the bullet. The same rating appears in the per-driver `### Recommendations` table with its rationale. The per-driver implementation plan in `docs/plans/` covers the **[Build]** items only.
---
## AbCip (Allen-Bradley EtherNet/IP — Logix)
### What we ship today
- Per-device `ab://gateway[:port]/cip-path` host-address with multi-hop CIP path via a comma-separated string (e.g. `1,2,2,192.168.50.20,1,0`) — `src/ZB.MOM.WW.OtOpcUa.Driver.AbCip/AbCipHostAddress.cs:23`.
- Four PLC-family profiles (`ControlLogix`, `CompactLogix`, `Micro800`, `GuardLogix`) selecting libplctag plc attribute, ConnectionSize default (504/4002/488), default CIP path (`1,0` or empty), connected-vs-unconnected hint, request-packing flag, and MaxFragmentBytes — `src/ZB.MOM.WW.OtOpcUa.Driver.AbCip/PlcFamilies/AbCipPlcFamilyProfile.cs:13-62`.
- N devices per driver instance with per-device bulkhead/breaker keying — `src/ZB.MOM.WW.OtOpcUa.Driver.AbCip/AbCipDriverOptions.cs:19`.
- Pre-declared static tag map (`AbCipTagDefinition`) keyed by `Name`, with `TagPath`, `DataType`, `Writable`, `WriteIdempotent`, `Members`, `SafetyTag``AbCipDriverOptions.cs:95-103`.
- Logix atomic types `BOOL/SINT/INT/DINT/LINT/USINT/UINT/UDINT/ULINT/REAL/LREAL/STRING/DT` plus `Structure` marker — `src/ZB.MOM.WW.OtOpcUa.Driver.AbCip/AbCipDataType.cs:16-37`.
- Optional online controller browse via libplctag `@tags` pseudo-tag, surfaced under a `Discovered/` sub-folder; controller- and program-scope (`Program:Main.X`) tags emitted; system/module/routine/task tags filtered — `AbCipDriver.cs:674-757`, `AbCipSystemTagFilter.cs`.
- UDT / Predefined-Structure handling: declaration-driven member fan-out (Variable per member) plus runtime CIP Template Object (class 0x6C) decoder + per-device `(deviceHostAddress, templateInstanceId)` template cache — `CipTemplateObjectDecoder.cs`, `AbCipTemplateCache.cs`, `AbCipDriver.cs:70-103`.
- Whole-UDT read coalescing — `AbCipUdtReadPlanner` groups members of the same parent and reads the parent once, decoding members from the buffer at computed byte offsets — `AbCipDriver.cs:323-449`, `AbCipUdtReadPlanner.cs`, `AbCipUdtMemberLayout.cs`.
- BOOL-in-DINT addressing (`Tag.N` bit-index) with read-decode + RMW write through a per-parent `SemaphoreSlim` and cached parent-DINT runtime — `AbCipDriver.cs:494-614`, `AbCipTagPath.cs`.
- Polling subscription overlay shared with other drivers (`PollGroupEngine`) — `AbCipDriver.cs:56-59,187-195`.
- Per-device connectivity probe with configurable interval/timeout/probe tag (default off until tag configured) and `OnHostStatusChanged` events — `AbCipDriverOptions.cs:131-143`, `AbCipDriver.cs:235-295`.
- ALMD alarm projection (opt-in) polling `InFaulted` + `Severity`, raising `OnAlarmEvent` on edges, with ack-write — `AbCipAlarmProjection.cs`, `AbCipDriverOptions.cs:42-58`.
- GuardLogix safety-tag flag forces `SecurityClassification.ViewOnly``AbCipDriverOptions.cs:89-94`, `AbCipDriver.cs:474-478`.
- libplctag-status → OPC UA StatusCode mapping (`BadCommunicationError`, `BadNotWritable`, `BadTypeMismatch`, `BadOutOfRange`, `BadNodeIdUnknown`) — `AbCipStatusMapper.cs`.
- Tier-B reinit (`ReinitializeAsync`) tearing down all `IAbCipTagRuntime` handles — `AbCipDriver.cs:163-167`.
- CLI test client: `probe`, `read`, `write`, `subscribe` against the same driver — `docs/Driver.AbCip.Cli.md`.
### Gaps vs commercial gateways
- **[Build]** **Offline tag import from L5K / L5X** — present in: both (Kepware Logix Database Settings; TOP Server Auto Tag Generation). Why it matters: lets engineers stage a project against a Studio 5000 export with no PLC online, the de-facto config workflow at Rockwell shops.
- **[Build]** **CSV tag import / export** — present in: both. Why it matters: Kepware/AVEVA users routinely round-trip tag lists through Excel; replacing them without CSV makes mass-config painful.
- **[Build]** **Tag descriptions / engineering metadata** — present in: both (descriptions imported with L5X). Why it matters: descriptions become the OPC UA `Description`/`DisplayName`, expected by HMI/Historian engineers.
- **[Build]** **Logical-blocking / logical-non-blocking protocol modes** — present in: both (TOP Server names them; Kepware exposes equivalent "Optimize for read" / structure-block reads). Why it matters: whole-UDT vs per-member read strategy is the single biggest performance lever; we have one-direction whole-UDT only via `AbCipUdtReadPlanner`, no structure-block read for non-grouped members.
- **[Build]** **Symbolic vs logical (instance-ID) addressing toggle** — present in: both. Why it matters: logical addressing skips ASCII parsing on every poll, ~3-5x faster for high-tag-count rigs; libplctag supports it but we don't expose the choice.
- **[Build]** **Configurable CIP Connection Size per device** — present in: both (Kepware 500-4000 byte slider, TOP Server "Max Packet Size"). Why it matters: we hard-code the family default (4002/504/488); no field knob to tune for switches that fragment large frames or for legacy v19 firmware that won't accept Large Forward Open.
- **[Skip]** **Inactivity timeout / connection idle disconnect** — present in: both. Why it matters: long-idle CIP sessions get reaped silently by some firewalls; commercial drivers expose a keep-alive cadence we don't.
- **[Build]** **Per-tag scan rate / scan group bucketing** — present in: both (Kepware "scan classes", AVEVA Topic update intervals). Why it matters: lets engineers separate fast 100ms machine-state tags from 5s recipe data; we have one publishing-interval-per-subscription with no per-tag override.
- **[Skip]** **"Respect tag-specified scan rate" mode** — present in: Kepware. Why it matters: lets the static tag table override client-requested rate, important when an HMI subscribes too fast and overruns the PLC.
- **[Skip]** **Initial value cache / "first updates from cache"** — present in: Kepware. Why it matters: avoids a stall while a fresh subscription waits for its first poll; common SCADA expectation.
- **[Build]** **Multi-tag write packing (write-multi)** — present in: both. Why it matters: we serialise writes one-by-one in `AbCipDriver.WriteAsync`; without CIP multi-request packing for writes a recipe-download is N round-trips instead of one.
- **[Build]** **AOI (Add-On Instruction) input/output handling** — present in: Kepware (with explicit InOut limitation note). Why it matters: AOIs are how modern Logix code is structured; the Template Object decoder probably handles the layout but we don't surface AOI-specific browse paths.
- **[Build]** **Native STRING (Logix STRING / custom STRINGxx) decoding** — present in: both (Kepware preserves descriptors; AVEVA exposes as native string). Why it matters: we map Logix `STRING` to `DriverDataType.String` but `AbCipDataType.cs` flags whole-string only; no support for user-defined `STRINGnn` variants of different DATA-array sizes.
- **[Build]** **64-bit integer surface (LINT/ULINT)** — present in: both. Why it matters: Logix v32+ exposes LINT for 64-bit counters/timestamps; we widen them into `Int32` per a TODO at `AbCipDataType.cs:53`, losing the upper bits.
- **[Skip]** **Structure / UDT as first-class OPC UA structured type** — present in: both (Kepware emits child tags; AVEVA exposes via native UDT). Why it matters: we emit `DriverDataType.String` placeholder for whole-UDT, only members are fully typed; OPC UA clients can't bind to a UDT shape.
- **[Build]** **Array element / array slice addressing** — present in: both (Kepware `Tag[3,5]`, slice `Tag[0..15]`). Why it matters: `AbCipTagPath` supports indexed elements but the driver has no array-slice read for adjacent indices; reading `Tag[0..99]` becomes 100 individual reads.
- **[Skip]** **PLC-5 / SLC-500 bridging via ControlLogix gateway** — present in: both (Kepware Logix Gateway, TOP Server NET-ENI). Why it matters: thousands of legacy AB sites front a PLC-5/SLC behind a 1756-ENBT; without the bridge those plants can't migrate to us in one step.
- **[Build]** **Hot-standby ControlLogix redundancy (paired EN2T IPs)** — present in: AVEVA (and Kepware via secondary device). Why it matters: ControlLogix HSBY pairs are standard in continuous-process plants; today our driver has one host address per device, no automatic failover to the partner chassis.
- **[Build]** **Diagnostics / system tags (`_ConnectionStatus`, `_ScanRate`, `_TagCount`, `_DeviceError`)** — present in: both. Why it matters: SCADA dashboards bind to these for live driver health; we expose `IHostConnectivityProbe` + `DriverHealth` but not as browseable OPC UA variables.
- **[Build]** **Tag-write deadband / write-on-change / write-coalesce** — present in: both. Why it matters: avoids hammering the PLC on jittery analogue setpoints; we write every request straight through.
- **[Skip]** **Unsolicited messages (PLC-pushed CIP MSG)** — present in: AVEVA (DASABCIP unsolicited topic), Kepware (separate "ControlLogix Unsolicited" driver). Why it matters: event-driven alarm/recipe-complete signals from the PLC arrive with sub-100ms latency vs our 1s alarm-poll loop.
- **[Skip]** **CIP Generic / Class 3 message passthrough** — present in: both. Why it matters: enables custom tooling (drive parameters, motion config, MSG instruction targets) for shops that have built around it.
- **[Skip]** **Configurable per-device connection count / connection pooling** — present in: both (AVEVA: max 31). Why it matters: lets operators trade PLC CPU cost against parallelism for high-throughput rigs; we run one connection per tag handle implicitly.
- **[Build]** **Online tag-database refresh trigger** — present in: AVEVA (`$Sys$UpdateTagInfo`). Why it matters: lets ops force re-browse after a Studio 5000 download without restarting the driver; we only re-browse on full driver reinit.
### Recommendations
| # | Gap | Build? | Rationale |
|---|-----|:------:|-----------|
| 1 | Offline L5K / L5X import | Yes | De-facto Studio 5000 workflow; engineers won't switch without it |
| 2 | CSV tag import / export | Yes | Common round-trip via Excel for mass config |
| 3 | Tag descriptions / engineering metadata | Yes | Free once L5X import lands; expected as OPC UA `Description` |
| 4 | Logical-blocking / non-blocking modes | Yes | Biggest perf lever; today only whole-UDT coalescing |
| 5 | Symbolic vs logical (instance-ID) toggle | Yes | 3-5x perf on dense rigs; libplctag already supports it |
| 6 | Configurable Connection Size per device | Yes | Cheap field knob for v19 firmware / fragmenting switches |
| 7 | Inactivity timeout / keep-alive cadence | No | Rarely an issue with libplctag-managed connections |
| 8 | Per-tag scan rate / scan groups | Yes | Standard SCADA expectation; mixed-rate tag tables |
| 9 | "Respect tag-specified scan rate" mode | No | Niche; OPC UA subscription rate already covers it |
| 10 | Initial value cache / first-update from cache | No | OPC UA subscription sampling already handles first-update |
| 11 | Multi-tag write packing | Yes | Recipe-download speed; one PDU vs N |
| 12 | AOI input / output handling | Yes | Standard modern Logix code structure |
| 13 | Native STRING / STRINGnn decoding | Yes | Table-stakes; we passthrough as String only |
| 14 | 64-bit LINT / ULINT fidelity | Yes | Correctness on Logix v32+; we silently truncate (TODO in code) |
| 15 | UDT as first-class OPC UA structured type | No | Member fan-out already works; structured-type plumbing is heavy |
| 16 | Array slice addressing `Tag[0..15]` | Yes | Perf; reads of N-element arrays in one call |
| 17 | PLC-5 / SLC bridging through CLX | No | AbLegacy driver covers this protocol family |
| 18 | Hot-standby ControlLogix redundancy | Yes | Continuous-process plants standardize on HSBY pairs |
| 19 | Diagnostic system tags (`_ConnectionStatus` etc.) | Yes | HMI dashboards bind to them; cheap given DriverHealth |
| 20 | Write deadband / write-on-change | Yes | Analog setpoints flood the PLC without it |
| 21 | Unsolicited CIP MSG ingestion | No | Separate driver in commercial; design-heavy; niche |
| 22 | CIP Generic / Class 3 passthrough | No | Niche custom-tooling territory |
| 23 | Per-device connection count / pooling | No | libplctag manages connections; premature |
| 24 | Online tag-DB refresh trigger | Yes | Cheap; avoids restart after PLC download |
### Notable parity (keep)
- libplctag-class wire layer covering ControlLogix/CompactLogix/Micro800/GuardLogix on EtherNet/IP CIP — same controller coverage as the commercial drivers (minus PLC-5/SLC).
- Multi-hop CIP path syntax with bridge-through chassis (`1,2,2,IP,1,0` form) — matches Kepware/AVEVA routing semantics.
- Online controller browse with program-scope vs controller-scope distinction and system-tag filtering — same shape as Kepware Auto Tag Generation.
- CIP Template Object (class 0x6C) decoder for live UDT-shape resolution + cache — feature-parity with Kepware's structure-aware Auto Tag Generation.
- Whole-UDT read coalescing for grouped members — matches TOP Server "logical blocking" optimisation for the cases it covers.
- BOOL-in-DINT bit-index addressing with RMW serialisation per parent — same semantics commercial drivers expose for `Tag.N` bit access.
- Per-PLC-family Connection Size / connected-messaging / fragment-bytes profile — mirrors the per-controller "model" picker in Kepware.
- ALMD alarm projection with edge-detected raise/clear — reasonable parity for the alarm subset of FT Alarms & Events that those drivers do not natively translate.
- Per-device circuit-breaker / bulkhead isolation keyed on `(driver, hostName)` — better operational story than the typical commercial gateway, which trips the whole channel on one bad device.
- GuardLogix safety-tag write rejection at config time — explicit, matches Rockwell's safety-partition rules.
### Sources
- [Kepware Allen-Bradley ControlLogix Ethernet driver overview](https://support.ptc.com/help/kepware/drivers/en/kepware/drivers/CONTROLLOGIXETHERNET/Overview.html)
- [Kepware Logix Database Settings (offline / online ATG, L5K/L5X)](https://support.ptc.com/help/kepware/drivers/en/kepware/drivers/CONTROLLOGIXETHERNET/Logix_Database_Settings.html)
- [Kepware Preparing for Automatic Tag Database Generation](https://support.ptc.com/help/kepware/drivers/en/kepware/drivers/CONTROLLOGIXETHERNET/Preparing_for_Automatic_Tag_Database_Generation.html)
- [Kepware Device Properties — Scan Mode (respect tag-specified, demand poll, initial cache)](https://support.ptc.com/help/kepware/drivers/en/kepware/drivers/Device_Properties_Scan_Mode.html)
- [Kepware Allen-Bradley ControlLogix Ethernet driver manual (PDF, 2025)](https://downloads.softwaretoolbox.com/demodnld/prod_docs/topserver_help_pdf/Common/allen-bradley-controllogix-ethernet-manual.pdf)
- [Kepware Allen-Bradley ControlLogix Server (Unsolicited)](https://www.ptc.com/en/store/kepware/drivers/allen-bradley-controllogix-unsolicited)
- [Kepware System Tags](https://support.ptc.com/help/kepware/kepware_edge/en/kepware/kepware-edge/system-tags.html)
- [TOP Server ControlLogix protocol modes (symbolic / logical-blocking / logical-non-blocking)](https://blog.softwaretoolbox.com/optimizing-controllogix-protocol-modes)
- [TOP Server Rockwell ControlLogix Ethernet OPC driver details](https://softwaretoolbox.com/top-server/rockwell-ab-controllogix-ethernet)
- [TOP Server ControlLogix Ethernet performance optimization](https://softwaretoolbox.com/top-server/rockwell-ab-controllogix-performance)
- [Software Toolbox FAQ — making configuration choices for ControlLogix Ethernet](https://help.softwaretoolbox.com/faq/1658)
- [AVEVA Communication Drivers Pack — ABCIP Driver user guide (PDF)](https://cdn.logic-control.com/docs/aveva/communications-pack/OIABCIP.pdf)
- [Wonderware DASABCIP user guide (PDF)](https://cdn.logic-control.com/media/DASABCIP.pdf)
- [Wonderware OI.ABCIP server user guide (PDF, v7.0)](https://s3-us-west-2.amazonaws.com/wonderwarepacwest/downloads/oi-abcip-user-guide.pdf)
- [Industrial Software Solutions — DASABCIP unsolicited message handling](https://industrial-software.com/training-support/tech-notes/74-how-configure-wonderware-dasabcip-unsolicited-message-handling/)
- [Industrial Software Solutions — `$Sys$UpdateTagInfo` with ABCIP](https://industrial-software.com/training-support/tech-notes/119-using-sysupdatetaginfo-with-abcip-oi-servers/)
- [AVEVA — Configure the ABCIP Communication Driver](https://docs.aveva.com/bundle/sp-cdp-drivers/page/193749.html)
---
## AbLegacy (Allen-Bradley PLC-5 / SLC / MicroLogix)
### What we ship today
- Per-device family knob: `Slc500` / `MicroLogix` / `Plc5` / `LogixPccc`, each mapped to a libplctag PLC attribute, default CIP path, max-tag-bytes, and string/long-file capability flags (`PlcFamilies/AbLegacyPlcFamilyProfile.cs:14-54`).
- Single transport: PCCC encapsulated in EtherNet/IP via libplctag, with `ab://gateway[:port]/cip-path` host strings supporting CLX-bridged routing (`AbLegacyHostAddress.cs:14-52`).
- File-letter set: `N`, `F`, `B`, `L`, `ST`, `T`, `C`, `R`, `I`, `O`, `S`, `A` parsed and validated; trailing `/N` bit index and `.SUBELEMENT` (ACC/PRE/EN/DN/TT/CU/CD/LEN/POS/ER) recognised (`AbLegacyAddress.cs:97-101`, `AbLegacyDataType.cs:9-29`).
- Data types: `Bit`, `Int` (N/A), `Long` (L), `Float` (F), `String` (ST), `TimerElement`, `CounterElement`, `ControlElement` — all surfacing as `Boolean` / `Int32` / `Float32` / `String` driver types (`AbLegacyDataType.cs:34-44`).
- Bit-within-N-word write path: read-modify-write against a parent-word runtime, serialised by per-parent `SemaphoreSlim` (`AbLegacyDriver.cs:353-409`).
- Polling overlay via shared `PollGroupEngine` exposed through `ISubscribable`; per-publishing-interval grouping (`AbLegacyDriver.cs:268-276`).
- Connectivity probe loop per device (default `S:0`, configurable interval/timeout) emitting `HostStatusChangedEventArgs` transitions (`AbLegacyDriver.cs:283-336`, `AbLegacyDriverOptions.cs:36-44`).
- Capability surfaces: `IDriver`, `IReadable`, `IWritable`, `ITagDiscovery`, `ISubscribable`, `IHostConnectivityProbe`, `IPerCallHostResolver` — flat `AbLegacy/<host>/<tag>` browse tree built from static config (`AbLegacyDriver.cs:11-12`, `238-264`).
- Static-config tag list only (`AbLegacyTagDefinition`); writes can be flagged `Writable=false` and `WriteIdempotent=true` (`AbLegacyDriverOptions.cs:28-34`).
- Status mapping for libplctag error codes to OPC UA StatusCodes (`AbLegacyStatusMapper.cs`).
### Gaps vs commercial gateways
- **[Skip]** **Serial DF1 transports (full-duplex, half-duplex master/slave, KF2/KF3, radio modem)** — present in: both. Why: libplctag PCCC is Ethernet-only; no COM-port path means PLC-5/SLC/ML serial deployments are unreachable.
- **[Build]** **DH+ via 1756-DHRIO / 1784-PKTX gateway routing** — present in: both. Why: DH+ Gateway is the canonical way to reach PLC-5 nodes through a CLX rack today; we expose a CIP path but no station-number addressing or DH+ link-id concept.
- **[Skip]** **DH-485 routing through 1761-NET-AIC / 1747-AIC** — present in: both. Why: MicroLogix 1000/1200 and SLC 5/03 multi-drop deployments need DH-485 station addressing.
- **[Skip]** **`M0` / `M1` module file access (block-transfer / RIO data)** — present in: Kepware, AVEVA. Why: Required for any PLC-5 with RIO modules or specialty cards (motion, weigh, vision); PCCC has dedicated frames.
- **[Build]** **`PD` (PID), `MG` (Message), `PLS` (programmable limit switch), `BT` (block transfer) function/structure files** — present in: both. Why: Standard SLC/PLC-5 file types for PID loops and message instructions; we cap at T/C/R structures only.
- **[Skip]** **`D` (BCD) and Long-BCD types** — present in: both. Why: Some legacy SLC/PLC-5 programs store recipe / setpoint data as packed BCD; we only ship binary `Int`/`Long`.
- **[Build]** **PLC-5 octal addressing for I/O word/bit (`I:001/17`)** — present in: both. Why: Native PLC-5 documentation and RSLogix 5 use octal; rejecting decimal-only addresses misreads real configs.
- **[Build]** **Indirect / indexed addressing (`N7:[N7:0]`, `N[N7:0]:5`)** — present in: both. Why: Common pattern for recipe / batch lookup tables; libplctag supports it but our parser only accepts literal `<letter><file>:<word>`.
- **[Build]** **Array reads / contiguous block addressing (`N7:0,10` or `N7:0[10]`)** — present in: both. Why: One PCCC request can pull up to ~120 words; absent array syntax forces N round-trips for 1-of-N tags and breaks block-read sizing optimisation.
- **[Build]** **String-file (`ST`) read/write path in production** — present in: both. Why: Type is enum-listed but `AbLegacyDataTypeExtensions.ToDriverDataType` maps to `String` only; ST is an 82-byte fixed buffer with a length word and we have no integration coverage to confirm round-trip.
- **[Build]** **Sub-element predefined symbol coverage (timer `.PRE/.ACC/.EN/.TT/.DN`, counter `.CU/.CD/.OV/.UN`, control `.LEN/.POS/.ER/.UL/.IN/.FD`)** — present in: both. Why: Parser admits any all-letters sub-element but the `TimerElement/CounterElement/ControlElement` types collapse to a single `Int32`, losing per-bit Boolean semantics that HMIs expect (`.DN` should be Bit, not Int32).
- **[Skip]** **Block read-size negotiation per family** — present in: both. Why: We carry `MaxTagBytes` as a constant but never plumb it into a request optimiser; libplctag's PCCC chunking is implicit and not tunable per-tag-group.
- **[Build]** **Auto-demote on comm failure** — present in: both. Why: Kepware/TOP Server temporarily off-scan a non-responsive device for N seconds so other devices on the channel keep flowing; we only switch a `HostState` flag and keep retrying.
- **[Skip]** **Communication serialisation across multiple devices on one channel** — present in: both. Why: DH+/DF1 networks share a single physical link; we have no channel concept, so a slow PLC-5 can starve a fast SLC on the same DH+ link.
- **[Build]** **RSLogix 500 (`.RSS`) / RSLogix 5 (`.RSP`) / `.SLC` symbol & data-table import for automatic tag generation** — present in: both (DF1, AB Ethernet drivers). Why: Manual `AbLegacyTagDefinition` entries scale poorly; commercial tools parse RSLogix exports to seed tags and descriptions.
- **[Skip]** **Online browse / data-table discovery from the controller** — present in: Kepware (Create-from-Device). Why: PCCC has a "read file directory" frame; we don't issue it, so `DiscoverAsync` only ever returns the static config.
- **[Skip]** **DF1 error checking selection (BCC vs CRC-16)** — present in: both. Why: Some serial gear (older modems) only does BCC; not applicable until serial transport ships, but flagged for parity.
- **[Build]** **Per-tag deadband / change filter on subscriptions** — present in: both. Why: Polling overlay publishes every poll; commercial drivers suppress no-op publishes by absolute-deadband or scaling.
- **[Skip]** **PLC-5 typed-write / typed-read selection vs SLC protected typed reads** — present in: both. Why: Kepware exposes "Optimization Method" and "Force Logical=Yes" knobs that materially affect performance on slower processors; we use libplctag defaults silently.
- **[Build]** **Diagnostic counters (request count, response time, retries, last-error per device, comm-failures)** — present in: both (built-in `_System` / `_DiagnosticTags`). Why: We surface a `DriverHealth` enum but no per-device tag-level diagnostics for an HMI to bind to.
- **[Build]** **Per-device timeout / retry overrides** — present in: both. Why: We have one driver-wide `Timeout` (`AbLegacyDriverOptions.cs:16`) and one probe timeout; SLC 5/01 vs SLC 5/05 vs MicroLogix 1100 need very different values on a shared driver.
- **[Skip]** **Write completion semantics — synchronous-confirmation vs queued** — present in: both. Why: Commercial drivers offer "write optimization (latest value only / write-through / disable)"; ours always writes through, which floods slow channels with redundant writes.
- **[Build]** **MicroLogix-specific item naming (e.g. `RTC:0.HR`, `HSC:0`, `DLS:0` for daylight savings)** — present in: both. Why: MicroLogix 1100/1400 have proprietary function files that don't share file letters with SLC and our `IsKnownFileLetter` whitelist rejects them.
### Recommendations
| # | Gap | Build? | Rationale |
|---|-----|:------:|-----------|
| 1 | Serial DF1 transports | No | Declining install base; libplctag has no serial path; major scope |
| 2 | DH+ via 1756-DHRIO bridging | Yes | Real-world PLC-5 path; libplctag CIP routing already supports it |
| 3 | DH-485 routing (1761/1747-AIC) | No | Very legacy; rare in greenfield |
| 4 | M0 / M1 module file access | No | Niche RIO modules; declining |
| 5 | PD / MG / PLS / BT files | Yes | PID files are common in real SLC programs |
| 6 | D (BCD) and Long-BCD types | No | Very legacy data convention |
| 7 | PLC-5 octal addressing | Yes | Correctness for actual PLC-5 sites |
| 8 | Indirect / indexed addressing | Yes | Standard recipe / lookup pattern |
| 9 | Array contiguous block addressing | Yes | Big perf gain; one PCCC frame vs N |
| 10 | ST string read / write production verification | Yes | Type is enum-listed but untested; cheap to validate |
| 11 | Sub-element bit semantics (`.DN` as Bit, etc.) | Yes | Correctness; HMIs expect Boolean for `.DN`/`.EN`/`.TT` |
| 12 | Block read-size negotiation per family | No | libplctag handles chunking implicitly |
| 13 | Auto-demote on comm failure | Yes | Standard SCADA resilience; one slow PLC starves fast ones |
| 14 | Channel-shared comm serialisation | No | Only matters for serial / DH+ (transport not built) |
| 15 | RSLogix 500/5 (.RSS / .RSP) symbol import | Yes | Workflow parity; manual config doesn't scale |
| 16 | Online controller browse / data-table discovery | No | PCCC dir frame limited; libplctag support unclear |
| 17 | DF1 BCC vs CRC-16 selection | No | Predicated on DF1 transport (gap #1) |
| 18 | Per-tag deadband / change filter | Yes | Polling overlay floods every poll without it |
| 19 | PLC-5 typed-read selection / Force Logical | No | libplctag defaults are sound; niche tuning |
| 20 | Diagnostic counters as tags | Yes | HMI binding; cheap given existing health probe |
| 21 | Per-device timeout / retry overrides | Yes | SLC 5/01 vs 5/05 vs ML1100 differ; cheap |
| 22 | Write completion semantics options | No | Niche tuning; current write-through is safe default |
| 23 | MicroLogix function-file naming (RTC/HSC/DLS) | Yes | Correctness for ML1100/1400 deployments |
### Notable parity (keep)
- Family enum + per-family profile keeps SLC 500 / MicroLogix / PLC-5 / LogixPccc-mode behavioural differences explicit instead of probed at runtime (`PlcFamilies/AbLegacyPlcFamilyProfile.cs:14-54`).
- ControlLogix-bridged routing string (`ab://gw/1,0`) matches Kepware's "Routing Path" concept and is how real PLC-5 deployments are reached today (`AbLegacyHostAddress.cs:14-52`).
- Bit-within-N-word RMW with per-parent serialisation prevents the classic two-writer-tear bug other drivers ship (`AbLegacyDriver.cs:353-384`).
- Probe loop with explicit `HostState` transitions gives a cleaner diagnostic surface than Kepware's lump-sum auto-demote (`AbLegacyDriver.cs:283-336`).
- Status-file probe (`S:0`) is the same heartbeat Rockwell HMIs traditionally use, and it's family-agnostic (`AbLegacyDriverOptions.cs:43`).
- libplctag back-end inherits ongoing community fixes for PCCC frame edge-cases without us owning the wire decoder.
### Sources
- [Kepware Allen-Bradley Ethernet Driver Manual (PDF)](https://cdn.logic-control.com/docs/kepware/Manuals/Drivers/Allen-Bradley/Allen-Bradley%20Ethernet%20Driver.pdf)
- [Kepware Allen-Bradley DF1 Driver Manual (PDF)](https://cdn.logic-control.com/docs/kepware/Manuals/Drivers/Allen-Bradley/Allen-Bradley%20DF1%20Driver.pdf)
- [Kepware Allen-Bradley ControlLogix Ethernet Driver Manual (PDF, 2025)](https://downloads.softwaretoolbox.com/demodnld/prod_docs/topserver_help_pdf/Common/allen-bradley-controllogix-ethernet-manual.pdf)
- [Kepware Allen-Bradley ControlLogix Driver Manual (PDF, 2017)](https://ftp.softwaretoolbox.com/demodnld/prod_docs/topserver_help_pdf/v5_20/controllogix_ethernet.pdf)
- [Kepware Allen-Bradley Ethernet driver product page](https://www.kepware.com/en-us/products/kepserverex/drivers/allen-bradley-ethernet/)
- [TOP Server Rockwell DF1 Serial driver](https://softwaretoolbox.com/top-server/rockwell-ab-df1)
- [AVEVA Communication Drivers Pack 2023 R2 readme](https://www.wmkit.com/archives/aveva-communication-drivers-pack-2023-r2-readme.html)
- [AVEVA Communication Drivers Pack 2020 R2 readme](https://industrial-software.com/wp-content/uploads/Communication_Drivers/oi-communication-drivers-pack-2020-r2/Readme.html)
- [AVEVA Communication Drivers datasheet (PDF)](https://www.aveva.com/content/dam/aveva/documents/datasheets/Datasheet_AVEVA-CommunicationDrivers_11-19.pdf)
- [AVEVA OI ABCIP user guide (PDF)](https://s3-us-west-2.amazonaws.com/wonderwarepacwest/downloads/oi-abcip-user-guide.pdf)
- [Kepware Logix Database Settings (Create-from-Device / .L5K import)](https://support.ptc.com/help/kepware/drivers/en/kepware/drivers/CONTROLLOGIXETHERNET/Logix_Database_Settings.html)
- [Rockwell DF1 Protocol and Command Set reference (1770-RM516, PDF)](https://literature.rockwellautomation.com/idc/groups/literature/documents/rm/1770-rm516_-en-p.pdf)
---
## FOCAS (Fanuc CNC)
### What we ship today
- TCP-only Ethernet transport on port 8193 via the pure-managed `Focas.Wire` client; no Fwlib DLL, no P/Invoke, no out-of-process Tier-C host (`docs/drivers/FOCAS.md:8-13`, retired Host noted at `:25-27`).
- One driver instance can host N CNCs, each keyed by `focas://{ip}[:{port}]` (`FocasDriverOptions.cs:10`, `FocasDeviceOptions:92-95`).
- Per-device CNC series declaration (`Zero_i_D/F/MF/TF`, `Sixteen_i`, `Thirty_i`, `ThirtyOne_i`, `ThirtyTwo_i`, `PowerMotion_i`, `Unknown`) with init-time capability matrix validating macro / parameter / PMC ranges per series (`FocasCncSeries.cs:21-47`, `FocasCapabilityMatrix.cs:29-138`).
- User-authored tag addressing for: PMC bits/bytes (`X0.0`, `R100`, `R100.3`), CNC parameters (`PARAM:1815/0`), and macro variables (`MACRO:500`) — wired through `cnc_rdpmcrng` / `cnc_rdparam` / `cnc_rdmacro` (`docs/drivers/FOCAS.md:62-66, 90`).
- Atomic data types: Bit, Byte, Int16, Int32, Float32, Float64, String (`FocasDataType.cs:10-26`).
- Read-only by design — `WriteAsync` returns `BadNotWritable`; no `cnc_wrparam` / `pmc_wrpmcrng` / `cnc_wrmacro` paths exist (`FocasDriver.cs:222-279`, `docs/drivers/FOCAS.md:17-18, 91`).
- Optional `FixedTree` auto-populated subtree per device (`FocasFixedTreeOptions:26-51`) populated at bootstrap from `cnc_sysinfo` + `cnc_rdaxisname` + `cnc_rdspdlname`, polled at three cadences (axis 250 ms, program 1 s, timer 30 s):
- `Identity/``SeriesNumber`, `Version`, `MaxAxes`, `CncType`, `MtType`, `AxisCount` (`FocasDriver.cs:299-304`).
- `Axes/{name}/``AbsolutePosition`, `MachinePosition`, `RelativePosition`, `DistanceToGo`, `ServoLoad` (cap-gated) (`FocasDriver.cs:307-316`).
- `Axes/FeedRate/Actual`, `Axes/SpindleSpeed/Actual` (single-channel rates — first axis only, `FocasDriver.cs:317-318`, `:646-651`).
- `Spindle/{name}/Load`, `Spindle/{name}/MaxRpm` (cap-gated, multi-spindle aware) (`FocasDriver.cs:323-336`).
- `Program/Name`, `ONumber`, `Number`, `MainNumber`, `Sequence`, `BlockCount` (`FocasDriver.cs:339-347`).
- `OperationMode/Mode` + `ModeText` ("MDI"/"AUTO"/"EDIT"/"HANDLE"/"JOG"/"TEACH_IN_HANDLE"/"REFERENCE"/"REMOTE"/"TEST"/"TJOG") (`IFocasClient.cs:213-226`).
- `Timers/PowerOnSeconds`, `OperatingSeconds`, `CuttingSeconds`, `CycleSeconds` (`FocasDriver.cs:355-362`).
- Per-series node suppression: optional API probes at bootstrap, `EW_FUNC` / `EW_NOOPT` / `EW_VERSION` causes the corresponding subtree to not be emitted (`docs/drivers/FOCAS.md:134-142`, `FocasDriver.cs:497-526`).
- Active-alarm projection via `IAlarmSource` (opt-in, polls `cnc_rdalmmsg2` at 2 s default), differential raise/clear with mapped alarm types `Parameter / PulseCode / Overtravel / Overheat / Servo / DataIo / MemoryCheck / MacroAlarm`, severity buckets, and ack as no-op (`FocasAlarmProjectionOptions:79-85`, `IFocasClient.cs:275-287`, `docs/drivers/FOCAS.md:154-181`).
- Connectivity probe via `cnc_rdcncstat` on configurable interval; transitions fire `OnHostStatusChanged` (`FocasProbeOptions:110-115`, `docs/drivers/FOCAS.md:94`).
- Optional proactive handle-recycle loop to release FWLIB session handles on a cadence (defends against the documented handle-leak bugs and finite ~510 connection pool) (`FocasHandleRecycleOptions:68-72`, `docs/drivers/FOCAS.md:184-205`).
- Subscriptions are emulated via the shared `PollGroupEngine` (FOCAS has no push) (`FocasDriver.cs:451-461`).
- `IPerCallHostResolver` so each tag's reads route to its declared device, enabling per-host bulkhead resilience (decision #144) (`FocasDriver.cs:850-857`, `FocasDriverOptions.cs:3-7`).
### Gaps vs commercial gateways / MTConnect adapters
- **[Build]** **Writes (parameters / PMC / macro)** — Kepware "Fanuc Focas HSSB and Ethernet Driver", Ignition Fanuc, Memex Merlin, Predator MDC. Why: Macro / PMC writes are the canonical mechanism for DPRNT-free supervisory feedback to ladder logic; we explicitly return `BadNotWritable`.
- **[Skip]** **HSSB (high-speed serial bus) transport** — Kepware, MTConnect Fanuc Adapter (Cincinnati), Memex. Why: HSSB is the only path on machines with no FOCAS Ethernet option licensed; we are TCP:8193 only, no `hssb` discovery, no PCI handle.
- **[Build]** **FOCAS password / unlock parameter** — Kepware ("Password" property), MTConnect adapter. Why: Some controllers gate `cnc_wrparam` and certain reads behind a connection-level password; we have no such property in `FocasDeviceOptions`.
- **[Build]** **Multi-path / multi-channel CNC support** — Kepware (Path number 1..n), MTConnect (per-path Components). Why: 30i/31i/32i can host 2-10 paths each with their own program / position / mode; our `cnc_setpath`-equivalent never runs and the fixed tree implicitly assumes path 1.
- **[Skip]** **Series 15, Series 15i, Power Mate D/H, Series 35i** — Kepware lists 15/15i, MTConnect adapter handles legacy. Why: Our `FocasCncSeries` enum stops at Power Motion i + 16i; legacy Series 15 deployments would either fail validation or be forced to `Unknown`.
- **[Build]** **`cnc_getfigure` decimal scaling** — Kepware, MTConnect, Memex. Why: Position values are exposed as raw scaled ints (Float64-typed) and we punt the divide-by-10^N onto the client; commercial gateways present pre-scaled millimeters/inches. (Acknowledged TODO in `docs/drivers/FOCAS.md:144-148`.)
- **[Build]** **G-code / modal info (`cnc_modal`)** — Kepware ModalCodes group, MTConnect (FunctionalMode, MotionMode, PlaneCode, etc.), Ignition. Why: Modal G/M-code state (G54 active, G90/91, G17/18/19, M03/04/05, S/F overrides) is one of the most-asked CNC tag groups; we have neither a fixed-tree exposure nor a `MODAL:` address scheme.
- **[Build]** **Tool number, current tool, tool life management** — Kepware (T-code, ToolLife group), MTConnect (`ToolNumber`, `ToolGroup`), Memex, Predator MDC. Why: Live `cnc_rdtlife*` / current T-code are core MES integration data; absent.
- **[Skip]** **Tool offset table read/write (`cnc_rdtofs` / `cnc_wrtofs`)** — Kepware, Ignition. Why: Tool length / wear / radius compensation tables are often supervisory-edited; we have no `TOFS:` address scheme.
- **[Build]** **Work coordinate offsets (G54..G59 + extended via `cnc_rdzofs` / `cnc_wrzofs`)** — Kepware "WorkOffsets" group, MTConnect (`PartCount` and `WorkCoordinate`). Why: Setup automation needs to read/poke work offsets; absent.
- **[Build]** **Override values (Feedrate %, Rapid %, Spindle %, Jog %)** — Kepware OverrideGroup, MTConnect (`PathFeedrateOverride`, `RotaryVelocityOverride`). Why: Operator-modulated speeds are crucial for OEE/MES; not in the dynamic snapshot.
- **[Build]** **Status / running flags surfaced as nodes (Auto, Run, Motion, Mstb, EmergencyStop, Edit, Tmmode, Alarm bool)** — MTConnect adapter exposes `Execution`, `ControllerMode`, `EmergencyStop` directly. Why: We poll `cnc_rdcncstat` only as a Boolean probe; the 9-field ODBST struct (tmmode/aut/run/motion/mstb/emergency/alarm/edit) is never projected to nodes.
- **[Build]** **Parts count / required parts (`cnc_rdparam` 6711/6712/6713)** — Kepware "PartCount", MTConnect `PartCountAct/Min/Max`. Why: Part counters are MES bread-and-butter; reachable today only by user-authored `PARAM:6711` tag, not in the fixed tree.
- **[Build]** **Diagnostic numbers (`cnc_rddiag` / `cnc_rddiagdgn`)** — Kepware Diagnostic group, MTConnect. Why: Servo/spindle diagnostics (axis position errors, current, temperature) are essential for predictive maintenance; no `DIAG:` address scheme.
- **[Build]** **PMC data ranges (D/T/C/K/F/G addresses) for Series 16i** — partially limited by our matrix (`PmcLetters(Sixteen_i)` only allows X/Y/R/D, `FocasCapabilityMatrix.cs:80`). Why: Real 16i ladders use F/G signals for handshakes; users would have to set Series=Unknown to bypass validation.
- **[Build]** **Bulk PMC range read (`pmc_rdpmcrng` multi-byte)** — Kepware coalesces consecutive PMC bytes; we issue one request per tag. Why: One TCP RTT per PMC byte at scale will saturate; commercial drivers batch into ranges of up to 1KB.
- **[Build]** **Alarm history (`cnc_rdalmhistry` / `cnc_rdalmhistry5`)** — MTConnect adapter, Memex. Why: Acked alarms persist in a CNC ring buffer; we surface only the active alarm list.
- **[Build]** **External operator messages (`cnc_rdopmsg` / `cnc_rdopmsg2` / `cnc_rdopmsg3`)** — Kepware OpMessage tag, MTConnect (`Message` data item). Why: Macro programmers display operator messages via #3006 / G65 P9099 etc.; not exposed.
- **[Skip]** **Program list / upload / download / delete (`cnc_rdprogdir` / `cnc_upstart` / `cnc_dnstart` family)** — Kepware program-management group, Predator MDC, Memex Merlin. Why: DNC drip-feed is a primary use case for MDC products; entirely absent.
- **[Build]** **Currently-executing program text (`cnc_rdactpt` / `cnc_rdexecprog`)** — Kepware "CurrentProgram", MTConnect `Block` and `Line`. Why: Live block display / current sequence content; we expose `Sequence` (number) but not the block text.
- **[Skip]** **DPRNT / external data input (`cnc_rdmacrohk` / external macro)** — Predator MDC, Forcam, Memex (DPRNT collector). Why: DPRNT is the standard 1980s-vintage CNC-to-MES messaging path; we have no DPRNT TCP listener and no macro-call subscription.
- **[Skip]** **Servo / spindle deep info (`cnc_rdsvinfo` / `cnc_rdspinfo`)** — Kepware, Memex. Why: Servo cycle counts, spindle motor speed/temp; absent (we only expose load percent).
- **[Skip]** **Per-axis acceleration / jerk / feed-per-rev** — MTConnect (`AccelerationSpec`, `Jerk`, `Feedrate`). Why: Beyond actual feed; absent.
- **[Build]** **Cycle time per part / last cycle time / cycle start timestamp** — MTConnect (`ProcessTimer`), Memex. Why: We expose accumulating timers but not "last completed cycle" deltas.
- **[Skip]** **`cnc_rdrelpos` reset / preset, `cnc_setpath`, `cnc_wrabsmac`** — operator-style write commands. Why: Read-only-by-design covers it, but commercial parity assumes selective writes.
- **[Skip]** **CNC time/date sync (`cnc_rdtimer` clock variant / `cnc_rtime`)** — Kepware, Memex. Why: Setting CNC system clock from a master time source is common in audited environments; absent.
- **[Build]** **Connection-level statistics + retry counters surfaced as variables** — Kepware exposes per-channel stats; we publish health but not as variables.
### Recommendations
| # | Gap | Build? | Rationale |
|---|-----|:------:|-----------|
| 1 | Writes (parameters / PMC / macro) | Yes | Key MES feedback path; current read-only is too narrow |
| 2 | HSSB transport | No | PCI hardware; declining; reopens fwlib distribution problem |
| 3 | FOCAS password / unlock | Yes | Cheap once writes ship; some controllers gate reads too |
| 4 | Multi-path / multi-channel CNC | Yes | 30i/31i/32i routinely have multiple paths |
| 5 | Series 15 / Power Mate D-H / Series 35i | No | Very legacy; small install base |
| 6 | `cnc_getfigure` decimal scaling | Yes | Already TODO; clients shouldn't compute scaling |
| 7 | Modal G-code / M-code state | Yes | One of the most-asked CNC tag groups |
| 8 | Tool number / tool life management | Yes | Core MES integration data |
| 9 | Tool offset table read / write | No | Write-heavy; defer with general write decision |
| 10 | Work coordinate offsets (G54..) | Yes | Setup automation needs read / poke |
| 11 | Override values (Feed / Rapid / Spindle / Jog) | Yes | OEE / MES bread-and-butter |
| 12 | ODBST status flags as nodes | Yes | Cheap; project the 9 fields we already read |
| 13 | Parts count in fixed tree | Yes | MES table-stakes; simple `cnc_rdparam` projection |
| 14 | Diagnostic numbers (`cnc_rddiag`) | Yes | Predictive maintenance |
| 15 | PMC F / G letters for 16i | Yes | Correctness; real ladders use F/G handshakes |
| 16 | Bulk PMC range read | Yes | Big perf gain at scale |
| 17 | Alarm history (`cnc_rdalmhistry`) | Yes | Auditing; small extension to alarm projection |
| 18 | Operator messages (`cnc_rdopmsg*`) | Yes | Cheap; common macro feedback |
| 19 | Program list / upload / download / delete | No | DNC product territory; significant scope |
| 20 | Currently-executing program text | Yes | HMI displays expect block view |
| 21 | DPRNT TCP listener | No | Significant scope; modern paths supersede it |
| 22 | Servo / spindle deep info | No | Specialty; load% covers most needs |
| 23 | Per-axis acceleration / jerk / feed-per-rev | No | Niche advanced telemetry |
| 24 | Cycle time per part / last cycle delta | Yes | OEE-essential |
| 25 | Operator write commands (preset etc.) | No | Read-only design choice; revisit only with general writes |
| 26 | CNC time / date sync | No | Rare ask; commonly handled by CNC NTP |
| 27 | Connection statistics as variables | Yes | Cheap given existing health |
### Notable parity (keep)
- Pure-managed wire client (no Fwlib distribution problem) — significant operational win vs Kepware's HSSB driver DLL stack.
- Per-series capability matrix at `InitializeAsync` time prevents silent runtime `BadOutOfRange` on misconfigured macro/parameter/PMC numbers.
- Fixed-tree per-API capability probes auto-suppress nodes the CNC doesn't support — operators don't see nodes that perpetually return `BadDeviceFailure`.
- `IPerCallHostResolver` integrates each device into the shared resilience bulkhead (Phase 6.1) — comparable to Kepware's per-device "channel" isolation.
- Three-tier poll cadence (axis fast / program medium / timer slow) is closer to MTConnect adapter behaviour than Kepware's single-rate channel scan.
- Handle-recycle loop is a thoughtful defence against documented Fanuc handle-leak firmware bugs — not present in many commercial drivers.
- Alarm projection differentiates raise vs clear and maps `ALM_TYPE_*` to OPC UA severity buckets — closer to A&E semantics than the simple "alarm bit" Kepware exposes.
### Sources
- https://www.kepware.com/en-us/products/kepserverex/drivers/fanuc-focas-hssb-ethernet/ — Kepware Fanuc Focas HSSB and Ethernet Driver
- https://github.com/mtconnect/cppagent_dev/tree/main/agent/adapter/fanuc — MTConnect Fanuc adapter reference
- https://github.com/Ladder99/focas-mock — managed Focas wire client (the OSS basis we consume)
- https://www.inductiveautomation.com/exchange/2218 — Ignition Fanuc FOCAS driver module
- https://memex.ca/merlin-tempus-mes-suite/ — Memex Merlin OEE / Fanuc connectivity
- https://www.predator-software.com/cnc-data-collection.htm — Predator MDC / DNC capabilities
- https://www.forcam.com/en/products/factory-data-collection/ — Forcam Force MES Fanuc driver
- Fanuc FOCAS Developer Kit `fwlib32.h` (mirrored at `strangesast/fwlib`) — authoritative API surface
- https://www.mtconnect.org/standard-2 — MTConnect Standard Part 2 Devices Information Model
---
## OpcUaClient (OPC UA Aggregation Client)
### What we ship today
- **Endpoint config**: single `EndpointUrl` plus ordered `EndpointUrls` failover list with `PerEndpointConnectTimeout` per-attempt budget (`OpcUaClientDriverOptions.cs:22-40`); failover sweep tries each in order on init and on session drop (`OpcUaClientDriver.cs:95-118`).
- **Security policies**: `None`, `Basic128Rsa15`, `Basic256`, `Basic256Sha256`, `Aes128_Sha256_RsaOaep`, `Aes256_Sha256_RsaPss` plus `Sign` / `SignAndEncrypt` modes; explicit policy+mode matching against the server's `GetEndpoints` response, no silent fallback to a weaker cipher (`OpcUaClientDriver.cs:299-336`).
- **Identity tokens**: Anonymous, Username/Password, and X509 user-certificate (PFX with private key) — built once and reused across every failover attempt (`OpcUaClientDriver.cs:244-369`).
- **Certificate management**: per-process PKI store rooted at `%LocalAppData%\OtOpcUa\pki` with own/trusted/issuers/rejected directories; SDK auto-creates the application instance certificate at startup; `AutoAcceptCertificates` dev knob hooks the validator's `BadCertificateUntrusted` path (`OpcUaClientDriver.cs:163-217`).
- **Session lifecycle**: configurable `SessionTimeout`, `KeepAliveInterval`, `ReconnectPeriod`, `ApplicationUri`, `SessionName`, operation `Timeout` (`OpcUaClientDriverOptions.cs:82-112`).
- **Reconnect**: native `Session.KeepAlive` event drives a `SessionReconnectHandler` with a 2-minute max retry period; SDK's automatic `TransferSubscriptions` migrates monitored items onto the rebuilt channel; keep-alive is rewired onto the new session post-recovery (`OpcUaClientDriver.cs:1297-1359`).
- **Discovery**: two-pass recursive browse from `BrowseRoot` (default `ObjectsFolder`) with `MaxBrowseDepth=10` and `MaxDiscoveredNodes=10_000` caps; pass 2 batch-reads `DataType` + `ValueRank` + `UserAccessLevel` + `Historizing` per variable in one Session.ReadAsync (`OpcUaClientDriver.cs:596-810`).
- **Type mapping**: built-in OPC UA scalar types → `DriverDataType`; structs/enums/extension objects fall through to String passthrough; `ValueRank>=0` flags arrays (`OpcUaClientDriver.cs:820-836`).
- **ACL bridge**: `UserAccessLevel.CurrentWrite``SecurityClassification.Operate`, otherwise `ViewOnly`; gating happens server-side in DriverNodeManager (`OpcUaClientDriver.cs:844-850`).
- **Read/Write**: batched ReadAsync/WriteAsync with NodeId pre-parse + per-tag `BadNodeIdInvalid` short-circuit; cascading-quality preserves upstream `StatusCode` and `SourceTimestamp` verbatim; transport faults fan out as `BadCommunicationError` (`OpcUaClientDriver.cs:441-568`).
- **Subscriptions**: native MonitoredItem forwarding with publishing-interval floor of 50 ms, `KeepAliveCount=10`, `LifetimeCount=1000`, `QueueSize=1`, `DiscardOldest=true`, `Reporting` mode, `TimestampsToReturn.Both` (`OpcUaClientDriver.cs:854-914`).
- **Alarms (A&C)**: EventFilter SelectClauses on `BaseEventType` + `ConditionType` (EventId/EventType/SourceNode/Message/Severity/Time/ConditionId), source-node filter set, `QueueSize=1000` for burst tolerance, `Acknowledge` method invocation forwarded as `CallAsync`; severity bucketed Low/Medium/High/Critical per OPC UA Part 9 (`OpcUaClientDriver.cs:967-1143`).
- **HistoryRead pass-through**: `ReadRawAsync`, `ReadProcessedAsync` (Average/Min/Max/Total/Count standard aggregates), `ReadAtTimeAsync` with continuation point support (`OpcUaClientDriver.cs:1154-1264`).
- **Diagnostics**: per-driver `HostName` reflects the URL actually connected (not the first candidate); `HostState` transitions Running/Stopped/Unknown driven by keep-alive; `DriverHealth` carries `LastSuccessfulRead` + last error (`OpcUaClientDriver.cs:1281-1372`).
- **Capability surface**: 8/8 — `IDriver`, `ITagDiscovery`, `IReadable`, `IWritable`, `ISubscribable`, `IHostConnectivityProbe`, `IAlarmSource`, `IHistoryProvider`.
### Gaps vs commercial UA aggregators
- **[Build]** **Reverse Connect (server-initiated client connect)** — present in: UaGateway, Prosys Forge, Kepware (1.5+), Matrikon. Why: lets the upstream server traverse outbound-only firewalls (typical OT-DMZ direction); a hard requirement for many regulated plant networks.
- **[Build]** **Discovery URL with `FindServers` / `FindServersOnNetwork`** — present in: Kepware, UaGateway, Matrikon. Why: we accept only an explicit endpoint URL; commercial gateways resolve a discovery URL and let the operator pick from advertised endpoints in a UI without copying the policy/mode tuple by hand.
- **[Skip]** **Multicast / LDS-ME registration** — present in: UaGateway, Prosys. Why: lets clients discover this gateway via the Local Discovery Server without static config.
- **[Skip]** **GDS push management (Part 12)** — present in: UaGateway, Prosys. Why: certificate provisioning, renewal, trust-list updates pushed from a central GDS — required for fleets >10 endpoints; we have no `ServerConfigurationType` method support and no automatic renewal hook.
- **[Build]** **Per-tag advanced subscription tuning** — present in: Kepware, UaGateway, Cogent. Why: `SamplingInterval`, `QueueSize`, `DiscardOldest`, `MonitoringMode`, `DataChangeFilter` (DeadbandType=Absolute/Percent, Trigger=Status/StatusValue/StatusValueTimestamp) are hard-coded (50 ms / 1 / true / Reporting / no deadband). No way to set deadbands per tag — a baseline aggregator feature for analog noise filtering.
- **[Build]** **Per-subscription tuning (`PublishingInterval` / `KeepAliveCount` / `LifetimeCount` / `MaxNotificationsPerPublish` / `Priority`)** — present in: all listed gateways. Why: we hard-code 10/1000/0/0 in `Subscription` and `MaxNotificationsPerPublish=0` (unlimited) is a denial-of-service surface against high-event-rate servers; high-tag-count deployments need to split subscriptions across priorities.
- **[Build]** **Selective import / namespace remap** — present in: Kepware, Matrikon, UaGateway, Cogent. Why: we mirror everything under `BrowseRoot` and re-prefix with a single "Remote" folder; commercial aggregators support per-branch include/exclude rules, namespace-URI remapping, alias paths, and re-keyed BrowseNames.
- **[Build]** **Type definition mirroring (ObjectTypes / VariableTypes / DataTypes / ReferenceTypes)** — present in: UaGateway, Prosys, Kepware. Why: we walk Object + Variable nodes only; HasTypeDefinition references and custom type nodes are dropped, so downstream UI clients lose type-aware rendering and structured DataTypes decode as String passthrough.
- **[Build]** **Method node mirroring + pass-through `Call`** — present in: UaGateway, Matrikon, Kepware. Why: `NodeClass.Method` is filtered out of the browse and `IDriver` has no `CallMethodAsync` capability; clients cannot invoke remote methods through the gateway. (`Acknowledge` is the only call we forward, hard-coded for A&C.)
- **[Build]** **Automatic re-import on remote `ServerStatus.NodeVersion` / `ModelChangeEvent`** — present in: UaGateway, Kepware, Prosys. Why: we don't subscribe to `ServerStatus.State` or `BaseModelChangeEventType`; if the upstream server adds nodes mid-flight the new tags don't appear until the driver is reinitialized.
- **[Skip]** **HistoryUpdate / HistoryRead-Modified / Annotation pass-through** — present in: UaGateway, Prosys Historian, Kepware (LocalHistorian). Why: we ship Raw/Processed/AtTime only; `IsReadModified=false` is hard-coded; no `HistoryUpdate`, no `DeleteRawModified`, no annotation forwarding. Many MES integrations need backfill writes.
- **[Build]** **`ReadEventsAsync` (HistoryRead Events)** — explicitly deferred per memory entry. Why: `IHistoryProvider.ReadEventsAsync` interface lacks an `EventFilter SelectClauses` parameter to carry the field projection.
- **[Build]** **Aggregate function set** — present in: UaGateway, Prosys, Kepware. Why: we map only Average/Minimum/Maximum/Total/Count; OPC UA Part 13 standard catalog has 30+ (TimeAverage, Interpolative, StdDev, DurationGood, NumberOfTransitions, etc.) that historian-class clients expect.
- **[Build]** **Redundant-server URI list (`ServerUriArray`) and transparent failover** — present in: Kepware, UaGateway, Matrikon. Why: our `EndpointUrls` is a one-shot connect-attempt list, not a live redundancy group; we don't read the upstream `ServerRedundancyType` or fail over mid-session on `ServiceLevel` drop.
- **[Build]** **Maximum nodes per Read/Write/Browse honored from server capabilities** — present in: all listed gateways. Why: we delegate chunking to the SDK but never query `Server.ServerCapabilities.OperationLimits.MaxNodesPerRead/Write/Browse`; on undersized servers this can produce `BadTooManyOperations` instead of automatic fragmentation.
- **[Skip]** **Connection / session pooling for multi-instance scale-out** — present in: UaGateway, Cogent. Why: each driver instance opens its own session even when N drivers point at the same upstream; commercial gateways multiplex one session per remote across multiple downstream contexts to cut session count and cert-handshake load.
- **[Build]** **Diagnostics counters (PublishRequest count, NotificationsPerSecond, MissingPublishRequests, dropped-notification rate)** — present in: UaGateway, Prosys. Why: `DriverHealth` carries `LastSuccessfulRead` + last error string only; no per-server message-rate counters or publish-queue health metrics for the Admin dashboard.
- **[Skip]** **Kerberos / OAuth2 / IssuedToken (JWT) user identity** — present in: Kepware (Kerberos), UaGateway, Prosys. Why: we support Anonymous/Username/Certificate only; no `IssuedIdentityToken` token type, no Kerberos SPNEGO, no JWT bearer flow that newer security stacks (Azure AD) expect.
- **[Skip]** **WriteAsync attribute scope beyond Value** — present in: UaGateway, Matrikon. Why: `WriteAsync` hard-codes `AttributeId = Attributes.Value`; no way to write `StatusCode`, `SourceTimestamp`, or non-Value attributes (rare but a documented OPC UA capability).
- **[Build]** **CRL / revocation list configuration** — present in: Kepware, UaGateway. Why: the cert-validator hooks `BadCertificateUntrusted` only; revoked-cert chains aren't explicitly checked or surfaced as a distinct fault, and there's no `RejectSHA1SignedCertificates` knob.
### Recommendations
| # | Gap | Build? | Rationale |
|---|-----|:------:|-----------|
| 1 | Reverse Connect | Yes | OT-DMZ outbound-only is the standard plant-network direction |
| 2 | Discovery URL `FindServers` | Yes | Standard UX; saves manual policy / mode tuple copy |
| 3 | Multicast / LDS-ME registration | No | Server-side responsibility, not aggregator's |
| 4 | GDS push management (Part 12) | No | Significant infra; rare for our deployment scale |
| 5 | Per-tag advanced subscription tuning (deadband, queue, mode) | Yes | Deadbands are baseline analog filtering |
| 6 | Per-subscription tuning (publishing / keep-alive / lifetime) | Yes | Avoid DoS on bursty servers; operability |
| 7 | Selective import / namespace remap | Yes | Curation is a baseline aggregator feature |
| 8 | Type definition mirroring | Yes | UI clients lose structure decoding without it |
| 9 | Method node mirroring + `Call` passthrough | Yes | Clear functional gap; `IDriver` capability missing |
| 10 | Auto re-import on `ModelChangeEvent` | Yes | Correctness when remote topology changes |
| 11 | HistoryUpdate / Modified / Annotation passthrough | No | MES backfill scope; defer |
| 12 | `ReadEventsAsync` (HistoryRead Events) | Yes | Fix the `IHistoryProvider` abstraction gap |
| 13 | Full Aggregate function set (Part 13) | Yes | Cheap to forward; historian clients expect it |
| 14 | `ServerUriArray` redundant failover | Yes | HA expectation when upstream is redundant |
| 15 | Honor server `OperationLimits` | Yes | Correctness; avoids `BadTooManyOperations` |
| 16 | Connection / session pooling | No | Premature; current per-instance model is simple and adequate |
| 17 | Diagnostics counters | Yes | Operability; admin dashboard needs publish-rate visibility |
| 18 | Kerberos / OAuth2 / JWT identity | No | Significant security work; defer until AD integration drives it |
| 19 | Write attribute scope beyond Value | No | Niche; rarely used in OPC UA practice |
| 20 | CRL / revocation handling | Yes | Security baseline expectation |
### Notable parity (keep)
- Cascading-quality contract: upstream `StatusCode` and `SourceTimestamp` preserved verbatim across Read, Subscribe, History — a baseline OPC-to-OPC bridging requirement.
- Native subscription forwarding (no polling translation layer) — matches Kepware/UaGateway architecture, not Matrikon Tunneller's COM-bridge approach.
- Two-pass discovery batching attribute reads — many naive aggregators issue per-node Reads which makes 10k-node servers take minutes.
- Explicit policy+mode endpoint matching (no silent downgrade) — matches UaGateway's behavior; Kepware historically defaulted to "best available" which has been a CVE source.
- Per-endpoint connect-timeout in failover sweep — bounded init budget is a property most of the listed gateways added late.
- SDK-managed `TransferSubscriptions` on reconnect — matches the OPC Foundation reference behavior; no hand-rolled migration code.
### Sources
- OPC Foundation UA-.NETStandard SDK docs — https://github.com/OPCFoundation/UA-.NETStandard
- Kepware KEPServerEX OPC UA Client — https://www.ptc.com/en/products/kepware/kepserverex/clients/opc-ua-client
- Matrikon OPC UA Tunneller — https://www.matrikonopc.com/products/opc-tunneller/
- Unified Automation UaGateway — https://www.unified-automation.com/products/wrapper-and-gateway/ua-gateway.html
- Prosys OPC UA Forge / Historian — https://www.prosysopc.com/products/opc-ua-forge/
- Cogent DataHub OPC UA — https://www.cogentdatahub.com/products/opc-ua/
- AVEVA System Platform OI.UACLIENT — https://docs.aveva.com (Operations Integration UACLIENT)
- OPC UA Part 4 (Services), Part 5 (Information Model), Part 9 (A&C), Part 11 (HistoricalAccess), Part 12 (Discovery & GDS), Part 13 (Aggregates), Part 14 (PubSub) — https://reference.opcfoundation.org/
---
## S7 (Siemens S7-300/400/1200/1500)
### What we ship today
- Native S7comm over ISO-on-TCP via S7netplus; default port 102, configurable so an in-CI Snap7 server can bind 1102 (`S7DriverOptions.cs:32`, `S7Driver.cs:87`).
- CPU family selector — `S71200`, `S71500`, `S71200Smart`, `S7200`, `S7300`, `S7400` — enum forwarded straight to S7netplus to pick the remote TSAP slot byte (`S7DriverOptions.cs:34-38`).
- Rack/slot configuration with documented conventions (S7-300 slot 2, S7-400 slot 2/3, S7-1200/1500 slot 0) (`S7DriverOptions.cs:42-51`).
- Single-connection-per-PLC policy enforced by a `SemaphoreSlim` because the CPU's comms mailbox is scanned at most once per cycle (`S7Driver.cs:23-27,60-67`).
- Static tag table parsed at `InitializeAsync` so syntactic typos fail fast instead of bleeding through as `BadInternalError` per read (`S7Driver.cs:103-110`).
- Address parser accepts DB / M / I / Q / T / C with X/B/W/D widths and 0-7 bit offsets, case-insensitive, with structured `FormatException` messages (`S7AddressParser.cs:65-216`).
- Scalar reads/writes for Bool, Byte, Int16/UInt16, Int32/UInt32, Float32 with explicit signed/unsigned reinterpret of S7netplus' boxed unsigned return values (`S7Driver.cs:231-251,306-322`).
- PUT/GET-disabled detection — `S7.Net.PlcException` mapped to `BadDeviceFailure` and surfaced as a configuration alert rather than retried via Polly (`S7Driver.cs:200-208`, `S7DriverOptions.cs:14-25`).
- Polled `ISubscribable` overlay floored at 100 ms to avoid wire-side queueing past CPU scan; per-tag last-value diffing for change-of-value publishing (`S7Driver.cs:365-425`).
- `IHostConnectivityProbe` using `ReadStatusAsync` (CPU Run/Stop) every probe interval, gated on the same semaphore so it doesn't race a live read (`S7Driver.cs:457-489`).
- Per-tag `WriteIdempotent` flag for replay-safe write retry policy (`S7DriverOptions.cs:91-104`).
- Snap7-server-backed integration fixture covers atomic typed reads + DB write-then-read round-trip on `localhost:1102` (`docs/drivers/S7-Test-Fixture.md:1-60`).
- Test CLI — probe / read / write / subscribe — with the same address grammar and CPU/slot flags (`docs/Driver.S7.Cli.md`).
### Gaps vs commercial gateways
- **[Build]** **S7-1500 Optimized DB / Symbolic addressing (S7Plus)** — present in: Kepware "Siemens S7 Plus", Ignition, AVEVA OI.SIDIRECT (limited). Why: S7netplus speaks classic S7comm only; optimized DBs reorder fields and have no fixed byte offsets, so absolute `DB1.DBW0` reads return `BadDeviceFailure` until "Optimized block access" is unchecked in TIA Portal.
- **[Build]** **PDU size negotiation surfaced to operators** — present in: Kepware, TOP Server, AVEVA OI.SIDIRECT. Why: Modern S7 CPUs negotiate PDU sizes from 240 up to 960 bytes; we accept whatever S7netplus negotiates with no operator visibility into the cap and no per-request packing strategy that uses the negotiated size.
- **[Build]** **Multi-variable PDU packing / read coalescing** — present in: every commercial gateway. Why: `ReadAsync(IReadOnlyList<string>)` issues one S7netplus call per tag inside the semaphore (`S7Driver.cs:182-214`); commercial gateways bin-pack contiguous DB ranges into a single multi-item PDU which is 5-50× faster on dense tag groups.
- **[Build]** **TSAP / Connection Type selector (PG / OP / S7-Basic / Other)** — present in: Kepware, TOP Server, AVEVA. Why: S7netplus picks PG-style TSAPs; sites that need OP-class slots (e.g. fenced HMI connections, license-counted PG slots) cannot pick. Some S7-1500 hardening modes refuse PG access from non-allowlisted clients.
- **[Build]** **Symbol-table / TIA Portal export browse** — present in: Kepware (online symbol upload on S7-1500), Ignition (TIA tag CSV import), TOP Server (tag-import wizard from `.AWL`/`.udt`/`.xml`). Why: We ship a static tag table only (`S7DriverOptions.cs:55-57`); operators must hand-edit the JSON. No `.tia`/`.s7p` import, no online symbol read of the S7-1500 PG symbol table.
- **[Build]** **UDT / STRUCT / nested-DB handling** — present in: Kepware, Ignition, TOP Server. Why: Tag map is flat scalar-only — no UDT fan-out into member variables, no `Array of <UDT>` indexing. Real S7-1500 projects expose hundreds of UDT-typed DBs.
- **[Build]** **Array tags (ValueRank=1)** — present in: every commercial gateway. Why: `S7TagDefinition` has no array dimension; `MapDataType` always returns `IsArray: false` (`S7Driver.cs:337-345`). OPC UA arrays of S7 `Array[0..n]` are unaddressable.
- **[Build]** **STRING / WSTRING / DTL / S5TIME / TIME / DATE_AND_TIME read+write** — present in: every commercial gateway. Why: Enum entries exist but every code path throws `NotSupportedException` (`S7Driver.cs:241-245,316-320`); S7 `STRING` has a 2-byte header, `WString` is UTF-16 with a 4-byte header, `DTL` is 12 bytes, `S5TIME` is BCD-encoded — none are wired up.
- **[Build]** **64-bit types (LInt / ULInt / LReal / LWord)** — present in: Kepware S7 Plus, Ignition, TOP Server S7-1500 driver. Why: `Int64`/`UInt64`/`Float64` cases throw `NotSupportedException` (`S7Driver.cs:241-243`); S7-1500 `LReal` (8-byte double) is the standard analog representation in modern projects.
- **[Build]** **Instance-DB / FB-block parameter access** — present in: Kepware, Ignition (with TIA import). Why: We address by absolute DB number; instance DBs of multi-instance FBs need symbolic resolution (`MyFB_Instance.MyParam`) which our parser doesn't accept.
- **[Build]** **CPU diagnostic buffer / SZL reads** — present in: Kepware (CPU diagnostic tags), TOP Server (`@Diagnostic` tags), AVEVA OI.SIDIRECT. Why: We probe `ReadStatusAsync` only (`S7Driver.cs:476`); SZL IDs 0x0000-0xFFFF (CPU type, firmware version, cycle time min/max/avg, diagnostic-buffer entries, hardware module status) are not exposed as system tags.
- **[Skip]** **AS-Alarms / Alarm_S/SQ/D/DQ / S7 ProDiag** — present in: Kepware (Alarms suite), Ignition. Why: No `IAlarmSource` implementation; CPU-resident alarms (Alarm_S blocks, ProDiag supervision messages, system diagnostic messages) are invisible to OPC UA A&E clients. CPU diagnostic-buffer entries similarly not surfaced.
- **[Skip]** **CPU Run/Stop control / block download / PG functions** — present in: Kepware (limited), AVEVA OI.SIDIRECT. Why: `ReadStatusAsync` is the only PG-class call we make; remote `WriteCpuStop` / `WriteCpuStart`, block download, password authentication for PG functions are absent.
- **[Build]** **PLC password / protection-level handling** — present in: Kepware, TOP Server, AVEVA. Why: S7-300/400 protection levels 1-3 and S7-1200/1500's "Connection mechanisms" / "Full access incl. fail-safe" tiers can require a password on connect; S7netplus's `Plc` ctor takes no password and we have no place to plumb one through.
- **[Skip]** **S7-1500 "Secure Communication" (TLS / certificate-based)** — present in: Siemens-direct (OPC UA on S7-1500), Kepware S7 Plus partial. Why: S7-1500 firmware V3.0+ supports authenticated PG connections with certificates; we connect plaintext over TCP only. Sites with hardened CPUs (`Access protection = high` + cert required) won't accept the driver.
- **[Skip]** **S7-400H / redundant H-system support** — present in: Kepware (paired-IP with sticky-master), AVEVA OI.SIDIRECT. Why: We have one host/port; H-systems present two sync'd CPUs on two IPs and the driver should fail over without losing subscriptions. Driver-level redundancy is unimplemented (server-level redundancy in `docs/Redundancy.md` is a separate axis).
- **[Skip]** **Multi-CPU rack / multiple TSAPs per rack** — present in: Kepware, TOP Server. Why: One Plc instance binds one (rack, slot); S7-400 multi-CPU racks expose 2-4 CPUs that need parallel sessions to drive in parallel.
- **[Skip]** **MPI / Profibus / RFC1006-routed transports** — present in: Kepware, AVEVA OI.SIDIRECT (DASSIDirect legacy paths), TOP Server. Why: S7netplus is Ethernet-only. Brownfield S7-300 sites still routed via CP 5611/5613 MPI cards or via S7-1500-as-router for fenced subnets are out of reach.
- **[Build]** **LOGO! 8 / S7-200 / S7-200 Smart variant tuning** — present in: Kepware "Siemens TCP/IP Ethernet" (LOGO!), Sharp7 (S7-200 Smart), Ignition. Why: `CpuType.S7200`/`S7200Smart` exists in S7netplus but the V-memory area (`V` letter) is not in our parser's switch (`S7AddressParser.cs:88-97`). LOGO!'s VM range and S7-200's V/SM areas are unaddressable.
- **[Build]** **Per-tag scan group / publish rate** — present in: Kepware (scan classes), Ignition (tag groups), TOP Server (scan rate per tag). Why: Subscriptions take one publishingInterval for the whole tag list (`S7Driver.cs:365-380`); a CPU with mixed 100 ms / 1 s / 10 s tags needs three subscribe calls and three semaphore-serialized poll loops.
- **[Build]** **Deadband / on-change suppression with absolute or percent thresholds** — present in: every commercial gateway. Why: We diff exact-equal only (`S7Driver.cs:419`); no analog deadband — a noisy float tag floods the bus.
- **[Build]** **Block-read coalescing for contiguous DB regions** — present in: every commercial gateway. Why: Reading `DB1.DBW0`, `DB1.DBW2`, `DB1.DBW4` issues 3 calls; commercial drivers issue a single FC=04 ReadVarRequest covering bytes 0-5 and slice client-side.
- **[Skip]** **Connection-resource budget management / max-parallel-jobs (AmqLen)** — present in: Kepware, TOP Server. Why: S7-1200/1500 expose 8-64 connection-resources and a per-connection parallel-jobs cap (Amq); we hold one connection and serialize, but commercial drivers open 2-4 connections per CPU to multiplex. We have no operator knob.
- **[Build]** **Pre-flight / online-test of PUT/GET enablement** — present in: Kepware (config validation step), AVEVA. Why: We surface `BadDeviceFailure` only at first read (`S7Driver.cs:200-208`); commercial drivers warn during connection wizard via SZL probe before the operator commits config.
### Recommendations
| # | Gap | Build? | Rationale |
|---|-----|:------:|-----------|
| 1 | S7-1500 Optimized DB / Symbolic addressing (S7Plus) | Yes | Hard blocker on modern S7-1500 sites |
| 2 | PDU size negotiation surfaced | Yes | Cheap operability; no behavior change |
| 3 | Multi-variable PDU packing | Yes | 5-50x perf; current per-tag-per-call is the baseline gap |
| 4 | TSAP / Connection Type selector | Yes | Hardened CPUs reject PG-class slots |
| 5 | Symbol-table / TIA Portal export browse | Yes | Workflow parity; static JSON doesn't scale |
| 6 | UDT / STRUCT / nested-DB handling | Yes | Real S7-1500 projects expose hundreds of UDTs |
| 7 | Array tags (ValueRank=1) | Yes | Table-stakes; currently unaddressable |
| 8 | STRING / WSTRING / DTL / S5TIME / TIME / DT | Yes | Standard datatypes; currently throw `NotSupported` |
| 9 | 64-bit types (LInt / ULInt / LReal / LWord) | Yes | LReal is the standard analog representation on S7-1500 |
| 10 | Instance-DB / FB parameter access | Yes | Modern symbolic structure; absolute DBs alone are limiting |
| 11 | CPU diagnostic buffer / SZL reads | Yes | Operability; firmware / cycle-time visibility |
| 12 | AS-Alarms / Alarm_S / ProDiag | No | Significant scope; alarms are a separate workstream |
| 13 | CPU Run / Stop control / block download | No | Security / safety risk; out of scope |
| 14 | PLC password / protection-level handling | Yes | Hardened CPUs require it (S7netplus support permitting) |
| 15 | S7-1500 Secure Communication / TLS | No | Significant work; defer |
| 16 | S7-400H redundant H-system support | No | Rare in our deployment scope |
| 17 | Multi-CPU rack parallel sessions | No | Rare; one session per CPU works |
| 18 | MPI / Profibus / RFC1006-routed transports | No | Declining; brownfield only |
| 19 | LOGO! 8 / S7-200 V-memory area | Yes | Small parser fix broadens coverage materially |
| 20 | Per-tag scan group / publish rate | Yes | Operability; mixed-rate is normal |
| 21 | Deadband / on-change with thresholds | Yes | Analog noise mitigation |
| 22 | Block-read coalescing for contiguous DBs | Yes | Big perf win; complements multi-variable PDU packing |
| 23 | Connection-resource budget / parallel jobs | No | Premature; one connection works for most rigs |
| 24 | Pre-flight PUT/GET enablement test | Yes | UX improvement; cheap |
### Notable parity (keep)
- Single-connection-per-PLC + semaphore serialization is the documented S7netplus / Snap7 best practice and matches what TOP Server / AVEVA do in their default profile.
- 100 ms minimum publishing interval correctly reflects CPU mailbox scan reality — commercial gateways advertise "1 ms scan" in marketing then quietly floor to ~100 ms in practice.
- Strict address-parse-at-init with structured exceptions (rather than per-read `BadInternalError`) is better operator UX than Kepware's "you'll find out at runtime" default.
- PUT/GET-disabled mapped to a sticky `BadDeviceFailure` instead of being retried by Polly — Polly retry against a CPU that will keep refusing is exactly the failure mode that floods commercial deployments.
- `WriteIdempotent` per-tag flag is finer-grained than Kepware's connection-level `Auto Demote` and matches the safe-replay reality: DB set-points are replayable, M/Q edge-triggered bits are not.
- Probe path uses `ReadStatusAsync` (single CPU-state PDU) rather than a tag read — doubles as "PLC actually up" without polluting the comms mailbox.
- Driver-instance host/port format (`host:port`) matches the Modbus driver so Admin UI can render both families uniformly.
- Snap7-server CI fixture closes the "no commercial vendor offers a meaningful S7 simulator" gap that Kepware/TOP Server users hit on day one.
### Sources
- https://www.kepserverexopc.com/products/siemens-tcpip-ethernet/ (Kepware Siemens TCP/IP Ethernet)
- https://www.kepware.com/en-us/products/kepserverex/drivers/siemens-s7-plus/ (Kepware S7 Plus — Optimized DB / Symbolic addressing)
- https://www.aveva.com/en/products/communication-drivers/ (AVEVA OI Server / DASSIDirect)
- https://www.softwaretoolbox.com/topserver-siemens-suite (TOP Server Siemens Suite)
- https://docs.inductiveautomation.com/docs/8.1/platform/connecting-to-devices/siemens (Ignition Siemens driver guide)
- https://github.com/S7NetPlus/s7netplus (S7netplus library)
- https://snap7.sourceforge.net/ (Snap7)
- https://github.com/evcc-io/sharp7 (Sharp7 fork — S7-1200/1500 PUT/GET semantics)
- https://cache.industry.siemens.com/dl/files/591/68018591/att_956083/v1/s71500_communication_function_manual_en-US_en-US.pdf (Siemens S7-1500 Communication Function Manual)
- https://support.industry.siemens.com/cs/document/26224811 (Siemens — TSAPs and connection resources)
- https://support.industry.siemens.com/cs/document/89260861 (Siemens — SZL list IDs / system status lists)
- https://docs.tia.siemens.cloud/r/en-us/v20/safety-and-security/secure-communication (S7-1500 secure communication)
---
## TwinCAT (Beckhoff ADS)
### What we ship today
- `TwinCATDriver` implements `IReadable`, `IWritable`, `ISubscribable`, `ITagDiscovery`, `IHostConnectivityProbe`, `IPerCallHostResolver` over Beckhoff's `Beckhoff.TwinCAT.Ads` v6 `AdsClient` (`src/ZB.MOM.WW.OtOpcUa.Driver.TwinCAT/TwinCATDriver.cs:11-12`, `AdsTwinCATClient.cs:22-24`).
- AMS addressing parses `ads://{netId}:{port}` with the six-octet AmsNetId and TC3 default port 851 (also documents 801/811/821 for TC2 and 10000 for system service) (`TwinCATAmsAddress.cs:20-64`).
- Native ADS notifications via `AddDeviceNotificationExAsync` with `AdsTransMode.OnChange` and per-tag cycle time; falls back to shared `PollGroupEngine` when `UseNativeNotifications=false` (`TwinCATDriver.cs:296-339`, `AdsTwinCATClient.cs:130-160`).
- IEC 61131-3 atomic data type surface — Bool, S/U Int 8/16/32/64, Real, LReal, String, WString, Time, Date, DT, TOD (`TwinCATDataType.cs:9-30`).
- Symbol path parser supports POU/GVL prefix, struct member walks, array subscripts incl. multi-dim `Matrix[1,2]`, and bit-access `.0..31` (`TwinCATSymbolPath.cs:1-104`).
- Bit-indexed BOOL **read** path: read parent word as `uint`, mask locally (`AdsTwinCATClient.cs:57-64`, `ExtractBit`).
- Optional controller-side symbol browse via `SymbolLoaderFactory` (flat mode), with system-symbol filter for `TwinCAT_*`, `Constants.*`, `Mc_*`, `__*` (`AdsTwinCATClient.cs:178-195`, `TwinCATSystemSymbolFilter.cs`).
- Per-device probe loop calls `ReadStateAsync` and emits `OnHostStatusChanged` Running/Stopped transitions (`TwinCATDriver.cs:366-402`).
- Status-code mapping `AdsErrorCode` → OPC UA via `TwinCATStatusMapper`; auto-reconnect on dropped client (`TwinCATDriver.cs:413-429`).
- Sized strings `STRING(80)` / `WSTRING(80)` are tolerated in browse — type name parens stripped to bare atom (`AdsTwinCATClient.cs:200-206`).
- Live-tested against TCBSD VM and Hyper-V XAR — 30 integration test cases (read/write/array/subscribe/browse/reconnect/probe), 110 unit tests (`docs/drivers/TwinCAT-Test-Fixture.md`).
### Gaps vs commercial gateways
- **[Build]** **ADS Sum commands (sum-read / sum-write / sum-add-notification)** — present in: Kepware, TF6100, Ignition, TwinCAT.Ads itself. Why: we issue one `ReadValueAsync` per tag in a loop (`TwinCATDriver.cs:118-156`); commercial drivers batch into `IndexGroup=0xF080..0xF084` sum requests for ~10x throughput on multi-thousand tag scans.
- **[Build]** **Handle-based access (CreateVariableHandle / ReadByHandle)** — present in: Kepware, TF6100, AdsClient itself. Why: we resolve the symbolic name on every read; cached handles cut per-request bytes and AMS overhead, especially over WAN/multi-hop.
- **[Build]** **STRUCT / UDT decomposition with offline TMC parsing** — present in: Kepware (TwinCAT TMC import), TF6100 (native), Ignition. Why: `TwinCATDataType.Structure` is declared but discovery skips non-atomic symbols (`AdsTwinCATClient.cs:224`); we can't expose nested UDT trees without hand-declaring every leaf.
- **[Build]** **Bit-indexed BOOL writes** — present in: Kepware, TF6100. Why: we throw `NotSupportedException` (`AdsTwinCATClient.cs:99-100`); commercial drivers do read-modify-write or use `ADSIGRP_SYM_VALBYNAME` with the `.N` syntax the runtime supports for some primitives.
- **[Build]** **Multi-dim / whole-array reads** — present in: Kepware, TF6100. Why: we parse `Matrix[1,2]` element-by-element but never read the array in one ADS call; sized-array marshalling is in `TwinCAT.Ads` but unused here.
- **[Build]** **Int64 fidelity** — present in: TF6100, Ignition. Why: `LInt`/`ULInt` map to `DriverDataType.Int32` (`TwinCATDataType.ToDriverDataType` line 40 with explicit "matches Int64 gap" comment) — silent precision loss above 2^31.
- **[Build]** **TIME / DATE / DT / TOD as native OPC UA types** — present in: TF6100 (DateTime/Duration), Kepware. Why: we marshal all four as raw `UDINT` (`AdsTwinCATClient.cs:278-280`) leaving timestamp interpretation to the client.
- **[Build]** **ENUM / ALIAS / REFERENCE / POINTER / INTERFACE / UNION** — present in: TF6100, Kepware (partial). Why: not in `TwinCATDataType`; symbol-mapper returns `null` and skips.
- **[Skip]** **Multi-target / multi-route AMS gateway** — present in: Kepware, Ignition (one driver instance, many devices). Why: we accept N `Devices` but each requires its own `TwinCATDeviceOptions`; no central route table, no `StaticRoutes.xml` management, no AMS-router credential handling.
- **[Skip]** **TwinCAT 3.1.4024+ Secure ADS / ADS-over-TLS** — present in: TF6100, recent TwinCAT.Ads. Why: `AdsClient.Connect` is called without secure-ADS opts; no certificate or pre-shared-key knobs in `TwinCATDriverOptions`.
- **[Skip]** **Route credential management** — present in: Kepware (route auth UI), TF6100. Why: relies entirely on the host AMS router's pre-authorized routes; we have no in-driver way to add a route or supply credentials.
- **[Skip]** **NC-axis / CNC channel / EtherCAT slave I/O surfaces** — present in: TF6100 (full NC namespace), Kepware (NC variables). Why: our system-symbol filter actively drops `Mc_*` (`TwinCATSystemSymbolFilter.cs:28`); we treat NC plumbing as noise.
- **[Skip]** **System-service ports** (`AMSPORT_R0_REALTIME=200`, `R0_TCOMSERVER=10000`, `EVENTLOG=110`) — present in: TF6100, Kepware (system data). Why: only `Devices` are PLC-runtime ports in practice; no helpers for system-service requests, run/config-mode switches, or Real-Time diagnostic counters.
- **[Build]** **Event log ingest (TwinCAT EventLogger / TC3 Eventing)** — present in: TF6100 (alarms/conditions), Ignition. Why: we don't implement `IAlarmSource`; AMS port 110 events never surface as OPC UA AC events.
- **[Skip]** **PLC RPC / method invocation (TC3 method calls via ADS)** — present in: TF6100. Why: `IWritable` is value-only; no surface for `RpcInvoke`-style method calls on FB instances.
- **[Skip]** **Per-PLC-runtime fan-out (port 851/852/853)** — partially present. Why: technically supported via separate `Devices` entries, but no helper that auto-discovers which runtimes exist on a controller via the system service.
- **[Build]** **Sub-millisecond cycle accuracy / max-delay tuning** — present in: TF6100, Kepware. Why: `NotificationSettings(OnChange, cycleMs, 0)` clamps cycle to 1 ms and sets max-delay to 0 (`AdsTwinCATClient.cs:144-145`); no per-tag override of `MaxDelay` to coalesce bursty signals.
- **[Build]** **Cycle-time / jitter / PLC-state diagnostics** — present in: TF6100, Kepware. Why: probe only checks reachability; we don't surface cycle-time, jitter, RT-state or `_AppInfo.OnlineChangeCnt` as health signals.
- **[Build]** **Online change / symbol-version invalidation** — present in: TF6100, Ignition. Why: no listener on `ADSIGRP_SYMVAL_BYHND` invalidation event; an online change silently invalidates cached handles (we have none, but adding handles needs this).
- **[Skip]** **File-system access via ADS (`ADSIGRP_FOPEN/FREAD`)** — present in: TF6100. Why: not implemented; useful for reading recipe files / log uploads without a separate transport.
### Recommendations
| # | Gap | Build? | Rationale |
|---|-----|:------:|-----------|
| 1 | ADS Sum commands | Yes | ~10x throughput for multi-thousand-tag scans; blocker at scale |
| 2 | Handle-based access (caching) | Yes | Perf; reduces per-request bytes and AMS overhead |
| 3 | STRUCT / UDT decomposition with TMC parsing | Yes | Real projects have nested UDTs we currently can't expose |
| 4 | Bit-indexed BOOL writes | Yes | Correctness; we read bits but throw on write |
| 5 | Multi-dim / whole-array reads | Yes | Perf; library supports it |
| 6 | Int64 fidelity (LInt / ULInt) | Yes | Correctness; we silently truncate |
| 7 | TIME / DATE / DT / TOD as native UA types | Yes | Correctness; raw UDINT pushes interpretation to clients |
| 8 | ENUM / ALIAS / REFERENCE / POINTER / INTERFACE / UNION | Yes | At least ENUM and ALIAS are common in real projects |
| 9 | Multi-target / multi-route AMS gateway | No | Per-device config already works |
| 10 | Secure ADS / ADS-over-TLS | No | Significant work; defer |
| 11 | Route credential management | No | Host-level AMS router responsibility |
| 12 | NC-axis / CNC channel / EtherCAT slave I/O surfaces | No | Specialty; not in target use cases |
| 13 | System-service ports | No | Niche operational tooling |
| 14 | Event log / TC3 alarms (`IAlarmSource`) | Yes | Currently no `IAlarmSource` implementation; capability gap |
| 15 | PLC RPC / method invocation | No | Niche; design-heavy |
| 16 | Per-PLC-runtime auto-discover | No | Cosmetic; manual port config works |
| 17 | Sub-millisecond max-delay tuning | Yes | Cheap; helps coalesce bursty signals |
| 18 | Cycle-time / jitter / PLC-state diagnostics | Yes | Operability; cheap given existing probe |
| 19 | Online-change / symbol-version invalidation | Yes | Required if handle caching lands (gap #2) |
| 20 | File-system access via ADS | No | Niche; out of scope |
### Notable parity (keep)
- Native `OnChange` notifications (not polling) — matches TF6100/Kepware default and is the right CPU/latency posture.
- Symbolic addressing (no manual index-group/offset arithmetic) — same DX as Kepware's TwinCAT driver.
- Live integration suite against a real runtime (TCBSD + XAR), not just mocks — better than Ignition's stock TwinCAT module which lacks bundled hardware tests.
- System-symbol filter so `Discovered/` doesn't drown the address space — Kepware ships an equivalent.
- Config-driven tag declarations as the authoritative path; `EnableControllerBrowse` is opt-in — matches "tag-import-then-curate" workflow Kepware encourages.
- AmsNetId + port modelled correctly with TC3-vs-TC2 default port awareness — matches TF6100 conventions.
### Sources
- https://infosys.beckhoff.com/english.php?content=../content/1033/tc3_ads_intro/index.html
- https://infosys.beckhoff.com/english.php?content=../content/1033/tcadsdll2/117571083.html (Sum commands / index groups)
- https://infosys.beckhoff.com/english.php?content=../content/1033/tf6100_tc3_opcua/index.html (TF6100)
- https://github.com/Beckhoff/TF6100-OPC-UA-Sample
- https://github.com/Beckhoff/TC3-AdsClient-Csharp / `Beckhoff.TwinCAT.Ads` NuGet docs
- https://www.kepserverexlibrary.kepware.com/Beckhoff%20TwinCAT (Kepware Beckhoff TwinCAT driver manual)
- https://docs.inductiveautomation.com/docs/8.1/platform/connections/devices (Ignition device drivers)
- https://infosys.beckhoff.com/english.php?content=../content/1033/tc3_security_management/ (Secure ADS)
- https://infosys.beckhoff.com/english.php?content=../content/1033/tc3_adsnetref/ (NotificationSettings, AdsTransMode)
+682
View File
@@ -0,0 +1,682 @@
# AbCip Driver — Implementation Plan
> Source of gap analysis: [featuregaps.md → AbCip](../featuregaps.md#abcip-allen-bradley-ethernetip--logix)
>
> This plan covers the **Build = Yes** items only. Skip-rated gaps are listed at the bottom for traceability.
## Summary
This plan closes the 16 Build-rated AbCip gaps in five phases ordered to ship correctness fixes
first, then engineering workflow, then performance, then operability, and finally redundancy.
Phase 1 lands the data-type fidelity work (LINT/ULINT, native STRINGnn, array slicing,
write-multi packing) that today silently truncates 64-bit values and serialises adjacent reads
into N round-trips. Phase 2 introduces the offline tag-import workflow (L5K/L5X + CSV) that
Studio 5000 shops require before they will switch off Kepware. Phase 3 exposes the
performance levers commercial drivers ship as field knobs — symbolic vs logical addressing,
configurable Connection Size, and the logical-blocking / logical-non-blocking strategy
selector. Phase 4 surfaces per-tag scan rates, write deadband, online tag-DB refresh trigger,
and the diagnostic system tags an HMI dashboard expects. Phase 5 adds HSBY paired-IP failover
for continuous-process plants. Headline outcome: parity with Kepware's Logix Database Settings
and TOP Server's protocol-mode picker, with measurable throughput wins (3-5x on dense rigs via
logical addressing, single-PDU reads on contiguous arrays, single-PDU writes on multi-tag
recipe pushes).
## Phased delivery
### Phase 1 — Data-type correctness (4 PRs)
Goal: stop silently losing data. None of the items in this phase are user-visible features —
they are correctness fixes against existing capability surfaces.
#### PR 1.1 — LINT / ULINT 64-bit fidelity
- **Scope**: replace the truncating `Int32` widening at `AbCipDataType.cs:53` with `Int64`
routing across decode + encode + the `DriverDataType` map. Includes `DT` (epoch-millis on
Logix v32+ surfaces as LINT, not DINT — verify against `LibplctagTagRuntime.cs:53` before
reusing the same code-path).
- **Files**: `AbCipDataType.cs` (mapping), `LibplctagTagRuntime.cs` (already calls
`_tag.GetInt64` / `SetInt64`, so the runtime is correct — the gap is the surface enum
flattening into `Int32`), `Core.Abstractions/DriverDataType.cs` may need an `Int64` /
`UInt64` member if not already present.
- **Test approach**: unit (xUnit + Shouldly) with a fake `IAbCipTagRuntime` that returns
`long.MaxValue` on `DecodeValueAt(LInt, ...)`; assert the snapshot value round-trips through
the read path without truncation. Integration test against pymodbus is N/A — needs a live
Logix or a libplctag mock-server fixture; keep this unit-only and rely on smoke testing on
the dev box with a real ControlLogix.
- **Effort**: S
- **Dependencies**: confirm `DriverDataType.Int64` exists; if not, that is a Core change
shared with the Modbus TODO at `AbCipDataType.cs:53`.
- **Docs / fixture / e2e**: appends a Logix-types row to the type-mapping table in
`docs/Driver.AbCip.Cli.md` (CLI gains `--type LInt` / `--type ULInt`); extends
`docs/drivers/AbServer-Test-Fixture.md` §"What it actually covers" to list `LINT` once
ab_server is reseeded with a `TestLINT:LINT[1]` tag; updates the
`tests/ZB.MOM.WW.OtOpcUa.Driver.AbCip.IntegrationTests/Docker/docker-compose.yml`
ControlLogix profile to seed `TestLINT`; adds a 64-bit assertion case in
`AbCipReadSmokeTests`; extends `scripts/e2e/test-abcip.ps1` with an LInt loopback
assertion (and a matching seeded `TestLINT` in `scripts/smoke/seed-abcip-smoke.sql`).
#### PR 1.2 — Native STRING / STRINGnn variant decoding
- **Scope**: Today `AbCipDataType.String` flattens any Logix `STRING` UDT into a .NET
string via libplctag's `_tag.GetString(0)`. Logix programs commonly define
`STRING_20`, `STRING_40`, `STRING_80` variants with different DATA-array sizes; libplctag
honours these when the tag name resolves to the user-defined type, but our discovery
emits them as the generic `String` placeholder. Add a `StringLength` field to
`AbCipStructureMember` + `AbCipTagDefinition` so declared variants carry their cap, and
thread it into the `Tag.Name` attribute or a libplctag string-cap hint.
- **Files**: `AbCipDataType.cs`, `AbCipDriverOptions.cs` (record fields), `LibplctagTagRuntime.cs`
(string-length aware decode/encode), and the discovery emit at `AbCipDriver.cs:715`.
- **Test approach**: unit test with a fake runtime returning `string` values shorter and
longer than the declared cap; integration test deferred until a sample L5X with mixed
STRING variants is available.
- **Effort**: M
- **Dependencies**: investigate libplctag's `str_max_capacity` / `str_count_word_bytes`
attributes — the docs reference them but the C# wrapper may not expose them; if not, this
PR must extend `LibplctagTagRuntime` with a raw-buffer decode path.
- **Docs / fixture / e2e**: extends `docs/Driver.AbCip.Cli.md` with a new `--string-size`
flag in the `read`/`write` cookbook plus a STRINGnn worked example; updates
`docs/drivers/AbServer-Test-Fixture.md` §"What it actually covers" to list
`STRING_20`/`STRING_80` once seeded; extends the ControlLogix profile in
`tests/.../Docker/docker-compose.yml` with `TestSTRING80:STRING[1]` (plus a `STRING_20`
variant if `ab_server` honours non-default DATA caps; otherwise documented as Emulate-tier
only); adds `tests/.../IntegrationTests/AbCipStringDecodingTests.cs` round-trip; adds a
short-string round-trip case to `scripts/e2e/test-abcip.ps1` and a `TestSTRING80` row to
`scripts/smoke/seed-abcip-smoke.sql`.
#### PR 1.3 — Array-slice read addressing `Tag[0..N]`
- **Scope**: today `AbCipTagPath` parses `Tag[3,5]` as a single element. Add slice syntax
`Tag[0..15]` (parsed in `AbCipTagPath.TryParse`) and a planner that issues one libplctag
read with `elem_count=N` per Rockwell array semantics, decoding the buffer at element
stride into N output snapshots. Mirrors the whole-UDT planner pattern.
- **Files**: `AbCipTagPath.cs` (parser — add `IsSlice` + `SliceLength` to the path segment
record, or carry it on `AbCipTagPath` itself), new `AbCipArrayReadPlanner.cs` next to
`AbCipUdtReadPlanner.cs`, `AbCipDriver.ReadAsync` to dispatch through the planner,
`IAbCipTagRuntime` to add `DecodeArrayAt(type, elementStride, count)` or build on
`DecodeValueAt`. Investigate libplctag's `elem_count` attribute on `Tag` create to confirm
the right wire-level switch.
- **Test approach**: parser unit tests for the new syntax, planner unit tests with fake
runtime, integration smoke against a live ControlLogix DINT[100] tag using the dev-box
PLC.
- **Effort**: L
- **Dependencies**: PR 1.1 must land first if the array element type is LINT — otherwise the
slice path silently truncates 64-bit elements.
- **Docs / fixture / e2e**: extends `docs/Driver.AbCip.Cli.md` `read` section with the
`Tag[0..N]` slice syntax + a worked example reading `Recipe[0..15]` in one round-trip;
updates `docs/drivers/AbServer-Test-Fixture.md` §"What it actually covers" to mention
the existing `DINT[16]` array tag is now exercised end-to-end via slicing; extends
`AbCipReadSmokeTests` with a slice-read assertion against the seeded `TestDINTArray`;
adds `tests/.../IntegrationTests/AbCipArraySliceTests.cs` covering edge cases
(boundary, single-element, full-range); adds a slice-read assertion to
`scripts/e2e/test-abcip.ps1`.
#### PR 1.4 — CIP multi-tag write packing
- **Scope**: `AbCipDriver.WriteAsync` (`AbCipDriver.cs:460-546`) loops over writes one-by-one.
Group writes by `(device, no-bit-RMW)` and submit one CIP Multi-Service Packet (0x0A)
carrying up to N write-singles per round-trip. Honours the per-family
`SupportsRequestPacking` flag at `AbCipPlcFamilyProfile.cs:36,43,51,59` — Micro800 falls
back to the existing per-write loop because its profile already disables packing.
- **Files**: `AbCipDriver.cs` (add a write planner mirroring the read planner), new
`AbCipMultiWritePlanner.cs`, possibly a new `IAbCipTagRuntime.WriteBatchAsync` method or a
new `IAbCipMultiWriter` capability since libplctag's high-level `Tag.WriteAsync` is
per-tag — investigate libplctag's `cip-msg-multi` raw-CIP path or whether building a
Multi-Service Packet via `plc_tag_create("name=@raw,...")` is feasible.
- **Test approach**: unit test the planner with a synthetic batch (mixed-device, mixed
bit-RMW, one Micro800); integration test recipe-style 50-tag write against ControlLogix
measuring round-trip count via Wireshark or via a libplctag debug-trace sink.
- **Effort**: L
- **Dependencies**: investigate libplctag multi-service-packet API; if absent, this PR may
need to drop down to raw CIP via the `@raw` pseudo-tag or be deferred.
- **Docs / fixture / e2e**: appends a "Multi-tag writes" subsection to
`docs/Driver.AbCip.Cli.md` (no flag — automatic batching when multiple writes queue
inside one publish) plus a note that Micro800 falls back per profile; updates
`docs/drivers/AbServer-Test-Fixture.md` §7 ("Capability surfaces beyond read") to flip
`IWritable.WriteAsync` from "no smoke test" to covered for the multi-write path; adds
`tests/.../IntegrationTests/AbCipMultiWriteTests.cs` asserting 50-tag batch lands in one
round-trip (count via libplctag debug-trace sink); extends `scripts/e2e/test-abcip.ps1`
with a recipe-style multi-write step; extends seed SQL with two extra DINT tags so the
e2e has a packing target.
### Phase 2 — Tag-import workflows (4 PRs)
Goal: replicate Kepware's Logix Database Settings — point the driver at an L5K/L5X export or
a CSV and have the tag table populate without an online controller.
#### PR 2.1 — L5K parser + ingest
- **Scope**: parse a Studio 5000 L5K export (a labelled-section text format with
`TAG ... END_TAG` blocks, `DATATYPE ... END_DATATYPE` UDT definitions, and program-scope
qualifiers). Produce `AbCipTagDefinition` + `AbCipStructureMember` records that match the
declarative options shape. Includes Description ingest (PR 2.3 lifts it to OPC UA
`Description`).
- **Files**: new `Import/L5kParser.cs`, new `Import/IL5kSource.cs` for testability, new
`Import/L5kIngest.cs` that converts parsed records into `AbCipTagDefinition`. Hook into
`AbCipDriverOptions` via a new `TagImports` collection (filenames or inline blobs) parsed
on `AbCipDriver.InitializeAsync`.
- **Test approach**: unit-only with sample L5K files in `tests/ZB.MOM.WW.OtOpcUa.Driver.AbCip.Tests/Import/Fixtures/`
covering controller-scope tags, program-scope tags, alias tags (skipped per Kepware
precedent), and UDTs with nested structures.
- **Effort**: L
- **Dependencies**: none — pure-text parser.
- **Docs / fixture / e2e**: new doc `docs/drivers/AbCip-TagImport.md` covering the L5K
format support matrix (controller-scope / program-scope / UDT / alias-skipped) and a
worked example of pointing `AbCipDriverOptions.TagImports` at an L5K export; appends a
`tag-import` command section to `docs/Driver.AbCip.Cli.md` (CLI gains
`tag-import --file foo.L5K`); fixture-side no change to `ab_server` (offline parse —
no PLC needed) but adds sample L5K files under
`tests/.../AbCip.Tests/Import/Fixtures/`; extends `scripts/e2e/test-abcip.ps1` with an
offline `tag-import` smoke that diffs the parsed tag set against a golden JSON.
#### PR 2.2 — L5X (XML) parser + ingest
- **Scope**: same surface as PR 2.1 but parses Studio 5000's XML export. L5X is the de-facto
modern format (Studio 5000 v21+) and carries richer metadata than L5K including
ExternalAccess attributes and AOI definitions.
- **Files**: new `Import/L5xParser.cs` using `System.Xml.XPath`, share the `IL5kSource` /
`L5kIngest` ingest layer with PR 2.1 by introducing a common `ParsedTagsBundle` record.
- **Test approach**: unit tests with sample L5X fixtures including an AOI-typed tag (sets up
the AOI work in PR 2.7).
- **Effort**: L
- **Dependencies**: PR 2.1 is preferred first to settle the shared ingest seam.
- **Docs / fixture / e2e**: extends `docs/drivers/AbCip-TagImport.md` (created in PR 2.1)
with the L5X-specific section — namespace handling, ExternalAccess attributes, AOI
references; extends the `tag-import` CLI section in `docs/Driver.AbCip.Cli.md` to note
L5X auto-detection by file extension; sample L5X files added under
`tests/.../AbCip.Tests/Import/Fixtures/` (one with an AOI-typed tag for PR 2.6);
reuses the offline `tag-import` step from `scripts/e2e/test-abcip.ps1` (now driven by
L5X) — no fixture container change because parse is offline; cross-links from
`tests/.../IntegrationTests/LogixProject/README.md` so the on-site Emulate L5X export
doubles as a parser fixture.
#### PR 2.3 — Tag descriptions surfaced as OPC UA `Description`
- **Scope**: extend `AbCipTagDefinition` with `Description` (string?), populate it from the
L5K/L5X parsers, and thread it through to `DriverAttributeInfo` so the address-space
builder sets the OPC UA `Description` attribute. Also lifts the description onto
`AbCipStructureMember` for member-level metadata.
- **Files**: `AbCipDriverOptions.cs` (record fields), `AbCipDriver.cs:760-770`
(`ToAttributeInfo` helper), `Core.Abstractions/DriverAttributeInfo.cs` (verify it carries
a Description field; if not, that becomes a Core PR shared across drivers).
- **Test approach**: unit — a discovery test asserts that a tag with a description ends up
with that description on the `DriverAttributeInfo` record.
- **Effort**: S
- **Dependencies**: PR 2.1 / PR 2.2 (descriptions only land via importer).
- **Docs / fixture / e2e**: appends a "Description metadata" subsection to
`docs/drivers/AbCip-TagImport.md` documenting how Studio 5000 descriptions surface as
OPC UA `Description`; no CLI surface change (read-side only — the existing
`otopcua-cli read` already projects `Description`); no fixture container change; adds
a cross-driver assertion to the existing OPC UA browse test in
`tests/.../IntegrationTests/` verifying the description survives the full
parser → driver → server → client path; extends `scripts/e2e/test-abcip.ps1` with a
one-line `Description != null` assertion after the import smoke step.
#### PR 2.4 — CSV tag import / export
- **Scope**: a CSV round-trip matching the Kepware column layout (`Tag Name, Address, Data
Type, Respect Data Type, Client Access, Scan Rate, Description, Scaling`). Import populates
`AbCipTagDefinition`; export dumps the live tag table for editing in Excel.
- **Files**: new `Import/CsvTagImporter.cs`, new `Import/CsvTagExporter.cs`, integration
point in `AbCipDriverOptions.TagImports` parallel to PR 2.1's hook. Export hook is
exposed via the CLI (`docs/Driver.AbCip.Cli.md`) — add a `tag-export` command.
- **Test approach**: unit tests for parser + writer with fixture CSVs; CLI integration test
using a synthetic options payload.
- **Effort**: M
- **Dependencies**: lighter than 2.1/2.2 — could ship in either order, but landing CSV after
L5X means the CSV export reuses the `ParsedTagsBundle` shape.
- **Docs / fixture / e2e**: appends a "CSV tag table" section to
`docs/drivers/AbCip-TagImport.md` documenting the column layout (Kepware-compatible) and
round-trip semantics; appends `tag-export` and CSV-flavour `tag-import` commands to
`docs/Driver.AbCip.Cli.md`; adds sample CSVs under
`tests/.../AbCip.Tests/Import/Fixtures/` plus a CLI integration test
(`tests/.../AbCip.Tests/Import/CsvRoundTripTests.cs`); extends
`scripts/e2e/test-abcip.ps1` with an export-then-import-then-diff scenario (no PLC
required); fixture-side no change.
#### PR 2.5 — Online tag-DB refresh trigger (`$Sys$UpdateTagInfo` parity)
- **Scope**: AVEVA exposes `$Sys$UpdateTagInfo` so an HMI can write `1` to force the driver
to re-walk the controller's symbol table after a Studio 5000 download — without restarting
the driver. Implement as a new `IDriverControl.RebrowseAsync()` invoked by the server or
via a system-tag write (PR 4.4 will surface system tags as browseable variables — once
that lands, this becomes the writeable system tag `_RefreshTagDb`). For now expose it
via the CLI and via a new `AbCipDriver.RebrowseAsync` method.
- **Files**: `AbCipDriver.cs` (new method that re-runs the `@tags` enumerator without going
through full `ReinitializeAsync`), CLI command in `src/ZB.MOM.WW.OtOpcUa.Driver.AbCip.Cli/`,
documentation update in `docs/Driver.AbCip.Cli.md`.
- **Test approach**: unit test that two consecutive `RebrowseAsync` calls produce two
enumeration passes; integration smoke against the dev-box ControlLogix verifying the
address space picks up a tag added between rebrowses.
- **Effort**: M
- **Dependencies**: ties cleanly into PR 4.4 (system tags) but ships earlier as a
programmatic API.
- **Docs / fixture / e2e**: appends a `rebrowse` command to `docs/Driver.AbCip.Cli.md`
with a Studio 5000 download recipe ("after a download, run
`otopcua-abcip-cli rebrowse -g …`"); cross-references the future `_RefreshTagDb` system
tag once PR 4.4 lands; updates `docs/drivers/AbServer-Test-Fixture.md` §7 to mark
`ITagDiscovery.DiscoverAsync` as covered for the rebrowse path; adds
`tests/.../IntegrationTests/AbCipRebrowseTests.cs` driving two consecutive
enumerations (the second sees a tag added between calls — ab_server supports runtime
reseed via its REST hook); extends `scripts/e2e/test-abcip.ps1` with a
rebrowse-after-reseed assertion (or marks it `[OnlyIfRig]` if the simulator's reseed
hook isn't reachable).
#### PR 2.6 — AOI (Add-On Instruction) input/output handling
- **Scope**: AOIs are first-class types in L5X (`AddOnInstructionDefinition` blocks). The
Template Object decoder at `CipTemplateObjectDecoder.cs` likely already handles them at
the wire level (an AOI is a Logix UDT with InOut/Input/Output qualifiers). This PR adds:
(a) AOI-aware browse paths so an AOI instance shows up as a folder with `Inputs/`,
`Outputs/`, `InOut/` sub-folders; (b) skip-on-discovery for `InOut` parameters per
Kepware's documented limitation (InOut is a pointer, not a value).
- **Files**: extend `AbCipStructureMember` with an `AoiQualifier` enum
(Input/Output/InOut/Local), L5K/L5X parser extends to set it, `AbCipDriver.DiscoverAsync`
groups members into qualifier-named sub-folders.
- **Test approach**: unit test discovery against an AOI-containing fixture.
- **Effort**: M
- **Dependencies**: PR 2.2 (L5X) lands the AOI definition parsing.
- **Docs / fixture / e2e**: appends an "AOI handling" section to
`docs/drivers/AbCip-TagImport.md` covering Inputs/Outputs/InOut grouping + the InOut
skip rationale; updates `docs/drivers/AbServer-Test-Fixture.md` §"What it does NOT
cover" to keep AOIs flagged as ab_server-blocked but call out Logix Emulate as the
authoritative tier; adds a sample AOI-bearing L5X under
`tests/.../AbCip.Tests/Import/Fixtures/` and a discovery test that asserts the
Inputs/Outputs sub-folder shape; promotes
`tests/.../IntegrationTests/Emulate/AbCipEmulateAoiTests.cs` (gated on
`AB_SERVER_PROFILE=emulate`) — no `scripts/e2e/test-abcip.ps1` change because AOIs
need Emulate or a rig.
### Phase 3 — Performance levers (3 PRs)
Goal: expose the protocol-mode + connection-tuning knobs that commercial drivers expose as
device-level config.
#### PR 3.1 — Configurable CIP Connection Size per device
- **Scope**: today the family profile hard-codes 4002 / 504 / 488 at
`AbCipPlcFamilyProfile.cs:33,42,49`. Add an optional `ConnectionSize` field to
`AbCipDeviceOptions` that overrides the family default; thread it through to the
libplctag tag-create attribute (`connection_size=N`). Validate against a sensible range
(500-4002 per Kepware's slider).
- **Files**: `AbCipDriverOptions.cs:70-73` (extend `AbCipDeviceOptions` record),
`IAbCipTagRuntime.cs` (extend `AbCipTagCreateParams` with `ConnectionSize`),
`LibplctagTagRuntime.cs` (set the `Tag.PlcType`-adjacent attribute — investigate libplctag
C# wrapper's exposure of `connection_size`; may need to set via `Tag.AddAttribute` if a
named property doesn't exist).
- **Test approach**: unit test that custom Connection Size flows from options into the
`AbCipTagCreateParams`; integration smoke against the dev-box ControlLogix verifying
reduced-size connections succeed on legacy v19 firmware. Live test required because
libplctag rejects out-of-range values silently in some versions.
- **Effort**: S
- **Dependencies**: investigate libplctag `connection_size` attribute exposure.
- **Docs / fixture / e2e**: appends a "Connection Size" subsection to a new
`docs/drivers/AbCip-Performance.md` (consolidates the Phase 3 knobs in one place) and a
brief note + warning-symptom callout in `docs/Driver.AbCip.Cli.md` for the new
per-device option in the Driver config; updates
`docs/drivers/AbServer-Test-Fixture.md` §5 (CompactLogix narrow cap) noting that
ab_server still doesn't enforce the cap so live coverage stays Emulate/rig-only;
extends `scripts/smoke/seed-abcip-smoke.sql` with a `ConnectionSize` field demo;
no `scripts/e2e/test-abcip.ps1` change (boot-time config knob, no per-call surface).
#### PR 3.2 — Symbolic vs logical (instance-ID) addressing toggle
- **Scope**: libplctag exposes `use_connected_msg=1&allow_packet_response_packing=1&logical_segment=1`
(or similar — investigate the exact attribute name) for instance-ID addressing that skips
per-poll ASCII parsing. Add a per-device `AddressingMode` enum
(`Symbolic | Logical | Auto`) and thread it through `AbCipTagCreateParams`. `Auto` is the
default and matches today's behaviour; `Logical` flips libplctag into instance-ID mode.
Logical mode requires a one-time symbol-table walk to map names to instance IDs — reuse
`LibplctagTagEnumerator` for the bootstrap.
- **Files**: `AbCipDriverOptions.cs` (per-device enum), `IAbCipTagRuntime.cs`
(`AbCipTagCreateParams.AddressingMode`), `LibplctagTagRuntime.cs` (translate to libplctag
attributes), `AbCipDriver.cs` (run a one-time symbol-walk on first read in Logical mode).
- **Test approach**: unit test attribute construction; integration benchmark — read 1000
tags in Symbolic vs Logical and assert >2x throughput on the dev-box ControlLogix.
- **Effort**: L
- **Dependencies**: investigate libplctag's instance-ID API; the mapping pseudo-tag is
`@tags` (already used for browse) but the per-tag wire flag needs research. If libplctag
doesn't expose this cleanly, the PR drops down to the raw `cip_addr` attribute.
- **Docs / fixture / e2e**: appends an "Addressing mode" section to
`docs/drivers/AbCip-Performance.md` (Symbolic / Logical / Auto trade-offs); adds a
per-device `addressing-mode` knob to `docs/Driver.AbCip.Cli.md` (CLI gains
`--addressing-mode` on `read`/`subscribe`/`write` for ad-hoc benchmarking); updates
`docs/drivers/AbServer-Test-Fixture.md` §"What it actually covers" to add Logical-mode
reads if ab_server's symbol table walks correctly under instance IDs (otherwise
marked Emulate-tier-only); adds a benchmark test
`tests/.../IntegrationTests/AbCipAddressingModeBenchTests.cs`; extends
`scripts/e2e/test-abcip.ps1` with a Symbolic-vs-Logical sanity assertion
(read 1000 tags both modes, assert Logical >= Symbolic throughput).
#### PR 3.3 — Logical-blocking / non-blocking strategy selector
- **Scope**: TOP Server names two modes: "logical-blocking" (whole-UDT read, decode members
in-memory) and "logical-non-blocking" (per-member reads packed into one Multi-Service
Packet). We have one direction shipped via `AbCipUdtReadPlanner`. Add a per-device
`ReadStrategy` enum with three values: `WholeUdt` (current behaviour), `MultiPacket`
(new: use libplctag request-packing to bundle per-member reads into one PDU when the UDT
is sparse — i.e. only 2-of-50 members subscribed), and `Auto` (planner picks based on
fraction-of-members-subscribed threshold). Strategy is per-device because Micro800
doesn't support packing.
- **Files**: `AbCipDriverOptions.cs` (per-device enum), `AbCipUdtReadPlanner.cs` (add the
threshold heuristic), new `AbCipMultiPacketReadPlanner.cs`, `AbCipDriver.ReadAsync`
dispatch. Honours `AbCipPlcFamilyProfile.SupportsRequestPacking` at the family level so a
user-selected `MultiPacket` on Micro800 falls back to per-tag with a warning logged.
- **Test approach**: unit test the heuristic on synthetic batches of varying sparsity;
integration benchmark with a 50-member UDT where 5 members are subscribed — verify
MultiPacket beats WholeUdt by buffer-size delta.
- **Effort**: L
- **Dependencies**: PR 1.4 (multi-tag write packing) builds the same libplctag-multi-service
primitive; landing 1.4 first reduces scope here.
- **Docs / fixture / e2e**: appends a "Read strategy" section to
`docs/drivers/AbCip-Performance.md` covering WholeUdt / MultiPacket / Auto plus the
sparsity-threshold heuristic; updates `docs/drivers/AbServer-Test-Fixture.md` §1
(UDT coverage) with a note that strategy switching is decided in the planner and
unit-tested only — Emulate is the authoritative wire-level coverage; adds
`tests/.../IntegrationTests/Emulate/AbCipEmulateMultiPacketReadTests.cs` (gated on
`AB_SERVER_PROFILE=emulate`); no CLI surface change beyond the existing
per-device option, no `scripts/e2e/test-abcip.ps1` change because the simulator
doesn't differentiate the two strategies on the wire.
### Phase 4 — Operability (4 PRs)
Goal: make the driver behave like a SCADA driver — per-tag scan rates, write deadband,
diagnostic system tags, online refresh trigger.
#### PR 4.1 — Per-tag scan rate / scan group bucketing
- **Scope**: today subscriptions key on a single `publishingInterval` per
`_poll.Subscribe(...)` call. Add an optional `ScanRate` field to `AbCipTagDefinition` that,
when set, overrides the subscription interval for that tag. The shared `PollGroupEngine`
already buckets by interval — the change is to read the per-tag rate at subscribe-time
and place the tag into its own bucket.
- **Files**: `AbCipDriverOptions.cs` (record field), `AbCipDriver.SubscribeAsync` (look up
per-tag override before passing to `_poll.Subscribe`). `PollGroupEngine` may need a new
`Subscribe(tags, defaultInterval, perTagOverrides)` overload — check Core for the current
signature.
- **Test approach**: unit test that two tags with different ScanRate values produce two
poll buckets; integration test verifying the faster-rate tag publishes more frequently
than the slower-rate tag inside one subscription.
- **Effort**: M
- **Dependencies**: may require a small change to `PollGroupEngine` in Core.
- **Docs / fixture / e2e**: new doc `docs/drivers/AbCip-Operability.md` (consolidates the
Phase 4 knobs); appends a "Per-tag scan rate" section to it covering Kepware "scan
classes" parity + the OPC UA publishing-interval interaction; no CLI surface change;
fixture-side no change to ab_server; adds
`tests/.../IntegrationTests/AbCipPerTagScanRateTests.cs` driving two tags at
different rates against ab_server and asserting bucket separation; extends
`scripts/e2e/test-abcip.ps1` with a two-tag subscribe-rate-divergence assertion.
#### PR 4.2 — Write deadband / write-on-change
- **Scope**: `AbCipDriver.WriteAsync` writes every request through. Add per-tag
`WriteDeadband` (numeric) and `WriteOnChange` (boolean). When set, the driver tracks the
last successfully-written value per `(tag, deviceHostAddress)` and suppresses the next
write if `|new - last| < deadband` (numeric) or `new == last` (any). Suppressed writes
return `Good` so OPC UA semantics are unaffected.
- **Files**: `AbCipDriverOptions.cs` (record fields), new `AbCipWriteCoalescer.cs` holding
the per-tag last-value cache, `AbCipDriver.WriteAsync` consults the coalescer before
hitting the runtime.
- **Test approach**: unit tests with synthetic writes — assert that a sequence of jittery
setpoint values within deadband triggers a single PLC write.
- **Effort**: M
- **Dependencies**: none.
- **Docs / fixture / e2e**: appends a "Write deadband / write-on-change" section to
`docs/drivers/AbCip-Operability.md` with a worked setpoint-jitter example; updates
`docs/drivers/AbServer-Test-Fixture.md` §7 to flip the multi-write coverage line to
also cover suppression; adds
`tests/.../IntegrationTests/AbCipWriteDeadbandTests.cs` driving a jittery setpoint and
asserting the actual PLC write count via libplctag debug-trace; extends
`scripts/e2e/test-abcip.ps1` with a write-coalesce assertion (write the same value
twice, verify only one PLC-side change).
#### PR 4.3 — Diagnostic / system tags as browseable variables
- **Scope**: surface the `IHostConnectivityProbe` + `DriverHealth` data as browseable OPC UA
variables under `AbCip/<device>/_System/`. Variables: `_ConnectionStatus`, `_ScanRate`
(current effective publishing interval), `_TagCount`, `_DeviceError`, `_LastScanTimeMs`.
Read-only; updated on each driver health transition.
- **Files**: `AbCipDriver.DiscoverAsync` (`AbCipDriver.cs:674-758`) emits the system folder
per device; new `AbCipSystemTagSource.cs` produces the live values; `ReadAsync` routes
`_System/...` references to the source instead of the libplctag runtime.
- **Test approach**: unit test that the discovery emits the expected nodes; unit test that
reading a system tag returns the current health snapshot.
- **Effort**: M
- **Dependencies**: none, but PR 2.5 (online refresh trigger) becomes nicer once this lands —
`_RefreshTagDb` writes `1` to invoke `RebrowseAsync`.
- **Docs / fixture / e2e**: appends a "System tags / `_System` folder" section to
`docs/drivers/AbCip-Operability.md` enumerating `_ConnectionStatus`, `_ScanRate`,
`_TagCount`, `_DeviceError`, `_LastScanTimeMs`; cross-link from
`docs/Driver.AbCip.Cli.md` (the `read` cookbook gains a system-tag example); updates
`docs/drivers/AbServer-Test-Fixture.md` §7 to flip `IHostConnectivityProbe` state-
transition coverage from "no" to covered (system tag observation provides the assertion
hook); adds `tests/.../IntegrationTests/AbCipSystemTagDiscoveryTests.cs`; extends
`scripts/e2e/test-abcip.ps1` with a `_System/_ConnectionStatus` browse-and-read step.
#### PR 4.4 — Online tag-DB refresh trigger as `_RefreshTagDb` system tag
- **Scope**: thin follow-up to PR 2.5 + PR 4.3 — wire the writeable system tag to the
existing `RebrowseAsync` method.
- **Files**: `AbCipSystemTagSource.cs` (writeable variable), `AbCipDriver.WriteAsync`
intercepts `_RefreshTagDb` writes and dispatches to `RebrowseAsync`.
- **Test approach**: unit + CLI integration.
- **Effort**: S
- **Dependencies**: PR 2.5 and PR 4.3.
- **Docs / fixture / e2e**: extends the `_System` table in
`docs/drivers/AbCip-Operability.md` to mark `_RefreshTagDb` as writeable; appends a
"Refreshing the tag DB" recipe to `docs/Driver.AbCip.Cli.md` that pairs the system-tag
write with the existing `rebrowse` command from PR 2.5; reuses the
`AbCipRebrowseTests` fixture from PR 2.5 with an added system-tag-write entry point;
extends `scripts/e2e/test-abcip.ps1` with a `_RefreshTagDb` write-then-verify
assertion (chained off the rebrowse step from PR 2.5).
### Phase 5 — Redundancy (2 PRs)
Goal: HSBY paired-IP failover for continuous-process plants. Heavier than the rest because
it changes the `(device, hostName)` axiom — one logical device now has two host addresses.
#### PR 5.1 — Paired host address syntax + role probing
- **Scope**: extend `AbCipDeviceOptions` with `PartnerHostAddress` (optional). When set, the
device probes both gateways concurrently using the existing probe loop machinery
(`AbCipDriver.cs:235-281`). A ControlLogix HSBY pair exposes
`WallClockTime`/`Module.Status` tags that identify the active chassis — investigate the
exact tag name; `WallClockTime.SyncStatus` is one option, `S:34` (Module Status)
carries the role bit on some versions.
- **Files**: `AbCipDriverOptions.cs` (extend `AbCipDeviceOptions`), `AbCipDriver.cs`
(extend `DeviceState` with `ActiveAddress` field, run two probe loops), new
`AbCipHsbyRoleProber.cs` reading the role tag and returning Active/Standby.
- **Test approach**: unit test with two fake probe runtimes returning different role bits;
integration test deferred until a true HSBY pair is available — note in
`MEMORY.md/project_aveva_platform_installed.md` that the dev box has a single chassis.
- **Effort**: L
- **Dependencies**: investigate the canonical HSBY role tag — the AVEVA ABCIP docs name it
but the wire-level tag varies by firmware.
- **Docs / fixture / e2e**: new doc `docs/drivers/AbCip-HSBY.md` covering the paired-IP
config, the role-tag detection matrix (v20 / v24 / v32+), and the feature-flag gate
(`Redundancy.Hsby.Enabled`); extends `docs/Driver.AbCip.Cli.md` with a `--partner`
flag plus an `hsby-status` command that prints the active partner; updates
`docs/drivers/AbServer-Test-Fixture.md` §"What it does NOT cover" with a new entry
marking HSBY as ab_server-blocked but adds a "paired-fixture" mode to
`tests/.../Docker/docker-compose.yml` (two `controllogix` services on different
ports + a `hsby-mux` sidecar that flips the role bit on demand); adds
`tests/.../IntegrationTests/AbCipHsbyRoleProberTests.cs`; no
`scripts/e2e/test-abcip.ps1` change yet — HSBY e2e is gated behind a sibling
`scripts/e2e/test-abcip-hsby.ps1` script introduced in PR 5.2.
#### PR 5.2 — Failover routing in IPerCallHostResolver
- **Scope**: `AbCipDriver.ResolveHost` returns the device's primary address today
(`AbCipDriver.cs:307-312`). Change it to return the currently-Active partner. On role
transition, the existing bulkhead/breaker per-host keying isolates a stuck primary
without affecting the failover path because the partner address has its own breaker.
- **Files**: `AbCipDriver.cs:ResolveHost` consults `DeviceState.ActiveAddress`, plus a
small change to per-tag runtime caching so handles are keyed on the active address —
failover invalidates the handle cache and re-creates against the new gateway.
- **Test approach**: unit test that toggling the role flag flips `ResolveHost` output;
integration test deferred per PR 5.1.
- **Effort**: M
- **Dependencies**: PR 5.1.
- **Docs / fixture / e2e**: appends a "Failover behaviour" section to
`docs/drivers/AbCip-HSBY.md` documenting handle-cache invalidation + bulkhead key
semantics; appends a "Failure-mode walkthrough" to the same doc covering
primary-stuck / secondary-stuck / both-stuck cases; reuses the paired-fixture from
PR 5.1; adds `tests/.../IntegrationTests/AbCipHsbyFailoverTests.cs` driving the
role-flip via the `hsby-mux` sidecar and asserting reads route to the new active
partner; ships the new `scripts/e2e/test-abcip-hsby.ps1` (paired-fixture variant of
the standard e2e — flips the role mid-stream and asserts subscribe stream survives).
## Per-PR detail
The summary above already includes each PR's title, motivation (linked to the
featuregaps.md table row), files, test plan, and effort. To keep this section from
duplicating, here are the cross-cutting design notes and risks per phase rather than per PR.
### Phase 1 risks
- **Int64 surface change** (PR 1.1) ripples through the address-space builder + the OPC UA
variant emit. Confirm `Core.Abstractions.DriverDataType` already has `Int64`; if not, this
PR pulls in a Core change other drivers will share (Modbus has the same TODO).
- **STRINGnn variant addressing** (PR 1.2) is the smallest data-correctness PR but has the
highest unknown — libplctag's C# wrapper may flatten all string variants to its built-in
`GetString(0)` helper. If true, PR 1.2 must add a raw-buffer decode path and is then
upgraded from M to L.
- **Array-slice planner** (PR 1.3) introduces a third planner alongside the UDT planner +
the future write planner (1.4). Build them on a shared `IAbCipReadPlanner` seam so
Phase 3's strategy selector has one slot to pivot on, not three.
- **Multi-write packing** (PR 1.4) hinges on libplctag exposing CIP Multi-Service Packet
construction. If it does not, the work-around is a raw-CIP `@raw` send, which is a
bigger lift and may push 1.4 to an L-plus that drags into Phase 3.
### Phase 2 risks
- **L5K text format** has documented edge cases (escape sequences in DESCRIPTION strings,
alias resolution, nested DATATYPE blocks). Lean on Rockwell's published L5K BNF and treat
unknown sections as warnings, not failures.
- **L5X namespace handling** — Studio 5000 v32+ adds optional XML namespaces. Use
XPath with prefix-agnostic queries to avoid version-pinning the parser.
- **CSV column drift** — Kepware's column order has shifted over major versions. Implement
the importer to read by header name, not column index.
### Phase 3 risks
- **Logical addressing bootstrap cost** (PR 3.2) — symbol-table walk on first read can
stall the first poll batch. Cache the instance-ID map per `(device, last symbol-table
hash)` and persist it across `ReinitializeAsync` if feasible.
- **MultiPacket vs WholeUdt heuristic** (PR 3.3) — the threshold (e.g. "switch to
MultiPacket when fewer than 30% of UDT members are subscribed") needs benchmarking on
real rigs. Ship an explicit per-device override + pick a conservative default.
- **Connection Size on legacy firmware** (PR 3.1) — v19-and-earlier ControlLogix firmware
rejects Large Forward Open silently. Document the symptom in `docs/Driver.AbCip.md` and
emit a warning when ConnectionSize > 511 against a family profile that is
ControlLogix-typed but probed-as-v19.
### Phase 4 risks
- **Per-tag scan rate** (PR 4.1) interacts with the OPC UA subscription's
publishing-interval contract. Document that the per-tag override is a *driver-side*
publish bucket that fires `OnDataChange` events at the per-tag rate; the OPC UA layer
still aggregates them on its own publishing-interval and the client may see them at the
larger of the two intervals. This matches Kepware's "scan classes" semantics.
- **Write deadband** (PR 4.2) on UDT-fanned-out members must use the member-level cache, not
the parent UDT's cache.
### Phase 5 risks
- **HSBY role tag name** (PR 5.1) varies by firmware version; without a real HSBY pair on
the dev box the integration coverage is deferred to a customer-site smoke test. Consider
parking PR 5.1+5.2 behind a feature flag (`Redundancy.Hsby.Enabled`) and shipping unit
coverage only.
- **Bulkhead key** assumed to be `(driver, hostName)`; once `ResolveHost` returns the active
partner address that key is correct by construction, but verify Polly's per-key state is
invalidated cleanly when the active address changes mid-call.
## Documentation, fixture, and e2e impact
Cross-cutting roll-up of the per-PR `Docs / fixture / e2e` lines above. Read this before
starting any phase to plan doc + fixture + e2e work in parallel with the code change.
### New documents
- `docs/drivers/AbCip-TagImport.md` (Phase 2, lands with PR 2.1; extended by PR 2.2,
PR 2.3, PR 2.4, PR 2.6) — L5K / L5X / CSV / AOI tag-import reference.
- `docs/drivers/AbCip-Performance.md` (Phase 3, lands with PR 3.1; extended by PR 3.2,
PR 3.3) — Connection Size, Addressing Mode, Read Strategy.
- `docs/drivers/AbCip-Operability.md` (Phase 4, lands with PR 4.1; extended by PR 4.2,
PR 4.3, PR 4.4) — per-tag scan rate, write deadband, system tags.
- `docs/drivers/AbCip-HSBY.md` (Phase 5, lands with PR 5.1; extended by PR 5.2) —
paired-IP redundancy, role-tag matrix, failover semantics.
### Documents with appended sections
- `docs/Driver.AbCip.Cli.md` — gains type-table rows (PR 1.1), `--string-size` flag
(PR 1.2), slice syntax (PR 1.3), multi-write subsection (PR 1.4), `tag-import` /
`tag-export` commands (PR 2.1, PR 2.2, PR 2.4), `rebrowse` command (PR 2.5),
Connection Size note (PR 3.1), `--addressing-mode` flag (PR 3.2), system-tag
read example (PR 4.3), `_RefreshTagDb` recipe (PR 4.4), `--partner` flag plus
`hsby-status` command (PR 5.1).
- `docs/drivers/AbServer-Test-Fixture.md` — coverage map updated by every PR that
changes what ab_server actually exercises (1.1, 1.2, 1.3, 1.4, 2.6, 3.1, 3.2,
3.3, 4.2, 4.3, 5.1).
### Fixture / simulator scaffolding
- `tests/ZB.MOM.WW.OtOpcUa.Driver.AbCip.IntegrationTests/Docker/docker-compose.yml` —
ControlLogix profile gains seeded `TestLINT`, `TestSTRING80`, extra DINT tags for
multi-write (PRs 1.1, 1.2, 1.4); a new paired-fixture mode with `hsby-mux` sidecar
for HSBY (PR 5.1).
- `tests/.../AbCip.IntegrationTests/AbServerProfile.cs` — the `KnownProfiles`
records get extended Notes lines for each new seeded tag class.
- `tests/.../AbCip.Tests/Import/Fixtures/` — new directory hosting sample L5K, L5X,
and CSV files (PRs 2.1, 2.2, 2.6, 2.4).
- `tests/.../AbCip.IntegrationTests/Emulate/` — new gated tests for AOI (PR 2.6) and
MultiPacket strategy (PR 3.3); reuses the existing `AB_SERVER_PROFILE=emulate`
gate.
- `tests/.../AbCip.IntegrationTests/LogixProject/README.md` — cross-link added when
PR 2.2 lands so the on-site Studio 5000 export doubles as a parser fixture.
### Integration / e2e scripts
- `scripts/e2e/test-abcip.ps1` — gains assertions for: LInt loopback (1.1),
STRING round-trip (1.2), array-slice read (1.3), recipe multi-write (1.4),
tag-import diff (2.1, 2.2, 2.4), Description survival (2.3), rebrowse-after-reseed
(2.5), Symbolic-vs-Logical sanity (3.2), per-tag scan-rate divergence (4.1),
write-coalesce (4.2), `_System` browse-and-read (4.3), `_RefreshTagDb` write-then-
verify (4.4).
- `scripts/smoke/seed-abcip-smoke.sql` — extended with `TestLINT`, `TestSTRING80`,
multi-write target tags, and a `ConnectionSize` field demo (PRs 1.1, 1.2, 1.4,
3.1).
- `scripts/e2e/test-abcip-hsby.ps1` — new paired-fixture variant of the standard
e2e, ships with PR 5.2; not chained into `scripts/e2e/test-all.ps1` until HSBY
exits feature-flag gating.
### Cross-cutting work
- The `Docs / fixture / e2e` lines deliberately reuse the existing
`Test-Probe` / `Test-DriverLoopback` / `Test-ServerBridge` / `Test-OpcUaWriteBridge` /
`Test-SubscribeSeesChange` helpers in `scripts/e2e/_common.ps1` — no new helper
functions are required for Phases 1-4. Phase 5 is the first phase that introduces a
new helper (`Test-FailoverDuringSubscribe`) in `_common.ps1`, shipped alongside
PR 5.2; if other drivers (TwinCAT, S7) later adopt a paired-fixture mode they can
reuse it.
- `tests/.../AbCip.IntegrationTests/AbServerFixture.cs` may need a small extension in
PR 5.1 to support the paired-port probe; the change is additive (probe both
`127.0.0.1:44818` and `127.0.0.1:44819`), keeping single-fixture tests working
unchanged.
## Skip-rated items (for context)
Copied from the Recommendations table at `docs/featuregaps.md`:
- **#7 Inactivity timeout / keep-alive cadence** — Rarely an issue with libplctag-managed
connections.
- **#9 "Respect tag-specified scan rate" mode** — Niche; OPC UA subscription rate already
covers it.
- **#10 Initial value cache / first-update from cache** — OPC UA subscription sampling
already handles first-update.
- **#15 UDT as first-class OPC UA structured type** — Member fan-out already works;
structured-type plumbing is heavy.
- **#17 PLC-5 / SLC bridging through CLX** — AbLegacy driver covers this protocol family.
- **#21 Unsolicited CIP MSG ingestion** — Separate driver in commercial; design-heavy;
niche.
- **#22 CIP Generic / Class 3 passthrough** — Niche custom-tooling territory.
- **#23 Per-device connection count / pooling** — libplctag manages connections;
premature.
## Open questions
1. **libplctag instance-ID API** (PR 3.2) — does the C# wrapper expose
`logical_segment` / `cip_addr` attributes directly, or do we have to drop down to
`Tag.AddAttribute` calls? Affects scope of Phase 3.
2. **libplctag CIP Multi-Service Packet** (PR 1.4) — is there a wrapper-level multi-write
helper, or must we go through the `@raw` pseudo-tag? Affects scope of Phase 1.
3. **`DriverDataType.Int64` / `Int64Array`** (PR 1.1) — does Core already carry it, or is
this a shared Core change with Modbus's matching TODO?
4. **HSBY role tag** (PR 5.1) — confirm the canonical Active/Standby indicator across
ControlLogix v20 / v24 / v32+; without a known tag the role-prober is speculative.
5. **AOI InOut handling** (PR 2.6) — Kepware skips InOut parameters because they are
pointers, not values. Do we follow the same precedent or attempt to dereference at
read-time? Skip is the cheap default.
6. **L5K vs L5X coverage** — if the customer base has standardised on L5X (Studio 5000
v21+), can we ship PR 2.2 first and make PR 2.1 best-effort? Affects phasing within
Phase 2.
7. **HSBY scope for v2 vs v3** — Phase 5 carries the largest unknowns; if no continuous-
process customer demands it for the v2 release, deferring Phase 5 to v3 is reasonable.
8. **Per-tag scan rate plumbing** (PR 4.1) — does `PollGroupEngine` in Core already accept
per-reference interval overrides, or does that need a Core extension shared with the
other polling-overlay drivers (Modbus, FOCAS)?
+470
View File
@@ -0,0 +1,470 @@
# AbLegacy Driver — Implementation Plan
> Source of gap analysis: [featuregaps.md → AbLegacy](../featuregaps.md#ablegacy-allen-bradley-plc-5--slc--micrologix)
>
> Covers Build = Yes items only. Skip-rated gaps listed at bottom for traceability.
## Summary
The AbLegacy driver (PCCC over EtherNet/IP via libplctag) currently ships with parsing for the canonical SLC/PLC-5/MicroLogix file letters, four PLC-family profiles, bit-within-N-word RMW writes, a probe loop, and a flat static-config tag list. The `featuregaps.md` Recommendations table flags 13 gaps as **Build = Yes**:
1. DH+ via 1756-DHRIO bridging (#2)
2. PD/MG/PLS/BT files (#5)
3. PLC-5 octal addressing (#7)
4. Indirect/indexed addressing (#8)
5. Array contiguous block addressing (#9)
6. ST string read/write production verification (#10)
7. Sub-element bit semantics (`.DN` as Bit) (#11)
8. Auto-demote on comm failure (#13)
9. RSLogix 500/5 symbol import (#15)
10. Per-tag deadband / change filter (#18)
11. Diagnostic counters as tags (#20)
12. Per-device timeout / retry overrides (#21)
13. MicroLogix function-file naming (RTC/HSC/DLS) (#23)
The plan splits these across **5 phases / 13 PRs** (one PR per gap, with a couple of small ones bundled). Phases are ordered by coupling — addressing correctness first because everything downstream depends on the parser, then file/type coverage, then performance, then workflow tooling, then resilience. Each PR is sized to fit comfortably under the project's per-PR review budget (most S/M; only the RSLogix import is L).
## Phased delivery
| Phase | Theme | PRs | Gaps |
|-------|-------|-----|------|
| 1 | Addressing correctness | 4 | #7 octal, #8 indirect, #11 sub-element bits, #23 ML function files |
| 2 | File / type coverage | 2 | #5 PD/MG/PLS/BT, #10 ST verification |
| 3 | Performance | 2 | #9 array block, #18 per-tag deadband |
| 4 | Workflow | 3 | #15 RSLogix import, #21 per-device timeouts, #20 diagnostic counters |
| 5 | Resilience | 2 | #13 auto-demote, #2 DH+ bridging |
Phase 1 lands first because Phase 2 (PD/MG/PLS/BT) and Phase 3 (array reads) both extend the parser shipped in Phase 1. Phase 5 (auto-demote) reads diagnostic counters from Phase 4 #20, so 4 precedes 5.
---
## Per-PR detail
### Phase 1 — Addressing correctness
#### PR 1 — PLC-5 octal I/O addressing (#7)
**Scope**: PLC-5 documentation and RSLogix 5 use octal for `I:` / `O:` word and bit indices (`I:001/17` is rack 0 group 0 word 1, bit 17₈ = bit 15₁₀). Today `AbLegacyAddress.TryParse` does `int.TryParse` on the word number and bit index, silently accepting decimal. For `PlcFamily=Plc5` (and only that family) `I` / `O` files must parse as octal.
**Files**:
- `src/ZB.MOM.WW.OtOpcUa.Driver.AbLegacy/AbLegacyAddress.cs` — add `TryParse(string, AbLegacyPlcFamily)` overload; existing `TryParse(string)` keeps decimal semantics (back-compat for non-PLC-5 callers and pure shape validation).
- `src/ZB.MOM.WW.OtOpcUa.Driver.AbLegacy/AbLegacyDriver.cs``EnsureTagRuntimeAsync` and the bit-RMW path call the family-aware overload using `device.Options.PlcFamily`.
- `src/ZB.MOM.WW.OtOpcUa.Driver.AbLegacy/PlcFamilies/AbLegacyPlcFamilyProfile.cs` — add `OctalIoAddressing` flag (true for `Plc5` only).
**Test plan**:
- Unit (`tests/ZB.MOM.WW.OtOpcUa.Driver.AbLegacy.Tests/AbLegacyAddressTests.cs`): `I:001/17` parses to word=1, bit=15 under PLC-5; same string parses to bit=17 under SLC500. `O:7/10` (decimal under SLC500 = bit 10; octal under PLC-5 = bit 8).
- Round-trip: `ToLibplctagName()` must emit the format libplctag expects (verify libplctag's PLC-5 PCCC layer accepts octal-formatted I/O addresses, or whether we must convert decimal→octal-text before forwarding).
**Docs / fixture / e2e**:
- Update `docs/Driver.AbLegacy.Cli.md` — extend the "PCCC address primer" with an `I:` / `O:` row noting PLC-5 octal vs SLC500 decimal semantics; worked example showing `I:001/17` resolved differently per family.
- Update `docs/drivers/AbLegacy-Test-Fixture.md` — note octal-vs-decimal addressing as a covered family-aware parser dimension under the unit-coverage list.
- Fixture: extend `tests/ZB.MOM.WW.OtOpcUa.Driver.AbLegacy.IntegrationTests/Docker/docker-compose.yml` `plc5` profile to seed an `I:001` (or equivalent module-image word) tag if `ab_server --plc=PLC/5` accepts it; otherwise document the gap in `Docker/README.md`.
- E2E: add `--plc-type Plc5 -a "I:001/17"` octal-bit assertion to `scripts/e2e/test-ablegacy.ps1` (gated on the `plc5` compose profile being up); no change to `scripts/smoke/seed-ablegacy-smoke.sql` required (existing `N7:5` tag continues to cover the SLC500 path).
**Effort**: S
**Dependencies**: none
---
#### PR 2 — MicroLogix function-file letters (RTC / HSC / DLS / MMI / PTO / PWM / STI / EII / IOS / BHI) (#23)
**Scope**: MicroLogix 1100/1400 expose proprietary function files that don't share file letters with SLC. Today `IsKnownFileLetter` (`AbLegacyAddress.cs:97-101`) only allows the SLC/PLC-5 set, so any tag like `RTC:0.HR` is rejected at parse time even though libplctag's `micrologix` PlcType supports them.
**Files**:
- `src/ZB.MOM.WW.OtOpcUa.Driver.AbLegacy/AbLegacyAddress.cs` — extend `IsKnownFileLetter` to recognise multi-letter function-file types (`RTC`, `HSC`, `DLS`, `MMI`, `PTO`, `PWM`, `STI`, `EII`, `IOS`, `BHI`). Permit only when family is `MicroLogix`. The letter-scan loop already accepts any contiguous letters (`AbLegacyAddress.cs:80-82`).
- `src/ZB.MOM.WW.OtOpcUa.Driver.AbLegacy/AbLegacyDataType.cs` — define a sub-element catalogue per function-file (RTC has YR/MON/DAY/HR/MIN/SEC/DOW; HSC has ACC/HIP/LOP/OFS/etc.). Map each sub-element to the right `DriverDataType`.
- `src/ZB.MOM.WW.OtOpcUa.Driver.AbLegacy/PlcFamilies/AbLegacyPlcFamilyProfile.cs``SupportsFunctionFiles` flag.
**Test plan**:
- Unit: `RTC:0.HR` parses with `FileLetter="RTC"`, `WordNumber=0`, `SubElement="HR"`. `HSC:0.ACC` parses. Same strings under PlcFamily=Slc500 must reject (ML1100 file types not present on SLC).
- Integration (`tests/ZB.MOM.WW.OtOpcUa.Driver.AbLegacy.IntegrationTests`): only if a MicroLogix simulator profile exists; flag as TODO otherwise — verify libplctag `micrologix` PlcType accepts these tag names.
**Docs / fixture / e2e**:
- New doc `docs/drivers/AbLegacy-MicroLogix-FunctionFiles.md` — catalogue of supported function files (RTC/HSC/DLS/MMI/PTO/PWM/STI/EII/IOS/BHI), per-family availability matrix (ML1100 vs ML1400 vs ML1500), sub-element-to-DriverDataType table.
- Update `docs/Driver.AbLegacy.Cli.md` — add a "MicroLogix function files" row to the PCCC address primer with `RTC:0.HR` / `HSC:0.ACC` examples and a CLI worked example.
- Update `docs/drivers/AbLegacy-Test-Fixture.md` — record fixture coverage status for function files and link to the `micrologix` profile gap (only if `ab_server --plc=Micrologix` rejects function-file addresses, document the unit-only fallback).
- Fixture: extend `tests/ZB.MOM.WW.OtOpcUa.Driver.AbLegacy.IntegrationTests/Docker/docker-compose.yml` `micrologix` profile with `--tag=RTC0[1]` / `--tag=HSC0[1]` if accepted by `ab_server`, else mark as hardware-gated in `Docker/README.md`.
- E2E: add a parametric `-PlcType MicroLogix -Address RTC:0.HR` invocation to `scripts/e2e/test-ablegacy.ps1` (skip-when-fixture-gap, mirroring the existing `BadCommunicationError` gate); no `seed-ablegacy-smoke.sql` change unless the fixture supports function-file tags.
**Effort**: M
**Dependencies**: PR 1 (parser overload signature settled)
---
#### PR 3 — Sub-element bit semantics (`.DN`, `.EN`, `.TT`, `.CU`, `.CD`, `.OV`, `.UN`, `.ER`) (#11)
**Scope**: Today `T4:0.DN` parses fine but the `TimerElement`/`CounterElement`/`ControlElement` types collapse to `Int32` (`AbLegacyDataType.cs:41-44`). HMIs expect `.DN` / `.EN` / `.TT` / `.CU` / `.CD` / `.OV` / `.UN` / `.ER` to surface as `Boolean`. The fix is to detect the sub-element at tag-runtime build time and override the driver-surface type.
**Files**:
- `src/ZB.MOM.WW.OtOpcUa.Driver.AbLegacy/AbLegacyDataType.cs` — new helper `SubElementBitNames` (HashSet of bit-typed sub-elements per parent type — Timer: EN/TT/DN; Counter: CU/CD/DN/OV/UN; Control: EN/EU/DN/EM/ER/UL/IN/FD). New `EffectiveDriverDataType(AbLegacyDataType, string? subElement)` returning `Boolean` for bit-typed sub-elements, otherwise the existing mapping.
- `src/ZB.MOM.WW.OtOpcUa.Driver.AbLegacy/AbLegacyDriver.cs``DiscoverAsync` uses `EffectiveDriverDataType(def.DataType, parsed.SubElement)`; `ReadAsync` decodes the parent word and masks the bit instead of returning the whole word as Int32.
- `src/ZB.MOM.WW.OtOpcUa.Driver.AbLegacy/LibplctagLegacyTagRuntime.cs` — verify libplctag exposes `.DN` etc. as a single bit when read with `GetBit` against the sub-element address. If not, fall back to read-the-word + mask.
**Test plan**:
- Unit (`AbLegacyDriverTests` + new `AbLegacyDataTypeTests`): `T4:0.DN` discovers as Boolean; `T4:0.ACC` discovers as Int32; counter `.OV` is Boolean; control `.LEN` is Int32.
- Bit-write semantics: writing Boolean `true` to `T4:0.DN` should be rejected with `BadNotWritable` (timer status bits are PLC-set; verify by integration smoke test against the AbLegacy simulator).
**Docs / fixture / e2e**:
- Update `docs/Driver.AbLegacy.Cli.md` — extend the Timer/Counter/Control rows in the address primer with a "bit sub-elements surface as Boolean" note and a `--type Bool -a T4:0.DN` CLI example.
- Update `docs/drivers/AbLegacy-Test-Fixture.md` — note `AbLegacyDataTypeTests` as a new unit-coverage class under "What it actually covers".
- Fixture: no compose change required (T4/C5/R6 already seeded by `ab_server` defaults — verify; if not, add `--tag=T4[5]`/`--tag=C5[5]`/`--tag=R6[5]` to the `slc500` profile in `Docker/docker-compose.yml`).
- E2E: extend `scripts/e2e/test-ablegacy.ps1` with a Boolean sub-element read assertion (`read --type Bool -a T4:0.DN`) once the simulator round-trip works. Update `scripts/smoke/seed-ablegacy-smoke.sql` to add a Boolean tag binding `T4:0.DN` so the server-bridge assertion exercises the new mapping.
**Effort**: M
**Dependencies**: none (independent of PR 1/2 parser changes)
---
#### PR 4 — Indirect / indexed addressing parser (`N7:[N7:0]`, `N[N7:0]:5`) (#8)
**Scope**: Recipe / batch lookup tables use `N7:[N7:0]` (read N7 word indexed by the value at N7:0) or `N[N7:0]:5`. Today `AbLegacyAddress.TryParse` rejects both because it requires literal integer word and file numbers.
**Files**:
- `src/ZB.MOM.WW.OtOpcUa.Driver.AbLegacy/AbLegacyAddress.cs` — record gains nullable `IndirectFileSource` and `IndirectWordSource` (each itself an `AbLegacyAddress`). Parser handles `[<inner>]` segments at file-number or word-number positions. Recursion depth capped at 1 (libplctag accepts only one level of indirection per address — verify against libplctag PCCC docs).
- `src/ZB.MOM.WW.OtOpcUa.Driver.AbLegacy/AbLegacyDataType.cs` — no change.
- `src/ZB.MOM.WW.OtOpcUa.Driver.AbLegacy/AbLegacyDriver.cs` — pass-through; `ToLibplctagName()` re-emits the bracket form.
**Test plan**:
- Unit: `N7:[N7:0]` → outer file=N7, indirect word source = (N, 7, 0); `B3:[N7:0]/0` → bit, indirect word source = (N, 7, 0); `N[N7:0]:5` → indirect file source = (N, 7, 0), word=5; depth-2 (`N[N[N7:0]:5]:0`) must reject.
- Integration: verify libplctag's `slc500`/`plc5` PlcType accepts a `Name` of form `N7:[N7:0]` and resolves at read time. (If libplctag rejects indirect text, fall back to two-step read: resolve the inner address, then read the outer with the resolved index. Document the chosen strategy in the PR.)
**Docs / fixture / e2e**:
- New doc `docs/drivers/AbLegacy-Indirect-Addressing.md` — explain `N7:[N7:0]` and `N[N7:0]:5` syntax, the depth-1 limit, the chosen libplctag strategy (verbatim pass-through vs two-step resolve), and recipe-table use cases.
- Update `docs/Driver.AbLegacy.Cli.md` — add an indirect-addressing row to the address primer with `--address "N7:[N7:0]"` example.
- Update `docs/drivers/AbLegacy-Test-Fixture.md` — under unit coverage, list `AbLegacyAddressTests` indirect-parsing cases.
- Fixture: no `Docker/docker-compose.yml` change required (`N7[10]` already seeded; the inner index tag at `N7:0` is already addressable). Document recipe-pattern in `Docker/README.md`.
- E2E: extend `scripts/e2e/test-ablegacy.ps1` with an indirect-address driver-loopback case (write to `N7:0` to set the index, then read `N7:[N7:0]` and assert the value matches the previously-written content of the resolved word). Skip-gate behind libplctag capability check.
**Effort**: M
**Dependencies**: PR 1 (octal resolution must apply to inner address too if the outer file is `I:`/`O:` on PLC-5)
---
### Phase 2 — File / type coverage
#### PR 5 — PD / MG / PLS / BT structure files (#5)
**Scope**: Add PD (PID), MG (Message), PLS (Programmable Limit Switch), BT (Block Transfer) file types to the parser and the data-type catalogue. PD has SP/PV/CV/Error/Bias plus 25+ sub-elements; MG has Error/Length/Position/etc.; PLS has LEN/POS; BT is similar to MG.
**Files**:
- `src/ZB.MOM.WW.OtOpcUa.Driver.AbLegacy/AbLegacyAddress.cs` — extend `IsKnownFileLetter` with `PD`, `MG`, `PLS`, `BT`.
- `src/ZB.MOM.WW.OtOpcUa.Driver.AbLegacy/AbLegacyDataType.cs` — new enum members `PidElement`, `MessageElement`, `PlsElement`, `BlockTransferElement`. Sub-element catalogue per type — many PD sub-elements are Float32 (`SP`, `PV`, `CV`, `KP`, `KI`, `KD`), some are Boolean (`EN`, `DN`, `MO`, `PE`), some Int16 (`SPS`, `MAXS`, `MINS`).
- `src/ZB.MOM.WW.OtOpcUa.Driver.AbLegacy/LibplctagLegacyTagRuntime.cs` — verify libplctag PCCC supports addressing PD/MG/PLS/BT sub-elements by name; if not, the driver reads the parent struct as a byte block and offsets internally (libplctag docs to consult).
- `src/ZB.MOM.WW.OtOpcUa.Driver.AbLegacy/PlcFamilies/AbLegacyPlcFamilyProfile.cs``SupportsPidFile` etc. flags (PLC-5 supports PD/BT; SLC supports PD; ML1100/1400 generally do not — verify per family docs).
**Test plan**:
- Unit: `PD9:0.SP` → Float32; `PD9:0.EN` → Boolean; `MG10:0.LEN` → Int32; reject `PD9:0` (no sub-element on a struct file).
- Integration: smoke test against a simulator with PD file configured (verify pylogix/pycomm3 sim supports PD, otherwise mark as TODO and lean on unit coverage).
**Docs / fixture / e2e**:
- New doc `docs/drivers/AbLegacy-Structure-Files.md` — sub-element catalogues for PD / MG / PLS / BT, per-family availability matrix (PLC-5 vs SLC vs ML), DriverDataType per sub-element.
- Update `docs/Driver.AbLegacy.Cli.md` — add PD / MG / PLS / BT rows to the file-letter primer with `--type PidElement` etc. examples.
- Update `docs/drivers/AbLegacy-Test-Fixture.md` — list new structure-file file letters under unit coverage and note any fixture limitations (pd/mg likely not supported by `ab_server`).
- Fixture: extend `tests/ZB.MOM.WW.OtOpcUa.Driver.AbLegacy.IntegrationTests/Docker/docker-compose.yml` `slc500` and `plc5` profiles with `--tag=PD9[2]` / `--tag=MG10[2]` if `ab_server` accepts; otherwise document gap in `Docker/README.md` and rely on unit coverage.
- E2E: extend `scripts/e2e/test-ablegacy.ps1` with a `read --type Float -a PD9:0.SP` assertion when fixture exposes the file; add a corresponding tag row to `scripts/smoke/seed-ablegacy-smoke.sql` (skip-gated).
**Effort**: M
**Dependencies**: PR 3 (sub-element bit semantics machinery must exist first — PD `.EN` is Boolean by the same mechanism as Timer `.EN`)
---
#### PR 6 — ST string read/write production verification (#10)
**Scope**: ST is enum-listed and `LibplctagLegacyTagRuntime.DecodeValue` calls `_tag.GetString(0)`, but there's no integration coverage that ST round-trips through libplctag's 82-byte length-word format. This PR is verification + any fixes uncovered.
**Files**:
- `src/ZB.MOM.WW.OtOpcUa.Driver.AbLegacy/LibplctagLegacyTagRuntime.cs` — likely no source change if libplctag's `GetString`/`SetString` already handles the length-word convention; if not, add `GetByteArrayBuffer` + manual length-word decode.
- `tests/ZB.MOM.WW.OtOpcUa.Driver.AbLegacy.IntegrationTests/AbLegacyReadSmokeTests.cs` — add `ST_RoundTrip_*` tests against the simulator: write 82-char string, write 0-char, write 41-char, write embedded null/non-ASCII; round-trip each through ReadAsync.
- New `tests/ZB.MOM.WW.OtOpcUa.Driver.AbLegacy.Tests/AbLegacyStringEncodingTests.cs` — unit-level decode of a known length-word + payload byte buffer (mock `IAbLegacyTagRuntime` returning fixed bytes).
**Test plan**:
- Integration: 4 round-trip cases above; covers PlcFamily=Slc500 and PlcFamily=Plc5 (libplctag may handle the length word differently between the two PCCC layers — verify).
- Quality: unit test that `BadOutOfRange` surfaces when caller writes a 100-char string to an 82-byte ST.
**Docs / fixture / e2e**:
- Update `docs/Driver.AbLegacy.Cli.md` — expand the `ST` row in the address primer with the 82-byte limit, length-word convention, and a `write --type String --value "Hello"` worked example.
- Update `docs/drivers/AbLegacy-Test-Fixture.md` — list the new `AbLegacyStringEncodingTests` unit class and the four `ST_RoundTrip_*` integration cases under coverage.
- Fixture: extend `tests/ZB.MOM.WW.OtOpcUa.Driver.AbLegacy.IntegrationTests/Docker/docker-compose.yml` `slc500` and `plc5` profiles with `--tag=ST20[5]` so the round-trip tests have a real address to write against; document any `ab_server` ST gaps in `Docker/README.md`.
- E2E: extend `scripts/e2e/test-ablegacy.ps1` with a String round-trip case (`-a "ST20:0" --type String`) and a `String` tag row in `scripts/smoke/seed-ablegacy-smoke.sql` so the bridge assertion exercises ST.
**Effort**: S (mostly tests; small encoding fix if any)
**Dependencies**: none
---
### Phase 3 — Performance
#### PR 7 — Array contiguous block addressing (`N7:0,10` or `N7:0[10]`) (#9)
**Scope**: One PCCC frame can pull up to ~120 words. Today every tag is a separate libplctag instance and a separate request. The fix exposes array tags as a single tag with `IsArray=true` + `ArrayDim`, backed by a libplctag tag with `elem_count=N`.
**Files**:
- `src/ZB.MOM.WW.OtOpcUa.Driver.AbLegacy/AbLegacyAddress.cs` — record gains `ArrayCount` (nullable). Parser accepts `,N` suffix (Rockwell convention) and `[N]` suffix (libplctag convention) on the word number. Reject combination with sub-element or bit index.
- `src/ZB.MOM.WW.OtOpcUa.Driver.AbLegacy/AbLegacyDriverOptions.cs``AbLegacyTagDefinition` gains optional `ArrayLength` (overrides parsed value; convenient when address is parameterised).
- `src/ZB.MOM.WW.OtOpcUa.Driver.AbLegacy/IAbLegacyTagRuntime.cs``AbLegacyTagCreateParams` gains `ElementCount` (default 1).
- `src/ZB.MOM.WW.OtOpcUa.Driver.AbLegacy/LibplctagLegacyTagRuntime.cs` — pass `ElementCount` to libplctag `Tag.ElementCount` (verify libplctag supports element counts on PCCC PlcTypes — it does for ab_eip CIP tags but PCCC may behave differently).
- `src/ZB.MOM.WW.OtOpcUa.Driver.AbLegacy/AbLegacyDriver.cs``DiscoverAsync` emits `IsArray=true`, `ArrayDim=[N]`; `ReadAsync` decodes via per-index `_tag.GetInt16(i*2)` etc.
**Test plan**:
- Unit: `N7:0,10` parses ArrayCount=10; `N7:0[10]` same; `N7:0,10/3` rejects (array+bit); `T4:0,5.ACC` rejects (array+sub-element).
- Integration: read `N7:0,10` returns 10 elements in one frame; latency measurement vs 10 individual tags should be ≥ 5x faster (target).
**Docs / fixture / e2e**:
- Update `docs/Driver.AbLegacy.Cli.md` — add an "Array reads" section explaining `N7:0,10` vs `N7:0[10]` syntax and the per-PCCC-frame ~120-word ceiling, plus a `read --array-length 10 -a N7:0,10` CLI example.
- Update `docs/drivers/AbLegacy-Test-Fixture.md` — list array-block reads under unit coverage and note the latency benchmark integration test as a new perf-flagged case.
- Fixture: confirm `tests/ZB.MOM.WW.OtOpcUa.Driver.AbLegacy.IntegrationTests/Docker/docker-compose.yml` `--tag=N7[10]` / `--tag=F8[10]` already provide enough contiguous words; otherwise bump array sizes (`N7[120]` to allow max-frame tests).
- E2E: extend `scripts/e2e/test-ablegacy.ps1` with a `read -a "N7:0,10"` array assertion (parse comma-separated CLI output); add a matching `IsArray=1` tag row in `scripts/smoke/seed-ablegacy-smoke.sql` to exercise the address-space side.
**Effort**: M
**Dependencies**: PR 1 (octal applies to array index when the file is I/O on PLC-5)
---
#### PR 8 — Per-tag deadband / change filter (#18)
**Scope**: Today `PollGroupEngine` publishes every poll. Add absolute and percent deadband per tag — only emit `OnDataChange` when the new value differs by ≥ deadband.
**Files**:
- `src/ZB.MOM.WW.OtOpcUa.Driver.AbLegacy/AbLegacyDriverOptions.cs``AbLegacyTagDefinition` gains `AbsoluteDeadband` (double?), `PercentDeadband` (double?).
- `src/ZB.MOM.WW.OtOpcUa.Driver.AbLegacy/AbLegacyDriver.cs` — wrap the `PollGroupEngine` callback with a per-tag last-published-value cache and the deadband test. Booleans bypass deadband (always change-on-edge). Strings + status changes always publish.
- Verify: `PollGroupEngine` (in `Core.Drivers`) doesn't already centralise this — if it does, this PR threads the per-tag config through the engine instead of layering on top.
**Test plan**:
- Unit (new `AbLegacyDeadbandTests`): tag with `AbsoluteDeadband=1.0` reading `[10.0, 10.5, 11.5, 11.6]` publishes only `10.0` and `11.5`. Boolean tag publishes every transition. Status code change always publishes.
- Quality: ensure last-value cache doesn't leak across `ReinitializeAsync`.
**Docs / fixture / e2e**:
- Update `docs/Driver.AbLegacy.Cli.md` — add a "Deadband" subsection under subscribe with `--deadband-absolute` / `--deadband-percent` CLI flags and example.
- Update `docs/drivers/AbLegacy-Test-Fixture.md` — list `AbLegacyDeadbandTests` under unit coverage.
- Fixture: no compose change required (per-tag deadband is a config-side concern, not a server simulator one).
- E2E: extend `scripts/e2e/test-ablegacy.ps1` with a deadband subscribe assertion (subscribe with `--deadband-absolute 5`, write three small deltas, assert only one notification fires); add a tag row to `scripts/smoke/seed-ablegacy-smoke.sql` with `AbsoluteDeadband=5` to exercise the seed-from-config path.
**Effort**: S
**Dependencies**: none
---
### Phase 4 — Workflow
#### PR 9 — Per-device timeout / retry overrides (#21)
**Scope**: Replace single driver-wide `Timeout` with per-device override (SLC 5/01 needs ~5 s, SLC 5/05 fine at 2 s, ML1100 sometimes 3 s). Optional retry count per device.
**Files**:
- `src/ZB.MOM.WW.OtOpcUa.Driver.AbLegacy/AbLegacyDriverOptions.cs``AbLegacyDeviceOptions` gains optional `Timeout`, `Retries`. `AbLegacyDriverOptions.Timeout` becomes the driver-wide default.
- `src/ZB.MOM.WW.OtOpcUa.Driver.AbLegacy/AbLegacyDriver.cs``EnsureTagRuntimeAsync` and `ProbeLoopAsync` use `device.Options.Timeout ?? _options.Timeout`. `ReadAsync` retry loop honours `device.Options.Retries`.
**Test plan**:
- Unit: device with `Timeout=TimeSpan.FromSeconds(5)` propagates into `AbLegacyTagCreateParams.Timeout`; absent override falls back to driver-wide.
- Integration: simulate a slow device (1 s artificial delay) — driver-wide 2 s passes; reducing per-device to 500 ms surfaces `BadCommunicationError` on the slow device while the fast device keeps reading.
**Docs / fixture / e2e**:
- Update `docs/Driver.AbLegacy.Cli.md` — document per-device `--timeout-ms` / `--retries` precedence vs driver-wide defaults; add a tuning cheat-sheet for SLC 5/01 vs 5/05 vs ML1100.
- Update `docs/drivers/AbLegacy-Test-Fixture.md` — note per-device options under the AbLegacyDeviceOptions surface.
- Fixture: no compose change. Add a slow-device test harness using a `tc qdisc add dev eth0 delay 1000ms` sidecar (or a Linux `iptables -j DELAY` shim) — document in `Docker/README.md` as an optional perf-tuning fixture.
- E2E: no `test-ablegacy.ps1` change needed (per-device timeout is integration-test territory). Add a `Timeout=PT500MS` device-level row to `scripts/smoke/seed-ablegacy-smoke.sql` so the seed path exercises the new column.
**Effort**: S
**Dependencies**: none
---
#### PR 10 — Diagnostic counters as tags (#20)
**Scope**: Per-device diagnostic counters (request count, response count, retry count, last-error code, comm-failures) surface as auto-generated tags under `AbLegacy/<host>/_Diagnostics/*` so HMIs can bind directly. Mirrors what other drivers expose.
**Files**:
- `src/ZB.MOM.WW.OtOpcUa.Driver.AbLegacy/AbLegacyDriver.cs``DeviceState` gains `Counters` (record of int64s). `ReadAsync`, `WriteAsync`, `ProbeLoopAsync` increment counters on success/failure paths. `DiscoverAsync` emits a `_Diagnostics` folder per device with seven Variables: `RequestCount`, `ResponseCount`, `ErrorCount`, `RetryCount`, `LastErrorCode`, `LastErrorMessage`, `CommFailures`.
- New `src/ZB.MOM.WW.OtOpcUa.Driver.AbLegacy/AbLegacyDiagnosticTags.cs` — generates the 7 well-known tag names; reading them returns counter snapshots from `DeviceState.Counters`.
- `src/ZB.MOM.WW.OtOpcUa.Driver.AbLegacy/AbLegacyDriver.cs` `ReadAsync` short-circuits diagnostic tag references before dispatching to libplctag.
**Test plan**:
- Unit (new `AbLegacyDiagnosticsTests`): force 5 reads (3 success, 2 fail) → `RequestCount=5`, `ErrorCount=2`. `LastErrorCode` reflects the last libplctag status. Counters reset on `ReinitializeAsync`.
- Quality: verify the 7 well-known names don't collide with user-config tag names (reject overlap at `InitializeAsync`).
**Docs / fixture / e2e**:
- New doc `docs/drivers/AbLegacy-Diagnostics.md` — the seven well-known counter tag names, their semantics, namespace convention (`_Diagnostics` folder per device), reset behaviour on `ReinitializeAsync`, and HMI binding examples.
- Update `docs/Driver.AbLegacy.Cli.md` — note that diagnostic tags surface alongside user-config tags and can be `read --address _Diagnostics/RequestCount` (or whatever the canonical CLI shape ends up being).
- Update `docs/drivers/AbLegacy-Test-Fixture.md` — list `AbLegacyDiagnosticsTests` and call out the collision-rejection contract.
- Fixture: no compose change.
- E2E: extend `scripts/e2e/test-ablegacy.ps1` with a "after N reads, RequestCount==N" assertion against the diagnostic NodeId published by the OPC UA server-bridge step; add a `_Diagnostics/RequestCount` Tag row to `scripts/smoke/seed-ablegacy-smoke.sql` if the addr-space team requires explicit registration.
**Effort**: M
**Dependencies**: none
---
#### PR 11 — RSLogix 500 / PLC-5 symbol & data-table import (#15)
**Scope**: Import RSLogix exports (`.RSS` Slc500, `.RSP` Plc5, `.SLC` text export) to seed `AbLegacyTagDefinition` entries. The binary `.RSS`/`.RSP` formats are proprietary and largely undocumented; the practical strategy is to support the `.SLC` / `.CSV` text exports that RSLogix can produce ("save as text" / "Database Export"). Verify whether libplctag or a sister project ships an `.RSS` parser — if not, scope to text exports only and document the binary case as a future enhancement.
**Files**:
- New `src/ZB.MOM.WW.OtOpcUa.Driver.AbLegacy/Import/RsLogixSymbolImport.cs` — parses RSLogix text export (CSV: `Symbol,Address,Description,DataType,Scope`).
- New `src/ZB.MOM.WW.OtOpcUa.Driver.AbLegacy/Import/IRsLogixImporter.cs` — abstraction for future binary support.
- `src/ZB.MOM.WW.OtOpcUa.Driver.AbLegacy/AbLegacyDriverFactoryExtensions.cs` — extension method `AddRsLogixImport(string path, string deviceHostAddress)` materialises `AbLegacyTagDefinition` entries from the file at startup-time.
- New CLI command in `src/ZB.MOM.WW.OtOpcUa.Driver.AbLegacy.Cli/` (mirrors AbCip CLI patterns — verify: confirm the AbLegacy CLI project layout): `import-rslogix --file foo.csv --device ab://... --emit appsettings-fragment`.
**Test plan**:
- Unit (new `RsLogixSymbolImportTests`): canonical CSV with one of each file letter (N/F/B/L/ST/T/C/R) generates 8 `AbLegacyTagDefinition` entries with correct `DataType`. Malformed rows skipped with logged warning. Comments and header rows skipped.
- Integration: an end-to-end test with a recorded RSLogix CSV (committed under `tests/ZB.MOM.WW.OtOpcUa.Driver.AbLegacy.Tests/Fixtures/`) produces an addr-space matching a golden snapshot.
**Docs / fixture / e2e**:
- New doc `docs/drivers/AbLegacy-RSLogix-Import.md` — supported export formats (CSV / .SLC text), CSV column convention, scope handling, the `import-rslogix` CLI subcommand, and the explicit non-goal of binary `.RSS`/`.RSP` parsing for v1.
- Update `docs/Driver.AbLegacy.Cli.md` — add an `import-rslogix` subcommand row to the commands table with `--file foo.csv --device ab://... --emit appsettings-fragment` example.
- Update `docs/DriverClis.md` if it carries a per-CLI command matrix.
- Update `docs/drivers/AbLegacy-Test-Fixture.md` — list `RsLogixSymbolImportTests`, the new `tests/ZB.MOM.WW.OtOpcUa.Driver.AbLegacy.Tests/Fixtures/` golden CSV, and the import-then-read integration scenario.
- Fixture: new committed CSV under `tests/ZB.MOM.WW.OtOpcUa.Driver.AbLegacy.Tests/Fixtures/rslogix-canonical.csv` plus the corresponding golden snapshot. No `Docker/docker-compose.yml` change.
- E2E: extend `scripts/e2e/test-ablegacy.ps1` with an `import-rslogix` invocation that emits an appsettings fragment, then asserts the resulting tag count matches the CSV row count. No `seed-ablegacy-smoke.sql` change (importer is offline tooling).
**Effort**: L (parser + CLI + golden-snapshot fixture)
**Dependencies**: PR 15 complete (importer must produce addresses the parser accepts)
---
### Phase 5 — Resilience
#### PR 12 — Auto-demote on comm failure (#13)
**Scope**: When a device fails N consecutive reads/probes, mark it Demoted and skip its tags for `DemoteFor` seconds — so one slow PLC doesn't starve fast PLCs sharing the same driver/poll cadence.
**Files**:
- `src/ZB.MOM.WW.OtOpcUa.Driver.AbLegacy/AbLegacyDriverOptions.cs` — new `AbLegacyDemoteOptions { FailureThreshold=3, DemoteFor=TimeSpan.FromSeconds(30), Enabled=true }` on `AbLegacyDeviceOptions`.
- `src/ZB.MOM.WW.OtOpcUa.Driver.AbLegacy/AbLegacyDriver.cs``DeviceState` gains `ConsecutiveFailures`, `DemotedUntilUtc`. `ReadAsync` short-circuits demoted devices with `BadCommunicationError` until `DemotedUntilUtc`. `ProbeLoopAsync` clears demote on first success. New `HostState.Demoted` enum value (verify `HostState` is in `Core.Abstractions` and adding a member is non-breaking).
- Diagnostic tags from PR 10 gain `DemoteCount` and `LastDemotedUtc`.
**Test plan**:
- Unit (new `AbLegacyAutoDemoteTests`): force 3 consecutive failures → device transitions to `Demoted`; reads while demoted return `BadCommunicationError` without invoking libplctag (verify via test fake counting `ReadAsync` calls). After `DemoteFor` expires, the next read attempt goes through.
- Integration: two devices on the same driver, one with a fault — fault doesn't slow down the healthy one.
**Docs / fixture / e2e**:
- New doc `docs/drivers/AbLegacy-AutoDemote.md` (or a section appended to `AbLegacy-Diagnostics.md` from PR 10) — failure-threshold + demote-window semantics, interaction with the probe loop, the `HostState.Demoted` enum value, recovery path.
- Update `docs/Driver.AbLegacy.Cli.md` — add `--demote-failure-threshold` / `--demote-for` per-device flags and document how `probe` reflects the Demoted state.
- Update `docs/drivers/AbLegacy-Test-Fixture.md` — list `AbLegacyAutoDemoteTests` and the two-device fault-isolation integration case.
- Fixture: extend `tests/ZB.MOM.WW.OtOpcUa.Driver.AbLegacy.IntegrationTests/Docker/docker-compose.yml` with a second `slc500-faulty` service that listens on `:44819` but rejects every read (or doesn't bind, simulating ECONNREFUSED). The driver test then targets both `:44818` (healthy) and `:44819` (faulty) to exercise demotion.
- E2E: extend `scripts/e2e/test-ablegacy.ps1` with a "kill simulator, observe demotion in `_Diagnostics/DemoteCount`" assertion (gated on PR 10's diagnostic tags being present). Add a `DemoteFor=PT30S` device row to `scripts/smoke/seed-ablegacy-smoke.sql`.
**Effort**: M
**Dependencies**: PR 10 (diagnostic counters)
---
#### PR 13 — DH+ via 1756-DHRIO bridging (#2)
**Scope**: Allow addressing a PLC-5 sitting on a DH+ link reached through a ControlLogix chassis with a 1756-DHRIO module. The CIP path syntax is `1,<slot>,2,<dh+_station_octal>` — already accepted as a string by `AbLegacyHostAddress`, but we should validate and document it, and verify libplctag's `plc5` PlcType resolves DH+ stations correctly through the DHRIO port.
**Files**:
- `src/ZB.MOM.WW.OtOpcUa.Driver.AbLegacy/AbLegacyHostAddress.cs` — add validation for the DH+ path form `1,<slot>,2,<station>` where station is 0..77 octal. Surface the parsed components (`BackplaneSlot`, `DhPlusPort`, `DhPlusStation`) for diagnostics.
- `src/ZB.MOM.WW.OtOpcUa.Driver.AbLegacy/PlcFamilies/AbLegacyPlcFamilyProfile.cs` — note that DH+ bridging is a `Plc5`-only path (DHRIO doesn't bridge to SLC/ML).
- `docs/Driver.AbLegacy.Cli.md` — add a worked example of DHRIO routing.
**Test plan**:
- Unit (`AbLegacyHostAndStatusTests`): `ab://10.0.0.1/1,3,2,07` parses with slot=3, station=7₈=7. `ab://10.0.0.1/1,3,2,77` parses station=77₈=63. `ab://10.0.0.1/1,3,2,80` rejects (octal range).
- Integration: requires a real DHRIO + PLC-5 — flag as hardware-gated; cover with unit-only for now and document the manual smoke procedure (`docs/Driver.AbLegacy.Cli.md`).
**Docs / fixture / e2e**:
- New doc `docs/drivers/AbLegacy-DH-Bridging.md` — the `1,<slot>,2,<station_octal>` CIP path syntax, DHRIO module wiring overview, octal-station-number reference (00..77 octal = 0..63), restriction to PLC-5 family, and the manual smoke procedure since DHRIO can't be simulated.
- Update `docs/Driver.AbLegacy.Cli.md` — extend the family/cip-path cheat sheet with a "PLC-5 via DHRIO" row showing `ab://logix-host/1,3,2,07` and a worked CLI example. (Plan already calls this out at line 279 — keep it, but link to the new dedicated doc.)
- Update `docs/drivers/AbLegacy-Test-Fixture.md` — note that DH+ bridging is unit-only (no fixture support possible) and reference the manual hardware smoke procedure.
- Fixture: no `Docker/docker-compose.yml` change is feasible (DHRIO is hardware-only).
- E2E: no new automated `test-ablegacy.ps1` case (would require real DHRIO). Add a `-DhPlusStation 7` parameter form documented in the script comment header for hardware-gated runs only. No `seed-ablegacy-smoke.sql` change.
**Effort**: S
**Dependencies**: PR 1 (octal parsing utility) — share the octal-int helper between PR 1 and PR 13.
---
## Documentation, fixture, and e2e impact
Consolidated view of every doc, fixture, and e2e/smoke artefact this plan touches, so reviewers and PR authors can size the non-code surface area at a glance.
### New docs (created by this plan)
| Doc | Created by | Purpose |
|-----|-----------|---------|
| `docs/drivers/AbLegacy-MicroLogix-FunctionFiles.md` | PR 2 | Function-file catalogue (RTC/HSC/DLS/MMI/PTO/PWM/STI/EII/IOS/BHI), per-family availability, sub-element types |
| `docs/drivers/AbLegacy-Indirect-Addressing.md` | PR 4 | `N7:[N7:0]` and `N[N7:0]:5` syntax, depth-1 limit, libplctag strategy |
| `docs/drivers/AbLegacy-Structure-Files.md` | PR 5 | PD / MG / PLS / BT sub-element catalogues + per-family availability matrix |
| `docs/drivers/AbLegacy-Diagnostics.md` | PR 10 | Seven well-known counter tag names, namespace convention, reset semantics |
| `docs/drivers/AbLegacy-RSLogix-Import.md` | PR 11 | CSV / `.SLC` text-export schema, `import-rslogix` CLI, binary-format non-goals |
| `docs/drivers/AbLegacy-AutoDemote.md` (or PR 10 doc extension) | PR 12 | Demote thresholds, recovery, `HostState.Demoted` semantics |
| `docs/drivers/AbLegacy-DH-Bridging.md` | PR 13 | `1,<slot>,2,<station_octal>` CIP path, DHRIO wiring, manual smoke procedure |
### Updated docs (extended by this plan)
- `docs/Driver.AbLegacy.Cli.md` — extended by **every** PR (octal I/O, function files, sub-element bits, indirect, structure files, ST round-trip, array reads, deadband flags, per-device timeouts, diagnostic tags, RSLogix import subcommand, demote flags, DHRIO cheat-sheet row).
- `docs/drivers/AbLegacy-Test-Fixture.md` — extended by **every** PR with new unit test classes, integration cases, and fixture limitations.
- `docs/DriverClis.md` — touched by PR 11 (new `import-rslogix` subcommand row).
- `tests/ZB.MOM.WW.OtOpcUa.Driver.AbLegacy.IntegrationTests/Docker/README.md` — touched by PRs 1, 2, 4, 5, 9, 12 (fixture limitations, optional perf-tuning sidecars, faulty-device service, recipe-pattern note).
### Fixture / scaffolding work
- `tests/ZB.MOM.WW.OtOpcUa.Driver.AbLegacy.IntegrationTests/Docker/docker-compose.yml`:
- PR 1: extend `plc5` profile with `I:001`-style tags (if `ab_server` accepts).
- PR 2: extend `micrologix` profile with `RTC0[1]`/`HSC0[1]` (if accepted).
- PR 3: extend `slc500` profile with `T4[5]`/`C5[5]`/`R6[5]` if not already seeded by `ab_server` defaults.
- PR 5: extend `slc500` and `plc5` profiles with `PD9[2]`/`MG10[2]` (if accepted).
- PR 6: extend `slc500` and `plc5` profiles with `ST20[5]`.
- PR 7: bump array sizes (`N7[120]`) for max-frame array-read tests.
- PR 12: add a second `slc500-faulty` service for demotion/fault-isolation tests.
- `tests/ZB.MOM.WW.OtOpcUa.Driver.AbLegacy.Tests/Fixtures/`:
- PR 11: new `rslogix-canonical.csv` + golden snapshot for the symbol-import integration test.
### E2E / smoke scripts
- `scripts/e2e/test-ablegacy.ps1`:
- PR 1: octal-bit `Plc5` assertion.
- PR 2: `MicroLogix RTC:0.HR` parametric.
- PR 3: Boolean sub-element read (`T4:0.DN`).
- PR 4: indirect-address loopback.
- PR 5: `PD9:0.SP` Float read (skip-gated).
- PR 6: ST round-trip.
- PR 7: array-read `N7:0,10`.
- PR 8: deadband subscribe assertion.
- PR 10: `_Diagnostics/RequestCount` assertion via OPC UA bridge.
- PR 11: `import-rslogix` invocation + tag-count assertion.
- PR 12: kill-simulator-and-observe-demote assertion.
- PR 13: parameter-only header note for hardware-gated DHRIO runs.
- `scripts/smoke/seed-ablegacy-smoke.sql`:
- PR 3: `T4:0.DN` Boolean tag row.
- PR 5: `PD9:0.SP` PidElement tag row (skip-gated).
- PR 6: `ST20:0` String tag row.
- PR 7: `N7:0,10` array tag row (`IsArray=1`).
- PR 8: tag row with `AbsoluteDeadband=5`.
- PR 9: device row with `Timeout=PT500MS`.
- PR 10: `_Diagnostics/RequestCount` tag row (if explicit registration required).
- PR 12: device row with `DemoteFor=PT30S`.
---
## Skip-rated items (for context)
For traceability, the gaps the recommendations table flagged **No**:
| # | Gap | Skip rationale |
|---|-----|----------------|
| 1 | Serial DF1 transports (full-duplex, half-duplex, KF2/KF3) | libplctag has no serial path; declining install base |
| 3 | DH-485 routing (1761/1747-AIC) | Very legacy; rare in greenfield |
| 4 | M0 / M1 module file access | Niche RIO modules; declining |
| 6 | D (BCD) and Long-BCD types | Very legacy data convention |
| 12 | Block read-size negotiation per family | libplctag handles chunking implicitly |
| 14 | Channel-shared comm serialisation | Only matters for serial / DH+ transport (not built) |
| 16 | Online controller browse / data-table discovery | PCCC dir frame limited; libplctag support unclear |
| 17 | DF1 BCC vs CRC-16 selection | Predicated on DF1 transport (gap #1) |
| 19 | PLC-5 typed-read selection / Force Logical | libplctag defaults are sound; niche tuning |
| 22 | Write completion semantics options | Niche tuning; current write-through is safe default |
These remain documented in `featuregaps.md` and can be reopened if customer feedback warrants.
---
## Open questions
1. **libplctag PCCC capability verification** — several PRs (especially 2, 4, 5, 7) hinge on what libplctag's `slc500` / `micrologix` / `plc5` / `logixpccc` PlcTypes actually accept in the `Name` attribute. Before scheduling Phase 2 we should run a one-day spike with the AbLegacy simulator to confirm:
- Does libplctag accept indirect addresses (`N7:[N7:0]`) verbatim, or do we need to resolve in two steps?
- Does it accept array notation (`N7:0,10` vs `N7:0[10]`) for PCCC PlcTypes?
- Does it expose PD/MG/PLS/BT sub-elements by name, or do we read the parent struct as a byte block?
- Does it correctly handle PLC-5 octal in I:/O: addresses, or does the driver need to convert?
2. **MicroLogix simulator fidelity** — we don't currently know whether the AbLegacy integration-test fixture (`AbLegacyServerFixture`) simulates the MicroLogix function files (RTC/HSC/DLS). PR 2's integration coverage is gated on this. If not, we either extend the fixture or scope PR 2 to unit-only tests + a hardware smoke-test playbook.
3. **RSLogix import format coverage** — binary `.RSS` / `.RSP` parsing is non-trivial. PR 11 scopes to text/CSV exports. Should we instead invest in shelling out to the (free) Rockwell `RSWho` / `RSLogix Emulate` tooling for binary conversion, or accept text-only as the v1 scope and revisit?
4. **Address-space rebuild on tag-set change** — when PR 11 (RSLogix import) adds 1000+ tags, does `ReinitializeAsync` perform acceptably, or do we need an incremental discovery path? Out of scope for this plan but worth flagging.
5. **Diagnostic tag namespace collision** — PR 10 reserves `_Diagnostics` under each device folder. Confirm with the address-space team that the leading underscore is the established convention (other drivers use `_System` or `_DiagnosticTags`); align before implementation.
+807
View File
@@ -0,0 +1,807 @@
# FOCAS Driver — Implementation Plan
> Source of gap analysis: [featuregaps.md → FOCAS](../featuregaps.md#focas-fanuc-cnc)
>
> Covers Build = Yes items only.
## Summary
The FOCAS driver today is a pure-managed, read-only FOCAS/2 wire client
(`src/ZB.MOM.WW.OtOpcUa.Driver.FOCAS/Wire/`) backing a fixed-tree projection
plus user-authored `PARAM:` / `MACRO:` / PMC tags. It exposes a thin set of
calls (`cnc_sysinfo`, `cnc_rdcncstat`, `cnc_rdaxisname`, `cnc_rdspdlname`,
`cnc_rddynamic2`, `cnc_rdsvmeter`, `cnc_rdspload`, `cnc_rdspmaxrpm`,
`cnc_exeprgname2`, `cnc_rdblkcount`, `cnc_rdopmode`, `cnc_rdtimer`,
`cnc_rdparam`, `cnc_rdmacro`, `pmc_rdpmcrng`, `cnc_rdalmmsg2`).
The featuregaps table marks **18** items as Build = Yes. They cluster into
five distinct workstreams:
1. **Phase 1 — fixed-tree expansion** (#6, #7, #8, #10, #11, #12, #13, #14,
#18, #20, #24, #27). These are mostly new wire calls plumbed into the
existing `FixedTree*` poll cadences; no architectural change.
2. **Phase 2 — addressing additions** (#4, #14 DIAG scheme, #15, #16). New
`FocasAreaKind` values, new capability-matrix entries, multi-path
`PathId`. Touches the parser + matrix + wire envelope; mostly additive.
3. **Phase 3 — alarm history** (#17). Extends the existing
`FocasAlarmProjection` with a one-shot history pull on connect plus
periodic delta polls.
4. **Phase 4 — write path** (#1, #3). The biggest behavioural change in
the driver's lifetime: removes the `BadNotWritable` short-circuit, adds
`cnc_wrparam` / `pmc_wrpmcrng` / `cnc_wrmacro` plus FOCAS password
handling. Material risk surface — see Risks.
5. **Phase 5 — derived telemetry** (#24 cycle-delta computation). Optional
companion to #24 raw cycle time; computes "last completed cycle" from
the existing cumulative `Cycle` timer.
DIAG (#14) is in Phase 2 (addressing) rather than Phase 1 because it
needs a new address scheme, but the fixed-tree status flag projection
(#12) is the cheapest item and should land first as a vertical slice.
The remaining 9 items in the featuregaps table (HSSB, Series 15 / 35i,
tool-offset write, program upload/download, DPRNT, deep servo info,
acceleration/jerk, operator preset commands, NTP) are scoped out as
Build = No; they appear in [Skip-rated items](#skip-rated-items-for-context)
for context only.
## Phased delivery
| Phase | Scope | Gaps closed | Approx PRs | Risk |
|-------|-------|-------------|------------|------|
| 1 | Fixed-tree expansion (read-only) | 12, 13, 7, 8, 10, 11, 20, 18, 6, 24, 27, 14 (read-only piece) | 6 | Low |
| 2 | Addressing additions | 4, 15, 16, 14 (DIAG: scheme) | 4 | Medium (multi-path) |
| 3 | Alarm history | 17 | 1 | Low |
| 4 | Write path + password | 1, 3 | 4 | High (read-only design choice removed) |
| 5 | Cycle-delta derived telemetry | 24 (delta companion) | 1 | Low |
Phases 13 are mutually independent and can ship in any order. Phase 4
deliberately follows Phase 2 so writes ride on top of the multi-path
addressing already in place. Phase 5 tags onto the cycle-time node from
Phase 1.
## Per-PR detail
### Phase 1 — fixed-tree expansion
Common shape: each PR adds one or more wire calls in
`Wire/FocasWireClient.cs`, surfaces them on `IFocasClient`, plumbs them
into `FocasDriver`'s `FixedTreeLoopAsync` cadences (axis 250 ms / program
1 s / timer 30 s) and the `TryReadFixedTree` synthesizer, then adds
fakes + assertions.
**PR F1-a — ODBST status flags as fixed-tree nodes (#12)**
- Scope: project the 9 fields of `cnc_rdcncstat` (`tmmode`, `aut`, `run`,
`motion`, `mstb`, `emergency`, `alarm`, `edit`, `dummy`) under
`Status/` per device. We already issue this call in `ProbeAsync`; this
PR keeps the boolean probe but additionally caches the full struct on
every poll tick.
- Files:
`Wire/FocasWireClient.cs` (extend `ReadStatusAsync` to return the
whole `WireStatus` rather than only `IsOk`), `IFocasClient.cs` (new
`GetStatusAsync`), `FocasDriver.cs` (new `Status/*` branch in
`TryReadFixedTree`, status cache on `DeviceState`).
- Tests:
`tests/ZB.MOM.WW.OtOpcUa.Driver.FOCAS.Tests/FocasFixedTreeStatusTests.cs`
(new) — `FakeFocasClient` returns canned ODBST, assert each field maps
to the expected `Status/*` browse name. Integration: extend
`FocasSimFixture` to seed the simulator's status response and assert
via the OPC UA client.
- **Docs / fixture / e2e**: extend `docs/drivers/FOCAS.md` fixed-tree
table with the 9 `Status/*` nodes; mention the boolean-probe →
full-struct change in `docs/drivers/FOCAS-Test-Fixture.md` integration
bullet list; teach `focas-mock` (under
`tests/.../IntegrationTests/Docker/focas-mock/`) the `cnc_rdcncstat`
payload shape per `docs/v2/implementation/focas-wire-protocol.md`
(add ODBST struct entry); extend `FocasSimFixture` with a helper to
patch the canned status payload; new
`Series/StatusFlagsPopulateTests.cs` integration test.
- Effort: small; one wire call already exists.
- Risk: Low.
**PR F1-b — parts count + cycle time (#13, #24 raw)**
- Scope: surface `cnc_rdparam(6711)` (parts produced), `6712` (parts
required), `6713` (parts total since power-on) under `Production/`,
plus `Production/CycleTimeSeconds` (already exposed as
`Timers/CycleSeconds` — promote to the `Production/` group too with
the same backing). The existing `cnc_rdtimer` call is sufficient.
- Files: `FocasDriver.cs` (`Production/*` branch, parameter-cached
reads on the timer poll cadence), `IFocasClient.cs` (no new call —
rides on `ReadParameterInt32Async`).
- Tests: `FakeFocasClient` returns canned parameter values; assert
`Production/PartsTotal` equals the canned value.
- **Docs / fixture / e2e**: add `Production/*` rows to the fixed-tree
table in `docs/drivers/FOCAS.md`; add `Production:` example to
`docs/Driver.FOCAS.Cli.md` (a `read -a PARAM:6711` snippet); the
parts-count parameters (6711/6712/6713) are already in the
simulator profile range, so only the `dl205`-style profile JSON
under `tests/.../Docker/focas-mock/profiles/` needs seeded values
added; extend `FocasSimFixture` with a `SeedPartsCount` helper;
integration test under `Series/ProductionPopulatesTests.cs`.
- Effort: small.
- Risk: Low.
**PR F1-c — modal G/M/T codes (#7) + override values (#11)**
- Scope: add `cnc_modal` (command id TBD per `fwlib32.h` — the wire
protocol uses the same numeric command convention seen in
`FocasWireClient`; capture during simulator iteration). Project:
`Modal/G_Group{n}` (groups 1..21), `Modal/MCode`, `Modal/SCode`,
`Modal/TCode`, `Modal/BCode`. Adds `Override/Feed`, `Override/Rapid`,
`Override/Spindle`, `Override/Jog` from `cnc_rdparam(...)` — the
override percent registers live at known parameter numbers; numbers
are MTB-specific so pull defaults from
`docs/v2/focas-version-matrix.md` and let operators override per device.
- Files: `Wire/FocasWireClient.cs` (new `ReadModalAsync`), new
`Wire/FocasWireModels.cs` records `WireModal` / `WireModalGroup`,
`IFocasClient.cs` (new `GetModalAsync`), `FocasDriver.cs` (new
poll-medium branches under the program-poll cadence).
- Tests: `FocasModalTests.cs` (unit), simulator handler returns canned
modal payload, integration asserts `Modal/G_Group1` text.
- **Docs / fixture / e2e**: add `Modal/*` and `Override/*` sections to
the fixed-tree table in `docs/drivers/FOCAS.md`, including the
G-group decode table for groups 01/03/06/07/14; add a `MODAL:`
address example row to `docs/Driver.FOCAS.Cli.md` (new `read -a
MODAL:G1` style — note: this PR does NOT add a new address scheme,
the modal data is fixed-tree only, so the CLI example reads via
`read -n "ns=2;s=Modal/G_Group1"` over the OPC UA endpoint);
document MTB-specific override register defaults in
`docs/v2/focas-version-matrix.md` (new `Override registers per
series` table); capture the `cnc_modal` command id resolved during
simulator iteration into `docs/v2/implementation/focas-wire-protocol.md`
(new struct entry — promote out of the open-questions list);
update `docs/v2/implementation/focas-simulator-plan.md` Stream C
protocol-surface table with the new `cnc_modal` handler;
extend focas-mock with a `cnc_modal` command-id handler + canned
modal payload per profile; integration test reading G54/G90 modal
state via `Series/ModalPopulatesTests.cs`.
- Effort: medium — `cnc_modal` returns a multi-group struct; encoding
needs care.
- Risk: Medium — modal-group numbering varies by series; treat the
raw integer as the value the CNC reports and surface a string
decode table only for the universally-present groups (G-group 01
motion, 03 absolute/incremental, 06 input units, 07 cutter comp,
14 work coordinate). Document MTB-specific groups as raw int.
**PR F1-d — tool number / tool life (#8) + work coordinate offsets (#10)**
- Scope: add `cnc_rdtofs` / `cnc_rdtlife*` / `cnc_rdzofs`. Project
`Tooling/CurrentTool`, `Tooling/CurrentOffset`,
`Tooling/Life/{group}/Remaining`, `Tooling/Life/{group}/Total`,
`Offsets/G54..G59[+ extended]/{X,Y,Z}`.
- Files: new wire calls in `Wire/FocasWireClient.cs` (`ReadToolOffsetAsync`,
`ReadToolLifeAsync`, `ReadWorkOffsetAsync`), `Wire/FocasWireModels.cs`
(records), `IFocasClient.cs`, `FocasDriver.cs` (new `Tooling/` and
`Offsets/` branches; both poll on the slow timer cadence — these
change at setup time, not per-cycle), capability matrix per-call
suppression like the existing `Spindle/` gating.
- Tests: unit + simulator. Tool-life is the largest payload; assert
array projection rather than per-tool nodes (one ValueRank=1 array
per group keeps the address-space size bounded on machines with
500+ tool slots).
- **Docs / fixture / e2e**: add `Tooling/*` and `Offsets/*` sections to
the fixed-tree table in `docs/drivers/FOCAS.md`, including the
ValueRank=1 array note for tool-life groups; add a per-series
capability-suppression row to `docs/v2/focas-version-matrix.md`
(which series support `cnc_rdtlife*` vs not); document the three
new structs (`ODBTOFS`, `ODBTLIFE5`, `IODBZOR`) in
`docs/v2/implementation/focas-wire-protocol.md`; add
`cnc_rdtofs` / `cnc_rdtlife*` / `cnc_rdzofs` rows to the protocol
surface table in `docs/v2/implementation/focas-simulator-plan.md`;
extend focas-mock with three new command-id handlers + per-profile
seed data (tool table + work-offset table); add a
`tools_per_series` matrix to the `focas-mock` per-series profile
JSON so 0i-D's small tool table differs from 30i's; new
`Series/ToolingPopulatesTests.cs` and `Series/OffsetsPopulatesTests.cs`
integration tests; update `docs/drivers/FOCAS-Test-Fixture.md`
coverage map with the three new wire calls.
- Effort: large — three new calls, each with its own struct; tool-life
is variable-length.
- Risk: Medium — payload shapes are series-specific; keep the
capability matrix as the authoritative gate.
**PR F1-e — operator messages (#18) + currently-executing block text (#20)**
- Scope: `cnc_rdopmsg3` (gives all four FANUC opmsg classes in one
call), `cnc_rdactpt` (current block text). Project `Messages/External`
(variable, last-N strings), `Program/CurrentBlock` (single string).
- Files: `Wire/FocasWireClient.cs` (`ReadOperatorMessagesAsync`,
`ReadCurrentBlockAsync`), `IFocasClient.cs`, `FocasDriver.cs` (new
branches under program-poll cadence).
- Tests: simulator returns canned ASCII; assert string round-trip is
trim-stable (FANUC right-pads with `\0` or space).
- **Docs / fixture / e2e**: add `Messages/External` and
`Program/CurrentBlock` rows to the fixed-tree table in
`docs/drivers/FOCAS.md`, including the ring-buffer / last-N
semantics for opmsg; document the `OPMSG3` and `ODBACT2`
payload shapes in `docs/v2/implementation/focas-wire-protocol.md`;
add `cnc_rdopmsg3` / `cnc_rdactpt` rows to the protocol surface
table in `docs/v2/implementation/focas-simulator-plan.md`; extend
focas-mock with the two new command-id handlers (per-profile
canned message text + canned current-block text); add a
`mock_patch_opmsg` admin endpoint hook on `FocasSimFixture` for
tests that need to push a canned message; integration test
`Series/OperatorMessagesPopulateTests.cs` asserts trim-stable
round-trip and last-N retention.
- Effort: medium.
- Risk: Low — ASCII-only payloads.
**PR F1-f — `cnc_getfigure` decimal scaling (#6) + connection statistics (#27)**
- Scope: `cnc_getfigure` returns per-axis decimal-place counts; cache
the result at bootstrap and divide each `AbsolutePosition` /
`MachinePosition` / `RelativePosition` / `DistanceToGo` /
`ActualFeedRate` value before publishing. Existing nodes already
carry `Float64`; the change is invisible to clients except that
values become real-world units. Adds `Diagnostics/` subtree:
`Diagnostics/ReadCount`, `Diagnostics/ReadFailureCount`,
`Diagnostics/LastErrorMessage`, `Diagnostics/LastSuccessfulRead`,
`Diagnostics/ReconnectCount` — driven by counters already maintained
on `DeviceState`.
- Files: `Wire/FocasWireClient.cs` (new `ReadFigureAsync`),
`IFocasClient.cs`, `FocasDriver.cs` (cache decimal places per axis,
multiply on the read path, expose counters under `Diagnostics/`).
- Tests: assert that with a canned `cnc_getfigure` returning 3, an
`AbsolutePosition` of 12345 becomes `12.345`. Connection-stat tests
assert counters increment under known conditions.
- **Docs / fixture / e2e**: significant `docs/drivers/FOCAS.md` change —
add a "Decimal-place scaling" subsection explaining the
`FixedTree.ApplyFigureScaling` flag (default true on new installs,
false on migrations) and the unit-correctness semantics it enforces;
add `Diagnostics/*` rows to the fixed-tree table; add a
Diagnostics-counters subsection to `docs/v2/focas-deployment.md`
for operator dashboards; document `cnc_getfigure` (`ODBAXDP` /
`ODBAXIS`) struct in `docs/v2/implementation/focas-wire-protocol.md`;
add `cnc_getfigure` to the protocol surface in
`docs/v2/implementation/focas-simulator-plan.md`; extend focas-mock
with the per-axis decimal-place command handler + a `decimal_places`
field on each profile JSON; update
`docs/drivers/FOCAS-Test-Fixture.md` "When to trust each layer"
table with a "Are axis values reported in real-world units?" row;
add an opt-in `-CheckDecimalScaling` switch to `scripts/e2e/test-focas.ps1`
that asserts AbsolutePosition is scaled when the flag is on;
integration test `Series/DecimalScalingTests.cs` and
`Series/DiagnosticsCountersTests.cs`.
- Effort: medium — touches every axis read.
- Risk: Medium — this is a behavioural change for any existing
consumer that was already dividing client-side. Surface as a
`FixedTree.ApplyFigureScaling` opt-in flag (default true on new
installs, false when migrating); document in `docs/drivers/FOCAS.md`.
### Phase 2 — addressing additions
**PR F2-a — DIAG: address scheme (#14)**
- Scope: new `FocasAreaKind.Diagnostic` parsed from `DIAG:nnn` /
`DIAG:nnn/axis`, dispatched to `cnc_rddiag` (or `cnc_rddiagdgn` for
series that support it).
- Files: `FocasAddress.cs` (new prefix branch), `FocasCapabilityMatrix.cs`
(new `DiagnosticRange` per series), `Wire/FocasWireClient.cs`
(`ReadDiagnosticAsync`), `WireFocasClient.ReadAsync` (new dispatch
branch).
- Tests: parser unit tests, capability matrix unit tests, simulator
read-round-trip.
- **Docs / fixture / e2e**: add a `DIAG:` row to the address-syntax
table in `docs/Driver.FOCAS.Cli.md` with `read -a DIAG:301` and
`DIAG:301/0` (axis-scoped) examples; add a `DIAG:` row to the
addressing table in `docs/drivers/FOCAS.md`; add per-series
`DiagnosticRange` columns to `docs/v2/focas-version-matrix.md`;
document the `ODBDGN` struct in
`docs/v2/implementation/focas-wire-protocol.md`; add `cnc_rddiag`
/ `cnc_rddiagdgn` to the protocol surface in
`docs/v2/implementation/focas-simulator-plan.md`; extend focas-mock
with the diagnostic-range command handler + per-profile seeded
diagnostic numbers; integration test
`Series/DiagAddressTests.cs` round-trips a seeded diagnostic
number; update `docs/drivers/FOCAS-Test-Fixture.md` capability list
with the new `Diagnostic` `FocasAreaKind`.
- Effort: medium.
- Risk: Low — additive.
**PR F2-b — Multi-path / multi-channel CNC (#4)**
- Scope: 30i/31i/32i can host 210 paths; today every request block is
built with `PathId = 1` (`Wire/FocasWireProtocol.cs:216`). Add
optional `Path` segment to `FocasAddress` (e.g. `PARAM:1815@2`,
`R100@3.0`, `MACRO:500@2`); thread it into the `RequestBlock.PathId`
field. Fixed-tree gets a `Paths/{n}/` folder pivot.
- Files: `FocasAddress.cs` (new `Path` field + parser), `IFocasClient.cs`
(every read call gains an optional `pathId` parameter, defaulting to
1 for backward compatibility), `Wire/FocasWireClient.cs`
(thread the param through every `RequestBlock` constructor),
`FocasDriver.cs` (per-device `PathCount` discovery via
`cnc_rdpathnum`; iterate fixed-tree per path).
- Tests: unit on the parser; simulator with two paths configured;
assert that a `PARAM:1815@2` read targets path 2.
- **Docs / fixture / e2e**: significant `docs/drivers/FOCAS.md`
update — new "Multi-path / multi-channel CNC" subsection explaining
the `@N` suffix syntax, `Paths/{n}/` browse pivot, and per-path
capability gating; add `@N` to every address row in the
addressing table in `docs/Driver.FOCAS.Cli.md`; document
`cnc_rdpathnum` (`ODBPATHNUM` struct) in
`docs/v2/implementation/focas-wire-protocol.md`, and update the
`RequestBlock.PathId` discussion (was hard-coded to 1 — now a
parameter); add `cnc_rdpathnum` to the protocol surface and the
per-profile `path_count` field to the profile schema in
`docs/v2/implementation/focas-simulator-plan.md`; extend focas-mock
with per-path state isolation (separate PMC / param / macro tables
per `path_id`) and a new `multi_path` profile (e.g.
`thirtyone_i_dual_path`); add a `-Paths` switch to
`scripts/e2e/test-focas.ps1` that runs the matrix once per
declared path; document the new compose profile in
`docs/drivers/FOCAS-Test-Fixture.md`; new
`Series/MultiPathTests.cs` integration test asserting independent
per-path reads.
- Effort: large — touches every wire call's `RequestBlock` shape.
- Risk: Medium — backward compatibility for existing single-path
configs. Default `PathId = 1` everywhere; only deviate when the
address explicitly carries a `@N` suffix or when the fixed-tree
loop is iterating discovered paths.
**PR F2-c — PMC F/G letters for 16i (#15)**
- Scope: capability matrix bug — `PmcLetters(Sixteen_i)` currently
returns `{X, Y, R, D}`; real 16i ladders use F/G for handshakes.
Widen the set; verify the address `pmc_rdpmcrng` numeric letter
codes match.
- Files: `FocasCapabilityMatrix.cs` (one-line fix to the 16i case),
`tests/.../FocasCapabilityMatrixTests.cs` (assert F0.0 and G50.5
parse against `Sixteen_i`).
- **Docs / fixture / e2e**: update the 16i row of the PMC-letters
column in `docs/v2/focas-version-matrix.md` (the row currently lists
X/Y/R/D — add F/G); add a one-line "fixed in v…" callout to the
changelog section of the same doc; no simulator change required (the
16i profile JSON in `tests/.../Docker/focas-mock/profiles/sixteen_i.json`
already has F/G ranges declared from Stream B); add F0.0 / G50.5
probes to the 16i row of the per-series matrix in
`scripts/e2e/test-focas.ps1`; no fixture-doc change needed.
- Effort: trivial.
- Risk: Low — correctness fix.
**PR F2-d — Bulk PMC range read (#16)**
- Scope: today the driver issues one `pmc_rdpmcrng` per tag (one TCP
RTT each). The wire call already supports a range `[start, end]`;
the missing piece is coalescing on the read side. Add a coalescer:
group same-letter contiguous (or near-contiguous within a small
gap budget) PMC bytes from the request batch into one wire call
per group, then slice client-side. Reuse the Modbus coalescing
infrastructure pattern (per-group-id ProhibitedRanges) where it
applies.
- Files: new `Wire/FocasPmcCoalescer.cs`, hook into
`FocasDriver.ReadAsync` between the per-tag path and the wire call
layer. Surface coalesce stats on the `Diagnostics/` subtree (PR F1-f).
- Tests: unit — given a request batch of `R100..R110`, assert that
the coalescer issues one call covering 100..110 and slices the
result. Integration — assert observed wire-call count drops with
coalescing on.
- **Docs / fixture / e2e**: add a "PMC range coalescing" subsection
to `docs/drivers/FOCAS.md` (wire-call reduction, gap budget,
per-series byte cap); document the new `Diagnostics/CoalesceStats/*`
counters added on top of PR F1-f's diagnostics tree; add a
PMC-byte-cap column to `docs/v2/focas-version-matrix.md`;
no new wire calls (`pmc_rdpmcrng` is already in the surface), but
document the supported max-bytes-per-call in
`docs/v2/implementation/focas-wire-protocol.md`; extend focas-mock
with a request-counter admin endpoint so integration tests can
assert the call-count reduction (counter visible via
`FocasSimFixture.GetWireCallCountAsync`); update
`docs/v2/implementation/focas-simulator-plan.md` Stream B
validation harness with the request-counter handler; integration
test `Series/PmcCoalescingTests.cs` asserts an `R100..R110` batch
produces exactly 1 wire call against the mock.
- Effort: medium.
- Risk: Medium — the FANUC max-bytes-per-`pmc_rdpmcrng` ceiling is
series-specific; cap conservatively (≤ 256 bytes per range) and
let operators raise it via config if their CNC accepts more.
### Phase 3 — alarm history
**PR F3-a — `cnc_rdalmhistry` extension to alarm projection (#17)**
- Scope: extend `FocasAlarmProjection` with two modes — `ActiveOnly`
(today's behaviour) and `ActivePlusHistory`. In the latter, on
connect (and on a configurable cadence — default 5 min, since the
CNC ring buffer changes only on alarm raise/clear) issue
`cnc_rdalmhistry` for the most-recent N entries; project as
historic events through `IAlarmSource` with `OccurrenceTime` from
the CNC's timestamp field.
- Files: new `Wire/FocasWireClient.ReadAlarmHistoryAsync`, new
`IFocasClient.ReadAlarmHistoryAsync`,
`FocasAlarmProjection.cs` (mode switch + history poll loop),
`FocasDriverOptions.cs` (`AlarmProjection.Mode` enum +
`HistoryPollInterval` + `HistoryDepth`).
- Tests: simulator returns canned history payload; assert events
fire with the timestamps from the canned data and don't re-fire
on every poll.
- **Docs / fixture / e2e**: add an "Alarm history" subsection to
`docs/drivers/FOCAS.md` documenting the `ActiveOnly` vs
`ActivePlusHistory` mode switch, the `HistoryDepth` cap, and the
dedup key; add a configuration-knob row to
`docs/v2/focas-deployment.md` for operator dashboards; document
`ODBALMHIS` struct in
`docs/v2/implementation/focas-wire-protocol.md`; add
`cnc_rdalmhistry` to the protocol surface in
`docs/v2/implementation/focas-simulator-plan.md`; extend focas-mock
with a ring-buffer alarm history (per profile) + `mock_patch_alarmhistory`
admin endpoint; expose a `SeedAlarmHistoryAsync` helper on
`FocasSimFixture`; add `Series/AlarmHistoryProjectionTests.cs`
asserting historic events fire once and active events still fire
raise/clear; update `docs/drivers/FOCAS-Test-Fixture.md` integration
bullet list with `cnc_rdalmhistry`.
- Effort: medium.
- Risk: Medium — duplicate-event suppression; key history events on
`(timestamp, alarmNumber, type)` to deduplicate against the active
list.
### Phase 4 — write path
This phase is the major behavioural change. The driver's read-only
contract has been the documented design choice in
`docs/drivers/FOCAS.md:14-18` and is reinforced by tests
(`FocasReadWriteTests.WriteAsync_ReturnsBadNotWritable`). Removing it
deserves a deliberate decision-record entry in the v2 decisions log
before any code lands.
**PR F4-a — write infrastructure + per-tag opt-in (no wire calls yet)**
- Scope: drop the `BadNotWritable` short-circuit in
`WireFocasClient.WriteAsync` and replace with a kind-based dispatch
that returns `BadNotWritable` only for kinds the wire client
doesn't yet implement. Honour `FocasTagDefinition.Writable` (already
present, default `true` — flip default to `false` per #1's safer
posture). Plumb `WriteIdempotent` through Polly retry.
- Files: `WireFocasClient.cs`, `FocasDriverOptions.cs`,
`FocasDriver.cs`, `docs/drivers/FOCAS.md` (rewrite the read-only
paragraph), new `docs/v2/decisions.md` entry.
- Tests: assert that with `Writable=false` the path still returns
`BadNotWritable`; with `Writable=true` and an unimplemented kind
the write returns `BadNotSupported` (distinct from the per-tag
policy denial).
- **Docs / fixture / e2e**: this is the heaviest doc PR in the plan.
- **`docs/drivers/FOCAS.md` lines 1418** — revoke the unconditional
"OtOpcUa is read-only against FOCAS… Writes return BadNotWritable
by design" callout. Replace with a "Writes (opt-in, off by
default)" subsection that names `Writes.Enabled`, the per-tag
`Writable` flag (default flipped to `false`), and links to the
Phase 4 decision-record entry.
- **`docs/drivers/FOCAS-Test-Fixture.md` lines 4243** — revoke the
"`IWritable` intentionally returns `BadNotWritable` — OtOpcUa is
read-only against FOCAS" callout. Replace with a qualified
"default behaviour" note plus a pointer to the new write-enabled
test profile.
- **`docs/Driver.FOCAS.Cli.md` lines 100116** — the existing
`write` section already documents the CLI shape; expand the
"**Writes are non-idempotent by default**" warning with a
server-side note that the OtOpcUa endpoint enforces the
`Writes.Enabled` flag and rejects writes when off, and that
the CLI itself talks to the driver directly so its writes are
not gated by the server flag (operator must consciously use
the right tool).
- New `docs/v2/decisions.md` entry "FOCAS write-path opt-in"
capturing the design-choice reversal.
- Update `docs/featuregaps.md` row for #1 / #3 — flip Build = Yes
annotation to "shipping behind flag".
- Simulator: no new commands; existing read commands gain a
"writes when not unlocked" branch wired up here for symmetry
even though no write commands ship yet (returns
`BadNotSupported` until F4-b lands).
- E2E: add `-Write` switch (no-op stage in this PR; populated by
F4-b) to `scripts/e2e/test-focas.ps1`.
- Effort: medium.
- Risk: High — design-choice reversal. Mitigation: ship behind a
driver-level `Writes.Enabled` flag (default `false`); operators
must explicitly enable in `appsettings.json`.
**PR F4-b — `cnc_wrmacro` + `cnc_wrparam`**
- Scope: implement macro and parameter writes. Both have well-defined
payload shapes mirroring their read counterparts (IODBPSD for
parameters, ODBM for macros).
- Files: `Wire/FocasWireClient.cs` (new `WriteParameterAsync`,
`WriteMacroAsync`), `WireFocasClient.WriteAsync` (dispatch).
- Tests: simulator extension — accept writes and reflect them on
subsequent reads. ACL tests in
`tests/ZB.MOM.WW.OtOpcUa.IntegrationTests` to verify the
server-layer enforcement (per the memory entry: ACL decisions
happen in `DriverNodeManager`, never in driver-level code).
- **Docs / fixture / e2e**:
- `docs/drivers/FOCAS.md` — extend the "Writes" subsection
(introduced in F4-a) with the two new write kinds, the
`Writes.AllowParameter` and `Writes.AllowMacro` granular flags,
and a security note: parameter writes require LDAP group
`WriteConfigure`, macro writes require `WriteOperate` (cross-link
to `docs/Security.md`).
- `docs/v2/focas-deployment.md` — significant addition: a "Write
safety" section covering operator pre-checks (CNC in MDI mode,
parameter-write switch enabled), audit-log expectations, and the
LDAP group requirements.
- `docs/Driver.FOCAS.Cli.md` — populate the existing `write`
examples for `PARAM:` and `MACRO:` (already present at lines
105108) with a "Server-enforced ACL" note linking to
`docs/Security.md`.
- Document `IODBPSD` (write side) and `ODBM` (write side) in
`docs/v2/implementation/focas-wire-protocol.md` (the read-side
structs are already there — flag the byte layout symmetry).
- `docs/v2/implementation/focas-simulator-plan.md` — add
`cnc_wrparam` / `cnc_wrmacro` to the protocol surface table
and update Stream C status accordingly.
- Extend focas-mock with `cnc_wrparam` / `cnc_wrmacro` handlers
that mutate the per-profile state and return
`EW_PASSWD` when the unlock state is off (sets up F4-d's
test path); add `mock_get_last_write` admin endpoint for
audit-log assertions.
- New `Series/ParameterWriteTests.cs` and `Series/MacroWriteTests.cs`
integration tests; ACL test under
`tests/ZB.MOM.WW.OtOpcUa.IntegrationTests/Authz/FocasWriteAclTests.cs`
asserting `WriteConfigure` is required for `PARAM:` writes and
`WriteOperate` for `MACRO:` writes.
- `scripts/e2e/test-focas.ps1` — populate the `-Write` stage from
F4-a with macro and parameter round-trip writes against the
Docker mock.
- Effort: medium.
- Risk: High — a misdirected parameter write can put the CNC into a
bad state. Surface a `Writes.AllowParameter` flag separate from
`Writes.Enabled` so operators can grant macro writes without
parameter writes.
**PR F4-c — `pmc_wrpmcrng`**
- Scope: PMC range writes. Read-modify-write semantics for bit-level
writes (the wire call is byte-addressed). Existing tests
(`FocasPmcBitRmwTests.cs`) prove the read-modify-write pattern
shape that the write path needs.
- Files: `Wire/FocasWireClient.cs` (new `WritePmcRangeAsync`),
bit-level RMW helper in `WireFocasClient`.
- Tests: simulator round-trip on byte writes; bit-level write asserts
the unrelated bits in the same byte are preserved.
- **Docs / fixture / e2e**:
- `docs/drivers/FOCAS.md` — extend the "Writes" subsection with
PMC writes; loud safety callout block ("PMC is ladder working
memory — a mistargeted bit can move motion"); document the
read-modify-write semantics for bit-level writes; document the
new `Writes.AllowPmc` granular flag.
- `docs/v2/focas-deployment.md` — extend the "Write safety"
section with PMC-specific pre-checks (e-stop, jog mode); add an
ops-runbook bullet on auditing PMC writes from the
`Diagnostics/CoalesceStats/` (extended) tree.
- `docs/Driver.FOCAS.Cli.md` — the existing `write` example
`write -h … -a G50.3 -t Bit -v on` (line 107) is already PMC-bit;
update its surrounding warning to call out RMW behaviour.
- Document the `pmc_wrpmcrng` request frame in
`docs/v2/implementation/focas-wire-protocol.md` (the read frame
is already there — flag the inverted shape).
- `docs/v2/implementation/focas-simulator-plan.md` — add
`pmc_wrpmcrng` to the protocol surface table.
- Extend focas-mock with `pmc_wrpmcrng` handler that mutates
per-profile PMC tables; assert byte-aligned writes preserve
untouched bytes (mirrors the driver's RMW contract).
- New `Series/PmcRangeWriteTests.cs` and
`Series/PmcBitRmwIntegrationTests.cs` integration tests; ACL
test under
`tests/ZB.MOM.WW.OtOpcUa.IntegrationTests/Authz/FocasPmcWriteAclTests.cs`
asserting `WriteOperate` is required.
- `scripts/e2e/test-focas.ps1` — extend the `-Write` stage with a
PMC bit round-trip.
- Effort: medium.
- Risk: High — PMC is the ladder logic's working memory; a
mistargeted write can move motion. Document loudly.
**PR F4-d — FOCAS password / unlock parameter (#3)**
- Scope: some controllers gate `cnc_wrparam` and certain reads behind
a connection-level password. Add `Password` to `FocasDeviceOptions`;
emit the FOCAS password block during connect (`cnc_wrunlockparam`
per FOCAS docs — confirm the exact command id during simulator
iteration). On any read/write returning `EW_PASSWD`
re-issue the password and retry once.
- Files: `Wire/FocasWireClient.cs` (`UnlockAsync`),
`FocasDriverOptions.cs` (`Password` field, treated as a secret —
redact in logs), `FocasDriver.cs` (call on connect).
- Tests: simulator extension — emit `EW_PASSWD` on writes when not
unlocked; assert the unlock+retry path.
- **Docs / fixture / e2e**:
- `docs/drivers/FOCAS.md` — new "FOCAS password" subsection under
Writes describing the optional `Password` device-option, when
the CNC requires it (16i + some 30i firmwares with parameter-
protect on), and the redaction guarantee.
- **Security-note in `docs/v2/focas-deployment.md`** — significant
addition: a "FOCAS password handling" subsection covering
storage in `appsettings.json` (and the dev redaction pattern at
`.local/`), the no-log invariant, and a runbook for password
rotation. Cross-link to `docs/Security.md`.
- `docs/Driver.FOCAS.Cli.md` — add a `--cnc-password` flag row to
the "Common flags" table with the redaction note.
- Document `cnc_wrunlockparam` (or the resolved command id) in
`docs/v2/implementation/focas-wire-protocol.md`; resolve the
open question raised by F4-d into the doc.
- `docs/v2/implementation/focas-simulator-plan.md` — add
`cnc_wrunlockparam` to the protocol surface; document the
per-profile `unlock_password` field on the JSON profile schema.
- Extend focas-mock with locked-state semantics on parameter
writes (already half-stubbed in F4-b's `EW_PASSWD` branch);
add `cnc_wrunlockparam` handler; add `mock_set_password`
admin endpoint so integration tests can pin the unlock value.
- New `Series/PasswordUnlockTests.cs` integration test asserts
a write returning `EW_PASSWD` triggers exactly one unlock
retry, and the second write succeeds.
- `scripts/e2e/test-focas.ps1` — add `-CncPassword` parameter,
threaded through to the CLI for the `-Write` stage.
- Effort: small — once Phase 4-a/b are in.
- Risk: Medium — password storage. Use the existing
`appsettings.json` redaction pattern (memory entry: `dohertj2`
AppData path); never log the password value.
### Phase 5 — derived telemetry
**PR F5-a — Cycle time per part / last cycle delta (#24 derivation)**
- Scope: with `Production/CycleTimeSeconds` in place from F1-b and
the parts-count from `cnc_rdparam`, compute "last completed cycle"
as the delta in `Timers/CycleSeconds` between successive
parts-count increments. Project `Production/LastCycleSeconds`,
`Production/LastCycleStartUtc`.
- Files: `FocasDriver.cs` only — pure derivation in the program-poll
cadence handler.
- Tests: simulate a parts-count increment from 5→6; assert
`LastCycleSeconds` equals the cycle-timer delta over the same
window.
- **Docs / fixture / e2e**: add `Production/LastCycleSeconds` and
`Production/LastCycleStartUtc` rows to the fixed-tree table in
`docs/drivers/FOCAS.md` with the rollover / counter-reset
behaviour documented; add a `Derived telemetry` callout in
`docs/v2/focas-deployment.md` explaining the derivation is
client-visible only (no new wire calls); no
`docs/v2/implementation/focas-wire-protocol.md` change (pure
derivation); no focas-mock change beyond `FocasSimFixture`'s
existing parameter-patch / timer-patch helpers — add a
`SimulateCycleCompletionAsync` convenience helper that increments
parts-count and advances the cycle timer atomically; new
`Series/CycleDeltaTests.cs` integration test simulates a 5→6
parts-count transition; no `scripts/e2e/test-focas.ps1` change.
- Effort: small.
- Risk: Low — pure derivation.
## Documentation, fixture, and e2e impact
Consolidated view of every doc, fixture, and e2e artefact this plan
touches. FOCAS has the largest doc surface of any driver in the v2
roadmap because Phase 4 reverses a long-standing read-only design
choice that is referenced from at least three user-facing docs and one
test-fixture doc.
### Docs touched (per file, with the heaviest PR called out)
| Doc | Touched by | Heaviest change |
| --- | --- | --- |
| `docs/drivers/FOCAS.md` | F1-a, F1-b, F1-c, F1-d, F1-e, F1-f, F2-a, F2-b, F2-d, F3-a, F4-a, F4-b, F4-c, F4-d, F5-a | **F4-a** revokes the read-only callout at lines 1418; **F2-b** adds the multi-path subsection |
| `docs/drivers/FOCAS-Test-Fixture.md` | F1-a, F1-d, F1-f, F2-a, F2-b, F3-a, F4-a | **F4-a** revokes the "`IWritable` intentionally returns `BadNotWritable`" callout at lines 4243 |
| `docs/Driver.FOCAS.Cli.md` | F1-b, F1-c, F2-a, F2-b, F4-a, F4-b, F4-c, F4-d | **F4-a** qualifies the read-only stance at lines 100116; **F4-d** adds `--cnc-password` flag |
| `docs/v2/focas-deployment.md` | F1-f, F3-a, F4-a, F4-b, F4-c, F4-d | **F4-b** adds "Write safety" section; **F4-d** adds "FOCAS password handling" section |
| `docs/v2/focas-version-matrix.md` | F1-c, F1-d, F2-a, F2-c, F2-d | **F1-d** adds capability-suppression rows for tooling/offsets |
| `docs/v2/implementation/focas-wire-protocol.md` | F1-a, F1-c, F1-d, F1-e, F1-f, F2-a, F2-b, F2-d, F3-a, F4-b, F4-c, F4-d | **F1-d** documents three new structs (ODBTOFS, ODBTLIFE5, IODBZOR); **F4-d** resolves the `cnc_wrunlockparam` open question |
| `docs/v2/implementation/focas-simulator-plan.md` | F1-c, F1-d, F1-e, F1-f, F2-a, F2-b, F2-d, F3-a, F4-a, F4-b, F4-c, F4-d | Each PR appends to the protocol surface table; F4-* close out Stream C status |
| `docs/v2/decisions.md` (new entry) | F4-a | Net-new decision-record for the read-only reversal |
| `docs/featuregaps.md` | F4-a | Updates Build = Yes annotation for #1 / #3 with "shipping behind flag" |
### Fixture (focas-mock) extensions
The vendored Python `focas-mock` simulator under
`tests/.../IntegrationTests/Docker/focas-mock/` gains the following
new command-id handlers and per-profile state:
| PR | Mock extension |
| --- | --- |
| F1-a | `cnc_rdcncstat` full-struct response |
| F1-b | Seeded values for parameters 6711/6712/6713 in every profile JSON |
| F1-c | New `cnc_modal` handler + canned modal payload per profile |
| F1-d | `cnc_rdtofs` / `cnc_rdtlife*` / `cnc_rdzofs` handlers + per-profile tool/offset tables, plus a `tools_per_series` profile knob |
| F1-e | `cnc_rdopmsg3` / `cnc_rdactpt` handlers + `mock_patch_opmsg` admin endpoint |
| F1-f | `cnc_getfigure` handler + per-profile `decimal_places` field |
| F2-a | `cnc_rddiag` / `cnc_rddiagdgn` handlers + per-profile diagnostic numbers |
| F2-b | Per-path state isolation; new `path_count` profile field; new `thirtyone_i_dual_path` compose profile |
| F2-c | No mock change (16i profile already declares F/G ranges) |
| F2-d | Wire-call counter admin endpoint |
| F3-a | Ring-buffer alarm history + `mock_patch_alarmhistory` admin endpoint |
| F4-a | Stub branch returning `BadNotSupported` for write commands |
| F4-b | `cnc_wrparam` / `cnc_wrmacro` handlers (with `EW_PASSWD` when locked); `mock_get_last_write` admin endpoint |
| F4-c | `pmc_wrpmcrng` handler with byte-aligned write semantics |
| F4-d | `cnc_wrunlockparam` handler; `mock_set_password` admin endpoint; locked-state on the param-write path |
| F5-a | `SimulateCycleCompletionAsync` helper on `FocasSimFixture` (no new mock command) |
`FocasSimFixture` (in
`tests/ZB.MOM.WW.OtOpcUa.Driver.FOCAS.IntegrationTests/FocasSimFixture.cs`)
gains corresponding admin-API client helpers for each new endpoint.
### Integration tests (per phase)
| Phase | New / extended integration tests under `tests/.../FOCAS.IntegrationTests/Series/` |
| --- | --- |
| Phase 1 | `StatusFlagsPopulateTests.cs`, `ProductionPopulatesTests.cs`, `ModalPopulatesTests.cs`, `ToolingPopulatesTests.cs`, `OffsetsPopulatesTests.cs`, `OperatorMessagesPopulateTests.cs`, `DecimalScalingTests.cs`, `DiagnosticsCountersTests.cs` |
| Phase 2 | `DiagAddressTests.cs`, `MultiPathTests.cs`, `PmcCoalescingTests.cs` (plus a 16i row in `FocasCapabilityMatrixTests.cs` for F2-c) |
| Phase 3 | `AlarmHistoryProjectionTests.cs` |
| Phase 4 | `ParameterWriteTests.cs`, `MacroWriteTests.cs`, `PmcRangeWriteTests.cs`, `PmcBitRmwIntegrationTests.cs`, `PasswordUnlockTests.cs` plus ACL tests under `tests/ZB.MOM.WW.OtOpcUa.IntegrationTests/Authz/FocasWriteAclTests.cs` and `FocasPmcWriteAclTests.cs` |
| Phase 5 | `CycleDeltaTests.cs` |
### E2E script (`scripts/e2e/test-focas.ps1`) updates
| PR | Change |
| --- | --- |
| F1-f | New `-CheckDecimalScaling` switch |
| F2-b | New `-Paths` switch (matrix mode iterates per declared path) |
| F2-c | Adds F0.0 / G50.5 probes to the 16i row of the per-series matrix |
| F4-a | Adds `-Write` switch (no-op stage in F4-a; populated by F4-b/c) |
| F4-b | Populates `-Write` stage with macro + parameter round-trip writes |
| F4-c | Extends `-Write` stage with PMC bit round-trip |
| F4-d | Adds `-CncPassword` parameter, threaded through to the CLI |
`scripts/integration/run-focas.ps1` does not change shape across the
plan — it remains the compose up/test/compose down wrapper. New
profiles registered by F2-b are automatically picked up via the
existing `-Profile` switch.
### Read-only callouts requiring revocation in Phase 4
For reviewer benefit, the explicit read-only callouts that **F4-a
must revoke or qualify** in the same PR that flips the design choice:
- `docs/drivers/FOCAS.md` lines 1418 ("OtOpcUa is **read-only**
against FOCAS… Writes return `BadNotWritable` by design.")
- `docs/drivers/FOCAS-Test-Fixture.md` lines 4243 ("`IWritable`
intentionally returns `BadNotWritable` — OtOpcUa is read-only
against FOCAS.")
- `docs/Driver.FOCAS.Cli.md` lines 100116 (write section is already
documented but predates the server-side flag; needs a
server-enforced-ACL note)
- `docs/featuregaps.md` (FOCAS row entries for #1 and #3 carry the
same read-only-by-design framing — flip annotation)
## Skip-rated items (for context)
These appear in the featuregaps recommendations table as Build = No;
recapped here so reviewers can confirm the scope decision rather than
re-deriving it from `featuregaps.md`:
- **#2 HSSB transport** — PCI hardware, declining install base,
reopens the Fwlib distribution problem the wire client deliberately
closed.
- **#5 Series 15 / Power Mate D-H / Series 35i** — very legacy; small
install base. Capability matrix already accepts `Unknown` as a
permissive escape hatch.
- **#9 Tool-offset write** — write-heavy; defer alongside the general
write decision (F4 covers reads via tool-life only).
- **#19 Program list / upload / download / delete** — DNC product
territory; significant scope; out of OtOpcUa's MES focus.
- **#21 DPRNT TCP listener** — significant scope; modern OPC UA
alarms / events supersede it.
- **#22 Servo / spindle deep info (`cnc_rdsvinfo` / `cnc_rdspinfo`)** —
specialty; load-percent already covers most needs.
- **#23 Per-axis acceleration / jerk / feed-per-rev** — niche
advanced telemetry.
- **#25 Operator write commands (preset, `cnc_setpath`, `cnc_wrabsmac`)** —
read-only-by-design covers it; parameter / PMC / macro writes from
Phase 4 are the supervisory writes operators actually need.
- **#26 CNC time / date sync** — rare ask; commonly handled by CNC NTP.
## Open questions
- **Modal command id** (PR F1-c): `cnc_modal` numeric command code is
not in the existing wire-protocol notes
(`docs/v2/implementation/focas-wire-protocol.md`). Capture during
the simulator iteration loop; if the simulator can't yet emit the
shape, gate F1-c behind a bench-CNC trace per the
diminishing-returns checkpoint.
- **Override parameter numbers** (PR F1-c): feedrate / rapid /
spindle override register numbers are MTB-specific. Default to the
documented Fanuc factory numbers and let operators override per
device (`Devices[].OverrideRegisters` map).
- **Multi-path discovery** (PR F2-b): does the simulator support
multi-path responses today? If not, F2-b lands gated behind the
`OTOPCUA_FOCAS_SIM_WIRE_COMPAT=1` flag the wire-protocol doc
describes.
- **Decimal-scaling migration** (PR F1-f): existing `Float64` axis
nodes are scaled integers today. Decision: ship F1-f with
scaling-on default, add a one-release deprecation window with the
flag default-off so existing dashboards don't silently scale by
10^N when the driver is upgraded. Need explicit operator opt-in.
- **Write security posture** (Phase 4): should writes require LDAP
group `WriteConfigure` (parameters) vs `WriteOperate` (macros /
PMC)? Per the memory entry on ACL-at-server-layer, the driver only
reports `SecurityClassification`; the server enforces. Need the
driver to surface the right classification per address kind:
`Configure` for `PARAM:`, `Operate` for `MACRO:` and PMC writes.
- **Phase 4 rollout**: ship behind a feature flag in `appsettings.json`
(`Drivers.{name}.Config.Writes.Enabled`) with `false` default for at
least one release before flipping the default. Update
`docs/drivers/FOCAS.md` and `docs/featuregaps.md` in the same PR
that flips the default.
- **Cycle-delta edge cases** (PR F5-a): parts-count rollover; counter
reset by the operator. Default behaviour: emit the delta only when
the counter strictly increments by 1; on any other transition emit
`Production/LastCycleSeconds` as `null` with `BadOutOfRange` and
let the operator interpret.
+863
View File
@@ -0,0 +1,863 @@
# OpcUaClient Driver — Implementation Plan
> Source of gap analysis: [featuregaps.md → OpcUaClient](../featuregaps.md#opcuaclient-opc-ua-aggregation-client)
>
> Covers Build = Yes items only. Numbering matches the featuregaps Recommendations table.
## Summary
The OpcUaClient driver already ships 8/8 capability interfaces and a working
end-to-end Session/Subscription/MonitoredItem/HistoryRead pipeline backed by
the OPC Foundation `OPCFoundation.NetStandard.Opc.Ua.Client` SDK. Most of the
14 Build = Yes gaps are operability or curation knobs — config surface +
plumbing into existing SDK calls — rather than new protocol implementation.
A small number need genuinely new SDK plumbing (Reverse Connect,
ModelChangeEvent subscribe) and one (`ReadEventsAsync`) needs a coordinated
cross-driver interface change.
The plan groups the work into five phases, ordered to deliver per-tag /
per-subscription operability first (highest-frequency operator pain), then
curation, then change tracking, then connectivity, then historical+HA. Each
PR sticks to one feature-gap row so reviews stay narrow.
## Phased delivery
| Phase | Theme | Gaps | PRs | Notes |
| :---: | --- | --- | :---: | --- |
| 1 | Operability knobs | #5, #6, #15, #17, #20 | 5 | Pure SDK config surface; no new wire flows |
| 2 | Discovery & curation | #2, #7, #8, #9 | 4 | Touches `ITagDiscovery` + adds method invoke |
| 3 | Change tracking | #10 | 1 | New session-level subscription on `Server` node |
| 4 | Connectivity | #1 | 1 | Reverse Connect — new listener path |
| 5 | Historical & redundancy | #12, #13, #14 | 3 | Includes the cross-driver `IHistoryProvider` change |
**Total: 14 PRs across 5 phases.** Phases 1-3 land independently against
the existing single-session model. Phase 4 ships in parallel with phases 2-3
since it doesn't touch `OpcUaClientDriver` proper. Phase 5's first PR is a
prerequisite for the `ReadEventsAsync` work in every other history-capable
driver and must coordinate with them.
## Per-PR detail
### Phase 1 — Operability knobs
#### PR-1: Per-subscription tuning (gap #6)
**Goal**: lift the hard-coded `KeepAliveCount=10`, `LifetimeCount=1000`,
`MaxNotificationsPerPublish=0`, `Priority=0`, `PublishingInterval` floor of
50 ms into `OpcUaClientDriverOptions` so high-event-rate servers can be
defended against (`MaxNotificationsPerPublish=0` is unlimited — the
documented DoS surface) and high-tag-count deployments can split by
priority.
**SDK API**:
- `Subscription.SetPublishingMode(bool, ct)` for runtime enable/disable
- `SubscriptionOptions.PublishingInterval / KeepAliveCount / LifetimeCount /
MaxNotificationsPerPublish / Priority` set at create-time
- New options class `OpcUaSubscriptionDefaults` (publish interval floor,
keep-alive count, lifetime count, max notifications, priority)
**Files**:
- `src/.../OpcUaClient/OpcUaClientDriverOptions.cs` — add `Subscriptions`
sub-section
- `src/.../OpcUaClient/OpcUaClientDriver.cs` — `SubscribeAsync` reads from
options
- `src/.../OpcUaClient/OpcUaClientDriver.cs` — `SubscribeAlarmsAsync` reuses
same defaults but with `Priority=1` higher than data subscriptions so
alarms aren't starved during data bursts
**Tests**: `OpcUaClientSubscribeAndProbeTests` — assert options propagate;
add a stress unit test (mocked `Subscription`) that asserts custom
`MaxNotificationsPerPublish` is forwarded so a value > 0 actually reaches
the SDK.
**Risks**: Setting `LifetimeCount` too low against a server with publish-
throttling can drop subscriptions; doc the formula (`LifetimeCount >=
3 * KeepAliveCount`).
**Docs / fixture / e2e**: new "Subscription tuning" subsection in
`docs/drivers/OpcUaClient.md` (create if missing) documenting the
`Subscriptions` options block with the `LifetimeCount >= 3 *
KeepAliveCount` formula; cross-link from the "Advanced options" section
of `docs/Client.CLI.md` so CLI users discover the knobs. Fixture: opc-plc
already publishes fast tickers (`FastUInt1` @ 100 ms) sufficient for
coverage — no fixture-side change. Integration test in
`tests/ZB.MOM.WW.OtOpcUa.Driver.OpcUaClient.IntegrationTests/` asserting
custom `KeepAliveCount` / `Priority` reach the wire (capture via
`OpcPlcFixture` keepalive count). E2E: extend
`scripts/e2e/test-opcuaclient.ps1` with a stage that sets a non-default
publish interval and confirms the local subscription honours it.
---
#### PR-2: Per-tag advanced subscription tuning incl. deadband (gap #5)
**Goal**: surface `SamplingInterval`, `QueueSize`, `DiscardOldest`,
`MonitoringMode`, and `DataChangeFilter` (DeadbandType=Absolute/Percent +
Trigger=Status/StatusValue/StatusValueTimestamp) per-tag. Deadband is the
baseline analog noise filter every commercial UA aggregator ships and the
single feature most likely to cut bandwidth on busy plants.
**SDK API**:
- `MonitoredItem.Filter = new DataChangeFilter { Trigger =
DataChangeTrigger.StatusValue, DeadbandType = (uint)DeadbandType.Absolute,
DeadbandValue = 0.5 }`
- `MonitoredItemOptions.QueueSize / DiscardOldest / SamplingInterval /
MonitoringMode`
- Per-tag override structure: extend the `SubscribeAsync` parameter shape
(or add an overload accepting a `IReadOnlyList<MonitoredTagSpec>`) — note
this requires coordinating with `ISubscribable` so the per-tag carrier
reaches the driver.
**Files**:
- `src/.../Core.Abstractions/ISubscribable.cs` — add overload
`SubscribeAsync(IReadOnlyList<MonitoredTagSpec>, ...)` keeping old API
for source compat
- `src/.../OpcUaClient/OpcUaClientDriver.cs` — translate spec → SDK filter
**Tests**: assert `DataChangeFilter` lands on the `MonitoredItem.Filter` for
each kind of trigger; assert PercentDeadband requires server-side
EURange (server returns `BadFilterNotAllowed` if not configured) — capture
the StatusCode and surface as a usable error.
**Risks**: cross-cutting `ISubscribable` change. Mitigation: ship the
overload as additive — existing single-arg path still exists.
**Docs / fixture / e2e**: new "Per-tag deadband and monitoring filters"
section in `docs/drivers/OpcUaClient.md` (create if missing) with worked
examples of Absolute vs Percent deadband + the EURange prerequisite;
update `docs/Client.CLI.md` `subscribe` command page with the new tag-
config syntax for `--deadband` / `--queue-size` / `--discard-oldest`;
update `docs/Client.UI.md` Subscriptions tab section to mirror. Fixture:
`OpcPlcFixture` / `OpcPlcProfile` seeds an analog (`StepUp` already
oscillates) and confirms `EURange` is published — extend the profile to
flag noisy nodes. Integration test in
`tests/ZB.MOM.WW.OtOpcUa.Driver.OpcUaClient.IntegrationTests/` asserts
publish suppression below the deadband threshold. E2E: add a
`-DeadbandValue` stage to `scripts/e2e/test-opcuaclient.ps1` (and a
`deadband` knob to `scripts/e2e/e2e-config.sample.json`) that subscribes,
asserts no spurious updates within the band.
---
#### PR-3: Honor server `OperationLimits` (gap #15)
**Goal**: read `Server.ServerCapabilities.OperationLimits.MaxNodesPerRead /
Write / Browse / HistoryReadData` once after Session activation, cache,
and chunk batch operations to those caps client-side. Today the SDK chunks
on its internal default; against an undersized embedded UA server this
results in `BadTooManyOperations`.
**SDK API**:
- After session open: `Session.ReadAsync` of
`VariableIds.Server_ServerCapabilities_OperationLimits_MaxNodesPerRead`
+ sibling NodeIds. The SDK exposes `Session.OperationLimits` after
`FetchOperationLimits` is called — prefer that path.
- `Session.FetchOperationLimitsAsync(ct)` (1.5+); fallback: explicit Read.
**Files**:
- `src/.../OpcUaClient/OpcUaClientDriver.cs` — call
`FetchOperationLimitsAsync` post-`OpenSessionOnEndpointAsync`; honour
caps in `ReadAsync`, `WriteAsync`, `BrowseRecursiveAsync`,
`EnrichAndRegisterVariablesAsync`, `ExecuteHistoryReadAsync`.
**Tests**: mock `Session.OperationLimits` to a value below the test batch
size and assert the driver issues N wire calls instead of one.
**Risks**: a zero on the server means "no limit" per Part 5 — don't divide
by zero.
**Docs / fixture / e2e**: new "Server OperationLimits handling"
subsection in `docs/drivers/OpcUaClient.md` documenting the auto-fetch
behaviour, the zero-means-unlimited semantics, and how to override via
options if the server reports an under-truthful value. Fixture: opc-plc
publishes the standard ServerCapabilities tree out of the box — no
container-side change; the `OpcPlcFixture` seed validates the IDs at
collection init. Integration test asserts batch reads chunk to the
fetched cap. No e2e change needed (the script's batch sizes are already
small).
---
#### PR-4: Diagnostics counters (gap #17)
**Goal**: expose per-driver counters on `DriverHealth` (or a sibling
`DriverDiagnostics` surface): publish-request count, notifications-per-
second EWMA, missing-publish-request count, dropped-notification rate,
session resets count. Operators currently see only `LastSuccessfulRead`
+ last error.
**SDK API**:
- `Subscription.Notification` event fires per published notification — bump
a counter
- `Subscription.PublishStateChanged` event for missed-publish detection
- `Session.PublishError` event for channel-level errors
- `Session.SessionClosing`/`SessionConfigurationChanged` for session-reset
attribution
**Files**:
- `src/.../OpcUaClient/OpcUaClientDriver.cs` — instrument hooks; expose via
`IDriver.GetDiagnostics()` or extend `DriverHealth`
- `src/.../Core.Abstractions/IDriver.cs` — confirm where the counter shape
lives; if `DriverHealth` is too rigid, add `IDriverDiagnostics` (mirrors
the Modbus `driver-diagnostics` RPC pattern from #154)
**Tests**: synthetic notification fan-out → assert counters increment;
session close → assert reset count bumps.
**Risks**: counters need to be lock-free hot-path safe; use
`Interlocked.Increment` and a single sliding-window clock per counter.
**Docs / fixture / e2e**: new "Driver diagnostics" section in
`docs/drivers/OpcUaClient.md` enumerating each counter and the event
that bumps it; cross-link to the `driver-diagnostics` Admin RPC
documented for Modbus (#154 pattern). Fixture: no opc-plc change
required. Integration test exercises `IDriverDiagnostics` after
forcing a session close. E2E: extend
`scripts/e2e/test-opcuaclient.ps1` with a "diagnostics snapshot" stage
that asserts publish/notification counters are non-zero after the
subscribe stage.
---
#### PR-5: CRL / revocation handling (gap #20)
**Goal**: explicit revoked-cert handling in `CertificateValidator` plus a
`RejectSHA1SignedCertificates` knob. Today the validator hooks
`BadCertificateUntrusted` only — a revoked cert silently fails as
"untrusted" with no operator-visible distinction.
**SDK API**:
- `CertificateValidator.CertificateValidation` event — inspect
`e.Error.StatusCode` for `BadCertificateRevoked`,
`BadCertificateRevocationUnknown`,
`BadCertificateIssuerRevocationUnknown`,
`BadCertificatePolicyCheckFailed`
- `SecurityConfiguration.RejectSHA1SignedCertificates`,
`SecurityConfiguration.RejectUnknownRevocationStatus`,
`SecurityConfiguration.MinimumCertificateKeySize` — direct config
bool/int knobs already on the SDK type
- `CertificateTrustList.AddCRL` / per-store CRL directories under
`%LocalAppData%\OtOpcUa\pki\{trusted,issuers}\crl\`
**Files**:
- `src/.../OpcUaClient/OpcUaClientDriver.cs``BuildApplicationConfigurationAsync`
honours new options, validator handler distinguishes revoked vs untrusted
in the surfaced error message
- `src/.../OpcUaClient/OpcUaClientDriverOptions.cs` — add
`RejectSHA1SignedCertificates`, `RejectUnknownRevocationStatus`,
`MinimumCertificateKeySize`
**Tests**: feed a SHA1-signed test cert and a revoked cert through the
validator with the new knobs on/off.
**Risks**: PKI directory layout changes — existing deployments need a
migration note.
**Docs / fixture / e2e**: new "Certificate revocation and SHA1 rejection"
subsection in `docs/drivers/OpcUaClient.md` documenting the CRL
directory layout under `%LocalAppData%\OtOpcUa\pki\{trusted,issuers}\crl\`
and the new options (with a migration note for existing PKI stores);
cross-link from `docs/security.md`. Fixture: extend
`OpcPlcFixture` / `Docker/docker-compose.yml` with an optional secured
endpoint variant and a SHA1-signed test cert checked into the test
project's resources for the validator unit test. Integration test
exercises a revoked cert via a local CRL drop. E2E: add a
`-Insecure:$false` smoke stage to `scripts/e2e/test-opcuaclient.ps1`
that asserts a revoked cert produces a distinguishable error message.
---
### Phase 2 — Discovery & curation
#### PR-6: Discovery URL `FindServers` (gap #2)
**Goal**: accept a discovery URL (`opc.tcp://host:4840` pointing at the
LDS or the server's own discovery endpoint) and surface advertised servers
+ endpoints to the operator without manual policy/mode tuple copy.
**SDK API**:
- `DiscoveryClient.CreateAsync(appConfig, new Uri(url), DiagnosticsMasks.None, ct)`
- `DiscoveryClient.FindServersAsync(null, ct)``ApplicationDescription[]`
- `DiscoveryClient.GetEndpointsAsync(null, ct)` per advertised `DiscoveryUrl`
**Files**:
- `src/.../OpcUaClient/OpcUaClientDriver.cs` — new internal
`DiscoverServersAsync` helper; extend the Admin-side discovery RPC to
invoke it (driver-diagnostics pattern from #154)
- `src/.../OpcUaClient/OpcUaClientDriverOptions.cs` — add
`DiscoveryUrl` knob (alternative to explicit `EndpointUrls` — when set
the driver runs `FindServers` at init and feeds the result into the
failover candidate list)
**Tests**: mock `DiscoveryClient` returning two advertised servers each
with three endpoints; assert the candidate list reflects the policy/mode
filter applied client-side.
**Risks**: `FindServers` itself usually requires `SecurityMode=None`
spec out in the doc that the discovery channel is unsecured even when
the data channel will be encrypted.
**Docs / fixture / e2e**: new "Discovery URL (`FindServers`)" section in
`docs/drivers/OpcUaClient.md` with the unsecured-discovery-vs-secured-
data caveat called out; cross-link from `docs/Client.CLI.md` if a
`discover` CLI command surfaces. Fixture: opc-plc already responds to
`FindServers` on the same endpoint — `OpcPlcFixture` adds a discovery
probe at collection init. Integration test exercises the helper against
the live opc-plc container and asserts at least one
`ApplicationDescription` returned. E2E: replace the hard-coded
`-RemoteUrl` stage in `scripts/e2e/test-opcuaclient.ps1` with an
optional `-DiscoveryUrl` mode that picks the first advertised endpoint.
---
#### PR-7: Selective import / namespace remap (gap #7)
**Goal**: per-branch include/exclude rules, namespace-URI remapping, and
re-keyed BrowseNames — the curation surface every commercial aggregator
ships.
**Approach**: extend `OpcUaClientDriverOptions` with a `Curation` section:
- `IncludePaths: string[]` — glob or NodeId-rooted prefix list; only paths
matching are imported
- `ExcludePaths: string[]` — wins over Include (Include is allow-list,
Exclude is block-list)
- `NamespaceRemap: Dictionary<string,string>` — upstream NS URI →
local-side alias for BrowseName generation
- `RootAlias: string` — default `"Remote"`; replaces the hardcoded folder
name today
**SDK API** — none new; this is pure local filtering inside
`BrowseRecursiveAsync` and `EnrichAndRegisterVariablesAsync`.
**Files**:
- `src/.../OpcUaClient/OpcUaClientDriverOptions.cs`
- `src/.../OpcUaClient/OpcUaClientDriver.cs`
`BrowseRecursiveAsync` consults the rule set; helper
`MapNamespaceForBrowseName` handles NS remap
**Tests**: synthetic browse tree, exercise include/exclude/remap each
independently and combined; verify the cap accounting in
`MaxDiscoveredNodes` excludes filtered nodes.
**Risks**: glob semantics — pin to a small subset (`*`, `?` only — no
character classes or `**`) to keep the doc + behaviour simple.
**Docs / fixture / e2e**: new "Curation: include/exclude and namespace
remap" section in `docs/drivers/OpcUaClient.md` with worked examples of
each rule kind and the supported glob subset; update
`docs/drivers/OpcUaClient-Test-Fixture.md` "Coverage map" with the new
filtering rows. Fixture: extend `OpcPlcProfile` to enumerate which
upstream namespaces are exercised so curation tests can target them.
Integration test seeds an Include + Exclude + Remap rule and asserts
the local tree reflects the filter. E2E: add a
`-IncludePath` / `-NamespaceRemap` set of params to
`scripts/e2e/test-opcuaclient.ps1` that asserts the local browse depth
matches the rule.
---
#### PR-8: Type definition mirroring (gap #8)
**Goal**: walk the upstream `Types` folder (`ObjectTypes`,
`VariableTypes`, `DataTypes`, `ReferenceTypes`) and project them into the
local address space so downstream UI clients keep type-aware rendering and
structured DataTypes decode correctly.
**SDK API**:
- `Session.NodeCache.FetchNode(typeNodeId)` for type metadata
- `Session.LoadDataTypeSystem` — for structured DataType encoding
- `Session.FetchTypeTree(NodeIdCollection)` — populates the session's
type cache from the server
**Files**:
- `src/.../OpcUaClient/OpcUaClientDriver.cs` — new pass-3 in `DiscoverAsync`
that walks `i=86` (Types folder) under the curation rules, registers a
parallel type subtree, and links variables to their TypeDefinition via
HasTypeDefinition references on the address-space builder
- `src/.../Core.Abstractions/IAddressSpaceBuilder.cs` — confirm whether
the builder accepts type nodes; if not, extend it (this likely is a
prerequisite — if so, it gets its own preceding PR-8a)
**Tests**: mock browse returning `BaseObjectType -> DerivedThing`;
assert local builder receives the type node + the HasTypeDefinition link.
**Risks**: significant. Type mirroring touches `IAddressSpaceBuilder`
which is a cross-cutting interface every driver depends on. If
`IAddressSpaceBuilder` already supports type nodes (Galaxy has type-like
templates), reuse that surface; otherwise this PR splits.
**Docs / fixture / e2e**: new "Type mirroring" section in
`docs/drivers/OpcUaClient.md` documenting which type nodes get walked
and how downstream UA clients see the HasTypeDefinition references; also
note in `docs/Client.UI.md` that the Browse tree now shows mirrored
types. Fixture: opc-plc already exposes the standard `Types` folder;
extend `OpcPlcProfile` to assert at least one custom ObjectType is
present. Integration test browses the local Types folder post-discovery
and asserts the upstream type chain landed. No e2e change needed beyond
extending the existing browse stage to walk under `Types`.
---
#### PR-9: Method node mirroring + `Call` passthrough (gap #9)
**Goal**: discover `NodeClass.Method` nodes in the browse pass, expose
them on the local address space, and forward `Call` invocations as
`Session.CallAsync` against the upstream node. The driver already calls
`AcknowledgeableConditionType.Acknowledge` for A&C — generalize that path.
**SDK API**:
- `Session.CallAsync(requestHeader, methodsToCall: CallMethodRequestCollection, ct)`
returning `CallMethodResultCollection`
- Browse already covers Method nodes by lifting the `NodeClassMask`; need
to additionally browse `HasProperty` to discover `InputArguments` /
`OutputArguments` for argument translation
**Files**:
- `src/.../Core.Abstractions/IDriver.cs` — add `IMethodInvoker` capability
interface (this is a NEW capability, not a tweak to an existing one)
- `src/.../OpcUaClient/OpcUaClientDriver.cs` — implement
`IMethodInvoker.InvokeAsync(string objectId, string methodId,
IReadOnlyList<object?> inputs, ct)`; refactor `AcknowledgeAsync` to
reuse the common path
- `src/.../Server/...` node-manager — wire `IMethodInvoker` to the OPC UA
server's `MethodNode.OnCallMethod` hook so downstream Call requests
reach the driver
**Tests**: mock `Session.CallAsync` returning Good + an output collection;
assert pass-through fidelity. Also assert per-argument `BadInvalidArgument`
codes pass through.
**Risks**: high — adds a new capability interface. Other drivers that
*could* support methods (Galaxy via `OnExecute` scripts, FOCAS via FOCAS
commands) gain a clean extension point but each is its own follow-up.
**Docs / fixture / e2e**: new "Method nodes and Call passthrough"
section in `docs/drivers/OpcUaClient.md` explaining how method calls
flow through the aggregator (input/output argument translation, error-
code passthrough); add a `call` command page to `docs/Client.CLI.md`
covering the new path; mirror in `docs/Client.UI.md` if a UI surface
ships. Fixture: opc-plc already exposes the standard
`Server.GetMonitoredItems` method — `OpcPlcFixture` registers it as the
canonical method-call target. Integration test in
`tests/ZB.MOM.WW.OtOpcUa.Driver.OpcUaClient.IntegrationTests/` invokes
`Server.GetMonitoredItems` through the aggregator. E2E: add a
`-MethodNodeId` stage to `scripts/e2e/test-opcuaclient.ps1` that calls
the method through the local server and asserts the output matches the
direct upstream call.
---
### Phase 3 — Change tracking
#### PR-10: Auto re-import on `ModelChangeEvent` (gap #10)
**Goal**: subscribe to `BaseModelChangeEventType` /
`GeneralModelChangeEventType` on the upstream server's `i=2253` Server
node so when the upstream topology changes (new tag added, type modified)
the driver triggers a `ReinitializeAsync`-style re-import without
operator action.
**SDK API**:
- A second `Subscription` on the Session, monitoring `Server` node
(`ObjectIds.Server`) with an `EventFilter` whose SelectClauses reference
`BaseModelChangeEventType` and (optionally) `GeneralModelChangeEventType`
Changes property
- On notification: enqueue a debounced re-discover (don't react to every
event during a bulk topology edit — coalesce 2-5s window)
**Files**:
- `src/.../OpcUaClient/OpcUaClientDriver.cs` — add `_modelChangeSubscription`
field; new `SubscribeModelChangesAsync` invoked at the end of
`InitializeAsync`; debounce timer that calls `ReinitializeAsync` on the
driver host
- `src/.../OpcUaClient/OpcUaClientDriverOptions.cs` — add
`WatchModelChanges: bool` (default true) +
`ModelChangeDebounce: TimeSpan` (default 5s)
**Tests**: synthetic event injection on the mock Session's notification
stream; assert one debounced re-import call regardless of N events
arriving in the window.
**Risks**: re-import while a downstream client is mid-browse — needs
serialization on `_gate` like the rest of the driver; document that
clients see a brief gap in the address space during reload.
**Docs / fixture / e2e**: new "Auto re-import on ModelChangeEvent"
section in `docs/drivers/OpcUaClient.md` documenting the debounce window,
the `_gate` serialization, and the brief browse-gap during reload.
Fixture: opc-plc supports runtime topology mutation via the
`addnode`/`addtag` HTTP control endpoint — extend `OpcPlcFixture` with
a helper that triggers a model change. Integration test asserts a
single re-import call after a burst of synthetic model change events.
E2E: add a "topology change" stage to
`scripts/e2e/test-opcuaclient.ps1` that calls the opc-plc control
endpoint, then asserts the local server reflects the new node within
the debounce window.
---
### Phase 4 — Connectivity
#### PR-11: Reverse Connect (gap #1)
**Goal**: support server-initiated client connect for OT-DMZ outbound-only
firewalls. The upstream server connects *to* us on a TCP listener; we
respond as the client. Hard requirement for many regulated plant networks.
**SDK API**:
- `Opc.Ua.Client.ReverseConnectManager` — manages a TCP listener on the
configured port and dispatches incoming reverse-connect requests
- `ReverseConnectManager.AddEndpoint(Uri reverseEndpoint)` — listener URI
e.g. `opc.tcp://0.0.0.0:4844`
- `ReverseConnectManager.WaitForConnection(serverUri, serverUri, ct)`
blocks until the configured server initiates a reverse connect
- `Session.Create(appConfig, reverseConnection, endpoint, ...)`
alternative session-create overload accepting the
`ITransportWaitingConnection` returned by the manager
**Files**:
- `src/.../OpcUaClient/OpcUaClientDriverOptions.cs` — add
`ReverseConnect: { Enabled, ListenerUrl, ExpectedServerUri }` section
- `src/.../OpcUaClient/OpcUaClientDriver.cs` — when reverse-connect is
enabled, replace the failover sweep with `WaitForConnection` and fall
through into the same session-create path
- New helper `ReverseConnectListener` — owns the manager lifecycle, one
listener per driver-host process (singleton across instances if multiple
reverse-connect drivers are configured)
**Tests**: spin up a `ReverseConnectClient` test against an opc-plc
container started with `--rc opc.tcp://host:4844` to verify end-to-end.
Unit tests mock `ITransportWaitingConnection`.
**Risks**: highest of the plan. Reverse Connect changes the
listen-vs-dial direction; if multiple OpcUaClient driver instances both
listen on the same port the manager must multiplex. opc-plc supports
reverse connect (`--rc` flag) so the integration test pattern from
`docs/drivers/OpcUaClient-Test-Fixture.md` extends cleanly.
**Docs / fixture / e2e**: new "Reverse Connect" section in
`docs/drivers/OpcUaClient.md` (create if missing) documenting the
listener URL config, the OT-DMZ outbound-only use case, and the shared-
listener singleton model; update `docs/drivers/OpcUaClient-Test-Fixture.md`
with the new "Reverse Connect coverage" row. Fixture: extend
`Docker/docker-compose.yml` with an `opc-plc-rc` service variant that
adds `--rc opc.tcp://host.docker.internal:4844`; `OpcPlcFixture` gains
a `[CollectionDefinition]` that wires up the reverse-connect listener
on the test side. Integration test asserts a session opens via the
reverse path. E2E: add a `-ReverseConnect` switch to
`scripts/e2e/test-opcuaclient.ps1` that flips the driver to listener
mode and verifies the bridge stage still passes.
---
### Phase 5 — Historical & redundancy
#### PR-12: `IHistoryProvider.ReadEventsAsync` interface fix + driver impl (gap #12)
**Goal**: extend `IHistoryProvider.ReadEventsAsync` to carry an
`EventFilter SelectClauses` parameter so HistoryRead Events can return
the right field projection, and implement the OPC UA Client passthrough.
**This is a cross-driver concern.** `IHistoryProvider` lives in
`Core.Abstractions` and every driver that opts into history (Galaxy,
OpcUaClient, plus any future historian-backed Tier-A driver) inherits the
default. Changing the signature is source-breaking — coordinate as one PR
that:
1. Adds the `IReadOnlyList<EventFieldProjection>` (or equivalent
abstract `EventFilterSpec`) parameter
2. Updates Galaxy's existing override (currently the only override) to
honour the projection (best-effort — the Galaxy A&E log has a fixed
field set so most projections degrade to the default columns)
3. Lands the OpcUaClient passthrough using `Session.HistoryReadAsync` with
`ReadEventDetails`
**SDK API**:
- `ReadEventDetails { StartTime, EndTime, NumValuesPerNode, Filter }`
- `Session.HistoryReadAsync` is already the call we use for Raw — pass
`new ExtensionObject(new ReadEventDetails { ... })` for events
- `HistoryEvent.Events: HistoryEventFieldList[]` — unwrap into
`HistoricalEvent` records
**Files**:
- `src/.../Core.Abstractions/IHistoryProvider.cs` — interface change
- `src/.../Driver.Galaxy.../*HistoryProvider*.cs` — adjust signature
- `src/.../OpcUaClient/OpcUaClientDriver.cs` — implement
`ReadEventsAsync`; reuse `ExecuteHistoryReadAsync` shape
- Server-side history facade — propagate the new parameter
**Tests**: integration test against opc-plc with
`--alm` (alarm sim already enabled per the fixture doc) — verify the
SelectClause projection comes back correctly.
**Risks**: the cross-driver interface change is the riskiest single
ergonomic call in this plan. If we can't fit the new parameter without
breaking every driver's `IHistoryProvider` impl, fall back to a sibling
`IEventHistoryProvider` interface and only the OPC UA Client + Galaxy
implement it. **Decide this in the PR review.**
**Docs / fixture / e2e**: new "HistoryRead Events" section in
`docs/drivers/OpcUaClient.md` documenting the `EventFilter`-aware
passthrough; update `docs/Client.CLI.md` `historyread` page to cover
event-mode reads. **Cross-driver doc updates** (this PR adds an
"`IHistoryProvider.ReadEventsAsync` signature change — see
`docs/plans/opcuaclient-plan.md` PR-12" note to every other driver
plan that has a history surface): `docs/plans/abcip-plan.md`,
`docs/plans/ablegacy-plan.md`, `docs/plans/focas-plan.md`,
`docs/plans/s7-plan.md`, `docs/plans/twincat-plan.md`, the Galaxy plan
family (`docs/plans/galaxy-*.md` if/when present, and the LMX equivalent
if it lands), and any Modbus plan. Galaxy is the only existing
implementor and gets a real signature update in this PR; the others
get a heads-up note so future work tracks the new shape. Fixture: opc-
plc runs with `--alm` already (per existing fixture doc) — no compose
change. Integration test issues a HistoryRead Events with a non-default
SelectClause and asserts the projected fields. E2E: extend
`scripts/e2e/test-opcuaclient.ps1` with a "history events" stage
gated on the `--alm` simulator producing at least one event.
---
#### PR-13: Full Aggregate function set (gap #13)
**Goal**: extend `HistoryAggregateType` from the 5 enum values today
(Average/Minimum/Maximum/Total/Count) to the OPC UA Part 13 standard
catalog of 30+ aggregates that historian-class clients expect.
**SDK API**: `ObjectIds.AggregateFunction_*` constants — one per
aggregate. The SDK already exposes them; this is pure mapping work.
Aggregates to add (Part 13 §5):
- `TimeAverage`, `TimeAverage2`
- `Interpolative`
- `MinimumActualTime`, `MaximumActualTime`, `Range`, `Range2`
- `AnnotationCount`, `DurationGood`, `DurationBad`,
`PercentGood`, `PercentBad`
- `WorstQuality`, `WorstQuality2`
- `StandardDeviationSample`, `StandardDeviationPopulation`,
`VarianceSample`, `VariancePopulation`
- `NumberOfTransitions`
- `Start`, `End`, `Delta`, `StartBound`, `EndBound`
- `DurationInStateZero`, `DurationInStateNonZero`
**Files**:
- `src/.../Core.Abstractions/IHistoryProvider.cs` — extend
`HistoryAggregateType` enum (additive — existing values keep their
ordinal)
- `src/.../OpcUaClient/OpcUaClientDriver.cs`
`MapAggregateToNodeId` switch grows; default arm rejects `out of range`
**Tests**: parametrized unit test sweeping every enum value — assert
each maps to a non-null `NodeId` in the SDK's well-known set.
**Risks**: low — this is mapping work. Drivers without a real historian
(everything except Galaxy + OpcUaClient) keep throwing `NotSupported`.
**Docs / fixture / e2e**: extend the "HistoryRead aggregates" section in
`docs/drivers/OpcUaClient.md` with the full Part 13 catalog and which
aggregates require server-side support; update
`docs/Client.CLI.md` `historyread` page enumerating the new
`--aggregate` values. Fixture: opc-plc historian support is limited —
flag in `docs/drivers/OpcUaClient-Test-Fixture.md` that the new
aggregates are unit-tested via the SDK's well-known NodeId set, not
exercised wire-side. Integration test sweeps every enum value and
asserts the mapping; gated-skip for aggregates the live opc-plc image
doesn't honour. No e2e change.
---
#### PR-14: `ServerUriArray` redundant failover (gap #14)
**Goal**: read upstream `Server.ServerArray` /
`ServerStatus.ServerArray` and `ServerRedundancyType.RedundancySupport` at
session activation; when the upstream server advertises non-`None`
redundancy, fail over mid-session on `ServiceLevel` drop without losing
client subscriptions. Today our `EndpointUrls` is a one-shot connect-
attempt list, not a live redundancy group.
**SDK API**:
- `Session.ReadValueAsync(VariableIds.Server_ServerStatus_ServerArray, ct)`
→ URI list
- `Session.ReadValueAsync(VariableIds.Server_ServiceLevel, ct)` polled or
subscribed via MonitoredItem
- Subscribe `Server_ServiceLevel` on the existing alarm subscription so
drops propagate via the publish channel
- On low-`ServiceLevel`: open a parallel session against the next URI in
`ServerArray`, `Session.TransferSubscriptionsAsync(otherSession, ...)`
the live subscriptions, swap `Session` reference
**Files**:
- `src/.../OpcUaClient/OpcUaClientDriver.cs` — new
`MonitorServerRedundancyAsync` method; integrate with the existing
`OnKeepAlive` / `SessionReconnectHandler` machinery so reconnect and
redundancy-failover share the subscription-transfer code path
- `src/.../OpcUaClient/OpcUaClientDriverOptions.cs` — add
`Redundancy: { Enabled, ServiceLevelThreshold (default 200) }`
**Tests**: with two opc-plc containers behind the driver,
artificially drop ServiceLevel on the active one and assert the
secondary takes over; assert subscription handles stay valid.
**Risks**: redundancy is the second-riskiest item after Reverse Connect.
The SDK's `TransferSubscriptions` has known edge cases when the
secondary's `SecureChannel` rejects the source-channel's authentication
token; doc that the secondary must trust the same client cert as the
primary.
**Docs / fixture / e2e**: new "Upstream redundancy (`ServerArray`)"
section in `docs/drivers/OpcUaClient.md` with the ServiceLevel
threshold, the shared-cert prerequisite for `TransferSubscriptions`,
and the ops runbook for forcing a failover; cross-link from
`docs/Redundancy.md` (which today covers OUR server's redundancy —
add a "vs upstream-side redundancy" note). Fixture: extend
`Docker/docker-compose.yml` with a second `opc-plc-secondary` service
on a different port; `OpcPlcFixture` gains a multi-endpoint variant.
Integration test drops the active server's ServiceLevel and asserts
the secondary takes over with subscription handles intact. E2E: add a
`-PrimaryUrl` / `-SecondaryUrl` pair to
`scripts/e2e/test-opcuaclient.ps1` (and matching keys to
`scripts/e2e/e2e-config.sample.json`) that scripts a primary stop +
asserts the bridge stage continues to pass.
---
## Documentation, fixture, and e2e impact
Consolidated index of every doc page, fixture asset, and e2e script touched
by the plan above. Authoritative for review — if a PR's `Docs / fixture /
e2e` line references a path not listed here, that's a checklist gap.
### Driver user docs
- `docs/drivers/OpcUaClient.md` — **create on first PR that needs it
(PR-1)** if not present, then extend with one section per PR-1 through
PR-14 covering: subscription tuning, per-tag deadband, OperationLimits
handling, diagnostics counters, CRL/SHA1, FindServers, curation,
type mirroring, methods, ModelChangeEvent, Reverse Connect, history
events, aggregates, upstream redundancy.
- `docs/drivers/OpcUaClient-Test-Fixture.md` — coverage map updated for
curation (PR-7), Reverse Connect (PR-11), aggregates note (PR-13),
redundancy multi-endpoint variant (PR-14).
- `docs/Client.CLI.md` — extended for subscribe deadband syntax (PR-2),
any `discover` command (PR-6), `call` command (PR-9), `historyread`
event mode (PR-12), `--aggregate` enum expansion (PR-13).
- `docs/Client.UI.md` — extended for Subscriptions tab deadband fields
(PR-2), Browse-tree type rendering note (PR-8), Method-call surface
(PR-9) if it ships.
- `docs/security.md` — cross-link from PR-5 (CRL/SHA1 knobs).
- `docs/Redundancy.md` — cross-link from PR-14 (note distinguishing
server-side redundancy from upstream-side redundancy).
### Fixture assets
- `tests/ZB.MOM.WW.OtOpcUa.Driver.OpcUaClient.IntegrationTests/Docker/docker-compose.yml`
— add `opc-plc-rc` (PR-11) and `opc-plc-secondary` (PR-14) service
variants; optional secured endpoint (PR-5).
- `tests/ZB.MOM.WW.OtOpcUa.Driver.OpcUaClient.IntegrationTests/OpcPlcFixture.cs`
— discovery probe at collection init (PR-6), reverse-connect listener
(PR-11), multi-endpoint variant (PR-14), model-change helper (PR-10).
- `tests/ZB.MOM.WW.OtOpcUa.Driver.OpcUaClient.IntegrationTests/OpcPlcProfile.cs`
— flag noisy analogs for deadband (PR-2), enumerate exercised
namespaces for curation (PR-7), record at least one custom ObjectType
(PR-8).
- New integration tests added per PR; all live under the existing
`tests/ZB.MOM.WW.OtOpcUa.Driver.OpcUaClient.IntegrationTests/`
collection.
- Test certs (PR-5): SHA1-signed + revoked test fixtures checked into
the unit-test project's resources.
### E2E scripts
- `scripts/e2e/test-opcuaclient.ps1` — new stages added per PR (subscription
tuning PR-1, deadband PR-2, diagnostics PR-4, CRL PR-5, discovery
PR-6, curation PR-7, method call PR-9, topology change PR-10,
reverse connect PR-11, history events PR-12, redundancy failover
PR-14). The script is the single integration point for every
driver-level e2e — keep the stages ordered top-down by phase.
- `scripts/e2e/e2e-config.sample.json` — new keys: `deadband`,
`discoveryUrl`, `includePath`, `namespaceRemap`, `methodNodeId`,
`reverseConnect`, `primaryUrl`, `secondaryUrl`.
- `scripts/e2e/test-all.ps1` — no structural change; the existing
`opcuaclient` block forwards new params after wiring them through
`e2e-config.sample.json`.
### Cross-driver impact (PR-12 — `IHistoryProvider.ReadEventsAsync`)
PR-12 changes the `IHistoryProvider.ReadEventsAsync` signature in
`Core.Abstractions` (or introduces a sibling `IEventHistoryProvider`
— pinned in PR-12 review per Open Question 2). That decision is
source-breaking for every driver that opts into history. PR-12 must
add an explicit "interface change — adopt new signature when this
driver implements `ReadEventsAsync`" note to:
- `docs/plans/abcip-plan.md`
- `docs/plans/ablegacy-plan.md`
- `docs/plans/focas-plan.md`
- `docs/plans/s7-plan.md`
- `docs/plans/twincat-plan.md`
- The Galaxy plan family — `docs/plans/galaxy-*.md` if/when those
pages exist; Galaxy is the only current implementor and gets the
real signature update in PR-12, not just a note.
- The LMX plan — `docs/plans/lmx-*.md` if/when it lands (current state:
the LMX driver's history surface is implicit through Galaxy; revisit
during PR-12 review).
- A Modbus plan page if/when one exists; Modbus does not implement
history today but the heads-up note tracks the cross-driver shape.
The cross-driver note text should be a one-paragraph "Heads up: the
`IHistoryProvider.ReadEventsAsync` interface gained an
`EventFilterSpec` parameter in OpcUaClient PR-12 (`docs/plans/opcuaclient-plan.md`).
If/when this driver implements event-history, adopt the new signature."
This pattern keeps each driver plan stable while the cross-cutting
breakage is owned by one PR.
---
## Skip-rated items (for context)
These featuregaps rows are **Build = No** and intentionally omitted from
the plan above:
| # | Gap | Why we're skipping |
| :---: | --- | --- |
| 3 | Multicast / LDS-ME registration | Server-side responsibility, not aggregator's. |
| 4 | GDS push management (Part 12) | Significant infra; rare for our deployment scale. |
| 11 | HistoryUpdate / Modified / Annotation passthrough | MES backfill scope; defer. |
| 16 | Connection / session pooling for multi-instance scale-out | Premature; current per-instance model is simple and adequate. |
| 18 | Kerberos / OAuth2 / JWT identity | Significant security work; defer until AD integration drives it (separate workstream). |
| 19 | Write attribute scope beyond `Value` | Niche; rarely used in OPC UA practice. |
If any of these get prioritized later they slot cleanly between the phases
above — none have prerequisites among the Build = Yes items.
## Open questions
1. **`ISubscribable` overload vs new method (PR-2)**: per-tag spec
carrier is needed for deadband; do we extend the existing
`SubscribeAsync` overload or add `SubscribeWithSpecsAsync`? The
former is source-breaking but cleaner; the latter is additive but
leaves two parallel paths.
2. **`IHistoryProvider.ReadEventsAsync` shape (PR-12)**: does the
`EventFilterSpec` parameter live on `IHistoryProvider` (one interface,
every driver gets it) or on a sibling `IEventHistoryProvider` (two
interfaces, only event-history drivers implement)? Memory entry
suggests the former; preference depends on whether non-OPC-UA drivers
ever expect to project arbitrary event fields. **Pin this in PR-12
review.**
3. **`IMethodInvoker` capability (PR-9)**: does this become the 9th
capability interface (currently 8/8) or is it folded into
`IWritable` as a method-invoke variant? Adding a 9th interface is
the cleaner model and matches the spec layering.
4. **Type mirroring address-space surface (PR-8)**: does
`IAddressSpaceBuilder` already accept type nodes? If yes, PR-8 is
straightforward; if no, it splits into a prerequisite PR-8a that
extends the builder, then PR-8b for the OPC UA Client wire-up. The
answer determines whether PR-8 ships in Phase 2 or slips to a later
phase.
5. **Reverse Connect listener ownership (PR-11)**: one listener per
driver instance (port collision when multiple reverse-connect
drivers run in the same process) vs one shared listener with a
`expectedServerUri` dispatcher. Shared is the right answer; pin
the singleton lifetime to the driver-host.
6. **Phase 1 ship order**: PR-1, PR-3, PR-4, PR-5 are independent and can
land in parallel. PR-2 depends on the `ISubscribable` interface
decision (Q1) — recommend landing PR-1 first to validate the
`OpcUaSubscriptionDefaults` shape, then PR-2.
+807
View File
@@ -0,0 +1,807 @@
# S7 Driver — Implementation Plan
> Source of gap analysis: [featuregaps.md → S7](../featuregaps.md#s7-siemens-s7-3004001200--1500)
>
> Covers Build = Yes items only. Skip-rated rows are noted at the end for context.
## Summary
The S7 driver (`src/ZB.MOM.WW.OtOpcUa.Driver.S7/`) ships a working scaffold over
**S7netplus 0.20**: ISO-on-TCP / S7comm, single-connection-per-PLC (`SemaphoreSlim`),
DB / M / I / Q / T / C address parsing, atomic scalar reads/writes for Bool / Byte
/ I16 / U16 / I32 / U32 / F32, polled `ISubscribable` overlay, `IHostConnectivityProbe`
via `ReadStatusAsync`, and a Snap7-server-backed CI fixture on `localhost:1102`.
The 16 Build = Yes gaps fall into six tractable phases. **The hard one is gap #1
(S7-1500 Optimized DB / Symbolic addressing)** — S7netplus speaks classic S7comm
only and cannot reach optimized DBs at all. Phase 6 calls that out as an explicit
architectural decision: ship the constraint as documentation and the rest as
S7netplus-compatible features, *or* fork to a library that supports S7Plus
(Sharp7-fork, Snap7 v2, custom S7Plus). Phases 1-5 do not depend on that decision
and are landable on the current S7netplus base.
Every PR ships unit-test coverage and — where wire semantics matter — extends the
Snap7-server profile in `Docker/server.py` so the integration fixture exercises
the new path. PRs that need real S7-1500 firmware features the simulator doesn't
mimic (PUT/GET protection, password-tier auth, SZL diagnostic buffer) call that
out and gate the live-firmware test on the dev-box S7-1500 lab rig.
Architectural invariants we explicitly preserve:
- Single connection per PLC; `_gate` (SemaphoreSlim) serializes every PDU.
- Strict address-parse-at-init; bad config fails fast with `FormatException`.
- PUT/GET-disabled mapped to sticky `BadDeviceFailure`, not Polly-retried.
- 100 ms minimum publishing interval (matches CPU mailbox scan reality).
- `WriteIdempotent` per-tag flag is the only retry-policy lever.
## Phased delivery
| Phase | Theme | PRs | Gaps closed |
|------:|-------|-----|-------------|
| 1 | Data-type correctness | PR-S7-A1..A5 | #7, #8, #9, #19 |
| 2 | Performance — multi-tag PDU packing | PR-S7-B1..B2 | #3, #22 |
| 3 | Operability knobs | PR-S7-C1..C5 | #2, #4, #20, #21, #24 |
| 4 | Workflow — symbol import + UDTs | PR-S7-D1..D3 | #5, #6, #10 |
| 5 | Diagnostics & security | PR-S7-E1..E2 | #11, #14 |
| 6 | S7-1500 Optimized DB / Symbolic | PR-S7-F (decision) | #1 |
Phases 1-3 run sequentially because Phase 2 packing and Phase 3 deadbands are
both keyed off the type-decode work in Phase 1. Phase 4 (UDT/symbol import) is
parallelizable with Phase 5; Phase 6 is gated on the library-choice decision
in Open Questions (a).
---
## Per-PR detail
### Phase 1 — Data-type correctness
#### PR-S7-A1 — 64-bit scalar types (LInt / ULInt / LReal / LWord)
Closes gap #9. `Float64`/`Int64`/`UInt64` cases in `S7Driver.ReadOneAsync`/
`WriteOneAsync` currently throw `NotSupportedException`.
- **Files**: `S7Driver.cs` (read + write switch), `S7DriverOptions.cs` (extend
`S7Size` with `LWord` for 8-byte access), `S7AddressParser.cs` (accept `DBL` /
`LD` size suffix; S7netplus encodes 8-byte access via byte-array reads, so the
parser converts `DB1.LD0` to a byte-range read internally).
- **Tests**: unit decode tests for the byte-pattern → `long` / `ulong` / `double`
conversion; Snap7-server profile gets `f64` and `i64` seed types.
- **Risks**: S7netplus's `ReadAsync(string)` does not accept `LD` natively;
fallback path is `Plc.ReadBytes(DataType.DataBlock, db, byteOffset, 8)` then
`BitConverter` with explicit endian flip (S7 is big-endian on the wire,
`BitConverter` is little-endian on x86/x64).
- **Effort**: M (3-4 days incl. tests).
- **Deps**: none.
- **Docs / fixture / e2e**: extends the type-mapping table in `docs/v2/s7.md`
with `LInt` / `ULInt` / `LReal` / `LWord` rows; adds the new sizes
(`LInt`, `ULInt`, `LReal`) to the `read` / `write` cookbook in
`docs/Driver.S7.Cli.md`; updates `docs/drivers/S7-Test-Fixture.md`
§"What it actually covers" to list the new 64-bit types and removes them
from §5 "Data types beyond the scalars"; extends the snap7 seed-type set
in `tests/ZB.MOM.WW.OtOpcUa.Driver.S7.IntegrationTests/Docker/server.py`
with `i64`, `u64`, `f64` cases; adds seeds at known offsets
(e.g. `DB1.DBL40` for i64, `DB1.DBL48` for f64) to
`Docker/profiles/s7_1500.json`; adds `S7_1500Profile` constants for the
new tags + a `Driver_reads_seeded_64bit_batch` smoke test in
`S7_1500SmokeTests`; adds an LInt loopback assertion to
`scripts/e2e/test-s7.ps1`.
#### PR-S7-A2 — STRING / WSTRING / CHAR / WCHAR
Closes gap #8 (string portion). S7 `STRING(n)` is `[max-len][actual-len][bytes...]`
(2-byte header + ASCII). `WSTRING(n)` is 4-byte header + UTF-16BE bytes. `CHAR`
is 1 byte; `WCHAR` is 2 bytes UTF-16BE.
- **Files**: `S7Driver.cs` (new `ReadStringAsync` / `WriteStringAsync` private
helpers using `Plc.ReadBytes` for raw byte-range fetch), `S7DriverOptions.cs`
(already has `StringLength`; add `S7DataType.WString`, `Char`, `WChar`).
- **Tests**: unit tests for header parsing including the "actual-len > max-len"
PLC bug case (clamp on read, reject on write); Snap7 `ascii` seed type already
exists, add `wstring` seed.
- **Risks**: write must respect the configured `StringLength` to avoid overrunning
the DB; mismatched max-len is a common field bug.
- **Effort**: M.
- **Deps**: PR-S7-A1 (byte-range read helper lands there).
- **Docs / fixture / e2e**: extends the type-mapping section in
`docs/v2/s7.md` with `STRING(n)` / `WSTRING(n)` / `CHAR` / `WCHAR`
layouts (2-byte vs 4-byte header, UTF-16BE encoding, the "actual-len >
max-len" PLC bug); extends the `read` / `write` cookbook in
`docs/Driver.S7.Cli.md` with `--type WString` / `--type Char` / `--type
WChar` examples and the `--string-length` flag for WString; updates
`docs/drivers/S7-Test-Fixture.md` §"What it actually covers" to list
ascii/wstring/char/wchar; adds `wstring`, `char`, `wchar` seed types to
`Docker/server.py` (existing `ascii` covers STRING); seeds a
`DB1.WSTRING[256]` and a `DB1.CHAR[300]` in
`Docker/profiles/s7_1500.json`; adds `Driver_round_trips_string_types`
smoke test exercising read + write of every variant; adds a string
round-trip assertion to `scripts/e2e/test-s7.ps1`.
#### PR-S7-A3 — DTL / DATE_AND_TIME / S5TIME / TIME / TOD / DATE
Closes gap #8 (date/time portion).
- DTL is 12 bytes: year(u16) / month / day / weekday / hour / minute / second / nanos(u32).
- DATE_AND_TIME (DT) is 8 bytes BCD: yy mm dd hh mm ss msH msL+dow.
- S5TIME is 16-bit BCD with a 2-bit time-base.
- TIME is `Int32` ms since 1972 (S7-300/400) or signed-ms duration (S7-1200/1500).
- TOD is `UInt32` ms since midnight; DATE is `UInt16` days since 1990-01-01.
- **Files**: `S7Driver.cs` + new `S7DateTimeCodec.cs` static class encapsulating
every encode/decode (keep the driver lean; codec is unit-testable in isolation).
- **Tests**: round-trip tests per type with golden byte vectors taken from the
Siemens "STEP 7 V18 — Programming Reference" document. Snap7-server seed
profile gains `dtl`, `dt`, `s5time`, `time` types.
- **Risks**: BCD parsing must reject invalid month/day combinations; PLC programs
occasionally write 0x00 0x00 ... when uninitialized — surface as `BadOutOfRange`
rather than parsing to year 0.
- **Effort**: L (4-5 days incl. all six types and the golden-vector suite).
- **Deps**: PR-S7-A1.
- **Docs / fixture / e2e**: extends `docs/v2/s7.md` with a new "Date / time
types" subsection documenting DTL / DT (BCD) / S5TIME / TIME / TOD /
DATE byte layouts and the S7-300/400 vs S7-1200/1500 TIME-encoding
split; adds `--type Dtl` / `--type DateAndTime` / `--type S5Time` /
`--type Time` / `--type TimeOfDay` / `--type Date` to the
`docs/Driver.S7.Cli.md` cookbook; updates
`docs/drivers/S7-Test-Fixture.md` §"What it actually covers" with the
new datetime types and removes "DTL / DATE_AND_TIME" from §5 "Data
types beyond the scalars"; adds `dtl`, `dt`, `s5time`, `time`, `tod`,
`date` seed types to `Docker/server.py` with golden-byte vectors
documented in comments; seeds `DB1.DTL[260]`, `DB1.DT[272]`,
`DB1.S5TIME[280]`, `DB1.TIME[284]`, `DB1.TOD[288]`, `DB1.DATE[292]` in
`Docker/profiles/s7_1500.json`; adds
`S7DateTimeCodecTests` (unit) + `Driver_round_trips_datetime_types`
smoke test; no `scripts/e2e/test-s7.ps1` change required (CLI cookbook
examples cover the manual surface).
#### PR-S7-A4 — Array tags (ValueRank=1)
Closes gap #7. `S7TagDefinition` currently has no array dimension; `MapDataType`
hard-codes `IsArray: false`.
- **Files**: `S7DriverOptions.cs` (extend `S7TagDefinition` with `ArrayDim` int?
and `ElementCount` int?), `S7Driver.cs` (read path: detect array tag, issue
one byte-range read covering N elements, slice client-side; write path: same
in reverse), `DiscoverAsync` reports `IsArray: true, ArrayDim: [N]`.
- **Tests**: unit tests for `Array[0..9] of Int` and `Array[0..9] of Real`;
Snap7-server profile adds an array seed type. Round-trip array-write test
proves slice ordering.
- **Risks**: S7-1500 supports multi-dim arrays; declare ValueRank=1 only and
document multi-dim as a follow-up. Array-of-UDT lands with PR-S7-D2.
- **Effort**: M.
- **Deps**: PR-S7-A1 (byte-range reads).
- **Docs / fixture / e2e**: adds an "Array tags (ValueRank=1)" subsection
to `docs/v2/s7.md` documenting `Array[0..N]` syntax + the multi-dim
follow-up note; extends `docs/Driver.S7.Cli.md` with an
`--array-count N` flag in the `read` / `write` cookbook and worked
examples for `Array[0..9] of Int` and `Array[0..9] of Real`; updates
`docs/drivers/S7-Test-Fixture.md` §"What it actually covers" to list
array round-trips and removes "arrays of structs" from §5 (struct
arrays land in PR-S7-D2); extends `Docker/server.py` with an `array`
meta-seed-type that takes an inner-type + count and lays out N elements
contiguously; seeds `DB1.ArrayInt[300]` (10×Int) and
`DB1.ArrayReal[320]` (10×Real) in `Docker/profiles/s7_1500.json`;
adds `Driver_round_trips_array_int10` + `Driver_round_trips_array_real10`
smoke tests proving slice ordering; adds an array round-trip assertion
to `scripts/e2e/test-s7.ps1`.
#### PR-S7-A5 — LOGO! 8 + S7-200 V-memory area
Closes gap #19. `S7AddressParser` currently rejects the `V` area letter.
- **Files**: `S7AddressParser.cs` (add `V` case → maps to `S7Area.DataBlock` with
`DbNumber=1` for S7-200 / DbNumber per LOGO! VM-mapping table; document the
conversion), `S7DriverOptions.cs` (note CpuType-dependent meaning of V).
- **Tests**: unit tests for `VW0` / `VD4` / `V0.0` parsing, both S7-200 and
LOGO! conventions; document caller responsibility to set `CpuType.S7200` or
`S7200Smart`.
- **Risks**: LOGO! VM base address differs by firmware (V0=0 vs V0=1024 depending
on block); document the offset table rather than auto-detecting.
- **Effort**: S (1-2 days, mostly parser + tests; no wire changes).
- **Deps**: none.
- **Docs / fixture / e2e**: adds a "LOGO! 8 / S7-200 V-memory" subsection
to `docs/v2/s7.md` covering the `V` area letter, the `S7200` /
`S7200Smart` CpuType pre-requisite, the LOGO! VM-mapping table by
firmware band, and the "V0 = DB1.DBX0.0" semantic; extends the address
grammar cheat sheet in `docs/Driver.S7.Cli.md` with `VW0` / `VD4` /
`V0.0` rows and a `-c S7200Smart` worked example; updates
`docs/drivers/S7-Test-Fixture.md` §"What it does NOT cover" item 4 to
note S7-200 / LOGO! parser coverage now exists at unit level; adds
unit-only `S7AddressParserTests` cases — no Snap7 fixture change
(server.py already exposes DB1, which is where V-memory aliases land);
no `scripts/e2e/test-s7.ps1` change required (live-LOGO! testing is
documented as field-only).
### Phase 2 — Performance (multi-tag PDU packing + block coalescing)
#### PR-S7-B1 — Multi-variable PDU packing
Closes gap #3. `ReadAsync(IReadOnlyList<string>)` currently issues one
`plc.ReadAsync` per tag inside the semaphore — N PDUs for N tags.
- **Files**: `S7Driver.cs` (replace per-tag loop with a packer that builds a
list of `S7.Net.Types.DataItem`, calls `plc.ReadMultipleVarsAsync`, then
fans the results back to the per-tag decoder). Keep the existing per-tag
decode switch — only the wire fetch becomes batched.
- **Tests**: integration test that subscribes to 100 tags and asserts the
packet count seen by the Snap7 server is 1 (or N / packing-budget) rather
than 100. Unit-level test covers packer chunking when the negotiated PDU
size won't fit all items.
- **Risks**: `ReadMultipleVarsAsync` errors are per-item; we must surface
per-tag StatusCodes correctly rather than failing the whole batch on one
bad tag. Packing budget = `negotiatedPduSize - 18 (header) - per_item(12)`,
conservatively cap at 19 items per PDU on a 240-byte PDU.
- **Effort**: L (5-6 days incl. the per-item-error fan-out semantics).
- **Deps**: Phase 1 PRs do not block this — but conflicts in `S7Driver.cs`
are likely, so land Phase 1 first.
- **Docs / fixture / e2e**: adds a "Performance — multi-variable PDU
packing" subsection to `docs/v2/s7.md` describing
`ReadMultipleVarsAsync`, the negotiated-PDU packing budget formula
(`pdu - 18 - 12·N`), the 19-items-per-240-byte-PDU rule of thumb, and
the per-item-error semantics; no `docs/Driver.S7.Cli.md` change (CLI
is single-tag); no Snap7-server seed change required (existing seeds
cover the wire path); adds
`S7MultiVarPduPackingTests` to the unit suite (planner chunking when
items don't fit) + a 100-tag perf integration test
`Driver_packs_100_tags_into_minimum_pdus` that asserts request-count
reduction; no `scripts/e2e/test-s7.ps1` change required.
#### PR-S7-B2 — Block-read coalescing for contiguous DBs
Closes gap #22. Reading `DB1.DBW0`, `DB1.DBW2`, `DB1.DBW4` should issue one
6-byte byte-range read against DB1 starting at offset 0, sliced client-side.
- **Files**: `S7Driver.cs` adds a planner pass: group same-DB tags by
contiguous byte ranges (gap-merge threshold = configurable, default 16
bytes; over-fetching 16 bytes is cheaper than one extra PDU). Merged ranges
become a single `Plc.ReadBytes` call; the result is sliced per-tag.
- **Tests**: unit tests for the merge planner (input list → expected ranges);
integration test with 50 contiguous DB words proves wire-level reduction.
- **Risks**: STRINGs / arrays should opt out of merging because the per-tag
byte size is variable. Add an "opaque-size" flag so the planner skips them.
- **Effort**: M.
- **Deps**: PR-S7-B1 (the multi-var packer). The two interact: the planner
emits sum-reads, then the packer puts multiple sum-reads on one PDU.
- **Docs / fixture / e2e**: extends the §"Performance" section in
`docs/v2/s7.md` with a "Block-read coalescing" subsection — the
default 16-byte gap-merge threshold, the opaque-size opt-out for
STRINGs / arrays, and operator guidance for tuning the threshold per
DB; no CLI doc change; no Snap7-server seed change (existing
contiguous DB1 seeds — DBW0 / DBW10 / DBD20 — already exercise
contiguous-merge); adds
`S7BlockCoalescingPlannerTests` (unit) covering the merge planner +
opaque opt-out; adds a 50-contiguous-DBW integration test
`Driver_coalesces_contiguous_DBWs_into_single_byte_range_read` that
asserts wire-level reduction; no `scripts/e2e/test-s7.ps1` change.
### Phase 3 — Operability
#### PR-S7-C1 — PDU size negotiation surfaced
Closes gap #2. S7netplus's `Plc` instance exposes the negotiated PDU size after
`OpenAsync` via `Plc.MaxPDUSize`.
- **Files**: `S7Driver.cs` (read `Plc.MaxPDUSize` after open, store on
`_health`; expose via `GetHealth().Diagnostics["NegotiatedPduSize"]`
this requires adding a `Diagnostics` dictionary to `DriverHealth`, which
is a Core change). Operator-visible via the Admin UI driver-diagnostics
panel that already renders Modbus diagnostic stats.
- **Tests**: integration test asserts the value is non-zero after init.
- **Risks**: `DriverHealth` extension must be backward-compatible — existing
drivers should still compile against the unchanged record. Make the new
property nullable with a default of `null`.
- **Effort**: S.
- **Deps**: Core `DriverHealth` shape change (single PR coordinated with
the Modbus diagnostic surface).
- **Docs / fixture / e2e**: adds a "Diagnostics surfacing" subsection to
`docs/v2/s7.md` documenting the `Diagnostics["NegotiatedPduSize"]`
surface + how it renders in the Admin UI driver-diagnostics panel;
no CLI doc change (CLI doesn't expose diagnostics); updates
`docs/drivers/S7-Test-Fixture.md` §"What it actually covers" with a
"negotiated PDU size surfaces in driver health" line; no Snap7
seed-type change (snap7's PDU negotiation is fixed at 240 bytes —
document the fixture's negotiated size in the README); adds
`Driver_exposes_negotiated_pdu_size_post_init` smoke test asserting
the value is non-zero; no `scripts/e2e/test-s7.ps1` change.
#### PR-S7-C2 — TSAP / Connection Type selector
Closes gap #4. S7netplus picks PG-class TSAPs by default; hardened CPUs may
require OP / S7-Basic / Other.
- **Files**: `S7DriverOptions.cs` (new `TsapMode` enum: `Auto` / `Pg` / `Op` /
`S7Basic` / `Other`; `Auto` preserves current behavior. Optional
`LocalTsap` / `RemoteTsap` `ushort?` for explicit override). `S7Driver.cs`
branches on the mode to pick the S7netplus `Plc(CpuType, ...)` constructor
vs the `Plc(string ip, byte rack, byte slot, ushort localTsap, ushort remoteTsap)`
raw-TSAP overload. Document the raw-TSAP table in `docs/v2/s7.md`.
- **Tests**: unit test on the mode → TSAP-byte mapping; live-firmware test
documented but only runnable against the dev-box S7-1500 lab rig.
- **Risks**: wrong TSAP causes connection refused at handshake — same failure
shape as wrong slot. Document the mapping prominently.
- **Effort**: M.
- **Deps**: none.
- **Docs / fixture / e2e**: adds a "TSAP / Connection Type" section to
`docs/v2/s7.md` covering the `TsapMode` enum, the raw-TSAP table
(PG = 0x0100/0x0102, OP = 0x0200/0x0202, S7-Basic = 0x0300/0x0302,
Other = caller-supplied), and the hardened-CPU motivation; adds
`--tsap-mode` and `--local-tsap` / `--remote-tsap` flags to
`docs/Driver.S7.Cli.md`'s common-flags table with a worked example
hitting an OP-class TSAP; no Snap7 seed change (snap7 accepts any
TSAP from the CLI, so the unit-level mapping test is sufficient); no
smoke test change (live-firmware-only); no `scripts/e2e/test-s7.ps1`
change.
#### PR-S7-C3 — Per-tag scan group / publish rate
Closes gap #20. `SubscribeAsync` takes one publishing interval for the whole
list; mixed 100 ms / 1 s / 10 s tags need three subscribe calls today.
- **Files**: `S7DriverOptions.cs` (extend `S7TagDefinition` with optional
`ScanGroup` string). `S7Driver.cs` (`SubscribeAsync` partitions the input
list into one poll loop per distinct interval; `PollGroupEngine`-style
internal group, but driver-local — same engine the TwinCAT driver uses).
- **Tests**: unit test with three tags at three rates asserts three independent
poll-tick streams; integration test asserts no group starves the others.
- **Risks**: the `_gate` semaphore still serializes — three poll loops can
contend. Document the contention as part of the "1 connection / 1 mailbox"
invariant; if it bites, follow-up adds a fairness queue.
- **Effort**: M.
- **Deps**: none.
- **Docs / fixture / e2e**: adds a "Per-tag scan groups" subsection to
`docs/v2/s7.md` documenting `S7TagDefinition.ScanGroup`, the multi-rate
partitioning semantics, and the `_gate` contention caveat; no CLI doc
change (CLI is single-tag); no Snap7 seed change required (existing
scalar seeds suffice); adds `S7ScanGroupPartitioningTests` (unit) +
`Driver_three_scan_groups_publish_independently` smoke test that
subscribes 3 tags at 100 ms / 1 s / 10 s rates and asserts
independent tick streams; no `scripts/e2e/test-s7.ps1` change
(subscribe assertion already covers the polling path).
#### PR-S7-C4 — Deadband / on-change with thresholds
Closes gap #21. `PollOnceAsync` currently does `!Equals(prev, current)` only —
no analog deadband.
- **Files**: `S7DriverOptions.cs` (extend `S7TagDefinition` with
`DeadbandAbsolute double?` and `DeadbandPercent double?`). `S7Driver.cs`
(`PollOnceAsync` evaluates per-tag deadband for numeric types; non-numeric
types fall through to exact equality).
- **Tests**: unit tests for absolute and percent deadbands at edge cases
(NaN, ±Infinity, sign flip, near-zero percent).
- **Risks**: percent deadband against a zero baseline diverges; document and
fall back to absolute when |baseline| < 1e-6.
- **Effort**: S.
- **Deps**: PR-S7-C3 helpful but not required.
- **Docs / fixture / e2e**: adds a "Deadband / on-change" subsection to
`docs/v2/s7.md` documenting `DeadbandAbsolute` / `DeadbandPercent` per
tag, NaN / ±Infinity / sign-flip / near-zero-percent edge cases, and
the |baseline| < 1e-6 fallback; no CLI doc change (CLI's `subscribe`
already polls on change); no Snap7 seed change; adds
`S7DeadbandTests` (unit) covering all edge cases — no integration test
required since deadband is pre-publish filtering inside the polling
loop; no `scripts/e2e/test-s7.ps1` change.
#### PR-S7-C5 — Pre-flight PUT/GET enablement test
Closes gap #24. We currently surface `BadDeviceFailure` only at first read.
Add a pre-flight check during `InitializeAsync` (after `OpenAsync`) that issues
one trivial read (`MW0` or the configured `Probe.ProbeAddress`) and surfaces
the dedicated diagnostic message before declaring `DriverState.Healthy`.
- **Files**: `S7Driver.cs` (`InitializeAsync` adds the probe read; on
`S7.Net.PlcException` with the PUT/GET-disabled error code, throw a
typed `S7PutGetDisabledException` with a configuration-fix hint).
- **Tests**: integration test toggles a Snap7 simulator quirk that mimics
the PUT/GET-disabled response (Snap7 doesn't model this; gate the test
on a `--with-real-plc` opt-in or document as live-firmware-only).
- **Risks**: pre-flight against a real `Probe.ProbeAddress` requires the
address to exist in the PLC; document that the default `MW0` is fine for
most installs but allow `null` / "skip" for sites that haven't wired one.
- **Effort**: S.
- **Deps**: none.
- **Docs / fixture / e2e**: extends the "PUT/GET must be enabled" section
of `docs/Driver.S7.Cli.md` with the new typed
`S7PutGetDisabledException` message + the "skip pre-flight" knob;
adds the same content as a "Pre-flight PUT/GET enablement" subsection
in `docs/v2/s7.md`; no Snap7 seed change (snap7 doesn't model
PUT/GET-disabled — the test for the success path uses the existing
MW0 seed); adds `Driver_preflight_passes_when_probe_address_seeded`
smoke test; documents the live-firmware test as gated on a
`--with-real-plc` opt-in flag in `docs/drivers/S7-Test-Fixture.md`
§"Follow-up candidates"; no `scripts/e2e/test-s7.ps1` change (probe
test already runs first).
### Phase 4 — Workflow (symbol import + UDTs + instance DBs)
#### PR-S7-D1 — Symbol-table / TIA Portal export browse
Closes gap #5. Operators currently hand-edit `S7TagDefinition` JSON. TIA Portal
exports symbols as **`.s7p` archive → External tags → CSV / SDF**. The lighter
target is the CSV format used by the "Generate source from blocks" exporter.
- **Files**: new `src/ZB.MOM.WW.OtOpcUa.Driver.S7/SymbolImport/` directory:
- `TiaCsvImporter.cs` — parses TIA Portal "Show all tags" CSV (`Name`,
`Address`, `Data type`, `Comment`, `Visible in HMI`). Output: list of
`S7TagDefinition`.
- `AwlImporter.cs` — best-effort AWL `VAR_GLOBAL` / `DATA_BLOCK` parser
for legacy STEP 7 Classic projects.
- **Files (Admin UI)**: a "Import S7 symbols" button on the Driver Tags tab
that POSTs the file to a new `POST /api/drivers/{id}/import-s7-symbols`
endpoint and reports the diff.
- **Tests**: unit tests with golden-input CSV / AWL fixtures; round-trip
test that imports → produces tags → reads against simulator.
- **Risks**: TIA Portal CSV is locale-dependent (decimal-comma in DE locale).
Detect from the header row and accept both. UDT-typed symbols import as
a placeholder until PR-S7-D2.
- **Effort**: L (5-7 days incl. the Admin UI flow).
- **Deps**: see Open Question (c) — confirm CSV+AWL is the right scope, or
whether `.s7p` / `.zip` archive parsing is required.
- **Docs / fixture / e2e**: adds new doc
`docs/drivers/S7-TIA-Import.md` documenting the supported TIA Portal
CSV format (column names, locale-comma detection, UDT-typed
placeholders) and the AWL `VAR_GLOBAL` / `DATA_BLOCK` parser scope;
cross-links it from `docs/v2/s7.md`'s new "Symbol import" section
and from `docs/Driver.S7.Cli.md` with a future `import` subcommand
hook; adds golden-input fixtures
`tests/ZB.MOM.WW.OtOpcUa.Driver.S7.IntegrationTests/Fixtures/sample_tia_export.csv`,
`sample_tia_export_de_locale.csv`, and `sample_step7_classic.awl`;
no Snap7 seed change required (existing DB1 seeds support
the import-then-read round-trip); adds `TiaCsvImporterTests` and
`AwlImporterTests` (unit) + `Driver_imports_csv_then_reads_seeded_tags`
integration test that imports the sample CSV → reads via Snap7;
no `scripts/e2e/test-s7.ps1` change (Admin-UI flow has its own
end-to-end coverage in the Admin UI test suite).
#### PR-S7-D2 — UDT / STRUCT / nested-DB handling
Closes gap #6. Today's tag map is flat scalar-only; UDT-typed DBs are
unusable without hand-flattening every member.
- **Files**: `S7DriverOptions.cs` (extend `S7TagDefinition` with `UdtName string?`;
alongside, a new `IReadOnlyList<S7UdtDefinition> Udts` on the options that
declares the layout: name, ordered members `(Name, Offset, S7DataType, ArrayDim?)`).
`S7Driver.cs` fans a UDT-typed tag into per-member sub-tags at `InitializeAsync`,
so the read/write path stays scalar-only.
- **Tests**: unit tests for fan-out with nested UDTs (UDT-of-UDT); integration
test with a Snap7 DB seeded as a UDT-shape byte array proves the fan-out
decodes correctly.
- **Risks**: UDT-of-UDT arbitrary nesting depth — cap at 4 levels and reject
deeper with a clear error. Optimized DBs would let TIA reorder members,
re-introducing gap #1; document that user-defined UDTs require "Optimized
block access" off, same as the general DB rule.
- **Effort**: L (1-2 weeks).
- **Deps**: PR-S7-D1 (symbol importer drops UDT-typed entries with a
placeholder; D2 makes those usable).
- **Docs / fixture / e2e**: adds a "UDT / STRUCT support" section to
`docs/v2/s7.md` documenting `S7UdtDefinition`, the fan-out
semantics, the 4-level nesting cap, and the "Optimized block access
must be off" prerequisite; extends `docs/drivers/S7-TIA-Import.md`
(created in PR-S7-D1) with a UDT-typed-entry section showing how
the importer + `Udts` declaration cooperate; updates
`docs/drivers/S7-Test-Fixture.md` §"What it does NOT cover" item 5 to
remove "UDT fan-out"; extends `Docker/server.py` with a
`udt_layout` meta-seed-type that lays out per-member offsets within
a DB byte range; seeds a `DB1.MyUdt[400]` (e.g. Real + Int + Bool)
in `Docker/profiles/s7_1500.json`; adds `S7UdtFanOutTests` (unit) +
`Driver_fans_out_udt_into_member_tags` integration test covering a
nested-UDT case; adds a UDT-member round-trip assertion to
`scripts/e2e/test-s7.ps1`.
#### PR-S7-D3 — Instance-DB / FB parameter access
Closes gap #10. Multi-instance FBs are addressed symbolically (`MyFB_Instance.MyParam`)
with no fixed absolute DB byte offset visible without a TIA project export.
- **Files**: extends PR-S7-D1's importer to recognize "instance DB" entries
(TIA export shows them with a different "DB type" column value); the
importer translates `MyFB_Instance.MyParam` to the resolved
`DBn.DBW_offset` based on the FB's interface declaration in the export.
- **Tests**: golden-input test with an FB-instance DB export; resolved
addresses match Siemens reference.
- **Risks**: when the FB interface changes (TIA "online change"), instance-DB
layouts shift. Document that re-import is required after any FB-interface
edit. Eventually surface this as a startup warning when the symbol-table
hash differs from the imported snapshot — out of scope for this PR.
- **Effort**: M.
- **Deps**: PR-S7-D1, PR-S7-D2.
- **Docs / fixture / e2e**: extends `docs/drivers/S7-TIA-Import.md` with
an "Instance DBs / FB parameters" section covering the importer's
`MyFB_Instance.MyParam``DBn.DBW_offset` resolution, the "DB type"
column convention, and the "re-import on FB-interface edit" caveat;
adds the same caveat as a paragraph in `docs/v2/s7.md`'s "UDT /
STRUCT" section; adds a golden-input fixture
`Fixtures/sample_tia_export_with_fb_instance.csv` to the integration
tests; no Snap7 seed change required (resolved addresses land in DB1
which the existing seeds back); adds
`InstanceDbResolverTests` (unit) +
`Driver_resolves_fb_instance_then_reads_seeded_member` integration
test; no `scripts/e2e/test-s7.ps1` change (FB-instance lookup is an
import-time concern).
### Phase 5 — Diagnostics & security
#### PR-S7-E1 — CPU diagnostic buffer / SZL reads
Closes gap #11. SZL (System Status List) IDs surface CPU type, firmware
version, cycle-time min/avg/max, and the diagnostic-buffer entries.
- **Files**: `S7Driver.cs` exposes a small set of "system tags" alongside
`Tags` — virtual addresses prefixed `@System.` that the read path
recognizes and dispatches to S7netplus's `ReadSzlAsync` (or, if not
exposed, a raw `Plc.ReadBytes` against the SZL-via-S7comm sub-protocol):
- `@System.CpuType`, `@System.Firmware`, `@System.OrderNo` — SZL 0x0011
- `@System.CycleMs.Min` / `.Max` / `.Avg` — SZL 0x0132 / 0x0432
- `@System.DiagBuffer[0..N]` — SZL 0x00A0 ring-buffer entries
- **Files (discovery)**: `DiscoverAsync` adds a `Diagnostics/` subfolder
with the system-tag set when `S7DriverOptions.ExposeSystemTags = true`.
- **Tests**: unit tests for the SZL response parser (golden bytes); live-
firmware test against the dev-box S7-1500.
- **Risks**: S7netplus's SZL surface is incomplete; may need a raw
`Plc.ReadBytes` against `0x84` register or a small SZL-PDU helper.
- **Effort**: M-L.
- **Deps**: PR-S7-C1 (`DriverHealth.Diagnostics` dictionary already there).
- **Docs / fixture / e2e**: adds a "CPU diagnostics (SZL)" section to
`docs/v2/s7.md` listing the exposed `@System.*` virtual addresses, the
underlying SZL IDs, and the `ExposeSystemTags` opt-in; extends
`docs/Driver.S7.Cli.md` with a worked `read -a @System.CpuType` example
in the cookbook; updates `docs/drivers/S7-Test-Fixture.md` §"What it
does NOT cover" with a note that snap7 does not implement SZL — golden-
byte unit tests cover the parser, live SZL is gated on a real S7-1500;
no Snap7 seed change (snap7 returns a fixed handshake banner that the
test checks for "SZL not supported on simulator" branch); adds
`S7SzlParserTests` (unit) with golden bytes; documents the live SZL
test in `docs/drivers/S7-Test-Fixture.md` §"Follow-up candidates"; no
`scripts/e2e/test-s7.ps1` change.
#### PR-S7-E2 — PLC password / protection-level handling
Closes gap #14. S7-300/400 protection levels 1-3 and S7-1200/1500 connection
mechanisms can require a password on connect.
- **Files**: `S7DriverOptions.cs` (new `Password string?` and `ProtectionLevel`
enum). `S7Driver.cs` calls S7netplus's `SetPassword` (if the API surfaces it
— newer S7netplus versions ship `Plc.SendPassword(string)`; if not, raw-PDU
fallback per Siemens "Communication Function Manual" §5.2).
- **Tests**: live-firmware-gated; password-tier failure modes don't reproduce
in Snap7. Unit-level coverage for the options-binding shape only.
- **Risks**: S7netplus may not expose password auth — fallback is to call into
the lower-level `S7.Net.S7Protocol` types or to fork. Land the options
surface unconditionally, gate the wire path on library support, document
the limitation if the library doesn't oblige.
- **Effort**: M (S if S7netplus ships it; L if we need a fallback path).
- **Deps**: none.
- **Docs / fixture / e2e**: adds a "PLC password / protection levels"
section to `docs/v2/s7.md` documenting the `Password` /
`ProtectionLevel` options + the S7-300/400 levels 1-3 vs S7-1200/1500
connection-mechanism semantics + the "limitation if S7netplus
doesn't ship `SendPassword`" note; adds a `--password` flag to
`docs/Driver.S7.Cli.md`'s common-flags table with a hardened-CPU
worked example; updates `docs/drivers/S7-Test-Fixture.md` §"What it
does NOT cover" with a "password / protection levels not modelled by
snap7" note; no Snap7 seed change (snap7 doesn't enforce protection
levels); adds options-binding unit tests only — no integration test
(live-firmware-only); no `scripts/e2e/test-s7.ps1` change.
### Phase 6 — S7-1500 Optimized DB / Symbolic addressing (decision PR)
#### PR-S7-F — Optimized DB / S7Plus
Closes gap #1. **This is an architectural decision PR, not a code PR.**
S7netplus speaks classic S7comm only. Optimized DBs on S7-1500 (default for
new TIA projects) reorder fields and have no fixed byte offsets — absolute
`DB1.DBW0` reads return `BadDeviceFailure`. Three tracks:
1. **Document the constraint and stay on S7netplus.** Operators must uncheck
"Optimized block access" in TIA Portal for any DB the driver reads. This
is what the test fixture already documents. Effort: S (docs only).
2. **Migrate to a library that supports S7Plus.**
- **Snap7 v2 / `Snap7Net`** — C-library wrapper, supports classic S7comm
only (same limitation as S7netplus). Not a fix.
- **Sharp7 fork** — community fork of Snap7 with **partial** S7-1200/1500
PUT/GET semantics. Still classic S7comm.
- **Custom S7Plus implementation** — Wireshark dissector exists; reverse
engineering is substantial. Effort: ≥ 4 weeks; ongoing protocol-version
maintenance. Risk: Siemens has not published S7Plus.
3. **Embed an OPC UA → OPC UA bridge to the S7-1500's onboard OPC UA server.**
The S7-1500 V2.5+ exposes its own OPC UA server with full symbolic access.
Our `OPC UA Client driver` (already shipping per memory) could read the
target CPU's OPC UA server and re-publish — sidesteps S7Plus entirely.
Effort: S; semantics: requires the customer to license Siemens OPC UA
on the CPU. Most modern S7-1500 deployments already license it.
**Recommendation**: ship Track 1 docs immediately (closes the operator
expectation gap) and Track 3 as the Optimized-DB workflow path (re-uses
existing OPC UA Client driver). Track 2 (S7Plus reverse-engineering) is
out of scope unless a customer pays for it.
- **Files**: `docs/v2/s7.md` (Optimized DB section + how to disable),
`docs/featuregaps.md` row #1 updated to reflect the Track 1+3 decision.
- **Tests**: live-firmware test against the dev-box S7-1500 with optimized
block access toggled both ways, asserting `BadDeviceFailure` vs
successful read.
- **Risks**: Track 3's OPC-UA-Client-bridging needs Admin UI plumbing to
configure; that's a larger workstream tracked separately.
- **Effort**: S (docs + decision); L if Track 2 is taken.
- **Deps**: Open Question (a) below.
- **Docs / fixture / e2e**: rewrites `docs/v2/s7.md` to land a
prominent "Optimized DB constraint" section at the top — explicitly
documents the S7-1200 V4.0+ / S7-1500 default, the
`BadDeviceFailure` shape on absolute `DB1.DBW0` reads against an
optimized DB, the "Uncheck Optimized block access in TIA Portal"
fix, and the recommended **bridge-via-OpcUaClient** pattern with a
worked example (Siemens S7-1500 V2.5+ onboard OPC UA server →
`OpcUaClient` driver → re-publish on the OtOpcUa server's address
space); updates `docs/featuregaps.md` row #1 to reflect the
Track 1+3 decision; updates the "Optimized-DB" line of
`docs/drivers/S7-Test-Fixture.md` §"What it does NOT cover" item 4
to point at the new doc; no CLI doc change (CLI is a probe tool, not
the bridging path); no Snap7 fixture change (snap7 has no Optimized-
DB mode); the live-firmware test toggling Optimized block access on
/ off is recorded as a manual checklist in
`docs/drivers/S7-Test-Fixture.md` §"Follow-up candidates" and gated
behind `--with-real-plc`; if Track 2 is taken later, this PR's doc
surface becomes the migration baseline; no `scripts/e2e/test-s7.ps1`
change.
---
## Documentation, fixture, and e2e impact
Consolidated view of every per-PR `Docs / fixture / e2e` line above, so a
reviewer can see the cross-cutting churn at a glance and so the doc /
fixture / e2e maintainers can sequence their work alongside the code PRs.
### User-facing documentation churn
| PR | `docs/v2/s7.md` | `docs/Driver.S7.Cli.md` | `docs/drivers/S7-Test-Fixture.md` | New / cross-cut docs |
|----|-----------------|-------------------------|------------------------------------|----------------------|
| PR-S7-A1 (LInt/ULInt/LReal/LWord) | extend type-mapping table | new sizes in cookbook | remove "no 64-bit types" | — |
| PR-S7-A2 (STRING/WSTRING/CHAR/WCHAR) | string layout subsection | `--type WString` / `--string-length` | list new types | — |
| PR-S7-A3 (DTL/DT/S5TIME/TIME/TOD/DATE) | "Date / time types" subsection | datetime cookbook entries | list new types | — |
| PR-S7-A4 (arrays) | "Array tags (ValueRank=1)" subsection | `--array-count` flag + examples | list array round-trips | — |
| PR-S7-A5 (V-memory) | "LOGO! 8 / S7-200 V-memory" subsection | grammar table + S7200Smart example | parser coverage note | — |
| PR-S7-B1 (PDU packing) | "Performance — multi-variable PDU packing" subsection | — | — | — |
| PR-S7-B2 (block coalescing) | "Block-read coalescing" subsection | — | — | — |
| PR-S7-C1 (negotiated PDU diag) | "Diagnostics surfacing" subsection | — | "negotiated PDU size" line | — |
| PR-S7-C2 (TSAP) | "TSAP / Connection Type" section | `--tsap-mode` / `--local-tsap` / `--remote-tsap` flags | — | — |
| PR-S7-C3 (scan groups) | "Per-tag scan groups" subsection | — | — | — |
| PR-S7-C4 (deadband) | "Deadband / on-change" subsection | — | — | — |
| PR-S7-C5 (PUT/GET pre-flight) | "Pre-flight PUT/GET enablement" subsection | extend "PUT/GET must be enabled" | mark live-firmware test | — |
| PR-S7-D1 (TIA CSV / AWL import) | "Symbol import" cross-link | future `import` subcommand stub | — | **new `docs/drivers/S7-TIA-Import.md`** |
| PR-S7-D2 (UDT / STRUCT) | "UDT / STRUCT support" section | — | remove "UDT fan-out" | extend `S7-TIA-Import.md` |
| PR-S7-D3 (instance DB) | re-import-on-FB-edit caveat | — | — | extend `S7-TIA-Import.md` |
| PR-S7-E1 (SZL diagnostics) | "CPU diagnostics (SZL)" section | `read -a @System.CpuType` example | "SZL not modelled by snap7" + Follow-up | — |
| PR-S7-E2 (PLC password) | "PLC password / protection levels" section | `--password` flag | "password not modelled by snap7" | — |
| PR-S7-F (Optimized DB / S7Plus) | top-level "Optimized DB constraint" + bridge-via-OpcUaClient worked example | — | point §"What it does NOT cover" at new doc | also updates `docs/featuregaps.md` row #1 |
### Snap7-server fixture seed-type additions per PR
The snap7 simulator at `localhost:1102` (driven by
`tests/ZB.MOM.WW.OtOpcUa.Driver.S7.IntegrationTests/Docker/server.py` +
`Docker/profiles/s7_1500.json`) has a `seed_buffer` pump with a fixed type
set — `u8 / i8 / u16 / i16 / u32 / i32 / f32 / bool / ascii`. New PRs need
new seed-type cases in `server.py`, new offsets in `s7_1500.json`, and
matching constants in `S7_1500Profile.cs`. The table below names the
delta for each Build-Yes PR:
| PR | New `server.py` seed types | New `s7_1500.json` seed offsets | `S7_1500Profile.cs` additions |
|----|----------------------------|----------------------------------|-------------------------------|
| PR-S7-A1 | `i64`, `u64`, `f64` | `DB1.DBL40` (i64), `DB1.DBL48` (f64), `DB1.DBL56` (u64) | `SmokeI64Tag` / `SmokeU64Tag` / `SmokeF64Tag` |
| PR-S7-A2 | `wstring`, `char`, `wchar` (existing `ascii` covers STRING) | `DB1.WSTRING[256]`, `DB1.CHAR[300]` | `SmokeWStringTag` / `SmokeCharTag` |
| PR-S7-A3 | `dtl`, `dt`, `s5time`, `time`, `tod`, `date` (golden-byte vectors in comments) | `DB1.DTL[260]`, `DB1.DT[272]`, `DB1.S5TIME[280]`, `DB1.TIME[284]`, `DB1.TOD[288]`, `DB1.DATE[292]` | `SmokeDtl` / `SmokeDt` / `SmokeS5Time` / `SmokeTime` / `SmokeTod` / `SmokeDate` |
| PR-S7-A4 | `array` meta-seed (inner-type + count) | `DB1.ArrayInt[300]` 10×Int, `DB1.ArrayReal[320]` 10×Real | `ArrayInt10Tag` / `ArrayReal10Tag` |
| PR-S7-A5 | none (V-memory aliases land in DB1, which `server.py` already exposes) | none | unit-only — no profile change |
| PR-S7-B1 | none | none (existing scalar seeds suffice for packing) | none — perf integration test reuses scalar tags |
| PR-S7-B2 | none | none (existing contiguous DBW0 / DBW10 / DBD20 already test merge) | none |
| PR-S7-C1 | none | none | none |
| PR-S7-C2 | none (snap7 accepts any TSAP) | none | none |
| PR-S7-C3 | none | none | none |
| PR-S7-C4 | none | none | none |
| PR-S7-C5 | none (existing `MK0` MW0 seed covers success path) | none | none |
| PR-S7-D1 | none (CSV import lands tags pointing at existing seeds) | none | possibly add fixture-pointer constants |
| PR-S7-D2 | `udt_layout` meta-seed (per-member offsets) | `DB1.MyUdt[400]` (Real + Int + Bool layout) | `MyUdtTag` + member tags |
| PR-S7-D3 | none (resolved addresses land in DB1) | none | none |
| PR-S7-E1 | none — snap7 doesn't model SZL; unit-level golden bytes cover the parser | none | none |
| PR-S7-E2 | none — snap7 doesn't enforce protection levels; options-binding unit tests only | none | none |
| PR-S7-F | none — snap7 has no Optimized-DB mode; live-firmware checklist instead | none | none |
### E2E `scripts/e2e/test-s7.ps1` impact
`scripts/e2e/test-s7.ps1` runs the five-assertion CLI loopback (probe /
driver-loopback / forward-bridge / reverse-bridge / subscribe-sees-change)
against `DB1.DBW0` Int16. Build-Yes PRs that add CLI surface get a
matching loopback assertion; PRs that touch only internals or admin-UI
flows do not.
| PR | E2E script change |
|----|-------------------|
| PR-S7-A1 | add LInt loopback assertion (write 0x7FFFFFFFFFFFFFFF, read back) |
| PR-S7-A2 | add string round-trip assertion |
| PR-S7-A3 | none (CLI cookbook covers manual surface) |
| PR-S7-A4 | add array round-trip assertion |
| PR-S7-A5 | none (live-LOGO! field-only) |
| PR-S7-B1 | none |
| PR-S7-B2 | none |
| PR-S7-C1 | none |
| PR-S7-C2 | none (live-firmware-only) |
| PR-S7-C3 | none (subscribe assertion already covers polling) |
| PR-S7-C4 | none |
| PR-S7-C5 | none (probe runs first today) |
| PR-S7-D1 | none (Admin UI has its own e2e) |
| PR-S7-D2 | add UDT-member round-trip assertion |
| PR-S7-D3 | none (import-time concern) |
| PR-S7-E1 | none |
| PR-S7-E2 | none (live-firmware-only) |
| PR-S7-F | none (decision PR; live-firmware checklist instead) |
---
## Skip-rated items (for context)
| # | Gap | Skip rationale |
|---|-----|---------------|
| 12 | AS-Alarms / Alarm_S / ProDiag | Alarms are a separate workstream; no `IAlarmSource` shipped on this driver yet, and the gap analysis flags it as a deferred topic. |
| 13 | CPU Run / Stop control / block download | Security and safety risk. PG-class writes that change CPU state are explicitly out of scope. |
| 15 | S7-1500 Secure Communication / TLS | Significant work; S7netplus has no TLS surface. Reconsider when S7Plus track is taken. |
| 16 | S7-400H redundant H-system support | Rare in our deployment scope. Server-level redundancy (`docs/Redundancy.md`) covers the OPC UA layer; H-system driver-level failover is a separate axis. |
| 17 | Multi-CPU rack parallel sessions | One session per CPU works for the deployments we target; multi-CPU racks are an S7-400 niche. |
| 18 | MPI / Profibus / RFC1006-routed transports | Declining use; brownfield only. S7netplus is Ethernet-only. |
| 23 | Connection-resource budget / parallel jobs | One connection works; premature optimization until a deployment hits the cap. |
---
## Open questions
### (a) Library choice for S7Plus
PR-S7-F gates on this decision. Options:
1. **Stay on S7netplus + document Optimized-DB constraint** (preferred default).
2. **Fork to Sharp7 / Snap7 v2** — does *not* solve the S7Plus / Optimized-DB
problem; both are classic S7comm only. Adopting them buys nothing for this
gap. Reject unless we want it for unrelated reasons.
3. **Custom S7Plus client over Wireshark-dissected protocol** — large effort,
ongoing maintenance risk. Only if a customer is paying.
4. **OPC UA → OPC UA bridge via existing OPC UA Client driver** — sidesteps
S7Plus by re-using Siemens's onboard OPC UA server. Recommended secondary
track.
Decision needed before Phase 6 PR-S7-F kicks off.
### (b) `WriteIdempotent` semantics for new types
The `WriteIdempotent` per-tag flag (decisions #44, #45, #143) governs replay-
safe writes. New types from Phase 1:
- **STRING / WSTRING** — typically idempotent (recipe / message text).
Replay-safe by default? **Need confirmation.** Risk: PLC programs that
treat a new string write as a "new message" event would double-fire.
- **DTL / DT** — usually written from a clock master; replay-safe.
- **Arrays of UDT** — depends on the UDT semantics (recipe = safe, command
block = unsafe). Inherit `WriteIdempotent` from the parent tag, do not
add a per-member flag.
- **64-bit types** — same rule as 32-bit equivalents.
Default: keep `WriteIdempotent = false` for everything. Operators flip per
tag based on PLC program semantics. **No semantic extension needed**, but
document the per-type guidance in `docs/v2/s7.md`.
### (c) Symbol-import file format(s)
PR-S7-D1 ships an importer. Which formats?
- **TIA Portal CSV** (Show all tags / Export) — preferred entry point;
most common. **Confirm.**
- **TIA Portal SDF / Excel** — same data; harder to parse. Skip unless
customer demand emerges.
- **STEP 7 Classic AWL / SCL `.AWL`** — secondary. Useful for legacy
S7-300/400 sites still on Classic. **Include in D1?**
- **`.s7p` / `.zap` project archive** — full TIA project. ZIP-shaped;
symbol export would require unpacking and parsing internal XML. Large
scope. **Defer.**
- **`.udt` / `.SDF` external tag library** — niche; defer unless asked.
Recommendation: PR-S7-D1 ships **TIA CSV** + **AWL** only. Anything else is
a follow-up. Decision needed before Phase 4 work begins.
+899
View File
@@ -0,0 +1,899 @@
# TwinCAT Driver — Implementation Plan
> Source of gap analysis: [featuregaps.md → TwinCAT](../featuregaps.md#twincat-beckhoff-ads)
>
> Covers Build = Yes items only.
## Summary
The TwinCAT driver (`src/ZB.MOM.WW.OtOpcUa.Driver.TwinCAT/`) ships a solid baseline:
six capability interfaces over `Beckhoff.TwinCAT.Ads` v6 `AdsClient`, native
`AdsTransMode.OnChange` notifications, AMS address parsing, symbol-path parser
with multi-dim subscripts, controller-side browse with system-symbol filtering,
and a 30-case live integration suite against TCBSD + Hyper-V XAR. Twelve gaps
remain rated Build=Yes in `docs/featuregaps.md` and they cluster cleanly into
five themes:
1. **Data-type correctness**`LInt`/`ULInt` silently truncated to Int32
(explicit `// matches Int64 gap` comment in `TwinCATDataType.cs:40`),
`TIME`/`DATE`/`DT`/`TOD` marshalled as raw `UDINT` rather than native UA
types, `ENUM`/`ALIAS` skipped at browse, bit-indexed BOOL writes throw,
multi-dim and whole-array reads not batched.
2. **Performance** — every read is a `ReadValueAsync` call with re-resolved
symbolic name; no Sum commands, no handle caching. Multi-thousand-tag
scans pay symbol resolution + per-tag AMS round-trip cost on every cycle.
3. **Operability**`NotificationSettings(OnChange, cycleMs, 0)` clamps
max-delay to zero with no per-tag override; probe loop only checks
reachability — no cycle-time / jitter / `_AppInfo` / RT-state telemetry.
4. **UDT decomposition**`Structure` is declared in the enum but discovery
skips non-atomic symbols (`AdsTwinCATClient.cs:224`); to expose nested UDT
trees we need TMC-file parsing or runtime data-type table introspection.
5. **Alarms** — no `IAlarmSource` implementation; TC3 EventLogger / AMS port
110 events never surface as OPC UA AC events.
The plan ships as five phases / 12 PRs. Phases 1-3 are all narrow scope and can
land in parallel where dependencies allow. Phase 4 (UDT/TMC) is the largest
single piece of work and is called out as such. Phase 5 (alarms) requires
investigation up front (Beckhoff TC3 EventLogger NuGet availability — see
Open questions).
Hyper-V conflict gating: live integration runs against the TCBSD VM
(`docs/drivers/TwinCAT-Test-Fixture.md`, AmsNetId `41.169.163.43.1.1` at
`10.100.0.128`) since the local Hyper-V XAR can't co-exist with Docker
Desktop. All wire-level tests gate on `[TwinCATFact]` / `[TwinCATTheory]`
and skip cleanly when `TWINCAT_TARGET_NETID` is unset.
## Phased delivery
| Phase | Theme | PRs | Sequencing |
|---|---|---|---|
| 1 | Data-type correctness | 1.1 — 1.5 | Independent; ship in any order |
| 2 | Performance — Sum + handles | 2.1 — 2.3 | 2.3 depends on 2.2 |
| 3 | Operability — max-delay + diagnostics | 3.1 — 3.2 | Independent |
| 4 | UDT decomposition with TMC parsing | 4.1 | Stand-alone; significant scope |
| 5 | TC3 EventLogger alarms | 5.1 | Stand-alone; spike first |
Total: 12 PRs covering the 12 Build=Yes gaps.
Recommended landing order: **Phase 1 (correctness) → Phase 3 (operability) →
Phase 2 (perf) → Phase 5 (alarms) → Phase 4 (UDT)**. Correctness first because
it's cheap and removes fixtures' `Skip("Int64 gap")`-style workarounds.
Operability before perf because the diagnostics surface created in 3.2 makes it
much easier to validate Sum-command throughput claims in 2.1.
## Per-PR detail
### Phase 1 — Data-type correctness
#### PR 1.1 — Int64 fidelity for `LINT` / `ULINT`
**Scope**: Map `LInt`/`ULInt` to `DriverDataType.Int64` (currently truncates to
Int32 per `TwinCATDataType.cs:40` comment "matches Int64 gap"). `MapToClrType`
already returns `typeof(long)`/`typeof(ulong)`; the truncation is purely in the
`ToDriverDataType` extension.
**Files**:
- `src/ZB.MOM.WW.OtOpcUa.Driver.TwinCAT/TwinCATDataType.cs` — change line 40 to
`=> DriverDataType.Int64;` (drop the gap comment).
- Verify `DriverDataType.Int64` exists in `Core.Abstractions` — if not, add it
(likely scope creep into `ZB.MOM.WW.OtOpcUa.Core.Abstractions/DriverDataType.cs`).
**Beckhoff.TwinCAT.Ads API**: none — the wire-level `AdsClient.ReadValueAsync`
already returns `long`/`ulong` boxed in `result.Value` when called with
`typeof(long)` per `MapToClrType`.
**Test plan**:
- Unit: extend `TwinCATCapabilityTests` — assert `LInt.ToDriverDataType() ==
Int64`, `ULInt.ToDriverDataType() == Int64`.
- Integration: extend `GVL_Primitives` to include an `LINT` (`nLargeCounter`)
seeded with `0x1_0000_0000L` (above Int32 range). Add a `[TwinCATTheory]`
case asserting the value round-trips without truncation. May need a new
`GVL_Primitives.lLong : LINT` symbol if not already present (the existing
16-primitive theory in `TwinCAT3SmokeTests.cs` covers `LInt`/`ULInt` —
inspect what value it seeds and tighten the assertion).
**Effort**: S (half day).
**Deps**: none.
**Docs / fixture / e2e**:
- Docs: `docs/Driver.TwinCAT.Cli.md` "Data types" table — drop the "marshal as
`UDINT` on the wire" caveat for `LInt` / `ULInt` (this PR keeps Int64 fidelity);
`docs/drivers/TwinCAT-Test-Fixture.md` "Bugs caught by live runs" gains a 4th
entry pinning the truncation regression.
- Fixture (TCBSD PLC project): `PLC/GVLs/GVL_Primitives.TcGVL` adds
`vLargeCounter : LINT := 16#1_0000_0000` (above Int32 range) + matching
`vLargeCounterU : ULINT`; `tests/.../TwinCatProject/README.md` "GVL_Primitives
numeric seeds" enumerates the new symbols.
- Integration tests: `TwinCAT3SmokeTests.cs` — extend the 16-case
`[TwinCATTheory]` to 17/18 cases covering the new LINT/ULINT seeds; assert
the value round-trips without truncation.
- E2E: no change to `scripts/e2e/test-twincat.ps1` — the bridge script targets
a single DINT counter, untouched by Int64 work.
#### PR 1.2 — TIME / DATE / DT / TOD as native UA types
**Scope**: Stop marshalling `TIME` / `DATE` / `DT` / `TOD` as raw `UDINT`
(`AdsTwinCATClient.cs:278-280`). Map according to IEC 61131-3 semantics:
- `TIME` (ms duration) → `DriverDataType.Duration` (UA `Double` seconds, or
add `Duration` to `DriverDataType` if missing).
- `DATE` (days since 1970-01-01) → `DriverDataType.DateTime` (midnight UTC).
- `DT` (seconds since 1970-01-01) → `DriverDataType.DateTime`.
- `TOD` (ms since midnight) → `DriverDataType.DateTime` (today's date +
offset) or a dedicated `TimeOfDay` type if the abstraction supports it.
**Files**:
- `src/ZB.MOM.WW.OtOpcUa.Driver.TwinCAT/TwinCATDataType.cs` — update
`ToDriverDataType` mapping for the four IEC time types.
- `src/ZB.MOM.WW.OtOpcUa.Driver.TwinCAT/AdsTwinCATClient.cs` — `MapToClrType`
returns the raw UDINT today; keep that for the wire read but post-process
inside `ReadValueAsync` / `ConvertForWrite` to convert UDINT ↔ `DateTime` /
`TimeSpan`. Symmetrical change in `OnAdsNotificationEx` so subscriptions see
the same shape.
**Beckhoff.TwinCAT.Ads API**: still `AdsClient.ReadValueAsync(symbol,
typeof(uint), ct)`. Beckhoff exposes `PlcOpenDate` / `PlcOpenTimeOfDay` etc.
in `TwinCAT.Ads.TypeSystem` — using those types directly would simplify
conversion but tightens our coupling. Investigate during PR.
**Test plan**:
- Unit: round-trip helpers UDINT-since-epoch ↔ `DateTime` for each variant.
- Integration: add `GVL_Primitives.dCurrentTime : DT` seeded with a known
literal (e.g. `DT#2026-01-15-12:00:00`); assert the driver returns a
`DateTime` matching that instant within 1 s.
**Effort**: M (1-2 days).
**Deps**: none. May expose missing `Duration` in `DriverDataType` enum.
**Docs / fixture / e2e**:
- Docs: `docs/Driver.TwinCAT.Cli.md` "Data types" section — replace the
"marshal as `UDINT` on the wire — CLI takes a numeric raw value" paragraph
with native syntax (e.g. `read -t DateTime` returns ISO-8601, `write -t Time
-v 00:00:01.500` for IEC TIME duration). New examples for each of the four
IEC time types under `read` / `write`.
- Fixture (TCBSD PLC project): `PLC/GVLs/GVL_Primitives.TcGVL` adds
`dCurrentTime : DT := DT#2026-01-15-12:00:00`, `tCycleDuration : TIME :=
T#1500ms`, `dToday : DATE := DATE#2026-04-25`, `tShiftStart : TOD :=
TOD#06:30:00`. Existing primitives theory in
`tests/.../TwinCatProject/README.md` § "Type coverage" gets the seed values
documented.
- Integration tests: `TwinCAT3SmokeTests.cs` — new
`Driver_round_trips_TIME_DATE_DT_TOD_as_native_UA_types` `[TwinCATFact]`
reading each variable and asserting the CLR shape (`TimeSpan` / `DateTime`).
Update the existing 16-case primitive `[TwinCATTheory]` to assert native
types instead of raw `UDINT` for these four entries.
- E2E: `scripts/e2e/test-twincat.ps1` unchanged for now (single DINT bridge);
follow-up could add a DT-typed bridge node but it's not on the critical path.
#### PR 1.3 — Bit-indexed BOOL writes (read-modify-write)
**Scope**: Replace the `NotSupportedException` at `AdsTwinCATClient.cs:99-100`
with a read-modify-write sequence: read parent word as `uint`, set/clear bit,
write the word back. Must serialize against concurrent writes to the same
parent word — a single `SemaphoreSlim` keyed on parent symbol path is
sufficient (concurrency on bit writes within the same parent is rare and the
PLC cycle is the natural lower bound on contention anyway).
**Files**:
- `src/ZB.MOM.WW.OtOpcUa.Driver.TwinCAT/AdsTwinCATClient.cs` — replace `throw`
branch in `WriteValueAsync` with RMW logic mirroring `ReadValueAsync`'s
bit-index path. Add `ConcurrentDictionary<string, SemaphoreSlim>
_bitWriteLocks` keyed on parent symbol.
**Beckhoff.TwinCAT.Ads API**: `AdsClient.ReadValueAsync(parent, typeof(uint))`
+ `AdsClient.WriteValueAsync(parent, modifiedWord)`. Both already used.
**Test plan**:
- Unit: extend `TwinCATReadWriteTests` with a `FakeTwinCATClient` test
covering set + clear of bits 0, 7, 15, 31 of a `uint` parent.
- Integration: add a new `[TwinCATFact]` —
`Driver_round_trips_bit_indexed_BOOL_write_and_read` against
`GVL_Primitives.vWord.4` (the `0xBEEF` word's bit-4); flip to true, read
back as true, flip to false, read back as false.
**Effort**: S-M (1 day).
**Deps**: none. Closes task #181 referenced in the existing `NotSupported`
exception message.
**Docs / fixture / e2e**:
- Docs: `docs/Driver.TwinCAT.Cli.md` `write` section — add an example
`otopcua-twincat-cli write -n ... -s "GVL_Primitives.vWord.4" -t Bool -v
true` and a note explaining the RMW semantics + concurrency caveat (parent
word is locked per write — concurrent bit writes on the same word
serialize). `docs/drivers/TwinCAT-Test-Fixture.md` "Bugs caught by live
runs" updates entry #3 to note that writes now also work (read previously
shipped; write was the gap).
- Fixture (TCBSD PLC project): no schema change required —
`GVL_Primitives.vWord` already exists with seed `0xBEEF`. Tests use bits 4
(clear) and 7 (set) to round-trip.
- Integration tests: `TwinCAT3SmokeTests.cs` — new
`Driver_round_trips_bit_indexed_BOOL_write_and_read` `[TwinCATFact]`. Unit
tests in `TwinCATReadWriteTests` extended via `FakeTwinCATClient` for bits
0/7/15/31 of a `uint` parent.
- E2E: no change.
#### PR 1.4 — Multi-dim and whole-array reads
**Scope**: Expand `ReadValueAsync` / `WriteValueAsync` to handle whole-array
reads via Beckhoff's array marshalling, instead of element-by-element. The
symbol-path parser already produces `TwinCATSymbolSegment.Subscripts` with N
dims; today the driver only reads single elements (one path per request).
**Files**:
- `src/ZB.MOM.WW.OtOpcUa.Driver.TwinCAT/AdsTwinCATClient.cs` — when a tag
declares `IsArray=true` (extend `TwinCATTagDefinition`), use
`AdsClient.ReadValueAsync(symbol, typeof(int[]))` / `typeof(double[,])` etc.
- `src/ZB.MOM.WW.OtOpcUa.Driver.TwinCAT/TwinCATDriver.cs` — surface
`IsArray` + `ArrayDim` through `DriverAttributeInfo` in `DiscoverAsync`.
- `src/ZB.MOM.WW.OtOpcUa.Driver.TwinCAT/TwinCATTagDefinition.cs` (if exists,
in `TwinCATDriverOptions.cs`) — add `bool IsArray`, `int[]? ArrayDimensions`.
**Beckhoff.TwinCAT.Ads API**: `AdsClient.ReadValueAsync(symbol, Type, ct)`
accepts CLR array types. For dynamically-sized reads use
`AdsClient.ReadAnyAsync<T[]>(...)` or pass `Array.CreateInstance(elemType,
dims)`. SymbolLoader yields a `Symbol.Category == DataTypeCategory.Array` we
can inspect to autoderive dimensions during discovery.
**Test plan**:
- Unit: parse `Matrix[1,2]` and verify ranking / dimension flow into the
request shape via `FakeTwinCATClient`.
- Integration: extend `GVL_Arrays` with a 5x5 `aReal2D : ARRAY [1..5, 1..5]
OF REAL`; new `[TwinCATFact]` reads the whole array in one call and
verifies element count + values.
**Effort**: M (2-3 days).
**Deps**: none. Sets up the array-shape plumbing the rest of the driver
needs anyway.
**Docs / fixture / e2e**:
- Docs: `docs/Driver.TwinCAT.Cli.md` `read` section — add whole-array example
(`read -s "GVL_Arrays.aReal2D"` returns the full matrix as JSON) plus a
dedicated "Arrays" sub-section calling out 1-D / N-D / array-of-struct
semantics. `docs/drivers/TwinCAT-Test-Fixture.md` "What it actually covers"
list adds the whole-array bullet.
- Fixture (TCBSD PLC project): `PLC/GVLs/GVL_Arrays.TcGVL` already declares
`ARRAY[1..4,1..4] OF REAL` per `TwinCatProject/README.md` § "Array
coverage". This PR adds a 5x5 `aReal2D : ARRAY [1..5, 1..5] OF REAL`
initialised with a deterministic pattern (e.g. `(i-1)*5 + (j-1)`) so the
whole-array test can assert each element. README "Array coverage" gets the
new symbol.
- Integration tests: `TwinCAT3SmokeTests.cs` — new
`Driver_reads_whole_2D_array_in_one_call` `[TwinCATFact]`. Unit tests
extend `TwinCATSymbolPathTests` for multi-dim subscript shape.
- E2E: no change to `scripts/e2e/test-twincat.ps1` (scalar bridge); a future
array-bridge scenario is captured in the consolidated section below.
#### PR 1.5 — ENUM and ALIAS at discovery
**Scope**: `MapSymbolTypeName` returns `null` for any non-atomic type
(`AdsTwinCATClient.cs:224`), so ENUM and ALIAS symbols are silently dropped
during browse. ENUM is essentially a sized-integer with named members; ALIAS
is a renamed atomic. Both are extremely common in real projects (motor states,
recipe-step IDs, bit-flag groups).
**Files**:
- `src/ZB.MOM.WW.OtOpcUa.Driver.TwinCAT/AdsTwinCATClient.cs` —
`MapSymbolTypeName` keyed only on the type name today; switch to inspecting
`symbol.DataType` + `symbol.Category` from `TwinCAT.TypeSystem`. For
`DataTypeCategory.Enum` walk `EnumType.EnumValues` and pick the underlying
base type. For `DataTypeCategory.Alias` resolve `AliasType.BaseType`
recursively until atomic.
- `src/ZB.MOM.WW.OtOpcUa.Driver.TwinCAT/BrowseSymbolsAsync` —
surface enum members so the OPC UA layer can later emit them as
EnumStrings.
**Beckhoff.TwinCAT.Ads API**: `TwinCAT.Ads.TypeSystem.SymbolLoaderFactory`
already returns full `IDataType` objects with `Category`, `EnumType`,
`AliasType`, etc. No new APIs.
**Test plan**:
- Unit: extend `TwinCATSymbolBrowserTests` — fake an enum symbol via
`FakeTwinCATClient`; assert it browses with the underlying base type.
- Integration: add `E_LineState : (Idle, Running, Faulted)` + a GVL instance
variable; new `[TwinCATFact]` browses + reads it as `Int16` (or whatever
the underlying type is).
**Effort**: M (1-2 days).
**Deps**: none. POINTER / REFERENCE / INTERFACE / UNION are explicitly
out-of-scope for this PR — they need real-world demand and a much larger
type-system rework. ENUM and ALIAS are the 80% case.
**Docs / fixture / e2e**:
- Docs: `docs/Driver.TwinCAT.Cli.md` `browse` section — note that ENUM and
ALIAS symbols now appear in the output (previously dropped); add a Data
types row for "Enum (surfaced as underlying integer with EnumStrings)"
and "Alias (resolved to base atomic)". `docs/drivers/TwinCAT-Test-
Fixture.md` "What it actually covers" extends with the enum/alias bullet.
- Fixture (TCBSD PLC project): `PLC/DUTs/E_AxisState.TcDUT` and
`E_Severity.TcDUT` already exist; `PLC/DUTs/T_Temperature.TcDUT` and
`T_MeterPerSec.TcDUT` already exist. `PLC/GVLs/GVL_Enums.TcGVL` already
exposes them at the root per `TwinCatProject/README.md` § "Enum + alias
coverage" — no fixture change needed for this PR. README's "Integration-
test contract" gets a new entry for `GVL_Enums.currentSeverity` /
`currentTemperature` so the new browse assertion has a stable target.
- Integration tests: `TwinCAT3SmokeTests.cs` — new
`Driver_browses_enums_and_aliases_with_resolved_base_types` `[TwinCATFact]`
asserting the four `GVL_Enums` symbols surface with the correct underlying
CLR type (`Int32` for E_AxisState, `Int16` for E_Severity, `Double` for
the LREAL aliases).
- E2E: no change.
### Phase 2 — Performance (Sum commands + handle caching)
#### PR 2.1 — ADS Sum-read / Sum-write
**Scope**: Today `ReadAsync` loops over `fullReferences` issuing one
`ReadValueAsync` per tag (`TwinCATDriver.cs:118-156`). Beckhoff's ADS Sum
commands (`IndexGroup=0xF080..0xF084`) batch N reads/writes into a single AMS
request. `Beckhoff.TwinCAT.Ads` v6 exposes this via
`AdsClient.ReadWriteAsync` with `SumCommand` request envelopes —
specifically `SumSymbolRead` / `SumSymbolWrite` from
`TwinCAT.Ads.SumCommand`. ~10x throughput on multi-thousand-tag scans
according to Beckhoff InfoSys.
**Files**:
- `src/ZB.MOM.WW.OtOpcUa.Driver.TwinCAT/AdsTwinCATClient.cs` — new
`ReadValuesAsync(IReadOnlyList<(string symbol, Type clrType)>, ct)` returning
a parallel array of `(value, status)`.
- `src/ZB.MOM.WW.OtOpcUa.Driver.TwinCAT/TwinCATDriver.cs::ReadAsync` — bucket
`fullReferences` by `DeviceHostAddress`, call the new client method per
bucket. `bitIndex` handling stays per-tag (RMW post-step).
- `src/ZB.MOM.WW.OtOpcUa.Driver.TwinCAT/ITwinCATClient.cs` — add the
bulk-read / bulk-write surface.
**Beckhoff.TwinCAT.Ads API**:
- `AdsClient.ReadWriteAsync(IndexGroup=0xF080, IndexOffset=count, ...)` for
raw sum-read by handle.
- Higher-level: `TwinCAT.Ads.SumCommand.SumSymbolRead(client, symbols)` /
`SumSymbolWrite(client, symbols, values)` in v6. Verify the exact namespace
during PR — Beckhoff sometimes re-shuffles between minor versions.
- For symbolic (no handle) batching: `SumSymbolReadByName`.
**Test plan**:
- Unit: `FakeTwinCATClient.ReadValuesAsync` fakes the bulk surface; test
ordering preservation, partial-failure mapping, empty-input handling.
- Integration: `[TwinCATFact]` reads 100 declared tags in one call, asserts
value parity with 100 single-call equivalents and measures wall-clock
difference (assert under 50% of the loop baseline).
**Effort**: M-L (3 days).
**Deps**: none (handle caching in 2.2 amplifies the win but isn't required).
**Docs / fixture / e2e**:
- Docs: `docs/v3/twincat-backlog.md` perf note moves out (Sum-commands no
longer deferred) — add a closed-out bullet pointing at this PR. New
performance section in `docs/drivers/TwinCAT-Test-Fixture.md` documenting
the throughput baseline + Sum-command delta. `docs/Driver.TwinCAT.Cli.md`
doesn't expose Sum directly to the user — the CLI still drives one symbol
per call — so no CLI doc change.
- Fixture (TCBSD PLC project, primary fixture-extension surface): add a new
`PLC/GVLs/GVL_Perf.TcGVL` declaring `aTags : ARRAY[1..1000] OF DINT` plus
a `MAIN` rung (or new `FB_PerfChurn` POU) that increments each element on
a rotating subset. `TwinCatProject/README.md` § "Required project state"
gains a "Performance scenarios" subsection documenting the 1000-tag GVL.
- Integration tests: new perf test
`Driver_sum_read_1000_tags_beats_loop_baseline_by_5x` (`[TwinCATFact]`,
perf-tier — guarded behind a separate `TWINCAT_PERF=1` env flag so CI
noise from VM jitter doesn't flap the suite). Unit tests cover ordering,
partial-failure mapping, empty-input via `FakeTwinCATClient.ReadValuesAsync`.
- E2E: `scripts/e2e/test-twincat.ps1` unchanged for the canonical bridge;
perf scripts live alongside as a separate `scripts/perf/twincat-sum.ps1`
if/when introduced (deferred — integration test is sufficient).
#### PR 2.2 — Handle-based access with caching
**Scope**: Cache `AdsClient.CreateVariableHandleAsync` results so per-read
overhead drops from "resolve symbolic name + read by name" to "read by handle"
— smaller AMS payloads, no name resolution on each call. Cache lifetime is
process-scoped; eviction is via the PR 2.3 invalidation listener. Until 2.3
ships the cache must be cleared on `AdsClient` reconnect (the existing
auto-reconnect path in `EnsureConnectedAsync`).
**Files**:
- `src/ZB.MOM.WW.OtOpcUa.Driver.TwinCAT/AdsTwinCATClient.cs` — add
`ConcurrentDictionary<string, uint> _handleCache`. Wrap reads/writes through
`EnsureHandleAsync(symbolPath)` that hits the cache or calls
`CreateVariableHandleAsync`. On `AdsErrorCode.DeviceSymbolVersionInvalid`
(0x710 / 1808) evict the entry and retry once.
- Dispose path: `DeleteVariableHandleAsync` for every cached handle on
`AdsClient.Dispose` to be a good citizen with the runtime.
**Beckhoff.TwinCAT.Ads API**:
- `AdsClient.CreateVariableHandleAsync(string symbol, ct)` → returns
`ResultHandle` with `.Handle` (uint).
- `AdsClient.ReadAnyAsync<T>(IndexGroup=0xF005, IndexOffset=handle, ct)`
reads by handle.
- `AdsClient.WriteAnyAsync(IndexGroup=0xF005, IndexOffset=handle, value, ct)`.
- `AdsClient.DeleteVariableHandleAsync(uint handle, ct)`.
**Test plan**:
- Unit: `FakeTwinCATClient` records handle-create / read-by-handle calls;
test asserts second read of same symbol uses cached handle (zero new
creates).
- Integration: subscribe + read 50 tags, capture AMS round-trips via probe
counter, assert the second pass uses ~50% of the bytes (handle = 4 bytes
vs symbol path = N bytes).
**Effort**: M (2 days).
**Deps**: combines with PR 2.1 for sum-read-by-handle (highest perf path).
Without 2.3, handles can go stale after an online change — call out the
caveat in driver options and add a manual `FlushOptionalCachesAsync` invocation
that wipes the handle cache.
**Docs / fixture / e2e**:
- Docs: `docs/drivers/TwinCAT-Test-Fixture.md` perf section gets a paragraph
noting that handles drop AMS payload size for repeated reads (4 bytes vs.
N-byte symbol path); call out the staleness caveat (online-change
invalidation lands in 2.3). `docs/Driver.TwinCAT.Cli.md` adds a brief note
in the `subscribe` / `read` sections that handles are cached transparently
— no user-visible flag.
- Fixture (TCBSD PLC project): no change required — handle caching is
observable via byte-counter on the wire, not via PLC-side state. The
perf-scenario `GVL_Perf.aTags` from PR 2.1 doubles as the exercise target.
- Integration tests: new
`Driver_handle_cache_avoids_repeat_symbol_resolution` `[TwinCATFact]`
reads the same 50 symbols twice; asserts second pass uses cached handles
(probed via diagnostics counters from PR 3.2 if shipped, otherwise via a
test-only hook on `AdsTwinCATClient`). Unit tests on
`FakeTwinCATClient.HandleCacheTests` assert second read of same symbol
triggers zero new handle creates.
- E2E: no change.
#### PR 2.3 — Symbol-version invalidation listener
**Scope**: TwinCAT publishes a "symbol table version changed" notification on
ADS Index Group `ADSIGRP_SYMVAL_BYHND` (or rather, version bumps land via
`SystemServiceLoadFile` style notifications + `SymbolVersion` reads). When the
PLC takes an online change, all cached handles are silently invalidated; the
next read returns `DeviceSymbolVersionInvalid` if you're lucky and a wrong
value if you're not. We register a notification on the symbol-version index
and wipe the handle cache on bump.
**Files**:
- `src/ZB.MOM.WW.OtOpcUa.Driver.TwinCAT/AdsTwinCATClient.cs` — on connect,
call `AddDeviceNotificationAsync(ADSIGRP_SYM_VERSION, 0, length=1, ...)`
with `AdsTransMode.OnChange`. On callback, clear `_handleCache` + log.
**Beckhoff.TwinCAT.Ads API**:
- `AdsClient.AddDeviceNotificationAsync(uint indexGroup, uint indexOffset,
int length, NotificationSettings, object userData, ct)` — the raw,
index-group-based variant (not the symbol-name `Ex` variant we use today).
- Index group: `AdsReservedIndexGroup.SymbolVersion` (0xF008). One byte
payload that's the current symbol-version counter. Confirm during PR — open
question (c) below.
**Test plan**:
- Unit: extend `TwinCATNativeNotificationTests` — `FakeTwinCATClient` exposes
a `FireSymbolVersionChange()` method; test asserts handle cache is cleared
and subsequent reads recreate handles.
- Integration: `[TwinCATFact]` triggers an online change on the TCBSD project
(rebuild a GVL with one new variable + login activate) — needs a project
helper that automates the online-change. May ship behind a manual gate
(`[TwinCATFact(Reason="requires-manual-online-change")]`) initially.
**Effort**: M (2 days).
**Deps**: PR 2.2 (no point invalidating an empty cache). Confirm
`SymbolVersion` index-group constant in `Beckhoff.TwinCAT.Ads` v6 — open
question (c) below.
**Docs / fixture / e2e**:
- Docs: `docs/drivers/TwinCAT-Test-Fixture.md` section on "What it does NOT
cover" — drop the implicit "online-change handling" gap. New paragraph in
the perf section noting handle cache is now self-invalidating.
`docs/Driver.TwinCAT.Cli.md` no change (transparent to CLI user).
- Fixture (TCBSD PLC project): no schema change. Operator workflow gains an
online-change drill — `TwinCatProject/README.md` adds a § "Online-change
test scenario" describing the steps (open project, add a dummy variable
to `GVL_Perf`, "Login + Activate" → triggers the symbol-version bump).
This is the manual gate for the integration assertion.
- Integration tests: new `Driver_invalidates_handle_cache_on_symbol_version_bump`
`[TwinCATFact]` — initially gated `[TwinCATFact(Reason="requires-manual-online-change")]`
until automation lands. Unit tests cover the callback path via
`FakeTwinCATClient.FireSymbolVersionChange()`.
- E2E: no change.
### Phase 3 — Operability
#### PR 3.1 — Per-tag MaxDelay tuning
**Scope**: Today `NotificationSettings` is hard-coded as `(OnChange, cycleMs,
0)` (`AdsTwinCATClient.cs:144-145`). MaxDelay=0 means "fire as soon as the
change is detected, no coalescing"; for bursty high-frequency signals this
floods the OPC UA subscription queue. Surface MaxDelay as a per-tag option
(default 0 to preserve current behavior).
**Files**:
- `src/ZB.MOM.WW.OtOpcUa.Driver.TwinCAT/TwinCATDriverOptions.cs` — add
`int? MaxDelayMs` to `TwinCATTagDefinition`.
- `src/ZB.MOM.WW.OtOpcUa.Driver.TwinCAT/TwinCATDriver.cs::SubscribeAsync` —
pass through to client.
- `src/ZB.MOM.WW.OtOpcUa.Driver.TwinCAT/AdsTwinCATClient.cs::AddNotificationAsync`
— accept `int maxDelayMs`, plumb into `NotificationSettings(...,
cycleMs, maxDelayMs)`.
**Beckhoff.TwinCAT.Ads API**: `NotificationSettings(AdsTransMode mode, int
cycleTime, int maxDelay)` — both args in milliseconds per Beckhoff InfoSys
`tcadsnetref/7313319051`.
**Test plan**:
- Unit: extend `TwinCATNativeNotificationTests` — assert the plumbed
`maxDelayMs` lands on `NotificationSettings`.
- Integration: subscribe to `GVL_Fixture.nCounter` with `MaxDelayMs=500`;
assert delivery rate is ≤ 2 Hz even when PLC cycle is 10 ms.
**Effort**: S (half day).
**Deps**: none.
**Docs / fixture / e2e**:
- Docs: `docs/Driver.TwinCAT.Cli.md` `subscribe` flag table — add `--max-delay-ms`
with default `0` and a note that nonzero coalesces high-frequency PLC
signals. Update the description of `-i` / `--interval-ms` to disambiguate
cycle vs. max-delay (both pass through to `NotificationSettings`).
`docs/drivers/TwinCAT-Test-Fixture.md` "Notification coalescing under
jitter" caveat — noting per-tag MaxDelay is now configurable.
- Fixture (TCBSD PLC project): no change required — `GVL_Fixture.nCounter`
already increments on every 10 ms cycle (see `MAIN.TcPOU`), so the test
can drive a 100 Hz change rate and verify ≤ 2 Hz delivery with
`MaxDelayMs=500`. README "Required project state" gets a one-line note
that the counter doubles as the coalescing-test driver.
- Integration tests: new `Driver_coalesces_notifications_at_max_delay`
`[TwinCATFact]` subscribes to `GVL_Fixture.nCounter` with `MaxDelayMs=500`
and asserts delivered-event count ≤ 3 over a 1 s window.
- E2E: `scripts/e2e/test-twincat.ps1` `Test-SubscribeSeesChange` is a
one-shot subscribe; no change. A future high-rate variant could test
coalescing end-to-end through the OPC UA bridge but it's not on the
critical path.
#### PR 3.2 — Cycle-time / jitter / PLC-state diagnostics
**Scope**: Probe loop today only checks reachability via `ReadStateAsync`
(`TwinCATDriver.cs::ProbeLoopAsync`). Surface cycle-time, jitter, and online-
change counter as health signals via the standard `_AppInfo` /
`TwinCAT_SystemInfoVarList._AppInfo` GVL (the same one we filter out of
discovery). Specifically:
- `_AppInfo.OnlineChangeCnt` (UDINT) — incremented on every online change.
- `_AppInfo.AppName` (STRING) — TC project name, useful for
cross-instance identification.
- `_TaskInfo[1].CycleTime` (UDINT, 100 ns units) — the configured PLC cycle.
- `_TaskInfo[1].LastExecTime` (UDINT, 100 ns units) — most recent measured
cycle execution; jitter is the delta against `CycleTime`.
**Files**:
- `src/ZB.MOM.WW.OtOpcUa.Driver.TwinCAT/TwinCATDriver.cs::ProbeLoopAsync` —
augment success path to also read these four symbols. Surface via a new
`TwinCATDeviceDiagnostics` record on `DeviceState`. Emit through
`IDriverDiagnostics` (the cross-driver diagnostics surface introduced for
Modbus prohibition events — task #154).
- `src/ZB.MOM.WW.OtOpcUa.Driver.TwinCAT/TwinCATSystemSymbolFilter.cs` — leave
the filter as-is for the user-visible browse; the probe path reads system
symbols directly without going through discovery.
**Beckhoff.TwinCAT.Ads API**: still `AdsClient.ReadValueAsync(symbol, type,
ct)`. The symbols are read by name, not by index group, so no new API.
**Test plan**:
- Unit: `FakeTwinCATClient` exposes `SetSystemSymbolValue(string name, object
value)` so tests can drive the diagnostics surface deterministically.
- Integration: `[TwinCATFact]` connects to TCBSD, asserts the diagnostics
block populates `CycleTimeMs > 0` and `OnlineChangeCnt >= 0` within one
probe interval.
**Effort**: M (1-2 days).
**Deps**: confirm `IDriverDiagnostics` shape from existing Modbus diagnostics
RPC (task #154 in MEMORY); it should be reusable.
**Docs / fixture / e2e**:
- Docs: new section "Diagnostics" in `docs/drivers/TwinCAT-Test-Fixture.md`
documenting the four exposed signals (cycle time, jitter, online-change
counter, app name) and where they surface in the cross-driver
diagnostics RPC. `docs/Driver.TwinCAT.Cli.md` `probe` section gains a
"Health probe" sub-section noting the same symbols can be read directly
via `probe -s "TwinCAT_SystemInfoVarList._AppInfo.OnlineChangeCnt"`
(the existing example) plus the new `_TaskInfo[1].CycleTime` /
`LastExecTime`. Add `docs/v3/twincat-backlog.md` cross-link confirming
cycle-time/jitter no longer deferred.
- Fixture (TCBSD PLC project): no change required — `_AppInfo` and
`_TaskInfo[1]` are TwinCAT system GVLs, present on every runtime. The
`TwinCATSystemSymbolFilter` already drops them from user browse;
`TwinCatProject/README.md` adds a one-line "These symbols are read by
the probe loop, not project-defined" callout.
- Integration tests: new `Probe_loop_surfaces_cycle_time_and_online_change_count`
`[TwinCATFact]` asserts the diagnostics record populates within one
probe interval against TCBSD. Unit tests via `FakeTwinCATClient.SetSystemSymbolValue`
drive the diagnostics surface deterministically.
- E2E: no change. Future enhancement could expose driver diagnostics via a
CLI subcommand (`otopcua-twincat-cli diagnostics -n ...`) — captured in
the consolidated section below as a follow-up.
### Phase 4 — UDT decomposition with TMC parsing
#### PR 4.1 — Nested UDT browse via TMC parsing
**Scope**: Largest single piece of work in the plan. `TwinCATDataType.Structure`
exists but `BrowseSymbolsAsync` skips non-atomic symbols
(`AdsTwinCATClient.cs:224`); to expose nested UDT trees we either:
1. **Online**: walk the `IDataType` tree returned by `SymbolLoaderFactory` —
each `IStructType` exposes `SubItems` recursively. This is what
`Beckhoff.TwinCAT.Ads` v6's TypeSystem already gives us at runtime; we just
never recursed.
2. **Offline (TMC file)**: parse the TwinCAT Module Class XML file the project
compiles to (`*.tmc`), build a type catalogue, drive discovery from it
without requiring a live runtime.
We ship the **online** path first (PR 4.1) because it covers 100% of the case
where the runtime is reachable, and `SymbolLoaderFactory` already does the
heavy lifting. TMC offline parsing is deferred to a hypothetical PR 4.2 if a
disconnected-discovery use case emerges (unlikely; live integration tests
demonstrate runtime is always available in our deployments).
**Files**:
- `src/ZB.MOM.WW.OtOpcUa.Driver.TwinCAT/AdsTwinCATClient.cs` —
`BrowseSymbolsAsync` recurses into `IStructType.SubItems`, yielding one
`TwinCATDiscoveredSymbol` per leaf with the dotted instance path
(`MyStruct.Inner.Field`). For arrays-of-structs, expand element-by-element
up to a configurable bound (default 1024) — beyond that, expose only the
array root with `IsArray=true`.
- `src/ZB.MOM.WW.OtOpcUa.Driver.TwinCAT/TwinCATDriver.cs::DiscoverAsync` —
fold the recursed structure into the existing `Discovered/` folder tree
using `IAddressSpaceBuilder.Folder` for each struct member.
- New: `TwinCATTypeWalker.cs` — pure helper that takes an `IDataType` and
yields `(instancePath, atomicType, readOnly)` tuples. Unit-testable without
touching `AdsClient`.
**Beckhoff.TwinCAT.Ads API**:
- `TwinCAT.TypeSystem.IStructType` — `SubItems` (collection of
`IMember`); each member has `BaseType`, `Name`, `Offset`.
- `TwinCAT.TypeSystem.IArrayType` — `Dimensions`, `BaseType`.
- `TwinCAT.TypeSystem.IEnumType` — handled in PR 1.5 (atomic surface).
- `TwinCAT.TypeSystem.IAliasType.BaseType` — recurse until atomic.
**Test plan**:
- Unit: new `TwinCATTypeWalkerTests` — feed synthetic `IDataType` trees,
assert the flattened paths and types.
- Integration: extend `GVL_Plant` (already has `Line1.Stations[1].Axes[1].Motor`
per `TwinCAT3SmokeTests.cs`) — the existing `Driver_reads_deeply_nested_UDT_path`
test reads a known-leaf path; add a new test that browses into the same
GVL and asserts the entire tree shape matches expectation. Should yield
~50+ leaves.
**Effort**: L (4-5 days). Most of the cost is in the addressspace-builder
folder/variable plumbing, not the type walking itself.
**Deps**: PR 1.5 (ENUM/ALIAS) — without it, struct members of enum type
silently drop. PR 1.4 (whole-array reads) is helpful but not blocking.
**Docs / fixture / e2e**:
- Docs: this is the **largest doc-write of the plan**.
`docs/Driver.TwinCAT.Cli.md` gains a new top-level "UDT decomposition"
section explaining the dotted-instance browse syntax (`MyStruct.Inner.
Field`), array-of-struct expansion bound, and how members surface via
`browse`. The existing `read` example "Nested UDT member" gets expanded
with a multi-level case targeting the plant hierarchy. `docs/drivers/
TwinCAT-Test-Fixture.md` "What it actually covers" gets a UDT bullet
per-member rather than per-leaf. Update `docs/v3/twincat-backlog.md` —
remove the implicit UDT-decomposition gap.
- Fixture (TCBSD PLC project, primary fixture-extension surface): the
existing `GVL_Plant.Line1.Stations[1..3].Axes[1..4]...` 5-level
hierarchy already provides ~50+ leaves per `TwinCatProject/README.md`
§ "5-level plant hierarchy" + § "Live value churn". This PR may add a
few **edge cases** to stress the type walker:
- `PLC/DUTs/ST_NestedFlags.TcDUT` — struct containing a BIT-packed
member (e.g. `Flags : DWORD` with named bit-mask aliases).
- `PLC/DUTs/ST_RecursiveCap.TcDUT` — struct with a self-pointer (must
be capped by the type walker, not infinite-recurse). Demonstrates
POINTER skip behavior.
- Add an `ARRAY [1..2000] OF ST_AlarmRecord` to exercise the
`MaxArrayExpansion` (default 1024) cutoff.
README § "Complex hierarchy" gets the new edge-case DUTs documented.
- Integration tests: new `TwinCATTypeWalkerTests` (unit) feeding synthetic
`IDataType` trees. Live: `Driver_browses_full_plant_hierarchy_yields_50_plus_leaves`,
`Driver_caps_array_of_struct_expansion_at_configured_bound`,
`Driver_handles_self_referential_struct_without_recursion` against the
new edge-case DUTs.
- E2E: `scripts/e2e/test-twincat.ps1` could gain a UDT-bridge scenario
(`-BridgeNodeId` pointing at `GVL_Plant.Line1.Stations[1].Axes[1].Motor.
Temperature`) but this requires the OPC UA server's address-space to
reflect the decomposed tree — keep as a follow-up after server-side
rendering ships in v3.
### Phase 5 — TC3 EventLogger alarms
#### PR 5.1 — `IAlarmSource` via TC3 EventLogger
**Scope**: TwinCAT 3.1 build 4022+ ships TcEventLogger as a system service
exposing alarms/events on AMS port 110 (`AMSPORT_EVENTLOG`). Implement
`IAlarmSource` over that interface so PLC alarms surface as OPC UA AC events.
**Open question (b) below** drives the implementation: does Beckhoff publish
a managed wrapper, or do we hit AMS port 110 directly?
If a managed wrapper exists:
- `Beckhoff.TwinCAT.Ads.TcEventLogger` (or similar) — subscribe via
`EventLogger.AlarmRaised` event.
If not (likely — InfoSys docs lean on `TcCOM` C++ APIs):
- Open a second `AdsClient` connection to port 110 via
`_secondaryClient.Connect(netId, 110)`.
- Use `AddDeviceNotificationAsync` on the alarm-list index group
(`ADSIGRP_TCEVENTLOG_ALARMS`, exact constant TBD during spike).
- Decode the binary event payload into `AlarmEvent` records (severity,
source, message, time-of-occurrence, ack state).
**Files**:
- New: `src/ZB.MOM.WW.OtOpcUa.Driver.TwinCAT/TwinCATAlarmSource.cs`
— implements `IAlarmSource` (currently used by Galaxy / Wonderware).
- `src/ZB.MOM.WW.OtOpcUa.Driver.TwinCAT/TwinCATDriver.cs` — declare
`IAlarmSource` interface, delegate to the helper.
- `src/ZB.MOM.WW.OtOpcUa.Driver.TwinCAT/TwinCATDriverOptions.cs` — new
`bool EnableAlarms` (default `false` until production-validated).
**Beckhoff.TwinCAT.Ads API**: TBD pending spike. Falls back to raw
`AdsClient.AddDeviceNotificationAsync` on port 110 if no managed wrapper.
**Test plan**:
- Unit: fake event-logger feeds synthetic alarms; assert `IAlarmSource`
surface raises events with correct shape.
- Integration: TCBSD project gains an `Alarm.Raise(...)` call site on a GVL
bool transition; new `[TwinCATFact]` subscribes via the driver, toggles the
trigger, asserts the alarm appears in the source within 5 s.
**Effort**: L (4-5 days), most of which is the spike. If no managed wrapper
exists, add another L (3-4 days) to implement the binary protocol decoder.
**Deps**: spike answer to open question (b) — surface that as an explicit
investigation PR before committing to the build.
**Docs / fixture / e2e**:
- Docs: **new file** `docs/drivers/TwinCAT.md` (the existing
`TwinCAT-Test-Fixture.md` is fixture-only) covering the alarm
configuration surface — `EnableAlarms` option, AMS port 110 routing,
severity / source / message decode, OPC UA AC mapping. Spike output
goes to `docs/v3/twincat-eventlogger-spike.md` per open question (b).
`docs/Driver.TwinCAT.Cli.md` gains a new `alarms` subcommand (subscribe
+ print stream) mirroring the OPC UA Client CLI's `alarms` verb.
`docs/drivers/TwinCAT-Test-Fixture.md` "Alarms / history" caveat
removed; capability matrix gets `IAlarmSource = yes`.
- Fixture (TCBSD PLC project, primary fixture-extension surface): add
`PLC/POUs/FB_AlarmHarness.TcPOU` that calls `FB_TcLogEvent` (or
equivalent TC3 EventLogger PLC API) on a 5 s tick, raising / clearing
a known event class. New `PLC/GVLs/GVL_Alarms.TcGVL` exposes the
trigger booleans the test toggles. `TwinCatProject/README.md` § new
"Alarm scenarios" subsection documents the event class IDs + severity
+ cleared-on transitions. The existing `ST_Alarm` DUT remains for
PLC-level data; the EventLogger is the AC source.
- Integration tests: new `TwinCATAlarmIntegrationTests.cs` —
`Driver_raises_alarm_event_when_PLC_logs_event` `[TwinCATFact]`
toggles the trigger via `WriteAsync`, asserts the alarm appears in
`IAlarmSource.AlarmRaised` within 5 s. Includes a clear-event variant.
Unit tests via fake event-logger feed synthetic alarms.
- E2E: `scripts/e2e/test-twincat.ps1` gains a `Test-AlarmRoundTrip`
step (toggle PLC trigger → assert event surfaces via OPC UA AC client)
once the server-side wiring is in. Likely defers to a follow-up PR
after the server-tier alarm rendering catches up.
## Documentation, fixture, and e2e impact
Consolidated view across all 12 PRs. The **TCBSD fixture PLC project**
(`tests/ZB.MOM.WW.OtOpcUa.Driver.TwinCAT.IntegrationTests/TwinCatProject/`) is
the **primary fixture-extension surface** — it's a real TwinCAT XAE project
committed object-by-object as `.TcGVL` / `.TcDUT` / `.TcPOU` files. Most PRs
extend it by adding GVL variables, DUTs (structs / enums / aliases), or POUs
(function blocks driving live data churn). The TCBSD VM at AmsNetId
`41.169.163.43.1.1` on `10.100.0.128` is the deployment target (per memory
entry `project_tcbsd_fixture.md`); the project bypasses the local Hyper-V/RTIME
conflict (per `project_twincat_hyperv_conflict.md`) by running on ESXi.
### User docs touched
| PR | `docs/Driver.TwinCAT.Cli.md` | `docs/drivers/TwinCAT-Test-Fixture.md` | `docs/v3/twincat-backlog.md` | Other |
|---|---|---|---|---|
| 1.1 LINT/ULINT | Data-types caveat removed | Bugs-caught entry #4 | — | — |
| 1.2 TIME/DATE/DT/TOD | Native-type syntax + 4 examples | — | — | — |
| 1.3 Bit-write | `write` example + RMW note | Bugs-caught entry #3 update | — | — |
| 1.4 Arrays | New "Arrays" sub-section + read example | Coverage list bullet | — | — |
| 1.5 ENUM/ALIAS | `browse` data-types rows | Coverage list bullet | — | — |
| 2.1 Sum cmds | — | New "Performance" section | Closed-out perf bullet | — |
| 2.2 Handles | Cache note in `read` / `subscribe` | Perf-section paragraph | — | — |
| 2.3 Sym-version | — | Online-change-handling caveat dropped | — | — |
| 3.1 MaxDelay | `--max-delay-ms` flag | Coalescing caveat updated | — | — |
| 3.2 Diagnostics | `probe` health-symbols sub-section | New "Diagnostics" section | Cycle-time bullet closed | — |
| 4.1 UDT | New top-level "UDT decomposition" section | Coverage list per-member | UDT-decomp gap removed | — |
| 5.1 Alarms | New `alarms` subcommand | "Alarms" caveat removed | — | **New** `docs/drivers/TwinCAT.md`; **new** `docs/v3/twincat-eventlogger-spike.md` |
### TCBSD fixture PLC project changes
| PR | GVL changes | DUT changes | POU changes | README section |
|---|---|---|---|---|
| 1.1 LINT/ULINT | `GVL_Primitives.vLargeCounter`, `vLargeCounterU` | — | — | "GVL_Primitives numeric seeds" |
| 1.2 TIME/DATE/DT/TOD | `GVL_Primitives.dCurrentTime`, `tCycleDuration`, `dToday`, `tShiftStart` | — | — | "Type coverage" seed values |
| 1.3 Bit-write | _(reuse `GVL_Primitives.vWord`)_ | — | — | — |
| 1.4 Arrays | `GVL_Arrays.aReal2D : ARRAY[1..5,1..5] OF REAL` | — | — | "Array coverage" |
| 1.5 ENUM/ALIAS | _(reuse `GVL_Enums`; new `currentSeverity`/`currentTemperature` instance vars)_ | — | — | "Integration-test contract" entry |
| 2.1 Sum cmds | **`GVL_Perf.aTags : ARRAY[1..1000] OF DINT`** | — | New `FB_PerfChurn` driving rotating writes | New "Performance scenarios" subsection |
| 2.2 Handles | _(reuse `GVL_Perf.aTags`)_ | — | — | — |
| 2.3 Sym-version | _(no schema change; manual online-change drill)_ | — | — | New "Online-change test scenario" |
| 3.1 MaxDelay | _(reuse `GVL_Fixture.nCounter` 100 Hz driver)_ | — | — | One-line note in "Required project state" |
| 3.2 Diagnostics | _(reads system GVLs `_AppInfo`, `_TaskInfo[1]`)_ | — | — | Probe-symbols callout |
| 4.1 UDT | _(reuse `GVL_Plant`; possibly grow `aLargeAlarms : ARRAY[1..2000] OF ST_AlarmRecord`)_ | New `ST_NestedFlags`, `ST_RecursiveCap`, `ST_AlarmRecord` | — | "Complex hierarchy" edge-cases |
| 5.1 Alarms | New `GVL_Alarms` (trigger booleans) | — | New `FB_AlarmHarness` calling `FB_TcLogEvent` | New "Alarm scenarios" |
### Integration test additions
All new tests gate on `[TwinCATFact]` / `[TwinCATTheory]` against
`TWINCAT_TARGET_NETID`. Most ship in `tests/ZB.MOM.WW.OtOpcUa.Driver.TwinCAT.
IntegrationTests/TwinCAT3SmokeTests.cs`; PR 5.1 introduces a new
`TwinCATAlarmIntegrationTests.cs`. The existing 30-case suite grows to
roughly **45 cases** end-of-plan, plus a perf-tier guarded behind
`TWINCAT_PERF=1`.
### E2E scripts
`scripts/e2e/test-twincat.ps1` is the single TwinCAT e2e bridge today; it's
gated behind `TWINCAT_TRUST_WIRE=1` (see task #221 — CI fixture). The plan
intentionally **does not change** the canonical bridge for most PRs because
the bridge exercises one DINT counter through the OPC UA server, and that
path stays correct. PRs 1.2 (DT bridge), 1.4 (array bridge), 4.1 (UDT
bridge), 5.1 (alarm round-trip) each list speculative e2e extensions but
they're explicitly marked as follow-ups gated on server-side rendering
catching up.
## Skip-rated items (for context)
These are intentionally not built. Listed for future-reader completeness so
nobody re-invests effort that was already triaged:
| # | Gap | Why skip |
|---|---|---|
| 9 | Multi-target / multi-route AMS gateway | Per-device config in `TwinCATDriverOptions.Devices` already supports N targets |
| 10 | Secure ADS / ADS-over-TLS | Significant work — TC3.1 build 4024+ feature, host-router-level config; defer |
| 11 | Route credential management | Host-level AMS router responsibility (`StaticRoutes.xml`); not driver scope |
| 12 | NC-axis / CNC channel / EtherCAT slave I/O | Specialty; system-symbol filter actively drops `Mc_*` (`TwinCATSystemSymbolFilter.cs:28`) |
| 13 | System-service ports (200/10000) | Niche operational tooling; user-runtime ports cover real use cases |
| 15 | PLC RPC / method invocation | Niche; design-heavy; no demand signal yet |
| 16 | Per-PLC-runtime auto-discover | Cosmetic; manual port config in options works |
| 20 | File-system access via ADS (FOPEN/FREAD) | Niche; out of scope |
## Open questions
1. **(a) TMC parsing — separate library or embedded?**
Phase 4 ships the **online type-walker** path which uses
`Beckhoff.TwinCAT.Ads.TypeSystem.SymbolLoaderFactory` and needs a live
runtime. If a future use case needs offline discovery (e.g. address-space
pre-bake at build time without a reachable PLC), do we:
- vendor a TMC-XML parser into this driver, or
- build a separate `ZB.MOM.WW.OtOpcUa.Tooling.TwinCAT` CLI that emits a
pre-baked tag manifest?
The latter cleanly separates build-time tooling from runtime driver code
and matches how Galaxy.Host is split. Decision deferred until demand
appears; recommend the CLI route when it does.
2. **(b) Beckhoff TC3 EventLogger NuGet — published, or AMS port 110 raw?**
Need to spike against the current `Beckhoff.TwinCAT.Ads` v6 NuGet API
surface. Beckhoff InfoSys lists a `Tc3_EventLogger` PLC library and a
TcCOM C++ API but the .NET surface is thinner. PR 5.1 starts with a
one-day spike documented as `docs/v3/twincat-eventlogger-spike.md` before
committing to the implementation path.
3. **(c) Symbol-version invalidation event details**
PR 2.3 needs the exact index-group constant and notification semantics for
the symbol-version counter. `AdsReservedIndexGroup.SymbolVersion` (0xF008)
is the working hypothesis but the field on the v6 enum needs verification
— the older `TwinCAT.Ads.AdsReservedIndexGroup` enum had different naming.
Beckhoff InfoSys `tcadscommon/tcadscommon_indexgroups` is the reference;
confirm during the PR 2.3 spike. Fallback: poll the version counter at
probe-loop cadence and treat any change as an invalidation.
## References
- `src/ZB.MOM.WW.OtOpcUa.Driver.TwinCAT/TwinCATDriver.cs`
- `src/ZB.MOM.WW.OtOpcUa.Driver.TwinCAT/AdsTwinCATClient.cs`
- `src/ZB.MOM.WW.OtOpcUa.Driver.TwinCAT/TwinCATDataType.cs`
- `src/ZB.MOM.WW.OtOpcUa.Driver.TwinCAT/TwinCATSymbolPath.cs`
- `src/ZB.MOM.WW.OtOpcUa.Driver.TwinCAT/TwinCATSystemSymbolFilter.cs`
- `src/ZB.MOM.WW.OtOpcUa.Driver.TwinCAT/TwinCATAmsAddress.cs`
- `docs/featuregaps.md` — TwinCAT (Beckhoff ADS) section
- `docs/v3/twincat-backlog.md` — deferred items (TC2, multi-hop, lab IPC)
- `docs/drivers/TwinCAT-Test-Fixture.md` — TCBSD + XAR fixture details
- Beckhoff InfoSys: <https://infosys.beckhoff.com/english.php?content=../content/1033/tcadsdll2/117571083.html> (Sum commands)
- Beckhoff InfoSys: <https://infosys.beckhoff.com/english.php?content=../content/1033/tcadsnetref/7313319051.html> (NotificationSettings)
- Beckhoff GitHub: <https://github.com/Beckhoff/TC3-AdsClient-Csharp>
+149
View File
@@ -0,0 +1,149 @@
# Galaxy backend parity matrix
This document tracks the scenario × result matrix that the
`Driver.Galaxy.ParityTests` suite drives against both Galaxy backends —
the legacy out-of-process **Galaxy.Host** (.NET 4.8 x86 + MXAccess COM,
fronted by `GalaxyProxyDriver`) and the new in-process **mxgateway**
backend (`GalaxyDriver`, .NET 10 + gRPC against `mxaccessgw`).
Maintained alongside Phase 5 (PR 5.W). The Phase 7 default flip
(PR 7.1) consumes this matrix as its go/no-go gate — every row must be
either green or carry an explicit *accepted-delta* justification.
## Reading the matrix
- **Status: green** — the scenario asserts strict parity and passes
(or skips cleanly when the rig isn't up).
- **Status: yellow** — soft pin only (count or shape parity, not value
parity) — acceptable when the underlying COM/gRPC stacks have known
divergences in raw payloads but the surface presented to the
DriverNodeManager is equivalent.
- **Status: red** — divergence detected. Row carries a fix or a
follow-up task ID.
## Scenarios
Last verified end-to-end on the dev parity rig: **2026-04-30**
(legacy `OtOpcUaGalaxyHost` mxaccess backend; mxaccessgw v1.x at
`http://localhost:5120`; sandbox `OtOpcUaParityTest_001` deployed in
the `ZB` galaxy; 13 passed / 1 skipped / 0 failed in 19 minutes).
| PR | Test class | Scenario | Status | Notes |
|----|-----------|----------|--------|-------|
| 5.2 | `BrowseAndReadParityTests` | Same variable set | green | symmetric set diff on full-reference set, after `[]` array-suffix workaround in `GalaxyDiscoverer` |
| 5.2 | `BrowseAndReadParityTests` | Same DataType / SecurityClass / IsHistorized | green | per-attribute meta triple parity |
| 5.2 | `BrowseAndReadParityTests` | Same StatusCode-class on a sampled read | yellow | pins status class (Bad/Uncertain/Good); CLR type intentionally not asserted — see "Accepted deltas" #6 |
| 5.3 | `SubscribeAndEventRateParityTests` | Subscribe returns a handle on each backend | green | symmetric Unsubscribe cleanup |
| 5.3 | `SubscribeAndEventRateParityTests` | Event rate within ±50% over 3s | yellow | both backends fed by the same upstream MXAccess subscriptions; tolerance absorbs scheduler jitter |
| 5.4 | `WriteByClassificationParityTests` | FreeAccess / Operate write status-class parity | yellow | pins status class only; legacy flat-maps every failure to BadInternalError, mxgw distinguishes (BadCommunicationError, BadDeviceFailure, etc.) — see "Accepted deltas" #7 |
| 5.4 | `WriteByClassificationParityTests` | Configure / Tune routes via secured-write | yellow | same status-class pin |
| 5.5 | `AlarmTransitionParityTests` | Same alarm-condition source-node-id set | green | one-way invariant on sub-attribute refs (legacy populated → mxgw matches; legacy null → mxgw free to populate per AlarmRefBuilder) |
| 5.5 | `AlarmTransitionParityTests` | IsAlarm-marked variable count parity | green | soft pin — count must match, doesn't have to be non-zero |
| 5.6 | `HistoryReadParityTests` | Same historized attribute set | green | what HistoryRouter consumes when routing to the Wonderware sidecar |
| 5.6 | `HistoryReadParityTests` | New mxgw GalaxyDriver does not implement `IHistoryProvider` | green | architectural pin from Phase 1 (PR 1.3) on the *new* path; legacy `GalaxyProxyDriver` keeps the interface for back-compat until PR 7.2 — see "Accepted deltas" #8 |
| 5.7 | `ReconnectParityTests` | Reinitialize → both Healthy + reads succeed | green | recovery latency is *not* pinned (legacy: pipe + COM client; mxgw: re-Register gw session) |
| 5.7 | `ReconnectParityTests` | Health diverges only when one side recovers | yellow | soft pin until a toxiproxy-style fault injector lands |
| 5.8 | `ScanStateProbeParityTests` | Same per-platform host set | n/a — deferred | dev rig is licensed for one `$WinPlatform` only; multi-platform parity deferred to a customer rig (PR 4.7's unit tests pin the state-decoder + member-tracking logic) |
| 5.8 | `ScanStateProbeParityTests` | Same `HostState` per overlapping platform | n/a — deferred | same single-platform constraint |
## Accepted deltas
These are intentional differences between the two backends — the parity
suite skips or tolerates them by design.
1. **Transport-entry host name.** The legacy backend's
`IHostConnectivityProbe` surface includes a host entry named after
the Galaxy.Host process identity; the mxgw backend uses the
configured `MxAccess.ClientName`. The names differ, but both are
correct for their respective sessions — the parity test compares
only the platform-host subset.
2. **Reconnect latency cadence.** Legacy reconnect roundtrips an OS
named pipe + an MxAccess COM client + a Galaxy.Host process restart
if the host died. The mxgw reconnect re-Registers the gateway session
over an existing gRPC channel. Sub-second vs multi-second recoveries
are both correct for their own paths; only the eventual `Healthy`
convergence is pinned.
3. **Read-value drift.** A read sampled twice on a live Galaxy can
return different values legitimately. We pin `StatusCode`-class
parity (Bad/Uncertain/Good); value equality is not pinned.
4. **Event-rate variance.** Both backends consume the same upstream
MXAccess publish events but route them through different deserializers
(LMXProxyServer COM events vs gRPC `MxEvent` protos). Scheduler
jitter on either side can shift counts within a 3s window; we pin a
±50% ratio, not strict equality.
5. **`IHistoryProvider` on the new path only.** Phase 1 (PR 1.3) lifted
history off the per-driver path onto the server-owned
`HistoryRouter` for the *new* in-process `GalaxyDriver`. The legacy
`GalaxyProxyDriver` still surfaces `IHistoryProvider` for back-compat
with the legacy server bootstrap path — it's an accepted delta
retired in PR 7.2 alongside the rest of the legacy projects. The
pin we want to enforce is "the new path doesn't regress to per-driver
history."
6. **Read value-CLR-type.** Legacy returns the raw VARIANT (e.g.
`Byte[]`) for an attribute that hasn't received its first value
cycle from MxAccess yet, while mxgw returns the typed value
(`Single`, `Int32`, etc.). Once a real value is written or scanned,
both converge. Pinning CLR-type equality across the uninitialized
window adds noise without a real parity invariant — the
`StatusCode`-class assertion already covers the
"did the read succeed" question.
7. **Write-failure StatusCode mapping.** Legacy
`MxAccessGalaxyBackend.WriteValuesAsync` flat-maps every failure to
`BadInternalError` (`0x80020000`); mxgw
`GatewayGalaxyDataWriter.TranslateReply` uses
`MxStatusProxy.RawDetectedBy` to distinguish gw-layer faults
(`BadCommunicationError`, `0x80050000`) from MxAccess HRESULT
faults (`BadDeviceFailure`, `BadNotConnected`, etc.). Both yield
Bad-status — the parity invariant is the *status class*, not the
exact code. Tighter mapping parity isn't worth investing in: the
legacy mapping retires alongside `GalaxyProxyDriver` in PR 7.2.
8. **Single-platform scope on the dev rig.** Two
`ScanStateProbeParityTests` scenarios are deferred to a customer
rig with multiple deployed `$WinPlatform` instances; this dev box
is licensed for one. PR 4.7's unit tests (`PerPlatformProbeWatcherTests`)
pin the state-decoder + member-tracking logic at the seam level,
so the runtime parity check becomes a customer-rig acceptance gate
before that customer goes live, not a precondition for retiring
the legacy projects on this dev box.
9. **Workaround for the gw `[]` array-suffix bug.**
`mxaccessgw/src/MxGateway.Server/Galaxy/GalaxyRepository.cs:173-175`
appends `[]` to the `full_tag_reference` of array-typed attributes,
which `MxAccess COM IInstance.AddItem` doesn't accept. The lmxopcua
discoverer (`GalaxyDiscoverer.StripArraySuffix`) defensively strips
the suffix. Tracked in `mxaccessgw/requirements-array-suffix-fix.md`;
the workaround is removed when that gw fix lands.
## Outstanding deltas
None as of 2026-04-30. Phase 7 (PR 7.1) flipped the default to
`mxgw`; PR 7.2 (legacy project deletion) is unblocked — the matrix
gate is satisfied and no further soak/pilot precondition applies.
## Running the matrix
```bash
# Both backends must be reachable for any row to run; rows skip
# cleanly when their backend is unavailable.
dotnet test tests/ZB.MOM.WW.OtOpcUa.Driver.Galaxy.ParityTests/
```
Environment overrides for the mxgw backend:
| Variable | Default | Purpose |
|----------|---------|---------|
| `OTOPCUA_PARITY_GW_ENDPOINT` | `http://localhost:5120` | mxaccessgw gRPC endpoint |
| `OTOPCUA_PARITY_GW_API_KEY` | `parity-suite-key` | API key handed to `MxGatewayClient` |
| `OTOPCUA_PARITY_CLIENT_NAME` | `OtOpcUa-Parity` | `MxAccess.ClientName` for the session |
The legacy backend reads ZB SQL on `localhost:1433` and spawns
`OtOpcUa.Driver.Galaxy.Host.exe` from
`src/ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Host/bin/Debug/net48/` — both
must exist for the legacy half to resolve.
+361
View File
@@ -0,0 +1,361 @@
# Galaxy parity rig — runbook
Brings up both Galaxy backends side-by-side against a single live Galaxy
so the parity matrix in `docs/v2/Galaxy.ParityMatrix.md` and the soak
scenario in `tests/ZB.MOM.WW.OtOpcUa.Driver.Galaxy.ParityTests/SoakScenarioTests.cs`
can run for real. Closing the parity matrix is the gate for PR 7.2
(retire legacy Galaxy projects).
## Conceptual layout
```
Galaxy ZB SQL ──┬── OtOpcUaGalaxyHost (NSSM service, net48 x86)
│ └── MxAccess COM, ClientName "OtOpcUa-Galaxy.Host"
│ └── named pipe "OtOpcUaGalaxy"
│ ▲
│ │ pipe IPC
│ │
│ GalaxyProxyDriver ◄── parity test (legacy half)
└── mxaccessgw service
└── MxAccess COM, ClientName "OtOpcUa-Parity"
└── gRPC on http://localhost:5120
│ gRPC
GalaxyDriver (in-process) ◄── parity test (mxgw half)
```
Both halves talk to the **same Galaxy** through **two distinct MxAccess
sessions** (different ClientNames so they don't evict each other).
## What's already on this dev box
Per `~/.claude/projects/.../memory/`:
- **AVEVA System Platform + Galaxy + MXAccess runtime**`project_aveva_platform_installed.md`.
- **`OtOpcUaGalaxyHost`** Windows service running as `dohertj2`, NSSM-wrapped,
binary at `C:\publish\OtOpcUaGalaxyHost\OtOpcUa.Driver.Galaxy.Host.exe`,
shared secret at `.local/galaxy-host-secret.txt`, ZB SQL on `localhost:1433`
`project_galaxy_host_installed.md`.
- **Parity test project** (`Driver.Galaxy.ParityTests`) committed and
skip-clean — runs as soon as the mxgw half resolves.
## Setup steps (one-time)
### 1. Build + run mxaccessgw
The gateway source is at `c:\Users\dohertj2\Desktop\mxaccessgw\`.
Build both halves — the worker has to be x86 net48 (MxAccess COM
bitness), the server is .NET 10:
```powershell
cd C:\Users\dohertj2\Desktop\mxaccessgw
dotnet build src\MxGateway.Worker -c Release # produces bin\x86\Release\net48\MxGateway.Worker.exe
dotnet build src\MxGateway.Server -c Release # produces bin\Release\net10.0\MxGateway.Server.dll
```
Initialize the auth database and mint an API key. The CLI mode is
gated by an `apikey` first-arg prefix:
```powershell
$env:MxGateway__ApiKeyPepper = "parity-rig-dev-pepper" # any stable string for dev
$srv = "C:\Users\dohertj2\Desktop\mxaccessgw\src\MxGateway.Server\bin\Release\net10.0\MxGateway.Server.dll"
dotnet $srv apikey init-db # → "init-db: initialized"
dotnet $srv apikey create-key `
--key-id parity-rig `
--display-name "OtOpcUa-Parity" `
--scopes "session:open,session:close,invoke:read,invoke:write,invoke:secure,events:read,metadata:read"
# → "API key: mxgw_parity-rig_<base64suffix>" ← capture this; you can't list secrets later
```
Save that exact key string for `OTOPCUA_PARITY_GW_API_KEY` in step 2.
Run the server with three env-var overrides — the defaults don't
quite match what gRPC + the parity test need:
```powershell
$env:MxGateway__ApiKeyPepper = "parity-rig-dev-pepper" # MUST match the create-key invocation
$env:Kestrel__Endpoints__Http__Url = "http://localhost:5120"
$env:Kestrel__Endpoints__Http__Protocols = "Http2" # gRPC needs h2c on plain HTTP
$env:MxGateway__Worker__ExecutablePath = `
"C:\Users\dohertj2\Desktop\mxaccessgw\src\MxGateway.Worker\bin\x86\Release\net48\MxGateway.Worker.exe"
# appsettings.json's relative path is missing the \net48 segment; absolute path sidesteps that
dotnet $srv
# → "Now listening on: http://localhost:5120"
```
The worker spawns lazily on the first OpenSession RPC — there's no
worker process visible in Task Manager until the first session. If
the worker can't spawn, the server returns `Failed to open session
session-…` with a `WorkerProcessLaunchException` in the server log.
NSSM-wrap it later if the rig becomes long-lived; for first-pass
provisioning a console window is easier to inspect.
### 2. Set the parity env vars
In the test-runner shell:
```powershell
$env:OTOPCUA_PARITY_GW_ENDPOINT = "http://localhost:5120"
$env:OTOPCUA_PARITY_GW_API_KEY = "parity-suite-key" # match the gw config
$env:OTOPCUA_PARITY_CLIENT_NAME = "OtOpcUa-Parity"
```
Elevation status doesn't matter — the legacy Galaxy.Host pipe ACL accepts
elevated and non-elevated `dohertj2` shells alike (the Administrators deny
ACE was removed 2026-04-24; see `project_galaxy_host_installed.md`).
### 3. Verify both halves resolve
```powershell
cd C:\Users\dohertj2\Desktop\lmxopcua
dotnet test tests\ZB.MOM.WW.OtOpcUa.Driver.Galaxy.ParityTests\ `
--filter "FullyQualifiedName~HarnessShapeTests"
```
`Harness_records_a_skip_reason_for_each_unavailable_backend` is the
two-line truth-teller:
- Both `LegacyDriver` non-null + both `MxGatewayDriver` non-null → rig is up.
- One side null → read its `LegacySkipReason` / `MxGatewaySkipReason` and fix.
## Running the matrix
Once both halves resolve:
```powershell
dotnet test tests\ZB.MOM.WW.OtOpcUa.Driver.Galaxy.ParityTests\ `
--filter "Category=ParityE2E"
```
This runs all 17 scenario tests across the seven scenario classes
(BrowseAndRead / Subscribe / Write / Alarm / History / Reconnect /
ScanState). Each scenario class is independent — failures in one don't
block the rest.
Track the result against `docs/v2/Galaxy.ParityMatrix.md`. Update each
row to:
- **green** if the scenario passes
- **yellow** if it skipped because the dev Galaxy doesn't have the right
shape (see coverage matrix below)
- **red** if it asserted a real delta — those are the deltas that block
PR 7.2; chase each before retiring the legacy backend
## Galaxy shape needed for full coverage
Skip-on-empty-shape scenarios fail-soft today. To turn a skip into a
real result, the dev Galaxy needs the shape in the right column:
| Scenario | Needs | Local rig |
|---|---|---|
| `BrowseAndReadParityTests` (3 tests) | Any deployed objects with attributes | ✅ existing seed |
| `SubscribeAndEventRateParityTests` event-rate | ≥5 attributes whose values *change* in 3s | ⚙ scriptable via graccess-cli |
| `WriteByClassificationParityTests` (FreeAccess/Operate) | A FreeAccess/Operate numeric attribute | ⚙ scriptable via graccess-cli |
| `WriteByClassificationParityTests` (Configure/Tune) | A Configure/Tune attribute | ⚙ scriptable via graccess-cli |
| `AlarmTransitionParityTests` (2 tests) | Attributes with the `$Alarm*` extension | ⚙ scriptable via graccess-cli |
| `HistoryReadParityTests` (historized set) | Attributes with the History extension | ⚙ scriptable via graccess-cli |
| `ScanStateProbeParityTests` (2 tests) | Multiple `$WinPlatform` / `$AppEngine` objects | ❌ **deferred to customer rig** — this dev box is provisioned for one platform only |
### The single-platform constraint
The dev box at `DESKTOP-6JL3KKO` is licensed / configured for a single
deployed `$WinPlatform`. Adding a second platform isn't feasible here,
so `ScanStateProbeParityTests` will skip in a "no overlap" branch on
this rig. Both of its scenarios already handle that case gracefully
(`Assert.Skip("no overlapping platform hosts between backends — likely
the transport names differ but no $WinPlatform was discovered")`), so
the matrix reports them as **n/a (deferred)** rather than red.
Plan: defer the two ScanState scenarios to a customer rig with multiple
platforms. The PR 7.2 gate accepts "n/a, deferred" on these rows
provided the legacy `GalaxyRuntimeProbeManager` and the in-process
`PerPlatformProbeWatcher` have matching unit-test coverage of the
state-decoder + member-tracking logic — which they do (PR 4.7's tests).
Treat the runtime parity check as a customer-rig acceptance gate before
that customer goes live, not a precondition for retiring the legacy
projects on this dev box.
### Provisioning the rest via graccess-cli
`C:\Users\dohertj2\Desktop\graccess\graccess_cli\` is a .NET Framework
4.8 console app over the ArchestrA GRAccess COM API. It can configure
templates, instances, attributes, UDAs, extensions, and attribute
security — i.e. every row above marked ⚙ scriptable. Full surface in
`graccess/graccess_cli/docs/usage.md` and per-area workflow guides
(`attribute-editing.md`, `template-editing.md`,
`template-instance-editing.md`).
Reserve a sandbox UDO (e.g. `OtOpcUaParityTest`) to avoid mutating
attributes on plant-relevant objects. Concrete commands per requirement:
**A FreeAccess/Operate numeric attribute** (covers WriteByClassification
FreeAccess/Operate scenario):
```powershell
graccess object uda add `
--galaxy ZB --name OtOpcUaParityTest --type template `
--uda OperateValue --data-type MxFloat `
--category MxCategoryWriteable_C --security MxSecurityOperate `
--confirm --confirm-target OtOpcUaParityTest
```
**A Configure / Tune attribute** (covers WriteByClassification
Configure/Tune scenario):
```powershell
# Tune
graccess object uda add `
--galaxy ZB --name OtOpcUaParityTest --type template `
--uda TuneValue --data-type MxFloat `
--category MxCategoryWriteable_T --security MxSecurityTune `
--confirm --confirm-target OtOpcUaParityTest
# Configure
graccess object uda add `
--galaxy ZB --name OtOpcUaParityTest --type template `
--uda ConfigValue --data-type MxFloat `
--category MxCategoryWriteable_C --security MxSecurityConfigure `
--confirm --confirm-target OtOpcUaParityTest
```
**A changing-value attribute** (covers Subscribe event-rate scenario).
Two ways:
1. *On-scan increment* — bind a script extension that bumps a counter
each scan. Simplest to author with `object extension add` against
`ScriptExtension` plus `object attribute set` for the script body
(see `attribute-editing.md` §"Edit Extensions" for the pattern).
2. *External writer loop* — leave the attribute as plain Float and run
a one-liner that writes incrementing values from the parity-test
shell. Uses the legacy backend path so it's available before the
mxgw subscriber is up. This keeps the Galaxy template clean.
For first-pass validation pick #2 — no template surgery needed, and the
write loop runs only during `dotnet test`.
**Attributes with the `$Alarm*` extension** (covers AlarmTransition
scenario). Per `attribute-editing.md` §"Edit Alarm Settings" the
likely-named attributes vary by extension type
(`Limit`, `RateOfChange`, etc.). Add the extension via:
```powershell
graccess object extension add `
--galaxy ZB --name OtOpcUaParityTest --type template `
--extension-type AnalogLimitAlarm --primitive AlarmInput `
--object-extension `
--confirm --confirm-target OtOpcUaParityTest
```
Then set HiHi/Hi/Lo/LoLo limit values + priority on the resulting
attributes via `object attribute set`. Inspect first via
`object attributes` to see the names the extension introduces — they
differ across Aveva versions.
**Attributes with the History extension** (covers HistoryRead routing
scenario). History settings are usually attribute or extension
attributes; `attribute-editing.md` §"Edit History Settings" covers the
discovery flow. Quick start:
```powershell
graccess object extension add `
--galaxy ZB --name OtOpcUaParityTest --type template `
--extension-type HistoryExtension --primitive HistoryRecord `
--object-extension `
--confirm --confirm-target OtOpcUaParityTest
# Then enable history on whichever attribute the extension points at
graccess object attribute set `
--galaxy ZB --name OtOpcUaParityTest --type template `
--attribute HistoryEnabled --value true --data-type bool `
--confirm --confirm-target OtOpcUaParityTest
```
**Deploy + restart Galaxy.Host after any of the above** so MxAccess
sees the change:
```powershell
graccess object deploy --galaxy ZB --name OtOpcUaParityTest_001 `
--confirm --confirm-target OtOpcUaParityTest_001
sc.exe restart OtOpcUaGalaxyHost
```
Then re-run the parity matrix. The previously-skipped scenarios should
now find a sandbox attribute matching their selector and assert.
## Soak run
The 24h × 50k soak gates the production confidence half of PR 7.2.
```powershell
$env:OTOPCUA_SOAK_RUN = "1"
$env:OTOPCUA_SOAK_TAGS = "<actual tag count if Galaxy < 50k>"
$env:OTOPCUA_SOAK_MINUTES = "1440" # default 24h; compress for first runs
$env:OTOPCUA_SOAK_DROP_PCT = "0.5"
dotnet test tests\ZB.MOM.WW.OtOpcUa.Driver.Galaxy.ParityTests\ `
--filter "Category=Soak"
```
The test logs a per-minute CSV-style line to stdout:
```
soak,1.0,received=51234,dispatched=51234,dropped=0,ws_mb=412
soak,2.0,received=102468,dispatched=102468,dropped=0,ws_mb=415
...
```
Capture stdout to a file for post-run analysis. The three guards
(`received` growing, `dropped/received` ratio, working-set delta) all
fire mid-run rather than at end-of-test, so a failure surfaces within
the first few minutes if the architecture is wrong.
## Compressed-tag soak (when Galaxy isn't 50k tags)
A first-pass validation is fine with the override:
```powershell
$env:OTOPCUA_SOAK_RUN = "1"
$env:OTOPCUA_SOAK_TAGS = "500" # whatever the dev Galaxy has
$env:OTOPCUA_SOAK_MINUTES = "60" # one hour is enough to surface plumbing bugs
$env:OTOPCUA_SOAK_DROP_PCT = "1.0"
```
This validates the *plumbing* (bounded channel, pump invariants, leak
guard) but doesn't pin the 50k-tag scaling assertion. Defer the full
50k validation to a customer rig with that scale, or build a synthetic
Galaxy with a script that imports 50k attributes onto a generated UDO
(~2 hours of one-off work).
## Troubleshooting
- **`MxGatewaySkipReason` says "mxaccessgw not reachable"** — the gw
isn't listening, or it's on a different port. `Test-NetConnection
localhost -Port 5120` is the quick check.
- **`MxGatewaySkipReason` says "mxgateway backend boot failed:
RpcException: Unauthenticated"** — API key mismatch. Verify the
`OTOPCUA_PARITY_GW_API_KEY` env var matches the gw's configured key.
- **`LegacySkipReason` says "Galaxy ZB SQL not reachable on
localhost:1433"** — SQL Server isn't running, or its TCP listener is
off. Check `services.msc` for the SQL Server (default) instance.
- **`LegacySkipReason` says "Galaxy.Host EXE not built"** — the parity
harness looks under `src/.../bin/Debug/net48/`. Build it once:
`dotnet build src\ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Host`. Note the
separately-published copy at `C:\publish\OtOpcUaGalaxyHost\` is for
the Windows service; the parity harness spawns its own subprocess.
- **Both halves resolve but parity scenarios assert deltas** — that's
the expected outcome the rig exists to surface. Review each delta
against `docs/v2/Galaxy.ParityMatrix.md`'s "Accepted deltas" section
to decide whether it's a real bug or a pre-accepted divergence.
## After the rig is green
When the matrix is fully green or carries documented accepted-deltas,
PR 7.2 (legacy project deletion) is unblocked. The only follow-up is
to promote any newly-discovered accepted-delta to the matrix doc with
the why so the matrix history stays auditable.
+152
View File
@@ -0,0 +1,152 @@
# Galaxy backend performance
This document covers the performance surface of the in-process
`GalaxyDriver` (the v2 mxgw backend) — the ActivitySource it emits, the
metrics on its EventPump, the soak scenario that validates it, and the
tuning knobs you can reach for when the dev parity rig surfaces a hot
spot.
## Tracing surface (PR 6.1)
The driver emits spans on the `ZB.MOM.WW.OtOpcUa.Driver.Galaxy`
ActivitySource. No package dependency on OpenTelemetry — the host
process picks the listener (OTLP exporter, dotnet-trace, Application
Insights). Wire it via `OpenTelemetry.Trace.AddSource(...)` in the
host's tracing pipeline.
| Span | Source | Tags |
|------|--------|------|
| `galaxy.subscribe_bulk` | `TracedGalaxySubscriber` | `galaxy.client`, `galaxy.tag_count`, `galaxy.buffered_interval_ms`, `galaxy.success_count` |
| `galaxy.unsubscribe_bulk` | `TracedGalaxySubscriber` | `galaxy.client`, `galaxy.tag_count` |
| `galaxy.stream_events` | `TracedGalaxySubscriber` | `galaxy.client`, `galaxy.event_count` (set on stream end) |
| `galaxy.write` | `TracedGalaxyDataWriter` | `galaxy.client`, `galaxy.tag_count`, `galaxy.secured_write_count`, `galaxy.success_count` |
| `galaxy.get_hierarchy` | `TracedGalaxyHierarchySource` | `galaxy.client`, `galaxy.object_count` |
The stream-events span deliberately covers the *entire* stream lifetime
rather than per-event spans — at 50k tags / 1Hz the per-event volume
would dominate the trace pipeline. Per-event visibility flows through
the metrics surface instead.
## Metrics surface (PR 6.2)
`EventPump` publishes three counters on the
`ZB.MOM.WW.OtOpcUa.Driver.Galaxy` meter, each tagged with
`galaxy.client` so multi-driver hosts can split by source:
| Counter | Unit | Meaning |
|---------|------|---------|
| `galaxy.events.received` | `{event}` | MxEvents read from the gateway StreamEvents stream |
| `galaxy.events.dispatched` | `{event}` | MxEvents that made it through the bounded channel into `OnDataChange` |
| `galaxy.events.dropped` | `{event}` | MxEvents discarded because the bounded channel was full (newest-dropped) |
The invariant is `received = dispatched + dropped + (in-flight in the
channel)`. Watch the dropped counter — it is the leading indicator of
listener back-pressure. A non-zero dropped rate means a downstream
consumer (DriverNodeManager → UA notification queue → client) is
slower than the gw event stream; investigate that consumer before
raising `EventPump` channel capacity.
### Bounded channel design
The pump runs two background tasks:
1. **Producer** — reads from `IGalaxySubscriber.StreamEventsAsync`,
increments `events.received`, and `TryWrite`s into a bounded
`Channel<MxEvent>`. When the channel is full, the producer counts
the drop and continues reading the gw stream so back-pressure does
not propagate upstream (which would stall the gw worker and cascade
to *all* driver instances sharing that worker).
2. **Consumer** — reads from the channel, fans out via
`SubscriptionRegistry`, increments `events.dispatched`.
Default channel capacity is 50_000 (one second of headroom at 50k
tags / 1Hz). Override via the `EventPump` constructor's
`channelCapacity` parameter; the public-facing wiring path in
`GalaxyDriver.EnsureEventPumpStarted` does not yet expose this through
`GalaxyDriverOptions` because no parity scenario has needed it. Add it
when soak data does.
## Buffered update interval (PR 6.3)
`MxAccess.PublishingIntervalMs` (default 1000) flows through both
subscribe paths:
- `GalaxyDriver.SubscribeAsync` — the caller's `publishingInterval`
wins when non-zero (the server's UA subscription publishingInterval
drives this in production). When the caller passes
`TimeSpan.Zero`, the configured option is the fallback.
- `PerPlatformProbeWatcher` — the watcher passes the configured value
through `SubscribeBulkAsync` so probe `ScanState` changes publish at
the deployment's chosen cadence.
A session-level `SetBufferedUpdateInterval` RPC exists in the gw
protocol but the .NET client doesn't expose a typed helper yet —
adjusting an existing subscription's interval mid-flight is a
follow-up. Today's path subscribes once at the right interval, which
covers the common case.
## Soak scenario (PR 6.4)
`SoakScenarioTests.Soak_HoldsSubscription_AndKeepsEventStreamFlowing`
in `Driver.Galaxy.ParityTests` is the long-running validation. It
subscribes a configurable tag count (default 50_000), holds the
subscription for a configurable duration (default 24h), polls the
three counters every minute, and asserts:
- `events.received` continues to grow (gw stream isn't stuck)
- `events.dropped / events.received` stays under the configured
ceiling (default 0.5%)
- process working-set doesn't grow more than 1 GB above baseline
(leak guard)
Always skipped unless the operator opts in:
```bash
# Full 24h × 50k soak (production validation)
OTOPCUA_SOAK_RUN=1 dotnet test tests/ZB.MOM.WW.OtOpcUa.Driver.Galaxy.ParityTests/
# Compressed CI-friendly run (10min × 1k tags, 1% drop ceiling)
OTOPCUA_SOAK_RUN=1 OTOPCUA_SOAK_MINUTES=10 OTOPCUA_SOAK_TAGS=1000 \
OTOPCUA_SOAK_DROP_PCT=1.0 \
dotnet test tests/ZB.MOM.WW.OtOpcUa.Driver.Galaxy.ParityTests/
```
The scenario writes a per-minute CSV-style row to stdout
(`soak,<minutes>,received=…,dispatched=…,dropped=…,ws_mb=…`) so an
operator can grep the test runner output mid-run.
## Tuned defaults (PR 6.5)
| Option | Default | Source | Notes |
|--------|---------|--------|-------|
| `Gateway.ConnectTimeoutSeconds` | 10 | unchanged | Cold-start network paths fit comfortably; soak never observed >2s |
| `Gateway.DefaultCallTimeoutSeconds` | 30 | **bumped from 5** in PR 6.5 | A 50k-tag `SubscribeBulk` can exceed 5s under MxAccess COM apartment lock contention; 30s leaves headroom while still failing fast on a wedged worker |
| `Gateway.StreamTimeoutSeconds` | 0 (unlimited) | unchanged | The stream must run for the lifetime of the driver |
| `MxAccess.PublishingIntervalMs` | 1000 | unchanged | Matches the legacy `LMXProxyServer` cadence; deployments needing tighter health visibility can dial down |
| `Reconnect.InitialBackoffMs` | 500 | unchanged | First retry shouldn't dogpile a recovering gw |
| `Reconnect.MaxBackoffMs` | 30_000 | unchanged | 30s ceiling so a long-down gw doesn't sit in 5+ min backoff |
| `Repository.DiscoverPageSize` | 5000 | unchanged | One Galaxy page round-trip per ~5k objects; soak hadn't surfaced pressure |
| `EventPump` channel capacity | 50_000 | unchanged | One second of headroom at 50k tags / 1Hz |
The unchanged rows are not "definitely correct" — they are "no live
data argues for changing them." Re-run the soak scenario after every
substantive driver change, and revise this table when the data does.
## Where to look first when something's slow
1. **Slow `Discover`?** Inspect `galaxy.get_hierarchy` span duration
and `galaxy.object_count`. The gw walks the Galaxy DB serially;
slow Discovers usually mean a slow ZB SQL.
2. **Subscribe pile-up?** `galaxy.subscribe_bulk` span duration
correlates with `galaxy.tag_count`. If duration ÷ tag_count starts
climbing, the gw worker is probably under apartment-lock pressure.
3. **Events stalled?** Watch `galaxy.events.received`. Flat-lined
means the gw stream is wedged — kick the reconnect supervisor by
forcing a `ReinitializeAsync`.
4. **Dropped events?** Non-zero `galaxy.events.dropped` means a slow
downstream consumer. Profile `OnDataChange` handlers in
`DriverNodeManager` before bumping the channel capacity.
5. **Memory growing?** Confirm with the soak scenario's working-set
leak guard. Likely culprits: lingering subscription handles in
`SubscriptionRegistry`, or a downstream consumer retaining
`DataValueSnapshot` references past their useful life.
+45 -42
View File
@@ -4,6 +4,7 @@
>
> **Branch**: `v2`
> **Created**: 2026-04-17
> **Updated 2026-04-28**: Docker workloads moved off the Windows dev VM to a shared Linux Docker host at `10.100.0.35` so the dev VM can have its GPU re-attached via ESXi passthrough (Hyper-V/WSL2 was blocking it). The two-tier model below is updated accordingly: per-developer Docker Desktop is gone; SQL Server + driver fixtures all live on the central Linux host, identifiable via `docker ps --filter label=project=lmxopcua`.
## Scope
@@ -13,30 +14,31 @@ Every external resource a developer needs on their machine, plus the dedicated i
## Two Environment Tiers
Per decision #99:
Per decision #99 (updated 2026-04-28):
| Tier | Purpose | Where it runs | Resources |
|------|---------|---------------|-----------|
| **PR-CI / inner-loop dev** | Fast, runs on minimal Windows + Linux build agents and developer laptops | Each developer's machine; CI runners | Pure-managed in-process simulators (NModbus, OPC Foundation reference server, FOCAS TCP stub from test project). No Docker, no VMs. |
| **Nightly / integration CI** | Full driver-stack validation against real wire protocols | One dedicated Windows host with Docker Desktop + Hyper-V + a TwinCAT XAR VM | All Docker simulators (`oitc/modbus-server`, `ab_server`, Snap7), TwinCAT XAR VM, Galaxy.Host installer + dev Galaxy access, FOCAS TCP stub binary, FOCAS FaultShim assembly |
| **PR-CI / inner-loop dev** | Fast, runs on minimal Windows + Linux build agents and developer laptops | Each developer's machine; CI runners | Pure-managed in-process simulators (NModbus, OPC Foundation reference server, FOCAS TCP stub from test project). No Docker, no VMs locally. |
| **Integration / nightly CI** | Full driver-stack validation against real wire protocols | **Shared Linux Docker host at `10.100.0.35`** (Debian 13, Docker 29.2.1) — one host for all developers; replaces the former per-developer Docker Desktop + Hyper-V model | All Docker simulators (pymodbus, ab_server, python-snap7, opc-plc) + central SQL Server, all running as `/opt/otopcua-<driver>/` stacks with the `project=lmxopcua` label. TwinCAT XAR + the Galaxy/mxaccessgw stack stay on the Windows dev VM (license + Hyper-V constraints unchanged) |
The tier split keeps developer onboarding fast (no Docker required for first build) while concentrating the heavy simulator setup on one machine the team maintains.
The Linux Docker host is shared because (a) only one team member needs it active at a time, (b) it removes the per-developer Docker Desktop install, and (c) the dev VM no longer needs Hyper-V/WSL2 — freeing it for GPU passthrough.
## Installed Inventory — This Machine
## Installed Inventory — Dev VM (`DESKTOP-6JL3KKO`)
Running record of every v2 dev service stood up on this developer machine. Updated on every install / config change. Credentials here are **dev-only** per decision #137 — production uses Integrated Security / gMSA per decision #46 and never any value in this table.
Running record of v2 dev services on the Windows dev VM. Updated on every install / config change. Credentials here are **dev-only** per decision #137 — production uses Integrated Security / gMSA per decision #46 and never any value in this table.
**Last updated**: 2026-04-17
**Last updated**: 2026-04-28 — Docker Desktop + WSL2 removed; Docker workloads now live on the Linux Docker host (see next section).
### Host
| Attribute | Value |
|-----------|-------|
| Machine name | `DESKTOP-6JL3KKO` |
| User | `dohertj2` (member of local Administrators + `docker-users`) |
| VM platform | VMware (`VMware20,1`), nested virtualization enabled |
| Machine name | `DESKTOP-6JL3KKO` (10.100.0.48) |
| User | `dohertj2` (local Administrators) |
| VM platform | VMware ESXi |
| CPU | Intel Xeon E5-2697 v4 @ 2.30GHz (3 vCPUs) |
| OS | Windows (WSL2 + Hyper-V Platform features installed) |
| OS | Windows 10 Enterprise (10.0.19045) |
| GPU | (Re-attached after WSL2/Hyper-V removal) |
### Toolchain
@@ -46,36 +48,40 @@ Running record of every v2 dev service stood up on this developer machine. Updat
| .NET AspNetCore runtime | 10.0.5 | `C:\Program Files\dotnet\shared\Microsoft.AspNetCore.App\` | Pre-installed |
| .NET NETCore runtime | 10.0.5 | `C:\Program Files\dotnet\shared\Microsoft.NETCore.App\` | Pre-installed |
| .NET WindowsDesktop runtime | 10.0.5 | `C:\Program Files\dotnet\shared\Microsoft.WindowsDesktop.App\` | Pre-installed |
| .NET Framework 4.8 SDK | — | Pending (needed for Phase 2 Galaxy.Host; not yet required) | — |
| .NET Framework 4.8 SDK | — | Optional — only needed when building the mxaccessgw worker (sibling repo, x86 net48) | — |
| Git | Pre-installed | Standard | — |
| PowerShell 7 | Pre-installed | Standard | — |
| winget | v1.28.220 | Standard Windows feature | — |
| WSL | Default v2, distro `docker-desktop` `STATE Running` | — | `wsl --install --no-launch` (2026-04-17) |
| Docker Desktop | 29.3.1 (engine) / Docker Desktop 4.68.0 (app) | Standard | `winget install --id Docker.DockerDesktop` (2026-04-17) |
| Docker CLI (standalone, no daemon) | 29.3.1 | `%USERPROFILE%\bin\docker.exe` | Static binary from download.docker.com (2026-04-28) |
| Docker Compose CLI plugin | latest | `%USERPROFILE%\.docker\cli-plugins\docker-compose.exe` | Direct download from github.com/docker/compose (2026-04-28) |
| `lmxopcua-fix.ps1` helper | n/a | `%USERPROFILE%\bin\lmxopcua-fix.ps1` | See "Docker host" section below |
| `dotnet-ef` CLI | 10.0.6 | `%USERPROFILE%\.dotnet\tools\dotnet-ef.exe` | `dotnet tool install --global dotnet-ef --version 10.0.*` (2026-04-17) |
| ~~Docker Desktop~~ | — | Removed 2026-04-28 — replaced by remote Linux Docker host | — |
| ~~WSL2 (`docker-desktop` distro)~~ | — | Removed 2026-04-28 (frees Hyper-V for GPU passthrough) | — |
### Services
| Service | Container / Process | Version | Host:Port | Credentials (dev-only) | Data location | Status |
|---------|---------------------|---------|-----------|------------------------|---------------|--------|
| **Central config DB** | Docker container `otopcua-mssql` (image `mcr.microsoft.com/mssql/server:2022-latest`) | 16.0.4250.1 (RTM-CU24-GDR, KB5083252) | `localhost:14330` (host)`1433` (container) — remapped from 1433 to avoid collision with the native MSSQL14 instance that hosts the Galaxy `ZB` DB (both bind 0.0.0.0:1433; whichever wins the race gets connections) | User `sa` / Password `OtOpcUaDev_2026!` | Docker named volume `otopcua-mssql-data` (mounted at `/var/opt/mssql` inside container) | ✅ Running — `InitialSchema` migration applied, 16 entity tables live |
| **Central config DB** | Docker container `otopcua-mssql` on the Linux Docker host (image `mcr.microsoft.com/mssql/server:2022-latest`) | 16.0.4250.1 (RTM-CU24-GDR, KB5083252) | `10.100.0.35:14330``1433` (container) — port 14330 retained from the previous local-container setup so connection-string ports don't churn | User `sa` / Password `OtOpcUaDev_2026!` | Docker named volume `otopcua-mssql-data` on the Docker host | ✅ Running on Docker host (`/opt/otopcua-mssql/`) since 2026-04-28; carries `project=lmxopcua` label |
| Dev Galaxy (AVEVA System Platform) | Local install on this dev box — full ArchestrA + Historian + OI-Server stack | v1 baseline | Local COM via MXAccess (`C:\Program Files (x86)\ArchestrA\Framework\bin\ArchestrA.MXAccess.dll`); Historian via `aaH*` services; SuiteLink via `slssvc` | Windows Auth | Galaxy repository DB `ZB` on local SQL Server (separate instance from `otopcua-mssql` — legacy v1 Galaxy DB, not related to v2 config DB) | ✅ **Fully available — Phase 2 lift unblocked.** 27 ArchestrA / AVEVA / Wonderware services running incl. `aaBootstrap`, `aaGR` (Galaxy Repository), `aaLogger`, `aaUserValidator`, `aaPim`, `ArchestrADataStore`, `AsbServiceManager`, `AutoBuild_Service`; full Historian set (`aahClientAccessPoint`, `aahGateway`, `aahInSight`, `aahSearchIndexer`, `aahSupervisor`, `InSQLStorage`, `InSQLConfiguration`, `InSQLEventSystem`, `InSQLIndexing`, `InSQLIOServer`, `InSQLManualStorage`, `InSQLSystemDriver`, `HistorianSearch-x64`); `slssvc` (Wonderware SuiteLink); `OI-Gateway` install present at `C:\Program Files (x86)\Wonderware\OI-Server\OI-Gateway\` (decision #142 AppServer-via-OI-Gateway smoke test now also unblocked) |
| GLAuth (LDAP) | Local install at `C:\publish\glauth\` | v2.4.0 | `localhost:3893` (LDAP) / `3894` (LDAPS, disabled) | Direct-bind `cn={user},dc=lmxopcua,dc=local` per `auth.md`; users `readonly`/`writeop`/`writetune`/`writeconfig`/`alarmack`/`admin`/`serviceaccount` (passwords in `glauth.cfg` as SHA-256) | `C:\publish\glauth\` | ✅ Running (NSSM service `GLAuth`). Phase 1 Admin uses GroupToRole map `ReadOnly→ConfigViewer`, `WriteOperate→ConfigEditor`, `AlarmAck→FleetAdmin`. v2-rebrand to `dc=otopcua,dc=local` is a future cosmetic change |
| OPC Foundation reference server | Not yet built | — | `localhost:62541` (target) | `user1` / `password1` (reference-server defaults) | — | Pending (needed for Phase 5 OPC UA Client driver testing) |
| FOCAS TCP stub | Not yet built | — | `localhost:8193` (target) | n/a | — | Pending (built in Phase 5) |
| Modbus simulator (`oitc/modbus-server`) | — | — | `localhost:502` (target) | n/a | — | Pending (needed for Phase 3 Modbus driver; moves to integration host per two-tier model) |
| libplctag `ab_server` | — | — | `localhost:44818` (target) | n/a | — | Pending (Phase 3/4 AB CIP and AB Legacy drivers) |
| Snap7 Server | — | — | `localhost:102` (target) | n/a | — | Pending (Phase 4 S7 driver) |
| TwinCAT XAR VM | — | — | `localhost:48898` (ADS) (target) | TwinCAT default route creds | — | Pending — runs in Hyper-V VM, not on this dev box (per decision #135) |
| OPC Foundation reference server | Not yet built | — | `10.100.0.35:62541` (target) | `user1` / `password1` (reference-server defaults) | — | Pending (needed for Phase 5 OPC UA Client driver testing) |
| FOCAS TCP stub | Not yet built | — | `10.100.0.35:8193` (target) | n/a | — | Pending (built in Phase 5; runs on Docker host) |
| Modbus simulator (`otopcua-pymodbus:3.13.0`) | Docker compose at `/opt/otopcua-modbus/` on Docker host | pinned 3.13.0 | `10.100.0.35:5020` | n/a | n/a | Stack staged; bring up with `lmxopcua-fix up modbus <profile>` from this VM |
| AB CIP fixture (`otopcua-ab-server:libplctag-release`) | Docker compose at `/opt/otopcua-abcip/` on Docker host | source-pinned `release` tag | `10.100.0.35:44818` | n/a | n/a | Stack staged; bring up with `lmxopcua-fix up abcip <profile>` from this VM |
| S7 fixture (`otopcua-python-snap7:1.0`) | Docker compose at `/opt/otopcua-s7/` on Docker host | python-snap7 ≥2.0 | `10.100.0.35:1102` | n/a | n/a | Stack staged; bring up with `lmxopcua-fix up s7 s7_1500` from this VM |
| OPC UA simulator (`mcr.microsoft.com/iotedge/opc-plc:2.14.10`) | Docker compose at `/opt/otopcua-opcuaclient/` on Docker host | pinned 2.14.10 | `10.100.0.35:50000` | anonymous | n/a | Stack staged; bring up with `lmxopcua-fix up opcuaclient` from this VM |
| TwinCAT XAR VM | — | — | TBD via Hyper-V on a separate Windows host (NOT this dev VM) | TwinCAT default route creds | — | Pending — Hyper-V removed from this dev VM; XAR will live on a separate dedicated Windows machine if needed |
### Connection strings for `appsettings.Development.json`
Copy-paste-ready. **Never commit these to the repo** — they go in `appsettings.Development.json` (gitignored per the standard .NET convention) or in user-scoped dotnet secrets.
Copy-paste-ready. The checked-in `appsettings.json` defaults already point at the Docker host (`10.100.0.35,14330`), so `appsettings.Development.json` is only needed for per-developer overrides.
```jsonc
{
"ConfigDatabase": {
"ConnectionString": "Server=localhost,14330;Database=OtOpcUaConfig_Dev;User Id=sa;Password=OtOpcUaDev_2026!;TrustServerCertificate=true;Encrypt=false;"
"ConnectionString": "Server=10.100.0.35,14330;Database=OtOpcUaConfig_Dev;User Id=sa;Password=OtOpcUaDev_2026!;TrustServerCertificate=true;Encrypt=false;"
},
"Authentication": {
"Ldap": {
@@ -89,29 +95,26 @@ Copy-paste-ready. **Never commit these to the repo** — they go in `appsettings
}
```
LDAP host stays `localhost` because GLAuth still runs as a native NSSM service on this dev VM (not yet migrated to the Docker host).
For xUnit test fixtures that need a throwaway DB per test run, build connection strings with `Database=OtOpcUaConfig_Test_{timestamp}` to avoid cross-run pollution.
### Container management quick reference
All commands SSH into the Docker host. The standalone Windows `docker.exe` on this VM has no daemon — every operation runs server-side via the helper.
```powershell
# Start / stop the SQL Server container (survives reboots via Docker Desktop auto-start)
docker stop otopcua-mssql
docker start otopcua-mssql
# Status / log / lifecycle from this VM
lmxopcua-fix ls # list lmxopcua-tagged containers + status
lmxopcua-fix logs mssql # SQL Server log tail
ssh dohertj2@10.100.0.35 'docker stop otopcua-mssql; docker start otopcua-mssql'
ssh dohertj2@10.100.0.35 'docker logs otopcua-mssql --tail 50'
# Logs (useful for diagnosing startup failures or login issues)
docker logs otopcua-mssql --tail 50
# sqlcmd inside the container (run on the Docker host)
ssh dohertj2@10.100.0.35 'docker exec otopcua-mssql /opt/mssql-tools18/bin/sqlcmd -S localhost -U sa -P "OtOpcUaDev_2026!" -C -Q "SELECT @@VERSION"'
# Shell into the container (rarely needed; sqlcmd is the usual tool)
docker exec -it otopcua-mssql bash
# Query via sqlcmd inside the container (Git Bash needs MSYS_NO_PATHCONV=1 to avoid path mangling)
MSYS_NO_PATHCONV=1 docker exec otopcua-mssql /opt/mssql-tools18/bin/sqlcmd -S localhost -U sa -P "OtOpcUaDev_2026!" -C -Q "SELECT @@VERSION"
# Nuclear reset: drop the container + volume (destroys all DB data)
docker stop otopcua-mssql
docker rm otopcua-mssql
docker volume rm otopcua-mssql-data
# …then re-run the docker run command from Bootstrap Step 6
# Nuclear reset (destroys dev DB data)
ssh dohertj2@10.100.0.35 'cd /opt/otopcua-mssql && docker compose down -v && docker compose up -d'
```
### Credential rotation
@@ -125,7 +128,7 @@ Dev credentials in this inventory are convenience defaults, not secrets. Change
| Resource | Purpose | Type | Default port | Default credentials | Owner |
|----------|---------|------|--------------|---------------------|-------|
| **.NET 10 SDK** | Build all .NET 10 x64 projects | OS install | n/a | n/a | Developer |
| **.NET Framework 4.8 SDK + targeting pack** | Build `Driver.Galaxy.Host` (Phase 2+) | Windows install | n/a | n/a | Developer |
| **.NET Framework 4.8 SDK + targeting pack** | Optional — build the mxaccessgw worker (sibling repo, x86 net48) | Windows install | n/a | n/a | Developer |
| **Visual Studio 2022 17.8+ or Rider 2024+** | IDE (any C# IDE works; these are the supported configs) | OS install | n/a | n/a | Developer |
| **Git** | Source control | OS install | n/a | n/a | Developer |
| **PowerShell 7.4+** | Compliance scripts (`phase-N-compliance.ps1`) | OS install | n/a | n/a | Developer |
@@ -247,7 +250,7 @@ Order matters because some installs have prerequisites and several need admin el
winget install --id Microsoft.DotNet.SDK.10 --accept-package-agreements --accept-source-agreements
```
2. **Install .NET Framework 4.8 SDK + targeting pack** — only needed when starting Phase 2 (Galaxy.Host); skip for Phase 01 if not yet there
2. **Install .NET Framework 4.8 SDK + targeting pack** optional, only needed when building the mxaccessgw worker (sibling repo, x86 net48). Not required by anything in this repo.
```powershell
winget install --id Microsoft.DotNet.Framework.DeveloperPack_4 --accept-package-agreements --accept-source-agreements
```
@@ -482,7 +485,7 @@ Seeds are idempotent (re-runnable) and gitignored where they contain credentials
| Docker Desktop license terms change for org use | Track Docker pricing; budget approved or fall back to Podman if license becomes blocking |
| Integration host single point of failure | Document the setup so a second host can be provisioned in <2 days; test fixtures pin to a hostname so failover changes one DNS entry |
| GLAuth dev config drifts between developers | Sync script + template (Step 4) keep configs aligned; periodic review |
| Galaxy / MXAccess licensing for non-dev-machine | Galaxy stays on the dev machines that already have Aveva licenses; integration host does NOT run Galaxy (Galaxy.Host integration tests run on the dev box, not the shared host) |
| Galaxy / MXAccess licensing for non-dev-machine | Galaxy stays on the dev machines that already have Aveva licenses; integration host does NOT run Galaxy (the mxaccessgw worker requires the AVEVA stack and runs on the dev box, not the shared host) |
| Long-lived dev env credentials in dev `appsettings.Development.json` | Gitignored; documented as dev-only; production never uses these |
## Decisions to Add to plan.md
+159
View File
@@ -0,0 +1,159 @@
# FOCAS deployment guide
Operational reference for deploying the Fanuc FOCAS driver in production.
## Licence + DLL provisioning
Fanuc's FOCAS2 library is proprietary + closed-source. Two DLL variants exist:
| Variant | Bitness | OtOpcUa usage |
|---|---|---|
| **`Fwlib64.dll`** | x64 | **Default production binary.** Loaded by `Driver.FOCAS.Host` (net10.0 x64 Windows service) and by the `Driver.FOCAS.Cli` when running on an x64 server. |
| `Fwlib32.dll` | x86 | Historical — what the project was originally scaffolded against. Not used by any current binary post the 2026-04-23 Host retarget. Kept in the licence set for legacy deployments that insist on x86-only Hosts. |
Both are **licensed for this project** — this project has a valid Fanuc FOCAS developer-kit licence that grants redistribution for either variant internally.
### The DLLs now ship with the Host (2026-04-23)
As of the vendoring change, the Host csproj copies the licensed FOCAS binaries from [`vendor/fanuc/`](../../vendor/fanuc/README.md) to its build output automatically. So after a `dotnet build` / `dotnet publish`, the layout is:
```
<publish-root>\Driver.FOCAS.Host\
├── OtOpcUa.Driver.FOCAS.Host.exe
├── OtOpcUa.Driver.FOCAS.Host.dll
├── ... runtime deps ...
├── Fwlib64.dll ← master FOCAS runtime (generic x64)
├── fwlib0iD64.dll ← 0i-D series dispatch target
├── fwlib30i64.dll ← 30i / 31i / 32i series dispatch target
├── fwlibe64.dll ← Ethernet transport variant
├── fwlibNCG64.dll ← NC Guide (Fanuc PC simulator) target
└── fwlib0DN64.dll ← 0i-D Numeric-control thin variant
```
No operator step required to "drop Fwlib64.dll on PATH" anymore — the Host loads `Fwlib64.dll` via bare-name and Windows finds it in the exe's own directory first. Shipping the full set of series-specific siblings lets the Host work against any Fanuc CNC the deployment points it at; the master `Fwlib64.dll` dispatches to the right variant based on what the CNC reports during `cnc_allclibhndl3`.
The DLL loads lazily on the first `OpenSessionAsync` call. When somehow missing (deployment artefact surgery), `Fwlib64FocasBackend` returns a structured `Fwlib64DllMissing` error-code rather than crashing; the Proxy maps it to `BadCommunicationError` with a clear operator message.
### Repo confidentiality note
**The FOCAS runtime DLLs in `vendor/fanuc/` are licensed binaries — treat this repo accordingly.** Do not mirror / push / fork to any public forge without first confirming the redistribution is covered by whoever manages the Fanuc relationship. Internal / customer-licensed mirrors are fine. See [`vendor/fanuc/README.md`](../../vendor/fanuc/README.md) for the full provenance + licence context.
## Tier-C architecture recap
The FOCAS driver is **Tier-C** — out-of-process — for **blast-radius isolation**, not bitness. Fanuc's DLL has documented crash modes (network errors, malformed responses, handle-recycle bugs) that could take the main OPC UA server down if loaded in-process. Splitting the P/Invoke into a separate Host process means a Fwlib crash only loses FOCAS tags; every other driver keeps running, and the supervisor restarts the Host.
Galaxy has the same pattern but is **forced** by MXAccess's 32-bit-only COM — there's no x64 path. FOCAS would work in-process on x64 (Fwlib64 is licensed), but the blast-radius argument keeps it Tier-C anyway.
See [`implementation/focas-isolation-plan.md`](implementation/focas-isolation-plan.md) for the full topology.
## Installing the Host service
Use the NSSM wrapper script:
```powershell
.\scripts\install\Install-FocasHost.ps1 `
-InstallRoot 'C:\Program Files\OtOpcUa\Driver.FOCAS.Host' `
-ServiceAccount 'OTOPCUA\svc-otopcua' `
-FocasBackend fwlib64
```
Parameters:
| Parameter | Default | Purpose |
|---|---|---|
| `-InstallRoot` | **required** | Where the Host binaries + `Fwlib64.dll` live |
| `-ServiceAccount` | **required** | Must match the main OtOpcUa server account so the named-pipe ACL allows the Proxy to connect |
| `-FocasBackend` | `fwlib64` | `fwlib64` (production), `fake` (in-memory for Tier-C pipeline smoke without a CNC), `unconfigured` (returns BadDeviceFailure for every call) |
| `-FocasSharedSecret` | auto-gen | Per-process secret passed at service start so it never touches disk |
| `-FocasPipeName` | `OtOpcUaFocas` | Named pipe the Proxy connects to |
| `-ServiceName` | `OtOpcUaFocasHost` | Windows service display name |
`fwlib32` is accepted as a legacy alias but maps to `Fwlib64FocasBackend` internally — the Host is x64 post-2026-04-23, so 32-bit-only deployments would need to rebuild + retarget.
## Configuring a FOCAS driver instance
In the Admin UI's Drivers tab, create a `DriverInstance` with `DriverType = "FOCAS"` and a JSON config of the shape:
```json
{
"Backend": "ipc",
"PipeName": "OtOpcUaFocas",
"SharedSecret": "<matches OTOPCUA_FOCAS_SECRET env var on the Host>",
"Devices": [
{ "Name": "Mill-01", "HostAddress": "focas://192.168.1.50:8193", "Series": "ThirtyOne_i" }
],
"Tags": [
{ "Name": "SpindleLoad", "DeviceName": "Mill-01", "Address": "R100", "DataType": "Int16" },
{ "Name": "CycleRunning", "DeviceName": "Mill-01", "Address": "X0.0", "DataType": "Bit" },
{ "Name": "PartCount", "DeviceName": "Mill-01", "Address": "MACRO:500", "DataType": "Float64" }
],
"Probe": { "Enabled": true, "IntervalMs": 5000, "TimeoutMs": 2000 }
}
```
`Backend` selector (on the Proxy side — not to be confused with `OTOPCUA_FOCAS_BACKEND` on the Host):
| Value | Meaning |
|---|---|
| `ipc` (default) | Route through `Driver.FOCAS.Host` over the named pipe. **Production shape.** |
| `fwlib` | Direct in-process P/Invoke via `FwlibFocasClient`. Only valid on x64 servers that are willing to accept the blast-radius trade-off. |
| `unimplemented` | Throws at construction — used for scaffolding `DriverInstance` rows before the Host is deployed. |
## Smoke testing
**Without a CNC — pipeline only:**
```powershell
$env:OTOPCUA_FOCAS_BACKEND = "fake"
Start-Service OtOpcUaFocasHost
```
The `FakeFocasBackend` stores per-address values in-memory and survives read/write/subscribe exercising. Use `otopcua-focas-cli` (in-process, bypasses the Host) or the OtOpcUa server's own driver registration to exercise the pipeline.
**Version-aware fake** (Stream A of the simulator plan, shipped 2026-04-23) — set `OTOPCUA_FOCAS_SERIES` to simulate a specific Fanuc controller's capability matrix. Addresses outside the series' documented ranges get rejected with `BadOutOfRange` (matching what the real DLL returns as `EW_NUMBER` / `EW_PARAM`):
```powershell
$env:OTOPCUA_FOCAS_BACKEND = "fake"
$env:OTOPCUA_FOCAS_SERIES = "ThirtyOne_i" # or Zero_i_D / Zero_i_F / Sixteen_i / PowerMotion_i / ...
Start-Service OtOpcUaFocasHost
```
**Optional behavioural quirks** — `OTOPCUA_FOCAS_QUIRKS` is a comma-separated list:
| Token | Behaviour |
|---|---|
| `EditMode` | `OpenSessionAsync` refuses sessions with `ErrorCode=EditModeActive`, mimicking a CNC in Edit mode |
| `Emergency` | `ProbeAsync` reports the session as unhealthy with `emergency-stop active` error even after a clean open — exercises the driver's probe-surfaces-non-connectivity path |
| `SlowFirstConnect[=ms]` | First `OpenSessionAsync` blocks for `ms` (default 3000) milliseconds, mimicking the 16i-series slow-first-connect — subsequent opens are fast |
| `CrashAfterCycles=N` | After `N` session opens, the `N+1`-th returns `ErrorCode=Fwlib64Crashed` — mimics the documented Fanuc handle-leak |
Example combining several:
```powershell
$env:OTOPCUA_FOCAS_QUIRKS = "EditMode,CrashAfterCycles=5,SlowFirstConnect=500"
```
Unknown tokens log a warning but don't abort startup.
**With a real CNC:**
```powershell
$env:OTOPCUA_FOCAS_BACKEND = "fwlib64"
$env:FOCAS_TRUST_WIRE = "1"
Start-Service OtOpcUaFocasHost
.\scripts\e2e\test-focas.ps1 -CncHost 192.168.1.50 -BridgeNodeId 'ns=2;s=Focas/R100'
```
Requires `Fwlib64.dll` on `PATH` alongside the Host exe.
## Observability
- Host logs: `%ProgramData%\OtOpcUa\focas-host-*.log` (Serilog daily rolling)
- Post-mortem: `%ProgramData%\OtOpcUa\focas-post-mortem.mmf` — ring buffer of the last ~1000 IPC operations, survives a Host crash so the Proxy-side supervisor can read it during respawn diagnostic
- `DriverHostStatus` rows in the central Config DB under `HostName = <configured device host>``State` transitions + Polly resilience counters surface on the Admin `/hosts` page
## Known issues
- **No public simulator** — Fanuc FOCAS has no published emulator. Lab-rig validation (a real FANUC 0i-F / 30i controller or an FDK-licenced dev rig) is the only way to confirm wire-level correctness. Tracked under task #222.
- **32-bit-only deployments unsupported** — post the 2026-04-23 retarget, running the Host as net48 x86 is not a supported mode. If you genuinely need Fwlib32-only, revert the Host csproj + Program.cs changes from that commit.
- **Handle-recycling cadence** — documented Fanuc issue where long-lived FWLIB session handles can leak inside the DLL; the Host periodically cycles them. Currently on a fixed 60-minute cadence; future config knob tracked as a post-release follow-up.
@@ -0,0 +1,315 @@
# FOCAS Docker simulator — implementation plan
> **Status**: **IN PROGRESS** 2026-04-23. **Streams A + B shipped.** Stream C (real Fwlib64 wire compat) + Stream D (e2e + docs) still open — both require a Windows rig with licensed Fwlib64.dll + captured Wireshark traces. Stream B shipped the full architectural scaffold (Docker image, 9 per-series compose profiles, asyncio TCP server, handler dispatch, profile-driven range enforcement, local validation harness) — exercised end-to-end against both `thirtyone_i` and `powermotion_i` profiles.
## Goal
Close the one remaining FOCAS gap (`#222` follow-up — "wire-level live-boot against real hardware") with a hardware-free fixture that:
1. Runs in Docker, matches the per-driver fixture pattern (`docker compose up -d` in the test project).
2. Exposes the FOCAS TCP port (`8193` by default) to the host.
3. Speaks enough of the FOCAS wire protocol that **a Windows test rig running our unmodified `Driver.FOCAS.Host` + licensed `Fwlib64.dll` can open a session and exercise the 9 FWLIB functions the driver actually uses.**
4. Supports **version profiles** — one container per Fanuc series (0i-D, 0i-F, 30i, 31i, 32i, PowerMotion-i) — so driver-side range validation, error-code mapping, and per-series quirks get exercised against a server that actually behaves differently per series.
5. Plugs into the existing e2e infrastructure (`scripts/e2e/test-focas.ps1` loses the `FOCAS_TRUST_WIRE=1` gate when the fixture is up).
## Non-goals
- **Not a full FOCAS emulator.** Fanuc's FOCAS spec is closed; faithfully reproducing every function across every controller model would be a years-long project. We implement the narrow subset the driver uses (see §Protocol surface).
- **Not a CNC behavioural model.** We return plausible values for PMC/param/macro reads; we do NOT simulate axis motion, program execution, or alarm generation. The mock exists to exercise the driver's marshalling + IPC + status-code paths, not to prove the CNC behaves correctly.
- **Not a replacement for a bench CNC.** A physical controller still catches timing-dependent bugs (Fwlib-internal thread-pool exhaustion, handle-recycle pathologies, vendor-firmware quirks) that a mock can't reproduce. Mock covers ~80% of value; real-hardware smoke stays as a final gate.
## Constraint that shapes the design
`Fwlib64.dll` is a proprietary closed-source library that speaks FOCAS to the CNC. **Our driver never touches raw TCP** — it calls `cnc_allclibhndl3` / `pmc_rdpmcrng` / etc. and Fwlib encodes the wire frames internally.
This means the mock has two possible architectures:
| Option | Where the mock lives | Exercises Fwlib? |
|---|---|---|
| **A. IPC-layer fake** (already shipped as `FakeFocasBackend`) | Between `FwlibFrameHandler` and the FWLIB call | ❌ No — bypasses Fwlib entirely |
| **B. TCP wire mock** (this plan) | Listens on port 8193; Fwlib connects to it | ✅ Yes — Fwlib encodes real frames |
Option B is the only one that validates the driver's actual production wire path (driver → Host → `FwlibFocasClient``Fwlib64.dll` → TCP → mock).
**Prerequisite reading** the implementer needs before starting Option B:
- `strangesast/fwlib` on GitHub — reverse-engineered FOCAS2 Linux client, has frame-format notes
- `GalvinGao/opcua-server-fanuc` — another OSS FOCAS client with wire-format traces
- `jdegre/focas-python` (if it still exists) — previous Python FOCAS stub, starting point
- Our own `src/ZB.MOM.WW.OtOpcUa.Driver.FOCAS/FwlibNative.cs` — the 9-function surface we need to satisfy
## Protocol surface (what the mock must speak)
From `FwlibNative.cs`, our driver makes exactly 9 FWLIB calls:
| FWLIB function | What it does | Wire complexity |
|---|---|---|
| `cnc_allclibhndl3` | Open Ethernet handle (connect) | **High** — initial handshake, version negotiation, session state |
| `cnc_freelibhndl` | Close handle | Low |
| `pmc_rdpmcrng` | PMC range read (byte/word/long + optional bit) | **Medium** — 40-byte buffer with type-dependent layout |
| `pmc_wrpmcrng` | PMC range write | **Medium** — same buffer shape inverted |
| `cnc_rdparam` | Parameter read (axis-aware) | Medium — 32-byte buffer |
| `cnc_wrparam` | Parameter write | Medium |
| `cnc_rdmacro` | Macro variable read (value + decimal-point count) | Low |
| `cnc_wrmacro` | Macro variable write | Low |
| `cnc_statinfo` | Status info (for probe) | Low — fixed-shape response |
**Coverage target**: all 9 functions return plausible responses for the address ranges declared in each series profile. Out-of-range addresses return `EW_NUMBER` / `EW_PARAM`. Unknown PMC letters return `EW_DATA`. Session state (handle validity, unknown handle detection) is enforced.
## Version profiles
The driver has `FocasCncSeries` + `FocasCapabilityMatrix` already — we mirror that matrix into JSON profiles the mock loads at start:
```
fixture/
├── Dockerfile
├── requirements.txt
├── server/
│ ├── focas_server.py # asyncio TCP server + frame parser
│ ├── handlers/
│ │ ├── allclibhndl3.py
│ │ ├── pmc.py
│ │ ├── param.py
│ │ ├── macro.py
│ │ └── status.py
│ ├── state.py # in-memory "CNC" state
│ └── frames.py # FOCAS frame encode/decode
└── profiles/
├── zero_i_d.json
├── zero_i_f.json
├── zero_i_mf.json
├── zero_i_tf.json
├── sixteen_i.json
├── thirty_i.json
├── thirtyone_i.json
├── thirtytwo_i.json
└── powermotion_i.json
```
Each profile captures:
```json
{
"series": "ThirtyOne_i",
"api_version": "0x30",
"pmc_ranges": {
"X": [0, 127], "Y": [0, 127], "F": [0, 767], "G": [0, 767],
"R": [0, 1499], "D": [0, 2999], "C": [0, 199], "K": [0, 31],
"A": [0, 24], "T": [0, 79], "E": [0, 9999]
},
"param_ranges": [[1000, 9999], [10000, 15999]],
"macro_range": [100, 999],
"extended_macros": false,
"axes": 3,
"quirks": {
"crash_after_handle_cycles": null,
"edit_mode_rejects_connection": false,
"allclibhndl3_blocks_during_alarm": false,
"param_bit_index_max": 7
},
"alarm_default": false,
"emergency_default": false
}
```
**Differences that actually matter** for driver coverage:
| Series | Meaningful difference vs baseline |
|---|---|
| 0i-D / 0i-F / 0i-MF / 0i-TF | PMC range narrower; no E-relay; macro range `100-999` strict |
| 16i | Older Fwlib version; `cnc_allclibhndl3` extra-slow on first connect (artificial delay in mock) |
| 30i | Full PMC range; extended macros (`#10000+`) supported |
| 31i / 32i | 5-axis; larger parameter ranges |
| PowerMotion-i | No PMC `T` timer; motion-only controller quirks |
## Architecture
```
┌─────────────────────────────────────────────────────────────────┐
│ Windows test rig (net10.0 x64) │
│ │
│ FocasDriver ──► FwlibFocasClient ──► Fwlib64.dll ──► TCP ──┐ │
│ (real P/Invoke) │ │
└─────────────────────────────────────────────────────────────┼───┘
port 8193 │
┌─────────────────────────────────────────────────────────────────┐
│ Docker container: otopcua-focas-sim-{series} │
│ │
│ Python asyncio TCP server │
│ ├─ frames.py: parse + encode FOCAS frames │
│ ├─ handlers/: one module per FWLIB function │
│ ├─ state.py: per-session handle registry + simulated memory │
│ └─ profiles/{series}.json: range + quirk table loaded at │
│ boot via env var OTOPCUA_FOCAS_ │
│ PROFILE=thirtyone_i │
└─────────────────────────────────────────────────────────────────┘
```
Python choice rationale: the existing OSS FOCAS implementations are Python-first; asyncio's `StreamReader`/`StreamWriter` maps cleanly to FOCAS's length-prefixed frame model; one Dockerfile covers every profile because profile-switching is an env-var.
`docker-compose.yml` exposes one service per profile as a `--profile`:
```yaml
services:
focas-thirtyone:
profiles: ["thirtyone"]
image: otopcua-focas-sim:latest
environment: { OTOPCUA_FOCAS_PROFILE: "thirtyone_i" }
ports: ["8193:8193"]
focas-zerod:
profiles: ["zerod"]
image: otopcua-focas-sim:latest
environment: { OTOPCUA_FOCAS_PROFILE: "zero_i_d" }
ports: ["8193:8193"]
# ... one per supported series ...
```
Users pick a profile with `docker compose --profile thirtyone up -d`. Only one profile runs at a time (port collision on 8193) — matching the other driver fixtures' single-image pattern.
## Delivery plan — three streams
### Stream A — Version-aware fake backend (C#, 2-3 days) — ✅ **SHIPPED 2026-04-23**
**What landed**:
- `FakeFocasBackend` gained a second ctor `(FocasCncSeries series, FakeFocasBackendQuirks? quirks)`; default ctor preserves the pre-Stream-A permissive behaviour.
- `ValidateAddress` delegates to the existing `FocasCapabilityMatrix.Validate` so mock + driver share one source of truth. Out-of-range reads/writes/PMC-bit-writes return `BadOutOfRange` (0x803C0000 — matching what the real driver maps `EW_NUMBER`/`EW_PARAM` to).
- `FakeFocasBackendQuirks` record carries four opt-in quirks: `EditModeRejectsConnection`, `CrashAfterHandleCycles`, `SlowFirstConnectDelay`, `EmergencyAtStartup`.
- `Program.cs` reads `OTOPCUA_FOCAS_SERIES` (case-insensitive FocasCncSeries enum value) + `OTOPCUA_FOCAS_QUIRKS` (comma-separated token list: `EditMode`, `Emergency`, `SlowFirstConnect[=ms]`, `CrashAfterCycles=N`). Unknown tokens log-and-ignore. Values surface in Host log at startup.
- 19 new tests in `FakeFocasBackendSeriesTests.cs` covering: Unknown-permissive baseline, Zero_i_D macro rejection, ThirtyOne_i extended-macro acceptance, PowerMotion_i T-timer rejection, Write+PmcBitWrite parallel rejection, all four quirks, + 8 theory cases for the env-var parser.
**Deliverable shipped**:
- `src/ZB.MOM.WW.OtOpcUa.Driver.FOCAS.Host/Backend/FakeFocasBackend.cs` — extended
- `src/ZB.MOM.WW.OtOpcUa.Driver.FOCAS.Host/Program.cs``BuildFakeBackend` local fn + `ParseFakeQuirks` helper
- `tests/ZB.MOM.WW.OtOpcUa.Driver.FOCAS.Host.Tests/FakeFocasBackendSeriesTests.cs` — new, 19 tests
- 38/38 Host tests green post-Stream-A.
### Stream B — Python FOCAS TCP server (scaffold) — ✅ **SHIPPED 2026-04-23**
**What landed** under `tests/ZB.MOM.WW.OtOpcUa.Driver.FOCAS.IntegrationTests/Docker/`:
- `Dockerfile` — Python 3.12-slim image; stdlib-only, no external deps
- `docker-compose.yml` — 9 `--profile` entries, one per Fanuc series (`thirtyone`, `thirtytwo`, `thirty`, `sixteen`, `zerod`, `zerof`, `zeromf`, `zerotf`, `powermotion`). All share one image + one port (8193).
- `server/focas_server.py` — asyncio entry point, per-connection session loop, graceful-shutdown signal handling
- `server/frames.py` — length-prefixed frame codec (scaffold — see Stream C note below)
- `server/state.py` — per-session handle registry + in-memory PMC/param/macro dictionaries
- `server/profile.py` — JSON profile loader
- `server/handlers/` — one module per FWLIB function (9 total): open/close, PMC read/write, param read/write, macro read/write, statinfo. Profile-driven range validation; error responses use a `FLAG_ERROR` bit on the response header.
- `profiles/*.json` — 9 series profiles mirroring `FocasCapabilityMatrix`. Quirks (`slow_first_connect_ms`, `alarm_default`, `emergency_default`, `crash_after_handle_cycles`, `edit_mode_rejects_connection`) declared per profile.
- `validate_harness.py` — scaffold-protocol TCP client that opens a session, round-trips a macro, triggers range-rejection, asserts the expected error reasons surface.
- `README.md` — operator-facing usage + Stream C next-steps checklist.
**Exit criterion met**: validated end-to-end against two profiles (`thirtyone_i`, `powermotion_i`) via the local harness. Session handshake → statinfo → macro round-trip → out-of-range rejection → PMC round-trip → bad-letter rejection → clean close — all PASS. Profile-switching confirmed working: 31i API 0x0030 → PowerMotion 0x0040, macro range [0,99999]→[0,999], letter set {A,C,D,E,F,G,K,M,R,T,X,Y}→{D,R,X,Y}.
**⚠️ The wire *framing* is a scaffold — NOT Fwlib64-compatible yet.** `server/frames.py` uses a plausible length-prefixed framing (big-endian header: uint32 length, uint16 function_id, uint16 flags) that satisfies the harness but has never been validated against the real Fanuc DLL. Stream C is the iterative refinement cycle where a Windows rig drives that convergence.
**The response payload shapes inside those frames ARE authoritative** (refined 2026-04-23 after `fwlib32.h` review):
- `ODBM` (macro read) = 10 bytes: `short datano, short dummy, int32 mcr_val, short dec_val`
- `ODBST` (statinfo) = 18 bytes: 9 × `short` (dummy/tmmode/aut/run/motion/mstb/emergency/alarm/edit)
- `IODBPSD` (param read) = 36 bytes: `short datano, short type, bytes[32]` (union = 8 axes × 4 bytes)
- `IODBPMC` (PMC range read) = 48 bytes: `short type_a, short type_d, uint16 datano_s, uint16 datano_e, bytes[40]`
Validate harness asserts exact byte sizes + header field round-trip. When Stream C's Wireshark traces arrive, the payload layer should already match — only framing needs iteration.
See [`focas-wire-protocol.md`](focas-wire-protocol.md) for the authoritative-vs-guessed breakdown.
**C# integration test scaffold** also shipped (`tests/ZB.MOM.WW.OtOpcUa.Driver.FOCAS.IntegrationTests/`) — `FocasSimFixture` probes port 8193 + skips when the container's down; three smoke tests pass against a running container (TCP reachability, clean connect-close, profile parsing). A `Series/WireCompatGatedTests.cs` skeleton gates Fwlib64-dependent tests behind `OTOPCUA_FOCAS_SIM_WIRE_COMPAT=1`, ready for Stream C activation.
### Stream C — FWLIB compat + version profiles (2-3 weeks) — **blocked on Windows rig + Wireshark traces**
See `tests/ZB.MOM.WW.OtOpcUa.Driver.FOCAS.IntegrationTests/Docker/README.md` §"Stream C — what's required to reach wire compatibility" for the concrete implementer checklist.
**Goal**: real Fwlib64.dll running on a Windows test rig can open a session against the mock and round-trip the 9 FWLIB calls our driver makes.
Sub-tasks:
1. **Handshake** (`handlers/allclibhndl3.py`) — the hardest piece. FOCAS session open negotiates protocol version + controller type. Incorrect negotiation → Fwlib disconnects. Start from `strangesast/fwlib`'s handshake trace.
2. **PMC read/write** (`handlers/pmc.py`) — 40-byte buffer with type-dependent layout. Must match `FwlibNative.IODBPMC` struct layout exactly. Implement per-profile range checks.
3. **Parameter read/write** (`handlers/param.py`) — 32-byte axis-aware buffer. Similar to PMC but simpler (no sub-address bit indexing beyond `param_bit_index_max`).
4. **Macro read/write** (`handlers/macro.py`) — straightforward; value + decimal-point count as `ODBM`.
5. **Status info** (`handlers/status.py`) — fixed `ODBST` shape; profile declares defaults for `Aut` / `Run` / `Motion` / `Alarm`.
6. **State management** (`server/state.py`) — per-session handle registry, in-memory PMC/param/macro dictionaries, persistent across one session, reset on session close.
7. **Profile loader** — reads `OTOPCUA_FOCAS_PROFILE` env var, loads matching JSON, injects into handlers.
8. **Windows validation rig** — one-time setup: a Windows VM (or dev box) with licensed `Fwlib64.dll` + a tiny test driver that calls the 9 FWLIB functions + asserts round-trip. This is the first live-wire validation the plan asks for.
9. **Per-series test matrix**`tests/ZB.MOM.WW.OtOpcUa.Driver.FOCAS.IntegrationTests/` new project, one test class per series, each class's `[Fact]` runs against that profile's container.
**Exit criterion**: live Fwlib64.dll on a Windows rig opens a session, reads + writes across all 9 FWLIB functions, against each of the 9 profiles. Integration test suite green.
### Stream D — e2e integration + doc close-out (1-2 days)
- Update `scripts/e2e/test-focas.ps1` to accept `-ProfileName` and skip `FOCAS_TRUST_WIRE` gate when the matching container is up.
- Add the FOCAS simulator to `docs/v2/test-data-sources.md` + `docs/drivers/FOCAS-Test-Fixture.md` (flip the "hardware-gated" caveat to "fixture or hardware").
- Update `exit-gate-phase-3.md` — final FOCAS deferral closes.
## Test integration
The new project `tests/ZB.MOM.WW.OtOpcUa.Driver.FOCAS.IntegrationTests/` mirrors `Driver.OpcUaClient.IntegrationTests`:
```
tests/ZB.MOM.WW.OtOpcUa.Driver.FOCAS.IntegrationTests/
├── Docker/
│ ├── docker-compose.yml # references the 9 series profiles
│ ├── Dockerfile # Python image
│ ├── requirements.txt
│ ├── server/
│ └── profiles/
├── FocasSimFixture.cs # probes 8193 at collection init, skips if down
├── FocasSimSeriesProfile.cs # test-side mirror of the JSON profile
└── Series/
├── ThirtyOneITests.cs
├── ZeroIDTests.cs
└── ... one file per series ...
```
The existing `FocasDocker`-less skip pattern applies: if the container isn't running, tests skip with a clear message pointing at `docker compose up -d`. Matches Modbus / S7 / OpcUaClient.
## Risks + mitigations
| Risk | Likelihood | Impact | Mitigation |
|---|---|---|---|
| FOCAS wire protocol is more complex than the OSS traces suggest → Stream C slips weeks | **Medium** | High | Stream A delivers 70% value with zero protocol risk. If Stream C stalls, ship A + schedule C as a follow-up. |
| Fwlib64.dll version differs from what `strangesast/fwlib` reverse-engineered → handshake fails | Medium | High | Capture Wireshark trace of a real CNC session against our actual licensed Fwlib64 version before coding. One-time investment, catches drift early. |
| Profile differences that matter at the wire level aren't captured in `FocasCapabilityMatrix` | Medium | Medium | Stream C exit criterion includes validating each profile against live Fwlib — any mismatch is a profile-table bug we fix then. |
| Docker container startup time breaks PR-CI budget | Low | Low | Each profile is one Python container + profile JSON — sub-5s cold start. Matches opc-plc. |
| Windows validation rig availability blocks Stream C | Medium | High | Use the existing TCBSD-class approach: a dedicated ESXi VM with Windows + licensed Fwlib64.dll, provisioned once, shared by the team. Cost ~1 dev-day to set up; unblocks all future FOCAS work forever. |
| Fanuc licence audit surfaces our mock as an "unlicensed FOCAS implementation" | **Low** | **High** | The mock doesn't ship the Fanuc DLL or reproduce any of Fanuc's code. Reverse-engineered wire formats from OSS research are fair use; the mock is our code. Consult legal before open-sourcing, not before internal use. |
## Timeline estimate
Assuming one dev full-time:
| Stream | Duration | Dependencies |
|---|---|---|
| A — Version-aware fake backend | 2-3 days | none |
| B — TCP server scaffold | 1 week | Windows rig not required yet |
| C — FWLIB compat + profiles | 2-3 weeks | Windows rig with Fwlib64 + Wireshark trace |
| D — e2e + docs | 1-2 days | C done |
**Total**: ~4-5 weeks to full coverage. Ship A immediately (independent value), start C in parallel with Windows-rig setup.
## Exit criteria (what closes #222)
- [ ] All 9 series profiles containerized + pass startup health check
- [ ] Live Fwlib64.dll round-trips all 9 FWLIB calls against every profile (Stream C validation rig)
- [ ] Per-series integration test suite green in CI
- [ ] `test-focas.ps1` runs end-to-end against the simulator without `FOCAS_TRUST_WIRE=1`
- [ ] Docs updated: `FOCAS-Test-Fixture.md` flipped from "hardware-only" to "fixture or hardware"
- [ ] One live-CNC smoke still runs during v2 release readiness, as a belt-and-braces final check
## Open questions
1. **Licence clarity**: is reverse-engineered FOCAS2 wire-format documentation (from `strangesast/fwlib` etc.) compatible with our Fanuc FOCAS developer-kit licence? Legal check required before starting Stream C.
2. **Windows rig**: do we dedicate an existing VM (like the TCBSD box) or provision a new one? Cost difference is small; decision affects who owns maintenance.
3. **Profile source of truth**: if `FocasCapabilityMatrix.cs` and `profiles/*.json` ever disagree, which wins? Proposal: profiles win (wire behavior is authoritative), driver's matrix is regenerated from profiles as a build step.
4. **Alarm events**: the driver doesn't currently use `cnc_rdalmmsg2` / alarm subscription, so the mock doesn't need to simulate alarms beyond the `statinfo.Alarm` flag. If we add `IAlarmSource` to FOCAS later, Stream C expands.
## References
- `src/ZB.MOM.WW.OtOpcUa.Driver.FOCAS/FwlibNative.cs` — 9-function P/Invoke surface the mock must satisfy
- `src/ZB.MOM.WW.OtOpcUa.Driver.FOCAS/FocasCapabilityMatrix.cs` — per-series range tables (profile seed data)
- `docs/v2/focas-version-matrix.md` — human-readable version matrix the profiles mirror
- `docs/drivers/FOCAS-Test-Fixture.md` — current test-fixture doc (flips post-Stream-D)
- `tests/ZB.MOM.WW.OtOpcUa.Driver.OpcUaClient.IntegrationTests/Docker/` — pattern this plan mirrors for the Docker compose + fixture-skip shape
- `strangesast/fwlib` (GitHub, OSS) — primary FOCAS wire-format reverse-engineering reference
+87
View File
@@ -0,0 +1,87 @@
# Follow-ups from `auto/driver-gaps` queue (PRs #225#316, 92 merged)
Captured 2026-04-26 after the plan-execution queue drained. Organised by category.
## Wrapper / library-version-blocked (waiting on upstream)
| Driver | PR | Blocker | Resolution path |
|---|---|---|---|
| AbCip | abcip-3.1 | libplctag.NET 1.5.2 doesn't expose `connection_size` | Reflection fallback ships; remove when wrapper publishes the property |
| AbCip | abcip-3.2 | libplctag.NET 1.5.x has no public instance-ID knob | Wire stays Symbolic regardless of mode; flip when wrapper exposes it |
| AbCip | abcip-3.3 | libplctag.NET wire-level multi-service-packet bundling not exposed | Planner ships correct; runtime currently issues N reads. Switch when wrapper bundles |
| S7 | s7-e1 | S7netplus 0.20 has no public `ReadSzlAsync` (request builder is internal) | Parser tested + cached; `BadNotSupported` until S7netplus exposes it or we add raw S7comm SZL-PDU helper |
| S7 | s7-e2 | S7netplus 0.20 doesn't expose `SendPassword` | `IS7PlcAuthGate` reflection probe; logs warning, no exception. Flip when library exposes it |
| TwinCAT | twincat-2.2 | Bulk Sum path stays on symbolic | Phase-2 perf sweep follow-up to switch bulk to handle-based |
| TwinCAT | twincat-5.1 | Beckhoff doesn't ship a managed `TcEventLogger` wrapper | Gate seam ships; production `AdsTwinCATAlarmGate` binary decoder against `ADSIGRP_TCEVENTLOG_ALARMS` is the next chunk of work |
## Fixture / simulator gaps
### focas-mock simulator doesn't exist
- Blocks integration tests for: f3a (alarm history ring-buffer + `mock_patch_alarmhistory`), f4b (`mock_set_unlock_state`, `mock_get_last_write`), f4c (`pmc_wrpmcrng` handler), f4d (`cnc_wrunlockparam` + `mock_set_password`), f5a (`mock_simulate_cycle_completion`).
- No FOCAS IntegrationTests project exists yet — it needs to be created when the mock lands.
### opc-plc fixture upgrades
- **opcuaclient-10**: `TriggerModelChangeAsync` is a stub. Live HTTP-driven model-change verification deferred. Tests use an inject seam.
- **opcuaclient-11**: `opc-plc-rc` Docker fixture session-open assertion (gated `OPCUACLIENT_TOPOLOGY_TRIGGER_CMD` / `OPCUA_RC_SIM`).
- **opcuaclient-12**: opc-plc `--alm` fixture run for HistoryRead Events (waiting for fixture image upgrade).
- **opcuaclient-13**: opc-plc historian-sim wire-level sweep for the 25 new aggregates (only ~5 likely honoured today).
- **opcuaclient-14**: Two-container failover smoke against opc-plc + opc-plc-secondary on the live fixture.
### AbCip HSBY paired-fixture
- **abcip-5.1/5.2**: `hsby-mux` Python sidecar is a stub; the patched `ab_server` image and live role-flip integration test are gated until that stabilises.
### AbLegacy auto-demote fixture
- **ablegacy-12**: `slc500-faulty` is a commented compose placeholder; tests use the `127.0.0.1:1` ECONNREFUSED trick. Real refusing-proxy fixture is follow-up.
### TCBSD TwinCAT project
- twincat-2.1, 3.1, 3.2, 4.1, 5.1 added new fixture stub files that need to be imported into the actual TwinCAT XAE project before `[TwinCATFact]` integration tests can exercise them:
- `PLC/GVLs/GVL_Perf.TcGVL` + `PLC/POUs/FB_PerfChurn.TcPOU` (twincat-2.1)
- `PLC/DUTs/ST_NestedFlags.TcDUT`, `ST_RecursiveCap.TcDUT`, `ST_AlarmRecord.TcDUT` (twincat-4.1)
- `PLC/GVLs/GVL_Plant.TcGVL` extensions (twincat-4.1)
- `PLC/GVLs/GVL_Alarms.TcGVL` + `PLC/POUs/FB_AlarmHarness.TcPOU` (twincat-5.1)
### Snap7 round-trip tests
- s7-d1 (TIA CSV), s7-d2 (UDT fan-out), s7-d3 (instance-DB), s7-c1 (negotiated PDU), s7-c3 (scan groups), s7-c4 (deadband) integration tests are build-only until run against the live Snap7 fixture.
## Live-firmware / hardware verification
- **s7-c2** — hardened S7-1500 with non-PG TSAP modes (gated `--with-real-plc`).
- **s7-c5** — hardened PLC with PUT/GET disabled (currently only Snap7 happy-path tested).
- **s7-f** — manual checklist: toggle Optimized block access in TIA + Track 3 OPC UA bridge verification.
- **ablegacy-13** — DH+ via real 1756-DHRIO + PLC-5. No Docker fixture possible.
- **twincat-2.1 perf-tier**`Driver_sum_read_1000_tags_beats_loop_baseline_by_5x` gated `TWINCAT_PERF=1`.
- **twincat-2.3** — symbol-version online-change drill (`TWINCAT_MANUAL_ONLINE_CHANGE=1`).
- **focas-f4b/c/d** — live CNC parameter / macro / PMC writes + password-protected CNC.
## Cross-driver / ecosystem
- **opcuaclient-12** — Galaxy A&E projection currently keeps the fixed-field `ReadEventsAsync(sourceName, ...)` overload; richer SelectClause-aware projection on the Galaxy A&E log is best-effort future work.
- **per-driver plan files don't exist** — opcuaclient-12 cross-driver `IHistoryProvider` heads-up went into doc-comments instead. If anyone adds per-driver plan files later, the heads-up note belongs in each.
## Pre-existing red-build issues (NOT touched, will block solution-level CI)
- **NU1902 OpenTelemetry warning-as-error in Admin** — predates the queue.
- **`Server/Phase7/DriverSubscriptionBridge.cs` cref ambiguity** — predates the queue.
Both must be fixed before solution-level CI can pass on the merged-up `task-galaxy-e2e`.
## Integration-branch merge
- **`auto/driver-gaps` has 92 stacked PRs** vs `task-galaxy-e2e`. Final merge needs a careful single review — likely staged or one big PR — and will collide with whatever has landed on `task-galaxy-e2e` in parallel.
## Plan-vs-reality deltas (informational; nothing to chase)
- **focas-f4a/b/c/d** — Plan referenced doc lines that had already been removed in prior evolution (FOCAS.md "intentionally returns BadNotWritable" callout; FOCAS-Test-Fixture.md alarms-not-covered caveat).
- **opcuaclient-12** — Repo has no per-driver plan files for abcip / ablegacy / s7 / twincat — heads-up went into IHistoryProvider doc-comments instead.
- **twincat-4.1**`docs/v3/twincat-backlog.md` doesn't exist; UDT-gap-removal item N/A.
## Highest-leverage cleanup once upstream catches up
When the upstream library bumps, these reflection / `BadNotSupported` paths simplify to direct calls:
- **abcip-3.1**: remove reflection fallback in `LibplctagTagRuntime.TrySetIntAttribute(connection_size, ...)`.
- **abcip-3.2**: remove `LibplctagTagRuntime.TrySetLogicalAddressing` reflection.
- **abcip-3.3**: switch `MultiPacket` runtime from N-reads to true wire-bundle.
- **s7-e1**: replace `S7NetSzlReader.ReadAsync` returning null with real `Plc.ReadSzlAsync`.
- **s7-e2**: replace `ReflectionS7PlcAuthGate` warning path with direct `Plc.SendPasswordAsync` call.
- **twincat-5.1**: ship `AdsTwinCATAlarmGate` binary decoder.
+3 -1
View File
@@ -913,7 +913,9 @@ after 6.4 (uses its data). 6.W last.
- `Server/Configuration/DriverFactoryRegistry.cs` — remove the
`legacy-host` switch arm.
**Depends on:** PR 7.1 fully soaked (no rollback risk).
**Depends on:** parity matrix in `docs/v2/Galaxy.ParityMatrix.md` is
fully green or carries documented accepted-deltas (verified
2026-04-30 on the dev rig: 14 passed / 1 skipped / 0 failed).
#### PR 7.3 — Doc + memory housekeeping
File diff suppressed because it is too large Load Diff
+3 -3
View File
@@ -73,13 +73,13 @@ Assert-TextFound "ScriptedAlarmSource implements IAlarmSource" "class ScriptedAl
Assert-TextFound "IAlarmStateStore abstraction + in-memory default" "class InMemoryAlarmStateStore" @("src/ZB.MOM.WW.OtOpcUa.Core.ScriptedAlarms/IAlarmStateStore.cs")
Write-Host ""
Write-Host "Stream D - Core.AlarmHistorian (SQLite store-and-forward + Galaxy.Host IPC contracts)"
Write-Host "Stream D - Core.AlarmHistorian (SQLite store-and-forward; alarm-event sidecar IPC moved to Driver.Historian.Wonderware.Client in PR 3.4)"
Assert-FileExists "Core.AlarmHistorian project" "src/ZB.MOM.WW.OtOpcUa.Core.AlarmHistorian/ZB.MOM.WW.OtOpcUa.Core.AlarmHistorian.csproj"
Assert-TextFound "SqliteStoreAndForwardSink backoff ladder (1s..60s cap)" "BackoffLadder" @("src/ZB.MOM.WW.OtOpcUa.Core.AlarmHistorian/SqliteStoreAndForwardSink.cs")
Assert-TextFound "Default 1M row capacity + 30-day dead-letter retention (plan decision #21)" "DefaultDeadLetterRetention" @("src/ZB.MOM.WW.OtOpcUa.Core.AlarmHistorian/SqliteStoreAndForwardSink.cs")
Assert-TextFound "Per-event outcomes (Ack/RetryPlease/PermanentFail)" "HistorianWriteOutcome" @("src/ZB.MOM.WW.OtOpcUa.Core.AlarmHistorian/IAlarmHistorianSink.cs")
Assert-TextFound "Galaxy.Host IPC contract HistorianAlarmEventRequest" "class HistorianAlarmEventRequest" @("src/ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Shared/Contracts/HistorianAlarms.cs")
Assert-TextFound "Historian connectivity status notification" "HistorianConnectivityStatusNotification" @("src/ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Shared/Contracts/HistorianAlarms.cs")
# Galaxy.Shared pipe-IPC contracts retired in PR 7.2 alongside the rest of the legacy
# Galaxy projects. Wonderware sidecar contracts live in Driver.Historian.Wonderware.Client.
Write-Host ""
Write-Host "Stream E - Config DB schema"
+4 -2
View File
@@ -63,7 +63,9 @@ live driver. The factory-wiring block that originally gated stages
Live-boot verification:
- **Galaxy** — 7/7 stages (read / write / subscribe / alarms / history)
against a real Galaxy + `OtOpcUaGalaxyHost` on this dev box.
against a real Galaxy via the in-process `GalaxyDriver`
`mxaccessgw` (gRPC). PR 7.2 retired the legacy `OtOpcUaGalaxyHost`
out-of-process driver path.
- **AB CIP, S7** — 5/5 stages each under task #220 against the
`ab_server` + `python-snap7` fixtures.
- **AB Legacy** — 5/5 stages under task #222 against `ab_server` SLC500
@@ -155,7 +157,7 @@ section to skip it.
| Modbus | — | **PASS** (pymodbus fixture) |
| AB CIP | — | **PASS** (ab_server fixture) |
| AB Legacy | — | **PASS** (ab_server SLC500/MicroLogix/PLC-5 profiles; `/1,0` cip-path required for the Docker fixture) |
| Galaxy | — | **PASS** (requires OtOpcUaGalaxyHost + a live Galaxy; 7 stages including alarms + history) |
| Galaxy | — | **PASS** (requires mxaccessgw running + a live Galaxy; 7 stages including alarms + history; PR 7.2 retired the legacy OtOpcUaGalaxyHost path) |
| S7 | — | **PASS** (python-snap7 fixture) |
| FOCAS | `FOCAS_TRUST_WIRE=1` | **SKIP** (no public simulator — task #222 lab rig) |
| TwinCAT | `TWINCAT_TRUST_WIRE=1` | **SKIP** by default; features **validated** against the TCBSD VM fixture — set the env var to run |
+6 -6
View File
@@ -3,14 +3,14 @@
"modbus": {
"$comment": "Port 5020 matches tests/.../Modbus.IntegrationTests/Docker/docker-compose.yml — `docker compose --profile standard up -d`.",
"endpoint": "127.0.0.1:5020",
"endpoint": "10.100.0.35:5020",
"bridgeNodeId": "ns=2;s=Modbus/HR200",
"opcUaUrl": "opc.tcp://localhost:4840"
},
"abcip": {
"$comment": "ab_server listens on port 44818 (default CIP/EIP). `docker compose --profile controllogix up -d`.",
"gateway": "ab://127.0.0.1:44818/1,0",
"gateway": "ab://10.100.0.35:44818/1,0",
"family": "ControlLogix",
"tagPath": "TestDINT",
"bridgeNodeId": "ns=2;s=AbCip/TestDINT"
@@ -18,7 +18,7 @@
"ablegacy": {
"$comment": "Works against ab_server --profile slc500 (Docker fixture) or real SLC/MicroLogix/PLC-5 hardware. `/1,0` cip-path is required for the Docker fixture; real hardware accepts an empty path — e.g. `ab://10.0.1.50:44818/`.",
"gateway": "ab://127.0.0.1/1,0",
"gateway": "ab://10.100.0.35/1,0",
"plcType": "Slc500",
"address": "N7:5",
"bridgeNodeId": "ns=2;s=AbLegacy/N7_5"
@@ -26,7 +26,7 @@
"s7": {
"$comment": "Port 1102 matches tests/.../S7.IntegrationTests/Docker/docker-compose.yml (python-snap7 needs non-priv port). `docker compose --profile s7_1500 up -d`. Real S7 PLCs listen on 102.",
"endpoint": "127.0.0.1:1102",
"endpoint": "10.100.0.35:1102",
"cpu": "S71500",
"slot": 0,
"address": "DB1.DBW0",
@@ -50,7 +50,7 @@
},
"galaxy": {
"$comment": "Galaxy (MXAccess) driver. Has no per-driver CLI — all stages go through otopcua-cli against the published NodeIds. Seven stages: probe / source read / virtual-tag bridge / subscribe-sees-change / reverse write / alarm fires / history read. Requires OtOpcUaGalaxyHost running + seed-phase-7-smoke.sql applied with a real Galaxy attribute substituted into dbo.Tag.TagConfig.",
"$comment": "Galaxy (MXAccess) driver. Has no per-driver CLI — all stages go through otopcua-cli against the published NodeIds. Seven stages: probe / source read / virtual-tag bridge / subscribe-sees-change / reverse write / alarm fires / history read. The driver is now the in-process GalaxyDriver (DriverType = 'GalaxyMxGateway') talking gRPC to a separately-installed mxaccessgw at http://localhost:5120 by default — override via the DriverInstance row's DriverConfig. PR 7.2 retired the legacy 'Galaxy' DriverType + OtOpcUaGalaxyHost service.",
"sourceNodeId": "ns=2;s=p7-smoke-tag-source",
"virtualNodeId": "ns=2;s=p7-smoke-vt-derived",
"alarmNodeId": "ns=2;s=p7-smoke-al-overtemp",
@@ -62,7 +62,7 @@
"opcuaclient": {
"$comment": "OPC UA Client (gateway) driver. Default opc-plc Docker fixture exposes ns=3;s=FastUInt1 as a ticker. The `bridgeNodeId` is the local mirror of remoteNodeId after the OpcUaClient driver's DiscoverAsync runs — dev-specific. Stages 5/7/8 are opt-in: supply writable* NodeIds to enable reverse-bridge, alarmNodeId to enable alarm, historyNodeId to enable history (opc-plc does not historize by default — a Prosys / UA Expert sample server is needed for stage 8).",
"remoteUrl": "opc.tcp://localhost:50000",
"remoteUrl": "opc.tcp://10.100.0.35:50000",
"remoteNodeId": "ns=3;s=FastUInt1",
"bridgeNodeId": "ns=2;s=OpcUaClient/FastUInt1",
"bridgeRootNodeId": "ns=2;s=OpcUaClient",
-298
View File
@@ -1,298 +0,0 @@
#Requires -Version 7.0
<#
.SYNOPSIS
End-to-end CLI test for the Galaxy (MXAccess) driver read, write, subscribe,
alarms, and history through a running OtOpcUa server.
.DESCRIPTION
Unlike the other e2e scripts there is no `otopcua-galaxy-cli` the Galaxy
driver proxy lives in-process with the server + talks to `OtOpcUaGalaxyHost`
over a named pipe (MXAccess is 32-bit COM, can't ship in the .NET 10 process).
Every stage therefore goes through `otopcua-cli` against the published OPC UA
address space.
Seven stages:
1. Probe otopcua-cli connect + read the source NodeId; confirms
the whole Galaxy.Host Proxy server client chain is
up
2. Source read otopcua-cli read returns a Good value for the source
attribute; proves IReadable.ReadAsync is dispatching
through the IPC bridge
3. Virtual-tag bridge `otopcua-cli read` on the VirtualTag NodeId; confirms
the Phase 7 CachedTagUpstreamSource is bridging the
driver-sourced input into the scripting engine
4. Subscribe-sees-change subscribe to the source NodeId in the background;
Galaxy pushes a data-change event within N seconds
(Galaxy's underlying attribute must be actively
changing production Galaxies typically have
scan-driven updates; for idle galaxies, widen
-ChangeWaitSec or drive the write stage below first)
5. Reverse bridge `otopcua-cli write` to a writable Galaxy attribute;
read it back. Gracefully becomes INFO-only if the
attribute's Galaxy-side AccessLevel forbids writes
(BadUserAccessDenied / BadNotWritable)
6. Alarm fires subscribe to the scripted-alarm Condition NodeId,
drive the source tag above its threshold, confirm an
Active alarm event surfaces. Exercises the Part 9
alarm-condition propagation path
7. History read historyread on the source tag over the last hour;
confirms Aveva Historian IHistoryProvider dispatch
returns samples
The Phase 7 seed (`scripts/smoke/seed-phase-7-smoke.sql`) already plants the
right shape one Galaxy DriverInstance, one source Tag, one VirtualTag
(source × 2), one ScriptedAlarm (source > 50). Substitute the real Galaxy
attribute FullName into `dbo.Tag.TagConfig` before running.
.PARAMETER OpcUaUrl
OtOpcUa server endpoint. Default opc.tcp://localhost:4840.
.PARAMETER SourceNodeId
NodeId of the driver-sourced Galaxy tag (numeric, writable preferred). NodeIds
are path-based per OPC UA Part 3 §5.2.2 the default matches the Phase 7 seed
walking `p7-smoke-galaxy` (DriverInstanceId) `lab-floor` `galaxy-line`
`reactor-1` `Source` (Tag.Name).
.PARAMETER VirtualNodeId
NodeId of the VirtualTag that computes MachineStatus = (Source > 0) (Phase 7
scripting). Same path-based scheme, ending in the VirtualTag.Name
(`MachineStatus`). The tag is historized so the write/subscribe exercise
doubles as a historian-sink check.
.PARAMETER AlarmNodeId
NodeId of the scripted-alarm Condition (fires when Source > 50). Same
path-based scheme, ending in ScriptedAlarm.Name (`OverTemp`).
.PARAMETER AlarmTriggerValue
Value written to -SourceNodeId to push it over the alarm threshold.
Default 75 (well above the seeded 50-threshold).
.PARAMETER ChangeWaitSec
Seconds the subscribe-sees-change stage waits for a natural data change.
Default 10. Idle galaxies may need this extended or the stage will fail
with "subscribe did not observe...".
.PARAMETER AlarmWaitSec
Seconds the alarm-fires stage waits after triggering the write. Default 10.
.PARAMETER HistoryLookbackSec
Seconds back from now to query history. Default 3600 (1 h).
.EXAMPLE
# Against the default Phase-7 smoke seed + live Galaxy + OtOpcUa server
./scripts/e2e/test-galaxy.ps1
.EXAMPLE
# Custom NodeIds from a non-smoke cluster
./scripts/e2e/test-galaxy.ps1 `
-SourceNodeId "ns=2;s=Reactor1.Temperature" `
-VirtualNodeId "ns=2;s=Reactor1.TempDoubled" `
-AlarmNodeId "ns=2;s=Reactor1.OverTemp" `
-AlarmTriggerValue 120
#>
param(
[string]$OpcUaUrl = "opc.tcp://localhost:4840",
[string]$SourceNodeId = "ns=2;s=p7-smoke-galaxy/lab-floor/galaxy-line/reactor-1/Source",
[string]$VirtualNodeId = "ns=2;s=p7-smoke-galaxy/lab-floor/galaxy-line/reactor-1/MachineStatus",
[string]$AlarmNodeId = "ns=2;s=p7-smoke-galaxy/lab-floor/galaxy-line/reactor-1/OverTemp",
[string]$AlarmTriggerValue = "75",
[int]$ChangeWaitSec = 10,
[int]$AlarmWaitSec = 10,
[int]$HistoryLookbackSec = 3600,
# The default Phase 7 seed uses a Galaxy attribute with
# security_classification=Operate. Anonymous OPC UA sessions are denied writes
# against Operate-classified tags (PR 26 / docs/Security.md). Supply an LDAP
# user with WriteOperate to exercise the reverse-bridge stage — e.g.
# `-Username writeop -Password writeop123` against the dev-box GLAuth.
[string]$Username = "",
[string]$Password = ""
)
$ErrorActionPreference = "Stop"
. "$PSScriptRoot/_common.ps1"
$opcUaCli = Get-CliInvocation `
-ProjectFolder "src/ZB.MOM.WW.OtOpcUa.Client.CLI" `
-ExeName "otopcua-cli"
# Auth-extension helper — appends `-U / -P` to the CLI args when credentials
# were supplied. Stays empty for anonymous runs so the default smoke path
# doesn't require an LDAP round-trip.
$authArgs = @()
if ($Username) { $authArgs += @("-U", $Username) }
if ($Password) { $authArgs += @("-P", $Password) }
$results = @()
# ---------------------------------------------------------------------------
# Stage 1 — Probe. The probe is an otopcua-cli read against the source NodeId;
# success implies Galaxy.Host is up + the pipe ACL lets the server connect +
# the Proxy is tracking the tag + the server published it.
# ---------------------------------------------------------------------------
Write-Header "Probe"
$probe = Invoke-Cli -Cli $opcUaCli -Args (@("read", "-u", $OpcUaUrl, "-n", $SourceNodeId) + $authArgs)
if ($probe.ExitCode -eq 0 -and $probe.Output -match "Status:\s+0x00000000") {
Write-Pass "source NodeId readable (Galaxy pipe → proxy → server → client chain up)"
$results += @{ Passed = $true }
} else {
Write-Fail "probe read failed (exit=$($probe.ExitCode))"
Write-Host $probe.Output
$results += @{ Passed = $false; Reason = "probe failed" }
}
# ---------------------------------------------------------------------------
# Stage 2 — Source read. Captures the current value for the later virtual-tag
# comparison + confirms read dispatch works end-to-end. Failure here without a
# stage-1 failure would be unusual — probe already reads.
# ---------------------------------------------------------------------------
Write-Header "Source read"
$sourceRead = Invoke-Cli -Cli $opcUaCli -Args (@("read", "-u", $OpcUaUrl, "-n", $SourceNodeId) + $authArgs)
$sourceValue = $null
if ($sourceRead.ExitCode -eq 0 -and $sourceRead.Output -match "Value:\s+([^\r\n]+)") {
$sourceValue = $Matches[1].Trim()
Write-Pass "source value = $sourceValue"
$results += @{ Passed = $true }
} else {
Write-Fail "source read failed"
Write-Host $sourceRead.Output
$results += @{ Passed = $false; Reason = "source read failed" }
}
# ---------------------------------------------------------------------------
# Stage 3 — Virtual-tag bridge. Reads the Phase 7 VirtualTag (source × 2). Not
# strictly driver-specific, but exercises the CachedTagUpstreamSource bridge
# (the seam most likely to silently stop working after a Galaxy-side change).
# Skip if the VirtualNodeId param is empty (non-Phase-7 clusters).
# ---------------------------------------------------------------------------
if ([string]::IsNullOrEmpty($VirtualNodeId)) {
Write-Header "Virtual-tag bridge"
Write-Skip "VirtualNodeId not supplied — skipping Phase 7 bridge check"
} else {
Write-Header "Virtual-tag bridge"
$vtRead = Invoke-Cli -Cli $opcUaCli -Args (@("read", "-u", $OpcUaUrl, "-n", $VirtualNodeId) + $authArgs)
if ($vtRead.ExitCode -eq 0 -and $vtRead.Output -match "Value:\s+([^\r\n]+)") {
$vtValue = $Matches[1].Trim()
Write-Pass "virtual-tag value = $vtValue (source was $sourceValue)"
$results += @{ Passed = $true }
} else {
Write-Fail "virtual-tag read failed"
Write-Host $vtRead.Output
$results += @{ Passed = $false; Reason = "virtual-tag read failed" }
}
}
# ---------------------------------------------------------------------------
# Stage 4 — Subscribe-sees-change. otopcua-cli subscribe in the background;
# wait N seconds for Galaxy to push any data-change event on the source node.
# This is optimistic — if the Galaxy attribute is idle, widen -ChangeWaitSec.
# ---------------------------------------------------------------------------
Write-Header "Subscribe sees change"
$stdout = New-TemporaryFile
$stderr = New-TemporaryFile
$subArgs = @($opcUaCli.PrefixArgs) + @(
"subscribe", "-u", $OpcUaUrl, "-n", $SourceNodeId,
"-i", "500", "--duration", "$ChangeWaitSec") + $authArgs
$subProc = Start-Process -FilePath $opcUaCli.File `
-ArgumentList $subArgs -NoNewWindow -PassThru `
-RedirectStandardOutput $stdout.FullName `
-RedirectStandardError $stderr.FullName
Write-Info "subscription started (pid $($subProc.Id)) for ${ChangeWaitSec}s"
$subProc.WaitForExit(($ChangeWaitSec + 5) * 1000) | Out-Null
if (-not $subProc.HasExited) { Stop-Process -Id $subProc.Id -Force }
$subOut = (Get-Content $stdout.FullName -Raw) + (Get-Content $stderr.FullName -Raw)
Remove-Item $stdout.FullName, $stderr.FullName -ErrorAction SilentlyContinue
# Any `=` followed by `(Good)` line after the initial subscribe-confirmation
# indicates at least one data-change tick arrived. The `@(...)` forces an array
# so `.Count` works on the 0-match + single-match cases that Set-StrictMode
# -Version 3.0 otherwise flags as `property 'Count' cannot be found`.
$changeLines = @(($subOut -split "`n") | Where-Object { $_ -match "=\s+.*\(Good\)" })
if ($changeLines.Count -gt 0) {
Write-Pass "$($changeLines.Count) data-change events observed"
$results += @{ Passed = $true }
} else {
Write-Fail "no data-change events in ${ChangeWaitSec}s — Galaxy attribute may be idle; rerun with -ChangeWaitSec larger, or trigger a change first"
Write-Host $subOut
$results += @{ Passed = $false; Reason = "no data-change" }
}
# ---------------------------------------------------------------------------
# Stage 5 — Reverse bridge (OPC UA write → Galaxy). Galaxy attributes with
# AccessLevel > FreeAccess often reject anonymous writes; record as INFO when
# that's the case rather than failing the whole script.
# ---------------------------------------------------------------------------
Write-Header "Reverse bridge (OPC UA write)"
$writeValue = [int]$AlarmTriggerValue # reuse the alarm trigger value — two stages for one write
$w = Invoke-Cli -Cli $opcUaCli -Args (@(
"write", "-u", $OpcUaUrl, "-n", $SourceNodeId, "-v", "$writeValue") + $authArgs)
if ($w.ExitCode -ne 0) {
# Connection/protocol failure — still a test failure.
Write-Fail "write CLI exit=$($w.ExitCode)"
Write-Host $w.Output
$results += @{ Passed = $false; Reason = "write failed" }
} elseif ($w.Output -match "Write failed:\s*0x801F0000") {
Write-Info "BadUserAccessDenied — attribute's Galaxy-side ACL blocks writes for this session. Not a bug; grant WriteOperate or run against a writable attribute."
$results += @{ Passed = $true; Reason = "acl-expected" }
} elseif ($w.Output -match "Write failed:\s*0x80390000|BadNotWritable") {
Write-Info "BadNotWritable — attribute is read-only at the Galaxy layer (status attributes, @-prefixed meta, etc)."
$results += @{ Passed = $true; Reason = "readonly-expected" }
} elseif ($w.Output -match "Write successful") {
# Read back — Galaxy poll interval + MXAccess advise may need a second or two to settle.
Start-Sleep -Seconds 2
$r = Invoke-Cli -Cli $opcUaCli -Args (@("read", "-u", $OpcUaUrl, "-n", $SourceNodeId) + $authArgs)
if ($r.Output -match "Value:\s+$([Regex]::Escape("$writeValue"))\b") {
Write-Pass "write propagated — source reads back $writeValue"
$results += @{ Passed = $true }
} else {
Write-Fail "write reported success but read-back did not reflect $writeValue"
Write-Host $r.Output
$results += @{ Passed = $false; Reason = "write-readback mismatch" }
}
} else {
Write-Fail "unexpected write response"
Write-Host $w.Output
$results += @{ Passed = $false; Reason = "unexpected write response" }
}
# ---------------------------------------------------------------------------
# Stage 6 — Alarm fires. Uses the helper from _common.ps1. If stage 5 already
# wrote the trigger value the alarm may already be active; that's fine — the
# Part 9 ConditionRefresh in the alarms CLI replays the current state so the
# subscribe window still captures the Active event.
# ---------------------------------------------------------------------------
if ([string]::IsNullOrEmpty($AlarmNodeId)) {
Write-Header "Alarm fires on threshold"
Write-Skip "AlarmNodeId not supplied — skipping alarm check"
} else {
$results += Test-AlarmFiresOnThreshold `
-OpcUaCli $opcUaCli `
-OpcUaUrl $OpcUaUrl `
-AlarmNodeId $AlarmNodeId `
-InputNodeId $SourceNodeId `
-TriggerValue $AlarmTriggerValue `
-DurationSec $AlarmWaitSec
}
# ---------------------------------------------------------------------------
# Stage 7 — History read. historyread against the source tag over the last N
# seconds. Failure modes the skip pattern catches: tag not historized in the
# Galaxy attribute's historization profile, or the lookback window misses the
# sample cadence.
# ---------------------------------------------------------------------------
$results += Test-HistoryHasSamples `
-OpcUaCli $opcUaCli `
-OpcUaUrl $OpcUaUrl `
-NodeId $SourceNodeId `
-LookbackSec $HistoryLookbackSec
Write-Summary -Title "Galaxy e2e" -Results $results
if ($results | Where-Object { -not $_.Passed }) { exit 1 }
+3 -3
View File
@@ -11,7 +11,7 @@
of this test use `otopcua-cli` against two different endpoints:
remote = the upstream OPC UA server the driver connects to (opc-plc fixture
by default, opc.tcp://localhost:50000)
by default, opc.tcp://10.100.0.35:50000)
local = the OtOpcUa server itself, which mirrors remote nodes through the
OpcUaClient driver instance (opc.tcp://localhost:4840)
@@ -72,7 +72,7 @@
.PARAMETER RemoteUrl
Upstream OPC UA server endpoint (the server the driver connects to).
Default matches the opc-plc Docker fixture opc.tcp://localhost:50000.
Default matches the opc-plc Docker fixture opc.tcp://10.100.0.35:50000.
.PARAMETER OpcUaUrl
Local OtOpcUa server endpoint. Default opc.tcp://localhost:4840.
@@ -146,7 +146,7 @@
#>
param(
[string]$RemoteUrl = "opc.tcp://localhost:50000",
[string]$RemoteUrl = "opc.tcp://10.100.0.35:50000",
[string]$OpcUaUrl = "opc.tcp://localhost:4840",
[string]$RemoteNodeId = "ns=3;s=FastUInt1",
[Parameter(Mandatory)] [string]$BridgeNodeId,
+94 -54
View File
@@ -1,39 +1,52 @@
<#
.SYNOPSIS
Registers the two v2 Windows services on a node: OtOpcUa (main server, net10) and
OtOpcUaGalaxyHost (out-of-process Galaxy COM host, net48 x86).
Registers the v2 Windows services on a node: OtOpcUa (main server, net10) and
optionally OtOpcUaWonderwareHistorian (Wonderware historian sidecar).
.DESCRIPTION
Phase 2 Stream D.2 replaces the v1 single-service install (TopShelf-based OtOpcUa.Host).
Installs both services with the correct service-account SID + per-process shared secret
provisioning per `driver-stability.md §"IPC Security"`. Galaxy.Host depends on OtOpcUa
(Galaxy.Host must be reachable when OtOpcUa starts; service dependency wiring + retry
handled by OtOpcUa.Server NodeBootstrap).
PR 7.2 retired the legacy out-of-process OtOpcUaGalaxyHost service alongside the
GalaxyProxyDriver / GalaxyHost / GalaxyShared projects. Galaxy access now flows
through the in-process GalaxyDriver talking gRPC to a separately-installed
mxaccessgw. The mxaccessgw server runs out of its own repo
(`c:\Users\dohertj2\Desktop\mxaccessgw\`) see
`docs/v2/Galaxy.ParityRig.md` for the gw setup recipe.
.PARAMETER InstallRoot
Where the binaries live (typically C:\Program Files\OtOpcUa).
.PARAMETER ServiceAccount
Service account SID or DOMAIN\name. Both services run under this account; the
Galaxy.Host pipe ACL only allows this SID to connect (decision #76).
Service account SID or DOMAIN\name. The OtOpcUa service runs under this account.
.PARAMETER GalaxySharedSecret
Per-process secret passed to Galaxy.Host via env var. Generated freshly per install.
.PARAMETER InstallWonderwareHistorian
Gate the OtOpcUaWonderwareHistorian sidecar install. Off by default; set when
the deployment uses the Wonderware historian for history reads + alarm-event
persistence.
.PARAMETER ZbConnection
Galaxy ZB SQL connection string (passed to Galaxy.Host via env var).
.PARAMETER HistorianSharedSecret
Per-process secret passed to the Historian sidecar via env var. Generated
freshly per install when not supplied.
.EXAMPLE
.\Install-Services.ps1 -InstallRoot 'C:\Program Files\OtOpcUa' -ServiceAccount 'OTOPCUA\svc-otopcua'
.EXAMPLE
.\Install-Services.ps1 -InstallRoot 'C:\Program Files\OtOpcUa' -ServiceAccount 'OTOPCUA\svc-otopcua' `
-InstallWonderwareHistorian
#>
[CmdletBinding()]
param(
[Parameter(Mandatory)] [string]$InstallRoot,
[Parameter(Mandatory)] [string]$ServiceAccount,
[string]$GalaxySharedSecret,
[string]$ZbConnection = 'Server=localhost;Database=ZB;Integrated Security=True;TrustServerCertificate=True;Encrypt=False;',
[string]$GalaxyClientName = 'OtOpcUa-Galaxy.Host',
[string]$GalaxyPipeName = 'OtOpcUaGalaxy'
# PR 3.W — Wonderware historian sidecar. Optional; gates the
# OtOpcUaWonderwareHistorian service. Secret + pipe defaults match the server's
# Historian:Wonderware appsettings block.
[switch]$InstallWonderwareHistorian,
[string]$HistorianSharedSecret,
[string]$HistorianPipeName = 'OtOpcUaWonderwareHistorian',
[string]$HistorianServer = 'localhost',
[int]$HistorianPort = 32568,
[string[]]$AvevaServiceDependencies = @('NmxSvc', 'aaBootstrap', 'aaGR')
)
$ErrorActionPreference = 'Stop'
@@ -42,17 +55,18 @@ if (-not (Test-Path "$InstallRoot\OtOpcUa.Server.exe")) {
Write-Error "OtOpcUa.Server.exe not found at $InstallRoot — copy the publish output first"
exit 1
}
if (-not (Test-Path "$InstallRoot\Galaxy\OtOpcUa.Driver.Galaxy.Host.exe")) {
Write-Error "OtOpcUa.Driver.Galaxy.Host.exe not found at $InstallRoot\Galaxy — copy the publish output first"
exit 1
}
# Generate a fresh shared secret per install if not supplied. Stored in DPAPI-protected file
# rather than the registry so the service account can read it but other local users cannot.
if (-not $GalaxySharedSecret) {
# Generate fresh shared secrets per install if not supplied.
function New-SharedSecret {
$bytes = New-Object byte[] 32
[System.Security.Cryptography.RandomNumberGenerator]::Create().GetBytes($bytes)
$GalaxySharedSecret = [Convert]::ToBase64String($bytes)
return [Convert]::ToBase64String($bytes)
}
if ($InstallWonderwareHistorian -and -not $HistorianSharedSecret) { $HistorianSharedSecret = New-SharedSecret }
if ($InstallWonderwareHistorian -and -not (Test-Path "$InstallRoot\WonderwareHistorian\OtOpcUa.Driver.Historian.Wonderware.exe")) {
Write-Error "OtOpcUa.Driver.Historian.Wonderware.exe not found at $InstallRoot\WonderwareHistorian — copy the publish output first"
exit 1
}
# Resolve the SID — the IPC ACL needs the SID, not the down-level name.
@@ -62,41 +76,67 @@ $sid = if ($ServiceAccount.StartsWith('S-1-')) {
(New-Object System.Security.Principal.NTAccount $ServiceAccount).Translate([System.Security.Principal.SecurityIdentifier]).Value
}
# --- Install OtOpcUaGalaxyHost first (OtOpcUa starts after, depends on it being up).
$galaxyEnv = @(
"OTOPCUA_GALAXY_PIPE=$GalaxyPipeName"
"OTOPCUA_ALLOWED_SID=$sid"
"OTOPCUA_GALAXY_SECRET=$GalaxySharedSecret"
"OTOPCUA_GALAXY_BACKEND=mxaccess"
"OTOPCUA_GALAXY_ZB_CONN=$ZbConnection"
"OTOPCUA_GALAXY_CLIENT_NAME=$GalaxyClientName"
) -join "`0"
$galaxyEnv += "`0`0"
# --- Install OtOpcUaWonderwareHistorian (PR 3.W) — separate sidecar that exposes the
# Wonderware Historian SDK via a named-pipe protocol consumed by the .NET 10 server.
# Optional: only installed when -InstallWonderwareHistorian is supplied. Depends on the
# hard AVEVA services that host the historian SDK runtime path.
$historianDepend = $null
if ($InstallWonderwareHistorian) {
$historianEnv = @(
"OTOPCUA_HISTORIAN_PIPE=$HistorianPipeName"
"OTOPCUA_ALLOWED_SID=$sid"
"OTOPCUA_HISTORIAN_SECRET=$HistorianSharedSecret"
"OTOPCUA_HISTORIAN_ENABLED=true"
"OTOPCUA_HISTORIAN_SERVER=$HistorianServer"
"OTOPCUA_HISTORIAN_PORT=$HistorianPort"
) -join "`0"
$historianEnv += "`0`0"
Write-Host "Installing OtOpcUaGalaxyHost..."
& sc.exe create OtOpcUaGalaxyHost binPath= "`"$InstallRoot\Galaxy\OtOpcUa.Driver.Galaxy.Host.exe`"" `
DisplayName= 'OtOpcUa Galaxy Host (out-of-process MXAccess)' `
start= auto `
obj= $ServiceAccount | Out-Null
Write-Host "Installing OtOpcUaWonderwareHistorian..."
& sc.exe create OtOpcUaWonderwareHistorian binPath= "`"$InstallRoot\WonderwareHistorian\OtOpcUa.Driver.Historian.Wonderware.exe`"" `
DisplayName= 'OtOpcUa Wonderware Historian Sidecar (out-of-process aahClient)' `
start= auto `
depend= ($AvevaServiceDependencies -join '/') `
obj= $ServiceAccount | Out-Null
& sc.exe config OtOpcUaWonderwareHistorian start= delayed-auto | Out-Null
# Set per-service environment variables via the registry — sc.exe doesn't expose them directly.
$svcKey = "HKLM:\SYSTEM\CurrentControlSet\Services\OtOpcUaGalaxyHost"
$envValue = $galaxyEnv.Split("`0") | Where-Object { $_ -ne '' }
Set-ItemProperty -Path $svcKey -Name 'Environment' -Type MultiString -Value $envValue
$svcKey = "HKLM:\SYSTEM\CurrentControlSet\Services\OtOpcUaWonderwareHistorian"
$envValue = $historianEnv.Split("`0") | Where-Object { $_ -ne '' }
Set-ItemProperty -Path $svcKey -Name 'Environment' -Type MultiString -Value $envValue
$historianDepend = 'OtOpcUaWonderwareHistorian'
}
# --- Install OtOpcUa. Galaxy access flows through GalaxyDriver → mxaccessgw (gRPC),
# so OtOpcUa no longer depends on a sibling service for Galaxy connectivity. The
# mxaccessgw is installed separately. When the Wonderware sidecar is installed,
# depend on it for startup ordering.
$otOpcUaDepends = @()
if ($historianDepend) { $otOpcUaDepends += $historianDepend }
# --- Install OtOpcUa (depends on Galaxy host being installed; doesn't strictly require it
# started — OtOpcUa.Server NodeBootstrap retries on the IPC connect path).
Write-Host "Installing OtOpcUa..."
& sc.exe create OtOpcUa binPath= "`"$InstallRoot\OtOpcUa.Server.exe`"" `
DisplayName= 'OtOpcUa Server' `
start= auto `
depend= 'OtOpcUaGalaxyHost' `
obj= $ServiceAccount | Out-Null
$createArgs = @(
'create', 'OtOpcUa',
'binPath=', "`"$InstallRoot\OtOpcUa.Server.exe`"",
'DisplayName=', 'OtOpcUa Server',
'start=', 'auto',
'obj=', $ServiceAccount
)
if ($otOpcUaDepends.Count -gt 0) {
$createArgs += @('depend=', ($otOpcUaDepends -join '/'))
}
& sc.exe @createArgs | Out-Null
Write-Host ""
Write-Host "Installed. Start with:"
Write-Host " sc.exe start OtOpcUaGalaxyHost"
if ($InstallWonderwareHistorian) { Write-Host " sc.exe start OtOpcUaWonderwareHistorian" }
Write-Host " sc.exe start OtOpcUa"
if ($InstallWonderwareHistorian) {
Write-Host ""
Write-Host "Wonderware historian shared secret (configure into appsettings.json Historian:Wonderware:SharedSecret):"
Write-Host " $HistorianSharedSecret"
}
Write-Host ""
Write-Host "Galaxy shared secret (record this offline — required for service rebinding):"
Write-Host " $GalaxySharedSecret"
Write-Host "NOTE: Galaxy access flows through mxaccessgw — install + run that separately"
Write-Host " per docs/v2/Galaxy.ParityRig.md. OtOpcUa connects via the Galaxy.Gateway"
Write-Host " section of appsettings.json (default endpoint http://localhost:5120)."
+9 -2
View File
@@ -1,11 +1,18 @@
<#
.SYNOPSIS
Stops + removes the two v2 services. Mirrors Install-Services.ps1.
Stops + removes the v2 services. Mirrors Install-Services.ps1.
.DESCRIPTION
PR 7.2 retired the legacy OtOpcUaGalaxyHost service. Galaxy access now flows
through the in-process GalaxyDriver against a separately-installed mxaccessgw.
OtOpcUaGalaxyHost is included in the cleanup loop below so this script safely
removes it from any rig still carrying the legacy service from a pre-7.2
install.
#>
[CmdletBinding()] param()
$ErrorActionPreference = 'Continue'
foreach ($svc in 'OtOpcUa', 'OtOpcUaGalaxyHost') {
foreach ($svc in 'OtOpcUa', 'OtOpcUaWonderwareHistorian', 'OtOpcUaGalaxyHost') {
if (Get-Service $svc -ErrorAction SilentlyContinue) {
Write-Host "Stopping $svc..."
Stop-Service $svc -Force -ErrorAction SilentlyContinue
+21
View File
@@ -0,0 +1,21 @@
{
"auto-managed": 10,
"cross-driver": 14,
"driver/abcip": 13,
"driver/ablegacy": 16,
"driver/focas": 11,
"driver/opcuaclient": 12,
"driver/s7": 19,
"driver/twincat": 17,
"phase/1": 8,
"phase/2": 7,
"phase/3": 6,
"phase/4": 5,
"phase/5": 4,
"phase/6": 3,
"queue/blocked": 2,
"queue/done": 15,
"queue/failed": 9,
"queue/in-progress": 1,
"queue/queued": 18
}
+320
View File
@@ -0,0 +1,320 @@
- id: twincat-1.1
driver: twincat
phase: 1
plan_pr_id: "1.1"
title: "TwinCAT — Int64 fidelity for LINT/ULINT"
plan_anchor: "docs/plans/twincat-plan.md"
summary: |
Map LInt/ULInt to DriverDataType.Int64 instead of silently truncating to Int32.
The TwinCATDataType.cs:40 truncation comment "matches Int64 gap" is removed and
MapToClrType already returns long/ulong, so the wire-level read returns the
correct boxed types. May add Int64 to Core.Abstractions DriverDataType enum if
missing. Closes a long-standing fixture caveat noted in the test suite.
files:
- "src/ZB.MOM.WW.OtOpcUa.Driver.TwinCAT/TwinCATDataType.cs"
- "src/ZB.MOM.WW.OtOpcUa.Core.Abstractions/DriverDataType.cs"
docs:
- "docs/Driver.TwinCAT.Cli.md"
- "docs/drivers/TwinCAT-Test-Fixture.md"
fixture:
- "tests/ZB.MOM.WW.OtOpcUa.Driver.TwinCAT.IntegrationTests/TwinCatProject/PLC/GVLs/GVL_Primitives.TcGVL"
- "tests/ZB.MOM.WW.OtOpcUa.Driver.TwinCAT.IntegrationTests/TwinCatProject/README.md"
e2e: []
effort: S
deps: []
cross_driver: false
notes: "Hardware-gated via TWINCAT_TARGET_NETID; no e2e change to test-twincat.ps1."
- id: twincat-1.2
driver: twincat
phase: 1
plan_pr_id: "1.2"
title: "TwinCAT — TIME/DATE/DT/TOD as native UA types"
plan_anchor: "docs/plans/twincat-plan.md"
summary: |
Stop marshalling IEC TIME/DATE/DT/TOD as raw UDINT and convert to native UA
Duration/DateTime types via post-processing in ReadValueAsync, ConvertForWrite,
and OnAdsNotificationEx. May expose missing Duration in DriverDataType. CLI
syntax updates so users write ISO-8601 / IEC literals instead of numeric raw
values.
files:
- "src/ZB.MOM.WW.OtOpcUa.Driver.TwinCAT/TwinCATDataType.cs"
- "src/ZB.MOM.WW.OtOpcUa.Driver.TwinCAT/AdsTwinCATClient.cs"
docs:
- "docs/Driver.TwinCAT.Cli.md"
fixture:
- "tests/ZB.MOM.WW.OtOpcUa.Driver.TwinCAT.IntegrationTests/TwinCatProject/PLC/GVLs/GVL_Primitives.TcGVL"
- "tests/ZB.MOM.WW.OtOpcUa.Driver.TwinCAT.IntegrationTests/TwinCatProject/README.md"
e2e: []
effort: M
deps: []
cross_driver: false
notes: "Hardware-gated via TWINCAT_TARGET_NETID. May add Duration to DriverDataType enum."
- id: twincat-1.3
driver: twincat
phase: 1
plan_pr_id: "1.3"
title: "TwinCAT — Bit-indexed BOOL writes (read-modify-write)"
plan_anchor: "docs/plans/twincat-plan.md"
summary: |
Replace the NotSupportedException at AdsTwinCATClient.cs:99 with read-modify-write
on the parent word, serializing concurrent bit writes to the same parent via a
keyed SemaphoreSlim. Closes referenced task #181. CLI gains an example and the
fixture caveat in the bugs-caught list updates to note writes now work.
files:
- "src/ZB.MOM.WW.OtOpcUa.Driver.TwinCAT/AdsTwinCATClient.cs"
docs:
- "docs/Driver.TwinCAT.Cli.md"
- "docs/drivers/TwinCAT-Test-Fixture.md"
fixture: []
e2e: []
effort: S
deps: []
cross_driver: false
notes: "Reuses GVL_Primitives.vWord (0xBEEF) — no fixture schema change."
- id: twincat-1.4
driver: twincat
phase: 1
plan_pr_id: "1.4"
title: "TwinCAT — Multi-dim and whole-array reads"
plan_anchor: "docs/plans/twincat-plan.md"
summary: |
Expand ReadValueAsync/WriteValueAsync to handle whole-array reads in a single
AdsClient call rather than element-by-element. Surface IsArray + ArrayDimensions
on TwinCATTagDefinition and through DriverAttributeInfo from DiscoverAsync. Sets
up the array-shape plumbing the rest of the driver needs.
files:
- "src/ZB.MOM.WW.OtOpcUa.Driver.TwinCAT/AdsTwinCATClient.cs"
- "src/ZB.MOM.WW.OtOpcUa.Driver.TwinCAT/TwinCATDriver.cs"
- "src/ZB.MOM.WW.OtOpcUa.Driver.TwinCAT/TwinCATDriverOptions.cs"
docs:
- "docs/Driver.TwinCAT.Cli.md"
- "docs/drivers/TwinCAT-Test-Fixture.md"
fixture:
- "tests/ZB.MOM.WW.OtOpcUa.Driver.TwinCAT.IntegrationTests/TwinCatProject/PLC/GVLs/GVL_Arrays.TcGVL"
- "tests/ZB.MOM.WW.OtOpcUa.Driver.TwinCAT.IntegrationTests/TwinCatProject/README.md"
e2e: []
effort: M
deps: []
cross_driver: false
notes: "Hardware-gated via TWINCAT_TARGET_NETID. New 5x5 aReal2D seed with deterministic pattern."
- id: twincat-1.5
driver: twincat
phase: 1
plan_pr_id: "1.5"
title: "TwinCAT — ENUM and ALIAS at discovery"
plan_anchor: "docs/plans/twincat-plan.md"
summary: |
MapSymbolTypeName currently returns null for non-atomic types, dropping ENUM and
ALIAS symbols silently. Switch to inspecting symbol.DataType + Category from
TwinCAT.TypeSystem so DataTypeCategory.Enum walks EnumValues and Alias resolves
to base atomic recursively. Surface enum members for later EnumStrings rendering.
POINTER/REFERENCE/INTERFACE/UNION explicitly out of scope.
files:
- "src/ZB.MOM.WW.OtOpcUa.Driver.TwinCAT/AdsTwinCATClient.cs"
docs:
- "docs/Driver.TwinCAT.Cli.md"
- "docs/drivers/TwinCAT-Test-Fixture.md"
fixture:
- "tests/ZB.MOM.WW.OtOpcUa.Driver.TwinCAT.IntegrationTests/TwinCatProject/README.md"
e2e: []
effort: M
deps: []
cross_driver: false
notes: "Reuses existing GVL_Enums + DUTs; only README integration-test contract entry added."
- id: twincat-2.1
driver: twincat
phase: 2
plan_pr_id: "2.1"
title: "TwinCAT — ADS Sum-read / Sum-write"
plan_anchor: "docs/plans/twincat-plan.md"
summary: |
Replace per-tag ReadValueAsync loops with Beckhoff's ADS Sum commands
(IndexGroup 0xF080-0xF084) via SumSymbolRead/SumSymbolWrite to batch N
reads/writes per AMS request. Bucket fullReferences by DeviceHostAddress and
expose a new ReadValuesAsync surface on ITwinCATClient. Targets ~10x throughput
on multi-thousand-tag scans; perf-tier test gated behind TWINCAT_PERF=1.
files:
- "src/ZB.MOM.WW.OtOpcUa.Driver.TwinCAT/AdsTwinCATClient.cs"
- "src/ZB.MOM.WW.OtOpcUa.Driver.TwinCAT/TwinCATDriver.cs"
- "src/ZB.MOM.WW.OtOpcUa.Driver.TwinCAT/ITwinCATClient.cs"
docs:
- "docs/v3/twincat-backlog.md"
- "docs/drivers/TwinCAT-Test-Fixture.md"
fixture:
- "tests/ZB.MOM.WW.OtOpcUa.Driver.TwinCAT.IntegrationTests/TwinCatProject/PLC/GVLs/GVL_Perf.TcGVL"
- "tests/ZB.MOM.WW.OtOpcUa.Driver.TwinCAT.IntegrationTests/TwinCatProject/README.md"
e2e: []
effort: L
deps: []
cross_driver: false
notes: "Perf test gated behind TWINCAT_PERF=1 plus TWINCAT_TARGET_NETID; new FB_PerfChurn POU."
- id: twincat-2.2
driver: twincat
phase: 2
plan_pr_id: "2.2"
title: "TwinCAT — Handle-based access with caching"
plan_anchor: "docs/plans/twincat-plan.md"
summary: |
Cache CreateVariableHandleAsync results so per-read overhead drops to
read-by-handle (4-byte index vs N-byte symbol path). On
DeviceSymbolVersionInvalid (0x710) evict and retry once. Clear cache on
AdsClient reconnect until the symbol-version listener (PR 2.3) ships. Dispose
path calls DeleteVariableHandleAsync for cached handles.
files:
- "src/ZB.MOM.WW.OtOpcUa.Driver.TwinCAT/AdsTwinCATClient.cs"
docs:
- "docs/Driver.TwinCAT.Cli.md"
- "docs/drivers/TwinCAT-Test-Fixture.md"
fixture: []
e2e: []
effort: M
deps: []
cross_driver: false
notes: "Combines with PR 2.1 for sum-read-by-handle. Reuses GVL_Perf.aTags."
- id: twincat-2.3
driver: twincat
phase: 2
plan_pr_id: "2.3"
title: "TwinCAT — Symbol-version invalidation listener"
plan_anchor: "docs/plans/twincat-plan.md"
summary: |
Register an AddDeviceNotificationAsync on the symbol-version index group
(AdsReservedIndexGroup.SymbolVersion 0xF008) so the handle cache from PR 2.2
is wiped on online-change bumps. Initial integration test gated as
requires-manual-online-change until automation lands. Resolves open question
(c) confirming the v6 enum constant.
files:
- "src/ZB.MOM.WW.OtOpcUa.Driver.TwinCAT/AdsTwinCATClient.cs"
docs:
- "docs/drivers/TwinCAT-Test-Fixture.md"
fixture:
- "tests/ZB.MOM.WW.OtOpcUa.Driver.TwinCAT.IntegrationTests/TwinCatProject/README.md"
e2e: []
effort: M
deps: ["twincat-2.2"]
cross_driver: false
notes: "Hardware-gated via TWINCAT_TARGET_NETID; manual online-change drill documented in README."
- id: twincat-3.1
driver: twincat
phase: 3
plan_pr_id: "3.1"
title: "TwinCAT — Per-tag MaxDelay tuning"
plan_anchor: "docs/plans/twincat-plan.md"
summary: |
Surface NotificationSettings MaxDelay as a per-tag option (default 0 to
preserve current behavior). Plumb int? MaxDelayMs through TwinCATTagDefinition,
SubscribeAsync, and AddNotificationAsync. Coalesces high-frequency PLC signals
so the OPC UA subscription queue stops flooding under bursty change rates.
files:
- "src/ZB.MOM.WW.OtOpcUa.Driver.TwinCAT/TwinCATDriverOptions.cs"
- "src/ZB.MOM.WW.OtOpcUa.Driver.TwinCAT/TwinCATDriver.cs"
- "src/ZB.MOM.WW.OtOpcUa.Driver.TwinCAT/AdsTwinCATClient.cs"
docs:
- "docs/Driver.TwinCAT.Cli.md"
- "docs/drivers/TwinCAT-Test-Fixture.md"
fixture:
- "tests/ZB.MOM.WW.OtOpcUa.Driver.TwinCAT.IntegrationTests/TwinCatProject/README.md"
e2e: []
effort: S
deps: []
cross_driver: false
notes: "Reuses GVL_Fixture.nCounter as 100 Hz driver. Hardware-gated via TWINCAT_TARGET_NETID."
- id: twincat-3.2
driver: twincat
phase: 3
plan_pr_id: "3.2"
title: "TwinCAT — Cycle-time / jitter / PLC-state diagnostics"
plan_anchor: "docs/plans/twincat-plan.md"
summary: |
Augment the probe loop to read _AppInfo.OnlineChangeCnt/AppName and
_TaskInfo[1].CycleTime/LastExecTime, surface as TwinCATDeviceDiagnostics on
DeviceState, and emit through IDriverDiagnostics (cross-driver surface from
Modbus task #154). Read system symbols directly without going through the user
browse filter.
files:
- "src/ZB.MOM.WW.OtOpcUa.Driver.TwinCAT/TwinCATDriver.cs"
- "src/ZB.MOM.WW.OtOpcUa.Driver.TwinCAT/TwinCATSystemSymbolFilter.cs"
docs:
- "docs/drivers/TwinCAT-Test-Fixture.md"
- "docs/Driver.TwinCAT.Cli.md"
- "docs/v3/twincat-backlog.md"
fixture:
- "tests/ZB.MOM.WW.OtOpcUa.Driver.TwinCAT.IntegrationTests/TwinCatProject/README.md"
e2e: []
effort: M
deps: []
cross_driver: true
notes: "Reuses IDriverDiagnostics from Modbus task #154. Hardware-gated via TWINCAT_TARGET_NETID."
- id: twincat-4.1
driver: twincat
phase: 4
plan_pr_id: "4.1"
title: "TwinCAT — Nested UDT browse via online type walker"
plan_anchor: "docs/plans/twincat-plan.md"
summary: |
Largest single piece of work. Recurse BrowseSymbolsAsync into IStructType.SubItems
yielding one TwinCATDiscoveredSymbol per leaf with dotted instance paths. Expand
arrays-of-structs up to a configurable bound (default 1024). Add a pure
TwinCATTypeWalker helper. Folds recursed structure into Discovered/ folder tree.
Online runtime path only — TMC offline parsing deferred per open question (a).
files:
- "src/ZB.MOM.WW.OtOpcUa.Driver.TwinCAT/AdsTwinCATClient.cs"
- "src/ZB.MOM.WW.OtOpcUa.Driver.TwinCAT/TwinCATDriver.cs"
- "src/ZB.MOM.WW.OtOpcUa.Driver.TwinCAT/TwinCATTypeWalker.cs"
docs:
- "docs/Driver.TwinCAT.Cli.md"
- "docs/drivers/TwinCAT-Test-Fixture.md"
- "docs/v3/twincat-backlog.md"
fixture:
- "tests/ZB.MOM.WW.OtOpcUa.Driver.TwinCAT.IntegrationTests/TwinCatProject/PLC/DUTs/ST_NestedFlags.TcDUT"
- "tests/ZB.MOM.WW.OtOpcUa.Driver.TwinCAT.IntegrationTests/TwinCatProject/PLC/DUTs/ST_RecursiveCap.TcDUT"
- "tests/ZB.MOM.WW.OtOpcUa.Driver.TwinCAT.IntegrationTests/TwinCatProject/PLC/DUTs/ST_AlarmRecord.TcDUT"
- "tests/ZB.MOM.WW.OtOpcUa.Driver.TwinCAT.IntegrationTests/TwinCatProject/README.md"
e2e: []
effort: L
deps: ["twincat-1.5"]
cross_driver: false
notes: "Hardware-gated via TWINCAT_TARGET_NETID. PR 1.4 helpful but not blocking."
- id: twincat-5.1
driver: twincat
phase: 5
plan_pr_id: "5.1"
title: "TwinCAT — IAlarmSource via TC3 EventLogger"
plan_anchor: "docs/plans/twincat-plan.md"
summary: |
Implement IAlarmSource over TcEventLogger on AMS port 110 so PLC alarms
surface as OPC UA AC events. Begins with a one-day spike (open question (b))
documented in docs/v3/twincat-eventlogger-spike.md to determine if a managed
wrapper exists or if we hit AMS port 110 directly via a secondary AdsClient
+ AddDeviceNotificationAsync on the alarm-list index group. Gated by new
EnableAlarms option (default false).
files:
- "src/ZB.MOM.WW.OtOpcUa.Driver.TwinCAT/TwinCATAlarmSource.cs"
- "src/ZB.MOM.WW.OtOpcUa.Driver.TwinCAT/TwinCATDriver.cs"
- "src/ZB.MOM.WW.OtOpcUa.Driver.TwinCAT/TwinCATDriverOptions.cs"
docs:
- "docs/drivers/TwinCAT.md"
- "docs/v3/twincat-eventlogger-spike.md"
- "docs/Driver.TwinCAT.Cli.md"
- "docs/drivers/TwinCAT-Test-Fixture.md"
fixture:
- "tests/ZB.MOM.WW.OtOpcUa.Driver.TwinCAT.IntegrationTests/TwinCatProject/PLC/POUs/FB_AlarmHarness.TcPOU"
- "tests/ZB.MOM.WW.OtOpcUa.Driver.TwinCAT.IntegrationTests/TwinCatProject/PLC/GVLs/GVL_Alarms.TcGVL"
- "tests/ZB.MOM.WW.OtOpcUa.Driver.TwinCAT.IntegrationTests/TwinCatProject/README.md"
e2e:
- "scripts/e2e/test-twincat.ps1"
effort: L
deps: []
cross_driver: false
notes: "Hardware-gated via TWINCAT_TARGET_NETID. Spike-first; e2e Test-AlarmRoundTrip likely deferred to follow-up."
+36
View File
@@ -0,0 +1,36 @@
# Plan-execution queue
Gitea-backed work queue that drives the per-driver implementation plans (`docs/plans/*-plan.md`) to completion in **Mode B** (autonomous: auto-merges into the `auto/driver-gaps` integration branch when build+tests pass).
## Pieces
- `pr-manifest.yaml` — canonical list of every PR across all six plans.
- `setup-labels.sh` — idempotently creates the queue labels in Gitea.
- `file-issues.sh` — files one Gitea issue per manifest entry (idempotent — skips ids that already exist).
- `next-pr.sh` — picks the next eligible queue issue (queued, blockers all done) as JSON.
- `start-pr.sh ISSUE BRANCH` — flips queued → in-progress and creates the branch off `auto/driver-gaps`.
- `open-pr.sh ISSUE BRANCH TITLE BODY_FILE` — opens a PR from BRANCH into `auto/driver-gaps`.
- `merge-pr.sh PR` — merges a PR with branch-delete (Mode B).
- `finish-pr.sh ISSUE success PR` / `finish-pr.sh ISSUE failed REASON_FILE` — closes / marks failed.
## Flow per loop iteration
1. `next-pr.sh` → issue#, branch, canonical id.
2. `start-pr.sh` → mark in-progress, create branch.
3. Loop driver dispatches a Claude Agent to implement the PR on the branch.
4. Loop runs `dotnet build` + `dotnet test`.
5. On green: `open-pr.sh`, `merge-pr.sh`, `finish-pr.sh success`.
6. On red: capture log → `finish-pr.sh failed log.txt`. Issue stays open with `queue/failed` label for retry.
## Environment
- Gitea repo: `dohertj2/lmxopcua` on `gitea.dohertylan.com`.
- Token: read from `%LOCALAPPDATA%\tea\config.yml` (or `$GITEA_TOKEN` override).
- Integration branch: `auto/driver-gaps` (created off master).
- Per-PR branches: `auto/<driver>/<plan-pr-id>`.
## Reset / debug
- Re-list eligible issues: `bash scripts/queue/next-pr.sh`.
- Manually unblock: remove `queue/blocked` label and add `queue/queued`.
- Drop a failed PR back into queue: remove `queue/failed`, add `queue/queued`.
+122
View File
@@ -0,0 +1,122 @@
#!/usr/bin/env bash
# Reads scripts/queue/pr-manifest.yaml and creates one Gitea issue per PR.
# Idempotent: skips PRs whose canonical id already exists as an open issue.
set -euo pipefail
HERE="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
. "$HERE/lib.sh"
if [ ! -f "$MANIFEST" ]; then
echo "manifest not found: $MANIFEST" >&2
exit 1
fi
# Collect existing canonical-id → issue# mapping (from queue-meta blocks)
EXISTING_JSON=$(api_repo GET "issues?state=all&type=issues&limit=200&page=1")
# multiple pages — keep paging until empty
PAGE=2
while :; do
PG=$(api_repo GET "issues?state=all&type=issues&limit=200&page=$PAGE")
COUNT=$(echo "$PG" | python -c "import sys,json; print(len(json.load(sys.stdin)))")
if [ "$COUNT" = "0" ]; then break; fi
EXISTING_JSON=$(python -c "import sys,json; a=json.loads(sys.argv[1]); b=json.loads(sys.argv[2]); print(json.dumps(a+b))" "$EXISTING_JSON" "$PG")
PAGE=$((PAGE+1))
done
python - "$MANIFEST" "$LABEL_MAP" <<'PY'
import json, sys, re, yaml, urllib.request, os
manifest_path, label_map_path = sys.argv[1], sys.argv[2]
gitea_token = os.environ["GITEA_TOKEN"]
api_base = "https://gitea.dohertylan.com/api/v1/repos/dohertj2/lmxopcua"
with open(manifest_path) as f: manifest = yaml.safe_load(f)
with open(label_map_path) as f: lmap = json.load(f)
def api(method, path, data=None):
req = urllib.request.Request(
f"{api_base}/{path}",
method=method,
headers={
"Authorization": f"token {gitea_token}",
"Content-Type": "application/json",
"Accept": "application/json",
},
data=json.dumps(data).encode() if data else None,
)
with urllib.request.urlopen(req) as r:
return json.loads(r.read().decode())
# Collect existing issues' canonical ids → issue#
existing = {}
page = 1
while True:
items = api("GET", f"issues?state=all&type=issues&limit=50&page={page}")
if not items: break
for it in items:
m = re.search(r'<!-- queue-meta\s*(\{.*?\})\s*-->', it.get("body","") or "", re.S)
if m:
try:
meta = json.loads(m.group(1))
if "id" in meta:
existing[meta["id"]] = it["number"]
except: pass
page += 1
print(f"existing queue issues: {len(existing)}")
filed = 0
skipped = 0
for pr in manifest["prs"]:
if pr["id"] in existing:
skipped += 1
continue
title = f"[{pr['driver']}] {pr['title']}"
meta = {
"id": pr["id"],
"driver": pr["driver"],
"phase": pr["phase"],
"plan_pr_id": pr.get("plan_pr_id",""),
"deps": pr.get("deps", []),
"cross_driver": pr.get("cross_driver", False),
}
body_parts = [
f"<!-- queue-meta\n{json.dumps(meta)}\n-->",
"## Auto-managed PR — Mode B (autonomous)",
f"**Driver**: `{pr['driver']}` **Phase**: `{pr['phase']}` **Plan PR**: `{pr.get('plan_pr_id','')}`",
f"**Plan**: [`{pr.get('plan_anchor','docs/plans/' + pr['driver'] + '-plan.md')}`]({pr.get('plan_anchor','../docs/plans/' + pr['driver'] + '-plan.md')})",
f"**Effort**: `{pr.get('effort','M')}` **Cross-driver**: `{pr.get('cross_driver', False)}`",
"",
"## Summary",
pr.get("summary","_(see plan)_"),
]
if pr.get("files"):
body_parts += ["", "## Source files", *[f"- `{f}`" for f in pr["files"]]]
if pr.get("docs"):
body_parts += ["", "## Docs", *[f"- `{d}`" for d in pr["docs"]]]
if pr.get("fixture"):
body_parts += ["", "## Fixture", *[f"- `{x}`" for x in pr["fixture"]]]
if pr.get("e2e"):
body_parts += ["", "## E2E", *[f"- `{x}`" for x in pr["e2e"]]]
if pr.get("deps"):
body_parts += ["", "## Depends on", *[f"- canonical: `{d}`" for d in pr["deps"]]]
if pr.get("notes"):
body_parts += ["", "## Notes", pr["notes"]]
body_parts += ["",
"---",
f"_Branch: `auto/{pr['driver']}/{pr.get('plan_pr_id','').replace('/','-')}`. Target: `auto/driver-gaps`._"]
body = "\n".join(body_parts)
label_names = [
f"driver/{pr['driver']}",
f"phase/{pr['phase']}",
"queue/queued",
"auto-managed",
]
if pr.get("cross_driver"): label_names.append("cross-driver")
label_ids = [lmap[n] for n in label_names if n in lmap]
issue = api("POST", "issues", {"title": title, "body": body, "labels": label_ids})
print(f" filed #{issue['number']}: {pr['id']}")
filed += 1
print(f"\nfiled {filed}, skipped (existing) {skipped}")
PY
+39
View File
@@ -0,0 +1,39 @@
#!/usr/bin/env bash
# Closes the issue (success) or marks failed and reopens for retry.
# Usage:
# finish-pr.sh ISSUE_NUM success PR_NUM
# finish-pr.sh ISSUE_NUM failed REASON_FILE
set -euo pipefail
HERE="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
. "$HERE/lib.sh"
ISSUE="${1:?ISSUE_NUM required}"
RESULT="${2:?success|failed required}"
ARG3="${3:?PR_NUM or REASON_FILE required}"
INPROG=$(python -c "import json; print(json.load(open('$LABEL_MAP'))['queue/in-progress'])")
DONE=$(python -c "import json; print(json.load(open('$LABEL_MAP'))['queue/done'])")
FAILED=$(python -c "import json; print(json.load(open('$LABEL_MAP'))['queue/failed'])")
api_repo DELETE "issues/$ISSUE/labels/$INPROG" >/dev/null || true
case "$RESULT" in
success)
PR_NUM="$ARG3"
api_repo POST "issues/$ISSUE/labels" "{\"labels\":[$DONE]}" >/dev/null
BODY=$(python -c "import json; print(json.dumps({'body':'✅ Auto-loop completed. Merged via PR #$PR_NUM.'}))")
api_repo POST "issues/$ISSUE/comments" "$BODY" >/dev/null
api_repo PATCH "issues/$ISSUE" '{"state":"closed"}' >/dev/null
echo " issue #$ISSUE closed (PR #$PR_NUM merged)"
;;
failed)
REASON_FILE="$ARG3"
REASON=$(cat "$REASON_FILE" 2>/dev/null | head -c 4000 || echo "(no reason file)")
api_repo POST "issues/$ISSUE/labels" "{\"labels\":[$FAILED]}" >/dev/null
BODY=$(python -c "import json,sys; r=open('$REASON_FILE').read()[:4000] if __import__('os').path.exists('$REASON_FILE') else '(no log)'; print(json.dumps({'body':'❌ Auto-loop failed.\n\n\`\`\`\n'+r+'\n\`\`\`'}))")
api_repo POST "issues/$ISSUE/comments" "$BODY" >/dev/null
echo " issue #$ISSUE marked failed (still open for retry)"
;;
*)
echo "unknown result: $RESULT" >&2; exit 1 ;;
esac
+57
View File
@@ -0,0 +1,57 @@
#!/usr/bin/env bash
# Shared helpers for the Gitea-backed plan-execution queue.
set -euo pipefail
GITEA_URL="https://gitea.dohertylan.com"
GITEA_REPO="dohertj2/lmxopcua"
GITEA_API="$GITEA_URL/api/v1"
if [ -z "${GITEA_TOKEN:-}" ]; then
TEA_CONFIG="${LOCALAPPDATA:-$HOME/AppData/Local}/tea/config.yml"
if [ ! -f "$TEA_CONFIG" ]; then
TEA_CONFIG="$HOME/.config/tea/config.yml"
fi
GITEA_TOKEN="$(awk '/token:/{gsub(/[ \t]/,"",$2); print $2; exit}' "$TEA_CONFIG" 2>/dev/null || true)"
fi
if [ -z "${GITEA_TOKEN:-}" ]; then
echo "lib.sh: GITEA_TOKEN not set and tea config not readable" >&2
exit 1
fi
export GITEA_TOKEN
INTEGRATION_BRANCH="auto/driver-gaps"
QUEUE_ROOT="$(cd "$(dirname "${BASH_SOURCE[0]}")" && { pwd -W 2>/dev/null || pwd; })"
REPO_ROOT="$(cd "$(dirname "${BASH_SOURCE[0]}")/../.." && { pwd -W 2>/dev/null || pwd; })"
MANIFEST="$QUEUE_ROOT/pr-manifest.yaml"
LABEL_MAP="$QUEUE_ROOT/.label-ids.json"
LABEL_QUEUED="queue/queued"
LABEL_IN_PROGRESS="queue/in-progress"
LABEL_BLOCKED="queue/blocked"
LABEL_FAILED="queue/failed"
LABEL_DONE="queue/done"
LABEL_AUTO="auto-managed"
LABEL_CROSS="cross-driver"
api() {
local method="$1" path="$2" data="${3:-}"
if [ -n "$data" ]; then
curl -sf -X "$method" \
-H "Authorization: token $GITEA_TOKEN" \
-H "Content-Type: application/json" \
-d "$data" \
"$GITEA_API/$path"
else
curl -sf -X "$method" \
-H "Authorization: token $GITEA_TOKEN" \
"$GITEA_API/$path"
fi
}
api_repo() {
api "$1" "repos/$GITEA_REPO/$2" "${3:-}"
}
label_id() {
python -c "import json,sys; m=json.load(open('$LABEL_MAP')); print(m['$1'])"
}
+57
View File
@@ -0,0 +1,57 @@
# Loop iteration prompt (Mode B autonomous)
This is the single self-contained prompt that `/loop` re-fires until the queue empties. Each iteration handles exactly one PR end-to-end.
---
You are running one iteration of the autonomous plan-execution loop. The queue lives in Gitea at `dohertj2/lmxopcua`. Helpers: `scripts/queue/*.sh`.
## Step 1 — pick the next PR
Run `bash scripts/queue/next-pr.sh`. It returns JSON.
- If `{"empty": true}` → the queue is drained. **Do not call ScheduleWakeup.** Report "queue empty — loop terminating" and exit. The /loop will end.
- Otherwise parse: `issue_num`, `canonical_id`, `driver`, `phase`, `plan_pr_id`, `branch`, `title`, `url`.
## Step 2 — claim it
Run `bash scripts/queue/start-pr.sh "$ISSUE_NUM" "$BRANCH"`. This swaps `queue/queued``queue/in-progress` and creates the branch off `auto/driver-gaps`.
## Step 3 — pull the issue body
Run `curl -sf -H "Authorization: token $(awk '/token:/{print $2}' "$LOCALAPPDATA/tea/config.yml")" "https://gitea.dohertylan.com/api/v1/repos/dohertj2/lmxopcua/issues/$ISSUE_NUM"` and extract the `body` field. The body contains the Plan link, summary, source files, docs/fixture/e2e files.
## Step 4 — implement on a worktree
Dispatch a general-purpose Agent with `isolation: "worktree"`. Brief it with:
- the issue body verbatim
- the linked plan section (read `docs/plans/<driver>-plan.md` and quote the relevant per-PR detail)
- explicit instructions: implement the source-file changes, the doc updates, the fixture extensions, and the e2e test additions named in the issue
- run `dotnet build c:/Users/dohertj2/Desktop/lmxopcua/ZB.MOM.WW.OtOpcUa.slnx` until green
- run `dotnet test` for the relevant test project until green
- commit on `$BRANCH` with message `Auto: <canonical_id> — <short summary>` followed by `Closes #$ISSUE_NUM`
- return a brief summary of what changed
## Step 5 — verify and push
Verify the agent did commit + push. If branch isn't pushed, push it: `git push origin "$BRANCH"`.
## Step 6 — open PR
Build a body file: include the issue summary + the agent's summary. Then:
```
PR_NUM=$(bash scripts/queue/open-pr.sh "$ISSUE_NUM" "$BRANCH" "$TITLE" /tmp/pr-body.md)
```
## Step 7 — auto-merge (Mode B)
Run `bash scripts/queue/merge-pr.sh "$PR_NUM"`.
## Step 8 — close issue
Run `bash scripts/queue/finish-pr.sh "$ISSUE_NUM" success "$PR_NUM"`.
## On failure
If anywhere from Step 4 onward fails (build red, tests red, agent gives up, push fails, merge conflict):
- write the failure log to `/tmp/loop-fail-$ISSUE_NUM.log`
- run `bash scripts/queue/finish-pr.sh "$ISSUE_NUM" failed /tmp/loop-fail-$ISSUE_NUM.log`
- the issue keeps `queue/failed` and stays open for retry
- **do not** retry the same issue this iteration; let the loop pick a different one next fire
## Re-arm
At the very end of the iteration (success OR failure), call `ScheduleWakeup` with the same `/loop` prompt and `delaySeconds: 60` to fire the next iteration.
If the queue was empty in Step 1, do NOT call ScheduleWakeup.
Report a one-line summary to the user before re-arming.
+11
View File
@@ -0,0 +1,11 @@
#!/usr/bin/env bash
# Merges a PR (Mode B autonomous merge into auto/driver-gaps).
# Usage: merge-pr.sh PR_NUM
set -euo pipefail
HERE="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
. "$HERE/lib.sh"
PR="${1:?PR_NUM required}"
PAYLOAD='{"Do":"merge","delete_branch_after_merge":true}'
api_repo POST "pulls/$PR/merge" "$PAYLOAD" >/dev/null
echo " PR #$PR merged into $INTEGRATION_BRANCH (branch deleted)"
+77
View File
@@ -0,0 +1,77 @@
#!/usr/bin/env bash
# Prints the next eligible queue issue as JSON: {issue_num, canonical_id, driver, plan_pr_id, branch, ...}
# Eligible = open + label queue/queued + all canonical deps closed.
# Picks lowest phase first, then lowest issue number within phase.
set -euo pipefail
HERE="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
. "$HERE/lib.sh"
python - <<PY
import json, urllib.request, re, os, sys
token = os.environ["GITEA_TOKEN"]
api_base = "https://gitea.dohertylan.com/api/v1/repos/dohertj2/lmxopcua"
def api(path):
req = urllib.request.Request(f"{api_base}/{path}",
headers={"Authorization": f"token {token}"})
with urllib.request.urlopen(req) as r:
return json.loads(r.read().decode())
# Gather all queue issues
issues = []
page = 1
while True:
items = api(f"issues?state=all&type=issues&limit=50&page={page}&labels=auto-managed")
if not items: break
issues.extend(items)
page += 1
by_id = {}
for it in issues:
m = re.search(r'<!-- queue-meta\s*(\{.*?\})\s*-->', it.get("body","") or "", re.S)
if not m: continue
try: meta = json.loads(m.group(1))
except: continue
by_id[meta["id"]] = (it, meta)
def is_done(issue):
if issue["state"] == "closed": return True
labels = {l["name"] for l in issue["labels"]}
return "queue/done" in labels
eligible = []
for cid, (it, meta) in by_id.items():
labels = {l["name"] for l in it["labels"]}
if it["state"] != "open": continue
if "queue/queued" not in labels: continue
deps = meta.get("deps", [])
blocked = False
for d in deps:
if d not in by_id:
blocked = True; break
if not is_done(by_id[d][0]):
blocked = True; break
if blocked: continue
eligible.append((meta.get("phase",99), it["number"], cid, it, meta))
if not eligible:
print(json.dumps({"empty": True}))
sys.exit(0)
eligible.sort(key=lambda x: (x[0], x[1]))
phase, num, cid, it, meta = eligible[0]
plan_pr = meta.get("plan_pr_id","").replace("/","-")
result = {
"empty": False,
"issue_num": num,
"canonical_id": cid,
"driver": meta["driver"],
"phase": meta["phase"],
"plan_pr_id": meta.get("plan_pr_id",""),
"title": it["title"],
"branch": f"auto/{meta['driver']}/{plan_pr}",
"url": it["html_url"],
}
print(json.dumps(result, indent=2))
PY
+24
View File
@@ -0,0 +1,24 @@
#!/usr/bin/env bash
# Opens a PR from BRANCH into auto/driver-gaps, references the issue, sets ready/draft.
# Usage: open-pr.sh ISSUE_NUM BRANCH_NAME TITLE BODY_FILE
# Echoes the PR number on stdout.
set -euo pipefail
HERE="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
. "$HERE/lib.sh"
ISSUE="${1:?}"; BRANCH="${2:?}"; TITLE="${3:?}"; BODY_FILE="${4:?}"
BODY=$(cat "$BODY_FILE")
PAYLOAD=$(python -c "
import json, sys
print(json.dumps({
'title': sys.argv[1],
'body': sys.argv[2] + '\n\nCloses #' + sys.argv[3],
'head': sys.argv[4],
'base': sys.argv[5],
}))
" "$TITLE" "$BODY" "$ISSUE" "$BRANCH" "$INTEGRATION_BRANCH")
PR=$(api_repo POST pulls "$PAYLOAD")
PR_NUM=$(echo "$PR" | python -c "import sys,json; print(json.load(sys.stdin)['number'])")
echo "$PR_NUM"
File diff suppressed because it is too large Load Diff
+58
View File
@@ -0,0 +1,58 @@
#!/usr/bin/env bash
# Idempotent: creates queue labels in Gitea and stores name→id map at .label-ids.json
set -euo pipefail
HERE="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
. "$HERE/lib.sh"
declare -A LABELS=(
["driver/abcip"]="0e8a16"
["driver/ablegacy"]="0e8a16"
["driver/focas"]="0e8a16"
["driver/opcuaclient"]="0e8a16"
["driver/s7"]="0e8a16"
["driver/twincat"]="0e8a16"
["phase/1"]="bfd4f2"
["phase/2"]="bfd4f2"
["phase/3"]="bfd4f2"
["phase/4"]="bfd4f2"
["phase/5"]="bfd4f2"
["phase/6"]="bfd4f2"
["queue/queued"]="d4c5f9"
["queue/in-progress"]="fbca04"
["queue/blocked"]="b60205"
["queue/failed"]="b60205"
["queue/done"]="2ea44f"
["auto-managed"]="cccccc"
["cross-driver"]="d93f0b"
)
# Pull existing labels
EXISTING=$(api_repo GET "labels?limit=200")
emit_map() {
python - <<PY
import json, sys
existing = json.loads('''$EXISTING''')
print(json.dumps({l['name']: l['id'] for l in existing}, indent=2))
PY
}
# Create any missing
for name in "${!LABELS[@]}"; do
color="${LABELS[$name]}"
exists=$(echo "$EXISTING" | python -c "import json,sys; ls=json.load(sys.stdin); print('yes' if any(l['name']=='$name' for l in ls) else 'no')")
if [ "$exists" = "no" ]; then
payload=$(python -c "import json; print(json.dumps({'name':'$name','color':'#$color','description':'queue management'}))")
api_repo POST labels "$payload" >/dev/null
echo "created label: $name"
fi
done
# Refresh and write the map file
api_repo GET "labels?limit=200" | python -c "
import json, sys
ls = json.load(sys.stdin)
m = {l['name']: l['id'] for l in ls}
open('$LABEL_MAP','w').write(json.dumps(m, indent=2))
print(f'wrote {len(m)} labels to $LABEL_MAP')
"
+31
View File
@@ -0,0 +1,31 @@
#!/usr/bin/env bash
# Marks an issue in-progress and creates its branch off the integration branch.
# Usage: start-pr.sh ISSUE_NUM BRANCH_NAME
set -euo pipefail
HERE="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
. "$HERE/lib.sh"
ISSUE="${1:?ISSUE_NUM required}"
BRANCH="${2:?BRANCH_NAME required}"
# Swap labels: queued -> in-progress
QUEUED=$(python -c "import json; print(json.load(open('$LABEL_MAP'))['queue/queued'])")
INPROG=$(python -c "import json; print(json.load(open('$LABEL_MAP'))['queue/in-progress'])")
api_repo DELETE "issues/$ISSUE/labels/$QUEUED" >/dev/null || true
api_repo POST "issues/$ISSUE/labels" "{\"labels\":[$INPROG]}" >/dev/null
# Create branch off integration
EXISTS=$(api_repo GET "branches/$BRANCH" 2>/dev/null || echo "")
if [ -z "$EXISTS" ]; then
PAYLOAD=$(python -c "import json; print(json.dumps({'new_branch_name':'$BRANCH','old_branch_name':'$INTEGRATION_BRANCH'}))")
api_repo POST branches "$PAYLOAD" >/dev/null
echo " branch created: $BRANCH"
else
echo " branch exists: $BRANCH"
fi
# Comment
COMMENT=$(python -c "import json; print(json.dumps({'body':'🤖 Auto-loop picked this up. Branch: \`$BRANCH\`. Status: in-progress.'}))")
api_repo POST "issues/$ISSUE/comments" "$COMMENT" >/dev/null
echo " issue #$ISSUE marked in-progress"
+1 -1
View File
@@ -1,6 +1,6 @@
{
"ConnectionStrings": {
"ConfigDb": "Server=localhost,14330;Database=OtOpcUaConfig;User Id=sa;Password=OtOpcUaDev_2026!;TrustServerCertificate=True;Encrypt=False;"
"ConfigDb": "Server=10.100.0.35,14330;Database=OtOpcUaConfig;User Id=sa;Password=OtOpcUaDev_2026!;TrustServerCertificate=True;Encrypt=False;"
},
"Authentication": {
"Ldap": {
@@ -1,260 +0,0 @@
using System;
using System.Collections.Concurrent;
using System.Collections.Generic;
using System.Linq;
using System.Threading.Tasks;
using ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Host.Backend.MxAccess;
namespace ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Host.Backend.Alarms;
/// <summary>
/// Subscribes to the four Galaxy alarm attributes (<c>.InAlarm</c>, <c>.Priority</c>,
/// <c>.DescAttrName</c>, <c>.Acked</c>) per alarm-bearing attribute discovered during
/// <c>DiscoverAsync</c>. Maintains one <see cref="AlarmState"/> per alarm, raises
/// <see cref="AlarmTransition"/> on lifecycle transitions (Active / Unacknowledged /
/// Acknowledged / Inactive). Ack path writes <c>.AckMsg</c>. Pure-logic state machine
/// with delegate-based subscribe/write so it's testable against in-memory fakes.
/// </summary>
/// <remarks>
/// Transitions emitted (OPC UA Part 9 alarm lifecycle, simplified for the Galaxy model):
/// <list type="bullet">
/// <item><c>Active</c> — InAlarm false → true. Default to Unacknowledged.</item>
/// <item><c>Acknowledged</c> — Acked false → true while InAlarm is still true.</item>
/// <item><c>Inactive</c> — InAlarm true → false. If still unacknowledged the alarm
/// is marked latched-inactive-unack; next Ack transitions straight to Inactive.</item>
/// </list>
/// </remarks>
public sealed class GalaxyAlarmTracker : IDisposable
{
public const string InAlarmAttr = ".InAlarm";
public const string PriorityAttr = ".Priority";
public const string DescAttrNameAttr = ".DescAttrName";
public const string AckedAttr = ".Acked";
public const string AckMsgAttr = ".AckMsg";
private readonly Func<string, Action<string, Vtq>, Task> _subscribe;
private readonly Func<string, Task> _unsubscribe;
private readonly Func<string, object, Task<bool>> _write;
private readonly Func<DateTime> _clock;
// Alarm tag (attribute full ref, e.g. "Tank.Level.HiHi") → state.
private readonly ConcurrentDictionary<string, AlarmState> _alarms =
new(StringComparer.OrdinalIgnoreCase);
// Reverse lookup: probed tag (".InAlarm" etc.) → owning alarm tag.
private readonly ConcurrentDictionary<string, (string AlarmTag, AlarmField Field)> _probeToAlarm =
new(StringComparer.OrdinalIgnoreCase);
private bool _disposed;
public event EventHandler<AlarmTransition>? TransitionRaised;
public GalaxyAlarmTracker(
Func<string, Action<string, Vtq>, Task> subscribe,
Func<string, Task> unsubscribe,
Func<string, object, Task<bool>> write)
: this(subscribe, unsubscribe, write, () => DateTime.UtcNow) { }
internal GalaxyAlarmTracker(
Func<string, Action<string, Vtq>, Task> subscribe,
Func<string, Task> unsubscribe,
Func<string, object, Task<bool>> write,
Func<DateTime> clock)
{
_subscribe = subscribe ?? throw new ArgumentNullException(nameof(subscribe));
_unsubscribe = unsubscribe ?? throw new ArgumentNullException(nameof(unsubscribe));
_write = write ?? throw new ArgumentNullException(nameof(write));
_clock = clock ?? throw new ArgumentNullException(nameof(clock));
}
public int TrackedAlarmCount => _alarms.Count;
/// <summary>
/// Advise the four alarm attributes for <paramref name="alarmTag"/>. Idempotent —
/// repeat calls for the same alarm tag are a no-op. Subscribe failure for any of the
/// four rolls back the alarm entry so a stale callback cannot promote a phantom.
/// </summary>
public async Task TrackAsync(string alarmTag)
{
if (_disposed || string.IsNullOrWhiteSpace(alarmTag)) return;
if (_alarms.ContainsKey(alarmTag)) return;
var state = new AlarmState { AlarmTag = alarmTag };
if (!_alarms.TryAdd(alarmTag, state)) return;
var probes = new[]
{
(Tag: alarmTag + InAlarmAttr, Field: AlarmField.InAlarm),
(Tag: alarmTag + PriorityAttr, Field: AlarmField.Priority),
(Tag: alarmTag + DescAttrNameAttr, Field: AlarmField.DescAttrName),
(Tag: alarmTag + AckedAttr, Field: AlarmField.Acked),
};
foreach (var p in probes)
{
_probeToAlarm[p.Tag] = (alarmTag, p.Field);
}
try
{
foreach (var p in probes)
{
await _subscribe(p.Tag, OnProbeCallback).ConfigureAwait(false);
}
}
catch
{
// Rollback so a partial advise doesn't leak state.
_alarms.TryRemove(alarmTag, out _);
foreach (var p in probes)
{
_probeToAlarm.TryRemove(p.Tag, out _);
try { await _unsubscribe(p.Tag).ConfigureAwait(false); } catch { }
}
throw;
}
}
/// <summary>
/// Drop every tracked alarm. Unadvises all 4 probes per alarm as best-effort.
/// </summary>
public async Task ClearAsync()
{
_alarms.Clear();
foreach (var kv in _probeToAlarm.ToList())
{
_probeToAlarm.TryRemove(kv.Key, out _);
try { await _unsubscribe(kv.Key).ConfigureAwait(false); } catch { }
}
}
/// <summary>
/// Operator ack — write the comment text into <c>&lt;alarmTag&gt;.AckMsg</c>.
/// Returns false when the runtime reports the write failed.
/// </summary>
public Task<bool> AcknowledgeAsync(string alarmTag, string comment)
{
if (_disposed || string.IsNullOrWhiteSpace(alarmTag))
return Task.FromResult(false);
return _write(alarmTag + AckMsgAttr, comment ?? string.Empty);
}
/// <summary>
/// Subscription callback entry point. Exposed for tests and for the Backend to route
/// fan-out callbacks through. Runs the state machine and fires TransitionRaised
/// outside the lock.
/// </summary>
public void OnProbeCallback(string probeTag, Vtq vtq)
{
if (_disposed) return;
if (!_probeToAlarm.TryGetValue(probeTag, out var link)) return;
if (!_alarms.TryGetValue(link.AlarmTag, out var state)) return;
AlarmTransition? transition = null;
var now = _clock();
lock (state.Lock)
{
switch (link.Field)
{
case AlarmField.InAlarm:
{
var wasActive = state.InAlarm;
var isActive = vtq.Value is bool b && b;
state.InAlarm = isActive;
state.LastUpdateUtc = now;
if (!wasActive && isActive)
{
state.Acked = false;
state.LastTransitionUtc = now;
transition = new AlarmTransition(state.AlarmTag, AlarmStateTransition.Active, state.Priority, state.DescAttrName, now);
}
else if (wasActive && !isActive)
{
state.LastTransitionUtc = now;
transition = new AlarmTransition(state.AlarmTag, AlarmStateTransition.Inactive, state.Priority, state.DescAttrName, now);
}
break;
}
case AlarmField.Priority:
if (vtq.Value is int pi) state.Priority = pi;
else if (vtq.Value is short ps) state.Priority = ps;
else if (vtq.Value is long pl && pl <= int.MaxValue) state.Priority = (int)pl;
state.LastUpdateUtc = now;
break;
case AlarmField.DescAttrName:
state.DescAttrName = vtq.Value as string;
state.LastUpdateUtc = now;
break;
case AlarmField.Acked:
{
var wasAcked = state.Acked;
var isAcked = vtq.Value is bool b && b;
state.Acked = isAcked;
state.LastUpdateUtc = now;
// Fire Acknowledged only when transitioning false→true. Don't fire on initial
// subscribe callback (wasAcked==isAcked in that case because the state starts
// with Acked=false and the initial probe is usually true for an un-active alarm).
if (!wasAcked && isAcked && state.InAlarm)
{
state.LastTransitionUtc = now;
transition = new AlarmTransition(state.AlarmTag, AlarmStateTransition.Acknowledged, state.Priority, state.DescAttrName, now);
}
break;
}
}
}
if (transition is { } t)
{
TransitionRaised?.Invoke(this, t);
}
}
public IReadOnlyList<AlarmSnapshot> SnapshotStates()
{
return _alarms.Values.Select(s =>
{
lock (s.Lock)
return new AlarmSnapshot(s.AlarmTag, s.InAlarm, s.Acked, s.Priority, s.DescAttrName);
}).ToList();
}
public void Dispose()
{
if (_disposed) return;
_disposed = true;
_alarms.Clear();
_probeToAlarm.Clear();
}
private sealed class AlarmState
{
public readonly object Lock = new();
public string AlarmTag = "";
public bool InAlarm;
public bool Acked = true; // default ack'd so first false→true on subscribe doesn't misfire
public int Priority;
public string? DescAttrName;
public DateTime LastUpdateUtc;
public DateTime LastTransitionUtc;
}
private enum AlarmField { InAlarm, Priority, DescAttrName, Acked }
}
public enum AlarmStateTransition { Active, Acknowledged, Inactive }
public sealed record AlarmTransition(
string AlarmTag,
AlarmStateTransition Transition,
int Priority,
string? DescAttrName,
DateTime AtUtc);
public sealed record AlarmSnapshot(
string AlarmTag,
bool InAlarm,
bool Acked,
int Priority,
string? DescAttrName);
@@ -1,188 +0,0 @@
using System;
using System.Collections.Generic;
using System.Linq;
using System.Threading;
using System.Threading.Tasks;
using ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Host.Backend.Galaxy;
using ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Shared.Contracts;
namespace ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Host.Backend;
/// <summary>
/// Galaxy backend that uses the live <c>ZB</c> repository for <see cref="DiscoverAsync"/> —
/// real gobject hierarchy + attributes flow through to the Proxy without needing the MXAccess
/// COM client. Runtime data-plane calls (Read/Write/Subscribe/Alarm/History) still surface
/// as "MXAccess code lift pending" until the COM client port lands. This is the highest-value
/// intermediate state because Discover is what powers the OPC UA address-space build, so
/// downstream Proxy + parity tests can exercise the complete tree shape today.
/// </summary>
public sealed class DbBackedGalaxyBackend(GalaxyRepository repository) : IGalaxyBackend
{
private long _nextSessionId;
private long _nextSubscriptionId;
// DB-only backend doesn't have a runtime data plane; never raises events.
#pragma warning disable CS0067
public event System.EventHandler<OnDataChangeNotification>? OnDataChange;
public event System.EventHandler<GalaxyAlarmEvent>? OnAlarmEvent;
public event System.EventHandler<HostConnectivityStatus>? OnHostStatusChanged;
#pragma warning restore CS0067
public Task<OpenSessionResponse> OpenSessionAsync(OpenSessionRequest req, CancellationToken ct)
{
var id = Interlocked.Increment(ref _nextSessionId);
return Task.FromResult(new OpenSessionResponse { Success = true, SessionId = id });
}
public Task CloseSessionAsync(CloseSessionRequest req, CancellationToken ct) => Task.CompletedTask;
public async Task<DiscoverHierarchyResponse> DiscoverAsync(DiscoverHierarchyRequest req, CancellationToken ct)
{
try
{
var hierarchy = await repository.GetHierarchyAsync(ct).ConfigureAwait(false);
var attributes = await repository.GetAttributesAsync(ct).ConfigureAwait(false);
// Group attributes by their owning gobject for the IPC payload.
var attrsByGobject = attributes
.GroupBy(a => a.GobjectId)
.ToDictionary(g => g.Key, g => g.Select(MapAttribute).ToArray());
var parentByChild = hierarchy
.ToDictionary(o => o.GobjectId, o => o.ParentGobjectId);
var nameByGobject = hierarchy
.ToDictionary(o => o.GobjectId, o => o.TagName);
var objects = hierarchy.Select(o => new GalaxyObjectInfo
{
ContainedName = string.IsNullOrEmpty(o.ContainedName) ? o.TagName : o.ContainedName,
TagName = o.TagName,
ParentContainedName = parentByChild.TryGetValue(o.GobjectId, out var p)
&& p != 0
&& nameByGobject.TryGetValue(p, out var pName)
? pName
: null,
TemplateCategory = MapCategory(o.CategoryId),
Attributes = attrsByGobject.TryGetValue(o.GobjectId, out var a) ? a : System.Array.Empty<GalaxyAttributeInfo>(),
}).ToArray();
return new DiscoverHierarchyResponse { Success = true, Objects = objects };
}
catch (Exception ex) when (ex is System.Data.SqlClient.SqlException
or InvalidOperationException
or TimeoutException)
{
return new DiscoverHierarchyResponse
{
Success = false,
Error = $"Galaxy ZB repository error: {ex.Message}",
Objects = System.Array.Empty<GalaxyObjectInfo>(),
};
}
}
public Task<ReadValuesResponse> ReadValuesAsync(ReadValuesRequest req, CancellationToken ct)
=> Task.FromResult(new ReadValuesResponse
{
Success = false,
Error = "MXAccess code lift pending (Phase 2 Task B.1) — DB-backed backend covers Discover only",
Values = System.Array.Empty<GalaxyDataValue>(),
});
public Task<WriteValuesResponse> WriteValuesAsync(WriteValuesRequest req, CancellationToken ct)
{
var results = new WriteValueResult[req.Writes.Length];
for (var i = 0; i < req.Writes.Length; i++)
{
results[i] = new WriteValueResult
{
TagReference = req.Writes[i].TagReference,
StatusCode = 0x80020000u,
Error = "MXAccess code lift pending (Phase 2 Task B.1)",
};
}
return Task.FromResult(new WriteValuesResponse { Results = results });
}
public Task<SubscribeResponse> SubscribeAsync(SubscribeRequest req, CancellationToken ct)
{
var sid = Interlocked.Increment(ref _nextSubscriptionId);
return Task.FromResult(new SubscribeResponse
{
Success = true,
SubscriptionId = sid,
ActualIntervalMs = req.RequestedIntervalMs,
});
}
public Task UnsubscribeAsync(UnsubscribeRequest req, CancellationToken ct) => Task.CompletedTask;
public Task SubscribeAlarmsAsync(AlarmSubscribeRequest req, CancellationToken ct) => Task.CompletedTask;
public Task AcknowledgeAlarmAsync(AlarmAckRequest req, CancellationToken ct) => Task.CompletedTask;
public Task<HistoryReadResponse> HistoryReadAsync(HistoryReadRequest req, CancellationToken ct)
=> Task.FromResult(new HistoryReadResponse
{
Success = false,
Error = "MXAccess + Historian code lift pending (Phase 2 Task B.1)",
Tags = System.Array.Empty<HistoryTagValues>(),
});
public Task<HistoryReadProcessedResponse> HistoryReadProcessedAsync(
HistoryReadProcessedRequest req, CancellationToken ct)
=> Task.FromResult(new HistoryReadProcessedResponse
{
Success = false,
Error = "MXAccess + Historian code lift pending (Phase 2 Task B.1)",
Values = System.Array.Empty<GalaxyDataValue>(),
});
public Task<HistoryReadAtTimeResponse> HistoryReadAtTimeAsync(
HistoryReadAtTimeRequest req, CancellationToken ct)
=> Task.FromResult(new HistoryReadAtTimeResponse
{
Success = false,
Error = "MXAccess + Historian code lift pending (Phase 2 Task B.1)",
Values = System.Array.Empty<GalaxyDataValue>(),
});
public Task<HistoryReadEventsResponse> HistoryReadEventsAsync(
HistoryReadEventsRequest req, CancellationToken ct)
=> Task.FromResult(new HistoryReadEventsResponse
{
Success = false,
Error = "MXAccess + Historian code lift pending (Phase 2 Task B.1)",
Events = System.Array.Empty<GalaxyHistoricalEvent>(),
});
public Task<RecycleStatusResponse> RecycleAsync(RecycleHostRequest req, CancellationToken ct)
=> Task.FromResult(new RecycleStatusResponse { Accepted = true, GraceSeconds = 15 });
private static GalaxyAttributeInfo MapAttribute(GalaxyAttributeRow row) => new()
{
AttributeName = row.AttributeName,
MxDataType = row.MxDataType,
IsArray = row.IsArray,
ArrayDim = row.ArrayDimension is int d and > 0 ? (uint)d : null,
SecurityClassification = row.SecurityClassification,
IsHistorized = row.IsHistorized,
IsAlarm = row.IsAlarm,
};
/// <summary>
/// Galaxy <c>template_definition.category_id</c> → human-readable name.
/// Mirrors v1 Host's <c>AlarmObjectFilter</c> mapping.
/// </summary>
private static string MapCategory(int categoryId) => categoryId switch
{
1 => "$WinPlatform",
3 => "$AppEngine",
4 => "$Area",
10 => "$UserDefined",
11 => "$ApplicationObject",
13 => "$Area",
17 => "$DeviceIntegration",
24 => "$ViewEngine",
26 => "$ViewApp",
_ => $"category-{categoryId}",
};
}
@@ -1,35 +0,0 @@
namespace ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Host.Backend.Galaxy;
/// <summary>
/// One row from the v1 <c>HierarchySql</c>. Galaxy <c>gobject</c> deployed instance with its
/// hierarchy parent + template-chain context.
/// </summary>
public sealed class GalaxyHierarchyRow
{
public int GobjectId { get; init; }
public string TagName { get; init; } = string.Empty;
public string ContainedName { get; init; } = string.Empty;
public string BrowseName { get; init; } = string.Empty;
public int ParentGobjectId { get; init; }
public bool IsArea { get; init; }
public int CategoryId { get; init; }
public int HostedByGobjectId { get; init; }
public System.Collections.Generic.IReadOnlyList<string> TemplateChain { get; init; } = System.Array.Empty<string>();
}
/// <summary>One row from the v1 <c>AttributesSql</c>.</summary>
public sealed class GalaxyAttributeRow
{
public int GobjectId { get; init; }
public string TagName { get; init; } = string.Empty;
public string AttributeName { get; init; } = string.Empty;
public string FullTagReference { get; init; } = string.Empty;
public int MxDataType { get; init; }
public string? DataTypeName { get; init; }
public bool IsArray { get; init; }
public int? ArrayDimension { get; init; }
public int MxAttributeCategory { get; init; }
public int SecurityClassification { get; init; }
public bool IsHistorized { get; init; }
public bool IsAlarm { get; init; }
}
@@ -1,224 +0,0 @@
using System;
using System.Collections.Generic;
using System.Data.SqlClient;
using System.Linq;
using System.Threading;
using System.Threading.Tasks;
namespace ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Host.Backend.Galaxy;
/// <summary>
/// SQL access to the Galaxy <c>ZB</c> repository — port of v1 <c>GalaxyRepositoryService</c>.
/// The two SQL bodies (Hierarchy + Attributes) are byte-for-byte identical to v1 so the
/// queries surface the same row set at parity time. Extended-attributes and scope-filter
/// queries from v1 are intentionally not ported yet — they're refinements that aren't on
/// the Phase 2 critical path.
/// </summary>
public sealed class GalaxyRepository(GalaxyRepositoryOptions options)
{
public async Task<bool> TestConnectionAsync(CancellationToken ct = default)
{
try
{
using var conn = new SqlConnection(options.ConnectionString);
await conn.OpenAsync(ct).ConfigureAwait(false);
using var cmd = new SqlCommand("SELECT 1", conn) { CommandTimeout = options.CommandTimeoutSeconds };
var result = await cmd.ExecuteScalarAsync(ct).ConfigureAwait(false);
return result is int i && i == 1;
}
catch (SqlException) { return false; }
catch (InvalidOperationException) { return false; }
}
public async Task<DateTime?> GetLastDeployTimeAsync(CancellationToken ct = default)
{
using var conn = new SqlConnection(options.ConnectionString);
await conn.OpenAsync(ct).ConfigureAwait(false);
using var cmd = new SqlCommand("SELECT time_of_last_deploy FROM galaxy", conn)
{ CommandTimeout = options.CommandTimeoutSeconds };
var result = await cmd.ExecuteScalarAsync(ct).ConfigureAwait(false);
return result is DateTime dt ? dt : null;
}
public async Task<List<GalaxyHierarchyRow>> GetHierarchyAsync(CancellationToken ct = default)
{
var rows = new List<GalaxyHierarchyRow>();
using var conn = new SqlConnection(options.ConnectionString);
await conn.OpenAsync(ct).ConfigureAwait(false);
using var cmd = new SqlCommand(HierarchySql, conn) { CommandTimeout = options.CommandTimeoutSeconds };
using var reader = await cmd.ExecuteReaderAsync(ct).ConfigureAwait(false);
while (await reader.ReadAsync(ct).ConfigureAwait(false))
{
var templateChainRaw = reader.IsDBNull(8) ? string.Empty : reader.GetString(8);
var templateChain = templateChainRaw.Length == 0
? Array.Empty<string>()
: templateChainRaw.Split(new[] { '|' }, StringSplitOptions.RemoveEmptyEntries)
.Select(s => s.Trim())
.Where(s => s.Length > 0)
.ToArray();
rows.Add(new GalaxyHierarchyRow
{
GobjectId = Convert.ToInt32(reader.GetValue(0)),
TagName = reader.GetString(1),
ContainedName = reader.IsDBNull(2) ? string.Empty : reader.GetString(2),
BrowseName = reader.GetString(3),
ParentGobjectId = Convert.ToInt32(reader.GetValue(4)),
IsArea = Convert.ToInt32(reader.GetValue(5)) == 1,
CategoryId = Convert.ToInt32(reader.GetValue(6)),
HostedByGobjectId = Convert.ToInt32(reader.GetValue(7)),
TemplateChain = templateChain,
});
}
return rows;
}
public async Task<List<GalaxyAttributeRow>> GetAttributesAsync(CancellationToken ct = default)
{
var rows = new List<GalaxyAttributeRow>();
using var conn = new SqlConnection(options.ConnectionString);
await conn.OpenAsync(ct).ConfigureAwait(false);
using var cmd = new SqlCommand(AttributesSql, conn) { CommandTimeout = options.CommandTimeoutSeconds };
using var reader = await cmd.ExecuteReaderAsync(ct).ConfigureAwait(false);
while (await reader.ReadAsync(ct).ConfigureAwait(false))
{
rows.Add(new GalaxyAttributeRow
{
GobjectId = Convert.ToInt32(reader.GetValue(0)),
TagName = reader.GetString(1),
AttributeName = reader.GetString(2),
FullTagReference = reader.GetString(3),
MxDataType = Convert.ToInt32(reader.GetValue(4)),
DataTypeName = reader.IsDBNull(5) ? null : reader.GetString(5),
IsArray = Convert.ToInt32(reader.GetValue(6)) == 1,
ArrayDimension = reader.IsDBNull(7) ? (int?)null : Convert.ToInt32(reader.GetValue(7)),
MxAttributeCategory = Convert.ToInt32(reader.GetValue(8)),
SecurityClassification = Convert.ToInt32(reader.GetValue(9)),
IsHistorized = Convert.ToInt32(reader.GetValue(10)) == 1,
IsAlarm = Convert.ToInt32(reader.GetValue(11)) == 1,
});
}
return rows;
}
private const string HierarchySql = @"
;WITH template_chain AS (
SELECT g.gobject_id AS instance_gobject_id, t.gobject_id AS template_gobject_id,
t.tag_name AS template_tag_name, t.derived_from_gobject_id, 0 AS depth
FROM gobject g
INNER JOIN gobject t ON t.gobject_id = g.derived_from_gobject_id
WHERE g.is_template = 0 AND g.deployed_package_id <> 0 AND g.derived_from_gobject_id <> 0
UNION ALL
SELECT tc.instance_gobject_id, t.gobject_id, t.tag_name, t.derived_from_gobject_id, tc.depth + 1
FROM template_chain tc
INNER JOIN gobject t ON t.gobject_id = tc.derived_from_gobject_id
WHERE tc.derived_from_gobject_id <> 0 AND tc.depth < 10
)
SELECT DISTINCT
g.gobject_id,
g.tag_name,
g.contained_name,
CASE WHEN g.contained_name IS NULL OR g.contained_name = ''
THEN g.tag_name
ELSE g.contained_name
END AS browse_name,
CASE WHEN g.contained_by_gobject_id = 0
THEN g.area_gobject_id
ELSE g.contained_by_gobject_id
END AS parent_gobject_id,
CASE WHEN td.category_id = 13
THEN 1
ELSE 0
END AS is_area,
td.category_id AS category_id,
g.hosted_by_gobject_id AS hosted_by_gobject_id,
ISNULL(
STUFF((
SELECT '|' + tc.template_tag_name
FROM template_chain tc
WHERE tc.instance_gobject_id = g.gobject_id
ORDER BY tc.depth
FOR XML PATH('')
), 1, 1, ''),
''
) AS template_chain
FROM gobject g
INNER JOIN template_definition td
ON g.template_definition_id = td.template_definition_id
WHERE td.category_id IN (1, 3, 4, 10, 11, 13, 17, 24, 26)
AND g.is_template = 0
AND g.deployed_package_id <> 0
ORDER BY parent_gobject_id, g.tag_name";
private const string AttributesSql = @"
;WITH deployed_package_chain AS (
SELECT g.gobject_id, p.package_id, p.derived_from_package_id, 0 AS depth
FROM gobject g
INNER JOIN package p ON p.package_id = g.deployed_package_id
WHERE g.is_template = 0 AND g.deployed_package_id <> 0
UNION ALL
SELECT dpc.gobject_id, p.package_id, p.derived_from_package_id, dpc.depth + 1
FROM deployed_package_chain dpc
INNER JOIN package p ON p.package_id = dpc.derived_from_package_id
WHERE dpc.derived_from_package_id <> 0 AND dpc.depth < 10
)
SELECT gobject_id, tag_name, attribute_name, full_tag_reference,
mx_data_type, data_type_name, is_array, array_dimension,
mx_attribute_category, security_classification, is_historized, is_alarm
FROM (
SELECT
dpc.gobject_id,
g.tag_name,
da.attribute_name,
g.tag_name + '.' + da.attribute_name
+ CASE WHEN da.is_array = 1 THEN '[]' ELSE '' END
AS full_tag_reference,
da.mx_data_type,
dt.description AS data_type_name,
da.is_array,
CASE WHEN da.is_array = 1
THEN CONVERT(int, CONVERT(varbinary(2),
SUBSTRING(da.mx_value, 15, 2) + SUBSTRING(da.mx_value, 13, 2), 2))
ELSE NULL
END AS array_dimension,
da.mx_attribute_category,
da.security_classification,
CASE WHEN EXISTS (
SELECT 1 FROM deployed_package_chain dpc2
INNER JOIN primitive_instance pi ON pi.package_id = dpc2.package_id AND pi.primitive_name = da.attribute_name
INNER JOIN primitive_definition pd ON pd.primitive_definition_id = pi.primitive_definition_id AND pd.primitive_name = 'HistoryExtension'
WHERE dpc2.gobject_id = dpc.gobject_id
) THEN 1 ELSE 0 END AS is_historized,
CASE WHEN EXISTS (
SELECT 1 FROM deployed_package_chain dpc2
INNER JOIN primitive_instance pi ON pi.package_id = dpc2.package_id AND pi.primitive_name = da.attribute_name
INNER JOIN primitive_definition pd ON pd.primitive_definition_id = pi.primitive_definition_id AND pd.primitive_name = 'AlarmExtension'
WHERE dpc2.gobject_id = dpc.gobject_id
) THEN 1 ELSE 0 END AS is_alarm,
ROW_NUMBER() OVER (
PARTITION BY dpc.gobject_id, da.attribute_name
ORDER BY dpc.depth
) AS rn
FROM deployed_package_chain dpc
INNER JOIN dynamic_attribute da
ON da.package_id = dpc.package_id
INNER JOIN gobject g
ON g.gobject_id = dpc.gobject_id
INNER JOIN template_definition td
ON td.template_definition_id = g.template_definition_id
LEFT JOIN data_type dt
ON dt.mx_data_type = da.mx_data_type
WHERE td.category_id IN (1, 3, 4, 10, 11, 13, 17, 24, 26)
AND da.attribute_name NOT LIKE '[_]%'
AND da.attribute_name NOT LIKE '%.Description'
AND da.mx_attribute_category IN (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 24)
) ranked
WHERE rn = 1
ORDER BY tag_name, attribute_name";
}
@@ -1,13 +0,0 @@
namespace ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Host.Backend.Galaxy;
/// <summary>
/// Connection settings for the Galaxy <c>ZB</c> repository database. Set from the
/// <c>DriverConfig</c> JSON section <c>Database</c> per <c>plan.md</c> §"Galaxy DriverConfig".
/// </summary>
public sealed class GalaxyRepositoryOptions
{
public string ConnectionString { get; init; } =
"Server=localhost;Database=ZB;Integrated Security=True;TrustServerCertificate=True;Encrypt=False;";
public int CommandTimeoutSeconds { get; init; } = 60;
}
@@ -1,46 +0,0 @@
using System.Collections.Generic;
using System.Threading;
using System.Threading.Tasks;
using ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Shared.Contracts;
namespace ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Host.Backend;
/// <summary>
/// Galaxy data-plane abstraction. Replaces the placeholder <c>StubFrameHandler</c> with a
/// real boundary the lifted <c>MxAccessClient</c> + <c>GalaxyRepository</c> implement during
/// Phase 2 Task B.1. Splitting the IPC dispatch (<c>GalaxyFrameHandler</c>) from the
/// backend means the dispatcher is unit-testable against an in-memory mock without needing
/// live Galaxy.
/// </summary>
public interface IGalaxyBackend
{
/// <summary>
/// Server-pushed events the backend raises asynchronously (data-change, alarm,
/// host-status). The frame handler subscribes once on connect and forwards each
/// event to the Proxy as a typed <see cref="MessageKind"/> notification.
/// </summary>
event System.EventHandler<OnDataChangeNotification>? OnDataChange;
event System.EventHandler<GalaxyAlarmEvent>? OnAlarmEvent;
event System.EventHandler<HostConnectivityStatus>? OnHostStatusChanged;
Task<OpenSessionResponse> OpenSessionAsync(OpenSessionRequest req, CancellationToken ct);
Task CloseSessionAsync(CloseSessionRequest req, CancellationToken ct);
Task<DiscoverHierarchyResponse> DiscoverAsync(DiscoverHierarchyRequest req, CancellationToken ct);
Task<ReadValuesResponse> ReadValuesAsync(ReadValuesRequest req, CancellationToken ct);
Task<WriteValuesResponse> WriteValuesAsync(WriteValuesRequest req, CancellationToken ct);
Task<SubscribeResponse> SubscribeAsync(SubscribeRequest req, CancellationToken ct);
Task UnsubscribeAsync(UnsubscribeRequest req, CancellationToken ct);
Task SubscribeAlarmsAsync(AlarmSubscribeRequest req, CancellationToken ct);
Task AcknowledgeAlarmAsync(AlarmAckRequest req, CancellationToken ct);
Task<HistoryReadResponse> HistoryReadAsync(HistoryReadRequest req, CancellationToken ct);
Task<HistoryReadProcessedResponse> HistoryReadProcessedAsync(HistoryReadProcessedRequest req, CancellationToken ct);
Task<HistoryReadAtTimeResponse> HistoryReadAtTimeAsync(HistoryReadAtTimeRequest req, CancellationToken ct);
Task<HistoryReadEventsResponse> HistoryReadEventsAsync(HistoryReadEventsRequest req, CancellationToken ct);
Task<RecycleStatusResponse> RecycleAsync(RecycleHostRequest req, CancellationToken ct);
}
@@ -1,43 +0,0 @@
using ArchestrA.MxAccess;
namespace ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Host.Backend.MxAccess;
/// <summary>
/// Delegate matching <c>LMXProxyServer.OnDataChange</c> COM event signature. Allows
/// <see cref="MxAccessClient"/> to subscribe via the abstracted <see cref="IMxProxy"/>
/// instead of the COM object directly (so the test mock works without MXAccess registered).
/// </summary>
public delegate void MxDataChangeHandler(
int hLMXServerHandle,
int phItemHandle,
object pvItemValue,
int pwItemQuality,
object pftItemTimeStamp,
ref MXSTATUS_PROXY[] ItemStatus);
public delegate void MxWriteCompleteHandler(
int hLMXServerHandle,
int phItemHandle,
ref MXSTATUS_PROXY[] ItemStatus);
/// <summary>
/// Abstraction over <c>LMXProxyServer</c> — port of v1 <c>IMxProxy</c>. Same surface area
/// so the lifted client behaves identically; only the namespace + apartment-marshalling
/// entry-point change.
/// </summary>
public interface IMxProxy
{
int Register(string clientName);
void Unregister(int handle);
int AddItem(int handle, string address);
void RemoveItem(int handle, int itemHandle);
void AdviseSupervisory(int handle, int itemHandle);
void UnAdviseSupervisory(int handle, int itemHandle);
void Write(int handle, int itemHandle, object value, int securityClassification);
event MxDataChangeHandler? OnDataChange;
event MxWriteCompleteHandler? OnWriteComplete;
}
@@ -1,408 +0,0 @@
using System;
using System.Collections.Concurrent;
using System.Linq;
using System.Threading;
using System.Threading.Tasks;
using ArchestrA.MxAccess;
using Serilog;
using ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Host.Sta;
namespace ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Host.Backend.MxAccess;
/// <summary>
/// MXAccess runtime client — focused port of v1 <c>MxAccessClient</c>. Owns one
/// <c>LMXProxyServer</c> COM connection on the supplied <see cref="StaPump"/>; serializes
/// read / write / subscribe through the pump because all COM calls must run on the STA
/// thread. Subscriptions are stored so they can be replayed on reconnect (full reconnect
/// loop is the deferred-but-non-blocking refinement; this version covers connect/read/write
/// /subscribe/unsubscribe — the MVP needed for parity testing).
/// </summary>
public sealed class MxAccessClient : IDisposable
{
private static readonly ILogger Log = Serilog.Log.ForContext<MxAccessClient>();
private readonly StaPump _pump;
private readonly IMxProxy _proxy;
private readonly string _clientName;
private readonly MxAccessClientOptions _options;
// Galaxy attribute reference → MXAccess item handle (set on first Subscribe/Read).
private readonly ConcurrentDictionary<string, int> _addressToHandle = new(StringComparer.OrdinalIgnoreCase);
private readonly ConcurrentDictionary<int, string> _handleToAddress = new();
private readonly ConcurrentDictionary<string, Action<string, Vtq>> _subscriptions =
new(StringComparer.OrdinalIgnoreCase);
private readonly ConcurrentDictionary<int, TaskCompletionSource<bool>> _pendingWrites = new();
private int _connectionHandle;
private bool _connected;
private DateTime _lastObservedActivityUtc = DateTime.UtcNow;
private CancellationTokenSource? _monitorCts;
private int _reconnectCount;
private bool _disposed;
/// <summary>Fires whenever the connection transitions Connected ↔ Disconnected.</summary>
public event EventHandler<bool>? ConnectionStateChanged;
/// <summary>
/// Fires once per failed subscription replay after a reconnect. Carries the tag reference
/// and the exception so the backend can propagate the degradation signal (e.g. mark the
/// subscription bad on the Proxy side rather than silently losing its callback). Added for
/// PR 6 low finding #2 — the replay loop previously ate per-tag failures silently and an
/// operator would only find out that a specific subscription stopped updating through a
/// data-quality complaint from downstream.
/// </summary>
public event EventHandler<SubscriptionReplayFailedEventArgs>? SubscriptionReplayFailed;
public MxAccessClient(StaPump pump, IMxProxy proxy, string clientName, MxAccessClientOptions? options = null)
{
_pump = pump;
_proxy = proxy;
_clientName = clientName;
_options = options ?? new MxAccessClientOptions();
_proxy.OnDataChange += OnDataChange;
_proxy.OnWriteComplete += OnWriteComplete;
}
public bool IsConnected => _connected;
public int SubscriptionCount => _subscriptions.Count;
public int ReconnectCount => _reconnectCount;
/// <summary>
/// Wonderware client identity used when registering with the LMXProxyServer. Surfaced so
/// <see cref="Backend.MxAccessGalaxyBackend"/> can tag its <c>OnHostStatusChanged</c> IPC
/// pushes with a stable gateway name per PR 8.
/// </summary>
public string ClientName => _clientName;
/// <summary>Connects on the STA thread. Idempotent. Starts the reconnect monitor on first call.</summary>
public async Task<int> ConnectAsync()
{
var handle = await _pump.InvokeAsync(() =>
{
if (_connected) return _connectionHandle;
_connectionHandle = _proxy.Register(_clientName);
_connected = true;
return _connectionHandle;
});
ConnectionStateChanged?.Invoke(this, true);
if (_options.AutoReconnect && _monitorCts is null)
{
_monitorCts = new CancellationTokenSource();
_ = Task.Run(() => MonitorLoopAsync(_monitorCts.Token));
}
return handle;
}
public async Task DisconnectAsync()
{
_monitorCts?.Cancel();
_monitorCts = null;
await _pump.InvokeAsync(() =>
{
if (!_connected) return;
try { _proxy.Unregister(_connectionHandle); }
finally
{
_connected = false;
_addressToHandle.Clear();
_handleToAddress.Clear();
}
});
ConnectionStateChanged?.Invoke(this, false);
}
/// <summary>
/// Background loop that watches for connection liveness signals and triggers
/// reconnect-with-replay when the connection appears dead. Per Phase 2 high finding #2:
/// v1's MxAccessClient.Monitor pattern lifted into the new pump-based client. Uses
/// observed-activity timestamp + optional probe-tag subscription. Without an explicit
/// probe tag, falls back to "no data change in N seconds + no successful read in N
/// seconds = unhealthy" — same shape as v1.
/// </summary>
private async Task MonitorLoopAsync(CancellationToken ct)
{
while (!ct.IsCancellationRequested)
{
try { await Task.Delay(_options.MonitorInterval, ct); }
catch (OperationCanceledException) { break; }
if (!_connected || _disposed) continue;
var idle = DateTime.UtcNow - _lastObservedActivityUtc;
if (idle <= _options.StaleThreshold) continue;
// Probe: try a no-op COM call. If the proxy is dead, the call will throw — that's
// our reconnect signal. PR 6 low finding #1: AddItem allocates an MXAccess item
// handle; we must RemoveItem it on the same pump turn or the long-running monitor
// leaks one handle per probe cycle (one every MonitorInterval seconds, indefinitely).
bool probeOk;
try
{
probeOk = await _pump.InvokeAsync(() =>
{
int probeHandle = 0;
try
{
probeHandle = _proxy.AddItem(_connectionHandle, "$Heartbeat");
return probeHandle > 0;
}
catch { return false; }
finally
{
if (probeHandle > 0)
{
try { _proxy.RemoveItem(_connectionHandle, probeHandle); }
catch { /* proxy is dying; best-effort cleanup */ }
}
}
});
}
catch { probeOk = false; }
if (probeOk)
{
_lastObservedActivityUtc = DateTime.UtcNow;
continue;
}
// Connection appears dead — reconnect-with-replay.
try
{
await _pump.InvokeAsync(() =>
{
try { _proxy.Unregister(_connectionHandle); } catch { /* dead anyway */ }
_connected = false;
});
ConnectionStateChanged?.Invoke(this, false);
await _pump.InvokeAsync(() =>
{
_connectionHandle = _proxy.Register(_clientName);
_connected = true;
});
_reconnectCount++;
ConnectionStateChanged?.Invoke(this, true);
// Replay every subscription that was active before the disconnect. PR 6 low
// finding #2: surface per-tag failures — log them and raise
// SubscriptionReplayFailed so the backend can propagate the degraded state
// (previously swallowed silently; downstream quality dropped without a signal).
var snapshot = _addressToHandle.Keys.ToArray();
_addressToHandle.Clear();
_handleToAddress.Clear();
var failed = 0;
foreach (var fullRef in snapshot)
{
try { await SubscribeOnPumpAsync(fullRef); }
catch (Exception subEx)
{
failed++;
Log.Warning(subEx,
"MXAccess subscription replay failed for {TagReference} after reconnect #{Reconnect}",
fullRef, _reconnectCount);
SubscriptionReplayFailed?.Invoke(this,
new SubscriptionReplayFailedEventArgs(fullRef, subEx));
}
}
if (failed > 0)
Log.Warning("Subscription replay completed — {Failed} of {Total} failed", failed, snapshot.Length);
else
Log.Information("Subscription replay completed — {Total} re-subscribed cleanly", snapshot.Length);
_lastObservedActivityUtc = DateTime.UtcNow;
}
catch
{
// Reconnect failed; back off and retry on the next tick.
_connected = false;
}
}
}
/// <summary>
/// One-shot read implemented as a transient subscribe + unsubscribe.
/// <c>LMXProxyServer</c> doesn't expose a synchronous read, so the canonical pattern
/// (lifted from v1) is to subscribe, await the first OnDataChange, then unsubscribe.
/// This method captures that single value.
/// </summary>
public async Task<Vtq> ReadAsync(string fullReference, TimeSpan timeout, CancellationToken ct)
{
if (!_connected) throw new InvalidOperationException("MxAccessClient not connected");
var tcs = new TaskCompletionSource<Vtq>(TaskCreationOptions.RunContinuationsAsynchronously);
Action<string, Vtq> oneShot = (_, value) => tcs.TrySetResult(value);
// Stash the one-shot handler before sending the subscribe, then remove it after firing.
_subscriptions.AddOrUpdate(fullReference, oneShot, (_, existing) => Combine(existing, oneShot));
var addedToReadOnlyAttribute = !_addressToHandle.ContainsKey(fullReference);
try
{
await SubscribeOnPumpAsync(fullReference);
using var _ = ct.Register(() => tcs.TrySetCanceled());
var raceTask = await Task.WhenAny(tcs.Task, Task.Delay(timeout, ct));
if (raceTask != tcs.Task) throw new TimeoutException($"MXAccess read of {fullReference} timed out after {timeout}");
return await tcs.Task;
}
finally
{
// High 1 — always detach the one-shot handler, even on cancellation/timeout/throw.
// If we were the one who added the underlying MXAccess subscription (no other
// caller had it), tear it down too so we don't leak a probe item handle.
_subscriptions.AddOrUpdate(fullReference, _ => default!, (_, existing) => Remove(existing, oneShot));
if (addedToReadOnlyAttribute)
{
try { await UnsubscribeAsync(fullReference); }
catch { /* shutdown-best-effort */ }
}
}
}
/// <summary>
/// Writes <paramref name="value"/> to the runtime and AWAITS the OnWriteComplete
/// callback so the caller learns the actual write status. Per Phase 2 medium finding #4
/// in <c>exit-gate-phase-2.md</c>: the previous fire-and-forget version returned a
/// false-positive Good even when the runtime rejected the write post-callback.
/// </summary>
public async Task<bool> WriteAsync(string fullReference, object value,
int securityClassification = 0, TimeSpan? timeout = null)
{
if (!_connected) throw new InvalidOperationException("MxAccessClient not connected");
var actualTimeout = timeout ?? TimeSpan.FromSeconds(5);
var itemHandle = await _pump.InvokeAsync(() => ResolveItem(fullReference));
var tcs = new TaskCompletionSource<bool>(TaskCreationOptions.RunContinuationsAsynchronously);
if (!_pendingWrites.TryAdd(itemHandle, tcs))
{
// A prior write to the same item handle is still pending — uncommon but possible
// if the caller spammed writes. Replace it: the older TCS observes a Cancelled task.
if (_pendingWrites.TryRemove(itemHandle, out var prior))
prior.TrySetCanceled();
_pendingWrites[itemHandle] = tcs;
}
try
{
await _pump.InvokeAsync(() =>
_proxy.Write(_connectionHandle, itemHandle, value, securityClassification));
var raceTask = await Task.WhenAny(tcs.Task, Task.Delay(actualTimeout));
if (raceTask != tcs.Task)
throw new TimeoutException($"MXAccess write of {fullReference} timed out after {actualTimeout}");
return await tcs.Task;
}
finally
{
_pendingWrites.TryRemove(itemHandle, out _);
}
}
public async Task SubscribeAsync(string fullReference, Action<string, Vtq> callback)
{
if (!_connected) throw new InvalidOperationException("MxAccessClient not connected");
_subscriptions.AddOrUpdate(fullReference, callback, (_, existing) => Combine(existing, callback));
await SubscribeOnPumpAsync(fullReference);
}
public Task UnsubscribeAsync(string fullReference) => _pump.InvokeAsync(() =>
{
if (!_connected) return;
if (!_addressToHandle.TryRemove(fullReference, out var handle)) return;
_handleToAddress.TryRemove(handle, out _);
_subscriptions.TryRemove(fullReference, out _);
try
{
_proxy.UnAdviseSupervisory(_connectionHandle, handle);
_proxy.RemoveItem(_connectionHandle, handle);
}
catch { /* best-effort during teardown */ }
});
private Task<int> SubscribeOnPumpAsync(string fullReference) => _pump.InvokeAsync(() =>
{
if (_addressToHandle.TryGetValue(fullReference, out var existing)) return existing;
var itemHandle = _proxy.AddItem(_connectionHandle, fullReference);
_addressToHandle[fullReference] = itemHandle;
_handleToAddress[itemHandle] = fullReference;
_proxy.AdviseSupervisory(_connectionHandle, itemHandle);
return itemHandle;
});
private int ResolveItem(string fullReference)
{
if (_addressToHandle.TryGetValue(fullReference, out var existing)) return existing;
var itemHandle = _proxy.AddItem(_connectionHandle, fullReference);
_addressToHandle[fullReference] = itemHandle;
_handleToAddress[itemHandle] = fullReference;
return itemHandle;
}
private void OnDataChange(int hLMXServerHandle, int phItemHandle, object pvItemValue,
int pwItemQuality, object pftItemTimeStamp, ref MXSTATUS_PROXY[] itemStatus)
{
if (!_handleToAddress.TryGetValue(phItemHandle, out var fullRef)) return;
// Liveness: any data-change event is proof the connection is alive.
_lastObservedActivityUtc = DateTime.UtcNow;
var ts = pftItemTimeStamp is DateTime dt ? dt.ToUniversalTime() : DateTime.UtcNow;
var quality = (byte)Math.Min(255, Math.Max(0, pwItemQuality));
var vtq = new Vtq(pvItemValue, ts, quality);
if (_subscriptions.TryGetValue(fullRef, out var cb)) cb?.Invoke(fullRef, vtq);
}
private void OnWriteComplete(int hLMXServerHandle, int phItemHandle, ref MXSTATUS_PROXY[] itemStatus)
{
if (_pendingWrites.TryRemove(phItemHandle, out var tcs))
tcs.TrySetResult(itemStatus is null || itemStatus.Length == 0 || itemStatus[0].success != 0);
}
private static Action<string, Vtq> Combine(Action<string, Vtq> a, Action<string, Vtq> b)
=> (Action<string, Vtq>)Delegate.Combine(a, b)!;
private static Action<string, Vtq> Remove(Action<string, Vtq> source, Action<string, Vtq> remove)
=> (Action<string, Vtq>?)Delegate.Remove(source, remove) ?? ((_, _) => { });
public void Dispose()
{
_disposed = true;
_monitorCts?.Cancel();
try { DisconnectAsync().GetAwaiter().GetResult(); }
catch { /* swallow */ }
_proxy.OnDataChange -= OnDataChange;
_proxy.OnWriteComplete -= OnWriteComplete;
_monitorCts?.Dispose();
}
}
/// <summary>
/// Tunables for <see cref="MxAccessClient"/>'s reconnect monitor. Defaults match the v1
/// monitor's polling cadence so behavior is consistent across the lift.
/// </summary>
public sealed class MxAccessClientOptions
{
/// <summary>Whether to start the background monitor at connect time.</summary>
public bool AutoReconnect { get; init; } = true;
/// <summary>How often the monitor wakes up to check liveness.</summary>
public TimeSpan MonitorInterval { get; init; } = TimeSpan.FromSeconds(5);
/// <summary>If no data-change activity in this window, the monitor probes the connection.</summary>
public TimeSpan StaleThreshold { get; init; } = TimeSpan.FromSeconds(60);
}
@@ -1,68 +0,0 @@
using System;
using System.Runtime.InteropServices;
using ArchestrA.MxAccess;
namespace ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Host.Backend.MxAccess;
/// <summary>
/// Concrete <see cref="IMxProxy"/> backed by a real <c>LMXProxyServer</c> COM object.
/// Port of v1 <c>MxProxyAdapter</c>. <strong>Must only be constructed on an STA thread</strong>
/// — the StaPump owns this instance.
/// </summary>
public sealed class MxProxyAdapter : IMxProxy, IDisposable
{
private LMXProxyServer? _lmxProxy;
public event MxDataChangeHandler? OnDataChange;
public event MxWriteCompleteHandler? OnWriteComplete;
public int Register(string clientName)
{
_lmxProxy = new LMXProxyServer();
_lmxProxy.OnDataChange += ProxyOnDataChange;
_lmxProxy.OnWriteComplete += ProxyOnWriteComplete;
var handle = _lmxProxy.Register(clientName);
if (handle <= 0)
throw new InvalidOperationException($"LMXProxyServer.Register returned invalid handle: {handle}");
return handle;
}
public void Unregister(int handle)
{
if (_lmxProxy is null) return;
try
{
_lmxProxy.OnDataChange -= ProxyOnDataChange;
_lmxProxy.OnWriteComplete -= ProxyOnWriteComplete;
_lmxProxy.Unregister(handle);
}
finally
{
// ReleaseComObject loop until refcount = 0 — the Tier C SafeHandle wraps this in
// production; here the lifetime is owned by the surrounding MxAccessHandle.
while (Marshal.IsComObject(_lmxProxy) && Marshal.ReleaseComObject(_lmxProxy) > 0) { }
_lmxProxy = null;
}
}
public int AddItem(int handle, string address) => _lmxProxy!.AddItem(handle, address);
public void RemoveItem(int handle, int itemHandle) => _lmxProxy!.RemoveItem(handle, itemHandle);
public void AdviseSupervisory(int handle, int itemHandle) => _lmxProxy!.AdviseSupervisory(handle, itemHandle);
public void UnAdviseSupervisory(int handle, int itemHandle) => _lmxProxy!.UnAdvise(handle, itemHandle);
public void Write(int handle, int itemHandle, object value, int securityClassification) =>
_lmxProxy!.Write(handle, itemHandle, value, securityClassification);
private void ProxyOnDataChange(int hLMXServerHandle, int phItemHandle, object pvItemValue,
int pwItemQuality, object pftItemTimeStamp, ref MXSTATUS_PROXY[] ItemStatus)
=> OnDataChange?.Invoke(hLMXServerHandle, phItemHandle, pvItemValue, pwItemQuality, pftItemTimeStamp, ref ItemStatus);
private void ProxyOnWriteComplete(int hLMXServerHandle, int phItemHandle, ref MXSTATUS_PROXY[] ItemStatus)
=> OnWriteComplete?.Invoke(hLMXServerHandle, phItemHandle, ref ItemStatus);
public void Dispose() => Unregister(0);
}
@@ -1,20 +0,0 @@
using System;
namespace ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Host.Backend.MxAccess;
/// <summary>
/// Fired by <see cref="MxAccessClient.SubscriptionReplayFailed"/> when a previously-active
/// subscription fails to be restored after a reconnect. The backend should treat the tag as
/// unhealthy until the next successful resubscribe.
/// </summary>
public sealed class SubscriptionReplayFailedEventArgs : EventArgs
{
public SubscriptionReplayFailedEventArgs(string tagReference, Exception exception)
{
TagReference = tagReference;
Exception = exception;
}
public string TagReference { get; }
public Exception Exception { get; }
}
@@ -1,24 +0,0 @@
using System;
namespace ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Host.Backend.MxAccess;
/// <summary>Value-timestamp-quality triplet — port of v1 <c>Vtq</c>.</summary>
public readonly struct Vtq
{
public object? Value { get; }
public DateTime TimestampUtc { get; }
public byte Quality { get; }
public Vtq(object? value, DateTime timestampUtc, byte quality)
{
Value = value;
TimestampUtc = timestampUtc;
Quality = quality;
}
/// <summary>OPC DA Good = 192.</summary>
public static Vtq Good(object? v) => new(v, DateTime.UtcNow, 192);
/// <summary>OPC DA Bad = 0.</summary>
public static Vtq Bad() => new(null, DateTime.UtcNow, 0);
}
@@ -1,608 +0,0 @@
using System;
using System.Collections.Generic;
using System.Linq;
using System.Threading;
using System.Threading.Tasks;
using MessagePack;
using ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Host.Backend.Alarms;
using ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Host.Backend.Galaxy;
using ZB.MOM.WW.OtOpcUa.Driver.Historian.Wonderware.Backend;
using ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Host.Backend.MxAccess;
using ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Host.Backend.Stability;
using ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Shared.Contracts;
namespace ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Host.Backend;
/// <summary>
/// Production <see cref="IGalaxyBackend"/> — combines the SQL-backed
/// <see cref="GalaxyRepository"/> for Discover with the live MXAccess
/// <see cref="MxAccessClient"/> for Read / Write / Subscribe. History stays bad-coded
/// until the Wonderware Historian SDK plugin loader (Task B.1.h) lands. Alarms come from
/// MxAccess <c>AlarmExtension</c> primitives but the wire-up is also Phase 2 follow-up
/// (the v1 alarm subsystem is its own subtree).
/// </summary>
public sealed class MxAccessGalaxyBackend : IGalaxyBackend, IDisposable
{
private readonly GalaxyRepository _repository;
private readonly MxAccessClient _mx;
private readonly IHistorianDataSource? _historian;
private long _nextSessionId;
private long _nextSubscriptionId;
// Active SubscriptionId → MXAccess full reference list — so Unsubscribe can find them.
private readonly System.Collections.Concurrent.ConcurrentDictionary<long, IReadOnlyList<string>> _subs = new();
// Reverse lookup: tag reference → subscription IDs subscribed to it (one tag may belong to many).
private readonly System.Collections.Concurrent.ConcurrentDictionary<string, System.Collections.Concurrent.ConcurrentBag<long>>
_refToSubs = new(System.StringComparer.OrdinalIgnoreCase);
public event System.EventHandler<OnDataChangeNotification>? OnDataChange;
public event System.EventHandler<GalaxyAlarmEvent>? OnAlarmEvent;
public event System.EventHandler<HostConnectivityStatus>? OnHostStatusChanged;
private readonly System.EventHandler<bool> _onConnectionStateChanged;
private readonly GalaxyRuntimeProbeManager _probeManager;
private readonly System.EventHandler<HostStateTransition> _onProbeStateChanged;
private readonly GalaxyAlarmTracker _alarmTracker;
private readonly System.EventHandler<AlarmTransition> _onAlarmTransition;
// Cached during DiscoverAsync so SubscribeAlarmsAsync knows which attributes to advise.
// One entry per IsAlarm=true attribute in the last discovered hierarchy.
private readonly System.Collections.Concurrent.ConcurrentBag<string> _discoveredAlarmTags = new();
public MxAccessGalaxyBackend(GalaxyRepository repository, MxAccessClient mx, IHistorianDataSource? historian = null)
{
_repository = repository;
_mx = mx;
_historian = historian;
// PR 8: gateway-level host-status push. When the MXAccess COM proxy transitions
// connected↔disconnected, raise OnHostStatusChanged with a synthetic host entry named
// after the Wonderware client identity so the Admin UI surfaces top-level transport
// health even before per-platform/per-engine probing lands (deferred to a later PR that
// ports v1's GalaxyRuntimeProbeManager with ScanState subscriptions).
_onConnectionStateChanged = (_, connected) =>
{
OnHostStatusChanged?.Invoke(this, new HostConnectivityStatus
{
HostName = _mx.ClientName,
RuntimeStatus = connected ? "Running" : "Stopped",
LastObservedUtcUnixMs = DateTimeOffset.UtcNow.ToUnixTimeMilliseconds(),
});
};
_mx.ConnectionStateChanged += _onConnectionStateChanged;
// PR 13: per-platform runtime probes. ScanState subscriptions fire OnProbeCallback,
// which runs the state machine and raises StateChanged on transitions we care about.
// We forward each transition through the same OnHostStatusChanged IPC event that the
// gateway-level ConnectionStateChanged uses — tagged with the platform's TagName so the
// Admin UI can show per-host health independently from the top-level transport status.
_probeManager = new GalaxyRuntimeProbeManager(
subscribe: (probe, cb) => _mx.SubscribeAsync(probe, cb),
unsubscribe: probe => _mx.UnsubscribeAsync(probe));
_onProbeStateChanged = (_, t) =>
{
OnHostStatusChanged?.Invoke(this, new HostConnectivityStatus
{
HostName = t.TagName,
RuntimeStatus = t.NewState switch
{
HostRuntimeState.Running => "Running",
HostRuntimeState.Stopped => "Stopped",
_ => "Unknown",
},
LastObservedUtcUnixMs = new DateTimeOffset(t.AtUtc, TimeSpan.Zero).ToUnixTimeMilliseconds(),
});
};
_probeManager.StateChanged += _onProbeStateChanged;
// PR 14: alarm subsystem. Per IsAlarm=true attribute discovered, subscribe to the four
// alarm-state attributes (.InAlarm/.Priority/.DescAttrName/.Acked), track lifecycle,
// and raise GalaxyAlarmEvent on transitions — forwarded through the existing
// OnAlarmEvent IPC event that the PR 4 ConnectionSink already wires into AlarmEvent frames.
_alarmTracker = new GalaxyAlarmTracker(
subscribe: (tag, cb) => _mx.SubscribeAsync(tag, cb),
unsubscribe: tag => _mx.UnsubscribeAsync(tag),
write: (tag, v) => _mx.WriteAsync(tag, v));
_onAlarmTransition = (_, t) => OnAlarmEvent?.Invoke(this, new GalaxyAlarmEvent
{
EventId = Guid.NewGuid().ToString("N"),
ObjectTagName = t.AlarmTag,
AlarmName = t.AlarmTag,
Severity = t.Priority,
StateTransition = t.Transition switch
{
AlarmStateTransition.Active => "Active",
AlarmStateTransition.Acknowledged => "Acknowledged",
AlarmStateTransition.Inactive => "Inactive",
_ => "Unknown",
},
Message = t.DescAttrName ?? t.AlarmTag,
UtcUnixMs = new DateTimeOffset(t.AtUtc, TimeSpan.Zero).ToUnixTimeMilliseconds(),
});
_alarmTracker.TransitionRaised += _onAlarmTransition;
}
/// <summary>
/// Exposed for tests. Production flow: DiscoverAsync completes → backend calls
/// <c>SyncProbesAsync</c> with the runtime hosts (WinPlatform + AppEngine gobjects) to
/// advise ScanState per host.
/// </summary>
internal GalaxyRuntimeProbeManager ProbeManager => _probeManager;
public async Task<OpenSessionResponse> OpenSessionAsync(OpenSessionRequest req, CancellationToken ct)
{
try
{
await _mx.ConnectAsync();
return new OpenSessionResponse { Success = true, SessionId = Interlocked.Increment(ref _nextSessionId) };
}
catch (Exception ex)
{
return new OpenSessionResponse { Success = false, Error = $"MXAccess connect failed: {ex.Message}" };
}
}
public async Task CloseSessionAsync(CloseSessionRequest req, CancellationToken ct)
{
await _mx.DisconnectAsync();
}
public async Task<DiscoverHierarchyResponse> DiscoverAsync(DiscoverHierarchyRequest req, CancellationToken ct)
{
try
{
var hierarchy = await _repository.GetHierarchyAsync(ct).ConfigureAwait(false);
var attributes = await _repository.GetAttributesAsync(ct).ConfigureAwait(false);
var attrsByGobject = attributes
.GroupBy(a => a.GobjectId)
.ToDictionary(g => g.Key, g => g.Select(MapAttribute).ToArray());
var nameByGobject = hierarchy.ToDictionary(o => o.GobjectId, o => o.TagName);
var objects = hierarchy.Select(o => new GalaxyObjectInfo
{
ContainedName = string.IsNullOrEmpty(o.ContainedName) ? o.TagName : o.ContainedName,
TagName = o.TagName,
ParentContainedName = o.ParentGobjectId != 0 && nameByGobject.TryGetValue(o.ParentGobjectId, out var p) ? p : null,
TemplateCategory = MapCategory(o.CategoryId),
Attributes = attrsByGobject.TryGetValue(o.GobjectId, out var a) ? a : Array.Empty<GalaxyAttributeInfo>(),
}).ToArray();
// PR 14: cache alarm-bearing attribute full refs so SubscribeAlarmsAsync can advise
// them on demand. Format matches the Galaxy reference grammar <tag>.<attr>.
var freshAlarmTags = attributes
.Where(a => a.IsAlarm)
.Select(a => nameByGobject.TryGetValue(a.GobjectId, out var tn)
? tn + "." + a.AttributeName
: null)
.Where(s => !string.IsNullOrWhiteSpace(s))
.Cast<string>()
.ToArray();
while (_discoveredAlarmTags.TryTake(out _)) { }
foreach (var t in freshAlarmTags) _discoveredAlarmTags.Add(t);
// PR 13: Sync the per-platform probe manager against the just-discovered hierarchy
// so ScanState subscriptions track the current runtime set. Best-effort — probe
// failures don't block Discover from returning, since the gateway-level signal from
// MxAccessClient.ConnectionStateChanged still flows and the Admin UI degrades to
// that level if any per-host probe couldn't advise.
try
{
var targets = hierarchy
.Where(o => o.CategoryId == GalaxyRuntimeProbeManager.CategoryWinPlatform
|| o.CategoryId == GalaxyRuntimeProbeManager.CategoryAppEngine)
.Select(o => new HostProbeTarget(o.TagName, o.CategoryId));
await _probeManager.SyncAsync(targets).ConfigureAwait(false);
}
catch { /* swallow — Discover succeeded; probes are a diagnostic enrichment */ }
return new DiscoverHierarchyResponse { Success = true, Objects = objects };
}
catch (Exception ex)
{
return new DiscoverHierarchyResponse { Success = false, Error = ex.Message, Objects = Array.Empty<GalaxyObjectInfo>() };
}
}
public async Task<ReadValuesResponse> ReadValuesAsync(ReadValuesRequest req, CancellationToken ct)
{
if (!_mx.IsConnected) return new ReadValuesResponse { Success = false, Error = "Not connected", Values = Array.Empty<GalaxyDataValue>() };
var results = new List<GalaxyDataValue>(req.TagReferences.Length);
foreach (var reference in req.TagReferences)
{
try
{
var vtq = await _mx.ReadAsync(reference, TimeSpan.FromSeconds(5), ct);
results.Add(ToWire(reference, vtq));
}
catch (Exception ex)
{
results.Add(new GalaxyDataValue
{
TagReference = reference,
StatusCode = 0x80020000u, // Bad_InternalError
ServerTimestampUtcUnixMs = DateTimeOffset.UtcNow.ToUnixTimeMilliseconds(),
ValueBytes = MessagePackSerializer.Serialize(ex.Message),
});
}
}
return new ReadValuesResponse { Success = true, Values = results.ToArray() };
}
public async Task<WriteValuesResponse> WriteValuesAsync(WriteValuesRequest req, CancellationToken ct)
{
var results = new List<WriteValueResult>(req.Writes.Length);
foreach (var w in req.Writes)
{
try
{
// Decode the value back from the MessagePack bytes the Proxy sent.
var value = w.ValueBytes is null
? null
: MessagePackSerializer.Deserialize<object>(w.ValueBytes);
var ok = await _mx.WriteAsync(w.TagReference, value!);
results.Add(new WriteValueResult
{
TagReference = w.TagReference,
StatusCode = ok ? 0u : 0x80020000u, // Good or Bad_InternalError
Error = ok ? null : "MXAccess runtime reported write failure",
});
}
catch (Exception ex)
{
results.Add(new WriteValueResult { TagReference = w.TagReference, StatusCode = 0x80020000u, Error = ex.Message });
}
}
return new WriteValuesResponse { Results = results.ToArray() };
}
public async Task<SubscribeResponse> SubscribeAsync(SubscribeRequest req, CancellationToken ct)
{
var sid = Interlocked.Increment(ref _nextSubscriptionId);
try
{
foreach (var tag in req.TagReferences)
{
_refToSubs.AddOrUpdate(tag,
_ => new System.Collections.Concurrent.ConcurrentBag<long> { sid },
(_, bag) => { bag.Add(sid); return bag; });
// The MXAccess SubscribeAsync only takes one callback per tag; the same callback
// fires for every active subscription of that tag — we fan out by SubscriptionId.
await _mx.SubscribeAsync(tag, OnTagValueChanged);
}
_subs[sid] = req.TagReferences;
return new SubscribeResponse { Success = true, SubscriptionId = sid, ActualIntervalMs = req.RequestedIntervalMs };
}
catch (Exception ex)
{
return new SubscribeResponse { Success = false, Error = ex.Message };
}
}
public async Task UnsubscribeAsync(UnsubscribeRequest req, CancellationToken ct)
{
if (!_subs.TryRemove(req.SubscriptionId, out var refs)) return;
foreach (var r in refs)
{
// Drop this subscription from the reverse map; only unsubscribe from MXAccess if no
// other subscription is still listening (multiple Proxy subs may share a tag).
_refToSubs.TryGetValue(r, out var bag);
if (bag is not null)
{
var remaining = new System.Collections.Concurrent.ConcurrentBag<long>(
bag.Where(id => id != req.SubscriptionId));
if (remaining.IsEmpty)
{
_refToSubs.TryRemove(r, out _);
await _mx.UnsubscribeAsync(r);
}
else
{
_refToSubs[r] = remaining;
}
}
}
}
/// <summary>
/// Fires for every value change on any subscribed Galaxy attribute. Wraps the value in
/// a <see cref="GalaxyDataValue"/> and raises <see cref="OnDataChange"/> once per
/// subscription that includes this tag — the IPC sink translates that into outbound
/// <c>OnDataChangeNotification</c> frames.
/// </summary>
private void OnTagValueChanged(string fullReference, MxAccess.Vtq vtq)
{
if (!_refToSubs.TryGetValue(fullReference, out var bag) || bag.IsEmpty) return;
var wireValue = ToWire(fullReference, vtq);
// Emit one notification per active SubscriptionId for this tag — the Proxy fans out to
// each ISubscribable consumer based on the SubscriptionId in the payload.
foreach (var sid in bag.Distinct())
{
OnDataChange?.Invoke(this, new OnDataChangeNotification
{
SubscriptionId = sid,
Values = new[] { wireValue },
});
}
}
/// <summary>
/// PR 14: advise every alarm-bearing attribute's 4-attr quartet. Best-effort per-alarm —
/// a subscribe failure on one alarm doesn't abort the whole call, since operators prefer
/// partial alarm coverage to none. Idempotent on repeat calls (tracker internally
/// skips already-tracked alarms).
/// </summary>
public async Task SubscribeAlarmsAsync(AlarmSubscribeRequest req, CancellationToken ct)
{
foreach (var tag in _discoveredAlarmTags)
{
try { await _alarmTracker.TrackAsync(tag).ConfigureAwait(false); }
catch { /* swallow per-alarm — tracker rolls back its own state on failure */ }
}
}
/// <summary>
/// PR 14: route operator ack through the tracker's AckMsg write path. EventId on the
/// incoming request maps directly to the alarm full reference (Proxy-side naming
/// convention from GalaxyProxyDriver.RaiseAlarmEvent → ev.EventId).
/// </summary>
public async Task AcknowledgeAlarmAsync(AlarmAckRequest req, CancellationToken ct)
{
// EventId carries a per-transition Guid.ToString("N"); there's no reverse map from
// event id to alarm tag yet, so v1's convention (ack targets the condition) is matched
// by reading the alarm name from the Comment envelope: v1 packed "<tag>|<comment>".
// Until the Proxy is updated to send the alarm tag separately, fall back to treating
// the EventId as the alarm tag — Client CLI passes it through unchanged.
var tag = req.EventId;
if (!string.IsNullOrWhiteSpace(tag))
{
try { await _alarmTracker.AcknowledgeAsync(tag, req.Comment ?? string.Empty).ConfigureAwait(false); }
catch { /* swallow — ack failures surface via MxAccessClient.WriteAsync logs */ }
}
}
public async Task<HistoryReadResponse> HistoryReadAsync(HistoryReadRequest req, CancellationToken ct)
{
if (_historian is null)
return new HistoryReadResponse
{
Success = false,
Error = "Historian disabled — no OTOPCUA_HISTORIAN_ENABLED configuration",
Tags = Array.Empty<HistoryTagValues>(),
};
var start = DateTimeOffset.FromUnixTimeMilliseconds(req.StartUtcUnixMs).UtcDateTime;
var end = DateTimeOffset.FromUnixTimeMilliseconds(req.EndUtcUnixMs).UtcDateTime;
var tags = new List<HistoryTagValues>(req.TagReferences.Length);
try
{
foreach (var reference in req.TagReferences)
{
var samples = await _historian.ReadRawAsync(reference, start, end, (int)req.MaxValuesPerTag, ct).ConfigureAwait(false);
tags.Add(new HistoryTagValues
{
TagReference = reference,
Values = samples.Select(s => ToWire(reference, s)).ToArray(),
});
}
return new HistoryReadResponse { Success = true, Tags = tags.ToArray() };
}
catch (OperationCanceledException) { throw; }
catch (Exception ex)
{
return new HistoryReadResponse
{
Success = false,
Error = $"Historian read failed: {ex.Message}",
Tags = tags.ToArray(),
};
}
}
public async Task<HistoryReadProcessedResponse> HistoryReadProcessedAsync(
HistoryReadProcessedRequest req, CancellationToken ct)
{
if (_historian is null)
return new HistoryReadProcessedResponse
{
Success = false,
Error = "Historian disabled — no OTOPCUA_HISTORIAN_ENABLED configuration",
Values = Array.Empty<GalaxyDataValue>(),
};
if (req.IntervalMs <= 0)
return new HistoryReadProcessedResponse
{
Success = false,
Error = "HistoryReadProcessed requires IntervalMs > 0",
Values = Array.Empty<GalaxyDataValue>(),
};
var start = DateTimeOffset.FromUnixTimeMilliseconds(req.StartUtcUnixMs).UtcDateTime;
var end = DateTimeOffset.FromUnixTimeMilliseconds(req.EndUtcUnixMs).UtcDateTime;
try
{
var samples = await _historian.ReadAggregateAsync(
req.TagReference, start, end, req.IntervalMs, req.AggregateColumn, ct).ConfigureAwait(false);
var wire = samples.Select(s => ToWire(req.TagReference, s)).ToArray();
return new HistoryReadProcessedResponse { Success = true, Values = wire };
}
catch (OperationCanceledException) { throw; }
catch (Exception ex)
{
return new HistoryReadProcessedResponse
{
Success = false,
Error = $"Historian aggregate read failed: {ex.Message}",
Values = Array.Empty<GalaxyDataValue>(),
};
}
}
public async Task<HistoryReadAtTimeResponse> HistoryReadAtTimeAsync(
HistoryReadAtTimeRequest req, CancellationToken ct)
{
if (_historian is null)
return new HistoryReadAtTimeResponse
{
Success = false,
Error = "Historian disabled — no OTOPCUA_HISTORIAN_ENABLED configuration",
Values = Array.Empty<GalaxyDataValue>(),
};
if (req.TimestampsUtcUnixMs.Length == 0)
return new HistoryReadAtTimeResponse { Success = true, Values = Array.Empty<GalaxyDataValue>() };
var timestamps = req.TimestampsUtcUnixMs
.Select(ms => DateTimeOffset.FromUnixTimeMilliseconds(ms).UtcDateTime)
.ToArray();
try
{
var samples = await _historian.ReadAtTimeAsync(req.TagReference, timestamps, ct).ConfigureAwait(false);
var wire = samples.Select(s => ToWire(req.TagReference, s)).ToArray();
return new HistoryReadAtTimeResponse { Success = true, Values = wire };
}
catch (OperationCanceledException) { throw; }
catch (Exception ex)
{
return new HistoryReadAtTimeResponse
{
Success = false,
Error = $"Historian at-time read failed: {ex.Message}",
Values = Array.Empty<GalaxyDataValue>(),
};
}
}
public async Task<HistoryReadEventsResponse> HistoryReadEventsAsync(
HistoryReadEventsRequest req, CancellationToken ct)
{
if (_historian is null)
return new HistoryReadEventsResponse
{
Success = false,
Error = "Historian disabled — no OTOPCUA_HISTORIAN_ENABLED configuration",
Events = Array.Empty<GalaxyHistoricalEvent>(),
};
var start = DateTimeOffset.FromUnixTimeMilliseconds(req.StartUtcUnixMs).UtcDateTime;
var end = DateTimeOffset.FromUnixTimeMilliseconds(req.EndUtcUnixMs).UtcDateTime;
try
{
var events = await _historian.ReadEventsAsync(req.SourceName, start, end, req.MaxEvents, ct).ConfigureAwait(false);
var wire = events.Select(e => new GalaxyHistoricalEvent
{
EventId = e.Id.ToString(),
SourceName = e.Source,
EventTimeUtcUnixMs = new DateTimeOffset(DateTime.SpecifyKind(e.EventTime, DateTimeKind.Utc), TimeSpan.Zero).ToUnixTimeMilliseconds(),
ReceivedTimeUtcUnixMs = new DateTimeOffset(DateTime.SpecifyKind(e.ReceivedTime, DateTimeKind.Utc), TimeSpan.Zero).ToUnixTimeMilliseconds(),
DisplayText = e.DisplayText,
Severity = e.Severity,
}).ToArray();
return new HistoryReadEventsResponse { Success = true, Events = wire };
}
catch (OperationCanceledException) { throw; }
catch (Exception ex)
{
return new HistoryReadEventsResponse
{
Success = false,
Error = $"Historian event read failed: {ex.Message}",
Events = Array.Empty<GalaxyHistoricalEvent>(),
};
}
}
public Task<RecycleStatusResponse> RecycleAsync(RecycleHostRequest req, CancellationToken ct)
=> Task.FromResult(new RecycleStatusResponse { Accepted = true, GraceSeconds = 15 });
public void Dispose()
{
_alarmTracker.TransitionRaised -= _onAlarmTransition;
_alarmTracker.Dispose();
_probeManager.StateChanged -= _onProbeStateChanged;
_probeManager.Dispose();
_mx.ConnectionStateChanged -= _onConnectionStateChanged;
_historian?.Dispose();
}
private static GalaxyDataValue ToWire(string reference, Vtq vtq) => new()
{
TagReference = reference,
ValueBytes = vtq.Value is null ? null : MessagePackSerializer.Serialize(vtq.Value),
ValueMessagePackType = 0,
StatusCode = vtq.Quality >= 192 ? 0u : 0x40000000u, // Good vs Uncertain placeholder
SourceTimestampUtcUnixMs = new DateTimeOffset(vtq.TimestampUtc, TimeSpan.Zero).ToUnixTimeMilliseconds(),
ServerTimestampUtcUnixMs = DateTimeOffset.UtcNow.ToUnixTimeMilliseconds(),
};
/// <summary>
/// Maps a <see cref="HistorianSample"/> (raw historian row, OPC-UA-free) to the IPC wire
/// shape. The Proxy decodes the MessagePack value and maps <see cref="HistorianSample.Quality"/>
/// through <c>QualityMapper</c> on its side of the pipe — we keep the raw byte here so
/// rich OPC DA status codes (e.g. <c>BadNotConnected</c>, <c>UncertainSubNormal</c>) survive
/// the hop intact.
/// </summary>
private static GalaxyDataValue ToWire(string reference, HistorianSample sample) => new()
{
TagReference = reference,
ValueBytes = sample.Value is null ? null : MessagePackSerializer.Serialize(sample.Value),
ValueMessagePackType = 0,
StatusCode = HistorianQualityMapper.Map(sample.Quality),
SourceTimestampUtcUnixMs = new DateTimeOffset(sample.TimestampUtc, TimeSpan.Zero).ToUnixTimeMilliseconds(),
ServerTimestampUtcUnixMs = DateTimeOffset.UtcNow.ToUnixTimeMilliseconds(),
};
/// <summary>
/// Maps a <see cref="HistorianAggregateSample"/> (one aggregate bucket) to the IPC wire
/// shape. A null <see cref="HistorianAggregateSample.Value"/> means the aggregate was
/// unavailable for the bucket — the Proxy translates that to OPC UA <c>BadNoData</c>.
/// </summary>
private static GalaxyDataValue ToWire(string reference, HistorianAggregateSample sample) => new()
{
TagReference = reference,
ValueBytes = sample.Value is null ? null : MessagePackSerializer.Serialize(sample.Value.Value),
ValueMessagePackType = 0,
StatusCode = sample.Value is null ? 0x800E0000u /* BadNoData */ : 0x00000000u,
SourceTimestampUtcUnixMs = new DateTimeOffset(sample.TimestampUtc, TimeSpan.Zero).ToUnixTimeMilliseconds(),
ServerTimestampUtcUnixMs = DateTimeOffset.UtcNow.ToUnixTimeMilliseconds(),
};
private static GalaxyAttributeInfo MapAttribute(GalaxyAttributeRow row) => new()
{
AttributeName = row.AttributeName,
MxDataType = row.MxDataType,
IsArray = row.IsArray,
ArrayDim = row.ArrayDimension is int d and > 0 ? (uint)d : null,
SecurityClassification = row.SecurityClassification,
IsHistorized = row.IsHistorized,
IsAlarm = row.IsAlarm,
};
private static string MapCategory(int categoryId) => categoryId switch
{
1 => "$WinPlatform",
3 => "$AppEngine",
4 => "$Area",
10 => "$UserDefined",
11 => "$ApplicationObject",
13 => "$Area",
17 => "$DeviceIntegration",
24 => "$ViewEngine",
26 => "$ViewApp",
_ => $"category-{categoryId}",
};
}
@@ -1,273 +0,0 @@
using System;
using System.Collections.Generic;
using System.Linq;
using System.Threading.Tasks;
using ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Host.Backend.MxAccess;
namespace ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Host.Backend.Stability;
/// <summary>
/// Per-platform + per-AppEngine runtime probe. Subscribes to <c>&lt;TagName&gt;.ScanState</c>
/// for each $WinPlatform and $AppEngine gobject, tracks Unknown → Running → Stopped
/// transitions, and fires <see cref="StateChanged"/> so <see cref="Backend.MxAccessGalaxyBackend"/>
/// can forward per-host events through the existing IPC <c>OnHostStatusChanged</c> event.
/// Pure-logic state machine with an injected clock so it's deterministically testable —
/// port of v1 <c>GalaxyRuntimeProbeManager</c> without the OPC UA node-manager coupling.
/// </summary>
/// <remarks>
/// State machine rules (documented in v1's <c>runtimestatus.md</c> and preserved here):
/// <list type="bullet">
/// <item><c>ScanState</c> is on-change-only — a stably-Running host may go hours without a
/// callback. Running → Stopped is driven by an explicit <c>ScanState=false</c> callback,
/// never by starvation.</item>
/// <item>Unknown → Running is a startup transition and does NOT fire StateChanged (would
/// paint every host as "just recovered" at startup, which is noise).</item>
/// <item>Stopped → Running and Running → Stopped fire StateChanged. Unknown → Stopped
/// fires StateChanged because that's a first-known-bad signal operators need.</item>
/// <item>All public methods are thread-safe. Callbacks fire outside the internal lock to
/// avoid lock inversion with caller-owned state.</item>
/// </list>
/// </remarks>
public sealed class GalaxyRuntimeProbeManager : IDisposable
{
public const int CategoryWinPlatform = 1;
public const int CategoryAppEngine = 3;
public const string ProbeAttribute = ".ScanState";
private readonly Func<DateTime> _clock;
private readonly Func<string, Action<string, Vtq>, Task> _subscribe;
private readonly Func<string, Task> _unsubscribe;
private readonly object _lock = new();
// probe tag → per-host state
private readonly Dictionary<string, HostProbeState> _byProbe = new(StringComparer.OrdinalIgnoreCase);
// tag name → probe tag (for reverse lookup on the desired-set diff)
private readonly Dictionary<string, string> _probeByTagName = new(StringComparer.OrdinalIgnoreCase);
private bool _disposed;
/// <summary>
/// Fires on every state transition that operators should react to. See class remarks
/// for the rules on which transitions fire.
/// </summary>
public event EventHandler<HostStateTransition>? StateChanged;
public GalaxyRuntimeProbeManager(
Func<string, Action<string, Vtq>, Task> subscribe,
Func<string, Task> unsubscribe)
: this(subscribe, unsubscribe, () => DateTime.UtcNow) { }
internal GalaxyRuntimeProbeManager(
Func<string, Action<string, Vtq>, Task> subscribe,
Func<string, Task> unsubscribe,
Func<DateTime> clock)
{
_subscribe = subscribe ?? throw new ArgumentNullException(nameof(subscribe));
_unsubscribe = unsubscribe ?? throw new ArgumentNullException(nameof(unsubscribe));
_clock = clock ?? throw new ArgumentNullException(nameof(clock));
}
/// <summary>Number of probes currently advised. Test/dashboard hook.</summary>
public int ActiveProbeCount
{
get { lock (_lock) return _byProbe.Count; }
}
/// <summary>
/// Snapshot every currently-tracked host's state. One entry per probe.
/// </summary>
public IReadOnlyList<HostProbeSnapshot> SnapshotStates()
{
lock (_lock)
{
return _byProbe.Select(kv => new HostProbeSnapshot(
TagName: kv.Value.TagName,
State: kv.Value.State,
LastChangedUtc: kv.Value.LastStateChangeUtc)).ToList();
}
}
/// <summary>
/// Query the current runtime state for <paramref name="tagName"/>. Returns
/// <see cref="HostRuntimeState.Unknown"/> when the host is not tracked.
/// </summary>
public HostRuntimeState GetState(string tagName)
{
lock (_lock)
{
if (_probeByTagName.TryGetValue(tagName, out var probe)
&& _byProbe.TryGetValue(probe, out var state))
return state.State;
return HostRuntimeState.Unknown;
}
}
/// <summary>
/// Diff the desired host set (filtered $WinPlatform / $AppEngine from the latest Discover)
/// against the currently-tracked set and advise / unadvise as needed. Idempotent:
/// calling twice with the same set does nothing.
/// </summary>
public async Task SyncAsync(IEnumerable<HostProbeTarget> desiredHosts)
{
if (_disposed) return;
var desired = desiredHosts
.Where(h => !string.IsNullOrWhiteSpace(h.TagName))
.ToDictionary(h => h.TagName, StringComparer.OrdinalIgnoreCase);
List<string> toAdvise;
List<string> toUnadvise;
lock (_lock)
{
toAdvise = desired.Keys
.Where(tag => !_probeByTagName.ContainsKey(tag))
.ToList();
toUnadvise = _probeByTagName.Keys
.Where(tag => !desired.ContainsKey(tag))
.Select(tag => _probeByTagName[tag])
.ToList();
foreach (var tag in toAdvise)
{
var probe = tag + ProbeAttribute;
_probeByTagName[tag] = probe;
_byProbe[probe] = new HostProbeState
{
TagName = tag,
State = HostRuntimeState.Unknown,
LastStateChangeUtc = _clock(),
};
}
foreach (var probe in toUnadvise)
{
_byProbe.Remove(probe);
}
foreach (var removedTag in _probeByTagName.Keys.Where(t => !desired.ContainsKey(t)).ToList())
{
_probeByTagName.Remove(removedTag);
}
}
foreach (var tag in toAdvise)
{
var probe = tag + ProbeAttribute;
try
{
await _subscribe(probe, OnProbeCallback);
}
catch
{
// Rollback on subscribe failure so a later Tick can't transition a never-advised
// probe into a false Stopped state. Callers can re-Sync later to retry.
lock (_lock)
{
_byProbe.Remove(probe);
_probeByTagName.Remove(tag);
}
}
}
foreach (var probe in toUnadvise)
{
try { await _unsubscribe(probe); } catch { /* best-effort cleanup */ }
}
}
/// <summary>
/// Public entry point for tests and internal callbacks. Production flow: MxAccessClient's
/// SubscribeAsync delivers VTQ updates through the callback wired in <see cref="SyncAsync"/>,
/// which calls this method under the lock to update state and fires
/// <see cref="StateChanged"/> outside the lock for any transition that matters.
/// </summary>
public void OnProbeCallback(string probeTag, Vtq vtq)
{
if (_disposed) return;
HostStateTransition? transition = null;
lock (_lock)
{
if (!_byProbe.TryGetValue(probeTag, out var state)) return;
var isRunning = vtq.Quality >= 192 && vtq.Value is bool b && b;
var now = _clock();
var previous = state.State;
state.LastCallbackUtc = now;
if (isRunning)
{
state.GoodUpdateCount++;
if (previous != HostRuntimeState.Running)
{
state.State = HostRuntimeState.Running;
state.LastStateChangeUtc = now;
if (previous == HostRuntimeState.Stopped)
{
transition = new HostStateTransition(state.TagName, previous, HostRuntimeState.Running, now);
}
}
}
else
{
state.FailureCount++;
if (previous != HostRuntimeState.Stopped)
{
state.State = HostRuntimeState.Stopped;
state.LastStateChangeUtc = now;
transition = new HostStateTransition(state.TagName, previous, HostRuntimeState.Stopped, now);
}
}
}
if (transition is { } t)
{
StateChanged?.Invoke(this, t);
}
}
public void Dispose()
{
if (_disposed) return;
_disposed = true;
lock (_lock)
{
_byProbe.Clear();
_probeByTagName.Clear();
}
}
private sealed class HostProbeState
{
public string TagName { get; set; } = "";
public HostRuntimeState State { get; set; }
public DateTime LastStateChangeUtc { get; set; }
public DateTime? LastCallbackUtc { get; set; }
public long GoodUpdateCount { get; set; }
public long FailureCount { get; set; }
}
}
public enum HostRuntimeState
{
Unknown,
Running,
Stopped,
}
public sealed record HostStateTransition(
string TagName,
HostRuntimeState OldState,
HostRuntimeState NewState,
DateTime AtUtc);
public sealed record HostProbeSnapshot(
string TagName,
HostRuntimeState State,
DateTime LastChangedUtc);
public readonly record struct HostProbeTarget(string TagName, int CategoryId)
{
public bool IsRuntimeHost =>
CategoryId == GalaxyRuntimeProbeManager.CategoryWinPlatform
|| CategoryId == GalaxyRuntimeProbeManager.CategoryAppEngine;
}
@@ -1,121 +0,0 @@
using System.Threading;
using System.Threading.Tasks;
using ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Shared.Contracts;
namespace ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Host.Backend;
/// <summary>
/// Phase 2 placeholder backend — accepts session open/close + responds to recycle, returns
/// "not-implemented" results for every data-plane call. Replaced by the lifted
/// <c>MxAccessClient</c>-backed implementation during the deferred Galaxy code move
/// (Task B.1 + parity gate). Keeps the IPC end-to-end testable today.
/// </summary>
public sealed class StubGalaxyBackend : IGalaxyBackend
{
private long _nextSessionId;
private long _nextSubscriptionId;
// Stub backend never raises events — implements the interface members for symmetry.
#pragma warning disable CS0067
public event System.EventHandler<OnDataChangeNotification>? OnDataChange;
public event System.EventHandler<GalaxyAlarmEvent>? OnAlarmEvent;
public event System.EventHandler<HostConnectivityStatus>? OnHostStatusChanged;
#pragma warning restore CS0067
public Task<OpenSessionResponse> OpenSessionAsync(OpenSessionRequest req, CancellationToken ct)
{
var id = Interlocked.Increment(ref _nextSessionId);
return Task.FromResult(new OpenSessionResponse { Success = true, SessionId = id });
}
public Task CloseSessionAsync(CloseSessionRequest req, CancellationToken ct) => Task.CompletedTask;
public Task<DiscoverHierarchyResponse> DiscoverAsync(DiscoverHierarchyRequest req, CancellationToken ct)
=> Task.FromResult(new DiscoverHierarchyResponse
{
Success = false,
Error = "stub: MXAccess code lift pending (Phase 2 Task B.1)",
Objects = System.Array.Empty<GalaxyObjectInfo>(),
});
public Task<ReadValuesResponse> ReadValuesAsync(ReadValuesRequest req, CancellationToken ct)
=> Task.FromResult(new ReadValuesResponse
{
Success = false,
Error = "stub: MXAccess code lift pending (Phase 2 Task B.1)",
Values = System.Array.Empty<GalaxyDataValue>(),
});
public Task<WriteValuesResponse> WriteValuesAsync(WriteValuesRequest req, CancellationToken ct)
{
var results = new WriteValueResult[req.Writes.Length];
for (var i = 0; i < req.Writes.Length; i++)
{
results[i] = new WriteValueResult
{
TagReference = req.Writes[i].TagReference,
StatusCode = 0x80020000u, // Bad_InternalError
Error = "stub: MXAccess code lift pending (Phase 2 Task B.1)",
};
}
return Task.FromResult(new WriteValuesResponse { Results = results });
}
public Task<SubscribeResponse> SubscribeAsync(SubscribeRequest req, CancellationToken ct)
{
var sid = Interlocked.Increment(ref _nextSubscriptionId);
return Task.FromResult(new SubscribeResponse
{
Success = true,
SubscriptionId = sid,
ActualIntervalMs = req.RequestedIntervalMs,
});
}
public Task UnsubscribeAsync(UnsubscribeRequest req, CancellationToken ct) => Task.CompletedTask;
public Task SubscribeAlarmsAsync(AlarmSubscribeRequest req, CancellationToken ct) => Task.CompletedTask;
public Task AcknowledgeAlarmAsync(AlarmAckRequest req, CancellationToken ct) => Task.CompletedTask;
public Task<HistoryReadResponse> HistoryReadAsync(HistoryReadRequest req, CancellationToken ct)
=> Task.FromResult(new HistoryReadResponse
{
Success = false,
Error = "stub: MXAccess code lift pending (Phase 2 Task B.1)",
Tags = System.Array.Empty<HistoryTagValues>(),
});
public Task<HistoryReadProcessedResponse> HistoryReadProcessedAsync(
HistoryReadProcessedRequest req, CancellationToken ct)
=> Task.FromResult(new HistoryReadProcessedResponse
{
Success = false,
Error = "stub: MXAccess code lift pending (Phase 2 Task B.1)",
Values = System.Array.Empty<GalaxyDataValue>(),
});
public Task<HistoryReadAtTimeResponse> HistoryReadAtTimeAsync(
HistoryReadAtTimeRequest req, CancellationToken ct)
=> Task.FromResult(new HistoryReadAtTimeResponse
{
Success = false,
Error = "stub: MXAccess code lift pending (Phase 2 Task B.1)",
Values = System.Array.Empty<GalaxyDataValue>(),
});
public Task<HistoryReadEventsResponse> HistoryReadEventsAsync(
HistoryReadEventsRequest req, CancellationToken ct)
=> Task.FromResult(new HistoryReadEventsResponse
{
Success = false,
Error = "stub: MXAccess code lift pending (Phase 2 Task B.1)",
Events = System.Array.Empty<GalaxyHistoricalEvent>(),
});
public Task<RecycleStatusResponse> RecycleAsync(RecycleHostRequest req, CancellationToken ct)
=> Task.FromResult(new RecycleStatusResponse
{
Accepted = true,
GraceSeconds = 15, // matches Phase 2 plan §B.8 default
});
}
@@ -1,183 +0,0 @@
using System;
using System.Threading;
using System.Threading.Tasks;
using MessagePack;
using Serilog;
using ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Host.Backend;
using ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Shared;
using ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Shared.Contracts;
namespace ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Host.Ipc;
/// <summary>
/// Real IPC dispatcher — routes each <see cref="MessageKind"/> to the matching
/// <see cref="IGalaxyBackend"/> method. Replaces <see cref="StubFrameHandler"/>. Heartbeat
/// stays handled inline so liveness detection works regardless of backend health.
/// </summary>
public sealed class GalaxyFrameHandler(IGalaxyBackend backend, ILogger logger) : IFrameHandler
{
public async Task HandleAsync(MessageKind kind, byte[] body, FrameWriter writer, CancellationToken ct)
{
try
{
switch (kind)
{
case MessageKind.Heartbeat:
{
var hb = Deserialize<Heartbeat>(body);
await writer.WriteAsync(MessageKind.HeartbeatAck,
new HeartbeatAck { SequenceNumber = hb.SequenceNumber, UtcUnixMs = hb.UtcUnixMs }, ct);
return;
}
case MessageKind.OpenSessionRequest:
{
var resp = await backend.OpenSessionAsync(Deserialize<OpenSessionRequest>(body), ct);
await writer.WriteAsync(MessageKind.OpenSessionResponse, resp, ct);
return;
}
case MessageKind.CloseSessionRequest:
await backend.CloseSessionAsync(Deserialize<CloseSessionRequest>(body), ct);
return; // one-way
case MessageKind.DiscoverHierarchyRequest:
{
var resp = await backend.DiscoverAsync(Deserialize<DiscoverHierarchyRequest>(body), ct);
await writer.WriteAsync(MessageKind.DiscoverHierarchyResponse, resp, ct);
return;
}
case MessageKind.ReadValuesRequest:
{
var resp = await backend.ReadValuesAsync(Deserialize<ReadValuesRequest>(body), ct);
await writer.WriteAsync(MessageKind.ReadValuesResponse, resp, ct);
return;
}
case MessageKind.WriteValuesRequest:
{
var resp = await backend.WriteValuesAsync(Deserialize<WriteValuesRequest>(body), ct);
await writer.WriteAsync(MessageKind.WriteValuesResponse, resp, ct);
return;
}
case MessageKind.SubscribeRequest:
{
var resp = await backend.SubscribeAsync(Deserialize<SubscribeRequest>(body), ct);
await writer.WriteAsync(MessageKind.SubscribeResponse, resp, ct);
return;
}
case MessageKind.UnsubscribeRequest:
await backend.UnsubscribeAsync(Deserialize<UnsubscribeRequest>(body), ct);
return; // one-way
case MessageKind.AlarmSubscribeRequest:
await backend.SubscribeAlarmsAsync(Deserialize<AlarmSubscribeRequest>(body), ct);
return; // one-way; subsequent alarm events are server-pushed
case MessageKind.AlarmAckRequest:
await backend.AcknowledgeAlarmAsync(Deserialize<AlarmAckRequest>(body), ct);
return;
case MessageKind.HistoryReadRequest:
{
var resp = await backend.HistoryReadAsync(Deserialize<HistoryReadRequest>(body), ct);
await writer.WriteAsync(MessageKind.HistoryReadResponse, resp, ct);
return;
}
case MessageKind.HistoryReadProcessedRequest:
{
var resp = await backend.HistoryReadProcessedAsync(
Deserialize<HistoryReadProcessedRequest>(body), ct);
await writer.WriteAsync(MessageKind.HistoryReadProcessedResponse, resp, ct);
return;
}
case MessageKind.HistoryReadAtTimeRequest:
{
var resp = await backend.HistoryReadAtTimeAsync(
Deserialize<HistoryReadAtTimeRequest>(body), ct);
await writer.WriteAsync(MessageKind.HistoryReadAtTimeResponse, resp, ct);
return;
}
case MessageKind.HistoryReadEventsRequest:
{
var resp = await backend.HistoryReadEventsAsync(
Deserialize<HistoryReadEventsRequest>(body), ct);
await writer.WriteAsync(MessageKind.HistoryReadEventsResponse, resp, ct);
return;
}
case MessageKind.RecycleHostRequest:
{
var resp = await backend.RecycleAsync(Deserialize<RecycleHostRequest>(body), ct);
await writer.WriteAsync(MessageKind.RecycleStatusResponse, resp, ct);
return;
}
default:
await SendErrorAsync(writer, "unknown-kind", $"Frame kind {kind} not handled by Host", ct);
return;
}
}
catch (OperationCanceledException) { throw; }
catch (Exception ex)
{
logger.Error(ex, "GalaxyFrameHandler threw on {Kind}", kind);
await SendErrorAsync(writer, "handler-exception", ex.Message, ct);
}
}
/// <summary>
/// Subscribes the backend's server-pushed events for the lifetime of the connection.
/// The returned disposable unsubscribes when the connection closes — without it the
/// backend's static event invocation list would accumulate dead writer references and
/// leak memory + raise <see cref="ObjectDisposedException"/> on every push.
/// </summary>
public IDisposable AttachConnection(FrameWriter writer)
{
var sink = new ConnectionSink(backend, writer, logger);
sink.Attach();
return sink;
}
private static T Deserialize<T>(byte[] body) => MessagePackSerializer.Deserialize<T>(body);
private static Task SendErrorAsync(FrameWriter writer, string code, string message, CancellationToken ct)
=> writer.WriteAsync(MessageKind.ErrorResponse,
new ErrorResponse { Code = code, Message = message }, ct);
private sealed class ConnectionSink : IDisposable
{
private readonly IGalaxyBackend _backend;
private readonly FrameWriter _writer;
private readonly ILogger _logger;
private EventHandler<OnDataChangeNotification>? _onData;
private EventHandler<GalaxyAlarmEvent>? _onAlarm;
private EventHandler<HostConnectivityStatus>? _onHost;
public ConnectionSink(IGalaxyBackend backend, FrameWriter writer, ILogger logger)
{
_backend = backend; _writer = writer; _logger = logger;
}
public void Attach()
{
_onData = (_, e) => Push(MessageKind.OnDataChangeNotification, e);
_onAlarm = (_, e) => Push(MessageKind.AlarmEvent, e);
_onHost = (_, e) => Push(MessageKind.RuntimeStatusChange,
new RuntimeStatusChangeNotification { Status = e });
_backend.OnDataChange += _onData;
_backend.OnAlarmEvent += _onAlarm;
_backend.OnHostStatusChanged += _onHost;
}
private void Push<T>(MessageKind kind, T payload)
{
// Fire-and-forget — pushes can race with disposal of the writer. We swallow
// ObjectDisposedException because the dispose path will detach this sink shortly.
try { _writer.WriteAsync(kind, payload, CancellationToken.None).GetAwaiter().GetResult(); }
catch (ObjectDisposedException) { }
catch (Exception ex) { _logger.Warning(ex, "ConnectionSink push failed for {Kind}", kind); }
}
public void Dispose()
{
if (_onData is not null) _backend.OnDataChange -= _onData;
if (_onAlarm is not null) _backend.OnAlarmEvent -= _onAlarm;
if (_onHost is not null) _backend.OnHostStatusChanged -= _onHost;
}
}
}
@@ -1,45 +0,0 @@
using System;
using System.IO.Pipes;
using System.Security.AccessControl;
using System.Security.Principal;
namespace ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Host.Ipc;
/// <summary>
/// Builds the <see cref="PipeSecurity"/> required by <c>driver-stability.md §"IPC Security"</c>:
/// only the configured OtOpcUa server principal SID gets <c>ReadWrite | Synchronize</c>;
/// LocalSystem is explicitly denied. Any other authenticated user falls through to the
/// implicit deny.
/// </summary>
/// <remarks>
/// Earlier revisions also denied <c>BUILTIN\Administrators</c>, which broke live testing
/// on dev boxes where the allowed user (<c>dohertj2</c>) is also a member of the local
/// Administrators group — UAC's filtered token still carries the Admins SID as deny-only,
/// so the deny ACE fired even from non-elevated shells. The per-connection
/// <see cref="PipeServer.VerifyCaller"/> check already gates on the exact allowed SID,
/// which is the real authorization boundary, so the Admins deny added no defence in depth
/// in that topology.
/// </remarks>
public static class PipeAcl
{
public static PipeSecurity Create(SecurityIdentifier allowedSid)
{
if (allowedSid is null) throw new ArgumentNullException(nameof(allowedSid));
var security = new PipeSecurity();
security.AddAccessRule(new PipeAccessRule(
allowedSid,
PipeAccessRights.ReadWrite | PipeAccessRights.Synchronize,
AccessControlType.Allow));
var localSystem = new SecurityIdentifier(WellKnownSidType.LocalSystemSid, null);
if (allowedSid != localSystem)
security.AddAccessRule(new PipeAccessRule(localSystem, PipeAccessRights.FullControl, AccessControlType.Deny));
// Owner = allowed SID so the deny rules can't be removed without write-DACL rights.
security.SetOwner(allowedSid);
return security;
}
}
@@ -1,179 +0,0 @@
using System;
using System.IO.Pipes;
using System.Security.Principal;
using System.Threading;
using System.Threading.Tasks;
using MessagePack;
using Serilog;
using ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Shared;
using ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Shared.Contracts;
namespace ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Host.Ipc;
/// <summary>
/// Accepts one client connection at a time on a named pipe with the strict ACL from
/// <see cref="PipeAcl"/>. Verifies the peer SID and the per-process shared secret before any
/// RPC frame is accepted. Per <c>driver-stability.md §"IPC Security"</c>.
/// </summary>
public sealed class PipeServer : IDisposable
{
private readonly string _pipeName;
private readonly SecurityIdentifier _allowedSid;
private readonly string _sharedSecret;
private readonly ILogger _logger;
private readonly CancellationTokenSource _cts = new();
private NamedPipeServerStream? _current;
public PipeServer(string pipeName, SecurityIdentifier allowedSid, string sharedSecret, ILogger logger)
{
_pipeName = pipeName ?? throw new ArgumentNullException(nameof(pipeName));
_allowedSid = allowedSid ?? throw new ArgumentNullException(nameof(allowedSid));
_sharedSecret = sharedSecret ?? throw new ArgumentNullException(nameof(sharedSecret));
_logger = logger ?? throw new ArgumentNullException(nameof(logger));
}
/// <summary>
/// Accepts one connection, performs Hello handshake, then dispatches frames to
/// <paramref name="handler"/> until EOF or cancel. Returns when the client disconnects.
/// </summary>
public async Task RunOneConnectionAsync(IFrameHandler handler, CancellationToken ct)
{
using var linked = CancellationTokenSource.CreateLinkedTokenSource(_cts.Token, ct);
var acl = PipeAcl.Create(_allowedSid);
// .NET Framework 4.8 uses the legacy constructor overload that takes a PipeSecurity directly.
_current = new NamedPipeServerStream(
_pipeName,
PipeDirection.InOut,
maxNumberOfServerInstances: 1,
PipeTransmissionMode.Byte,
PipeOptions.Asynchronous,
inBufferSize: 64 * 1024,
outBufferSize: 64 * 1024,
pipeSecurity: acl);
try
{
await _current.WaitForConnectionAsync(linked.Token).ConfigureAwait(false);
using var reader = new FrameReader(_current, leaveOpen: true);
using var writer = new FrameWriter(_current, leaveOpen: true);
// First frame must be a Hello with the correct shared secret. Reading it before
// the caller-SID impersonation check satisfies Windows' ERROR_CANNOT_IMPERSONATE
// rule — ImpersonateNamedPipeClient fails until at least one frame has been read.
var first = await reader.ReadFrameAsync(linked.Token).ConfigureAwait(false);
if (first is null || first.Value.Kind != MessageKind.Hello)
{
_logger.Warning("IPC first frame was not Hello; dropping");
return;
}
if (!VerifyCaller(_current, out var reason))
{
_logger.Warning("IPC caller rejected: {Reason}", reason);
_current.Disconnect();
return;
}
var hello = MessagePackSerializer.Deserialize<Hello>(first.Value.Body);
if (!string.Equals(hello.SharedSecret, _sharedSecret, StringComparison.Ordinal))
{
await writer.WriteAsync(MessageKind.HelloAck,
new HelloAck { Accepted = false, RejectReason = "shared-secret-mismatch" },
linked.Token).ConfigureAwait(false);
_logger.Warning("IPC Hello rejected: shared-secret-mismatch");
return;
}
if (hello.ProtocolMajor != Hello.CurrentMajor)
{
await writer.WriteAsync(MessageKind.HelloAck,
new HelloAck { Accepted = false, RejectReason = $"major-version-mismatch-peer={hello.ProtocolMajor}-server={Hello.CurrentMajor}" },
linked.Token).ConfigureAwait(false);
_logger.Warning("IPC Hello rejected: major mismatch peer={Peer} server={Server}",
hello.ProtocolMajor, Hello.CurrentMajor);
return;
}
await writer.WriteAsync(MessageKind.HelloAck,
new HelloAck { Accepted = true, HostName = Environment.MachineName },
linked.Token).ConfigureAwait(false);
using var attachment = handler.AttachConnection(writer);
while (!linked.Token.IsCancellationRequested)
{
var frame = await reader.ReadFrameAsync(linked.Token).ConfigureAwait(false);
if (frame is null) break;
await handler.HandleAsync(frame.Value.Kind, frame.Value.Body, writer, linked.Token).ConfigureAwait(false);
}
}
finally
{
_current.Dispose();
_current = null;
}
}
/// <summary>
/// Runs the server continuously, handling one connection at a time. When a connection ends
/// (clean or error), accepts the next.
/// </summary>
public async Task RunAsync(IFrameHandler handler, CancellationToken ct)
{
while (!ct.IsCancellationRequested)
{
try { await RunOneConnectionAsync(handler, ct).ConfigureAwait(false); }
catch (OperationCanceledException) { break; }
catch (Exception ex) { _logger.Error(ex, "IPC connection loop error — accepting next"); }
}
}
private bool VerifyCaller(NamedPipeServerStream pipe, out string reason)
{
try
{
pipe.RunAsClient(() =>
{
using var wi = WindowsIdentity.GetCurrent();
if (wi.User is null)
throw new InvalidOperationException("GetCurrent().User is null — cannot verify caller");
if (wi.User != _allowedSid)
throw new UnauthorizedAccessException(
$"caller SID {wi.User.Value} does not match allowed {_allowedSid.Value}");
});
reason = string.Empty;
return true;
}
catch (Exception ex) { reason = ex.Message; return false; }
}
public void Dispose()
{
_cts.Cancel();
_current?.Dispose();
_cts.Dispose();
}
}
public interface IFrameHandler
{
Task HandleAsync(MessageKind kind, byte[] body, FrameWriter writer, CancellationToken ct);
/// <summary>
/// Called once per accepted connection after the Hello handshake. Lets the handler
/// attach server-pushed event sinks (data-change, alarm, host-status) to the
/// connection's <paramref name="writer"/>. Returns an <see cref="IDisposable"/> the
/// pipe server disposes when the connection closes — backends use it to unsubscribe.
/// Implementations that don't push events can return <see cref="NoopAttachment"/>.
/// </summary>
IDisposable AttachConnection(FrameWriter writer);
public sealed class NoopAttachment : IDisposable
{
public static readonly NoopAttachment Instance = new();
public void Dispose() { }
}
}
@@ -1,33 +0,0 @@
using System;
using System.Threading;
using System.Threading.Tasks;
using MessagePack;
using ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Shared;
using ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Shared.Contracts;
namespace ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Host.Ipc;
/// <summary>
/// Placeholder handler that responds to the framed IPC with error responses. Replaced by the
/// real Galaxy-backed handler when the MXAccess code move (deferred) lands.
/// </summary>
public sealed class StubFrameHandler : IFrameHandler
{
public Task HandleAsync(MessageKind kind, byte[] body, FrameWriter writer, CancellationToken ct)
{
// Minimal lifecycle: heartbeat ack keeps the supervisor's liveness detector happy even
// while the data-plane is stubbed, so integration tests of the supervisor can run end-to-end.
if (kind == MessageKind.Heartbeat)
{
var hb = MessagePackSerializer.Deserialize<Heartbeat>(body);
return writer.WriteAsync(MessageKind.HeartbeatAck,
new HeartbeatAck { SequenceNumber = hb.SequenceNumber, UtcUnixMs = hb.UtcUnixMs }, ct);
}
return writer.WriteAsync(MessageKind.ErrorResponse,
new ErrorResponse { Code = "not-implemented", Message = $"Kind {kind} is stubbed — MXAccess lift deferred" },
ct);
}
public IDisposable AttachConnection(FrameWriter writer) => IFrameHandler.NoopAttachment.Instance;
}
@@ -1,5 +0,0 @@
// Shim — .NET Framework 4.8 doesn't ship with IsExternalInit, required for init-only setters +
// positional records. Safe to add in our own namespace; the compiler accepts any type with this name.
namespace System.Runtime.CompilerServices;
internal static class IsExternalInit;
@@ -1,139 +0,0 @@
using System;
using System.Security.Principal;
using System.Threading;
using Serilog;
using ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Host.Backend;
using ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Host.Backend.Galaxy;
using ZB.MOM.WW.OtOpcUa.Driver.Historian.Wonderware.Backend;
using ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Host.Backend.MxAccess;
using ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Host.Ipc;
using ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Host.Sta;
namespace ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Host;
/// <summary>
/// Entry point for the <c>OtOpcUaGalaxyHost</c> Windows service / console host. Reads the
/// pipe name, allowed-SID, and shared secret from environment (passed by the supervisor at
/// spawn time per <c>driver-stability.md</c>).
/// </summary>
public static class Program
{
public static int Main(string[] args)
{
Log.Logger = new LoggerConfiguration()
.MinimumLevel.Information()
.WriteTo.File(
@"%ProgramData%\OtOpcUa\galaxy-host-.log".Replace("%ProgramData%", Environment.GetFolderPath(Environment.SpecialFolder.CommonApplicationData)),
rollingInterval: RollingInterval.Day)
.CreateLogger();
try
{
var pipeName = Environment.GetEnvironmentVariable("OTOPCUA_GALAXY_PIPE") ?? "OtOpcUaGalaxy";
var allowedSidValue = Environment.GetEnvironmentVariable("OTOPCUA_ALLOWED_SID")
?? throw new InvalidOperationException("OTOPCUA_ALLOWED_SID not set — supervisor must pass the server principal SID");
var sharedSecret = Environment.GetEnvironmentVariable("OTOPCUA_GALAXY_SECRET")
?? throw new InvalidOperationException("OTOPCUA_GALAXY_SECRET not set — supervisor must pass the per-process secret at spawn time");
var allowedSid = new SecurityIdentifier(allowedSidValue);
using var server = new PipeServer(pipeName, allowedSid, sharedSecret, Log.Logger);
using var cts = new CancellationTokenSource();
Console.CancelKeyPress += (_, e) => { e.Cancel = true; cts.Cancel(); };
Log.Information("OtOpcUaGalaxyHost starting — pipe={Pipe} allowedSid={Sid}", pipeName, allowedSidValue);
// Backend selection — env var picks the implementation:
// OTOPCUA_GALAXY_BACKEND=stub → StubGalaxyBackend (no Galaxy required)
// OTOPCUA_GALAXY_BACKEND=db → DbBackedGalaxyBackend (Discover only, against ZB)
// OTOPCUA_GALAXY_BACKEND=mxaccess → MxAccessGalaxyBackend (real COM + ZB; default)
var backendKind = Environment.GetEnvironmentVariable("OTOPCUA_GALAXY_BACKEND")?.ToLowerInvariant() ?? "mxaccess";
var zbConn = Environment.GetEnvironmentVariable("OTOPCUA_GALAXY_ZB_CONN")
?? "Server=localhost;Database=ZB;Integrated Security=True;TrustServerCertificate=True;Encrypt=False;";
var clientName = Environment.GetEnvironmentVariable("OTOPCUA_GALAXY_CLIENT_NAME") ?? "OtOpcUa-Galaxy.Host";
IGalaxyBackend backend;
StaPump? pump = null;
MxAccessClient? mx = null;
switch (backendKind)
{
case "stub":
backend = new StubGalaxyBackend();
break;
case "db":
backend = new DbBackedGalaxyBackend(new GalaxyRepository(new GalaxyRepositoryOptions { ConnectionString = zbConn }));
break;
default: // mxaccess
pump = new StaPump("Galaxy.Sta");
pump.WaitForStartedAsync().GetAwaiter().GetResult();
mx = new MxAccessClient(pump, new MxProxyAdapter(), clientName);
var historian = BuildHistorianIfEnabled();
backend = new MxAccessGalaxyBackend(
new GalaxyRepository(new GalaxyRepositoryOptions { ConnectionString = zbConn }),
mx,
historian);
break;
}
Log.Information("OtOpcUaGalaxyHost backend={Backend}", backendKind);
var handler = new GalaxyFrameHandler(backend, Log.Logger);
try { server.RunAsync(handler, cts.Token).GetAwaiter().GetResult(); }
finally
{
(backend as IDisposable)?.Dispose();
mx?.Dispose();
pump?.Dispose();
}
Log.Information("OtOpcUaGalaxyHost stopped cleanly");
return 0;
}
catch (Exception ex)
{
Log.Fatal(ex, "OtOpcUaGalaxyHost fatal");
return 2;
}
finally { Log.CloseAndFlush(); }
}
/// <summary>
/// Builds a <see cref="HistorianDataSource"/> from the OTOPCUA_HISTORIAN_* environment
/// variables the supervisor passes at spawn time. Returns null when the historian is
/// disabled (default) so <c>MxAccessGalaxyBackend.HistoryReadAsync</c> returns a clear
/// "not configured" error instead of attempting an SDK connection to localhost.
/// </summary>
private static IHistorianDataSource? BuildHistorianIfEnabled()
{
var enabled = Environment.GetEnvironmentVariable("OTOPCUA_HISTORIAN_ENABLED");
if (!string.Equals(enabled, "true", StringComparison.OrdinalIgnoreCase) && enabled != "1")
return null;
var cfg = new HistorianConfiguration
{
Enabled = true,
ServerName = Environment.GetEnvironmentVariable("OTOPCUA_HISTORIAN_SERVER") ?? "localhost",
Port = TryParseInt("OTOPCUA_HISTORIAN_PORT", 32568),
IntegratedSecurity = !string.Equals(Environment.GetEnvironmentVariable("OTOPCUA_HISTORIAN_INTEGRATED"), "false", StringComparison.OrdinalIgnoreCase),
UserName = Environment.GetEnvironmentVariable("OTOPCUA_HISTORIAN_USER"),
Password = Environment.GetEnvironmentVariable("OTOPCUA_HISTORIAN_PASS"),
CommandTimeoutSeconds = TryParseInt("OTOPCUA_HISTORIAN_TIMEOUT_SEC", 30),
MaxValuesPerRead = TryParseInt("OTOPCUA_HISTORIAN_MAX_VALUES", 10000),
FailureCooldownSeconds = TryParseInt("OTOPCUA_HISTORIAN_COOLDOWN_SEC", 60),
};
var servers = Environment.GetEnvironmentVariable("OTOPCUA_HISTORIAN_SERVERS");
if (!string.IsNullOrWhiteSpace(servers))
cfg.ServerNames = new System.Collections.Generic.List<string>(
servers.Split(new[] { ',' }, StringSplitOptions.RemoveEmptyEntries));
Log.Information("Historian enabled — {NodeCount} configured node(s), port={Port}",
cfg.ServerNames.Count > 0 ? cfg.ServerNames.Count : 1, cfg.Port);
return new HistorianDataSource(cfg);
}
private static int TryParseInt(string envName, int defaultValue)
{
var raw = Environment.GetEnvironmentVariable(envName);
return int.TryParse(raw, out var parsed) ? parsed : defaultValue;
}
}
@@ -1,58 +0,0 @@
using System;
using System.Runtime.ConstrainedExecution;
using System.Runtime.InteropServices;
namespace ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Host.Sta;
/// <summary>
/// SafeHandle-style lifetime wrapper for an <c>LMXProxyServer</c> COM connection. Per Task B.3
/// + decision #65: <see cref="ReleaseHandle"/> must call <c>Marshal.ReleaseComObject</c> until
/// refcount = 0, then <c>UnregisterProxy</c>. The finalizer runs as a
/// <see cref="CriticalFinalizerObject"/> to honor AppDomain-unload ordering.
/// </summary>
/// <remarks>
/// This scaffold accepts any RCW (tagged as <see cref="object"/>) so we can unit-test the
/// release logic with a mock. The concrete wiring to <c>ArchestrA.MxAccess.LMXProxyServer</c>
/// lands when the actual Galaxy code moves over (the part deferred to the parity gate).
/// </remarks>
public sealed class MxAccessHandle : SafeHandle
{
private object? _comObject;
private readonly Action<object>? _unregister;
public MxAccessHandle(object comObject, Action<object>? unregister = null)
: base(IntPtr.Zero, ownsHandle: true)
{
_comObject = comObject ?? throw new ArgumentNullException(nameof(comObject));
_unregister = unregister;
// The pointer value itself doesn't matter — we're wrapping an RCW, not a native handle.
SetHandle(new IntPtr(1));
}
public override bool IsInvalid => handle == IntPtr.Zero;
public object? RawComObject => _comObject;
[ReliabilityContract(Consistency.WillNotCorruptState, Cer.Success)]
protected override bool ReleaseHandle()
{
if (_comObject is null) return true;
try { _unregister?.Invoke(_comObject); }
catch { /* swallow — we're in finalizer/cleanup; log elsewhere */ }
try
{
if (Marshal.IsComObject(_comObject))
{
while (Marshal.ReleaseComObject(_comObject) > 0) { /* loop until fully released */ }
}
}
catch { /* swallow */ }
_comObject = null;
SetHandle(IntPtr.Zero);
return true;
}
}
@@ -1,206 +0,0 @@
using System;
using System.Collections.Concurrent;
using System.Runtime.InteropServices;
using System.Threading;
using System.Threading.Tasks;
namespace ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Host.Sta;
/// <summary>
/// Dedicated STA thread with a Win32 message pump that owns all <c>LMXProxyServer</c> COM
/// instances. Lifted from v1 <c>StaComThread</c> per CLAUDE.md "Reference Implementation".
/// Per <c>driver-stability.md</c> Galaxy deep dive §"STA thread + Win32 message pump":
/// work items dispatched via <c>PostThreadMessage(WM_APP)</c>; <c>WM_APP+1</c> requests a
/// graceful drain → <c>WM_QUIT</c>; supervisor escalates to <c>Environment.Exit(2)</c> if the
/// pump doesn't drain within the recycle grace window.
/// </summary>
public sealed class StaPump : IDisposable
{
private const uint WM_APP = 0x8000;
private const uint WM_DRAIN_AND_QUIT = WM_APP + 1;
private const uint PM_NOREMOVE = 0x0000;
private readonly Thread _thread;
private readonly ConcurrentQueue<WorkItem> _workItems = new();
private readonly TaskCompletionSource<bool> _started = new(TaskCreationOptions.RunContinuationsAsynchronously);
private volatile uint _nativeThreadId;
private volatile bool _pumpExited;
private volatile bool _disposed;
public int ThreadId => _thread.ManagedThreadId;
public DateTime LastDispatchedUtc { get; private set; } = DateTime.MinValue;
public int QueueDepth => _workItems.Count;
public bool IsRunning => _nativeThreadId != 0 && !_disposed && !_pumpExited;
public StaPump(string name = "Galaxy.Sta")
{
_thread = new Thread(PumpLoop) { Name = name, IsBackground = true };
_thread.SetApartmentState(ApartmentState.STA);
_thread.Start();
}
public Task WaitForStartedAsync() => _started.Task;
/// <summary>Posts a work item; resolves once it's executed on the STA thread.</summary>
public Task<T> InvokeAsync<T>(Func<T> work)
{
if (_disposed) throw new ObjectDisposedException(nameof(StaPump));
if (_pumpExited) throw new InvalidOperationException("STA pump has exited");
var tcs = new TaskCompletionSource<T>(TaskCreationOptions.RunContinuationsAsynchronously);
_workItems.Enqueue(new WorkItem(
() =>
{
try { tcs.TrySetResult(work()); }
catch (Exception ex) { tcs.TrySetException(ex); }
},
ex => tcs.TrySetException(ex)));
if (!PostThreadMessage(_nativeThreadId, WM_APP, IntPtr.Zero, IntPtr.Zero))
{
_pumpExited = true;
DrainAndFaultQueue();
}
return tcs.Task;
}
public Task InvokeAsync(Action work) => InvokeAsync(() => { work(); return 0; });
/// <summary>
/// Health probe — returns true if a no-op work item round-trips within
/// <paramref name="timeout"/>. Used by the supervisor; timeout means the pump is wedged
/// and a recycle is warranted (Task B.2 acceptance).
/// </summary>
public async Task<bool> IsResponsiveAsync(TimeSpan timeout)
{
if (!IsRunning) return false;
var task = InvokeAsync(() => { });
var completed = await Task.WhenAny(task, Task.Delay(timeout)).ConfigureAwait(false);
return completed == task;
}
private void PumpLoop()
{
try
{
_nativeThreadId = GetCurrentThreadId();
// Force the system to create the thread message queue before we signal Started.
// PeekMessage(PM_NOREMOVE) on an empty queue is the documented way to do this.
PeekMessage(out _, IntPtr.Zero, 0, 0, PM_NOREMOVE);
_started.TrySetResult(true);
// GetMessage returns 0 on WM_QUIT, -1 on error, otherwise a positive value.
while (GetMessage(out var msg, IntPtr.Zero, 0, 0) > 0)
{
if (msg.message == WM_APP)
{
DrainQueue();
}
else if (msg.message == WM_DRAIN_AND_QUIT)
{
DrainQueue();
PostQuitMessage(0);
}
else
{
// Pass through any window/dialog messages the COM proxy may inject.
TranslateMessage(ref msg);
DispatchMessage(ref msg);
}
}
}
catch (Exception ex)
{
_started.TrySetException(ex);
}
finally
{
_pumpExited = true;
DrainAndFaultQueue();
}
}
private void DrainQueue()
{
while (_workItems.TryDequeue(out var item))
{
item.Execute();
LastDispatchedUtc = DateTime.UtcNow;
}
}
private void DrainAndFaultQueue()
{
var ex = new InvalidOperationException("STA pump has exited");
while (_workItems.TryDequeue(out var item))
{
try { item.Fault(ex); }
catch { /* faulting a TCS shouldn't throw, but be defensive */ }
}
}
public void Dispose()
{
if (_disposed) return;
_disposed = true;
try
{
if (_nativeThreadId != 0 && !_pumpExited)
PostThreadMessage(_nativeThreadId, WM_DRAIN_AND_QUIT, IntPtr.Zero, IntPtr.Zero);
_thread.Join(TimeSpan.FromSeconds(5));
}
catch { /* swallow — best effort */ }
DrainAndFaultQueue();
}
private sealed record WorkItem(Action Execute, Action<Exception> Fault);
#region Win32 P/Invoke
[StructLayout(LayoutKind.Sequential)]
private struct MSG
{
public IntPtr hwnd;
public uint message;
public IntPtr wParam;
public IntPtr lParam;
public uint time;
public POINT pt;
}
[StructLayout(LayoutKind.Sequential)]
private struct POINT { public int x; public int y; }
[DllImport("user32.dll")]
private static extern int GetMessage(out MSG lpMsg, IntPtr hWnd, uint wMsgFilterMin, uint wMsgFilterMax);
[DllImport("user32.dll")]
[return: MarshalAs(UnmanagedType.Bool)]
private static extern bool TranslateMessage(ref MSG lpMsg);
[DllImport("user32.dll")]
private static extern IntPtr DispatchMessage(ref MSG lpMsg);
[DllImport("user32.dll")]
[return: MarshalAs(UnmanagedType.Bool)]
private static extern bool PostThreadMessage(uint idThread, uint Msg, IntPtr wParam, IntPtr lParam);
[DllImport("user32.dll")]
private static extern void PostQuitMessage(int nExitCode);
[DllImport("user32.dll")]
[return: MarshalAs(UnmanagedType.Bool)]
private static extern bool PeekMessage(out MSG lpMsg, IntPtr hWnd, uint wMsgFilterMin, uint wMsgFilterMax,
uint wRemoveMsg);
[DllImport("kernel32.dll")]
private static extern uint GetCurrentThreadId();
#endregion
}
@@ -1,64 +0,0 @@
using System;
using System.Collections.Generic;
namespace ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Host.Stability;
/// <summary>
/// Galaxy-specific RSS watchdog per <c>driver-stability.md §"Memory Watchdog Thresholds"</c>.
/// Baseline-relative + absolute caps. Sustained-slope detection uses a rolling 30-min window.
/// Pluggable RSS source keeps it unit-testable.
/// </summary>
public sealed class MemoryWatchdog
{
/// <summary>Absolute hard ceiling — process is force-killed above this.</summary>
public long HardCeilingBytes { get; init; } = 1_500L * 1024 * 1024;
/// <summary>Sustained slope (bytes/min) above which soft recycle is scheduled.</summary>
public long SustainedSlopeBytesPerMinute { get; init; } = 5L * 1024 * 1024;
public TimeSpan SlopeWindow { get; init; } = TimeSpan.FromMinutes(30);
private readonly long _baselineBytes;
private readonly Queue<RssSample> _samples = new();
public MemoryWatchdog(long baselineBytes)
{
_baselineBytes = baselineBytes;
}
/// <summary>Called every 30s with the current RSS. Returns the action the supervisor should take.</summary>
public WatchdogAction Sample(long rssBytes, DateTime utcNow)
{
_samples.Enqueue(new RssSample(utcNow, rssBytes));
while (_samples.Count > 0 && utcNow - _samples.Peek().TimestampUtc > SlopeWindow)
_samples.Dequeue();
if (rssBytes >= HardCeilingBytes)
return WatchdogAction.HardKill;
var softThreshold = Math.Max(_baselineBytes * 2, _baselineBytes + 200L * 1024 * 1024);
var warnThreshold = Math.Max((long)(_baselineBytes * 1.5), _baselineBytes + 200L * 1024 * 1024);
if (rssBytes >= softThreshold) return WatchdogAction.SoftRecycle;
if (rssBytes >= warnThreshold) return WatchdogAction.Warn;
if (_samples.Count >= 2)
{
var oldest = _samples.Peek();
var span = (utcNow - oldest.TimestampUtc).TotalMinutes;
if (span >= SlopeWindow.TotalMinutes * 0.9) // need ~full window to trust the slope
{
var delta = rssBytes - oldest.RssBytes;
var bytesPerMin = delta / span;
if (bytesPerMin >= SustainedSlopeBytesPerMinute)
return WatchdogAction.SoftRecycle;
}
}
return WatchdogAction.None;
}
private readonly record struct RssSample(DateTime TimestampUtc, long RssBytes);
}
public enum WatchdogAction { None, Warn, SoftRecycle, HardKill }
@@ -1,121 +0,0 @@
using System;
using System.IO;
using System.IO.MemoryMappedFiles;
using System.Runtime.InteropServices;
using System.Text;
namespace ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Host.Stability;
/// <summary>
/// Ring-buffer of the last <see cref="Capacity"/> IPC operations, written into a
/// memory-mapped file. On hard crash the supervisor reads the MMF after the corpse is gone
/// to see what was in flight. Thread-safe for the single-writer, multi-reader pattern.
/// </summary>
/// <remarks>
/// File layout:
/// <code>
/// [16-byte header: magic(4) | version(4) | capacity(4) | writeIndex(4)]
/// [capacity × 256-byte entries: each is [8-byte utcUnixMs | 8-byte opKind | 240-byte UTF-8 message]]
/// </code>
/// </remarks>
public sealed class PostMortemMmf : IDisposable
{
private const int Magic = 0x4F505043; // 'OPPC'
private const int Version = 1;
private const int HeaderBytes = 16;
public const int EntryBytes = 256;
private const int MessageOffset = 16;
private const int MessageCapacity = EntryBytes - MessageOffset;
public int Capacity { get; }
public string Path { get; }
private readonly MemoryMappedFile _mmf;
private readonly MemoryMappedViewAccessor _accessor;
private readonly object _writeGate = new();
public PostMortemMmf(string path, int capacity = 1000)
{
if (capacity <= 0) throw new ArgumentOutOfRangeException(nameof(capacity));
Capacity = capacity;
Path = path;
var fileBytes = HeaderBytes + capacity * EntryBytes;
Directory.CreateDirectory(System.IO.Path.GetDirectoryName(path)!);
var fs = new FileStream(path, FileMode.OpenOrCreate, FileAccess.ReadWrite, FileShare.Read);
fs.SetLength(fileBytes);
_mmf = MemoryMappedFile.CreateFromFile(fs, null, fileBytes,
MemoryMappedFileAccess.ReadWrite, HandleInheritability.None, leaveOpen: false);
_accessor = _mmf.CreateViewAccessor(0, fileBytes, MemoryMappedFileAccess.ReadWrite);
// Initialize header if blank/garbage.
if (_accessor.ReadInt32(0) != Magic)
{
_accessor.Write(0, Magic);
_accessor.Write(4, Version);
_accessor.Write(8, capacity);
_accessor.Write(12, 0); // writeIndex
}
}
public void Write(long opKind, string message)
{
lock (_writeGate)
{
var idx = _accessor.ReadInt32(12);
var offset = HeaderBytes + idx * EntryBytes;
_accessor.Write(offset + 0, DateTimeOffset.UtcNow.ToUnixTimeMilliseconds());
_accessor.Write(offset + 8, opKind);
var msgBytes = Encoding.UTF8.GetBytes(message ?? string.Empty);
var copy = Math.Min(msgBytes.Length, MessageCapacity - 1);
_accessor.WriteArray(offset + MessageOffset, msgBytes, 0, copy);
_accessor.Write(offset + MessageOffset + copy, (byte)0); // null terminator
var next = (idx + 1) % Capacity;
_accessor.Write(12, next);
}
}
/// <summary>Reads all entries in order (oldest → newest). Safe to call from another process.</summary>
public PostMortemEntry[] ReadAll()
{
var magic = _accessor.ReadInt32(0);
if (magic != Magic) return [];
var capacity = _accessor.ReadInt32(8);
var writeIndex = _accessor.ReadInt32(12);
var entries = new PostMortemEntry[capacity];
var count = 0;
for (var i = 0; i < capacity; i++)
{
var slot = (writeIndex + i) % capacity;
var offset = HeaderBytes + slot * EntryBytes;
var ts = _accessor.ReadInt64(offset + 0);
if (ts == 0) continue; // unwritten
var op = _accessor.ReadInt64(offset + 8);
var msgBuf = new byte[MessageCapacity];
_accessor.ReadArray(offset + MessageOffset, msgBuf, 0, MessageCapacity);
var nulTerm = Array.IndexOf<byte>(msgBuf, 0);
var msg = Encoding.UTF8.GetString(msgBuf, 0, nulTerm < 0 ? MessageCapacity : nulTerm);
entries[count++] = new PostMortemEntry(ts, op, msg);
}
Array.Resize(ref entries, count);
return entries;
}
public void Dispose()
{
_accessor.Dispose();
_mmf.Dispose();
}
}
public readonly record struct PostMortemEntry(long UtcUnixMs, long OpKind, string Message);
@@ -1,40 +0,0 @@
using System;
using System.Collections.Generic;
namespace ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Host.Stability;
/// <summary>
/// Frequency-capped soft-recycle decision per <c>driver-stability.md §"Recycle Policy"</c>.
/// Default cap: 1 soft recycle per hour. Scheduled recycle at 03:00 local; supervisor reads
/// <see cref="ShouldSoftRecycleScheduled"/> to decide.
/// </summary>
public sealed class RecyclePolicy
{
public TimeSpan SoftRecycleCap { get; init; } = TimeSpan.FromHours(1);
public int DailyRecycleHourLocal { get; init; } = 3;
private readonly List<DateTime> _recentRecyclesUtc = new();
/// <summary>Returns true if a soft recycle would be allowed under the frequency cap.</summary>
public bool TryRequestSoftRecycle(DateTime utcNow, out string? reason)
{
_recentRecyclesUtc.RemoveAll(t => utcNow - t > SoftRecycleCap);
if (_recentRecyclesUtc.Count > 0)
{
reason = $"soft-recycle frequency cap: last recycle was {(utcNow - _recentRecyclesUtc[_recentRecyclesUtc.Count - 1]).TotalMinutes:F1} min ago";
return false;
}
_recentRecyclesUtc.Add(utcNow);
reason = null;
return true;
}
public bool ShouldSoftRecycleScheduled(DateTime localNow, ref DateTime lastScheduledDateLocal)
{
if (localNow.Hour != DailyRecycleHourLocal) return false;
if (localNow.Date <= lastScheduledDateLocal.Date) return false;
lastScheduledDateLocal = localNow.Date;
return true;
}
}
@@ -1,53 +0,0 @@
<Project Sdk="Microsoft.NET.Sdk">
<PropertyGroup>
<OutputType>Exe</OutputType>
<TargetFramework>net48</TargetFramework>
<!-- Decision #23: x86 required for MXAccess COM interop. The MxAccess COM client is
now ported (Backend/MxAccess/) so we need the x86 platform target for the
ArchestrA.MxAccess.dll COM interop reference to resolve at runtime. -->
<PlatformTarget>x86</PlatformTarget>
<Prefer32Bit>true</Prefer32Bit>
<Nullable>enable</Nullable>
<LangVersion>latest</LangVersion>
<TreatWarningsAsErrors>true</TreatWarningsAsErrors>
<GenerateDocumentationFile>true</GenerateDocumentationFile>
<NoWarn>$(NoWarn);CS1591</NoWarn>
<RootNamespace>ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Host</RootNamespace>
<AssemblyName>OtOpcUa.Driver.Galaxy.Host</AssemblyName>
</PropertyGroup>
<ItemGroup>
<PackageReference Include="System.IO.Pipes.AccessControl" Version="5.0.0"/>
<PackageReference Include="System.Memory" Version="4.5.5"/>
<PackageReference Include="System.Threading.Tasks.Extensions" Version="4.5.4"/>
<PackageReference Include="System.Data.SqlClient" Version="4.9.0"/>
<PackageReference Include="Serilog" Version="4.2.0"/>
<PackageReference Include="Serilog.Sinks.File" Version="7.0.0"/>
</ItemGroup>
<ItemGroup>
<ProjectReference Include="..\ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Shared\ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Shared.csproj"/>
<!-- PR 3.2: Historian SDK code lifted to the Wonderware sidecar. Galaxy.Host still
consumes the historian types (MxAccessGalaxyBackend, Program) until phase 7,
so reference the sidecar project to keep building. -->
<ProjectReference Include="..\ZB.MOM.WW.OtOpcUa.Driver.Historian.Wonderware\ZB.MOM.WW.OtOpcUa.Driver.Historian.Wonderware.csproj"/>
</ItemGroup>
<ItemGroup>
<InternalsVisibleTo Include="ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Host.Tests"/>
</ItemGroup>
<ItemGroup>
<Reference Include="ArchestrA.MxAccess">
<HintPath>..\..\lib\ArchestrA.MxAccess.dll</HintPath>
<Private>true</Private>
</Reference>
</ItemGroup>
<ItemGroup>
<NuGetAuditSuppress Include="https://github.com/advisories/GHSA-37gx-xxp4-5rgx"/>
<NuGetAuditSuppress Include="https://github.com/advisories/GHSA-w3x6-4m5h-cxqf"/>
</ItemGroup>
</Project>
@@ -1,590 +0,0 @@
using MessagePack;
using ZB.MOM.WW.OtOpcUa.Core.Abstractions;
using ZB.MOM.WW.OtOpcUa.Core.AlarmHistorian;
using ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Proxy.Ipc;
using ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Shared.Contracts;
using IpcHostConnectivityStatus = ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Shared.Contracts.HostConnectivityStatus;
namespace ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Proxy;
/// <summary>
/// <see cref="IDriver"/> implementation that forwards every capability over the Galaxy IPC
/// channel to the out-of-process Host. Implements the full Phase 2 capability surface;
/// bodies that depend on the deferred Host-side MXAccess code lift will surface
/// <see cref="GalaxyIpcException"/> with code <c>not-implemented</c> until the Host's
/// <c>IGalaxyBackend</c> is wired to the real <c>MxAccessClient</c>.
/// </summary>
public sealed class GalaxyProxyDriver(GalaxyProxyOptions options)
: IDriver,
ITagDiscovery,
IReadable,
IWritable,
ISubscribable,
IAlarmSource,
IHistoryProvider,
IRediscoverable,
IHostConnectivityProbe,
IAlarmHistorianWriter,
IDisposable
{
private GalaxyIpcClient? _client;
private long _sessionId;
private DriverHealth _health = new(DriverState.Unknown, null, null);
private IReadOnlyList<Core.Abstractions.HostConnectivityStatus> _hostStatuses = [];
public string DriverInstanceId => options.DriverInstanceId;
public string DriverType => "Galaxy";
public event EventHandler<DataChangeEventArgs>? OnDataChange;
public event EventHandler<AlarmEventArgs>? OnAlarmEvent;
public event EventHandler<RediscoveryEventArgs>? OnRediscoveryNeeded;
public event EventHandler<HostStatusChangedEventArgs>? OnHostStatusChanged;
public async Task InitializeAsync(string driverConfigJson, CancellationToken cancellationToken)
{
_health = new DriverHealth(DriverState.Initializing, null, null);
try
{
_client = await GalaxyIpcClient.ConnectAsync(
options.PipeName, options.SharedSecret, options.ConnectTimeout, cancellationToken);
// Route Host-pushed event frames to the matching Raise* methods. Must be set BEFORE
// the first CallAsync so a RuntimeStatusChange arriving between OpenSessionRequest
// and OpenSessionResponse lands on the handler rather than unblocking the call with
// the wrong kind.
_client.SetEventHandler(DispatchHostEventAsync);
var resp = await _client.CallAsync<OpenSessionRequest, OpenSessionResponse>(
MessageKind.OpenSessionRequest,
new OpenSessionRequest { DriverInstanceId = DriverInstanceId, DriverConfigJson = driverConfigJson },
MessageKind.OpenSessionResponse,
cancellationToken);
if (!resp.Success)
throw new InvalidOperationException($"Galaxy.Host OpenSession failed: {resp.Error}");
_sessionId = resp.SessionId;
_health = new DriverHealth(DriverState.Healthy, DateTime.UtcNow, null);
}
catch (Exception ex)
{
_health = new DriverHealth(DriverState.Faulted, null, ex.Message);
throw;
}
}
public async Task ReinitializeAsync(string driverConfigJson, CancellationToken cancellationToken)
{
await ShutdownAsync(cancellationToken);
await InitializeAsync(driverConfigJson, cancellationToken);
}
public async Task ShutdownAsync(CancellationToken cancellationToken)
{
if (_client is null) return;
try
{
await _client.SendOneWayAsync(
MessageKind.CloseSessionRequest,
new CloseSessionRequest { SessionId = _sessionId },
cancellationToken);
}
catch { /* shutdown is best effort */ }
await _client.DisposeAsync();
_client = null;
_health = new DriverHealth(DriverState.Unknown, _health.LastSuccessfulRead, null);
}
public DriverHealth GetHealth() => _health;
public long GetMemoryFootprint() => 0;
public Task FlushOptionalCachesAsync(CancellationToken cancellationToken) => Task.CompletedTask;
// ---- ITagDiscovery ----
public async Task DiscoverAsync(IAddressSpaceBuilder builder, CancellationToken cancellationToken)
{
ArgumentNullException.ThrowIfNull(builder);
var client = RequireClient();
var resp = await client.CallAsync<DiscoverHierarchyRequest, DiscoverHierarchyResponse>(
MessageKind.DiscoverHierarchyRequest,
new DiscoverHierarchyRequest { SessionId = _sessionId },
MessageKind.DiscoverHierarchyResponse,
cancellationToken);
if (!resp.Success)
throw new InvalidOperationException($"Galaxy.Host DiscoverHierarchy failed: {resp.Error}");
foreach (var obj in resp.Objects)
{
var folder = builder.Folder(obj.ContainedName, obj.ContainedName);
foreach (var attr in obj.Attributes)
{
var fullName = $"{obj.TagName}.{attr.AttributeName}";
var handle = folder.Variable(
attr.AttributeName,
attr.AttributeName,
new DriverAttributeInfo(
FullName: fullName,
DriverDataType: MapDataType(attr.MxDataType),
IsArray: attr.IsArray,
ArrayDim: attr.ArrayDim,
SecurityClass: MapSecurity(attr.SecurityClassification),
IsHistorized: attr.IsHistorized,
IsAlarm: attr.IsAlarm));
// PR 15: when Galaxy flags the attribute as alarm-bearing (AlarmExtension
// primitive), register an alarm-condition sink so the generic node manager
// can route OnAlarmEvent payloads for this tag to the concrete address-space
// builder. Severity default Medium — the live severity arrives through
// AlarmEventArgs once MxAccessGalaxyBackend's tracker starts firing.
if (attr.IsAlarm)
{
handle.MarkAsAlarmCondition(new AlarmConditionInfo(
SourceName: fullName,
InitialSeverity: AlarmSeverity.Medium,
InitialDescription: null));
}
}
}
}
// ---- IReadable ----
public async Task<IReadOnlyList<DataValueSnapshot>> ReadAsync(
IReadOnlyList<string> fullReferences, CancellationToken cancellationToken)
{
var client = RequireClient();
var resp = await client.CallAsync<ReadValuesRequest, ReadValuesResponse>(
MessageKind.ReadValuesRequest,
new ReadValuesRequest { SessionId = _sessionId, TagReferences = [.. fullReferences] },
MessageKind.ReadValuesResponse,
cancellationToken);
if (!resp.Success)
throw new InvalidOperationException($"Galaxy.Host ReadValues failed: {resp.Error}");
var byRef = resp.Values.ToDictionary(v => v.TagReference);
var result = new DataValueSnapshot[fullReferences.Count];
for (var i = 0; i < fullReferences.Count; i++)
{
result[i] = byRef.TryGetValue(fullReferences[i], out var v)
? ToSnapshot(v)
: new DataValueSnapshot(null, StatusBadInternalError, null, DateTime.UtcNow);
}
return result;
}
// ---- IWritable ----
public async Task<IReadOnlyList<WriteResult>> WriteAsync(
IReadOnlyList<WriteRequest> writes, CancellationToken cancellationToken)
{
var client = RequireClient();
var resp = await client.CallAsync<WriteValuesRequest, WriteValuesResponse>(
MessageKind.WriteValuesRequest,
new WriteValuesRequest
{
SessionId = _sessionId,
Writes = [.. writes.Select(FromWriteRequest)],
},
MessageKind.WriteValuesResponse,
cancellationToken);
return [.. resp.Results.Select(r => new WriteResult(r.StatusCode))];
}
// ---- ISubscribable ----
public async Task<ISubscriptionHandle> SubscribeAsync(
IReadOnlyList<string> fullReferences, TimeSpan publishingInterval, CancellationToken cancellationToken)
{
var client = RequireClient();
var resp = await client.CallAsync<SubscribeRequest, SubscribeResponse>(
MessageKind.SubscribeRequest,
new SubscribeRequest
{
SessionId = _sessionId,
TagReferences = [.. fullReferences],
RequestedIntervalMs = (int)publishingInterval.TotalMilliseconds,
},
MessageKind.SubscribeResponse,
cancellationToken);
if (!resp.Success)
throw new InvalidOperationException($"Galaxy.Host Subscribe failed: {resp.Error}");
return new GalaxySubscriptionHandle(resp.SubscriptionId);
}
public async Task UnsubscribeAsync(ISubscriptionHandle handle, CancellationToken cancellationToken)
{
var client = RequireClient();
var sid = ((GalaxySubscriptionHandle)handle).SubscriptionId;
await client.SendOneWayAsync(
MessageKind.UnsubscribeRequest,
new UnsubscribeRequest { SessionId = _sessionId, SubscriptionId = sid },
cancellationToken);
}
/// <summary>
/// Internal entry point used by the IPC client when the Host pushes an
/// <see cref="MessageKind.OnDataChangeNotification"/> frame. Surfaces it as a managed
/// <see cref="OnDataChange"/> event.
/// </summary>
internal void RaiseDataChange(OnDataChangeNotification notif)
{
var handle = new GalaxySubscriptionHandle(notif.SubscriptionId);
// ISubscribable.OnDataChange fires once per changed attribute — fan out the batch.
foreach (var v in notif.Values)
OnDataChange?.Invoke(this, new DataChangeEventArgs(handle, v.TagReference, ToSnapshot(v)));
}
// ---- IAlarmSource ----
public async Task<IAlarmSubscriptionHandle> SubscribeAlarmsAsync(
IReadOnlyList<string> sourceNodeIds, CancellationToken cancellationToken)
{
var client = RequireClient();
await client.SendOneWayAsync(
MessageKind.AlarmSubscribeRequest,
new AlarmSubscribeRequest { SessionId = _sessionId },
cancellationToken);
return new GalaxyAlarmSubscriptionHandle($"alarm-{_sessionId}");
}
public Task UnsubscribeAlarmsAsync(IAlarmSubscriptionHandle handle, CancellationToken cancellationToken)
=> Task.CompletedTask;
public async Task AcknowledgeAsync(
IReadOnlyList<AlarmAcknowledgeRequest> acknowledgements, CancellationToken cancellationToken)
{
var client = RequireClient();
foreach (var ack in acknowledgements)
{
await client.SendOneWayAsync(
MessageKind.AlarmAckRequest,
new AlarmAckRequest
{
SessionId = _sessionId,
EventId = ack.ConditionId,
Comment = ack.Comment ?? string.Empty,
},
cancellationToken);
}
}
internal void RaiseAlarmEvent(GalaxyAlarmEvent ev)
{
var handle = new GalaxyAlarmSubscriptionHandle($"alarm-{_sessionId}");
OnAlarmEvent?.Invoke(this, new AlarmEventArgs(
SubscriptionHandle: handle,
SourceNodeId: ev.ObjectTagName,
ConditionId: ev.EventId,
AlarmType: ev.AlarmName,
Message: ev.Message,
Severity: MapSeverity(ev.Severity),
SourceTimestampUtc: DateTimeOffset.FromUnixTimeMilliseconds(ev.UtcUnixMs).UtcDateTime));
}
// ---- IHistoryProvider ----
public async Task<HistoryReadResult> ReadRawAsync(
string fullReference, DateTime startUtc, DateTime endUtc, uint maxValuesPerNode,
CancellationToken cancellationToken)
{
var client = RequireClient();
var resp = await client.CallAsync<HistoryReadRequest, HistoryReadResponse>(
MessageKind.HistoryReadRequest,
new HistoryReadRequest
{
SessionId = _sessionId,
TagReferences = [fullReference],
StartUtcUnixMs = new DateTimeOffset(startUtc, TimeSpan.Zero).ToUnixTimeMilliseconds(),
EndUtcUnixMs = new DateTimeOffset(endUtc, TimeSpan.Zero).ToUnixTimeMilliseconds(),
MaxValuesPerTag = maxValuesPerNode,
},
MessageKind.HistoryReadResponse,
cancellationToken);
if (!resp.Success)
throw new InvalidOperationException($"Galaxy.Host HistoryRead failed: {resp.Error}");
var first = resp.Tags.FirstOrDefault();
IReadOnlyList<DataValueSnapshot> samples = first is null
? Array.Empty<DataValueSnapshot>()
: [.. first.Values.Select(ToSnapshot)];
return new HistoryReadResult(samples, ContinuationPoint: null);
}
public async Task<HistoryReadResult> ReadProcessedAsync(
string fullReference, DateTime startUtc, DateTime endUtc, TimeSpan interval,
HistoryAggregateType aggregate, CancellationToken cancellationToken)
{
var client = RequireClient();
var column = MapAggregateToColumn(aggregate);
var resp = await client.CallAsync<HistoryReadProcessedRequest, HistoryReadProcessedResponse>(
MessageKind.HistoryReadProcessedRequest,
new HistoryReadProcessedRequest
{
SessionId = _sessionId,
TagReference = fullReference,
StartUtcUnixMs = new DateTimeOffset(startUtc, TimeSpan.Zero).ToUnixTimeMilliseconds(),
EndUtcUnixMs = new DateTimeOffset(endUtc, TimeSpan.Zero).ToUnixTimeMilliseconds(),
IntervalMs = (long)interval.TotalMilliseconds,
AggregateColumn = column,
},
MessageKind.HistoryReadProcessedResponse,
cancellationToken);
if (!resp.Success)
throw new InvalidOperationException($"Galaxy.Host HistoryReadProcessed failed: {resp.Error}");
IReadOnlyList<DataValueSnapshot> samples = [.. resp.Values.Select(ToSnapshot)];
return new HistoryReadResult(samples, ContinuationPoint: null);
}
public async Task<HistoryReadResult> ReadAtTimeAsync(
string fullReference, IReadOnlyList<DateTime> timestampsUtc, CancellationToken cancellationToken)
{
var client = RequireClient();
var resp = await client.CallAsync<HistoryReadAtTimeRequest, HistoryReadAtTimeResponse>(
MessageKind.HistoryReadAtTimeRequest,
new HistoryReadAtTimeRequest
{
SessionId = _sessionId,
TagReference = fullReference,
TimestampsUtcUnixMs = [.. timestampsUtc.Select(t => new DateTimeOffset(t, TimeSpan.Zero).ToUnixTimeMilliseconds())],
},
MessageKind.HistoryReadAtTimeResponse,
cancellationToken);
if (!resp.Success)
throw new InvalidOperationException($"Galaxy.Host HistoryReadAtTime failed: {resp.Error}");
// ReadAtTime returns one sample per requested timestamp in the same order — the Host
// pads with bad-quality snapshots when a timestamp can't be interpolated, so response
// length matches request length exactly. We trust that contract rather than
// re-aligning here, because the Host is the source-of-truth for interpolation policy.
IReadOnlyList<DataValueSnapshot> samples = [.. resp.Values.Select(ToSnapshot)];
return new HistoryReadResult(samples, ContinuationPoint: null);
}
public async Task<HistoricalEventsResult> ReadEventsAsync(
string? sourceName, DateTime startUtc, DateTime endUtc, int maxEvents, CancellationToken cancellationToken)
{
var client = RequireClient();
var resp = await client.CallAsync<HistoryReadEventsRequest, HistoryReadEventsResponse>(
MessageKind.HistoryReadEventsRequest,
new HistoryReadEventsRequest
{
SessionId = _sessionId,
SourceName = sourceName,
StartUtcUnixMs = new DateTimeOffset(startUtc, TimeSpan.Zero).ToUnixTimeMilliseconds(),
EndUtcUnixMs = new DateTimeOffset(endUtc, TimeSpan.Zero).ToUnixTimeMilliseconds(),
MaxEvents = maxEvents,
},
MessageKind.HistoryReadEventsResponse,
cancellationToken);
if (!resp.Success)
throw new InvalidOperationException($"Galaxy.Host HistoryReadEvents failed: {resp.Error}");
IReadOnlyList<HistoricalEvent> events = [.. resp.Events.Select(ToHistoricalEvent)];
return new HistoricalEventsResult(events, ContinuationPoint: null);
}
internal static HistoricalEvent ToHistoricalEvent(GalaxyHistoricalEvent wire) => new(
EventId: wire.EventId,
SourceName: wire.SourceName,
EventTimeUtc: DateTimeOffset.FromUnixTimeMilliseconds(wire.EventTimeUtcUnixMs).UtcDateTime,
ReceivedTimeUtc: DateTimeOffset.FromUnixTimeMilliseconds(wire.ReceivedTimeUtcUnixMs).UtcDateTime,
Message: wire.DisplayText,
Severity: wire.Severity);
/// <summary>
/// Maps the OPC UA Part 13 aggregate enum onto the Wonderware Historian
/// AnalogSummaryQuery column names consumed by <c>HistorianDataSource.ReadAggregateAsync</c>.
/// Kept on the Proxy side so Galaxy.Host stays OPC-UA-free.
/// </summary>
internal static string MapAggregateToColumn(HistoryAggregateType aggregate) => aggregate switch
{
HistoryAggregateType.Average => "Average",
HistoryAggregateType.Minimum => "Minimum",
HistoryAggregateType.Maximum => "Maximum",
HistoryAggregateType.Count => "ValueCount",
HistoryAggregateType.Total => throw new NotSupportedException(
"HistoryAggregateType.Total is not supported by the Wonderware Historian AnalogSummary " +
"query — use Average × Count on the caller side, or switch to Average/Minimum/Maximum/Count."),
_ => throw new NotSupportedException($"Unknown HistoryAggregateType {aggregate}"),
};
// ---- IRediscoverable ----
/// <summary>
/// Triggered by the IPC client when the Host pushes a deploy-watermark notification
/// (Galaxy <c>time_of_last_deploy</c> changed per decision #54).
/// </summary>
internal void RaiseRediscoveryNeeded(string reason, string? scopeHint = null) =>
OnRediscoveryNeeded?.Invoke(this, new RediscoveryEventArgs(reason, scopeHint));
// ---- IHostConnectivityProbe ----
public IReadOnlyList<Core.Abstractions.HostConnectivityStatus> GetHostStatuses() => _hostStatuses;
internal void OnHostConnectivityUpdate(IpcHostConnectivityStatus update)
{
var translated = new Core.Abstractions.HostConnectivityStatus(
HostName: update.HostName,
State: ParseHostState(update.RuntimeStatus),
LastChangedUtc: DateTimeOffset.FromUnixTimeMilliseconds(update.LastObservedUtcUnixMs).UtcDateTime);
var prior = _hostStatuses.FirstOrDefault(h => h.HostName == translated.HostName);
_hostStatuses = [
.. _hostStatuses.Where(h => h.HostName != translated.HostName),
translated
];
if (prior is null || prior.State != translated.State)
{
OnHostStatusChanged?.Invoke(this, new HostStatusChangedEventArgs(
translated.HostName, prior?.State ?? HostState.Unknown, translated.State));
}
}
private static HostState ParseHostState(string s) => s switch
{
"Running" => HostState.Running,
"Stopped" => HostState.Stopped,
"Faulted" => HostState.Faulted,
_ => HostState.Unknown,
};
// ---- helpers ----
/// <summary>
/// Event-handler registered with <see cref="GalaxyIpcClient.SetEventHandler"/>. Decodes
/// the MessagePack body into the matching wire contract and delegates to the existing
/// <c>Raise*</c> helpers. Unknown kinds are silently ignored — the IPC contract is
/// append-only, so a newer Host sending a kind this Proxy doesn't recognise shouldn't
/// break the session.
/// </summary>
private Task DispatchHostEventAsync(MessageKind kind, byte[] body)
{
switch (kind)
{
case MessageKind.OnDataChangeNotification:
RaiseDataChange(MessagePackSerializer.Deserialize<OnDataChangeNotification>(body));
break;
case MessageKind.AlarmEvent:
RaiseAlarmEvent(MessagePackSerializer.Deserialize<GalaxyAlarmEvent>(body));
break;
case MessageKind.HostConnectivityStatus:
OnHostConnectivityUpdate(MessagePackSerializer.Deserialize<IpcHostConnectivityStatus>(body));
break;
case MessageKind.RuntimeStatusChange:
var rsc = MessagePackSerializer.Deserialize<RuntimeStatusChangeNotification>(body);
OnHostConnectivityUpdate(rsc.Status);
break;
// HistorianConnectivityStatus has no consumer on this Proxy today — drop.
// Response kinds never reach the event handler; the client routes those to
// their pending CallAsync TCS.
}
return Task.CompletedTask;
}
private GalaxyIpcClient RequireClient() =>
_client ?? throw new InvalidOperationException("Driver not initialized");
private const uint StatusBadInternalError = 0x80020000u;
private static DataValueSnapshot ToSnapshot(GalaxyDataValue v) => new(
Value: v.ValueBytes,
StatusCode: v.StatusCode,
SourceTimestampUtc: v.SourceTimestampUtcUnixMs > 0
? DateTimeOffset.FromUnixTimeMilliseconds(v.SourceTimestampUtcUnixMs).UtcDateTime
: null,
ServerTimestampUtc: DateTimeOffset.FromUnixTimeMilliseconds(v.ServerTimestampUtcUnixMs).UtcDateTime);
private static GalaxyDataValue FromWriteRequest(WriteRequest w) => new()
{
TagReference = w.FullReference,
ValueBytes = MessagePack.MessagePackSerializer.Serialize(w.Value),
ValueMessagePackType = 0,
StatusCode = 0,
SourceTimestampUtcUnixMs = DateTimeOffset.UtcNow.ToUnixTimeMilliseconds(),
ServerTimestampUtcUnixMs = DateTimeOffset.UtcNow.ToUnixTimeMilliseconds(),
};
private static DriverDataType MapDataType(int mxDataType) => mxDataType switch
{
0 => DriverDataType.Boolean,
1 => DriverDataType.Int32,
2 => DriverDataType.Float32,
3 => DriverDataType.Float64,
4 => DriverDataType.String,
5 => DriverDataType.DateTime,
_ => DriverDataType.String,
};
private static SecurityClassification MapSecurity(int mxSec) => mxSec switch
{
0 => SecurityClassification.FreeAccess,
1 => SecurityClassification.Operate,
2 => SecurityClassification.SecuredWrite,
3 => SecurityClassification.VerifiedWrite,
4 => SecurityClassification.Tune,
5 => SecurityClassification.Configure,
6 => SecurityClassification.ViewOnly,
_ => SecurityClassification.FreeAccess,
};
private static AlarmSeverity MapSeverity(int sev) => sev switch
{
<= 250 => AlarmSeverity.Low,
<= 500 => AlarmSeverity.Medium,
<= 800 => AlarmSeverity.High,
_ => AlarmSeverity.Critical,
};
/// <summary>
/// Phase 7 follow-up #247 — IAlarmHistorianWriter implementation. Forwards alarm
/// batches to Galaxy.Host over the existing IPC channel, reusing the connection
/// the driver already established for data-plane traffic. Throws
/// <see cref="InvalidOperationException"/> when called before
/// <see cref="InitializeAsync"/> has connected the client; the SQLite drain worker
/// translates that to whole-batch RetryPlease per its catch contract.
/// </summary>
public Task<IReadOnlyList<HistorianWriteOutcome>> WriteBatchAsync(
IReadOnlyList<AlarmHistorianEvent> batch, CancellationToken cancellationToken)
{
if (_client is null)
throw new InvalidOperationException(
"GalaxyProxyDriver IPC client not connected — historian writes rejected until InitializeAsync completes");
return new GalaxyHistorianWriter(_client).WriteBatchAsync(batch, cancellationToken);
}
public void Dispose() => _client?.DisposeAsync().AsTask().GetAwaiter().GetResult();
}
internal sealed record GalaxySubscriptionHandle(long SubscriptionId) : ISubscriptionHandle
{
public string DiagnosticId => $"galaxy-sub-{SubscriptionId}";
}
internal sealed record GalaxyAlarmSubscriptionHandle(string Id) : IAlarmSubscriptionHandle
{
public string DiagnosticId => Id;
}
public sealed class GalaxyProxyOptions
{
public required string DriverInstanceId { get; init; }
public required string PipeName { get; init; }
public required string SharedSecret { get; init; }
public TimeSpan ConnectTimeout { get; init; } = TimeSpan.FromSeconds(10);
}
@@ -1,61 +0,0 @@
using System.Text.Json;
using ZB.MOM.WW.OtOpcUa.Core.Abstractions;
using ZB.MOM.WW.OtOpcUa.Core.Hosting;
namespace ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Proxy;
/// <summary>
/// Static factory registration helper for <see cref="GalaxyProxyDriver"/>. Server's
/// Program.cs calls <see cref="Register"/> once at startup; the bootstrapper (task #248)
/// then materialises Galaxy DriverInstance rows from the central config DB into live
/// driver instances. No dependency on Microsoft.Extensions.DependencyInjection so the
/// driver project stays free of DI machinery.
/// </summary>
public static class GalaxyProxyDriverFactoryExtensions
{
public const string DriverTypeName = "Galaxy";
/// <summary>
/// Register the Galaxy driver factory in the supplied <see cref="DriverFactoryRegistry"/>.
/// Throws if 'Galaxy' is already registered — single-instance per process.
/// </summary>
public static void Register(DriverFactoryRegistry registry)
{
ArgumentNullException.ThrowIfNull(registry);
// Galaxy is Tier C — out-of-process MXAccess Host, scheduled recycle is allowed.
registry.Register(DriverTypeName, CreateInstance, DriverTier.C);
}
internal static GalaxyProxyDriver CreateInstance(string driverInstanceId, string driverConfigJson)
{
ArgumentException.ThrowIfNullOrWhiteSpace(driverInstanceId);
ArgumentException.ThrowIfNullOrWhiteSpace(driverConfigJson);
// DriverConfig column is a JSON object that mirrors GalaxyProxyOptions.
// Required: PipeName, SharedSecret. Optional: ConnectTimeoutMs (defaults to 10s).
// The DriverInstanceId from the row wins over any value in the JSON — the row
// is the authoritative identity per the schema's UX_DriverInstance_Generation_LogicalId.
using var doc = JsonDocument.Parse(driverConfigJson);
var root = doc.RootElement;
string pipeName = root.TryGetProperty("PipeName", out var p) && p.ValueKind == JsonValueKind.String
? p.GetString()!
: throw new InvalidOperationException(
$"GalaxyProxyDriver config for '{driverInstanceId}' missing required PipeName");
string sharedSecret = root.TryGetProperty("SharedSecret", out var s) && s.ValueKind == JsonValueKind.String
? s.GetString()!
: throw new InvalidOperationException(
$"GalaxyProxyDriver config for '{driverInstanceId}' missing required SharedSecret");
var connectTimeout = root.TryGetProperty("ConnectTimeoutMs", out var t) && t.ValueKind == JsonValueKind.Number
? TimeSpan.FromMilliseconds(t.GetInt32())
: TimeSpan.FromSeconds(10);
return new GalaxyProxyDriver(new GalaxyProxyOptions
{
DriverInstanceId = driverInstanceId,
PipeName = pipeName,
SharedSecret = sharedSecret,
ConnectTimeout = connectTimeout,
});
}
}
@@ -1,90 +0,0 @@
using ZB.MOM.WW.OtOpcUa.Core.AlarmHistorian;
using ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Shared;
using ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Shared.Contracts;
namespace ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Proxy.Ipc;
/// <summary>
/// Phase 7 follow-up (task #247) — bridges <see cref="SqliteStoreAndForwardSink"/>'s
/// drain worker to <c>Driver.Galaxy.Host</c> over the existing <see cref="GalaxyIpcClient"/>
/// pipe. Translates <see cref="AlarmHistorianEvent"/> batches into the
/// <see cref="HistorianAlarmEventDto"/> wire format the Host expects + maps per-event
/// <see cref="HistorianAlarmEventOutcomeDto"/> responses back to
/// <see cref="HistorianWriteOutcome"/> so the SQLite queue knows what to ack /
/// dead-letter / retry.
/// </summary>
/// <remarks>
/// <para>
/// Reuses the IPC channel <see cref="GalaxyProxyDriver"/> already opens for the
/// Galaxy data plane — no second pipe to <c>Driver.Galaxy.Host</c>, no separate
/// auth handshake. The IPC client's call gate serializes historian batches with
/// driver Reads/Writes/Subscribes; historian batches are infrequent (every few
/// seconds at most under the SQLite sink's drain cadence) so the contention is
/// negligible compared to per-tag-read pressure.
/// </para>
/// <para>
/// Pipe-level transport faults (broken pipe, host crash) bubble up as
/// <see cref="GalaxyIpcException"/> which the SQLite sink's drain worker catches +
/// translates to a whole-batch RetryPlease per the
/// <see cref="SqliteStoreAndForwardSink"/> docstring — failed events stay queued
/// for the next drain tick after backoff.
/// </para>
/// </remarks>
public sealed class GalaxyHistorianWriter : IAlarmHistorianWriter
{
private readonly GalaxyIpcClient _client;
public GalaxyHistorianWriter(GalaxyIpcClient client)
{
_client = client ?? throw new ArgumentNullException(nameof(client));
}
public async Task<IReadOnlyList<HistorianWriteOutcome>> WriteBatchAsync(
IReadOnlyList<AlarmHistorianEvent> batch, CancellationToken cancellationToken)
{
ArgumentNullException.ThrowIfNull(batch);
if (batch.Count == 0) return [];
var request = new HistorianAlarmEventRequest
{
Events = batch.Select(ToDto).ToArray(),
};
var response = await _client.CallAsync<HistorianAlarmEventRequest, HistorianAlarmEventResponse>(
requestKind: MessageKind.HistorianAlarmEventRequest,
request: request,
expectedResponseKind: MessageKind.HistorianAlarmEventResponse,
ct: cancellationToken).ConfigureAwait(false);
if (response.Outcomes.Length != batch.Count)
throw new InvalidOperationException(
$"Galaxy.Host returned {response.Outcomes.Length} outcomes for a batch of {batch.Count} — protocol mismatch");
var outcomes = new HistorianWriteOutcome[response.Outcomes.Length];
for (var i = 0; i < response.Outcomes.Length; i++)
outcomes[i] = MapOutcome(response.Outcomes[i]);
return outcomes;
}
internal static HistorianAlarmEventDto ToDto(AlarmHistorianEvent e) => new()
{
AlarmId = e.AlarmId,
EquipmentPath = e.EquipmentPath,
AlarmName = e.AlarmName,
AlarmTypeName = e.AlarmTypeName,
Severity = (int)e.Severity,
EventKind = e.EventKind,
Message = e.Message,
User = e.User,
Comment = e.Comment,
TimestampUtcUnixMs = new DateTimeOffset(e.TimestampUtc, TimeSpan.Zero).ToUnixTimeMilliseconds(),
};
internal static HistorianWriteOutcome MapOutcome(HistorianAlarmEventOutcomeDto wire) => wire switch
{
HistorianAlarmEventOutcomeDto.Ack => HistorianWriteOutcome.Ack,
HistorianAlarmEventOutcomeDto.RetryPlease => HistorianWriteOutcome.RetryPlease,
HistorianAlarmEventOutcomeDto.PermanentFail => HistorianWriteOutcome.PermanentFail,
_ => throw new InvalidOperationException($"Unknown HistorianAlarmEventOutcomeDto byte {(byte)wire}"),
};
}
@@ -1,243 +0,0 @@
using System.IO.Pipes;
using MessagePack;
using ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Shared;
using ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Shared.Contracts;
namespace ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Proxy.Ipc;
/// <summary>
/// Client-side IPC channel to a running <c>Driver.Galaxy.Host</c>. Owns the data-plane pipe
/// connection, serializes request/response round-trips, and routes unsolicited push frames
/// (<see cref="MessageKind.OnDataChangeNotification"/>, <see cref="MessageKind.AlarmEvent"/>,
/// <see cref="MessageKind.HostConnectivityStatus"/>, <see cref="MessageKind.RuntimeStatusChange"/>,
/// <see cref="MessageKind.HistorianConnectivityStatus"/>) to a handler supplied via
/// <see cref="SetEventHandler"/>. One instance per session.
/// </summary>
/// <remarks>
/// A single background reader task owns the read side of the pipe. Calls are serialized by
/// <see cref="_writeGate"/>, so at most one pending response is outstanding at a time — the
/// reader uses a single pending-response slot. Any frame that doesn't match the pending
/// expected kind (or <see cref="MessageKind.ErrorResponse"/>) is treated as a push event and
/// forwarded to the registered handler. Without this router, a push event arriving between
/// request and response would satisfy the caller's read and fail the next
/// <see cref="CallAsync{TReq, TResp}"/> with an "Expected X, got Y" error.
/// </remarks>
public sealed class GalaxyIpcClient : IAsyncDisposable
{
private readonly NamedPipeClientStream _stream;
private readonly FrameReader _reader;
private readonly FrameWriter _writer;
private readonly SemaphoreSlim _writeGate = new(1, 1);
private readonly CancellationTokenSource _readerCts = new();
private readonly object _pendingLock = new();
private TaskCompletionSource<(MessageKind Kind, byte[] Body)>? _pending;
private MessageKind _pendingExpected;
private Task? _readerTask;
private Func<MessageKind, byte[], Task>? _eventHandler;
private GalaxyIpcClient(NamedPipeClientStream stream)
{
_stream = stream;
_reader = new FrameReader(stream, leaveOpen: true);
_writer = new FrameWriter(stream, leaveOpen: true);
}
/// <summary>Connects, sends Hello with the shared secret, and awaits HelloAck. Throws on rejection.</summary>
public static async Task<GalaxyIpcClient> ConnectAsync(
string pipeName, string sharedSecret, TimeSpan connectTimeout, CancellationToken ct)
{
var stream = new NamedPipeClientStream(
serverName: ".",
pipeName: pipeName,
direction: PipeDirection.InOut,
options: PipeOptions.Asynchronous);
await stream.ConnectAsync((int)connectTimeout.TotalMilliseconds, ct);
var client = new GalaxyIpcClient(stream);
try
{
await client._writer.WriteAsync(MessageKind.Hello,
new Hello { PeerName = "Galaxy.Proxy", SharedSecret = sharedSecret }, ct);
// Hello/HelloAck is the one round-trip that runs inline before the reader loop
// starts — the Host expects its response-side write before accepting any other
// frames, so there's no push-event window to worry about here.
var ack = await client._reader.ReadFrameAsync(ct);
if (ack is null || ack.Value.Kind != MessageKind.HelloAck)
throw new InvalidOperationException("Did not receive HelloAck from Galaxy.Host");
var ackMsg = FrameReader.Deserialize<HelloAck>(ack.Value.Body);
if (!ackMsg.Accepted)
throw new UnauthorizedAccessException($"Galaxy.Host rejected Hello: {ackMsg.RejectReason}");
client._readerTask = Task.Run(() => client.ReadLoopAsync(client._readerCts.Token));
return client;
}
catch
{
await client.DisposeAsync();
throw;
}
}
/// <summary>
/// Register a handler that receives unsolicited push frames. Safe to call once per
/// session — typically during the driver's <c>InitializeAsync</c> right after
/// <see cref="ConnectAsync"/>. The handler is invoked on the reader's thread-pool
/// task; it should not block. Exceptions thrown by the handler are swallowed so a
/// buggy event subscriber cannot kill the reader loop.
/// </summary>
public void SetEventHandler(Func<MessageKind, byte[], Task> handler)
=> _eventHandler = handler ?? throw new ArgumentNullException(nameof(handler));
/// <summary>Round-trips a request and returns the deserialized response.</summary>
public async Task<TResp> CallAsync<TReq, TResp>(
MessageKind requestKind, TReq request, MessageKind expectedResponseKind, CancellationToken ct)
{
await _writeGate.WaitAsync(ct);
var tcs = new TaskCompletionSource<(MessageKind, byte[])>(
TaskCreationOptions.RunContinuationsAsynchronously);
try
{
lock (_pendingLock)
{
if (_pending is not null)
throw new InvalidOperationException(
"GalaxyIpcClient pending-response slot is not empty — call re-entry is a bug");
_pending = tcs;
_pendingExpected = expectedResponseKind;
}
await _writer.WriteAsync(requestKind, request, ct);
using var reg = ct.Register(static s =>
((TaskCompletionSource<(MessageKind, byte[])>)s!).TrySetCanceled(), tcs);
var frame = await tcs.Task.ConfigureAwait(false);
if (frame.Item1 == MessageKind.ErrorResponse)
{
var err = MessagePackSerializer.Deserialize<ErrorResponse>(frame.Item2);
throw new GalaxyIpcException(err.Code, err.Message);
}
return MessagePackSerializer.Deserialize<TResp>(frame.Item2);
}
finally
{
lock (_pendingLock)
{
if (ReferenceEquals(_pending, tcs)) _pending = null;
}
_writeGate.Release();
}
}
/// <summary>
/// Fire-and-forget request — used for unsubscribe, alarm-ack, close-session, and other
/// calls where the protocol is one-way. The send is still serialized through the write
/// gate so it doesn't interleave a frame with a concurrent <see cref="CallAsync{TReq, TResp}"/>.
/// </summary>
public async Task SendOneWayAsync<TReq>(MessageKind requestKind, TReq request, CancellationToken ct)
{
await _writeGate.WaitAsync(ct);
try { await _writer.WriteAsync(requestKind, request, ct); }
finally { _writeGate.Release(); }
}
private async Task ReadLoopAsync(CancellationToken ct)
{
try
{
while (!ct.IsCancellationRequested)
{
(MessageKind Kind, byte[] Body)? frame;
try
{
var read = await _reader.ReadFrameAsync(ct).ConfigureAwait(false);
frame = read is null ? null : (read.Value.Kind, read.Value.Body);
}
catch (OperationCanceledException) { break; }
catch (Exception ex)
{
FailPending(ex);
break;
}
if (frame is null)
{
FailPending(new EndOfStreamException("IPC peer closed the pipe"));
break;
}
// Route: response-ish frame to pending TCS if one is waiting, else treat as event.
// ErrorResponse always terminates a pending call — that's the Host signalling a
// request-scoped failure. Unsolicited ErrorResponse with no pending call shouldn't
// happen under a well-formed protocol; if it does, we drop it to the event channel
// so it shows up in logs rather than deadlocking the next CallAsync.
TaskCompletionSource<(MessageKind, byte[])>? pendingTcs = null;
lock (_pendingLock)
{
if (_pending is not null && (frame.Value.Kind == _pendingExpected
|| frame.Value.Kind == MessageKind.ErrorResponse))
{
pendingTcs = _pending;
_pending = null;
}
}
if (pendingTcs is not null)
{
pendingTcs.TrySetResult(frame.Value);
continue;
}
var handler = _eventHandler;
if (handler is null) continue;
try { await handler(frame.Value.Kind, frame.Value.Body).ConfigureAwait(false); }
catch
{
// A buggy subscriber must not kill the reader. The handler is expected to
// do its own logging; swallowing here keeps the channel alive for the next
// frame + the next CallAsync.
}
}
}
finally
{
// Any still-pending call after the loop exits would otherwise hang forever.
FailPending(new EndOfStreamException("IPC reader loop exited"));
}
}
private void FailPending(Exception ex)
{
TaskCompletionSource<(MessageKind, byte[])>? tcs;
lock (_pendingLock) { tcs = _pending; _pending = null; }
tcs?.TrySetException(ex);
}
public async ValueTask DisposeAsync()
{
_readerCts.Cancel();
if (_readerTask is not null)
{
try { await _readerTask.ConfigureAwait(false); } catch { /* shutdown */ }
}
_writeGate.Dispose();
_reader.Dispose();
_writer.Dispose();
_readerCts.Dispose();
await _stream.DisposeAsync();
}
}
public sealed class GalaxyIpcException(string code, string message)
: Exception($"[{code}] {message}")
{
public string Code { get; } = code;
}
@@ -1,29 +0,0 @@
namespace ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Proxy.Supervisor;
/// <summary>
/// Respawn-with-backoff schedule per <c>driver-stability.md §"Crash-loop circuit breaker"</c>:
/// 5s → 15s → 60s, capped. Reset on a successful (&gt; <see cref="StableRunThreshold"/>)
/// run.
/// </summary>
public sealed class Backoff
{
public static TimeSpan[] DefaultSequence { get; } =
[TimeSpan.FromSeconds(5), TimeSpan.FromSeconds(15), TimeSpan.FromSeconds(60)];
public TimeSpan StableRunThreshold { get; init; } = TimeSpan.FromMinutes(2);
private readonly TimeSpan[] _sequence;
private int _index;
public Backoff(TimeSpan[]? sequence = null) => _sequence = sequence ?? DefaultSequence;
public TimeSpan Next()
{
var delay = _sequence[Math.Min(_index, _sequence.Length - 1)];
_index++;
return delay;
}
/// <summary>Called when the spawned process has stayed up past the stable threshold.</summary>
public void RecordStableRun() => _index = 0;
}
@@ -1,68 +0,0 @@
namespace ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Proxy.Supervisor;
/// <summary>
/// Crash-loop circuit breaker per <c>driver-stability.md</c>:
/// 3 crashes within 5 min → open with escalating cooldown 1h → 4h → 24h manual. A sticky
/// alert stays until the operator explicitly resets.
/// </summary>
public sealed class CircuitBreaker
{
public int CrashesAllowedPerWindow { get; init; } = 3;
public TimeSpan Window { get; init; } = TimeSpan.FromMinutes(5);
public TimeSpan[] CooldownEscalation { get; init; } =
[TimeSpan.FromHours(1), TimeSpan.FromHours(4), TimeSpan.MaxValue];
private readonly List<DateTime> _crashesUtc = [];
private DateTime? _openSinceUtc;
private int _escalationLevel;
public bool StickyAlertActive { get; private set; }
/// <summary>
/// Called by the supervisor each time the host process exits unexpectedly. Returns
/// <c>false</c> when the breaker is open — supervisor must not respawn.
/// </summary>
public bool TryRecordCrash(DateTime utcNow, out TimeSpan cooldownRemaining)
{
if (_openSinceUtc is { } openedAt)
{
var cooldown = CooldownEscalation[Math.Min(_escalationLevel, CooldownEscalation.Length - 1)];
if (cooldown == TimeSpan.MaxValue)
{
cooldownRemaining = TimeSpan.MaxValue;
return false; // manual reset required
}
if (utcNow - openedAt < cooldown)
{
cooldownRemaining = cooldown - (utcNow - openedAt);
return false;
}
// Cooldown elapsed — close the breaker but keep the sticky alert per spec.
_openSinceUtc = null;
_escalationLevel++;
}
_crashesUtc.RemoveAll(t => utcNow - t > Window);
_crashesUtc.Add(utcNow);
if (_crashesUtc.Count > CrashesAllowedPerWindow)
{
_openSinceUtc = utcNow;
StickyAlertActive = true;
cooldownRemaining = CooldownEscalation[Math.Min(_escalationLevel, CooldownEscalation.Length - 1)];
return false;
}
cooldownRemaining = TimeSpan.Zero;
return true;
}
public void ManualReset()
{
_crashesUtc.Clear();
_openSinceUtc = null;
_escalationLevel = 0;
StickyAlertActive = false;
}
}
@@ -1,28 +0,0 @@
namespace ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Proxy.Supervisor;
/// <summary>
/// Tracks missed heartbeats on the dedicated heartbeat pipe per
/// <c>driver-stability.md §"Heartbeat between proxy and host"</c>: 2s cadence, 3 consecutive
/// misses = host declared dead (~6s detection).
/// </summary>
public sealed class HeartbeatMonitor
{
public int MissesUntilDead { get; init; } = 3;
public TimeSpan Cadence { get; init; } = TimeSpan.FromSeconds(2);
public int ConsecutiveMisses { get; private set; }
public DateTime? LastAckUtc { get; private set; }
public void RecordAck(DateTime utcNow)
{
ConsecutiveMisses = 0;
LastAckUtc = utcNow;
}
public bool RecordMiss()
{
ConsecutiveMisses++;
return ConsecutiveMisses >= MissesUntilDead;
}
}
@@ -1,30 +0,0 @@
<Project Sdk="Microsoft.NET.Sdk">
<PropertyGroup>
<TargetFramework>net10.0</TargetFramework>
<Nullable>enable</Nullable>
<ImplicitUsings>enable</ImplicitUsings>
<LangVersion>latest</LangVersion>
<TreatWarningsAsErrors>true</TreatWarningsAsErrors>
<GenerateDocumentationFile>true</GenerateDocumentationFile>
<NoWarn>$(NoWarn);CS1591</NoWarn>
<RootNamespace>ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Proxy</RootNamespace>
</PropertyGroup>
<ItemGroup>
<ProjectReference Include="..\ZB.MOM.WW.OtOpcUa.Core.Abstractions\ZB.MOM.WW.OtOpcUa.Core.Abstractions.csproj"/>
<ProjectReference Include="..\ZB.MOM.WW.OtOpcUa.Core\ZB.MOM.WW.OtOpcUa.Core.csproj"/>
<ProjectReference Include="..\ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Shared\ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Shared.csproj"/>
<ProjectReference Include="..\ZB.MOM.WW.OtOpcUa.Core.AlarmHistorian\ZB.MOM.WW.OtOpcUa.Core.AlarmHistorian.csproj"/>
</ItemGroup>
<ItemGroup>
<InternalsVisibleTo Include="ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Proxy.Tests"/>
</ItemGroup>
<ItemGroup>
<NuGetAuditSuppress Include="https://github.com/advisories/GHSA-37gx-xxp4-5rgx"/>
<NuGetAuditSuppress Include="https://github.com/advisories/GHSA-w3x6-4m5h-cxqf"/>
</ItemGroup>
</Project>
@@ -1,32 +0,0 @@
using MessagePack;
namespace ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Shared.Contracts;
[MessagePackObject]
public sealed class AlarmSubscribeRequest
{
[Key(0)] public long SessionId { get; set; }
}
[MessagePackObject]
public sealed class GalaxyAlarmEvent
{
[Key(0)] public string EventId { get; set; } = string.Empty;
[Key(1)] public string ObjectTagName { get; set; } = string.Empty;
[Key(2)] public string AlarmName { get; set; } = string.Empty;
[Key(3)] public int Severity { get; set; }
/// <summary>Per OPC UA Part 9 lifecycle: Active, Unacknowledged, Confirmed, Inactive, etc.</summary>
[Key(4)] public string StateTransition { get; set; } = string.Empty;
[Key(5)] public string Message { get; set; } = string.Empty;
[Key(6)] public long UtcUnixMs { get; set; }
}
[MessagePackObject]
public sealed class AlarmAckRequest
{
[Key(0)] public long SessionId { get; set; }
[Key(1)] public string EventId { get; set; } = string.Empty;
[Key(2)] public string Comment { get; set; } = string.Empty;
}
@@ -1,53 +0,0 @@
using MessagePack;
namespace ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Shared.Contracts;
/// <summary>
/// IPC-shape for a tag value snapshot. Per decision #13: value + StatusCode + source + server timestamps.
/// </summary>
[MessagePackObject]
public sealed class GalaxyDataValue
{
[Key(0)] public string TagReference { get; set; } = string.Empty;
[Key(1)] public byte[]? ValueBytes { get; set; }
[Key(2)] public int ValueMessagePackType { get; set; }
[Key(3)] public uint StatusCode { get; set; }
[Key(4)] public long SourceTimestampUtcUnixMs { get; set; }
[Key(5)] public long ServerTimestampUtcUnixMs { get; set; }
}
[MessagePackObject]
public sealed class ReadValuesRequest
{
[Key(0)] public long SessionId { get; set; }
[Key(1)] public string[] TagReferences { get; set; } = System.Array.Empty<string>();
}
[MessagePackObject]
public sealed class ReadValuesResponse
{
[Key(0)] public bool Success { get; set; }
[Key(1)] public string? Error { get; set; }
[Key(2)] public GalaxyDataValue[] Values { get; set; } = System.Array.Empty<GalaxyDataValue>();
}
[MessagePackObject]
public sealed class WriteValuesRequest
{
[Key(0)] public long SessionId { get; set; }
[Key(1)] public GalaxyDataValue[] Writes { get; set; } = System.Array.Empty<GalaxyDataValue>();
}
[MessagePackObject]
public sealed class WriteValueResult
{
[Key(0)] public string TagReference { get; set; } = string.Empty;
[Key(1)] public uint StatusCode { get; set; }
[Key(2)] public string? Error { get; set; }
}
[MessagePackObject]
public sealed class WriteValuesResponse
{
[Key(0)] public WriteValueResult[] Results { get; set; } = System.Array.Empty<WriteValueResult>();
}
@@ -1,50 +0,0 @@
using MessagePack;
namespace ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Shared.Contracts;
[MessagePackObject]
public sealed class DiscoverHierarchyRequest
{
[Key(0)] public long SessionId { get; set; }
}
/// <summary>
/// IPC-shape for a Galaxy object. Proxy maps to/from <c>DriverAttributeInfo</c> (Core.Abstractions).
/// </summary>
[MessagePackObject]
public sealed class GalaxyObjectInfo
{
[Key(0)] public string ContainedName { get; set; } = string.Empty;
[Key(1)] public string TagName { get; set; } = string.Empty;
[Key(2)] public string? ParentContainedName { get; set; }
[Key(3)] public string TemplateCategory { get; set; } = string.Empty;
[Key(4)] public GalaxyAttributeInfo[] Attributes { get; set; } = System.Array.Empty<GalaxyAttributeInfo>();
}
[MessagePackObject]
public sealed class GalaxyAttributeInfo
{
[Key(0)] public string AttributeName { get; set; } = string.Empty;
[Key(1)] public int MxDataType { get; set; }
[Key(2)] public bool IsArray { get; set; }
[Key(3)] public uint? ArrayDim { get; set; }
[Key(4)] public int SecurityClassification { get; set; }
[Key(5)] public bool IsHistorized { get; set; }
/// <summary>
/// True when the attribute has an AlarmExtension primitive in the Galaxy repository
/// (<c>primitive_definition.primitive_name = 'AlarmExtension'</c>). The generic
/// node-manager uses this to enrich the variable's OPC UA node with an
/// <c>AlarmConditionState</c> during address-space build. Added in PR 9 as the
/// discovery-side foundation for the alarm event wire-up that follows in PR 10+.
/// </summary>
[Key(6)] public bool IsAlarm { get; set; }
}
[MessagePackObject]
public sealed class DiscoverHierarchyResponse
{
[Key(0)] public bool Success { get; set; }
[Key(1)] public string? Error { get; set; }
[Key(2)] public GalaxyObjectInfo[] Objects { get; set; } = System.Array.Empty<GalaxyObjectInfo>();
}
@@ -1,75 +0,0 @@
namespace ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Shared.Contracts;
/// <summary>
/// Length-prefixed framing per decision #28. Each IPC frame is:
/// <c>[4-byte big-endian length][1-byte message kind][MessagePack body]</c>.
/// Length is the body size only; the kind byte is not part of the prefixed length.
/// </summary>
public static class Framing
{
public const int LengthPrefixSize = 4;
public const int KindByteSize = 1;
/// <summary>
/// Maximum permitted body length (16 MiB). Protects the receiver from a hostile or
/// misbehaving peer sending an oversized length prefix.
/// </summary>
public const int MaxFrameBodyBytes = 16 * 1024 * 1024;
}
/// <summary>
/// Wire identifier for each contract. Values are stable — new contracts append.
/// </summary>
public enum MessageKind : byte
{
Hello = 0x01,
HelloAck = 0x02,
Heartbeat = 0x03,
HeartbeatAck = 0x04,
OpenSessionRequest = 0x10,
OpenSessionResponse = 0x11,
CloseSessionRequest = 0x12,
DiscoverHierarchyRequest = 0x20,
DiscoverHierarchyResponse = 0x21,
ReadValuesRequest = 0x30,
ReadValuesResponse = 0x31,
WriteValuesRequest = 0x32,
WriteValuesResponse = 0x33,
SubscribeRequest = 0x40,
SubscribeResponse = 0x41,
UnsubscribeRequest = 0x42,
OnDataChangeNotification = 0x43,
AlarmSubscribeRequest = 0x50,
AlarmEvent = 0x51,
AlarmAckRequest = 0x52,
HistoryReadRequest = 0x60,
HistoryReadResponse = 0x61,
HistoryReadProcessedRequest = 0x62,
HistoryReadProcessedResponse = 0x63,
HistoryReadAtTimeRequest = 0x64,
HistoryReadAtTimeResponse = 0x65,
HistoryReadEventsRequest = 0x66,
HistoryReadEventsResponse = 0x67,
HostConnectivityStatus = 0x70,
RuntimeStatusChange = 0x71,
// Phase 7 Stream D — historian alarm sink. Main server → Galaxy.Host batched
// writes into the Aveva Historian alarm schema via the already-loaded
// aahClientManaged DLLs. HistorianConnectivityStatus fires proactively from the
// Host when the SDK session transitions so diagnostics flip promptly.
HistorianAlarmEventRequest = 0x80,
HistorianAlarmEventResponse = 0x81,
HistorianConnectivityStatus = 0x82,
RecycleHostRequest = 0xF0,
RecycleStatusResponse = 0xF1,
ErrorResponse = 0xFE,
}
@@ -1,36 +0,0 @@
using MessagePack;
namespace ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Shared.Contracts;
/// <summary>
/// First frame of every connection. Advertises protocol major/minor and the peer's feature set.
/// Major mismatch is fatal; minor is advisory. Per Task A.3.
/// </summary>
[MessagePackObject]
public sealed class Hello
{
public const int CurrentMajor = 1;
public const int CurrentMinor = 0;
[Key(0)] public int ProtocolMajor { get; set; } = CurrentMajor;
[Key(1)] public int ProtocolMinor { get; set; } = CurrentMinor;
[Key(2)] public string PeerName { get; set; } = string.Empty;
/// <summary>Per-process shared secret — verified on the Host side against the value passed by the supervisor at spawn time.</summary>
[Key(3)] public string SharedSecret { get; set; } = string.Empty;
[Key(4)] public string[] Features { get; set; } = System.Array.Empty<string>();
}
[MessagePackObject]
public sealed class HelloAck
{
[Key(0)] public int ProtocolMajor { get; set; } = Hello.CurrentMajor;
[Key(1)] public int ProtocolMinor { get; set; } = Hello.CurrentMinor;
/// <summary>True if the server accepted the hello; false + <see cref="RejectReason"/> filled if not.</summary>
[Key(2)] public bool Accepted { get; set; }
[Key(3)] public string? RejectReason { get; set; }
[Key(4)] public string HostName { get; set; } = string.Empty;
}
@@ -1,92 +0,0 @@
using System;
using MessagePack;
namespace ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Shared.Contracts;
/// <summary>
/// Phase 7 Stream D — IPC contracts for routing Part 9 alarm transitions from the
/// main .NET 10 server into Galaxy.Host's already-loaded <c>aahClientManaged</c>
/// DLLs. Reuses the Tier-C isolation + licensing pathway rather than loading 32-bit
/// native historian code into the main server.
/// </summary>
/// <remarks>
/// <para>
/// Batched on the wire to amortize IPC overhead — the main server's SqliteStoreAndForwardSink
/// ships up to 100 events per request per Phase 7 plan Stream D.5.
/// </para>
/// <para>
/// Per-event outcomes (Ack / RetryPlease / PermanentFail) let the drain worker
/// dead-letter malformed events without blocking neighbors in the batch.
/// <see cref="HistorianConnectivityStatusNotification"/> fires proactively from
/// the Host when the SDK session drops so the /hosts + /alarms/historian Admin
/// diagnostics pages flip to red promptly instead of waiting for the next
/// drain cycle.
/// </para>
/// </remarks>
[MessagePackObject]
public sealed class HistorianAlarmEventRequest
{
[Key(0)] public HistorianAlarmEventDto[] Events { get; set; } = Array.Empty<HistorianAlarmEventDto>();
}
[MessagePackObject]
public sealed class HistorianAlarmEventResponse
{
/// <summary>Per-event outcome, same order as the request.</summary>
[Key(0)] public HistorianAlarmEventOutcomeDto[] Outcomes { get; set; } = Array.Empty<HistorianAlarmEventOutcomeDto>();
}
/// <summary>Outcome enum — bytes on the wire so it stays compact.</summary>
public enum HistorianAlarmEventOutcomeDto : byte
{
/// <summary>Successfully persisted to the historian — remove from queue.</summary>
Ack = 0,
/// <summary>Transient failure (historian disconnected, timeout, busy) — retry after backoff.</summary>
RetryPlease = 1,
/// <summary>Permanent failure (malformed, unrecoverable SDK error) — move to dead-letter.</summary>
PermanentFail = 2,
}
/// <summary>One alarm-transition payload. Fields mirror <c>Core.AlarmHistorian.AlarmHistorianEvent</c>.</summary>
[MessagePackObject]
public sealed class HistorianAlarmEventDto
{
[Key(0)] public string AlarmId { get; set; } = string.Empty;
[Key(1)] public string EquipmentPath { get; set; } = string.Empty;
[Key(2)] public string AlarmName { get; set; } = string.Empty;
/// <summary>Concrete Part 9 subtype name — "LimitAlarm" / "OffNormalAlarm" / "AlarmCondition" / "DiscreteAlarm".</summary>
[Key(3)] public string AlarmTypeName { get; set; } = string.Empty;
/// <summary>Numeric severity the Host maps to the historian's priority scale.</summary>
[Key(4)] public int Severity { get; set; }
/// <summary>Which transition this event represents — "Activated" / "Cleared" / "Acknowledged" / etc.</summary>
[Key(5)] public string EventKind { get; set; } = string.Empty;
/// <summary>Pre-rendered message — template tokens resolved upstream.</summary>
[Key(6)] public string Message { get; set; } = string.Empty;
/// <summary>Operator who triggered the transition. "system" for engine-driven events.</summary>
[Key(7)] public string User { get; set; } = "system";
/// <summary>Operator-supplied free-form comment, if any.</summary>
[Key(8)] public string? Comment { get; set; }
/// <summary>Source timestamp (UTC Unix milliseconds).</summary>
[Key(9)] public long TimestampUtcUnixMs { get; set; }
}
/// <summary>
/// Proactive notification — Galaxy.Host pushes this when the historian SDK session
/// transitions (connected / disconnected / degraded). The main server reflects this
/// into the historian sink status so Admin UI surfaces the problem without the
/// operator having to scrutinize drain cadence.
/// </summary>
[MessagePackObject]
public sealed class HistorianConnectivityStatusNotification
{
[Key(0)] public string Status { get; set; } = "unknown"; // connected | disconnected | degraded
[Key(1)] public string? Detail { get; set; }
[Key(2)] public long ObservedAtUtcUnixMs { get; set; }
}
@@ -1,110 +0,0 @@
using MessagePack;
namespace ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Shared.Contracts;
[MessagePackObject]
public sealed class HistoryReadRequest
{
[Key(0)] public long SessionId { get; set; }
[Key(1)] public string[] TagReferences { get; set; } = System.Array.Empty<string>();
[Key(2)] public long StartUtcUnixMs { get; set; }
[Key(3)] public long EndUtcUnixMs { get; set; }
[Key(4)] public uint MaxValuesPerTag { get; set; } = 1000;
}
[MessagePackObject]
public sealed class HistoryTagValues
{
[Key(0)] public string TagReference { get; set; } = string.Empty;
[Key(1)] public GalaxyDataValue[] Values { get; set; } = System.Array.Empty<GalaxyDataValue>();
}
[MessagePackObject]
public sealed class HistoryReadResponse
{
[Key(0)] public bool Success { get; set; }
[Key(1)] public string? Error { get; set; }
[Key(2)] public HistoryTagValues[] Tags { get; set; } = System.Array.Empty<HistoryTagValues>();
}
/// <summary>
/// Processed (aggregated) historian read — OPC UA HistoryReadProcessed service. The
/// aggregate column is a string (e.g. "Average", "Minimum") mapped by the Proxy from the
/// OPC UA HistoryAggregateType enum so Galaxy.Host stays OPC-UA-free.
/// </summary>
[MessagePackObject]
public sealed class HistoryReadProcessedRequest
{
[Key(0)] public long SessionId { get; set; }
[Key(1)] public string TagReference { get; set; } = string.Empty;
[Key(2)] public long StartUtcUnixMs { get; set; }
[Key(3)] public long EndUtcUnixMs { get; set; }
[Key(4)] public long IntervalMs { get; set; }
[Key(5)] public string AggregateColumn { get; set; } = "Average";
}
[MessagePackObject]
public sealed class HistoryReadProcessedResponse
{
[Key(0)] public bool Success { get; set; }
[Key(1)] public string? Error { get; set; }
[Key(2)] public GalaxyDataValue[] Values { get; set; } = System.Array.Empty<GalaxyDataValue>();
}
/// <summary>
/// At-time historian read — OPC UA HistoryReadAtTime service. Returns one sample per
/// requested timestamp (interpolated when no exact match exists). The per-timestamp array
/// is flow-encoded as Unix milliseconds to avoid MessagePack DateTime quirks.
/// </summary>
[MessagePackObject]
public sealed class HistoryReadAtTimeRequest
{
[Key(0)] public long SessionId { get; set; }
[Key(1)] public string TagReference { get; set; } = string.Empty;
[Key(2)] public long[] TimestampsUtcUnixMs { get; set; } = System.Array.Empty<long>();
}
[MessagePackObject]
public sealed class HistoryReadAtTimeResponse
{
[Key(0)] public bool Success { get; set; }
[Key(1)] public string? Error { get; set; }
[Key(2)] public GalaxyDataValue[] Values { get; set; } = System.Array.Empty<GalaxyDataValue>();
}
/// <summary>
/// Historical events read — OPC UA HistoryReadEvents service and Alarm &amp; Condition
/// history. <c>SourceName</c> null means "all sources". Distinct from the live
/// <see cref="GalaxyAlarmEvent"/> stream because historical rows carry both
/// <c>EventTime</c> (when the event occurred in the process) and <c>ReceivedTime</c>
/// (when the Historian persisted it) and have no StateTransition — the Historian logs
/// the instantaneous event, not the OPC UA alarm lifecycle.
/// </summary>
[MessagePackObject]
public sealed class HistoryReadEventsRequest
{
[Key(0)] public long SessionId { get; set; }
[Key(1)] public string? SourceName { get; set; }
[Key(2)] public long StartUtcUnixMs { get; set; }
[Key(3)] public long EndUtcUnixMs { get; set; }
[Key(4)] public int MaxEvents { get; set; } = 1000;
}
[MessagePackObject]
public sealed class GalaxyHistoricalEvent
{
[Key(0)] public string EventId { get; set; } = string.Empty;
[Key(1)] public string? SourceName { get; set; }
[Key(2)] public long EventTimeUtcUnixMs { get; set; }
[Key(3)] public long ReceivedTimeUtcUnixMs { get; set; }
[Key(4)] public string? DisplayText { get; set; }
[Key(5)] public ushort Severity { get; set; }
}
[MessagePackObject]
public sealed class HistoryReadEventsResponse
{
[Key(0)] public bool Success { get; set; }
[Key(1)] public string? Error { get; set; }
[Key(2)] public GalaxyHistoricalEvent[] Events { get; set; } = System.Array.Empty<GalaxyHistoricalEvent>();
}
@@ -1,47 +0,0 @@
using MessagePack;
namespace ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Shared.Contracts;
[MessagePackObject]
public sealed class OpenSessionRequest
{
[Key(0)] public string DriverInstanceId { get; set; } = string.Empty;
/// <summary>JSON blob sourced from <c>DriverInstance.DriverConfig</c>.</summary>
[Key(1)] public string DriverConfigJson { get; set; } = string.Empty;
}
[MessagePackObject]
public sealed class OpenSessionResponse
{
[Key(0)] public bool Success { get; set; }
[Key(1)] public string? Error { get; set; }
[Key(2)] public long SessionId { get; set; }
}
[MessagePackObject]
public sealed class CloseSessionRequest
{
[Key(0)] public long SessionId { get; set; }
}
[MessagePackObject]
public sealed class Heartbeat
{
[Key(0)] public long SequenceNumber { get; set; }
[Key(1)] public long UtcUnixMs { get; set; }
}
[MessagePackObject]
public sealed class HeartbeatAck
{
[Key(0)] public long SequenceNumber { get; set; }
[Key(1)] public long UtcUnixMs { get; set; }
}
[MessagePackObject]
public sealed class ErrorResponse
{
[Key(0)] public string Code { get; set; } = string.Empty;
[Key(1)] public string Message { get; set; } = string.Empty;
}
@@ -1,34 +0,0 @@
using MessagePack;
namespace ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Shared.Contracts;
/// <summary>Per-host runtime status — per <c>driver-stability.md</c> Galaxy §"Connection Health Probe".</summary>
[MessagePackObject]
public sealed class HostConnectivityStatus
{
[Key(0)] public string HostName { get; set; } = string.Empty;
[Key(1)] public string RuntimeStatus { get; set; } = string.Empty; // Running | Stopped | Unknown
[Key(2)] public long LastObservedUtcUnixMs { get; set; }
}
[MessagePackObject]
public sealed class RuntimeStatusChangeNotification
{
[Key(0)] public HostConnectivityStatus Status { get; set; } = new();
}
[MessagePackObject]
public sealed class RecycleHostRequest
{
/// <summary>One of: Soft, Hard.</summary>
[Key(0)] public string Kind { get; set; } = "Soft";
[Key(1)] public string Reason { get; set; } = string.Empty;
}
[MessagePackObject]
public sealed class RecycleStatusResponse
{
[Key(0)] public bool Accepted { get; set; }
[Key(1)] public int GraceSeconds { get; set; } = 15;
[Key(2)] public string? Error { get; set; }
}
@@ -1,34 +0,0 @@
using MessagePack;
namespace ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Shared.Contracts;
[MessagePackObject]
public sealed class SubscribeRequest
{
[Key(0)] public long SessionId { get; set; }
[Key(1)] public string[] TagReferences { get; set; } = System.Array.Empty<string>();
[Key(2)] public int RequestedIntervalMs { get; set; } = 1000;
}
[MessagePackObject]
public sealed class SubscribeResponse
{
[Key(0)] public bool Success { get; set; }
[Key(1)] public string? Error { get; set; }
[Key(2)] public long SubscriptionId { get; set; }
[Key(3)] public int ActualIntervalMs { get; set; }
}
[MessagePackObject]
public sealed class UnsubscribeRequest
{
[Key(0)] public long SessionId { get; set; }
[Key(1)] public long SubscriptionId { get; set; }
}
[MessagePackObject]
public sealed class OnDataChangeNotification
{
[Key(0)] public long SubscriptionId { get; set; }
[Key(1)] public GalaxyDataValue[] Values { get; set; } = System.Array.Empty<GalaxyDataValue>();
}
@@ -1,67 +0,0 @@
using System;
using System.IO;
using System.Threading;
using System.Threading.Tasks;
using MessagePack;
using ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Shared.Contracts;
namespace ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Shared;
/// <summary>
/// Reads length-prefixed, kind-tagged frames from a stream. Single-consumer — do not call
/// <see cref="ReadFrameAsync"/> from multiple threads against the same instance.
/// </summary>
public sealed class FrameReader : IDisposable
{
private readonly Stream _stream;
private readonly bool _leaveOpen;
public FrameReader(Stream stream, bool leaveOpen = false)
{
_stream = stream ?? throw new ArgumentNullException(nameof(stream));
_leaveOpen = leaveOpen;
}
public async Task<(MessageKind Kind, byte[] Body)?> ReadFrameAsync(CancellationToken ct)
{
var lengthPrefix = new byte[Framing.LengthPrefixSize];
if (!await ReadExactAsync(lengthPrefix, ct).ConfigureAwait(false))
return null; // clean EOF on frame boundary
var length = (lengthPrefix[0] << 24) | (lengthPrefix[1] << 16) | (lengthPrefix[2] << 8) | lengthPrefix[3];
if (length < 0 || length > Framing.MaxFrameBodyBytes)
throw new InvalidDataException($"IPC frame length {length} out of range.");
var kindByte = _stream.ReadByte();
if (kindByte < 0) throw new EndOfStreamException("EOF after length prefix, before kind byte.");
var body = new byte[length];
if (!await ReadExactAsync(body, ct).ConfigureAwait(false))
throw new EndOfStreamException("EOF mid-frame.");
return ((MessageKind)(byte)kindByte, body);
}
public static T Deserialize<T>(byte[] body) => MessagePackSerializer.Deserialize<T>(body);
private async Task<bool> ReadExactAsync(byte[] buffer, CancellationToken ct)
{
var offset = 0;
while (offset < buffer.Length)
{
var read = await _stream.ReadAsync(buffer, offset, buffer.Length - offset, ct).ConfigureAwait(false);
if (read == 0)
{
if (offset == 0) return false;
throw new EndOfStreamException($"Stream ended after reading {offset} of {buffer.Length} bytes.");
}
offset += read;
}
return true;
}
public void Dispose()
{
if (!_leaveOpen) _stream.Dispose();
}
}
@@ -1,57 +0,0 @@
using System;
using System.IO;
using System.Threading;
using System.Threading.Tasks;
using MessagePack;
using ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Shared.Contracts;
namespace ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Shared;
/// <summary>
/// Writes length-prefixed, kind-tagged MessagePack frames to a stream. Thread-safe via
/// <see cref="SemaphoreSlim"/> — multiple producers (e.g. heartbeat + data-plane sharing a stream)
/// get serialized writes.
/// </summary>
public sealed class FrameWriter : IDisposable
{
private readonly Stream _stream;
private readonly SemaphoreSlim _gate = new(1, 1);
private readonly bool _leaveOpen;
public FrameWriter(Stream stream, bool leaveOpen = false)
{
_stream = stream ?? throw new ArgumentNullException(nameof(stream));
_leaveOpen = leaveOpen;
}
public async Task WriteAsync<T>(MessageKind kind, T message, CancellationToken ct)
{
var body = MessagePackSerializer.Serialize(message, cancellationToken: ct);
if (body.Length > Framing.MaxFrameBodyBytes)
throw new InvalidOperationException(
$"IPC frame body {body.Length} exceeds {Framing.MaxFrameBodyBytes} byte cap.");
var lengthPrefix = new byte[Framing.LengthPrefixSize];
// Big-endian — easy to read in hex dumps.
lengthPrefix[0] = (byte)((body.Length >> 24) & 0xFF);
lengthPrefix[1] = (byte)((body.Length >> 16) & 0xFF);
lengthPrefix[2] = (byte)((body.Length >> 8) & 0xFF);
lengthPrefix[3] = (byte)( body.Length & 0xFF);
await _gate.WaitAsync(ct).ConfigureAwait(false);
try
{
await _stream.WriteAsync(lengthPrefix, 0, lengthPrefix.Length, ct).ConfigureAwait(false);
_stream.WriteByte((byte)kind);
await _stream.WriteAsync(body, 0, body.Length, ct).ConfigureAwait(false);
await _stream.FlushAsync(ct).ConfigureAwait(false);
}
finally { _gate.Release(); }
}
public void Dispose()
{
_gate.Dispose();
if (!_leaveOpen) _stream.Dispose();
}
}
@@ -1,23 +0,0 @@
<Project Sdk="Microsoft.NET.Sdk">
<PropertyGroup>
<TargetFramework>netstandard2.0</TargetFramework>
<Nullable>enable</Nullable>
<LangVersion>latest</LangVersion>
<TreatWarningsAsErrors>true</TreatWarningsAsErrors>
<GenerateDocumentationFile>true</GenerateDocumentationFile>
<NoWarn>$(NoWarn);CS1591</NoWarn>
<RootNamespace>ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Shared</RootNamespace>
</PropertyGroup>
<ItemGroup>
<!-- Decision #32: MessagePack for IPC. Netstandard 2.0 consumable by both .NET 10 (Proxy) + .NET 4.8 (Host). -->
<PackageReference Include="MessagePack" Version="2.5.187"/>
</ItemGroup>
<ItemGroup>
<NuGetAuditSuppress Include="https://github.com/advisories/GHSA-37gx-xxp4-5rgx"/>
<NuGetAuditSuppress Include="https://github.com/advisories/GHSA-w3x6-4m5h-cxqf"/>
</ItemGroup>
</Project>
@@ -52,7 +52,7 @@ public sealed class GalaxyDiscoverer
if (string.IsNullOrEmpty(attr.AttributeName)) continue;
var fullReference = !string.IsNullOrEmpty(attr.FullTagReference)
? attr.FullTagReference
? StripArraySuffix(attr.FullTagReference)
: obj.TagName + "." + attr.AttributeName;
var info = new DriverAttributeInfo(
@@ -77,4 +77,15 @@ public sealed class GalaxyDiscoverer
}
}
}
// PR 5.W workaround for mxaccessgw GalaxyRepository.cs:173-175 — the gateway's
// SQL appends `[]` to array-typed `full_tag_reference` values, but MxAccess COM
// `IInstance.AddItem` doesn't accept `[]`-suffixed addresses (so any downstream
// Subscribe/Read/Write through the worker would fail with the suffixed form).
// Strip defensively here so the parity matrix can run today; remove once the
// gw fix (mxaccessgw/requirements-array-suffix-fix.md) lands.
private static string StripArraySuffix(string fullReference) =>
fullReference.EndsWith("[]", StringComparison.Ordinal)
? fullReference[..^2]
: fullReference;
}
@@ -0,0 +1,30 @@
using MxGateway.Contracts.Proto.Galaxy;
using ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Runtime;
namespace ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Browse;
/// <summary>
/// PR 6.1 — Decorator that emits one <see cref="System.Diagnostics.Activity"/> span
/// per <c>GetHierarchy</c> RPC. <c>galaxy.object_count</c> on the span lets ops
/// correlate slow Discover passes with Galaxy size without instrumenting the
/// discoverer's translation step.
/// </summary>
internal sealed class TracedGalaxyHierarchySource(IGalaxyHierarchySource inner, string clientName) : IGalaxyHierarchySource
{
public async Task<IReadOnlyList<GalaxyObject>> GetHierarchyAsync(CancellationToken cancellationToken)
{
using var activity = GalaxyTelemetry.ActivitySource.StartActivity("galaxy.get_hierarchy");
activity?.SetTag("galaxy.client", clientName);
try
{
var hierarchy = await inner.GetHierarchyAsync(cancellationToken).ConfigureAwait(false);
activity?.SetTag("galaxy.object_count", hierarchy.Count);
return hierarchy;
}
catch (Exception ex)
{
activity.RecordError(ex);
throw;
}
}
}
@@ -22,13 +22,22 @@ public sealed record GalaxyDriverOptions(
/// through the server-side secret store (DPAPI for production, environment override for
/// dev) — the API key never appears in cleartext config.
/// </summary>
// PR 6.5 tuning notes:
// ConnectTimeoutSeconds = 10 — cold-start network path comfort margin; soak runs
// never saw a successful connect take >2s, so 10s is generous without being lax.
// DefaultCallTimeoutSeconds = 30 — bumped from 5s because a 50k-tag SubscribeBulk
// can exceed 5s under MxAccess COM contention (the worker walks the gw item list
// serially under the apartment lock). 30s leaves comfortable headroom for the
// legitimate worst case while still failing fast on a wedged worker.
// StreamTimeoutSeconds = 0 — unlimited; the StreamEvents RPC must run for the
// lifetime of the driver. Set a finite value only for diagnostic runs.
public sealed record GalaxyGatewayOptions(
string Endpoint,
string ApiKeySecretRef,
bool UseTls = true,
string? CaCertificatePath = null,
int ConnectTimeoutSeconds = 10,
int DefaultCallTimeoutSeconds = 5,
int DefaultCallTimeoutSeconds = 30,
int StreamTimeoutSeconds = 0);
/// <summary>
@@ -47,10 +56,17 @@ public sealed record GalaxyGatewayOptions(
/// Reserved for ArchestrA secured-write user mapping; PR 4.3 wires <c>WriteSecured</c>
/// routing against this id. 0 = anonymous.
/// </param>
/// <param name="EventPumpChannelCapacity">
/// Bounded-channel size between the EventPump's network-read loop and its listener
/// fan-out loop (PR 6.2). Default 50_000 = one second of headroom at 50k tags / 1Hz;
/// raise it when <c>galaxy.events.dropped</c> shows up under transient consumer
/// slowness, lower it on a memory-tight host where the headroom isn't needed.
/// </param>
public sealed record GalaxyMxAccessOptions(
string ClientName,
int PublishingIntervalMs = 1000,
int WriteUserId = 0);
int WriteUserId = 0,
int EventPumpChannelCapacity = 50_000);
/// <summary>
/// Galaxy Repository browse-side knobs consumed by PR 4.1's <c>GalaxyDiscoverer</c>.
@@ -1,6 +1,7 @@
using Microsoft.Extensions.Logging;
using Microsoft.Extensions.Logging.Abstractions;
using MxGateway.Client;
using MxGateway.Contracts.Proto;
using ZB.MOM.WW.OtOpcUa.Core.Abstractions;
using ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Browse;
using ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Config;
@@ -185,8 +186,16 @@ public sealed class GalaxyDriver
_ownedMxSession = new GalaxyMxSession(_options.MxAccess, _logger);
await _ownedMxSession.ConnectAsync(clientOptions, cancellationToken).ConfigureAwait(false);
_subscriber = new GatewayGalaxySubscriber(_ownedMxSession);
_dataWriter = new GatewayGalaxyDataWriter(_ownedMxSession, _options.MxAccess.WriteUserId, _logger);
// PR 6.1 — wrap the gw-facing seams in tracing decorators so every Subscribe /
// Unsubscribe / Write / StreamEvents call emits a span on the
// "ZB.MOM.WW.OtOpcUa.Driver.Galaxy" ActivitySource. The host process's tracing
// listener (OTLP exporter, dotnet-trace, etc.) consumes these without the driver
// taking a dependency on the OpenTelemetry packages.
_subscriber = new TracedGalaxySubscriber(
new GatewayGalaxySubscriber(_ownedMxSession), _options.MxAccess.ClientName);
_dataWriter = new TracedGalaxyDataWriter(
new GatewayGalaxyDataWriter(_ownedMxSession, _options.MxAccess.WriteUserId, _logger),
_options.MxAccess.ClientName);
_supervisor = new ReconnectSupervisor(
reopen: ReopenAsync,
@@ -201,7 +210,9 @@ public sealed class GalaxyDriver
_supervisor.StateChanged += OnSupervisorStateChanged;
_probeWatcher = new PerPlatformProbeWatcher(_subscriber, _hostStatuses, _logger);
_probeWatcher = new PerPlatformProbeWatcher(
_subscriber, _hostStatuses, _logger,
bufferedUpdateIntervalMs: _options.MxAccess.PublishingIntervalMs);
}
/// <summary>
@@ -252,10 +263,58 @@ public sealed class GalaxyDriver
}
}
/// <summary>
/// Resolves <c>Gateway.ApiKeySecretRef</c> to the actual API-key bytes. Three
/// forms supported, evaluated in order:
/// <list type="number">
/// <item><c>env:NAME</c> — reads <c>Environment.GetEnvironmentVariable(NAME)</c>.
/// Throws when the variable is unset, so a misconfigured deployment fails
/// fast at InitializeAsync rather than silently sending an empty key.</item>
/// <item><c>file:PATH</c> — reads UTF-8 text from <c>PATH</c>, trimming
/// whitespace. Lets operators stash the key in an ACL'd file outside the
/// repo (the same pattern as the legacy <c>.local/galaxy-host-secret.txt</c>).</item>
/// <item>Anything else — used as the literal API key. Convenient for dev,
/// and avoids breaking existing configs that pre-date this resolver.</item>
/// </list>
/// A future PR can swap any of these arms for a DPAPI-backed lookup without
/// changing the call site.
/// </summary>
internal static string ResolveApiKey(string secretRef)
{
ArgumentException.ThrowIfNullOrEmpty(secretRef);
if (secretRef.StartsWith("env:", StringComparison.OrdinalIgnoreCase))
{
var name = secretRef[4..];
var value = Environment.GetEnvironmentVariable(name);
return !string.IsNullOrEmpty(value)
? value
: throw new InvalidOperationException(
$"Galaxy.Gateway.ApiKeySecretRef='{secretRef}' resolves to env var '{name}', but it is unset.");
}
if (secretRef.StartsWith("file:", StringComparison.OrdinalIgnoreCase))
{
var path = secretRef[5..];
if (!File.Exists(path))
{
throw new InvalidOperationException(
$"Galaxy.Gateway.ApiKeySecretRef='{secretRef}' points at '{path}', which doesn't exist.");
}
var contents = File.ReadAllText(path).Trim();
return !string.IsNullOrEmpty(contents)
? contents
: throw new InvalidOperationException(
$"Galaxy.Gateway.ApiKeySecretRef='{secretRef}' file '{path}' is empty.");
}
return secretRef;
}
private static MxGatewayClientOptions BuildClientOptions(GalaxyGatewayOptions gw) => new()
{
Endpoint = new Uri(gw.Endpoint, UriKind.Absolute),
ApiKey = gw.ApiKeySecretRef,
ApiKey = ResolveApiKey(gw.ApiKeySecretRef),
UseTls = gw.UseTls,
CaCertificatePath = gw.CaCertificatePath,
ConnectTimeout = TimeSpan.FromSeconds(gw.ConnectTimeoutSeconds),
@@ -357,7 +416,7 @@ public sealed class GalaxyDriver
private SecurityClassification ResolveSecurity(string fullReference) =>
_securityByFullRef.TryGetValue(fullReference, out var sec) ? sec : SecurityClassification.FreeAccess;
// ===== IReadable (PR 4.2 — abstraction; PR 4.4 supplies production reader) =====
// ===== IReadable =====
/// <inheritdoc />
public Task<IReadOnlyList<DataValueSnapshot>> ReadAsync(
@@ -367,19 +426,152 @@ public sealed class GalaxyDriver
ArgumentNullException.ThrowIfNull(fullReferences);
if (fullReferences.Count == 0) return Task.FromResult<IReadOnlyList<DataValueSnapshot>>([]);
if (_dataReader is null)
if (_dataReader is not null)
{
// The production GW-backed reader builds on the StreamEvents pump that PR 4.4
// ships; until then a real gateway-driver instance can't fulfill reads.
// Tests that need to exercise IReadable inject a fake reader via the internal
// ctor; production deployments running on this PR should keep the
// legacy-host backend selected via the Galaxy:Backend flag (PR 4.W).
throw new NotSupportedException(
"GalaxyDriver.ReadAsync requires the StreamEvents-backed reader from PR 4.4. " +
"Until that lands, route reads through the legacy-host backend (Galaxy:Backend=legacy-host).");
// Test-only path — tests inject a canned reader via the internal ctor.
return _dataReader.ReadAsync(fullReferences, cancellationToken);
}
return _dataReader.ReadAsync(fullReferences, cancellationToken);
if (_subscriber is null)
{
throw new NotSupportedException(
"GalaxyDriver.ReadAsync requires a connected GalaxyMxSession (production runtime not built). " +
"Either inject a test seam via the internal ctor or call InitializeAsync against a real gateway.");
}
return ReadViaSubscribeOnceAsync(fullReferences, cancellationToken);
}
/// <summary>
/// Production read path. MxAccess has no one-shot Read RPC — every value comes
/// through the event stream. We synthesise a Read by:
/// <list type="number">
/// <item>Subscribing the requested tags through the existing
/// <see cref="SubscriptionRegistry"/> + <see cref="EventPump"/>.</item>
/// <item>Waiting for the first <c>OnDataChange</c> per item handle (the gateway
/// pushes the current value as the initial event after a SubscribeBulk).</item>
/// <item>Unsubscribing.</item>
/// </list>
/// Tags the gw rejects at SubscribeBulk time, or that never publish before the
/// caller's cancellation token fires, return a Bad-status snapshot in input order
/// so the caller still sees one snapshot per requested reference.
/// </summary>
private async Task<IReadOnlyList<DataValueSnapshot>> ReadViaSubscribeOnceAsync(
IReadOnlyList<string> fullReferences, CancellationToken cancellationToken)
{
var pump = EnsureEventPumpStarted();
var subscriptionId = _subscriptions.NextSubscriptionId();
// Pre-allocate one TaskCompletionSource per full-reference so the OnDataChange
// handler can complete them out-of-order as events arrive. Wired BEFORE the
// SubscribeBulk call so we don't race with the first event the gw pushes.
var pendingByRef = new Dictionary<string, TaskCompletionSource<DataValueSnapshot>>(
StringComparer.OrdinalIgnoreCase);
foreach (var fullRef in fullReferences.Distinct(StringComparer.OrdinalIgnoreCase))
{
pendingByRef[fullRef] = new TaskCompletionSource<DataValueSnapshot>(
TaskCreationOptions.RunContinuationsAsynchronously);
}
EventHandler<DataChangeEventArgs> handler = (_, args) =>
{
// Filter to OUR subscription — the pump's OnDataChange fans out across all
// subscriptions on the driver, and we don't want a parallel ISubscribable
// caller's events to leak into our read.
if (args.SubscriptionHandle is GalaxySubscriptionHandle gsh
&& gsh.SubscriptionId == subscriptionId
&& pendingByRef.TryGetValue(args.FullReference, out var tcs))
{
tcs.TrySetResult(args.Snapshot);
}
};
pump.OnDataChange += handler;
var bufferedIntervalMs = _options.MxAccess.PublishingIntervalMs;
IReadOnlyList<SubscribeResult> results;
try
{
results = await _subscriber!
.SubscribeBulkAsync(fullReferences, bufferedIntervalMs, cancellationToken)
.ConfigureAwait(false);
}
catch
{
pump.OnDataChange -= handler;
throw;
}
// Register bindings so the pump knows to dispatch events for these handles.
var bindings = new List<TagBinding>(fullReferences.Count);
for (var i = 0; i < fullReferences.Count; i++)
{
var fullRef = fullReferences[i];
var match = results.FirstOrDefault(r => string.Equals(r.TagAddress, fullRef, StringComparison.OrdinalIgnoreCase));
var itemHandle = match is { WasSuccessful: true } ? match.ItemHandle : 0;
bindings.Add(new TagBinding(fullRef, itemHandle));
// Tags the gw rejected up front — complete with Bad status now so the
// wait below doesn't time out on them.
if (itemHandle <= 0
&& pendingByRef.TryGetValue(fullRef, out var rejectedTcs))
{
rejectedTcs.TrySetResult(new DataValueSnapshot(
Value: null,
StatusCode: 0x80000000u, // Bad
SourceTimestampUtc: null,
ServerTimestampUtc: DateTime.UtcNow));
}
}
_subscriptions.Register(subscriptionId, bindings);
try
{
// Wait for every pending TCS to complete or the caller's CT to fire. When the
// CT fires before all values arrive, fill the still-pending entries with a
// Bad-status snapshot rather than throwing — Read semantics let callers see
// partial results.
using var registration = cancellationToken.Register(() =>
{
foreach (var tcs in pendingByRef.Values)
{
tcs.TrySetResult(new DataValueSnapshot(
Value: null,
StatusCode: 0x800B0000u, // BadTimeout
SourceTimestampUtc: null,
ServerTimestampUtc: DateTime.UtcNow));
}
});
var snapshots = new DataValueSnapshot[fullReferences.Count];
for (var i = 0; i < fullReferences.Count; i++)
{
snapshots[i] = await pendingByRef[fullReferences[i]].Task.ConfigureAwait(false);
}
return snapshots;
}
finally
{
pump.OnDataChange -= handler;
// Drop the bindings + unsubscribe the live handles. UnsubscribeBulkAsync's
// failure isn't fatal — the registry is already cleared, so any straggling
// event from the gw would be a no-op fan-out.
_subscriptions.Remove(subscriptionId);
var liveHandles = bindings.Where(b => b.ItemHandle > 0).Select(b => b.ItemHandle).ToArray();
if (liveHandles.Length > 0)
{
try
{
await _subscriber!.UnsubscribeBulkAsync(liveHandles, CancellationToken.None)
.ConfigureAwait(false);
}
catch (Exception ex)
{
_logger.LogWarning(ex,
"GalaxyDriver.ReadViaSubscribeOnceAsync UnsubscribeBulk failed for {Count} handle(s) — registry already cleared.",
liveHandles.Length);
}
}
}
}
// ===== IWritable (PR 4.3) =====
@@ -433,7 +625,12 @@ public sealed class GalaxyDriver
return new GalaxySubscriptionHandle(subscriptionId);
}
var bufferedIntervalMs = (int)Math.Max(0, publishingInterval.TotalMilliseconds);
// PR 6.3 — when the caller doesn't set a publishing interval (TimeSpan.Zero or
// negative), fall back to the configured MxAccess.PublishingIntervalMs. The
// server's UA subscription publishingInterval drives this in production; tests
// and infrastructure callers (probe watcher, deploy watcher) hit the fallback.
var requested = (int)Math.Max(0, publishingInterval.TotalMilliseconds);
var bufferedIntervalMs = requested > 0 ? requested : _options.MxAccess.PublishingIntervalMs;
var results = await _subscriber
.SubscribeBulkAsync(fullReferences, bufferedIntervalMs, cancellationToken)
.ConfigureAwait(false);
@@ -503,7 +700,10 @@ public sealed class GalaxyDriver
lock (_pumpLock)
{
if (_eventPump is not null) return _eventPump;
_eventPump = new EventPump(_subscriber!, _subscriptions, _logger);
_eventPump = new EventPump(
_subscriber!, _subscriptions, _logger,
channelCapacity: _options.MxAccess.EventPumpChannelCapacity,
clientName: _options.MxAccess.ClientName);
_eventPump.OnDataChange += OnPumpDataChange;
_eventPump.Start();
return _eventPump;
@@ -547,9 +747,7 @@ public sealed class GalaxyDriver
var clientOptions = new MxGatewayClientOptions
{
Endpoint = new Uri(gw.Endpoint, UriKind.Absolute),
// PR 4.1 stub: ApiKeySecretRef is currently treated as the literal API key.
// PR 4.W (or a follow-up) wires up DPAPI-backed secret resolution.
ApiKey = gw.ApiKeySecretRef,
ApiKey = ResolveApiKey(gw.ApiKeySecretRef),
UseTls = gw.UseTls,
CaCertificatePath = gw.CaCertificatePath,
ConnectTimeout = TimeSpan.FromSeconds(gw.ConnectTimeoutSeconds),
@@ -559,7 +757,8 @@ public sealed class GalaxyDriver
: null,
};
_ownedRepositoryClient = GalaxyRepositoryClient.Create(clientOptions);
return new GatewayGalaxyHierarchySource(_ownedRepositoryClient);
return new TracedGalaxyHierarchySource(
new GatewayGalaxyHierarchySource(_ownedRepositoryClient), _options.MxAccess.ClientName);
}
public void Dispose()
@@ -54,14 +54,15 @@ public static class GalaxyDriverFactoryExtensions
UseTls: dto.Gateway.UseTls ?? true,
CaCertificatePath: dto.Gateway.CaCertificatePath,
ConnectTimeoutSeconds: dto.Gateway.ConnectTimeoutSeconds ?? 10,
DefaultCallTimeoutSeconds: dto.Gateway.DefaultCallTimeoutSeconds ?? 5,
DefaultCallTimeoutSeconds: dto.Gateway.DefaultCallTimeoutSeconds ?? 30,
StreamTimeoutSeconds: dto.Gateway.StreamTimeoutSeconds ?? 0),
MxAccess: new GalaxyMxAccessOptions(
ClientName: dto.MxAccess?.ClientName
?? throw new InvalidOperationException(
$"Galaxy driver '{driverInstanceId}' missing required MxAccess.ClientName"),
PublishingIntervalMs: dto.MxAccess.PublishingIntervalMs ?? 1000,
WriteUserId: dto.MxAccess.WriteUserId ?? 0),
WriteUserId: dto.MxAccess.WriteUserId ?? 0,
EventPumpChannelCapacity: dto.MxAccess.EventPumpChannelCapacity ?? 50_000),
Repository: new GalaxyRepositoryOptions(
DiscoverPageSize: dto.Repository?.DiscoverPageSize ?? 5000,
WatchDeployEvents: dto.Repository?.WatchDeployEvents ?? true),
@@ -104,6 +105,7 @@ public static class GalaxyDriverFactoryExtensions
public string? ClientName { get; init; }
public int? PublishingIntervalMs { get; init; }
public int? WriteUserId { get; init; }
public int? EventPumpChannelCapacity { get; init; }
}
internal sealed class RepositoryDto
@@ -18,7 +18,7 @@ namespace ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Health;
/// (<see cref="HostConnectivityForwarder"/>) both feed this aggregator; the
/// <see cref="GalaxyDriver"/> consumes <see cref="Snapshot"/> from
/// <c>IHostConnectivityProbe.GetHostStatuses()</c> and re-raises
/// <see cref="OnHostStatusChanged"/> as the driver-level event in a follow-up PR.
/// <see cref="OnHostStatusChanged"/> as the driver-level event (wired in PR 4.W).
/// </remarks>
public sealed class HostStatusAggregator
{
@@ -36,6 +36,7 @@ public sealed class PerPlatformProbeWatcher : IDisposable
private readonly IGalaxySubscriber _subscriber;
private readonly HostStatusAggregator _aggregator;
private readonly ILogger _logger;
private readonly int _bufferedUpdateIntervalMs;
// Tracked platform → gw item handle. Item handle 0 means the gw rejected the subscribe;
// we keep the entry so SyncPlatformsAsync doesn't try to subscribe it again on every call.
@@ -45,11 +46,20 @@ public sealed class PerPlatformProbeWatcher : IDisposable
private bool _disposed;
public PerPlatformProbeWatcher(
IGalaxySubscriber subscriber, HostStatusAggregator aggregator, ILogger? logger = null)
IGalaxySubscriber subscriber,
HostStatusAggregator aggregator,
ILogger? logger = null,
int bufferedUpdateIntervalMs = 0)
{
_subscriber = subscriber ?? throw new ArgumentNullException(nameof(subscriber));
_aggregator = aggregator ?? throw new ArgumentNullException(nameof(aggregator));
_logger = logger ?? NullLogger.Instance;
if (bufferedUpdateIntervalMs < 0)
{
throw new ArgumentOutOfRangeException(nameof(bufferedUpdateIntervalMs),
"bufferedUpdateIntervalMs must be >= 0; 0 means use the gw's default cadence.");
}
_bufferedUpdateIntervalMs = bufferedUpdateIntervalMs;
}
/// <summary>Snapshot of platform tag names currently watched.</summary>
@@ -107,10 +117,12 @@ public sealed class PerPlatformProbeWatcher : IDisposable
if (toAdd.Count == 0) return;
var probeAddresses = toAdd.Select(p => p + ProbeSuffix).ToArray();
// bufferedUpdateInterval=0 — probe ScanState changes are rare enough that the gw's
// default cadence is fine; explicit polling rate goes through PR 6.3.
// PR 6.3 — use the configured bufferedUpdateIntervalMs (defaults to 0 = gw cadence
// when the driver hasn't overridden MxAccess.PublishingIntervalMs). Probe ScanState
// changes are rare so a coarser interval is usually fine; deployments that need
// tighter health visibility can dial it down through GalaxyDriverOptions.
var results = await _subscriber.SubscribeBulkAsync(
probeAddresses, bufferedUpdateIntervalMs: 0, cancellationToken).ConfigureAwait(false);
probeAddresses, _bufferedUpdateIntervalMs, cancellationToken).ConfigureAwait(false);
for (var i = 0; i < toAdd.Count; i++)
{
@@ -1,3 +1,5 @@
using System.Diagnostics.Metrics;
using System.Threading.Channels;
using Microsoft.Extensions.Logging;
using Microsoft.Extensions.Logging.Abstractions;
using MxGateway.Contracts.Proto;
@@ -13,19 +15,47 @@ namespace ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Runtime;
/// <see cref="SubscriptionRegistry.ResolveSubscribers"/>).
/// </summary>
/// <remarks>
/// <para>
/// One pump per connected <see cref="GalaxyMxSession"/>. Reconnect lives in PR 4.5's
/// supervisor; on transport failure here we log + propagate so the supervisor can
/// decide whether to restart.
/// </para>
/// <para>
/// PR 6.2 — the network-read loop and the listener-fanout loop are decoupled by a
/// bounded <see cref="Channel{T}"/>. When a listener is slow enough to fill the
/// channel, new events are dropped (newest-dropped semantics: producer's
/// <c>TryWrite</c> fails) rather than back-pressuring the gw stream. Three counters
/// on the <c>ZB.MOM.WW.OtOpcUa.Driver.Galaxy</c> meter expose received / dispatched
/// / dropped totals so ops sees pressure before it manifests as user-visible loss.
/// </para>
/// </remarks>
internal sealed class EventPump : IAsyncDisposable
{
public const string MeterName = "ZB.MOM.WW.OtOpcUa.Driver.Galaxy";
private const int DefaultChannelCapacity = 50_000;
// Single static meter so a host-level MeterListener catches all pump instances.
private static readonly Meter Meter = new(MeterName);
private static readonly Counter<long> EventsReceived =
Meter.CreateCounter<long>("galaxy.events.received", unit: "{event}",
description: "MxEvents read from the gateway StreamEvents stream.");
private static readonly Counter<long> EventsDispatched =
Meter.CreateCounter<long>("galaxy.events.dispatched", unit: "{event}",
description: "MxEvents passed through the bounded channel and into OnDataChange.");
private static readonly Counter<long> EventsDropped =
Meter.CreateCounter<long>("galaxy.events.dropped", unit: "{event}",
description: "MxEvents dropped because the bounded channel was full (newest-dropped).");
private readonly IGalaxySubscriber _subscriber;
private readonly SubscriptionRegistry _registry;
private readonly ILogger _logger;
private readonly Func<long, ISubscriptionHandle> _handleFactory;
private readonly Channel<MxEvent> _channel;
private readonly KeyValuePair<string, object?> _clientTag;
private readonly CancellationTokenSource _cts = new();
private Task? _loop;
private Task? _dispatchLoop;
private bool _disposed;
public event EventHandler<DataChangeEventArgs>? OnDataChange;
@@ -34,12 +64,30 @@ internal sealed class EventPump : IAsyncDisposable
IGalaxySubscriber subscriber,
SubscriptionRegistry registry,
ILogger? logger = null,
Func<long, ISubscriptionHandle>? handleFactory = null)
Func<long, ISubscriptionHandle>? handleFactory = null,
int channelCapacity = DefaultChannelCapacity,
string? clientName = null)
{
_subscriber = subscriber ?? throw new ArgumentNullException(nameof(subscriber));
_registry = registry ?? throw new ArgumentNullException(nameof(registry));
_logger = logger ?? NullLogger.Instance;
_handleFactory = handleFactory ?? (id => new GalaxySubscriptionHandle(id));
if (channelCapacity < 1)
{
throw new ArgumentOutOfRangeException(nameof(channelCapacity),
"channelCapacity must be >= 1; recommended 50_000 for 50k-tag deployments.");
}
_channel = Channel.CreateBounded<MxEvent>(new BoundedChannelOptions(channelCapacity)
{
// Newest-dropped policy: when full, the producer's TryWrite returns false
// and we account for the drop. We do this manually rather than relying on
// BoundedChannelFullMode.DropWrite so we can count drops without polling.
FullMode = BoundedChannelFullMode.Wait,
SingleReader = true,
SingleWriter = true,
});
_clientTag = new KeyValuePair<string, object?>("galaxy.client", clientName ?? "<unknown>");
}
/// <summary>
@@ -51,6 +99,7 @@ internal sealed class EventPump : IAsyncDisposable
ObjectDisposedException.ThrowIf(_disposed, this);
if (_loop is not null) return;
_loop = Task.Run(() => RunAsync(_cts.Token));
_dispatchLoop = Task.Run(() => DispatchLoopAsync(_cts.Token));
}
private async Task RunAsync(CancellationToken ct)
@@ -60,7 +109,15 @@ internal sealed class EventPump : IAsyncDisposable
await foreach (var ev in _subscriber.StreamEventsAsync(ct).WithCancellation(ct).ConfigureAwait(false))
{
if (ct.IsCancellationRequested) break;
Dispatch(ev);
EventsReceived.Add(1, _clientTag);
// Newest-dropped: TryWrite fast-paths the common case (channel has room).
// When full we count the drop and continue reading the gw stream so
// back-pressure doesn't propagate upstream.
if (!_channel.Writer.TryWrite(ev))
{
EventsDropped.Add(1, _clientTag);
}
}
}
catch (OperationCanceledException) when (ct.IsCancellationRequested)
@@ -72,6 +129,32 @@ internal sealed class EventPump : IAsyncDisposable
_logger.LogWarning(ex,
"Galaxy EventPump loop ended with an exception — reconnect supervisor (PR 4.5) handles restart.");
}
finally
{
// Tell the dispatch loop the producer is done so it drains and exits.
_channel.Writer.TryComplete();
}
}
private async Task DispatchLoopAsync(CancellationToken ct)
{
try
{
await foreach (var ev in _channel.Reader.ReadAllAsync(ct).ConfigureAwait(false))
{
Dispatch(ev);
EventsDispatched.Add(1, _clientTag);
}
}
catch (OperationCanceledException) when (ct.IsCancellationRequested)
{
// Clean shutdown.
}
catch (Exception ex)
{
_logger.LogWarning(ex,
"Galaxy EventPump dispatch loop ended with an exception — events past this point will be lost until restart.");
}
}
private void Dispatch(MxEvent ev)
@@ -121,10 +204,15 @@ internal sealed class EventPump : IAsyncDisposable
if (_disposed) return;
_disposed = true;
_cts.Cancel();
_channel.Writer.TryComplete();
if (_loop is not null)
{
try { await _loop.ConfigureAwait(false); } catch { /* shutdown */ }
}
if (_dispatchLoop is not null)
{
try { await _dispatchLoop.ConfigureAwait(false); } catch { /* shutdown */ }
}
_cts.Dispose();
}
}
@@ -0,0 +1,35 @@
using System.Diagnostics;
namespace ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Runtime;
/// <summary>
/// PR 6.1 — In-box <see cref="ActivitySource"/> wired around every gw call the
/// driver makes (Subscribe/Unsubscribe, Write/WriteSecured, GetHierarchy). The
/// decorators in this folder produce one span per call, tagged with the inputs
/// ops needs to triage a slow or failing operation:
/// <c>galaxy.tag_count</c>, <c>galaxy.success_count</c>, <c>galaxy.client</c>.
/// <para>
/// The driver itself doesn't take a dependency on the OpenTelemetry packages —
/// <c>System.Diagnostics.ActivitySource</c> is in the BCL. The host process
/// decides which listener (OTLP exporter, Application Insights, dotnet-trace)
/// subscribes to <see cref="ActivitySourceName"/>.
/// </para>
/// </summary>
internal static class GalaxyTelemetry
{
public const string ActivitySourceName = "ZB.MOM.WW.OtOpcUa.Driver.Galaxy";
public static readonly ActivitySource ActivitySource = new(ActivitySourceName);
/// <summary>
/// Tag a span with a failure reason and set its status to <c>Error</c>. Helper
/// so the decorators don't repeat the four-line idiom on every catch block.
/// </summary>
public static void RecordError(this Activity? activity, Exception ex)
{
if (activity is null) return;
activity.SetStatus(ActivityStatusCode.Error, ex.Message);
activity.SetTag("exception.type", ex.GetType().FullName);
activity.SetTag("exception.message", ex.Message);
}
}
@@ -1,5 +1,6 @@
using MxGateway.Client;
using MxGateway.Contracts.Proto;
// Use the generated nested status enum for the SetBufferedUpdateInterval reply check.
namespace ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Runtime;
@@ -9,14 +10,16 @@ namespace ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Runtime;
/// gateway and streams MxEvents via the gw's bidirectional events RPC.
/// </summary>
/// <remarks>
/// The gw's <c>SubscribeBulkAsync</c> doesn't currently take a buffered-update-interval
/// hint as a typed parameter — gw issue #102 / lmx_mxgw_impl.md gw-9 tracks adding
/// <c>buffered_update_interval_ms</c>. Until that lands, the parameter is captured here
/// and forwarded to <c>SetBufferedUpdateInterval</c> in a follow-up. PR 6.3 picks it up.
/// PR 6.3 wired the per-call <c>buffered_update_interval_ms</c> through
/// <see cref="SubscribeBulkAsync"/>. The gw's contract is session-level
/// (<c>SetBufferedUpdateInterval</c> applies to all buffered subscriptions on the
/// server handle), so we cache the last-applied value and skip redundant calls.
/// </remarks>
public sealed class GatewayGalaxySubscriber : IGalaxySubscriber
{
private readonly GalaxyMxSession _session;
private readonly Lock _intervalLock = new();
private int _lastAppliedIntervalMs = -1; // -1 = never applied; 0 = explicit "use gw default"
public GatewayGalaxySubscriber(GalaxyMxSession session)
{
@@ -31,14 +34,65 @@ public sealed class GatewayGalaxySubscriber : IGalaxySubscriber
"GalaxyMxSession is not connected. Call ConnectAsync before subscribing.");
var serverHandle = _session.ServerHandle;
// PR 6.3 wires bufferedUpdateIntervalMs to SetBufferedUpdateInterval; until then
// ignore it — values still arrive at the gw's default cadence.
_ = bufferedUpdateIntervalMs;
// The gw's SubscribeBulk RPC doesn't carry a per-call interval — buffered cadence
// is session-level, set via SetBufferedUpdateInterval. Apply it before the
// SubscribeBulk so the very first events on the new handles publish at the
// requested cadence. Skip when the last-applied value already matches.
if (bufferedUpdateIntervalMs > 0)
{
await EnsureSessionIntervalAsync(session, serverHandle, bufferedUpdateIntervalMs, cancellationToken)
.ConfigureAwait(false);
}
return await session.SubscribeBulkAsync(serverHandle, fullReferences, cancellationToken)
.ConfigureAwait(false);
}
/// <summary>
/// Apply the gateway's session-level <c>SetBufferedUpdateInterval</c> command. The
/// gw's contract is "for this server handle, every buffered subscription publishes
/// at this cadence" — there's no per-handle granularity, so we cache the last
/// applied value and skip redundant calls.
/// </summary>
private async Task EnsureSessionIntervalAsync(
MxGateway.Client.MxGatewaySession session, int serverHandle, int intervalMs, CancellationToken cancellationToken)
{
lock (_intervalLock)
{
if (_lastAppliedIntervalMs == intervalMs) return;
}
var reply = await session.InvokeAsync(
new MxCommandRequest
{
SessionId = session.SessionId,
ClientCorrelationId = Guid.NewGuid().ToString("N"),
Command = new MxCommand
{
Kind = MxCommandKind.SetBufferedUpdateInterval,
SetBufferedUpdateInterval = new SetBufferedUpdateIntervalCommand
{
ServerHandle = serverHandle,
UpdateIntervalMilliseconds = intervalMs,
},
},
},
cancellationToken).ConfigureAwait(false);
if (reply.ProtocolStatus?.Code is not (ProtocolStatusCode.Ok or ProtocolStatusCode.MxaccessFailure))
{
// Don't throw on a soft failure — the SubscribeBulk will still succeed at the
// gw's default cadence, which is functional just not the requested cadence.
// The trace span (PR 6.1) plus the warning here gives ops the signal.
return;
}
lock (_intervalLock)
{
_lastAppliedIntervalMs = intervalMs;
}
}
public async Task UnsubscribeBulkAsync(IReadOnlyList<int> itemHandles, CancellationToken cancellationToken)
{
if (itemHandles.Count == 0) return;
@@ -0,0 +1,54 @@
using ZB.MOM.WW.OtOpcUa.Core.Abstractions;
namespace ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Runtime;
/// <summary>
/// PR 6.1 — Decorator that emits one <see cref="System.Diagnostics.Activity"/> span
/// per gw write batch. Tags secured-write counts so ops can see the routing-by-
/// classification split (FreeAccess/Operate vs Tune/Configure) without re-reading
/// the discovery dictionary.
/// </summary>
internal sealed class TracedGalaxyDataWriter(IGalaxyDataWriter inner, string clientName) : IGalaxyDataWriter
{
public async Task<IReadOnlyList<WriteResult>> WriteAsync(
IReadOnlyList<WriteRequest> writes,
Func<string, SecurityClassification> securityResolver,
CancellationToken cancellationToken)
{
using var activity = GalaxyTelemetry.ActivitySource.StartActivity("galaxy.write");
activity?.SetTag("galaxy.client", clientName);
activity?.SetTag("galaxy.tag_count", writes.Count);
if (activity is { IsAllDataRequested: true })
{
// Counting the secured-write split is cheap (one resolver call per request)
// and only happens when a tracing listener is actively recording — keeps the
// hot path free when no one's listening.
var securedCount = 0;
foreach (var w in writes)
{
var sc = securityResolver(w.FullReference);
if (sc is SecurityClassification.Tune
or SecurityClassification.Configure
or SecurityClassification.VerifiedWrite)
{
securedCount++;
}
}
activity.SetTag("galaxy.secured_write_count", securedCount);
}
try
{
var results = await inner.WriteAsync(writes, securityResolver, cancellationToken)
.ConfigureAwait(false);
activity?.SetTag("galaxy.success_count", results.Count(r => r.StatusCode < 0x80000000u));
return results;
}
catch (Exception ex)
{
activity.RecordError(ex);
throw;
}
}
}
@@ -0,0 +1,91 @@
using System.Runtime.CompilerServices;
using MxGateway.Contracts.Proto;
namespace ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Runtime;
/// <summary>
/// PR 6.1 — Decorator that emits one <see cref="System.Diagnostics.Activity"/> span
/// per gw subscription RPC. Wraps the production <see cref="GatewayGalaxySubscriber"/>;
/// tests substitute a fake at the same seam without taking the tracing overhead.
/// </summary>
internal sealed class TracedGalaxySubscriber(IGalaxySubscriber inner, string clientName) : IGalaxySubscriber
{
public async Task<IReadOnlyList<SubscribeResult>> SubscribeBulkAsync(
IReadOnlyList<string> fullReferences, int bufferedUpdateIntervalMs, CancellationToken cancellationToken)
{
using var activity = GalaxyTelemetry.ActivitySource.StartActivity("galaxy.subscribe_bulk");
activity?.SetTag("galaxy.client", clientName);
activity?.SetTag("galaxy.tag_count", fullReferences.Count);
activity?.SetTag("galaxy.buffered_interval_ms", bufferedUpdateIntervalMs);
try
{
var results = await inner.SubscribeBulkAsync(fullReferences, bufferedUpdateIntervalMs, cancellationToken)
.ConfigureAwait(false);
activity?.SetTag("galaxy.success_count", results.Count(r => r.WasSuccessful));
return results;
}
catch (Exception ex)
{
activity.RecordError(ex);
throw;
}
}
public async Task UnsubscribeBulkAsync(IReadOnlyList<int> itemHandles, CancellationToken cancellationToken)
{
using var activity = GalaxyTelemetry.ActivitySource.StartActivity("galaxy.unsubscribe_bulk");
activity?.SetTag("galaxy.client", clientName);
activity?.SetTag("galaxy.tag_count", itemHandles.Count);
try
{
await inner.UnsubscribeBulkAsync(itemHandles, cancellationToken).ConfigureAwait(false);
}
catch (Exception ex)
{
activity.RecordError(ex);
throw;
}
}
/// <summary>
/// Streaming RPC — one parent span covers the entire stream lifetime. Per-event
/// spans would dominate the trace volume at 50k tags / 1Hz; ops gets per-event
/// visibility through <see cref="EventPump"/>'s metrics in PR 6.2 instead.
/// </summary>
public async IAsyncEnumerable<MxEvent> StreamEventsAsync(
[EnumeratorCancellation] CancellationToken cancellationToken)
{
using var activity = GalaxyTelemetry.ActivitySource.StartActivity("galaxy.stream_events");
activity?.SetTag("galaxy.client", clientName);
IAsyncEnumerator<MxEvent>? enumerator = null;
try
{
enumerator = inner.StreamEventsAsync(cancellationToken).GetAsyncEnumerator(cancellationToken);
var eventCount = 0L;
while (true)
{
bool moveNext;
try
{
moveNext = await enumerator.MoveNextAsync().ConfigureAwait(false);
}
catch (Exception ex)
{
activity.RecordError(ex);
activity?.SetTag("galaxy.event_count", eventCount);
throw;
}
if (!moveNext) break;
eventCount++;
yield return enumerator.Current;
}
activity?.SetTag("galaxy.event_count", eventCount);
}
finally
{
if (enumerator is not null) await enumerator.DisposeAsync().ConfigureAwait(false);
}
}
}
+4 -7
View File
@@ -14,7 +14,6 @@ using ZB.MOM.WW.OtOpcUa.Driver.Historian.Wonderware.Client;
using ZB.MOM.WW.OtOpcUa.Driver.AbCip;
using ZB.MOM.WW.OtOpcUa.Driver.AbLegacy;
using ZB.MOM.WW.OtOpcUa.Driver.FOCAS;
using ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Proxy;
using ZB.MOM.WW.OtOpcUa.Driver.Modbus;
using ZB.MOM.WW.OtOpcUa.Driver.S7;
using ZB.MOM.WW.OtOpcUa.Driver.TwinCAT;
@@ -110,12 +109,10 @@ builder.Services.AddSingleton<NodeBootstrap>();
builder.Services.AddSingleton<DriverFactoryRegistry>(_ =>
{
var registry = new DriverFactoryRegistry();
// Both Galaxy backends register side-by-side under distinct DriverType names
// ("Galaxy" → legacy GalaxyProxyDriver, "GalaxyMxGateway" → in-process GalaxyDriver
// over the gRPC mxaccessgw). The DriverInstance row's DriverType selects between
// them at bootstrap time — see lmx_mxgw.md / PR 4.W. Phase 7 retires the legacy
// factory once parity tests pin both.
GalaxyProxyDriverFactoryExtensions.Register(registry);
// Galaxy access flows through the in-process GalaxyDriver (DriverType =
// "GalaxyMxGateway") talking gRPC to the mxaccessgw worker. The legacy
// out-of-process GalaxyProxyDriver retired in PR 7.2 once the parity matrix
// (docs/v2/Galaxy.ParityMatrix.md) verified equivalence.
ZB.MOM.WW.OtOpcUa.Driver.Galaxy.GalaxyDriverFactoryExtensions.Register(registry);
FocasDriverFactoryExtensions.Register(registry);
ModbusDriverFactoryExtensions.Register(registry);

Some files were not shown because too many files have changed in this diff Show More