92 lines
5.5 KiB
Markdown
92 lines
5.5 KiB
Markdown
# PR 4 — Phase 2 follow-up: close the 4 open MXAccess findings
|
|
|
|
**Source**: `phase-2-pr4-findings` (branched from `phase-2-stream-d`)
|
|
**Target**: `v2`
|
|
|
|
## Summary
|
|
|
|
Closes the 4 high/medium open findings carried forward in `exit-gate-phase-2-final.md`:
|
|
|
|
- **High 1 — `ReadAsync` subscription-leak on cancel.** One-shot read now wraps the
|
|
subscribe→first-OnDataChange→unsubscribe pattern in a `try/finally` so the per-tag
|
|
callback is always detached, and if the read installed the underlying MXAccess
|
|
subscription itself (no other caller had it), it tears it down on the way out.
|
|
- **High 2 — No reconnect loop on the MXAccess COM connection.** New
|
|
`MxAccessClientOptions { AutoReconnect, MonitorInterval, StaleThreshold }` + a background
|
|
`MonitorLoopAsync` that watches a stale-activity threshold + probes the proxy via a
|
|
no-op COM call, then reconnects-with-replay (re-Register, re-AddItem every active
|
|
subscription) when the proxy is dead. Liveness signal: every `OnDataChange` callback bumps
|
|
`_lastObservedActivityUtc`. Defaults match v1 monitor cadence (5s poll, 60s stale).
|
|
`ReconnectCount` exposed for diagnostics; `ConnectionStateChanged` event for downstream
|
|
consumers (the supervisor on the Proxy side already surfaces this through its
|
|
HeartbeatMonitor, but the Host-side event lets local logging/metrics hook in).
|
|
- **Medium 3 — `MxAccessGalaxyBackend.SubscribeAsync` doesn't push OnDataChange frames back to
|
|
the Proxy.** New `IGalaxyBackend.OnDataChange` / `OnAlarmEvent` / `OnHostStatusChanged`
|
|
events that the new `GalaxyFrameHandler.AttachConnection` subscribes per-connection and
|
|
forwards as outbound `OnDataChangeNotification` / `AlarmEvent` /
|
|
`RuntimeStatusChange` frames through the connection's `FrameWriter`. `MxAccessGalaxyBackend`
|
|
fans out per-tag value changes to every `SubscriptionId` that's listening to that tag
|
|
(multiple Proxy subs may share a Galaxy attribute — single COM subscription, multi-fan-out
|
|
on the wire). Stub + DbBacked backends declare the events with `#pragma warning disable
|
|
CS0067` (treat-warnings-as-errors would otherwise fail on never-raised events that exist
|
|
only to satisfy the interface).
|
|
- **Medium 4 — `WriteValuesAsync` doesn't await `OnWriteComplete`.** New
|
|
`WriteAsync(...)` overload returns `bool` after awaiting the OnWriteComplete callback via
|
|
the v1-style `TaskCompletionSource`-keyed-by-item-handle pattern in `_pendingWrites`.
|
|
`MxAccessGalaxyBackend.WriteValuesAsync` now reports per-tag `Bad_InternalError` when the
|
|
runtime rejected the write, instead of false-positive `Good`.
|
|
|
|
## Pipe server change
|
|
|
|
`IFrameHandler` gains `AttachConnection(FrameWriter writer): IDisposable` so the handler can
|
|
register backend event sinks on each accepted connection and detach them at disconnect. The
|
|
`PipeServer.RunOneConnectionAsync` calls it after the Hello handshake and disposes it in the
|
|
finally of the per-connection scope. `StubFrameHandler` returns `IFrameHandler.NoopAttachment.Instance`
|
|
(net48 doesn't support default interface methods, so the empty-attach lives as a public nested
|
|
class).
|
|
|
|
## Tests
|
|
|
|
**`dotnet test ZB.MOM.WW.OtOpcUa.slnx`**: **460 pass / 7 skip (E2E on admin shell) / 1
|
|
pre-existing baseline failure**. No regressions. The Driver.Galaxy.Host unit tests + 5 live
|
|
ZB smoke + 3 live MXAccess COM smoke all pass unchanged.
|
|
|
|
## Test plan for reviewers
|
|
|
|
- [ ] `dotnet build` clean
|
|
- [ ] `dotnet test` shows 460/7-skip/1-baseline
|
|
- [ ] Spot-check `MxAccessClient.MonitorLoopAsync` against v1's `MxAccessClient.Monitor`
|
|
partial (`src/ZB.MOM.WW.OtOpcUa.Host/MxAccess/MxAccessClient.Monitor.cs`) — same
|
|
polling cadence, same probe-then-reconnect-with-replay shape
|
|
- [ ] Read `GalaxyFrameHandler.ConnectionSink.Dispose` and confirm event handlers are
|
|
detached on connection close (no leaked invocation list refs)
|
|
- [ ] `WriteValuesAsync` returning `Bad_InternalError` on a runtime-rejected write is the
|
|
correct shape — confirm against the v1 `MxAccessClient.ReadWrite.cs` pattern
|
|
|
|
## What's NOT in this PR
|
|
|
|
- Wonderware Historian SDK plugin port (Task B.1.h) — separate PR, larger scope.
|
|
- Alarm subsystem wire-up (`MxAccessGalaxyBackend.SubscribeAlarmsAsync` is still a no-op).
|
|
`OnAlarmEvent` is declared on the backend interface and pushed by the frame handler when
|
|
raised; `MxAccessGalaxyBackend` just doesn't raise it yet (waits for the alarm-tracking
|
|
port from v1's `AlarmObjectFilter` + Galaxy alarm primitives).
|
|
- Host-status push (`OnHostStatusChanged`) — declared on the interface and pushed by the
|
|
frame handler; `MxAccessGalaxyBackend` doesn't raise it (the Galaxy.Host's
|
|
`HostConnectivityProbe` from v1 needs porting too, scoped under the Historian PR).
|
|
|
|
## Adversarial review
|
|
|
|
Quick pass over the PR 4 deltas. No new findings beyond:
|
|
|
|
- **Low 1** — `MonitorLoopAsync`'s `$Heartbeat` probe item-handle is leaked
|
|
(`AddItem` succeeds, never `RemoveItem`'d). Cosmetic — the probe item is internal to
|
|
the COM connection, dies with `Unregister` at disconnect/recycle. Worth a follow-up
|
|
to call `RemoveItem` after the probe succeeds.
|
|
- **Low 2** — Replay loop in `MonitorLoopAsync` swallows per-subscription failures. If
|
|
Galaxy permanently rejects a previously-valid reference (rare but possible after a
|
|
re-deploy), the user gets silent data loss for that one subscription. The stub-handler-
|
|
unaware operator wouldn't notice. Worth surfacing as a `ConnectionStateChanged(false)
|
|
→ ConnectionStateChanged(true)` payload that includes the replay-failures list.
|
|
|
|
Both are low-priority follow-ups, not PR 4 blockers.
|