When a client calls Subscribe multiple times with the same session ID
(one tag per RPC), each call overwrites the ClientSubscription entry.
UnsubscribeClient only cleaned up tags from the last entry, leaving
earlier tags orphaned in _tagSubscriptions. Now scans all tag
subscriptions for client references during cleanup.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Grpc.Core doesn't reliably fire CancellationToken on client disconnect,
so Subscribe RPCs can hang forever and leak session subscriptions. Bridge
SessionManager scavenging to SubscriptionManager cleanup, and add a
30-second periodic session validity check in the Subscribe loop so stale
streams exit within 30s of session scavenge rather than hanging until
process restart.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Without this, the staleness check could fire immediately after reconnect
before the first OnDataChange callback arrives, causing a reconnect loop.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
ReadAsync internally subscribes/unsubscribes the same ScanTime tag used
by the persistent probe, which was tearing down the probe subscription
and triggering false reconnects every ~5s. Guard UnsubscribeInternal and
stored subscription state so the probe tag is never removed by other
callers. Also removes DetailedHealthCheckService (redundant with the
persistent probe), adds per-instance config files (appsettings.v2.json,
appsettings.v2b.json) loaded via LMXPROXY_INSTANCE env var so deploys
no longer overwrite port settings.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The old probe did a subscribe-read-unsubscribe cycle every 5 seconds to
check connection health. This created unnecessary churn and didn't detect
the failure mode where long-lived subscriptions silently stop receiving
COM callbacks (e.g. stalled STA message pump). The new approach keeps a
persistent subscription on the health check tag and forces reconnect if
no value update arrives within a configurable threshold (ProbeStaleThresholdMs,
default 5s). Also adds STA message pump debug logging (5-min heartbeat with
message counters) and fixes log file path resolution for Windows services.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The SiteCommunicationActor expected an event log handler but none was
registered, causing "Event log handler not available" on the Event Logs
page and CLI. Bridge IEventLogQueryService to Akka via a simple actor.
Table now displays all 5 RPC types (Read, ReadBatch, Write, WriteBatch,
Subscribe) with dashes for zero-count operations instead of hiding the
table entirely.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Subscription metrics (totalDelivered, totalDropped) now visible in
/api/status JSON and HTML dashboard. Card turns yellow if drops > 0.
Aggregated from per-client counters in SubscriptionManager.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Deviation #2: document three STA iterations (failed → Task.Run → StaComThread)
- Deviation #7: mark resolved — OnWriteComplete now works via STA message pump
- Deviation #8: note awaited subscription creation fixes flaky subscribe test
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
SubscriptionManager.Subscribe was fire-and-forgetting the MxAccess COM
subscription creation. The initial OnDataChange callback could fire
before the subscription was established, losing the first (and possibly
only) value update. Changed to async SubscribeAsync that awaits
CreateMxAccessSubscriptionsAsync before returning the channel reader.
Subscribe_ReceivesUpdates now passes 5/5 consecutive runs.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
With StaComThread's GetMessage loop in place, OnWriteComplete callbacks
are now delivered properly. Write flow: dispatch Write() on STA thread,
await OnWriteComplete via TCS, clean up on STA thread. Falls back to
fire-and-forget on timeout as safety net. OnWriteComplete now resolves
or rejects the TCS with MxStatus error details.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Multiple instances registering with the same name may cause MxAccess to
conflict on callback routing. ClientName is now configurable via
appsettings.json, defaulting to a GUID-suffixed name if not set.
Instances A and B use "LmxProxy-A" and "LmxProxy-B" respectively.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Three fixes for the SubscriptionManager/MxAccessClient subscription pipeline:
1. Serialize Subscribe and UnsubscribeClient with a SemaphoreSlim gate to prevent
race where old-session unsubscribe removes new-session COM subscriptions.
CreateMxAccessSubscriptionsAsync is now awaited instead of fire-and-forget.
2. Fix dual VTQ delivery in MxAccessClient.OnDataChange — each update was delivered
twice (once via stored callback, once via OnTagValueChanged property). Now uses
stored callback as the single delivery path.
3. Store pending tag addresses when CreateMxAccessSubscriptionsAsync fails (MxAccess
down) and retry them on reconnect via NotifyReconnection/RetryPendingSubscriptionsAsync.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Move connection bindings, attribute overrides, and area assignment from
inline expandable rows on the Instances table to a separate page at
/deployment/instances/{id}/configure for a cleaner, less cramped UX.
Replace raw dictionary casting with ScriptParameters wrapper that provides
Get<T>, Get<T?>, Get<T[]>, and Get<List<T>> with clear error messages,
numeric conversion, and JsonElement support for Inbound API parameters.
The LmxProxy client's ExtractArrayValue now returns proper .NET arrays
(bool[], int[], DateTime[], etc.) instead of ArrayValue objects. Removed
the reflection-based FormatArrayContainer logic — IEnumerable handling
is sufficient for all array types.
Added DatetimeArray message (repeated int64, UTC ticks) to proto and
code-first contracts. Host serializes DateTime[] → DatetimeArray.
Client deserializes DatetimeArray → DateTime[] (not raw long[]).
Client ExtractArrayValue now unpacks all array types including DateTime.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
DateTime[] from MxAccess was falling through to ToString() fallback,
producing "System.DateTime[]" instead of actual values. Now converts
each DateTime to UTC ticks and stores in Int64Array.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
ArrayValue from LmxProxy client was showing as type name in debug views.
Added ValueFormatter utility and NormalizeValue in LmxProxyDataConnection
to convert arrays at the adapter boundary. DateTime arrays remain as
"System.DateTime[]" due to server-side v1 string serialization.
Docker: include lmxproxy/src/ZB.MOM.WW.LmxProxy.Client in build context
so the project reference resolves during container image build.
CLI: fix set-bindings JSON parsing — use JsonElement.GetString()/GetInt32()
instead of object.ToString() which returned null for deserialized elements.
Thread backup data connection fields through management command messages,
ManagementActor handlers, SiteService, site-side SQLite storage, and
deployment/replication actors. The old --configuration CLI flag is kept
as a hidden alias for backwards compatibility.
Add ActiveEndpoint field to DataConnectionHealthReport showing which
endpoint is active (Primary, Backup, or Primary with no backup configured).
Log failover transitions and connection restoration events to the site
event log via ISiteEventLogger, passed as an optional parameter through
the actor hierarchy for backwards compatibility.
Update CreateConnectionCommand to carry PrimaryConnectionDetails,
BackupConnectionDetails, and FailoverRetryCount. Update all callers:
DataConnectionManagerActor, DataConnectionActor, DeploymentManagerActor,
FlatteningService, and ConnectionConfig. The actor stores both configs
but continues using primary only — failover logic comes in Task 3.
Covers entity model, failover state machine, health reporting,
UI/CLI changes, and deployment flow for optional backup endpoints
with automatic failover after configurable retry count.
Switches from v1 string-based proto stubs to the production LmxProxyClient
(v2 native TypedValue protocol) via project reference. Deletes 6k+ lines of
generated proto code. Preserves ILmxProxyClient adapter interface for testability.
LmxFakeProxy is no longer needed now that two real LmxProxy v2 instances
are available for testing. Added remote test infra section to test_infra.md
documenting the windev instances. Removed tagsim (never committed).
Both instances share API keys and connect to the same AVEVA platform.
Verified: 17/17 integration tests pass against both instances.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
AppEngine built-in tag is always present and constantly updating (~1s),
making it a more reliable probe than a user-deployed TestChildObject tag.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Gap 1: Active health probing verified — 60s recovery after platform restart.
Gap 2: Address-based subscription cleanup — no stale handles.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Tested aaBootstrap kill on windev — three gaps identified:
1. No active health probing (IsConnected stays true on dead connection)
2. Stale SubscriptionManager handles after reconnect cycle
3. AVEVA objects don't auto-start after platform crash (platform behavior)
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- MxStatusMapper: maps all 40+ MxStatusDetail codes, MxStatusCategory,
and MxStatusSource to human-readable names and client messages
- OnDataChange: checks MXSTATUS_PROXY.success and overrides quality with
specific OPC UA code when MxAccess reports a failure (e.g., CommFailure,
ConfigError, WaitingForInitialData)
- OnWriteComplete: uses MxStatusMapper.FormatStatus for structured logging
- Write errors: catches COMException separately with HRESULT in message
- Read errors: distinguishes COM, timeout, and generic failures in logging
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Documents when the full STA+Application.Run() approach is needed
(secured/verified writes), why our first attempt failed, the correct
pattern using Form.BeginInvoke(), and tradeoffs vs fire-and-forget.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>