Central nodes crashed at startup with `CREATE DATABASE permission denied`
when MSSQL accepted connections before recovering user databases —
DB_ID(@db) returned null, so EF Core's MigrateAsync fell through to
SqlServerDatabaseCreator.CreateAsync. The non-privileged app login then
failed CREATE DATABASE and the host terminated with FTL, leaving Traefik's
/health/active probe unable to find an upstream ("no available server" at
localhost:9000).
Add MigrationHelper.WaitForDatabaseReadyAsync that polls
Database.CanConnectAsync() for up to 60s before invoking MigrateAsync, and
thread an ILogger through so retry attempts surface in normal logs. This
removes the startup race without requiring depends_on across compose stacks
or granting dbcreator to the app login.
LmxProxy is no longer needed. Moved the entire lmxproxy/ workspace, DCL
adapter files, and related docs to deprecated/. Removed LmxProxy registration
from DataConnectionFactory, project reference from DCL, protocol option from
UI, and cleaned up all requirement docs.
HandleConnectionQualityChanged now publishes AttributeValueChanged events
to the SiteStreamManager for all affected attributes. This ensures the
central UI debug view updates in real-time when a data connection
disconnects and attributes go bad quality.
Only publishes to the stream — does NOT notify script or alarm actors,
since the value hasn't changed and firing scripts/alarms on quality-only
changes would cause spurious evaluations.
Previously, failover only triggered when ConnectAsync failed consecutively.
If a connection succeeded but went stale quickly (e.g., heartbeat timeout),
the failure counter reset on each successful connect and failover never
triggered.
Added a separate _consecutiveUnstableDisconnects counter that increments
when a connection lasts less than StableConnectionThreshold (60s) before
disconnecting. When this counter reaches failoverRetryCount, the actor
fails over to the backup endpoint. Stable connections (lasting >60s)
reset this counter.
The original connection-failure failover path is unchanged.
Existing site databases created before the primary/backup data connections
feature lack the backup_configuration and failover_retry_count columns.
Added TryAddColumnAsync migration that runs on startup after table creation.
Composable StaleTagMonitor class in Commons fires a Stale event when no
value is received within a configurable max silence period. Integrated
into both LmxProxyDataConnection and OpcUaDataConnection adapters via
optional HeartbeatTagPath/HeartbeatMaxSilence connection config keys.
When stale, the adapter fires Disconnected triggering the standard
reconnect cycle. 10 unit tests cover timer behavior.
Add NodeStatus record, IClusterNodeProvider interface, and AkkaClusterNodeProvider
that queries Akka cluster membership for all site-role nodes. HealthReportSender
populates ClusterNodes before each report. UI shows a row per node with
hostname, Online/Offline badge, and Primary/Standby badge. Falls back to
single-node display if ClusterNodes is not populated.
- AkkaHostedService: SetNodeHostname from NodeOptions
- DataConnectionActor: UpdateConnectionEndpoint on state transitions,
track per-tag quality counts and UpdateTagQuality on value changes
- HealthReportSender: query StoreAndForwardStorage for parked message count
- StoreAndForwardStorage: add GetParkedMessageCountAsync()
New fields in SiteHealthReport: NodeHostname, DataConnectionEndpoints
(primary/secondary), DataConnectionTagQuality (good/bad/uncertain),
ParkedMessageCount. New collector methods to populate them.
Health dashboard redesigned to match mockup: Nodes | Data Connections
(with per-connection tag quality) | Instances + S&F Buffers | Error
Counts + Parked Messages. Site names resolved from repository.
Keys from KeySelector (e.g. boxed int) were compared against string keys
restored from sessionStorage, causing expansion state to be lost on
navigation. All keys are now normalized to strings internally.
Areas page now shows a single TreeView with sites as roots and areas as
children. Context menus: sites get "Add Area", areas get "Add Child Area",
"Edit Area", "Delete Area" — each navigating to a dedicated page.
The Delete Area page shows a TreeView of the area and all recursive children
with assigned instances. Deletion is blocked if any instances are assigned
to the area or its descendants.
The SiteCommunicationActor expected an event log handler but none was
registered, causing "Event log handler not available" on the Event Logs
page and CLI. Bridge IEventLogQueryService to Akka via a simple actor.
Move connection bindings, attribute overrides, and area assignment from
inline expandable rows on the Instances table to a separate page at
/deployment/instances/{id}/configure for a cleaner, less cramped UX.
Replace raw dictionary casting with ScriptParameters wrapper that provides
Get<T>, Get<T?>, Get<T[]>, and Get<List<T>> with clear error messages,
numeric conversion, and JsonElement support for Inbound API parameters.
The LmxProxy client's ExtractArrayValue now returns proper .NET arrays
(bool[], int[], DateTime[], etc.) instead of ArrayValue objects. Removed
the reflection-based FormatArrayContainer logic — IEnumerable handling
is sufficient for all array types.
ArrayValue from LmxProxy client was showing as type name in debug views.
Added ValueFormatter utility and NormalizeValue in LmxProxyDataConnection
to convert arrays at the adapter boundary. DateTime arrays remain as
"System.DateTime[]" due to server-side v1 string serialization.
Docker: include lmxproxy/src/ZB.MOM.WW.LmxProxy.Client in build context
so the project reference resolves during container image build.
CLI: fix set-bindings JSON parsing — use JsonElement.GetString()/GetInt32()
instead of object.ToString() which returned null for deserialized elements.
Thread backup data connection fields through management command messages,
ManagementActor handlers, SiteService, site-side SQLite storage, and
deployment/replication actors. The old --configuration CLI flag is kept
as a hidden alias for backwards compatibility.
Add ActiveEndpoint field to DataConnectionHealthReport showing which
endpoint is active (Primary, Backup, or Primary with no backup configured).
Log failover transitions and connection restoration events to the site
event log via ISiteEventLogger, passed as an optional parameter through
the actor hierarchy for backwards compatibility.
Update CreateConnectionCommand to carry PrimaryConnectionDetails,
BackupConnectionDetails, and FailoverRetryCount. Update all callers:
DataConnectionManagerActor, DataConnectionActor, DeploymentManagerActor,
FlatteningService, and ConnectionConfig. The actor stores both configs
but continues using primary only — failover logic comes in Task 3.
Switches from v1 string-based proto stubs to the production LmxProxyClient
(v2 native TypedValue protocol) via project reference. Deletes 6k+ lines of
generated proto code. Preserves ILmxProxyClient adapter interface for testability.
Replace SiteDataConnectionAssignment join table with a direct SiteId FK on DataConnection,
simplifying the data model, repositories, UI, CLI, and deployment service.
Move all CRUD create/edit forms from inline on list pages to dedicated form pages
with back-button navigation and post-save redirect. Add Playwright Docker container
(browser server on port 3000) with 25 passing E2E tests covering login, navigation,
and site CRUD workflows. Add POST /auth/token endpoint for clean JWT retrieval.
Switch site host to WebApplicationBuilder with Kestrel HTTP/2 gRPC server,
add GrpcPort/keepalive config, wire SiteStreamManager as ISiteStreamSubscriber,
expose gRPC ports in docker-compose, add site seed script, update all 10
requirement docs + CLAUDE.md + README.md for the new dual-transport architecture.
After receiving the initial snapshot via ClusterClient, the bridge actor
now opens a gRPC server-streaming subscription via SiteStreamGrpcClient
for ongoing AttributeValueChanged/AlarmStateChanged events. Adds NodeA/
NodeB failover with max 3 retries, retry count reset on successful event,
and IWithTimers-based reconnect scheduling.
- DebugStreamBridgeActor: gRPC stream after snapshot, reconnect state machine
- DebugStreamService: inject SiteStreamGrpcClientFactory, resolve gRPC addresses
- ServiceCollectionExtensions: register SiteStreamGrpcClientFactory singleton
- SiteStreamGrpcClient: make SubscribeAsync/Unsubscribe virtual for testability
- SiteStreamGrpcClientFactory: make GetOrCreate virtual for testability
- New test suite: DebugStreamBridgeActorTests (8 tests)
Per-site gRPC client for central-side streaming subscriptions to site
servers. SiteStreamGrpcClient manages server-streaming calls with
keepalive, converts proto events to domain types, and supports
cancellation via Unsubscribe. SiteStreamGrpcClientFactory caches one
client per site identifier.
Includes InternalsVisibleTo for test access to conversion helpers and
comprehensive unit tests for event mapping, quality/alarm-state
conversion, unsubscribe behavior, and factory caching.