Commit Graph

413 Commits

Author SHA1 Message Date
Joseph Doherty
5fdeaf613f feat(dcl): failover on repeated unstable connections (connect-then-stale pattern)
Previously, failover only triggered when ConnectAsync failed consecutively.
If a connection succeeded but went stale quickly (e.g., heartbeat timeout),
the failure counter reset on each successful connect and failover never
triggered.

Added a separate _consecutiveUnstableDisconnects counter that increments
when a connection lasts less than StableConnectionThreshold (60s) before
disconnecting. When this counter reaches failoverRetryCount, the actor
fails over to the backup endpoint. Stable connections (lasting >60s)
reset this counter.

The original connection-failure failover path is unchanged.
2026-03-24 16:19:39 -04:00
Joseph Doherty
ff2784b862 fix(site-runtime): add SQLite schema migration for backup_configuration column
Existing site databases created before the primary/backup data connections
feature lack the backup_configuration and failover_retry_count columns.
Added TryAddColumnAsync migration that runs on startup after table creation.
2026-03-24 16:19:39 -04:00
Joseph Doherty
0d03aec4f2 feat(dcl): log connection disconnect events to site event log 2026-03-24 16:19:39 -04:00
Joseph Doherty
d4397910f0 feat(dcl): add StaleTagMonitor for heartbeat-based disconnect detection
Composable StaleTagMonitor class in Commons fires a Stale event when no
value is received within a configurable max silence period. Integrated
into both LmxProxyDataConnection and OpcUaDataConnection adapters via
optional HeartbeatTagPath/HeartbeatMaxSilence connection config keys.
When stale, the adapter fires Disconnected triggering the standard
reconnect cycle. 10 unit tests cover timer behavior.
2026-03-24 16:19:39 -04:00
Joseph Doherty
02a7e8abc6 feat(health): show all cluster nodes (online/offline, primary/standby) in health dashboard
Add NodeStatus record, IClusterNodeProvider interface, and AkkaClusterNodeProvider
that queries Akka cluster membership for all site-role nodes. HealthReportSender
populates ClusterNodes before each report. UI shows a row per node with
hostname, Online/Offline badge, and Primary/Standby badge. Falls back to
single-node display if ClusterNodes is not populated.
2026-03-24 16:19:39 -04:00
Joseph Doherty
65cc7b69cd feat(health): wire up NodeHostname, ConnectionEndpoint, TagQuality, ParkedMessageCount collectors
- AkkaHostedService: SetNodeHostname from NodeOptions
- DataConnectionActor: UpdateConnectionEndpoint on state transitions,
  track per-tag quality counts and UpdateTagQuality on value changes
- HealthReportSender: query StoreAndForwardStorage for parked message count
- StoreAndForwardStorage: add GetParkedMessageCountAsync()
2026-03-24 16:19:39 -04:00
Joseph Doherty
e84a831a02 feat(health): redesign health dashboard with 4-column layout and new metrics
New fields in SiteHealthReport: NodeHostname, DataConnectionEndpoints
(primary/secondary), DataConnectionTagQuality (good/bad/uncertain),
ParkedMessageCount. New collector methods to populate them.

Health dashboard redesigned to match mockup: Nodes | Data Connections
(with per-connection tag quality) | Instances + S&F Buffers | Error
Counts + Parked Messages. Site names resolved from repository.
2026-03-24 16:19:39 -04:00
Joseph Doherty
5e2a4c9080 fix(ui): align TreeView node text by giving toggle and spacer equal fixed width 2026-03-24 16:19:39 -04:00
Joseph Doherty
0abaa47de2 fix(ui): normalize TreeView expanded keys to strings for sessionStorage compatibility
Keys from KeySelector (e.g. boxed int) were compared against string keys
restored from sessionStorage, causing expansion state to be lost on
navigation. All keys are now normalized to strings internally.
2026-03-24 16:19:39 -04:00
Joseph Doherty
a0a6bb4986 refactor(ui): replace manual template inheritance tree with TreeView component 2026-03-24 16:19:39 -04:00
Joseph Doherty
2b5dabb336 refactor(ui): redesign Areas page with TreeView and dedicated Add/Edit/Delete pages
Areas page now shows a single TreeView with sites as roots and areas as
children. Context menus: sites get "Add Area", areas get "Add Child Area",
"Edit Area", "Delete Area" — each navigating to a dedicated page.

The Delete Area page shows a TreeView of the area and all recursive children
with assigned instances. Deletion is blocked if any instances are assigned
to the area or its descendants.
2026-03-24 16:19:39 -04:00
Joseph Doherty
968fc4adc7 fix(ui): disable site and instance dropdowns while debug view is connected 2026-03-24 16:19:39 -04:00
Joseph Doherty
4c7fa03c07 fix(ui): remove default list-style bullets from TreeView ul elements 2026-03-24 16:19:39 -04:00
Joseph Doherty
addbb6ffeb fix(ui): move treeview-storage.js to Host wwwroot where static files are served 2026-03-24 16:19:39 -04:00
Joseph Doherty
f1537b62ca refactor(ui): replace instances table with hierarchical TreeView (Site → Area → Instance) 2026-03-24 16:19:39 -04:00
Joseph Doherty
71894f4ba9 refactor(ui): replace manual area tree rendering with TreeView component 2026-03-24 16:19:39 -04:00
Joseph Doherty
4426f3e928 refactor(ui): replace data connections table with TreeView grouped by site 2026-03-24 16:19:39 -04:00
Joseph Doherty
08d511f609 test(ui): add external filtering tests for TreeView (R8) 2026-03-24 16:19:39 -04:00
Joseph Doherty
4e5b5facec feat(ui): add right-click context menu to TreeView (R15) 2026-03-24 16:19:39 -04:00
Joseph Doherty
f127efe6ea feat(ui): add ExpandAll, CollapseAll, RevealNode to TreeView (R12, R13) 2026-03-24 16:19:39 -04:00
Joseph Doherty
d3a6ed5f68 feat(ui): add sessionStorage persistence for TreeView expansion state (R11) 2026-03-24 16:19:39 -04:00
Joseph Doherty
da4f29f6ee feat(ui): add selection support to TreeView (R5) 2026-03-24 16:19:39 -04:00
Joseph Doherty
75648c0c76 feat(ui): add TreeView<TItem> component with core rendering, expand/collapse, ARIA (R1-R4, R14) 2026-03-24 16:19:39 -04:00
Joseph Doherty
b3222cf30b fix(site-runtime): wire EventLogHandlerActor so site event log queries work
The SiteCommunicationActor expected an event log handler but none was
registered, causing "Event log handler not available" on the Event Logs
page and CLI. Bridge IEventLogQueryService to Akka via a simple actor.
2026-03-23 00:37:33 -04:00
Joseph Doherty
bc4fc97652 refactor(ui): extract instance bindings and overrides to dedicated Configure page
Move connection bindings, attribute overrides, and area assignment from
inline expandable rows on the Instances table to a separate page at
/deployment/instances/{id}/configure for a cleaner, less cramped UX.
2026-03-22 15:58:32 -04:00
Joseph Doherty
161dc406ed feat(scripts): add typed Parameters.Get<T>() helpers for script API
Replace raw dictionary casting with ScriptParameters wrapper that provides
Get<T>, Get<T?>, Get<T[]>, and Get<List<T>> with clear error messages,
numeric conversion, and JsonElement support for Inbound API parameters.
2026-03-22 15:47:18 -04:00
Joseph Doherty
ecf4b434c2 refactor(dcl): simplify ValueFormatter now that SDK returns native .NET arrays
The LmxProxy client's ExtractArrayValue now returns proper .NET arrays
(bool[], int[], DateTime[], etc.) instead of ArrayValue objects. Removed
the reflection-based FormatArrayContainer logic — IEnumerable handling
is sufficient for all array types.
2026-03-22 15:15:38 -04:00
Joseph Doherty
dcdf79afdc fix(dcl): format ArrayValue objects as comma-separated strings for display
ArrayValue from LmxProxy client was showing as type name in debug views.
Added ValueFormatter utility and NormalizeValue in LmxProxyDataConnection
to convert arrays at the adapter boundary. DateTime arrays remain as
"System.DateTime[]" due to server-side v1 string serialization.
2026-03-22 14:46:15 -04:00
Joseph Doherty
ea9c2857a7 fix(docker,cli): add LmxProxy.Client to Docker build, fix set-bindings JSON parsing
Docker: include lmxproxy/src/ZB.MOM.WW.LmxProxy.Client in build context
so the project reference resolves during container image build.

CLI: fix set-bindings JSON parsing — use JsonElement.GetString()/GetInt32()
instead of object.ToString() which returned null for deserialized elements.
2026-03-22 14:25:09 -04:00
Joseph Doherty
e8df71ea64 feat(cli): add --primary-config, --backup-config, --failover-retry-count to data connection commands
Thread backup data connection fields through management command messages,
ManagementActor handlers, SiteService, site-side SQLite storage, and
deployment/replication actors. The old --configuration CLI flag is kept
as a hidden alias for backwards compatibility.
2026-03-22 08:41:57 -04:00
Joseph Doherty
ab4e88f17f feat(ui): add primary/backup endpoint fields to data connection form 2026-03-22 08:36:18 -04:00
Joseph Doherty
801c0c1df2 feat(dcl): add active endpoint to health reports and log failover events
Add ActiveEndpoint field to DataConnectionHealthReport showing which
endpoint is active (Primary, Backup, or Primary with no backup configured).
Log failover transitions and connection restoration events to the site
event log via ISiteEventLogger, passed as an optional parameter through
the actor hierarchy for backwards compatibility.
2026-03-22 08:34:05 -04:00
Joseph Doherty
da290fa4f8 feat(dcl): add failover state machine to DataConnectionActor with round-robin endpoint switching 2026-03-22 08:30:03 -04:00
Joseph Doherty
46304678da feat(dcl): extend CreateConnectionCommand with backup config and failover retry count
Update CreateConnectionCommand to carry PrimaryConnectionDetails,
BackupConnectionDetails, and FailoverRetryCount. Update all callers:
DataConnectionManagerActor, DataConnectionActor, DeploymentManagerActor,
FlatteningService, and ConnectionConfig. The actor stores both configs
but continues using primary only — failover logic comes in Task 3.
2026-03-22 08:24:39 -04:00
Joseph Doherty
04af03980e feat(dcl): rename Configuration to PrimaryConfiguration, add BackupConfiguration and FailoverRetryCount 2026-03-22 08:18:31 -04:00
Joseph Doherty
5ec7f35150 feat(dcl): replace hand-rolled LmxProxy gRPC client with real LmxProxyClient library
Switches from v1 string-based proto stubs to the production LmxProxyClient
(v2 native TypedValue protocol) via project reference. Deletes 6k+ lines of
generated proto code. Preserves ILmxProxyClient adapter interface for testability.
2026-03-22 07:55:50 -04:00
Joseph Doherty
970d0a5cb3 refactor: simplify data connections from many-to-many site assignment to direct site ownership
Replace SiteDataConnectionAssignment join table with a direct SiteId FK on DataConnection,
simplifying the data model, repositories, UI, CLI, and deployment service.
2026-03-21 21:07:10 -04:00
Joseph Doherty
d3194e3634 feat: separate create/edit form pages, Playwright test infrastructure, /auth/token endpoint
Move all CRUD create/edit forms from inline on list pages to dedicated form pages
with back-button navigation and post-save redirect. Add Playwright Docker container
(browser server on port 3000) with 25 passing E2E tests covering login, navigation,
and site CRUD workflows. Add POST /auth/token endpoint for clean JWT retrieval.
2026-03-21 15:17:24 -04:00
Joseph Doherty
b3f8850711 docs: document script hot-reload mechanisms for all script types 2026-03-21 13:42:06 -04:00
Joseph Doherty
eeca930cbd fix: add EF migration for GrpcNodeAAddress/GrpcNodeBAddress columns on Sites table 2026-03-21 12:44:21 -04:00
Joseph Doherty
416a03b782 feat: complete gRPC streaming channel — site host, docker config, docs, integration tests
Switch site host to WebApplicationBuilder with Kestrel HTTP/2 gRPC server,
add GrpcPort/keepalive config, wire SiteStreamManager as ISiteStreamSubscriber,
expose gRPC ports in docker-compose, add site seed script, update all 10
requirement docs + CLAUDE.md + README.md for the new dual-transport architecture.
2026-03-21 12:38:33 -04:00
Joseph Doherty
49f042a937 refactor: remove ClusterClient streaming path (DebugStreamEvent), events flow via gRPC 2026-03-21 12:18:52 -04:00
Joseph Doherty
2cd43b6992 feat: update DebugStreamBridgeActor to use gRPC for streaming events
After receiving the initial snapshot via ClusterClient, the bridge actor
now opens a gRPC server-streaming subscription via SiteStreamGrpcClient
for ongoing AttributeValueChanged/AlarmStateChanged events. Adds NodeA/
NodeB failover with max 3 retries, retry count reset on successful event,
and IWithTimers-based reconnect scheduling.

- DebugStreamBridgeActor: gRPC stream after snapshot, reconnect state machine
- DebugStreamService: inject SiteStreamGrpcClientFactory, resolve gRPC addresses
- ServiceCollectionExtensions: register SiteStreamGrpcClientFactory singleton
- SiteStreamGrpcClient: make SubscribeAsync/Unsubscribe virtual for testability
- SiteStreamGrpcClientFactory: make GetOrCreate virtual for testability
- New test suite: DebugStreamBridgeActorTests (8 tests)
2026-03-21 12:14:24 -04:00
Joseph Doherty
25a6022f7b feat: add SiteStreamGrpcClient and SiteStreamGrpcClientFactory
Per-site gRPC client for central-side streaming subscriptions to site
servers. SiteStreamGrpcClient manages server-streaming calls with
keepalive, converts proto events to domain types, and supports
cancellation via Unsubscribe. SiteStreamGrpcClientFactory caches one
client per site identifier.

Includes InternalsVisibleTo for test access to conversion helpers and
comprehensive unit tests for event mapping, quality/alarm-state
conversion, unsubscribe behavior, and factory caching.
2026-03-21 12:06:38 -04:00
Joseph Doherty
55a05914d0 feat: add SiteStreamGrpcServer with Channel<T> bridge and stream limits
- Define ISiteStreamSubscriber interface for decoupling from SiteRuntime
- Implement SiteStreamGrpcServer (inherits SiteStreamServiceBase) with:
  - Readiness gate (SetReady)
  - Max concurrent stream enforcement
  - Duplicate correlationId replacement (cancels previous stream)
  - StreamRelayActor creation per subscription
  - Bounded Channel<SiteStreamEvent> bridge (1000 capacity, drop-oldest)
  - Clean teardown: unsubscribe, stop actor, remove tracking entry
- Identity-safe cleanup using ConcurrentDictionary.TryRemove(KeyValuePair)
  to prevent replacement streams from being removed by predecessor cleanup
- 7 unit tests covering reject-not-ready, max-streams, duplicate cancel,
  cleanup-on-cancel, subscribe/remove lifecycle, event forwarding
2026-03-21 11:52:31 -04:00
Joseph Doherty
d70bbbe739 feat: add StreamRelayActor bridging Akka events to gRPC proto channel 2026-03-21 11:48:04 -04:00
Joseph Doherty
9b0a80dcbd feat: add GrpcNodeAAddress/GrpcNodeBAddress to Site entity, CLI, and UI 2026-03-21 11:45:22 -04:00
Joseph Doherty
64ee316609 feat: add GrpcPort config to NodeOptions with startup validation 2026-03-21 11:42:41 -04:00
Joseph Doherty
826cfbee31 feat: add sitestream.proto definition and generated gRPC stubs
Define the SiteStreamService proto for real-time instance event
streaming (attribute value changes, alarm state changes) from site
nodes to central. Add pre-generated C# stubs following the existing
LmxProxy pattern, gRPC NuGet packages with FrameworkReference for
ASP.NET Core server types, and proto roundtrip tests.
2026-03-21 11:37:39 -04:00
Joseph Doherty
3efec91386 fix: route debug stream events through ClusterClient site→central path
ClusterClient Sender refs are temporary proxies — valid for immediate reply
but not durable for future Tells. Events now flow as DebugStreamEvent through
SiteCommunicationActor → ClusterClient → CentralCommunicationActor → bridge
actor (same pattern as health reports). Also fix DebugStreamHub to use
IHubContext for long-lived callbacks instead of transient hub instance.
2026-03-21 11:32:17 -04:00