Add ActiveNodeHealthCheck that returns 200 only on the Akka.NET cluster
leader, enabling Traefik to route traffic to the active central node and
automatically fail over when the leader changes. Also fixes AkkaClusterHealthCheck
to resolve ActorSystem from AkkaHostedService (was always null via DI).
Replace the CLI's Akka.NET ClusterClient transport with a simple HTTP client
targeting a new POST /management endpoint on the Central Host. The endpoint
handles Basic Auth, LDAP authentication, role resolution, and ManagementActor
dispatch in a single round-trip — eliminating the CLI's Akka, LDAP, and
Security dependencies.
Also fixes DCL ReSubscribeAll losing subscriptions on repeated reconnect by
deriving the tag list from _subscriptionsByInstance instead of _subscriptionIds.
- Add JoeAppEngine folder to OPC UA nodes.json (BTCS, AlarmCntsBySeverity, Scheduler/ScanTime)
- Fix DataConnectionActor: capture Self in PreStart for use from non-actor threads,
preventing Self.Tell failure in Disconnected event handler
- Implement InstanceActor.HandleConnectionQualityChanged to mark attributes Bad on disconnect
- Fix LmxFakeProxy TagMapper to serialize arrays as JSON instead of "System.Int32[]"
- Allow DataType and DataSourceReference updates in TemplateService.UpdateAttributeAsync
- Update test_infra_opcua.md with JoeAppEngine documentation
Defines a gRPC server implementing the scada.ScadaService proto that bridges
to the existing OPC UA test server. Enables end-to-end testing of
RealLmxProxyClient without a Windows LmxProxy deployment.
Replace stub ILmxProxyClient with production proto-generated gRPC client
(RealLmxProxyClient) that connects to LmxProxy servers with x-api-key
metadata header authentication. Includes pre-generated proto stubs for
ARM64 Docker compatibility, updated adapter with proper quality mapping
(Good/Uncertain/Bad), subscription via server-streaming RPC, and 20 unit
tests covering all operations. Updated Component-DataConnectionLayer.md
to reflect the actual implementation.
- InboundScriptExecutor lazy-compiles scripts on first request, solving
the multi-node problem where methods created via CLI/UI were only compiled
on the ManagementActor's node, not the node handling the HTTP request.
- ManagementActor hot-registers API method scripts on create/update/delete
for the local node.
- FlatteningService prefixes the "attribute" field in composed alarm trigger
configs with the composition instance name so alarms evaluate against the
correct path-qualified attribute (e.g. CoolingTank.Level not Level).
NotificationRepository.GetAllNotificationListsAsync() was missing
.Include(Recipients), causing artifact deployments to push empty recipient
lists to sites. Also load shared scripts from SQLite on DeploymentManager
startup so they're available before Instance Actors compile their scripts.
Completes the Inbound API → site script call chain by adding RouteToCallRequest
handlers in SiteCommunicationActor and DeploymentManagerActor. Also replaces the
placeholder dispatch table in InboundScriptExecutor with Roslyn compilation of
API method scripts at startup, enabling user-defined inbound API methods to call
instance scripts across the cluster.
Add SiteReplicationActor (runs on every site node) to replicate deployed
configs and store-and-forward buffer operations to the standby peer via
cluster member discovery and fire-and-forget Tell. Wire ReplicationService
handler and pass replication actor to DeploymentManagerActor singleton.
Fix 5 pre-existing ConfigurationDatabase test failures: RowVersion NOT NULL
on SQLite, stale migration name assertion, and seed data count mismatch.
Three runtime bugs fixed:
- DataConnectionActor: TagValueReceived/TagResolutionSucceeded/Failed not
handled in any Become state — OPC UA values went to dead letters. Added
initial read after subscribe to seed current values immediately.
- AlarmActor: ParseEvalConfig expected "attributeName"/"matchValue"/"min"/
"max" keys but seed data uses "attribute"/"value"/"high"/"low". Added
support for both conventions and !=prefix for not-equal matching.
- InstanceActor: snapshots reported all alarms (including unevaluated) with
correct priorities and source timestamps instead of current UTC. Removed
bogus Vibration template attribute that shadowed Speed's tag mapping.
Adds `debug snapshot --id <int>` to query a running instance's current
attribute values and alarm states without the subscribe/stream overhead
of the debug view. Routes through ManagementActor → CommunicationService
→ site DeploymentManager → InstanceActor using the existing remote query
pattern.
IServiceProvider now flows through the actor chain (DeploymentManagerActor
→ InstanceActor → ScriptActor → ScriptExecutionActor) so scripts can
resolve IExternalSystemClient, IDatabaseGateway, and
INotificationDeliveryService from DI. ScriptGlobals exposes ExternalSystem,
Database, Notify, and Scripts as top-level properties so scripts can use
them without the Instance. prefix.
Both nodes of a site cluster were sending health reports. The standby
node (without the DeploymentManager singleton) reported 0 instances and
no connections, overwriting the active node's data in the aggregator.
Added IsActiveNode flag to ISiteHealthCollector, set by
DeploymentManagerActor on PreStart/PostStop. HealthReportSender skips
sending when the node is not active. Also ensured EnsureDclConnections
is called during startup batch creation so data connections survive
container restarts.
UpdateInstanceCounts() was only called before Instance Actors were
created (in HandleStartupConfigsLoaded), showing 0 enabled on the
health dashboard. Now also called after each batch in
HandleStartNextBatch to reflect actual running actor count.
Add 33 new management message records, ManagementActor handlers, and CLI
commands to close all functionality gaps between the Central UI and the
Management CLI. New capabilities include:
- Template member CRUD (attributes, alarms, scripts, compositions)
- Shared script CRUD
- Database connection definition CRUD
- Inbound API method CRUD
- LDAP scope rule management
- API key enable/disable
- Area update
- Remote event log and parked message queries
- Missing get/update commands for templates, sites, instances, data
connections, external systems, notifications, and SMTP config
Includes 12 new ManagementActor unit tests covering authorization,
happy-path queries, and error handling. Updates CLI README and component
design documents (Component-CLI.md, Component-ManagementService.md).
Wired ISiteHealthCollector calls for script errors (ScriptExecutionActor),
alarm eval errors (AlarmActor), dead letters (DeadLetterMonitorActor), and
S&F buffer depth placeholder. Added instance count tracking (deployed/
enabled/disabled) to SiteHealthReport via DeploymentManagerActor. Updated
Health Dashboard UI to show instance counts per site. All metrics flow
through the existing health report pipeline via ClusterClient.
DeploymentManagerActor deserialized connection config JSON as
Dictionary<string, string>, which silently failed on non-string values
like {"publishInterval":1000}. The OPC UA adapter then fell back to
localhost:4840 (unreachable in Docker). Now uses JsonDocument to handle
any JSON value type. OPC PLC Simulator connects successfully.
Areas management is a design concern, not admin. Moved Areas page
authorization from RequireAdmin to RequireDesign, moved nav link from
Admin to Design section, updated ManagementActor role check. Added
GET /logout endpoint (was 404, now redirects to login). Improved Sign
Out button visibility in sidebar next to username.
DataConnectionActor now calls UpdateConnectionHealth() on state
transitions (Connecting/Connected/Reconnecting) and UpdateTagResolution()
on connection establishment. DataConnectionManagerActor calls
RemoveConnection() on actor removal. Health reports now include
data connection statuses when instances are deployed with bindings.
Central and site clusters now communicate via ClusterClient/
ClusterClientReceptionist instead of direct ActorSelection. Both
CentralCommunicationActor and SiteCommunicationActor are registered
with their cluster's receptionist. Central creates one ClusterClient
per site using NodeA/NodeB contact points from the DB. Sites configure
multiple CentralContactPoints for automatic failover between central
nodes. ISiteClientFactory enables test injection.
Sites now send SiteHealthReport via AkkaHealthReportTransport →
SiteCommunicationActor → CentralCommunicationActor → CentralHealthAggregator.
Added IHealthReportTransport impl, ISiteIdentityProvider impl, registered
HealthReportSender on site nodes, and added SiteHealthReport handler in
CentralCommunicationActor. Health Dashboard now shows all 3 sites online.
Central now resolves site Akka remoting addresses from the Sites DB table
(NodeAAddress/NodeBAddress) instead of relying on runtime RegisterSite
messages. Eliminates the race condition where sites starting before central
had their registration dead-lettered. Addresses are cached in
CentralCommunicationActor with 60s periodic refresh and on-demand refresh
when sites are added/edited/deleted via UI or CLI.
Multi-stage Dockerfile with NuGet restore layer caching, per-node appsettings
with Docker hostnames, shared bridge network with infra services, and
build/deploy/teardown scripts. Ports use 90xx block to avoid conflicts.
Attributes bound to data connections now initialize with "Uncertain" quality,
distinguishing "never received a value" from "known good" or "connection lost."
Quality is tracked per attribute and included in GetAttributeResponse.
Two Akka.NET deserialization bugs prevented CLI commands from reaching ManagementActor: IReadOnlyList<string> in AuthenticatedUser serialized as a compiler-generated internal type unknown to the server, and ManagementSuccess.Data carried server-side assembly types the CLI couldn't resolve on receipt. Fixed by using string[] for roles and pre-serializing response data to JSON in ManagementActor before sending. Adds full CLI reference documentation covering all 10 command groups.
New components 18-19: ManagementService (Akka.NET actor on Central exposing
all admin operations via ClusterClientReceptionist) and CLI (console app using
ClusterClient for scripting). Updated HighLevelReqs, CLAUDE.md, README,
Component-Host, Component-Communication, Component-Security.
All 22 occurrences of hardcoded "system" user string replaced with
GetCurrentUserAsync() which reads the Username claim from AuthenticationState.
Affected: Instances.razor (6), Sites.razor (2), Templates.razor (11),
SharedScripts.razor (3).