AddZbTelemetry (shared OTel Resource + standard instrumentation + /metrics) wired into both Central and Site composition roots; kept LoggerConfigurationFactory (min-level governance) and added the shared TraceContextEnricher for trace<->log correlation. Behaviour-preserving (no AddZbSerilog; factory retained).
ScadaBridge
ScadaBridge is a centrally-managed, distributed SCADA configuration and deployment platform built on Akka.NET, running across a central cluster and multiple site clusters in a hub-and-spoke topology.
Overview
This repository is the full implementation project for ScadaBridge — the C#/.NET source (src/), tests (tests/), deployable Docker topology (docker/, docker-env2/, infra/), and the design documentation (docs/) that the code implements. This README is the master index: it links the per-component design specs (the spec the code in src/ implements) and shows the system architecture. The solution file is ZB.MOM.WW.ScadaBridge.slnx.
Technology Stack
| Layer | Technology |
|---|---|
| Runtime | .NET, Akka.NET (actors, clustering, remoting, persistence, streams) |
| Central UI | Blazor Server (ASP.NET Core + SignalR) |
| Inbound API | ASP.NET Core Web API (REST/JSON) |
| Central Database | MS SQL Server, Entity Framework Core |
| Site Storage | SQLite (deployed configs, S&F buffer, event logs) |
| Authentication | Direct LDAP/AD bind (LDAPS/StartTLS), JWT sessions |
| Notifications | Delivered from the central cluster (SMTP, OAuth2/Microsoft 365); store-and-forwarded from sites |
| Hosting | Windows Server, Windows Service |
| Cluster | Akka.NET Cluster (active/standby, keep-oldest SBR) |
| Logging | Serilog (structured) |
Scale
- Central cluster: 2-node active/standby behind a load balancer.
- Site clusters: 2-node active/standby, headless (no UI).
Repository Layout
| Path | Contents |
|---|---|
src/ |
C#/.NET implementation — one project per component (ZB.MOM.WW.ScadaBridge.<Component>). Solution: ZB.MOM.WW.ScadaBridge.slnx. |
tests/ |
Unit and integration test projects. |
docs/ |
Design documentation — docs/requirements/ (high-level + per-component specs, the spec the code implements), docs/test_infra/, docs/plans/. |
docker/ |
Primary 8-node cluster topology (2 central + 3 sites × 2 nodes + Traefik) + deploy.sh. |
docker-env2/ |
Minimal second cluster (2 central + 1 site) for exercising Transport (#24) against a real second environment. |
infra/ |
Local test services (MS SQL, LDAP, OPC UA, SMTP, REST API, Traefik). |
deploy/ |
Production/on-host deployment artifacts (e.g. wonder-app-vd03/). |
AkkaDotNet/ |
Akka.NET reference notes. |
Build, Test & Run
# Build the solution
dotnet build ZB.MOM.WW.ScadaBridge.slnx
# Run the tests
dotnet test ZB.MOM.WW.ScadaBridge.slnx
# Bring up the primary local cluster (builds the scadabridge:latest image + recreates containers)
bash docker/deploy.sh # central load balancer at http://localhost:9000
# Drive the system from the CLI (reads ~/.scadabridge/config.json; test user has all roles)
dotnet run --project src/ZB.MOM.WW.ScadaBridge.CLI -- \
--username multi-role --password password template list
See docker/README.md for ports and management commands, and src/ZB.MOM.WW.ScadaBridge.CLI/README.md for the full CLI reference.
Local Test Environments
Two Docker-based cluster topologies are available for local development and testing:
- Primary (
docker/) — Full topology (2 central + 3 sites × 2 nodes + Traefik). Default development target. - Env2 (
docker-env2/) — Minimal sibling stack (2 central + 1 site × 2 nodes + Traefik), runs concurrently with primary on host ports 91XX. Purpose: exercise the Transport (#24) bundle export/import feature against a real second environment.
Both stacks share the infrastructure services in infra/ (MS SQL, LDAP, SMTP, OPC UA, REST API).
Document Map
Requirements
- HighLevelReqs.md — Complete high-level requirements covering all functional areas.
Component Design Documents
| # | Component | Document | Description |
|---|---|---|---|
| 1 | Template Engine | docs/requirements/Component-TemplateEngine.md | Template modeling, inheritance, composition, path-qualified member addressing, override granularity, locking, alarms, native alarm source bindings, flattening, semantic validation, revision hashing, diff calculation, and folder organization (nested folders, drag-drop). |
| 2 | Deployment Manager | docs/requirements/Component-DeploymentManager.md | Central-side deployment pipeline with deployment ID/idempotency, per-instance operation lock, state transition matrix, all-or-nothing site apply, system-wide artifact deployment with per-site status. |
| 3 | Site Runtime | docs/requirements/Component-SiteRuntime.md | Site-side actor hierarchy with explicit supervision strategies, staggered startup, script trust model (constrained APIs), Tell/Ask conventions, concurrency serialization, site-wide Akka stream with per-subscriber backpressure, and a read-only Native Alarm Actor (peer to the computed Alarm Actor) mirroring native OPC UA A&C / MxAccess alarms with site SQLite persistence. |
| 4 | Data Connection Layer | docs/requirements/Component-DataConnectionLayer.md | Common data connection interface (OPC UA, MxGateway, custom), Become/Stash connection actor model, auto-reconnect, immediate bad quality on disconnect, transparent re-subscribe, synchronous write failures, tag path resolution retry, protocol-agnostic address-space browse, and optional read-only native alarm mirroring (IAlarmSubscribableConnection, one alarm feed per connection with snapshot replay). |
| 5 | Central–Site Communication | docs/requirements/Component-Communication.md | Dual transport: Akka.NET ClusterClient (command/control) + gRPC server-streaming (real-time data). 9 message patterns with per-pattern timeouts, SiteStreamGrpcServer/Client, application-level correlation IDs, transport heartbeat config, gRPC keepalive, message ordering, connection failure behavior. The gRPC stream additively carries the read-only native alarm mirror (computed + native OPC UA / MxAccess) via the enriched AlarmStateUpdate. |
| 6 | Store-and-Forward Engine | docs/requirements/Component-StoreAndForward.md | Buffering (transient failures only), fixed-interval retry, parking, async best-effort replication, SQLite persistence at sites. |
| 7 | External System Gateway | docs/requirements/Component-ExternalSystemGateway.md | HTTP/REST + JSON, API key/Basic Auth, per-system timeout, dual call modes (Call/CachedCall), transient/permanent error classification, dedicated blocking I/O dispatcher, ADO.NET connection pooling. |
| 8 | Notification Service | docs/requirements/Component-NotificationService.md | Central-only — manages typed notification-list and SMTP definitions, supplies per-type delivery adapters (SMTP with OAuth2 (M365) or Basic Auth, BCC, plain text); delivery performed by the Notification Outbox. |
| 9 | Central UI | docs/requirements/Component-CentralUI.md | Blazor Server with SignalR real-time push, load balancer failover with JWT, all management workflows. |
| 10 | Security & Auth | docs/requirements/Component-Security.md | Direct LDAP bind (LDAPS/StartTLS), JWT sessions (HMAC-SHA256, 15-min refresh, 30-min idle), role-based authorization, site-scoped permissions. |
| 11 | Health Monitoring | docs/requirements/Component-HealthMonitoring.md | 30s report interval, 60s offline threshold, monotonic sequence numbers, raw error counts, tag resolution counts, dead letter monitoring. |
| 12 | Site Event Logging | docs/requirements/Component-SiteEventLogging.md | SQLite storage, 30-day retention + 1GB cap, daily purge, paginated remote queries with keyword search. |
| 13 | Cluster Infrastructure | docs/requirements/Component-ClusterInfrastructure.md | Akka.NET cluster, keep-oldest SBR with down-if-alone, min-nr-of-members=1, 2s/10s/15s failure detection, CoordinatedShutdown, automatic dual-node recovery. The ClusterInfrastructure project owns the ClusterOptions config model; the Akka bootstrap/SBR/CoordinatedShutdown wiring lives in the Host. |
| 14 | Inbound API | docs/requirements/Component-InboundAPI.md | POST /api/{methodName}, X-API-Key header, flat JSON, extended type system (Object/List), script-based implementations, failures-only logging. |
| 15 | Host | docs/requirements/Component-Host.md | Single deployable binary, role-based component registration, per-component config binding (Options pattern), readiness gating, dead letter monitoring, Akka.NET bootstrap, ASP.NET Core hosting for central. |
| 16 | Commons | docs/requirements/Component-Commons.md | Namespace/folder convention (Types/Interfaces/Entities/Messages), shared data types, POCOs, repository interfaces, message contracts with additive-only versioning, UTC timestamp convention, the unified read-only alarm condition model (AlarmConditionState/AlarmKind), and native alarm source entities + the IAlarmSubscribableConnection capability seam. |
| 17 | Configuration Database | docs/requirements/Component-ConfigurationDatabase.md | EF Core data access, per-component repositories, unit-of-work, optimistic concurrency on deployment status, audit logging (IAuditService), migration management (incl. the AddNativeAlarmSources migration + native alarm source repository CRUD). |
| 18 | Management Service | docs/requirements/Component-ManagementService.md | Akka.NET ManagementActor on central, ClusterClientReceptionist registration, programmatic access to all admin operations, CLI interface. |
| 19 | CLI | docs/requirements/Component-CLI.md | Standalone command-line tool, System.CommandLine, HTTP transport via Management API, JSON/table output, mirrors all Management Service operations. |
| 20 | Traefik Proxy | docs/requirements/Component-TraefikProxy.md | Reverse proxy/load balancer fronting central cluster, active node routing via /health/active, automatic failover. |
| 21 | Notification Outbox | docs/requirements/Component-NotificationOutbox.md | Central component ingesting store-and-forwarded notifications into the Notifications audit table, with NotificationOutboxActor singleton dispatcher, per-type delivery adapters, retry/parking, status tracking, daily purge, and delivery KPIs. |
| 22 | Site Call Audit | docs/requirements/Component-SiteCallAudit.md | Central component auditing site cached calls (ExternalSystem.CachedCall/Database.CachedWrite) into the SiteCalls audit table, with SiteCallAuditActor singleton, telemetry ingest, periodic reconciliation, point-in-time KPIs, daily purge, and central→site Retry/Discard relay for parked calls. |
| 23 | Audit Log | docs/requirements/Component-AuditLog.md | New central append-only AuditLog spanning every script-trust-boundary action (outbound API sync+cached, outbound DB sync+cached, notifications, inbound API). Site-local SQLite hot-path append + gRPC telemetry + central reconciliation; combined telemetry packet with Site Call Audit; central direct-write for Notification Outbox dispatch + Inbound API middleware; monthly partitioning, 365-day default retention. |
| 24 | Transport | docs/requirements/Component-Transport.md | Bundle export/import for templates, shared scripts, external systems, central-only artifacts. AES-256-GCM encryption; per-conflict resolution on import; correlated audit trail. |
Shared UI sub-component (not a top-level component): TreeView — reusable hierarchical tree/grid Blazor component used by the Central UI (#9) for the templates folder hierarchy, data-connection browse, and tag pickers.
Reference Documentation
- AkkaDotNet/ — Akka.NET reference notes covering actors, remoting, clustering, persistence, streams, serialization, hosting, testing, and best practices.
- docs/plans/ — Design decision documents from refinement sessions.
Architecture Diagram (Logical)
Site Runtime Actor Hierarchy
%%{init: {'theme':'base', 'themeVariables': {'textColor':'#111111','lineColor':'#555555','edgeLabelBackground':'#ffffff','fontSize':'15px'}}}%%
flowchart TD
DMS["Deployment Manager Singleton<br/>(Cluster Singleton)"]
IA["Instance Actor<br/>(one per deployed, enabled instance)"]
IA2["Instance Actor<br/>( … )"]
MOREIA["… more Instance Actors"]
DMS --> IA
DMS --> IA2
DMS -.-> MOREIA
SA["Script Actor<br/>(coordinator, one per instance script)"]
AA["Alarm Actor<br/>(coordinator, one per alarm definition)"]
MORE1["… more Script /<br/>Alarm Actors"]
IA --> SA
IA --> AA
IA -.-> MORE1
SEA["Script Execution Actor<br/>(short-lived, per invocation)"]
AEA["Alarm Execution Actor<br/>(short-lived, per on-trigger invocation)"]
IA2C["… (Script / Alarm Actors)"]
SA --> SEA
AA --> AEA
IA2 -.-> IA2C
subgraph STREAM["Site-Wide Akka Stream"]
PUB["All Instance Actors"]
STR["Site-Wide Akka Stream<br/>(attribute + alarm state changes)"]
DBG["Debug view<br/>(instance-level filtering)"]
PUB -->|publish| STR
STR -->|subscribe filtered| DBG
end
classDef start fill:#d5e8d4,stroke:#82b366,color:#111111;
classDef proc fill:#dae8fc,stroke:#6c8ebf,color:#111111;
classDef warn fill:#ffe6cc,stroke:#d79b00,color:#111111;
classDef alt fill:#e1d5e7,stroke:#9673a6,color:#111111;
classDef muted fill:#f5f5f5,stroke:#999999,color:#666666;
class DMS,STR alt
class IA,IA2,PUB proc
class SA,AA,DBG start
class SEA,AEA warn
class MOREIA,MORE1,IA2C muted
