Files
ScadaBridge/README.md
T
Joseph Doherty 3763f6d2d8 docs: reframe README as the ScadaBridge implementation project
Retitle from 'SCADA System — Design Documentation' to ScadaBridge; the
overview now describes the repo as the full implementation (src/tests/docker
+ design docs as spec) rather than design docs only. Add Repository Layout
and Build/Test/Run sections. Component table + architecture diagrams unchanged.
2026-05-31 22:12:16 -04:00

212 lines
19 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# ScadaBridge
ScadaBridge is a centrally-managed, distributed SCADA configuration and deployment platform built on Akka.NET, running across a central cluster and multiple site clusters in a hub-and-spoke topology.
## Overview
This repository is the full **implementation** project for ScadaBridge — the C#/.NET source (`src/`), tests (`tests/`), deployable Docker topology (`docker/`, `docker-env2/`, `infra/`), and the design documentation (`docs/`) that the code implements. This README is the master index: it links the per-component **design specs** (the spec the code in `src/` implements) and shows the system architecture. The solution file is `ZB.MOM.WW.ScadaBridge.slnx`.
### Technology Stack
| Layer | Technology |
|-------|-----------|
| Runtime | .NET, Akka.NET (actors, clustering, remoting, persistence, streams) |
| Central UI | Blazor Server (ASP.NET Core + SignalR) |
| Inbound API | ASP.NET Core Web API (REST/JSON) |
| Central Database | MS SQL Server, Entity Framework Core |
| Site Storage | SQLite (deployed configs, S&F buffer, event logs) |
| Authentication | Direct LDAP/AD bind (LDAPS/StartTLS), JWT sessions |
| Notifications | Delivered from the central cluster (SMTP, OAuth2/Microsoft 365); store-and-forwarded from sites |
| Hosting | Windows Server, Windows Service |
| Cluster | Akka.NET Cluster (active/standby, keep-oldest SBR) |
| Logging | Serilog (structured) |
### Scale
- Central cluster: 2-node active/standby behind a load balancer.
- Site clusters: 2-node active/standby, headless (no UI).
## Repository Layout
| Path | Contents |
|------|----------|
| `src/` | C#/.NET implementation — one project per component (`ZB.MOM.WW.ScadaBridge.<Component>`). Solution: `ZB.MOM.WW.ScadaBridge.slnx`. |
| `tests/` | Unit and integration test projects. |
| `docs/` | Design documentation — `docs/requirements/` (high-level + per-component specs, the spec the code implements), `docs/test_infra/`, `docs/plans/`. |
| `docker/` | Primary 8-node cluster topology (2 central + 3 sites × 2 nodes + Traefik) + `deploy.sh`. |
| `docker-env2/` | Minimal second cluster (2 central + 1 site) for exercising Transport (#24) against a real second environment. |
| `infra/` | Local test services (MS SQL, LDAP, OPC UA, SMTP, REST API, Traefik). |
| `deploy/` | Production/on-host deployment artifacts (e.g. `wonder-app-vd03/`). |
| `AkkaDotNet/` | Akka.NET reference notes. |
## Build, Test & Run
```bash
# Build the solution
dotnet build ZB.MOM.WW.ScadaBridge.slnx
# Run the tests
dotnet test ZB.MOM.WW.ScadaBridge.slnx
# Bring up the primary local cluster (builds the scadabridge:latest image + recreates containers)
bash docker/deploy.sh # central load balancer at http://localhost:9000
# Drive the system from the CLI (reads ~/.scadabridge/config.json; test user has all roles)
dotnet run --project src/ZB.MOM.WW.ScadaBridge.CLI -- \
--username multi-role --password password template list
```
See [`docker/README.md`](docker/README.md) for ports and management commands, and [`src/ZB.MOM.WW.ScadaBridge.CLI/README.md`](src/ZB.MOM.WW.ScadaBridge.CLI/README.md) for the full CLI reference.
## Local Test Environments
Two Docker-based cluster topologies are available for local development and testing:
- **Primary** ([`docker/`](docker/)) — Full topology (2 central + 3 sites × 2 nodes + Traefik). Default development target.
- **Env2** ([`docker-env2/`](docker-env2/)) — Minimal sibling stack (2 central + 1 site × 2 nodes + Traefik), runs concurrently with primary on host ports 91XX. Purpose: exercise the Transport (#24) bundle export/import feature against a real second environment.
Both stacks share the infrastructure services in [`infra/`](infra/) (MS SQL, LDAP, SMTP, OPC UA, REST API).
## Document Map
### Requirements
- [HighLevelReqs.md](docs/requirements/HighLevelReqs.md) — Complete high-level requirements covering all functional areas.
### Component Design Documents
| # | Component | Document | Description |
|---|-----------|----------|-------------|
| 1 | Template Engine | [docs/requirements/Component-TemplateEngine.md](docs/requirements/Component-TemplateEngine.md) | Template modeling, inheritance, composition, path-qualified member addressing, override granularity, locking, alarms, native alarm source bindings, flattening, semantic validation, revision hashing, diff calculation, and folder organization (nested folders, drag-drop). |
| 2 | Deployment Manager | [docs/requirements/Component-DeploymentManager.md](docs/requirements/Component-DeploymentManager.md) | Central-side deployment pipeline with deployment ID/idempotency, per-instance operation lock, state transition matrix, all-or-nothing site apply, system-wide artifact deployment with per-site status. |
| 3 | Site Runtime | [docs/requirements/Component-SiteRuntime.md](docs/requirements/Component-SiteRuntime.md) | Site-side actor hierarchy with explicit supervision strategies, staggered startup, script trust model (constrained APIs), Tell/Ask conventions, concurrency serialization, site-wide Akka stream with per-subscriber backpressure, and a read-only Native Alarm Actor (peer to the computed Alarm Actor) mirroring native OPC UA A&C / MxAccess alarms with site SQLite persistence. |
| 4 | Data Connection Layer | [docs/requirements/Component-DataConnectionLayer.md](docs/requirements/Component-DataConnectionLayer.md) | Common data connection interface (OPC UA, MxGateway, custom), Become/Stash connection actor model, auto-reconnect, immediate bad quality on disconnect, transparent re-subscribe, synchronous write failures, tag path resolution retry, protocol-agnostic address-space browse, and optional read-only native alarm mirroring (`IAlarmSubscribableConnection`, one alarm feed per connection with snapshot replay). |
| 5 | CentralSite Communication | [docs/requirements/Component-Communication.md](docs/requirements/Component-Communication.md) | Dual transport: Akka.NET ClusterClient (command/control) + gRPC server-streaming (real-time data). 9 message patterns with per-pattern timeouts, SiteStreamGrpcServer/Client, application-level correlation IDs, transport heartbeat config, gRPC keepalive, message ordering, connection failure behavior. The gRPC stream additively carries the read-only native alarm mirror (computed + native OPC UA / MxAccess) via the enriched `AlarmStateUpdate`. |
| 6 | Store-and-Forward Engine | [docs/requirements/Component-StoreAndForward.md](docs/requirements/Component-StoreAndForward.md) | Buffering (transient failures only), fixed-interval retry, parking, async best-effort replication, SQLite persistence at sites. |
| 7 | External System Gateway | [docs/requirements/Component-ExternalSystemGateway.md](docs/requirements/Component-ExternalSystemGateway.md) | HTTP/REST + JSON, API key/Basic Auth, per-system timeout, dual call modes (Call/CachedCall), transient/permanent error classification, dedicated blocking I/O dispatcher, ADO.NET connection pooling. |
| 8 | Notification Service | [docs/requirements/Component-NotificationService.md](docs/requirements/Component-NotificationService.md) | Central-only — manages typed notification-list and SMTP definitions, supplies per-type delivery adapters (SMTP with OAuth2 (M365) or Basic Auth, BCC, plain text); delivery performed by the Notification Outbox. |
| 9 | Central UI | [docs/requirements/Component-CentralUI.md](docs/requirements/Component-CentralUI.md) | Blazor Server with SignalR real-time push, load balancer failover with JWT, all management workflows. |
| 10 | Security & Auth | [docs/requirements/Component-Security.md](docs/requirements/Component-Security.md) | Direct LDAP bind (LDAPS/StartTLS), JWT sessions (HMAC-SHA256, 15-min refresh, 30-min idle), role-based authorization, site-scoped permissions. |
| 11 | Health Monitoring | [docs/requirements/Component-HealthMonitoring.md](docs/requirements/Component-HealthMonitoring.md) | 30s report interval, 60s offline threshold, monotonic sequence numbers, raw error counts, tag resolution counts, dead letter monitoring. |
| 12 | Site Event Logging | [docs/requirements/Component-SiteEventLogging.md](docs/requirements/Component-SiteEventLogging.md) | SQLite storage, 30-day retention + 1GB cap, daily purge, paginated remote queries with keyword search. |
| 13 | Cluster Infrastructure | [docs/requirements/Component-ClusterInfrastructure.md](docs/requirements/Component-ClusterInfrastructure.md) | Akka.NET cluster, keep-oldest SBR with down-if-alone, min-nr-of-members=1, 2s/10s/15s failure detection, CoordinatedShutdown, automatic dual-node recovery. The `ClusterInfrastructure` project owns the `ClusterOptions` config model; the Akka bootstrap/SBR/CoordinatedShutdown wiring lives in the Host. |
| 14 | Inbound API | [docs/requirements/Component-InboundAPI.md](docs/requirements/Component-InboundAPI.md) | POST /api/{methodName}, X-API-Key header, flat JSON, extended type system (Object/List), script-based implementations, failures-only logging. |
| 15 | Host | [docs/requirements/Component-Host.md](docs/requirements/Component-Host.md) | Single deployable binary, role-based component registration, per-component config binding (Options pattern), readiness gating, dead letter monitoring, Akka.NET bootstrap, ASP.NET Core hosting for central. |
| 16 | Commons | [docs/requirements/Component-Commons.md](docs/requirements/Component-Commons.md) | Namespace/folder convention (Types/Interfaces/Entities/Messages), shared data types, POCOs, repository interfaces, message contracts with additive-only versioning, UTC timestamp convention, the unified read-only alarm condition model (`AlarmConditionState`/`AlarmKind`), and native alarm source entities + the `IAlarmSubscribableConnection` capability seam. |
| 17 | Configuration Database | [docs/requirements/Component-ConfigurationDatabase.md](docs/requirements/Component-ConfigurationDatabase.md) | EF Core data access, per-component repositories, unit-of-work, optimistic concurrency on deployment status, audit logging (IAuditService), migration management (incl. the `AddNativeAlarmSources` migration + native alarm source repository CRUD). |
| 18 | Management Service | [docs/requirements/Component-ManagementService.md](docs/requirements/Component-ManagementService.md) | Akka.NET ManagementActor on central, ClusterClientReceptionist registration, programmatic access to all admin operations, CLI interface. |
| 19 | CLI | [docs/requirements/Component-CLI.md](docs/requirements/Component-CLI.md) | Standalone command-line tool, System.CommandLine, HTTP transport via Management API, JSON/table output, mirrors all Management Service operations. |
| 20 | Traefik Proxy | [docs/requirements/Component-TraefikProxy.md](docs/requirements/Component-TraefikProxy.md) | Reverse proxy/load balancer fronting central cluster, active node routing via `/health/active`, automatic failover. |
| 21 | Notification Outbox | [docs/requirements/Component-NotificationOutbox.md](docs/requirements/Component-NotificationOutbox.md) | Central component ingesting store-and-forwarded notifications into the `Notifications` audit table, with `NotificationOutboxActor` singleton dispatcher, per-type delivery adapters, retry/parking, status tracking, daily purge, and delivery KPIs. |
| 22 | Site Call Audit | [docs/requirements/Component-SiteCallAudit.md](docs/requirements/Component-SiteCallAudit.md) | Central component auditing site cached calls (`ExternalSystem.CachedCall`/`Database.CachedWrite`) into the `SiteCalls` audit table, with `SiteCallAuditActor` singleton, telemetry ingest, periodic reconciliation, point-in-time KPIs, daily purge, and central→site Retry/Discard relay for parked calls. |
| 23 | Audit Log | [docs/requirements/Component-AuditLog.md](docs/requirements/Component-AuditLog.md) | New central append-only AuditLog spanning every script-trust-boundary action (outbound API sync+cached, outbound DB sync+cached, notifications, inbound API). Site-local SQLite hot-path append + gRPC telemetry + central reconciliation; combined telemetry packet with Site Call Audit; central direct-write for Notification Outbox dispatch + Inbound API middleware; monthly partitioning, 365-day default retention. |
| 24 | Transport | [docs/requirements/Component-Transport.md](docs/requirements/Component-Transport.md) | Bundle export/import for templates, shared scripts, external systems, central-only artifacts. AES-256-GCM encryption; per-conflict resolution on import; correlated audit trail. |
**Shared UI sub-component** (not a top-level component): [TreeView](docs/requirements/Component-TreeView.md) — reusable hierarchical tree/grid Blazor component used by the Central UI (#9) for the templates folder hierarchy, data-connection browse, and tag pickers.
### Reference Documentation
- [AkkaDotNet/](AkkaDotNet/) — Akka.NET reference notes covering actors, remoting, clustering, persistence, streams, serialization, hosting, testing, and best practices.
- [docs/plans/](docs/plans/) — Design decision documents from refinement sessions.
### Architecture Diagram (Logical)
```
Users (Blazor Server)
Load Balancer
┌────────────────────────┼────────────────────────────┐
│ CENTRAL CLUSTER │
│ (2-node active/standby) │
│ │
│ ┌──────────┐ ┌──────────┐ ┌──────────┐ │
│ │ Template │ │Deployment│ │ Central │ │
│ │ Engine │ │ Manager │ │ UI │ Blazor Svr │
│ └──────────┘ └──────────┘ └──────────┘ │
│ ┌──────────┐ ┌──────────┐ ┌──────────┐ │
│ │ Security │ │ Config │ │ Health │ │
│ │ & Auth │ │ DB │ │ Monitor │ │
│ │ (JWT/LDAP)│ │ (EF+IAud)│ │ │ │
│ └──────────┘ └──────────┘ └──────────┘ │
│ ┌──────────┐ │
│ │ Inbound │ ◄── External Systems (X-API-Key) │
│ │ API │ POST /api/{method}, JSON │
│ └──────────┘ │
│ ┌──────────┐ │
│ │ Mgmt │ ◄── CLI (ClusterClient) │
│ │ Service │ ManagementActor + Receptionist │
│ └──────────┘ │
│ ┌──────────┐ ┌──────────┐ ┌──────────┐ │
│ │ Ntf │ │ Site │ │ Audit │ Observ. / │
│ │ Outbox │ │ Call │ │ Log │ Audit area │
│ │ (#21) │ │ Audit │ │ (#23) │ │
│ │ │ │ (#22) │ │ │ │
│ └────▲─────┘ └────▲─────┘ └────▲─────┘ │
│ │ ingests │ ingests │ ingests │
│ │ (S&F) │ (telemetry)│ (telemetry + │
│ │ │ │ direct-write │
│ │ │ │ from Ntf Outbox │
│ │ │ │ & Inbound API) │
│ ┌───────────────────────────────────┐ │
│ │ Akka.NET Communication Layer │ │
│ │ ClusterClient: command/control │ │
│ │ gRPC Client: real-time streams │ │
│ │ (correlation IDs, per-pattern │ │
│ │ timeouts, message ordering) │ │
│ └──────────────┬────────────────────┘ │
│ ┌──────────────┴────────────────────┐ │
│ │ Configuration Database (EF) │──► MS SQL │
│ └───────────────────────────────────┘ (Config DB)│
│ │ Machine Data DB│
└─────────────────┼───────────────────────────────────┘
│ Akka.NET Remoting (command/control)
│ gRPC HTTP/2 (real-time data, port 8083)
┌────────────┼────────────┐
▼ ▼ ▼
┌─────────┐ ┌─────────┐ ┌─────────┐
│ SITE A │ │ SITE B │ │ SITE N │
│ (2-node)│ │ (2-node)│ │ (2-node)│
│ ┌─────┐ │ │ ┌─────┐ │ │ ┌─────┐ │
│ │Data │ │ │ │Data │ │ │ │Data │ │
│ │Conn │ │ │ │Conn │ │ │ │Conn │ │
│ │Layer │ │ │ │Layer │ │ │ │Layer │ │
│ ├─────┤ │ │ ├─────┤ │ │ ├─────┤ │
│ │Site │ │ │ │Site │ │ │ │Site │ │
│ │Runtm│ │ │ │Runtm│ │ │ │Runtm│ │
│ ├─────┤ │ │ ├─────┤ │ │ ├─────┤ │
│ │gRPC │ │ │ │gRPC │ │ │ │gRPC │ │
│ │Srvr │ │ │ │Srvr │ │ │ │Srvr │ │
│ ├─────┤ │ │ ├─────┤ │ │ ├─────┤ │
│ │S&F │ │ │ │S&F │ │ │ │S&F │ │
│ │Engine│ │ │ │Engine│ │ │ │Engine│ │
│ ├─────┤ │ │ ├─────┤ │ │ ├─────┤ │
│ │ExtSys│ │ │ │ExtSys│ │ │ │ExtSys│ │
│ │Gatwy │ │ │ │Gatwy │ │ │ │Gatwy │ │
│ └─────┘ │ │ └─────┘ │ │ └─────┘ │
│ SQLite │ │ SQLite │ │ SQLite │
└─────────┘ └─────────┘ └─────────┘
│ │ │
OPC UA / OPC UA / OPC UA /
Custom Custom Custom
Protocol Protocol Protocol
```
### Site Runtime Actor Hierarchy
```
Deployment Manager Singleton (Cluster Singleton)
├── Instance Actor (one per deployed, enabled instance)
│ ├── Script Actor (coordinator, one per instance script)
│ │ └── Script Execution Actor (short-lived, per invocation)
│ ├── Alarm Actor (coordinator, one per alarm definition)
│ │ └── Alarm Execution Actor (short-lived, per on-trigger invocation)
│ └── ... (more Script/Alarm Actors)
├── Instance Actor
│ └── ...
└── ... (more Instance Actors)
Site-Wide Akka Stream (attribute + alarm state changes)
├── All Instance Actors publish to the stream
└── Debug view subscribes with instance-level filtering
```