10 Commits

Author SHA1 Message Date
Joseph Doherty adf1bd2693 build: drop orphaned AspNetCore.HealthChecks.UI.Client ref (UIResponseWriter removed) 2026-06-01 13:56:12 -04:00
Joseph Doherty bbff1d19b5 feat: adopt shared ZB.MOM.WW.Health probes; add /healthz; canonical writer 2026-06-01 13:46:49 -04:00
Joseph Doherty 2a7ff03718 feat: bridge ActorSystem into DI (transient) for shared health checks 2026-06-01 13:37:21 -04:00
Joseph Doherty 38e48299a4 build: reference ZB.MOM.WW.Health packages from the Gitea feed 2026-06-01 13:30:33 -04:00
Joseph Doherty 43228185b4 docs: convert standard diagrams from draw.io PNGs to inline Mermaid
Gitea renders mermaid inline, so the flow/state/hierarchy/DAG diagrams
move to text-in-markdown: auto-layout (removes the manual overlap-prone
draw.io step), diffable source, no committed binaries, and a dark-text
theme so labels stay legible. Keep draw.io PNGs only for the two complex
bespoke diagrams (logical architecture, env2 topology) where pixel
control still wins. All 24 mermaid blocks validated by rendering.
2026-06-01 00:23:00 -04:00
Joseph Doherty e3ca5ac0cf docs(spike): darken Mermaid label text for readability
Add explicit dark text color (per-class color + base theme override) to
the store-and-forward mermaid diagram so node/edge labels read clearly
regardless of gitea's page theme.
2026-06-01 00:08:08 -04:00
Joseph Doherty 4c5e7eb917 docs(spike): inline Mermaid for store-and-forward lifecycle
Swap the store-and-forward Message Lifecycle PNG embed for an inline
mermaid block to verify whether gitea renders mermaid in markdown. If it
does, the standard flow/state/hierarchy diagrams can move to inline
mermaid (text-only, auto-layout) instead of draw.io source + PNG.
2026-05-31 23:53:04 -04:00
Joseph Doherty bdee12f4e9 docs: render architecture & flow diagrams as draw.io charts
Replace ASCII-art diagrams across the README and docs/ with editable
.drawio sources plus exported PNGs, so the diagrams render clearly in
rendered markdown and can be maintained/regenerated instead of being
hand-edited as fragile text art. Non-diagram blocks (code, folder
trees, UI wireframes) were left as text.
2026-05-31 23:32:53 -04:00
Joseph Doherty 3763f6d2d8 docs: reframe README as the ScadaBridge implementation project
Retitle from 'SCADA System — Design Documentation' to ScadaBridge; the
overview now describes the repo as the full implementation (src/tests/docker
+ design docs as spec) rather than design docs only. Add Repository Layout
and Build/Test/Run sections. Component table + architecture diagrams unchanged.
2026-05-31 22:12:16 -04:00
Joseph Doherty 300841b205 chore: mark rename plan complete (all 7 tasks done) 2026-05-31 22:05:13 -04:00
34 changed files with 1389 additions and 768 deletions
+3 -1
View File
@@ -15,7 +15,6 @@
<PackageVersion Include="Akka.Streams" Version="1.5.62" /> <PackageVersion Include="Akka.Streams" Version="1.5.62" />
<PackageVersion Include="Akka.Streams.TestKit" Version="1.5.62" /> <PackageVersion Include="Akka.Streams.TestKit" Version="1.5.62" />
<PackageVersion Include="Akka.TestKit.Xunit2" Version="1.5.62" /> <PackageVersion Include="Akka.TestKit.Xunit2" Version="1.5.62" />
<PackageVersion Include="AspNetCore.HealthChecks.UI.Client" Version="9.0.0" />
<PackageVersion Include="bunit" Version="2.0.33-preview" /> <PackageVersion Include="bunit" Version="2.0.33-preview" />
<PackageVersion Include="coverlet.collector" Version="6.0.4" /> <PackageVersion Include="coverlet.collector" Version="6.0.4" />
<PackageVersion Include="FluentAssertions" Version="8.3.0" /> <PackageVersion Include="FluentAssertions" Version="8.3.0" />
@@ -73,6 +72,9 @@
to mark tests as Skipped (not silently Passed) when MSSQL is unreachable. to mark tests as Skipped (not silently Passed) when MSSQL is unreachable.
--> -->
<PackageVersion Include="Xunit.SkippableFact" Version="1.5.61" /> <PackageVersion Include="Xunit.SkippableFact" Version="1.5.61" />
<PackageVersion Include="ZB.MOM.WW.Health" Version="0.1.0" />
<PackageVersion Include="ZB.MOM.WW.Health.Akka" Version="0.1.0" />
<PackageVersion Include="ZB.MOM.WW.Health.EntityFrameworkCore" Version="0.1.0" />
<PackageVersion Include="ZB.MOM.WW.MxGateway.Client" Version="0.1.0" /> <PackageVersion Include="ZB.MOM.WW.MxGateway.Client" Version="0.1.0" />
<PackageVersion Include="ZB.MOM.WW.MxGateway.Contracts" Version="0.1.0" /> <PackageVersion Include="ZB.MOM.WW.MxGateway.Contracts" Version="0.1.0" />
</ItemGroup> </ItemGroup>
+80 -96
View File
@@ -1,8 +1,10 @@
# SCADA System — Design Documentation # ScadaBridge
ScadaBridge is a centrally-managed, distributed SCADA configuration and deployment platform built on Akka.NET, running across a central cluster and multiple site clusters in a hub-and-spoke topology.
## Overview ## Overview
This document serves as the master index for the SCADA system design. The system is a centrally-managed, distributed SCADA configuration and deployment platform built on Akka.NET, running across a central cluster and multiple site clusters in a hub-and-spoke topology. This repository is the full **implementation** project for ScadaBridge — the C#/.NET source (`src/`), tests (`tests/`), deployable Docker topology (`docker/`, `docker-env2/`, `infra/`), and the design documentation (`docs/`) that the code implements. This README is the master index: it links the per-component **design specs** (the spec the code in `src/` implements) and shows the system architecture. The solution file is `ZB.MOM.WW.ScadaBridge.slnx`.
### Technology Stack ### Technology Stack
@@ -24,6 +26,38 @@ This document serves as the master index for the SCADA system design. The system
- Central cluster: 2-node active/standby behind a load balancer. - Central cluster: 2-node active/standby behind a load balancer.
- Site clusters: 2-node active/standby, headless (no UI). - Site clusters: 2-node active/standby, headless (no UI).
## Repository Layout
| Path | Contents |
|------|----------|
| `src/` | C#/.NET implementation — one project per component (`ZB.MOM.WW.ScadaBridge.<Component>`). Solution: `ZB.MOM.WW.ScadaBridge.slnx`. |
| `tests/` | Unit and integration test projects. |
| `docs/` | Design documentation — `docs/requirements/` (high-level + per-component specs, the spec the code implements), `docs/test_infra/`, `docs/plans/`. |
| `docker/` | Primary 8-node cluster topology (2 central + 3 sites × 2 nodes + Traefik) + `deploy.sh`. |
| `docker-env2/` | Minimal second cluster (2 central + 1 site) for exercising Transport (#24) against a real second environment. |
| `infra/` | Local test services (MS SQL, LDAP, OPC UA, SMTP, REST API, Traefik). |
| `deploy/` | Production/on-host deployment artifacts (e.g. `wonder-app-vd03/`). |
| `AkkaDotNet/` | Akka.NET reference notes. |
## Build, Test & Run
```bash
# Build the solution
dotnet build ZB.MOM.WW.ScadaBridge.slnx
# Run the tests
dotnet test ZB.MOM.WW.ScadaBridge.slnx
# Bring up the primary local cluster (builds the scadabridge:latest image + recreates containers)
bash docker/deploy.sh # central load balancer at http://localhost:9000
# Drive the system from the CLI (reads ~/.scadabridge/config.json; test user has all roles)
dotnet run --project src/ZB.MOM.WW.ScadaBridge.CLI -- \
--username multi-role --password password template list
```
See [`docker/README.md`](docker/README.md) for ports and management commands, and [`src/ZB.MOM.WW.ScadaBridge.CLI/README.md`](src/ZB.MOM.WW.ScadaBridge.CLI/README.md) for the full CLI reference.
## Local Test Environments ## Local Test Environments
Two Docker-based cluster topologies are available for local development and testing: Two Docker-based cluster topologies are available for local development and testing:
@@ -76,102 +110,52 @@ Both stacks share the infrastructure services in [`infra/`](infra/) (MS SQL, LDA
### Architecture Diagram (Logical) ### Architecture Diagram (Logical)
``` ![Logical architecture](diagrams/architecture-logical.png)
Users (Blazor Server) <!-- source: diagrams/architecture-logical.drawio — edit, then re-export with export-drawio.sh -->
Load Balancer
┌────────────────────────┼────────────────────────────┐
│ CENTRAL CLUSTER │
│ (2-node active/standby) │
│ │
│ ┌──────────┐ ┌──────────┐ ┌──────────┐ │
│ │ Template │ │Deployment│ │ Central │ │
│ │ Engine │ │ Manager │ │ UI │ Blazor Svr │
│ └──────────┘ └──────────┘ └──────────┘ │
│ ┌──────────┐ ┌──────────┐ ┌──────────┐ │
│ │ Security │ │ Config │ │ Health │ │
│ │ & Auth │ │ DB │ │ Monitor │ │
│ │ (JWT/LDAP)│ │ (EF+IAud)│ │ │ │
│ └──────────┘ └──────────┘ └──────────┘ │
│ ┌──────────┐ │
│ │ Inbound │ ◄── External Systems (X-API-Key) │
│ │ API │ POST /api/{method}, JSON │
│ └──────────┘ │
│ ┌──────────┐ │
│ │ Mgmt │ ◄── CLI (ClusterClient) │
│ │ Service │ ManagementActor + Receptionist │
│ └──────────┘ │
│ ┌──────────┐ ┌──────────┐ ┌──────────┐ │
│ │ Ntf │ │ Site │ │ Audit │ Observ. / │
│ │ Outbox │ │ Call │ │ Log │ Audit area │
│ │ (#21) │ │ Audit │ │ (#23) │ │
│ │ │ │ (#22) │ │ │ │
│ └────▲─────┘ └────▲─────┘ └────▲─────┘ │
│ │ ingests │ ingests │ ingests │
│ │ (S&F) │ (telemetry)│ (telemetry + │
│ │ │ │ direct-write │
│ │ │ │ from Ntf Outbox │
│ │ │ │ & Inbound API) │
│ ┌───────────────────────────────────┐ │
│ │ Akka.NET Communication Layer │ │
│ │ ClusterClient: command/control │ │
│ │ gRPC Client: real-time streams │ │
│ │ (correlation IDs, per-pattern │ │
│ │ timeouts, message ordering) │ │
│ └──────────────┬────────────────────┘ │
│ ┌──────────────┴────────────────────┐ │
│ │ Configuration Database (EF) │──► MS SQL │
│ └───────────────────────────────────┘ (Config DB)│
│ │ Machine Data DB│
└─────────────────┼───────────────────────────────────┘
│ Akka.NET Remoting (command/control)
│ gRPC HTTP/2 (real-time data, port 8083)
┌────────────┼────────────┐
▼ ▼ ▼
┌─────────┐ ┌─────────┐ ┌─────────┐
│ SITE A │ │ SITE B │ │ SITE N │
│ (2-node)│ │ (2-node)│ │ (2-node)│
│ ┌─────┐ │ │ ┌─────┐ │ │ ┌─────┐ │
│ │Data │ │ │ │Data │ │ │ │Data │ │
│ │Conn │ │ │ │Conn │ │ │ │Conn │ │
│ │Layer │ │ │ │Layer │ │ │ │Layer │ │
│ ├─────┤ │ │ ├─────┤ │ │ ├─────┤ │
│ │Site │ │ │ │Site │ │ │ │Site │ │
│ │Runtm│ │ │ │Runtm│ │ │ │Runtm│ │
│ ├─────┤ │ │ ├─────┤ │ │ ├─────┤ │
│ │gRPC │ │ │ │gRPC │ │ │ │gRPC │ │
│ │Srvr │ │ │ │Srvr │ │ │ │Srvr │ │
│ ├─────┤ │ │ ├─────┤ │ │ ├─────┤ │
│ │S&F │ │ │ │S&F │ │ │ │S&F │ │
│ │Engine│ │ │ │Engine│ │ │ │Engine│ │
│ ├─────┤ │ │ ├─────┤ │ │ ├─────┤ │
│ │ExtSys│ │ │ │ExtSys│ │ │ │ExtSys│ │
│ │Gatwy │ │ │ │Gatwy │ │ │ │Gatwy │ │
│ └─────┘ │ │ └─────┘ │ │ └─────┘ │
│ SQLite │ │ SQLite │ │ SQLite │
└─────────┘ └─────────┘ └─────────┘
│ │ │
OPC UA / OPC UA / OPC UA /
Custom Custom Custom
Protocol Protocol Protocol
```
### Site Runtime Actor Hierarchy ### Site Runtime Actor Hierarchy
``` ```mermaid
Deployment Manager Singleton (Cluster Singleton) %%{init: {'theme':'base', 'themeVariables': {'textColor':'#111111','lineColor':'#555555','edgeLabelBackground':'#ffffff','fontSize':'15px'}}}%%
├── Instance Actor (one per deployed, enabled instance) flowchart TD
├── Script Actor (coordinator, one per instance script) DMS["Deployment Manager Singleton<br/>(Cluster Singleton)"]
│ │ └── Script Execution Actor (short-lived, per invocation) IA["Instance Actor<br/>(one per deployed, enabled instance)"]
│ ├── Alarm Actor (coordinator, one per alarm definition) IA2["Instance Actor<br/>( … )"]
│ │ └── Alarm Execution Actor (short-lived, per on-trigger invocation) MOREIA["… more Instance Actors"]
│ └── ... (more Script/Alarm Actors) DMS --> IA
├── Instance Actor DMS --> IA2
│ └── ... DMS -.-> MOREIA
└── ... (more Instance Actors)
Site-Wide Akka Stream (attribute + alarm state changes) SA["Script Actor<br/>(coordinator, one per instance script)"]
├── All Instance Actors publish to the stream AA["Alarm Actor<br/>(coordinator, one per alarm definition)"]
└── Debug view subscribes with instance-level filtering MORE1["… more Script /<br/>Alarm Actors"]
IA --> SA
IA --> AA
IA -.-> MORE1
SEA["Script Execution Actor<br/>(short-lived, per invocation)"]
AEA["Alarm Execution Actor<br/>(short-lived, per on-trigger invocation)"]
IA2C["… (Script / Alarm Actors)"]
SA --> SEA
AA --> AEA
IA2 -.-> IA2C
subgraph STREAM["Site-Wide Akka Stream"]
PUB["All Instance Actors"]
STR["Site-Wide Akka Stream<br/>(attribute + alarm state changes)"]
DBG["Debug view<br/>(instance-level filtering)"]
PUB -->|publish| STR
STR -->|subscribe filtered| DBG
end
classDef start fill:#d5e8d4,stroke:#82b366,color:#111111;
classDef proc fill:#dae8fc,stroke:#6c8ebf,color:#111111;
classDef warn fill:#ffe6cc,stroke:#d79b00,color:#111111;
classDef alt fill:#e1d5e7,stroke:#9673a6,color:#111111;
classDef muted fill:#f5f5f5,stroke:#999999,color:#666666;
class DMS,STR alt
class IA,IA2,PUB proc
class SA,AA,DBG start
class SEA,AEA warn
class MOREIA,MORE1,IA2C muted
``` ```
+214
View File
@@ -0,0 +1,214 @@
<mxfile host="app.diagrams.net">
<diagram id="arch-logical" name="Logical Architecture">
<mxGraphModel dx="1200" dy="900" grid="1" gridSize="10" guides="1" arrows="1" fold="1" page="1" pageScale="1" pageWidth="1000" pageHeight="1200" math="0" shadow="0">
<root>
<mxCell id="0" />
<mxCell id="1" parent="0" />
<!-- top: users + load balancer -->
<mxCell id="users" value="Users (Blazor Server)" style="rounded=1;whiteSpace=wrap;html=1;fillColor=#dae8fc;strokeColor=#6c8ebf;" vertex="1" parent="1">
<mxGeometry x="430" y="20" width="180" height="40" as="geometry" />
</mxCell>
<mxCell id="lb" value="Load Balancer / Traefik" style="whiteSpace=wrap;html=1;fillColor=#fff2cc;strokeColor=#d6b656;" vertex="1" parent="1">
<mxGeometry x="430" y="92" width="180" height="40" as="geometry" />
</mxCell>
<mxCell id="e_users_lb" style="edgeStyle=orthogonalEdgeStyle;html=1;endArrow=block;" edge="1" parent="1" source="users" target="lb">
<mxGeometry relative="1" as="geometry" />
</mxCell>
<mxCell id="e_lb_central" style="edgeStyle=orthogonalEdgeStyle;html=1;endArrow=block;" edge="1" parent="1" source="lb" target="central">
<mxGeometry relative="1" as="geometry" />
</mxCell>
<!-- central cluster container -->
<mxCell id="central" value="CENTRAL CLUSTER — 2-node active / standby" style="rounded=0;whiteSpace=wrap;html=1;verticalAlign=top;fontStyle=1;fontSize=14;fillColor=#eef3fb;strokeColor=#6c8ebf;" vertex="1" parent="1">
<mxGeometry x="40" y="160" width="740" height="490" as="geometry" />
</mxCell>
<mxCell id="te" value="Template Engine" style="whiteSpace=wrap;html=1;fillColor=#dae8fc;strokeColor=#6c8ebf;" vertex="1" parent="1">
<mxGeometry x="70" y="206" width="200" height="44" as="geometry" />
</mxCell>
<mxCell id="dm" value="Deployment Manager" style="whiteSpace=wrap;html=1;fillColor=#dae8fc;strokeColor=#6c8ebf;" vertex="1" parent="1">
<mxGeometry x="300" y="206" width="200" height="44" as="geometry" />
</mxCell>
<mxCell id="ui" value="Central UI (Blazor Server)" style="whiteSpace=wrap;html=1;fillColor=#dae8fc;strokeColor=#6c8ebf;" vertex="1" parent="1">
<mxGeometry x="530" y="206" width="200" height="44" as="geometry" />
</mxCell>
<mxCell id="sec" value="Security &amp; Auth (JWT / LDAP)" style="whiteSpace=wrap;html=1;fillColor=#dae8fc;strokeColor=#6c8ebf;" vertex="1" parent="1">
<mxGeometry x="70" y="270" width="200" height="44" as="geometry" />
</mxCell>
<mxCell id="cfg" value="Configuration DB (EF + IAudit)" style="whiteSpace=wrap;html=1;fillColor=#dae8fc;strokeColor=#6c8ebf;" vertex="1" parent="1">
<mxGeometry x="300" y="270" width="200" height="44" as="geometry" />
</mxCell>
<mxCell id="health" value="Health Monitor" style="whiteSpace=wrap;html=1;fillColor=#dae8fc;strokeColor=#6c8ebf;" vertex="1" parent="1">
<mxGeometry x="530" y="270" width="200" height="44" as="geometry" />
</mxCell>
<mxCell id="inapi" value="Inbound API" style="whiteSpace=wrap;html=1;fillColor=#dae8fc;strokeColor=#6c8ebf;" vertex="1" parent="1">
<mxGeometry x="70" y="338" width="200" height="44" as="geometry" />
</mxCell>
<mxCell id="extsys" value="External Systems&#10;(X-API-Key)" style="shape=cloud;whiteSpace=wrap;html=1;fillColor=#fff2cc;strokeColor=#d6b656;" vertex="1" parent="1">
<mxGeometry x="548" y="352" width="184" height="48" as="geometry" />
</mxCell>
<mxCell id="e_ext_in" value="POST /api/{method} · JSON" style="edgeStyle=orthogonalEdgeStyle;html=1;endArrow=block;fontSize=10;" edge="1" parent="1" source="extsys" target="inapi">
<mxGeometry relative="1" as="geometry" />
</mxCell>
<mxCell id="mgmt" value="Management Service" style="whiteSpace=wrap;html=1;fillColor=#dae8fc;strokeColor=#6c8ebf;" vertex="1" parent="1">
<mxGeometry x="70" y="402" width="200" height="44" as="geometry" />
</mxCell>
<mxCell id="cli" value="CLI&#10;(ClusterClient)" style="shape=cloud;whiteSpace=wrap;html=1;fillColor=#fff2cc;strokeColor=#d6b656;" vertex="1" parent="1">
<mxGeometry x="548" y="414" width="184" height="48" as="geometry" />
</mxCell>
<mxCell id="e_cli_mgmt" value="ManagementActor + Receptionist" style="edgeStyle=orthogonalEdgeStyle;html=1;endArrow=block;fontSize=10;" edge="1" parent="1" source="cli" target="mgmt">
<mxGeometry relative="1" as="geometry" />
</mxCell>
<mxCell id="obslabel" value="Observability / Audit" style="text;html=1;align=left;verticalAlign=middle;fontStyle=2;fontSize=11;" vertex="1" parent="1">
<mxGeometry x="70" y="456" width="300" height="18" as="geometry" />
</mxCell>
<mxCell id="ntf" value="Notification Outbox (#21)" style="whiteSpace=wrap;html=1;fillColor=#ffe6cc;strokeColor=#d79b00;" vertex="1" parent="1">
<mxGeometry x="70" y="478" width="200" height="44" as="geometry" />
</mxCell>
<mxCell id="sca" value="Site Call Audit (#22)" style="whiteSpace=wrap;html=1;fillColor=#ffe6cc;strokeColor=#d79b00;" vertex="1" parent="1">
<mxGeometry x="300" y="478" width="200" height="44" as="geometry" />
</mxCell>
<mxCell id="audit" value="Audit Log (#23)" style="whiteSpace=wrap;html=1;fillColor=#ffe6cc;strokeColor=#d79b00;" vertex="1" parent="1">
<mxGeometry x="530" y="478" width="200" height="44" as="geometry" />
</mxCell>
<mxCell id="comm" value="Akka.NET Communication Layer&#10;ClusterClient (command/control) · gRPC Client (real-time streams)" style="whiteSpace=wrap;html=1;fillColor=#e1d5e7;strokeColor=#9673a6;" vertex="1" parent="1">
<mxGeometry x="70" y="558" width="430" height="64" as="geometry" />
</mxCell>
<mxCell id="mssql" value="MS SQL&#10;Config DB · Machine Data DB" style="shape=cylinder3;whiteSpace=wrap;html=1;fillColor=#f5f5f5;strokeColor=#666666;" vertex="1" parent="1">
<mxGeometry x="580" y="556" width="200" height="70" as="geometry" />
</mxCell>
<mxCell id="e_cfg_sql" value="EF Core" style="edgeStyle=orthogonalEdgeStyle;html=1;endArrow=block;dashed=1;fontSize=10;exitX=1;exitY=0.7;entryX=0;entryY=0.4;" edge="1" parent="1" source="cfg" target="mssql">
<mxGeometry relative="1" as="geometry">
<Array as="points">
<mxPoint x="515" y="301" />
<mxPoint x="515" y="560" />
</Array>
</mxGeometry>
</mxCell>
<!-- ingests edges -->
<mxCell id="e_ing_ntf" value="ingests (S&amp;F)" style="edgeStyle=orthogonalEdgeStyle;html=1;endArrow=block;dashed=1;fontSize=9;" edge="1" parent="1" source="comm" target="ntf">
<mxGeometry relative="1" as="geometry" />
</mxCell>
<mxCell id="e_ing_sca" value="ingests (telemetry)" style="edgeStyle=orthogonalEdgeStyle;html=1;endArrow=block;dashed=1;fontSize=9;" edge="1" parent="1" source="comm" target="sca">
<mxGeometry relative="1" as="geometry" />
</mxCell>
<mxCell id="e_ing_audit" value="ingests (telemetry + direct-write)" style="edgeStyle=orthogonalEdgeStyle;html=1;endArrow=block;dashed=1;fontSize=9;exitX=0.95;exitY=0;entryX=0.1;entryY=1;" edge="1" parent="1" source="comm" target="audit">
<mxGeometry relative="1" as="geometry" />
</mxCell>
<!-- transport annotation between central and sites -->
<mxCell id="transport" value="Akka.NET Remoting (command/control) · gRPC HTTP/2 (real-time data, port 8083)" style="text;html=1;align=center;verticalAlign=middle;fontStyle=2;fontSize=11;" vertex="1" parent="1">
<mxGeometry x="120" y="662" width="580" height="28" as="geometry" />
</mxCell>
<!-- SITE A -->
<mxCell id="siteA" value="SITE A — 2-node" style="rounded=0;whiteSpace=wrap;html=1;verticalAlign=top;fontStyle=1;fillColor=#eafaf0;strokeColor=#82b366;" vertex="1" parent="1">
<mxGeometry x="40" y="720" width="230" height="364" as="geometry" />
</mxCell>
<mxCell id="a_dcl" value="Data Connection Layer" style="whiteSpace=wrap;html=1;fillColor=#d5e8d4;strokeColor=#82b366;" vertex="1" parent="1">
<mxGeometry x="60" y="758" width="190" height="40" as="geometry" />
</mxCell>
<mxCell id="a_rt" value="Site Runtime" style="whiteSpace=wrap;html=1;fillColor=#d5e8d4;strokeColor=#82b366;" vertex="1" parent="1">
<mxGeometry x="60" y="806" width="190" height="40" as="geometry" />
</mxCell>
<mxCell id="a_grpc" value="gRPC Server" style="whiteSpace=wrap;html=1;fillColor=#d5e8d4;strokeColor=#82b366;" vertex="1" parent="1">
<mxGeometry x="60" y="854" width="190" height="40" as="geometry" />
</mxCell>
<mxCell id="a_snf" value="Store-and-Forward Engine" style="whiteSpace=wrap;html=1;fillColor=#d5e8d4;strokeColor=#82b366;" vertex="1" parent="1">
<mxGeometry x="60" y="902" width="190" height="40" as="geometry" />
</mxCell>
<mxCell id="a_esg" value="External System Gateway" style="whiteSpace=wrap;html=1;fillColor=#d5e8d4;strokeColor=#82b366;" vertex="1" parent="1">
<mxGeometry x="60" y="950" width="190" height="40" as="geometry" />
</mxCell>
<mxCell id="a_sql" value="SQLite" style="shape=cylinder3;whiteSpace=wrap;html=1;fillColor=#f5f5f5;strokeColor=#666666;" vertex="1" parent="1">
<mxGeometry x="85" y="1002" width="140" height="64" as="geometry" />
</mxCell>
<!-- SITE B -->
<mxCell id="siteB" value="SITE B — 2-node" style="rounded=0;whiteSpace=wrap;html=1;verticalAlign=top;fontStyle=1;fillColor=#eafaf0;strokeColor=#82b366;" vertex="1" parent="1">
<mxGeometry x="295" y="720" width="230" height="364" as="geometry" />
</mxCell>
<mxCell id="b_dcl" value="Data Connection Layer" style="whiteSpace=wrap;html=1;fillColor=#d5e8d4;strokeColor=#82b366;" vertex="1" parent="1">
<mxGeometry x="315" y="758" width="190" height="40" as="geometry" />
</mxCell>
<mxCell id="b_rt" value="Site Runtime" style="whiteSpace=wrap;html=1;fillColor=#d5e8d4;strokeColor=#82b366;" vertex="1" parent="1">
<mxGeometry x="315" y="806" width="190" height="40" as="geometry" />
</mxCell>
<mxCell id="b_grpc" value="gRPC Server" style="whiteSpace=wrap;html=1;fillColor=#d5e8d4;strokeColor=#82b366;" vertex="1" parent="1">
<mxGeometry x="315" y="854" width="190" height="40" as="geometry" />
</mxCell>
<mxCell id="b_snf" value="Store-and-Forward Engine" style="whiteSpace=wrap;html=1;fillColor=#d5e8d4;strokeColor=#82b366;" vertex="1" parent="1">
<mxGeometry x="315" y="902" width="190" height="40" as="geometry" />
</mxCell>
<mxCell id="b_esg" value="External System Gateway" style="whiteSpace=wrap;html=1;fillColor=#d5e8d4;strokeColor=#82b366;" vertex="1" parent="1">
<mxGeometry x="315" y="950" width="190" height="40" as="geometry" />
</mxCell>
<mxCell id="b_sql" value="SQLite" style="shape=cylinder3;whiteSpace=wrap;html=1;fillColor=#f5f5f5;strokeColor=#666666;" vertex="1" parent="1">
<mxGeometry x="340" y="1002" width="140" height="64" as="geometry" />
</mxCell>
<!-- SITE N -->
<mxCell id="siteN" value="SITE N — 2-node" style="rounded=0;whiteSpace=wrap;html=1;verticalAlign=top;fontStyle=1;fillColor=#eafaf0;strokeColor=#82b366;" vertex="1" parent="1">
<mxGeometry x="550" y="720" width="230" height="364" as="geometry" />
</mxCell>
<mxCell id="n_dcl" value="Data Connection Layer" style="whiteSpace=wrap;html=1;fillColor=#d5e8d4;strokeColor=#82b366;" vertex="1" parent="1">
<mxGeometry x="570" y="758" width="190" height="40" as="geometry" />
</mxCell>
<mxCell id="n_rt" value="Site Runtime" style="whiteSpace=wrap;html=1;fillColor=#d5e8d4;strokeColor=#82b366;" vertex="1" parent="1">
<mxGeometry x="570" y="806" width="190" height="40" as="geometry" />
</mxCell>
<mxCell id="n_grpc" value="gRPC Server" style="whiteSpace=wrap;html=1;fillColor=#d5e8d4;strokeColor=#82b366;" vertex="1" parent="1">
<mxGeometry x="570" y="854" width="190" height="40" as="geometry" />
</mxCell>
<mxCell id="n_snf" value="Store-and-Forward Engine" style="whiteSpace=wrap;html=1;fillColor=#d5e8d4;strokeColor=#82b366;" vertex="1" parent="1">
<mxGeometry x="570" y="902" width="190" height="40" as="geometry" />
</mxCell>
<mxCell id="n_esg" value="External System Gateway" style="whiteSpace=wrap;html=1;fillColor=#d5e8d4;strokeColor=#82b366;" vertex="1" parent="1">
<mxGeometry x="570" y="950" width="190" height="40" as="geometry" />
</mxCell>
<mxCell id="n_sql" value="SQLite" style="shape=cylinder3;whiteSpace=wrap;html=1;fillColor=#f5f5f5;strokeColor=#666666;" vertex="1" parent="1">
<mxGeometry x="595" y="1002" width="140" height="64" as="geometry" />
</mxCell>
<!-- central -> sites transport edges -->
<mxCell id="e_c_a" style="edgeStyle=orthogonalEdgeStyle;html=1;endArrow=block;strokeWidth=1.5;" edge="1" parent="1" source="comm" target="siteA">
<mxGeometry relative="1" as="geometry" />
</mxCell>
<mxCell id="e_c_b" style="edgeStyle=orthogonalEdgeStyle;html=1;endArrow=block;strokeWidth=1.5;" edge="1" parent="1" source="comm" target="siteB">
<mxGeometry relative="1" as="geometry" />
</mxCell>
<mxCell id="e_c_n" style="edgeStyle=orthogonalEdgeStyle;html=1;endArrow=block;strokeWidth=1.5;" edge="1" parent="1" source="comm" target="siteN">
<mxGeometry relative="1" as="geometry" />
</mxCell>
<!-- field protocol per site -->
<mxCell id="a_proto" value="OPC UA / Custom Protocol" style="whiteSpace=wrap;html=1;fillColor=#fff2cc;strokeColor=#d6b656;fontSize=11;" vertex="1" parent="1">
<mxGeometry x="60" y="1100" width="190" height="36" as="geometry" />
</mxCell>
<mxCell id="b_proto" value="OPC UA / Custom Protocol" style="whiteSpace=wrap;html=1;fillColor=#fff2cc;strokeColor=#d6b656;fontSize=11;" vertex="1" parent="1">
<mxGeometry x="315" y="1100" width="190" height="36" as="geometry" />
</mxCell>
<mxCell id="n_proto" value="OPC UA / Custom Protocol" style="whiteSpace=wrap;html=1;fillColor=#fff2cc;strokeColor=#d6b656;fontSize=11;" vertex="1" parent="1">
<mxGeometry x="570" y="1100" width="190" height="36" as="geometry" />
</mxCell>
<mxCell id="e_a_proto" style="edgeStyle=orthogonalEdgeStyle;html=1;endArrow=block;" edge="1" parent="1" source="a_dcl" target="a_proto">
<mxGeometry relative="1" as="geometry" />
</mxCell>
<mxCell id="e_b_proto" style="edgeStyle=orthogonalEdgeStyle;html=1;endArrow=block;" edge="1" parent="1" source="b_dcl" target="b_proto">
<mxGeometry relative="1" as="geometry" />
</mxCell>
<mxCell id="e_n_proto" style="edgeStyle=orthogonalEdgeStyle;html=1;endArrow=block;" edge="1" parent="1" source="n_dcl" target="n_proto">
<mxGeometry relative="1" as="geometry" />
</mxCell>
</root>
</mxGraphModel>
</diagram>
</mxfile>
Binary file not shown.

After

Width:  |  Height:  |  Size: 429 KiB

+44 -17
View File
@@ -6,23 +6,50 @@ ScadaBridge uses a hub-and-spoke architecture:
- **Central Cluster**: Two-node active/standby Akka.NET cluster for management, UI, and coordination. - **Central Cluster**: Two-node active/standby Akka.NET cluster for management, UI, and coordination.
- **Site Clusters**: Two-node active/standby Akka.NET clusters at each remote site for data collection and local processing. - **Site Clusters**: Two-node active/standby Akka.NET clusters at each remote site for data collection and local processing.
``` ```mermaid
┌──────────────────────────┐ %%{init: {'theme':'base', 'themeVariables': {'textColor':'#111111','lineColor':'#555555','edgeLabelBackground':'#ffffff','fontSize':'15px'}}}%%
│ Central Cluster │ flowchart TD
│ ┌──────┐ ┌──────┐ │ USERS["Users<br/>(HTTPS / LB)"]
Users ──────────► │ │Node A│◄──►│Node B│ │
(HTTPS/LB) │ │Active│ │Stby │ │ subgraph CENTRAL["Central Cluster"]
│ └──┬───┘ └──┬───┘ │ NA["Node A<br/>Active"]
└─────┼───────────┼────────┘ NB["Node B<br/>Standby"]
│ │ NA <--> NB
┌───────────┼───────────┼───────────┐ end
│ │ │ │
┌─────▼─────┐ ┌──▼──────┐ ┌──▼──────┐ ┌──▼──────┐ USERS --> NA
│ Site 01 │ │ Site 02 │ │ Site 03 │ │ Site N │ CENTRAL --> SITE01
│ ┌──┐ ┌──┐ │ │ ┌──┐┌──┐│ │ ┌──┐┌──┐│ │ ┌──┐┌──┐│ CENTRAL --> SITE02
│ │A │ │B │ │ │ │A ││B ││ │ │A ││B ││ │ │A ││B ││ CENTRAL --> SITE03
│ └──┘ └──┘ │ │ └──┘└──┘│ │ └──┘└──┘│ │ └──┘└──┘│ CENTRAL --> SITEN
└───────────┘ └─────────┘ └─────────┘ └─────────┘
subgraph SITE01["Site 01"]
S01A["A<br/>Active"]
S01B["B<br/>Standby"]
end
subgraph SITE02["Site 02"]
S02A["A<br/>Active"]
S02B["B<br/>Standby"]
end
subgraph SITE03["Site 03"]
S03A["A<br/>Active"]
S03B["B<br/>Standby"]
end
subgraph SITEN["Site N"]
SNA["A<br/>Active"]
SNB["B<br/>Standby"]
end
classDef start fill:#d5e8d4,stroke:#82b366,color:#111111;
classDef proc fill:#dae8fc,stroke:#6c8ebf,color:#111111;
classDef dec fill:#fff2cc,stroke:#d6b656,color:#111111;
classDef warn fill:#ffe6cc,stroke:#d79b00,color:#111111;
classDef muted fill:#f5f5f5,stroke:#999999,color:#666666;
class USERS dec
class CENTRAL proc
class NA,S01A,S02A,S03A,SNA start
class NB,S01B,S02B,S03B,SNB muted
class SITE01,SITE02,SITE03,SITEN warn
``` ```
## Central Cluster Setup ## Central Cluster Setup
@@ -39,27 +39,42 @@ Both endpoints use the same `Protocol`. EF Core migration renames `Configuration
The `DataConnectionActor` Reconnecting state is extended: The `DataConnectionActor` Reconnecting state is extended:
``` ```mermaid
Connected %%{init: {'theme':'base', 'themeVariables': {'textColor':'#111111','lineColor':'#555555','edgeLabelBackground':'#ffffff','fontSize':'15px'}}}%%
│ disconnect detected flowchart TD
C(["Connected"])
Push bad quality to all subscribers BQ["Push bad quality<br/>to all subscribers"]
RT["Retry active endpoint<br/>(5s interval)"]
INC["_consecutiveFailures++"]
Retry active endpoint (5s interval) BR{"Evaluate<br/>_consecutiveFailures"}
│ failure SAME["Retry same endpoint"]
FO["Failover<br/>- dispose adapter, switch _activeEndpoint, reset counter<br/>- create fresh adapter with other config<br/>- attempt connect"]
_consecutiveFailures++ NB["Keep retrying indefinitely<br/>(current behavior)"]
RC(["On successful reconnect (either endpoint)<br/>1. Reset _consecutiveFailures = 0<br/>2. ReSubscribeAll() — re-create subscriptions on new adapter<br/>3. Transition to Connected<br/>4. Log failover event if endpoint changed<br/>5. Report active endpoint in health metrics"])
├─ < FailoverRetryCount → retry same endpoint
C -->|"disconnect detected"| BQ
├─ ≥ FailoverRetryCount AND backup exists BQ --> RT
│ → dispose adapter, switch _activeEndpoint, reset counter RT -->|"failure"| INC
│ → create fresh adapter with other config INC --> BR
│ → attempt connect BR -->|"&lt; FailoverRetryCount"| SAME
SAME -.->|"retry"| RT
└─ ≥ FailoverRetryCount AND no backup BR -->|"&gt;= FailoverRetryCount AND backup exists"| FO
→ keep retrying indefinitely (current behavior) BR -->|"&gt;= FailoverRetryCount AND no backup"| NB
NB -.->|"retry (round-robin n/a)"| RT
FO -->|"connect succeeds"| RC
FO -.->|"connect fails (round-robin: primary to backup to primary...)"| RT
RC -->|"Transition to Connected"| C
classDef start fill:#d5e8d4,stroke:#82b366,color:#111111;
classDef proc fill:#dae8fc,stroke:#6c8ebf,color:#111111;
classDef dec fill:#fff2cc,stroke:#d6b656,color:#111111;
classDef warn fill:#ffe6cc,stroke:#d79b00,color:#111111;
classDef bad fill:#f8cecc,stroke:#b85450,color:#111111;
class C,RC start
class BQ,RT,SAME proc
class INC,BR dec
class FO warn
class NB bad
``` ```
**On successful reconnect (either endpoint):** **On successful reconnect (either endpoint):**
@@ -32,33 +32,45 @@ We want a strongly-typed model for OPC UA endpoint configuration, a validator th
## Architecture ## Architecture
``` ```mermaid
┌──────────────────────────────────────┐ %%{init: {'theme':'base', 'themeVariables': {'textColor':'#111111','lineColor':'#555555','edgeLabelBackground':'#ffffff','fontSize':'15px'}}}%%
│ ZB.MOM.WW.ScadaBridge.Commons │ flowchart TD
│ Types/DataConnections/ │ subgraph COMMONS["ZB.MOM.WW.ScadaBridge.Commons"]
OpcUaEndpointConfig.cs (POCO) TYPES["Types/DataConnections/<br/>OpcUaEndpointConfig.cs (POCO)<br/>OpcUaHeartbeatConfig.cs (POCO)<br/>OpcUaSecurityMode.cs (enum)"]
│ OpcUaHeartbeatConfig.cs (POCO) │ VALID["Validators/<br/>OpcUaEndpointConfigValidator.cs"]
│ OpcUaSecurityMode.cs (enum) │ SER["Serialization/<br/>OpcUaEndpointConfigSerializer.cs"]
│ Validators/ │ TYPES ~~~ VALID ~~~ SER
│ OpcUaEndpointConfigValidator.cs │ end
│ Serialization/ │
│ OpcUaEndpointConfigSerializer.cs │ subgraph CENTRALUI["ZB.MOM.WW.ScadaBridge.CentralUI"]
└──────────────────────────────────────┘ CUIFORMS["Components/Forms/<br/>OpcUaEndpointEditor.razor (shared)"]
CUIPAGES["Pages/Admin/<br/>DataConnectionForm.razor"]
│ (referenced by both) CUIFORMS ~~~ CUIPAGES
┌───────┴────────────────────────┐ end
▼ ▼
┌──────────────────────────┐ ┌────────────────────────────┐ subgraph SITERUNTIME["ZB.MOM.WW.ScadaBridge.SiteRuntime"]
│ ZB.MOM.WW.ScadaBridge.CentralUI │ │ ZB.MOM.WW.ScadaBridge.SiteRuntime │ SRACTORS["Actors/<br/>DeploymentManagerActor<br/>(passes raw JSON to DataConnectionFactory)"]
│ Components/Forms/ │ │ Actors/ │ SRDC["DataConnections.OpcUa/<br/>OpcUaDataConnection.cs<br/>(consumes typed model)"]
│ OpcUaEndpointEditor │ │ DeploymentManagerActor │ SRACTORS ~~~ SRDC
│ .razor (shared) │ │ (passes raw JSON to │ end
│ │ │ DataConnectionFactory)│
│ Pages/Admin/ │ │ │ COMMONS -->|referenced by| CENTRALUI
│ DataConnectionForm │ │ DataConnections.OpcUa/ │ COMMONS -->|referenced by| SITERUNTIME
│ .razor │ │ OpcUaDataConnection.cs │
└──────────────────────────┘ │ (consumes typed model) │ NOTE["Both sides deserialize DataConnection.PrimaryConfiguration / BackupConfiguration<br/>into the same OpcUaEndpointConfig instance. The DB column type does not change."]
└────────────────────────────┘ CENTRALUI -.- NOTE
SITERUNTIME -.- NOTE
classDef start fill:#d5e8d4,stroke:#82b366,color:#111111;
classDef proc fill:#dae8fc,stroke:#6c8ebf,color:#111111;
classDef dec fill:#fff2cc,stroke:#d6b656,color:#111111;
classDef warn fill:#ffe6cc,stroke:#d79b00,color:#111111;
classDef muted fill:#f5f5f5,stroke:#999999,color:#666666;
class COMMONS dec
class TYPES,VALID,SER warn
class CENTRALUI,CUIFORMS,CUIPAGES proc
class SITERUNTIME,SRACTORS,SRDC start
class NOTE muted
``` ```
Both sides deserialize from `DataConnection.PrimaryConfiguration` / `BackupConfiguration` strings into the same `OpcUaEndpointConfig` instance. The DB column type does not change. Both sides deserialize from `DataConnection.PrimaryConfiguration` / `BackupConfiguration` strings into the same `OpcUaEndpointConfig` instance. The DB column type does not change.
@@ -17,29 +17,8 @@ A sibling `docker-env2/` directory with `deploy.sh` / `teardown.sh` / `seed-site
## Architecture Overview ## Architecture Overview
``` ![env2-architecture-overview](diagrams/env2-architecture-overview.png)
(host machine) <!-- source: diagrams/env2-architecture-overview.drawio — edit, then re-export with export-drawio.sh -->
Primary stack (already existing — unchanged) Env2 stack (new)
┌────────────────────────────────────┐ ┌──────────────────────────────┐
│ Traefik :9000 ◄── 9001/9002 UI │ │ Traefik :9100 ◄── 9101/9102 UI│
│ Central A/B (9011/9012 Akka) │ │ Central A/B (9111/9112 Akka) │
│ Site-A/B/C (9021..9044) │ │ Site-X (9121/9122 Akka, │
└─────────────┬──────────────────────┘ │ 9123/9124 gRPC) │
│ └──────────┬───────────────────┘
│ │
▼ scadabridge-net (shared bridge network) ◄──────┘
┌──────────────────────────────────────────────────────────────┐
│ scadabridge-mssql ScadaBridgeConfig (primary DB) │
│ ScadaBridgeMachineData (primary DB) │
│ ScadaBridgeConfig2 (env2 DB) ← new │
│ ScadaBridgeMachineData2(env2 DB) ← new │
│ scadabridge-ldap (shared — same test users) │
│ scadabridge-smtp (shared Mailpit) │
│ scadabridge-opcua (shared) │
│ scadabridge-restapi (shared) │
└──────────────────────────────────────────────────────────────┘
```
Both stacks attach to the same `scadabridge-net` Docker bridge so env2's app containers can reach the infra services by container hostname (`scadabridge-mssql`, `scadabridge-ldap`, etc.). Akka clusters are independent — each side's `SeedNodes` lists only its own central nodes, so they never gossip-merge despite sharing the network. Both stacks attach to the same `scadabridge-net` Docker bridge so env2's app containers can reach the infra services by container hostname (`scadabridge-mssql`, `scadabridge-ldap`, etc.). Akka clusters are independent — each side's `SeedNodes` lists only its own central nodes, so they never gossip-merge despite sharing the network.
+37 -12
View File
@@ -14,18 +14,43 @@
## Task Dependency Graph ## Task Dependency Graph
``` ```mermaid
T0 ─┐ ┐ %%{init: {'theme':'base', 'themeVariables': {'textColor':'#111111','lineColor':'#555555','edgeLabelBackground':'#ffffff','fontSize':'15px'}}}%%
T1 ─┤ (all independent, all │ flowchart LR
T2 ─┤ parallelizable, all ├─► T10 (manual smoke test) GRP["all independent, all parallelizable, all ready from the start"]
T3 ─┤ ready from the start) │ T0["T0"]
T4 ─┤ │ T1["T1"]
T6 ─┤ │ T2["T2"]
T7 ─┤ │ T3["T3"]
T8 ─┤ │ T4["T4"]
T9 ─┘ │ T6["T6"]
T7["T7"]
T0,T4 ──► T5 (lifecycle scripts) ─────────┘ T8["T8"]
T9["T9"]
T5["T5<br/>lifecycle scripts"]
T10(["T10<br/>manual smoke test"])
NOTE["T10 is the only task that requires all of T0T9 done. Everything else runs in parallel."]
T0 --> T10
T1 --> T10
T2 --> T10
T3 --> T10
T6 --> T10
T7 --> T10
T8 --> T10
T9 --> T10
T0 --> T5
T4 --> T5
T5 --> T10
classDef start fill:#d5e8d4,stroke:#82b366,color:#111111;
classDef proc fill:#dae8fc,stroke:#6c8ebf,color:#111111;
classDef warn fill:#ffe6cc,stroke:#d79b00,color:#111111;
classDef muted fill:#f5f5f5,stroke:#999999,color:#666666;
class T0,T1,T2,T3,T4,T6,T7,T8,T9 proc
class T5 warn
class T10 start
class GRP,NOTE muted
``` ```
T10 is the only task that requires all of T0T9 done. Everything else can run in parallel. T10 is the only task that requires all of T0T9 done. Everything else can run in parallel.
@@ -18,26 +18,32 @@
## Section 1 — Architecture ## Section 1 — Architecture
``` ```mermaid
[Blazor Server browser] %%{init: {'theme':'base', 'themeVariables': {'textColor':'#111111','lineColor':'#555555','edgeLabelBackground':'#ffffff','fontSize':'15px'}}}%%
│ SignalR flowchart TD
N1["Blazor Server browser"]
[CentralUI: InstanceConfigure.razor] N2["CentralUI: InstanceConfigure.razor"]
│ opens N3["CentralUI: OpcUaBrowserDialog component"]
N4["CentralUI: IOpcUaBrowseService"]
[CentralUI: <OpcUaBrowserDialog/>] N5["CommunicationService.SendCommandToSiteAsync of BrowseOpcUaNodeResult (siteId, BrowseOpcUaNodeCommand)"]
│ uses N6["Site: CentralCommunicationActor → DataConnectionManagerActor"]
N7["OPC UA server"]
[CentralUI: IOpcUaBrowseService] ── implementation calls
N1 -->|SignalR| N2
N2 -->|opens| N3
[CommunicationService.SendCommandToSiteAsync<BrowseOpcUaNodeResult>(siteId, BrowseOpcUaNodeCommand)] N3 -->|uses| N4
│ ClusterClient Ask, ManagementEnvelope { User, Command, CorrelationId } N4 -->|implementation calls| N5
N5 -->|"ClusterClient Ask<br/>ManagementEnvelope { User, Command, CorrelationId }"| N6
[Site: CentralCommunicationActor → DataConnectionManagerActor] N6 -->|"dispatches to IBrowsableDataConnection (RealOpcUaClient)<br/>OPC Foundation .NET SDK Browse service"| N7
│ dispatches to IBrowsableDataConnection (RealOpcUaClient)
classDef start fill:#d5e8d4,stroke:#82b366,color:#111111;
[OPC UA server] ◄── OPC Foundation .NET SDK Browse service classDef proc fill:#dae8fc,stroke:#6c8ebf,color:#111111;
classDef warn fill:#ffe6cc,stroke:#d79b00,color:#111111;
classDef alt fill:#e1d5e7,stroke:#9673a6,color:#111111;
class N1,N2,N3,N4 proc
class N5 alt
class N6 start
class N7 warn
``` ```
Three slices, top-to-bottom: Three slices, top-to-bottom:
@@ -164,24 +170,47 @@ Returning failure inside `BrowseOpcUaNodeResult` (rather than exceptions across
**Wire flow.** **Wire flow.**
``` ```mermaid
CentralUI.OpcUaBrowseService.BrowseChildrenAsync(siteId, connId, parent) %%{init: {'theme':'base', 'themeVariables': {'textColor':'#111111','lineColor':'#555555','edgeLabelBackground':'#ffffff','fontSize':'15px'}}}%%
→ CommunicationService.SendCommandToSiteAsync<BrowseOpcUaNodeResult>( flowchart TD
siteId, S1["CentralUI.OpcUaBrowseService.BrowseChildrenAsync(siteId, connId, parent)"]
new BrowseOpcUaNodeCommand(connId, parent)) S2["CommunicationService.SendCommandToSiteAsync of BrowseOpcUaNodeResult (siteId, new BrowseOpcUaNodeCommand(connId, parent))"]
ManagementEnvelope { User, Command, CorrelationId } over ClusterClient S3["ManagementEnvelope { User, Command, CorrelationId }<br/>over ClusterClient"]
Site: CentralCommunicationActor unwraps envelope S4["Site: CentralCommunicationActor unwraps envelope"]
Site: DataConnectionManagerActor receives BrowseOpcUaNodeCommand S5["Site: DataConnectionManagerActor receives BrowseOpcUaNodeCommand<br/>(DCL coordinator actor — owns the per-connection IDataConnection instances)"]
- Look up IDataConnection by Id
- if not found → ConnectionNotFound S1 --> S2 --> S3 --> S4 --> S5
- if !(conn is IBrowsableDataConnection) → NotBrowsable
- else await conn.BrowseChildrenAsync(ParentNodeId, ct) subgraph HANDLER["Handler logic"]
- Catch ConnectionNotConnectedException → ConnectionNotConnected direction TB
- Catch OperationCanceledException → Timeout HL["Look up IDataConnection by Id"]
- Catch ServiceResultException → ServerError + verbatim msg HNF["if not found → ConnectionNotFound"]
- Else success: BrowseOpcUaNodeResult(children, truncated, null) HNB["if not (conn is IBrowsableDataConnection) → NotBrowsable"]
→ Reply travels back via CentralCommunicationActor → CommunicationService HAW["else await conn.BrowseChildrenAsync(ParentNodeId, ct)"]
→ returned to CentralUI page HNC["Catch ConnectionNotConnectedException → ConnectionNotConnected"]
HCN["Catch OperationCanceledException → Timeout"]
HSVC["Catch ServiceResultException → ServerError + verbatim msg"]
HSUC["Else success: BrowseOpcUaNodeResult(children, truncated, null)"]
HL --- HNF --- HNB --- HAW --- HNC --- HCN --- HSVC --- HSUC
end
S5 -->|processes| HANDLER
R1["Reply travels back via<br/>CentralCommunicationActor → CommunicationService"]
R2["returned to CentralUI page"]
HANDLER -->|result / failure| R1
R1 --> R2
classDef start fill:#d5e8d4,stroke:#82b366,color:#111111;
classDef proc fill:#dae8fc,stroke:#6c8ebf,color:#111111;
classDef dec fill:#fff2cc,stroke:#d6b656,color:#111111;
classDef bad fill:#f8cecc,stroke:#b85450,color:#111111;
classDef alt fill:#e1d5e7,stroke:#9673a6,color:#111111;
class S1,R1,R2 proc
class S2,S3 alt
class S4,S5,HSUC start
class HANDLER,HL,HAW dec
class HNF,HNB,HNC,HCN,HSVC bad
``` ```
Handler lives in the **DCL coordinator actor** (the same actor that owns the per-connection `IDataConnection` instances) — keeps lifecycle and browse co-located so we don't race against reconnect. Handler lives in the **DCL coordinator actor** (the same actor that owns the per-connection `IDataConnection` instances) — keeps lifecycle and browse co-located so we don't race against reconnect.
+80 -16
View File
@@ -16,22 +16,86 @@
## Task dependency overview ## Task dependency overview
``` ```mermaid
T1 ─┬─ T2 ─┬─ T17 (computed AlarmActor enrich) %%{init: {'theme':'base', 'themeVariables': {'textColor':'#111111','lineColor':'#555555','edgeLabelBackground':'#ffffff','fontSize':'15px'}}}%%
│ ├─ T18 (proto) ── T19 (grpc mapping) ── T23 (DebugView) flowchart LR
T3 ─┼─ T10 (DCL actor) T1["T1"]
├─ T11 (OPC UA adapter) T3["T3"]
└─ T12 (MxGateway adapter) T2["T2"]
T4 ─┬─ T5 ── T6 ── T21 (mgmt handlers) T10["T10<br/>DCL actor"]
├─ T7 (migration) T11["T11<br/>OPC UA adapter"]
├─ T8 ── T9 (validation) T12["T12<br/>MxGateway adapter"]
└─ T20 ─┬─ T21 ── T26 (seed) T17["T17<br/>computed AlarmActor enrich"]
├─ T22 (CLI) T18["T18<br/>proto"]
├─ T24 (template UI) T19["T19<br/>grpc mapping"]
└─ T25 (instance UI) T23["T23<br/>DebugView"]
T13, T14 ──┐
T1,T2,T3,T4(Resolved),T13,T14 ── T15 (NativeAlarmActor) ── T16 (InstanceActor wiring) T4["T4"]
(everything) ── T27 (docs) , T28 (integration/manual verify) T5["T5"]
T6["T6"]
T7["T7<br/>migration"]
T8["T8"]
T9["T9<br/>validation"]
T20["T20"]
T21["T21<br/>mgmt handlers"]
T26["T26<br/>seed"]
T22["T22<br/>CLI"]
T24["T24<br/>template UI"]
T25["T25<br/>instance UI"]
T13["T13"]
T14["T14"]
T15["T15<br/>NativeAlarmActor"]
T16["T16<br/>InstanceActor wiring"]
T15IN["inputs to T15:<br/>T1, T2, T3, T4 (Resolved), T13, T14"]
T27["T27<br/>docs"]
T28["T28<br/>integration / manual verify"]
EVT["(everything) emits to T27 and T28"]
T1 --> T2
T1 --> T10
T1 --> T11
T1 --> T12
T3 --> T2
T3 --> T10
T3 --> T11
T3 --> T12
T2 --> T17
T2 --> T18
T18 --> T19
T19 --> T23
T4 --> T5
T4 --> T7
T4 --> T8
T4 --> T20
T5 --> T6
T6 --> T21
T8 --> T9
T20 --> T21
T20 --> T22
T20 --> T24
T20 --> T25
T21 --> T26
T13 --> T15
T14 --> T15
T15 --> T16
classDef start fill:#d5e8d4,stroke:#82b366,color:#111111;
classDef proc fill:#dae8fc,stroke:#6c8ebf,color:#111111;
classDef dec fill:#fff2cc,stroke:#d6b656,color:#111111;
classDef warn fill:#ffe6cc,stroke:#d79b00,color:#111111;
classDef alt fill:#e1d5e7,stroke:#9673a6,color:#111111;
classDef muted fill:#f5f5f5,stroke:#999999,color:#666666;
class T1,T2,T3,T10,T11,T12 proc
class T17,T18,T19,T23 alt
class T4,T5,T6,T7,T8,T9,T20,T21,T22,T24,T25,T26 start
class T13,T14 dec
class T15,T16 warn
class T27,T28,T15IN,EVT muted
``` ```
--- ---
@@ -6,8 +6,8 @@
{"id": 2, "subject": "Task 2: Run scrub + completeness gate", "status": "completed", "blockedBy": [1]}, {"id": 2, "subject": "Task 2: Run scrub + completeness gate", "status": "completed", "blockedBy": [1]},
{"id": 3, "subject": "Task 3: Build, run unit tests, fix stragglers", "status": "completed", "blockedBy": [2]}, {"id": 3, "subject": "Task 3: Build, run unit tests, fix stragglers", "status": "completed", "blockedBy": [2]},
{"id": 4, "subject": "Task 4: Commit the scrub", "status": "completed", "blockedBy": [3]}, {"id": 4, "subject": "Task 4: Commit the scrub", "status": "completed", "blockedBy": [3]},
{"id": 5, "subject": "Task 5: Update local Git remote after Gitea web-UI rename", "status": "pending", "blockedBy": [4]}, {"id": 5, "subject": "Task 5: Update local Git remote after Gitea web-UI rename", "status": "completed", "blockedBy": [4]},
{"id": 6, "subject": "Task 6: Runtime cutover redeploy (conditional)", "status": "pending", "blockedBy": [4]} {"id": 6, "subject": "Task 6: Runtime cutover redeploy (conditional)", "status": "completed", "blockedBy": [4]}
], ],
"lastUpdated": "2026-06-01T01:59:34Z" "lastUpdated": "2026-06-01T02:05:03Z"
} }
@@ -0,0 +1,96 @@
<mxfile host="app.diagrams.net">
<diagram id="env2arch" name="Env2 Architecture">
<mxGraphModel dx="1400" dy="900" grid="1" gridSize="10" guides="1" arrows="1" fold="1" page="1" pageScale="1" pageWidth="1100" pageHeight="900" math="0" shadow="0">
<root>
<mxCell id="0" />
<mxCell id="1" parent="0" />
<!-- Host machine container -->
<mxCell id="host" value="host machine" style="rounded=0;whiteSpace=wrap;html=1;fillColor=none;strokeColor=#666666;verticalAlign=top;fontStyle=2;fontColor=#666666;dashed=1;" vertex="1" parent="1">
<mxGeometry x="40" y="40" width="1020" height="800" as="geometry" />
</mxCell>
<!-- Primary stack -->
<mxCell id="primary" value="Primary stack&#10;(already existing — unchanged)" style="rounded=1;whiteSpace=wrap;html=1;fillColor=#dae8fc;strokeColor=#6c8ebf;verticalAlign=top;fontStyle=1;align=center;spacingTop=6;" vertex="1" parent="1">
<mxGeometry x="80" y="100" width="420" height="220" as="geometry" />
</mxCell>
<mxCell id="p-traefik" value="Traefik :9000 ◄── 9001/9002 UI" style="whiteSpace=wrap;html=1;fillColor=#ffffff;strokeColor=#6c8ebf;align=left;spacingLeft=8;" vertex="1" parent="1">
<mxGeometry x="100" y="160" width="380" height="40" as="geometry" />
</mxCell>
<mxCell id="p-central" value="Central A/B (9011/9012 Akka)" style="whiteSpace=wrap;html=1;fillColor=#ffffff;strokeColor=#6c8ebf;align=left;spacingLeft=8;" vertex="1" parent="1">
<mxGeometry x="100" y="210" width="380" height="40" as="geometry" />
</mxCell>
<mxCell id="p-site" value="Site-A/B/C (9021..9044)" style="whiteSpace=wrap;html=1;fillColor=#ffffff;strokeColor=#6c8ebf;align=left;spacingLeft=8;" vertex="1" parent="1">
<mxGeometry x="100" y="260" width="380" height="40" as="geometry" />
</mxCell>
<!-- Env2 stack -->
<mxCell id="env2" value="Env2 stack&#10;(new)" style="rounded=1;whiteSpace=wrap;html=1;fillColor=#d5e8d4;strokeColor=#82b366;verticalAlign=top;fontStyle=1;align=center;spacingTop=6;" vertex="1" parent="1">
<mxGeometry x="600" y="100" width="420" height="220" as="geometry" />
</mxCell>
<mxCell id="e-traefik" value="Traefik :9100 ◄── 9101/9102 UI" style="whiteSpace=wrap;html=1;fillColor=#ffffff;strokeColor=#82b366;align=left;spacingLeft=8;" vertex="1" parent="1">
<mxGeometry x="620" y="160" width="380" height="40" as="geometry" />
</mxCell>
<mxCell id="e-central" value="Central A/B (9111/9112 Akka)" style="whiteSpace=wrap;html=1;fillColor=#ffffff;strokeColor=#82b366;align=left;spacingLeft=8;" vertex="1" parent="1">
<mxGeometry x="620" y="210" width="380" height="40" as="geometry" />
</mxCell>
<mxCell id="e-site" value="Site-X (9121/9122 Akka, 9123/9124 gRPC)" style="whiteSpace=wrap;html=1;fillColor=#ffffff;strokeColor=#82b366;align=left;spacingLeft=8;" vertex="1" parent="1">
<mxGeometry x="620" y="260" width="380" height="40" as="geometry" />
</mxCell>
<!-- Shared network bar -->
<mxCell id="net" value="scadabridge-net (shared bridge network)" style="rounded=1;whiteSpace=wrap;html=1;fillColor=#fff2cc;strokeColor=#d6b656;fontStyle=1;" vertex="1" parent="1">
<mxGeometry x="80" y="400" width="940" height="50" as="geometry" />
</mxCell>
<!-- Infra container -->
<mxCell id="infra" value="" style="rounded=1;whiteSpace=wrap;html=1;fillColor=#f5f5f5;strokeColor=#666666;verticalAlign=top;" vertex="1" parent="1">
<mxGeometry x="120" y="500" width="860" height="300" as="geometry" />
</mxCell>
<!-- MSSQL block -->
<mxCell id="mssql" value="scadabridge-mssql" style="shape=cylinder3;whiteSpace=wrap;html=1;fillColor=#e1d5e7;strokeColor=#9673a6;verticalAlign=top;fontStyle=1;spacingTop=4;" vertex="1" parent="1">
<mxGeometry x="150" y="530" width="160" height="240" as="geometry" />
</mxCell>
<mxCell id="db1" value="ScadaBridgeConfig (primary DB)" style="whiteSpace=wrap;html=1;fillColor=#dae8fc;strokeColor=#6c8ebf;align=left;spacingLeft=8;" vertex="1" parent="1">
<mxGeometry x="340" y="540" width="300" height="40" as="geometry" />
</mxCell>
<mxCell id="db2" value="ScadaBridgeMachineData (primary DB)" style="whiteSpace=wrap;html=1;fillColor=#dae8fc;strokeColor=#6c8ebf;align=left;spacingLeft=8;" vertex="1" parent="1">
<mxGeometry x="340" y="590" width="300" height="40" as="geometry" />
</mxCell>
<mxCell id="db3" value="ScadaBridgeConfig2 (env2 DB) ← new" style="whiteSpace=wrap;html=1;fillColor=#d5e8d4;strokeColor=#82b366;align=left;spacingLeft=8;" vertex="1" parent="1">
<mxGeometry x="340" y="640" width="300" height="40" as="geometry" />
</mxCell>
<mxCell id="db4" value="ScadaBridgeMachineData2 (env2 DB) ← new" style="whiteSpace=wrap;html=1;fillColor=#d5e8d4;strokeColor=#82b366;align=left;spacingLeft=8;" vertex="1" parent="1">
<mxGeometry x="340" y="690" width="300" height="40" as="geometry" />
</mxCell>
<!-- Shared commodity infra services -->
<mxCell id="ldap" value="scadabridge-ldap&#10;(shared — same test users)" style="whiteSpace=wrap;html=1;fillColor=#ffe6cc;strokeColor=#d79b00;align=left;spacingLeft=8;" vertex="1" parent="1">
<mxGeometry x="680" y="530" width="280" height="40" as="geometry" />
</mxCell>
<mxCell id="smtp" value="scadabridge-smtp&#10;(shared Mailpit)" style="whiteSpace=wrap;html=1;fillColor=#ffe6cc;strokeColor=#d79b00;align=left;spacingLeft=8;" vertex="1" parent="1">
<mxGeometry x="680" y="580" width="280" height="40" as="geometry" />
</mxCell>
<mxCell id="opcua" value="scadabridge-opcua (shared)" style="whiteSpace=wrap;html=1;fillColor=#ffe6cc;strokeColor=#d79b00;align=left;spacingLeft=8;" vertex="1" parent="1">
<mxGeometry x="680" y="630" width="280" height="40" as="geometry" />
</mxCell>
<mxCell id="restapi" value="scadabridge-restapi (shared)" style="whiteSpace=wrap;html=1;fillColor=#ffe6cc;strokeColor=#d79b00;align=left;spacingLeft=8;" vertex="1" parent="1">
<mxGeometry x="680" y="680" width="280" height="40" as="geometry" />
</mxCell>
<!-- Edges: primary -> net, env2 -> net -->
<mxCell id="ep" style="edgeStyle=orthogonalEdgeStyle;rounded=0;html=1;endArrow=block;strokeColor=#6c8ebf;" edge="1" parent="1" source="primary" target="net">
<mxGeometry relative="1" as="geometry" />
</mxCell>
<mxCell id="ee" style="edgeStyle=orthogonalEdgeStyle;rounded=0;html=1;endArrow=block;strokeColor=#82b366;" edge="1" parent="1" source="env2" target="net">
<mxGeometry relative="1" as="geometry" />
</mxCell>
<!-- net -> infra -->
<mxCell id="eni" style="edgeStyle=orthogonalEdgeStyle;rounded=0;html=1;endArrow=block;strokeColor=#d6b656;" edge="1" parent="1" source="net" target="infra">
<mxGeometry relative="1" as="geometry" />
</mxCell>
</root>
</mxGraphModel>
</diagram>
</mxfile>
Binary file not shown.

After

Width:  |  Height:  |  Size: 317 KiB

+20 -20
View File
@@ -547,26 +547,26 @@ This section governs how implementation plans are executed. The goal is autonomo
For each work package, follow this sequence: For each work package, follow this sequence:
``` ```mermaid
┌─────────────────────────────────────────────────────┐ %%{init: {'theme':'base', 'themeVariables': {'textColor':'#111111','lineColor':'#555555','edgeLabelBackground':'#ffffff','fontSize':'15px'}}}%%
│ 1. READ the WP description and acceptance criteria │ flowchart TD
│ 2. READ all traced requirements (HLR bullets, KDD, │ S1["1. READ the WP description and acceptance criteria"]
CD constraints) to understand intent S2["2. READ all traced requirements (HLR bullets, KDD, CD constraints) to understand intent"]
3. IMPLEMENT the WP S3["3. IMPLEMENT the WP<br/>• Write code<br/>• Write unit tests for acceptance criteria<br/>• Write negative tests for prohibition criteria"]
│ - Write code │ S4["4. VERIFY acceptance criteria<br/>• Run tests: all must pass<br/>• Walk each acceptance criterion line by line<br/>• If a criterion cannot be verified yet (depends on a later WP), note it as deferred to WP-N"]
│ - Write unit tests for acceptance criteria │ S5["5. UPDATE the phase execution checklist<br/>• Mark WP as complete with date<br/>• Note any deferred criteria<br/>• Note any questions logged"]
│ - Write negative tests for prohibition criteria │ S6["6. COMMIT with message:<br/>Phase N WP-M: summary"]
│ 4. VERIFY acceptance criteria │
│ - Run tests: all must pass │ S1 --> S2 --> S3 --> S4 --> S5 --> S6
│ - Walk each acceptance criterion line by line │
│ - If a criterion cannot be verified yet (depends │ classDef start fill:#d5e8d4,stroke:#82b366,color:#111111;
│ on a later WP), note it as "deferred to WP-N" │ classDef proc fill:#dae8fc,stroke:#6c8ebf,color:#111111;
│ 5. UPDATE the phase execution checklist │ classDef dec fill:#fff2cc,stroke:#d6b656,color:#111111;
│ - Mark WP as complete with date │ classDef warn fill:#ffe6cc,stroke:#d79b00,color:#111111;
│ - Note any deferred criteria │ class S1,S2 proc
│ - Note any questions logged │ class S3 start
│ 6. COMMIT with message: "Phase N WP-M: <summary>" │ class S4,S5 dec
└─────────────────────────────────────────────────────┘ class S6 warn
``` ```
### Mid-Phase Compliance Check ### Mid-Phase Compliance Check
+73 -56
View File
@@ -23,28 +23,38 @@ gRPC server-streaming is an established pattern for real-time tag value updates;
## Architecture ## Architecture
``` ```mermaid
Central Cluster Site Cluster %%{init: {'theme':'base', 'themeVariables': {'textColor':'#111111','lineColor':'#555555','edgeLabelBackground':'#ffffff','fontSize':'15px'}}}%%
───────────── ──────────── flowchart TD
subgraph CENTRAL["Central Cluster"]
BT["DebugStreamBridgeActor"]
GC["SiteStreamGrpcClient<br/>(per-site, on central)"]
BB["DebugStreamBridgeActor"]
SR(["SignalR Hub / Blazor UI"])
end
DebugStreamBridgeActor InstanceActor subgraph SITE["Site Cluster"]
│ │ IN["InstanceActor"]
│── SubscribeDebugView ──► │ (ClusterClient: command/control) PB{"publishes<br/>AttributeValueChanged<br/>AlarmStateChanged"}
│◄── DebugViewSnapshot ── │ GS["SiteStreamGrpcServer<br/>(Kestrel, on site)"]
│ │ end
│ │ publishes AttributeValueChanged
│ │ publishes AlarmStateChanged BT -.->|"SubscribeDebugView"| IN
│ ▼ IN -.->|"DebugViewSnapshot"| BT
SiteStreamGrpcClient ◄──── gRPC stream ───── SiteStreamGrpcServer IN --> PB
(per-site, on central) (HTTP/2) (Kestrel, on site) PB --> GS
│ │ GS -->|"gRPC stream (HTTP/2)"| GC
│ reads from gRPC stream │ receives from SiteStreamManager GC --> BB
│ routes by correlationId │ filters by instance name BB --> SR
▼ │
DebugStreamBridgeActor │ classDef start fill:#d5e8d4,stroke:#82b366,color:#111111;
│ │ classDef proc fill:#dae8fc,stroke:#6c8ebf,color:#111111;
▼ │ classDef dec fill:#fff2cc,stroke:#d6b656,color:#111111;
SignalR Hub / Blazor UI │ classDef warn fill:#ffe6cc,stroke:#d79b00,color:#111111;
class BT,GC,BB proc
class SR start
class IN,GS warn
class PB dec
``` ```
**Key separation**: ClusterClient handles subscribe/unsubscribe/snapshot (request-response). gRPC handles the ongoing value stream (server-streaming). **Key separation**: ClusterClient handles subscribe/unsubscribe/snapshot (request-response). gRPC handles the ongoing value stream (server-streaming).
@@ -271,16 +281,22 @@ public override async Task SubscribeInstance(
`IServerStreamWriter<T>` is **not thread-safe**. Multiple Akka actors may publish events concurrently. The `Channel<SiteStreamEvent>` bridges these worlds: `IServerStreamWriter<T>` is **not thread-safe**. Multiple Akka actors may publish events concurrently. The `Channel<SiteStreamEvent>` bridges these worlds:
``` ```mermaid
Akka Actor Thread(s) gRPC Response Stream %%{init: {'theme':'base', 'themeVariables': {'textColor':'#111111','lineColor':'#555555','edgeLabelBackground':'#ffffff','fontSize':'15px'}}}%%
│ ▲ flowchart TD
│ channel.Writer.TryWrite(evt) │ await responseStream.WriteAsync(evt) AKKA["Akka Actor Thread(s)"]
▼ │ CH(["Channel&lt;SiteStreamEvent&gt;<br/><br/>BoundedChannelOptions(1000)<br/>FullMode = DropOldest"])
┌─────────────────────────────────────────┐ GRPC["gRPC Response Stream"]
│ Channel<SiteStreamEvent> │
│ BoundedChannelOptions(1000) │ AKKA -->|"channel.Writer.TryWrite(evt)"| CH
│ FullMode = DropOldest │ CH -->|"await responseStream.WriteAsync(evt)"| GRPC
└─────────────────────────────────────────┘
classDef start fill:#d5e8d4,stroke:#82b366,color:#111111;
classDef proc fill:#dae8fc,stroke:#6c8ebf,color:#111111;
classDef warn fill:#ffe6cc,stroke:#d79b00,color:#111111;
class AKKA warn
class CH start
class GRPC proc
``` ```
- **Bounded capacity** (1000): prevents unbounded memory growth if the gRPC client is slow - **Bounded capacity** (1000): prevents unbounded memory growth if the gRPC client is slow
@@ -431,31 +447,32 @@ private void HandleGrpcStreamError(Exception ex)
### Reconnection State Machine (DebugStreamBridgeActor) ### Reconnection State Machine (DebugStreamBridgeActor)
``` ```mermaid
┌──────────────────┐ %%{init: {'theme':'base', 'themeVariables': {'textColor':'#111111','lineColor':'#555555','edgeLabelBackground':'#ffffff','fontSize':'15px'}}}%%
│ Streaming │ ◄── Normal state: gRPC stream active flowchart TD
└────────┬─────────┘ S(["Streaming<br/><i>Normal state: gRPC stream active</i>"])
│ gRPC stream error / keepalive timeout R(["Reconnecting<br/><i>try other node endpoint</i>"])
D{"reconnect result?"}
┌──────────────────┐ RT["schedule retry<br/>(5s backoff)"]
┌──► │ Reconnecting │ ── try other node endpoint T(["Terminated<br/><i>notify consumer, stop actor</i>"])
│ └────────┬─────────┘
│ │ S -->|"gRPC stream error / keepalive timeout"| R
│ ┌────────┴─────────┐ R --> D
│ │ │ D -->|"success"| S
│ success failure (retry < max) D -->|"failure (retry &lt; max)"| RT
│ │ │ RT --> R
│ ▼ │ D -->|"failure (retry &gt;= max)"| T
│ Streaming schedule retry (5s backoff)
│ │ classDef start fill:#d5e8d4,stroke:#82b366,color:#111111;
└───────────────────────┘ classDef proc fill:#dae8fc,stroke:#6c8ebf,color:#111111;
classDef dec fill:#fff2cc,stroke:#d6b656,color:#111111;
failure (retry >= max) classDef warn fill:#ffe6cc,stroke:#d79b00,color:#111111;
classDef bad fill:#f8cecc,stroke:#b85450,color:#111111;
class S start
┌──────────────────┐ class R dec
│ Terminated │ ── notify consumer, stop actor class D proc
└──────────────────┘ class RT warn
class T bad
``` ```
### Summary ### Summary
+47 -12
View File
@@ -167,19 +167,54 @@ Keepalive settings are configurable via `CommunicationOptions`:
## Topology ## Topology
``` ```mermaid
Central Cluster %%{init: {'theme':'base', 'themeVariables': {'textColor':'#111111','lineColor':'#555555','edgeLabelBackground':'#ffffff','fontSize':'15px'}}}%%
├── ClusterClient → Site A Cluster (SiteCommunicationActor via Receptionist) [command/control] flowchart LR
├── ClusterClient → Site B Cluster (SiteCommunicationActor via Receptionist) [command/control] subgraph Central["Central Cluster"]
└── ClusterClient → Site N Cluster (SiteCommunicationActor via Receptionist) [command/control] CCA["ClusterClient<br/>(command/control)"]
CCB["ClusterClient<br/>(command/control)"]
├── SiteStreamGrpcClient ◄── gRPC stream ── Site A (SiteStreamGrpcServer) [real-time data] CCN["ClusterClient<br/>(command/control)"]
├── SiteStreamGrpcClient ◄── gRPC stream ── Site B (SiteStreamGrpcServer) [real-time data] GRPCC["SiteStreamGrpcClient<br/>(real-time data)"]
└── SiteStreamGrpcClient ◄── gRPC stream ── Site N (SiteStreamGrpcServer) [real-time data] end
Site Clusters subgraph SiteA["Site A Cluster"]
└── ClusterClient → Central Cluster (CentralCommunicationActor via Receptionist) [command/control] SACOMM["SiteCommunicationActor<br/>(via Receptionist)"]
└── SiteStreamGrpcServer (Kestrel HTTP/2, port 8083) → serves gRPC streams [real-time data] SAGRPC["SiteStreamGrpcServer<br/>(Kestrel HTTP/2, port 8083)"]
SACC["ClusterClient to Central<br/>(CentralCommunicationActor)"]
end
subgraph SiteB["Site B Cluster"]
SBCOMM["SiteCommunicationActor<br/>(via Receptionist)"]
SBGRPC["SiteStreamGrpcServer"]
end
subgraph SiteN["Site N Cluster"]
SNCOMM["SiteCommunicationActor<br/>(via Receptionist)"]
SNGRPC["SiteStreamGrpcServer"]
end
CCA -->|command/control| SACOMM
CCB -->|command/control| SBCOMM
CCN -->|command/control| SNCOMM
SAGRPC -->|"gRPC stream (real-time data)"| GRPCC
SBGRPC -->|gRPC stream| GRPCC
SNGRPC -->|gRPC stream| GRPCC
SACC -.->|command/control| Central
NOTE["Sites do NOT communicate with each other.<br/>All inter-cluster communication flows through Central."]
classDef start fill:#d5e8d4,stroke:#82b366,color:#111111;
classDef proc fill:#dae8fc,stroke:#6c8ebf,color:#111111;
classDef dec fill:#fff2cc,stroke:#d6b656,color:#111111;
classDef alt fill:#e1d5e7,stroke:#9673a6,color:#111111;
classDef muted fill:#f5f5f5,stroke:#999999,color:#666666;
class CCA,CCB,CCN,SACOMM,SACC,SBCOMM,SNCOMM dec
class GRPCC,SAGRPC,SBGRPC,SNGRPC start
class NOTE muted
class Central proc
class SiteA,SiteB,SiteN alt
``` ```
- Sites do **not** communicate with each other. - Sites do **not** communicate with each other.
@@ -143,14 +143,32 @@ EF Core's DbContext naturally provides unit-of-work semantics:
### Example Transactional Flow ### Example Transactional Flow
``` ```mermaid
Template Engine: Create Template %%{init: {'theme':'base', 'themeVariables': {'textColor':'#111111','lineColor':'#555555','edgeLabelBackground':'#ffffff','fontSize':'15px'}}}%%
flowchart TD
├── repository.AddTemplate(template) // template is a Commons POCO start(["Template Engine: Create Template"])
├── repository.AddAttributes(attributes) // attributes are Commons POCOs add1["repository.AddTemplate(template)<br/>// template is a Commons POCO"]
├── repository.AddAlarms(alarms) // alarms are Commons POCOs add2["repository.AddAttributes(attributes)<br/>// attributes are Commons POCOs"]
├── repository.AddScripts(scripts) // scripts are Commons POCOs add3["repository.AddAlarms(alarms)<br/>// alarms are Commons POCOs"]
└── repository.SaveChangesAsync() // single transaction commits all add4["repository.AddScripts(scripts)<br/>// scripts are Commons POCOs"]
save["repository.SaveChangesAsync()<br/>// single transaction commits all"]
db[("Configuration DB<br/>(MS SQL)")]
start --> add1
add1 --> add2
add2 --> add3
add3 --> add4
add4 --> save
save -. "single transaction" .-> db
classDef start fill:#d5e8d4,stroke:#82b366,color:#111111;
classDef proc fill:#dae8fc,stroke:#6c8ebf,color:#111111;
classDef dec fill:#fff2cc,stroke:#d6b656,color:#111111;
classDef muted fill:#f5f5f5,stroke:#999999,color:#666666;
class start start
class add1,add2,add3,add4 proc
class save dec
class db muted
``` ```
--- ---
@@ -184,13 +202,30 @@ Audit entries are written **synchronously** within the same database transaction
### Integration Example ### Integration Example
``` ```mermaid
Template Engine: Update Template %%{init: {'theme':'base', 'themeVariables': {'textColor':'#111111','lineColor':'#555555','edgeLabelBackground':'#ffffff','fontSize':'15px'}}}%%
flowchart TD
├── repository.UpdateTemplate(template) start(["Template Engine: Update Template"])
├── auditService.LogAsync(user, "Update", "Template", template.Id, upd["repository.UpdateTemplate(template)"]
template.Name, template) audit["auditService.LogAsync(user, &quot;Update&quot;, &quot;Template&quot;,<br/>template.Id, template.Name, template)"]
└── repository.SaveChangesAsync() ← both the change and audit entry commit together save["repository.SaveChangesAsync()"]
note["both the change and audit entry<br/>commit together"]
start --> upd
upd --> audit
audit --> save
save -.- note
classDef start fill:#d5e8d4,stroke:#82b366,color:#111111;
classDef proc fill:#dae8fc,stroke:#6c8ebf,color:#111111;
classDef dec fill:#fff2cc,stroke:#d6b656,color:#111111;
classDef warn fill:#ffe6cc,stroke:#d79b00,color:#111111;
classDef alt fill:#e1d5e7,stroke:#9673a6,color:#111111;
class start start
class upd proc
class audit alt
class save dec
class note warn
``` ```
### Audit Entry Schema ### Audit Entry Schema
@@ -80,11 +80,38 @@ Data connections support an optional backup endpoint for automatic failover when
**Failover state machine:** **Failover state machine:**
``` ```mermaid
Connected → disconnect → push bad quality → retry active endpoint (5s) %%{init: {'theme':'base', 'themeVariables': {'textColor':'#111111','lineColor':'#555555','edgeLabelBackground':'#ffffff','fontSize':'15px'}}}%%
→ N failures (≥ FailoverRetryCount) → switch to other endpoint flowchart TD
→ dispose adapter, create fresh adapter with other config connected(["Connected"])
→ reconnect → ReSubscribeAll → Connected pushbad["push bad quality"]
retry["retry active endpoint<br/>(5s)"]
decide{"N failures<br/>(≥ FailoverRetryCount)?"}
switch["switch to other endpoint"]
dispose["dispose adapter,<br/>create fresh adapter<br/>with other config"]
reconnect["reconnect"]
resub["ReSubscribeAll"]
connected -->|disconnect| pushbad
pushbad --> retry
retry --> decide
decide -->|"no (retry again)"| retry
decide -->|yes| switch
switch --> dispose
dispose --> reconnect
reconnect --> resub
resub -->|back to Connected| connected
classDef start fill:#d5e8d4,stroke:#82b366,color:#111111;
classDef proc fill:#dae8fc,stroke:#6c8ebf,color:#111111;
classDef dec fill:#fff2cc,stroke:#d6b656,color:#111111;
classDef warn fill:#ffe6cc,stroke:#d79b00,color:#111111;
classDef bad fill:#f8cecc,stroke:#b85450,color:#111111;
class connected start
class pushbad bad
class retry,reconnect,resub proc
class decide dec
class switch,dispose warn
``` ```
- **Round-robin**: primary → backup → primary → backup. No preferred endpoint after first failover — the connection stays on whichever endpoint is working. - **Round-robin**: primary → backup → primary → backup. No preferred endpoint after first failover — the connection stays on whichever endpoint is working.
@@ -22,23 +22,46 @@ Central cluster only. The site-side deployment responsibilities (receiving confi
## Deployment Flow ## Deployment Flow
``` ```mermaid
Engineer (UI) → Deployment Manager (Central) %%{init: {'theme':'base', 'themeVariables': {'textColor':'#111111','lineColor':'#555555','edgeLabelBackground':'#ffffff','fontSize':'15px'}}}%%
flowchart TD
├── 1. Request validated + flattened config from Template Engine engineer(["Engineer (UI)"])
│ (validation includes flattening, script compilation,
│ trigger references, connection binding completeness) subgraph DMC["Deployment Manager (Central)"]
├── 2. If validation fails → return errors to UI, stop step1["1. Request validated and flattened config from Template Engine<br/>(validation: flattening, script compilation, trigger references,<br/>connection binding completeness)"]
├── 3. Send config to site via Communication Layer step2{"2. Validation fails?"}
│ │ step2fail(["Return errors to UI, stop"])
│ ▼ step3["3. Send config to site via Communication Layer"]
│ Site Runtime (Deployment Manager Singleton) step8[("8. Update deployment status in config DB")]
│ ├── 4. Store new flattened config locally (SQLite) end
│ ├── 5. Compile scripts at site
│ ├── 6. Create/update Instance Actor (with child Script + Alarm Actors) subgraph SR["Site Runtime (Deployment Manager Singleton)"]
│ └── 7. Report success/failure back to central step4[("4. Store new flattened config locally (SQLite)")]
step5["5. Compile scripts at site"]
└── 8. Update deployment status in config DB step6["6. Create/update Instance Actor<br/>(with child Script + Alarm Actors)"]
step7["7. Report success/failure back to central"]
end
engineer --> step1
step1 --> step2
step2 -->|yes| step2fail
step2 -->|no| step3
step3 -->|config| step4
step4 --> step5
step5 --> step6
step6 --> step7
step7 -. "report success/failure" .-> step8
classDef start fill:#d5e8d4,stroke:#82b366,color:#111111;
classDef proc fill:#dae8fc,stroke:#6c8ebf,color:#111111;
classDef dec fill:#fff2cc,stroke:#d6b656,color:#111111;
classDef bad fill:#f8cecc,stroke:#b85450,color:#111111;
class engineer start
class step1,step5,step6,step7 dec
class step2,step2fail bad
class step3 dec
class step8 proc
class step4 start
``` ```
## Deployment Identity & Idempotency ## Deployment Identity & Idempotency
+34 -13
View File
@@ -123,19 +123,40 @@ API method scripts are compiled at central startup — all method definitions ar
## Request Flow ## Request Flow
``` ```mermaid
External System %%{init: {'theme':'base', 'themeVariables': {'textColor':'#111111','lineColor':'#555555','edgeLabelBackground':'#ffffff','fontSize':'15px'}}}%%
flowchart TD
ext(["External System"])
Inbound API (Central) api["Inbound API (Central)"]
├── 1. Extract API key from request s1["1. Extract API key from request"]
├── 2. Validate key exists and is enabled s2["2. Validate key exists and is enabled"]
├── 3. Resolve method by name s3["3. Resolve method by name"]
├── 4. Check API key is in method's approved list s4["4. Check API key is in method's approved list"]
├── 5. Validate and deserialize parameters s5["5. Validate and deserialize parameters"]
├── 6. Execute implementation script (subject to method timeout) s6["6. Execute implementation script<br/>(subject to method timeout)"]
├── 7. Serialize return value s7["7. Serialize return value"]
└── 8. Return response s8["8. Return response"]
ext --> api
api --> s1
s1 --> s2
s2 --> s3
s3 --> s4
s4 --> s5
s5 --> s6
s6 --> s7
s7 --> s8
classDef start fill:#d5e8d4,stroke:#82b366,color:#111111;
classDef proc fill:#dae8fc,stroke:#6c8ebf,color:#111111;
classDef dec fill:#fff2cc,stroke:#d6b656,color:#111111;
classDef warn fill:#ffe6cc,stroke:#d79b00,color:#111111;
classDef alt fill:#e1d5e7,stroke:#9673a6,color:#111111;
class ext start
class api proc
class s1,s2,s3,s4,s5,s7 dec
class s6 alt
class s8 warn
``` ```
## Implementation Script Capabilities ## Implementation Script Capabilities
@@ -24,23 +24,39 @@ SMTP and HTTP delivery is blocking I/O. Delivery work runs on a **dedicated bloc
## End-to-End Flow ## End-to-End Flow
``` ```mermaid
Site script: Notify.To("list").Send(subject, body) %%{init: {'theme':'base', 'themeVariables': {'textColor':'#111111','lineColor':'#555555','edgeLabelBackground':'#ffffff','fontSize':'15px'}}}%%
│ generate NotificationId (GUID) locally; return it to the script immediately flowchart TD
SCRIPT(["Site script: Notify.To('list').Send(subject, body)<br/>generate NotificationId (GUID) locally;<br/>return it to the script immediately"])
Site Store-and-Forward Engine (notification category, target = central) SNF["Site Store-and-Forward Engine<br/>(notification category, target = central)<br/>durably forwards to central via Central-Site Communication<br/>(ClusterClient); buffers/retries if central is unreachable"]
│ durably forwards to central via CentralSite Communication (ClusterClient); INGEST[("Central ingest: insert-if-not-exists on NotificationId<br/>to Notifications table (Pending)<br/>ack the site, site S and F clears the message")]
│ buffers/retries if central is unreachable OUTBOX["Central Notification Outbox actor<br/>(singleton, active central node)<br/>polls due rows; resolves the list;<br/>delivers via the matching adapter"]
D1{Delivery outcome}
Central ingest: insert-if-not-exists on NotificationId → Notifications table (Pending) DELIVERED(["Delivered"])
│ ack the site → site S&F clears the message RETRYING["Retrying<br/>(schedule NextAttemptAt)"]
PARKED(["Parked"])
Central Notification Outbox actor (singleton, active central node)
│ polls due rows; resolves the list; delivers via the matching adapter SCRIPT --> SNF
├── success → Delivered SNF --> INGEST
├── transient failure → Retrying (schedule NextAttemptAt) INGEST --> OUTBOX
└── permanent failure OUTBOX --> D1
/ retries exhausted → Parked D1 -->|success| DELIVERED
D1 -->|transient failure| RETRYING
D1 -->|"permanent failure /<br/>retries exhausted"| PARKED
RETRYING -.->|retry due| OUTBOX
classDef start fill:#d5e8d4,stroke:#82b366,color:#111111;
classDef proc fill:#dae8fc,stroke:#6c8ebf,color:#111111;
classDef dec fill:#fff2cc,stroke:#d6b656,color:#111111;
classDef warn fill:#ffe6cc,stroke:#d79b00,color:#111111;
classDef bad fill:#f8cecc,stroke:#b85450,color:#111111;
classDef alt fill:#e1d5e7,stroke:#9673a6,color:#111111;
class SCRIPT,DELIVERED start
class SNF warn
class INGEST proc
class OUTBOX alt
class D1,RETRYING dec
class PARKED bad
``` ```
The site forwards only `(listName, subject, body)` plus provenance — recipient resolution happens at central, at delivery time. This keeps notification-list definitions in one place and removes the deploy-to-sites artifact entirely. The site forwards only `(listName, subject, body)` plus provenance — recipient resolution happens at central, at delivery time. This keeps notification-list definitions in one place and removes the deploy-to-sites artifact entirely.
+50 -14
View File
@@ -27,20 +27,56 @@ Site clusters only.
## Actor Hierarchy ## Actor Hierarchy
``` ```mermaid
Deployment Manager Singleton (Cluster Singleton) %%{init: {'theme':'base', 'themeVariables': {'textColor':'#111111','lineColor':'#555555','edgeLabelBackground':'#ffffff','fontSize':'15px'}}}%%
├── Instance Actor ("MachineA-001") flowchart TD
│ ├── Script Actor ("MonitorSpeed") — coordinator DMS["Deployment Manager Singleton<br/>(Cluster Singleton)"]
│ │ └── Script Execution Actor — short-lived, per invocation IA1["Instance Actor<br/>('MachineA-001')"]
│ ├── Script Actor ("CalculateOEE") — coordinator IA2["Instance Actor<br/>('MachineA-002')"]
│ │ └── Script Execution Actor — short-lived, per invocation IAMORE["… more Instance Actors"]
│ ├── Alarm Actor ("OverTemp") — coordinator (computed)
│ │ └── Alarm Execution Actor — short-lived, per on-trigger invocation SA1["Script Actor ('MonitorSpeed')<br/>— coordinator"]
├── Alarm Actor ("LowPressure") — coordinator (computed) SA2["Script Actor ('CalculateOEE')<br/>— coordinator"]
└── Native Alarm Actor ("OpcUaServer1") — read-only mirror, peer to Alarm Actor AA1["Alarm Actor ('OverTemp')<br/>— coordinator (computed)"]
├── Instance Actor ("MachineA-002") AA2["Alarm Actor ('LowPressure')<br/>— coordinator (computed)"]
│ └── ... NAA1["Native Alarm Actor ('OpcUaServer1')<br/>— read-only mirror, peer to Alarm Actor"]
└── ...
SEA1["Script Execution Actor<br/>— short-lived, per invocation"]
SEA2["Script Execution Actor<br/>— short-lived, per invocation"]
AEA1["Alarm Execution Actor<br/>— short-lived, per on-trigger invocation"]
IA2CHILD["… (Script / Alarm Actors)"]
DMS --> IA1
DMS --> IA2
DMS -.-> IAMORE
IA1 --> SA1
IA1 --> SA2
IA1 --> AA1
IA1 --> AA2
IA1 --> NAA1
SA1 --> SEA1
SA2 --> SEA2
AA1 --> AEA1
IA2 -.-> IA2CHILD
classDef start fill:#d5e8d4,stroke:#82b366,color:#111111;
classDef proc fill:#dae8fc,stroke:#6c8ebf,color:#111111;
classDef dec fill:#fff2cc,stroke:#d6b656,color:#111111;
classDef warn fill:#ffe6cc,stroke:#d79b00,color:#111111;
classDef bad fill:#f8cecc,stroke:#b85450,color:#111111;
classDef alt fill:#e1d5e7,stroke:#9673a6,color:#111111;
classDef muted fill:#f5f5f5,stroke:#999999,color:#666666;
class DMS proc
class IA1,IA2 start
class SA1,SA2 dec
class AA1,AA2 bad
class NAA1 alt
class SEA1,SEA2,AEA1 warn
class IAMORE,IA2CHILD muted
``` ```
--- ---
+22 -16
View File
@@ -25,22 +25,28 @@ Site clusters only. The central cluster does not buffer messages.
## Message Lifecycle ## Message Lifecycle
``` ```mermaid
Script submits message %%{init: {'theme':'base', 'themeVariables': {'textColor':'#111111','lineColor':'#555555','edgeLabelBackground':'#ffffff','fontSize':'15px'}}}%%
flowchart TD
A([Script submits message]) --> B[Attempt immediate delivery]
Attempt immediate delivery B --> C{Delivered?}
C -->|Success| D([Remove from buffer])
├── Success → Remove from buffer C -->|Failure| E[Buffer message]
E --> F[Retry loop<br/>per retry policy]
└── Failure → Buffer message F --> G{Retry outcome}
G -->|Success| H([Remove from buffer<br/>+ notify standby])
G -->|Max retries exhausted| I([Park message<br/>dead-letter])
Retry loop (per retry policy)
classDef ok fill:#d5e8d4,stroke:#82b366,color:#111111;
├── Success → Remove from buffer + notify standby classDef proc fill:#dae8fc,stroke:#6c8ebf,color:#111111;
classDef dec fill:#fff2cc,stroke:#d6b656,color:#111111;
└── Max retries exhausted → Park message classDef buf fill:#ffe6cc,stroke:#d79b00,color:#111111;
classDef bad fill:#f8cecc,stroke:#b85450,color:#111111;
class A,D,H ok
class B,F proc
class C,G dec
class E buf
class I bad
``` ```
For notifications, "delivery" means forwarding the message to the central cluster via CentralSite Communication; "success" is central's ack, on which the message is cleared. Notifications are retried at the fixed forward interval until central acks, but — like every other category — they are bounded by the engine's `DefaultMaxRetries` cap: a sustained central outage that exceeds `DefaultMaxRetries × forward-interval` will park the buffered notification, after which an operator can Retry/Discard it via the parked-message UI. Operationally, the cap is sized so the normal central-recovery window stays well inside it; "do not park" is the design's operational intent on the happy path, not an absolute invariant. Callers that genuinely require unbounded retry pass `maxRetries: 0` on `EnqueueAsync` (the documented "no limit" escape hatch — see `StoreAndForward-015`). For notifications, "delivery" means forwarding the message to the central cluster via CentralSite Communication; "success" is central's ack, on which the message is cleared. Notifications are retried at the fixed forward interval until central acks, but — like every other category — they are bounded by the engine's `DefaultMaxRetries` cap: a sustained central outage that exceeds `DefaultMaxRetries × forward-interval` will park the buffered notification, after which an operator can Retry/Discard it via the parked-message UI. Operationally, the cap is sized so the normal central-recovery window stays well inside it; "do not park" is the design's operational intent on the happy path, not an absolute invariant. Callers that genuinely require unbounded retry pass `maxRetries: 0` on `EnqueueAsync` (the documented "no limit" escape hatch — see `StoreAndForward-015`).
+98 -56
View File
@@ -92,19 +92,31 @@ The manifest is plaintext so the import wizard can preview bundle contents and s
## Architecture ## Architecture
``` ```mermaid
ZB.MOM.WW.ScadaBridge.Transport %%{init: {'theme':'base', 'themeVariables': {'textColor':'#111111','lineColor':'#555555','edgeLabelBackground':'#ffffff','fontSize':'15px'}}}%%
├── IBundleExporter flowchart TD
│ ExportAsync(ExportSelection, Passphrase?, ct) → Stream subgraph T["ZB.MOM.WW.ScadaBridge.Transport"]
├── IBundleImporter EXPORTER["IBundleExporter<br/>ExportAsync(ExportSelection, Passphrase?, ct) → Stream"]
LoadAsync(stream, Passphrase?, ct) → BundleSession IMPORTER["IBundleImporter<br/>LoadAsync(stream, Passphrase?, ct) → BundleSession<br/>PreviewAsync(sessionId, ct) → ImportPreview<br/>ApplyAsync(sessionId, resolutions, ct) → ImportResult"]
│ PreviewAsync(sessionId, ct) → ImportPreview RESOLVER["DependencyResolver"]
│ ApplyAsync(sessionId, resolutions, ct) → ImportResult SERIALIZER["BundleSerializer<br/>(manifest + content JSON; ZIP packer)"]
├── DependencyResolver ENCRYPTOR["BundleSecretEncryptor<br/>(AES-256-GCM + PBKDF2)"]
├── BundleSerializer (manifest + content JSON; ZIP packer) SESSIONSTORE["BundleSessionStore<br/>(in-memory, TTL'd)"]
├── BundleSecretEncryptor (AES-256-GCM + PBKDF2) MANIFESTVALIDATOR["ManifestValidator<br/>(schema/version gating, hash check)"]
├── BundleSessionStore (in-memory, TTL'd) end
└── ManifestValidator (schema/version gating, hash check)
classDef start fill:#d5e8d4,stroke:#82b366,color:#111111;
classDef proc fill:#dae8fc,stroke:#6c8ebf,color:#111111;
classDef dec fill:#fff2cc,stroke:#d6b656,color:#111111;
classDef warn fill:#ffe6cc,stroke:#d79b00,color:#111111;
classDef alt fill:#e1d5e7,stroke:#9673a6,color:#111111;
classDef muted fill:#f5f5f5,stroke:#999999,color:#666666;
class EXPORTER,IMPORTER proc
class RESOLVER,SERIALIZER start
class ENCRYPTOR alt
class SESSIONSTORE warn
class MANIFESTVALIDATOR dec
class T muted
``` ```
The component is central-only. It is registered in `ZB.MOM.WW.ScadaBridge.Host` for central roles only, never for site roles. All persistence flows through existing audited repository interfaces in `ZB.MOM.WW.ScadaBridge.ConfigurationDatabase` — the component does not call `DbContext.SaveChangesAsync` directly. `BundleSessionStore` is in-process on the active central node (matching Blazor Server circuit affinity): 30-minute TTL, eviction on expiry, 3-strike passphrase lockout per session. The component is central-only. It is registered in `ZB.MOM.WW.ScadaBridge.Host` for central roles only, never for site roles. All persistence flows through existing audited repository interfaces in `ZB.MOM.WW.ScadaBridge.ConfigurationDatabase` — the component does not call `DbContext.SaveChangesAsync` directly. `BundleSessionStore` is in-process on the active central node (matching Blazor Server circuit affinity): 30-minute TTL, eviction on expiry, 3-strike passphrase lockout per session.
@@ -132,21 +144,49 @@ The user can toggle "include all dependencies" off (with a warning that the bund
### Backend ### Backend
``` ```mermaid
User (Design role) ─► Central UI Export wizard %%{init: {'theme':'base', 'themeVariables': {'textColor':'#111111','lineColor':'#555555','edgeLabelBackground':'#ffffff','fontSize':'15px'}}}%%
flowchart TD
USER(["User (Design role)"])
IBundleExporter WIZARD["Central UI Export wizard"]
EXPORTER["IBundleExporter"]
├─► DependencyResolver ─► repositories (read) RESOLVER["DependencyResolver"]
├─► EntitySerializer ─► content.json REPOS[("repositories (read)")]
├─► BundleSecretEncryptor ► content.enc (if passphrase) SERIALIZER["EntitySerializer"]
├─► ManifestBuilder ─► manifest.json CONTENTJSON["content.json"]
ENCRYPTOR["BundleSecretEncryptor"]
ZIP packer → temp file → browser download CONTENTENC["content.enc<br/>(if passphrase)"]
MANIFESTBUILDER["ManifestBuilder"]
MANIFESTJSON["manifest.json"]
IAuditService.LogAsync(BundleExported …) ZIP["ZIP packer → temp file → browser download"]
AUDIT["IAuditService.LogAsync(BundleExported …)"]
USER --> WIZARD
WIZARD --> EXPORTER
EXPORTER --> RESOLVER
RESOLVER --> SERIALIZER
SERIALIZER --> ENCRYPTOR
ENCRYPTOR --> MANIFESTBUILDER
MANIFESTBUILDER --> ZIP
ZIP --> AUDIT
RESOLVER --> REPOS
SERIALIZER --> CONTENTJSON
ENCRYPTOR --> CONTENTENC
MANIFESTBUILDER --> MANIFESTJSON
classDef start fill:#d5e8d4,stroke:#82b366,color:#111111;
classDef proc fill:#dae8fc,stroke:#6c8ebf,color:#111111;
classDef dec fill:#fff2cc,stroke:#d6b656,color:#111111;
classDef warn fill:#ffe6cc,stroke:#d79b00,color:#111111;
classDef alt fill:#e1d5e7,stroke:#9673a6,color:#111111;
classDef muted fill:#f5f5f5,stroke:#999999,color:#666666;
class USER,AUDIT start
class WIZARD,EXPORTER,ZIP proc
class RESOLVER,SERIALIZER,MANIFESTBUILDER dec
class ENCRYPTOR alt
class CONTENTJSON,CONTENTENC,MANIFESTJSON warn
class REPOS muted
``` ```
Audit event: `BundleExported` — caller, artifact count, content hash, encrypted yes/no, bundle filename. Audit event: `BundleExported` — caller, artifact count, content hash, encrypted yes/no, bundle filename.
@@ -179,34 +219,36 @@ Bundle references that cannot be satisfied in either the bundle or the target DB
### Backend ### Backend
``` ```mermaid
User (Admin role) ─► uploads bundle %%{init: {'theme':'base', 'themeVariables': {'textColor':'#111111','lineColor':'#555555','edgeLabelBackground':'#ffffff','fontSize':'15px'}}}%%
flowchart TD
USER(["User (Admin role) → uploads bundle"])
IBundleImporter.LoadAsync LOAD["IBundleImporter.LoadAsync<br/>· verify SHA-256 (manifest vs content)<br/>· check bundleFormatVersion supported<br/>· decrypt content.enc with passphrase (if encrypted)<br/>· deserialize entities<br/>· open BundleSession (30-min TTL)"]
· verify SHA-256 (manifest vs content) PREVIEW["PreviewAsync → diff vs target DB → ImportPreview"]
· check bundleFormatVersion supported REVIEW["(user reviews + resolves conflicts)"]
· decrypt content.enc with passphrase (if encrypted) APPLY["ApplyAsync (single EF transaction)<br/>· run two-tier semantic validation<br/>&nbsp;&nbsp;(minimal name scan + full SemanticValidator)<br/>· apply resolutions (add / overwrite / skip / rename)<br/>· upsert TemplateFolder hierarchy<br/>· IAuditService.LogAsync(BundleImported …)<br/>· commit"]
· deserialize entities RESULT["ImportResult → UI step 5"]
· open BundleSession (30-min TTL) DEPLOYMENTS["'View on Deployments →' (existing page)"]
USER --> LOAD
PreviewAsync → diff vs target DB → ImportPreview LOAD --> PREVIEW
PREVIEW --> APPLY
▼ (user reviews + resolves conflicts) PREVIEW -.- REVIEW
APPLY --> RESULT
ApplyAsync (single EF transaction) RESULT --> DEPLOYMENTS
· run two-tier semantic validation (minimal name scan + full SemanticValidator)
· apply resolutions (add / overwrite / skip / rename) classDef start fill:#d5e8d4,stroke:#82b366,color:#111111;
· upsert TemplateFolder hierarchy classDef proc fill:#dae8fc,stroke:#6c8ebf,color:#111111;
· IAuditService.LogAsync(BundleImported …) classDef dec fill:#fff2cc,stroke:#d6b656,color:#111111;
· commit classDef warn fill:#ffe6cc,stroke:#d79b00,color:#111111;
classDef alt fill:#e1d5e7,stroke:#9673a6,color:#111111;
classDef muted fill:#f5f5f5,stroke:#999999,color:#666666;
ImportResult → UI step 5 class USER start
class LOAD,RESULT proc
class PREVIEW dec
"View on Deployments →" (existing page) class APPLY alt
class DEPLOYMENTS warn
class REVIEW muted
``` ```
Authorization: `RequireAdmin` on both the Razor page and `IBundleImporter.*` entrypoints. Authorization: `RequireAdmin` on both the Razor page and `IBundleImporter.*` entrypoints.
+2
View File
@@ -16,6 +16,8 @@
</packageSource> </packageSource>
<packageSource key="dohertj2-gitea"> <packageSource key="dohertj2-gitea">
<package pattern="ZB.MOM.WW.MxGateway.*" /> <package pattern="ZB.MOM.WW.MxGateway.*" />
<package pattern="ZB.MOM.WW.Health" />
<package pattern="ZB.MOM.WW.Health.*" />
</packageSource> </packageSource>
</packageSourceMapping> </packageSourceMapping>
<!-- <!--
@@ -1,45 +0,0 @@
using Akka.Cluster;
using Microsoft.Extensions.Diagnostics.HealthChecks;
using ZB.MOM.WW.ScadaBridge.Host.Actors;
namespace ZB.MOM.WW.ScadaBridge.Host.Health;
/// <summary>
/// Health check that returns healthy only if this node is the active (leader) node
/// in the Akka.NET cluster. Used by Traefik to route traffic to the active node.
/// </summary>
public class ActiveNodeHealthCheck : IHealthCheck
{
private readonly AkkaHostedService _akkaService;
/// <summary>Initializes a new <see cref="ActiveNodeHealthCheck"/> with the given Akka hosted service.</summary>
/// <param name="akkaService">The Akka hosted service providing access to the actor system and cluster state.</param>
public ActiveNodeHealthCheck(AkkaHostedService akkaService)
{
_akkaService = akkaService;
}
/// <summary>Returns healthy if this node is the cluster leader (active node); otherwise returns unhealthy.</summary>
/// <param name="context">Health check context providing registration details.</param>
/// <param name="cancellationToken">Cancellation token.</param>
public Task<HealthCheckResult> CheckHealthAsync(
HealthCheckContext context,
CancellationToken cancellationToken = default)
{
var system = _akkaService.ActorSystem;
if (system == null)
return Task.FromResult(HealthCheckResult.Unhealthy("ActorSystem not yet available."));
var cluster = Cluster.Get(system);
var self = cluster.SelfMember;
if (self.Status != MemberStatus.Up)
return Task.FromResult(HealthCheckResult.Unhealthy($"Node not Up (status: {self.Status})."));
var leader = cluster.State.Leader;
if (leader != null && leader == self.Address)
return Task.FromResult(HealthCheckResult.Healthy("Active node (cluster leader)."));
return Task.FromResult(HealthCheckResult.Unhealthy("Standby node (not cluster leader)."));
}
}
@@ -1,52 +0,0 @@
using Akka.Cluster;
using Microsoft.Extensions.Diagnostics.HealthChecks;
using ZB.MOM.WW.ScadaBridge.Host.Actors;
namespace ZB.MOM.WW.ScadaBridge.Host.Health;
/// <summary>
/// Health check that verifies this node is an active member of the Akka.NET cluster.
/// Returns healthy only if the node's self-member status is Up or Joining.
/// </summary>
public class AkkaClusterHealthCheck : IHealthCheck
{
private readonly AkkaHostedService _akkaService;
/// <summary>
/// Initializes the health check with the Akka hosted service.
/// </summary>
/// <param name="akkaService">The hosted service providing access to the Akka actor system.</param>
public AkkaClusterHealthCheck(AkkaHostedService akkaService)
{
_akkaService = akkaService;
}
/// <summary>
/// Checks that this node is an active member of the Akka.NET cluster.
/// </summary>
/// <param name="context">Health check context.</param>
/// <param name="cancellationToken">Cancellation token.</param>
public Task<HealthCheckResult> CheckHealthAsync(
HealthCheckContext context,
CancellationToken cancellationToken = default)
{
var system = _akkaService.ActorSystem;
if (system == null)
return Task.FromResult(HealthCheckResult.Degraded("ActorSystem not yet available."));
var cluster = Cluster.Get(system);
var status = cluster.SelfMember.Status;
var result = status switch
{
MemberStatus.Up or MemberStatus.Joining =>
HealthCheckResult.Healthy($"Akka cluster member status: {status}"),
MemberStatus.Leaving or MemberStatus.Exiting =>
HealthCheckResult.Degraded($"Akka cluster member status: {status}"),
_ =>
HealthCheckResult.Unhealthy($"Akka cluster member status: {status}")
};
return Task.FromResult(result);
}
}
@@ -1,43 +0,0 @@
using Microsoft.Extensions.Diagnostics.HealthChecks;
using ZB.MOM.WW.ScadaBridge.ConfigurationDatabase;
namespace ZB.MOM.WW.ScadaBridge.Host.Health;
/// <summary>
/// Health check that verifies database connectivity for Central nodes.
/// </summary>
public class DatabaseHealthCheck : IHealthCheck
{
private readonly ScadaBridgeDbContext _dbContext;
/// <summary>
/// Initializes a new <see cref="DatabaseHealthCheck"/>.
/// </summary>
/// <param name="dbContext">The EF Core database context used to test connectivity.</param>
public DatabaseHealthCheck(ScadaBridgeDbContext dbContext)
{
_dbContext = dbContext;
}
/// <summary>
/// Checks database connectivity by attempting to open a connection.
/// </summary>
/// <param name="context">Health check context providing failure status information.</param>
/// <param name="cancellationToken">Cancellation token for the check.</param>
public async Task<HealthCheckResult> CheckHealthAsync(
HealthCheckContext context,
CancellationToken cancellationToken = default)
{
try
{
var canConnect = await _dbContext.Database.CanConnectAsync(cancellationToken);
return canConnect
? HealthCheckResult.Healthy("Database connection is available.")
: HealthCheckResult.Unhealthy("Database connection failed.");
}
catch (Exception ex)
{
return HealthCheckResult.Unhealthy("Database connection failed.", ex);
}
}
}
+39 -23
View File
@@ -1,5 +1,6 @@
using HealthChecks.UI.Client; using ZB.MOM.WW.Health;
using Microsoft.AspNetCore.Diagnostics.HealthChecks; using ZB.MOM.WW.Health.Akka;
using ZB.MOM.WW.Health.EntityFrameworkCore;
using ZB.MOM.WW.ScadaBridge.AuditLog; using ZB.MOM.WW.ScadaBridge.AuditLog;
using ZB.MOM.WW.ScadaBridge.CentralUI; using ZB.MOM.WW.ScadaBridge.CentralUI;
using ZB.MOM.WW.ScadaBridge.ClusterInfrastructure; using ZB.MOM.WW.ScadaBridge.ClusterInfrastructure;
@@ -110,16 +111,37 @@ try
?? throw new InvalidOperationException("ScadaBridge:Database:ConfigurationDb connection string is required for Central role."); ?? throw new InvalidOperationException("ScadaBridge:Database:ConfigurationDb connection string is required for Central role.");
builder.Services.AddConfigurationDatabase(configDbConnectionString); builder.Services.AddConfigurationDatabase(configDbConnectionString);
// WP-12: Health checks for readiness gating // WP-12: Health checks for readiness gating — shared ZB.MOM.WW.Health probes.
// Check names and the ready/active tier split are preserved: database + akka-cluster
// carry the Ready tag (/health/ready), active-node carries the Active tag (/health/active).
// The Akka checks resolve ActorSystem from DI via the transient bridge registered below;
// the DatabaseHealthCheck<TContext> resolves a scoped ScadaBridgeDbContext (no factory).
builder.Services.AddHealthChecks() builder.Services.AddHealthChecks()
.AddCheck<DatabaseHealthCheck>("database") .AddTypeActivatedCheck<DatabaseHealthCheck<ScadaBridgeDbContext>>(
.AddCheck<AkkaClusterHealthCheck>("akka-cluster") "database",
.AddCheck<ActiveNodeHealthCheck>("active-node"); failureStatus: null,
tags: new[] { ZbHealthTags.Ready })
.AddTypeActivatedCheck<AkkaClusterHealthCheck>(
"akka-cluster",
failureStatus: null,
tags: new[] { ZbHealthTags.Ready },
args: AkkaClusterStatusPolicy.Default)
.AddTypeActivatedCheck<ActiveNodeHealthCheck>(
"active-node",
failureStatus: null,
tags: new[] { ZbHealthTags.Active });
// WP-13: Akka.NET bootstrap via hosted service // WP-13: Akka.NET bootstrap via hosted service
builder.Services.AddSingleton<AkkaHostedService>(); builder.Services.AddSingleton<AkkaHostedService>();
builder.Services.AddHostedService(sp => sp.GetRequiredService<AkkaHostedService>()); builder.Services.AddHostedService(sp => sp.GetRequiredService<AkkaHostedService>());
// The shared ZB.MOM.WW.Health Akka checks resolve ActorSystem from DI. ScadaBridge owns the
// ActorSystem inside AkkaHostedService (not a DI singleton), so bridge it as TRANSIENT: each
// resolve re-reads the current value — null while warming up (checks → Degraded), live after.
// The factory must NOT throw: GetService<ActorSystem>() must return null (not raise) pre-start.
builder.Services.AddTransient<Akka.Actor.ActorSystem>(sp =>
sp.GetRequiredService<AkkaHostedService>().ActorSystem!);
// InboundAPI-022: register the production IActiveNodeGate implementation so // InboundAPI-022: register the production IActiveNodeGate implementation so
// standby-node gating is actually enforced (the InboundApiEndpointFilter // standby-node gating is actually enforced (the InboundApiEndpointFilter
// consults IActiveNodeGate and defaults to "allow" when none is registered, // consults IActiveNodeGate and defaults to "allow" when none is registered,
@@ -214,23 +236,17 @@ try
&& HttpMethods.IsPost(ctx.Request.Method), && HttpMethods.IsPost(ctx.Request.Method),
branch => branch.UseAuditWriteMiddleware()); branch => branch.UseAuditWriteMiddleware());
// WP-12: Map readiness endpoint — returns 503 until ready, 200 when ready. // WP-12: Map the canonical three-tier health endpoints in one call:
// REQ-HOST-4a defines readiness as cluster membership + DB connectivity, // /health/ready — Ready-tagged checks (database + akka-cluster). REQ-HOST-4a defines
// explicitly NOT cluster leadership. The leader-only "active-node" check is // readiness as cluster membership + DB connectivity, explicitly NOT
// excluded here so a fully operational standby central node reports ready; // cluster leadership, so the leader-only active-node check is excluded
// leadership is reported separately on /health/active. // (a fully operational standby central node still reports ready).
app.MapHealthChecks("/health/ready", new HealthCheckOptions // /health/active — Active-tagged check (active-node); returns 200 only on the cluster
{ // leader; used by Traefik for routing.
Predicate = check => check.Name != "active-node", // /healthz — bare process liveness; runs no checks (always 200 while the process
ResponseWriter = UIResponseWriter.WriteHealthCheckUIResponse // is up). New tier added by adopting the shared library.
}); // All three are anonymous and use the canonical ZbHealthWriter JSON output.
app.MapZbHealth();
// Active node endpoint — returns 200 only on the cluster leader; used by Traefik for routing
app.MapHealthChecks("/health/active", new HealthCheckOptions
{
Predicate = check => check.Name == "active-node",
ResponseWriter = UIResponseWriter.WriteHealthCheckUIResponse
});
app.MapStaticAssets(); app.MapStaticAssets();
app.MapCentralUI<ZB.MOM.WW.ScadaBridge.Host.Components.App>(); app.MapCentralUI<ZB.MOM.WW.ScadaBridge.Host.Components.App>();
@@ -73,6 +73,13 @@ public static class SiteServiceRegistration
services.AddSingleton<AkkaHostedService>(); services.AddSingleton<AkkaHostedService>();
services.AddHostedService(sp => sp.GetRequiredService<AkkaHostedService>()); services.AddHostedService(sp => sp.GetRequiredService<AkkaHostedService>());
// The shared ZB.MOM.WW.Health Akka checks resolve ActorSystem from DI. ScadaBridge owns the
// ActorSystem inside AkkaHostedService (not a DI singleton), so bridge it as TRANSIENT: each
// resolve re-reads the current value — null while warming up (checks → Degraded), live after.
// The factory must NOT throw: GetService<ActorSystem>() must return null (not raise) pre-start.
services.AddTransient<Akka.Actor.ActorSystem>(sp =>
sp.GetRequiredService<AkkaHostedService>().ActorSystem!);
// Cluster node status provider for health reports // Cluster node status provider for health reports
services.AddSingleton<IClusterNodeProvider>(sp => services.AddSingleton<IClusterNodeProvider>(sp =>
{ {
@@ -16,7 +16,6 @@
<PackageReference Include="Akka.Cluster.Tools" /> <PackageReference Include="Akka.Cluster.Tools" />
<PackageReference Include="Akka.Hosting" /> <PackageReference Include="Akka.Hosting" />
<PackageReference Include="Akka.Remote.Hosting" /> <PackageReference Include="Akka.Remote.Hosting" />
<PackageReference Include="AspNetCore.HealthChecks.UI.Client" />
<PackageReference Include="Microsoft.EntityFrameworkCore.Design"> <PackageReference Include="Microsoft.EntityFrameworkCore.Design">
<IncludeAssets>runtime; build; native; contentfiles; analyzers; buildtransitive</IncludeAssets> <IncludeAssets>runtime; build; native; contentfiles; analyzers; buildtransitive</IncludeAssets>
<PrivateAssets>all</PrivateAssets> <PrivateAssets>all</PrivateAssets>
@@ -29,6 +28,9 @@
<!-- Transitive override: Akka.Hosting 1.5.62 pins OpenTelemetry.Api 1.9.0 which is flagged <!-- Transitive override: Akka.Hosting 1.5.62 pins OpenTelemetry.Api 1.9.0 which is flagged
(GHSA-g94r-2vxg-569j, GHSA-8785-wc3w-h8q6). Bumping directly clears both advisories. --> (GHSA-g94r-2vxg-569j, GHSA-8785-wc3w-h8q6). Bumping directly clears both advisories. -->
<PackageReference Include="OpenTelemetry.Api" /> <PackageReference Include="OpenTelemetry.Api" />
<PackageReference Include="ZB.MOM.WW.Health" />
<PackageReference Include="ZB.MOM.WW.Health.Akka" />
<PackageReference Include="ZB.MOM.WW.Health.EntityFrameworkCore" />
</ItemGroup> </ItemGroup>
<ItemGroup> <ItemGroup>
@@ -0,0 +1,46 @@
using Akka.Actor;
using Microsoft.Extensions.DependencyInjection;
using Microsoft.Extensions.Logging;
using Microsoft.Extensions.Logging.Abstractions;
using Microsoft.Extensions.Options;
using ZB.MOM.WW.ScadaBridge.ClusterInfrastructure;
using ZB.MOM.WW.ScadaBridge.Communication;
using ZB.MOM.WW.ScadaBridge.Host;
using ZB.MOM.WW.ScadaBridge.Host.Actors;
namespace ZB.MOM.WW.ScadaBridge.Host.Tests;
/// <summary>
/// Verifies the DI bridge that exposes the Akka <see cref="ActorSystem"/> — owned by
/// <see cref="AkkaHostedService"/>, not registered as a DI singleton — to consumers that
/// resolve <c>ActorSystem</c> from the container (notably the shared ZB.MOM.WW.Health Akka
/// checks). The bridge is registered TRANSIENT so each resolve re-reads the current value:
/// null while the hosted service is warming up (checks treat that as Degraded), the live
/// system afterwards. A SINGLETON would cache the startup-time null forever.
/// </summary>
public sealed class ActorSystemBridgeTests
{
[Fact]
public void ActorSystem_ResolvesNull_BeforeHostedServiceStarts()
{
var services = new ServiceCollection();
// Register AkkaHostedService the same way Program.cs does, supplying the minimal
// constructor dependencies so the container can build it. Its ActorSystem property
// is null until StartAsync runs — which it never does here.
services.AddSingleton(Options.Create(new NodeOptions()));
services.AddSingleton(Options.Create(new ClusterOptions()));
services.AddSingleton(Options.Create(new CommunicationOptions()));
services.AddSingleton<ILogger<AkkaHostedService>>(NullLogger<AkkaHostedService>.Instance);
services.AddSingleton<AkkaHostedService>();
// The bridge under test: TRANSIENT factory that re-reads the owned ActorSystem.
services.AddTransient<ActorSystem>(sp =>
sp.GetRequiredService<AkkaHostedService>().ActorSystem!);
using var provider = services.BuildServiceProvider();
// The hosted service has not started, so the bridge must yield null (not throw).
Assert.Null(provider.GetService<ActorSystem>());
}
}
@@ -1,11 +1,20 @@
using System.Linq;
using Microsoft.AspNetCore.Mvc.Testing; using Microsoft.AspNetCore.Mvc.Testing;
using Microsoft.Extensions.Configuration; using Microsoft.Extensions.Configuration;
using ZB.MOM.WW.ScadaBridge.Host.Health; using Microsoft.Extensions.DependencyInjection;
using Microsoft.Extensions.Diagnostics.HealthChecks;
using Microsoft.Extensions.Options;
using ZB.MOM.WW.Health;
namespace ZB.MOM.WW.ScadaBridge.Host.Tests; namespace ZB.MOM.WW.ScadaBridge.Host.Tests;
/// <summary> /// <summary>
/// WP-12: Tests for /health/ready and /health/active endpoints. /// WP-12: Tests for the three-tier health endpoints after adopting the shared
/// ZB.MOM.WW.Health probes. Verifies that /health/ready, /health/active and the new
/// /healthz tier are mapped, and that the readiness/active tier split is now carried by
/// the canonical <see cref="ZbHealthTags"/> (Ready for database + akka-cluster, Active for
/// active-node) rather than by check-name predicates. These are pure route/tag assertions
/// — they require no database, LDAP, or formed Akka cluster.
/// </summary> /// </summary>
public class HealthCheckTests : IDisposable public class HealthCheckTests : IDisposable
{ {
@@ -25,14 +34,8 @@ public class HealthCheckTests : IDisposable
} }
} }
[Fact] private WebApplicationFactory<Program> CreateCentralFactory()
public async Task HealthReady_Endpoint_ReturnsResponse()
{ {
var previousEnv = Environment.GetEnvironmentVariable("DOTNET_ENVIRONMENT");
try
{
Environment.SetEnvironmentVariable("DOTNET_ENVIRONMENT", "Central");
var factory = new WebApplicationFactory<Program>() var factory = new WebApplicationFactory<Program>()
.WithWebHostBuilder(builder => .WithWebHostBuilder(builder =>
{ {
@@ -51,15 +54,29 @@ public class HealthCheckTests : IDisposable
builder.UseSetting("ScadaBridge:Database:SkipMigrations", "true"); builder.UseSetting("ScadaBridge:Database:SkipMigrations", "true");
}); });
_disposables.Add(factory); _disposables.Add(factory);
return factory;
}
private static IEnumerable<HealthCheckRegistration> Registrations(WebApplicationFactory<Program> factory) =>
factory.Services.GetRequiredService<IOptions<HealthCheckServiceOptions>>().Value.Registrations;
[Fact]
public async Task HealthReady_Endpoint_IsMapped()
{
var previousEnv = Environment.GetEnvironmentVariable("DOTNET_ENVIRONMENT");
try
{
Environment.SetEnvironmentVariable("DOTNET_ENVIRONMENT", "Central");
var factory = CreateCentralFactory();
var client = factory.CreateClient(); var client = factory.CreateClient();
_disposables.Add(client); _disposables.Add(client);
var response = await client.GetAsync("/health/ready"); var response = await client.GetAsync("/health/ready");
// The endpoint exists and returns a status code. // The endpoint exists and returns a status code. With test infrastructure
// With test infrastructure (no real DB), the database check may fail, // (no real DB / cluster) the readiness checks may report Unhealthy, so we
// so we accept either 200 (Healthy) or 503 (Unhealthy). // accept either 200 (Healthy/Degraded) or 503 (Unhealthy) — never 404.
Assert.NotEqual(System.Net.HttpStatusCode.NotFound, response.StatusCode);
Assert.True( Assert.True(
response.StatusCode == System.Net.HttpStatusCode.OK || response.StatusCode == System.Net.HttpStatusCode.OK ||
response.StatusCode == System.Net.HttpStatusCode.ServiceUnavailable, response.StatusCode == System.Net.HttpStatusCode.ServiceUnavailable,
@@ -72,39 +89,19 @@ public class HealthCheckTests : IDisposable
} }
[Fact] [Fact]
public async Task HealthActive_Endpoint_ReturnsResponse() public async Task HealthActive_Endpoint_IsMapped()
{ {
var previousEnv = Environment.GetEnvironmentVariable("DOTNET_ENVIRONMENT"); var previousEnv = Environment.GetEnvironmentVariable("DOTNET_ENVIRONMENT");
try try
{ {
Environment.SetEnvironmentVariable("DOTNET_ENVIRONMENT", "Central"); Environment.SetEnvironmentVariable("DOTNET_ENVIRONMENT", "Central");
var factory = CreateCentralFactory();
var factory = new WebApplicationFactory<Program>()
.WithWebHostBuilder(builder =>
{
builder.ConfigureAppConfiguration((context, config) =>
{
config.AddInMemoryCollection(new Dictionary<string, string?>
{
["ScadaBridge:Node:NodeHostname"] = "localhost",
["ScadaBridge:Node:RemotingPort"] = "0",
["ScadaBridge:Cluster:SeedNodes:0"] = "akka.tcp://scadabridge@localhost:2551",
["ScadaBridge:Cluster:SeedNodes:1"] = "akka.tcp://scadabridge@localhost:2552",
["ScadaBridge:Database:SkipMigrations"] = "true",
});
});
builder.UseSetting("ScadaBridge:Node:Role", "Central");
builder.UseSetting("ScadaBridge:Database:SkipMigrations", "true");
});
_disposables.Add(factory);
var client = factory.CreateClient(); var client = factory.CreateClient();
_disposables.Add(client); _disposables.Add(client);
var response = await client.GetAsync("/health/active"); var response = await client.GetAsync("/health/active");
// In test mode, the ActorSystem may not be fully available, Assert.NotEqual(System.Net.HttpStatusCode.NotFound, response.StatusCode);
// so the active-node check returns 503 (Unhealthy).
Assert.True( Assert.True(
response.StatusCode == System.Net.HttpStatusCode.OK || response.StatusCode == System.Net.HttpStatusCode.OK ||
response.StatusCode == System.Net.HttpStatusCode.ServiceUnavailable, response.StatusCode == System.Net.HttpStatusCode.ServiceUnavailable,
@@ -117,46 +114,21 @@ public class HealthCheckTests : IDisposable
} }
[Fact] [Fact]
public async Task HealthReady_Endpoint_ExcludesActiveNodeCheck() public async Task Healthz_LivenessEndpoint_IsMappedAndReturns200()
{ {
// Host-001 regression: /health/ready must reflect cluster membership + DB // New tier added by adopting the shared library: /healthz runs no checks, so it
// connectivity only (REQ-HOST-4a), NOT cluster leadership. The leader-only // returns 200 as long as the process is up — independent of DB / cluster state.
// "active-node" check belongs solely to /health/active. If /health/ready
// included "active-node", a fully operational standby central node would
// permanently report 503, breaking load-balancer failover readiness.
var previousEnv = Environment.GetEnvironmentVariable("DOTNET_ENVIRONMENT"); var previousEnv = Environment.GetEnvironmentVariable("DOTNET_ENVIRONMENT");
try try
{ {
Environment.SetEnvironmentVariable("DOTNET_ENVIRONMENT", "Central"); Environment.SetEnvironmentVariable("DOTNET_ENVIRONMENT", "Central");
var factory = CreateCentralFactory();
var factory = new WebApplicationFactory<Program>()
.WithWebHostBuilder(builder =>
{
builder.ConfigureAppConfiguration((context, config) =>
{
config.AddInMemoryCollection(new Dictionary<string, string?>
{
["ScadaBridge:Node:NodeHostname"] = "localhost",
["ScadaBridge:Node:RemotingPort"] = "0",
["ScadaBridge:Cluster:SeedNodes:0"] = "akka.tcp://scadabridge@localhost:2551",
["ScadaBridge:Cluster:SeedNodes:1"] = "akka.tcp://scadabridge@localhost:2552",
["ScadaBridge:Database:SkipMigrations"] = "true",
});
});
builder.UseSetting("ScadaBridge:Node:Role", "Central");
builder.UseSetting("ScadaBridge:Database:SkipMigrations", "true");
});
_disposables.Add(factory);
var client = factory.CreateClient(); var client = factory.CreateClient();
_disposables.Add(client); _disposables.Add(client);
var response = await client.GetAsync("/health/ready"); var response = await client.GetAsync("/healthz");
var body = await response.Content.ReadAsStringAsync();
// The readiness body lists each executed check by name in its entries map. Assert.Equal(System.Net.HttpStatusCode.OK, response.StatusCode);
// The leader-only "active-node" check must not be among them.
Assert.DoesNotContain("active-node", body);
} }
finally finally
{ {
@@ -165,43 +137,54 @@ public class HealthCheckTests : IDisposable
} }
[Fact] [Fact]
public async Task ActiveNodeHealthCheck_SystemNotStarted_ReturnsUnhealthy() public void ReadyTier_Carries_Database_And_AkkaCluster()
{ {
// AkkaHostedService before StartAsync has ActorSystem == null. // Host-001 regression guard: readiness reflects cluster membership + DB connectivity
// The integration test (HealthActive_Endpoint_ReturnsResponse) validates the full // only (REQ-HOST-4a), NOT cluster leadership. The split is now carried by the Ready tag
// endpoint wiring. This test validates the null-system path via WebApplicationFactory // rather than a check-name predicate: database + akka-cluster are Ready-tagged, and the
// where the ActorSystem may not be available. // leader-only active-node check is NOT — so a fully operational standby central node
// still reports ready on /health/ready.
var previousEnv = Environment.GetEnvironmentVariable("DOTNET_ENVIRONMENT"); var previousEnv = Environment.GetEnvironmentVariable("DOTNET_ENVIRONMENT");
try try
{ {
Environment.SetEnvironmentVariable("DOTNET_ENVIRONMENT", "Central"); Environment.SetEnvironmentVariable("DOTNET_ENVIRONMENT", "Central");
var factory = new WebApplicationFactory<Program>() var factory = CreateCentralFactory();
.WithWebHostBuilder(builder =>
{
builder.ConfigureAppConfiguration((context, config) =>
{
config.AddInMemoryCollection(new Dictionary<string, string?>
{
["ScadaBridge:Node:NodeHostname"] = "localhost",
["ScadaBridge:Node:RemotingPort"] = "0",
["ScadaBridge:Cluster:SeedNodes:0"] = "akka.tcp://scadabridge@localhost:2551",
["ScadaBridge:Database:SkipMigrations"] = "true",
});
});
builder.UseSetting("ScadaBridge:Node:Role", "Central");
builder.UseSetting("ScadaBridge:Database:SkipMigrations", "true");
});
_disposables.Add(factory);
var client = factory.CreateClient(); var registrations = Registrations(factory).ToDictionary(r => r.Name);
_disposables.Add(client);
var response = await client.GetAsync("/health/active"); Assert.True(registrations.ContainsKey("database"), "Expected a 'database' health check.");
var body = await response.Content.ReadAsStringAsync(); Assert.True(registrations.ContainsKey("akka-cluster"), "Expected an 'akka-cluster' health check.");
// Active-node check returns 503 when ActorSystem is not yet available or not leader Assert.Contains(ZbHealthTags.Ready, registrations["database"].Tags);
Assert.Equal(System.Net.HttpStatusCode.ServiceUnavailable, response.StatusCode); Assert.Contains(ZbHealthTags.Ready, registrations["akka-cluster"].Tags);
Assert.Contains("active-node", body);
// The leader-only active-node check must NOT be on the readiness tier.
Assert.DoesNotContain(ZbHealthTags.Ready, registrations["active-node"].Tags);
}
finally
{
Environment.SetEnvironmentVariable("DOTNET_ENVIRONMENT", previousEnv);
}
}
[Fact]
public void ActiveTier_Carries_Only_ActiveNode()
{
// The active-node leader check carries the Active tag (→ /health/active); the readiness
// checks do not, so /health/active reports leadership alone.
var previousEnv = Environment.GetEnvironmentVariable("DOTNET_ENVIRONMENT");
try
{
Environment.SetEnvironmentVariable("DOTNET_ENVIRONMENT", "Central");
var factory = CreateCentralFactory();
var registrations = Registrations(factory).ToDictionary(r => r.Name);
Assert.True(registrations.ContainsKey("active-node"), "Expected an 'active-node' health check.");
Assert.Contains(ZbHealthTags.Active, registrations["active-node"].Tags);
Assert.DoesNotContain(ZbHealthTags.Active, registrations["database"].Tags);
Assert.DoesNotContain(ZbHealthTags.Active, registrations["akka-cluster"].Tags);
} }
finally finally
{ {