diff --git a/docs/README.md b/docs/README.md index d8cb059..c85de84 100644 --- a/docs/README.md +++ b/docs/README.md @@ -9,10 +9,13 @@ The project was originally called **LmxOpcUa** (a single-driver Galaxy/MXAccess ## Platform overview -- **Core** owns the OPC UA stack, address space, session/security/subscription machinery. +> **v2 (2026-05-26):** the separate `OtOpcUa.Server` + `OtOpcUa.Admin` services fused into a single role-gated `OtOpcUa.Host` binary, joined by an Akka.NET cluster. See [v2 design](plans/2026-05-26-akka-hosting-alignment-design.md) for the architectural decision. + +- **Core** owns shared abstractions (driver capability contracts, scripting, virtual tags, alarm historian). - **Drivers** plug in via capability interfaces in `ZB.MOM.WW.OtOpcUa.Core.Abstractions`: `IDriver`, `IReadable`, `IWritable`, `ITagDiscovery`, `ISubscribable`, `IHostConnectivityProbe`, `IAlarmSource`, `IHistoryProvider`, `IPerCallHostResolver`. Each driver opts into whichever it supports. -- **Server** is the OPC UA endpoint process (net10, AnyCPU). Hosts every driver in-process. The Galaxy driver reaches MXAccess via gRPC to a separately-installed **mxaccessgw** sidecar (sibling repo); it is no longer hosted from this repo. -- **Admin** is the Blazor Server operator UI (net10, x64). Owns the Config DB draft/publish flow, ACL + role-grant authoring, fleet status + `/metrics` scrape endpoint. +- **Host** (`src/Server/ZB.MOM.WW.OtOpcUa.Host`) is the single fused binary (.NET 10, AnyCPU). `OTOPCUA_ROLES` env decides what to mount: `admin` (Blazor + control-plane singletons), `driver` (OPC UA endpoint + per-node actors), or both. See [ServiceHosting.md](ServiceHosting.md). +- **Cluster + ControlPlane + Runtime + AdminUI + Security** sit between Core and Host. The cluster glues per-node actors into one logical fleet; the control-plane singletons (deploy coordinator, audit writer, redundancy state) live on the admin role-leader. See [Redundancy.md](Redundancy.md). +- The Galaxy driver still reaches MXAccess via gRPC to a separately-installed **mxaccessgw** sidecar (sibling repo). ## Where to find what diff --git a/docs/Redundancy.md b/docs/Redundancy.md index 6de172f..fbab890 100644 --- a/docs/Redundancy.md +++ b/docs/Redundancy.md @@ -1,103 +1,93 @@ -# Redundancy +# Redundancy (v2) ## Overview -OtOpcUa supports OPC UA **non-transparent** warm/hot redundancy. Two (or more) OtOpcUa Server processes run side-by-side, share the same Config DB, the same driver backends (Galaxy ZB, MXAccess runtime, remote PLCs), and advertise the same OPC UA node tree. Each process owns a distinct `ApplicationUri`; OPC UA clients see both endpoints via the standard `ServerUriArray` and pick one based on the `ServiceLevel` that each server publishes. +OtOpcUa supports OPC UA **non-transparent** warm/hot redundancy. Two or more `OtOpcUa.Host` processes run side-by-side, share the same Config DB, and join the same Akka.NET cluster. Each process owns a distinct `ApplicationUri`; OPC UA clients see both endpoints via the standard `ServerUriArray` and pick one based on the `ServiceLevel` byte that each server publishes. -The redundancy surface lives in `src/Server/ZB.MOM.WW.OtOpcUa.Server/Redundancy/`: +> **v2 change.** v1's operator-managed `ClusterNode.RedundancyRole` column + `RedundancyCoordinator` / `ApplyLeaseRegistry` / `PeerHttpProbeLoop` are gone. Primary/secondary is now derived from **Akka cluster role-leader** for the `driver` role. The operator no longer writes a role into the DB; cluster topology + health drive ServiceLevel automatically. -| Class | Role | -|---|---| -| `RedundancyCoordinator` | Process-singleton; owns the current `RedundancyTopology` loaded from the `ClusterNode` table. `RefreshAsync` re-reads after `sp_PublishGeneration` so operator role swaps take effect without a process restart. CAS-style swap (`Interlocked.Exchange`) means readers always see a coherent snapshot. | -| `RedundancyTopology` | Immutable `(ClusterId, Self, Peers, ServerUriArray, ValidityFlags)` snapshot. | -| `ApplyLeaseRegistry` | Tracks in-progress `sp_PublishGeneration` apply leases keyed on `(ConfigGenerationId, PublishRequestId)`. `await using` the disposable scope guarantees every exit path (success / exception / cancellation) decrements the lease; a stale-lease watchdog force-closes any lease older than `ApplyMaxDuration` (default 10 minutes) so a crashed publisher can't pin the node at `PrimaryMidApply`. | -| `PeerReachabilityTracker` | Maintains last-known reachability for each peer node over two independent probes — OPC UA ping and HTTP `/healthz`. Both must succeed for `peerReachable = true`. | -| `RecoveryStateManager` | Gates transitions out of the `Recovering*` bands; requires dwell + publish-witness satisfaction before allowing a return to nominal. | -| `ServiceLevelCalculator` | Pure function `(role, selfHealthy, peerUa, peerHttp, applyInProgress, recoveryDwellMet, topologyValid, operatorMaintenance) → byte`. | -| `RedundancyStatePublisher` | Orchestrates inputs into the calculator, pushes the resulting byte to the OPC UA `ServiceLevel` variable via an edge-triggered `OnStateChanged` event, and fires `OnServerUriArrayChanged` when the topology's `ServerUriArray` shifts. | +The runtime pieces live in: -## Data model - -Per-node redundancy state lives in the Config DB `ClusterNode` table (`src/Core/ZB.MOM.WW.OtOpcUa.Configuration/Entities/ClusterNode.cs`): - -| Column | Role | -|---|---| -| `NodeId` | Unique node identity; matches `Node:NodeId` in the server's bootstrap `appsettings.json`. | -| `ClusterId` | Foreign key into `ServerCluster`. | -| `RedundancyRole` | `Primary`, `Secondary`, or `Standalone` (`RedundancyRole` enum in `Configuration/Enums`). | -| `ServiceLevelBase` | Per-node base value used to bias nominal ServiceLevel output. | -| `ApplicationUri` | Unique-per-node OPC UA ApplicationUri advertised in endpoint descriptions. | - -`ServerUriArray` is derived from the set of peer `ApplicationUri` values at topology-load time and republished when the topology changes. - -## ServiceLevel matrix - -`ServiceLevelCalculator` produces one of the following bands (see `ServiceLevelBand` enum in the same file): - -| Band | Byte | Meaning | +| Component | Project | Role | |---|---|---| -| `Maintenance` | 0 | Operator-declared maintenance. | -| `NoData` | 1 | Self-reported unhealthy (`/healthz` fails). | -| `InvalidTopology` | 2 | More than one Primary detected; both nodes self-demote. | -| `RecoveringBackup` | 30 | Backup post-fault, dwell not met. | -| `BackupMidApply` | 50 | Backup inside a publish-apply window. | -| `IsolatedBackup` | 80 | Primary unreachable; Backup says "take over if asked" — does **not** auto-promote (non-transparent model). | -| `AuthoritativeBackup` | 100 | Backup nominal. | -| `RecoveringPrimary` | 180 | Primary post-fault, dwell not met. | -| `PrimaryMidApply` | 200 | Primary inside a publish-apply window. | -| `IsolatedPrimary` | 230 | Primary with unreachable peer, retains authority. | -| `AuthoritativePrimary` | 255 | Primary nominal. | +| `ServiceLevelCalculator` | `OtOpcUa.ControlPlane.Redundancy` | Pure function `(NodeHealthInputs) → byte`. No side effects. | +| `RedundancyStateActor` | `OtOpcUa.ControlPlane.Redundancy` | Admin-role cluster singleton; subscribes to cluster topology events, debounces 250ms, broadcasts `RedundancyStateChanged` on the `redundancy-state` DPS topic. | +| `DbHealthProbeActor` | `OtOpcUa.Runtime.Health` | Per-node; runs `SELECT 1` against ConfigDb every 5s. Read by health endpoint + redundancy calc. | +| `PeerOpcUaProbeActor` | `OtOpcUa.Runtime.Health` | Per-node; pings peer `opc.tcp://peer:4840` (real probe call is staged for follow-up F12). | +| `ClusterRoleInfo` | `OtOpcUa.Cluster` | Live view of cluster membership + role-leader; exposes `IClusterRoleInfo` to the rest of the host. | -The reserved bands (0 Maintenance, 1 NoData, 2 InvalidTopology) take precedence over operational states per OPC UA Part 5 §6.3.34. Operational values occupy 2..255 so spec-compliant clients that treat "<3 = unhealthy" keep working. +## ServiceLevel tiers (Part 5 §6.5) -Standalone nodes (single-instance deployments) report `AuthoritativePrimary` when healthy and `PrimaryMidApply` during publish. +`ServiceLevelCalculator.Compute(NodeHealthInputs)` returns a byte in 0..255 by tier: -## Publish fencing and split-brain prevention +| Tier | Byte | Condition | +|---|---|---| +| Down | 0 | Member status is not `Up` or `Joining` (leaving, removed, exiting). | +| Critically degraded | 100 | ConfigDb unreachable AND data is stale. | +| Stale | 200 | Data stale but ConfigDb reachable. | +| Healthy follower | 240 | DB ok + OPC UA probe ok + not stale. | +| Healthy leader | 250 | Healthy + this node is the `driver` role-leader. | -Any Admin-triggered `sp_PublishGeneration` acquires an apply lease through `ApplyLeaseRegistry.BeginApplyLease`. While the lease is held: +Drivers write their computed byte into the OPC UA `ServiceLevel` Variable on each refresh. Clients with the standard redundancy heuristic ("pick the highest ServiceLevel") therefore prefer the role-leader and fall back to followers on its degradation. -- The calculator reports `PrimaryMidApply` / `BackupMidApply` — clients see the band shift and cut over to the unaffected peer rather than racing against a half-applied generation. -- `RedundancyCoordinator.RefreshAsync` is called at the end of the apply window so the post-publish topology becomes visible exactly once, atomically. -- The watchdog force-closes any lease older than `ApplyMaxDuration`; a stuck publisher therefore cannot strand a node at `PrimaryMidApply`. +## Data flow -Because role transitions are **operator-driven** (write `RedundancyRole` in the Config DB + publish), the Backup never auto-promotes. An `IsolatedBackup` at 80 is the signal that the operator should intervene; auto-failover is intentionally out of scope for the non-transparent model (decision #154). +``` +Cluster topology event ──┐ +DB health probe ─────────┤ +OPC UA peer probe ───────┤ + ▼ + RedundancyStateActor (admin singleton) + │ debounce 250ms + ▼ + DPS topic "redundancy-state" + │ + ▼ + Driver nodes' OpcUaPublishActor + │ + ▼ + ServiceLevelCalculator → byte + │ + ▼ + OPC UA ServiceLevel Variable +``` -## Metrics +The admin singleton is the cluster's only `RedundancyStateActor`. If the admin leader fails over, the new admin node spins up its replacement, re-subscribes to cluster events, and publishes a fresh snapshot from the current `Cluster.State`. There is no DB-persisted state to recover. -`RedundancyMetrics` in `src/Server/ZB.MOM.WW.OtOpcUa.Admin/Services/RedundancyMetrics.cs` registers the `ZB.MOM.WW.OtOpcUa.Redundancy` meter on the Admin process. Instruments: +## Configuration -| Name | Kind | Tags | Description | -|---|---|---|---| -| `otopcua.redundancy.role_transition` | Counter | `cluster.id`, `node.id`, `from_role`, `to_role` | Incremented every time `FleetStatusPoller` observes a `RedundancyRole` change on a `ClusterNode` row. | -| `otopcua.redundancy.primary_count` | ObservableGauge | `cluster.id` | Primary-role nodes per cluster — should be exactly 1 in nominal state. | -| `otopcua.redundancy.secondary_count` | ObservableGauge | `cluster.id` | Secondary-role nodes per cluster. | -| `otopcua.redundancy.stale_count` | ObservableGauge | `cluster.id` | Nodes whose `LastSeenAt` exceeded the stale threshold. | +Per-node identity comes from `appsettings.json` + the `OTOPCUA_ROLES` env var: -Admin `Program.cs` wires OpenTelemetry to the Prometheus exporter when `Metrics:Prometheus:Enabled=true` (default), exposing the meter under `/metrics`. The endpoint is intentionally unauthenticated — fleet conventions put it behind a reverse-proxy basic-auth gate if needed. +```json +{ + "Cluster": { + "Hostname": "0.0.0.0", + "Port": 4053, + "PublicHostname": "node-a.lan", + "SeedNodes": ["akka.tcp://otopcua@node-a.lan:4053"], + "Roles": ["admin", "driver"] + } +} +``` -## Real-time notifications (Admin UI) +``` +OTOPCUA_ROLES=admin,driver +``` -`FleetStatusPoller` in `src/Server/ZB.MOM.WW.OtOpcUa.Admin/Hubs/` polls the `ClusterNode` table, records role transitions, updates `RedundancyMetrics.SetClusterCounts`, and pushes a `RoleChanged` SignalR event onto `FleetStatusHub` when a transition is observed. `RedundancyTab.razor` subscribes with `_hub.On("RoleChanged", …)` so connected Admin sessions see role swaps the moment they happen. +Both nodes share the same `ConfigDb` connection string; `Cluster.PublicHostname` + `Roles` are what makes them distinct in cluster gossip. The first node bootstraps the cluster (its address goes in `SeedNodes`); the second node joins via the same `SeedNodes` list. -## Configuring a redundant pair +There is no longer a `Node:NodeId` setting, no `ClusterNode.RedundancyRole`, no `ServiceLevelBase`. NodeId is derived as `host:port` of the cluster `PublicHostname` (see `ClusterRoleInfo.LocalNode` for the formula). -Redundancy is configured **in the Config DB, not appsettings.json**. The fields that must differ between the two instances: +## Split-brain -| Field | Location | Instance 1 | Instance 2 | -|---|---|---|---| -| `NodeId` | `appsettings.json` `Node:NodeId` (bootstrap) | `node-a` | `node-b` | -| `ClusterNode.ApplicationUri` | Config DB | `urn:node-a:OtOpcUa` | `urn:node-b:OtOpcUa` | -| `ClusterNode.RedundancyRole` | Config DB | `Primary` | `Secondary` | -| `ClusterNode.ServiceLevelBase` | Config DB | typically 255 | typically 100 | +`akka.conf` configures Akka's split-brain resolver with `active-strategy = keep-oldest`, `stable-after = 15s`, and `failure-detector.threshold = 10.0`. Under a clean partition: the oldest member stays up + the smaller (or younger) side downs itself within ~15 seconds. The `RedundancyStateActor` on the surviving partition re-computes from the post-partition `Cluster.State`. -Shared between instances: `ClusterId`, Config DB connection string, published generation, cluster-level ACLs, UNS hierarchy, driver instances. - -Role swaps, stand-alone promotions, and base-level adjustments all happen through the Admin UI `RedundancyTab` — the operator edits the `ClusterNode` row in a draft generation and publishes. `RedundancyCoordinator.RefreshAsync` picks up the new topology without a process restart. +There is no operator-driven role swap during a partition. Failover is what the cluster does automatically. ## Client-side failover -The OtOpcUa Client CLI at `src/Client/ZB.MOM.WW.OtOpcUa.Client.CLI` supports `-F` / `--failover-urls` for automatic client-side failover; for long-running subscriptions the CLI monitors session KeepAlive and reconnects to the next available server, recreating the subscription on the new endpoint. See [`Client.CLI.md`](Client.CLI.md) for the command reference. +The OtOpcUa Client CLI at `src/Client/ZB.MOM.WW.OtOpcUa.Client.CLI` supports `-F` / `--failover-urls` for automatic client-side failover; for long-running subscriptions the CLI monitors session KeepAlive and reconnects to the next available server, recreating the subscription on the new endpoint. See [`Client.CLI.md`](Client.CLI.md). ## Depth reference -For the full decision trail and implementation plan — topology invariants, peer-probe cadence, recovery-dwell policy, compliance-script guard against enum-value drift — see `docs/v2/plan.md` §Phase 6.3. +For the full design — message contracts, tiered calculator truth table, recovery semantics — see `docs/plans/2026-05-26-akka-hosting-alignment-design.md` §6. diff --git a/docs/ServiceHosting.md b/docs/ServiceHosting.md index 2a28a11..df0ff98 100644 --- a/docs/ServiceHosting.md +++ b/docs/ServiceHosting.md @@ -1,62 +1,76 @@ -# Service Hosting +# Service Hosting (v2) ## Overview -A production OtOpcUa deployment runs **two or three processes**, each -with a distinct runtime and install surface: +A production OtOpcUa deployment runs **one binary per node**, plus the optional Wonderware historian sidecar: | Process | Project | Runtime | Platform | Responsibility | |---|---|---|---|---| -| **OtOpcUa Server** | `src/Server/ZB.MOM.WW.OtOpcUa.Server` | .NET 10 | x64 | Hosts the OPC UA endpoint; loads every driver in-process (Modbus, S7, AbCip, AbLegacy, TwinCAT, FOCAS, OPC UA Client, Galaxy via mxaccessgw); exposes `/healthz`. | -| **OtOpcUa Admin** | `src/Server/ZB.MOM.WW.OtOpcUa.Admin` | .NET 10 (ASP.NET Core / Blazor Server) | x64 | Operator UI for Config DB editing + fleet status, SignalR hubs (`FleetStatusHub`, `AlertHub`), Prometheus `/metrics`. | -| **OtOpcUa Wonderware Historian** *(optional)* | `src/Drivers/ZB.MOM.WW.OtOpcUa.Driver.Historian.Wonderware` | .NET Framework 4.8 | x86 (32-bit) | Out-of-process sidecar exposing the Wonderware Historian SDK over a named pipe. Required only when `Historian:Wonderware:Enabled=true` in `appsettings.json`. | +| **OtOpcUa Host** | `src/Server/ZB.MOM.WW.OtOpcUa.Host` | .NET 10 | AnyCPU | Single fused binary. `OTOPCUA_ROLES` env decides what to mount: `admin` (Blazor + auth + control-plane singletons), `driver` (OPC UA endpoint + per-driver actors), or both. | +| **OtOpcUa Wonderware Historian** *(optional)* | `src/Drivers/ZB.MOM.WW.OtOpcUa.Driver.Historian.Wonderware` | .NET Framework 4.8 | x86 (32-bit) | Out-of-process sidecar exposing the Wonderware Historian SDK over a named pipe. Required only when `Historian:Wonderware:Enabled=true`. | -Galaxy access uses a separately-installed **mxaccessgw** running out -of a sibling repo (`c:\Users\dohertj2\Desktop\mxaccessgw\`) — see -`docs/v2/Galaxy.ParityRig.md` for setup. The mxaccessgw owns the -MXAccess COM bitness constraint (its worker is x86 net48); nothing -in the OtOpcUa repo carries that constraint anymore. PR 7.2 retired -the legacy in-process `Galaxy.Host` / `Galaxy.Proxy` / `Galaxy.Shared` -projects + the `OtOpcUaGalaxyHost` Windows service. +Galaxy access still uses the separately-installed **mxaccessgw** sidecar (see `docs/v2/Galaxy.ParityRig.md`); the gateway owns the MXAccess COM bitness constraint (its worker is x86 net48). Nothing in the OtOpcUa repo carries that constraint anymore. -## OtOpcUa Server +> **v2 change.** v1's separate `OtOpcUa.Server` + `OtOpcUa.Admin` Windows services merged into a single role-gated `OtOpcUa.Host` binary. Two installers became one (with a `-Roles` parameter). The whole DI graph is composed in `OtOpcUa.Host/Program.cs`; per-role wiring is conditional on the env var. -Hosted via `Microsoft.Extensions.Hosting` with `AddWindowsService` -(decision #30 — replaced TopShelf in v2). The host's `Build()` -returns immediately when launched interactively (e.g. `dotnet run`) -but blocks for SCM signals when running as a Windows service. +## Role gating -In-process drivers are registered at startup in `Program.cs`'s -`DriverFactoryRegistry` block; the `DriverInstance` rows in the -central Config DB select which driver factories materialise into -live `IDriver` instances. See `docs/v2/driver-specs.md` for the -per-driver `DriverConfig` JSON shapes. +`Program.cs` reads `OTOPCUA_ROLES`, parses it with `RoleParser`, and conditionally registers services: -## OtOpcUa Admin +| Role present | Wires | +|---|---| +| `admin` | `AddOtOpcUaAuth`, `AddAdminUI`, `AddSignalR`, `AddOtOpcUaAdminClients`, `MapOtOpcUaAuth`, `MapAdminUI`, `MapOtOpcUaHubs`, `WithOtOpcUaControlPlaneSingletons` (5 admin singletons via `Akka.Hosting`) | +| `driver` | `WithOtOpcUaRuntimeActors` (DriverHostActor + DbHealthProbeActor) — and the OPC UA endpoint on port 4840 | +| Either / both | `AddOtOpcUaConfigDb`, `AddOtOpcUaCluster`, `AddOtOpcUaHealth` (`/health/ready`, `/health/active`, `/healthz`) | -Same hosting model; runs the Blazor Server UI + SignalR hubs. -Reads from the same Config DB the Server writes to. +Single-node dev: `OTOPCUA_ROLES=admin,driver`. Production: typically two admin nodes (HA pair) + N driver nodes. + +## Akka cluster + +The host joins an Akka.NET cluster bound to the address in `appsettings.json::Cluster`: + +```json +{ + "Cluster": { + "Hostname": "0.0.0.0", + "Port": 4053, + "PublicHostname": "node-a.lan", + "SeedNodes": ["akka.tcp://otopcua@node-a.lan:4053"], + "Roles": ["admin", "driver"] + } +} +``` + +- `WithOtOpcUaClusterBootstrap` (in `OtOpcUa.Cluster`) loads the embedded HOCON (split-brain resolver, pinned dispatcher, failure detector tuning) and overlays remote endpoint + cluster options. +- All cluster singletons + per-node actors live on this single ActorSystem — there is no second Akka instance. + +See [Redundancy.md](Redundancy.md) for the role-leader + ServiceLevel story. + +## Health endpoints + +Both admin and driver nodes expose: + +| Path | Status meaning | +|---|---| +| `/healthz` | Process alive. | +| `/health/ready` | ConfigDb reachable + cluster member state is `Up`. | +| `/health/active` | Admin-role leader (the node Traefik or an HA LB should route traffic to). | + +Used by Traefik for the active-leader-only routing pattern (see [Task 63 traefik docs](v2/Architecture-v2.md) — TODO). ## OtOpcUa Wonderware Historian (optional) -When `Historian:Wonderware:Enabled=true`, the Server speaks to a -sidecar that wraps the Wonderware Historian SDK (which is .NET -Framework only). The pipe IPC contract is in -`src/Drivers/ZB.MOM.WW.OtOpcUa.Driver.Historian.Wonderware.Client/Contracts/` -and the sidecar's pipe handler lives at -`src/Drivers/ZB.MOM.WW.OtOpcUa.Driver.Historian.Wonderware/Pipe/`. - -Install via the `-InstallWonderwareHistorian` switch on -`scripts/install/Install-Services.ps1`. +Unchanged from v1. Pipe IPC contract lives in `src/Drivers/ZB.MOM.WW.OtOpcUa.Driver.Historian.Wonderware.Client/Contracts/`; sidecar pipe handler in `src/Drivers/ZB.MOM.WW.OtOpcUa.Driver.Historian.Wonderware/Pipe/`. Install via `scripts/install/Install-Services.ps1 -InstallWonderwareHistorian`. ## Install / Uninstall -- `scripts/install/Install-Services.ps1` — installs `OtOpcUa` and - optionally `OtOpcUaWonderwareHistorian`. -- `scripts/install/Uninstall-Services.ps1` — stops + removes both, - plus `OtOpcUaGalaxyHost` if a pre-7.2 rig still carries it. +- `scripts/install/Install-Services.ps1 -Roles admin,driver` — installs `OtOpcUaHost`. v2 rewrite tracked as plan Task 62. +- `scripts/install/Uninstall-Services.ps1` — stops + removes the host service (and the historian sidecar if installed). ## Logging -Serilog with rolling-daily file sinks. Each service writes to -`%ProgramData%\OtOpcUa\-*.log` plus stdout (NSSM-friendly). +Serilog with rolling-daily file sinks. Each host writes to `logs/otopcua-*.log` plus stdout (NSSM/systemd-friendly). Per-environment log level overrides go in `appsettings.{Environment}.json`. + +## Depth reference + +For the full host-architecture rationale (why fused vs. split, role-gating tradeoffs, multi-node deployment shapes), see `docs/plans/2026-05-26-akka-hosting-alignment-design.md` §3-4. diff --git a/docs/security.md b/docs/security.md index e8d4bb3..ab2a450 100644 --- a/docs/security.md +++ b/docs/security.md @@ -1,5 +1,19 @@ # Security +> **v2 status (2026-05-26).** The four security concerns below are unchanged in v2. +> Paths + project names moved: `OtOpcUa.Server/Security/` → `OtOpcUa.Security/` +> (`Ldap/`, `Jwt/`, `Endpoints/AuthEndpoints.cs`), `OtOpcUa.Admin` is gone (its +> auth + role-grant pages live in `OtOpcUa.AdminUI`), and Admin auth policies +> register in `OtOpcUa.Host/Program.cs` via `AddOtOpcUaAuth` rather than in a +> separate Admin process. The v2 `Security:Jwt` section adds JWT bearer auth +> alongside the existing cookie scheme (`AddJwtBearer` wired via +> `IPostConfigureOptions` in `OtOpcUa.Security`). DataProtection +> keys persist to the shared `ConfigDb.DataProtectionKeys` table so cookies +> survive failover between admin-role nodes. +> +> See `docs/plans/2026-05-26-akka-hosting-alignment-design.md` §5 for the v2 +> auth + DataProtection rationale. + OtOpcUa has four independent security concerns. This document covers all four: 1. **Transport security** — OPC UA secure channel (signing, encryption, X.509 trust).