Five operational docs rewritten for v2 (multi-process, multi-driver, Config-DB authoritative): - docs/Configuration.md — replaced appsettings-only story with the two-layer model. appsettings.json is bootstrap only (Node identity, Config DB connection string, transport security, LDAP bind, logging). Authoritative config (clusters, namespaces, UNS, equipment, tags, driver instances, ACLs, role grants, poll groups) lives in the Config DB accessed via OtOpcUaConfigDbContext and edited through the Admin UI draft/publish workflow. Added v1-to-v2 migration index so operators can locate where each old section moved. Cross-links to docs/v2/config-db-schema.md + docs/v2/admin-ui.md. - docs/Redundancy.md — Phase 6.3 rewrite. Named every class under src/ZB.MOM.WW.OtOpcUa.Server/Redundancy/: RedundancyCoordinator, RedundancyTopology, ApplyLeaseRegistry (publish fencing), PeerReachabilityTracker, RecoveryStateManager, ServiceLevelCalculator (pure function), RedundancyStatePublisher. Documented the full 11-band ServiceLevel matrix (Maintenance=0 through AuthoritativePrimary=255) from ServiceLevelCalculator.cs and the per-ClusterNode fields (RedundancyRole, ServiceLevelBase, ApplicationUri). Covered metrics (otopcua.redundancy.role_transition counter + primary/secondary/stale_count gauges on meter ZB.MOM.WW.OtOpcUa.Redundancy) and SignalR RoleChanged push from FleetStatusPoller to RedundancyTab.razor. - docs/security.md — preserved the transport-security section (still accurate) and added Phase 6.2 authorization. Four concerns now documented in one place: (1) transport security profiles, (2) OPC UA auth via LdapUserAuthenticator (note: task spec called this LdapAuthenticationProvider — actual class name is LdapUserAuthenticator in Server/Security/), (3) data-plane authorization via NodeAcl + PermissionTrie + AuthorizationGate — additive-only model per decision #129, ClusterId → Namespace → UnsArea → UnsLine → Equipment → Tag hierarchy, NodePermissions bundle, PermissionProbeService in Admin for "probe this permission", (4) control-plane authorization via LdapGroupRoleMapping + AdminRole (ConfigViewer / ConfigEditor / FleetAdmin, CanEdit / CanPublish policies) — deliberately independent of data-plane ACLs per decision #150. Documented the OTOPCUA0001 Roslyn analyzer (UnwrappedCapabilityCallAnalyzer) as the compile-time guard ensuring every driver-capability async call is wrapped by CapabilityInvoker. - docs/ServiceHosting.md — three-process rewrite: OtOpcUa Server (net10 x64, BackgroundService + AddWindowsService, hosts OPC UA endpoint + all non-Galaxy drivers), OtOpcUa Admin (net10 x64, Blazor Server + SignalR + /metrics via OpenTelemetry Prometheus exporter), OtOpcUa Galaxy.Host (.NET Framework 4.8 x86, NSSM-wrapped, env-variable driven, STA thread + MXAccess COM). Pipe ACL denies-Admins detail + non-elevated shell requirement captured from feedback memory. Divergence from CLAUDE.md: task spec said "TopShelf is still the service-installer wrapper per CLAUDE.md note" but no csproj in the repo references TopShelf — decision #30 replaced it with the generic host's AddWindowsService wrapper (per the doc comment on OpcUaServerService). Reflected the actual state + flagged this divergence here so someone can update CLAUDE.md separately. - docs/StatusDashboard.md — replaced the full v1 reference (dashboard endpoints, health check rules, StatusData DTO, etc.) with a short "superseded by Admin UI" pointer that preserves git-blame continuity + avoids broken links from other docs that reference it. Class references verified by reading: src/ZB.MOM.WW.OtOpcUa.Server/Redundancy/{RedundancyCoordinator, ServiceLevelCalculator, ApplyLeaseRegistry, RedundancyStatePublisher}.cs src/ZB.MOM.WW.OtOpcUa.Core/Authorization/{PermissionTrie, PermissionTrieBuilder, PermissionTrieCache, TriePermissionEvaluator, AuthorizationGate}.cs src/ZB.MOM.WW.OtOpcUa.Server/Security/{AuthorizationGate, LdapUserAuthenticator}.cs src/ZB.MOM.WW.OtOpcUa.Admin/{Program.cs, Services/AdminRoles.cs, Services/RedundancyMetrics.cs, Hubs/FleetStatusPoller.cs} src/ZB.MOM.WW.OtOpcUa.Server/Program.cs + appsettings.json src/ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Host/{Program.cs, Ipc/PipeServer.cs} src/ZB.MOM.WW.OtOpcUa.Configuration/Entities/{ClusterNode, NodeAcl, LdapGroupRoleMapping}.cs src/ZB.MOM.WW.OtOpcUa.Analyzers/UnwrappedCapabilityCallAnalyzer.cs Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
134 lines
11 KiB
Markdown
134 lines
11 KiB
Markdown
# Service Hosting
|
|
|
|
## Overview
|
|
|
|
A production OtOpcUa deployment runs **three processes**, each with a distinct runtime, platform target, and install surface:
|
|
|
|
| Process | Project | Runtime | Platform | Responsibility |
|
|
|---|---|---|---|---|
|
|
| **OtOpcUa Server** | `src/ZB.MOM.WW.OtOpcUa.Server` | .NET 10 | x64 | Hosts the OPC UA endpoint; loads every non-Galaxy driver in-process; exposes `/healthz`. |
|
|
| **OtOpcUa Admin** | `src/ZB.MOM.WW.OtOpcUa.Admin` | .NET 10 (ASP.NET Core / Blazor Server) | x64 | Operator UI for Config DB editing + fleet status, SignalR hubs (`FleetStatusHub`, `AlertHub`), Prometheus `/metrics`. |
|
|
| **OtOpcUa Galaxy.Host** | `src/ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Host` | .NET Framework 4.8 | x86 (32-bit) | Hosts MXAccess COM on a dedicated STA thread with a Win32 message pump; exposes a named-pipe IPC surface consumed by `Driver.Galaxy.Proxy` inside the Server process. |
|
|
|
|
The x86 / .NET Framework 4.8 constraint applies **only** to Galaxy.Host because the MXAccess toolkit DLLs (`Program Files (x86)\ArchestrA\Framework\bin`) are 32-bit-only COM. Every other driver (Modbus, S7, OpcUaClient, AbCip, AbLegacy, TwinCAT, FOCAS) runs in-process in the 64-bit Server.
|
|
|
|
## Server process
|
|
|
|
`src/ZB.MOM.WW.OtOpcUa.Server/Program.cs` uses the generic host:
|
|
|
|
```csharp
|
|
var builder = Host.CreateApplicationBuilder(args);
|
|
builder.Services.AddSerilog();
|
|
builder.Services.AddWindowsService(o => o.ServiceName = "OtOpcUa");
|
|
…
|
|
builder.Services.AddHostedService<OpcUaServerService>();
|
|
builder.Services.AddHostedService<HostStatusPublisher>();
|
|
```
|
|
|
|
`OpcUaServerService` is a `BackgroundService` (decision #30 — TopShelf from v1 was replaced by the generic-host `AddWindowsService` wrapper; no TopShelf dependency remains in any csproj). It owns:
|
|
|
|
1. Config bootstrap — reads `Node:NodeId`, `Node:ClusterId`, `Node:ConfigDbConnectionString`, `Node:LocalCachePath` from `appsettings.json`.
|
|
2. `NodeBootstrap` — pulls the latest published generation from the Config DB into the LiteDB local cache (`LiteDbConfigCache`) so the node starts even if the central DB is briefly unreachable.
|
|
3. `DriverHost` — instantiates configured driver instances from the generation, wires each through `CapabilityInvoker` resilience pipelines.
|
|
4. `OpcUaApplicationHost` — builds the OPC UA endpoint, applies `OpcUaServerOptions` + `LdapOptions`, registers `AuthorizationGate` at dispatch.
|
|
5. `HostStatusPublisher` — a second hosted service that heartbeats `DriverHostStatus` rows so the Admin UI Fleet view sees the node.
|
|
|
|
### Installation
|
|
|
|
Same executable, different modes driven by the .NET generic-host `AddWindowsService` wrapper:
|
|
|
|
| Mode | Invocation |
|
|
|---|---|
|
|
| Console | `ZB.MOM.WW.OtOpcUa.Server.exe` |
|
|
| Install as Windows service | `sc create OtOpcUa binPath="C:\Program Files\OtOpcUa\Server\ZB.MOM.WW.OtOpcUa.Server.exe" start=auto` |
|
|
| Start | `sc start OtOpcUa` |
|
|
| Stop | `sc stop OtOpcUa` |
|
|
| Uninstall | `sc delete OtOpcUa` |
|
|
|
|
### Health endpoints
|
|
|
|
The Server exposes `/healthz` + `/readyz` used by (a) the Admin `FleetStatusPoller` as input to Fleet status and (b) `PeerReachabilityTracker` in a peer Server process as the HTTP side of the peer-reachability probe.
|
|
|
|
## Admin process
|
|
|
|
`src/ZB.MOM.WW.OtOpcUa.Admin/Program.cs` is a stock `WebApplication`. Highlights:
|
|
|
|
- Cookie auth (`CookieAuthenticationDefaults`, scheme name `OtOpcUa.Admin`) + Blazor Server (`AddInteractiveServerComponents`) + SignalR.
|
|
- Authorization policies gated by `AdminRoles`: `ConfigViewer`, `ConfigEditor`, `FleetAdmin` (see `Services/AdminRoles.cs`). `CanEdit` policy requires `ConfigEditor` or `FleetAdmin`; `CanPublish` requires `FleetAdmin`.
|
|
- `OtOpcUaConfigDbContext` registered against `ConnectionStrings:ConfigDb`.
|
|
- Scoped services: `ClusterService`, `GenerationService`, `EquipmentService`, `UnsService`, `NamespaceService`, `DriverInstanceService`, `NodeAclService`, `PermissionProbeService`, `AclChangeNotifier`, `ReservationService`, `DraftValidationService`, `AuditLogService`, `HostStatusService`, `ClusterNodeService`, `EquipmentImportBatchService`, `ILdapGroupRoleMappingService`.
|
|
- Singleton `RedundancyMetrics` (meter name `ZB.MOM.WW.OtOpcUa.Redundancy`) + `CertTrustService` (promotes rejected client certs in the Server's PKI store to trusted via the Admin Certificates page).
|
|
- `LdapAuthService` bound to `Authentication:Ldap` — same LDAP flow as ScadaLink CentralUI for visual parity.
|
|
- SignalR hubs mapped at `/hubs/fleet` and `/hubs/alerts`; `FleetStatusPoller` runs as a hosted service and pushes `RoleChanged`, host status, and alert events.
|
|
- OpenTelemetry → Prometheus exporter at `/metrics` when `Metrics:Prometheus:Enabled=true` (default). Pull-based means no Collector required in the common K8s deploy.
|
|
|
|
### Installation
|
|
|
|
Deployed as an ASP.NET Core service; the generic-host `AddWindowsService` wrapper (or IIS reverse-proxy for multi-node fleets) provides install/uninstall. Listens on whatever `ASPNETCORE_URLS` specifies.
|
|
|
|
## Galaxy.Host process
|
|
|
|
`src/ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Host/Program.cs` is a .NET Framework 4.8 x86 console executable. Configuration comes from environment variables supplied by the supervisor (`Driver.Galaxy.Proxy.Supervisor`):
|
|
|
|
| Env var | Purpose |
|
|
|---|---|
|
|
| `OTOPCUA_GALAXY_PIPE` | Pipe name the host listens on (default `OtOpcUaGalaxy`). |
|
|
| `OTOPCUA_ALLOWED_SID` | SID of the Server process's principal; anyone else is refused during the handshake. |
|
|
| `OTOPCUA_GALAXY_SECRET` | Per-spawn shared secret the client must present in the Hello frame. |
|
|
| `OTOPCUA_GALAXY_BACKEND` | `mxaccess` (default), `db` (ZB-only, no COM), `stub` (in-memory; for tests). |
|
|
| `OTOPCUA_GALAXY_ZB_CONN` | SQL connection string to the ZB Galaxy repository. |
|
|
| `OTOPCUA_HISTORIAN_*` | Optional Wonderware Historian SDK config if Historian is enabled for this node. |
|
|
|
|
The host spins up `StaPump` (the STA thread with message pump), creates the MXAccess `LMXProxyServer` COM object on that thread, and handles all COM calls there; the IPC layer marshals work items via `PostThreadMessage`.
|
|
|
|
### Pipe security
|
|
|
|
`PipeServer` builds a `PipeAcl` from the provided `SecurityIdentifier` + uses `NamedPipeServerStream` with `maxNumberOfServerInstances: 1`. The handshake requires a matching shared secret in the first Hello frame; callers whose SID doesn't match `OTOPCUA_ALLOWED_SID` are rejected before any frame is processed. **By design the pipe ACL denies BUILTIN\Administrators** — live smoke tests must therefore run from a non-elevated shell that matches the allowed principal. The installed dev host (`OtOpcUaGalaxyHost`) runs as `dohertj2` with the secret at `.local/galaxy-host-secret.txt`.
|
|
|
|
### Installation
|
|
|
|
NSSM-wrapped (the Non-Sucking Service Manager) because the executable itself is a plain console app, not a `ServiceBase` Windows service. The supervisor then adopts the child process over the pipe after install. Install/uninstall commands follow the NSSM pattern:
|
|
|
|
```bash
|
|
nssm install OtOpcUaGalaxyHost "C:\Program Files (x86)\OtOpcUa\Galaxy.Host\ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Host.exe"
|
|
nssm set OtOpcUaGalaxyHost ObjectName .\dohertj2 <password>
|
|
nssm set OtOpcUaGalaxyHost AppEnvironmentExtra OTOPCUA_GALAXY_BACKEND=mxaccess OTOPCUA_GALAXY_SECRET=… OTOPCUA_ALLOWED_SID=…
|
|
nssm start OtOpcUaGalaxyHost
|
|
```
|
|
|
|
(Exact values for the environment block are generated by the Admin UI + committed alongside `.local/galaxy-host-secret.txt` on the dev box.)
|
|
|
|
## Inter-process communication
|
|
|
|
```
|
|
┌──────────────────────────┐ LDAP bind (Authentication:Ldap) ┌──────────────────────────┐
|
|
│ OtOpcUa Admin (x64) │ ─────────────────────────────────────────────▶│ LDAP / AD │
|
|
│ Blazor Server + SignalR │ └──────────────────────────┘
|
|
│ /metrics (Prometheus) │ FleetStatusPoller → ClusterNode poll
|
|
│ │ ─────────────────────────────────────────────▶┌──────────────────────────┐
|
|
│ │ Cluster/Generation/ACL writes │ Config DB (SQL Server) │
|
|
└──────────────────────────┘ ─────────────────────────────────────────────▶│ OtOpcUaConfigDbContext │
|
|
▲ └──────────────────────────┘
|
|
│ SignalR ▲
|
|
│ (role change, │ sp_GetCurrentGenerationForCluster
|
|
│ host status, │ sp_PublishGeneration
|
|
│ alerts) │
|
|
┌──────────────────────────┐ │
|
|
│ OtOpcUa Server (x64) │ ──────────────────────────────────────────────────────────┘
|
|
│ OPC UA endpoint │
|
|
│ Non-Galaxy drivers │ Named pipe (OtOpcUaGalaxy) ┌──────────────────────────┐
|
|
│ Driver.Galaxy.Proxy │ ─────────────────────────────────────────────▶│ Galaxy.Host (x86 .NFx) │
|
|
│ │ SID + shared-secret handshake │ STA + message pump │
|
|
│ /healthz /readyz │ │ MXAccess COM │
|
|
└──────────────────────────┘ │ Historian SDK (opt) │
|
|
└──────────────────────────┘
|
|
```
|
|
|
|
## appsettings.json boundary
|
|
|
|
Each process reads its own `appsettings.json` for **bootstrap only** — connection strings, LDAP bind config, transport security profile, redundancy node id, logging. The authoritative configuration tree (drivers, UNS, tags, ACLs) lives in the Config DB and is edited through the Admin UI. See [`Configuration.md`](Configuration.md) for the split.
|
|
|
|
## Development bootstrap
|
|
|
|
For the Windows install steps (SQL Server in Docker, .NET 10 SDK, .NET Framework 4.8 SDK, Docker Desktop WSL 2 backend, EF Core CLI, first-run migration), see [`docs/v2/dev-environment.md`](v2/dev-environment.md).
|