Files
lmxopcua/docs/ServiceHosting.md
Joseph Doherty 5506b43ddc Doc refresh (task #204) — operational docs for multi-process multi-driver OtOpcUa
Five operational docs rewritten for v2 (multi-process, multi-driver, Config-DB authoritative):

- docs/Configuration.md — replaced appsettings-only story with the two-layer model.
  appsettings.json is bootstrap only (Node identity, Config DB connection string,
  transport security, LDAP bind, logging). Authoritative config (clusters, namespaces,
  UNS, equipment, tags, driver instances, ACLs, role grants, poll groups) lives in
  the Config DB accessed via OtOpcUaConfigDbContext and edited through the Admin UI
  draft/publish workflow. Added v1-to-v2 migration index so operators can locate where
  each old section moved. Cross-links to docs/v2/config-db-schema.md + docs/v2/admin-ui.md.

- docs/Redundancy.md — Phase 6.3 rewrite. Named every class under
  src/ZB.MOM.WW.OtOpcUa.Server/Redundancy/: RedundancyCoordinator, RedundancyTopology,
  ApplyLeaseRegistry (publish fencing), PeerReachabilityTracker, RecoveryStateManager,
  ServiceLevelCalculator (pure function), RedundancyStatePublisher. Documented the
  full 11-band ServiceLevel matrix (Maintenance=0 through AuthoritativePrimary=255)
  from ServiceLevelCalculator.cs and the per-ClusterNode fields (RedundancyRole,
  ServiceLevelBase, ApplicationUri). Covered metrics
  (otopcua.redundancy.role_transition counter + primary/secondary/stale_count gauges
  on meter ZB.MOM.WW.OtOpcUa.Redundancy) and SignalR RoleChanged push from
  FleetStatusPoller to RedundancyTab.razor.

- docs/security.md — preserved the transport-security section (still accurate) and
  added Phase 6.2 authorization. Four concerns now documented in one place:
  (1) transport security profiles, (2) OPC UA auth via LdapUserAuthenticator
  (note: task spec called this LdapAuthenticationProvider — actual class name is
  LdapUserAuthenticator in Server/Security/), (3) data-plane authorization via
  NodeAcl + PermissionTrie + AuthorizationGate — additive-only model per decision
  #129, ClusterId → Namespace → UnsArea → UnsLine → Equipment → Tag hierarchy,
  NodePermissions bundle, PermissionProbeService in Admin for "probe this permission",
  (4) control-plane authorization via LdapGroupRoleMapping + AdminRole
  (ConfigViewer / ConfigEditor / FleetAdmin, CanEdit / CanPublish policies) —
  deliberately independent of data-plane ACLs per decision #150. Documented the
  OTOPCUA0001 Roslyn analyzer (UnwrappedCapabilityCallAnalyzer) as the compile-time
  guard ensuring every driver-capability async call is wrapped by CapabilityInvoker.

- docs/ServiceHosting.md — three-process rewrite: OtOpcUa Server (net10 x64,
  BackgroundService + AddWindowsService, hosts OPC UA endpoint + all non-Galaxy
  drivers), OtOpcUa Admin (net10 x64, Blazor Server + SignalR + /metrics via
  OpenTelemetry Prometheus exporter), OtOpcUa Galaxy.Host (.NET Framework 4.8 x86,
  NSSM-wrapped, env-variable driven, STA thread + MXAccess COM). Pipe ACL
  denies-Admins detail + non-elevated shell requirement captured from feedback memory.
  Divergence from CLAUDE.md: task spec said "TopShelf is still the service-installer
  wrapper per CLAUDE.md note" but no csproj in the repo references TopShelf — decision
  #30 replaced it with the generic host's AddWindowsService wrapper (per the doc
  comment on OpcUaServerService). Reflected the actual state + flagged this divergence
  here so someone can update CLAUDE.md separately.

- docs/StatusDashboard.md — replaced the full v1 reference (dashboard endpoints,
  health check rules, StatusData DTO, etc.) with a short "superseded by Admin UI"
  pointer that preserves git-blame continuity + avoids broken links from other docs
  that reference it.

Class references verified by reading:
  src/ZB.MOM.WW.OtOpcUa.Server/Redundancy/{RedundancyCoordinator, ServiceLevelCalculator,
      ApplyLeaseRegistry, RedundancyStatePublisher}.cs
  src/ZB.MOM.WW.OtOpcUa.Core/Authorization/{PermissionTrie, PermissionTrieBuilder,
      PermissionTrieCache, TriePermissionEvaluator, AuthorizationGate}.cs
  src/ZB.MOM.WW.OtOpcUa.Server/Security/{AuthorizationGate, LdapUserAuthenticator}.cs
  src/ZB.MOM.WW.OtOpcUa.Admin/{Program.cs, Services/AdminRoles.cs,
      Services/RedundancyMetrics.cs, Hubs/FleetStatusPoller.cs}
  src/ZB.MOM.WW.OtOpcUa.Server/Program.cs + appsettings.json
  src/ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Host/{Program.cs, Ipc/PipeServer.cs}
  src/ZB.MOM.WW.OtOpcUa.Configuration/Entities/{ClusterNode, NodeAcl,
      LdapGroupRoleMapping}.cs
  src/ZB.MOM.WW.OtOpcUa.Analyzers/UnwrappedCapabilityCallAnalyzer.cs

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-20 01:34:25 -04:00

11 KiB

Service Hosting

Overview

A production OtOpcUa deployment runs three processes, each with a distinct runtime, platform target, and install surface:

Process Project Runtime Platform Responsibility
OtOpcUa Server src/ZB.MOM.WW.OtOpcUa.Server .NET 10 x64 Hosts the OPC UA endpoint; loads every non-Galaxy driver in-process; exposes /healthz.
OtOpcUa Admin src/ZB.MOM.WW.OtOpcUa.Admin .NET 10 (ASP.NET Core / Blazor Server) x64 Operator UI for Config DB editing + fleet status, SignalR hubs (FleetStatusHub, AlertHub), Prometheus /metrics.
OtOpcUa Galaxy.Host src/ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Host .NET Framework 4.8 x86 (32-bit) Hosts MXAccess COM on a dedicated STA thread with a Win32 message pump; exposes a named-pipe IPC surface consumed by Driver.Galaxy.Proxy inside the Server process.

The x86 / .NET Framework 4.8 constraint applies only to Galaxy.Host because the MXAccess toolkit DLLs (Program Files (x86)\ArchestrA\Framework\bin) are 32-bit-only COM. Every other driver (Modbus, S7, OpcUaClient, AbCip, AbLegacy, TwinCAT, FOCAS) runs in-process in the 64-bit Server.

Server process

src/ZB.MOM.WW.OtOpcUa.Server/Program.cs uses the generic host:

var builder = Host.CreateApplicationBuilder(args);
builder.Services.AddSerilog();
builder.Services.AddWindowsService(o => o.ServiceName = "OtOpcUa");

builder.Services.AddHostedService<OpcUaServerService>();
builder.Services.AddHostedService<HostStatusPublisher>();

OpcUaServerService is a BackgroundService (decision #30 — TopShelf from v1 was replaced by the generic-host AddWindowsService wrapper; no TopShelf dependency remains in any csproj). It owns:

  1. Config bootstrap — reads Node:NodeId, Node:ClusterId, Node:ConfigDbConnectionString, Node:LocalCachePath from appsettings.json.
  2. NodeBootstrap — pulls the latest published generation from the Config DB into the LiteDB local cache (LiteDbConfigCache) so the node starts even if the central DB is briefly unreachable.
  3. DriverHost — instantiates configured driver instances from the generation, wires each through CapabilityInvoker resilience pipelines.
  4. OpcUaApplicationHost — builds the OPC UA endpoint, applies OpcUaServerOptions + LdapOptions, registers AuthorizationGate at dispatch.
  5. HostStatusPublisher — a second hosted service that heartbeats DriverHostStatus rows so the Admin UI Fleet view sees the node.

Installation

Same executable, different modes driven by the .NET generic-host AddWindowsService wrapper:

Mode Invocation
Console ZB.MOM.WW.OtOpcUa.Server.exe
Install as Windows service sc create OtOpcUa binPath="C:\Program Files\OtOpcUa\Server\ZB.MOM.WW.OtOpcUa.Server.exe" start=auto
Start sc start OtOpcUa
Stop sc stop OtOpcUa
Uninstall sc delete OtOpcUa

Health endpoints

The Server exposes /healthz + /readyz used by (a) the Admin FleetStatusPoller as input to Fleet status and (b) PeerReachabilityTracker in a peer Server process as the HTTP side of the peer-reachability probe.

Admin process

src/ZB.MOM.WW.OtOpcUa.Admin/Program.cs is a stock WebApplication. Highlights:

  • Cookie auth (CookieAuthenticationDefaults, scheme name OtOpcUa.Admin) + Blazor Server (AddInteractiveServerComponents) + SignalR.
  • Authorization policies gated by AdminRoles: ConfigViewer, ConfigEditor, FleetAdmin (see Services/AdminRoles.cs). CanEdit policy requires ConfigEditor or FleetAdmin; CanPublish requires FleetAdmin.
  • OtOpcUaConfigDbContext registered against ConnectionStrings:ConfigDb.
  • Scoped services: ClusterService, GenerationService, EquipmentService, UnsService, NamespaceService, DriverInstanceService, NodeAclService, PermissionProbeService, AclChangeNotifier, ReservationService, DraftValidationService, AuditLogService, HostStatusService, ClusterNodeService, EquipmentImportBatchService, ILdapGroupRoleMappingService.
  • Singleton RedundancyMetrics (meter name ZB.MOM.WW.OtOpcUa.Redundancy) + CertTrustService (promotes rejected client certs in the Server's PKI store to trusted via the Admin Certificates page).
  • LdapAuthService bound to Authentication:Ldap — same LDAP flow as ScadaLink CentralUI for visual parity.
  • SignalR hubs mapped at /hubs/fleet and /hubs/alerts; FleetStatusPoller runs as a hosted service and pushes RoleChanged, host status, and alert events.
  • OpenTelemetry → Prometheus exporter at /metrics when Metrics:Prometheus:Enabled=true (default). Pull-based means no Collector required in the common K8s deploy.

Installation

Deployed as an ASP.NET Core service; the generic-host AddWindowsService wrapper (or IIS reverse-proxy for multi-node fleets) provides install/uninstall. Listens on whatever ASPNETCORE_URLS specifies.

Galaxy.Host process

src/ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Host/Program.cs is a .NET Framework 4.8 x86 console executable. Configuration comes from environment variables supplied by the supervisor (Driver.Galaxy.Proxy.Supervisor):

Env var Purpose
OTOPCUA_GALAXY_PIPE Pipe name the host listens on (default OtOpcUaGalaxy).
OTOPCUA_ALLOWED_SID SID of the Server process's principal; anyone else is refused during the handshake.
OTOPCUA_GALAXY_SECRET Per-spawn shared secret the client must present in the Hello frame.
OTOPCUA_GALAXY_BACKEND mxaccess (default), db (ZB-only, no COM), stub (in-memory; for tests).
OTOPCUA_GALAXY_ZB_CONN SQL connection string to the ZB Galaxy repository.
OTOPCUA_HISTORIAN_* Optional Wonderware Historian SDK config if Historian is enabled for this node.

The host spins up StaPump (the STA thread with message pump), creates the MXAccess LMXProxyServer COM object on that thread, and handles all COM calls there; the IPC layer marshals work items via PostThreadMessage.

Pipe security

PipeServer builds a PipeAcl from the provided SecurityIdentifier + uses NamedPipeServerStream with maxNumberOfServerInstances: 1. The handshake requires a matching shared secret in the first Hello frame; callers whose SID doesn't match OTOPCUA_ALLOWED_SID are rejected before any frame is processed. By design the pipe ACL denies BUILTIN\Administrators — live smoke tests must therefore run from a non-elevated shell that matches the allowed principal. The installed dev host (OtOpcUaGalaxyHost) runs as dohertj2 with the secret at .local/galaxy-host-secret.txt.

Installation

NSSM-wrapped (the Non-Sucking Service Manager) because the executable itself is a plain console app, not a ServiceBase Windows service. The supervisor then adopts the child process over the pipe after install. Install/uninstall commands follow the NSSM pattern:

nssm install OtOpcUaGalaxyHost "C:\Program Files (x86)\OtOpcUa\Galaxy.Host\ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Host.exe"
nssm set OtOpcUaGalaxyHost ObjectName .\dohertj2 <password>
nssm set OtOpcUaGalaxyHost AppEnvironmentExtra OTOPCUA_GALAXY_BACKEND=mxaccess OTOPCUA_GALAXY_SECRET=OTOPCUA_ALLOWED_SID=…
nssm start OtOpcUaGalaxyHost

(Exact values for the environment block are generated by the Admin UI + committed alongside .local/galaxy-host-secret.txt on the dev box.)

Inter-process communication

┌──────────────────────────┐        LDAP bind (Authentication:Ldap)        ┌──────────────────────────┐
│  OtOpcUa Admin (x64)     │ ─────────────────────────────────────────────▶│     LDAP / AD            │
│  Blazor Server + SignalR │                                               └──────────────────────────┘
│  /metrics (Prometheus)   │        FleetStatusPoller → ClusterNode poll
│                          │ ─────────────────────────────────────────────▶┌──────────────────────────┐
│                          │        Cluster/Generation/ACL writes          │   Config DB (SQL Server) │
└──────────────────────────┘ ─────────────────────────────────────────────▶│   OtOpcUaConfigDbContext │
             ▲                                                             └──────────────────────────┘
             │ SignalR                                                                 ▲
             │ (role change,                                                           │ sp_GetCurrentGenerationForCluster
             │  host status,                                                           │ sp_PublishGeneration
             │  alerts)                                                                │
┌──────────────────────────┐                                                           │
│  OtOpcUa Server (x64)    │ ──────────────────────────────────────────────────────────┘
│  OPC UA endpoint         │
│  Non-Galaxy drivers      │       Named pipe (OtOpcUaGalaxy)                ┌──────────────────────────┐
│  Driver.Galaxy.Proxy     │ ─────────────────────────────────────────────▶│  Galaxy.Host (x86 .NFx)  │
│                          │       SID + shared-secret handshake           │  STA + message pump      │
│  /healthz /readyz        │                                               │  MXAccess COM            │
└──────────────────────────┘                                               │  Historian SDK (opt)     │
                                                                           └──────────────────────────┘

appsettings.json boundary

Each process reads its own appsettings.json for bootstrap only — connection strings, LDAP bind config, transport security profile, redundancy node id, logging. The authoritative configuration tree (drivers, UNS, tags, ACLs) lives in the Config DB and is edited through the Admin UI. See Configuration.md for the split.

Development bootstrap

For the Windows install steps (SQL Server in Docker, .NET 10 SDK, .NET Framework 4.8 SDK, Docker Desktop WSL 2 backend, EF Core CLI, first-run migration), see docs/v2/dev-environment.md.