Per-file summary: - docs/reqs/OpcUaServerReqs.md — rewritten driver-agnostic. OPC-001..OPC-013 re-scoped to multi-driver address-space composition + capability dispatch; OPC-014 AuthorizationGate + permission trie; OPC-015 dynamic ServiceLevel via RedundancyCoordinator; OPC-017 surgical generation-apply rebuild; OPC-012 capability dispatch via CapabilityInvoker (decision #143 idempotence-aware retry); OPC-013 per-host Polly isolation (decision #144); OPC-019 OpenTelemetry metrics. Transport-security profile matrix (OPC-010) + UserName/LDAP (OPC-011) preserved. - docs/reqs/GalaxyRepositoryReqs.md — scope clarified as Galaxy-driver-only (not platform). GR-001..GR-004 tied to ITagDiscovery.DiscoverAsync + IRediscoverable; all SQL runs inside OtOpcUa.Galaxy.Host and streams to Proxy via named pipe. GR-008 capability wrapping via CapabilityInvoker added. Cross-links to docs/v2/driver-specs.md + docs/GalaxyRepository.md. - docs/reqs/MxAccessClientReqs.md — scope clarified as Galaxy-Host-only. MXA-001..MXA-009 preserved (STA pump, register/unregister, subscription refcount, auto-reconnect, probe, COM cleanup, operation metrics, error translation). MXA-010 Proxy-side capability wrapping + MXA-011 pipe ACL + per-process shared secret (OTOPCUA_ALLOWED_SID / OTOPCUA_GALAXY_SECRET) added. - docs/reqs/ServiceHostReqs.md — rewritten for three-process deployment. Shared section (SVC-SHARED-001/002) for Serilog + bootstrap-only appsettings. SRV-* for OtOpcUa.Server (net10 x64, Microsoft.Extensions.Hosting + AddWindowsService, in-process driver hosting, redundancy-node bootstrap). ADM-* for OtOpcUa.Admin (Blazor Server, cookie+LDAP auth, CanEdit/CanPublish policies, sole DB writer, Prometheus /metrics, audit logging). GHX-* for OtOpcUa.Galaxy.Host (TopShelf, net48 x86, named-pipe IPC bootstrap, STA backend lifecycle, crash handling tied to supervisor). - docs/reqs/ClientRequirements.md — restructured as numbered, verifiable requirements. SHR-* for Client.Shared (single IOpcUaClientService, ConnectionSettings, failover, cross-platform certs, type-coercing write, UI-thread neutrality). CLI-001..CLI-011 cover connect/read/write/browse/subscribe/historyread/alarms/redundancy. UI-001..UI-008 cover connection panel, tree browser, each tab, connection-state reflection, cross-platform build. Reference design content (IOpcUaClientService shape, models, view-model map, mock layout) preserved. - docs/reqs/StatusDashboardReqs.md — retired cleanly. Replaced with a pointer to docs/v2/admin-ui.md + HLR-015 / HLR-016 / HLR-017 / ADM-*. Mapping table shows each retired DASH-001..DASH-009 requirement's replacement (live cluster-node view via SignalR, Prometheus metrics, driver-instance detail views, etc.). Note that a formal AdminUiReqs.md can be written later if needed for cert compliance. HighLevelReqs.md was already at the target shape (HLR-001..HLR-018 with Revision header noting retired HLR-009) as of commit f217636; verified identical and no additional edit required. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
13 KiB
Service Host — Component Requirements
Revision — Refreshed 2026-04-19 for the OtOpcUa v2 multi-driver platform (task #205). v1 was a single Windows service; v2 ships three cooperating Windows services and the service-host requirements are rewritten per-process. SVC-001…SVC-006 from v1 are preserved in spirit (TopShelf, Serilog, config loading, graceful shutdown, startup sequence, unhandled-exception handling) but are now scoped to the process they apply to. SRV-* prefixes the Server process, ADM-* the Admin process, GHX-* the Galaxy Host process. A shared-requirements section at the top covers cross-process concerns (Serilog, logging rotation, bootstrap config scope).
Parent: HLR-007, HLR-008, HLR-011
Shared Requirements (all three processes)
SVC-SHARED-001: Serilog Logging
Every process shall use Serilog with a rolling daily file sink at Information level minimum, plus a console sink, plus opt-in CompactJsonFormatter file sink.
Acceptance Criteria
- Console sink active on every process (for interactive / debug mode).
- Rolling daily file sink:
- Server:
logs/otopcua-YYYYMMDD.log - Admin:
logs/otopcua-admin-YYYYMMDD.log - Galaxy Host:
%ProgramData%\OtOpcUa\galaxy-host-YYYYMMDD.log
- Server:
- Retention count and min level configurable via
Serilog:*in each process'sappsettings.json. - JSON sink opt-in via
Serilog:WriteJson = true(emits*.json.logalongside the plain-text file) for SIEM ingestion. Log.CloseAndFlush()invoked in afinallyblock on shutdown.- Structured logging (Serilog message templates) — no
string.Format.
SVC-SHARED-002: Bootstrap Configuration Scope
appsettings.json is bootstrap-only per HLR-011. Operational configuration (clusters, drivers, namespaces, tags, ACLs, poll groups) lives in the Config DB.
Acceptance Criteria
appsettings.jsonmay contain only: Config DB connection string,Node:NodeId,Node:ClusterId,Node:LocalCachePath,OpcUa:*security bootstrap fields,Ldap:*bootstrap fields,Serilog:*,Redundancy:*role id.- Any attempt to configure driver instances, tags, or equipment through
appsettings.jsonshall be rejected at startup with a descriptive error. - Invalid or missing required bootstrap fields are detected at startup with a clear error (
"Node:NodeId not configured"style).
OtOpcUa.Server — Service Host Requirements (SRV-*)
SRV-001: Microsoft.Extensions.Hosting + AddWindowsService
The Server shall use Host.CreateApplicationBuilder(args) with AddWindowsService(o => o.ServiceName = "OtOpcUa") to run as a Windows service.
Acceptance Criteria
- Service name
OtOpcUa. - Installs via standard
sc.exetooling or the build-provided installer. - Runs as a configured service account (typically a domain service account with Config DB read access; Windows Auth to SQL Server).
- Console mode (running
ZB.MOM.WW.OtOpcUa.Server.exewith no Windows service context) works for development and debugging. - Platform target: .NET 10 x64 (default per decision in
plan.md§3).
SRV-002: Startup Sequence
The Server shall start components in a defined order, with failure handling at each step.
Acceptance Criteria
- Startup sequence:
- Load
appsettings.jsonbootstrap configuration + initialize Serilog. - Validate bootstrap fields (NodeId, ClusterId, Config DB connection).
- Initialize
OpcUaApplicationHost(server-certificate resolution viaSecurityProfileResolver). - Connect to Config DB; request current published generation for
ClusterId. - If unreachable, fall back to
LiteDbConfigCache(latest applied generation). - Apply generation: register driver instances, build namespaces, wire capability pipelines.
- Start
OpcUaServerServicehosted service (opens endpoint listener). - Start
HostStatusPublisher(pushesClusterNodeGenerationStateto Config DB for Admin UI SignalR consumers). - Start
RedundancyCoordinator+ServiceLevelCalculator.
- Load
- Failure in steps 1-3 prevents startup.
- Failure in steps 4-6 logs Error and enters degraded mode (empty namespaces,
DriverHealth.Unavailableon every driver,ServiceLevel = 0). - Failure in steps 7-9 logs Error and shuts down (endpoint is non-optional).
SRV-003: Graceful Shutdown
On service stop, the Server shall gracefully shut down all driver instances, the OPC UA listener, and flush logs before exiting.
Acceptance Criteria
IHostApplicationLifetime.ApplicationStoppingtriggers orderly shutdown.- Shutdown sequence: stop
HostStatusPublisher→ stop driver instances (disconnect each viaIDriver.DisposeAsync, which for Galaxy tears down the named pipe) → stop OPC UA server (stop accepting new sessions, complete pending reads/writes) → flush Serilog. - Shutdown completes within 30 seconds (Windows SCM timeout).
- All
IDisposable/IAsyncDisposablecomponents disposed in reverse-creation order. - Final log entry:
"OtOpcUa.Server shutdown complete"at Information level.
SRV-004: Unhandled Exception Handling
The Server shall handle unexpected crashes gracefully.
Acceptance Criteria
- Registers
AppDomain.CurrentDomain.UnhandledExceptionhandler that logs Fatal before the process terminates. - Windows service recovery configured: restart on failure with 60-second delay.
- Fatal log entry includes full exception details.
SRV-005: Drivers Hosted In-Process
All drivers except Galaxy run in-process within the Server.
Acceptance Criteria
- Modbus TCP, AB CIP, AB Legacy, S7, TwinCAT, FOCAS, OPC UA Client drivers are resolved from the DI container and managed by
DriverHost. - Galaxy driver in-process component is
Driver.Galaxy.Proxy, which forwards toOtOpcUa.Galaxy.Hostover the named pipe (see GHX-*). - Each driver instance's lifecycle (connect, discover, subscribe, dispose) is orchestrated by
DriverHost.
SRV-006: Redundancy-Node Bootstrap
The Server shall bootstrap its redundancy identity from appsettings.json and the Config DB.
Acceptance Criteria
Node:NodeId+Node:ClusterIdidentify this node uniquely; theRedundancycoordinator looks upClusterNode.RedundancyRole(Primary / Secondary) from the Config DB.- Two nodes of the same cluster connect to the same Config DB and the same ClusterId but have different NodeIds and different
ApplicationUrivalues. - Missing or ambiguous
(ClusterId, NodeId)causes startup failure.
OtOpcUa.Admin — Service Host Requirements (ADM-*)
ADM-001: ASP.NET Core Blazor Server
The Admin app shall use WebApplication.CreateBuilder with Razor Components (AddRazorComponents().AddInteractiveServerComponents()), SignalR, and cookie authentication.
Acceptance Criteria
- Blazor Server (not WebAssembly) per
plan.md§Tech Stack. - Hosts SignalR hubs for live cluster state (used by
ClusterNodeGenerationStateviews, crash-loop alerts, etc.). - Runs as a Windows service via
AddWindowsServiceOR as a standard ASP.NET Core process behind IIS / reverse proxy (site decides). - Platform target: .NET 10 x64.
ADM-002: Authentication and Authorization
Admin users authenticate via LDAP bind with cookie auth; three admin roles gate operations.
Acceptance Criteria
- Cookie auth scheme:
OtOpcUa.Admin, 8-hour expiry, path/loginfor challenge. - LDAP bind via
LdapAuthService; user group memberships map to admin roles (ConfigViewer,ConfigEditor,FleetAdmin). - Authorization policies:
CanEditrequiresConfigEditororFleetAdmin.CanPublishrequiresFleetAdmin.- View-only access requires
ConfigViewer(or higher).
- Unauthenticated requests to any Admin page redirect to
/login. - Per-cluster role grants layer on top: a
ConfigEditorwith no grant for cluster X can view it but not edit.
ADM-003: Config DB as Sole Write Path
The Admin service shall be the only process with write access to the Config DB.
Acceptance Criteria
- EF Core
OtOpcUaConfigDbContextconfigured with the SQL login / connection string that has read+write permission on config tables. - Server nodes connect with a read-only principal (
grant SELECTonly). - Admin writes produce draft-generation rows; publish writes are atomic and transactional.
- Every write is audited via
AuditLogServiceper ADM-006.
ADM-004: Prometheus /metrics Endpoint
The Admin service shall expose an OpenTelemetry → Prometheus metrics endpoint at /metrics.
Acceptance Criteria
OpenTelemetry.Metricsregistered with Prometheus exporter./metricsscrapeable without authentication (standard Prometheus pattern) OR gated behind an infrastructure allow-list (site-configurable).- Exports metrics from Server nodes of managed clusters (aggregated via Config DB heartbeat telemetry) plus Admin-local metrics (login attempts, publish duration, active sessions).
ADM-005: Graceful Shutdown
On shutdown, the Admin service shall disconnect SignalR clients cleanly, finish in-flight DB writes, and flush Serilog.
Acceptance Criteria
IHostApplicationLifetime.ApplicationStoppingcloses SignalR hub connections gracefully.- In-flight publish transactions are allowed to complete up to 30 seconds.
- Final log entry:
"OtOpcUa.Admin shutdown complete".
ADM-006: Audit Logging
Every publish and every ACL / role-grant change shall produce an immutable audit row via AuditLogService.
Acceptance Criteria
- Audit rows include: timestamp (UTC), acting principal (LDAP DN + display name), action, entity kind + id, before/after generation number where applicable, session id, source IP.
- Audit rows are never mutated or deleted by application code.
- Audit table schema enforces immutability via DB permissions (no UPDATE / DELETE granted to the Admin app's principal).
OtOpcUa.Galaxy.Host — Service Host Requirements (GHX-*)
GHX-001: TopShelf Windows Service Hosting
The Galaxy Host shall use TopShelf for Windows service lifecycle (install, uninstall, start, stop) and interactive console mode.
Acceptance Criteria
- Service name
OtOpcUaGalaxyHost, display nameOtOpcUa Galaxy Host. - Installs via
ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Host.exe install. - Uninstalls via
ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Host.exe uninstall. - Runs as a configured user account (typically the same account as the Server, or a dedicated Galaxy service account with ArchestrA platform access).
- Interactive console mode (no args) for development / debugging.
- Platform target: .NET Framework 4.8 x86 — required for MXAccess COM 32-bit interop.
- Development deployments may use NSSM in place of TopShelf (memory:
project_galaxy_host_installed).
Details
- Service description: "OtOpcUa Galaxy Host — MXAccess + Galaxy Repository backend for the Galaxy driver, named-pipe IPC to OtOpcUa.Server."
GHX-002: Named-Pipe IPC Bootstrap
The Host shall open a named pipe on startup whose name, ACL, and shared secret come from environment variables supplied by the supervisor at spawn time.
Acceptance Criteria
OTOPCUA_GALAXY_PIPE→ pipe name (defaultOtOpcUaGalaxy).OTOPCUA_ALLOWED_SID→ SID of the principal allowed to connect; any other principal is denied at the ACL layer.OTOPCUA_GALAXY_SECRET→ per-process shared secret;Driver.Galaxy.Proxymust present it on handshake.OTOPCUA_GALAXY_BACKEND→stub/db/mxaccess(defaultmxaccess) — selects which backend implementation is loaded.- Missing
OTOPCUA_ALLOWED_SIDorOTOPCUA_GALAXY_SECRETat startup throws with a descriptive error.
GHX-003: Backend Lifecycle
The Host shall instantiate the STA pump + MXAccess backend + Galaxy Repository + optional Historian plugin in a defined order and tear them down cleanly on shutdown.
Acceptance Criteria
- Startup (mxaccess backend): initialize Serilog → resolve env vars → create
PipeServer→ startStaPump→ createMxAccessClienton STA thread → initializeGalaxyRepository→ optionally initialize Historian plugin → begin pipe request handling. - Shutdown: stop pipe → dispose MxAccessClient (MXA-007 COM cleanup) → dispose STA pump → flush Serilog.
- Shutdown must complete within 30 seconds (Windows SCM timeout).
Console.CancelKeyPresstriggers the same sequence in console mode.
GHX-004: Unhandled Exception Handling
The Host shall log Fatal on crash and let the supervisor restart it.
Acceptance Criteria
AppDomain.CurrentDomain.UnhandledExceptionhandler logs Fatal with full exception details before termination.- The supervisor's driver-stability policy (
docs/v2/driver-stability.md) governs restart behavior — backoff, crash-loop detection, and alerting live there, not in the Host. - Server-side:
Driver.Galaxy.Proxydetects pipe disconnect, opens its capability circuit, reports Bad quality on Galaxy nodes; reconnects automatically when the Host is back.