# Service Host — Component Requirements > **Revision** — Refreshed 2026-04-19 for the OtOpcUa v2 multi-driver platform (task #205). v1 was a single Windows service; v2 ships **three cooperating Windows services** and the service-host requirements are rewritten per-process. SVC-001…SVC-006 from v1 are preserved in spirit (TopShelf, Serilog, config loading, graceful shutdown, startup sequence, unhandled-exception handling) but are now scoped to the process they apply to. SRV-* prefixes the Server process, ADM-* the Admin process, GHX-* the Galaxy Host process. A shared-requirements section at the top covers cross-process concerns (Serilog, logging rotation, bootstrap config scope). Parent: [HLR-007](HighLevelReqs.md#hlr-007-service-hosting), [HLR-008](HighLevelReqs.md#hlr-008-logging), [HLR-011](HighLevelReqs.md#hlr-011-config-db-and-draft-publish) ## Shared Requirements (all three processes) ### SVC-SHARED-001: Serilog Logging Every process shall use Serilog with a rolling daily file sink at Information level minimum, plus a console sink, plus opt-in CompactJsonFormatter file sink. #### Acceptance Criteria - Console sink active on every process (for interactive / debug mode). - Rolling daily file sink: - Server: `logs/otopcua-YYYYMMDD.log` - Admin: `logs/otopcua-admin-YYYYMMDD.log` - Galaxy Host: `%ProgramData%\OtOpcUa\galaxy-host-YYYYMMDD.log` - Retention count and min level configurable via `Serilog:*` in each process's `appsettings.json`. - JSON sink opt-in via `Serilog:WriteJson = true` (emits `*.json.log` alongside the plain-text file) for SIEM ingestion. - `Log.CloseAndFlush()` invoked in a `finally` block on shutdown. - Structured logging (Serilog message templates) — no `string.Format`. --- ### SVC-SHARED-002: Bootstrap Configuration Scope `appsettings.json` is bootstrap-only per HLR-011. Operational configuration (clusters, drivers, namespaces, tags, ACLs, poll groups) lives in the Config DB. #### Acceptance Criteria - `appsettings.json` may contain only: Config DB connection string, `Node:NodeId`, `Node:ClusterId`, `Node:LocalCachePath`, `OpcUa:*` security bootstrap fields, `Ldap:*` bootstrap fields, `Serilog:*`, `Redundancy:*` role id. - Any attempt to configure driver instances, tags, or equipment through `appsettings.json` shall be rejected at startup with a descriptive error. - Invalid or missing required bootstrap fields are detected at startup with a clear error (`"Node:NodeId not configured"` style). --- ## OtOpcUa.Server — Service Host Requirements (SRV-*) ### SRV-001: Microsoft.Extensions.Hosting + AddWindowsService The Server shall use `Host.CreateApplicationBuilder(args)` with `AddWindowsService(o => o.ServiceName = "OtOpcUa")` to run as a Windows service. #### Acceptance Criteria - Service name `OtOpcUa`. - Installs via standard `sc.exe` tooling or the build-provided installer. - Runs as a configured service account (typically a domain service account with Config DB read access; Windows Auth to SQL Server). - Console mode (running `ZB.MOM.WW.OtOpcUa.Server.exe` with no Windows service context) works for development and debugging. - Platform target: .NET 10 x64 (default per decision in `plan.md` §3). --- ### SRV-002: Startup Sequence The Server shall start components in a defined order, with failure handling at each step. #### Acceptance Criteria - Startup sequence: 1. Load `appsettings.json` bootstrap configuration + initialize Serilog. 2. Validate bootstrap fields (NodeId, ClusterId, Config DB connection). 3. Initialize `OpcUaApplicationHost` (server-certificate resolution via `SecurityProfileResolver`). 4. Connect to Config DB; request current published generation for `ClusterId`. 5. If unreachable, fall back to `LiteDbConfigCache` (latest applied generation). 6. Apply generation: register driver instances, build namespaces, wire capability pipelines. 7. Start `OpcUaServerService` hosted service (opens endpoint listener). 8. Start `HostStatusPublisher` (pushes `ClusterNodeGenerationState` to Config DB for Admin UI SignalR consumers). 9. Start `RedundancyCoordinator` + `ServiceLevelCalculator`. - Failure in steps 1-3 prevents startup. - Failure in steps 4-6 logs Error and enters degraded mode (empty namespaces, `DriverHealth.Unavailable` on every driver, `ServiceLevel = 0`). - Failure in steps 7-9 logs Error and shuts down (endpoint is non-optional). --- ### SRV-003: Graceful Shutdown On service stop, the Server shall gracefully shut down all driver instances, the OPC UA listener, and flush logs before exiting. #### Acceptance Criteria - `IHostApplicationLifetime.ApplicationStopping` triggers orderly shutdown. - Shutdown sequence: stop `HostStatusPublisher` → stop driver instances (disconnect each via `IDriver.DisposeAsync`, which for Galaxy tears down the named pipe) → stop OPC UA server (stop accepting new sessions, complete pending reads/writes) → flush Serilog. - Shutdown completes within 30 seconds (Windows SCM timeout). - All `IDisposable` / `IAsyncDisposable` components disposed in reverse-creation order. - Final log entry: `"OtOpcUa.Server shutdown complete"` at Information level. --- ### SRV-004: Unhandled Exception Handling The Server shall handle unexpected crashes gracefully. #### Acceptance Criteria - Registers `AppDomain.CurrentDomain.UnhandledException` handler that logs Fatal before the process terminates. - Windows service recovery configured: restart on failure with 60-second delay. - Fatal log entry includes full exception details. --- ### SRV-005: Drivers Hosted In-Process All drivers except Galaxy run in-process within the Server. #### Acceptance Criteria - Modbus TCP, AB CIP, AB Legacy, S7, TwinCAT, FOCAS, OPC UA Client drivers are resolved from the DI container and managed by `DriverHost`. - Galaxy driver in-process component is `Driver.Galaxy.Proxy`, which forwards to `OtOpcUa.Galaxy.Host` over the named pipe (see GHX-*). - Each driver instance's lifecycle (connect, discover, subscribe, dispose) is orchestrated by `DriverHost`. --- ### SRV-006: Redundancy-Node Bootstrap The Server shall bootstrap its redundancy identity from `appsettings.json` and the Config DB. #### Acceptance Criteria - `Node:NodeId` + `Node:ClusterId` identify this node uniquely; the `Redundancy` coordinator looks up `ClusterNode.RedundancyRole` (Primary / Secondary) from the Config DB. - Two nodes of the same cluster connect to the same Config DB and the same ClusterId but have different NodeIds and different `ApplicationUri` values. - Missing or ambiguous `(ClusterId, NodeId)` causes startup failure. --- ## OtOpcUa.Admin — Service Host Requirements (ADM-*) ### ADM-001: ASP.NET Core Blazor Server The Admin app shall use `WebApplication.CreateBuilder` with Razor Components (`AddRazorComponents().AddInteractiveServerComponents()`), SignalR, and cookie authentication. #### Acceptance Criteria - Blazor Server (not WebAssembly) per `plan.md` §Tech Stack. - Hosts SignalR hubs for live cluster state (used by `ClusterNodeGenerationState` views, crash-loop alerts, etc.). - Runs as a Windows service via `AddWindowsService` OR as a standard ASP.NET Core process behind IIS / reverse proxy (site decides). - Platform target: .NET 10 x64. --- ### ADM-002: Authentication and Authorization Admin users authenticate via LDAP bind with cookie auth; three admin roles gate operations. #### Acceptance Criteria - Cookie auth scheme: `OtOpcUa.Admin`, 8-hour expiry, path `/login` for challenge. - LDAP bind via `LdapAuthService`; user group memberships map to admin roles (`ConfigViewer`, `ConfigEditor`, `FleetAdmin`). - Authorization policies: - `CanEdit` requires `ConfigEditor` or `FleetAdmin`. - `CanPublish` requires `FleetAdmin`. - View-only access requires `ConfigViewer` (or higher). - Unauthenticated requests to any Admin page redirect to `/login`. - Per-cluster role grants layer on top: a `ConfigEditor` with no grant for cluster X can view it but not edit. --- ### ADM-003: Config DB as Sole Write Path The Admin service shall be the only process with write access to the Config DB. #### Acceptance Criteria - EF Core `OtOpcUaConfigDbContext` configured with the SQL login / connection string that has read+write permission on config tables. - Server nodes connect with a read-only principal (`grant SELECT` only). - Admin writes produce draft-generation rows; publish writes are atomic and transactional. - Every write is audited via `AuditLogService` per ADM-006. --- ### ADM-004: Prometheus /metrics Endpoint The Admin service shall expose an OpenTelemetry → Prometheus metrics endpoint at `/metrics`. #### Acceptance Criteria - `OpenTelemetry.Metrics` registered with Prometheus exporter. - `/metrics` scrapeable without authentication (standard Prometheus pattern) OR gated behind an infrastructure allow-list (site-configurable). - Exports metrics from Server nodes of managed clusters (aggregated via Config DB heartbeat telemetry) plus Admin-local metrics (login attempts, publish duration, active sessions). --- ### ADM-005: Graceful Shutdown On shutdown, the Admin service shall disconnect SignalR clients cleanly, finish in-flight DB writes, and flush Serilog. #### Acceptance Criteria - `IHostApplicationLifetime.ApplicationStopping` closes SignalR hub connections gracefully. - In-flight publish transactions are allowed to complete up to 30 seconds. - Final log entry: `"OtOpcUa.Admin shutdown complete"`. --- ### ADM-006: Audit Logging Every publish and every ACL / role-grant change shall produce an immutable audit row via `AuditLogService`. #### Acceptance Criteria - Audit rows include: timestamp (UTC), acting principal (LDAP DN + display name), action, entity kind + id, before/after generation number where applicable, session id, source IP. - Audit rows are never mutated or deleted by application code. - Audit table schema enforces immutability via DB permissions (no UPDATE / DELETE granted to the Admin app's principal). --- ## OtOpcUa.Galaxy.Host — Service Host Requirements (GHX-*) ### GHX-001: TopShelf Windows Service Hosting The Galaxy Host shall use TopShelf for Windows service lifecycle (install, uninstall, start, stop) and interactive console mode. #### Acceptance Criteria - Service name `OtOpcUaGalaxyHost`, display name `OtOpcUa Galaxy Host`. - Installs via `ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Host.exe install`. - Uninstalls via `ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Host.exe uninstall`. - Runs as a configured user account (typically the same account as the Server, or a dedicated Galaxy service account with ArchestrA platform access). - Interactive console mode (no args) for development / debugging. - Platform target: **.NET Framework 4.8 x86** — required for MXAccess COM 32-bit interop. - Development deployments may use NSSM in place of TopShelf (memory: `project_galaxy_host_installed`). ### Details - Service description: "OtOpcUa Galaxy Host — MXAccess + Galaxy Repository backend for the Galaxy driver, named-pipe IPC to OtOpcUa.Server." --- ### GHX-002: Named-Pipe IPC Bootstrap The Host shall open a named pipe on startup whose name, ACL, and shared secret come from environment variables supplied by the supervisor at spawn time. #### Acceptance Criteria - `OTOPCUA_GALAXY_PIPE` → pipe name (default `OtOpcUaGalaxy`). - `OTOPCUA_ALLOWED_SID` → SID of the principal allowed to connect; any other principal is denied at the ACL layer. - `OTOPCUA_GALAXY_SECRET` → per-process shared secret; `Driver.Galaxy.Proxy` must present it on handshake. - `OTOPCUA_GALAXY_BACKEND` → `stub` / `db` / `mxaccess` (default `mxaccess`) — selects which backend implementation is loaded. - Missing `OTOPCUA_ALLOWED_SID` or `OTOPCUA_GALAXY_SECRET` at startup throws with a descriptive error. --- ### GHX-003: Backend Lifecycle The Host shall instantiate the STA pump + MXAccess backend + Galaxy Repository + optional Historian plugin in a defined order and tear them down cleanly on shutdown. #### Acceptance Criteria - Startup (mxaccess backend): initialize Serilog → resolve env vars → create `PipeServer` → start `StaPump` → create `MxAccessClient` on STA thread → initialize `GalaxyRepository` → optionally initialize Historian plugin → begin pipe request handling. - Shutdown: stop pipe → dispose MxAccessClient (MXA-007 COM cleanup) → dispose STA pump → flush Serilog. - Shutdown must complete within 30 seconds (Windows SCM timeout). - `Console.CancelKeyPress` triggers the same sequence in console mode. --- ### GHX-004: Unhandled Exception Handling The Host shall log Fatal on crash and let the supervisor restart it. #### Acceptance Criteria - `AppDomain.CurrentDomain.UnhandledException` handler logs Fatal with full exception details before termination. - The supervisor's driver-stability policy (`docs/v2/driver-stability.md`) governs restart behavior — backoff, crash-loop detection, and alerting live there, not in the Host. - Server-side: `Driver.Galaxy.Proxy` detects pipe disconnect, opens its capability circuit, reports Bad quality on Galaxy nodes; reconnects automatically when the Host is back.