Doc refresh (task #204) — operational docs for multi-process multi-driver OtOpcUa

Five operational docs rewritten for v2 (multi-process, multi-driver, Config-DB authoritative): - docs/Configuration.md — replaced appsettings-only story with the two-layer model. appsettings.json is bootstrap only (Node identity, Config DB connection string, transport security, LDAP bind, logging). Authoritative config (clusters, namespaces, UNS, equipment, tags, driver instances, ACLs, role grants, poll groups) lives in the Config DB accessed via OtOpcUaConfigDbContext and edited through the Admin UI draft/publish workflow. Added v1-to-v2 migration index so operators can locate where each old section moved. Cross-links to docs/v2/config-db-schema.md + docs/v2/admin-ui.md. - docs/Redundancy.md — Phase 6.3 rewrite. Named every class under src/ZB.MOM.WW.OtOpcUa.Server/Redundancy/: RedundancyCoordinator, RedundancyTopology, ApplyLeaseRegistry (publish fencing), PeerReachabilityTracker, RecoveryStateManager, ServiceLevelCalculator (pure function), RedundancyStatePublisher. Documented the full 11-band ServiceLevel matrix (Maintenance=0 through AuthoritativePrimary=255) from ServiceLevelCalculator.cs and the per-ClusterNode fields (RedundancyRole, ServiceLevelBase, ApplicationUri). Covered metrics (otopcua.redundancy.role_transition counter + primary/secondary/stale_count gauges on meter ZB.MOM.WW.OtOpcUa.Redundancy) and SignalR RoleChanged push from FleetStatusPoller to RedundancyTab.razor. - docs/security.md — preserved the transport-security section (still accurate) and added Phase 6.2 authorization. Four concerns now documented in one place: (1) transport security profiles, (2) OPC UA auth via LdapUserAuthenticator (note: task spec called this LdapAuthenticationProvider — actual class name is LdapUserAuthenticator in Server/Security/), (3) data-plane authorization via NodeAcl + PermissionTrie + AuthorizationGate — additive-only model per decision #129, ClusterId → Namespace → UnsArea → UnsLine → Equipment → Tag hierarchy, NodePermissions bundle, PermissionProbeService in Admin for "probe this permission", (4) control-plane authorization via LdapGroupRoleMapping + AdminRole (ConfigViewer / ConfigEditor / FleetAdmin, CanEdit / CanPublish policies) — deliberately independent of data-plane ACLs per decision #150. Documented the OTOPCUA0001 Roslyn analyzer (UnwrappedCapabilityCallAnalyzer) as the compile-time guard ensuring every driver-capability async call is wrapped by CapabilityInvoker. - docs/ServiceHosting.md — three-process rewrite: OtOpcUa Server (net10 x64, BackgroundService + AddWindowsService, hosts OPC UA endpoint + all non-Galaxy drivers), OtOpcUa Admin (net10 x64, Blazor Server + SignalR + /metrics via OpenTelemetry Prometheus exporter), OtOpcUa Galaxy.Host (.NET Framework 4.8 x86, NSSM-wrapped, env-variable driven, STA thread + MXAccess COM). Pipe ACL denies-Admins detail + non-elevated shell requirement captured from feedback memory. Divergence from CLAUDE.md: task spec said "TopShelf is still the service-installer wrapper per CLAUDE.md note" but no csproj in the repo references TopShelf — decision #30 replaced it with the generic host's AddWindowsService wrapper (per the doc comment on OpcUaServerService). Reflected the actual state + flagged this divergence here so someone can update CLAUDE.md separately. - docs/StatusDashboard.md — replaced the full v1 reference (dashboard endpoints, health check rules, StatusData DTO, etc.) with a short "superseded by Admin UI" pointer that preserves git-blame continuity + avoids broken links from other docs that reference it. Class references verified by reading: src/ZB.MOM.WW.OtOpcUa.Server/Redundancy/{RedundancyCoordinator, ServiceLevelCalculator, ApplyLeaseRegistry, RedundancyStatePublisher}.cs src/ZB.MOM.WW.OtOpcUa.Core/Authorization/{PermissionTrie, PermissionTrieBuilder, PermissionTrieCache, TriePermissionEvaluator, AuthorizationGate}.cs src/ZB.MOM.WW.OtOpcUa.Server/Security/{AuthorizationGate, LdapUserAuthenticator}.cs src/ZB.MOM.WW.OtOpcUa.Admin/{Program.cs, Services/AdminRoles.cs, Services/RedundancyMetrics.cs, Hubs/FleetStatusPoller.cs} src/ZB.MOM.WW.OtOpcUa.Server/Program.cs + appsettings.json src/ZB.MOM.WW.OtOpcUa.Driver.Galaxy.Host/{Program.cs, Ipc/PipeServer.cs} src/ZB.MOM.WW.OtOpcUa.Configuration/Entities/{ClusterNode, NodeAcl, LdapGroupRoleMapping}.cs src/ZB.MOM.WW.OtOpcUa.Analyzers/UnwrappedCapabilityCallAnalyzer.cs Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-20 01:30:56 -04:00
parent 71339307fa
commit 5506b43ddc
5 changed files with 509 additions and 1236 deletions
--- a/docs/StatusDashboard.md
+++ b/docs/StatusDashboard.md
@@ -1,274 +1,16 @@
-# Status Dashboard
+# Status Dashboard — Superseded

-## Overview
+This document has been superseded.

-The service hosts an embedded HTTP status dashboard that surfaces real-time health, connection state, subscription counts, data change throughput, and Galaxy metadata. Operators access it through a browser to verify the bridge is functioning without needing an OPC UA client. The dashboard is enabled by default on port 8081 and can be disabled via configuration.
+The single-process, HTTP-listener "Status Dashboard" (`StatusWebServer` bound to port 8081) belonged to v1 LmxOpcUa, where one process owned the OPC UA endpoint, the MXAccess bridge, and the operator surface. In the multi-process OtOpcUa platform the operator surface has moved into the **OtOpcUa Admin** app — a Blazor Server UI that talks to the shared Config DB and to every deployed node over SignalR (`FleetStatusHub`, `AlertHub`). Prometheus scraping lives on the Admin app's `/metrics` endpoint via OpenTelemetry (`Metrics:Prometheus:Enabled`).

-## HTTP Server
+Operator surfaces now covered by the Admin UI:

-`StatusWebServer` wraps a `System.Net.HttpListener` bound to `http://+:{port}/`. It starts a background task that accepts requests in a loop and dispatches them by path. Only `GET` requests are accepted; all other methods return `405 Method Not Allowed`. Responses include `Cache-Control: no-cache` headers to prevent stale data in the browser.
+- Fleet health, per-node role/ServiceLevel, crash-loop detection (`Fleet.razor`, `Hosts.razor`, `FleetStatusPoller`)
+- Redundancy state + role transitions (`RedundancyMetrics`, `otopcua.redundancy.*`)
+- Cluster + node + credential management (`ClusterService`, `ClusterNodeService`)
+- Draft/publish generation editor, diff viewer, CSV import, UnsTab, IdentificationFields, RedundancyTab, AclsTab with Probe-this-permission
+- Certificate trust management (`CertTrustService` promotes rejected client certs to trusted)
+- Audit log viewer (`AuditLogService`)

-### Endpoints
-
-| Path | Content-Type | Description |
-|------|-------------|-------------|
-| `/` | `text/html` | Operator dashboard with auto-refresh |
-| `/health` | `text/html` | Focused health page with service-level badge and component cards |
-| `/api/status` | `application/json` | Full status snapshot as JSON (`StatusData`) |
-| `/api/health` | `application/json` | Health endpoint (`HealthEndpointData`) -- returns `503` when status is `Unhealthy`, `200` otherwise |
-
-Any other path returns `404 Not Found`.
-
-## Health Check Logic
-
-`HealthCheckService.CheckHealth` evaluates bridge health using the following rules applied in order. The first rule that matches wins; rules 2b, 2c, 2d, and 2e only fire when the corresponding integration is enabled and a non-null snapshot is passed:
-
-1. **Rule 1 -- Unhealthy**: MXAccess connection state is not `Connected`. Returns a red banner with the current state.
-2. **Rule 2b -- Degraded**: `Historian.Enabled=true` but the plugin load outcome is not `Loaded`. Returns a yellow banner citing the plugin status (`NotFound`, `LoadFailed`) and the error message if one is available.
-3. **Rule 2 / 2c -- Degraded**: Any recorded operation has a low success rate. The sample threshold depends on the operation category:
-    - Regular operations (`Read`, `Write`, `Subscribe`, `AlarmAcknowledge`): >100 invocations and <50% success rate.
-    - Historian operations (`HistoryReadRaw`, `HistoryReadProcessed`, `HistoryReadAtTime`, `HistoryReadEvents`): >10 invocations and <50% success rate. The lower threshold surfaces a stuck historian quickly, since history reads are rare relative to live reads.
-4. **Rule 2d -- Degraded (latched)**: `AlarmTrackingEnabled=true` and any alarm acknowledge MXAccess write has failed since startup. Latched on purpose -- an ack write failure is a durable MXAccess write problem that should stay visible until the operator restarts.
-5. **Rule 2e -- Degraded**: `RuntimeStatus.StoppedCount > 0` -- at least one Galaxy runtime host (`$WinPlatform` / `$AppEngine`) is currently reported Stopped by the runtime probe manager. The rule names the stopped hosts in the message. Ordered after Rule 1 so an MxAccess transport outage stays `Unhealthy` via Rule 1 and this rule never double-messages; the probe manager also forces every entry to `Unknown` when the transport is disconnected, so the `StoppedCount` is always 0 in that case.
-6. **Rule 3 -- Healthy**: All checks pass. Returns a green banner with "All systems operational."
-
-The `/api/health` endpoint returns `200` for both Healthy and Degraded states, and `503` only for Unhealthy. This allows load balancers or monitoring tools to distinguish between a service that is running but degraded and one that has lost its runtime connection.
-
-## Status Data Model
-
-`StatusReportService` aggregates data from all bridge components into a `StatusData` DTO, which is then rendered as HTML or serialized to JSON. The DTO contains the following sections:
-
-### Connection
-
-| Field | Type | Description |
-|-------|------|-------------|
-| `State` | `string` | Current MXAccess connection state (Connected, Disconnected, Connecting) |
-| `ReconnectCount` | `int` | Number of reconnect attempts since startup |
-| `ActiveSessions` | `int` | Number of active OPC UA client sessions |
-
-### Health
-
-| Field | Type | Description |
-|-------|------|-------------|
-| `Status` | `string` | Healthy, Degraded, or Unhealthy |
-| `Message` | `string` | Operator-facing explanation |
-| `Color` | `string` | CSS color token (green, yellow, red, gray) |
-
-### Subscriptions
-
-| Field | Type | Description |
-|-------|------|-------------|
-| `ActiveCount` | `int` | Number of active MXAccess tag subscriptions (includes bridge-owned runtime status probes — see `ProbeCount`) |
-| `ProbeCount` | `int` | Subset of `ActiveCount` attributable to bridge-owned runtime status probes (`<Host>.ScanState` per deployed `$WinPlatform` / `$AppEngine`). Rendered as a separate `Probes: N (bridge-owned runtime status)` line on the dashboard so operators can distinguish probe overhead from client-driven subscription load |
-
-### Galaxy
-
-| Field | Type | Description |
-|-------|------|-------------|
-| `GalaxyName` | `string` | Name of the Galaxy being bridged |
-| `DbConnected` | `bool` | Whether the Galaxy repository database is reachable |
-| `LastDeployTime` | `DateTime?` | Most recent deploy timestamp from the Galaxy |
-| `ObjectCount` | `int` | Number of Galaxy objects in the address space |
-| `AttributeCount` | `int` | Number of Galaxy attributes as OPC UA variables |
-| `LastRebuildTime` | `DateTime?` | UTC timestamp of the last completed address-space rebuild |
-
-### Data change
-
-| Field | Type | Description |
-|-------|------|-------------|
-| `EventsPerSecond` | `double` | Rate of MXAccess data change events per second |
-| `AvgBatchSize` | `double` | Average items processed per dispatch cycle |
-| `PendingItems` | `int` | Items waiting in the dispatch queue |
-| `TotalEvents` | `long` | Total MXAccess data change events since startup |
-
-### Galaxy Runtime
-
-Populated from the `GalaxyRuntimeProbeManager` that advises `<Host>.ScanState` on every deployed `$WinPlatform` and `$AppEngine`. See [MXAccess Bridge](MxAccessBridge.md#per-host-runtime-status-probes-hostscanstate) for the probe machinery, state machine, and the subtree quality invalidation that fires on transitions. Disabled when `MxAccess.RuntimeStatusProbesEnabled = false`; the panel is suppressed entirely from the HTML when `Total == 0`.
-
-| Field | Type | Description |
-|-------|------|-------------|
-| `Total` | `int` | Number of runtime hosts tracked (Platforms + AppEngines) |
-| `RunningCount` | `int` | Hosts whose last probe callback reported `ScanState = true` with Good quality |
-| `StoppedCount` | `int` | Hosts whose last probe callback reported `ScanState != true` or a failed item status, or whose initial probe timed out in Unknown state |
-| `UnknownCount` | `int` | Hosts still awaiting initial probe resolution, or rewritten to Unknown when the MxAccess transport is Disconnected |
-| `Hosts` | `List<GalaxyRuntimeStatus>` | Per-host detail rows, sorted alphabetically by `ObjectName` |
-
-Each `GalaxyRuntimeStatus` entry:
-
-| Field | Type | Description |
-|-------|------|-------------|
-| `ObjectName` | `string` | Galaxy `tag_name` of the host (e.g., `DevPlatform`, `DevAppEngine`) |
-| `GobjectId` | `int` | Galaxy `gobject_id` of the host |
-| `Kind` | `string` | `$WinPlatform` or `$AppEngine` |
-| `State` | `enum` | `Unknown`, `Running`, or `Stopped` |
-| `LastStateCallbackTime` | `DateTime?` | UTC time of the most recent probe callback, whether good or bad |
-| `LastStateChangeTime` | `DateTime?` | UTC time of the most recent Running↔Stopped transition; backs the dashboard "Since" column |
-| `LastScanState` | `bool?` | Last `ScanState` value received; `null` before the first callback |
-| `LastError` | `string?` | Detail message from the most recent failure callback (e.g., `"ScanState = false (OffScan)"`); cleared on successful recovery |
-| `GoodUpdateCount` | `long` | Cumulative count of `ScanState = true` callbacks |
-| `FailureCount` | `long` | Cumulative count of `ScanState != true` callbacks or failed item statuses |
-
-The HTML panel renders a per-host table with Name / Kind / State / Since / Last Error columns. Panel color reflects aggregate state: green when every host is `Running`, yellow when any host is `Unknown` with zero `Stopped`, red when any host is `Stopped`, gray when the MxAccess transport is disconnected (the Connection panel is the primary signal in that case and every row is force-rewritten to `Unknown`).
-
-### Operations
-
-A dictionary of `MetricsStatistics` keyed by operation name. Each entry contains:
-
- `TotalCount` -- total invocations
- `SuccessRate` -- fraction of successful operations
- `AverageMilliseconds`, `MinMilliseconds`, `MaxMilliseconds`, `Percentile95Milliseconds` -- latency distribution
-
-The instrumented operation names are:
-
-| Name | Source |
-|---|---|
-| `Read` | MXAccess live tag reads (`MxAccessClient.ReadWrite.cs`) |
-| `Write` | MXAccess live tag writes |
-| `Subscribe` | MXAccess subscription attach |
-| `HistoryReadRaw` | `LmxNodeManager.HistoryReadRawModified` -> historian plugin |
-| `HistoryReadProcessed` | `LmxNodeManager.HistoryReadProcessed` -> historian plugin (aggregates) |
-| `HistoryReadAtTime` | `LmxNodeManager.HistoryReadAtTime` -> historian plugin (interpolated) |
-| `HistoryReadEvents` | `LmxNodeManager.HistoryReadEvents` -> historian plugin (alarm/event history) |
-| `AlarmAcknowledge` | `LmxNodeManager.OnAlarmAcknowledge` -> MXAccess AckMsg write |
-
-New operation names are auto-registered on first use, so the `Operations` dictionary only contains entries for features that have actually been exercised since startup.
-
-### Historian
-
-`HistorianStatusInfo` -- reflects the outcome of the runtime-loaded historian plugin and the runtime query-health counters. See [Historical Data Access](HistoricalDataAccess.md) for the plugin architecture and the [Runtime Health Counters](HistoricalDataAccess.md#runtime-health-counters) section for the data source instrumentation.
-
-| Field | Type | Description |
-|-------|------|-------------|
-| `Enabled` | `bool` | Whether `Historian.Enabled` is set in configuration |
-| `PluginStatus` | `string` | `Disabled`, `NotFound`, `LoadFailed`, or `Loaded` — load-time outcome from `HistorianPluginLoader.LastOutcome` |
-| `PluginError` | `string?` | Exception message from the last load attempt when `PluginStatus=LoadFailed`; otherwise `null` |
-| `PluginPath` | `string` | Absolute path the loader probed for the plugin assembly |
-| `ServerName` | `string` | Legacy single-node hostname from `Historian.ServerName`; ignored when `ServerNames` is non-empty |
-| `Port` | `int` | Configured historian TCP port |
-| `QueryTotal` | `long` | Total historian read queries attempted since startup (raw + aggregate + at-time + events) |
-| `QuerySuccesses` | `long` | Queries that completed without an exception |
-| `QueryFailures` | `long` | Queries that raised an exception — each failure also triggers the plugin's reconnect path |
-| `ConsecutiveFailures` | `int` | Failures since the last success. Resets to zero on any successful query. Drives the `Degraded` health rule at threshold 3 |
-| `LastSuccessTime` | `DateTime?` | UTC timestamp of the most recent successful query, or `null` when no query has succeeded since startup |
-| `LastFailureTime` | `DateTime?` | UTC timestamp of the most recent failure |
-| `LastQueryError` | `string?` | Exception message from the most recent failure. Prefixed with the read-path name (`raw:`, `aggregate:`, `at-time:`, `events:`) so operators can tell which SDK call failed |
-| `ProcessConnectionOpen` | `bool` | Whether the plugin currently holds an open SDK connection for the **process** silo (historical value queries — `ReadRaw`, `ReadAggregate`, `ReadAtTime`). See [Two SDK connection silos](HistoricalDataAccess.md#two-sdk-connection-silos) |
-| `EventConnectionOpen` | `bool` | Whether the plugin currently holds an open SDK connection for the **event** silo (alarm history queries — `ReadEvents`). Separate from the process connection because the SDK requires distinct query channels |
-| `ActiveProcessNode` | `string?` | Cluster node currently serving the process silo, or `null` when no process connection is open |
-| `ActiveEventNode` | `string?` | Cluster node currently serving the event silo, or `null` when no event connection is open |
-| `NodeCount` | `int` | Total configured historian cluster nodes. 1 for a legacy single-node deployment |
-| `HealthyNodeCount` | `int` | Nodes currently eligible for new connections (not in failure cooldown) |
-| `Nodes` | `List<HistorianClusterNodeState>` | Per-node cluster state in configuration order. Each entry carries `Name`, `IsHealthy`, `CooldownUntil`, `FailureCount`, `LastError`, `LastFailureTime` |
-
-The operator dashboard renders a cluster table inside the Historian panel when `NodeCount > 1`. Legacy single-node deployments render a compact `Node: <hostname>` line and no table. Panel color reflects combined load-time + runtime health: green when everything is fine, yellow when any cluster node is in cooldown or 1-4 consecutive query failures are accumulated, red when the plugin is unloaded / all cluster nodes are failed / 5+ consecutive failures.
-
-### Alarms
-
-`AlarmStatusInfo` -- surfaces alarm-condition tracking health and dispatch counters.
-
-| Field | Type | Description |
-|-------|------|-------------|
-| `TrackingEnabled` | `bool` | Whether `OpcUa.AlarmTrackingEnabled` is set in configuration |
-| `ConditionCount` | `int` | Number of distinct alarm conditions currently tracked |
-| `ActiveAlarmCount` | `int` | Number of alarms currently in the `InAlarm=true` state |
-| `TransitionCount` | `long` | Total `InAlarm` transitions observed in the dispatch loop since startup |
-| `AckEventCount` | `long` | Total alarm acknowledgement transitions observed since startup |
-| `AckWriteFailures` | `long` | Total MXAccess AckMsg writes that have failed while processing alarm acknowledges. Any non-zero value latches the service into Degraded (see Rule 2d). |
-| `FilterEnabled` | `bool` | Whether `OpcUa.AlarmFilter.ObjectFilters` has any patterns configured |
-| `FilterPatternCount` | `int` | Number of compiled filter patterns (after comma-splitting and trimming) |
-| `FilterIncludedObjectCount` | `int` | Number of Galaxy objects included by the filter during the most recent address-space build. Zero when the filter is disabled. |
-
-When the filter is active, the operator dashboard's Alarms panel renders an extra line `Filter: N pattern(s), M object(s) included` so operators can verify scope at a glance. See [Alarm Tracking](AlarmTracking.md#template-based-alarm-object-filter) for the matching rules and resolution algorithm.
-
-### Redundancy
-
-`RedundancyInfo` -- only populated when `Redundancy.Enabled=true` in configuration. Shows mode, role, computed service level, application URI, and the set of peer server URIs. See [Redundancy](Redundancy.md) for the full guide.
-
-### Footer
-
-| Field | Type | Description |
-|-------|------|-------------|
-| `Timestamp` | `DateTime` | UTC time when the snapshot was generated |
-| `Version` | `string` | Service assembly version |
-
-## `/api/health` Payload
-
-The health endpoint returns a `HealthEndpointData` document distinct from the full dashboard snapshot. It is designed for load balancers and external monitoring probes that only need an up/down signal plus component-level detail:
-
-| Field | Type | Description |
-|-------|------|-------------|
-| `Status` | `string` | `Healthy`, `Degraded`, or `Unhealthy` (drives the HTTP status code) |
-| `ServiceLevel` | `byte` | OPC UA-style 0-255 service level. 255 when healthy non-redundant; 0 when MXAccess is down; redundancy-adjusted otherwise |
-| `RedundancyEnabled` | `bool` | Whether redundancy is configured |
-| `RedundancyRole` | `string?` | `Primary` or `Secondary` when redundancy is enabled; `null` otherwise |
-| `RedundancyMode` | `string?` | `Warm` or `Hot` when redundancy is enabled; `null` otherwise |
-| `Components.MxAccess` | `string` | `Connected` or `Disconnected` |
-| `Components.Database` | `string` | `Connected` or `Disconnected` |
-| `Components.OpcUaServer` | `string` | `Running` or `Stopped` |
-| `Components.Historian` | `string` | `Disabled`, `NotFound`, `LoadFailed`, or `Loaded` -- matches `HistorianStatusInfo.PluginStatus` |
-| `Components.Alarms` | `string` | `Disabled` or `Enabled` -- mirrors `OpcUa.AlarmTrackingEnabled` |
-| `Uptime` | `string` | Formatted service uptime (e.g., `3d 5h 20m`) |
-| `Timestamp` | `DateTime` | UTC time the snapshot was generated |
-
-Monitoring tools should:
-
- Alert on `Status=Unhealthy` (HTTP 503) for hard outages.
- Alert on `Status=Degraded` (HTTP 200) for latched or cumulative failures -- a degraded status means the server is still operating but a subsystem needs attention (historian plugin missing, alarm ack writes failing, history read error rate too high, etc.).
-
-## HTML Dashboards
-
-### `/` -- Operator dashboard
-
-Monospace, dark background, color-coded panels. Panels: Connection, Health, Redundancy (when enabled), Subscriptions, Data Change Dispatch, Galaxy Info, **Historian**, **Alarms**, Operations (table), Footer. Each panel border color reflects component state (green, yellow, red, or gray).
-
-The page includes a `<meta http-equiv='refresh'>` tag set to the configured `RefreshIntervalSeconds` (default 10 seconds), so the browser polls automatically without JavaScript.
-
-### `/health` -- Focused health view
-
-Large status badge, computed `ServiceLevel` value, redundancy summary (when enabled), and a row of component cards: MXAccess, Galaxy Database, OPC UA Server, **Historian**, **Alarm Tracking**. Each card turns red when its component is in a failure state and grey when disabled. Best for wallboards and quick at-a-glance monitoring.
-
-## Configuration
-
-The dashboard is configured through the `Dashboard` section in `appsettings.json`:
-
-```json
-{
-  "Dashboard": {
-    "Enabled": true,
-    "Port": 8081,
-    "RefreshIntervalSeconds": 10
-  }
-}
-```
-
-Setting `Enabled` to `false` prevents the `StatusWebServer` from starting. The `StatusReportService` is still created so that other components can query health programmatically, but no HTTP listener is opened.
-
-### Dashboard start failures are non-fatal
-
-If the dashboard is enabled but the configured port is already bound (e.g., a previous instance did not clean up, another service is squatting on the port, or the user lacks URL-reservation rights), `StatusWebServer.Start()` logs the listener exception at Error level and returns `false`. `OpcUaService` then logs a Warning, disposes the unstarted instance, sets `DashboardStartFailed = true`, and continues in degraded mode — the OPC UA endpoint still starts. Operators can detect the failure by searching the service log for:
-
-```
-[WRN] Status dashboard failed to bind on port {Port}; service continues without dashboard
-```
-
-Stability review 2026-04-13 Finding 2.
-
-## Component Wiring
-
-`StatusReportService` is initialized after all other service components are created. `OpcUaService.Start()` calls `SetComponents()` to supply the live references, including the historian configuration so the dashboard can label the plugin target and evaluate Rule 2b:
-
-```csharp
-StatusReportInstance.SetComponents(
-    effectiveMxClient,
-    Metrics,
-    GalaxyStatsInstance,
-    ServerHost,
-    NodeManagerInstance,
-    _config.Redundancy,
-    _config.OpcUa.ApplicationUri,
-    _config.Historian);
-```
-
-This deferred wiring allows the report service to be constructed before the MXAccess client or node manager are fully initialized. If a component is `null`, the report service falls back to default values (e.g., `ConnectionState.Disconnected`, zero counts, `HistorianPluginStatus.Disabled`).
-
-The historian plugin status is sourced from `HistorianPluginLoader.LastOutcome`, which is updated on every load attempt. `OpcUaService` explicitly calls `HistorianPluginLoader.MarkDisabled()` when `Historian.Enabled=false` so the dashboard can distinguish "feature off" from "load failed" without ambiguity.
+See [`docs/v2/admin-ui.md`](v2/admin-ui.md) for the current operator surface and [`docs/ServiceHosting.md`](ServiceHosting.md) for the three-process layout.