Files
lmxopcua/docs/reqs/HighLevelReqs.md
Joseph Doherty 3d982d9a65 docs: sync against recent code changes
Five doc-content updates after this session's code-review resolution
sweep. No code touched; pure documentation drift correction.

1. docs/reqs/HighLevelReqs.md (HLR-007 — Service Hosting):
   Refreshed the deployment description from "three cooperating
   processes (Server, Admin, Galaxy.Host)" to "two cooperating
   Windows services (Server, Admin)". The legacy x86 TopShelf
   Galaxy.Host process was retired in PR 7.2 (2026-04-30); Galaxy
   access now flows through the in-process Tier-A GalaxyDriver
   talking gRPC to the sibling mxaccessgw gateway. Also called out
   decision #30 (AddWindowsService replacing TopShelf) inline.

2. docs/VirtualTags.md:
   - Line 9: "compiled via Microsoft.CodeAnalysis.CSharp.Scripting"
     replaced with the current pipeline (Microsoft.CodeAnalysis.CSharp
     regular compiler — Core.Scripting-008 / -016 retired the
     CSharpScript/ScriptRunner path).
   - Line 39: orphan-thread leak description rewritten. The
     CSharp.Scripting-era "underlying ScriptRunner keeps running on
     its thread-pool thread until the Roslyn runtime returns" is no
     longer accurate — the new pipeline binds the script as a
     regular C# Func<> delegate, so the leak is now "synchronous
     CPU-bound work on a pool thread" (same operator-visible
     effect, different mechanism).

3. docs/v2/plan.md decision #29 ("Galaxy Host is a separate Windows
   service"):
   Annotated both the decision body and the decision-log table row
   with "Reversed PR 7.2, 2026-04-30" + a one-line summary of the
   replacement architecture. The original reasoning is preserved as
   audit trail per the decision-log convention.

4. docs/v2/implementation/phase-7-scripting-and-alarming.md A.1:
   Added an Implementation note describing the
   Core.Scripting-008 / -016 supersession of the original
   CSharpScript pipeline. The historical record stays; the note
   points future readers at docs/VirtualTags.md "Compile cache"
   for the current contract.

5. docs/plans/alarms-over-gateway.md "Files" section under client
   regeneration:
   Updated the .NET regeneration instructions to point at the new
   ZB.MOM.WW.MxGateway.Contracts.csproj path. The old
   clients/dotnet/MxGateway.Client.csproj no longer exists in the
   sibling repo (restructure after this plan was written) and the
   vendored-binaries situation in
   src/Drivers/ZB.MOM.WW.OtOpcUa.Driver.Galaxy/libs/ is called out
   so a reader following the plan won't chase a deleted path.

Verification: grep against docs/ for the pre-fix wordings ("three
cooperating processes", "Galaxy.Host (TopShelf)", "ScriptRunner",
the wrong BadDeviceFailure hex code 0x80550000) returns no hits.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-23 18:57:04 -04:00

11 KiB

High-Level Requirements

Revision — Refreshed 2026-05-23 for the OtOpcUa v2 multi-driver platform. The original 2025 text described a single-process Galaxy/MXAccess server called LmxOpcUa. Today the project is the OtOpcUa multi-driver OPC UA platform deployed as two cooperating processes (Server, Admin). The Galaxy integration is one of seven shipped drivers and is now an in-process Tier-A driver that talks gRPC to a separately installed mxaccessgw gateway (sibling repo) — PR 7.2 (2026-04-30) retired the legacy out-of-process Galaxy.Host Windows service. HLR-001 through HLR-008 have been rewritten driver-agnostically; HLR-009 has been retired (the embedded Status Dashboard is superseded by the Admin UI). HLR-010 through HLR-017 cover plug-in drivers, resilience, Config DB / draft-publish, cluster redundancy, fleet-wide identifier uniqueness, Admin UI, audit logging, metrics, and the Roslyn capability-wrapping analyzer.

HLR-001: OPC UA Server

The system shall expose an OPC UA server endpoint that OPC UA clients can connect to for browsing, reading, writing, subscribing, acknowledging alarms, and reading historical values. Data is sourced from one or more driver instances that plug into the common core; OPC UA clients see a single unified address space per endpoint regardless of how many drivers are active behind it.

HLR-002: Multi-Driver Plug-In Model

The system shall support pluggable driver modules that bind to specific data sources. v2.0 ships seven drivers: Galaxy (AVEVA System Platform via MXAccess), Modbus TCP (including DL205 via AddressFormat=DL205), Allen-Bradley CIP (ControlLogix/CompactLogix), Allen-Bradley Legacy (SLC/MicroLogix via PCCC), Siemens S7, Beckhoff TwinCAT (ADS), FANUC FOCAS, and OPC UA Client (aggregation/gateway). Drivers implement only the capability interfaces (IDriver, ITagDiscovery, IReadable, IWritable, ISubscribable, IAlarmSource, IHistoryProvider, IHostConnectivityProbe, IPerCallHostResolver, IRediscoverable) defined in ZB.MOM.WW.OtOpcUa.Core.Abstractions that apply to their protocol. Multiple instances of the same driver type are supported; each instance binds to its own OPC UA namespace index.

HLR-003: Address Space Composition per Namespace

The system shall build the OPC UA address space by composing per-driver subtrees into a single endpoint. Each driver instance owns one namespace and registers its nodes via the core-provided IAddressSpaceBuilder streaming API. The Galaxy driver continues to mirror the deployed ArchestrA object hierarchy (contained-name browse paths) in a namespace of kind SystemPlatform. Native-protocol drivers populate a namespace of kind Equipment whose browse structure conforms to the canonical 5-level Unified Namespace (Enterprise / Site / Area / Line / Equipment / Signal).

HLR-004: Data Type Mapping

Each driver shall map its native data types to OPC UA built-in types via DriverDataType conversions, including support for arrays (ValueRank=1 with ArrayDimensions). Type mapping is driver-specific — docs/DataTypeMapping.md covers Galaxy/MXAccess; each other driver's spec in docs/v2/driver-specs.md covers its own mapping. Unknown/unmapped driver types shall default to String per the driver's spec.

HLR-005: Live Data Access

For every data-path operation (read, write, subscribe notification, alarm event, history read, tag rediscovery, host connectivity probe), the system shall route the call through the capability interface owned by the target driver instance. Reads and subscriptions shall deliver a DataValueSnapshot carrying value, OPC UA StatusCode, and source timestamp regardless of the underlying protocol. Every async capability invocation at dispatch shall pass through Core.Resilience.CapabilityInvoker.

HLR-006: Change Detection and Rediscovery

Drivers whose backend has a native change signal (e.g. Galaxy's time_of_last_deploy, OPC UA Client receiving ServerStatusChange) shall implement the optional IRediscoverable interface so the core can rebuild only the affected subtree. Drivers whose tag set is static relative to a published config generation are not required to implement IRediscoverable; their address-space structure changes only via a new published Config DB generation (see HLR-012).

HLR-007: Service Hosting

The system shall be deployed as two cooperating Windows services (the legacy OtOpcUa.Galaxy.Host x86 host was retired in PR 7.2 — Galaxy access now flows through the separately installed mxaccessgw gateway, which lives in a sibling repository and is not part of the OtOpcUa deployment):

  • OtOpcUa.Server — .NET 10 AnyCPU, Microsoft.Extensions.Hosting + AddWindowsService (decision #30 replaced the original TopShelf choice), hosts every driver in-process — including the new Tier-A GalaxyDriver that speaks gRPC to mxaccessgw — and the OPC UA endpoint.
  • OtOpcUa.Admin — .NET 10 x64 Blazor Server web app, hosts the admin UI, SignalR hubs for live updates, /metrics Prometheus endpoint, and audit log writers.

HLR-008: Logging

The system shall log operational events to rolling daily file sinks using Serilog on every process. Plain-text is on by default; structured JSON (CompactJsonFormatter) is opt-in via Serilog:WriteJson = true so SIEMs (Splunk, Datadog) can ingest without a regex parser.

HLR-009: Transport Security and Authentication

The system shall support configurable OPC UA transport-security profiles (None, Basic256Sha256-Sign, Basic256Sha256-SignAndEncrypt, Aes128_Sha256_RsaOaep-Sign, Aes128_Sha256_RsaOaep-SignAndEncrypt, Aes256_Sha256_RsaPss-Sign, Aes256_Sha256_RsaPss-SignAndEncrypt) resolved at startup by SecurityProfileResolver. UserName-token authentication shall be validated against LDAP (production: Active Directory; dev: GLAuth). The server certificate is always created even for None-only deployments because UserName token encryption depends on it.

HLR-010: Per-Driver-Instance Resilience

Every async capability call at dispatch shall pass through Core.Resilience.CapabilityInvoker, which runs a Polly v8 pipeline keyed on (DriverInstanceId, HostName, DriverCapability). Retry and circuit-breaker strategies are per capability per decision #143: Read / Discover / Probe / Subscribe / AlarmSubscribe / HistoryRead retry automatically; Write and AlarmAcknowledge do not retry unless the tag or capability is explicitly marked with WriteIdempotentAttribute. A driver-instance circuit-breaker trip sets Bad quality on that instance's nodes only; other drivers are unaffected (decision #144 — per-host Polly isolation).

HLR-011: Config DB and Draft/Publish

Cluster topology, driver instances, namespaces, UNS hierarchy, equipment, tags, node ACLs, poll groups, and role grants shall live in a central MSSQL Config DB, not in appsettings.json. Changes accumulate in a draft generation that is validated and then atomically published. Each published generation gets a monotonically increasing GenerationNumber scoped per cluster. Nodes poll the DB for new published generations and diff-apply surgically against an atomic snapshot. appsettings.json is reduced to bootstrap-only fields (Config DB connection, NodeId, ClusterId, LDAP, security profile, redundancy role, logging, local cache path).

HLR-012: Local Cache Fallback

Each node shall maintain a sealed LiteDB local cache of the most recent successfully applied generation. If the central Config DB is unreachable at startup, the node shall boot from its cached generation and log a warning. Cache reads are the Polly Fallback leg of the Config DB pipeline.

HLR-013: Cluster Redundancy

The system shall support non-transparent OPC UA redundancy via 2-node clusters sharing a Config DB generation. RedundancyCoordinator + ServiceLevelCalculator compute a dynamic OPC UA ServiceLevel reflecting role (Primary/Secondary), publish state (current generation applied vs mid-apply), health (driver circuit-breaker state), and apply-lease state. Clients select an endpoint by ServerUriArray + ServiceLevel per the OPC UA spec; there is no VIP or load balancer. Single-node deployments use the same model with NodeCount = 1.

HLR-014: Fleet-Wide Identifier Uniqueness

Equipment identifiers that integrate with external systems (ZTag for ERP, SAPID for SAP PM) shall be unique fleet-wide (across all clusters), not just within a cluster. The Admin UI enforces this at draft-publish time via the ExternalIdReservation table, which reserves external IDs across clusters so two clusters cannot publish the same ZTag or SAPID. EquipmentUuid is immutable and globally unique (UUIDv4). EquipmentId and MachineCode are unique within a cluster.

HLR-015: Admin UI Operator Surface

The system shall provide a Blazor Server Admin UI (OtOpcUa.Admin) as the sole write path into the Config DB. Capabilities include: cluster + node management, driver-instance CRUD with schemaless JSON editors, UNS drag-and-drop hierarchy editor, CSV-driven equipment import with fleet-wide external-id reservation, draft/publish with a 6-section diff viewer (Drivers / Namespaces / UNS / Equipment / Tags / ACLs), node-ACL editor producing a permission trie, LDAP role grants, redundancy tab, live cluster-generation state via SignalR, audit log viewer. Users authenticate via cookie-auth over LDAP bind; three admin roles (ConfigViewer, ConfigEditor, FleetAdmin) gate UI operations.

HLR-016: Audit Logging

Every publish event and every ACL / role-grant change shall produce an immutable audit log row in the Config DB via AuditLogService with the acting principal, timestamp, action, before/after generation numbers, and affected entity ids. Audit rows are never mutated or deleted.

HLR-017: Prometheus Metrics

The Admin service shall expose a /metrics endpoint using OpenTelemetry → Prometheus. Core / Server shall emit driver health (per DriverInstanceId), Polly circuit-breaker states (per DriverInstanceId + HostName + DriverCapability), capability-call duration histograms, subscription counts, session counts, memory-tracking gauges (Phase 6.1), publish durations, and Config-DB apply-status gauges.

HLR-018: Roslyn Analyzer OTOPCUA0001

All direct call sites to capability-interface methods (IReadable.ReadAsync, IWritable.WriteAsync, ITagDiscovery.DiscoverAsync, ISubscribable.SubscribeAsync, IAlarmSource.SubscribeAlarmsAsync / AcknowledgeAsync, IHistoryProvider.*, IHostConnectivityProbe.*) made outside Core.Resilience.CapabilityInvoker shall produce Roslyn diagnostic OTOPCUA0001 at build time. The analyzer is shipped in ZB.MOM.WW.OtOpcUa.Analyzers and referenced by every project that could host a capability call, guaranteeing that resilience cannot be accidentally bypassed.

Retired HLRs

  • HLR-009 (Status Dashboard) — retired. Superseded by the Admin UI (HLR-015). See docs/v2/admin-ui.md.

Component-Level Requirements

Detailed requirements are broken out into the following documents: